1
|
Wang YL, Liu C, Yang YY, Zhang L, Guo X, Niu C, Zhang NP, Ding J, Wu J. Dynamic changes of gut microbiota in mouse models of metabolic dysfunction-associated steatohepatitis and its transition to hepatocellular carcinoma. FASEB J 2024; 38:e23766. [PMID: 38967214 DOI: 10.1096/fj.202400573rr] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 06/07/2024] [Accepted: 06/13/2024] [Indexed: 07/06/2024]
Abstract
Dysbiosis of gut microbiota may account for pathobiology in simple fatty liver (SFL), metabolic dysfunction-associated steatohepatitis (MASH), fibrotic progression, and transformation to MASH-associated hepatocellular carcinoma (MASH-HCC). The aim of the present study is to investigate gut dysbiosis in this progression. Fecal microbial rRNA-16S sequencing, absolute quantification, histopathologic, and biochemical tests were performed in mice fed high fat/calorie diet plus high fructose and glucose in drinking water (HFCD-HF/G) or control diet (CD) for 2, 16 weeks, or 14 months. Histopathologic examination verified an early stage of SFL, MASH, fibrotic, or MASH-HCC progression with disturbance of lipid metabolism, liver injury, and impaired gut mucosal barrier as indicated by loss of occludin in ileum mucosa. Gut dysbiosis occurred as early as 2 weeks with reduced α diversity, expansion of Kineothrix, Lactococcus, Akkermansia; and shrinkage in Bifidobacterium, Lactobacillus, etc., at a genus level. Dysbiosis was found as early as MAHS initiation, and was much more profound through the MASH-fibrotic and oncogenic progression. Moreover, the expansion of specific species, such as Lactobacillus johnsonii and Kineothrix alysoides, was confirmed by an optimized method for absolute quantification. Dynamic alterations of gut microbiota were characterized in three stages of early SFL, MASH, and its HCC transformation. The findings suggest that the extent of dysbiosis was accompanied with MASH progression and its transformation to HCC, and the shrinking or emerging of specific microbial species may account at least in part for pathologic, metabolic, and immunologic alterations in fibrogenic progression and malignant transition in the liver.
Collapse
Affiliation(s)
- Yu-Li Wang
- Department of Medical Microbiology and Parasitology, MOE/NHC/CAMS Key Laboratory of Medical Molecular Virology, School of Basic Medical Sciences, Fudan University Shanghai Medical College, Shanghai, China
| | - Chang Liu
- Department of Medical Microbiology and Parasitology, MOE/NHC/CAMS Key Laboratory of Medical Molecular Virology, School of Basic Medical Sciences, Fudan University Shanghai Medical College, Shanghai, China
| | - Yong-Yu Yang
- Department of Medical Microbiology and Parasitology, MOE/NHC/CAMS Key Laboratory of Medical Molecular Virology, School of Basic Medical Sciences, Fudan University Shanghai Medical College, Shanghai, China
| | - Li Zhang
- Department of Medical Microbiology and Parasitology, MOE/NHC/CAMS Key Laboratory of Medical Molecular Virology, School of Basic Medical Sciences, Fudan University Shanghai Medical College, Shanghai, China
| | - Xiao Guo
- Department of Medical Microbiology and Parasitology, MOE/NHC/CAMS Key Laboratory of Medical Molecular Virology, School of Basic Medical Sciences, Fudan University Shanghai Medical College, Shanghai, China
| | - Chen Niu
- Department of Medical Microbiology and Parasitology, MOE/NHC/CAMS Key Laboratory of Medical Molecular Virology, School of Basic Medical Sciences, Fudan University Shanghai Medical College, Shanghai, China
| | - Ning-Ping Zhang
- Department of Gastroenterology and Hepatology, Zhongshan Hospital of Fudan University, Shanghai, China
- Shanghai Institute of Liver Diseases, Fudan University Shanghai Medical College, Shanghai, China
| | - Jia Ding
- Department of Gastroenterology, Shanghai Jing'an District Central Hospital, Fudan University, Shanghai, China
| | - Jian Wu
- Department of Medical Microbiology and Parasitology, MOE/NHC/CAMS Key Laboratory of Medical Molecular Virology, School of Basic Medical Sciences, Fudan University Shanghai Medical College, Shanghai, China
- Department of Gastroenterology and Hepatology, Zhongshan Hospital of Fudan University, Shanghai, China
- Shanghai Institute of Liver Diseases, Fudan University Shanghai Medical College, Shanghai, China
| |
Collapse
|
2
|
Farzad N, Enninful A, Bao S, Zhang D, Deng Y, Fan R. Spatially resolved epigenome sequencing via Tn5 transposition and deterministic DNA barcoding in tissue. Nat Protoc 2024:10.1038/s41596-024-01013-y. [PMID: 38943021 DOI: 10.1038/s41596-024-01013-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 04/11/2024] [Indexed: 06/30/2024]
Abstract
Spatial epigenetic mapping of tissues enables the study of gene regulation programs and cellular functions with the dependency on their local tissue environment. Here we outline a complete procedure for two spatial epigenomic profiling methods: spatially resolved genome-wide profiling of histone modifications using in situ cleavage under targets and tagmentation (CUT&Tag) chemistry (spatial-CUT&Tag) and transposase-accessible chromatin sequencing (spatial-ATAC-sequencing) for chromatin accessibility. Both assays utilize in-tissue Tn5 transposition to recognize genomic DNA loci followed by microfluidic deterministic barcoding to incorporate spatial address codes. Furthermore, these two methods do not necessitate prior knowledge of the transcription or epigenetic markers for a given tissue or cell type but permit genome-wide unbiased profiling pixel-by-pixel at the 10 μm pixel size level and single-base resolution. To support the widespread adaptation of these methods, details are provided in five general steps: (1) sample preparation; (2) Tn5 transposition in spatial-ATAC-sequencing or antibody-controlled pA-Tn5 tagmentation in CUT&Tag; (3) library preparation; (4) next-generation sequencing; and (5) data analysis using our customed pipelines available at: https://github.com/dyxmvp/Spatial_ATAC-seq and https://github.com/dyxmvp/spatial-CUT-Tag . The whole procedure can be completed on four samples in 2-3 days. Familiarity with basic molecular biology and bioinformatics skills with access to a high-performance computing environment are required. A rudimentary understanding of pathology and specimen sectioning, as well as deterministic barcoding in tissue-specific skills (e.g., design of a multiparameter barcode panel and creation of microfluidic devices), are also advantageous. In this protocol, we mainly focus on spatial profiling of tissue region-specific epigenetic landscapes in mouse embryos and mouse brains using spatial-ATAC-sequencing and spatial-CUT&Tag, but these methods can be used for other species with no need for species-specific probe design.
Collapse
Affiliation(s)
- Negin Farzad
- Department of Biomedical Engineering, Yale University, New Haven, CT, USA
- Yale Stem Cell Center and Yale Cancer Center, Yale School of Medicine, New Haven, CT, USA
| | - Archibald Enninful
- Department of Biomedical Engineering, Yale University, New Haven, CT, USA
- Yale Stem Cell Center and Yale Cancer Center, Yale School of Medicine, New Haven, CT, USA
| | - Shuozhen Bao
- Department of Biomedical Engineering, Yale University, New Haven, CT, USA
- Yale Stem Cell Center and Yale Cancer Center, Yale School of Medicine, New Haven, CT, USA
| | - Di Zhang
- Department of Biomedical Engineering, Yale University, New Haven, CT, USA
| | - Yanxiang Deng
- Department of Biomedical Engineering, Yale University, New Haven, CT, USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania School of Medicine, Pennsylvania, PA, USA
| | - Rong Fan
- Department of Biomedical Engineering, Yale University, New Haven, CT, USA.
- Yale Stem Cell Center and Yale Cancer Center, Yale School of Medicine, New Haven, CT, USA.
- Department of Pathology, Yale School of Medicine, New Haven, CT, USA.
- Human and Translational Immunology Program, Yale School of Medicine, New Haven, CT, USA.
| |
Collapse
|
3
|
Hlongwane R, Ramaboa KKKM, Mongwe W. Enhancing credit scoring accuracy with a comprehensive evaluation of alternative data. PLoS One 2024; 19:e0303566. [PMID: 38771812 PMCID: PMC11108212 DOI: 10.1371/journal.pone.0303566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 04/27/2024] [Indexed: 05/23/2024] Open
Abstract
This study explores the potential of utilizing alternative data sources to enhance the accuracy of credit scoring models, compared to relying solely on traditional data sources, such as credit bureau data. A comprehensive dataset from the Home Credit Group's home loan portfolio is analysed. The research examines the impact of incorporating alternative predictors that are typically overlooked, such as an applicant's social network default status, regional economic ratings, and local population characteristics. The modelling approach applies the model-X knockoffs framework for systematic variable selection. By including these alternative data sources, the credit scoring models demonstrate improved predictive performance, achieving an area under the curve metric of 0.79360 on the Kaggle Home Credit default risk competition dataset, outperforming models that relied solely on traditional data sources, such as credit bureau data. The findings highlight the significance of leveraging diverse, non-traditional data sources to augment credit risk assessment capabilities and overall model accuracy.
Collapse
Affiliation(s)
- Rivalani Hlongwane
- Graduate School of Business, University of Cape, Cape Town, South Africa
| | | | - Wilson Mongwe
- Electrical and Electronic Engineering, University of Johannesburg, Johannesburg, South Africa
| |
Collapse
|
4
|
Fu H, Nicolet D, Mrózek K, Stone RM, Eisfeld A, Byrd JC, Archer KJ. Controlled variable selection in Weibull mixture cure models for high-dimensional data. Stat Med 2022; 41:4340-4366. [PMID: 35792553 PMCID: PMC9545322 DOI: 10.1002/sim.9513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 06/14/2022] [Accepted: 06/19/2022] [Indexed: 12/03/2022]
Abstract
Medical breakthroughs in recent years have led to cures for many diseases. The mixture cure model (MCM) is a type of survival model that is often used when a cured fraction exists. Many have sought to identify genomic features associated with a time-to-event outcome which requires variable selection strategies for high-dimensional spaces. Unfortunately, currently few variable selection methods exist for MCMs especially when there are more predictors than samples. This study develops high-dimensional penalized Weibull MCMs, which allow for identification of prognostic factors associated with both cure status and/or survival. We demonstrated how such models may be estimated using two different iterative algorithms. The model-X knockoffs method was combined with these algorithms to control the false discovery rate (FDR) in variable selection. Through extensive simulation studies, our penalized MCMs have been shown to outperform alternative methods on multiple metrics and achieve high statistical power with FDR being controlled. In an acute myeloid leukemia (AML) application with gene expression data, our proposed approach identified 14 genes associated with potential cure and 12 genes with time-to-relapse, which may help inform treatment decisions for AML patients.
Collapse
Affiliation(s)
- Han Fu
- Division of BiostatisticsCollege of Public Health, The Ohio State UniversityColumbusOhioUSA
| | - Deedra Nicolet
- Clara D. Bloomfield Center for Leukemia Outcomes ResearchThe Ohio State University Comprehensive Cancer CenterColumbusOhioUSA
- Alliance Statistics and Data Management CenterThe Ohio State University Comprehensive Cancer CenterColumbusOhioUSA
| | - Krzysztof Mrózek
- Clara D. Bloomfield Center for Leukemia Outcomes ResearchThe Ohio State University Comprehensive Cancer CenterColumbusOhioUSA
| | - Richard M. Stone
- Dana‐Farber/Partners CancerHarvard UniversityBostonMassachusettsUSA
| | - Ann‐Kathrin Eisfeld
- Clara D. Bloomfield Center for Leukemia Outcomes ResearchThe Ohio State University Comprehensive Cancer CenterColumbusOhioUSA
| | - John C. Byrd
- Department of Internal MedicineUniversity of CincinnatiCincinnatiOhioUSA
| | - Kellie J. Archer
- Division of BiostatisticsCollege of Public Health, The Ohio State UniversityColumbusOhioUSA
| |
Collapse
|
5
|
Wang J, Liang H, Zhang Q, Ma S. Replicability in cancer omics data analysis: measures and empirical explorations. Brief Bioinform 2022; 23:bbac304. [PMID: 35876281 PMCID: PMC9487717 DOI: 10.1093/bib/bbac304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Revised: 06/30/2022] [Accepted: 07/06/2022] [Indexed: 02/05/2023] Open
Abstract
In biomedical research, the replicability of findings across studies is highly desired. In this study, we focus on cancer omics data, for which the examination of replicability has been mostly focused on important omics variables identified in different studies. In published literature, although there have been extensive attention and ad hoc discussions, there is insufficient quantitative research looking into replicability measures and their properties. The goal of this study is to fill this important knowledge gap. In particular, we consider three sensible replicability measures, for which we examine distributional properties and develop a way of making inference. Applying them to three The Cancer Genome Atlas (TCGA) datasets reveals in general low replicability and significant across-data variations. To further comprehend such findings, we resort to simulation, which confirms the validity of the findings with the TCGA data and further informs the dependence of replicability on signal level (or equivalently sample size). Overall, this study can advance our understanding of replicability for cancer omics and other studies that have identification as a key goal.
Collapse
Affiliation(s)
- Jiping Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Hongmin Liang
- Department of Statistics, School of Economics, Xiamen University, Xiamen, Fujian, China
| | - Qingzhao Zhang
- Department of Statistics, School of Economics, Xiamen University, Xiamen, Fujian, China
- The Wang Yanan Institute for Studies in Economics, Xiamen University, Xiamen, Fujian, China
| | - Shuangge Ma
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| |
Collapse
|
6
|
Li S, Sesia M, Romano Y, Candès E, Sabatti C. Searching for robust associations with a multi-environment knockoff filter. Biometrika 2022; 109:611-629. [PMID: 38633763 PMCID: PMC11022501 DOI: 10.1093/biomet/asab055] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024] Open
Abstract
This paper develops a method based on model-X knockoffs to find conditional associations that are consistent across environments, controlling the false discovery rate. The motivation for this problem is that large data sets may contain numerous associations that are statistically significant and yet misleading, as they are induced by confounders or sampling imperfections. However, associations replicated under different conditions may be more interesting. In fact, consistency sometimes provably leads to valid causal inferences even if conditional associations do not. While the proposed method is widely applicable, this paper highlights its relevance to genome-wide association studies, in which robustness across populations with diverse ancestries mitigates confounding due to unmeasured variants. The effectiveness of this approach is demonstrated by simulations and applications to the UK Biobank data.
Collapse
Affiliation(s)
- S Li
- Department of Statistics, Stanford University, Stanford, California 94305, USA
| | - M Sesia
- Department of Data Sciences and Operations, University of Southern California, Los Angeles, California 90089, USA
| | - Y Romano
- Departments of Electrical Engineering and of Computer Science, Technion, Haifa, Israel
| | - E Candès
- Department of Statistics, Stanford University, Stanford, California 94305, USA
| | - C Sabatti
- Department of Statistics, Stanford University, Stanford, California 94305, USA
| |
Collapse
|
7
|
Li Y, Dai R, Gwon Y, Rennard SI, Make BJ, Foer D, Strand MJ, Austin E, Young KA, Hokanson JE, Pratte KA, Conway R, Kinney GL. Identifying Individual Medications Affecting Pulmonary Outcomes When Multiple Medications are Present. Clin Epidemiol 2022; 14:731-735. [PMID: 35677475 PMCID: PMC9167843 DOI: 10.2147/clep.s364692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 05/19/2022] [Indexed: 11/25/2022] Open
Affiliation(s)
- Yisha Li
- Department of Epidemiology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Ran Dai
- Department of Biostatistics, School of Public Health, University of Nebraska Medical Center, Omaha, NE, USA
| | - Yeongjin Gwon
- Department of Biostatistics, School of Public Health, University of Nebraska Medical Center, Omaha, NE, USA
| | - Stephen I Rennard
- Division of Pulmonary, Critical Care and Sleep Medicine, University of Nebraska Medical Center, Omaha, NE, USA
| | - Barry J Make
- Department of Medicine, National Jewish Health, Denver, CO, USA
| | - Dinah Foer
- Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | | | - Erin Austin
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO, USA
| | - Kendra A Young
- Department of Epidemiology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - John E Hokanson
- Department of Epidemiology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | | | - Rebecca Conway
- Department of Epidemiology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Gregory L Kinney
- Department of Epidemiology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
- Correspondence: Gregory L Kinney, Department of Epidemiology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA, Tel +1 303-724-4437, Email
| | | |
Collapse
|
8
|
Liu M, Katsevich E, Janson L, Ramdas A. Fast and powerful conditional randomization testing via distillation. Biometrika 2021. [DOI: 10.1093/biomet/asab039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Summary
We consider the problem of conditional independence testing: given a response $Y$ and covariates $(X,Z)$, we test the null hypothesis that $Y {\perp\!\!\!\perp} X \mid Z$. The conditional randomization test was recently proposed as a way to use distributional information about $X\mid Z$ to exactly and nonasymptotically control Type-I error using any test statistic in any dimensionality without assuming anything about $Y\mid (X,Z)$. This flexibility, in principle, allows one to derive powerful test statistics from complex prediction algorithms while maintaining statistical validity. Yet the direct use of such advanced test statistics in the conditional randomization test is prohibitively computationally expensive, especially with multiple testing, due to the requirement to recompute the test statistic many times on resampled data. We propose the distilled conditional randomization test, a novel approach to using state-of-the-art machine learning algorithms in the conditional randomization test while drastically reducing the number of times those algorithms need to be run, thereby taking advantage of their power and the conditional randomization test’s statistical guarantees without suffering the usual computational expense. In addition to distillation, we propose a number of other tricks, like screening and recycling computations, to further speed up the conditional randomization test without sacrificing its high power and exact validity. Indeed, we show in simulations that all our proposals combined lead to a test that has similar power to most powerful existing conditional randomization test implementations, but requires orders of magnitude less computation, making it a practical tool even for large datasets. We demonstrate these benefits on a breast cancer dataset by identifying biomarkers related to cancer stage.
Collapse
Affiliation(s)
- Molei Liu
- Department of Biostatistics, Harvard Chan School of Public Health, 677 Huntington Avenue, Boston, Massachusetts 02115, U.S.A
| | - Eugene Katsevich
- Department of Statistics and Data Science, Wharton School of the University of Pennsylvania, 265 South 37th Street, Philadelphia, Pennsylvania 19104, U.S.A
| | - Lucas Janson
- Department of Statistics, Harvard University, One Oxford Street, Cambridge, Massachusetts 02138, U.S.A
| | - Aaditya Ramdas
- Department of Statistics & Data Science, Carnegie Mellon University, 132H Baker Hall, Pittsburgh, Pennsylvania 15213, U.S.A
| |
Collapse
|
9
|
Chia C, Sesia M, Ho CS, Jeffrey SS, Dionne J, Candes EJ, Howe RT. Interpretable Classification of Bacterial Raman Spectra with Knockoff Wavelets. IEEE J Biomed Health Inform 2021; 26:740-748. [PMID: 34232897 DOI: 10.1109/jbhi.2021.3094873] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Deep neural networks and other machine learning models are widely applied to biomedical signal data because they can detect complex patterns and compute accurate predictions. However, the difficulty of interpreting such models is a limitation, especially for applications involving high-stakes decision, including the identification of bacterial infections. This paper considers fast Raman spectroscopy data and demonstrates that a logistic regression model with carefully selected features achieves accuracy comparable to that of neural networks, while being much simpler and more transparent. Our analysis leverages wavelet features with intuitive chemical interpretations, and performs controlled variable selection with knockoffs to ensure the predictors are relevant and non-redundant. Although we focus on a particular data set, the proposed approach is broadly applicable to other types of signal data for which interpretability may be important.
Collapse
|
10
|
Jiang T, Li Y, Motsinger-Reif AA. Knockoff boosted tree for model-free variable selection. Bioinformatics 2021; 37:976-983. [PMID: 32966559 DOI: 10.1093/bioinformatics/btaa770] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2020] [Revised: 08/17/2020] [Accepted: 09/09/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The recently proposed knockoff filter is a general framework for controlling the false discovery rate (FDR) when performing variable selection. This powerful new approach generates a 'knockoff' of each variable tested for exact FDR control. Imitation variables that mimic the correlation structure found within the original variables serve as negative controls for statistical inference. Current applications of knockoff methods use linear regression models and conduct variable selection only for variables existing in model functions. Here, we extend the use of knockoffs for machine learning with boosted trees, which are successful and widely used in problems where no prior knowledge of model function is required. However, currently available importance scores in tree models are insufficient for variable selection with FDR control. RESULTS We propose a novel strategy for conducting variable selection without prior model topology knowledge using the knockoff method with boosted tree models. We extend the current knockoff method to model-free variable selection through the use of tree-based models. Additionally, we propose and evaluate two new sampling methods for generating knockoffs, namely the sparse covariance and principal component knockoff methods. We test and compare these methods with the original knockoff method regarding their ability to control type I errors and power. In simulation tests, we compare the properties and performance of importance test statistics of tree models. The results include different combinations of knockoffs and importance test statistics. We consider scenarios that include main-effect, interaction, exponential and second-order models while assuming the true model structures are unknown. We apply our algorithm for tumor purity estimation and tumor classification using Cancer Genome Atlas (TCGA) gene expression data. Our results show improved discrimination between difficult-to-discriminate cancer types. AVAILABILITY AND IMPLEMENTATION The proposed algorithm is included in the KOBT package, which is available at https://cran.r-project.org/web/packages/KOBT/index.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tao Jiang
- Department of Statistics, Bioinformatics Research Center, North Carolina State University, Raleigh, NC 27695, USA
| | - Yuanyuan Li
- Biostatistics & Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, NC 27709, USA
| | - Alison A Motsinger-Reif
- Biostatistics & Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, NC 27709, USA
| |
Collapse
|
11
|
Fu H, Archer KJ. High-dimensional variable selection for ordinal outcomes with error control. Brief Bioinform 2020; 22:334-345. [PMID: 32031572 DOI: 10.1093/bib/bbaa007] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Revised: 01/06/2020] [Indexed: 12/24/2022] Open
Abstract
Many high-throughput genomic applications involve a large set of potential covariates and a response which is frequently measured on an ordinal scale, and it is crucial to identify which variables are truly associated with the response. Effectively controlling the false discovery rate (FDR) without sacrificing power has been a major challenge in variable selection research. This study reviews two existing variable selection frameworks, model-X knockoffs and a modified version of reference distribution variable selection (RDVS), both of which utilize artificial variables as benchmarks for decision making. Model-X knockoffs constructs a 'knockoff' variable for each covariate to mimic the covariance structure, while RDVS generates only one null variable and forms a reference distribution by performing multiple runs of model fitting. Herein, we describe how different importance measures for ordinal responses can be constructed that fit into these two selection frameworks, using either penalized regression or machine learning techniques. We compared these measures in terms of the FDR and power using simulated data. Moreover, we applied these two frameworks to high-throughput methylation data for identifying features associated with the progression from normal liver tissue to hepatocellular carcinoma to further compare and contrast their performances.
Collapse
|
12
|
Applications of Bioinformatics in Cancer. Cancers (Basel) 2019; 11:cancers11111630. [PMID: 31652939 PMCID: PMC6893424 DOI: 10.3390/cancers11111630] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Accepted: 10/23/2019] [Indexed: 01/02/2023] Open
|