1
|
McDonnell EI, Xie S, Marder K, Cui F, Wang Y. Dynamic undirected graphical models for time-varying clinical symptom and neuroimaging networks. Stat Med 2024; 43:4131-4147. [PMID: 39007408 DOI: 10.1002/sim.10143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Revised: 05/24/2024] [Accepted: 05/27/2024] [Indexed: 07/16/2024]
Abstract
In this work, we propose methods to examine how the complex interrelationships between clinical symptoms and, separately, brain imaging biomarkers change over time leading up to the diagnosis of a disease in subjects with a known genetic near-certainty of disease. We propose a time-dependent undirected graphical model that ensures temporal and structural smoothness across time-specific networks to examine the trajectories of interactions between markers aligned at the time of disease onset. Specifically, we anchor subjects relative to the time of disease diagnosis (anchoring time) as in a revival process, and we estimate networks at each time point of interest relative to the anchoring time. To use all available data, we apply kernel weights to borrow information across observations that are close to the time of interest. Adaptive lasso weights are introduced to encourage temporal smoothness in edge strength, while a novel elastic fused-l 0 $$ {l}_0 $$ penalty removes spurious edges and encourages temporal smoothness in network structure. Our approach can handle practical complications such as unbalanced visit times. We conduct simulation studies to compare our approach with existing methods. We then apply our method to data from PREDICT-HD, a large prospective observational study of pre-manifest Huntington's disease (HD) patients, to identify symptom and imaging network changes that precede clinical diagnosis of HD.
Collapse
Affiliation(s)
- Erin I McDonnell
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, New York
| | - Shanghong Xie
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, New York
- Department of Statistics, University of South Carolina, Columbia, South Carolina, USA
| | - Karen Marder
- Department of Neurology, Columbia University Medical Center, New York, New York
- Department of Psychiatry, Columbia University Medical Center, New York, New York
- The Taub Institute for Alzheimer's Disease and the Aging Brain, Columbia University Medical Center, New York, New York
- Gertrude H. Sergievsky Center, Columbia University Medical Center, New York, New York
| | - Fanyu Cui
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, New York
| | - Yuanjia Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, New York
- Department of Psychiatry, Columbia University Medical Center, New York, New York
| |
Collapse
|
2
|
Yao S, Li K, Li T, Yu X, Kuan PF, Wang X. GPS-Net: discovering prognostic pathway modules based on network regularized kernel learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.15.603645. [PMID: 39071382 PMCID: PMC11275840 DOI: 10.1101/2024.07.15.603645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
The search for prognostic biomarkers capable of predicting patient outcomes, by analyzing gene expression in tissue samples and other molecular profiles, remains largely on single-gene-based or global-gene-search approaches. Gene-centric approaches, while foundational, fail to capture the higher-order dependencies that reflect the activities of co-regulated processes, pathway alterations, and regulatory networks, all of which are crucial in determining the patient outcomes in complex diseases like cancer. Here, we introduce GPS-Net, a computational framework that fills the gap in efficiently identifying prognostic modules by incorporating the holistic pathway structures and the network of gene interactions. By innovatively incorporating advanced multiple kernel learning techniques and network-based regularization, the proposed method not only enhances the accuracy of biomarker and pathway identification but also significantly reduces computational complexity, as demonstrated by extensive simulation studies. Applying GPS-Net, we identified key pathways that are predictive of patient outcomes in a cancer immunotherapy study. Overall, our approach provides a novel framework that renders genome-wide pathway-level prognostic analysis both feasible and scalable, synergizing both mechanism-driven and data-driven for precision genomics.
Collapse
Affiliation(s)
- Sijie Yao
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institution, Tampa, Florida, 33612, USA
| | - Kaiqiao Li
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, 11794, USA
| | - Tingyi Li
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institution, Tampa, Florida, 33612, USA
| | - Xiaoqing Yu
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institution, Tampa, Florida, 33612, USA
| | - Pei Fen Kuan
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, 11794, USA
| | - Xuefeng Wang
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institution, Tampa, Florida, 33612, USA
| |
Collapse
|
3
|
Liu Z, Wang H. Simultaneous variable selection and estimation for survival data via the Gaussian seamless- L 0 $$ {L}_0 $$ penalty. Stat Med 2024; 43:1509-1526. [PMID: 38320545 DOI: 10.1002/sim.10031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Revised: 01/20/2024] [Accepted: 01/23/2024] [Indexed: 02/08/2024]
Abstract
We propose a new simultaneous variable selection and estimation procedure with the Gaussian seamless-L 0 $$ {L}_0 $$ (GSELO) penalty for Cox proportional hazard model and additive hazards model. The GSELO procedure shows good potential to improve the existing variable selection methods by taking strength from both best subset selection (BSS) and regularization. In addition, we develop an iterative algorithm to implement the proposed procedure in a computationally efficient way. Theoretically, we establish the convergence properties of the algorithm and asymptotic theoretical properties of the proposed procedure. Since parameter tuning is crucial to the performance of the GSELO procedure, we also propose an extended Bayesian information criteria (EBIC) parameter selector for the GSELO procedure. Simulated and real data studies have demonstrated the prediction performance and effectiveness of the proposed method over several state-of-the-art methods.
Collapse
Affiliation(s)
- Zili Liu
- School of Mathematics and Statistics, Central South University, Changsha, Hunan, China
| | - Hong Wang
- School of Mathematics and Statistics, Central South University, Changsha, Hunan, China
| |
Collapse
|
4
|
Frommlet F. A neutral comparison of algorithms to minimize L 0 penalties for high-dimensional variable selection. Biom J 2024; 66:e2200207. [PMID: 37421205 DOI: 10.1002/bimj.202200207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 03/09/2023] [Accepted: 04/29/2023] [Indexed: 07/10/2023]
Abstract
Variable selection methods based on L0 penalties have excellent theoretical properties to select sparse models in a high-dimensional setting. There exist modifications of the Bayesian Information Criterion (BIC) which either control the familywise error rate (mBIC) or the false discovery rate (mBIC2) in terms of which regressors are selected to enter a model. However, the minimization of L0 penalties comprises a mixed-integer problem which is known to be NP-hard and therefore becomes computationally challenging with increasing numbers of regressor variables. This is one reason why alternatives like the LASSO have become so popular, which involve convex optimization problems that are easier to solve. The last few years have seen some real progress in developing new algorithms to minimize L0 penalties. The aim of this article is to compare the performance of these algorithms in terms of minimizing L0 -based selection criteria. Simulation studies covering a wide range of scenarios that are inspired by genetic association studies are used to compare the values of selection criteria obtained with different algorithms. In addition, some statistical characteristics of the selected models and the runtime of algorithms are compared. Finally, the performance of the algorithms is illustrated in a real data example concerned with expression quantitative trait loci (eQTL) mapping.
Collapse
Affiliation(s)
- Florian Frommlet
- Institute of Medical Statistics, Center for Medical Data Science, Medical University of Vienna, Vienna, Austria
| |
Collapse
|
5
|
Halkola AS, Joki K, Mirtti T, Mäkelä MM, Aittokallio T, Laajala TD. OSCAR: Optimal subset cardinality regression using the L0-pseudonorm with applications to prognostic modelling of prostate cancer. PLoS Comput Biol 2023; 19:e1010333. [PMID: 36897911 PMCID: PMC10032505 DOI: 10.1371/journal.pcbi.1010333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 03/22/2023] [Accepted: 02/16/2023] [Indexed: 03/11/2023] Open
Abstract
In many real-world applications, such as those based on electronic health records, prognostic prediction of patient survival is based on heterogeneous sets of clinical laboratory measurements. To address the trade-off between the predictive accuracy of a prognostic model and the costs related to its clinical implementation, we propose an optimized L0-pseudonorm approach to learn sparse solutions in multivariable regression. The model sparsity is maintained by restricting the number of nonzero coefficients in the model with a cardinality constraint, which makes the optimization problem NP-hard. In addition, we generalize the cardinality constraint for grouped feature selection, which makes it possible to identify key sets of predictors that may be measured together in a kit in clinical practice. We demonstrate the operation of our cardinality constraint-based feature subset selection method, named OSCAR, in the context of prognostic prediction of prostate cancer patients, where it enables one to determine the key explanatory predictors at different levels of model sparsity. We further explore how the model sparsity affects the model accuracy and implementation cost. Lastly, we demonstrate generalization of the presented methodology to high-dimensional transcriptomics data.
Collapse
Affiliation(s)
- Anni S. Halkola
- Department of Mathematics and Statistics, University of Turku, Turku, Finland
| | - Kaisa Joki
- Department of Mathematics and Statistics, University of Turku, Turku, Finland
| | - Tuomas Mirtti
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- Department of Pathology, Diagnostic Center, Helsinki University Hospital, Helsinki, Finland
- Department of Biomedical Engineering, School of Medicine, Emory University, Atlanta, Georgia, United States of America
| | - Marko M. Mäkelä
- Department of Mathematics and Statistics, University of Turku, Turku, Finland
| | - Tero Aittokallio
- Department of Mathematics and Statistics, University of Turku, Turku, Finland
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
- Oslo Centre for Biostatistics and Epidemiology (OCBE), University of Oslo, Oslo, Norway
| | - Teemu D. Laajala
- Department of Mathematics and Statistics, University of Turku, Turku, Finland
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
| |
Collapse
|
6
|
Identifying phenotype-associated subpopulations by integrating bulk and single-cell sequencing data. Nat Biotechnol 2022; 40:527-538. [PMID: 34764492 PMCID: PMC9010342 DOI: 10.1038/s41587-021-01091-3] [Citation(s) in RCA: 144] [Impact Index Per Article: 72.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Accepted: 09/10/2021] [Indexed: 02/07/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) distinguishes cell types, states and lineages within the context of heterogeneous tissues. However, current single-cell data cannot directly link cell clusters with specific phenotypes. Here we present Scissor, a method that identifies cell subpopulations from single-cell data that are associated with a given phenotype. Scissor integrates phenotype-associated bulk expression data and single-cell data by first quantifying the similarity between each single cell and each bulk sample. It then optimizes a regression model on the correlation matrix with the sample phenotype to identify relevant subpopulations. Applied to a lung cancer scRNA-seq dataset, Scissor identified subsets of cells associated with worse survival and with TP53 mutations. In melanoma, Scissor discerned a T cell subpopulation with low PDCD1/CTLA4 and high TCF7 expression associated with an immunotherapy response. Beyond cancer, Scissor was effective in interpreting facioscapulohumeral muscular dystrophy and Alzheimer's disease datasets. Scissor identifies biologically and clinically relevant cell subpopulations from single-cell assays by leveraging phenotype and bulk-omics datasets.
Collapse
|
7
|
Li L, Liu ZP. Detecting prognostic biomarkers of breast cancer by regularized Cox proportional hazards models. J Transl Med 2021; 19:514. [PMID: 34930307 PMCID: PMC8686664 DOI: 10.1186/s12967-021-03180-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 12/03/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND The successful identification of breast cancer (BRCA) prognostic biomarkers is essential for the strategic interference of BRCA patients. Recently, various methods have been proposed for exploring a small prognostic gene set that can distinguish the high-risk group from the low-risk group. METHODS Regularized Cox proportional hazards (RCPH) models were proposed to discover prognostic biomarkers of BRCA from gene expression data. Firstly, the maximum connected network with 1142 genes by mapping 956 differentially expressed genes (DEGs) and 677 previously BRCA-related genes into the gene regulatory network (GRN) was constructed. Then, the 72 union genes of the four feature gene sets identified by Lasso-RCPH, Enet-RCPH, [Formula: see text]-RCPH and SCAD-RCPH models were recognized as the robust prognostic biomarkers. These biomarkers were validated by literature checks, BRCA-specific GRN and functional enrichment analysis. Finally, an index of prognostic risk score (PRS) for BRCA was established based on univariate and multivariate Cox regression analysis. Survival analysis was performed to investigate the PRS on 1080 BRCA patients from the internal validation. Particularly, the nomogram was constructed to express the relationship between PRS and other clinical information on the discovery dataset. The PRS was also verified on 1848 BRCA patients of ten external validation datasets or collected cohorts. RESULTS The nomogram highlighted that the importance of PRS in guiding significance for the prognosis of BRCA patients. In addition, the PRS of 301 normal samples and 306 tumor samples from five independent datasets showed that it is significantly higher in tumors than in normal tissues ([Formula: see text]). The protein expression profiles of the three genes, i.e., ADRB1, SAV1 and TSPAN14, involved in the PRS model demonstrated that the latter two genes are more strongly stained in tumor specimens. More importantly, external validation illustrated that the high-risk group has worse survival than the low-risk group ([Formula: see text]) in both internal and external validations. CONCLUSIONS The proposed pipelines of detecting and validating prognostic biomarker genes for BRCA are effective and efficient. Moreover, the proposed PRS is very promising as an important indicator for judging the prognosis of BRCA patients.
Collapse
Affiliation(s)
- Lingyu Li
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, 250061, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, 250061, China.
| |
Collapse
|
8
|
Generative Adversarial Network-Based Scheme for Diagnosing Faults in Cyber-Physical Power Systems. SENSORS 2021; 21:s21155173. [PMID: 34372410 PMCID: PMC8348776 DOI: 10.3390/s21155173] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 07/25/2021] [Accepted: 07/27/2021] [Indexed: 11/17/2022]
Abstract
This paper presents a novel diagnostic framework for distributed power systems that is based on using generative adversarial networks for generating artificial knockoffs in the power grid. The proposed framework makes use of the raw data measurements including voltage, frequency, and phase-angle that are collected from each bus in the cyber-physical power systems. The collected measurements are firstly fed into a feature selection module, where multiple state-of-the-art techniques have been used to extract the most informative features from the initial set of available features. The selected features are inputs to a knockoff generation module, where the generative adversarial networks are employed to generate the corresponding knockoffs of the selected features. The generated knockoffs are then fed into a classification module, in which two different classification models are used for the sake of fault diagnosis. Multiple experiments have been designed to investigate the effect of noise, fault resistance value, and sampling rate on the performance of the proposed framework. The effectiveness of the proposed framework is validated through a comprehensive study on the IEEE 118-bus system.
Collapse
|
9
|
Li X, Ivanova A, Tian H, Lim P, Liu K. Continual reassessment method with regularization in phase I clinical trials. J Biopharm Stat 2020; 30:964-978. [PMID: 32926652 DOI: 10.1080/10543406.2020.1818251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Many Phase I trial designs have been developed to improve upon the standard 3+3 design. These designs can be classified as long-memory designs, for example, the continual reassessment method (CRM), and short-memory designs such as the modified toxicity probability interval (mTPI) design. Long-term memory designs use all data but their performance can be negatively affected by the model misspecification. Short-term memory designs only use data at the current dose and might lose efficiency as a result. To overcome these issues, we propose a regularized CRM (rCRM). The rCRM offers a trade-off between long-term memory and short-term memory methods. The rCRM gives more weight to data obtained at the doses with the estimated probability of toxicity closer to the target toxicity rate. The addition of a regularization term has an effect of shrinking the dimension of the model and leads to improved performance of the 2-parameter CRM. The rCRM is a good design choice to guide assignments in an expansion cohort phase of a dose-finding trial since dose assignments do not seem to change as often as in corresponding CRMs.
Collapse
Affiliation(s)
- Xiang Li
- Statistics and Decision Sciences, Janssen Research & Development, LLC, Raritan, NJ, USA
| | - Anastasia Ivanova
- Department of Biostatistics, University of North Carolina at Chapel Hill, NC, USA
| | - Hong Tian
- Statistics and Decision Sciences, Janssen Research & Development, LLC, Raritan, NJ, USA
| | - Pilar Lim
- Statistics and Decision Sciences, Janssen Research & Development, LLC, Titusville, NJ, USA
| | - Kevin Liu
- Biostatistics, Genmab, Princeton, NJ, USA
| |
Collapse
|
10
|
Li X, Li Q, Zeng D, Marder K, Paulsen J, Wang Y. Time-varying Hazards Model for Incorporating Irregularly Measured, High-Dimensional Biomarkers. Stat Sin 2020; 30:1605-1632. [PMID: 32952367 PMCID: PMC7497773 DOI: 10.5705/ss.202017.0375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Clinical studies with time-to-event outcomes often collect measurements of a large number of time-varying covariates over time (e.g., clinical assessments or neuroimaging biomarkers) to build time-sensitive prognostic model. An emerging challenge is that due to resource-intensive or invasive (e.g., lumbar puncture) data collection process, biomarkers may be measured infrequently and thus not available at every observed event time point. Lever-aging all available, infrequently measured time-varying biomarkers to improve prognostic model of event occurrence is an important and challenging problem. In this paper, we propose a kernel-smoothing based approach to borrow information across subjects to remedy infrequent and unbalanced biomarker measurements under a time-varying hazards model. A penalized pseudo-likelihood function is proposed for estimation, and an efficient augmented penalization minimization algorithm related to the alternating direction method of multipliers (ADMM) is adopted for computation. Under some regularity conditions to carefully control approximation bias and stochastic variability, we show that even in the presence of ultra-high dimensionality, the proposed method selects important biomarkers with high probability. Through extensive simulation studies, we demonstrate superior performance in terms of estimation and selection performance compared to alternative methods. Finally, we apply the proposed method to analyze a recently completed real world study to model time to disease conversion using longitudinal, whole brain structural magnetic resonance imaging (MRI) biomarkers, and show a substantial improvement in performance over current standards including using baseline measures only.
Collapse
Affiliation(s)
| | - Quefeng Li
- University of North Carolina, Chapel Hill
| | | | | | | | | |
Collapse
|
11
|
Vinga S. Structured sparsity regularization for analyzing high-dimensional omics data. Brief Bioinform 2020; 22:77-87. [PMID: 32597465 DOI: 10.1093/bib/bbaa122] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2019] [Revised: 05/15/2020] [Accepted: 05/18/2020] [Indexed: 12/18/2022] Open
Abstract
The development of new molecular and cell technologies is having a significant impact on the quantity of data generated nowadays. The growth of omics databases is creating a considerable potential for knowledge discovery and, concomitantly, is bringing new challenges to statistical learning and computational biology for health applications. Indeed, the high dimensionality of these data may hamper the use of traditional regression methods and parameter estimation algorithms due to the intrinsic non-identifiability of the inherent optimization problem. Regularized optimization has been rising as a promising and useful strategy to solve these ill-posed problems by imposing additional constraints in the solution parameter space. In particular, the field of statistical learning with sparsity has been significantly contributing to building accurate models that also bring interpretability to biological observations and phenomena. Beyond the now-classic elastic net, one of the best-known methods that combine lasso with ridge penalizations, we briefly overview recent literature on structured regularizers and penalty functions that have been applied in biomedical data to build parsimonious models in a variety of underlying contexts, from survival to generalized linear models. These methods include functions of $\ell _k$-norms and network-based penalties that take into account the inherent relationships between the features. The successful application to omics data illustrates the potential of sparse structured regularization for identifying disease's molecular signatures and for creating high-performance clinical decision support systems towards more personalized healthcare. Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.
Collapse
Affiliation(s)
- Susana Vinga
- INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
| |
Collapse
|
12
|
Xu H, Li X, Yang Y, Li Y, Pinheiro J, Sasser K, Hamadeh H, Steven X, Yuan M. High-throughput and efficient multilocus genome-wide association study on longitudinal outcomes. Bioinformatics 2020; 36:3004-3010. [PMID: 32096821 DOI: 10.1093/bioinformatics/btaa120] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Revised: 01/16/2020] [Accepted: 02/18/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION With the emerging of high-dimensional genomic data, genetic analysis such as genome-wide association studies (GWAS) have played an important role in identifying disease-related genetic variants and novel treatments. Complex longitudinal phenotypes are commonly collected in medical studies. However, since limited analytical approaches are available for longitudinal traits, these data are often underutilized. In this article, we develop a high-throughput machine learning approach for multilocus GWAS using longitudinal traits by coupling Empirical Bayesian Estimates from mixed-effects modeling with a novel ℓ0-norm algorithm. RESULTS Extensive simulations demonstrated that the proposed approach not only provided accurate selection of single nucleotide polymorphisms (SNPs) with comparable or higher power but also robust control of false positives. More importantly, this novel approach is highly scalable and could be approximately >1000 times faster than recently published approaches, making genome-wide multilocus analysis of longitudinal traits possible. In addition, our proposed approach can simultaneously analyze millions of SNPs if the computer memory allows, thereby potentially allowing a true multilocus analysis for high-dimensional genomic data. With application to the data from Alzheimer's Disease Neuroimaging Initiative, we confirmed that our approach can identify well-known SNPs associated with AD and were much faster than recently published approaches (≥6000 times). AVAILABILITY AND IMPLEMENTATION The source code and the testing datasets are available at https://github.com/Myuan2019/EBE_APML0. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Huang Xu
- Department of Statistics and Finance, University of Science and Technology of China, Hefei 230026, China
| | - Xiang Li
- Janssen Research and Development, Raritan, NJ 08869, USA
| | - Yaning Yang
- Department of Statistics and Finance, University of Science and Technology of China, Hefei 230026, China
| | - Yi Li
- Department of Statistics and Finance, University of Science and Technology of China, Hefei 230026, China
| | - Jose Pinheiro
- Janssen Research and Development, Raritan, NJ 08869, USA
| | | | | | - Xu Steven
- Genmab US, Inc., Princeton, NJ 08540, USA
| | - Min Yuan
- School of Public Health Administration, Anhui Medical University, Hefei 230032, China
| | | |
Collapse
|
13
|
Van den Hove A, Verwaeren J, Van den Bossche J, Theunis J, De Baets B. Development of a land use regression model for black carbon using mobile monitoring data and its application to pollution-avoiding routing. ENVIRONMENTAL RESEARCH 2020; 183:108619. [PMID: 31836206 DOI: 10.1016/j.envres.2019.108619] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Revised: 05/02/2019] [Accepted: 07/31/2019] [Indexed: 06/10/2023]
Abstract
Black carbon is often used as an indicator for combustion-related air pollution. In urban environments, on-road black carbon concentrations have a large spatial variability, suggesting that the personal exposure of a cyclist to black carbon can heavily depend on the route that is chosen to reach a destination. In this paper, we describe the development of a cyclist routing procedure that minimizes personal exposure to black carbon. Firstly, a land use regression model for predicting black carbon concentrations in an urban environment is developed using mobile monitoring data, collected by cyclists. The optimal model is selected and validated using a spatially stratified cross-validation scheme. The resulting model is integrated in a dedicated routing procedure that minimizes personal exposure to black carbon during cycling. The best model obtains a coefficient of multiple correlation of R=0.520. Simulations with the black carbon exposure minimizing routing procedure indicate that the inhaled amount of black carbon is reduced by 1.58% on average as compared to the shortest-path route, with extreme cases where a reduction of up to 13.35% is obtained. Moreover, we observed that the average exposure to black carbon and the exposure to local peak concentrations on a route are competing objectives, and propose a parametrized cost function for the routing problem that allows for a gradual transition from routes that minimize average exposure to routes that minimize peak exposure.
Collapse
Affiliation(s)
- Annelies Van den Hove
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, Coupure links 653, Ghent, Belgium.
| | - Jan Verwaeren
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, Coupure links 653, Ghent, Belgium.
| | - Joris Van den Bossche
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, Coupure links 653, Ghent, Belgium; Flemish Institute for Technological Research (VITO), Boeretang 200, Mol, Belgium.
| | - Jan Theunis
- Flemish Institute for Technological Research (VITO), Boeretang 200, Mol, Belgium.
| | - Bernard De Baets
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, Coupure links 653, Ghent, Belgium.
| |
Collapse
|
14
|
Açıkoğlu M, Tuncer SA. Incorporating feature selection methods into a machine learning-based neonatal seizure diagnosis. Med Hypotheses 2020; 135:109464. [DOI: 10.1016/j.mehy.2019.109464] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Revised: 10/22/2019] [Accepted: 10/27/2019] [Indexed: 11/16/2022]
|
15
|
Xie S, Li X, McColgan P, Scahill RI, Zeng D, Wang Y. Identifying disease-associated biomarker network features through conditional graphical model. Biometrics 2019; 76:995-1006. [PMID: 31850527 DOI: 10.1111/biom.13201] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Revised: 07/25/2019] [Accepted: 12/04/2019] [Indexed: 01/28/2023]
Abstract
Biomarkers are often organized into networks, in which the strengths of network connections vary across subjects depending on subject-specific covariates (eg, genetic variants). Variation of network connections, as subject-specific feature variables, has been found to predict disease clinical outcome. In this work, we develop a two-stage method to estimate biomarker networks that account for heterogeneity among subjects and evaluate network's association with disease clinical outcome. In the first stage, we propose a conditional Gaussian graphical model with mean and precision matrix depending on covariates to obtain covariate-dependent networks with connection strengths varying across subjects while assuming homogeneous network structure. In the second stage, we evaluate clinical utility of network measures (connection strengths) estimated from the first stage. The second-stage analysis provides the relative predictive power of between-region network measures on clinical impairment in the context of regional biomarkers and existing disease risk factors. We assess the performance of proposed method by extensive simulation studies and application to a Huntington's disease (HD) study to investigate the effect of HD causal gene on the rate of change in motor symptom through affecting brain subcortical and cortical gray matter atrophy connections. We show that cortical network connections and subcortical volumes, but not subcortical connections are identified to be predictive of clinical motor function deterioration. We validate these findings in an independent HD study. Lastly, highly similar patterns seen in the gray matter connections and a previous white matter connectivity study suggest a shared biological mechanism for HD and support the hypothesis that white matter loss is a direct result of neuronal loss as opposed to the loss of myelin or dysmyelination.
Collapse
Affiliation(s)
- Shanghong Xie
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York
| | - Xiang Li
- Statistics and Decision Sciences, Janssen Research & Development, LLC, Raritan, New Jersey
| | - Peter McColgan
- Huntington's Disease Centre, Department of Neurodegenerative Disease, UCL Institute of Neurology, London, UK
| | - Rachael I Scahill
- Huntington's Disease Centre, Department of Neurodegenerative Disease, UCL Institute of Neurology, London, UK
| | - Donglin Zeng
- Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina
| | - Yuanjia Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York.,Department of Psychiatry, Columbia University Medical Center, New York
| |
Collapse
|