1
|
Qian J, Tanigawa Y, Li R, Tibshirani R, Rivas MA, Hastie T. LARGE-SCALE MULTIVARIATE SPARSE REGRESSION WITH APPLICATIONS TO UK BIOBANK. Ann Appl Stat 2022; 16:1891-1918. [PMID: 36091495 PMCID: PMC9454085 DOI: 10.1214/21-aoas1575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
In high-dimensional regression problems, often a relatively small subset of the features are relevant for predicting the outcome, and methods that impose sparsity on the solution are popular. When multiple correlated outcomes are available (multitask), reduced rank regression is an effective way to borrow strength and capture latent structures that underlie the data. Our proposal is motivated by the UK Biobank population-based cohort study, where we are faced with large-scale, ultrahigh-dimensional features, and have access to a large number of outcomes (phenotypes)-lifestyle measures, biomarkers, and disease outcomes. We are hence led to fit sparse reduced-rank regression models, using computational strategies that allow us to scale to problems of this size. We use a scheme that alternates between solving the sparse regression problem and solving the reduced rank decomposition. For the sparse regression component we propose a scalable iterative algorithm based on adaptive screening that leverages the sparsity assumption and enables us to focus on solving much smaller subproblems. The full solution is reconstructed and tested via an optimality condition to make sure it is a valid solution for the original problem. We further extend the method to cope with practical issues, such as the inclusion of confounding variables and imputation of missing values among the phenotypes. Experiments on both synthetic data and the UK Biobank data demonstrate the effectiveness of the method and the algorithm. We present multiSnpnet package, available at http://github.com/junyangq/multiSnpnet that works on top of PLINK2 files, which we anticipate to be a valuable tool for generating polygenic risk scores from human genetic studies.
Collapse
Affiliation(s)
| | | | - Ruilin Li
- Institute for Computational and Mathematical Engineering, Stanford University
| | | | - Manuel A Rivas
- Department of Biomedical Data Science, Stanford University
| | | |
Collapse
|
2
|
Wang W, Kong W, Wang S, Wei K. Detecting Biomarkers of Alzheimer's Disease Based on Multi-constrained Uncertainty-Aware Adaptive Sparse Multi-view Canonical Correlation Analysis. J Mol Neurosci 2022; 72:841-865. [PMID: 35080765 DOI: 10.1007/s12031-021-01963-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 12/29/2021] [Indexed: 12/01/2022]
Abstract
Image genetics mainly explores the pathogenesis of Alzheimer's disease (AD) by studying the relationship between genetic data (such as SNP, gene expression data, and DNA methylation) and imaging data (such as structural MRI (sMRI), fMRI, and PET). Most of the existing research on brain imaging genomics uses two-way or three-way bi-multivariate methods to explore the correlation analysis between genes and brain imaging. However, many of these methods are still affected by the gradient domination or cannot take into account the effect of feature redundancy on the results, so that the typical correlation coefficient and program running speed are not significantly improved. In order to solve the above problems, this paper proposes a multi-constrained uncertainty-aware adaptive sparse multi-view canonical correlation analysis method (MC-unAdaSMCCA) to explore associations among SNPs, gene expression data, and sMRI; that is, based on traditional unAdaSMCCA, orthogonal constraints are imposed on the weights of the three data features through linear programming, which can reduce the redundancy of feature weights to improve the correlation between the data and reduce the complexity of the algorithm to significantly speed up the running speed of the program. Three adaptive sparse multi-view canonical correlation analysis methods are used as benchmarks to evaluate the difference between real neuroimaging data and synthetic data. Compared with the other three methods, our proposed method has obtained better or comparable typical correlation coefficients and typical weights. Moreover, the following experimental results show that the MC-unAdaSMCCA method cannot only identify biomarkers related to AD and mild cognitive impairment (MCI), but also has a strong ability to resist noise and process high-dimensional data. Therefore, our proposed method provides a reliable approach to multi-modal imaging genetic researches.
Collapse
Affiliation(s)
- Wenbo Wang
- College of Information Engineering, Shanghai Maritime University, 1550 Haigang Ave., Shanghai, 201306, People's Republic of China
| | - Wei Kong
- College of Information Engineering, Shanghai Maritime University, 1550 Haigang Ave., Shanghai, 201306, People's Republic of China.
| | - Shuaiqun Wang
- College of Information Engineering, Shanghai Maritime University, 1550 Haigang Ave., Shanghai, 201306, People's Republic of China
| | - Kai Wei
- College of Information Engineering, Shanghai Maritime University, 1550 Haigang Ave., Shanghai, 201306, People's Republic of China
| |
Collapse
|
3
|
Li Y, Nan B, Zhu J. A Structured Brain-wide and Genome-wide Association Study Using ADNI PET Images. CAN J STAT 2021; 49:182-202. [PMID: 34566241 DOI: 10.1002/cjs.11605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
A multi-stage variable selection method is introduced for detecting association signals in structured brain-wide and genome-wide association studies (brain-GWAS). Compared to conventional single-voxel-to-single-SNP approaches, our approach is more efficient and powerful in selecting the important signals by integrating anatomic and gene grouping structures in the brain and the genome, respectively. It avoids large number of multiple comparisons while effectively controls the false discoveries. Validity of the proposed approach is demonstrated by both theoretical investigation and numerical simulations. We apply the proposed method to a brain-GWAS using ADNI PET imaging and genomic data. We confirm previously reported association signals and also find several novel SNPs and genes that either are associated with brain glucose metabolism or have their association significantly modified by Alzheimer's disease status.
Collapse
Affiliation(s)
- Yanming Li
- Department of Biotatistics & Data Science, University of Kansas Medical Center Kansas City, KS 66160
| | - Bin Nan
- Department of Statistics, University of California at Irvine Irvine, CA 92697
| | - Ji Zhu
- Department of Statistics, University of Michigan Ann Arbor, MI 48109
| |
Collapse
|
4
|
Lu P, Colliot O. Multilevel Survival Modeling with Structured Penalties for Disease Prediction from Imaging Genetics data. IEEE J Biomed Health Inform 2021; 26:798-808. [PMID: 34329174 DOI: 10.1109/jbhi.2021.3100918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
This paper introduces a framework for disease prediction from multimodal genetic and imaging data. We propose a multilevel survival model which allows predicting the time of occurrence of a future disease state in patients initially exhibiting mild symptoms. This new multilevel setting allows modeling the interactions between genetic and imaging variables. This is in contrast with classical additive models which treat all modalities in the same manner and can result in undesirable elimination of specific modalities when their contributions are unbalanced. Moreover, the use of a survival model allows overcoming the limitations of previous approaches based on classification which consider a fixed time frame. Furthermore, we introduce specific penalties taking into account the structure of the different types of data, such as a group lasso penalty over the genetic modality and a L2-penalty over the imaging modality. Finally, we propose a fast optimization algorithm, based on a proximal gradient method. The approach was applied to the prediction of Alzheimer's disease (AD) among patients with mild cognitive impairment (MCI) based on genetic (single nucleotide polymorphisms - SNP) and imaging (anatomical MRI measures) data from the ADNI database. The experiments demonstrate the effectiveness of the method for predicting the time of conversion to AD. It revealed how genetic variants and brain imaging alterations interact in the prediction of future disease status. The approach is generic and could potentially be useful for the prediction of other diseases.
Collapse
|
5
|
Deng L, Ma L, Cheng KK, Xu X, Raftery D, Dong J. Sparse PLS-Based Method for Overlapping Metabolite Set Enrichment Analysis. J Proteome Res 2021; 20:3204-3213. [PMID: 34002606 DOI: 10.1021/acs.jproteome.1c00064] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Metabolite set enrichment analysis (MSEA) has gained increasing research interest for identification of perturbed metabolic pathways in metabolomics. The method incorporates predefined metabolic pathways information in the analysis where metabolite sets are typically assumed to be mutually exclusive to each other. However, metabolic pathways are known to contain common metabolites and intermediates. This situation, along with limitations in metabolite detection or coverage leads to overlapping, incomplete metabolite sets in pathway analysis. For overlapping metabolite sets, MSEA tends to result in high false positives due to improper weights allocated to the overlapping metabolites. Here, we proposed an extended partial least squares (PLS) model with a new sparse scheme for overlapping metabolite set enrichment analysis, named overlapping group PLS (ogPLS) analysis. The weight vector of the ogPLS model was decomposed into pathway-specific subvectors, and then a group lasso penalty was imposed on these subvectors to achieve a proper weight allocation for the overlapping metabolites. Two strategies were adopted in the proposed ogPLS model to identify the perturbed metabolic pathways. The first strategy involves debiasing regularization, which was used to reduce inequalities amongst the predefined metabolic pathways. The second strategy is stable selection, which was used to rank pathways while avoiding the nuisance problems of model parameter optimization. Both simulated and real-world metabolomic datasets were used to evaluate the proposed method and compare with two other MSEA methods including Global-test and the multiblock PLS (MB-PLS)-based pathway importance in projection (PIP) methods. Using a simulated dataset with known perturbed pathways, the average true discovery rate for the ogPLS method was found to be higher than the Global-test and the MB-PLS-based PIP methods. Analysis with a real-world metabolomics dataset also indicated that the developed method was less prone to select pathways with highly overlapped detected metabolite sets. Compared with the two other methods, the proposed method features higher accuracy, lower false-positive rate, and is more robust when applied to overlapping metabolite set analysis. The developed ogPLS method may serve as an alternative MSEA method to facilitate biological interpretation of metabolomics data for overlapping metabolite sets.
Collapse
Affiliation(s)
- Lingli Deng
- Jiangxi Engineering Technology Research Center of Nuclear Geoscience Data Science and System, East China University of Technology, Nanchang 330013, China.,Department of Information Engineering, East China University of Technology, Nanchang 330013, China
| | - Lei Ma
- Department of Information Engineering, East China University of Technology, Nanchang 330013, China
| | - Kian-Kai Cheng
- Innovation Centre in Agritechnology, Universiti Teknologi Malaysia, Muar 84600, Johor, Malaysia
| | - Xiangnan Xu
- School of Mathematics and Statistics, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Daniel Raftery
- Northwest Metabolomics Research Center, Department of Anesthesiology and Pain Medicine, University of Washington, Seattle, Washington 98109, United States
| | - Jiyang Dong
- Department of Electronic Science, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| |
Collapse
|
6
|
Huang M, Chen X, Yu Y, Lai H, Feng Q. Imaging Genetics Study Based on a Temporal Group Sparse Regression and Additive Model for Biomarker Detection of Alzheimer's Disease. IEEE TRANSACTIONS ON MEDICAL IMAGING 2021; 40:1461-1473. [PMID: 33556003 DOI: 10.1109/tmi.2021.3057660] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Imaging genetics is an effective tool used to detect potential biomarkers of Alzheimer's disease (AD) in imaging and genetic data. Most existing imaging genetics methods analyze the association between brain imaging quantitative traits (QTs) and genetic data [e.g., single nucleotide polymorphism (SNP)] by using a linear model, ignoring correlations between a set of QTs and SNP groups, and disregarding the varied associations between longitudinal imaging QTs and SNPs. To solve these problems, we propose a novel temporal group sparsity regression and additive model (T-GSRAM) to identify associations between longitudinal imaging QTs and SNPs for detection of potential AD biomarkers. We first construct a nonparametric regression model to analyze the nonlinear association between QTs and SNPs, which can accurately model the complex influence of SNPs on QTs. We then use longitudinal QTs to identify the trajectory of imaging genetic patterns over time. Moreover, the SNP information of group and individual levels are incorporated into the proposed method to boost the power of biomarker detection. Finally, we propose an efficient algorithm to solve the whole T-GSRAM model. We evaluated our method using simulation data and real data obtained from AD neuroimaging initiative. Experimental results show that our proposed method outperforms several state-of-the-art methods in terms of the receiver operating characteristic curves and area under the curve. Moreover, the detection of AD-related genes and QTs has been confirmed in previous studies, thereby further verifying the effectiveness of our approach and helping understand the genetic basis over time during disease progression.
Collapse
|
7
|
Zwep LB, Duisters KLW, Jansen M, Guo T, Meulman JJ, Upadhyay PJ, van Hasselt JGC. Identification of high-dimensional omics-derived predictors for tumor growth dynamics using machine learning and pharmacometric modeling. CPT Pharmacometrics Syst Pharmacol 2021; 10:350-361. [PMID: 33792207 PMCID: PMC8099445 DOI: 10.1002/psp4.12603] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 01/07/2021] [Accepted: 02/01/2021] [Indexed: 12/26/2022] Open
Abstract
Pharmacometric modeling can capture tumor growth inhibition (TGI) dynamics and variability. These approaches do not usually consider covariates in high-dimensional settings, whereas high-dimensional molecular profiling technologies ("omics") are being increasingly considered for prediction of anticancer drug treatment response. Machine learning (ML) approaches have been applied to identify high-dimensional omics predictors for treatment outcome. Here, we aimed to combine TGI modeling and ML approaches for two distinct aims: omics-based prediction of tumor growth profiles and identification of pathways associated with treatment response and resistance. We propose a two-step approach combining ML using least absolute shrinkage and selection operator (LASSO) regression with pharmacometric modeling. We demonstrate our workflow using a previously published dataset consisting of 4706 tumor growth profiles of patient-derived xenograft (PDX) models treated with a variety of mono- and combination regimens. Pharmacometric TGI models were fit to the tumor growth profiles. The obtained empirical Bayes estimates-derived TGI parameter values were regressed using the LASSO on high-dimensional genomic copy number variation data, which contained over 20,000 variables. The predictive model was able to decrease median prediction error by 4% as compared with a model without any genomic information. A total of 74 pathways were identified as related to treatment response or resistance development by LASSO, of which part was verified by literature. In conclusion, we demonstrate how the combined use of ML and pharmacometric modeling can be used to gain pharmacological understanding in genomic factors driving variation in treatment response.
Collapse
Affiliation(s)
- Laura B. Zwep
- Leiden Academic Centre for Drug ResearchLeiden UniversityLeidenThe Netherlands
- Mathematical InstituteLeiden UniversityLeidenThe Netherlands
| | | | - Martijn Jansen
- Leiden Academic Centre for Drug ResearchLeiden UniversityLeidenThe Netherlands
| | - Tingjie Guo
- Leiden Academic Centre for Drug ResearchLeiden UniversityLeidenThe Netherlands
- Department of Intensive Care MedicineAmsterdam UMCVrije Universiteit AmsterdamAmsterdamThe Netherlands
| | | | - Parth J. Upadhyay
- Leiden Academic Centre for Drug ResearchLeiden UniversityLeidenThe Netherlands
| | | |
Collapse
|
8
|
Wen C, Yang Y, Xiao Q, Huang M, Pan W. Genome-wide association studies of brain imaging data via weighted distance correlation. Bioinformatics 2021; 36:4942-4950. [PMID: 32619001 DOI: 10.1093/bioinformatics/btaa612] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Revised: 06/17/2020] [Accepted: 06/26/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Imaging genetics is mainly used to reveal the pathogenesis of neuropsychiatric risk genes and understand the relationship between human brain structure, functional and individual differences. Increasingly, the brain-wide imaging phenotypes in voxels are available to test the association with genetic markers. A challenge with analyzing such data is their high dimensionality and complex relationships. RESULTS To tackle this challenge, we introduce a weighed distance correlation (wdCor) that can assess the association between genetic markers and voxel-based imaging data. Importantly, the wdCor test takes the voxel-based data as a whole multivariate phenotype, which preserves the spatial continuity and might enhance the power. Besides, an adaptive permutation procedure is introduced to determine the P-values of the wdCor test and also alleviate the computational burden in GWAS. In extensive simulation studies, wdCor achieves much better performances compared to the original distance correlation. We also successfully apply wdCor to conduct a large-scale analysis on data from the Alzheimer's disease neuroimaging project (ADNI). AVAILABILITY AND IMPLEMENTATION Our wdCor method provides new research directions and ideas for multivariate analysis of high-dimensional data, it can also be used as a tool for scientific analysis of imaging genetics research in practical applications. The R package wdcor, and the code for reproducing all results in this article is available in Github: https://github.com/yangyuhui0129/wdcor. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Canhong Wen
- Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei 230026, China
| | - Yuhui Yang
- Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei 230026, China
| | - Quan Xiao
- Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei 230026, China
| | - Meiyan Huang
- Guangdong Provincial Key Laboratory of Medical Image Processing, School of Biomedical Engineering, Southern Medical University, Guangzhou 510515, China
| | - Wenliang Pan
- Department of Statistical Science, School of Mathematics, Sun Yat-Sen University, Guangzhou 510275, China
| | | |
Collapse
|
9
|
Zhou J, Qiu Y, Chen S, Liu L, Liao H, Chen H, Lv S, Li X. A Novel Three-Stage Framework for Association Analysis Between SNPs and Brain Regions. Front Genet 2020; 11:572350. [PMID: 33193677 PMCID: PMC7542238 DOI: 10.3389/fgene.2020.572350] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2020] [Accepted: 08/17/2020] [Indexed: 12/17/2022] Open
Abstract
Motivation: At present, a number of correlation analysis methods between SNPs and ROIs have been devised to explore the pathogenic mechanism of Alzheimer's disease. However, some of the deficiencies inherent in these methods, including lack of statistical efficacy and biological meaning. This study aims at addressing issues: insufficient correlation by previous methods (relative high regression error) and the lack of biological meaning in association analysis. Results: In this paper, a novel three-stage SNPs and ROIs correlation analysis framework is proposed. Firstly, clustering algorithm is applied to remove the potential linkage unbalanced structure of two SNPs. Then, the group sparse model is used to introduce prior information such as gene structure and linkage unbalanced structure to select feature SNPs. After the above steps, each SNP has a weight vector corresponding to each ROI, and the importance of SNPs can be judged according to the weights in the feature vector, and then the feature SNPs can be selected. Finally, for the selected feature SNPS, a support vector machine regression model is used to implement the prediction of the ROIs phenotype values. The experimental results under multiple performance measures show that the proposed method has better accuracy than other methods.
Collapse
Affiliation(s)
- Juan Zhou
- School of Software, East China Jiaotong University, Nanchang, China
| | - Yangping Qiu
- School of Software, East China Jiaotong University, Nanchang, China
| | - Shuo Chen
- School of Software, East China Jiaotong University, Nanchang, China
| | - Liyue Liu
- School of Software, East China Jiaotong University, Nanchang, China
| | - Huifa Liao
- School of Software, East China Jiaotong University, Nanchang, China
| | - Hongli Chen
- School of Software, East China Jiaotong University, Nanchang, China
| | - Shanguo Lv
- School of Software, East China Jiaotong University, Nanchang, China
| | - Xiong Li
- School of Software, East China Jiaotong University, Nanchang, China
| |
Collapse
|
10
|
Zhang L, Papachristou C, Choudhary PK, Biswas S. A Bayesian Hierarchical Framework for Pathway Analysis in Genome-Wide Association Studies. Hum Hered 2020; 84:240-255. [PMID: 32966977 DOI: 10.1159/000508664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Accepted: 05/14/2020] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Pathway analysis allows joint consideration of multiple SNPs belonging to multiple genes, which in turn belong to a biologically defined pathway. This type of analysis is usually more powerful than single-SNP analyses for detecting joint effects of variants in a pathway. METHODS We develop a Bayesian hierarchical model by fully modeling the 3-level hierarchy, namely, SNP-gene-pathway that is naturally inherent in the structure of the pathways, unlike the currently used ad hoc ways of combining such information. We model the effects at each level conditional on the effects of the levels preceding them within the generalized linear model framework. To deal with the high dimensionality, we regularize the regression coefficients through an appropriate choice of priors. The model is fit using a combination of iteratively weighted least squares and expectation-maximization algorithms to estimate the posterior modes and their standard errors. A normal approximation is used for inference. RESULTS We conduct simulations to study the proposed method and find that our method has higher power than some standard approaches in several settings for identifying pathways with multiple modest-sized variants. We illustrate the method by analyzing data from two genome-wide association studies on breast and renal cancers. CONCLUSION Our method can be helpful in detecting pathway association.
Collapse
Affiliation(s)
- Lei Zhang
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, USA
| | | | - Pankaj K Choudhary
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, USA
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, USA,
| |
Collapse
|
11
|
Detecting genetic associations with brain imaging phenotypes in Alzheimer's disease via a novel structured SCCA approach. Med Image Anal 2020; 61:101656. [PMID: 32062154 DOI: 10.1016/j.media.2020.101656] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2018] [Revised: 11/27/2019] [Accepted: 01/22/2020] [Indexed: 01/15/2023]
Abstract
Brain imaging genetics becomes an important research topic since it can reveal complex associations between genetic factors and the structures or functions of the human brain. Sparse canonical correlation analysis (SCCA) is a popular bi-multivariate association identification method. To mine the complex genetic basis of brain imaging phenotypes, there arise many SCCA methods with a variety of norms for incorporating different structures of interest. They often use the group lasso penalty, the fused lasso or the graph/network guided fused lasso ones. However, the group lasso methods have limited capability because of the incomplete or unavailable prior knowledge in real applications. The fused lasso and graph/network guided methods are sensitive to the sign of the sample correlation which may be incorrectly estimated. In this paper, we introduce two new penalties to improve the fused lasso and the graph/network guided lasso penalties in structured sparse learning. We impose both penalties to the SCCA model and propose an optimization algorithm to solve it. The proposed SCCA method has a strong upper bound of grouping effects for both positively and negatively highly correlated variables. We show that, on both synthetic and real neuroimaging genetics data, the proposed SCCA method performs better than or equally to the conventional methods using fused lasso or graph/network guided fused lasso. In particular, the proposed method identifies higher canonical correlation coefficients and captures clearer canonical weight patterns, demonstrating its promising capability in revealing biologically meaningful imaging genetic associations.
Collapse
|
12
|
Shen L, Thompson PM. Brain Imaging Genomics: Integrated Analysis and Machine Learning. PROCEEDINGS OF THE IEEE. INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS 2020; 108:125-162. [PMID: 31902950 PMCID: PMC6941751 DOI: 10.1109/jproc.2019.2947272] [Citation(s) in RCA: 76] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Brain imaging genomics is an emerging data science field, where integrated analysis of brain imaging and genomics data, often combined with other biomarker, clinical and environmental data, is performed to gain new insights into the phenotypic, genetic and molecular characteristics of the brain as well as their impact on normal and disordered brain function and behavior. It has enormous potential to contribute significantly to biomedical discoveries in brain science. Given the increasingly important role of statistical and machine learning in biomedicine and rapidly growing literature in brain imaging genomics, we provide an up-to-date and comprehensive review of statistical and machine learning methods for brain imaging genomics, as well as a practical discussion on method selection for various biomedical applications.
Collapse
Affiliation(s)
- Li Shen
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
| | - Paul M Thompson
- Imaging Genetics Center, Mark & Mary Stevens Institute for Neuroimaging & Informatics, Keck School of Medicine, University of Southern California, Los Angeles, CA 90232, USA
| |
Collapse
|
13
|
Stelzer AS, Maccioni L, Gerhold-Ay A, Smedby KE, Schumacher M, Nieters A, Binder H. A multivariable approach for risk markers from pooled molecular data with only partial overlap. BMC MEDICAL GENETICS 2019; 20:128. [PMID: 31324155 PMCID: PMC6642584 DOI: 10.1186/s12881-019-0849-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Accepted: 06/19/2019] [Indexed: 11/29/2022]
Abstract
Background Increasingly, molecular measurements from multiple studies are pooled to identify risk scores, with only partial overlap of measurements available from different studies. Univariate analyses of such markers have routinely been performed in such settings using meta-analysis techniques in genome-wide association studies for identifying genetic risk scores. In contrast, multivariable techniques such as regularized regression, which might potentially be more powerful, are hampered by only partial overlap of available markers even when the pooling of individual level data is feasible for analysis. This cannot easily be addressed at a preprocessing level, as quality criteria in the different studies may result in differential availability of markers – even after imputation. Methods Motivated by data from the InterLymph Consortium on risk factors for non-Hodgkin lymphoma, which exhibits these challenges, we adapted a regularized regression approach, componentwise boosting, for dealing with partial overlap in SNPs. This synthesis regression approach is combined with resampling to determine stable sets of single nucleotide polymorphisms, which could feed into a genetic risk score. The proposed approach is contrasted with univariate analyses, an application of the lasso, and with an analysis that discards studies causing the partial overlap. The question of statistical significance is faced with an approach called stability selection. Results Using an excerpt of the data from the InterLymph Consortium on two specific subtypes of non-Hodgkin lymphoma, it is shown that componentwise boosting can take into account all applicable information from different SNPs, irrespective of whether they are covered by all investigated studies and for all individuals in the single studies. The results indicate increased power, even when studies that would be discarded in a complete case analysis only comprise a small proportion of individuals. Conclusions Given the observed gains in power, the proposed approach can be recommended more generally whenever there is only partial overlap of molecular measurements obtained from pooled studies and/or missing data in single studies. A corresponding software implementation is available upon request. Trial registration All involved studies have provided signed GWAS data submission certifications to the U.S. National Institute of Health and have been retrospectively registered. Electronic supplementary material The online version of this article (10.1186/s12881-019-0849-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Anne-Sophie Stelzer
- Forest Research Institute Baden-Württemberg (FVA), Wonnhaldestraße 4, Freiburg, 79100, Germany. .,Institute for Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Stefan-Meier-Straße 26, Freiburg, 79104, Germany. .,Freiburg Center for Data Analysis and Modeling, University of Freiburg, Eckerstraße 1, Freiburg, 79104, Germany. .,Center for Chronic Immunodeficiency, Faculty of Medicine and Medical Center - University of Freiburg, Breisacher Straße 115, Freiburg, 79106, Germany.
| | - Livia Maccioni
- Center for Chronic Immunodeficiency, Faculty of Medicine and Medical Center - University of Freiburg, Breisacher Straße 115, Freiburg, 79106, Germany
| | - Aslihan Gerhold-Ay
- Institute of Medical Biostatistics, Epidemiology and Informatics, University Medical Center Johannes Gutenberg University Mainz, Obere Zahlbacher Straße 69, Mainz, 55131, Germany
| | - Karin E Smedby
- Department of Medicine, Solna (MedS), Eugeniahemmet, T2, Karolinska Universitetssjukhuset, Solna, Stockholm, 17176, Sweden
| | - Martin Schumacher
- Institute for Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Stefan-Meier-Straße 26, Freiburg, 79104, Germany
| | - Alexandra Nieters
- Center for Chronic Immunodeficiency, Faculty of Medicine and Medical Center - University of Freiburg, Breisacher Straße 115, Freiburg, 79106, Germany
| | - Harald Binder
- Institute for Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Stefan-Meier-Straße 26, Freiburg, 79104, Germany
| |
Collapse
|
14
|
Jackknife Model Averaging Prediction Methods for Complex Phenotypes with Gene Expression Levels by Integrating External Pathway Information. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2019; 2019:2807470. [PMID: 31089389 PMCID: PMC6476151 DOI: 10.1155/2019/2807470] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/13/2019] [Accepted: 03/20/2019] [Indexed: 01/03/2023]
Abstract
Motivation In the past few years many prediction approaches have been proposed and widely employed in high dimensional genetic data for disease risk evaluation. However, those approaches typically ignore in model fitting the important group structures that naturally exists in genetic data. Methods In the present study, we applied a novel model-averaging approach, called jackknife model averaging prediction (JMAP), for high dimensional genetic risk prediction while incorporating pathway information into the model specification. JMAP selects the optimal weights across candidate models by minimizing a cross validation criterion in a jackknife way. Compared with previous approaches, one of the primary features of JMAP is to allow model weights to vary from 0 to 1 but without the limitation that the summation of weights is equal to one. We evaluated the performance of JMAP using extensive simulation studies and compared it with existing methods. We finally applied JMAP to four real cancer datasets that are publicly available from TCGA. Results The simulations showed that compared with other existing approaches (e.g., gsslasso), JMAP performed best or is among the best methods across a range of scenarios. For example, among 14 out of 16 simulation settings with PVE = 0.3, JMAP has an average of 0.075 higher prediction accuracy compared with gsslasso. We further found that in the simulation, the model weights for the true candidate models have much smaller chances to be zero compared with those for the null candidate models and are substantially greater in magnitude. In the real data application, JMAP also behaves comparably or better compared with the other methods for continuous phenotypes. For example, for the COAD, CRC, and PAAD datasets, the average gains of predictive accuracy of JMAP are 0.019, 0.064, and 0.052 compared with gsslasso. Conclusion The proposed method JMAP is a novel model-averaging approach for high dimensional genetic risk prediction while incorporating external useful group structures into the model specification.
Collapse
|
15
|
Tang Z, Lei S, Zhang X, Yi Z, Guo B, Chen JY, Shen Y, Yi N. Gsslasso Cox: a Bayesian hierarchical model for predicting survival and detecting associated genes by incorporating pathway information. BMC Bioinformatics 2019; 20:94. [PMID: 30813883 PMCID: PMC6391807 DOI: 10.1186/s12859-019-2656-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2018] [Accepted: 01/28/2019] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Group structures among genes encoded in functional relationships or biological pathways are valuable and unique features in large-scale molecular data for survival analysis. However, most of previous approaches for molecular data analysis ignore such group structures. It is desirable to develop powerful analytic methods for incorporating valuable pathway information for predicting disease survival outcomes and detecting associated genes. RESULTS We here propose a Bayesian hierarchical Cox survival model, called the group spike-and-slab lasso Cox (gsslasso Cox), for predicting disease survival outcomes and detecting associated genes by incorporating group structures of biological pathways. Our hierarchical model employs a novel prior on the coefficients of genes, i.e., the group spike-and-slab double-exponential distribution, to integrate group structures and to adaptively shrink the effects of genes. We have developed a fast and stable deterministic algorithm to fit the proposed models. We performed extensive simulation studies to assess the model fitting properties and the prognostic performance of the proposed method, and also applied our method to analyze three cancer data sets. CONCLUSIONS Both the theoretical and empirical studies show that the proposed method can induce weaker shrinkage on predictors in an active pathway, thereby incorporating the biological similarity of genes within a same pathway into the hierarchical modeling. Compared with several existing methods, the proposed method can more accurately estimate gene effects and can better predict survival outcomes. For the three cancer data sets, the results show that the proposed method generates more powerful models for survival prediction and detecting associated genes. The method has been implemented in a freely available R package BhGLM at https://github.com/nyiuab/BhGLM .
Collapse
Affiliation(s)
- Zaixiang Tang
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, University of Alabama at Birmingham, Suzhou, 215123 China
- Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Suzhou, 215123 China
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL 35294-0022 USA
| | - Shufeng Lei
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, University of Alabama at Birmingham, Suzhou, 215123 China
- Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Suzhou, 215123 China
| | - Xinyan Zhang
- Department of Biostatistics, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro, GA 30458 USA
| | - Zixuan Yi
- Eastern Virginia Medical School, Norfork, VA 23507 USA
| | - Boyi Guo
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL 35294-0022 USA
| | - Jake Y. Chen
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294 USA
| | - Yueping Shen
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, University of Alabama at Birmingham, Suzhou, 215123 China
- Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Suzhou, 215123 China
| | - Nengjun Yi
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL 35294-0022 USA
| |
Collapse
|
16
|
Yang A, Miller D, Pan Q. Constrained maximum entropy models to select genotype interactions associated with censored failure times. J Bioinform Comput Biol 2018; 16:1840024. [PMID: 30567478 DOI: 10.1142/s0219720018400243] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
We propose a novel screening method targeting genotype interactions associated with disease risks. The proposed method extends the maximum entropy conditional probability model to address disease occurrences over time. Continuous occurrence times are grouped into intervals. The model estimates the conditional distribution over the disease occurrence intervals given individual genotypes by maximizing the corresponding entropy subject to constraints linking genotype interactions to time intervals. The EM algorithm is employed to handle observations with uncertainty, for which the disease occurrence is censored. Stepwise greedy search is proposed to screen a large number of candidate constraints. The minimum description length is employed to select the optimal set of constraints. Extensive simulations show that five or so quantile-dependent intervals are sufficient to categorize disease outcomes into different risk groups. Performance depends on sample size, number of genotypes, and minor allele frequencies. The proposed method outperforms the likelihood ratio test, Lasso, and a previous maximum entropy method with only binary (disease occurrence, non-occurrence) outcomes. Finally, a GWAS study for type 1 diabetes patients is used to illustrate our method. Novel one-genotype and two-genotype interactions associated with neuropathy are identified.
Collapse
Affiliation(s)
- Aotian Yang
- * Department of Statistics, George Washington University, Washington, DC 20052, USA
| | - David Miller
- † Department of Electrical Engineering, Pennsylvania State University, State College, PA 16801, USA
| | - Qing Pan
- * Department of Statistics, George Washington University, Washington, DC 20052, USA
| |
Collapse
|
17
|
Tang Z, Shen Y, Li Y, Zhang X, Wen J, Qian C, Zhuang W, Shi X, Yi N. Group spike-and-slab lasso generalized linear models for disease prediction and associated genes detection by incorporating pathway information. Bioinformatics 2018; 34:901-910. [PMID: 29077795 PMCID: PMC5860634 DOI: 10.1093/bioinformatics/btx684] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2017] [Revised: 10/05/2017] [Accepted: 10/24/2017] [Indexed: 01/10/2023] Open
Abstract
Motivation Large-scale molecular data have been increasingly used as an important resource for prognostic prediction of diseases and detection of associated genes. However, standard approaches for omics data analysis ignore the group structure among genes encoded in functional relationships or pathway information. Results We propose new Bayesian hierarchical generalized linear models, called group spike-and-slab lasso GLMs, for predicting disease outcomes and detecting associated genes by incorporating large-scale molecular data and group structures. The proposed model employs a mixture double-exponential prior for coefficients that induces self-adaptive shrinkage amount on different coefficients. The group information is incorporated into the model by setting group-specific parameters. We have developed a fast and stable deterministic algorithm to fit the proposed hierarchal GLMs, which can perform variable selection within groups. We assess the performance of the proposed method on several simulated scenarios, by varying the overlap among groups, group size, number of non-null groups, and the correlation within group. Compared with existing methods, the proposed method provides not only more accurate estimates of the parameters but also better prediction. We further demonstrate the application of the proposed procedure on three cancer datasets by utilizing pathway structures of genes. Our results show that the proposed method generates powerful models for predicting disease outcomes and detecting associated genes. Availability and implementation The methods have been implemented in a freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/). Contact nyi@uab.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zaixiang Tang
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, China
- Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Suzhou, China
- Center for Genetic Epidemiology and Genomics, Medical College of Soochow University, Suzhou, China
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Yueping Shen
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, China
- Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Suzhou, China
| | - Yan Li
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Xinyan Zhang
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Jia Wen
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, USA
| | - Chen’ao Qian
- Department of Bioinformatics, School of Biology & Basic Medical Science, Soochow University, Suzhou, China
| | - Wenzhuo Zhuang
- Department of Cell Biology, School of Biology & Basic Medical Science, Soochow University, Suzhou, China
| | - Xinghua Shi
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, USA
| | - Nengjun Yi
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, USA
| |
Collapse
|
18
|
Machine learning shows association between genetic variability in PPARG and cerebral connectivity in preterm infants. Proc Natl Acad Sci U S A 2017; 114:13744-13749. [PMID: 29229843 PMCID: PMC5748164 DOI: 10.1073/pnas.1704907114] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Preterm birth affects 11% of births globally; 35% of infants develop long-term neurocognitive problems, and prematurity leads to the loss of 75 million disability adjusted life years per annum worldwide. Imaging studies have shown that these infants have extensive alterations in brain development, but little is known about the molecular or cellular mechanisms involved. This imaging genetics study found a strong association between abnormal cerebral connectivity and variability in the PPARG gene, implicating PPARG signaling in abnormal white-matter development in preterm infants and suggesting a tractable new target for therapeutic research. Preterm infants show abnormal structural and functional brain development, and have a high risk of long-term neurocognitive problems. The molecular and cellular mechanisms involved are poorly understood, but novel methods now make it possible to address them by examining the relationship between common genetic variability and brain endophenotype. We addressed the hypothesis that variability in the Peroxisome Proliferator Activated Receptor (PPAR) pathway would be related to brain development. We employed machine learning in an unsupervised, unbiased, combined analysis of whole-brain diffusion tractography together with genomewide, single-nucleotide polymorphism (SNP)-based genotypes from a cohort of 272 preterm infants, using Sparse Reduced Rank Regression (sRRR) and correcting for ethnicity and age at birth and imaging. Empirical selection frequencies for SNPs associated with cerebral connectivity ranged from 0.663 to zero, with multiple highly selected SNPs mapping to genes for PPARG (six SNPs), ITGA6 (four SNPs), and FXR1 (two SNPs). SNPs in PPARG were significantly overrepresented (ranked 7–11 and 67 of 556,000 SNPs; P < 2.2 × 10−7), and were mostly in introns or regulatory regions with predicted effects including protein coding and nonsense-mediated decay. Edge-centric graph-theoretic analysis showed that highly selected white-matter tracts were consistent across the group and important for information transfer (P < 2.2 × 10−17); they most often connected to the insula (P < 6 × 10−17). These results suggest that the inhibited brain development seen in humans exposed to the stress of a premature extrauterine environment is modulated by genetic factors, and that PPARG signaling has a previously unrecognized role in cerebral development.
Collapse
|
19
|
Characterizing Gene and Protein Crosstalks in Subjects at Risk of Developing Alzheimer’s Disease: A New Computational Approach. Processes (Basel) 2017. [DOI: 10.3390/pr5030047] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
|
20
|
Krishnan ML, Wang Z, Silver M, Boardman JP, Ball G, Counsell SJ, Walley AJ, Montana G, Edwards AD. Possible relationship between common genetic variation and white matter development in a pilot study of preterm infants. Brain Behav 2016; 6:e00434. [PMID: 27110435 PMCID: PMC4821839 DOI: 10.1002/brb3.434] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/08/2015] [Revised: 12/16/2015] [Accepted: 12/19/2015] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND The consequences of preterm birth are a major public health concern with high rates of ensuing multisystem morbidity, and uncertain biological mechanisms. Common genetic variation may mediate vulnerability to the insult of prematurity and provide opportunities to predict and modify risk. OBJECTIVE To gain novel biological and therapeutic insights from the integrated analysis of magnetic resonance imaging and genetic data, informed by prior knowledge. METHODS We apply our previously validated pathway-based statistical method and a novel network-based method to discover sources of common genetic variation associated with imaging features indicative of structural brain damage. RESULTS Lipid pathways were highly ranked by Pathways Sparse Reduced Rank Regression in a model examining the effect of prematurity, and PPAR (peroxisome proliferator-activated receptor) signaling was the highest ranked pathway once degree of prematurity was accounted for. Within the PPAR pathway, five genes were found by Graph Guided Group Lasso to be highly associated with the phenotype: aquaporin 7 (AQP7), malic enzyme 1, NADP(+)-dependent, cytosolic (ME1), perilipin 1 (PLIN1), solute carrier family 27 (fatty acid transporter), member 1 (SLC27A1), and acetyl-CoA acyltransferase 1 (ACAA1). Expression of four of these (ACAA1, AQP7, ME1, and SLC27A1) is controlled by a common transcription factor, early growth response 4 (EGR-4). CONCLUSIONS This suggests an important role for lipid pathways in influencing development of white matter in preterm infants, and in particular a significant role for interindividual genetic variation in PPAR signaling.
Collapse
Affiliation(s)
- Michelle L Krishnan
- Centre for the Developing Brain King's College London St Thomas' Hospital London SE1 7EH UK
| | - Zi Wang
- Department of Biomedical Engineering King's College London St Thomas' Hospital London SE1 7EH UK
| | - Matt Silver
- Department of Population Health London School of Hygiene and Tropical Medicine London WC1E 7HT UK
| | - James P Boardman
- MRC Centre for Reproductive Health University of Edinburgh Edinburgh EH16 4TJ UK
| | - Gareth Ball
- Centre for the Developing Brain King's College London St Thomas' Hospital London SE1 7EH UK
| | - Serena J Counsell
- Centre for the Developing Brain King's College London St Thomas' Hospital London SE1 7EH UK
| | - Andrew J Walley
- School of Public Health Faculty of Medicine Imperial College London Norfolk Place London W2 1PG UK
| | - Giovanni Montana
- Department of Biomedical Engineering King's College London St Thomas' Hospital London SE1 7EH UK
| | - Anthony David Edwards
- Centre for the Developing Brain King's College London St Thomas' Hospital London SE1 7EH UK
| |
Collapse
|
21
|
Edwards SM, Thomsen B, Madsen P, Sørensen P. Partitioning of genomic variance reveals biological pathways associated with udder health and milk production traits in dairy cattle. Genet Sel Evol 2015; 47:60. [PMID: 26169777 PMCID: PMC4499908 DOI: 10.1186/s12711-015-0132-6] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2014] [Accepted: 06/12/2015] [Indexed: 12/20/2022] Open
Abstract
Background We have used a linear mixed model (LMM) approach to examine the joint contribution of genetic markers associated with a biological pathway. However, with these markers being scattered throughout the genome, we are faced with the challenge of modelling the contribution from several, sometimes even all, chromosomes at once. Due to linkage disequilibrium (LD), all markers may be assumed to account for some genomic variance; but the question is whether random sets of markers account for the same genomic variance as markers associated with a biological pathway? Results We applied the LMM approach to identify biological pathways associated with udder health and milk production traits in dairy cattle. A random gene sampling procedure was applied to assess the biological pathways in a dataset that has an inherently complex genetic correlation pattern due to the population structure of dairy cattle, and to linkage disequilibrium within the bovine genome and within the genes associated to the biological pathway. Conclusions Several biological pathways that were significantly associated with health and production traits were identified in dairy cattle; i.e. the markers linked to these pathways explained more of the genomic variance and provided a better model fit than 95 % of the randomly sampled gene groups. Our results show that immune related pathways are associated with production traits, and that pathways that include a causal marker for production traits are identified with our procedure. We are confident that the LMM approach provides a general framework to exploit and integrate prior biological information and could potentially lead to improved understanding of the genetic architecture of complex traits and diseases. Electronic supplementary material The online version of this article (doi:10.1186/s12711-015-0132-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Stefan M Edwards
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, Blichers Allé 20, P.O. Box 50, Tjele, DK-8830, Denmark.
| | - Bo Thomsen
- Department of Molecular Biology and Genetics, Aarhus University, Blichers Allé 20, P.O. Box 50, Tjele, DK-8830, Denmark.
| | - Per Madsen
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, Blichers Allé 20, P.O. Box 50, Tjele, DK-8830, Denmark.
| | - Peter Sørensen
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, Blichers Allé 20, P.O. Box 50, Tjele, DK-8830, Denmark.
| |
Collapse
|
22
|
Weiner MW, Veitch DP, Aisen PS, Beckett LA, Cairns NJ, Cedarbaum J, Green RC, Harvey D, Jack CR, Jagust W, Luthman J, Morris JC, Petersen RC, Saykin AJ, Shaw L, Shen L, Schwarz A, Toga AW, Trojanowski JQ. 2014 Update of the Alzheimer's Disease Neuroimaging Initiative: A review of papers published since its inception. Alzheimers Dement 2015; 11:e1-120. [PMID: 26073027 PMCID: PMC5469297 DOI: 10.1016/j.jalz.2014.11.001] [Citation(s) in RCA: 203] [Impact Index Per Article: 22.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Revised: 04/18/2013] [Indexed: 01/18/2023]
Abstract
The Alzheimer's Disease Neuroimaging Initiative (ADNI) is an ongoing, longitudinal, multicenter study designed to develop clinical, imaging, genetic, and biochemical biomarkers for the early detection and tracking of Alzheimer's disease (AD). The initial study, ADNI-1, enrolled 400 subjects with early mild cognitive impairment (MCI), 200 with early AD, and 200 cognitively normal elderly controls. ADNI-1 was extended by a 2-year Grand Opportunities grant in 2009 and by a competitive renewal, ADNI-2, which enrolled an additional 550 participants and will run until 2015. This article reviews all papers published since the inception of the initiative and summarizes the results to the end of 2013. The major accomplishments of ADNI have been as follows: (1) the development of standardized methods for clinical tests, magnetic resonance imaging (MRI), positron emission tomography (PET), and cerebrospinal fluid (CSF) biomarkers in a multicenter setting; (2) elucidation of the patterns and rates of change of imaging and CSF biomarker measurements in control subjects, MCI patients, and AD patients. CSF biomarkers are largely consistent with disease trajectories predicted by β-amyloid cascade (Hardy, J Alzheimer's Dis 2006;9(Suppl 3):151-3) and tau-mediated neurodegeneration hypotheses for AD, whereas brain atrophy and hypometabolism levels show predicted patterns but exhibit differing rates of change depending on region and disease severity; (3) the assessment of alternative methods of diagnostic categorization. Currently, the best classifiers select and combine optimum features from multiple modalities, including MRI, [(18)F]-fluorodeoxyglucose-PET, amyloid PET, CSF biomarkers, and clinical tests; (4) the development of blood biomarkers for AD as potentially noninvasive and low-cost alternatives to CSF biomarkers for AD diagnosis and the assessment of α-syn as an additional biomarker; (5) the development of methods for the early detection of AD. CSF biomarkers, β-amyloid 42 and tau, as well as amyloid PET may reflect the earliest steps in AD pathology in mildly symptomatic or even nonsymptomatic subjects and are leading candidates for the detection of AD in its preclinical stages; (6) the improvement of clinical trial efficiency through the identification of subjects most likely to undergo imminent future clinical decline and the use of more sensitive outcome measures to reduce sample sizes. Multimodal methods incorporating APOE status and longitudinal MRI proved most highly predictive of future decline. Refinements of clinical tests used as outcome measures such as clinical dementia rating-sum of boxes further reduced sample sizes; (7) the pioneering of genome-wide association studies that leverage quantitative imaging and biomarker phenotypes, including longitudinal data, to confirm recently identified loci, CR1, CLU, and PICALM and to identify novel AD risk loci; (8) worldwide impact through the establishment of ADNI-like programs in Japan, Australia, Argentina, Taiwan, China, Korea, Europe, and Italy; (9) understanding the biology and pathobiology of normal aging, MCI, and AD through integration of ADNI biomarker and clinical data to stimulate research that will resolve controversies about competing hypotheses on the etiopathogenesis of AD, thereby advancing efforts to find disease-modifying drugs for AD; and (10) the establishment of infrastructure to allow sharing of all raw and processed data without embargo to interested scientific investigators throughout the world.
Collapse
Affiliation(s)
- Michael W Weiner
- Department of Veterans Affairs Medical Center, Center for Imaging of Neurodegenerative Diseases, San Francisco, CA, USA; Department of Radiology, University of California, San Francisco, CA, USA; Department of Medicine, University of California, San Francisco, CA, USA; Department of Psychiatry, University of California, San Francisco, CA, USA; Department of Neurology, University of California, San Francisco, CA, USA.
| | - Dallas P Veitch
- Department of Veterans Affairs Medical Center, Center for Imaging of Neurodegenerative Diseases, San Francisco, CA, USA
| | - Paul S Aisen
- Department of Neurosciences, University of California, San Diego, La Jolla, CA, USA
| | - Laurel A Beckett
- Division of Biostatistics, Department of Public Health Sciences, University of California, Davis, CA, USA
| | - Nigel J Cairns
- Knight Alzheimer's Disease Research Center, Washington University School of Medicine, Saint Louis, MO, USA; Department of Neurology, Washington University School of Medicine, Saint Louis, MO, USA
| | - Jesse Cedarbaum
- Neurology Early Clinical Development, Biogen Idec, Cambridge, MA, USA
| | - Robert C Green
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Danielle Harvey
- Division of Biostatistics, Department of Public Health Sciences, University of California, Davis, CA, USA
| | | | - William Jagust
- Helen Wills Neuroscience Institute, University of California Berkeley, Berkeley, CA, USA
| | - Johan Luthman
- Neuroscience Clinical Development, Neuroscience & General Medicine Product Creation Unit, Eisai Inc., Philadelphia, PA, USA
| | - John C Morris
- Department of Neurosciences, University of California, San Diego, La Jolla, CA, USA
| | | | - Andrew J Saykin
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA; Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Leslie Shaw
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Li Shen
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Adam Schwarz
- Tailored Therapeutics, Eli Lilly and Company, Indianapolis, IN, USA
| | - Arthur W Toga
- Laboratory of Neuroimaging, Institute of Neuroimaging and Informatics, Keck School of Medicine of University of Southern California, Los Angeles, CA, USA
| | - John Q Trojanowski
- Institute on Aging, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Alzheimer's Disease Core Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Udall Parkinson's Research Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Department of Pathology and Laboratory Medicine, Center for Neurodegenerative Research, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
23
|
Lin D, Cao H, Calhoun VD, Wang YP. Sparse models for correlative and integrative analysis of imaging and genetic data. J Neurosci Methods 2014; 237:69-78. [PMID: 25218561 DOI: 10.1016/j.jneumeth.2014.09.001] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2014] [Revised: 08/27/2014] [Accepted: 09/01/2014] [Indexed: 11/29/2022]
Abstract
The development of advanced medical imaging technologies and high-throughput genomic measurements has enhanced our ability to understand their interplay as well as their relationship with human behavior by integrating these two types of datasets. However, the high dimensionality and heterogeneity of these datasets presents a challenge to conventional statistical methods; there is a high demand for the development of both correlative and integrative analysis approaches. Here, we review our recent work on developing sparse representation based approaches to address this challenge. We show how sparse models are applied to the correlation and integration of imaging and genetic data for biomarker identification. We present examples on how these approaches are used for the detection of risk genes and classification of complex diseases such as schizophrenia. Finally, we discuss future directions on the integration of multiple imaging and genomic datasets including their interactions such as epistasis.
Collapse
Affiliation(s)
- Dongdong Lin
- Department of Biomedical Engineering, Tulane University, New Orleans, LA, 70118, USA; Center of Genomics and Bioinformatics, Tulane University, New Orleans, LA, 70112, USA.
| | - Hongbao Cao
- Unit on Statistical Genomics, Intramural Program of Research, National Institute of Mental Health, NIH, Bethesda 20852, USA.
| | - Vince D Calhoun
- The Mind Research Network & LBERI, Albuquerque, NM 87106, USA; Department of Electrical and Computer Engineering, University of New Mexico, Albuquerque, NM 87131, USA.
| | - Yu-Ping Wang
- Department of Biomedical Engineering, Tulane University, New Orleans, LA, 70118, USA; Center of Genomics and Bioinformatics, Tulane University, New Orleans, LA, 70112, USA.
| |
Collapse
|
24
|
Shen L, Thompson PM, Potkin SG, Bertram L, Farrer LA, Foroud TM, Green RC, Hu X, Huentelman MJ, Kim S, Kauwe JSK, Li Q, Liu E, Macciardi F, Moore JH, Munsie L, Nho K, Ramanan VK, Risacher SL, Stone DJ, Swaminathan S, Toga AW, Weiner MW, Saykin AJ. Genetic analysis of quantitative phenotypes in AD and MCI: imaging, cognition and biomarkers. Brain Imaging Behav 2014; 8:183-207. [PMID: 24092460 PMCID: PMC3976843 DOI: 10.1007/s11682-013-9262-z] [Citation(s) in RCA: 121] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
The Genetics Core of the Alzheimer's Disease Neuroimaging Initiative (ADNI), formally established in 2009, aims to provide resources and facilitate research related to genetic predictors of multidimensional Alzheimer's disease (AD)-related phenotypes. Here, we provide a systematic review of genetic studies published between 2009 and 2012 where either ADNI APOE genotype or genome-wide association study (GWAS) data were used. We review and synthesize ADNI genetic associations with disease status or quantitative disease endophenotypes including structural and functional neuroimaging, fluid biomarker assays, and cognitive performance. We also discuss the diverse analytical strategies used in these studies, including univariate and multivariate analysis, meta-analysis, pathway analysis, and interaction and network analysis. Finally, we perform pathway and network enrichment analyses of these ADNI genetic associations to highlight key mechanisms that may drive disease onset and trajectory. Major ADNI findings included all the top 10 AD genes and several of these (e.g., APOE, BIN1, CLU, CR1, and PICALM) were corroborated by ADNI imaging, fluid and cognitive phenotypes. ADNI imaging genetics studies discovered novel findings (e.g., FRMD6) that were later replicated on different data sets. Several other genes (e.g., APOC1, FTO, GRIN2B, MAGI2, and TOMM40) were associated with multiple ADNI phenotypes, warranting further investigation on other data sets. The broad availability and wide scope of ADNI genetic and phenotypic data has advanced our understanding of the genetic basis of AD and has nominated novel targets for future studies employing next-generation sequencing and convergent multi-omics approaches, and for clinical drug and biomarker development.
Collapse
Affiliation(s)
- Li Shen
- Center for Neuroimaging and Indiana Alzheimer’s Disease Center, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, 355 W 16th Street, Suite 4100, Indianapolis, IN 46202 USA
| | - Paul M. Thompson
- Imaging Genetics Center, Laboratory of Neuro Imaging, Department of Neurology, UCLA School of Medicine, Los Angeles, CA 90095 USA
| | - Steven G. Potkin
- Department of Psychiatry and Human Behavior, University of California Irvine, Irvine, CA 92617 USA
| | - Lars Bertram
- Neuropsychiatric Genetics Group, Max-Planck Institute for Molecular Genetics, Berlin, Germany
| | - Lindsay A. Farrer
- Biomedical Genetics L320, Boston University School of Medicine, 72 East Concord Street, Boston, MA 02118 USA
| | - Tatiana M. Foroud
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202 USA
| | - Robert C. Green
- Division of Genetics and Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115 USA
| | - Xiaolan Hu
- Clinical Genetics, Exploratory Clinical & Translational Research, Bristol-Myers Squibbs, Pennington, NJ 08534 USA
| | - Matthew J. Huentelman
- Neurogenomics Division, The Translational Genomics Research Institute, Phoenix, AZ 85004 USA
| | - Sungeun Kim
- Center for Neuroimaging and Indiana Alzheimer’s Disease Center, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, 355 W 16th Street, Suite 4100, Indianapolis, IN 46202 USA
| | - John S. K. Kauwe
- Departments of Biology, Neuroscience, Brigham Young University, 675 WIDB, Provo, UT 84602 USA
| | - Qingqin Li
- Department of Neuroscience Biomarkers, Janssen Research and Development, LLC, Raritan, NJ 08869 USA
| | - Enchi Liu
- Biomarker Discovery, Janssen Alzheimer Immunotherapy Research and Development, LLC, South San Francisco, CA 94080 USA
| | - Fabio Macciardi
- Department of Psychiatry and Human Behavior, University of California Irvine, Irvine, CA 92617 USA
- Department of Sciences and Biomedical Technologies, University of Milan, Segrate, MI Italy
| | - Jason H. Moore
- Department of Genetics, Computational Genetics Laboratory, Dartmouth Medical School, Lebanon, NH 03756 USA
| | - Leanne Munsie
- Tailored Therapeutics, Eli Lilly and Company, Indianapolis, IN 46285 USA
| | - Kwangsik Nho
- Center for Neuroimaging and Indiana Alzheimer’s Disease Center, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, 355 W 16th Street, Suite 4100, Indianapolis, IN 46202 USA
| | - Vijay K. Ramanan
- Center for Neuroimaging and Indiana Alzheimer’s Disease Center, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, 355 W 16th Street, Suite 4100, Indianapolis, IN 46202 USA
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202 USA
| | - Shannon L. Risacher
- Center for Neuroimaging and Indiana Alzheimer’s Disease Center, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, 355 W 16th Street, Suite 4100, Indianapolis, IN 46202 USA
| | - David J. Stone
- Merck Research Laboratories, 770 Sumneytown Pike, WP53B-120, West Point, PA 19486 USA
| | - Shanker Swaminathan
- Center for Neuroimaging and Indiana Alzheimer’s Disease Center, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, 355 W 16th Street, Suite 4100, Indianapolis, IN 46202 USA
| | - Arthur W. Toga
- Laboratory of Neuro Imaging, Department of Neurology, UCLA School of Medicine, Los Angeles, CA 90095 USA
| | - Michael W. Weiner
- Departments of Radiology, Medicine and Psychiatry, UC San Francisco, San Francisco, CA 94143 USA
| | - Andrew J. Saykin
- Center for Neuroimaging and Indiana Alzheimer’s Disease Center, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, 355 W 16th Street, Suite 4100, Indianapolis, IN 46202 USA
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202 USA
| | - for the Alzheimer’s Disease Neuroimaging Initiative
- Center for Neuroimaging and Indiana Alzheimer’s Disease Center, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, 355 W 16th Street, Suite 4100, Indianapolis, IN 46202 USA
- Imaging Genetics Center, Laboratory of Neuro Imaging, Department of Neurology, UCLA School of Medicine, Los Angeles, CA 90095 USA
- Department of Psychiatry and Human Behavior, University of California Irvine, Irvine, CA 92617 USA
- Neuropsychiatric Genetics Group, Max-Planck Institute for Molecular Genetics, Berlin, Germany
- Biomedical Genetics L320, Boston University School of Medicine, 72 East Concord Street, Boston, MA 02118 USA
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202 USA
- Division of Genetics and Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115 USA
- Clinical Genetics, Exploratory Clinical & Translational Research, Bristol-Myers Squibbs, Pennington, NJ 08534 USA
- Neurogenomics Division, The Translational Genomics Research Institute, Phoenix, AZ 85004 USA
- Departments of Biology, Neuroscience, Brigham Young University, 675 WIDB, Provo, UT 84602 USA
- Department of Neuroscience Biomarkers, Janssen Research and Development, LLC, Raritan, NJ 08869 USA
- Biomarker Discovery, Janssen Alzheimer Immunotherapy Research and Development, LLC, South San Francisco, CA 94080 USA
- Department of Sciences and Biomedical Technologies, University of Milan, Segrate, MI Italy
- Department of Genetics, Computational Genetics Laboratory, Dartmouth Medical School, Lebanon, NH 03756 USA
- Tailored Therapeutics, Eli Lilly and Company, Indianapolis, IN 46285 USA
- Merck Research Laboratories, 770 Sumneytown Pike, WP53B-120, West Point, PA 19486 USA
- Laboratory of Neuro Imaging, Department of Neurology, UCLA School of Medicine, Los Angeles, CA 90095 USA
- Departments of Radiology, Medicine and Psychiatry, UC San Francisco, San Francisco, CA 94143 USA
| |
Collapse
|
25
|
Silver M, Chen P, Li R, Cheng CY, Wong TY, Tai ES, Teo YY, Montana G. Pathways-driven sparse regression identifies pathways and genes associated with high-density lipoprotein cholesterol in two Asian cohorts. PLoS Genet 2013; 9:e1003939. [PMID: 24278029 PMCID: PMC3836716 DOI: 10.1371/journal.pgen.1003939] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2013] [Accepted: 09/11/2013] [Indexed: 01/11/2023] Open
Abstract
Standard approaches to data analysis in genome-wide association studies (GWAS) ignore any potential functional relationships between gene variants. In contrast gene pathways analysis uses prior information on functional structure within the genome to identify pathways associated with a trait of interest. In a second step, important single nucleotide polymorphisms (SNPs) or genes may be identified within associated pathways. The pathways approach is motivated by the fact that genes do not act alone, but instead have effects that are likely to be mediated through their interaction in gene pathways. Where this is the case, pathways approaches may reveal aspects of a trait's genetic architecture that would otherwise be missed when considering SNPs in isolation. Most pathways methods begin by testing SNPs one at a time, and so fail to capitalise on the potential advantages inherent in a multi-SNP, joint modelling approach. Here, we describe a dual-level, sparse regression model for the simultaneous identification of pathways and genes associated with a quantitative trait. Our method takes account of various factors specific to the joint modelling of pathways with genome-wide data, including widespread correlation between genetic predictors, and the fact that variants may overlap multiple pathways. We use a resampling strategy that exploits finite sample variability to provide robust rankings for pathways and genes. We test our method through simulation, and use it to perform pathways-driven gene selection in a search for pathways and genes associated with variation in serum high-density lipoprotein cholesterol levels in two separate GWAS cohorts of Asian adults. By comparing results from both cohorts we identify a number of candidate pathways including those associated with cardiomyopathy, and T cell receptor and PPAR signalling. Highlighted genes include those associated with the L-type calcium channel, adenylate cyclase, integrin, laminin, MAPK signalling and immune function.
Collapse
Affiliation(s)
- Matt Silver
- Statistics Section, Department of Mathematics, Imperial College, London, United Kingdom
- MRC International Nutrition Group, London School of Hygiene and Tropical Medicine, London, United Kingdom
- * E-mail:
| | - Peng Chen
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
| | - Ruoying Li
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Ching-Yu Cheng
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
- Department of Ophthalmology, National University of Singapore, Singapore
- Singapore Eye Research Institute, Singapore National Eye Center, Singapore
| | - Tien-Yin Wong
- Department of Ophthalmology, National University of Singapore, Singapore
- Singapore Eye Research Institute, Singapore National Eye Center, Singapore
| | - E-Shyong Tai
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Yik-Ying Teo
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
- NUS Graduate School for Integrative Science and Engineering, National University of Singapore, Singapore
- Life Sciences Institute, National University of Singapore, Singapore
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore
- Department of Statistics and Applied Probability, National University of Singapore, Singapore
| | - Giovanni Montana
- Statistics Section, Department of Mathematics, Imperial College, London, United Kingdom
| |
Collapse
|
26
|
Yang R, Li H, Fu L, Liu Y. An efficient approach to large-scale genotype-phenotype association analyses. Brief Bioinform 2013; 15:814-22. [PMID: 23990269 DOI: 10.1093/bib/bbt061] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Modern molecular biotechnology generates a great deal of intermediate information, such as transcriptional and metabolic products in bridging DNA and complex traits. In genome-wide linkage analysis and genome-wide association study, regression analysis for large-scale correlated phenotypes is applied to map genes for those by-products that are regarded as quantitative traits. For a single trait, least absolute shrinkage and selection operator with coordinate descent step can be employed to efficiently shrink sparse non-zero genetic effects of quantitative trait loci (QTLs). However, regression analyses in a trait-by-trait basis do not take account of the correlations among the analyzed traits. In this study, conditional phenotype of each trait is defined, given other traits. Large-scale genotype-phenotype association analyses are therefore transformed to separate genotype-conditional phenotype ones. Meanwhile, the correlation architecture between each trait and other traits can also be provided by shrinkage estimation for each conditional phenotype. Simulation demonstrates that the proposed conditional mapping method is generally identical to joint mapping method based on multivariate analysis in terms of statistical detection power and parameter estimation. Application of the method is provided to locate eQTL in yeast.
Collapse
|
27
|
Weiner MW, Veitch DP, Aisen PS, Beckett LA, Cairns NJ, Green RC, Harvey D, Jack CR, Jagust W, Liu E, Morris JC, Petersen RC, Saykin AJ, Schmidt ME, Shaw L, Shen L, Siuciak JA, Soares H, Toga AW, Trojanowski JQ. The Alzheimer's Disease Neuroimaging Initiative: a review of papers published since its inception. Alzheimers Dement 2013; 9:e111-94. [PMID: 23932184 DOI: 10.1016/j.jalz.2013.05.1769] [Citation(s) in RCA: 308] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Revised: 04/18/2013] [Indexed: 01/19/2023]
Abstract
The Alzheimer's Disease Neuroimaging Initiative (ADNI) is an ongoing, longitudinal, multicenter study designed to develop clinical, imaging, genetic, and biochemical biomarkers for the early detection and tracking of Alzheimer's disease (AD). The study aimed to enroll 400 subjects with early mild cognitive impairment (MCI), 200 subjects with early AD, and 200 normal control subjects; $67 million funding was provided by both the public and private sectors, including the National Institute on Aging, 13 pharmaceutical companies, and 2 foundations that provided support through the Foundation for the National Institutes of Health. This article reviews all papers published since the inception of the initiative and summarizes the results as of February 2011. The major accomplishments of ADNI have been as follows: (1) the development of standardized methods for clinical tests, magnetic resonance imaging (MRI), positron emission tomography (PET), and cerebrospinal fluid (CSF) biomarkers in a multicenter setting; (2) elucidation of the patterns and rates of change of imaging and CSF biomarker measurements in control subjects, MCI patients, and AD patients. CSF biomarkers are consistent with disease trajectories predicted by β-amyloid cascade (Hardy, J Alzheimers Dis 2006;9(Suppl 3):151-3) and tau-mediated neurodegeneration hypotheses for AD, whereas brain atrophy and hypometabolism levels show predicted patterns but exhibit differing rates of change depending on region and disease severity; (3) the assessment of alternative methods of diagnostic categorization. Currently, the best classifiers combine optimum features from multiple modalities, including MRI, [(18)F]-fluorodeoxyglucose-PET, CSF biomarkers, and clinical tests; (4) the development of methods for the early detection of AD. CSF biomarkers, β-amyloid 42 and tau, as well as amyloid PET may reflect the earliest steps in AD pathology in mildly symptomatic or even nonsymptomatic subjects, and are leading candidates for the detection of AD in its preclinical stages; (5) the improvement of clinical trial efficiency through the identification of subjects most likely to undergo imminent future clinical decline and the use of more sensitive outcome measures to reduce sample sizes. Baseline cognitive and/or MRI measures generally predicted future decline better than other modalities, whereas MRI measures of change were shown to be the most efficient outcome measures; (6) the confirmation of the AD risk loci CLU, CR1, and PICALM and the identification of novel candidate risk loci; (7) worldwide impact through the establishment of ADNI-like programs in Europe, Asia, and Australia; (8) understanding the biology and pathobiology of normal aging, MCI, and AD through integration of ADNI biomarker data with clinical data from ADNI to stimulate research that will resolve controversies about competing hypotheses on the etiopathogenesis of AD, thereby advancing efforts to find disease-modifying drugs for AD; and (9) the establishment of infrastructure to allow sharing of all raw and processed data without embargo to interested scientific investigators throughout the world. The ADNI study was extended by a 2-year Grand Opportunities grant in 2009 and a renewal of ADNI (ADNI-2) in October 2010 through to 2016, with enrollment of an additional 550 participants.
Collapse
Affiliation(s)
- Michael W Weiner
- Department of Veterans Affairs Medical Center, Center for Imaging of Neurodegenerative Diseases, San Francisco, CA, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Ayers KL, Cordell HJ. Identification of grouped rare and common variants via penalized logistic regression. Genet Epidemiol 2013; 37:592-602. [PMID: 23836590 PMCID: PMC3842118 DOI: 10.1002/gepi.21746] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2012] [Revised: 05/24/2013] [Accepted: 05/24/2013] [Indexed: 11/09/2022]
Abstract
In spite of the success of genome-wide association studies in finding many common variants associated with disease, these variants seem to explain only a small proportion of the estimated heritability. Data collection has turned toward exome and whole genome sequencing, but it is well known that single marker methods frequently used for common variants have low power to detect rare variants associated with disease, even with very large sample sizes. In response, a variety of methods have been developed that attempt to cluster rare variants so that they may gather strength from one another under the premise that there may be multiple causal variants within a gene. Most of these methods group variants by gene or proximity, and test one gene or marker window at a time. We propose a penalized regression method (PeRC) that analyzes all genes at once, allowing grouping of all (rare and common) variants within a gene, along with subgrouping of the rare variants, thus borrowing strength from both rare and common variants within the same gene. The method can incorporate either a burden-based weighting of the rare variants or one in which the weights are data driven. In simulations, our method performs favorably when compared to many previously proposed approaches, including its predecessor, the sparse group lasso [Friedman et al., 2010].
Collapse
Affiliation(s)
- Kristin L Ayers
- Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne NE1 3BZ, United Kingdom.
| | | |
Collapse
|
29
|
Lehne B, Schlitt T. Breaking free from the chains of pathway annotation: de novo pathway discovery for the analysis of disease processes. Pharmacogenomics 2013; 13:1967-78. [PMID: 23215889 DOI: 10.2217/pgs.12.170] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
Interpreting the biological implications of high-throughput experiments such as gene-expression studies, genome-wide association studies and large-scale sequencing studies is not trivial. Gene-set and pathway analyses are useful tools to support the interpretation of such experiments, but rely on curated pathways or gene sets. The recent development of de novo pathway discovery methods aims to overcome this limitation. This article provides an overview of the methods currently available and reviews the advantages and challenges of this approach. In detail, it highlights the particular issues of de novo pathway discovery based on genome-wide association studies data, for which multiple different strategies have been proposed.
Collapse
Affiliation(s)
- Benjamin Lehne
- Bioinformatics Group, Department of Medical & Molecular Genetics, 8th Floor Tower Wing Guy's Hospital, London SE1 9RT, UK
| | | |
Collapse
|
30
|
Silver M, Janousova E, Hua X, Thompson PM, Montana G. Identification of gene pathways implicated in Alzheimer's disease using longitudinal imaging phenotypes with sparse regression. Neuroimage 2012; 63:1681-94. [PMID: 22982105 PMCID: PMC3549495 DOI: 10.1016/j.neuroimage.2012.08.002] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2012] [Revised: 08/01/2012] [Accepted: 08/03/2012] [Indexed: 02/04/2023] Open
Abstract
We present a new method for the detection of gene pathways associated with a multivariate quantitative trait, and use it to identify causal pathways associated with an imaging endophenotype characteristic of longitudinal structural change in the brains of patients with Alzheimer's disease (AD). Our method, known as pathways sparse reduced-rank regression (PsRRR), uses group lasso penalised regression to jointly model the effects of genome-wide single nucleotide polymorphisms (SNPs), grouped into functional pathways using prior knowledge of gene-gene interactions. Pathways are ranked in order of importance using a resampling strategy that exploits finite sample variability. Our application study uses whole genome scans and MR images from 99 probable AD patients and 164 healthy elderly controls in the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. 66,182 SNPs are mapped to 185 gene pathways from the KEGG pathway database. Voxel-wise imaging signatures characteristic of AD are obtained by analysing 3D patterns of structural change at 6, 12 and 24 months relative to baseline. High-ranking, AD endophenotype-associated pathways in our study include those describing insulin signalling, vascular smooth muscle contraction and focal adhesion. All of these have been previously implicated in AD biology. In a secondary analysis, we investigate SNPs and genes that may be driving pathway selection. High ranking genes include a number previously linked in gene expression studies to β-amyloid plaque formation in the AD brain (PIK3R3,PIK3CG,PRKCAandPRKCB), and to AD related changes in hippocampal gene expression (ADCY2, ACTN1, ACACA, and GNAI1). Other high ranking previously validated AD endophenotype-related genes include CR1, TOMM40 and APOE.
Collapse
Affiliation(s)
- Matt Silver
- Statistics Section, Department of Mathematics, Imperial College London, UK
| | - Eva Janousova
- Statistics Section, Department of Mathematics, Imperial College London, UK
- Institute of Biostatistics and Analyses, Masaryk University, Brno, Czech Republic
| | - Xue Hua
- Laboratory of Neuro Imaging, Department of Neurology, UCLA School of Medicine, Los Angeles, CA, USA
| | - Paul M. Thompson
- Laboratory of Neuro Imaging, Department of Neurology, UCLA School of Medicine, Los Angeles, CA, USA
| | - Giovanni Montana
- Statistics Section, Department of Mathematics, Imperial College London, UK
- Corresponding author.
| | | |
Collapse
|
31
|
Vounou M, Janousova E, Wolz R, Stein JL, Thompson PM, Rueckert D, Montana G. Sparse reduced-rank regression detects genetic associations with voxel-wise longitudinal phenotypes in Alzheimer's disease. Neuroimage 2011; 60:700-16. [PMID: 22209813 DOI: 10.1016/j.neuroimage.2011.12.029] [Citation(s) in RCA: 92] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2011] [Revised: 11/18/2011] [Accepted: 12/14/2011] [Indexed: 11/17/2022] Open
Abstract
Scanning the entire genome in search of variants related to imaging phenotypes holds great promise in elucidating the genetic etiology of neurodegenerative disorders. Here we discuss the application of a penalized multivariate model, sparse reduced-rank regression (sRRR), for the genome-wide detection of markers associated with voxel-wise longitudinal changes in the brain caused by Alzheimer's disease (AD). Using a sample from the Alzheimer's Disease Neuroimaging Initiative database, we performed three separate studies that each compared two groups of individuals to identify genes associated with disease development and progression. For each comparison we took a two-step approach: initially, using penalized linear discriminant analysis, we identified voxels that provide an imaging signature of the disease with high classification accuracy; then we used this multivariate biomarker as a phenotype in a genome-wide association study, carried out using sRRR. The genetic markers were ranked in order of importance of association to the phenotypes using a data re-sampling approach. Our findings confirmed the key role of the APOE and TOMM40 genes but also highlighted some novel potential associations with AD.
Collapse
Affiliation(s)
- Maria Vounou
- Statistics Section, Department of Mathematics, Imperial College London, UK
| | | | | | | | | | | | | |
Collapse
|