1
|
Cheek CL, Lindner P, Grigorenko EL. Statistical and Machine Learning Analysis in Brain-Imaging Genetics: A Review of Methods. Behav Genet 2024; 54:233-251. [PMID: 38336922 DOI: 10.1007/s10519-024-10177-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 01/24/2024] [Indexed: 02/12/2024]
Abstract
Brain-imaging-genetic analysis is an emerging field of research that aims at aggregating data from neuroimaging modalities, which characterize brain structure or function, and genetic data, which capture the structure and function of the genome, to explain or predict normal (or abnormal) brain performance. Brain-imaging-genetic studies offer great potential for understanding complex brain-related diseases/disorders of genetic etiology. Still, a combined brain-wide genome-wide analysis is difficult to perform as typical datasets fuse multiple modalities, each with high dimensionality, unique correlational landscapes, and often low statistical signal-to-noise ratios. In this review, we outline the progress in brain-imaging-genetic methodologies starting from early massive univariate to current deep learning approaches, highlighting each approach's strengths and weaknesses and elongating it with the field's development. We conclude by discussing selected remaining challenges and prospects for the field.
Collapse
Affiliation(s)
- Connor L Cheek
- Texas Institute for Evaluation, Measurement, and Statistics, University of Houston, Houston, TX, USA.
- Department of Physics, University of Houston, Houston, TX, USA.
| | - Peggy Lindner
- Texas Institute for Evaluation, Measurement, and Statistics, University of Houston, Houston, TX, USA
- Department of Information Science Technology, University of Houston, Houston, TX, USA
| | - Elena L Grigorenko
- Texas Institute for Evaluation, Measurement, and Statistics, University of Houston, Houston, TX, USA
- Department of Psychology, University of Houston, Houston, TX, USA
- Baylor College of Medicine, Houston, TX, USA
- Sirius University of Science and Technology, Sochi, Russia
| |
Collapse
|
2
|
Vergara-Jaramillo KT, Medina-Sánchez CE, Mora-Rojas AF, Carrillo-Tete D, Bolaño-Romero MP. Letter to the editor regarding "Cardiovascular and cerebrovascular events in patients with intracerebral hemorrhage: Clinical characteristics and long-term predictors". J Clin Neurosci 2021; 93:284-285. [PMID: 34391621 DOI: 10.1016/j.jocn.2021.08.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2021] [Accepted: 08/02/2021] [Indexed: 10/20/2022]
Affiliation(s)
| | | | | | | | - Maria Paz Bolaño-Romero
- Colombian Clinical Research Group in Neurocritical Care, School of Medicine, University of Cartagena, Cartagena, Colombia.
| |
Collapse
|
3
|
Antonakakis M, Zervakis M, van Beijsterveldt CE, Boomsma DI, De Geus EJ, Micheloyannis S, Smit DJ. Genetic effects on source level evoked and induced oscillatory brain responses in a visual oddball task. Biol Psychol 2016; 114:69-80. [DOI: 10.1016/j.biopsycho.2015.12.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2014] [Revised: 11/28/2015] [Accepted: 12/22/2015] [Indexed: 12/31/2022]
|
4
|
Grellmann C, Bitzer S, Neumann J, Westlye LT, Andreassen OA, Villringer A, Horstmann A. Comparison of variants of canonical correlation analysis and partial least squares for combined analysis of MRI and genetic data. Neuroimage 2014; 107:289-310. [PMID: 25527238 DOI: 10.1016/j.neuroimage.2014.12.025] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2014] [Revised: 11/24/2014] [Accepted: 12/09/2014] [Indexed: 01/31/2023] Open
Abstract
The standard analysis approach in neuroimaging genetics studies is the mass-univariate linear modeling (MULM) approach. From a statistical view, however, this approach is disadvantageous, as it is computationally intensive, cannot account for complex multivariate relationships, and has to be corrected for multiple testing. In contrast, multivariate methods offer the opportunity to include combined information from multiple variants to discover meaningful associations between genetic and brain imaging data. We assessed three multivariate techniques, partial least squares correlation (PLSC), sparse canonical correlation analysis (sparse CCA) and Bayesian inter-battery factor analysis (Bayesian IBFA), with respect to their ability to detect multivariate genotype-phenotype associations. Our goal was to systematically compare these three approaches with respect to their performance and to assess their suitability for high-dimensional and multi-collinearly dependent data as is the case in neuroimaging genetics studies. In a series of simulations using both linearly independent and multi-collinear data, we show that sparse CCA and PLSC are suitable even for very high-dimensional collinear imaging data sets. Among those two, the predictive power was higher for sparse CCA when voxel numbers were below 400 times sample size and candidate SNPs were considered. Accordingly, we recommend Sparse CCA for candidate phenotype, candidate SNP studies. When voxel numbers exceeded 500 times sample size, the predictive power was the highest for PLSC. Therefore, PLSC can be considered a promising technique for multivariate modeling of high-dimensional brain-SNP-associations. In contrast, Bayesian IBFA cannot be recommended, since additional post-processing steps were necessary to detect causal relations. To verify the applicability of sparse CCA and PLSC, we applied them to an experimental imaging genetics data set provided for us. Most importantly, application of both methods replicated the findings of this data set.
Collapse
Affiliation(s)
- Claudia Grellmann
- Department of Neurology, Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstraße 1A, 04103 Leipzig, Germany; Leipzig University Hospital, IFB Adiposity Diseases, Philipp-Rosenthal-Straße 27, 04103 Leipzig, Germany.
| | - Sebastian Bitzer
- Department of Neurology, Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstraße 1A, 04103 Leipzig, Germany.
| | - Jane Neumann
- Department of Neurology, Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstraße 1A, 04103 Leipzig, Germany; Leipzig University Hospital, IFB Adiposity Diseases, Philipp-Rosenthal-Straße 27, 04103 Leipzig, Germany.
| | - Lars T Westlye
- Oslo University Hospital, NORMENT KG Jebsen Centre for Psychosis Research, Kirkeveien 166, PO Box 4956, Nydalen, 0424 Oslo, Norway; University of Oslo, Department of Psychology, PO Box 1094, Blindern, 0317 Oslo, Norway.
| | - Ole A Andreassen
- Oslo University Hospital, NORMENT KG Jebsen Centre for Psychosis Research, Kirkeveien 166, PO Box 4956, Nydalen, 0424 Oslo, Norway.
| | - Arno Villringer
- Department of Neurology, Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstraße 1A, 04103 Leipzig, Germany; Leipzig University Hospital, IFB Adiposity Diseases, Philipp-Rosenthal-Straße 27, 04103 Leipzig, Germany; Leipzig University Hospital, Clinic of Cognitive Neurology, Liebigstraße 16, 04103 Leipzig, Germany; Mind and Brain Institute, Berlin School of Mind and Brain, Humboldt-University, Unter den Linden 6, 10099 Berlin, Germany.
| | - Annette Horstmann
- Department of Neurology, Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstraße 1A, 04103 Leipzig, Germany; Leipzig University Hospital, IFB Adiposity Diseases, Philipp-Rosenthal-Straße 27, 04103 Leipzig, Germany.
| |
Collapse
|
5
|
Abstract
Specific language impairment (SLI) is a multifactorial neurodevelopmental disorder which occurs unexpectedly and without an obvious cause. Over a decade of research suggests that SLI is highly heritable. Several genes and loci have already been implicated in SLI through linkage and targeted association methods. Recently, genome-wide association studies (GWAS) of SLI and language traits in the general population have been reported and, consequently, new candidate genes have been identified. This review aims to summarise the literature concerning genome-wide studies of SLI. In addition, this review highlights the methodologies that have been used to research the genetics of SLI to date, and also considers the current, and future, contributions that GWAS can offer.
Collapse
Affiliation(s)
- Rose H Reader
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN UK
| | - Laura E Covill
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN UK
| | - Ron Nudel
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN UK
| | - Dianne F Newbury
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN UK ; St John's College, University of Oxford, Oxford, OX1 3JP UK
| |
Collapse
|
6
|
Wang Y, Goh W, Wong L, Montana G. Random forests on Hadoop for genome-wide association studies of multivariate neuroimaging phenotypes. BMC Bioinformatics 2013; 14 Suppl 16:S6. [PMID: 24564704 PMCID: PMC3853073 DOI: 10.1186/1471-2105-14-s16-s6] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
MOTIVATION Multivariate quantitative traits arise naturally in recent neuroimaging genetics studies, in which both structural and functional variability of the human brain is measured non-invasively through techniques such as magnetic resonance imaging (MRI). There is growing interest in detecting genetic variants associated with such multivariate traits, especially in genome-wide studies. Random forests (RFs) classifiers, which are ensembles of decision trees, are amongst the best performing machine learning algorithms and have been successfully employed for the prioritisation of genetic variants in case-control studies. RFs can also be applied to produce gene rankings in association studies with multivariate quantitative traits, and to estimate genetic similarities measures that are predictive of the trait. However, in studies involving hundreds of thousands of SNPs and high-dimensional traits, a very large ensemble of trees must be inferred from the data in order to obtain reliable rankings, which makes the application of these algorithms computationally prohibitive. RESULTS We have developed a parallel version of the RF algorithm for regression and genetic similarity learning tasks in large-scale population genetic association studies involving multivariate traits, called PaRFR (Parallel Random Forest Regression). Our implementation takes advantage of the MapReduce programming model and is deployed on Hadoop, an open-source software framework that supports data-intensive distributed applications. Notable speed-ups are obtained by introducing a distance-based criterion for node splitting in the tree estimation process. PaRFR has been applied to a genome-wide association study on Alzheimer's disease (AD) in which the quantitative trait consists of a high-dimensional neuroimaging phenotype describing longitudinal changes in the human brain structure. PaRFR provides a ranking of SNPs associated to this trait, and produces pair-wise measures of genetic proximity that can be directly compared to pair-wise measures of phenotypic proximity. Several known AD-related variants have been identified, including APOE4 and TOMM40. We also present experimental evidence supporting the hypothesis of a linear relationship between the number of top-ranked mutated states, or frequent mutation patterns, and an indicator of disease severity. AVAILABILITY The Java codes are freely available at http://www2.imperial.ac.uk/~gmontana.
Collapse
|