1
|
Mihalik A, Chapman J, Adams RA, Winter NR, Ferreira FS, Shawe-Taylor J, Mourão-Miranda J. Canonical Correlation Analysis and Partial Least Squares for Identifying Brain-Behavior Associations: A Tutorial and a Comparative Study. BIOLOGICAL PSYCHIATRY. COGNITIVE NEUROSCIENCE AND NEUROIMAGING 2022; 7:1055-1067. [PMID: 35952973 DOI: 10.1016/j.bpsc.2022.07.012] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 06/30/2022] [Accepted: 07/22/2022] [Indexed: 06/15/2023]
Abstract
Canonical correlation analysis (CCA) and partial least squares (PLS) are powerful multivariate methods for capturing associations across 2 modalities of data (e.g., brain and behavior). However, when the sample size is similar to or smaller than the number of variables in the data, standard CCA and PLS models may overfit, i.e., find spurious associations that generalize poorly to new data. Dimensionality reduction and regularized extensions of CCA and PLS have been proposed to address this problem, yet most studies using these approaches have some limitations. This work gives a theoretical and practical introduction into the most common CCA/PLS models and their regularized variants. We examine the limitations of standard CCA and PLS when the sample size is similar to or smaller than the number of variables. We discuss how dimensionality reduction and regularization techniques address this problem and explain their main advantages and disadvantages. We highlight crucial aspects of the CCA/PLS analysis framework, including optimizing the hyperparameters of the model and testing the identified associations for statistical significance. We apply the described CCA/PLS models to simulated data and real data from the Human Connectome Project and Alzheimer's Disease Neuroimaging Initiative (both of n > 500). We use both low- and high-dimensionality versions of these data (i.e., ratios between sample size and variables in the range of ∼1-10 and ∼0.1-0.01, respectively) to demonstrate the impact of data dimensionality on the models. Finally, we summarize the key lessons of the tutorial.
Collapse
Affiliation(s)
- Agoston Mihalik
- Centre for Medical Image Computing, Department of Computer Science, University College London, London, United Kingdom; Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London, United Kingdom; Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom.
| | - James Chapman
- Centre for Medical Image Computing, Department of Computer Science, University College London, London, United Kingdom; Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London, United Kingdom
| | - Rick A Adams
- Centre for Medical Image Computing, Department of Computer Science, University College London, London, United Kingdom; Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London, United Kingdom; Wellcome Centre for Human Neuroimaging, University College London, London, United Kingdom
| | - Nils R Winter
- Institute of Translational Psychiatry, University of Münster, Münster, Germany
| | - Fabio S Ferreira
- Centre for Medical Image Computing, Department of Computer Science, University College London, London, United Kingdom; Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London, United Kingdom
| | - John Shawe-Taylor
- Department of Computer Science, University College London, London, United Kingdom
| | - Janaina Mourão-Miranda
- Centre for Medical Image Computing, Department of Computer Science, University College London, London, United Kingdom; Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London, United Kingdom
| |
Collapse
|
2
|
Mihalik A, Ferreira FS, Moutoussis M, Ziegler G, Adams RA, Rosa MJ, Prabhu G, de Oliveira L, Pereira M, Bullmore ET, Fonagy P, Goodyer IM, Jones PB, Shawe-Taylor J, Dolan R, Mourão-Miranda J. Multiple Holdouts With Stability: Improving the Generalizability of Machine Learning Analyses of Brain-Behavior Relationships. Biol Psychiatry 2020; 87:368-376. [PMID: 32040421 PMCID: PMC6970221 DOI: 10.1016/j.biopsych.2019.12.001] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Revised: 12/03/2019] [Accepted: 12/04/2019] [Indexed: 12/27/2022]
Abstract
BACKGROUND In 2009, the National Institute of Mental Health launched the Research Domain Criteria, an attempt to move beyond diagnostic categories and ground psychiatry within neurobiological constructs that combine different levels of measures (e.g., brain imaging and behavior). Statistical methods that can integrate such multimodal data, however, are often vulnerable to overfitting, poor generalization, and difficulties in interpreting the results. METHODS We propose an innovative machine learning framework combining multiple holdouts and a stability criterion with regularized multivariate techniques, such as sparse partial least squares and kernel canonical correlation analysis, for identifying hidden dimensions of cross-modality relationships. To illustrate the approach, we investigated structural brain-behavior associations in an extensively phenotyped developmental sample of 345 participants (312 healthy and 33 with clinical depression). The brain data consisted of whole-brain voxel-based gray matter volumes, and the behavioral data included item-level self-report questionnaires and IQ and demographic measures. RESULTS Both sparse partial least squares and kernel canonical correlation analysis captured two hidden dimensions of brain-behavior relationships: one related to age and drinking and the other one related to depression. The applied machine learning framework indicates that these results are stable and generalize well to new data. Indeed, the identified brain-behavior associations are in agreement with previous findings in the literature concerning age, alcohol use, and depression-related changes in brain volume. CONCLUSIONS Multivariate techniques (such as sparse partial least squares and kernel canonical correlation analysis) embedded in our novel framework are promising tools to link behavior and/or symptoms to neurobiology and thus have great potential to contribute to a biologically grounded definition of psychiatric disorders.
Collapse
Affiliation(s)
- Agoston Mihalik
- Centre for Medical Image Computing, Department of Computer Science, University College London, London, United Kingdom; Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London, United Kingdom.
| | - Fabio S. Ferreira
- Centre for Medical Image Computing, Department of Computer Science, University College London, London, United Kingdom,Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London, United Kingdom
| | - Michael Moutoussis
- Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London, United Kingdom,Wellcome Centre for Human Neuroimaging, University College London, London, United Kingdom
| | - Gabriel Ziegler
- Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London, United Kingdom,Institute of Cognitive Neurology and Dementia Research, Otto von Guericke University, Magdeburg, Magdeburg, Germany,German Center for Neurodegenerative Diseases, Bonn, Germany
| | - Rick A. Adams
- Centre for Medical Image Computing, Department of Computer Science, University College London, London, United Kingdom,Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London, United Kingdom,Wellcome Centre for Human Neuroimaging, University College London, London, United Kingdom
| | - Maria J. Rosa
- Centre for Medical Image Computing, Department of Computer Science, University College London, London, United Kingdom,Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London, United Kingdom
| | - Gita Prabhu
- Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London, United Kingdom,Wellcome Centre for Human Neuroimaging, University College London, London, United Kingdom
| | - Leticia de Oliveira
- Laboratory of Neurophysiology of Behaviour, Department of Physiology and Pharmacology, Biomedical Institute, Federal Fluminense University, Niterói, Brazil
| | - Mirtes Pereira
- Laboratory of Neurophysiology of Behaviour, Department of Physiology and Pharmacology, Biomedical Institute, Federal Fluminense University, Niterói, Brazil
| | - Edward T. Bullmore
- Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom,Behavioural and Clinical Neuroscience Institute, University of Cambridge, Cambridge, United Kingdom,Cambridgeshire and Peterborough NHS Foundation Trust, Cambridge, United Kingdom,ImmunoPsychiatry, GlaxoSmithKline Research and Development, Stevenage, United Kingdom
| | - Peter Fonagy
- Research Department of Clinical, Educational, and Health Psychology, University College London, London, United Kingdom
| | - Ian M. Goodyer
- Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom,Cambridgeshire and Peterborough NHS Foundation Trust, Cambridge, United Kingdom
| | - Peter B. Jones
- Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom,Cambridgeshire and Peterborough NHS Foundation Trust, Cambridge, United Kingdom
| | | | - John Shawe-Taylor
- Department of Computer Science, University College London, London, United Kingdom
| | - Raymond Dolan
- Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London, United Kingdom,Wellcome Centre for Human Neuroimaging, University College London, London, United Kingdom
| | - Janaina Mourão-Miranda
- Centre for Medical Image Computing, Department of Computer Science, University College London, London, United Kingdom,Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London, United Kingdom
| |
Collapse
|
3
|
Lambert C, Sam Narean J, Benjamin P, Zeestraten E, Barrick TR, Markus HS. Characterising the grey matter correlates of leukoaraiosis in cerebral small vessel disease. Neuroimage Clin 2015; 9:194-205. [PMID: 26448913 PMCID: PMC4564392 DOI: 10.1016/j.nicl.2015.07.002] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2015] [Revised: 06/30/2015] [Accepted: 07/03/2015] [Indexed: 01/05/2023]
Abstract
Cerebral small vessel disease (SVD) is a heterogeneous group of pathological disorders that affect the small vessels of the brain and are an important cause of cognitive impairment. The ischaemic consequences of this disease can be detected using MRI, and include white matter hyperintensities (WMH), lacunar infarcts and microhaemorrhages. The relationship between SVD disease severity, as defined by WMH volume, in sporadic age-related SVD and cortical thickness has not been well defined. However, regional cortical thickness change would be expected due to associated phenomena such as underlying ischaemic white matter damage, and the observation that widespread cortical thinning is observed in the related genetic condition CADASIL (Righart et al., 2013). Using MRI data, we have developed a semi-automated processing pipeline for the anatomical analysis of individuals with cerebral small vessel disease and applied it cross-sectionally to 121 subjects diagnosed with this condition. Using a novel combined automated white matter lesion segmentation algorithm and lesion repair step, highly accurate warping to a group average template was achieved. The volume of white matter affected by WMH was calculated, and used as a covariate of interest in a voxel-based morphometry and voxel-based cortical thickness analysis. Additionally, Gaussian Process Regression (GPR) was used to assess if the severity of SVD, measured by WMH volume, could be predicted from the morphometry and cortical thickness measures. We found significant (Family Wise Error corrected p < 0.05) volumetric decline with increasing lesion load predominately in the parietal lobes, anterior insula and caudate nuclei bilaterally. Widespread significant cortical thinning was found bilaterally in the dorsolateral prefrontal, parietal and posterio-superior temporal cortices. These represent distinctive patterns of cortical thinning and volumetric reduction compared to ageing effects in the same cohort, which exhibited greater changes in the occipital and sensorimotor cortices. Using GPR, the absolute WMH volume could be significantly estimated from the grey matter density and cortical thickness maps (Pearson's coefficients 0.80 and 0.75 respectively). We demonstrate that SVD severity is associated with regional cortical thinning. Furthermore a quantitative measure of SVD severity (WMH volume) can be predicted from grey matter measures, supporting an association between white and grey matter damage. The pattern of cortical thinning and volumetric decline is distinctive for SVD severity compared to ageing. These results, taken together, suggest that there is a phenotypic pattern of atrophy associated with SVD severity.
Collapse
Affiliation(s)
- Christian Lambert
- Neurosciences Research Centre, Cardiovascular and Cell Sciences Research Institute, St George's University of London, United Kingdom
| | - Janakan Sam Narean
- Neurosciences Research Centre, Cardiovascular and Cell Sciences Research Institute, St George's University of London, United Kingdom
| | - Philip Benjamin
- Neurosciences Research Centre, Cardiovascular and Cell Sciences Research Institute, St George's University of London, United Kingdom
| | - Eva Zeestraten
- Neurosciences Research Centre, Cardiovascular and Cell Sciences Research Institute, St George's University of London, United Kingdom
| | - Thomas R. Barrick
- Neurosciences Research Centre, Cardiovascular and Cell Sciences Research Institute, St George's University of London, United Kingdom
| | - Hugh S. Markus
- Neurosciences Research Centre, Cardiovascular and Cell Sciences Research Institute, St George's University of London, United Kingdom
- Stroke Research Group, Division of Clinical Neurosciences, University of Cambridge, United Kingdom
| |
Collapse
|