1
|
Rosenblatt M, Tejavibulya L, Sun H, Camp CC, Khaitova M, Adkinson BD, Jiang R, Westwater ML, Noble S, Scheinost D. Power and reproducibility in the external validation of brain-phenotype predictions. Nat Hum Behav 2024:10.1038/s41562-024-01931-7. [PMID: 39085406 DOI: 10.1038/s41562-024-01931-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 06/18/2024] [Indexed: 08/02/2024]
Abstract
Brain-phenotype predictive models seek to identify reproducible and generalizable brain-phenotype associations. External validation, or the evaluation of a model in external datasets, is the gold standard in evaluating the generalizability of models in neuroimaging. Unlike typical studies, external validation involves two sample sizes: the training and the external sample sizes. Thus, traditional power calculations may not be appropriate. Here we ran over 900 million resampling-based simulations in functional and structural connectivity data to investigate the relationship between training sample size, external sample size, phenotype effect size, theoretical power and simulated power. Our analysis included a wide range of datasets: the Healthy Brain Network, the Adolescent Brain Cognitive Development Study, the Human Connectome Project (Development and Young Adult), the Philadelphia Neurodevelopmental Cohort, the Queensland Twin Adolescent Brain Project, and the Chinese Human Connectome Project; and phenotypes: age, body mass index, matrix reasoning, working memory, attention problems, anxiety/depression symptoms and relational processing. High effect size predictions achieved adequate power with training and external sample sizes of a few hundred individuals, whereas low and medium effect size predictions required hundreds to thousands of training and external samples. In addition, most previous external validation studies used sample sizes prone to low power, and theoretical power curves should be adjusted for the training sample size. Furthermore, model performance in internal validation often informed subsequent external validation performance (Pearson's r difference <0.2), particularly for well-harmonized datasets. These results could help decide how to power future external validation studies.
Collapse
Affiliation(s)
- Matthew Rosenblatt
- Department of Biomedical Engineering, Yale University, New Haven, CT, USA.
| | - Link Tejavibulya
- Interdepartmental Neuroscience Program, Yale University, New Haven, CT, USA
| | - Huili Sun
- Department of Biomedical Engineering, Yale University, New Haven, CT, USA
| | - Chris C Camp
- Interdepartmental Neuroscience Program, Yale University, New Haven, CT, USA
| | - Milana Khaitova
- Department of Radiology and Biomedical Imaging, Yale School of Medicine, New Haven, CT, USA
| | - Brendan D Adkinson
- Interdepartmental Neuroscience Program, Yale University, New Haven, CT, USA
| | - Rongtao Jiang
- Department of Radiology and Biomedical Imaging, Yale School of Medicine, New Haven, CT, USA
| | - Margaret L Westwater
- Department of Radiology and Biomedical Imaging, Yale School of Medicine, New Haven, CT, USA
| | - Stephanie Noble
- Department of Radiology and Biomedical Imaging, Yale School of Medicine, New Haven, CT, USA
- Department of Bioengineering, Northeastern University, Boston, MA, USA
- Department of Psychology, Northeastern University, Boston, MA, USA
| | - Dustin Scheinost
- Department of Biomedical Engineering, Yale University, New Haven, CT, USA
- Interdepartmental Neuroscience Program, Yale University, New Haven, CT, USA
- Department of Radiology and Biomedical Imaging, Yale School of Medicine, New Haven, CT, USA
- Child Study Center, Yale School of Medicine, New Haven, CT, USA
- Department of Statistics and Data Science, Yale University, New Haven, CT, USA
| |
Collapse
|
2
|
Wang G, Jiang N, Ma Y, Suo D, Liu T, Funahashi S, Yan T. Using a deep generation network reveals neuroanatomical specificity in hemispheres. PATTERNS (NEW YORK, N.Y.) 2024; 5:100930. [PMID: 38645770 PMCID: PMC11026975 DOI: 10.1016/j.patter.2024.100930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 01/08/2024] [Accepted: 01/15/2024] [Indexed: 04/23/2024]
Abstract
Asymmetry is an important property of brain organization, but its nature is still poorly understood. Capturing the neuroanatomical components specific to each hemisphere facilitates the understanding of the establishment of brain asymmetry. Since deep generative networks (DGNs) have powerful inference and recovery capabilities, we use one hemisphere to predict the opposite hemisphere by training the DGNs, which automatically fit the built-in dependencies between the left and right hemispheres. After training, the reconstructed images approximate the homologous components in the hemisphere. We use the difference between the actual and reconstructed hemispheres to measure hemisphere-specific components due to asymmetric expression of environmental and genetic factors. The results show that our model is biologically plausible and that our proposed metric of hemispheric specialization is reliable, representing a wide range of individual variation. Together, this work provides promising tools for exploring brain asymmetry and new insights into self-supervised DGNs for representing the brain.
Collapse
Affiliation(s)
- Gongshu Wang
- School of Medical Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Ning Jiang
- School of Medical Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Yunxiao Ma
- School of Medical Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Dingjie Suo
- School of Medical Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Tiantian Liu
- School of Medical Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Shintaro Funahashi
- Advanced Research Institute for Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China
- Department of Cognitive and Behavioral Sciences, Graduate School of Human and Environmental Science, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan
- Kokoro Research Center, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan
| | - Tianyi Yan
- School of Medical Technology, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
3
|
Guzmán Chacón E, Ovando-Tellez M, Thiebaut de Schotten M, Forkel SJ. Embracing digital innovation in neuroscience: 2023 in review at NEUROCCINO. Brain Struct Funct 2024; 229:251-255. [PMID: 38386031 PMCID: PMC10917830 DOI: 10.1007/s00429-024-02768-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 01/25/2024] [Indexed: 02/23/2024]
Affiliation(s)
- Eva Guzmán Chacón
- Donders Institute for Brain Cognition Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Marcela Ovando-Tellez
- University Bordeaux, CNRS, CEA, IMN, UMR 5293, GIN, 33000, Bordeaux, France
- Brain Connectivity and Behaviour Laboratory, Paris, France
| | - Michel Thiebaut de Schotten
- University Bordeaux, CNRS, CEA, IMN, UMR 5293, GIN, 33000, Bordeaux, France
- Brain Connectivity and Behaviour Laboratory, Paris, France
| | - Stephanie J Forkel
- Donders Institute for Brain Cognition Behaviour, Radboud University, Nijmegen, The Netherlands.
- Brain Connectivity and Behaviour Laboratory, Paris, France.
- Centre for Neuroimaging Sciences, Department of Neuroimaging, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands.
| |
Collapse
|
4
|
Rosenblatt M, Tejavibulya L, Camp CC, Jiang R, Westwater ML, Noble S, Scheinost D. Power and reproducibility in the external validation of brain-phenotype predictions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.25.563971. [PMID: 37961654 PMCID: PMC10634903 DOI: 10.1101/2023.10.25.563971] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Identifying reproducible and generalizable brain-phenotype associations is a central goal of neuroimaging. Consistent with this goal, prediction frameworks evaluate brain-phenotype models in unseen data. Most prediction studies train and evaluate a model in the same dataset. However, external validation, or the evaluation of a model in an external dataset, provides a better assessment of robustness and generalizability. Despite the promise of external validation and calls for its usage, the statistical power of such studies has yet to be investigated. In this work, we ran over 60 million simulations across several datasets, phenotypes, and sample sizes to better understand how the sizes of the training and external datasets affect statistical power. We found that prior external validation studies used sample sizes prone to low power, which may lead to false negatives and effect size inflation. Furthermore, increases in the external sample size led to increased simulated power directly following theoretical power curves, whereas changes in the training dataset size offset the simulated power curves. Finally, we compared the performance of a model within a dataset to the external performance. The within-dataset performance was typically within r=0.2 of the cross-dataset performance, which could help decide how to power future external validation studies. Overall, our results illustrate the importance of considering the sample sizes of both the training and external datasets when performing external validation.
Collapse
Affiliation(s)
| | - Link Tejavibulya
- Interdepartmental Neuroscience Program, Yale University, New Haven, CT
| | - Chris C. Camp
- Interdepartmental Neuroscience Program, Yale University, New Haven, CT
| | - Rongtao Jiang
- Department of Radiology & Biomedical Imaging, Yale School of Medicine, New Haven, CT
| | - Margaret L. Westwater
- Department of Radiology & Biomedical Imaging, Yale School of Medicine, New Haven, CT
| | - Stephanie Noble
- Department of Radiology & Biomedical Imaging, Yale School of Medicine, New Haven, CT
- Department of Bioengineering, Northeastern University, Boston, MA
- Department of Psychology, Northeastern University, Boston, MA
| | - Dustin Scheinost
- Department of Biomedical Engineering, Yale University, New Haven, CT
- Interdepartmental Neuroscience Program, Yale University, New Haven, CT
- Department of Radiology & Biomedical Imaging, Yale School of Medicine, New Haven, CT
- Child Study Center, Yale School of Medicine, New Haven, CT
- Department of Statistics & Data Science, Yale University, New Haven, CT
| |
Collapse
|
5
|
Orlichenko A, Qu G, Su KJ, Liu A, Shen H, Deng HW, Wang YP. Identifiability in Functional Connectivity May Unintentionally Inflate Prediction Results. ARXIV 2023:arXiv:2308.01451v1. [PMID: 37576121 PMCID: PMC10418521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Functional magnetic resonance (fMRI) is an invaluable tool in studying cognitive processes in vivo. Many recent studies use functional connectivity (FC), partial correlation connectivity (PC), or fMRI-derived brain networks to predict phenotypes with results that sometimes cannot be replicated. At the same time, FC can be used to identify the same subject from different scans with great accuracy. In this paper, we show a method by which one can unknowingly inflate classification results from 61% accuracy to 86% accuracy by treating longitudinal or contemporaneous scans of the same subject as independent data points. Using the UK Biobank dataset, we find one can achieve the same level of variance explained with 50 training subjects by exploiting identifiability as with 10,000 training subjects without double-dipping. We replicate this effect in four different datasets: the UK Biobank (UKB), the Philadelphia Neurodevelopmental Cohort (PNC), the Bipolar and Schizophrenia Network for Intermediate Phenotypes (BSNIP), and an OpenNeuro Fibromyalgia dataset (Fibro). The unintentional improvement ranges between 7% and 25% in the four datasets. Additionally, we find that by using dynamic functional connectivity (dFC), one can apply this method even when one is limited to a single scan per subject. One major problem is that features such as ROIs or connectivities that are reported alongside inflated results may confuse future work. This article hopes to shed light on how even minor pipeline anomalies may lead to unexpectedly superb results.
Collapse
Affiliation(s)
- Anton Orlichenko
- Department of Biomedical Engineering, Tulane University, New Orleans, LA, USA
| | - Gang Qu
- Department of Biomedical Engineering, Tulane University, New Orleans, LA, USA
| | - Kuan-Jui Su
- School of Medicine, Tulane University, New Orleans, LA, USA
| | - Anqi Liu
- School of Medicine, Tulane University, New Orleans, LA, USA
| | - Hui Shen
- School of Medicine, Tulane University, New Orleans, LA, USA
| | - Hong-Wen Deng
- School of Medicine, Tulane University, New Orleans, LA, USA
| | - Yu-Ping Wang
- Department of Biomedical Engineering, Tulane University, New Orleans, LA, USA
| |
Collapse
|