1
|
Cheng C, Messerschmidt L, Bravo I, Waldbauer M, Bhavikatti R, Schenk C, Grujic V, Model T, Kubinec R, Barceló J. A General Primer for Data Harmonization. Sci Data 2024; 11:152. [PMID: 38297013 PMCID: PMC10831085 DOI: 10.1038/s41597-024-02956-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 01/11/2024] [Indexed: 02/02/2024] Open
Affiliation(s)
- Cindy Cheng
- Hochschule für Politik, Technical University of Munich, Richard-Wagner Str. 1, Munich, 80333, Bavaria, Germany.
| | - Luca Messerschmidt
- Hochschule für Politik, Technical University of Munich, Richard-Wagner Str. 1, Munich, 80333, Bavaria, Germany
| | - Isaac Bravo
- Hochschule für Politik, Technical University of Munich, Richard-Wagner Str. 1, Munich, 80333, Bavaria, Germany
| | - Marco Waldbauer
- Hochschule für Politik, Technical University of Munich, Richard-Wagner Str. 1, Munich, 80333, Bavaria, Germany
| | | | - Caress Schenk
- School of Humanities and Social Sciences, Nazarbayev University, Kabanbay Batry Ave., 53, Astana, 010000, Kazakhstan
| | - Vanja Grujic
- Faculty of Law, University of Brasilia, Campus Universitário Darcy Ribeiro Asa Norte, Brasília, 10587, Brazil
| | - Tim Model
- Delve, 2225 3rd St, San Francisco, 94107, California, USA
| | - Robert Kubinec
- Division of Social Science, New York University Abu Dhabi, Social Science Building (A5), Abu Dhabi, 129188, United Arab Emirates
| | - Joan Barceló
- Division of Social Science, New York University Abu Dhabi, Social Science Building (A5), Abu Dhabi, 129188, United Arab Emirates
| |
Collapse
|
2
|
Adkinson BD, Rosenblatt M, Dadashkarimi J, Tejavibulya L, Jiang R, Noble S, Scheinost D. Brain-phenotype predictions can survive across diverse real-world data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.23.576916. [PMID: 38328100 PMCID: PMC10849571 DOI: 10.1101/2024.01.23.576916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Recent work suggests that machine learning models predicting psychiatric treatment outcomes based on clinical data may fail when applied to unharmonized samples. Neuroimaging predictive models offer the opportunity to incorporate neurobiological information, which may be more robust to dataset shifts. Yet, among the minority of neuroimaging studies that undertake any form of external validation, there is a notable lack of attention to generalization across dataset-specific idiosyncrasies. Research settings, by design, remove the between-site variations that real-world and, eventually, clinical applications demand. Here, we rigorously test the ability of a range of predictive models to generalize across three diverse, unharmonized samples: the Philadelphia Neurodevelopmental Cohort (n=1291), the Healthy Brain Network (n=1110), and the Human Connectome Project in Development (n=428). These datasets have high inter-dataset heterogeneity, encompassing substantial variations in age distribution, sex, racial and ethnic minority representation, recruitment geography, clinical symptom burdens, fMRI tasks, sequences, and behavioral measures. We demonstrate that reproducible and generalizable brain-behavior associations can be realized across diverse dataset features with sample sizes in the hundreds. Results indicate the potential of functional connectivity-based predictive models to be robust despite substantial inter-dataset variability. Notably, for the HCPD and HBN datasets, the best predictions were not from training and testing in the same dataset (i.e., cross-validation) but across datasets. This result suggests that training on diverse data may improve prediction in specific cases. Overall, this work provides a critical foundation for future work evaluating the generalizability of neuroimaging predictive models in real-world scenarios and clinical settings.
Collapse
Affiliation(s)
- Brendan D Adkinson
- Interdepartmental Neuroscience Program, Yale School of Medicine, New Haven, CT, 06510, USA
| | - Matthew Rosenblatt
- Department of Biomedical Engineering, Yale University, New Haven, CT, 06520, USA
| | - Javid Dadashkarimi
- Department of Radiology, Athinoula. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA, 02129, USA
- Department of Radiology, Harvard Medical School, Boston, MA, 02129, USA
| | - Link Tejavibulya
- Interdepartmental Neuroscience Program, Yale School of Medicine, New Haven, CT, 06510, USA
| | - Rongtao Jiang
- Department of Radiology & Biomedical Imaging, Yale School of Medicine, New Haven, CT, 06510, USA
| | - Stephanie Noble
- Department of Radiology & Biomedical Imaging, Yale School of Medicine, New Haven, CT, 06510, USA
- Department of Bioengineering, Northeastern University, Boston, MA, 02120, USA
- Department of Psychology, Northeastern University, Boston, MA, 02115, USA
| | - Dustin Scheinost
- Interdepartmental Neuroscience Program, Yale School of Medicine, New Haven, CT, 06510, USA
- Department of Biomedical Engineering, Yale University, New Haven, CT, 06520, USA
- Department of Radiology & Biomedical Imaging, Yale School of Medicine, New Haven, CT, 06510, USA
- Department of Statistics & Data Science, Yale University, New Haven, CT, 06520, USA
- Child Study Center, Yale School of Medicine, New Haven, CT, 06510, USA
- Wu Tsai Institute, Yale University, New Haven, CT, 06510, USA
| |
Collapse
|