1
|
Gage AT, Stone JR, Wilde EA, McCauley SR, Welsh RC, Mugler JP, Tustison N, Avants B, Whitlow CT, Lancashire L, Bhatt SD, Haas M. Normative Neuroimaging Library: Designing a Comprehensive and Demographically Diverse Dataset of Healthy Controls to Support Traumatic Brain Injury Diagnostic and Therapeutic Development. J Neurotrauma 2024. [PMID: 39235436 DOI: 10.1089/neu.2024.0128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2024] Open
Abstract
The past decade has seen impressive advances in neuroimaging, moving from qualitative to quantitative outputs. Available techniques now allow for the inference of microscopic changes occurring in white and gray matter, along with alterations in physiology and function. These existing and emerging techniques hold the potential of providing unprecedented capabilities in achieving a diagnosis and predicting outcomes for traumatic brain injury (TBI) and a variety of other neurological diseases. To see this promise move from the research lab into clinical care, an understanding is needed of what normal data look like for all age ranges, sex, and other demographic and socioeconomic categories. Clinicians can only use the results of imaging scans to support their decision-making if they know how the results for their patient compare with a normative standard. This potential for utilizing magnetic resonance imaging (MRI) in TBI diagnosis motivated the American College of Radiology and Cohen Veterans Bioscience to create a reference database of healthy individuals with neuroimaging, demographic data, and characterization of psychological functioning and neurocognitive data that will serve as a normative resource for clinicians and researchers for development of diagnostics and therapeutics for TBI and other brain disorders. The goal of this article is to introduce the large, well-curated Normative Neuroimaging Library (NNL) to the research community. NNL consists of data collected from ∼1900 healthy participants. The highlights of NNL are (1) data are collected across a diverse population, including civilians, veterans, and active-duty service members with an age range (18-64 years) not well represented in existing datasets; (2) comprehensive structural and functional neuroimaging acquisition with state-of-the-art sequences (including structural, diffusion, and functional MRI; raw scanner data are preserved, allowing higher quality data to be derived in the future; standardized imaging acquisition protocols across sites reflect sequences and parameters often recommended for use with various neurological and psychiatric conditions, including TBI, post-traumatic stress disorder, stroke, neurodegenerative disorders, and neoplastic disease); and (3) the collection of comprehensive demographic details, medical history, and a broad structured clinical assessment, including cognition and psychological scales, relevant to multiple neurological conditions with functional sequelae. Thus, NNL provides a demographically diverse population of healthy individuals who can serve as a comparison group for brain injury study and clinical samples, providing a strong foundation for precision medicine. Use cases include the creation of imaging-derived phenotypes (IDPs), derivation of reference ranges of imaging measures, and use of IDPs as training samples for artificial intelligence-based biomarker development and for normative modeling to help identify injury-induced changes as outliers for precision diagnosis and targeted therapeutic development. On its release, NNL is poised to support the use of advanced imaging in clinician decision support tools, the validation of imaging biomarkers, and the investigation of brain-behavior anomalies, moving the field toward precision medicine.
Collapse
Affiliation(s)
| | - James R Stone
- Department of Radiology and Medical Imaging, University of Virginia, Charlottesville, Virginia, USA
| | - Elisabeth A Wilde
- George E. Wahlen VA, Salt Lake City Healthcare System, Salt Lake City, Utah, USA
| | - Stephen R McCauley
- Department of Neurology, Baylor College of Medicine, Houston, Texas, USA
| | - Robert C Welsh
- Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, USA
| | - John P Mugler
- Department of Radiology and Medical Imaging, University of Virginia, Charlottesville, Virginia, USA
| | - Nick Tustison
- Department of Radiology and Medical Imaging, University of Virginia, Charlottesville, Virginia, USA
| | - Brian Avants
- Department of Radiology and Medical Imaging, University of Virginia, Charlottesville, Virginia, USA
| | - Christopher T Whitlow
- Department of Radiology, Wake Forest University School of Medicine, Winston-Salem, North Carolina, USA
| | | | | | - Magali Haas
- Cohen Veterans Bioscience, New York, New York, USA
| |
Collapse
|
2
|
Hu F, Lucas A, Chen AA, Coleman K, Horng H, Ng RWS, Tustison NJ, Davis KA, Shou H, Li M, Shinohara RT. DeepComBat: A statistically motivated, hyperparameter-robust, deep learning approach to harmonization of neuroimaging data. Hum Brain Mapp 2024; 45:e26708. [PMID: 39056477 PMCID: PMC11273293 DOI: 10.1002/hbm.26708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 04/19/2024] [Accepted: 04/25/2024] [Indexed: 07/28/2024] Open
Abstract
Neuroimaging data acquired using multiple scanners or protocols are increasingly available. However, such data exhibit technical artifacts across batches which introduce confounding and decrease reproducibility. This is especially true when multi-batch data are analyzed using complex downstream models which are more likely to pick up on and implicitly incorporate batch-related information. Previously proposed image harmonization methods have sought to remove these batch effects; however, batch effects remain detectable in the data after applying these methods. We present DeepComBat, a deep learning harmonization method based on a conditional variational autoencoder and the ComBat method. DeepComBat combines the strengths of statistical and deep learning methods in order to account for the multivariate relationships between features while simultaneously relaxing strong assumptions made by previous deep learning harmonization methods. As a result, DeepComBat can perform multivariate harmonization while preserving data structure and avoiding the introduction of synthetic artifacts. We apply this method to cortical thickness measurements from a cognitive-aging cohort and show DeepComBat qualitatively and quantitatively outperforms existing methods in removing batch effects while preserving biological heterogeneity. Additionally, DeepComBat provides a new perspective for statistically motivated deep learning harmonization methods.
Collapse
Affiliation(s)
- Fengling Hu
- Penn Statistics in Imaging and Visualization Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and InformaticsPerelman School of Medicine, University of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Alfredo Lucas
- Center for Neuroengineering and Therapeutics, Department of EngineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Andrew A. Chen
- Penn Statistics in Imaging and Visualization Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and InformaticsPerelman School of Medicine, University of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Kyle Coleman
- Statistical Center for Single‐Cell and Spatial GenomicsPerelman School of Medicine, University of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Hannah Horng
- Penn Statistics in Imaging and Visualization Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and InformaticsPerelman School of Medicine, University of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Raymond W. S. Ng
- Perelman School of Medicine, University of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Nicholas J. Tustison
- Department of Radiology and Medical ImagingUniversity of VirginiaCharlottesvilleVirginiaUSA
| | - Kathryn A. Davis
- Center for Neuroengineering and Therapeutics, Department of EngineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of NeurologyPerelman School of Medicine, University of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Haochang Shou
- Penn Statistics in Imaging and Visualization Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and InformaticsPerelman School of Medicine, University of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Center for Biomedical Image Computing and Analytics (CBICA)Perelman School of MedicinePhiladelphiaPennsylvaniaUSA
| | - Mingyao Li
- Statistical Center for Single‐Cell and Spatial GenomicsPerelman School of Medicine, University of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Russell T. Shinohara
- Penn Statistics in Imaging and Visualization Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and InformaticsPerelman School of Medicine, University of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Center for Biomedical Image Computing and Analytics (CBICA)Perelman School of MedicinePhiladelphiaPennsylvaniaUSA
| | | |
Collapse
|
3
|
Shan Y, Huang C, Li Y, Zhu H. Merging or ensembling: integrative analysis in multiple neuroimaging studies. Biometrics 2024; 80:ujae003. [PMID: 38465984 PMCID: PMC10926268 DOI: 10.1093/biomtc/ujae003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Revised: 11/27/2023] [Accepted: 01/10/2024] [Indexed: 03/12/2024]
Abstract
The aim of this paper is to systematically investigate merging and ensembling methods for spatially varying coefficient mixed effects models (SVCMEM) in order to carry out integrative learning of neuroimaging data obtained from multiple biomedical studies. The "merged" approach involves training a single learning model using a comprehensive dataset that encompasses information from all the studies. Conversely, the "ensemble" approach involves creating a weighted average of distinct learning models, each developed from an individual study. We systematically investigate the prediction accuracy of the merged and ensemble learners under the presence of different degrees of interstudy heterogeneity. Additionally, we establish asymptotic guidelines for making strategic decisions about when to employ either of these models in different scenarios, along with deriving optimal weights for the ensemble learner. To validate our theoretical results, we perform extensive simulation studies. The proposed methodology is also applied to 3 large-scale neuroimaging studies.
Collapse
Affiliation(s)
- Yue Shan
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| | - Chao Huang
- Department of Statistics, Florida State University, Tallahassee, FL 32306, United States
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
- Department of Statistics, Florida State University, Tallahassee, FL 32306, United States
- Department of Statistics & Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| |
Collapse
|
4
|
Adkinson BD, Rosenblatt M, Dadashkarimi J, Tejavibulya L, Jiang R, Noble S, Scheinost D. Brain-phenotype predictions can survive across diverse real-world data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.23.576916. [PMID: 38328100 PMCID: PMC10849571 DOI: 10.1101/2024.01.23.576916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Recent work suggests that machine learning models predicting psychiatric treatment outcomes based on clinical data may fail when applied to unharmonized samples. Neuroimaging predictive models offer the opportunity to incorporate neurobiological information, which may be more robust to dataset shifts. Yet, among the minority of neuroimaging studies that undertake any form of external validation, there is a notable lack of attention to generalization across dataset-specific idiosyncrasies. Research settings, by design, remove the between-site variations that real-world and, eventually, clinical applications demand. Here, we rigorously test the ability of a range of predictive models to generalize across three diverse, unharmonized samples: the Philadelphia Neurodevelopmental Cohort (n=1291), the Healthy Brain Network (n=1110), and the Human Connectome Project in Development (n=428). These datasets have high inter-dataset heterogeneity, encompassing substantial variations in age distribution, sex, racial and ethnic minority representation, recruitment geography, clinical symptom burdens, fMRI tasks, sequences, and behavioral measures. We demonstrate that reproducible and generalizable brain-behavior associations can be realized across diverse dataset features with sample sizes in the hundreds. Results indicate the potential of functional connectivity-based predictive models to be robust despite substantial inter-dataset variability. Notably, for the HCPD and HBN datasets, the best predictions were not from training and testing in the same dataset (i.e., cross-validation) but across datasets. This result suggests that training on diverse data may improve prediction in specific cases. Overall, this work provides a critical foundation for future work evaluating the generalizability of neuroimaging predictive models in real-world scenarios and clinical settings.
Collapse
Affiliation(s)
- Brendan D Adkinson
- Interdepartmental Neuroscience Program, Yale School of Medicine, New Haven, CT, 06510, USA
| | - Matthew Rosenblatt
- Department of Biomedical Engineering, Yale University, New Haven, CT, 06520, USA
| | - Javid Dadashkarimi
- Department of Radiology, Athinoula. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA, 02129, USA
- Department of Radiology, Harvard Medical School, Boston, MA, 02129, USA
| | - Link Tejavibulya
- Interdepartmental Neuroscience Program, Yale School of Medicine, New Haven, CT, 06510, USA
| | - Rongtao Jiang
- Department of Radiology & Biomedical Imaging, Yale School of Medicine, New Haven, CT, 06510, USA
| | - Stephanie Noble
- Department of Radiology & Biomedical Imaging, Yale School of Medicine, New Haven, CT, 06510, USA
- Department of Bioengineering, Northeastern University, Boston, MA, 02120, USA
- Department of Psychology, Northeastern University, Boston, MA, 02115, USA
| | - Dustin Scheinost
- Interdepartmental Neuroscience Program, Yale School of Medicine, New Haven, CT, 06510, USA
- Department of Biomedical Engineering, Yale University, New Haven, CT, 06520, USA
- Department of Radiology & Biomedical Imaging, Yale School of Medicine, New Haven, CT, 06510, USA
- Department of Statistics & Data Science, Yale University, New Haven, CT, 06520, USA
- Child Study Center, Yale School of Medicine, New Haven, CT, 06510, USA
- Wu Tsai Institute, Yale University, New Haven, CT, 06510, USA
| |
Collapse
|
5
|
Marzi C, Giannelli M, Barucci A, Tessa C, Mascalchi M, Diciotti S. Efficacy of MRI data harmonization in the age of machine learning: a multicenter study across 36 datasets. Sci Data 2024; 11:115. [PMID: 38263181 PMCID: PMC10805868 DOI: 10.1038/s41597-023-02421-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 07/27/2023] [Indexed: 01/25/2024] Open
Abstract
Pooling publicly-available MRI data from multiple sites allows to assemble extensive groups of subjects, increase statistical power, and promote data reuse with machine learning techniques. The harmonization of multicenter data is necessary to reduce the confounding effect associated with non-biological sources of variability in the data. However, when applied to the entire dataset before machine learning, the harmonization leads to data leakage, because information outside the training set may affect model building, and potentially falsely overestimate performance. We propose a 1) measurement of the efficacy of data harmonization; 2) harmonizer transformer, i.e., an implementation of the ComBat harmonization allowing its encapsulation among the preprocessing steps of a machine learning pipeline, avoiding data leakage by design. We tested these tools using brain T1-weighted MRI data from 1740 healthy subjects acquired at 36 sites. After harmonization, the site effect was removed or reduced, and we showed the data leakage effect in predicting individual age from MRI data, highlighting that introducing the harmonizer transformer into a machine learning pipeline allows for avoiding data leakage by design.
Collapse
Affiliation(s)
- Chiara Marzi
- Department of Statistics, Computer Science and Applications "Giuseppe Parenti", University of Florence, 50134, Florence, Italy
- "Nello Carrara" Institute of Applied Physics (IFAC), National Research Council (CNR), 50019, Sesto Fiorentino, Florence, Italy
| | - Marco Giannelli
- Unit of Medical Physics, Pisa University Hospital "Azienda Ospedaliero-Universitaria Pisana", 56126, Pisa, Italy
| | - Andrea Barucci
- "Nello Carrara" Institute of Applied Physics (IFAC), National Research Council (CNR), 50019, Sesto Fiorentino, Florence, Italy
| | - Carlo Tessa
- Radiology Unit Apuane e Lunigiana, Azienda USL Toscana Nord Ovest, 54100, Massa, Italy
| | - Mario Mascalchi
- Department of Experimental and Clinical Biomedical Sciences "Mario Serio", University of Florence, 50139, Florence, Italy
- Division of Epidemiology and Clinical Governance, Institute for Study, Prevention and netwoRk in Oncology (ISPRO), 50139, Florence, Italy
| | - Stefano Diciotti
- Department of Electrical, Electronic, and Information Engineering "Guglielmo Marconi" - DEI, University of Bologna, 47522, Cesena, Italy.
- Alma Mater Research Institute for Human-Centered Artificial Intelligence, University of Bologna, 40121, Bologna, Italy.
| |
Collapse
|
6
|
Zhu H, Li T, Zhao B. Statistical Learning Methods for Neuroimaging Data Analysis with Applications. Annu Rev Biomed Data Sci 2023; 6:73-104. [PMID: 37127052 DOI: 10.1146/annurev-biodatasci-020722-100353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
The aim of this review is to provide a comprehensive survey of statistical challenges in neuroimaging data analysis, from neuroimaging techniques to large-scale neuroimaging studies and statistical learning methods. We briefly review eight popular neuroimaging techniques and their potential applications in neuroscience research and clinical translation. We delineate four themes of neuroimaging data and review major image processing analysis methods for processing neuroimaging data at the individual level. We briefly review four large-scale neuroimaging-related studies and a consortium on imaging genomics and discuss four themes of neuroimaging data analysis at the population level. We review nine major population-based statistical analysis methods and their associated statistical challenges and present recent progress in statistical methodology to address these challenges.
Collapse
Affiliation(s)
- Hongtu Zhu
- Department of Biostatistics, Department of Statistics, Department of Genetics, and Department of Computer Science, University of North Carolina, Chapel Hill, North Carolina, USA;
- Biomedical Research Imaging Center, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Tengfei Li
- Biomedical Research Imaging Center, University of North Carolina, Chapel Hill, North Carolina, USA
- Department of Radiology, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Bingxin Zhao
- Department of Statistics and Data Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
7
|
Zhang R, Oliver LD, Voineskos AN, Park JY. RELIEF: A structured multivariate approach for removal of latent inter-scanner effects. IMAGING NEUROSCIENCE (CAMBRIDGE, MASS.) 2023; 1:1-16. [PMID: 37719839 PMCID: PMC10503485 DOI: 10.1162/imag_a_00011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 08/02/2023] [Indexed: 09/19/2023]
Abstract
Combining data collected from multiple study sites is becoming common and is advantageous to researchers to increase the generalizability and replicability of scientific discoveries. However, at the same time, unwanted inter-scanner biases are commonly observed across neuroimaging data collected from multiple study sites or scanners, rendering difficulties in integrating such data to obtain reliable findings. While several methods for handling such unwanted variations have been proposed, most of them use univariate approaches that could be too simple to capture all sources of scanner-specific variations. To address these challenges, we propose a novel multivariate harmonization method called RELIEF (REmoval of Latent Inter-scanner Effects through Factorization) for estimating and removing both explicit and latent scanner effects. Our method is the first approach to introduce the simultaneous dimension reduction and factorization of interlinked matrices to a data harmonization context, which provides a new direction in methodological research for correcting inter-scanner biases. Analyzing diffusion tensor imaging (DTI) data from the Social Processes Initiative in Neurobiology of the Schizophrenia (SPINS) study and conducting extensive simulation studies, we show that RELIEF outperforms existing harmonization methods in mitigating inter-scanner biases and retaining biological associations of interest to increase statistical power. RELIEF is publicly available as an R package.
Collapse
Affiliation(s)
- Rongqian Zhang
- Department of Statistical Sciences, University of Toronto, Toronto, Canada
| | | | - Aristotle N. Voineskos
- Centre for Addiction and Mental Health, Toronto, Canada
- Department of Psychiatry, University of Toronto, Toronto, Canada
| | - Jun Young Park
- Department of Statistical Sciences, University of Toronto, Toronto, Canada
- Department of Psychology, University of Toronto, Toronto, Canada
| |
Collapse
|
8
|
Hu F, Chen AA, Horng H, Bashyam V, Davatzikos C, Alexander-Bloch A, Li M, Shou H, Satterthwaite TD, Yu M, Shinohara RT. Image harmonization: A review of statistical and deep learning methods for removing batch effects and evaluation metrics for effective harmonization. Neuroimage 2023; 274:120125. [PMID: 37084926 PMCID: PMC10257347 DOI: 10.1016/j.neuroimage.2023.120125] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 04/12/2023] [Accepted: 04/19/2023] [Indexed: 04/23/2023] Open
Abstract
Magnetic resonance imaging and computed tomography from multiple batches (e.g. sites, scanners, datasets, etc.) are increasingly used alongside complex downstream analyses to obtain new insights into the human brain. However, significant confounding due to batch-related technical variation, called batch effects, is present in this data; direct application of downstream analyses to the data may lead to biased results. Image harmonization methods seek to remove these batch effects and enable increased generalizability and reproducibility of downstream results. In this review, we describe and categorize current approaches in statistical and deep learning harmonization methods. We also describe current evaluation metrics used to assess harmonization methods and provide a standardized framework to evaluate newly-proposed methods for effective harmonization and preservation of biological information. Finally, we provide recommendations to end-users to advocate for more effective use of current methods and to methodologists to direct future efforts and accelerate development of the field.
Collapse
Affiliation(s)
- Fengling Hu
- Penn Statistics in Imaging and Visualization Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Dr, Philadelphia, PA 19104, United States.
| | - Andrew A Chen
- Penn Statistics in Imaging and Visualization Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Dr, Philadelphia, PA 19104, United States
| | - Hannah Horng
- Penn Statistics in Imaging and Visualization Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Dr, Philadelphia, PA 19104, United States
| | - Vishnu Bashyam
- Center for Biomedical Image Computing and Analytics (CBICA), Perelman School of Medicine, United States
| | - Christos Davatzikos
- Center for Biomedical Image Computing and Analytics (CBICA), Perelman School of Medicine, United States
| | - Aaron Alexander-Bloch
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, United States; Penn-CHOP Lifespan Brain Institute, United States; Department of Child and Adolescent Psychiatry and Behavioral Science, Children's Hospital of Philadelphia, United States
| | - Mingyao Li
- Statistical Center for Single-Cell and Spatial Genomics, Perelman School of Medicine, University of Pennsylvania, United States
| | - Haochang Shou
- Penn Statistics in Imaging and Visualization Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Dr, Philadelphia, PA 19104, United States; Center for Biomedical Image Computing and Analytics (CBICA), Perelman School of Medicine, United States
| | - Theodore D Satterthwaite
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, United States; Penn-CHOP Lifespan Brain Institute, United States; The Penn Lifespan Informatics and Neuroimaging Center, Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, United States
| | - Meichen Yu
- Indiana Alzheimer's Disease Research Center, Indiana University School of Medicine, United States
| | - Russell T Shinohara
- Penn Statistics in Imaging and Visualization Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Dr, Philadelphia, PA 19104, United States; Center for Biomedical Image Computing and Analytics (CBICA), Perelman School of Medicine, United States
| |
Collapse
|
9
|
Hu F, Lucas A, Chen AA, Coleman K, Horng H, Ng RW, Tustison NJ, Davis KA, Shou H, Li M, Shinohara RT. DeepComBat: A Statistically Motivated, Hyperparameter-Robust, Deep Learning Approach to Harmonization of Neuroimaging Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.24.537396. [PMID: 37163042 PMCID: PMC10168207 DOI: 10.1101/2023.04.24.537396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Neuroimaging data from multiple batches (i.e. acquisition sites, scanner manufacturer, datasets, etc.) are increasingly necessary to gain new insights into the human brain. However, multi-batch data, as well as extracted radiomic features, exhibit pronounced technical artifacts across batches. These batch effects introduce confounding into the data and can obscure biological effects of interest, decreasing the generalizability and reproducibility of findings. This is especially true when multi-batch data is used alongside complex downstream analysis models, such as machine learning methods. Image harmonization methods seeking to remove these batch effects are important for mitigating these issues; however, significant multivariate batch effects remain in the data following harmonization by current state-of-the-art statistical and deep learning methods. We present DeepCombat, a deep learning harmonization method based on a conditional variational autoencoder architecture and the ComBat harmonization model. DeepCombat learns and removes subject-level batch effects by accounting for the multivariate relationships between features. Additionally, DeepComBat relaxes a number of strong assumptions commonly made by previous deep learning harmonization methods and is empirically robust across a wide range of hyperparameter choices. We apply this method to neuroimaging data from a large cognitive-aging cohort and find that DeepCombat outperforms existing methods, as assessed by a battery of machine learning methods, in removing scanner effects from cortical thickness measurements while preserving biological heterogeneity. Additionally, DeepComBat provides a new perspective for statistically-motivated deep learning harmonization methods.
Collapse
Affiliation(s)
- Fengling Hu
- Penn Statistics in Imaging and Visualization Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania
| | - Alfredo Lucas
- Center for Neuroengineering and Therapeutics, Department of Engineering, University of Pennsylvania
| | - Andrew A. Chen
- Penn Statistics in Imaging and Visualization Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania
| | - Kyle Coleman
- Statistical Center for Single-Cell and Spatial Genomics, Perelman School of Medicine, University of Pennsylvania
| | - Hannah Horng
- Penn Statistics in Imaging and Visualization Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania
| | | | | | - Kathryn A. Davis
- Center for Neuroengineering and Therapeutics, Department of Engineering, University of Pennsylvania
- Department of Neurology, Perelman School of Medicine, University of Pennsylvania
| | - Haochang Shou
- Penn Statistics in Imaging and Visualization Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania
- Center for Biomedical Image Computing and Analytics (CBICA), Perelman School of Medicine
| | - Mingyao Li
- Statistical Center for Single-Cell and Spatial Genomics, Perelman School of Medicine, University of Pennsylvania
| | - Russell T. Shinohara
- Penn Statistics in Imaging and Visualization Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania
- Center for Biomedical Image Computing and Analytics (CBICA), Perelman School of Medicine
| | | |
Collapse
|
10
|
Tian D, Zeng Z, Sun X, Tong Q, Li H, He H, Gao JH, He Y, Xia M. A deep learning-based multisite neuroimage harmonization framework established with a traveling-subject dataset. Neuroimage 2022; 257:119297. [PMID: 35568346 DOI: 10.1016/j.neuroimage.2022.119297] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Revised: 03/31/2022] [Accepted: 05/09/2022] [Indexed: 12/12/2022] Open
Abstract
The accumulation of multisite large-sample MRI datasets collected during large brain research projects in the last decade has provided critical resources for understanding the neurobiological mechanisms underlying cognitive functions and brain disorders. However, the significant site effects observed in imaging data and their derived structural and functional features have prevented the derivation of consistent findings across multiple studies. The development of harmonization methods that can effectively eliminate complex site effects while maintaining biological characteristics in neuroimaging data has become a vital and urgent requirement for multisite imaging studies. Here, we propose a deep learning-based framework to harmonize imaging data obtained from pairs of sites, in which site factors and brain features can be disentangled and encoded. We trained the proposed framework with a publicly available traveling subject dataset from the Strategic Research Program for Brain Sciences (SRPBS) and harmonized the gray matter volume maps derived from eight source sites to a target site. The proposed framework significantly eliminated intersite differences in gray matter volumes. The embedded encoders successfully captured both the abstract textures of site factors and the concrete brain features. Moreover, the proposed framework exhibited outstanding performance relative to conventional statistical harmonization methods in terms of site effect removal, data distribution homogenization, and intrasubject similarity improvement. Finally, the proposed harmonization network provided fixable expandability, through which new sites could be linked to the target site via indirect schema without retraining the whole model. Together, the proposed method offers a powerful and interpretable deep learning-based harmonization framework for multisite neuroimaging data that can enhance reliability and reproducibility in multisite studies regarding brain development and brain disorders.
Collapse
Affiliation(s)
- Dezheng Tian
- State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing 100875, China; Beijing Key Laboratory of Brain Imaging and Connectomics, Beijing Normal University, Beijing 100875, China; IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China
| | - Zilong Zeng
- State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing 100875, China; Beijing Key Laboratory of Brain Imaging and Connectomics, Beijing Normal University, Beijing 100875, China; IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China
| | - Xiaoyi Sun
- State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing 100875, China; Beijing Key Laboratory of Brain Imaging and Connectomics, Beijing Normal University, Beijing 100875, China; IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China; School of Systems Science, Beijing Normal University, Beijing 100875, China
| | - Qiqi Tong
- Research Center for Healthcare Data Science, Zhejiang Lab, Hangzhou 311121, China
| | - Huanjie Li
- School of Biomedical Engineering, Dalian University of Technology, Dalian 116024, China
| | - Hongjian He
- Center for Brain Imaging Science and Technology, Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou 310027, China
| | - Jia-Hong Gao
- Center for MRI Research, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China; Beijing City Key Laboratory for Medical Physics and Engineering, Institute of Heavy Ion Physics, School of Physics, Peking University, Beijing 100871, China; IDG/McGovern Institute for Brain Research, Peking University, Beijing 100871, China
| | - Yong He
- State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing 100875, China; Beijing Key Laboratory of Brain Imaging and Connectomics, Beijing Normal University, Beijing 100875, China; IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China; Chinese Institute for Brain Research, Beijing 102206, China
| | - Mingrui Xia
- State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing 100875, China; Beijing Key Laboratory of Brain Imaging and Connectomics, Beijing Normal University, Beijing 100875, China; IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China.
| |
Collapse
|