1
|
Orlichenko A, Qu G, Zhou Z, Liu A, Deng HW, Ding Z, Stephen JM, Wilson TW, Calhoun VD, Wang YP. A Demographic-Conditioned Variational Autoencoder for fMRI Distribution Sampling and Removal of Confounds. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.16.594528. [PMID: 38798580 PMCID: PMC11118390 DOI: 10.1101/2024.05.16.594528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Objective fMRI and derived measures such as functional connectivity (FC) have been used to predict brain age, general fluid intelligence, psychiatric disease status, and preclinical neurodegenerative disease. However, it is not always clear that all demographic confounds, such as age, sex, and race, have been removed from fMRI data. Additionally, many fMRI datasets are restricted to authorized researchers, making dissemination of these valuable data sources challenging. Methods We create a variational autoencoder (VAE)-based model, DemoVAE, to decorrelate fMRI features from demographics and generate high-quality synthetic fMRI data based on user-supplied demographics. We train and validate our model using two large, widely used datasets, the Philadelphia Neurodevel-opmental Cohort (PNC) and Bipolar and Schizophrenia Network for Intermediate Phenotypes (BSNIP). Results We find that DemoVAE recapitulates group differences in fMRI data while capturing the full breadth of individual variations. Significantly, we also find that most clinical and computerized battery fields that are correlated with fMRI data are not correlated with DemoVAE latents. An exception are several fields related to schizophrenia medication and symptom severity. Conclusion Our model generates fMRI data that captures the full distribution of FC better than traditional VAE or GAN models. We also find that most prediction using fMRI data is dependent on correlation with, and prediction of, demographics. Significance Our DemoVAE model allows for generation of high quality synthetic data conditioned on subject demographics as well as the removal of the confounding effects of demographics. We identify that FC-based prediction tasks are highly influenced by demographic confounds.
Collapse
|
2
|
Rokham H, Falakshahi H, Fu Z, Pearlson G, Calhoun VD. Evaluation of boundaries between mood and psychosis disorder using dynamic functional network connectivity (dFNC) via deep learning classification. Hum Brain Mapp 2023; 44:3180-3195. [PMID: 36919656 PMCID: PMC10171526 DOI: 10.1002/hbm.26273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 02/20/2023] [Accepted: 02/27/2023] [Indexed: 03/16/2023] Open
Abstract
The validity and reliability of diagnoses in psychiatry is a challenging topic in mental health. The current mental health categorization is based primarily on symptoms and clinical course and is not biologically validated. Among multiple ongoing efforts, neurological observations alongside clinical evaluations are considered to be potential solutions to address diagnostic problems. The Bipolar-Schizophrenia Network on Intermediate Phenotypes (B-SNIP) has published multiple papers attempting to reclassify psychotic illnesses based on biological rather than symptomatic measures. However, the effort to investigate the relationship between this new categorization approach and other neuroimaging techniques, including resting-state fMRI data, is still limited. This study focused on investigating the relationship between different psychotic disorders categorization methods and resting-state fMRI-based measures called dynamic functional network connectivity (dFNC) using state-of-the-art artificial intelligence (AI) approaches. We applied our method to 613 subjects, including individuals with psychosis and healthy controls, which were classified using both the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV) and the B-SNIP biomarker-based (Biotype) approach. Statistical group differences and cross-validated classifiers were performed within each framework to assess how different categories. Results highlight interesting differences in occupancy in both DSM-IV and Biotype categorizations compared to healthy individuals, which are distributed across specific transient connectivity states. Biotypes tended to show less distinctiveness in occupancy level and included fewer cellwise differences. Classification accuracy obtained by DSM-IV and Biotype categories were both well above chance. Results provided new insights and highlighted the benefits of both DSM-IV and biology-based categories while also emphasizing the importance of future work in this direction, including employing further data types.
Collapse
Affiliation(s)
- Hooman Rokham
- Department of Electrical and Computer EngineeringGeorgia Institute of TechnologyAtlantaGeorgiaUSA
- Tri‐institutional Center of Translational Research in Neuroimaging and Data Science (TReNDS), Georgia Institute of Technology, and Emory UniversityGeorgia State UniversityAtlantaGeorgiaUSA
| | - Haleh Falakshahi
- Department of Electrical and Computer EngineeringGeorgia Institute of TechnologyAtlantaGeorgiaUSA
- Tri‐institutional Center of Translational Research in Neuroimaging and Data Science (TReNDS), Georgia Institute of Technology, and Emory UniversityGeorgia State UniversityAtlantaGeorgiaUSA
| | - Zening Fu
- Tri‐institutional Center of Translational Research in Neuroimaging and Data Science (TReNDS), Georgia Institute of Technology, and Emory UniversityGeorgia State UniversityAtlantaGeorgiaUSA
| | - Godfrey Pearlson
- Department of PsychiatryYale UniversityNew HavenConnecticutUSA
- Department of NeuroscienceYale UniversityNew HavenConnecticutUSA
- Olin Neuropsychiatry Research CenterHartford HospitalHartfordConnecticutUSA
| | - Vince D. Calhoun
- Department of Electrical and Computer EngineeringGeorgia Institute of TechnologyAtlantaGeorgiaUSA
- Tri‐institutional Center of Translational Research in Neuroimaging and Data Science (TReNDS), Georgia Institute of Technology, and Emory UniversityGeorgia State UniversityAtlantaGeorgiaUSA
- Department of PsychiatryYale UniversityNew HavenConnecticutUSA
- Department of Computer ScienceGeorgia State UniversityAtlantaGeorgiaUSA
- Department of PsychologyGeorgia State UniversityAtlantaGeorgiaUSA
| |
Collapse
|
3
|
NeuroCrypt: Machine Learning Over Encrypted Distributed Neuroimaging Data. Neuroinformatics 2022; 20:91-108. [PMID: 33948898 PMCID: PMC8566325 DOI: 10.1007/s12021-021-09525-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/04/2021] [Indexed: 01/05/2023]
Abstract
The field of neuroimaging can greatly benefit from building machine learning models to detect and predict diseases, and discover novel biomarkers, but much of the data collected at various organizations and research centers is unable to be shared due to privacy or regulatory concerns (especially for clinical data or rare disorders). In addition, aggregating data across multiple large studies results in a huge amount of duplicated technical debt and the resources required can be challenging or impossible for an individual site to build. Training on the data distributed across organizations can result in models that generalize much better than models trained on data from any of organizations alone. While there are approaches for decentralized sharing, these often do not provide the highest possible guarantees of sample privacy that only cryptography can provide. In addition, such approaches are often focused on probabilistic solutions. In this paper, we propose an approach that leverages the potential of datasets spread among a number of data collecting organizations by performing joint analyses in a secure and deterministic manner when only encrypted data is shared and manipulated. The approach is based on secure multiparty computation which refers to cryptographic protocols that enable distributed computation of a function over distributed inputs without revealing additional information about the inputs. It enables multiple organizations to train machine learning models on their joint data and apply the trained models to encrypted data without revealing their sensitive data to the other parties. In our proposed approach, organizations (or sites) securely collaborate to build a machine learning model as it would have been trained on the aggregated data of all the organizations combined. Importantly, the approach does not require a trusted party (i.e. aggregator), each contributing site plays an equal role in the process, and no site can learn individual data of any other site. We demonstrate effectiveness of the proposed approach, in a range of empirical evaluations using different machine learning algorithms including logistic regression and convolutional neural network models on human structural and functional magnetic resonance imaging datasets.
Collapse
|
4
|
Abrol A, Fu Z, Salman M, Silva R, Du Y, Plis S, Calhoun V. Deep learning encodes robust discriminative neuroimaging representations to outperform standard machine learning. Nat Commun 2021; 12:353. [PMID: 33441557 PMCID: PMC7806588 DOI: 10.1038/s41467-020-20655-6] [Citation(s) in RCA: 86] [Impact Index Per Article: 28.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Accepted: 12/09/2020] [Indexed: 12/27/2022] Open
Abstract
Recent critical commentaries unfavorably compare deep learning (DL) with standard machine learning (SML) approaches for brain imaging data analysis. However, their conclusions are often based on pre-engineered features depriving DL of its main advantage — representation learning. We conduct a large-scale systematic comparison profiled in multiple classification and regression tasks on structural MRI images and show the importance of representation learning for DL. Results show that if trained following prevalent DL practices, DL methods have the potential to scale particularly well and substantially improve compared to SML methods, while also presenting a lower asymptotic complexity in relative computational time, despite being more complex. We also demonstrate that DL embeddings span comprehensible task-specific projection spectra and that DL consistently localizes task-discriminative brain biomarkers. Our findings highlight the presence of nonlinearities in neuroimaging data that DL can exploit to generate superior task-discriminative representations for characterizing the human brain. Recent critical commentaries unfavorably compare deep learning (DL) with standard machine learning (SML) for brain imaging data analysis. Here, the authors show that if trained following prevalent DL practices, DL methods substantially improve compared to SML methods by encoding robust discriminative brain representations.
Collapse
Affiliation(s)
- Anees Abrol
- Tri-institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, Emory University, Atlanta, GA, USA.
| | - Zening Fu
- Tri-institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, Emory University, Atlanta, GA, USA
| | - Mustafa Salman
- Tri-institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, Emory University, Atlanta, GA, USA.,School of Electrical & Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Rogers Silva
- Tri-institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, Emory University, Atlanta, GA, USA
| | - Yuhui Du
- Tri-institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, Emory University, Atlanta, GA, USA.,School of Computer & Information Technology, Shanxi University, Taiyuan, China
| | - Sergey Plis
- Tri-institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, Emory University, Atlanta, GA, USA
| | - Vince Calhoun
- Tri-institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, Emory University, Atlanta, GA, USA.,School of Electrical & Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| |
Collapse
|
5
|
A Comparative Analysis of Machine Learning classifiers for Dysphonia-based classification of Parkinson’s Disease. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2020. [DOI: 10.1007/s41060-020-00234-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
6
|
Rokham H, Pearlson G, Abrol A, Falakshahi H, Plis S, Calhoun VD. Addressing Inaccurate Nosology in Mental Health: A Multilabel Data Cleansing Approach for Detecting Label Noise From Structural Magnetic Resonance Imaging Data in Mood and Psychosis Disorders. BIOLOGICAL PSYCHIATRY. COGNITIVE NEUROSCIENCE AND NEUROIMAGING 2020; 5:819-832. [PMID: 32771180 PMCID: PMC7760893 DOI: 10.1016/j.bpsc.2020.05.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Revised: 05/04/2020] [Accepted: 05/06/2020] [Indexed: 10/24/2022]
Abstract
BACKGROUND Mental health diagnostic approaches are seeking to identify biological markers to work alongside advanced machine learning approaches. It is difficult to identify a biological marker of disease when the traditional diagnostic labels themselves are not necessarily valid. METHODS We worked with T1 structural magnetic resonance imaging data collected from 1493 individuals comprising healthy control subjects, patients with psychosis, and their unaffected first-degree relatives. Specifically, the dataset included 176 bipolar disorder probands, 134 schizoaffective disorder probands, 240 schizophrenia probands, 362 control subjects, and 581 patient relatives. We assumed that there might be noise in the diagnostic labeling process. We detected label noise by classifying the data multiple times using a support vector machine classifier, and then we flagged those individuals in which all classifiers unanimously mislabeled those subjects. Next, we assigned a new diagnostic label to these individuals, based on the biological data (magnetic resonance imaging), using an iterative data cleansing approach. RESULTS Simulation results showed that our method was highly accurate in identifying label noise. Both diagnostic and biotype categories showed about 65% and 63% of noisy labels, respectively, with the largest amount of relabeling occurring between the healthy control subjects and individuals with bipolar disorder and schizophrenia as well as in unaffected close relatives. The extraction of imaging features highlighted regional brain changes associated with each group. CONCLUSIONS This approach represents an initial step toward developing strategies that need not assume that existing mental health diagnostic categories are always valid but rather allows us to leverage this information while also acknowledging that there are misassignments.
Collapse
Affiliation(s)
- Hooman Rokham
- Department of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, Georgia; Tri-institutional Center of Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, and Emory University, Atlanta, Georgia.
| | - Godfrey Pearlson
- Department of Psychiatry, Yale University, New Haven, Connecticut; Department of Neuroscience, Yale University, New Haven, Connecticut; Olin Neuropsychiatry Research Center, Hartford Hospital, Hartford, Connecticut
| | - Anees Abrol
- Tri-institutional Center of Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, and Emory University, Atlanta, Georgia; Department of Computer Science, Georgia State University, Atlanta, Georgia
| | - Haleh Falakshahi
- Department of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, Georgia; Tri-institutional Center of Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, and Emory University, Atlanta, Georgia
| | - Sergey Plis
- Tri-institutional Center of Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, and Emory University, Atlanta, Georgia; Department of Computer Science, Georgia State University, Atlanta, Georgia
| | - Vince D Calhoun
- Department of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, Georgia; Tri-institutional Center of Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, and Emory University, Atlanta, Georgia; Department of Computer Science, Georgia State University, Atlanta, Georgia; Department of Psychology, Georgia State University, Atlanta, Georgia; Department of Psychiatry, Yale University, New Haven, Connecticut
| |
Collapse
|