1
|
Qu G, Orlichenko A, Wang J, Zhang G, Xiao L, Zhang K, Wilson TW, Stephen JM, Calhoun VD, Wang YP. Interpretable Cognitive Ability Prediction: A Comprehensive Gated Graph Transformer Framework for Analyzing Functional Brain Networks. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:1568-1578. [PMID: 38109241 PMCID: PMC11090410 DOI: 10.1109/tmi.2023.3343365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2023]
Abstract
Graph convolutional deep learning has emerged as a promising method to explore the functional organization of the human brain in neuroscience research. This paper presents a novel framework that utilizes the gated graph transformer (GGT) model to predict individuals' cognitive ability based on functional connectivity (FC) derived from fMRI. Our framework incorporates prior spatial knowledge and uses a random-walk diffusion strategy that captures the intricate structural and functional relationships between different brain regions. Specifically, our approach employs learnable structural and positional encodings (LSPE) in conjunction with a gating mechanism to efficiently disentangle the learning of positional encoding (PE) and graph embeddings. Additionally, we utilize the attention mechanism to derive multi-view node feature embeddings and dynamically distribute propagation weights between each node and its neighbors, which facilitates the identification of significant biomarkers from functional brain networks and thus enhances the interpretability of the findings. To evaluate our proposed model in cognitive ability prediction, we conduct experiments on two large-scale brain imaging datasets: the Philadelphia Neurodevelopmental Cohort (PNC) and the Human Connectome Project (HCP). The results show that our approach not only outperforms existing methods in prediction accuracy but also provides superior explainability, which can be used to identify important FCs underlying cognitive behaviors.
Collapse
|
2
|
Laha N, Huey N, Coull B, Mukherjee R. On statistical inference with high-dimensional sparse CCA. INFORMATION AND INFERENCE : A JOURNAL OF THE IMA 2023; 12:iaad040. [PMID: 37982049 PMCID: PMC10656287 DOI: 10.1093/imaiai/iaad040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Revised: 04/12/2023] [Accepted: 09/08/2023] [Indexed: 11/21/2023]
Abstract
We consider asymptotically exact inference on the leading canonical correlation directions and strengths between two high-dimensional vectors under sparsity restrictions. In this regard, our main contribution is developing a novel representation of the Canonical Correlation Analysis problem, based on which one can operationalize a one-step bias correction on reasonable initial estimators. Our analytic results in this regard are adaptive over suitable structural restrictions of the high-dimensional nuisance parameters, which, in this set-up, correspond to the covariance matrices of the variables of interest. We further supplement the theoretical guarantees behind our procedures with extensive numerical studies.
Collapse
Affiliation(s)
- Nilanjana Laha
- Department of Statistics, Texas A&M, College Station, TX 77843, USA
| | - Nathan Huey
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
| | - Brent Coull
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
| | - Rajarshi Mukherjee
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
| |
Collapse
|
3
|
Mandal A, Maji P. Multiview Regularized Discriminant Canonical Correlation Analysis: Sequential Extraction of Relevant Features From Multiblock Data. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:5497-5509. [PMID: 35417362 DOI: 10.1109/tcyb.2022.3155875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
One of the important issues associated with real-life high-dimensional data analysis is how to extract significant and relevant features from multiview data. The multiset canonical correlation analysis (MCCA) is a well-known statistical method for multiview data integration. It finds a linear subspace that maximizes the correlations among different views. However, the existing methods to find the multiset canonical variables are computationally very expensive, which restricts the application of the MCCA in real-life big data analysis. The covariance matrix of each high-dimensional view may also suffer from the singularity problem due to the limited number of samples. Moreover, the MCCA-based existing feature extraction algorithms are, in general, unsupervised in nature. In this regard, a new supervised feature extraction algorithm is proposed, which integrates multimodal multidimensional data sets by solving maximal correlation problem of the MCCA. A new block matrix representation is introduced to reduce the computational complexity for computing the canonical variables of the MCCA. The analytical formulation enables efficient computation of the multiset canonical variables under supervised ridge regression optimization technique. It deals with the "curse of dimensionality" problem associated with high-dimensional data and facilitates the sequential generation of relevant features with significantly lower computational cost. The effectiveness of the proposed multiblock data integration algorithm, along with a comparison with other existing methods, is demonstrated on several benchmark and real-life cancer data.
Collapse
|
4
|
Song X, Li R, Wang K, Bai Y, Xiao Y, Wang YP. Joint Sparse Collaborative Regression on Imaging Genetics Study of Schizophrenia. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1137-1146. [PMID: 35503837 PMCID: PMC10321021 DOI: 10.1109/tcbb.2022.3172289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The imaging genetics approach generates large amount of high dimensional and multi-modal data, providing complementary information for comprehensive study of Schizophrenia, a complex mental disease. However, at the same time, the variety of these data in structures, resolutions, and formats makes their integrative study a forbidding task. In this paper, we propose a novel model called Joint Sparse Collaborative Regression (JSCoReg), which can extract class-specific features from different health conditions/disease classes. We first evaluate the performance of feature selection in terms of Receiver operating characteristic curve and the area under the ROC curve in the simulation experiment. We demonstrate that the JSCoReg model can achieve higher accuracy compared with similar models including Joint Sparse Canonical Correlation Analysis and Sparse Collaborative Regression. We then applied the JSCoReg model to the analysis of schizophrenia dataset collected from the Mind Clinical Imaging Consortium. The JSCoReg enables us to better identify biomarkers associated with schizophrenia, which are verified to be both biologically and statistically significant.
Collapse
Affiliation(s)
- Xueli Song
- School of Sciences, Chang’an University, Xi’an, 710064, China
| | - Rongpeng Li
- School of Sciences, Chang’an University, Xi’an, 710064, China
| | - Kaiming Wang
- School of Sciences, Chang’an University, Xi’an, 710064, China
| | - Yuntong Bai
- Biomedical Engineering Department, Tulane University, New Orleans, LA 70118, USA
| | - Yuzhu Xiao
- School of Sciences, Chang’an University, Xi’an, 710064, China
| | - Yu-ping Wang
- Biomedical Engineering Department, Tulane University, New Orleans, LA 70118, USA
| |
Collapse
|
5
|
Wang S, Zheng K, Kong W, Huang R, Liu L, Wen G, Yu Y. Multimodal data fusion based on IGERNNC algorithm for detecting pathogenic brain regions and genes in Alzheimer's disease. Brief Bioinform 2023; 24:6887308. [PMID: 36502428 DOI: 10.1093/bib/bbac515] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2022] [Revised: 09/28/2022] [Accepted: 10/30/2022] [Indexed: 12/14/2022] Open
Abstract
At present, the study on the pathogenesis of Alzheimer's disease (AD) by multimodal data fusion analysis has been attracted wide attention. It often has the problems of small sample size and high dimension with the multimodal medical data. In view of the characteristics of multimodal medical data, the existing genetic evolution random neural network cluster (GERNNC) model combine genetic evolution algorithm and neural network for the classification of AD patients and the extraction of pathogenic factors. However, the model does not take into account the non-linear relationship between brain regions and genes and the problem that the genetic evolution algorithm can fall into local optimal solutions, which leads to the overall performance of the model is not satisfactory. In order to solve the above two problems, this paper made some improvements on the construction of fusion features and genetic evolution algorithm in GERNNC model, and proposed an improved genetic evolution random neural network cluster (IGERNNC) model. The IGERNNC model uses mutual information correlation analysis method to combine resting-state functional magnetic resonance imaging data with single nucleotide polymorphism data for the construction of fusion features. Based on the traditional genetic evolution algorithm, elite retention strategy and large variation genetic algorithm are added to avoid the model falling into the local optimal solution. Through multiple independent experimental comparisons, the IGERNNC model can more effectively identify AD patients and extract relevant pathogenic factors, which is expected to become an effective tool in the field of AD research.
Collapse
Affiliation(s)
- Shuaiqun Wang
- School of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Kai Zheng
- School of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Wei Kong
- School of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Ruiwen Huang
- School of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Lulu Liu
- School of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Gen Wen
- School of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Yaling Yu
- School of Information Engineering, Shanghai Maritime University, Shanghai, China
| |
Collapse
|
6
|
Zhang Y, Zhang H, Xiao L, Bai Y, Calhoun VD, Wang YP. Multi-Modal Imaging Genetics Data Fusion via a Hypergraph-Based Manifold Regularization: Application to Schizophrenia Study. IEEE TRANSACTIONS ON MEDICAL IMAGING 2022; 41:2263-2272. [PMID: 35320094 PMCID: PMC9661879 DOI: 10.1109/tmi.2022.3161828] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Recent studies show that multi-modal data fusion techniques combine information from diverse sources for comprehensive diagnosis and prognosis of complex brain disorder, often resulting in improved accuracy compared to single-modality approaches. However, many existing data fusion methods extract features from homogeneous networs, ignoring heterogeneous structural information among multiple modalities. To this end, we propose a Hypergraph-based Multi-modal data Fusion algorithm, namely HMF. Specifically, we first generate a hypergraph similarity matrix to represent the high-order relationships among subjects, and then enforce the regularization term based upon both the inter- and intra-modality relationships of the subjects. Finally, we apply HMF to integrate imaging and genetics datasets. Validation of the proposed method is performed on both synthetic data and real samples from schizophrenia study. Results show that our algorithm outperforms several competing methods, and reveals significant interactions among risk genes, environmental factors and abnormal brain regions.
Collapse
|
7
|
Wang S, Chen H, Kong W, Ke F, Wei K. Identify Biomarkers of Alzheimer's Disease Based on Multi-task Canonical Correlation Analysis and Regression Model. J Mol Neurosci 2022; 72:1749-1763. [PMID: 35698015 DOI: 10.1007/s12031-022-02031-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Accepted: 05/21/2022] [Indexed: 11/29/2022]
Abstract
Imaging genetics using imaging technology is regarded as a neuroanatomical phenotype to evaluate gene single nucleotide polymorphisms and their effects on the structure and function of different brain regions. It plays a vital role in bridging the initial understanding of the genetic basis of brain structure and dysfunction. Sparse canonical correlation analysis (SCCA) has become a widespread technique in this field because of its powerful ability to identify bivariate relationships and feature selection. Since most traditional SCCA algorithms assume that the input features are independent, this method obviously cannot be used to analyze genetic image data. The MT-SCCA model is unsupervised and cannot identify the genotype-phenotype associations for diagnostic guidance. Meanwhile, a single biological clinical index cannot fully reflect the physiological process of a comprehensive disease. Therefore, it is necessary to find biomarkers that can reflect Alzheimer's disease and physiological functions that can more comprehensively reflect the development of the disease. This article uses a multi-task sparse canonical correlation analysis and regression (MT-SCCAR) model to combine the annual depression level total score (GDSCALE), clinical dementia assessment scale (GLOBAL CDR), functional activity questionnaire (FAQ), and neuropsychiatric Symptom Questionnaire (NPI-Q) in this paper. These four clinical data are used as compensation information and embedded in the algorithm in a linear regression manner. It also reflects its superiority and robustness compared to traditional correlation analysis methods on actual and simulated data. Meanwhile, compared with MT-SCCA, the model utilized in this paper obtains a higher gene-ROI weight and identifies clearer biomarkers, which provides a practical basis for the study of complex human disease pathology.
Collapse
Affiliation(s)
- Shuaiqun Wang
- College of Information Engineering, Shanghai Maritime University, 1550 Haigang Ave, Shanghai, 201306, People's Republic of China.
| | - Huiqiu Chen
- College of Information Engineering, Shanghai Maritime University, 1550 Haigang Ave, Shanghai, 201306, People's Republic of China
| | - Wei Kong
- College of Information Engineering, Shanghai Maritime University, 1550 Haigang Ave, Shanghai, 201306, People's Republic of China
| | - Fengchun Ke
- College of Information Engineering, Shanghai Maritime University, 1550 Haigang Ave, Shanghai, 201306, People's Republic of China
| | - Kai Wei
- College of Information Engineering, Shanghai Maritime University, 1550 Haigang Ave, Shanghai, 201306, People's Republic of China
| |
Collapse
|
8
|
Wang W, Kong W, Wang S, Wei K. Detecting Biomarkers of Alzheimer's Disease Based on Multi-constrained Uncertainty-Aware Adaptive Sparse Multi-view Canonical Correlation Analysis. J Mol Neurosci 2022; 72:841-865. [PMID: 35080765 DOI: 10.1007/s12031-021-01963-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 12/29/2021] [Indexed: 12/01/2022]
Abstract
Image genetics mainly explores the pathogenesis of Alzheimer's disease (AD) by studying the relationship between genetic data (such as SNP, gene expression data, and DNA methylation) and imaging data (such as structural MRI (sMRI), fMRI, and PET). Most of the existing research on brain imaging genomics uses two-way or three-way bi-multivariate methods to explore the correlation analysis between genes and brain imaging. However, many of these methods are still affected by the gradient domination or cannot take into account the effect of feature redundancy on the results, so that the typical correlation coefficient and program running speed are not significantly improved. In order to solve the above problems, this paper proposes a multi-constrained uncertainty-aware adaptive sparse multi-view canonical correlation analysis method (MC-unAdaSMCCA) to explore associations among SNPs, gene expression data, and sMRI; that is, based on traditional unAdaSMCCA, orthogonal constraints are imposed on the weights of the three data features through linear programming, which can reduce the redundancy of feature weights to improve the correlation between the data and reduce the complexity of the algorithm to significantly speed up the running speed of the program. Three adaptive sparse multi-view canonical correlation analysis methods are used as benchmarks to evaluate the difference between real neuroimaging data and synthetic data. Compared with the other three methods, our proposed method has obtained better or comparable typical correlation coefficients and typical weights. Moreover, the following experimental results show that the MC-unAdaSMCCA method cannot only identify biomarkers related to AD and mild cognitive impairment (MCI), but also has a strong ability to resist noise and process high-dimensional data. Therefore, our proposed method provides a reliable approach to multi-modal imaging genetic researches.
Collapse
Affiliation(s)
- Wenbo Wang
- College of Information Engineering, Shanghai Maritime University, 1550 Haigang Ave., Shanghai, 201306, People's Republic of China
| | - Wei Kong
- College of Information Engineering, Shanghai Maritime University, 1550 Haigang Ave., Shanghai, 201306, People's Republic of China.
| | - Shuaiqun Wang
- College of Information Engineering, Shanghai Maritime University, 1550 Haigang Ave., Shanghai, 201306, People's Republic of China
| | - Kai Wei
- College of Information Engineering, Shanghai Maritime University, 1550 Haigang Ave., Shanghai, 201306, People's Republic of China
| |
Collapse
|
9
|
Peng P, Zhang Y, Ju Y, Wang K, Li G, Calhoun VD, Wang YP. Group Sparse Joint Non-Negative Matrix Factorization on Orthogonal Subspace for Multi-Modal Imaging Genetics Data Analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:479-490. [PMID: 32750856 PMCID: PMC7758677 DOI: 10.1109/tcbb.2020.2999397] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
With the development of multi-model neuroimaging technology and gene detection technology, the efforts of integrating multi-model imaging genetics data to explore the virulence factors of schizophrenia (SZ) are still limited. To address this issue, we propose a novel algorithm called group sparse of joint non-negative matrix factorization on orthogonal subspace (GJNMFO). Our algorithm fuses single nucleotide polymorphism (SNP) data, function magnetic resonance imaging (fMRI) data and epigenetic factors (DNA methylation) by projecting three-model data into a common basis matrix and three different coefficient matrices to identify risk genes, epigenetic factors and abnormal brain regions associated with SZ. Specifically, we introduce orthogonal constraints on the basis matrix to discard unimportant features in the row of coefficient matrices. Since imaging genetics data have rich group information, we draw into group sparse on three coefficient matrices to make the extracted features more accurate. Both the simulated and real Mind Clinical Imaging Consortium (MCIC) datasets are performed to validate our approach. Simulation results show that our algorithm works better than other competing methods. Through the experiments of MCIC datasets, GJNMFO reveals a set of risk genes, epigenetic factors and abnormal brain functional regions, which have been verified to be both statistically and biologically significant.
Collapse
|
10
|
Zhang A, Fang J, Hu W, Calhoun VD, Wang YP. A Latent Gaussian Copula Model for Mixed Data Analysis in Brain Imaging Genetics. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1350-1360. [PMID: 31689199 PMCID: PMC7756188 DOI: 10.1109/tcbb.2019.2950904] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Recent advances in imaging genetics make it possible to combine different types of data including medical images like functional magnetic resonance imaging (fMRI) and genetic data like single nucleotide polymorphisms (SNPs) for comprehensive diagnosis of mental disorders. Understanding complex interactions among these heterogeneous data may give rise to a new perspective, while at the same time demand statistical models for their integration. Various graphical models have been proposed for the study of interaction or association networks with continuous, binary, and count data as well as the mixture of them. However, limited efforts have been made for the multinomial case, for instance, SNP data. Our goal is therefore to fill the void by developing a graphical model for the integration of fMRI image and SNP data, which can provide deeper understanding of the unknown neurogenetic mechanism. In this article, we propose a latent Gaussian copula model for mixed data containing multinomial components. We assume that the discrete variable is obtained by discretizing a latent (unobserved) continuous variable and then create a semi-rank based estimator of the graph structure. The simulation results demonstrate that the proposed latent correlation has more steady and accurate performance than several existing methods in detecting graph structure. When applying to a real schizophrenia data consisting of SNP array and fMRI image collected by the Mind Clinical Imaging Consortium (MCIC), the proposed method reveals a set of distinct SNP-brain associations, which are verified to be biologically significant. The proposed model is statistically promising in handling mixed types of data including multinomial components, which can find widespread applications. To promote reproducible research, the R code is available at https://github.com/Aiying0512/LGCM.
Collapse
|
11
|
Wang M, Shao W, Hao X, Shen L, Zhang D. Identify Consistent Cross-Modality Imaging Genetic Patterns via Discriminant Sparse Canonical Correlation Analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1549-1561. [PMID: 31581090 DOI: 10.1109/tcbb.2019.2944825] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Sparse canonical correlation analysis (SCCA) is a bi-multivariate technique used in imaging genetics to identify complex multi-SNP-multi-QT associations. However, the traditional SCCA algorithm has been designed to seek a linear correlation between the SNP genotype and brain imaging phenotype, ignoring the discriminant similarity information between within-class subjects in brain imaging genetics association analysis. In addition, multi-modality brain imaging phenotypes are extracted from different perspectives and imaging markers from the same region consistently showing up in multimodalities may provide more insights for the mechanistic understanding of diseases. In this paper, a novel multi-modality discriminant SCCA algorithm (MD-SCCA) is proposed to overcome these limitations as well as to improve learning results by incorporating valuable discriminant similarity information into the SCCA algorithm. Specifically, we first extract the discriminant similarity information between within-class subjects by the sparse representation. Second, the discriminant similarity information is enforced within SCCA to construct a discriminant SCCA algorithm (D-SCCA). At last, the MD-SCCA algorithm is adopted to fully explore the relationships among different modalities of different subjects. In experiments, both synthetic dataset and real data from the Alzheimer's Disease Neuroimaging Initiative database are used to test the performance of our algorithm. The empirical results have demonstrated that the proposed algorithm not only produces improved cross-validation performances but also identifies consistent cross-modality imaging genetic biomarkers.
Collapse
|
12
|
Wang M, Shao W, Hao X, Zhang D. Identify Complex Imaging Genetic Patterns via Fusion Self-Expressive Network Analysis. IEEE TRANSACTIONS ON MEDICAL IMAGING 2021; 40:1673-1686. [PMID: 33661732 DOI: 10.1109/tmi.2021.3063785] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In the brain imaging genetic studies, it is a challenging task to estimate the association between quantitative traits (QTs) extracted from neuroimaging data and genetic markers such as single-nucleotide polymorphisms (SNPs). Most of the existing association studies are based on the extensions of sparse canonical correlation analysis (SCCA) for the identification of complex bi-multivariate associations, which can take the specific structure and group information into consideration. However, they often take the original data as input without considering its underlying complex multi-subspace structure, which will deteriorate the performance of the following integrative analysis. Accordingly, in this paper, the self-expressive property is exploited for the reconstruction of the original data before the association analysis, which can well describe the similarity structure. Specifically, we first apply the within-class similarity information to construct self-expressive networks by sparse representation. Then, we use the fusion method to iteratively fuse the self-expressive networks from multi-modality brain phenotypes into one network. Finally, we calculate the imaging genetic association based on the fused self-expressive network. We conduct the experiments on both single-modality and multi-modality phenotype data. Related experimental results validate that our method can not only better estimate the potential association between genetic markers and quantitative traits but also identify consistent multi-modality imaging genetic biomarkers to guide the interpretation of Alzheimer's disease.
Collapse
|
13
|
Hu W, Meng X, Bai Y, Zhang A, Qu G, Cai B, Zhang G, Wilson TW, Stephen JM, Calhoun VD, Wang YP. Interpretable Multimodal Fusion Networks Reveal Mechanisms of Brain Cognition. IEEE TRANSACTIONS ON MEDICAL IMAGING 2021; 40:1474-1483. [PMID: 33556002 PMCID: PMC8208525 DOI: 10.1109/tmi.2021.3057635] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
The combination of multimodal imaging and genomics provides a more comprehensive way for the study of mental illnesses and brain functions. Deep network-based data fusion models have been developed to capture their complex associations, resulting in improved diagnosis of diseases. However, deep learning models are often difficult to interpret, bringing about challenges for uncovering biological mechanisms using these models. In this work, we develop an interpretable multimodal fusion model to perform automated diagnosis and result interpretation simultaneously. We name it Grad-CAM guided convolutional collaborative learning (gCAM-CCL), which is achieved by combining intermediate feature maps with gradient-based weights. The gCAM-CCL model can generate interpretable activation maps to quantify pixel-level contributions of the input features. Moreover, the estimated activation maps are class-specific, which can therefore facilitate the identification of biomarkers underlying different groups. We validate the gCAM-CCL model on a brain imaging-genetic study, and demonstrate its applications to both the classification of cognitive function groups and the discovery of underlying biological mechanisms. Specifically, our analysis results suggest that during task-fMRI scans, several object recognition related regions of interests (ROIs) are activated followed by several downstream encoding ROIs. In addition, the high cognitive group may have stronger neurotransmission signaling while the low cognitive group may have problems in brain/neuron development due to genetic variations.
Collapse
|
14
|
Bai Y, Gong Y, Bai J, Liu J, Deng HW, Calhoun V, Wang YP. A Joint Analysis of Multi-Paradigm fMRI Data With Its Application to Cognitive Study. IEEE TRANSACTIONS ON MEDICAL IMAGING 2021; 40:951-962. [PMID: 33284749 PMCID: PMC7925383 DOI: 10.1109/tmi.2020.3042786] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
With the development of neuroimaging techniques, a growing amount of multi-modal brain imaging data are collected, facilitating comprehensive study of the brain. In this paper, we jointly analyzed functional magnetic resonance imaging (fMRI) collected under different paradigms in order to understand cognitive behaviors of an individual. To this end, we proposed a novel multi-view learning algorithm called structure-enforced collaborative regression (SCoRe) to extract co-expressed discriminative brain regions under the guidance of anatomical structure of the brain. An advantage of SCoRe over its predecessor collaborative regression (CoRe) lies in its incorporation of group structures in the brain imaging data, which makes the model biologically more meaningful. Results from real data analysis has confirmed that by incorporating prior knowledge of brain structure, SCoRe can deliver better prediction performance and is less sensitive to hyper-parameters than CoRe. After validation with simulation experiments, we applied SCoRe to fMRI data collected from the Philadelphia Neurodevelopmental Cohort and adopted the scores from the wide range achievement test (WRAT) to evaluate an individual's cognitive skills. We located 14 relevant brain regions that can efficiently predict WRAT scores and these brain regions were further confirmed by other independent studies.
Collapse
|
15
|
Wang M, Huang TZ, Fang J, Calhoun VD, Wang YP. Integration of Imaging (epi)Genomics Data for the Study of Schizophrenia Using Group Sparse Joint Nonnegative Matrix Factorization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1671-1681. [PMID: 30762565 PMCID: PMC7781159 DOI: 10.1109/tcbb.2019.2899568] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Schizophrenia (SZ) is a complex disease. Single nucleotide polymorphism (SNP), brain activity measured by functional magnetic resonance imaging (fMRI) and DNA methylation are all important biomarkers that can be used for the study of SZ. To our knowledge, there has been little effort to combine these three datasets together. In this study, we propose a group sparse joint nonnegative matrix factorization (GSJNMF) model to integrate SNP, fMRI, and DNA methylation for the identification of multi-dimensional modules associated with SZ, which can be used to study regulatory mechanisms underlying SZ at multiple levels. The proposed GSJNMF model projects multiple types of data onto a common feature space, in which heterogeneous variables with large coefficients on the same projected bases are used to identify multi-dimensional modules. We also incorporate group structure information available from each dataset. The genomic factors in such modules have significant correlations or functional associations with several brain activities. At the end, we have applied the method to the analysis of real data collected from the Mind Clinical Imaging Consortium (MCIC) for the study of SZ and identified significant biomarkers. These biomarkers were further used to discover genes and corresponding brain regions, which were confirmed to be significantly associated with SZ.
Collapse
Affiliation(s)
- Min Wang
- School of Mathematical Sciences/Research Center for Image and Vision Computing, University of Electronic Science and Technology of China, Chengdu, Sichuan, 611731, China
- School of Information Technology, Jiangxi University of Finance and Economics, Nanchang, Jiangxi, 330013, China
| | - Ting-Zhu Huang
- School of Mathematical Sciences/Research Center for Image and Vision Computing, University of Electronic Science and Technology of China, Chengdu, Sichuan, 611731, China
| | - Jian Fang
- Department of Biomedical Engineering, Tulane University, New Orleans, LA 70118, USA
| | - Vince D. Calhoun
- The Mind Research Network, University of New Mexico, NM 87131, USA
| | - Yu-Ping Wang
- Department of Biomedical Engineering, Tulane University, New Orleans, LA 70118, USA
- Corresponding author.
| |
Collapse
|
16
|
Zhuang X, Yang Z, Cordes D. A technical review of canonical correlation analysis for neuroscience applications. Hum Brain Mapp 2020; 41:3807-3833. [PMID: 32592530 PMCID: PMC7416047 DOI: 10.1002/hbm.25090] [Citation(s) in RCA: 63] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Accepted: 05/23/2020] [Indexed: 12/11/2022] Open
Abstract
Collecting comprehensive data sets of the same subject has become a standard in neuroscience research and uncovering multivariate relationships among collected data sets have gained significant attentions in recent years. Canonical correlation analysis (CCA) is one of the powerful multivariate tools to jointly investigate relationships among multiple data sets, which can uncover disease or environmental effects in various modalities simultaneously and characterize changes during development, aging, and disease progressions comprehensively. In the past 10 years, despite an increasing number of studies have utilized CCA in multivariate analysis, simple conventional CCA dominates these applications. Multiple CCA-variant techniques have been proposed to improve the model performance; however, the complicated multivariate formulations and not well-known capabilities have delayed their wide applications. Therefore, in this study, a comprehensive review of CCA and its variant techniques is provided. Detailed technical formulation with analytical and numerical solutions, current applications in neuroscience research, and advantages and limitations of each CCA-related technique are discussed. Finally, a general guideline in how to select the most appropriate CCA-related technique based on the properties of available data sets and particularly targeted neuroscience questions is provided.
Collapse
Affiliation(s)
- Xiaowei Zhuang
- Cleveland Clinic Lou Ruvo Center for Brain HealthLas VegasNevadaUSA
| | - Zhengshi Yang
- Cleveland Clinic Lou Ruvo Center for Brain HealthLas VegasNevadaUSA
| | - Dietmar Cordes
- Cleveland Clinic Lou Ruvo Center for Brain HealthLas VegasNevadaUSA
- University of ColoradoBoulderColoradoUSA
- Department of Brain HealthUniversity of NevadaLas VegasNevadaUSA
| |
Collapse
|
17
|
Wheater ENW, Stoye DQ, Cox SR, Wardlaw JM, Drake AJ, Bastin ME, Boardman JP. DNA methylation and brain structure and function across the life course: A systematic review. Neurosci Biobehav Rev 2020; 113:133-156. [PMID: 32151655 PMCID: PMC7237884 DOI: 10.1016/j.neubiorev.2020.03.007] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2019] [Revised: 03/03/2020] [Accepted: 03/05/2020] [Indexed: 01/01/2023]
Abstract
MRI has enhanced our capacity to understand variations in brain structure and function conferred by the genome. We identified 60 studies that report associations between DNA methylation (DNAm) and human brain structure/function. Forty-three studies measured candidate loci DNAm; seventeen measured epigenome-wide DNAm. MRI features included region-of-interest and whole-brain structural, diffusion and functional imaging features. The studies report DNAm-MRI associations for: neurodevelopment and neurodevelopmental disorders; major depression and suicidality; alcohol use disorder; schizophrenia and psychosis; ageing, stroke, ataxia and neurodegeneration; post-traumatic stress disorder; and socio-emotional processing. Consistency between MRI features and differential DNAm is modest. Sources of bias: variable inclusion of comparator groups; different surrogate tissues used; variation in DNAm measurement methods; lack of control for genotype and cell-type composition; and variations in image processing. Knowledge of MRI features associated with differential DNAm may improve understanding of the role of DNAm in brain health and disease, but caution is required because conventions for linking DNAm and MRI data are not established, and clinical and methodological heterogeneity in existing literature is substantial.
Collapse
Affiliation(s)
- Emily N W Wheater
- Medical Research Council Centre for Reproductive Health, University of Edinburgh, United Kingdom
| | - David Q Stoye
- Medical Research Council Centre for Reproductive Health, University of Edinburgh, United Kingdom
| | - Simon R Cox
- Department of Psychology, University of Edinburgh, United Kingdom
| | - Joanna M Wardlaw
- Centre for Clinical Brain Sciences, University of Edinburgh, United Kingdom
| | - Amanda J Drake
- University/British Heart Foundation Centre for Cardiovascular Science, University of Edinburgh, United Kingdom
| | - Mark E Bastin
- Centre for Clinical Brain Sciences, University of Edinburgh, United Kingdom
| | - James P Boardman
- Medical Research Council Centre for Reproductive Health, University of Edinburgh, United Kingdom; Centre for Clinical Brain Sciences, University of Edinburgh, United Kingdom.
| |
Collapse
|
18
|
Zhang Y, Peng P, Ju Y, Li G, Calhoun VD, Wang YP. Canonical Correlation Analysis of Imaging Genetics Data Based on Statistical Independence and Structural Sparsity. IEEE J Biomed Health Inform 2020; 24:2621-2629. [PMID: 32071012 DOI: 10.1109/jbhi.2020.2972581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Current developments of neuroimaging and genetics promote an integrative and compressive study of schizophrenia. However, it is still difficult to explore how gene mutations are related to brain abnormalities due to the high dimension but low sample size of these data. Conventional approaches reduce the dimension of dataset separately and then calculate the correlation, but ignore the effects of the response variables and the structure of data. To improve the identification of risk genes and abnormal brain regions on schizophrenia, in this paper, we propose a novel method called Independence and Structural sparsity Canonical Correlation Analysis (ISCCA). ISCCA combines independent component analysis (ICA) and Canonical Correlation Analysis (CCA) to reduce the collinear effects, which also incorporate graph structure of the data into the model to improve the accuracy of feature selection. The results from simulation studies demonstrate its higher accuracy in discovering correlations compared with other competing methods. Moreover, applying ISCCA to a real imaging genetics dataset collected by Mind Clinical Imaging Consortium (MCIC), a set of distinct gene-ROI interactions are identified, which are verified to be both statistically and biologically significant.
Collapse
|
19
|
Bi XA, Hu X, Wu H, Wang Y. Multimodal Data Analysis of Alzheimer's Disease Based on Clustering Evolutionary Random Forest. IEEE J Biomed Health Inform 2020; 24:2973-2983. [PMID: 32071013 DOI: 10.1109/jbhi.2020.2973324] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Alzheimer's disease (AD) has become a severe medical challenge. Advances in technologies produced high-dimensional data of different modalities including functional magnetic resonance imaging (fMRI) and single nucleotide polymorphism (SNP). Understanding the complex association patterns among these heterogeneous and complementary data is of benefit to the diagnosis and prevention of AD. In this paper, we apply the appropriate correlation analysis method to detect the relationships between brain regions and genes, and propose "brain region-gene pairs" as the multimodal features of the sample. In addition, we put forward a novel data analysis method from technology aspect, cluster evolutionary random forest (CERF), which is suitable for "brain region-gene pairs". The idea of clustering evolution is introduced to improve the generalization performance of random forest which is constructed by randomly selecting samples and sample features. Through hierarchical clustering of decision trees in random forest, the decision trees with higher similarity are clustered into one class, and the decision trees with the best performance are retained to enhance the diversity between decision trees. Furthermore, based on CERF, we integrate feature construction, feature selection and sample classification to find the optimal combination of different methods, and design a comprehensive diagnostic framework for AD. The framework is validated by the samples with both fMRI and SNP data from ADNI. The results show that we can effectively identify AD patients and discover some brain regions and genes associated with AD significantly based on this framework. These findings are conducive to the clinical treatment and prevention of AD.
Collapse
|
20
|
Li G, Han D, Wang C, Hu W, Calhoun VD, Wang YP. Application of deep canonically correlated sparse autoencoder for the classification of schizophrenia. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2020; 183:105073. [PMID: 31525548 DOI: 10.1016/j.cmpb.2019.105073] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Revised: 08/27/2019] [Accepted: 09/06/2019] [Indexed: 06/10/2023]
Abstract
BACKGROUND AND OBJECTIVE Imaging genetics has been widely used to help diagnose and treat mental illness, e.g., schizophrenia, by combining magnetic resonance imaging of the brain and genomic information for comprehensive and systematic analysis. As a result, utilizing the correlation between magnetic resonance imaging of the brain and genomic information is becoming an important challenge. METHODS In this paper, the joint analysis of single nucleotide polymorphisms and functional magnetic resonance imaging is conducted for comprehensive study of schizophrenia. We developed a deep canonically correlated sparse autoencoder to classify schizophrenia patients from healthy controls, which can address the limitation of many existing methods such as canonical correlation analysis, deep canonical correlation analysis and sparse autoencoder. RESULTS The proposed deep canonically correlated sparse autoencoder can not only use complex nonlinear transformation and dimension reduction, but also achieve more accurate classifications. Our experiments showed the proposed method achieved an accuracy of 95.65% for SNP data sets and an accuracy of 80.53% for fMRI data sets. CONCLUSIONS Experiments demonstrated higher accuracy of using the proposed method over other conventional models when classifying schizophrenia patients and healthy controls.
Collapse
Affiliation(s)
- Gang Li
- School of Electronic and Control Engineering, Chang'an University, Xi'an 710064, Shaanxi, China; Key Laboratory of Road Construction Technology and Equipment of MOE, Chang'an University, China.
| | - Depeng Han
- School of Electronic and Control Engineering, Chang'an University, Xi'an 710064, Shaanxi, China
| | - Chao Wang
- School of Electronic and Control Engineering, Chang'an University, Xi'an 710064, Shaanxi, China
| | - Wenxing Hu
- Biomedical Engineering Department, Tulane University, New Orleans, LA 70118, USA.
| | - Vince D Calhoun
- Mind Research Network and Department of ECE, University of New Mexico, Albuquerque, NM 87106, USA.
| | - Yu-Ping Wang
- Biomedical Engineering Department, Tulane University, New Orleans, LA 70118, USA.
| |
Collapse
|
21
|
Shen L, Thompson PM. Brain Imaging Genomics: Integrated Analysis and Machine Learning. PROCEEDINGS OF THE IEEE. INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS 2020; 108:125-162. [PMID: 31902950 PMCID: PMC6941751 DOI: 10.1109/jproc.2019.2947272] [Citation(s) in RCA: 76] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Brain imaging genomics is an emerging data science field, where integrated analysis of brain imaging and genomics data, often combined with other biomarker, clinical and environmental data, is performed to gain new insights into the phenotypic, genetic and molecular characteristics of the brain as well as their impact on normal and disordered brain function and behavior. It has enormous potential to contribute significantly to biomedical discoveries in brain science. Given the increasingly important role of statistical and machine learning in biomedicine and rapidly growing literature in brain imaging genomics, we provide an up-to-date and comprehensive review of statistical and machine learning methods for brain imaging genomics, as well as a practical discussion on method selection for various biomedical applications.
Collapse
Affiliation(s)
- Li Shen
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
| | - Paul M Thompson
- Imaging Genetics Center, Mark & Mary Stevens Institute for Neuroimaging & Informatics, Keck School of Medicine, University of Southern California, Los Angeles, CA 90232, USA
| |
Collapse
|
22
|
Deng J, Zeng W, Kong W, Shi Y, Mou X, Guo J. Multi-Constrained Joint Non-Negative Matrix Factorization With Application to Imaging Genomic Study of Lung Metastasis in Soft Tissue Sarcomas. IEEE Trans Biomed Eng 2019; 67:2110-2118. [PMID: 31751222 DOI: 10.1109/tbme.2019.2954989] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
OBJECTIVE The study of pathogenic mechanism at the genetic level by imaging genetics methods enables to effectively reveal the association of histopathology and genetics. However, there is a lack of effective and accurate tools to establish association models from macroscopic to microscopic. METHODS The multi-constrained joint non-negative matrix factorization (MCJNMF) was developed for simultaneous integration of genomic data and image data to identify common modules related to disease. Two types of data matrices were projected onto a common feature space, in which heterogeneous variables with large coefficients in the same projected direction form a common module. Meanwhile, the correlation between original data features was integrated by using regularization constraints to improve the biological relevance. Sparsity constraints and orthogonal constraints were performed on decomposition factors to minimize the redundancy between different bases and to reduce algorithm complexity. RESULTS This algorithm was successfully performed on the module identification of lung metastasis in soft tissue sarcomas (STSs) by integrating FDG-PET image and DNA methylation data features. Multilevel analysis on the top extracted modules revealed that these modules were closely related to the lung metastasis. Particularly, several genes with diagnostic potential for lung metastasis can be discovered from high score modules. CONCLUSION This method not only can be applied for the accurate identification of patterns related to pathogenic mechanism of diseases, but also has a significant implication for discovering protein biomarkers. SIGNIFICANCE This method provides avenues for further studies of identifying complex association patterns of diseases according to different types of biological data.
Collapse
|
23
|
Bai Y, Pascal Z, Hu W, Calhoun VD, Wang YP. Biomarker Identification Through Integrating fMRI and Epigenetics. IEEE Trans Biomed Eng 2019; 67:1186-1196. [PMID: 31395533 DOI: 10.1109/tbme.2019.2932895] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
OBJECTIVE Integration of multiple datasets is a hot topic in many fields. When studying complex mental disorders, great effort has been dedicated to fusing genetic and brain imaging data. However, an increasing number of studies have pointed out the importance of epigenetic factors in the cause of psychiatric diseases. In this study, we endeavor to fill the gap by combining epigenetics (e.g., DNA methylation) with imaging data (e.g., fMRI) to identify biomarkers for schizophrenia (SZ). METHODS We propose to combine linear regression with canonical correlation analysis (CCA) in a relaxed yet coupled manner to extract discriminative features for SZ that are co-expressed in the fMRI and DNA methylation data. RESULT After validation through simulations, we applied our method to real imaging epigenetics data of 184 subjects from the Mental Illness and Neuroscience Discovery Clinical Imaging Consortium. After significance test, we identified 14 brain regions and 44 cytosine-phosphate-guanine(CpG) sites. Average classification accuracy is [Formula: see text]. By linking the CpG sites to genes, we identified pathways Guanosine ribonucleotides de novo biosynthesis and Guanosine nucleotides de novo biosynthesis, and a GO term Perikaryon. CONCLUSION This imaging epigenetics study has identified both brain regions and genes that are associated with neuron development and memory processing. These biomarkers contribute to a good understanding of the mechanism underlying SZ but are overlooked by previous imaging genetics studies. SIGNIFICANCE Our study sheds light on the understanding and diagnosis of SZ with a imaging epigenetics approach, which is demonstrated to be effective in extracting novel biomarkers associated with SZ.
Collapse
|