1
|
Kale M, Wankhede N, Pawar R, Ballal S, Kumawat R, Goswami M, Khalid M, Taksande B, Upaganlawar A, Umekar M, Kopalli SR, Koppula S. AI-driven innovations in Alzheimer's disease: Integrating early diagnosis, personalized treatment, and prognostic modelling. Ageing Res Rev 2024; 101:102497. [PMID: 39293530 DOI: 10.1016/j.arr.2024.102497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 08/14/2024] [Accepted: 09/04/2024] [Indexed: 09/20/2024]
Abstract
Alzheimer's disease (AD) presents a significant challenge in neurodegenerative research and clinical practice due to its complex etiology and progressive nature. The integration of artificial intelligence (AI) into the diagnosis, treatment, and prognostic modelling of AD holds promising potential to transform the landscape of dementia care. This review explores recent advancements in AI applications across various stages of AD management. In early diagnosis, AI-enhanced neuroimaging techniques, including MRI, PET, and CT scans, enable precise detection of AD biomarkers. Machine learning models analyze these images to identify patterns indicative of early cognitive decline. Additionally, AI algorithms are employed to detect genetic and proteomic biomarkers, facilitating early intervention. Cognitive and behavioral assessments have also benefited from AI, with tools that enhance the accuracy of neuropsychological tests and analyze speech and language patterns for early signs of dementia. Personalized treatment strategies have been revolutionized by AI-driven approaches. In drug discovery, virtual screening and drug repurposing, guided by predictive modelling, accelerate the identification of effective treatments. AI also aids in tailoring therapeutic interventions by predicting individual responses to treatments and monitoring patient progress, allowing for dynamic adjustment of care plans. Prognostic modelling, another critical area, utilizes AI to predict disease progression through longitudinal data analysis and risk prediction models. The integration of multi-modal data, combining clinical, genetic, and imaging information, enhances the accuracy of these predictions. Deep learning techniques are particularly effective in fusing diverse data types to uncover new insights into disease mechanisms and progression. Despite these advancements, challenges remain, including ethical considerations, data privacy, and the need for seamless integration of AI tools into clinical workflows. This review underscores the transformative potential of AI in AD management while highlighting areas for future research and development. By leveraging AI, the healthcare community can improve early diagnosis, personalize treatments, and predict disease outcomes more accurately, ultimately enhancing the quality of life for individuals with AD.
Collapse
Affiliation(s)
- Mayur Kale
- Smt. Kishoritai Bhoyar College of Pharmacy, Kamptee, Nagpur, Maharashtra 441002, India.
| | - Nitu Wankhede
- Smt. Kishoritai Bhoyar College of Pharmacy, Kamptee, Nagpur, Maharashtra 441002, India.
| | - Rupali Pawar
- Smt. Kishoritai Bhoyar College of Pharmacy, Kamptee, Nagpur, Maharashtra 441002, India.
| | - Suhas Ballal
- Department of Chemistry and Biochemistry, School of Sciences, JAIN (Deemed to be University), Bangalore, Karnataka, India.
| | - Rohit Kumawat
- Department of Neurology, National Institute of Medical Sciences, NIMS University, Jaipur, Rajasthan, India.
| | - Manish Goswami
- Chandigarh Pharmacy College, Chandigarh Group of Colleges, Jhanjeri, Mohali, Punjab 140307, India.
| | - Mohammad Khalid
- Department of pharmacognosy, College of Pharmacy, Prince Sattam Bin Abdulaziz University Alkharj, Saudi Arabia.
| | - Brijesh Taksande
- Smt. Kishoritai Bhoyar College of Pharmacy, Kamptee, Nagpur, Maharashtra 441002, India.
| | - Aman Upaganlawar
- SNJB's Shriman Sureshdada Jain College of Pharmacy, Neminagar, Chandwad, Nashik, Maharashtra, India.
| | - Milind Umekar
- Smt. Kishoritai Bhoyar College of Pharmacy, Kamptee, Nagpur, Maharashtra 441002, India.
| | - Spandana Rajendra Kopalli
- Department of Bioscience and Biotechnology, Sejong University, Gwangjin-gu, Seoul 05006, Republic of Korea.
| | - Sushruta Koppula
- College of Biomedical and Health Sciences, Konkuk University, Chungju-Si, Chungju-Si, Chungcheongbuk Do 27478, Republic of Korea.
| |
Collapse
|
2
|
Coburn RP, Graff-Radford J, Machulda MM, Schwarz CG, Lowe VJ, Jones DT, Jack CR, Josephs KA, Whitwell JL, Botha H. Baseline multimodal imaging to predict longitudinal clinical decline in atypical Alzheimer's disease. Cortex 2024; 180:18-34. [PMID: 39305720 PMCID: PMC11532010 DOI: 10.1016/j.cortex.2024.07.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 07/10/2024] [Accepted: 07/31/2024] [Indexed: 09/25/2024]
Abstract
There are recognized neuroimaging regions of interest in typical Alzheimer's disease which have been used to track disease progression and aid prognostication. However, there is a need for validated baseline imaging markers to predict clinical decline in atypical Alzheimer's Disease. We aimed to address this need by producing models from baseline imaging features using penalized regression and evaluating their predictive performance on various clinical measures. Baseline multimodal imaging data, in combination with clinical testing data at two time points from 46 atypical Alzheimer's Disease patients with a diagnosis of logopenic progressive aphasia (N = 24) or posterior cortical atrophy (N = 22), were used to generate our models. An additional 15 patients (logopenic progressive aphasia = 7, posterior cortical atrophy = 8), whose data were not used in our original analysis, were used to test our models. Patients underwent MRI, FDG-PET and Tau-PET imaging and a full neurologic battery at two time points. The Schaefer functional atlas was used to extract network-based and regional gray matter volume or PET SUVR values from baseline imaging. Penalized regression (Elastic Net) was used to create models to predict scores on testing at Time 2 while controlling for baseline performance, education, age, and sex. In addition, we created models using clinical or Meta Region of Interested (ROI) data to serve as comparisons. We found the degree of baseline involvement on neuroimaging was predictive of future performance on cognitive testing while controlling for the above measures on all three imaging modalities. In many cases, model predictability improved with the addition of network-based neuroimaging data to clinical data. We also found our network-based models performed superiorly to the comparison models comprised of only clinical or a Meta ROI score. Creating predictive models from imaging studies at a baseline time point that are agnostic to clinical diagnosis as we have described could prove invaluable in both the clinical and research setting, particularly in the development and implementation of future disease modifying therapies.
Collapse
Affiliation(s)
- Ryan P Coburn
- Department of Neurology, Mayo Clinic (Rochester), Rochester, MN, USA.
| | | | - Mary M Machulda
- Department of Psychiatry and Psychology, Mayo Clinic (Rochester), Rochester, MN, USA
| | | | - Val J Lowe
- Department of Nuclear Medicine, Mayo Clinic (Rochester), Rochester, MN, USA
| | - David T Jones
- Department of Neurology, Mayo Clinic (Rochester), Rochester, MN, USA; Department of Radiology, Mayo Clinic (Rochester), Rochester, MN, USA
| | - Clifford R Jack
- Department of Radiology, Mayo Clinic (Rochester), Rochester, MN, USA
| | - Keith A Josephs
- Department of Neurology, Mayo Clinic (Rochester), Rochester, MN, USA
| | | | - Hugo Botha
- Department of Neurology, Mayo Clinic (Rochester), Rochester, MN, USA
| |
Collapse
|
3
|
Cheek CL, Lindner P, Grigorenko EL. Statistical and Machine Learning Analysis in Brain-Imaging Genetics: A Review of Methods. Behav Genet 2024; 54:233-251. [PMID: 38336922 DOI: 10.1007/s10519-024-10177-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 01/24/2024] [Indexed: 02/12/2024]
Abstract
Brain-imaging-genetic analysis is an emerging field of research that aims at aggregating data from neuroimaging modalities, which characterize brain structure or function, and genetic data, which capture the structure and function of the genome, to explain or predict normal (or abnormal) brain performance. Brain-imaging-genetic studies offer great potential for understanding complex brain-related diseases/disorders of genetic etiology. Still, a combined brain-wide genome-wide analysis is difficult to perform as typical datasets fuse multiple modalities, each with high dimensionality, unique correlational landscapes, and often low statistical signal-to-noise ratios. In this review, we outline the progress in brain-imaging-genetic methodologies starting from early massive univariate to current deep learning approaches, highlighting each approach's strengths and weaknesses and elongating it with the field's development. We conclude by discussing selected remaining challenges and prospects for the field.
Collapse
Affiliation(s)
- Connor L Cheek
- Texas Institute for Evaluation, Measurement, and Statistics, University of Houston, Houston, TX, USA.
- Department of Physics, University of Houston, Houston, TX, USA.
| | - Peggy Lindner
- Texas Institute for Evaluation, Measurement, and Statistics, University of Houston, Houston, TX, USA
- Department of Information Science Technology, University of Houston, Houston, TX, USA
| | - Elena L Grigorenko
- Texas Institute for Evaluation, Measurement, and Statistics, University of Houston, Houston, TX, USA
- Department of Psychology, University of Houston, Houston, TX, USA
- Baylor College of Medicine, Houston, TX, USA
- Sirius University of Science and Technology, Sochi, Russia
| |
Collapse
|
4
|
Zhang J, Ma Z, Yang Y, Guo L, Du L. Modeling genotype-protein interaction and correlation for Alzheimer's disease: a multi-omics imaging genetics study. Brief Bioinform 2024; 25:bbae038. [PMID: 38348747 PMCID: PMC10939371 DOI: 10.1093/bib/bbae038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 11/23/2023] [Accepted: 01/14/2024] [Indexed: 02/15/2024] Open
Abstract
Integrating and analyzing multiple omics data sets, including genomics, proteomics and radiomics, can significantly advance researchers' comprehensive understanding of Alzheimer's disease (AD). However, current methodologies primarily focus on the main effects of genetic variation and protein, overlooking non-additive effects such as genotype-protein interaction (GPI) and correlation patterns in brain imaging genetics studies. Importantly, these non-additive effects could contribute to intermediate imaging phenotypes, finally leading to disease occurrence. In general, the interaction between genetic variations and proteins, and their correlations are two distinct biological effects, and thus disentangling the two effects for heritable imaging phenotypes is of great interest and need. Unfortunately, this issue has been largely unexploited. In this paper, to fill this gap, we propose $\textbf{M}$ulti-$\textbf{T}$ask $\textbf{G}$enotype-$\textbf{P}$rotein $\textbf{I}$nteraction and $\textbf{C}$orrelation disentangling method ($\textbf{MT-GPIC}$) to identify GPI and extract correlation patterns between them. To ensure stability and interpretability, we use novel and off-the-shelf penalties to identify meaningful genetic risk factors, as well as exploit the interconnectedness of different brain regions. Additionally, since computing GPI poses a high computational burden, we develop a fast optimization strategy for solving MT-GPIC, which is guaranteed to converge. Experimental results on the Alzheimer's Disease Neuroimaging Initiative data set show that MT-GPIC achieves higher correlation coefficients and classification accuracy than state-of-the-art methods. Moreover, our approach could effectively identify interpretable phenotype-related GPI and correlation patterns in high-dimensional omics data sets. These findings not only enhance the diagnostic accuracy but also contribute valuable insights into the underlying pathogenic mechanisms of AD.
Collapse
Affiliation(s)
- Jin Zhang
- Department of Intelligent Science and Technology, Northwestern Polytechnical University School of Automation, 127 Youyi Road, 710072 Shaanxi, China
| | - Zikang Ma
- Department of Intelligent Science and Technology, Northwestern Polytechnical University School of Automation, 127 Youyi Road, 710072 Shaanxi, China
| | - Yan Yang
- Department of Intelligent Science and Technology, Northwestern Polytechnical University School of Automation, 127 Youyi Road, 710072 Shaanxi, China
| | - Lei Guo
- Department of Intelligent Science and Technology, Northwestern Polytechnical University School of Automation, 127 Youyi Road, 710072 Shaanxi, China
| | - Lei Du
- Department of Intelligent Science and Technology, Northwestern Polytechnical University School of Automation, 127 Youyi Road, 710072 Shaanxi, China
| | | |
Collapse
|
5
|
Beaulac C, Wu S, Gibson E, Miranda MF, Cao J, Rocha L, Beg MF, Nathoo FS. Neuroimaging feature extraction using a neural network classifier for imaging genetics. BMC Bioinformatics 2023; 24:271. [PMID: 37391692 DOI: 10.1186/s12859-023-05394-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Accepted: 06/21/2023] [Indexed: 07/02/2023] Open
Abstract
BACKGROUND Dealing with the high dimension of both neuroimaging data and genetic data is a difficult problem in the association of genetic data to neuroimaging. In this article, we tackle the latter problem with an eye toward developing solutions that are relevant for disease prediction. Supported by a vast literature on the predictive power of neural networks, our proposed solution uses neural networks to extract from neuroimaging data features that are relevant for predicting Alzheimer's Disease (AD) for subsequent relation to genetics. The neuroimaging-genetic pipeline we propose is comprised of image processing, neuroimaging feature extraction and genetic association steps. We present a neural network classifier for extracting neuroimaging features that are related with the disease. The proposed method is data-driven and requires no expert advice or a priori selection of regions of interest. We further propose a multivariate regression with priors specified in the Bayesian framework that allows for group sparsity at multiple levels including SNPs and genes. RESULTS We find the features extracted with our proposed method are better predictors of AD than features used previously in the literature suggesting that single nucleotide polymorphisms (SNPs) related to the features extracted by our proposed method are also more relevant for AD. Our neuroimaging-genetic pipeline lead to the identification of some overlapping and more importantly some different SNPs when compared to those identified with previously used features. CONCLUSIONS The pipeline we propose combines machine learning and statistical methods to benefit from the strong predictive performance of blackbox models to extract relevant features while preserving the interpretation provided by Bayesian models for genetic association. Finally, we argue in favour of using automatic feature extraction, such as the method we propose, in addition to ROI or voxelwise analysis to find potentially novel disease-relevant SNPs that may not be detected when using ROIs or voxels alone.
Collapse
Affiliation(s)
- Cédric Beaulac
- School of Engineering Science, Simon Fraser University, Burnaby, Canada.
- Department of Mathematics and Statistics, University of Victoria, Victoria, Canada.
| | - Sidi Wu
- Department of Statistics and Actuarial Sciences, Simon Fraser University, Burnaby, Canada
| | - Erin Gibson
- School of Engineering Science, Simon Fraser University, Burnaby, Canada
| | - Michelle F Miranda
- Department of Mathematics and Statistics, University of Victoria, Victoria, Canada
| | - Jiguo Cao
- Department of Statistics and Actuarial Sciences, Simon Fraser University, Burnaby, Canada
| | - Leno Rocha
- Department of Mathematics and Statistics, University of Victoria, Victoria, Canada
| | - Mirza Faisal Beg
- School of Engineering Science, Simon Fraser University, Burnaby, Canada
| | - Farouk S Nathoo
- Department of Mathematics and Statistics, University of Victoria, Victoria, Canada
| |
Collapse
|
6
|
Bao J, Chang C, Zhang Q, Saykin AJ, Shen L, Long Q. Integrative analysis of multi-omics and imaging data with incorporation of biological information via structural Bayesian factor analysis. Brief Bioinform 2023; 24:bbad073. [PMID: 36882008 PMCID: PMC10387302 DOI: 10.1093/bib/bbad073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 01/14/2023] [Accepted: 02/10/2023] [Indexed: 03/09/2023] Open
Abstract
MOTIVATION With the rapid development of modern technologies, massive data are available for the systematic study of Alzheimer's disease (AD). Though many existing AD studies mainly focus on single-modality omics data, multi-omics datasets can provide a more comprehensive understanding of AD. To bridge this gap, we proposed a novel structural Bayesian factor analysis framework (SBFA) to extract the information shared by multi-omics data through the aggregation of genotyping data, gene expression data, neuroimaging phenotypes and prior biological network knowledge. Our approach can extract common information shared by different modalities and encourage biologically related features to be selected, guiding future AD research in a biologically meaningful way. METHOD Our SBFA model decomposes the mean parameters of the data into a sparse factor loading matrix and a factor matrix, where the factor matrix represents the common information extracted from multi-omics and imaging data. Our framework is designed to incorporate prior biological network information. Our simulation study demonstrated that our proposed SBFA framework could achieve the best performance compared with the other state-of-the-art factor-analysis-based integrative analysis methods. RESULTS We apply our proposed SBFA model together with several state-of-the-art factor analysis models to extract the latent common information from genotyping, gene expression and brain imaging data simultaneously from the ADNI biobank database. The latent information is then used to predict the functional activities questionnaire score, an important measurement for diagnosis of AD quantifying subjects' abilities in daily life. Our SBFA model shows the best prediction performance compared with the other factor analysis models. AVAILABILITY Code are publicly available at https://github.com/JingxuanBao/SBFA. CONTACT qlong@upenn.edu.
Collapse
Affiliation(s)
- Jingxuan Bao
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, 19104, PA, USA
| | - Changgee Chang
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, 19104, PA, USA
| | - Qiyiwen Zhang
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, 19104, PA, USA
| | - Andrew J Saykin
- Department of Radiology and Imaging Sciences, Indiana University, Indianapolis, 46202, IN, USA
| | - Li Shen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, 19104, PA, USA
| | - Qi Long
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, 19104, PA, USA
| | | |
Collapse
|
7
|
Huang W, Tan K, Zhang Z, Hu J, Dong S. A Review of Fusion Methods for Omics and Imaging Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:74-93. [PMID: 35044920 DOI: 10.1109/tcbb.2022.3143900] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The development of omics data and biomedical images has greatly advanced the progress of precision medicine in diagnosis, treatment, and prognosis. The fusion of omics and imaging data, i.e., omics-imaging fusion, offers a new strategy for understanding complex diseases. However, due to a variety of issues such as the limited number of samples, high dimensionality of features, and heterogeneity of different data types, efficiently learning complementary or associated discriminative fusion information from omics and imaging data remains a challenge. Recently, numerous machine learning methods have been proposed to alleviate these problems. In this review, from the perspective of fusion levels and fusion methods, we first provide an overview of preprocessing and feature extraction methods for omics and imaging data, and comprehensively analyze and summarize the basic forms and variations of commonly used and newly emerging fusion methods, along with their advantages, disadvantages and the applicable scope. We then describe public datasets and compare experimental results of various fusion methods on the ADNI and TCGA datasets. Finally, we discuss future prospects and highlight remaining challenges in the field.
Collapse
|
8
|
Kline A, Wang H, Li Y, Dennis S, Hutch M, Xu Z, Wang F, Cheng F, Luo Y. Multimodal machine learning in precision health: A scoping review. NPJ Digit Med 2022; 5:171. [PMID: 36344814 PMCID: PMC9640667 DOI: 10.1038/s41746-022-00712-8] [Citation(s) in RCA: 83] [Impact Index Per Article: 41.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 10/14/2022] [Indexed: 11/09/2022] Open
Abstract
Machine learning is frequently being leveraged to tackle problems in the health sector including utilization for clinical decision-support. Its use has historically been focused on single modal data. Attempts to improve prediction and mimic the multimodal nature of clinical expert decision-making has been met in the biomedical field of machine learning by fusing disparate data. This review was conducted to summarize the current studies in this field and identify topics ripe for future research. We conducted this review in accordance with the PRISMA extension for Scoping Reviews to characterize multi-modal data fusion in health. Search strings were established and used in databases: PubMed, Google Scholar, and IEEEXplore from 2011 to 2021. A final set of 128 articles were included in the analysis. The most common health areas utilizing multi-modal methods were neurology and oncology. Early fusion was the most common data merging strategy. Notably, there was an improvement in predictive performance when using data fusion. Lacking from the papers were clear clinical deployment strategies, FDA-approval, and analysis of how using multimodal approaches from diverse sub-populations may improve biases and healthcare disparities. These findings provide a summary on multimodal data fusion as applied to health diagnosis/prognosis problems. Few papers compared the outputs of a multimodal approach with a unimodal prediction. However, those that did achieved an average increase of 6.4% in predictive accuracy. Multi-modal machine learning, while more robust in its estimations over unimodal methods, has drawbacks in its scalability and the time-consuming nature of information concatenation.
Collapse
Affiliation(s)
- Adrienne Kline
- Department of Preventive Medicine, Northwestern University, Chicago, 60201, IL, USA
| | - Hanyin Wang
- Department of Preventive Medicine, Northwestern University, Chicago, 60201, IL, USA
| | - Yikuan Li
- Department of Preventive Medicine, Northwestern University, Chicago, 60201, IL, USA
| | - Saya Dennis
- Department of Preventive Medicine, Northwestern University, Chicago, 60201, IL, USA
| | - Meghan Hutch
- Department of Preventive Medicine, Northwestern University, Chicago, 60201, IL, USA
| | - Zhenxing Xu
- Department of Population Health Sciences, Cornell University, New York, 10065, NY, USA
| | - Fei Wang
- Department of Population Health Sciences, Cornell University, New York, 10065, NY, USA
| | - Feixiong Cheng
- Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, 44195, OH, USA
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University, Chicago, 60201, IL, USA.
| |
Collapse
|
9
|
Guo X, Jiang Y, Zou Q. Structured Sparse Regularized TSK Fuzzy System for predicting therapeutic peptides. Brief Bioinform 2022; 23:6570018. [PMID: 35438149 DOI: 10.1093/bib/bbac135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 03/19/2022] [Accepted: 03/22/2022] [Indexed: 11/13/2022] Open
Abstract
Therapeutic peptides act on the skeletal system, digestive system and blood system, have antibacterial properties and help relieve inflammation. In order to reduce the resource consumption of wet experiments for the identification of therapeutic peptides, many computational-based methods have been developed to solve the identification of therapeutic peptides. Due to the insufficiency of traditional machine learning methods in dealing with feature noise. We propose a novel therapeutic peptide identification method called Structured Sparse Regularized Takagi-Sugeno-Kang Fuzzy System on Within-Class Scatter (SSR-TSK-FS-WCS). Our method achieves good performance on multiple therapeutic peptides and UCI datasets.
Collapse
Affiliation(s)
- Xiaoyi Guo
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, P.R.China
| | - Yizhang Jiang
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, P.R.China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, P.R.China
| |
Collapse
|
10
|
Mirabnahrazam G, Ma D, Lee S, Popuri K, Lee H, Cao J, Wang L, Galvin JE, Beg MF. Machine Learning Based Multimodal Neuroimaging Genomics Dementia Score for Predicting Future Conversion to Alzheimer's Disease. J Alzheimers Dis 2022; 87:1345-1365. [PMID: 35466939 PMCID: PMC9195128 DOI: 10.3233/jad-220021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
BACKGROUND The increasing availability of databases containing both magnetic resonance imaging (MRI) and genetic data allows researchers to utilize multimodal data to better understand the characteristics of dementia of Alzheimer's type (DAT). OBJECTIVE The goal of this study was to develop and analyze novel biomarkers that can help predict the development and progression of DAT. METHODS We used feature selection and ensemble learning classifier to develop an image/genotype-based DAT score that represents a subject's likelihood of developing DAT in the future. Three feature types were used: MRI only, genetic only, and combined multimodal data. We used a novel data stratification method to better represent different stages of DAT. Using a pre-defined 0.5 threshold on DAT scores, we predicted whether a subject would develop DAT in the future. RESULTS Our results on Alzheimer's Disease Neuroimaging Initiative (ADNI) database showed that dementia scores using genetic data could better predict future DAT progression for currently normal control subjects (Accuracy = 0.857) compared to MRI (Accuracy = 0.143), while MRI can better characterize subjects with stable mild cognitive impairment (Accuracy = 0.614) compared to genetics (Accuracy = 0.356). Combining MRI and genetic data showed improved classification performance in the remaining stratified groups. CONCLUSION MRI and genetic data can contribute to DAT prediction in different ways. MRI data reflects anatomical changes in the brain, while genetic data can detect the risk of DAT progression prior to the symptomatic onset. Combining information from multimodal data appropriately can improve prediction performance.
Collapse
Affiliation(s)
| | - Da Ma
- School of Engineering, Simon Fraser University, Burnaby, BC, Canada
- School of Medicine, Wake Forest University, Winston-Salem, NC, USA
| | - Sieun Lee
- School of Engineering, Simon Fraser University, Burnaby, BC, Canada
- Mental Health & Clinical Neurosciences, School of Medicine, University of Nottingham, Nottingham, United Kingdom
| | - Karteek Popuri
- School of Engineering, Simon Fraser University, Burnaby, BC, Canada
| | - Hyunwoo Lee
- Division of Neurology, Department of Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Jiguo Cao
- Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, BC, Canada
| | - Lei Wang
- Psychiatry and Behavioral Health, Ohio State University Wexner Medical Center, Columbus, OH, USA
| | - James E Galvin
- Comprehensive Center for Brain Health, Department of Neurology, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Mirza Faisal Beg
- School of Engineering, Simon Fraser University, Burnaby, BC, Canada
| | | |
Collapse
|
11
|
Classification of Initial Stages of Alzheimer’s Disease through Pet Neuroimaging Modality and Deep Learning: Quantifying the Impact of Image Filtering Approaches. MATHEMATICS 2021. [DOI: 10.3390/math9233101] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Alzheimer’s disease (AD) is a leading health concern affecting the elderly population worldwide. It is defined by amyloid plaques, neurofibrillary tangles, and neuronal loss. Neuroimaging modalities such as positron emission tomography (PET) and magnetic resonance imaging are routinely used in clinical settings to monitor the alterations in the brain during the course of progression of AD. Deep learning techniques such as convolutional neural networks (CNNs) have found numerous applications in healthcare and other technologies. Together with neuroimaging modalities, they can be deployed in clinical settings to learn effective representations of data for different tasks such as classification, segmentation, detection, etc. Image filtering methods are instrumental in making images viable for image processing operations and have found numerous applications in image-processing-related tasks. In this work, we deployed 3D-CNNs to learn effective representations of PET modality data to quantify the impact of different image filtering approaches. We used box filtering, median filtering, Gaussian filtering, and modified Gaussian filtering approaches to preprocess the images and use them for classification using 3D-CNN architecture. Our findings suggest that these approaches are nearly equivalent and have no distinct advantage over one another. For the multiclass classification task between normal control (NC), mild cognitive impairment (MCI), and AD classes, the 3D-CNN architecture trained using Gaussian-filtered data performed the best. For binary classification between NC and MCI classes, the 3D-CNN architecture trained using median-filtered data performed the best, while, for binary classification between AD and MCI classes, the 3D-CNN architecture trained using modified Gaussian-filtered data performed the best. Finally, for binary classification between AD and NC classes, the 3D-CNN architecture trained using box-filtered data performed the best.
Collapse
|
12
|
Huang Y, Li L, Jiang J. Radiogenomics of Alzheimer's disease: exploring gene related metabolic imaging markers. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2021; 2021:5772-5775. [PMID: 34892431 DOI: 10.1109/embc46164.2021.9630690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Alzheimer's disease (AD) is the most prevalent neurodegenerative disorder and considerably determined by genetic factors. Fluorodeoxyglucose positron emission tomography (FDG-PET) can reflect the functional state of glucose metabolism in the brain, and radiomic features of FDG-PET were considered as important imaging markers in AD. However, radiomic features are not highly interpretable, especially lack of explanation of underlying biological and molecular mechanisms. Therefore, this study used radiogenomics analysis to explore prognostic metabolic imaging markers by associating radiomics features and genetic data. In the study, we used the FDG-PET images and genotype data of 389 subjects (Cohort B) enrolled in the ADNI, including 109 AD, 134 healthy controls (HCs), 72 MCI non-converters (MCI-nc) and 74 MCI converters (MCI-c). Firstly, we performed a Genome-wide association study (GWAS) on the genotype data of 998 subjects (Cohort A), including 632 AD and 366 HCs after quality control (QC) steps to identify susceptibility loci as the gene features. Secondly, radiomics features were extracted from the preprocessed PET images. Thirdly, two-sample t-test, rank sum test and F-score were regarded as the feature selection step to select effective radiomic features. Fourthly, a support vector machine (SVM) was used to test the ability of the radiomic features to classify HCs, MCI and AD patients. Finally, we performed the Spearman correlation analysis on the genetic data and radiomic features. As a result, we identified rs429358 and rs2075650 as genome-wide significant signals. The radiomic approach achieved good classification abilities. Two prognostic FDG-PET radiomic features in the amygdala were proven to be correlated with the genetic data.
Collapse
|
13
|
Sparse robust multiview feature selection via adaptive-weighting strategy. INT J MACH LEARN CYB 2021. [DOI: 10.1007/s13042-021-01453-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
14
|
Vilor-Tejedor N, Garrido-Martín D, Rodriguez-Fernandez B, Lamballais S, Guigó R, Gispert JD. Multivariate Analysis and Modelling of multiple Brain endOphenotypes: Let's MAMBO! Comput Struct Biotechnol J 2021; 19:5800-5810. [PMID: 34765095 PMCID: PMC8567328 DOI: 10.1016/j.csbj.2021.10.019] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Revised: 10/08/2021] [Accepted: 10/12/2021] [Indexed: 12/01/2022] Open
Abstract
Imaging genetic studies aim to test how genetic information influences brain structure and function by combining neuroimaging-based brain features and genetic data from the same individual. Most studies focus on individual correlation and association tests between genetic variants and a single measurement of the brain. Despite the great success of univariate approaches, given the capacity of neuroimaging methods to provide a multiplicity of cerebral phenotypes, the development and application of multivariate methods become crucial. In this article, we review novel methods and strategies focused on the analysis of multiple phenotypes and genetic data. We also discuss relevant aspects of multi-trait modelling in the context of neuroimaging data.
Collapse
Affiliation(s)
- Natalia Vilor-Tejedor
- Barcelonaβeta Brain Research Center (BBRC), Pasqual Maragall Foundation, Barcelona, Spain
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Barcelona, Spain
- Department of Clinical Genetics, Erasmus Medical Center, Rotterdam, Netherlands
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Diego Garrido-Martín
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Barcelona, Spain
| | | | - Sander Lamballais
- Department of Clinical Genetics, Erasmus Medical Center, Rotterdam, Netherlands
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Juan Domingo Gispert
- Barcelonaβeta Brain Research Center (BBRC), Pasqual Maragall Foundation, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
- IMIM (Hospital del Mar Medical Research Institute), Barcelona, Spain
- Centro de Investigación Biomédica en Red Bioingeniería, Biomateriales y Nanomedicina, Madrid, Spain
| |
Collapse
|
15
|
|
16
|
Sheng J, Wang L, Cheng H, Zhang Q, Zhou R, Shi Y. Strategies for multivariate analyses of imaging genetics study in Alzheimer's disease. Neurosci Lett 2021; 762:136147. [PMID: 34332030 DOI: 10.1016/j.neulet.2021.136147] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 03/27/2021] [Accepted: 07/26/2021] [Indexed: 11/16/2022]
Abstract
Alzheimer's disease (AD) is an incurable neurodegenerative disease primarily affecting the elderly population. Early diagnosis of AD is critical for the management of this disease. Imaging genetics examines the influence of genetic variants (i.e., single nucleotide polymorphisms (SNPs)) on brain structure and function and many novel approaches of imaging genetics are proposed for studying AD. We review and synthesize the Alzheimer's Disease Neuroimaging Initiative (ADNI) genetic associations with quantitative disease endophenotypes including structural and functional neuroimaging, diffusion tensor imaging (DTI), positron emission tomography (PET), and fluid biomarker assays. In this review, we survey recent publications using neuroimaging and genetic data of AD, with a focus on methods capturing multivariate effects accommodating the large number variables from both imaging data and genetic data. We review methods focused on bridging the imaging and genetic data by establishing genotype-phenotype association, including sparse canonical correlation analysis, parallel independent component analysis, sparse reduced rank regression, sparse partial least squares, genome-wide association study, and so on. The broad availability and wide scope of ADNI genetic and phenotypic data has advanced our understanding of the genetic basis of AD and has nominated novel targets for future pharmaceutical therapy and biomarker development.
Collapse
Affiliation(s)
- Jinhua Sheng
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, Zhejiang 310018, China; Key Laboratory of Intelligent Image Analysis for Sensory and Cognitive Health, Ministry of Industry and Information Technology of China, Hangzhou, Zhejiang 310018, China.
| | - Luyun Wang
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, Zhejiang 310018, China; Key Laboratory of Intelligent Image Analysis for Sensory and Cognitive Health, Ministry of Industry and Information Technology of China, Hangzhou, Zhejiang 310018, China; College of Information Engineering, Hangzhou Vocational & Technical College, Hangzhou, Zhejiang 310018, China
| | - Hu Cheng
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN 47405, USA
| | | | - Rougang Zhou
- Key Laboratory of Intelligent Image Analysis for Sensory and Cognitive Health, Ministry of Industry and Information Technology of China, Hangzhou, Zhejiang 310018, China; School of Mechanical Engineering, Hangzhou Dianzi University, Hangzhou, Zhejiang 310018, China; Mstar Technologies Inc., Hangzhou, Zhejiang 310018, China
| | - Yuchen Shi
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, Zhejiang 310018, China; Key Laboratory of Intelligent Image Analysis for Sensory and Cognitive Health, Ministry of Industry and Information Technology of China, Hangzhou, Zhejiang 310018, China
| |
Collapse
|
17
|
Ke F, Kong W, Wang S. Identifying Imaging Genetics Biomarkers of Alzheimer's Disease by Multi-Task Sparse Canonical Correlation Analysis and Regression. Front Genet 2021; 12:706986. [PMID: 34422007 PMCID: PMC8375409 DOI: 10.3389/fgene.2021.706986] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2021] [Accepted: 07/19/2021] [Indexed: 11/29/2022] Open
Abstract
Imaging genetics combines neuroimaging and genetics to assess the relationships between genetic variants and changes in brain structure and metabolism. Sparse canonical correlation analysis (SCCA) models are well-known tools for identifying meaningful biomarkers in imaging genetics. However, most SCCA models incorporate only diagnostic status information, which poses challenges for finding disease-specific biomarkers. In this study, we proposed a multi-task sparse canonical correlation analysis and regression (MT-SCCAR) model to reveal disease-specific associations between single nucleotide polymorphisms and quantitative traits derived from multi-modal neuroimaging data in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohort. MT-SCCAR uses complementary information carried by multiple-perspective cognitive scores and encourages group sparsity on genetic variants. In contrast with two other multi-modal SCCA models, MT-SCCAR embedded more accurate neuropsychological assessment information through linear regression and enhanced the correlation coefficients, leading to increased identification of high-risk brain regions. Furthermore, MT-SCCAR identified primary genetic risk factors for Alzheimer’s disease (AD), including rs429358, and found some association patterns between genetic variants and brain regions. Thus, MT-SCCAR contributes to deciphering genetic risk factors of brain structural and metabolic changes by identifying potential risk biomarkers.
Collapse
Affiliation(s)
- Fengchun Ke
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Wei Kong
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Shuaiqun Wang
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| |
Collapse
|
18
|
Lu P, Colliot O. Multilevel Survival Modeling with Structured Penalties for Disease Prediction from Imaging Genetics data. IEEE J Biomed Health Inform 2021; 26:798-808. [PMID: 34329174 DOI: 10.1109/jbhi.2021.3100918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
This paper introduces a framework for disease prediction from multimodal genetic and imaging data. We propose a multilevel survival model which allows predicting the time of occurrence of a future disease state in patients initially exhibiting mild symptoms. This new multilevel setting allows modeling the interactions between genetic and imaging variables. This is in contrast with classical additive models which treat all modalities in the same manner and can result in undesirable elimination of specific modalities when their contributions are unbalanced. Moreover, the use of a survival model allows overcoming the limitations of previous approaches based on classification which consider a fixed time frame. Furthermore, we introduce specific penalties taking into account the structure of the different types of data, such as a group lasso penalty over the genetic modality and a L2-penalty over the imaging modality. Finally, we propose a fast optimization algorithm, based on a proximal gradient method. The approach was applied to the prediction of Alzheimer's disease (AD) among patients with mild cognitive impairment (MCI) based on genetic (single nucleotide polymorphisms - SNP) and imaging (anatomical MRI measures) data from the ADNI database. The experiments demonstrate the effectiveness of the method for predicting the time of conversion to AD. It revealed how genetic variants and brain imaging alterations interact in the prediction of future disease status. The approach is generic and could potentially be useful for the prediction of other diseases.
Collapse
|
19
|
Huang ZA, Zhu Z, Yau CH, Tan KC. Identifying Autism Spectrum Disorder From Resting-State fMRI Using Deep Belief Network. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:2847-2861. [PMID: 32692687 DOI: 10.1109/tnnls.2020.3007943] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
With the increasing prevalence of autism spectrum disorder (ASD), it is important to identify ASD patients for effective treatment and intervention, especially in early childhood. Neuroimaging techniques have been used to characterize the complex biomarkers based on the functional connectivity anomalies in the ASD. However, the diagnosis of ASD still adopts the symptom-based criteria by clinical observation. The existing computational models tend to achieve unreliable diagnostic classification on the large-scale aggregated data sets. In this work, we propose a novel graph-based classification model using the deep belief network (DBN) and the Autism Brain Imaging Data Exchange (ABIDE) database, which is a worldwide multisite functional and structural brain imaging data aggregation. The remarkable connectivity features are selected through a graph extension of K -nearest neighbors and then refined by a restricted path-based depth-first search algorithm. Thanks to the feature reduction, lower computational complexity could contribute to the shortening of the training time. The automatic hyperparameter-tuning technique is introduced to optimize the hyperparameters of the DBN by exploring the potential parameter space. The simulation experiments demonstrate the superior performance of our model, which is 6.4% higher than the best result reported on the ABIDE database. We also propose to use the data augmentation and the oversampling technique to identify further the possible subtypes within the ASD. The interpretability of our model enables the identification of the most remarkable autistic neural correlation patterns from the data-driven outcomes.
Collapse
|
20
|
Song Y, Ge S, Cao J, Wang L, Nathoo FS. A Bayesian spatial model for imaging genetics. Biometrics 2021; 78:742-753. [PMID: 33765325 DOI: 10.1111/biom.13460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Revised: 02/08/2021] [Accepted: 02/24/2021] [Indexed: 11/29/2022]
Abstract
We develop a Bayesian bivariate spatial model for multivariate regression analysis applicable to studies examining the influence of genetic variation on brain structure. Our model is motivated by an imaging genetics study of the Alzheimer's Disease Neuroimaging Initiative (ADNI), where the objective is to examine the association between images of volumetric and cortical thickness values summarizing the structure of the brain as measured by magnetic resonance imaging (MRI) and a set of 486 single nucleotide polymorphism (SNPs) from 33 Alzheimer's disease (AD) candidate genes obtained from 632 subjects. A bivariate spatial process model is developed to accommodate the correlation structures typically seen in structural brain imaging data. First, we allow for spatial correlation on a graph structure in the imaging phenotypes obtained from a neighborhood matrix for measures on the same hemisphere of the brain. Second, we allow for correlation in the same measures obtained from different hemispheres (left/right) of the brain. We develop a mean-field variational Bayes algorithm and a Gibbs sampling algorithm to fit the model. We also incorporate Bayesian false discovery rate (FDR) procedures to select SNPs. We implement the methodology in a new release of the R package bgsmtr. We show that the new spatial model demonstrates superior performance over a standard model in our application. Data used in the preparation of this article were obtained from the ADNI database (https://adni.loni.usc.edu).
Collapse
Affiliation(s)
- Yin Song
- Department of Mathematics and Statistics, University of Victoria, British Columbia, Canada
| | - Shufei Ge
- Institute of Mathematical Sciences, ShanghaiTech University, Shanghai, China
| | - Jiguo Cao
- Statistics and Actuarial Science, Simon Fraser University, British Columbia, Canada
| | - Liangliang Wang
- Statistics and Actuarial Science, Simon Fraser University, British Columbia, Canada
| | - Farouk S Nathoo
- Department of Mathematics and Statistics, University of Victoria, British Columbia, Canada
| |
Collapse
|
21
|
Du L, Zhang J, Liu F, Wang H, Guo L, Han J, Disease Neuroimaging Initiative TA. Identifying associations among genomic, proteomic and imaging biomarkers via adaptive sparse multi-view canonical correlation analysis. Med Image Anal 2021; 70:102003. [PMID: 33735757 DOI: 10.1016/j.media.2021.102003] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 02/10/2021] [Accepted: 02/15/2021] [Indexed: 12/13/2022]
Abstract
To uncover the genetic underpinnings of brain disorders, brain imaging genomics usually jointly analyzes genetic variations and imaging measurements. Meanwhile, other biomarkers such as proteomic expressions can also carry valuable complementary information. Therefore, it is necessary yet challenging to investigate the underlying relationships among genetic variations, proteomic expressions, and neuroimaging measurements, which stands a chance of gaining new insights into the pathogenesis of brain disorders. Given multiple types of biomarkers, using sparse multi-view canonical correlation analysis (SMCCA) and its variants to identify the multi-way associations is straightforward. However, due to the gradient domination issue caused by the naive fusion of multiple SCCA objectives, SMCCA is suboptimal. In this paper, we proposed two adaptive SMCCA (AdaSMCCA) methods, i.e. the robustness-aware AdaSMCCA and the uncertainty-aware AdaSMCCA, to analyze the complicated associations among genetic, proteomic, and neuroimaging biomarkers. We also imposed a data-driven feature grouping penalty to the genetic data with aim to uncover the joint inheritance of neighboring genetic variations. An efficient optimization algorithm, which is guaranteed to converge, was provided. Using two state-of-the-art SMCCA as benchmarks, we evaluated robustness-aware AdaSMCCA and uncertainty-aware AdaSMCCA on both synthetic data and real neuroimaging, proteomics, and genetic data. Both proposed methods obtained higher associations and cleaner canonical weight profiles than comparison methods, indicating their promising capability for association identification and feature selection. In addition, the subsequent analysis showed that the identified biomarkers were related to Alzheimer's disease, demonstrating the power of our methods in identifying multi-way bi-multivariate associations among multiple heterogeneous biomarkers.
Collapse
Affiliation(s)
- Lei Du
- School of Automation, Northwestern Polytechnical University, Xi'an 710072, China.
| | - Jin Zhang
- School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Fang Liu
- School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Huiai Wang
- School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Lei Guo
- School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Junwei Han
- School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | | |
Collapse
|
22
|
Venugopalan J, Tong L, Hassanzadeh HR, Wang MD. Multimodal deep learning models for early detection of Alzheimer's disease stage. Sci Rep 2021; 11:3254. [PMID: 33547343 PMCID: PMC7864942 DOI: 10.1038/s41598-020-74399-w] [Citation(s) in RCA: 128] [Impact Index Per Article: 42.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2018] [Accepted: 01/22/2020] [Indexed: 02/06/2023] Open
Abstract
Most current Alzheimer's disease (AD) and mild cognitive disorders (MCI) studies use single data modality to make predictions such as AD stages. The fusion of multiple data modalities can provide a holistic view of AD staging analysis. Thus, we use deep learning (DL) to integrally analyze imaging (magnetic resonance imaging (MRI)), genetic (single nucleotide polymorphisms (SNPs)), and clinical test data to classify patients into AD, MCI, and controls (CN). We use stacked denoising auto-encoders to extract features from clinical and genetic data, and use 3D-convolutional neural networks (CNNs) for imaging data. We also develop a novel data interpretation method to identify top-performing features learned by the deep-models with clustering and perturbation analysis. Using Alzheimer's disease neuroimaging initiative (ADNI) dataset, we demonstrate that deep models outperform shallow models, including support vector machines, decision trees, random forests, and k-nearest neighbors. In addition, we demonstrate that integrating multi-modality data outperforms single modality models in terms of accuracy, precision, recall, and meanF1 scores. Our models have identified hippocampus, amygdala brain areas, and the Rey Auditory Verbal Learning Test (RAVLT) as top distinguished features, which are consistent with the known AD literature.
Collapse
Affiliation(s)
- Janani Venugopalan
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | - Li Tong
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | - Hamid Reza Hassanzadeh
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - May D Wang
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA.
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA.
- Winship Cancer Institute, Parker H. Petit Institute for Bioengineering and Biosciences, Institute of People and Technology, Georgia Institute of Technology and Emory University, Atlanta, GA, USA.
| |
Collapse
|
23
|
Shirbandi K, Khalafi M, Mirza-Aghazadeh-Attari M, Tahmasbi M, Kiani Shahvandi H, Javanmardi P, Rahim F. Accuracy of deep learning model-assisted amyloid positron emission tomography scan in predicting Alzheimer's disease: A Systematic Review and meta-analysis. INFORMATICS IN MEDICINE UNLOCKED 2021. [DOI: 10.1016/j.imu.2021.100710] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
|
24
|
Peng B, Yao X, Risacher SL, Saykin AJ, Shen L, Ning X. Cognitive biomarker prioritization in Alzheimer's Disease using brain morphometric data. BMC Med Inform Decis Mak 2020; 20:319. [PMID: 33267852 PMCID: PMC7709267 DOI: 10.1186/s12911-020-01339-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Accepted: 11/17/2020] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Cognitive assessments represent the most common clinical routine for the diagnosis of Alzheimer's Disease (AD). Given a large number of cognitive assessment tools and time-limited office visits, it is important to determine a proper set of cognitive tests for different subjects. Most current studies create guidelines of cognitive test selection for a targeted population, but they are not customized for each individual subject. In this manuscript, we develop a machine learning paradigm enabling personalized cognitive assessments prioritization. METHOD We adapt a newly developed learning-to-rank approach [Formula: see text] to implement our paradigm. This method learns the latent scoring function that pushes the most effective cognitive assessments onto the top of the prioritization list. We also extend [Formula: see text] to better separate the most effective cognitive assessments and the less effective ones. RESULTS Our empirical study on the ADNI data shows that the proposed paradigm outperforms the state-of-the-art baselines on identifying and prioritizing individual-specific cognitive biomarkers. We conduct experiments in cross validation and level-out validation settings. In the two settings, our paradigm significantly outperforms the best baselines with improvement as much as 22.1% and 19.7%, respectively, on prioritizing cognitive features. CONCLUSIONS The proposed paradigm achieves superior performance on prioritizing cognitive biomarkers. The cognitive biomarkers prioritized on top have great potentials to facilitate personalized diagnosis, disease subtyping, and ultimately precision medicine in AD.
Collapse
Affiliation(s)
- Bo Peng
- The Ohio State University, Columbus, USA
| | - Xiaohui Yao
- University of Pennsylvania, Philadelphia, USA
| | | | | | - Li Shen
- University of Pennsylvania, Philadelphia, USA
| | - Xia Ning
- The Ohio State University, Columbus, USA
| | - for the ADNI
- The Ohio State University, Columbus, USA
- University of Pennsylvania, Philadelphia, USA
- Indiana University, Indianapolis, USA
| |
Collapse
|
25
|
Mishra R, Li B. The Application of Artificial Intelligence in the Genetic Study of Alzheimer's Disease. Aging Dis 2020; 11:1567-1584. [PMID: 33269107 PMCID: PMC7673858 DOI: 10.14336/ad.2020.0312] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 03/12/2020] [Indexed: 12/13/2022] Open
Abstract
Alzheimer's disease (AD) is a neurodegenerative disease in which genetic factors contribute approximately 70% of etiological effects. Studies have found many significant genetic and environmental factors, but the pathogenesis of AD is still unclear. With the application of microarray and next-generation sequencing technologies, research using genetic data has shown explosive growth. In addition to conventional statistical methods for the processing of these data, artificial intelligence (AI) technology shows obvious advantages in analyzing such complex projects. This article first briefly reviews the application of AI technology in medicine and the current status of genetic research in AD. Then, a comprehensive review is focused on the application of AI in the genetic research of AD, including the diagnosis and prognosis of AD based on genetic data, the analysis of genetic variation, gene expression profile, gene-gene interaction in AD, and genetic analysis of AD based on a knowledge base. Although many studies have yielded some meaningful results, they are still in a preliminary stage. The main shortcomings include the limitations of the databases, failing to take advantage of AI to conduct a systematic biology analysis of multilevel databases, and lack of a theoretical framework for the analysis results. Finally, we outlook the direction of future development. It is crucial to develop high quality, comprehensive, large sample size, data sharing resources; a multi-level system biology AI analysis strategy is one of the development directions, and computational creativity may play a role in theory model building, verification, and designing new intervention protocols for AD.
Collapse
Affiliation(s)
- Rohan Mishra
- Washington Institute for Health Sciences, Arlington, VA 22203, USA
| | - Bin Li
- Washington Institute for Health Sciences, Arlington, VA 22203, USA
- Georgetown University Medical Center, Washington D.C. 20057, USA
| |
Collapse
|
26
|
Du L, Liu F, Liu K, Yao X, Risacher SL, Han J, Guo L, Saykin AJ, Shen L. Identifying diagnosis-specific genotype-phenotype associations via joint multitask sparse canonical correlation analysis and classification. Bioinformatics 2020; 36:i371-i379. [PMID: 32657360 PMCID: PMC7355274 DOI: 10.1093/bioinformatics/btaa434] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
MOTIVATION Brain imaging genetics studies the complex associations between genotypic data such as single nucleotide polymorphisms (SNPs) and imaging quantitative traits (QTs). The neurodegenerative disorders usually exhibit the diversity and heterogeneity, originating from which different diagnostic groups might carry distinct imaging QTs, SNPs and their interactions. Sparse canonical correlation analysis (SCCA) is widely used to identify bi-multivariate genotype-phenotype associations. However, most existing SCCA methods are unsupervised, leading to an inability to identify diagnosis-specific genotype-phenotype associations. RESULTS In this article, we propose a new joint multitask learning method, named MT-SCCALR, which absorbs the merits of both SCCA and logistic regression. MT-SCCALR learns genotype-phenotype associations of multiple tasks jointly, with each task focusing on identifying one diagnosis-specific genotype-phenotype pattern. Meanwhile, MT-SCCALR cannot only select relevant SNPs and imaging QTs for each diagnostic group alone, but also allows the selection of those shared by multiple diagnostic groups. We derive an efficient optimization algorithm whose convergence to a local optimum is guaranteed. Compared with two state-of-the-art methods, MT-SCCALR yields better or similar canonical correlation coefficients and classification performances. In addition, it owns much better discriminative canonical weight patterns of great interest than competitors. This demonstrates the power and capability of MTSCCAR in identifying diagnostically heterogeneous genotype-phenotype patterns, which would be helpful to understand the pathophysiology of brain disorders. AVAILABILITY AND IMPLEMENTATION The software is publicly available at https://github.com/dulei323/MTSCCALR. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lei Du
- Department of intelligent science and technology, School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
| | - Fang Liu
- Department of intelligent science and technology, School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
| | - Kefei Liu
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Xiaohui Yao
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Shannon L Risacher
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Junwei Han
- Department of intelligent science and technology, School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
| | - Lei Guo
- Department of intelligent science and technology, School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
| | - Andrew J Saykin
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Li Shen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | | |
Collapse
|
27
|
Brand L, Nichols K, Wang H, Shen L, Huang H. Joint Multi-Modal Longitudinal Regression and Classification for Alzheimer's Disease Prediction. IEEE TRANSACTIONS ON MEDICAL IMAGING 2020; 39:1845-1855. [PMID: 31841400 PMCID: PMC7380699 DOI: 10.1109/tmi.2019.2958943] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
Alzheimer's disease (AD) is a serious neurodegenerative condition that affects millions of individuals across the world. As the average age of individuals in the United States and the world increases, the prevalence of AD will continue to grow. To address this public health problem, the research community has developed computational approaches to sift through various aspects of clinical data and uncover their insights, among which one of the most challenging problem is to determine the biological mechanisms that cause AD to develop. To study this problem, in this paper we present a novel Joint Multi-Modal Longitudinal Regression and Classification method and show how it can be used to identify the cognitive status of the participants in the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort and the underlying biological mechanisms. By intelligently combining clinical data of various modalities (i.e., genetic information and brain scans) using a variety of regularizations that can identify AD-relevant biomarkers, we perform the regression and classification tasks simultaneously. Because the proposed objective is a non-smooth optimization problem that is difficult to solve in general, we derive an efficient iterative algorithm and rigorously prove its convergence. To validate our new method in predicting the cognitive scores of patients and their clinical diagnosis, we conduct comprehensive experiments on the ADNI cohort. Our promising results demonstrate the benefits and flexibility of the proposed method. We anticipate that our new method is of interest to clinical communities beyond AD research and have open-sourced the code of our method online.11 The code package for the proposed Joint Multi-Modal Longitudinal Regression and Classification model have been made publicly available online at https://github.com/minds-mines/jmmlrc.
Collapse
|
28
|
Kong D, An B, Zhang J, Zhu H. L2RM: Low-rank Linear Regression Models for High-dimensional Matrix Responses. J Am Stat Assoc 2020; 115:403-424. [PMID: 33408427 PMCID: PMC7781207 DOI: 10.1080/01621459.2018.1555092] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2017] [Revised: 11/11/2018] [Accepted: 11/26/2018] [Indexed: 10/27/2022]
Abstract
The aim of this paper is to develop a low-rank linear regression model (L2RM) to correlate a high-dimensional response matrix with a high dimensional vector of covariates when coefficient matrices have low-rank structures. We propose a fast and efficient screening procedure based on the spectral norm of each coefficient matrix in order to deal with the case when the number of covariates is extremely large. We develop an efficient estimation procedure based on the trace norm regularization, which explicitly imposes the low rank structure of coefficient matrices. When both the dimension of response matrix and that of covariate vector diverge at the exponential order of the sample size, we investigate the sure independence screening property under some mild conditions. We also systematically investigate some theoretical properties of our estimation procedure including estimation consistency, rank consistency and non-asymptotic error bound under some mild conditions. We further establish a theoretical guarantee for the overall solution of our two-step screening and estimation procedure. We examine the finite-sample performance of our screening and estimation methods using simulations and a large-scale imaging genetic dataset collected by the Philadelphia Neurodevelopmental Cohort (PNC) study.
Collapse
Affiliation(s)
- Dehan Kong
- Department of Statistical Sciences, University of Toronto
| | - Baiguo An
- School of Statistics, Capital University of Economics and Business
| | - Jingwen Zhang
- Department of Biostatistics, University of North Carolina at Chapel Hill
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina at Chapel Hill
| |
Collapse
|
29
|
Cheng J, Mei J, Zhong J, Men M, Zhong P. Robust Feature Selection with Feature Correlation via Sparse Multi-Label Learning. PATTERN RECOGNITION AND IMAGE ANALYSIS 2020. [DOI: 10.1134/s1054661820010034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
30
|
Brand L, Nichols K, Wang H, Huang H, Shen L. Predicting Longitudinal Outcomes of Alzheimer's Disease via a Tensor-Based Joint Classification and Regression Model. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020; 25:7-18. [PMID: 31797582 PMCID: PMC6948350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Alzheimer's disease (AD) is a serious neurodegenerative condition that affects millions of people across the world. Recently machine learning models have been used to predict the progression of AD, although they frequently do not take advantage of the longitudinal and structural components associated with multi-modal medical data. To address this, we present a new algorithm that uses the multi-block alternating direction method of multipliers to optimize a novel objective that combines multi-modal longitudinal clinical data of various modalities to simultaneously predict the cognitive scores and diagnoses of the participants in the Alzheimer's Disease Neuroimaging Initiative cohort. Our new model is designed to leverage the structure associated with clinical data that is not incorporated into standard machine learning optimization algorithms. This new approach shows state-of-the-art predictive performance and validates a collection of brain and genetic biomarkers that have been recorded previously in AD literature.
Collapse
Affiliation(s)
- Lodewijk Brand
- Department of Computer Science, Colorado School of Mines, Golden, CO 80401, USA
| | - Kai Nichols
- Department of Computer Science, Colorado School of Mines, Golden, CO 80401, USA
| | - Hua Wang
- To whom correspondence should be addressed.
| | - Heng Huang
- Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA 15206, USA
| | - Li Shen
- Department of Biostatistics Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | |
Collapse
|
31
|
Shen L, Thompson PM. Brain Imaging Genomics: Integrated Analysis and Machine Learning. PROCEEDINGS OF THE IEEE. INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS 2020; 108:125-162. [PMID: 31902950 PMCID: PMC6941751 DOI: 10.1109/jproc.2019.2947272] [Citation(s) in RCA: 82] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Brain imaging genomics is an emerging data science field, where integrated analysis of brain imaging and genomics data, often combined with other biomarker, clinical and environmental data, is performed to gain new insights into the phenotypic, genetic and molecular characteristics of the brain as well as their impact on normal and disordered brain function and behavior. It has enormous potential to contribute significantly to biomedical discoveries in brain science. Given the increasingly important role of statistical and machine learning in biomedicine and rapidly growing literature in brain imaging genomics, we provide an up-to-date and comprehensive review of statistical and machine learning methods for brain imaging genomics, as well as a practical discussion on method selection for various biomedical applications.
Collapse
Affiliation(s)
- Li Shen
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
| | - Paul M Thompson
- Imaging Genetics Center, Mark & Mary Stevens Institute for Neuroimaging & Informatics, Keck School of Medicine, University of Southern California, Los Angeles, CA 90232, USA
| |
Collapse
|
32
|
Wang J, Wang Q, Zhang H, Chen J, Wang S, Shen D. Sparse Multiview Task-Centralized Ensemble Learning for ASD Diagnosis Based on Age- and Sex-Related Functional Connectivity Patterns. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:3141-3154. [PMID: 29994137 PMCID: PMC6411442 DOI: 10.1109/tcyb.2018.2839693] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Autism spectrum disorder (ASD) is an age- and sex-related neurodevelopmental disorder that alters the brain's functional connectivity (FC). The changes caused by ASD are associated with different age- and sex-related patterns in neuroimaging data. However, most contemporary computer-assisted ASD diagnosis methods ignore the aforementioned age-/sex-related patterns. In this paper, we propose a novel sparse multiview task-centralized (Sparse-MVTC) ensemble classification method for image-based ASD diagnosis. Specifically, with the age and sex information of each subject, we formulate the classification as a multitask learning problem, where each task corresponds to learning upon a specific age/sex group. We also extract multiview features per subject to better reveal the FC changes. Then, in Sparse-MVTC learning, we select a certain central task and treat the rest as auxiliary tasks. By considering both task-task and view-view relationships between the central task and each auxiliary task, we can learn better upon the entire dataset. Finally, by selecting the central task, in turn, we are able to derive multiple classifiers for each task/group. An ensemble strategy is further adopted, such that the final diagnosis can be integrated for each subject. Our comprehensive experiments on the ABIDE database demonstrate that our proposed Sparse-MVTC ensemble learning can significantly outperform the state-of-the-art classification methods for ASD diagnosis.
Collapse
Affiliation(s)
- Jun Wang
- Department of Radiology and BRIC, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA, also with the School of Digital Media, Jiangnan University, Wuxi 214122, China, and also with the Jiangsu Key Laboratory of Media Design and Software Technology, Jiangnan University, Wuxi 214122, China ()
| | - Qian Wang
- Institute for Medical Imaging Technology, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200030, China ()
| | - Han Zhang
- Department of Radiology and BRIC, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA ()
| | - Jiawei Chen
- Department of Radiology and BRIC, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA ()
| | - Shitong Wang
- School of Digital Media, Jiangnan University, Wuxi 214122, China, and also with the Jiangsu Key Laboratory of Media Design and Software Technology, Jiangnan University, Wuxi 214122, China ()
| | - Dinggang Shen
- Department of Radiology and BRIC, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA, and also with the Department of Brain and Cognitive Engineering, Korea University, Seoul 02841, South Korea ()
| |
Collapse
|
33
|
Zhong J, Wang N, Lin Q, Zhong P. Weighted feature selection via discriminative sparse multi-view learning. Knowl Based Syst 2019. [DOI: 10.1016/j.knosys.2019.04.024] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
34
|
|
35
|
Nathoo FS, Kong L, Zhu H. A Review of Statistical Methods in Imaging Genetics. CAN J STAT 2019; 47:108-131. [PMID: 31274952 PMCID: PMC6605768 DOI: 10.1002/cjs.11487] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Accepted: 10/08/2018] [Indexed: 12/24/2022]
Abstract
With the rapid growth of modern technology, many biomedical studies are being conducted to collect massive datasets with volumes of multi-modality imaging, genetic, neurocognitive, and clinical information from increasingly large cohorts. Simultaneously extracting and integrating rich and diverse heterogeneous information in neuroimaging and/or genomics from these big datasets could transform our understanding of how genetic variants impact brain structure and function, cognitive function, and brain-related disease risk across the lifespan. Such understanding is critical for diagnosis, prevention, and treatment of numerous complex brain-related disorders (e.g., schizophrenia and Alzheimer's disease). However, the development of analytical methods for the joint analysis of both high-dimensional imaging phenotypes and high-dimensional genetic data, a big data squared (BD2) problem, presents major computational and theoretical challenges for existing analytical methods. Besides the high-dimensional nature of BD2, various neuroimaging measures often exhibit strong spatial smoothness and dependence and genetic markers may have a natural dependence structure arising from linkage disequilibrium. We review some recent developments of various statistical techniques for imaging genetics, including massive univariate and voxel-wise approaches, reduced rank regression, mixture models, and group sparse multi-task regression. By doing so, we hope that this review may encourage others in the statistical community to enter into this new and exciting field of research.
Collapse
Affiliation(s)
- Farouk S Nathoo
- Department of Mathematics and Statistics, University of Victoria
| | - Linglong Kong
- Department of Mathematical and Statistical Sciences, University of Alberta
| | - Hongtu Zhu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center
| |
Collapse
|
36
|
Shi C, Duan C, Gu Z, Tian Q, An G, Zhao R. Semi-supervised feature selection analysis with structured multi-view sparse regularization. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.10.027] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
37
|
Zhou T, Thung KH, Liu M, Shen D. Brain-Wide Genome-Wide Association Study for Alzheimer's Disease via Joint Projection Learning and Sparse Regression Model. IEEE Trans Biomed Eng 2019; 66:165-175. [PMID: 29993426 PMCID: PMC6342004 DOI: 10.1109/tbme.2018.2824725] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Brain-wide and genome-wide association (BW-GWA) study is presented in this paper to identify the associations between the brain imaging phenotypes (i.e., regional volumetric measures) and the genetic variants [i.e., single nucleotide polymorphism (SNP)] in Alzheimer's disease (AD). The main challenges of this study include the data heterogeneity, complex phenotype-genotype associations, high-dimensional data (e.g., thousands of SNPs), and the existence of phenotype outliers. Previous BW-GWA studies, while addressing some of these challenges, did not consider the diagnostic label information in their formulations, thus limiting their clinical applicability. To address these issues, we present a novel joint projection and sparse regression model to discover the associations between the phenotypes and genotypes. Specifically, to alleviate the negative influence of data heterogeneity, we first map the genotypes into an intermediate imaging-phenotype-like space. Then, to better reveal the complex phenotype-genotype associations, we project both the mapped genotypes and the original imaging phenotypes into a diagnostic-label-guided joint feature space, where the intraclass projected points are constrained to be close to each other. In addition, we use l2,1-norm minimization on both the regression loss function and the transformation coefficient matrices, to reduce the effect of phenotype outliers and also to encourage sparse feature selections of both the genotypes and phenotypes. We evaluate our method using AD neuroimaging initiative dataset, and the results show that our proposed method outperforms several state-of-the-art methods in term of the average root-mean-square error of genome-to-phenotype predictions. Besides, the associated SNPs and brain regions identified in this study have also been shown in the previous AD-related studies, thus verifying the effectiveness and potential of our proposed method in AD pathogenesis study.
Collapse
Affiliation(s)
- Tao Zhou
- Department of Radiology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA ()
| | - Kim-Han Thung
- Department of Radiology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA ()
| | - Mingxia Liu
- Department of Radiology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA ()
| | - Dinggang Shen
- Department of Radiology and Biomedical Research Imaging Center, University of North Carolina, Chapel Hill, NC 27599 USA, and also with the Department of Brain and Cognitive Engineering, Korea University, Seoul 02841, Republic of Korea ()
| |
Collapse
|
38
|
Zhou T, Thung KH, Zhu X, Shen D. Effective feature learning and fusion of multimodality data using stage-wise deep neural network for dementia diagnosis. Hum Brain Mapp 2018; 40:1001-1016. [PMID: 30381863 DOI: 10.1002/hbm.24428] [Citation(s) in RCA: 111] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2017] [Revised: 09/04/2018] [Accepted: 10/03/2018] [Indexed: 12/13/2022] Open
Abstract
In this article, the authors aim to maximally utilize multimodality neuroimaging and genetic data for identifying Alzheimer's disease (AD) and its prodromal status, Mild Cognitive Impairment (MCI), from normal aging subjects. Multimodality neuroimaging data such as MRI and PET provide valuable insights into brain abnormalities, while genetic data such as single nucleotide polymorphism (SNP) provide information about a patient's AD risk factors. When these data are used together, the accuracy of AD diagnosis may be improved. However, these data are heterogeneous (e.g., with different data distributions), and have different number of samples (e.g., with far less number of PET samples than the number of MRI or SNPs). Thus, learning an effective model using these data is challenging. To this end, we present a novel three-stage deep feature learning and fusion framework, where deep neural network is trained stage-wise. Each stage of the network learns feature representations for different combinations of modalities, via effective training using the maximum number of available samples. Specifically, in the first stage, we learn latent representations (i.e., high-level features) for each modality independently, so that the heterogeneity among modalities can be partially addressed, and high-level features from different modalities can be combined in the next stage. In the second stage, we learn joint latent features for each pair of modality combination by using the high-level features learned from the first stage. In the third stage, we learn the diagnostic labels by fusing the learned joint latent features from the second stage. To further increase the number of samples during training, we also use data at multiple scanning time points for each training subject in the dataset. We evaluate the proposed framework using Alzheimer's disease neuroimaging initiative (ADNI) dataset for AD diagnosis, and the experimental results show that the proposed framework outperforms other state-of-the-art methods.
Collapse
Affiliation(s)
- Tao Zhou
- Department of Radiology and the Biomedical Research Imaging Center, University of North Carolina, Chapel Hill, North Carolina
| | - Kim-Han Thung
- Department of Radiology and the Biomedical Research Imaging Center, University of North Carolina, Chapel Hill, North Carolina
| | - Xiaofeng Zhu
- Department of Radiology and the Biomedical Research Imaging Center, University of North Carolina, Chapel Hill, North Carolina
| | - Dinggang Shen
- Department of Radiology and the Biomedical Research Imaging Center, University of North Carolina, Chapel Hill, North Carolina.,Department of Brain and Cognitive Engineering, Korea University, Seoul, Republic of Korea
| |
Collapse
|
39
|
Bhaumik D, Jie F, Nordgren R, Bhaumik R, Sinha BK. A Mixed-Effects Model for Detecting Disrupted Connectivities in Heterogeneous Data. IEEE TRANSACTIONS ON MEDICAL IMAGING 2018; 37:2381-2389. [PMID: 29994089 DOI: 10.1109/tmi.2018.2821655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The human brain is an amazingly complex network. Aberrant activities in this network can lead to various neurological disorders such as multiple sclerosis, Parkinson's disease, Alzheimer's disease, and autism. functional magnetic resonance imaging has emerged as an important tool to delineate the neural networks affected by such diseases, particularly autism. In this paper, we propose a special type of mixed-effects model together with an appropriate procedure for controlling false discoveries to detect disrupted connectivities for developing a neural network in whole brain studies. Results are illustrated with a large data set known as autism brain imaging data exchange which includes 361 subjects from eight medical centers.
Collapse
|
40
|
Zhu X, Zhang W, Fan Y. A Robust Reduced Rank Graph Regression Method for Neuroimaging Genetic Analysis. Neuroinformatics 2018; 16:351-361. [PMID: 29907892 PMCID: PMC6092232 DOI: 10.1007/s12021-018-9382-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
To characterize associations between genetic and neuroimaging data, a variety of analytic methods have been proposed in neuroimaging genetic studies. These methods have achieved promising performance by taking into account inherent correlation in either the neuroimaging data or the genetic data alone. In this study, we propose a novel robust reduced rank graph regression based method in a linear regression framework by considering correlations inherent in neuroimaging data and genetic data jointly. Particularly, we model the association analysis problem in a reduced rank regression framework with the genetic data as a feature matrix and the neuroimaging data as a response matrix by jointly considering correlations among the neuroimaging data as well as correlations between the genetic data and the neuroimaging data. A new graph representation of genetic data is adopted to exploit their inherent correlations, in addition to robust loss functions for both the regression and the data representation tasks, and a square-root-operator applied to the robust loss functions for achieving adaptive sample weighting. The resulting optimization problem is solved using an iterative optimization method whose convergence has been theoretically proved. Experimental results on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset have demonstrated that our method could achieve competitive performance in terms of regression performance between brain structural measures and the Single Nucleotide Polymorphisms (SNPs), compared with state-of-the-art alternative methods.
Collapse
Affiliation(s)
- Xiaofeng Zhu
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Weihong Zhang
- Peking Union Medical College Hospital, Beijing, 100730, China
| | - Yong Fan
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
41
|
Brand L, Wang H, Huang H, Risacher S, Saykin A, Shen L. Joint High-Order Multi-Task Feature Learning to Predict the Progression of Alzheimer's Disease. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION : MICCAI ... INTERNATIONAL CONFERENCE ON MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION 2018; 11070:555-562. [PMID: 31179446 PMCID: PMC6553480 DOI: 10.1007/978-3-030-00928-1_63] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Alzheimer's disease (AD) is a degenerative brain disease that affects millions of people around the world. As populations in the United States and worldwide age, the prevalence of Alzheimer's disease will only increase. In turn, the social and financial costs of AD will create a difficult environment for many families and caregivers across the globe. By combining genetic information, brain scans, and clinical data, gathered over time through the Alzheimer's Disease Neuroimaging Initiative (ADNI), we propose a new Joint High-Order Multi-Modal Multi-Task Feature Learning method to predict the cognitive performance and diagnosis of patients with and without AD.
Collapse
Affiliation(s)
- Lodewijk Brand
- Department of Computer Science, Colorado School of Mines, Golden, CO, USA
| | - Hua Wang
- Department of Computer Science, Colorado School of Mines, Golden, CO, USA
| | - Heng Huang
- Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA, USA
| | - Shannon Risacher
- Department of Radiology and Imaging Sciences, Department of BioHealth Informatics, Indiana University, Indianapolis, IN, USA
| | - Andrew Saykin
- Department of Radiology and Imaging Sciences, Department of BioHealth Informatics, Indiana University, Indianapolis, IN, USA
| | - Li Shen
- Department of Radiology and Imaging Sciences, Department of BioHealth Informatics, Indiana University, Indianapolis, IN, USA
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
42
|
Huo Z, Shen D, Huang H. Genotype-phenotype association study via new multi-task learning model. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018; 23:353-364. [PMID: 29218896 PMCID: PMC5890010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Research on the associations between genetic variations and imaging phenotypes is developing with the advance in high-throughput genotype and brain image techniques. Regression analysis of single nucleotide polymorphisms (SNPs) and imaging measures as quantitative traits (QTs) has been proposed to identify the quantitative trait loci (QTL) via multi-task learning models. Recent studies consider the interlinked structures within SNPs and imaging QTs through group lasso, e.g. ℓ2, 1-norm, leading to better predictive results and insights of SNPs. However, group sparsity is not enough for representing the correlation between multiple tasks and ℓ2, 1-norm regularization is not robust either. In this paper, we propose a new multi-task learning model to analyze the associations between SNPs and QTs. We suppose that low-rank structure is also beneficial to uncover the correlation between genetic variations and imaging phenotypes. Finally, we conduct regression analysis of SNPs and QTs. Experimental results show that our model is more accurate in prediction than compared methods and presents new insights of SNPs.
Collapse
Affiliation(s)
- Zhouyuan Huo
- Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA 15260, United States,
| | | | | |
Collapse
|
43
|
Gui J, Sun Z, Ji S, Tao D, Tan T. Feature Selection Based on Structured Sparsity: A Comprehensive Study. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:1490-1507. [PMID: 28287983 DOI: 10.1109/tnnls.2016.2551724] [Citation(s) in RCA: 122] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Feature selection (FS) is an important component of many pattern recognition tasks. In these tasks, one is often confronted with very high-dimensional data. FS algorithms are designed to identify the relevant feature subset from the original features, which can facilitate subsequent analysis, such as clustering and classification. Structured sparsity-inducing feature selection (SSFS) methods have been widely studied in the last few years, and a number of algorithms have been proposed. However, there is no comprehensive study concerning the connections between different SSFS methods, and how they have evolved. In this paper, we attempt to provide a survey on various SSFS methods, including their motivations and mathematical representations. We then explore the relationship among different formulations and propose a taxonomy to elucidate their evolution. We group the existing SSFS methods into two categories, i.e., vector-based feature selection (feature selection based on lasso) and matrix-based feature selection (feature selection based on lr,p-norm). Furthermore, FS has been combined with other machine learning algorithms for specific applications, such as multitask learning, multilabel learning, multiview learning, classification, and clustering. This paper not only compares the differences and commonalities of these methods based on regression and regularization strategies, but also provides useful guidelines to practitioners working in related fields to guide them how to do feature selection.
Collapse
|
44
|
Wang J, Wang Q, Peng J, Nie D, Zhao F, Kim M, Zhang H, Wee C, Wang S, Shen D. Multi-task diagnosis for autism spectrum disorders using multi-modality features: A multi-center study. Hum Brain Mapp 2017; 38:3081-3097. [PMID: 28345269 PMCID: PMC5427005 DOI: 10.1002/hbm.23575] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2016] [Revised: 12/22/2016] [Accepted: 03/08/2017] [Indexed: 01/11/2023] Open
Abstract
Autism spectrum disorder (ASD) is a neurodevelopment disease characterized by impairment of social interaction, language, behavior, and cognitive functions. Up to now, many imaging-based methods for ASD diagnosis have been developed. For example, one may extract abundant features from multi-modality images and then derive a discriminant function to map the selected features toward the disease label. A lot of recent works, however, are limited to single imaging centers. To this end, we propose a novel multi-modality multi-center classification (M3CC) method for ASD diagnosis. We treat the classification of each imaging center as one task. By introducing the task-task and modality-modality regularizations, we solve the classification for all imaging centers simultaneously. Meanwhile, the optimal feature selection and the modeling of the discriminant functions can be jointly conducted for highly accurate diagnosis. Besides, we also present an efficient iterative optimization solution to our formulated problem and further investigate its convergence. Our comprehensive experiments on the ABIDE database show that our proposed method can significantly improve the performance of ASD diagnosis, compared to the existing methods. Hum Brain Mapp 38:3081-3097, 2017. © 2017 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Jun Wang
- School of Digital MediaJiangnan UniversityWuxiJiangsu214122China
- Department of Radiology and Biomedical Research Imaging CenterUniversity of North Carolina at Chapel HillChapel HillNorth Carolina27599
| | - Qian Wang
- Med‐X Research Institute, School of Biomedical Engineering, Shanghai Jiao Tong UniversityShanghaiChina
| | - Jialin Peng
- Department of Radiology and Biomedical Research Imaging CenterUniversity of North Carolina at Chapel HillChapel HillNorth Carolina27599
| | - Dong Nie
- Department of Radiology and Biomedical Research Imaging CenterUniversity of North Carolina at Chapel HillChapel HillNorth Carolina27599
| | - Feng Zhao
- Department of Radiology and Biomedical Research Imaging CenterUniversity of North Carolina at Chapel HillChapel HillNorth Carolina27599
| | - Minjeong Kim
- Department of Radiology and Biomedical Research Imaging CenterUniversity of North Carolina at Chapel HillChapel HillNorth Carolina27599
| | - Han Zhang
- Department of Radiology and Biomedical Research Imaging CenterUniversity of North Carolina at Chapel HillChapel HillNorth Carolina27599
| | - Chong‐Yaw Wee
- Department of Biomedical Engineering, Faculty of EngineeringNational University of SingaporeSingapore119077
| | - Shitong Wang
- School of Digital MediaJiangnan UniversityWuxiJiangsu214122China
| | - Dinggang Shen
- Department of Radiology and Biomedical Research Imaging CenterUniversity of North Carolina at Chapel HillChapel HillNorth Carolina27599
- Department of Brain and Cognitive EngineeringKorea UniversitySeoulKorea
| |
Collapse
|
45
|
Wang X, Liu K, Yan J, Risacher SL, Saykin AJ, Shen L, Huang H. Predicting Interrelated Alzheimer's Disease Outcomes via New Self-Learned Structured Low-Rank Model. INFORMATION PROCESSING IN MEDICAL IMAGING : PROCEEDINGS OF THE ... CONFERENCE 2017; 10265:198-209. [PMID: 28848302 PMCID: PMC5571742 DOI: 10.1007/978-3-319-59050-9_16] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/26/2023]
Abstract
Alzheimer's disease (AD) is a progressive neurodegenerative disorder. As the prodromal stage of AD, Mild Cognitive Impairment (MCI) maintains a good chance of converting to AD. How to efficaciously detect this conversion from MCI to AD is significant in AD diagnosis. Different from standard classification problems where the distributions of classes are independent, the AD outcomes are usually interrelated (their distributions have certain overlaps). Most of existing methods failed to examine the interrelations among different classes, such as AD, MCI conversion and MCI non-conversion. In this paper, we proposed a novel self-learned low-rank structured learning model to automatically uncover the interrelations among different classes and utilized such interrelated structures to enhance classification. We conducted experiments on the ADNI cohort data. Empirical results demonstrated advantages of our model.
Collapse
Affiliation(s)
- Xiaoqian Wang
- Computer Science & Engineering, University of Texas at Arlington, TX, 76019, USA
| | - Kefei Liu
- Radiology & Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
- BioHealth, Indiana University School of Informatics & Computing, Indianapolis, IN, 46202, USA
| | - Jingwen Yan
- Radiology & Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
- BioHealth, Indiana University School of Informatics & Computing, Indianapolis, IN, 46202, USA
| | - Shannon L Risacher
- Radiology & Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Andrew J Saykin
- Radiology & Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Li Shen
- Radiology & Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Heng Huang
- Computer Science & Engineering, University of Texas at Arlington, TX, 76019, USA
| |
Collapse
|
46
|
Wang X, Yan J, Yao X, Kim S, Nho K, Risacher SL, Saykin AJ, Shen L, Huang H. Longitudinal Genotype-Phenotype Association Study via Temporal Structure Auto-Learning Predictive Model. RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY : ... ANNUAL INTERNATIONAL CONFERENCE, RECOMB ... : PROCEEDINGS. RECOMB (CONFERENCE : 2005- ) 2017; 10229:287-302. [PMID: 29696245 PMCID: PMC5912922 DOI: 10.1007/978-3-319-56970-3_18] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2023]
Abstract
With rapid progress in high-throughput genotyping and neuroimaging, imaging genetics has gained significant attention in the research of complex brain disorders, such as Alzheimer's Disease (AD). The genotype-phenotype association study using imaging genetic data has the potential to reveal genetic basis and biological mechanism of brain structure and function. AD is a progressive neurodegenerative disease, thus, it is crucial to look into the relations between SNPs and longitudinal variations of neuroimaging phenotypes. Although some machine learning models were newly presented to capture the longitudinal patterns in genotype-phenotype association study, most of them required fixed longitudinal structures of prediction tasks and could not automatically learn the interrelations among longitudinal prediction tasks. To address this challenge, we proposed a novel temporal structure auto-learning model to automatically uncover longitudinal genotype-phenotype interrelations and utilized such interrelated structures to enhance phenotype prediction in the meantime. We conducted longitudinal phenotype prediction experiments on the ADNI cohort including 3,123 SNPs and 2 types of biomarkers, VBM and FreeSurfer. Empirical results demonstrated advantages of our proposed model over the counterparts. Moreover, available literature was identified for our top selected SNPs, which demonstrated the rationality of our prediction results. An executable program is available online at https://github.com/littleq1991/sparse_lowRank_regression.
Collapse
Affiliation(s)
- Xiaoqian Wang
- Computer Science & Engineering, University of Texas at Arlington, TX, 76019, USA
| | - Jingwen Yan
- Radiology & Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
- BioHealth, Indiana University School of Informatics & Computing, Indianapolis, IN, 46202, USA
| | - Xiaohui Yao
- Radiology & Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
- BioHealth, Indiana University School of Informatics & Computing, Indianapolis, IN, 46202, USA
| | - Sungeun Kim
- Radiology & Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Kwangsik Nho
- Radiology & Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Shannon L Risacher
- Radiology & Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Andrew J Saykin
- Radiology & Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Li Shen
- Radiology & Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Heng Huang
- Computer Science & Engineering, University of Texas at Arlington, TX, 76019, USA
| |
Collapse
|
47
|
Mahfouz A, Huisman SMH, Lelieveldt BPF, Reinders MJT. Brain transcriptome atlases: a computational perspective. Brain Struct Funct 2017; 222:1557-1580. [PMID: 27909802 PMCID: PMC5406417 DOI: 10.1007/s00429-016-1338-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2016] [Accepted: 11/15/2016] [Indexed: 01/31/2023]
Abstract
The immense complexity of the mammalian brain is largely reflected in the underlying molecular signatures of its billions of cells. Brain transcriptome atlases provide valuable insights into gene expression patterns across different brain areas throughout the course of development. Such atlases allow researchers to probe the molecular mechanisms which define neuronal identities, neuroanatomy, and patterns of connectivity. Despite the immense effort put into generating such atlases, to answer fundamental questions in neuroscience, an even greater effort is needed to develop methods to probe the resulting high-dimensional multivariate data. We provide a comprehensive overview of the various computational methods used to analyze brain transcriptome atlases.
Collapse
Affiliation(s)
- Ahmed Mahfouz
- Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands.
- Delft Bioinformatics Laboratory, Delft University of Technology, Delft, The Netherlands.
| | - Sjoerd M H Huisman
- Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
- Delft Bioinformatics Laboratory, Delft University of Technology, Delft, The Netherlands
| | - Boudewijn P F Lelieveldt
- Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
- Delft Bioinformatics Laboratory, Delft University of Technology, Delft, The Netherlands
| | - Marcel J T Reinders
- Delft Bioinformatics Laboratory, Delft University of Technology, Delft, The Netherlands
| |
Collapse
|
48
|
Cao P, Liu X, Zhang J, Zhao D, Huang M, Zaiane O. ℓ2,1 norm regularized multi-kernel based joint nonlinear feature selection and over-sampling for imbalanced data classification. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2016.12.036] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
49
|
An L, Adeli E, Liu M, Zhang J, Lee SW, Shen D. A Hierarchical Feature and Sample Selection Framework and Its Application for Alzheimer's Disease Diagnosis. Sci Rep 2017; 7:45269. [PMID: 28358032 PMCID: PMC5372170 DOI: 10.1038/srep45269] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Accepted: 02/23/2017] [Indexed: 11/09/2022] Open
Abstract
Classification is one of the most important tasks in machine learning. Due to feature redundancy or outliers in samples, using all available data for training a classifier may be suboptimal. For example, the Alzheimer's disease (AD) is correlated with certain brain regions or single nucleotide polymorphisms (SNPs), and identification of relevant features is critical for computer-aided diagnosis. Many existing methods first select features from structural magnetic resonance imaging (MRI) or SNPs and then use those features to build the classifier. However, with the presence of many redundant features, the most discriminative features are difficult to be identified in a single step. Thus, we formulate a hierarchical feature and sample selection framework to gradually select informative features and discard ambiguous samples in multiple steps for improved classifier learning. To positively guide the data manifold preservation process, we utilize both labeled and unlabeled data during training, making our method semi-supervised. For validation, we conduct experiments on AD diagnosis by selecting mutually informative features from both MRI and SNP, and using the most discriminative samples for training. The superior classification results demonstrate the effectiveness of our approach, as compared with the rivals.
Collapse
Affiliation(s)
- Le An
- Department of Radiology and Biomedical Research Imaging Center (BRIC), University of North Carolina at Chapel Hill, NC 27599, USA
| | - Ehsan Adeli
- Department of Radiology and Biomedical Research Imaging Center (BRIC), University of North Carolina at Chapel Hill, NC 27599, USA
| | - Mingxia Liu
- Department of Radiology and Biomedical Research Imaging Center (BRIC), University of North Carolina at Chapel Hill, NC 27599, USA
| | - Jun Zhang
- Department of Radiology and Biomedical Research Imaging Center (BRIC), University of North Carolina at Chapel Hill, NC 27599, USA
| | - Seong-Whan Lee
- Department of Brain and Cognitive Engineering, Korea University, Seoul 02841, Republic of Korea
| | - Dinggang Shen
- Department of Radiology and Biomedical Research Imaging Center (BRIC), University of North Carolina at Chapel Hill, NC 27599, USA
- Department of Brain and Cognitive Engineering, Korea University, Seoul 02841, Republic of Korea
| |
Collapse
|
50
|
Viswanath SE, Tiwari P, Lee G, Madabhushi A. Dimensionality reduction-based fusion approaches for imaging and non-imaging biomedical data: concepts, workflow, and use-cases. BMC Med Imaging 2017; 17:2. [PMID: 28056889 PMCID: PMC5217665 DOI: 10.1186/s12880-016-0172-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2016] [Accepted: 12/09/2016] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND With a wide array of multi-modal, multi-protocol, and multi-scale biomedical data being routinely acquired for disease characterization, there is a pressing need for quantitative tools to combine these varied channels of information. The goal of these integrated predictors is to combine these varied sources of information, while improving on the predictive ability of any individual modality. A number of application-specific data fusion methods have been previously proposed in the literature which have attempted to reconcile the differences in dimensionalities and length scales across different modalities. Our objective in this paper was to help identify metholodological choices that need to be made in order to build a data fusion technique, as it is not always clear which strategy is optimal for a particular problem. As a comprehensive review of all possible data fusion methods was outside the scope of this paper, we have focused on fusion approaches that employ dimensionality reduction (DR). METHODS In this work, we quantitatively evaluate 4 non-overlapping existing instantiations of DR-based data fusion, within 3 different biomedical applications comprising over 100 studies. These instantiations utilized different knowledge representation and knowledge fusion methods, allowing us to examine the interplay of these modules in the context of data fusion. The use cases considered in this work involve the integration of (a) radiomics features from T2w MRI with peak area features from MR spectroscopy for identification of prostate cancer in vivo, (b) histomorphometric features (quantitative features extracted from histopathology) with protein mass spectrometry features for predicting 5 year biochemical recurrence in prostate cancer patients, and (c) volumetric measurements on T1w MRI with protein expression features to discriminate between patients with and without Alzheimers' Disease. RESULTS AND CONCLUSIONS Our preliminary results in these specific use cases indicated that the use of kernel representations in conjunction with DR-based fusion may be most effective, as a weighted multi-kernel-based DR approach resulted in the highest area under the ROC curve of over 0.8. By contrast non-optimized DR-based representation and fusion methods yielded the worst predictive performance across all 3 applications. Our results suggest that when the individual modalities demonstrate relatively poor discriminability, many of the data fusion methods may not yield accurate, discriminatory representations either. In summary, to outperform the predictive ability of individual modalities, methodological choices for data fusion must explicitly account for the sparsity of and noise in the feature space.
Collapse
Affiliation(s)
- Satish E Viswanath
- Department of Biomedical Engineering, Case Western Reserve University, 10900 Euclid Ave, Wickenden 523, Cleveland, OH, USA.
| | - Pallavi Tiwari
- Department of Biomedical Engineering, Case Western Reserve University, 10900 Euclid Ave, Wickenden 523, Cleveland, OH, USA
| | - George Lee
- Department of Biomedical Engineering, Case Western Reserve University, 10900 Euclid Ave, Wickenden 523, Cleveland, OH, USA
| | - Anant Madabhushi
- Department of Biomedical Engineering, Case Western Reserve University, 10900 Euclid Ave, Wickenden 523, Cleveland, OH, USA
| | | |
Collapse
|