1
|
Er AG, Ding DY, Er B, Uzun M, Cakmak M, Sadee C, Durhan G, Ozmen MN, Tanriover MD, Topeli A, Aydin Son Y, Tibshirani R, Unal S, Gevaert O. Multimodal data fusion using sparse canonical correlation analysis and cooperative learning: a COVID-19 cohort study. NPJ Digit Med 2024; 7:117. [PMID: 38714751 PMCID: PMC11076490 DOI: 10.1038/s41746-024-01128-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 04/25/2024] [Indexed: 05/10/2024] Open
Abstract
Through technological innovations, patient cohorts can be examined from multiple views with high-dimensional, multiscale biomedical data to classify clinical phenotypes and predict outcomes. Here, we aim to present our approach for analyzing multimodal data using unsupervised and supervised sparse linear methods in a COVID-19 patient cohort. This prospective cohort study of 149 adult patients was conducted in a tertiary care academic center. First, we used sparse canonical correlation analysis (CCA) to identify and quantify relationships across different data modalities, including viral genome sequencing, imaging, clinical data, and laboratory results. Then, we used cooperative learning to predict the clinical outcome of COVID-19 patients: Intensive care unit admission. We show that serum biomarkers representing severe disease and acute phase response correlate with original and wavelet radiomics features in the LLL frequency channel (cor(Xu1, Zv1) = 0.596, p value < 0.001). Among radiomics features, histogram-based first-order features reporting the skewness, kurtosis, and uniformity have the lowest negative, whereas entropy-related features have the highest positive coefficients. Moreover, unsupervised analysis of clinical data and laboratory results gives insights into distinct clinical phenotypes. Leveraging the availability of global viral genome databases, we demonstrate that the Word2Vec natural language processing model can be used for viral genome encoding. It not only separates major SARS-CoV-2 variants but also allows the preservation of phylogenetic relationships among them. Our quadruple model using Word2Vec encoding achieves better prediction results in the supervised task. The model yields area under the curve (AUC) and accuracy values of 0.87 and 0.77, respectively. Our study illustrates that sparse CCA analysis and cooperative learning are powerful techniques for handling high-dimensional, multimodal data to investigate multivariate associations in unsupervised and supervised tasks.
Collapse
Affiliation(s)
- Ahmet Gorkem Er
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA.
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, 06800, Ankara, Turkey.
- Department of Infectious Diseases and Clinical Microbiology, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey.
| | - Daisy Yi Ding
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Berrin Er
- Department of Internal Medicine, Division of Intensive Care Medicine, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Mertcan Uzun
- Department of Infectious Diseases and Clinical Microbiology, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Mehmet Cakmak
- Department of Internal Medicine, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Christoph Sadee
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA
| | - Gamze Durhan
- Department of Radiology, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Mustafa Nasuh Ozmen
- Department of Radiology, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Mine Durusu Tanriover
- Department of Internal Medicine, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Arzu Topeli
- Department of Internal Medicine, Division of Intensive Care Medicine, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Yesim Aydin Son
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, 06800, Ankara, Turkey
| | - Robert Tibshirani
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
- Department of Statistics, Stanford University, Stanford, CA, 94305, USA
| | - Serhat Unal
- Department of Infectious Diseases and Clinical Microbiology, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA.
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA.
| |
Collapse
|
2
|
Xia H, Luan X, Bao Z, Zhu Q, Wen C, Wang M, Song W. A multi-cohort study of the hippocampal radiomics model and its associated biological changes in Alzheimer's Disease. Transl Psychiatry 2024; 14:111. [PMID: 38395947 PMCID: PMC10891125 DOI: 10.1038/s41398-024-02836-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Revised: 02/08/2024] [Accepted: 02/14/2024] [Indexed: 02/25/2024] Open
Abstract
There have been no previous reports of hippocampal radiomics features associated with biological functions in Alzheimer's Disease (AD). This study aims to develop and validate a hippocampal radiomics model from structural magnetic resonance imaging (MRI) data for identifying patients with AD, and to explore the mechanism underlying the developed radiomics model using peripheral blood gene expression. In this retrospective multi-study, a radiomics model was developed based on the radiomics discovery group (n = 420) and validated in other cohorts. The biological functions underlying the model were identified in the radiogenomic analysis group using paired MRI and peripheral blood transcriptome analyses (n = 266). Mediation analysis and external validation were applied to further validate the key module and hub genes. A 12 radiomics features-based prediction model was constructed and this model showed highly robust predictive power for identifying AD patients in the validation and other three cohorts. Using radiogenomics mapping, myeloid leukocyte and neutrophil activation were enriched, and six hub genes were identified from the key module, which showed the highest correlation with the radiomics model. The correlation between hub genes and cognitive ability was confirmed using the external validation set of the AddneuroMed dataset. Mediation analysis revealed that the hippocampal radiomics model mediated the association between blood gene expression and cognitive ability. The hippocampal radiomics model can accurately identify patients with AD, while the predictive radiomics model may be driven by neutrophil-related biological pathways.
Collapse
Affiliation(s)
- Huwei Xia
- Center for Geriatric Medicine and Institute of Aging, Key Laboratory of Alzheimer's Disease of Zhejiang Province, Zhejiang Provincial Clinical Research for Mental Disorders, Wenzhou Medical University, Wenzhou, Zhejiang, 325035, China
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou, Zhejiang, 325000, China
| | - Xiaoqian Luan
- Center for Geriatric Medicine and Institute of Aging, Key Laboratory of Alzheimer's Disease of Zhejiang Province, Zhejiang Provincial Clinical Research for Mental Disorders, Wenzhou Medical University, Wenzhou, Zhejiang, 325035, China
| | - Zhengkai Bao
- Center for Geriatric Medicine and Institute of Aging, Key Laboratory of Alzheimer's Disease of Zhejiang Province, Zhejiang Provincial Clinical Research for Mental Disorders, Wenzhou Medical University, Wenzhou, Zhejiang, 325035, China
| | - Qinxin Zhu
- Center for Geriatric Medicine and Institute of Aging, Key Laboratory of Alzheimer's Disease of Zhejiang Province, Zhejiang Provincial Clinical Research for Mental Disorders, Wenzhou Medical University, Wenzhou, Zhejiang, 325035, China
| | - Caiyun Wen
- Department of Radiology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang, 325035, China
| | - Meihao Wang
- Department of Radiology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang, 325035, China
| | - Weihong Song
- Center for Geriatric Medicine and Institute of Aging, Key Laboratory of Alzheimer's Disease of Zhejiang Province, Zhejiang Provincial Clinical Research for Mental Disorders, Wenzhou Medical University, Wenzhou, Zhejiang, 325035, China.
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou, Zhejiang, 325000, China.
- Department of Radiology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang, 325035, China.
| |
Collapse
|
3
|
Er AG, Ding DY, Er B, Uzun M, Cakmak M, Sadee C, Durhan G, Ozmen MN, Tanriover MD, Topeli A, Son YA, Tibshirani R, Unal S, Gevaert O. Multimodal Biomedical Data Fusion Using Sparse Canonical Correlation Analysis and Cooperative Learning: A Cohort Study on COVID-19. RESEARCH SQUARE 2023:rs.3.rs-3569833. [PMID: 38045288 PMCID: PMC10690316 DOI: 10.21203/rs.3.rs-3569833/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Through technological innovations, patient cohorts can be examined from multiple views with high-dimensional, multiscale biomedical data to classify clinical phenotypes and predict outcomes. Here, we aim to present our approach for analyzing multimodal data using unsupervised and supervised sparse linear methods in a COVID-19 patient cohort. This prospective cohort study of 149 adult patients was conducted in a tertiary care academic center. First, we used sparse canonical correlation analysis (CCA) to identify and quantify relationships across different data modalities, including viral genome sequencing, imaging, clinical data, and laboratory results. Then, we used cooperative learning to predict the clinical outcome of COVID-19 patients. We show that serum biomarkers representing severe disease and acute phase response correlate with original and wavelet radiomics features in the LLL frequency channel (corr(Xu1, Zv1) = 0.596, p-value < 0.001). Among radiomics features, histogram-based first-order features reporting the skewness, kurtosis, and uniformity have the lowest negative, whereas entropy-related features have the highest positive coefficients. Moreover, unsupervised analysis of clinical data and laboratory results gives insights into distinct clinical phenotypes. Leveraging the availability of global viral genome databases, we demonstrate that the Word2Vec natural language processing model can be used for viral genome encoding. It not only separates major SARS-CoV-2 variants but also allows the preservation of phylogenetic relationships among them. Our quadruple model using Word2Vec encoding achieves better prediction results in the supervised task. The model yields area under the curve (AUC) and accuracy values of 0.87 and 0.77, respectively. Our study illustrates that sparse CCA analysis and cooperative learning are powerful techniques for handling high-dimensional, multimodal data to investigate multivariate associations in unsupervised and supervised tasks.
Collapse
Affiliation(s)
- Ahmet Gorkem Er
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, Ankara, 06800, Türkiye
- Department of Infectious Diseases and Clinical Microbiology, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Daisy Yi Ding
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Berrin Er
- Department of Internal Medicine, Division of Intensive Care Medicine, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Mertcan Uzun
- Department of Infectious Diseases and Clinical Microbiology, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Mehmet Cakmak
- Department of Internal Medicine, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Christoph Sadee
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA
| | - Gamze Durhan
- Department of Radiology, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Mustafa Nasuh Ozmen
- Department of Radiology, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Mine Durusu Tanriover
- Department of Internal Medicine, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Arzu Topeli
- Department of Internal Medicine, Division of Intensive Care Medicine, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Yesim Aydin Son
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, Ankara, 06800, Türkiye
| | - Robert Tibshirani
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
- Department of Statistics, Stanford University, Stanford, CA, 94305, USA
| | - Serhat Unal
- Department of Infectious Diseases and Clinical Microbiology, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| |
Collapse
|