1
|
Deng Y, Yao Y, Wang Y, Yu T, Cai W, Zhou D, Yin F, Liu W, Liu Y, Xie C, Guan J, Hu Y, Huang P, Li W. An end-to-end deep learning method for mass spectrometry data analysis to reveal disease-specific metabolic profiles. Nat Commun 2024; 15:7136. [PMID: 39164279 PMCID: PMC11335749 DOI: 10.1038/s41467-024-51433-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 08/07/2024] [Indexed: 08/22/2024] Open
Abstract
Untargeted metabolomic analysis using mass spectrometry provides comprehensive metabolic profiling, but its medical application faces challenges of complex data processing, high inter-batch variability, and unidentified metabolites. Here, we present DeepMSProfiler, an explainable deep-learning-based method, enabling end-to-end analysis on raw metabolic signals with output of high accuracy and reliability. Using cross-hospital 859 human serum samples from lung adenocarcinoma, benign lung nodules, and healthy individuals, DeepMSProfiler successfully differentiates the metabolomic profiles of different groups (AUC 0.99) and detects early-stage lung adenocarcinoma (accuracy 0.961). Model flow and ablation experiments demonstrate that DeepMSProfiler overcomes inter-hospital variability and effects of unknown metabolites signals. Our ensemble strategy removes background-category phenomena in multi-classification deep-learning models, and the novel interpretability enables direct access to disease-related metabolite-protein networks. Further applying to lipid metabolomic data unveils correlations of important metabolites and proteins. Overall, DeepMSProfiler offers a straightforward and reliable method for disease diagnosis and mechanism discovery, enhancing its broad applicability.
Collapse
Affiliation(s)
- Yongjie Deng
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Yao Yao
- State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, China
- Metabolic Innovation Platform, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Yanni Wang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Tiantian Yu
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
- State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, China
- Metabolic Innovation Platform, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Wenhao Cai
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Dingli Zhou
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Feng Yin
- State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Wanli Liu
- State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Yuying Liu
- State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Chuanbo Xie
- State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Jian Guan
- Department of Radiology, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Yumin Hu
- State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, China.
- Metabolic Innovation Platform, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China.
| | - Peng Huang
- State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, China.
- Metabolic Innovation Platform, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China.
| | - Weizhong Li
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China.
- Sun Yat-Sen University School of Medicine, Sun Yat-Sen University, Shenzhen, China.
- Key Laboratory of Tropical Disease Control of Ministry of Education, Sun Yat-sen University, Guangzhou, China.
| |
Collapse
|
2
|
Wang K, Theeke LA, Liao C, Wang N, Lu Y, Xiao D, Xu C. Deep learning analysis of UPLC-MS/MS-based metabolomics data to predict Alzheimer's disease. J Neurol Sci 2023; 453:120812. [PMID: 37776718 DOI: 10.1016/j.jns.2023.120812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 08/22/2023] [Accepted: 09/14/2023] [Indexed: 10/02/2023]
Abstract
OBJECTIVE Metabolic biomarkers can potentially inform disease progression in Alzheimer's disease (AD). The purpose of this study is to identify and describe a new set of diagnostic biomarkers for developing deep learning (DL) tools to predict AD using Ultra Performance Liquid Chromatography Mass Spectrometry (UPLC-MS/MS)-based metabolomics data. METHODS A total of 177 individuals, including 78 with AD and 99 with cognitive normal (CN), were selected from the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort along with 150 metabolomic biomarkers. We performed feature selection using the Least Absolute Shrinkage and Selection Operator (LASSO). The H2O DL function was used to build multilayer feedforward neural networks to predict AD. RESULTS The LASSO selected 21 metabolic biomarkers. To develop DL models, the 21 biomarkers identified by LASSO were imported into the H2O package. The data was split into 70% for training and 30% for validation. The best DL model with two layers and 18 neurons achieved an accuracy of 0.881, F1-score of 0.892, and AUC of 0.873. Several metabolomic biomarkers involved in glucose and lipid metabolism, in particular bile acid metabolites, were associated with APOE-ε4 allele and clinical biomarkers (Aβ42, tTau, pTau), cognitive assessments [the Alzheimer's Disease Assessment Scale-cognitive subscale 13 (ADAS13), the Mini-Mental State Examination (MMSE)], and hippocampus volume. CONCLUSIONS This study identified a new set of diagnostic metabolomic biomarkers for developing DL tools to predict AD. These biomarkers may help with early diagnosis, prognostic risk stratification, and/or early treatment interventions for patients at risk for AD.
Collapse
Affiliation(s)
- Kesheng Wang
- School of Nursing, Health Sciences Center, West Virginia University, Morgantown, WV 26506, USA.
| | - Laurie A Theeke
- School of Nursing, The George Washington University, Ashburn, VA 20147, USA
| | - Christopher Liao
- Department of Electrical and Computer Engineering, Boston University, MA 02215, USA
| | - Nianyang Wang
- Department of Health Policy and Management, School of Public Health, University of Maryland, College Park, MD 20742, USA
| | - Yongke Lu
- Department of Biomedical Sciences, Joan C. Edwards School of Medicine, Marshall University, Huntington, WV 25755, USA
| | - Danqing Xiao
- Department of STEM, School of Arts and Sciences, Regis College, Weston, MA 02493, USA
| | - Chun Xu
- Department of Health and Biomedical Sciences, College of Health Professions, University of Texas Rio Grande Valley, Brownsville, TX 78520, USA.
| |
Collapse
|
3
|
Bartmanski BJ, Rocha M, Zimmermann-Kogadeeva M. Recent advances in data- and knowledge-driven approaches to explore primary microbial metabolism. Curr Opin Chem Biol 2023; 75:102324. [PMID: 37207402 PMCID: PMC10410306 DOI: 10.1016/j.cbpa.2023.102324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 04/15/2023] [Accepted: 04/18/2023] [Indexed: 05/21/2023]
Abstract
With the rapid progress in metabolomics and sequencing technologies, more data on the metabolome of single microbes and their communities become available, revealing the potential of microorganisms to metabolize a broad range of chemical compounds. The analysis of microbial metabolomics datasets remains challenging since it inherits the technical challenges of metabolomics analysis, such as compound identification and annotation, while harboring challenges in data interpretation, such as distinguishing metabolite sources in mixed samples. This review outlines the recent advances in computational methods to analyze primary microbial metabolism: knowledge-based approaches that take advantage of metabolic and molecular networks and data-driven approaches that employ machine/deep learning algorithms in combination with large-scale datasets. These methods aim at improving metabolite identification and disentangling reciprocal interactions between microbes and metabolites. We also discuss the perspective of combining these approaches and further developments required to advance the investigation of primary metabolism in mixed microbial samples.
Collapse
Affiliation(s)
| | - Miguel Rocha
- Centre of Biological Engineering, University of Minho, Campus of Gualtar, Braga, Portugal
| | | |
Collapse
|
4
|
Galal A, Talal M, Moustafa A. Applications of machine learning in metabolomics: Disease modeling and classification. Front Genet 2022; 13:1017340. [PMID: 36506316 PMCID: PMC9730048 DOI: 10.3389/fgene.2022.1017340] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 11/07/2022] [Indexed: 11/25/2022] Open
Abstract
Metabolomics research has recently gained popularity because it enables the study of biological traits at the biochemical level and, as a result, can directly reveal what occurs in a cell or a tissue based on health or disease status, complementing other omics such as genomics and transcriptomics. Like other high-throughput biological experiments, metabolomics produces vast volumes of complex data. The application of machine learning (ML) to analyze data, recognize patterns, and build models is expanding across multiple fields. In the same way, ML methods are utilized for the classification, regression, or clustering of highly complex metabolomic data. This review discusses how disease modeling and diagnosis can be enhanced via deep and comprehensive metabolomic profiling using ML. We discuss the general layout of a metabolic workflow and the fundamental ML techniques used to analyze metabolomic data, including support vector machines (SVM), decision trees, random forests (RF), neural networks (NN), and deep learning (DL). Finally, we present the advantages and disadvantages of various ML methods and provide suggestions for different metabolic data analysis scenarios.
Collapse
Affiliation(s)
- Aya Galal
- Systems Genomics Laboratory, American University in Cairo, New Cairo, Egypt,Institute of Global Health and Human Ecology, American University in Cairo, New Cairo, Egypt
| | - Marwa Talal
- Systems Genomics Laboratory, American University in Cairo, New Cairo, Egypt,Biotechnology Graduate Program, American University in Cairo, New Cairo, Egypt
| | - Ahmed Moustafa
- Systems Genomics Laboratory, American University in Cairo, New Cairo, Egypt,Biotechnology Graduate Program, American University in Cairo, New Cairo, Egypt,Department of Biology, American University in Cairo, New Cairo, Egypt,*Correspondence: Ahmed Moustafa,
| |
Collapse
|