1
|
Zhang S, Li P, Wang S, Zhu J, Huang Z, Cai F, Freidel S, Ling F, Schwarz E, Chen J. BioM2: biologically informed multi-stage machine learning for phenotype prediction using omics data. Brief Bioinform 2024; 25:bbae384. [PMID: 39126426 PMCID: PMC11316398 DOI: 10.1093/bib/bbae384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 06/15/2024] [Accepted: 07/24/2024] [Indexed: 08/12/2024] Open
Abstract
Navigating the complex landscape of high-dimensional omics data with machine learning models presents a significant challenge. The integration of biological domain knowledge into these models has shown promise in creating more meaningful stratifications of predictor variables, leading to algorithms that are both more accurate and generalizable. However, the wider availability of machine learning tools capable of incorporating such biological knowledge remains limited. Addressing this gap, we introduce BioM2, a novel R package designed for biologically informed multistage machine learning. BioM2 uniquely leverages biological information to effectively stratify and aggregate high-dimensional biological data in the context of machine learning. Demonstrating its utility with genome-wide DNA methylation and transcriptome-wide gene expression data, BioM2 has shown to enhance predictive performance, surpassing traditional machine learning models that operate without the integration of biological knowledge. A key feature of BioM2 is its ability to rank predictor variables within biological categories, specifically Gene Ontology pathways. This functionality not only aids in the interpretability of the results but also enables a subsequent modular network analysis of these variables, shedding light on the intricate systems-level biology underpinning the predictive outcome. We have proposed a biologically informed multistage machine learning framework termed BioM2 for phenotype prediction based on omics data. BioM2 has been incorporated into the BioM2 CRAN package (https://cran.r-project.org/web/packages/BioM2/index.html).
Collapse
Affiliation(s)
- Shunjie Zhang
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China
| | - Pan Li
- Center for Intelligent Medicine, Greater Bay Area Institute of Precision Medicine (Guangzhou), School of Life Sciences, Fudan University, No. 6, 2nd Nanjiang Road, Nansha District, 511462 Guangzhou, China
| | - Shenghan Wang
- Center for Intelligent Medicine, Greater Bay Area Institute of Precision Medicine (Guangzhou), School of Life Sciences, Fudan University, No. 6, 2nd Nanjiang Road, Nansha District, 511462 Guangzhou, China
| | - Jijun Zhu
- Center for Intelligent Medicine, Greater Bay Area Institute of Precision Medicine (Guangzhou), School of Life Sciences, Fudan University, No. 6, 2nd Nanjiang Road, Nansha District, 511462 Guangzhou, China
| | - Zhongting Huang
- Center for Intelligent Medicine, Greater Bay Area Institute of Precision Medicine (Guangzhou), School of Life Sciences, Fudan University, No. 6, 2nd Nanjiang Road, Nansha District, 511462 Guangzhou, China
| | - Fuqiang Cai
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China
| | - Sebastian Freidel
- Hector Institute for Artificial Intelligence in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, M7, Mannheim 68161, Germany
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, J5, Mannheim 68159, Germany
| | - Fei Ling
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China
| | - Emanuel Schwarz
- Hector Institute for Artificial Intelligence in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, M7, Mannheim 68161, Germany
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, J5, Mannheim 68159, Germany
| | - Junfang Chen
- Center for Intelligent Medicine, Greater Bay Area Institute of Precision Medicine (Guangzhou), School of Life Sciences, Fudan University, No. 6, 2nd Nanjiang Road, Nansha District, 511462 Guangzhou, China
- Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, China
| |
Collapse
|
2
|
Mohr AE, Ortega-Santos CP, Whisner CM, Klein-Seetharaman J, Jasbi P. Navigating Challenges and Opportunities in Multi-Omics Integration for Personalized Healthcare. Biomedicines 2024; 12:1496. [PMID: 39062068 PMCID: PMC11274472 DOI: 10.3390/biomedicines12071496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 06/25/2024] [Accepted: 06/28/2024] [Indexed: 07/28/2024] Open
Abstract
The field of multi-omics has witnessed unprecedented growth, converging multiple scientific disciplines and technological advances. This surge is evidenced by a more than doubling in multi-omics scientific publications within just two years (2022-2023) since its first referenced mention in 2002, as indexed by the National Library of Medicine. This emerging field has demonstrated its capability to provide comprehensive insights into complex biological systems, representing a transformative force in health diagnostics and therapeutic strategies. However, several challenges are evident when merging varied omics data sets and methodologies, interpreting vast data dimensions, streamlining longitudinal sampling and analysis, and addressing the ethical implications of managing sensitive health information. This review evaluates these challenges while spotlighting pivotal milestones: the development of targeted sampling methods, the use of artificial intelligence in formulating health indices, the integration of sophisticated n-of-1 statistical models such as digital twins, and the incorporation of blockchain technology for heightened data security. For multi-omics to truly revolutionize healthcare, it demands rigorous validation, tangible real-world applications, and smooth integration into existing healthcare infrastructures. It is imperative to address ethical dilemmas, paving the way for the realization of a future steered by omics-informed personalized medicine.
Collapse
Affiliation(s)
- Alex E. Mohr
- Systems Precision Engineering and Advanced Research (SPEAR), Theriome Inc., Phoenix, AZ 85004, USA; (A.E.M.); (C.P.O.-S.); (C.M.W.); (J.K.-S.)
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA
- Biodesign Institute Center for Health Through Microbiomes, Arizona State University, Tempe, AZ 85281, USA
| | - Carmen P. Ortega-Santos
- Systems Precision Engineering and Advanced Research (SPEAR), Theriome Inc., Phoenix, AZ 85004, USA; (A.E.M.); (C.P.O.-S.); (C.M.W.); (J.K.-S.)
- Department of Exercise and Nutrition Sciences, Milken Institute School of Public Health, George Washington University, Washington, DC 20052, USA
| | - Corrie M. Whisner
- Systems Precision Engineering and Advanced Research (SPEAR), Theriome Inc., Phoenix, AZ 85004, USA; (A.E.M.); (C.P.O.-S.); (C.M.W.); (J.K.-S.)
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA
- Biodesign Institute Center for Health Through Microbiomes, Arizona State University, Tempe, AZ 85281, USA
| | - Judith Klein-Seetharaman
- Systems Precision Engineering and Advanced Research (SPEAR), Theriome Inc., Phoenix, AZ 85004, USA; (A.E.M.); (C.P.O.-S.); (C.M.W.); (J.K.-S.)
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA
- School of Molecular Sciences, Arizona State University, Tempe, AZ 85281, USA
| | - Paniz Jasbi
- Systems Precision Engineering and Advanced Research (SPEAR), Theriome Inc., Phoenix, AZ 85004, USA; (A.E.M.); (C.P.O.-S.); (C.M.W.); (J.K.-S.)
| |
Collapse
|
3
|
Abbasi AF, Asim MN, Ahmed S, Vollmer S, Dengel A. Survival prediction landscape: an in-depth systematic literature review on activities, methods, tools, diseases, and databases. Front Artif Intell 2024; 7:1428501. [PMID: 39021434 PMCID: PMC11252047 DOI: 10.3389/frai.2024.1428501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Accepted: 06/12/2024] [Indexed: 07/20/2024] Open
Abstract
Survival prediction integrates patient-specific molecular information and clinical signatures to forecast the anticipated time of an event, such as recurrence, death, or disease progression. Survival prediction proves valuable in guiding treatment decisions, optimizing resource allocation, and interventions of precision medicine. The wide range of diseases, the existence of various variants within the same disease, and the reliance on available data necessitate disease-specific computational survival predictors. The widespread adoption of artificial intelligence (AI) methods in crafting survival predictors has undoubtedly revolutionized this field. However, the ever-increasing demand for more sophisticated and effective prediction models necessitates the continued creation of innovative advancements. To catalyze these advancements, it is crucial to bring existing survival predictors knowledge and insights into a centralized platform. The paper in hand thoroughly examines 23 existing review studies and provides a concise overview of their scope and limitations. Focusing on a comprehensive set of 90 most recent survival predictors across 44 diverse diseases, it delves into insights of diverse types of methods that are used in the development of disease-specific predictors. This exhaustive analysis encompasses the utilized data modalities along with a detailed analysis of subsets of clinical features, feature engineering methods, and the specific statistical, machine or deep learning approaches that have been employed. It also provides insights about survival prediction data sources, open-source predictors, and survival prediction frameworks.
Collapse
Affiliation(s)
- Ahtisham Fazeel Abbasi
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, Germany
- Smart Data & Knowledge Services, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Kaiserslautern, Germany
| | - Muhammad Nabeel Asim
- Smart Data & Knowledge Services, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Kaiserslautern, Germany
| | - Sheraz Ahmed
- Smart Data & Knowledge Services, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Kaiserslautern, Germany
| | - Sebastian Vollmer
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, Germany
- Smart Data & Knowledge Services, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Kaiserslautern, Germany
| | - Andreas Dengel
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, Germany
- Smart Data & Knowledge Services, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Kaiserslautern, Germany
| |
Collapse
|
4
|
Shi L, Jia W, Zhang R, Fan Z, Bian W, Mo H. High-throughput analysis of hazards in novel food based on the density functional theory and multimodal deep learning. Food Chem 2024; 442:138468. [PMID: 38266417 DOI: 10.1016/j.foodchem.2024.138468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Revised: 12/30/2023] [Accepted: 01/15/2024] [Indexed: 01/26/2024]
Abstract
The emergence of cultured meat presents the potential for personalized food additive manufacturing, offering a solution to address future food resource scarcity. Processing raw materials and products in synthetic food products poses challenges in identifying hazards, impacting the entire industrial chain during the industry's further evolution. It is crucial to examine the correlation of biological information at different levels and to reveal the temporal dynamics jointly. Proposed active prevention method includes four aspects: (i) Investigating the molecular-level mechanism underlying the binding and dissociation of hazards with proteins represents a novel approach to mitigate matrix effect. (ii) Identifying distinct fragments is a pivotal advancement toward developing a novel screening strategy for hazards throughout the food chain. (iii) Designing an artificial intelligence model-based approach to acquire multi-dimensional histology data also holds significant potential for various applications. (iv) Integrating multimodal data is a practical approach to enhance evaluation and feedback control accuracy.
Collapse
Affiliation(s)
- Lin Shi
- School of Food and Biological Engineering, Shaanxi University of Science and Technology, Xi'an 710021, China
| | - Wei Jia
- School of Food and Biological Engineering, Shaanxi University of Science and Technology, Xi'an 710021, China; Shaanxi Testing Institute of Product Quality Supervision, Xi'an, Shaanxi 710048, China; Shaanxi Research Institute of Agricultural Products Processing Technology, Xi'an 710021, China; Shaanxi Sky Pet Biotechnology Co., Ltd, Xi'an 710075, China.
| | - Rong Zhang
- School of Food and Biological Engineering, Shaanxi University of Science and Technology, Xi'an 710021, China
| | - Zibian Fan
- School of Food and Biological Engineering, Shaanxi University of Science and Technology, Xi'an 710021, China
| | - Wenwen Bian
- Shaanxi Testing Institute of Product Quality Supervision, Xi'an, Shaanxi 710048, China
| | - Haizhen Mo
- School of Food and Biological Engineering, Shaanxi University of Science and Technology, Xi'an 710021, China; Shaanxi Research Institute of Agricultural Products Processing Technology, Xi'an 710021, China.
| |
Collapse
|
5
|
Du Z, Jiang Y, Yang Y, Kang X, Yan J, Liu B, Yang M. A multi-omics analysis-based model to predict the prognosis of low-grade gliomas. Sci Rep 2024; 14:9427. [PMID: 38658591 PMCID: PMC11043340 DOI: 10.1038/s41598-024-58434-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 03/29/2024] [Indexed: 04/26/2024] Open
Abstract
Lower-grade gliomas (LGGs) exhibit highly variable clinical behaviors, while classic histology characteristics cannot accurately reflect the authentic biological behaviors, clinical outcomes, and prognosis of LGGs. In this study, we carried out analyses of whole exome sequencing, RNA sequencing and DNA methylation in primary vs. recurrent LGG samples, and also combined the multi-omics data to construct a prognostic prediction model. TCGA-LGG dataset was searched for LGG samples. 523 samples were used for whole exome sequencing analysis, 532 for transcriptional analysis, and 529 for DNA methylation analysis. LASSO regression was used to screen genes with significant association with LGG survival from the frequently mutated genes, differentially expressed genes, and differentially methylated genes, whereby a prediction model for prognosis of LGG was further constructed and validated. The most frequently mutated diver genes in LGGs were IDH1 (77%), TP53 (48%), ATRX (37%), etc. Top significantly up-regulated genes were C6orf15, DAO, MEOX2, etc., and top significantly down-regulated genes were DMBX1, GPR50, HMX2, etc. 2077 genes were more and 299 were less methylated in recurrent vs. primary LGG samples. Thirty-nine genes from the above analysis were included to establish a prediction model of survival, which showed that the high-score group had a very significantly shorter survival than the low-score group in both training and testing sets. ROC analysis showed that AUC was 0.817 for the training set and 0.819 for the testing set. This study will be beneficial to accurately predict the survival of LGGs to identify patients with poor prognosis to take specific treatment as early, which will help improve the treatment outcomes and prognosis of LGG.
Collapse
Affiliation(s)
- Zhijie Du
- The Comprehensive Cancer Centre of Nanjing Drum Tower Hospital, Nanjing Drum Tower Hospital Clinical College of Traditional Chinese and Western Medicine, Nanjing University of Chinese Medicine, Nanjing, China
| | - Yuehui Jiang
- Nanjing Drum Tower Hospital Clinical College of Nanjing University of Chinese Medicine, Nanjing, China
| | - Yueling Yang
- The Comprehensive Cancer Centre of Nanjing Drum Tower Hospital, Nanjing Drum Tower Hospital Clinical College of Traditional Chinese and Western Medicine, Nanjing University of Chinese Medicine, Nanjing, China
| | - Xiaoyu Kang
- Nanjing Hospital of Chinese Medicine Affiliated to Nanjing University of Chinese Medicine, Nanjing, China
| | - Jing Yan
- The Comprehensive Cancer Centre of Nanjing Drum Tower Hospital, The Affiliated Hospital of Nanjing University Medical School, Nanjing, China
| | - Baorui Liu
- The Comprehensive Cancer Centre of Nanjing Drum Tower Hospital, The Affiliated Hospital of Nanjing University Medical School, Nanjing, China
| | - Mi Yang
- The Comprehensive Cancer Centre of Nanjing Drum Tower Hospital, Nanjing Drum Tower Hospital Clinical College of Traditional Chinese and Western Medicine, Nanjing University of Chinese Medicine, Nanjing, China.
- The Comprehensive Cancer Centre of Nanjing Drum Tower Hospital, The Affiliated Hospital of Nanjing University Medical School, Nanjing, China.
| |
Collapse
|
6
|
Lac L, Leung CK, Hu P. Computational frameworks integrating deep learning and statistical models in mining multimodal omics data. J Biomed Inform 2024; 152:104629. [PMID: 38552994 DOI: 10.1016/j.jbi.2024.104629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 02/26/2024] [Accepted: 03/25/2024] [Indexed: 04/04/2024]
Abstract
BACKGROUND In health research, multimodal omics data analysis is widely used to address important clinical and biological questions. Traditional statistical methods rely on the strong assumptions of distribution. Statistical methods such as testing and differential expression are commonly used in omics analysis. Deep learning, on the other hand, is an advanced computer science technique that is powerful in mining high-dimensional omics data for prediction tasks. Recently, integrative frameworks or methods have been developed for omics studies that combine statistical models and deep learning algorithms. METHODS AND RESULTS The aim of these integrative frameworks is to combine the strengths of both statistical methods and deep learning algorithms to improve prediction accuracy while also providing interpretability and explainability. This review report discusses the current state-of-the-art integrative frameworks, their limitations, and potential future directions in survival and time-to-event longitudinal analysis, dimension reduction and clustering, regression and classification, feature selection, and causal and transfer learning.
Collapse
Affiliation(s)
- Leann Lac
- Department of Computer Science, University of Manitoba, Winnipeg, Manitoba, Canada; Department of Statistics, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Carson K Leung
- Department of Computer Science, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Pingzhao Hu
- Department of Computer Science, University of Manitoba, Winnipeg, Manitoba, Canada; Department of Biochemistry, Western University, London, Ontario, Canada; Department of Computer Science, Western University, London, Ontario, Canada; Department of Oncology, Western University, London, Ontario, Canada; Department of Epidemiology and Biostatistics, Western University, London, Ontario, Canada; The Children's Health Research Institute, Lawson Health Research Institute, London, Ontario, Canada.
| |
Collapse
|
7
|
Yan H, Weng D, Li D, Gu Y, Ma W, Liu Q. Prior knowledge-guided multilevel graph neural network for tumor risk prediction and interpretation via multi-omics data integration. Brief Bioinform 2024; 25:bbae184. [PMID: 38670157 PMCID: PMC11052635 DOI: 10.1093/bib/bbae184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/11/2024] [Accepted: 04/06/2024] [Indexed: 04/28/2024] Open
Abstract
The interrelation and complementary nature of multi-omics data can provide valuable insights into the intricate molecular mechanisms underlying diseases. However, challenges such as limited sample size, high data dimensionality and differences in omics modalities pose significant obstacles to fully harnessing the potential of these data. The prior knowledge such as gene regulatory network and pathway information harbors useful gene-gene interaction and gene functional module information. To effectively integrate multi-omics data and make full use of the prior knowledge, here, we propose a Multilevel-graph neural network (GNN): a hierarchically designed deep learning algorithm that sequentially leverages multi-omics data, gene regulatory networks and pathway information to extract features and enhance accuracy in predicting survival risk. Our method achieved better accuracy compared with existing methods. Furthermore, key factors nonlinearly associated with the tumor pathogenesis are prioritized by employing two interpretation algorithms (i.e. GNN-Explainer and IGscore) for neural networks, at gene and pathway level, respectively. The top genes and pathways exhibit strong associations with disease in survival analyses, many of which such as SEC61G and CYP27B1 are previously reported in the literature.
Collapse
Affiliation(s)
- Hongxi Yan
- Department of Computer Science, Beihang University, XueYuan Road, 100191, BeiJing, China
| | - Dawei Weng
- School of Biomedical Engineering, Capital Medical University, 10 You An Men WaiXi Tou Tiao, 100069, Beijing, China
| | - Dongguo Li
- School of Biomedical Engineering, Capital Medical University, 10 You An Men WaiXi Tou Tiao, 100069, Beijing, China
| | - Yu Gu
- School of Biomedical Engineering, Capital Medical University, 10 You An Men WaiXi Tou Tiao, 100069, Beijing, China
| | - Wenji Ma
- Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, 227 South Chongqing Road, 200025, Shanghai, China
| | - Qingjie Liu
- Department of Computer Science, Beihang University, XueYuan Road, 100191, BeiJing, China
| |
Collapse
|
8
|
Chai H, Lin S, Lin J, He M, Yang Y, OuYang Y, Zhao H. An uncertainty-based interpretable deep learning framework for predicting breast cancer outcome. BMC Bioinformatics 2024; 25:88. [PMID: 38418940 PMCID: PMC10902951 DOI: 10.1186/s12859-024-05716-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 02/21/2024] [Indexed: 03/02/2024] Open
Abstract
BACKGROUND Predicting outcome of breast cancer is important for selecting appropriate treatments and prolonging the survival periods of patients. Recently, different deep learning-based methods have been carefully designed for cancer outcome prediction. However, the application of these methods is still challenged by interpretability. In this study, we proposed a novel multitask deep neural network called UISNet to predict the outcome of breast cancer. The UISNet is able to interpret the importance of features for the prediction model via an uncertainty-based integrated gradients algorithm. UISNet improved the prediction by introducing prior biological pathway knowledge and utilizing patient heterogeneity information. RESULTS The model was tested in seven public datasets of breast cancer, and showed better performance (average C-index = 0.691) than the state-of-the-art methods (average C-index = 0.650, ranged from 0.619 to 0.677). Importantly, the UISNet identified 20 genes as associated with breast cancer, among which 11 have been proven to be associated with breast cancer by previous studies, and others are novel findings of this study. CONCLUSIONS Our proposed method is accurate and robust in predicting breast cancer outcomes, and it is an effective way to identify breast cancer-associated genes. The method codes are available at: https://github.com/chh171/UISNet .
Collapse
Affiliation(s)
- Hua Chai
- School of Mathematics and Big Data, Foshan University, Foshan, 528000, China
| | - Siyin Lin
- School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, 510000, China
| | - Junqi Lin
- School of Mathematics and Big Data, Foshan University, Foshan, 528000, China
| | - Minfan He
- School of Mathematics and Big Data, Foshan University, Foshan, 528000, China
| | - Yuedong Yang
- School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, 510000, China
| | - Yongzhong OuYang
- School of Mathematics and Big Data, Foshan University, Foshan, 528000, China.
| | - Huiying Zhao
- Department of Medical Research Center, Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, Guangzhou, 510000, China.
| |
Collapse
|
9
|
Luo H, Liang H, Liu H, Fan Z, Wei Y, Yao X, Cong S. TEMINET: A Co-Informative and Trustworthy Multi-Omics Integration Network for Diagnostic Prediction. Int J Mol Sci 2024; 25:1655. [PMID: 38338932 PMCID: PMC10855161 DOI: 10.3390/ijms25031655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 01/20/2024] [Accepted: 01/26/2024] [Indexed: 02/12/2024] Open
Abstract
Advancing the domain of biomedical investigation, integrated multi-omics data have shown exceptional performance in elucidating complex human diseases. However, as the variety of omics information expands, precisely perceiving the informativeness of intra- and inter-omics becomes challenging due to the intricate interrelations, thus presenting significant challenges in the integration of multi-omics data. To address this, we introduce a novel multi-omics integration approach, referred to as TEMINET. This approach enhances diagnostic prediction by leveraging an intra-omics co-informative representation module and a trustworthy learning strategy used to address inter-omics fusion. Considering the multifactorial nature of complex diseases, TEMINET utilizes intra-omics features to construct disease-specific networks; then, it applies graph attention networks and a multi-level framework to capture more collective informativeness than pairwise relations. To perceive the contribution of co-informative representations within intra-omics, we designed a trustworthy learning strategy to identify the reliability of each omics in integration. To integrate inter-omics information, a combined-beliefs fusion approach is deployed to harmonize the trustworthy representations of different omics types effectively. Our experiments across four different diseases using mRNA, methylation, and miRNA data demonstrate that TEMINET achieves advanced performance and robustness in classification tasks.
Collapse
Affiliation(s)
- Haoran Luo
- Qingdao Innovation and Development Center, Harbin Engineering University, Qingdao 266000, China; (H.L.); (Z.F.)
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China; (H.L.); (H.L.); (Y.W.)
| | - Hong Liang
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China; (H.L.); (H.L.); (Y.W.)
| | - Hongwei Liu
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China; (H.L.); (H.L.); (Y.W.)
| | - Zhoujie Fan
- Qingdao Innovation and Development Center, Harbin Engineering University, Qingdao 266000, China; (H.L.); (Z.F.)
| | - Yanhui Wei
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China; (H.L.); (H.L.); (Y.W.)
| | - Xiaohui Yao
- Qingdao Innovation and Development Center, Harbin Engineering University, Qingdao 266000, China; (H.L.); (Z.F.)
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China; (H.L.); (H.L.); (Y.W.)
| | - Shan Cong
- Qingdao Innovation and Development Center, Harbin Engineering University, Qingdao 266000, China; (H.L.); (Z.F.)
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China; (H.L.); (H.L.); (Y.W.)
| |
Collapse
|
10
|
Hou Z, Leng J, Yu J, Xia Z, Wu LY. PathExpSurv: pathway expansion for explainable survival analysis and disease gene discovery. BMC Bioinformatics 2023; 24:434. [PMID: 37968615 PMCID: PMC10648621 DOI: 10.1186/s12859-023-05535-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 10/16/2023] [Indexed: 11/17/2023] Open
Abstract
BACKGROUND In the field of biology and medicine, the interpretability and accuracy are both important when designing predictive models. The interpretability of many machine learning models such as neural networks is still a challenge. Recently, many researchers utilized prior information such as biological pathways to develop neural networks-based methods, so as to provide some insights and interpretability for the models. However, the prior biological knowledge may be incomplete and there still exists some unknown information to be explored. RESULTS We proposed a novel method, named PathExpSurv, to gain an insight into the black-box model of neural network for cancer survival analysis. We demonstrated that PathExpSurv could not only incorporate the known prior information into the model, but also explore the unknown possible expansion to the existing pathways. We performed downstream analyses based on the expanded pathways and successfully identified some key genes associated with the diseases and original pathways. CONCLUSIONS Our proposed PathExpSurv is a novel, effective and interpretable method for survival analysis. It has great utility and value in medical diagnosis and offers a promising framework for biological research.
Collapse
Affiliation(s)
- Zhichao Hou
- IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Jiacheng Leng
- IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Jiating Yu
- IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Zheng Xia
- Computational Biology Program, Oregon Health & Science University, Portland, USA.
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, USA.
| | - Ling-Yun Wu
- IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
11
|
Chandrashekar PB, Alatkar S, Wang J, Hoffman GE, He C, Jin T, Khullar S, Bendl J, Fullard JF, Roussos P, Wang D. DeepGAMI: deep biologically guided auxiliary learning for multimodal integration and imputation to improve genotype-phenotype prediction. Genome Med 2023; 15:88. [PMID: 37904203 PMCID: PMC10617196 DOI: 10.1186/s13073-023-01248-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Accepted: 10/16/2023] [Indexed: 11/01/2023] Open
Abstract
BACKGROUND Genotypes are strongly associated with disease phenotypes, particularly in brain disorders. However, the molecular and cellular mechanisms behind this association remain elusive. With emerging multimodal data for these mechanisms, machine learning methods can be applied for phenotype prediction at different scales, but due to the black-box nature of machine learning, integrating these modalities and interpreting biological mechanisms can be challenging. Additionally, the partial availability of these multimodal data presents a challenge in developing these predictive models. METHOD To address these challenges, we developed DeepGAMI, an interpretable neural network model to improve genotype-phenotype prediction from multimodal data. DeepGAMI leverages functional genomic information, such as eQTLs and gene regulation, to guide neural network connections. Additionally, it includes an auxiliary learning layer for cross-modal imputation allowing the imputation of latent features of missing modalities and thus predicting phenotypes from a single modality. Finally, DeepGAMI uses integrated gradient to prioritize multimodal features for various phenotypes. RESULTS We applied DeepGAMI to several multimodal datasets including genotype and bulk and cell-type gene expression data in brain diseases, and gene expression and electrophysiology data of mouse neuronal cells. Using cross-validation and independent validation, DeepGAMI outperformed existing methods for classifying disease types, and cellular and clinical phenotypes, even using single modalities (e.g., AUC score of 0.79 for Schizophrenia and 0.73 for cognitive impairment in Alzheimer's disease). CONCLUSION We demonstrated that DeepGAMI improves phenotype prediction and prioritizes phenotypic features and networks in multiple multimodal datasets in complex brains and brain diseases. Also, it prioritized disease-associated variants, genes, and regulatory networks linked to different phenotypes, providing novel insights into the interpretation of gene regulatory mechanisms. DeepGAMI is open-source and available for general use.
Collapse
Affiliation(s)
- Pramod Bharadwaj Chandrashekar
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53076, USA
| | - Sayali Alatkar
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, 53076, USA
| | - Jiebiao Wang
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, 15261, USA
| | - Gabriel E Hoffman
- Center for Disease Neurogenomics, Department of Psychiatry and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Chenfeng He
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53076, USA
| | - Ting Jin
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53076, USA
| | - Saniya Khullar
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53076, USA
| | - Jaroslav Bendl
- Center for Disease Neurogenomics, Department of Psychiatry and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - John F Fullard
- Center for Disease Neurogenomics, Department of Psychiatry and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Panos Roussos
- Center for Disease Neurogenomics, Department of Psychiatry and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Mental Illness Research, Education and Clinical Centers, James J. Peters VA Medical Center, Bronx, NY, 10468, USA
- Center for Dementia Research, Nathan Kline Institute for Psychiatric Research, Orangeburg, NY, 10962, USA
| | - Daifeng Wang
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA.
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53076, USA.
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, 53076, USA.
| |
Collapse
|
12
|
Jemimah S, AlShehhi A. c-Diadem: a constrained dual-input deep learning model to identify novel biomarkers in Alzheimer's disease. BMC Med Genomics 2023; 16:244. [PMID: 37833700 PMCID: PMC10571239 DOI: 10.1186/s12920-023-01675-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Accepted: 09/27/2023] [Indexed: 10/15/2023] Open
Abstract
BACKGROUND Alzheimer's disease (AD) is an incurable, debilitating neurodegenerative disorder. Current biomarkers for AD diagnosis require expensive neuroimaging or invasive cerebrospinal fluid sampling, thus precluding early detection. Blood-based biomarker discovery in Alzheimer's can facilitate less-invasive, routine diagnostic tests to aid early intervention. Therefore, we propose "c-Diadem" (constrained dual-input Alzheimer's disease model), a novel deep learning classifier which incorporates KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway constraints on the input genotyping data to predict disease, i.e., mild cognitive impairment (MCI)/AD or cognitively normal (CN). SHAP (SHapley Additive exPlanations) was used to explain the model and identify novel, potential blood-based genetic markers of MCI/AD. METHODS We developed a novel constrained deep learning neural network which utilizes SNPs (single nucleotide polymorphisms) and microarray data from ADNI (Alzheimer's Disease Neuroimaging Initiative) to predict the disease status of participants, i.e., CN or with disease (MCI/AD), and identify potential blood-based biomarkers for diagnosis and intervention. The dataset contains samples from 626 participants, of which 212 are CN (average age 74.6 ± 5.4 years) and 414 patients have MCI/AD (average age 72.7 ± 7.6 years). KEGG pathway information was used to generate constraints applied to the input tensors, thus enhancing the interpretability of the model. SHAP scores were used to identify genes which could potentially serve as biomarkers for diagnosis and targets for drug development. RESULTS Our model's performance, with accuracy of 69% and AUC of 70% in the test dataset, is superior to previous models. The SHAP scores show that SNPs in PRKCZ, PLCB1 and ITPR2 as well as expression of HLA-DQB1, EIF1AY, HLA-DQA1, and ZFP57 have more impact on model predictions. CONCLUSIONS In addition to predicting MCI/AD, our model has been interrogated for potential genetic biomarkers using SHAP. From our analysis, we have identified blood-based genetic markers related to Ca2+ ion release in affected regions of the brain, as well as depression. The findings from our study provides insights into disease mechanisms, and can facilitate innovation in less-invasive, cost-effective diagnostics. To the best of our knowledge, our model is the first to use pathway constraints in a multimodal neural network to identify potential genetic markers for AD.
Collapse
Affiliation(s)
- Sherlyn Jemimah
- Department of Biomedical Engineering, Khalifa University, PO Box 127788, Abu Dhabi, United Arab Emirates
| | - Aamna AlShehhi
- Department of Biomedical Engineering, Khalifa University, PO Box 127788, Abu Dhabi, United Arab Emirates.
| |
Collapse
|
13
|
Pang J, Liang B, Ding R, Yan Q, Chen R, Xu J. A denoised multi-omics integration framework for cancer subtype classification and survival prediction. Brief Bioinform 2023; 24:bbad304. [PMID: 37594302 DOI: 10.1093/bib/bbad304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 07/04/2023] [Accepted: 08/04/2023] [Indexed: 08/19/2023] Open
Abstract
The availability of high-throughput sequencing data creates opportunities to comprehensively understand human diseases as well as challenges to train machine learning models using such high dimensions of data. Here, we propose a denoised multi-omics integration framework, which contains a distribution-based feature denoising algorithm, Feature Selection with Distribution (FSD), for dimension reduction and a multi-omics integration framework, Attention Multi-Omics Integration (AttentionMOI) to predict cancer prognosis and identify cancer subtypes. We demonstrated that FSD improved model performance either using single omic data or multi-omics data in 15 The Cancer Genome Atlas Program (TCGA) cancers for survival prediction and kidney cancer subtype identification. And our integration framework AttentionMOI outperformed machine learning models and current multi-omics integration algorithms with high dimensions of features. Furthermore, FSD identified features that were associated to cancer prognosis and could be considered as biomarkers.
Collapse
Affiliation(s)
- Jiali Pang
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
| | - Bilin Liang
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
| | - Ruifeng Ding
- Department of Anesthesiology, Changzheng Hospital, Second Affiliated Hospital of Naval Medical University, Shanghai, China
| | - Qiujuan Yan
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
| | - Ruiyao Chen
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
| | - Jie Xu
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
| |
Collapse
|
14
|
Chen C, Wang J, Pan D, Wang X, Xu Y, Yan J, Wang L, Yang X, Yang M, Liu G. Applications of multi-omics analysis in human diseases. MedComm (Beijing) 2023; 4:e315. [PMID: 37533767 PMCID: PMC10390758 DOI: 10.1002/mco2.315] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 05/25/2023] [Accepted: 05/31/2023] [Indexed: 08/04/2023] Open
Abstract
Multi-omics usually refers to the crossover application of multiple high-throughput screening technologies represented by genomics, transcriptomics, single-cell transcriptomics, proteomics and metabolomics, spatial transcriptomics, and so on, which play a great role in promoting the study of human diseases. Most of the current reviews focus on describing the development of multi-omics technologies, data integration, and application to a particular disease; however, few of them provide a comprehensive and systematic introduction of multi-omics. This review outlines the existing technical categories of multi-omics, cautions for experimental design, focuses on the integrated analysis methods of multi-omics, especially the approach of machine learning and deep learning in multi-omics data integration and the corresponding tools, and the application of multi-omics in medical researches (e.g., cancer, neurodegenerative diseases, aging, and drug target discovery) as well as the corresponding open-source analysis tools and databases, and finally, discusses the challenges and future directions of multi-omics integration and application in precision medicine. With the development of high-throughput technologies and data integration algorithms, as important directions of multi-omics for future disease research, single-cell multi-omics and spatial multi-omics also provided a detailed introduction. This review will provide important guidance for researchers, especially who are just entering into multi-omics medical research.
Collapse
Affiliation(s)
- Chongyang Chen
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
- Co‐innovation Center of NeurodegenerationNantong UniversityNantongChina
| | - Jing Wang
- Shenzhen Key Laboratory of Modern ToxicologyShenzhen Medical Key Discipline of Health Toxicology (2020–2024)Shenzhen Center for Disease Control and PreventionShenzhenChina
| | - Donghui Pan
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Xinyu Wang
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Yuping Xu
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Junjie Yan
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Lizhen Wang
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Xifei Yang
- Shenzhen Key Laboratory of Modern ToxicologyShenzhen Medical Key Discipline of Health Toxicology (2020–2024)Shenzhen Center for Disease Control and PreventionShenzhenChina
| | - Min Yang
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Gong‐Ping Liu
- Co‐innovation Center of NeurodegenerationNantong UniversityNantongChina
- Department of PathophysiologySchool of Basic MedicineKey Laboratory of Ministry of Education of China and Hubei Province for Neurological DisordersTongji Medical CollegeHuazhong University of Science and TechnologyWuhanChina
| |
Collapse
|
15
|
Raudenska M, Vicar T, Gumulec J, Masarik M. Johann Gregor Mendel: the victory of statistics over human imagination. Eur J Hum Genet 2023; 31:744-748. [PMID: 36755104 PMCID: PMC9909140 DOI: 10.1038/s41431-023-01303-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 01/11/2023] [Accepted: 01/24/2023] [Indexed: 02/10/2023] Open
Abstract
In 2022, we celebrated 200 years since the birth of Johann Gregor Mendel. Although his contributions to science went unrecognized during his lifetime, Mendel not only described the principles of monogenic inheritance but also pioneered the modern way of doing science based on precise experimental data acquisition and evaluation. Novel statistical and algorithmic approaches are now at the center of scientific work, showing that work that is considered marginal in one era can become a mainstream research approach in the next era. The onset of data-driven science caused a shift from hypothesis-testing to hypothesis-generating approaches in science. Mendel is remembered here as a promoter of this approach, and the benefits of big data and statistical approaches are discussed.
Collapse
Affiliation(s)
- Martina Raudenska
- Department of Physiology, Faculty of Medicine, Masaryk University/Kamenice 5, CZ-625 00, Brno, Czech Republic
- Department of Pathological Physiology, Faculty of Medicine, Masaryk University/Kamenice 5, CZ-625 00, Brno, Czech Republic
| | - Tomas Vicar
- Department of Physiology, Faculty of Medicine, Masaryk University/Kamenice 5, CZ-625 00, Brno, Czech Republic
- Department of Biomedical Engineering, Faculty of Electrical Engineering and Communication, Brno University of Technology, Technicka 3058/10, Brno, Czech Republic
| | - Jaromir Gumulec
- Department of Physiology, Faculty of Medicine, Masaryk University/Kamenice 5, CZ-625 00, Brno, Czech Republic
- Department of Pathological Physiology, Faculty of Medicine, Masaryk University/Kamenice 5, CZ-625 00, Brno, Czech Republic
| | - Michal Masarik
- Department of Physiology, Faculty of Medicine, Masaryk University/Kamenice 5, CZ-625 00, Brno, Czech Republic.
- Department of Pathological Physiology, Faculty of Medicine, Masaryk University/Kamenice 5, CZ-625 00, Brno, Czech Republic.
- BIOCEV, First Faculty of Medicine, Charles University, Prumyslova 595, CZ-252 50, Vestec, Czech Republic.
| |
Collapse
|
16
|
Wysocka M, Wysocki O, Zufferey M, Landers D, Freitas A. A systematic review of biologically-informed deep learning models for cancer: fundamental trends for encoding and interpreting oncology data. BMC Bioinformatics 2023; 24:198. [PMID: 37189058 DOI: 10.1186/s12859-023-05262-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 03/30/2023] [Indexed: 05/17/2023] Open
Abstract
BACKGROUND There is an increasing interest in the use of Deep Learning (DL) based methods as a supporting analytical framework in oncology. However, most direct applications of DL will deliver models with limited transparency and explainability, which constrain their deployment in biomedical settings. METHODS This systematic review discusses DL models used to support inference in cancer biology with a particular emphasis on multi-omics analysis. It focuses on how existing models address the need for better dialogue with prior knowledge, biological plausibility and interpretability, fundamental properties in the biomedical domain. For this, we retrieved and analyzed 42 studies focusing on emerging architectural and methodological advances, the encoding of biological domain knowledge and the integration of explainability methods. RESULTS We discuss the recent evolutionary arch of DL models in the direction of integrating prior biological relational and network knowledge to support better generalisation (e.g. pathways or Protein-Protein-Interaction networks) and interpretability. This represents a fundamental functional shift towards models which can integrate mechanistic and statistical inference aspects. We introduce a concept of bio-centric interpretability and according to its taxonomy, we discuss representational methodologies for the integration of domain prior knowledge in such models. CONCLUSIONS The paper provides a critical outlook into contemporary methods for explainability and interpretability used in DL for cancer. The analysis points in the direction of a convergence between encoding prior knowledge and improved interpretability. We introduce bio-centric interpretability which is an important step towards formalisation of biological interpretability of DL models and developing methods that are less problem- or application-specific.
Collapse
Affiliation(s)
- Magdalena Wysocka
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL, UK.
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL, UK.
| | - Oskar Wysocki
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL, UK.
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL, UK.
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920, Martigny, Switzerland.
| | - Marie Zufferey
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920, Martigny, Switzerland
| | - Dónal Landers
- DeLondra Oncology Ltd, 38 Carlton Avenue, Wilmslow, SK9 4EP, UK
| | - André Freitas
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL, UK
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL, UK
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920, Martigny, Switzerland
| |
Collapse
|
17
|
Zhao L, Qi X, Chen Y, Qiao Y, Bu D, Wu Y, Luo Y, Wang S, Zhang R, Zhao Y. Biological knowledge graph-guided investigation of immune therapy response in cancer with graph neural network. Brief Bioinform 2023; 24:6995380. [PMID: 36682018 DOI: 10.1093/bib/bbad023] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 12/18/2022] [Accepted: 01/07/2023] [Indexed: 01/23/2023] Open
Abstract
The determination of transcriptome profiles that mediate immune therapy in cancer remains a major clinical and biological challenge. Despite responses induced by immune-check points inhibitors (ICIs) in diverse tumor types and all the big breakthroughs in cancer immunotherapy, most patients with solid tumors do not respond to ICI therapies. It still remains a big challenge to predict the ICI treatment response. Here, we propose a framework with multiple prior knowledge networks guided for immune checkpoints inhibitors prediction-DeepOmix-ICI (or ICInet for short). ICInet can predict the immune therapy response by leveraging geometric deep learning and prior biological knowledge graphs of gene-gene interactions. Here, we demonstrate more than 600 ICI-treated patients with ICI response data and gene expression profile to apply on ICInet. ICInet was used for ICI therapy responses prediciton across different cancer types-melanoma, gastric cancer and bladder cancer, which includes 7 cohorts from different data sources. ICInet is able to robustly generalize into multiple cancer types. Moreover, the performance of ICInet in those cancer types can outperform other ICI biomarkers in the clinic. Our model [area under the curve (AUC = 0.85)] generally outperformed other measures, including tumor mutational burden (AUC = 0.62) and programmed cell death ligand-1 score (AUC = 0.74). Therefore, our study presents a prior-knowledge guided deep learning method to effectively select immunotherapy-response-associated biomarkers, thereby improving the prediction of immunotherapy response for precision oncology.
Collapse
Affiliation(s)
- Lianhe Zhao
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Xiaoning Qi
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yang Chen
- The First People's Hospital of Yunnan Province, Kunming, 650032, Yunnan, China
| | - Yixuan Qiao
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Dechao Bu
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Yang Wu
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Yufan Luo
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Sheng Wang
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | | | - Yi Zhao
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| |
Collapse
|
18
|
Li Z, Yang N, He L, Wang J, Ping F, Li W, Xu L, Zhang H, Li Y. Development and validation of questionnaire-based machine learning models for predicting all-cause mortality in a representative population of China. Front Public Health 2023; 11:1033070. [PMID: 36778549 PMCID: PMC9911458 DOI: 10.3389/fpubh.2023.1033070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 01/11/2023] [Indexed: 01/28/2023] Open
Abstract
Background Considering that the previously developed mortality prediction models have limited applications to the Chinese population, a questionnaire-based prediction model is of great importance for its accuracy and convenience in clinical practice. Methods Two national cohort, namely, the China Health and Nutrition Survey (8,355 individual older than 18) and the China Health and Retirement Longitudinal Study (12,711 individuals older than 45) were used for model development and validation. One hundred and fifty-nine variables were compiled to generate predictions. The Cox regression model and six machine learning (ML) models were used to predict all-cause mortality. Finally, a simple questionnaire-based ML prediction model was developed using the best algorithm and validated. Results In the internal validation set, all the ML models performed better than the traditional Cox model in predicting 6-year mortality and the random survival forest (RSF) model performed best. The questionnaire-based ML model, which only included 20 variables, achieved a C-index of 0.86 (95%CI: 0.80-0.92). On external validation, the simple questionnaire-based model achieved a C-index of 0.82 (95%CI: 0.77-0.87), 0.77 (95%CI: 0.75-0.79), and 0.79 (95%CI: 0.77-0.81), respectively, in predicting 2-, 9-, and 11-year mortality. Conclusions In this prospective population-based study, a model based on the RSF analysis performed best among all models. Furthermore, there was no significant difference between the prediction performance of the questionnaire-based ML model, which only included 20 variables, and that of the model with all variables (including laboratory variables). The simple questionnaire-based ML prediction model, which needs to be further explored, is of great importance for its accuracy and suitability to the Chinese general population.
Collapse
|
19
|
Wang S, Wang S, Wang Z. A survey on multi-omics-based cancer diagnosis using machine learning with the potential application in gastrointestinal cancer. Front Med (Lausanne) 2023; 9:1109365. [PMID: 36703893 PMCID: PMC9871466 DOI: 10.3389/fmed.2022.1109365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Accepted: 12/28/2022] [Indexed: 01/12/2023] Open
Abstract
Gastrointestinal cancer is becoming increasingly common, which leads to over 3 million deaths every year. No typical symptoms appear in the early stage of gastrointestinal cancer, posing a significant challenge in the diagnosis and treatment of patients with gastrointestinal cancer. Many patients are in the middle and late stages of gastrointestinal cancer when they feel uncomfortable, unfortunately, most of them will die of gastrointestinal cancer. Recently, various artificial intelligence techniques like machine learning based on multi-omics have been presented for cancer diagnosis and treatment in the era of precision medicine. This paper provides a survey on multi-omics-based cancer diagnosis using machine learning with potential application in gastrointestinal cancer. Particularly, we make a comprehensive summary and analysis from the perspective of multi-omics datasets, task types, and multi-omics-based integration methods. Furthermore, this paper points out the remaining challenges of multi-omics-based cancer diagnosis using machine learning and discusses future topics.
Collapse
Affiliation(s)
- Suixue Wang
- School of Information and Communication Engineering, Hainan University, Haikou, China
| | - Shuling Wang
- Department of Neurology, Affiliated Haikou Hospital of Xiangya School of Medicine, Central South University, Haikou, China
| | - Zhengxia Wang
- School of Computer Science and Technology, Hainan University, Haikou, China
| |
Collapse
|
20
|
Doan LMT, Angione C, Occhipinti A. Machine Learning Methods for Survival Analysis with Clinical and Transcriptomics Data of Breast Cancer. Methods Mol Biol 2023; 2553:325-393. [PMID: 36227551 DOI: 10.1007/978-1-0716-2617-7_16] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Breast cancer is one of the most common cancers in women worldwide, which causes an enormous number of deaths annually. However, early diagnosis of breast cancer can improve survival outcomes enabling simpler and more cost-effective treatments. The recent increase in data availability provides unprecedented opportunities to apply data-driven and machine learning methods to identify early-detection prognostic factors capable of predicting the expected survival and potential sensitivity to treatment of patients, with the final aim of enhancing clinical outcomes. This tutorial presents a protocol for applying machine learning models in survival analysis for both clinical and transcriptomic data. We show that integrating clinical and mRNA expression data is essential to explain the multiple biological processes driving cancer progression. Our results reveal that machine-learning-based models such as random survival forests, gradient boosted survival model, and survival support vector machine can outperform the traditional statistical methods, i.e., Cox proportional hazard model. The highest C-index among the machine learning models was recorded when using survival support vector machine, with a value 0.688, whereas the C-index recorded using the Cox model was 0.677. Shapley Additive Explanation (SHAP) values were also applied to identify the feature importance of the models and their impact on the prediction outcomes.
Collapse
Affiliation(s)
- Le Minh Thao Doan
- School of Computing, Engineering and Digital Technologies, Teesside University, Middlesbrough, UK
| | - Claudio Angione
- School of Computing, Engineering and Digital Technologies, Teesside University, Middlesbrough, UK
- Centre for Digital Innovation, Teesside University, Middlesbrough, UK
- Healthcare Innovation Centre, Teesside University, Middlesbrough, UK
- National Horizons Centre, Teesside University, Darlington, UK
| | - Annalisa Occhipinti
- School of Computing, Engineering and Digital Technologies, Teesside University, Middlesbrough, UK.
- Centre for Digital Innovation, Teesside University, Middlesbrough, UK.
- National Horizons Centre, Teesside University, Darlington, UK.
| |
Collapse
|
21
|
Krzyziński M, Spytek M, Baniecki H, Biecek P. SurvSHAP(t): Time-dependent explanations of machine learning survival models. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.110234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
22
|
Chen S, Zang Y, Xu B, Lu B, Ma R, Miao P, Chen B. An Unsupervised Deep Learning-Based Model Using Multiomics Data to Predict Prognosis of Patients with Stomach Adenocarcinoma. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:5844846. [PMID: 36339684 PMCID: PMC9633210 DOI: 10.1155/2022/5844846] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Revised: 09/25/2022] [Accepted: 10/08/2022] [Indexed: 09/08/2023]
Abstract
METHODS Patients (363 in total) with stomach adenocarcinoma from The Cancer Genome Atlas (TCGA) cohort were included. An autoencoder was constructed to integrate the RNA sequencing, miRNA sequencing, and methylation data. The features of the bottleneck layer were used to perform the k-means clustering algorithm to obtain different subgroups for evaluating the prognosis-related risk of stomach adenocarcinoma. The model's robustness was verified using a 10-fold cross-validation (CV). Survival was analyzed by the Kaplan-Meier method. Univariate and multivariate Cox regression was used to estimate hazard risk. The model was validated in three independent cohorts with different endpoints. RESULTS The patients were divided into low-risk and high-risk groups according to the k-means clustering algorithm. The high-risk group had a significantly higher risk of poor survival (log-rank P value = 2.80e - 06; adjusted hazard ratio = 2.386, 95% confidence interval: 1.607~3.543), a concordance index (C-index) of 0.714, and a Brier score of 0.184. The model performed well both in the 10-fold CV procedure and three independent cohorts from the Gene Expression Omnibus (GEO) repository. CONCLUSIONS A robust and generalizable model based on the autoencoder was proposed to integrate multiomics data and predict the prognosis of patients with stomach adenocarcinoma. The model demonstrates better performance than two alternative approaches on prognosis prediction. The results might provide the grounds for further exploring the potential biomarkers to predict the prognosis of patients with stomach adenocarcinoma.
Collapse
Affiliation(s)
- Sizhen Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Southeast University, Nanjing 210009, China
| | - Yiteng Zang
- Department of Epidemiology and Biostatistics, School of Public Health, Southeast University, Nanjing 210009, China
| | - Biyun Xu
- Department of Biostatistics, Nanjing Drum Tower Hospital, The Affiliated Hospital of Nanjing University Medical School, Nanjing 210008, China
| | - Beier Lu
- Department of Epidemiology and Biostatistics, School of Public Health, Southeast University, Nanjing 210009, China
| | - Rongji Ma
- Department of Epidemiology and Biostatistics, School of Public Health, Southeast University, Nanjing 210009, China
| | - Pengcheng Miao
- Department of Epidemiology and Biostatistics, School of Public Health, Southeast University, Nanjing 210009, China
| | - Bingwei Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Southeast University, Nanjing 210009, China
| |
Collapse
|
23
|
Pfeifer B, Baniecki H, Saranti A, Biecek P, Holzinger A. Multi-omics disease module detection with an explainable Greedy Decision Forest. Sci Rep 2022; 12:16857. [PMID: 36207536 PMCID: PMC9546860 DOI: 10.1038/s41598-022-21417-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 09/27/2022] [Indexed: 11/09/2022] Open
Abstract
Machine learning methods can detect complex relationships between variables, but usually do not exploit domain knowledge. This is a limitation because in many scientific disciplines, such as systems biology, domain knowledge is available in the form of graphs or networks, and its use can improve model performance. We need network-based algorithms that are versatile and applicable in many research areas. In this work, we demonstrate subnetwork detection based on multi-modal node features using a novel Greedy Decision Forest (GDF) with inherent interpretability. The latter will be a crucial factor to retain experts and gain their trust in such algorithms. To demonstrate a concrete application example, we focus on bioinformatics, systems biology and particularly biomedicine, but the presented methodology is applicable in many other domains as well. Systems biology is a good example of a field in which statistical data-driven machine learning enables the analysis of large amounts of multi-modal biomedical data. This is important to reach the future goal of precision medicine, where the complexity of patients is modeled on a system level to best tailor medical decisions, health practices and therapies to the individual patient. Our proposed explainable approach can help to uncover disease-causing network modules from multi-omics data to better understand complex diseases such as cancer.
Collapse
Affiliation(s)
- Bastian Pfeifer
- Institute for Medical Informatics Statistics and Documentation, Medical University Graz, Graz, Austria.
| | - Hubert Baniecki
- MI2DataLab, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| | - Anna Saranti
- Institute for Medical Informatics Statistics and Documentation, Medical University Graz, Graz, Austria.,Human-Centered AI Lab, University of Natural Resources and Life Sciences, Vienna, Austria
| | - Przemyslaw Biecek
- MI2DataLab, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| | - Andreas Holzinger
- Institute for Medical Informatics Statistics and Documentation, Medical University Graz, Graz, Austria.,Human-Centered AI Lab, University of Natural Resources and Life Sciences, Vienna, Austria.,Alberta Machine Intelligence Institute, Alberta, Canada
| |
Collapse
|
24
|
P D, C G. A systematic review on machine learning and deep learning techniques in cancer survival prediction. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2022; 174:62-71. [PMID: 35933043 DOI: 10.1016/j.pbiomolbio.2022.07.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 07/13/2022] [Accepted: 07/19/2022] [Indexed: 06/15/2023]
Abstract
Cancer is a disease which is characterised by the unusual and uncontrollable growth of body cells. This usually happens asymptomatically and gets spread to other parts of the body. The major problem in treating cancer is that its progress is not monitored once it is diagnosed. The progress or the prognosis can be done through survival analysis. The survival analysis is the branch of statistics that deals in predicting the time of event of occurrence. In the case of cancer prognosis the event is the survival time of the patient from the onset of the disease or it can be the recurrence of the disease after undergoing a treatment. This study aims to bring out the machine learning and deep learning models involved in providing the prognosis to the cancer patients.
Collapse
Affiliation(s)
- Deepa P
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India
| | - Gunavathi C
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India.
| |
Collapse
|
25
|
Liang B, Gong H, Lu L, Xu J. Risk stratification and pathway analysis based on graph neural network and interpretable algorithm. BMC Bioinformatics 2022; 23:394. [PMID: 36167504 PMCID: PMC9516820 DOI: 10.1186/s12859-022-04950-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Accepted: 09/19/2022] [Indexed: 12/01/2022] Open
Abstract
Background Pathway-based analysis of transcriptomic data has shown greater stability and better performance than traditional gene-based analysis. Until now, some pathway-based deep learning models have been developed for bioinformatic analysis, but these models have not fully considered the topological features of pathways, which limits the performance of the final prediction result. Results To address this issue, we propose a novel model, called PathGNN, which constructs a Graph Neural Networks (GNNs) model that can capture topological features of pathways. As a case, PathGNN was applied to predict long-term survival of four types of cancer and achieved promising predictive performance when compared to other common methods. Furthermore, the adoption of an interpretation algorithm enabled the identification of plausible pathways associated with survival. Conclusion PathGNN demonstrates that GNN can be effectively applied to build a pathway-based model, resulting in promising predictive power. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04950-1.
Collapse
Affiliation(s)
- Bilin Liang
- Shanghai Artificial Intelligence Laboratory, Yunjing Road 701, Shanghai, China
| | - Haifan Gong
- Shanghai Artificial Intelligence Laboratory, Yunjing Road 701, Shanghai, China
| | - Lu Lu
- Shanghai Artificial Intelligence Laboratory, Yunjing Road 701, Shanghai, China
| | - Jie Xu
- Shanghai Artificial Intelligence Laboratory, Yunjing Road 701, Shanghai, China.
| |
Collapse
|
26
|
Qiao Y, Zhao L, Luo C, Luo Y, Wu Y, Li S, Bu D, Zhao Y. Multi-modality artificial intelligence in digital pathology. Brief Bioinform 2022; 23:6702380. [PMID: 36124675 PMCID: PMC9677480 DOI: 10.1093/bib/bbac367] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 07/27/2022] [Accepted: 08/05/2022] [Indexed: 12/14/2022] Open
Abstract
In common medical procedures, the time-consuming and expensive nature of obtaining test results plagues doctors and patients. Digital pathology research allows using computational technologies to manage data, presenting an opportunity to improve the efficiency of diagnosis and treatment. Artificial intelligence (AI) has a great advantage in the data analytics phase. Extensive research has shown that AI algorithms can produce more up-to-date and standardized conclusions for whole slide images. In conjunction with the development of high-throughput sequencing technologies, algorithms can integrate and analyze data from multiple modalities to explore the correspondence between morphological features and gene expression. This review investigates using the most popular image data, hematoxylin-eosin stained tissue slide images, to find a strategic solution for the imbalance of healthcare resources. The article focuses on the role that the development of deep learning technology has in assisting doctors' work and discusses the opportunities and challenges of AI.
Collapse
Affiliation(s)
- Yixuan Qiao
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lianhe Zhao
- Corresponding authors: Yi Zhao, Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences; Shandong First Medical University & Shandong Academy of Medical Sciences. Tel.: +86 10 6260 0822; Fax: +86 10 6260 1356; E-mail: ; Lianhe Zhao, Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences. Tel.: +86 18513983324; E-mail:
| | - Chunlong Luo
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yufan Luo
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yang Wu
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Shengtong Li
- Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Dechao Bu
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Yi Zhao
- Corresponding authors: Yi Zhao, Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences; Shandong First Medical University & Shandong Academy of Medical Sciences. Tel.: +86 10 6260 0822; Fax: +86 10 6260 1356; E-mail: ; Lianhe Zhao, Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences. Tel.: +86 18513983324; E-mail:
| |
Collapse
|
27
|
Combining Molecular, Imaging, and Clinical Data Analysis for Predicting Cancer Prognosis. Cancers (Basel) 2022; 14:cancers14133215. [PMID: 35804988 PMCID: PMC9265023 DOI: 10.3390/cancers14133215] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 06/24/2022] [Accepted: 06/27/2022] [Indexed: 02/04/2023] Open
Abstract
Simple Summary The rise of Big Data, the widespread use of Machine Learning, and the cheapening of omics techniques have allowed for the creation of more sophisticated and accurate models in biomedical research. This article presents the state-of-the-art predictive models of cancer prognosis that use multimodal data, considering clinical, molecular (omics and non-omics), and image data. The subject of study, the data modalities used, the data processing and modelling methods applied, the validation strategies involved, the integration strategies encompassed, and the evolution of prognostic predictive models are discussed. Finally, we discuss challenges and opportunities in this field of cancer research, with great potential impact on the clinical management of patients and, by extension, on the implementation of personalised and precision medicine. Abstract Cancer is one of the most detrimental diseases globally. Accordingly, the prognosis prediction of cancer patients has become a field of interest. In this review, we have gathered 43 state-of-the-art scientific papers published in the last 6 years that built cancer prognosis predictive models using multimodal data. We have defined the multimodality of data as four main types: clinical, anatomopathological, molecular, and medical imaging; and we have expanded on the information that each modality provides. The 43 studies were divided into three categories based on the modelling approach taken, and their characteristics were further discussed together with current issues and future trends. Research in this area has evolved from survival analysis through statistical modelling using mainly clinical and anatomopathological data to the prediction of cancer prognosis through a multi-faceted data-driven approach by the integration of complex, multimodal, and high-dimensional data containing multi-omics and medical imaging information and by applying Machine Learning and, more recently, Deep Learning techniques. This review concludes that cancer prognosis predictive multimodal models are capable of better stratifying patients, which can improve clinical management and contribute to the implementation of personalised medicine as well as provide new and valuable knowledge on cancer biology and its progression.
Collapse
|
28
|
Leveraging Deep Learning Techniques and Integrated Omics Data for Tailored Treatment of Breast Cancer. J Pers Med 2022; 12:jpm12050674. [PMID: 35629097 PMCID: PMC9147748 DOI: 10.3390/jpm12050674] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 03/06/2022] [Accepted: 04/14/2022] [Indexed: 12/12/2022] Open
Abstract
Multiomics data of cancer patients and cell lines, in synergy with deep learning techniques, have aided in unravelling predictive problems related to cancer research and treatment. However, there is still room for improvement in the performance of the existing models based on the aforementioned combination. In this work, we propose two models that complement the treatment of breast cancer patients. First, we discuss our deep learning-based model for breast cancer subtype classification. Second, we propose DCNN-DR, a deep convolute.ion neural network-drug response method for predicting the effectiveness of drugs on in vitro and in vivo breast cancer datasets. Finally, we applied DCNN-DR for predicting effective drugs for the basal-like breast cancer subtype and validated the results with the information available in the literature. The models proposed use late integration methods and have fairly better predictive performance compared to the existing methods. We use the Pearson correlation coefficient and accuracy as the performance measures for the regression and classification models, respectively.
Collapse
|
29
|
Stahlschmidt SR, Ulfenborg B, Synnergren J. Multimodal deep learning for biomedical data fusion: a review. Brief Bioinform 2022; 23:6516346. [PMID: 35089332 PMCID: PMC8921642 DOI: 10.1093/bib/bbab569] [Citation(s) in RCA: 68] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 12/06/2021] [Accepted: 12/11/2021] [Indexed: 02/06/2023] Open
Abstract
Biomedical data are becoming increasingly multimodal and thereby capture the underlying complex relationships among biological processes. Deep learning (DL)-based data fusion strategies are a popular approach for modeling these nonlinear relationships. Therefore, we review the current state-of-the-art of such methods and propose a detailed taxonomy that facilitates more informed choices of fusion strategies for biomedical applications, as well as research on novel methods. By doing so, we find that deep fusion strategies often outperform unimodal and shallow approaches. Additionally, the proposed subcategories of fusion strategies show different advantages and drawbacks. The review of current methods has shown that, especially for intermediate fusion strategies, joint representation learning is the preferred approach as it effectively models the complex interactions of different levels of biological organization. Finally, we note that gradual fusion, based on prior biological knowledge or on search strategies, is a promising future research path. Similarly, utilizing transfer learning might overcome sample size limitations of multimodal data sets. As these data sets become increasingly available, multimodal DL approaches present the opportunity to train holistic models that can learn the complex regulatory dynamics behind health and disease.
Collapse
Affiliation(s)
| | | | - Jane Synnergren
- Systems Biology Research Center, University of Skövde, Sweden
| |
Collapse
|