1
|
Ma W, Li M, Chu Z, Chen H. Smart Biosensor for Breast Cancer Survival Prediction Based on Multi-View Multi-Way Graph Learning. SENSORS (BASEL, SWITZERLAND) 2024; 24:3289. [PMID: 38894082 PMCID: PMC11174864 DOI: 10.3390/s24113289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Revised: 05/17/2024] [Accepted: 05/19/2024] [Indexed: 06/21/2024]
Abstract
Biosensors play a crucial role in detecting cancer signals by orchestrating a series of intricate biological and physical transduction processes. Among various cancers, breast cancer stands out due to its genetic underpinnings, which trigger uncontrolled cell proliferation, predominantly impacting women, and resulting in significant mortality rates. The utilization of biosensors in predicting survival time becomes paramount in formulating an optimal treatment strategy. However, conventional biosensors employing traditional machine learning methods encounter challenges in preprocessing features for the learning task. Despite the potential of deep learning techniques to automatically extract useful features, they often struggle to effectively leverage the intricate relationships between features and instances. To address this challenge, our study proposes a novel smart biosensor architecture that integrates a multi-view multi-way graph learning (MVMWGL) approach for predicting breast cancer survival time. This innovative approach enables the assimilation of insights from gene interactions and biosensor similarities. By leveraging real-world data, we conducted comprehensive evaluations, and our experimental results unequivocally demonstrate the superiority of the MVMWGL approach over existing methods.
Collapse
Affiliation(s)
- Wenming Ma
- School of Computer and Control Engineering, Yantai University, Yantai 264005, China; (M.L.); (Z.C.); (H.C.)
| | | | | | | |
Collapse
|
2
|
Drouard G, Mykkänen J, Heiskanen J, Pohjonen J, Ruohonen S, Pahkala K, Lehtimäki T, Wang X, Ollikainen M, Ripatti S, Pirinen M, Raitakari O, Kaprio J. Exploring machine learning strategies for predicting cardiovascular disease risk factors from multi-omic data. BMC Med Inform Decis Mak 2024; 24:116. [PMID: 38698395 PMCID: PMC11064347 DOI: 10.1186/s12911-024-02521-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 04/29/2024] [Indexed: 05/05/2024] Open
Abstract
BACKGROUND Machine learning (ML) classifiers are increasingly used for predicting cardiovascular disease (CVD) and related risk factors using omics data, although these outcomes often exhibit categorical nature and class imbalances. However, little is known about which ML classifier, omics data, or upstream dimension reduction strategy has the strongest influence on prediction quality in such settings. Our study aimed to illustrate and compare different machine learning strategies to predict CVD risk factors under different scenarios. METHODS We compared the use of six ML classifiers in predicting CVD risk factors using blood-derived metabolomics, epigenetics and transcriptomics data. Upstream omic dimension reduction was performed using either unsupervised or semi-supervised autoencoders, whose downstream ML classifier performance we compared. CVD risk factors included systolic and diastolic blood pressure measurements and ultrasound-based biomarkers of left ventricular diastolic dysfunction (LVDD; E/e' ratio, E/A ratio, LAVI) collected from 1,249 Finnish participants, of which 80% were used for model fitting. We predicted individuals with low, high or average levels of CVD risk factors, the latter class being the most common. We constructed multi-omic predictions using a meta-learner that weighted single-omic predictions. Model performance comparisons were based on the F1 score. Finally, we investigated whether learned omic representations from pre-trained semi-supervised autoencoders could improve outcome prediction in an external cohort using transfer learning. RESULTS Depending on the ML classifier or omic used, the quality of single-omic predictions varied. Multi-omics predictions outperformed single-omics predictions in most cases, particularly in the prediction of individuals with high or low CVD risk factor levels. Semi-supervised autoencoders improved downstream predictions compared to the use of unsupervised autoencoders. In addition, median gains in Area Under the Curve by transfer learning compared to modelling from scratch ranged from 0.09 to 0.14 and 0.07 to 0.11 units for transcriptomic and metabolomic data, respectively. CONCLUSIONS By illustrating the use of different machine learning strategies in different scenarios, our study provides a platform for researchers to evaluate how the choice of omics, ML classifiers, and dimension reduction can influence the quality of CVD risk factor predictions.
Collapse
Affiliation(s)
- Gabin Drouard
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland.
| | - Juha Mykkänen
- Centre for Population Health Research, University of Turku and Turku University Hospital, Turku, Finland
- Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland
| | - Jarkko Heiskanen
- Centre for Population Health Research, University of Turku and Turku University Hospital, Turku, Finland
- Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland
| | - Joona Pohjonen
- Research Program in Systems Oncology, University of Helsinki, Helsinki, Finland
| | - Saku Ruohonen
- Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland
| | - Katja Pahkala
- Centre for Population Health Research, University of Turku and Turku University Hospital, Turku, Finland
- Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland
- Paavo Nurmi Centre & Unit for Health and Physical Activity, University of Turku, Turku, Finland
| | - Terho Lehtimäki
- Department of Clinical Chemistry, Fimlab Laboratories, and Finnish Cardiovascular Research Center - Tampere, Faculty of Medicine and Health Technology, Tampere University, 33520, Tampere, Finland
| | - Xiaoling Wang
- Georgia Prevention Institute, Medical College of Georgia, Augusta University, Augusta, GA, USA
| | - Miina Ollikainen
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- Minerva Foundation Institute for Medical Research, Helsinki, Finland
| | - Samuli Ripatti
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- Public Health, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Matti Pirinen
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- Public Health, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| | - Olli Raitakari
- Centre for Population Health Research, University of Turku and Turku University Hospital, Turku, Finland
- Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland
- Department of Clinical Physiology and Nuclear Medicine, Turku University Hospital, Turku, Finland
| | - Jaakko Kaprio
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
3
|
Katole VR, Kaple M. Unraveling the Landscape of Pediatric Glioblastoma Biomarkers: A Comprehensive Review of Enhancing Diagnostics and Therapeutic Insights. Cureus 2024; 16:e57272. [PMID: 38686271 PMCID: PMC11057698 DOI: 10.7759/cureus.57272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 03/28/2024] [Indexed: 05/02/2024] Open
Abstract
Glioblastoma, the most common and aggressive form of primary brain tumor, poses significant challenges to patients, caregivers, and clinicians alike. Pediatric glioblastoma is a rare and aggressive brain tumor that presents unique challenges in treatment. It differs from its adult counterpart in terms of genetic and molecular characteristics. Its incidence is relatively low, but the prognosis remains grim due to its aggressive behavior. Diagnosis relies on imaging techniques and histopathological analysis. The rarity of the disease underscores the need for effective treatment strategies. In recent years, the quest to understand and manage pediatric glioblastoma has seen a significant shift towards unraveling the intricate landscape of biomarkers. Surgery remains a cornerstone of glioblastoma management, aiming to resect as much of the tumor as possible. Glioblastoma's infiltrative nature presents challenges in achieving a complete surgical resection. This comprehensive review delves into the realm of pediatric glioblastoma biomarkers, shedding light on their potential to not only revolutionize diagnostics but also shape therapeutic strategies. From personalized treatment selection to the development of targeted therapies, the potential impact of these biomarkers on clinical outcomes is undeniable. Moreover, this review underscores the substantial implications of biomarker-driven approaches for therapeutic interventions. All advancements in targeted therapies and immunotherapy hold promise for the treatment of pediatric glioblastoma. The genetic profiling of tumors allows for personalized approaches, potentially improving treatment efficacy. The ethical dilemmas surrounding pediatric cancer treatment, particularly balancing potential benefits with risks, are complex. Ongoing clinical trials and preclinical research suggest exciting avenues for future interventions.
Collapse
Affiliation(s)
- Vedant R Katole
- Department of Biochemistry, Jawaharlal Nehru Medical College, Datta Meghe Institute of Higher Education and Research, Wardha, IND
| | - Meghali Kaple
- Department of Biochemistry, Jawaharlal Nehru Medical College, Datta Meghe Institute of Higher Education and Research, Wardha, IND
| |
Collapse
|
4
|
Zhang H, Deng Y, Xiaojie M, Zou Q, Liu H, Tang N, Luo Y, Xiang X. CT radiomics for predicting the prognosis of patients with stage II rectal cancer during the three-year period after surgery, chemotherapy and radiotherapy. Heliyon 2024; 10:e23923. [PMID: 38223741 PMCID: PMC10787243 DOI: 10.1016/j.heliyon.2023.e23923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 11/29/2023] [Accepted: 12/15/2023] [Indexed: 01/16/2024] Open
Abstract
Objective Pre-treatment enhanced CT image data were used to train and build models to predict the efficacy of non-small cell lung cancer after conventional radiotherapy and chemotherapy using two classification algorithms, Logistic Regression (LR) and Gaussian Naive Baye (GNB). Methods In this study, we used pre-treatment enhanced CT image data for region of interest (ROI) sketching and feature extraction. We utilized the least absolute shrinkage and selection operator (LASSO) mutual confidence method for feature screening. We pre-screened logistic regression (LR) and Gaussian naive Bayes (GNB) classification algorithms and trained and modeled the screened features. We plotted 5-fold and 10-fold cross-validated receiver operating characteristic (ROC) curves to calculate the area under the curve (AUC). We performed DeLong's test for validation and plotted calibration curves and decision curves to assess model performance. Results A total of 102 patients were included in this study, and after a comparative analysis of the two models, LR had only slightly lower specificity than GNB, and higher sensitivity, accuracy, AUC value, precision, and F1 value than GNB (training set accuracy: 0.787, AUC value: 0.851; test set accuracy: 0.772, AUC value: 0.849), and the LR model has better performance in both the decision curve and the calibration curve. Conclusion CT can be used for efficacy prediction after radiotherapy and chemotherapy in NSCLC patients. LR is more suitable for predicting whether NSCLC prognosis is in remission without considering the computing speed.
Collapse
Affiliation(s)
- Hanjing Zhang
- Department of Oncology, Affiliated Hospital of Chuanbei Medical College, Nanchong, Sichuan Province, 637000, China
| | - Yu Deng
- The Affiliated Cancer Hospital of Guizhou Medical University, GuiYang, Guizhou Province, 550000, China
| | - M.A. Xiaojie
- Department of Oncology, Affiliated Hospital of Chuanbei Medical College, Nanchong, Sichuan Province, 637000, China
| | - Qian Zou
- Department of Oncology, Affiliated Hospital of Chuanbei Medical College, Nanchong, Sichuan Province, 637000, China
| | - Huanhui Liu
- Department of Oncology, Affiliated Hospital of Chuanbei Medical College, Nanchong, Sichuan Province, 637000, China
| | - Ni Tang
- Department of Oncology, Affiliated Hospital of Chuanbei Medical College, Nanchong, Sichuan Province, 637000, China
| | - Yuanyuan Luo
- Department of Oncology, Affiliated Hospital of Chuanbei Medical College, Nanchong, Sichuan Province, 637000, China
| | - Xuejing Xiang
- Department of Oncology, Affiliated Hospital of Chuanbei Medical College, Nanchong, Sichuan Province, 637000, China
| |
Collapse
|
5
|
Tong L, Shi W, Isgut M, Zhong Y, Lais P, Gloster L, Sun J, Swain A, Giuste F, Wang MD. Integrating Multi-Omics Data With EHR for Precision Medicine Using Advanced Artificial Intelligence. IEEE Rev Biomed Eng 2024; 17:80-97. [PMID: 37824325 DOI: 10.1109/rbme.2023.3324264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2023]
Abstract
With the recent advancement of novel biomedical technologies such as high-throughput sequencing and wearable devices, multi-modal biomedical data ranging from multi-omics molecular data to real-time continuous bio-signals are generated at an unprecedented speed and scale every day. For the first time, these multi-modal biomedical data are able to make precision medicine close to a reality. However, due to data volume and the complexity, making good use of these multi-modal biomedical data requires major effort. Researchers and clinicians are actively developing artificial intelligence (AI) approaches for data-driven knowledge discovery and causal inference using a variety of biomedical data modalities. These AI-based approaches have demonstrated promising results in various biomedical and healthcare applications. In this review paper, we summarize the state-of-the-art AI models for integrating multi-omics data and electronic health records (EHRs) for precision medicine. We discuss the challenges and opportunities in integrating multi-omics data with EHRs and future directions. We hope this review can inspire future research and developing in integrating multi-omics data with EHRs for precision medicine.
Collapse
|
6
|
Wang H, Han X, Ren J, Cheng H, Li H, Li Y, Li X. A prognostic prediction model for ovarian cancer using a cross-modal view correlation discovery network. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024; 21:736-764. [PMID: 38303441 DOI: 10.3934/mbe.2024031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
Ovarian cancer is a tumor with different clinicopathological and molecular features, and the vast majority of patients have local or extensive spread at the time of diagnosis. Early diagnosis and prognostic prediction of patients can contribute to the understanding of the underlying pathogenesis of ovarian cancer and the improvement of therapeutic outcomes. The occurrence of ovarian cancer is influenced by multiple complex mechanisms, including the genome, transcriptome and proteome. Different types of omics analysis help predict the survival rate of ovarian cancer patients. Multi-omics data of ovarian cancer exhibit high-dimensional heterogeneity, and existing methods for integrating multi-omics data have not taken into account the variability and inter-correlation between different omics data. In this paper, we propose a deep learning model, MDCADON, which utilizes multi-omics data and cross-modal view correlation discovery network. We introduce random forest into LASSO regression for feature selection on mRNA expression, DNA methylation, miRNA expression and copy number variation (CNV), aiming to select important features highly correlated with ovarian cancer prognosis. A multi-modal deep neural network is used to comprehensively learn feature representations of each omics data and clinical data, and cross-modal view correlation discovery network is employed to construct the multi-omics discovery tensor, exploring the inter-relationships between different omics data. The experimental results demonstrate that MDCADON is superior to the existing methods in predicting ovarian cancer prognosis, which enables survival analysis for patients and facilitates the determination of follow-up treatment plans. Finally, we perform Gene Ontology (GO) term analysis and biological pathway analysis on the genes identified by MDCADON, revealing the underlying mechanisms of ovarian cancer and providing certain support for guiding ovarian cancer treatments.
Collapse
Affiliation(s)
- Huiqing Wang
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan 030024, China
| | - Xiao Han
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan 030024, China
| | - Jianxue Ren
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan 030024, China
| | - Hao Cheng
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan 030024, China
| | - Haolin Li
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan 030024, China
| | - Ying Li
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan 030024, China
| | - Xue Li
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan 030024, China
| |
Collapse
|
7
|
Gu Y, Wang M, Gong Y, Li X, Wang Z, Wang Y, Jiang S, Zhang D, Li C. Unveiling breast cancer risk profiles: a survival clustering analysis empowered by an online web application. Future Oncol 2023; 19:2651-2667. [PMID: 38095059 DOI: 10.2217/fon-2023-0736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2023] Open
Abstract
Aim: To develop a shiny app for doctors to investigate breast cancer treatments through a new approach by incorporating unsupervised clustering and survival information. Materials & methods: Analysis is based on the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) dataset, which contains 1726 subjects and 22 variables. Cox regression was used to identify survival risk factors for K-means clustering. Logrank tests and C-statistics were compared across different cluster numbers and Kaplan-Meier plots were presented. Results & conclusion: Our study fills an existing void by introducing a unique combination of unsupervised learning techniques and survival information on the clinician side, demonstrating the potential of survival clustering as a valuable tool in uncovering hidden structures based on distinct risk profiles.
Collapse
Affiliation(s)
- Yuan Gu
- Department of Statistics, The George Washington University, Washington, DC 20052, USA
| | - Mingyue Wang
- Department of Mathematics, Syracuse University, Syracuse, NY 13244, USA
| | - Yishu Gong
- Harvard T.H. Chan School of Public Health, Harvard University, Boston, NY 02115, USA
| | - Xin Li
- Department of Statistics, The George Washington University, Washington, DC 20052, USA
| | - Ziyang Wang
- Department of Computer Science, University of Oxford, Oxford, OX1 3QD, UK
| | - Yuli Wang
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Song Jiang
- Department of Biochemistry, Huzhou Institute of Biological Products Co., Ltd., 313017, China
| | - Dan Zhang
- Department of Information Science and Engineering, Shandong University, Shan Dong, China
| | - Chen Li
- Department of Biology, Chemistry and Pharmacy, Free University of Berlin, Berlin, 14195, Germany
| |
Collapse
|
8
|
Skingen VE, Hompland T, Fjeldbo CS, Salberg UB, Helgeland H, Ragnum HB, Aarnes EK, Vlatkovic L, Hole KH, Seierstad T, Lyng H. Prostate cancer radiogenomics reveals proliferative gene expression programs associated with distinct MRI-based hypoxia levels. Radiother Oncol 2023; 188:109875. [PMID: 37640161 DOI: 10.1016/j.radonc.2023.109875] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 08/21/2023] [Accepted: 08/22/2023] [Indexed: 08/31/2023]
Abstract
BACKGROUND AND PURPOSE The biology behind individual hypoxia levels in patient tumors is poorly understood. Here, we used radiogenomics to identify associations between magnetic resonance imaging (MRI)-based hypoxia levels and biological processes derived from gene expression data in prostate cancer. MATERIALS AND METHODS For 85 prostate cancer patients, MRI-based hypoxia images were constructed by combining diffusion-weighted images reflecting oxygen consumption and supply. The ability to differentiate hypoxia levels in these images was verified by comparison with matched biopsy sections stained for the hypoxia marker pimonidazole. For MRI-defined hypoxia levels, corresponding hypoxic fractions were calculated and correlated with biopsy gene expression profiles. Biological processes were predicted by gene set enrichment analysis (GSEA) and validated by immunohistochemistry (Ki67 proliferation marker, reactive stroma grade) and RT-PCR (MYC). RESULTS Genes with correlation between expression level and hypoxic fraction were identified for 56 MRI-based hypoxia levels. At all levels, GSEA identified proliferation as the predominant biological process enriched among the correlating genes. Two independent proliferative gene signatures were developed. The Peak1 signature, upregulated at moderate/severe hypoxia, reflected MYC upregulation and high Ki67-proliferation index of cancer cells in pimonidazole-positive regions. The Peak2 signature, upregulated at mild to non-hypoxic levels, was associated with fibroblast gene signature and reactive stroma grade. High scores of both Peak1 and Peak2 indicated elevated risk of biochemical recurrence in multiple cohorts. CONCLUSION Radiogenomics identified two gene expression programs activated at different hypoxia levels, reflecting proliferation of cancer cells and stroma cells. Genes involved in these programs could be candidate targets for intervention.
Collapse
Affiliation(s)
- Vilde Eide Skingen
- Department of Radiation Biology, Oslo University Hospital, Oslo, Norway; Department of Physics, University of Oslo, Oslo, Norway
| | - Tord Hompland
- Department of Radiation Biology, Oslo University Hospital, Oslo, Norway
| | | | - Unn Beate Salberg
- Department of Radiation Biology, Oslo University Hospital, Oslo, Norway
| | - Hanna Helgeland
- Department of Radiation Biology, Oslo University Hospital, Oslo, Norway
| | - Harald Bull Ragnum
- Department of Radiation Biology, Oslo University Hospital, Oslo, Norway; Department of Oncology and Hematology, Telemark Hospital Trust, Skien, Norway
| | | | | | - Knut Håkon Hole
- Department of Radiology and Nuclear Medicine, Oslo University Hospital, Oslo, Norway
| | - Therese Seierstad
- Department of Radiology and Nuclear Medicine, Oslo University Hospital, Oslo, Norway
| | - Heidi Lyng
- Department of Radiation Biology, Oslo University Hospital, Oslo, Norway; Department of Physics, University of Oslo, Oslo, Norway.
| |
Collapse
|
9
|
Yassi M, Chatterjee A, Parry M. Application of deep learning in cancer epigenetics through DNA methylation analysis. Brief Bioinform 2023; 24:bbad411. [PMID: 37985455 PMCID: PMC10661960 DOI: 10.1093/bib/bbad411] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 10/08/2023] [Accepted: 10/25/2023] [Indexed: 11/22/2023] Open
Abstract
DNA methylation is a fundamental epigenetic modification involved in various biological processes and diseases. Analysis of DNA methylation data at a genome-wide and high-throughput level can provide insights into diseases influenced by epigenetics, such as cancer. Recent technological advances have led to the development of high-throughput approaches, such as genome-scale profiling, that allow for computational analysis of epigenetics. Deep learning (DL) methods are essential in facilitating computational studies in epigenetics for DNA methylation analysis. In this systematic review, we assessed the various applications of DL applied to DNA methylation data or multi-omics data to discover cancer biomarkers, perform classification, imputation and survival analysis. The review first introduces state-of-the-art DL architectures and highlights their usefulness in addressing challenges related to cancer epigenetics. Finally, the review discusses potential limitations and future research directions in this field.
Collapse
Affiliation(s)
- Maryam Yassi
- Department of Mathematics and Statistics, University of Otago, Dunedin, New Zealand
- Department of Pathology, Dunedin School of Medicine, University of Otago, Dunedin, New Zealand
| | - Aniruddha Chatterjee
- Department of Pathology, Dunedin School of Medicine, University of Otago, Dunedin, New Zealand
- Honorary Professor, UPES University, Dehradun, India
| | - Matthew Parry
- Department of Mathematics and Statistics, University of Otago, Dunedin, New Zealand
- Te Pūnaha Matatini Centre of Research Excellence, University of Auckland, Auckland, New Zealand
| |
Collapse
|
10
|
Zhu J, Oh JH, Simhal AK, Elkin R, Norton L, Deasy JO, Tannenbaum A. Geometric graph neural networks on multi-omics data to predict cancer survival outcomes. Comput Biol Med 2023; 163:107117. [PMID: 37329617 PMCID: PMC10638676 DOI: 10.1016/j.compbiomed.2023.107117] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 05/25/2023] [Accepted: 05/30/2023] [Indexed: 06/19/2023]
Abstract
The advance of sequencing technologies has enabled a thorough molecular characterization of the genome in human cancers. To improve patient prognosis predictions and subsequent treatment strategies, it is imperative to develop advanced computational methods to analyze large-scale, high-dimensional genomic data. However, traditional machine learning methods face a challenge in handling the high-dimensional, low-sample size problem that is shown in most genomic data sets. To address this, our group has developed geometric network analysis techniques on multi-omics data in connection with prior biological knowledge derived from protein-protein interactions (PPIs) or pathways. Geometric features obtained from the genomic network, such as Ollivier-Ricci curvature and the invariant measure of the associated Markov chain, have been shown to be predictive of survival outcomes in various cancers. In this study, we propose a novel supervised deep learning method called geometric graph neural network (GGNN) that incorporates such geometric features into deep learning for enhanced predictive power and interpretability. More specifically, we utilize a state-of-the-art graph neural network with sparse connections between the hidden layers based on known biology of the PPI network and pathway information. Geometric features along with multi-omics data are then incorporated into the corresponding layers. The proposed approach utilizes a local-global principle in such a manner that highly predictive features are selected at the front layers and fed directly to the last layer for multivariable Cox proportional-hazards regression modeling. The method was applied to multi-omics data from the CoMMpass study of multiple myeloma and ten major cancers in The Cancer Genome Atlas (TCGA). In most experiments, our method showed superior predictive performance compared to other alternative methods.
Collapse
Affiliation(s)
- Jiening Zhu
- Department of Applied Mathematics & Statistics, Stony Brook University, NY, USA.
| | - Jung Hun Oh
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, NY, USA.
| | - Anish K Simhal
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, NY, USA.
| | - Rena Elkin
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, NY, USA.
| | - Larry Norton
- Department of Medicine, Memorial Sloan Kettering Cancer Center, NY, USA.
| | - Joseph O Deasy
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, NY, USA.
| | - Allen Tannenbaum
- Department of Applied Mathematics & Statistics, Stony Brook University, NY, USA; Department of Computer Science, Stony Brook University, NY, USA.
| |
Collapse
|
11
|
Blutt SE, Coarfa C, Neu J, Pammi M. Multiomic Investigations into Lung Health and Disease. Microorganisms 2023; 11:2116. [PMID: 37630676 PMCID: PMC10459661 DOI: 10.3390/microorganisms11082116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 08/08/2023] [Accepted: 08/13/2023] [Indexed: 08/27/2023] Open
Abstract
Diseases of the lung account for more than 5 million deaths worldwide and are a healthcare burden. Improving clinical outcomes, including mortality and quality of life, involves a holistic understanding of the disease, which can be provided by the integration of lung multi-omics data. An enhanced understanding of comprehensive multiomic datasets provides opportunities to leverage those datasets to inform the treatment and prevention of lung diseases by classifying severity, prognostication, and discovery of biomarkers. The main objective of this review is to summarize the use of multiomics investigations in lung disease, including multiomics integration and the use of machine learning computational methods. This review also discusses lung disease models, including animal models, organoids, and single-cell lines, to study multiomics in lung health and disease. We provide examples of lung diseases where multi-omics investigations have provided deeper insight into etiopathogenesis and have resulted in improved preventative and therapeutic interventions.
Collapse
Affiliation(s)
- Sarah E. Blutt
- Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX 77030, USA;
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA;
| | - Cristian Coarfa
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA;
- Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Josef Neu
- Department of Pediatrics, Section of Neonatology, University of Florida, Gainesville, FL 32611, USA;
| | - Mohan Pammi
- Department of Pediatrics, Section of Neonatology, Baylor College of Medicine and Texas Children’s Hospital, Houston, TX 77030, USA
| |
Collapse
|
12
|
Wen G, Li L. FGCNSurv: dually fused graph convolutional network for multi-omics survival prediction. Bioinformatics 2023; 39:btad472. [PMID: 37522887 PMCID: PMC10412406 DOI: 10.1093/bioinformatics/btad472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Revised: 05/24/2023] [Accepted: 07/29/2023] [Indexed: 08/01/2023] Open
Abstract
MOTIVATION Survival analysis is an important tool for modeling time-to-event data, e.g. to predict the survival time of patient after a cancer diagnosis or a certain treatment. While deep neural networks work well in standard prediction tasks, it is still unclear how to best utilize these deep models in survival analysis due to the difficulty of modeling right censored data, especially for multi-omics data. Although existing methods have shown the advantage of multi-omics integration in survival prediction, it remains challenging to extract complementary information from different omics and improve the prediction accuracy. RESULTS In this work, we propose a novel multi-omics deep survival prediction approach by dually fused graph convolutional network (GCN) named FGCNSurv. Our FGCNSurv is a complete generative model from multi-omics data to survival outcome of patients, including feature fusion by a factorized bilinear model, graph fusion of multiple graphs, higher-level feature extraction by GCN and survival prediction by a Cox proportional hazard model. The factorized bilinear model enables to capture cross-omics features and quantify complex relations from multi-omics data. By fusing single-omics features and the cross-omics features, and simultaneously fusing multiple graphs from different omics, GCN with the generated dually fused graph could capture higher-level features for computing the survival loss in the Cox-PH model. Comprehensive experimental results on real-world datasets with gene expression and microRNA expression data show that the proposed FGCNSurv method outperforms existing survival prediction methods, and imply its ability to extract complementary information for survival prediction from multi-omics data. AVAILABILITY AND IMPLEMENTATION The codes are freely available at https://github.com/LiminLi-xjtu/FGCNSurv.
Collapse
Affiliation(s)
- Gang Wen
- School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an, Shaanxi 710049, China
| | - Limin Li
- School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an, Shaanxi 710049, China
| |
Collapse
|
13
|
Lin SH, Chien CH, Chang KP, Lu MF, Chen YT, Chu YW. SaBrcada: Survival Intervals Prediction for Breast Cancer Patients by Dimension Raising and Age Stratification. Cancers (Basel) 2023; 15:3690. [PMID: 37509351 PMCID: PMC10378351 DOI: 10.3390/cancers15143690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 07/03/2023] [Accepted: 07/18/2023] [Indexed: 07/30/2023] Open
Abstract
(1) Background: Breast cancer is the second leading cause of cancer death among women. The accurate prediction of survival intervals will help physicians make informed decisions about treatment strategies or the use of palliative care. (2) Methods: Gene expression is predictive and correlates to patient prognosis. To establish a reliable prediction tool, we collected a total of 1187 RNA-seq data points from breast cancer patients (median age 58 years) in Fragments Per Kilobase Million (FPKM) format from the TCGA database. Among them, we selected 144 patients with date of death information to establish the SaBrcada-AD dataset. We first normalized the SaBrcada-AD dataset to TPM to build the survival prediction model SaBrcada. After normalization and dimension raising, we used the differential gene expression data to test eight different deep learning architectures. Considering the effect of age on prognosis, we also performed a stratified random sampling test on all ages between the lower and upper quartiles of patient age, 48 and 69 years; (3) Results: Stratifying by age 61, the performance of SaBrcada built by GoogLeNet was improved to a highest accuracy of 0.798. We also built a free website tool to provide five predicted survival periods: within six months, six months to one year, one to three years, three to five years, or over five years, for clinician reference. (4) Conclusions: We built the prediction model, SaBrcada, and the website tool of the same name for breast cancer survival analysis. Through these models and tools, clinicians will be provided with survival interval information as a basis for formulating precision medicine.
Collapse
Affiliation(s)
- Shih-Huan Lin
- Ph.D. Program in Medical Biotechnology, National Chung Hsing University, Taichung 40227, Taiwan
| | - Ching-Hsuan Chien
- Ph.D. Program in Medical Biotechnology, National Chung Hsing University, Taichung 40227, Taiwan
| | - Kai-Po Chang
- Department of Pathology, China Medical University Hospital, Taichung 404327, Taiwan
| | - Min-Fang Lu
- Institute of Genomics and Bioinformatics, National Chung Hsing University, Taichung 40227, Taiwan
| | - Yu-Ting Chen
- Ph.D. Program in Medical Biotechnology, National Chung Hsing University, Taichung 40227, Taiwan
- Institute of Genomics and Bioinformatics, National Chung Hsing University, Taichung 40227, Taiwan
- Biotechnology Center, National Chung Hsing University, Taichung 40227, Taiwan
- Agricultural Biotechnology Center, National Chung Hsing University, Taichung 40227, Taiwan
| | - Yen-Wei Chu
- Ph.D. Program in Medical Biotechnology, National Chung Hsing University, Taichung 40227, Taiwan
- Institute of Genomics and Bioinformatics, National Chung Hsing University, Taichung 40227, Taiwan
- Biotechnology Center, National Chung Hsing University, Taichung 40227, Taiwan
- Agricultural Biotechnology Center, National Chung Hsing University, Taichung 40227, Taiwan
- Institute of Molecular Biology, National Chung Hsing University, Taichung 40227, Taiwan
- Smart Sustainable New Agriculture Research Center (SMARTer), Taichung 40227, Taiwan
| |
Collapse
|
14
|
Gong P, Cheng L, Zhang Z, Meng A, Li E, Chen J, Zhang L. Multi-omics integration method based on attention deep learning network for biomedical data classification. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 231:107377. [PMID: 36739624 DOI: 10.1016/j.cmpb.2023.107377] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 01/06/2023] [Accepted: 01/25/2023] [Indexed: 06/18/2023]
Abstract
BACKGROUND AND OBJECTIVE Integrating multi-omics data for the comprehensive analysis of the biological processes in human diseases has become one of the most challenging tasks of bioinformatics. Deep learning (DL) algorithms have recently become one of the most promising multi-omics data integration analysis methods. However, existing DL-based studies almost integrate the multi-omics data by concatenation in the input data space or the learned feature space, ignoring the correlations between patients and omics. METHODS We propose a novel multi-omics integration method, called Multi-omics Attention Deep Learning Network (MOADLN), which is used for biomedical data classification. Firstly, for each type of omics data, we use three fully-connected layers and the self-attention mechanism to reduce dimensionality, and construct the correlations between patients, respectively. Then, we apply the feature vector learned from self-attention to generate the initial category labels. Secondly, for the initial label predicted of each omics data, we use an effective Multi-Omics Correlation Discovery Network (MOCDN) to learn the cross-omic correlations in the label space. Finally, we use the softmax classifier for label prediction. RESULTS We demonstrate that our method outperforms several state-of-the-art methods on two datasets with mRNA expression data, DNA methylation data, and miRNA expression data. In addition, we identified essential biomarkers of relevant diseases by MOADLN, and the generality of MOADLN is also demonstrated in the KIRP and KIRC datasets. CONCLUSIONS MOADLN jointly explores correlations between patients in intra-omics and correlations of cross-omics in label space, which is an effective DL-based classification of biomedical data.
Collapse
Affiliation(s)
- Ping Gong
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, CN, China.
| | - Lei Cheng
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, CN, China
| | - Zhiyuan Zhang
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, CN, China
| | - Ao Meng
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, CN, China
| | - Enshuo Li
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, CN, China
| | - Jie Chen
- Department of Radiation Oncology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, CN, China
| | - Longzhen Zhang
- Department of Radiation Oncology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, CN, China
| |
Collapse
|
15
|
Local augmented graph neural network for multi-omics cancer prognosis prediction and analysis. Methods 2023; 213:1-9. [PMID: 36933628 DOI: 10.1016/j.ymeth.2023.02.011] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 12/30/2022] [Accepted: 02/25/2023] [Indexed: 03/17/2023] Open
Abstract
Cancer prognosis prediction and analysis can help patients understand expected life and help clinicians provide correct therapeutic guidance. Thanks to the development of sequencing technology, multi-omics data, and biological networks have been used for cancer prognosis prediction. Besides, graph neural networks can simultaneously consider multi-omics features and molecular interactions in biological networks, becoming mainstream in cancer prognosis prediction and analysis. However, the limited number of neighboring genes in biological networks restricts the accuracy of graph neural networks. To solve this problem, a local augmented graph convolutional network named LAGProg is proposed in this paper for cancer prognosis prediction and analysis. The process follows: first, given a patient's multi-omics data features and biological network, the corresponding augmented conditional variational autoencoder generates features. Then, the generated augmented features and the original features are fed into a cancer prognosis prediction model to complete the cancer prognosis prediction task. The conditional variational autoencoder consists of two parts: encoder-decoder. In the encoding phase, an encoder learns the conditional distribution of the multi-omics data. As a generative model, a decoder takes the conditional distribution and the original feature as inputs to generate the enhanced features. The cancer prognosis prediction model consists of a two-layer graph convolutional neural network and a Cox proportional risk network. The Cox proportional risk network consists of fully connected layers. Extensive experiments on 15 real-world datasets from TCGA demonstrated the effectiveness and efficiency of the proposed method in predicting cancer prognosis. LAGProg improved the C-index values by an average of 8.5% over the state-of-the-art graph neural network method. Moreover, we confirmed that the local augmentation technique could enhance the model's ability to represent multi-omics features, improve the model's robustness to missing multi-omics features, and prevent the model's over-smoothing during training. Finally, based on genes identified through differential expression analysis, we discovered 13 prognostic markers highly associated with breast cancer, among which ten genes have been proved by literature review.
Collapse
|
16
|
Du X, Zhao Y. Multimodal adversarial representation learning for breast cancer prognosis prediction. Comput Biol Med 2023; 157:106765. [PMID: 36963355 DOI: 10.1016/j.compbiomed.2023.106765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 02/27/2023] [Accepted: 03/07/2023] [Indexed: 03/17/2023]
Abstract
With the increasing incidence of breast cancer, accurate prognosis prediction of breast cancer patients is a key issue in current cancer research, and it is also of great significance for patients' psychological rehabilitation and assisting clinical decision-making. Many studies that integrate data from different heterogeneous modalities such as gene expression profile, clinical data, and copy number alteration, have achieved greater success than those with only one modality in prognostic prediction. However, many of these approaches that exist fail to dramatically reduce the modality gap by aligning multimodal distributions. Therefore, it is crucial to develop a method that fully considers a modality-invariant embedding space to effectively integrate multimodal data. In this study, to reduce the modality gap, we propose a multimodal data adversarial representation framework (MDAR) to reduce the modal heterogeneity by translating source modalities into distributions for the target modality. Additionally, we apply reconstruction and classification losses to embedding space to further constrain it. Then, we design a multi-scale bilinear convolutional neural network (MS-B-CNN) for uni-modality to improve the feature expression ability. In addition, the embedding space generates predictions as stacked feature inputs to the extremely randomized trees classifier. With 10-fold cross-validation, our results show that the proposed adversarial representation learning improves prognostic performance. A comparative study of this method and other existing methods on the METABRIC (1980 patients) dataset showed that Matthews correlation coefficient (Mcc) was significantly enhanced by 7.4% in the prognosis prediction of breast cancer patients.
Collapse
Affiliation(s)
- Xiuquan Du
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei, China; School of Computer Science and Technology, Anhui University, Hefei, China.
| | - Yuefan Zhao
- School of Computer Science and Technology, Anhui University, Hefei, China
| |
Collapse
|
17
|
Unlu Yazici M, Marron JS, Bakir-Gungor B, Zou F, Yousef M. Invention of 3Mint for feature grouping and scoring in multi-omics. Front Genet 2023; 14:1093326. [PMID: 37007972 PMCID: PMC10050723 DOI: 10.3389/fgene.2023.1093326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 02/27/2023] [Indexed: 03/17/2023] Open
Abstract
Advanced genomic and molecular profiling technologies accelerated the enlightenment of the regulatory mechanisms behind cancer development and progression, and the targeted therapies in patients. Along this line, intense studies with immense amounts of biological information have boosted the discovery of molecular biomarkers. Cancer is one of the leading causes of death around the world in recent years. Elucidation of genomic and epigenetic factors in Breast Cancer (BRCA) can provide a roadmap to uncover the disease mechanisms. Accordingly, unraveling the possible systematic connections between-omics data types and their contribution to BRCA tumor progression is crucial. In this study, we have developed a novel machine learning (ML) based integrative approach for multi-omics data analysis. This integrative approach combines information from gene expression (mRNA), microRNA (miRNA) and methylation data. Due to the complexity of cancer, this integrated data is expected to improve the prediction, diagnosis and treatment of disease through patterns only available from the 3-way interactions between these 3-omics datasets. In addition, the proposed method bridges the interpretation gap between the disease mechanisms that drive onset and progression. Our fundamental contribution is the 3 Multi-omics integrative tool (3Mint). This tool aims to perform grouping and scoring of groups using biological knowledge. Another major goal is improved gene selection via detection of novel groups of cross-omics biomarkers. Performance of 3Mint is assessed using different metrics. Our computational performance evaluations showed that the 3Mint classifies the BRCA molecular subtypes with lower number of genes when compared to the miRcorrNet tool which uses miRNA and mRNA gene expression profiles in terms of similar performance metrics (95% Accuracy). The incorporation of methylation data in 3Mint yields a much more focused analysis. The 3Mint tool and all other supplementary files are available at https://github.com/malikyousef/3Mint/.
Collapse
Affiliation(s)
- Miray Unlu Yazici
- Department of Bioengineering, Abdullah Gül University, Kayseri, Türkiye
| | - J. S. Marron
- Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC, United States
| | - Burcu Bakir-Gungor
- Department of Bioengineering, Abdullah Gül University, Kayseri, Türkiye
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Türkiye
| | - Fei Zou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Malik Yousef
- Department of Information Systems, Zefat Academic College, Zefat, Israel
- Galilee Digital Health Research Center, Zefat Academic College, Zefat, Israel
- *Correspondence: Malik Yousef,
| |
Collapse
|
18
|
Benkirane H, Pradat Y, Michiels S, Cournède PH. CustOmics: A versatile deep-learning based strategy for multi-omics integration. PLoS Comput Biol 2023; 19:e1010921. [PMID: 36877736 PMCID: PMC10019780 DOI: 10.1371/journal.pcbi.1010921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 03/16/2023] [Accepted: 02/04/2023] [Indexed: 03/07/2023] Open
Abstract
The availability of patient cohorts with several types of omics data opens new perspectives for exploring the disease's underlying biological processes and developing predictive models. It also comes with new challenges in computational biology in terms of integrating high-dimensional and heterogeneous data in a fashion that captures the interrelationships between multiple genes and their functions. Deep learning methods offer promising perspectives for integrating multi-omics data. In this paper, we review the existing integration strategies based on autoencoders and propose a new customizable one whose principle relies on a two-phase approach. In the first phase, we adapt the training to each data source independently before learning cross-modality interactions in the second phase. By taking into account each source's singularity, we show that this approach succeeds at taking advantage of all the sources more efficiently than other strategies. Moreover, by adapting our architecture to the computation of Shapley additive explanations, our model can provide interpretable results in a multi-source setting. Using multiple omics sources from different TCGA cohorts, we demonstrate the performance of the proposed method for cancer on test cases for several tasks, such as the classification of tumor types and breast cancer subtypes, as well as survival outcome prediction. We show through our experiments the great performances of our architecture on seven different datasets with various sizes and provide some interpretations of the results obtained. Our code is available on (https://github.com/HakimBenkirane/CustOmics).
Collapse
Affiliation(s)
- Hakim Benkirane
- Université Paris-Saclay, CentraleSupélec, Lab of Mathematics and Informatics (MICS), Gif-sur-Yvette, France
- Oncostat U1018, Inserm, Université Paris-Saclay, Équipe Labellisée Ligue Contre le Cancer, CESP, Villejuif, France
| | - Yoann Pradat
- Université Paris-Saclay, CentraleSupélec, Lab of Mathematics and Informatics (MICS), Gif-sur-Yvette, France
| | - Stefan Michiels
- Oncostat U1018, Inserm, Université Paris-Saclay, Équipe Labellisée Ligue Contre le Cancer, CESP, Villejuif, France
- Bureau de Biostatistique et d’Épidémiologie, Gustave Roussy, Université Paris-Saclay, Villejuif, France
| | - Paul-Henry Cournède
- Université Paris-Saclay, CentraleSupélec, Lab of Mathematics and Informatics (MICS), Gif-sur-Yvette, France
- * E-mail:
| |
Collapse
|
19
|
Wang S, Wang S, Wang Z. A survey on multi-omics-based cancer diagnosis using machine learning with the potential application in gastrointestinal cancer. Front Med (Lausanne) 2023; 9:1109365. [PMID: 36703893 PMCID: PMC9871466 DOI: 10.3389/fmed.2022.1109365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Accepted: 12/28/2022] [Indexed: 01/12/2023] Open
Abstract
Gastrointestinal cancer is becoming increasingly common, which leads to over 3 million deaths every year. No typical symptoms appear in the early stage of gastrointestinal cancer, posing a significant challenge in the diagnosis and treatment of patients with gastrointestinal cancer. Many patients are in the middle and late stages of gastrointestinal cancer when they feel uncomfortable, unfortunately, most of them will die of gastrointestinal cancer. Recently, various artificial intelligence techniques like machine learning based on multi-omics have been presented for cancer diagnosis and treatment in the era of precision medicine. This paper provides a survey on multi-omics-based cancer diagnosis using machine learning with potential application in gastrointestinal cancer. Particularly, we make a comprehensive summary and analysis from the perspective of multi-omics datasets, task types, and multi-omics-based integration methods. Furthermore, this paper points out the remaining challenges of multi-omics-based cancer diagnosis using machine learning and discusses future topics.
Collapse
Affiliation(s)
- Suixue Wang
- School of Information and Communication Engineering, Hainan University, Haikou, China
| | - Shuling Wang
- Department of Neurology, Affiliated Haikou Hospital of Xiangya School of Medicine, Central South University, Haikou, China
| | - Zhengxia Wang
- School of Computer Science and Technology, Hainan University, Haikou, China
| |
Collapse
|
20
|
Sun Q, Cheng L, Meng A, Ge S, Chen J, Zhang L, Gong P. SADLN: Self-attention based deep learning network of integrating multi-omics data for cancer subtype recognition. Front Genet 2023; 13:1032768. [PMID: 36685873 PMCID: PMC9846505 DOI: 10.3389/fgene.2022.1032768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 12/15/2022] [Indexed: 01/05/2023] Open
Abstract
Integrating multi-omics data for cancer subtype recognition is an important task in bioinformatics. Recently, deep learning has been applied to recognize the subtype of cancers. However, existing studies almost integrate the multi-omics data simply by concatenation as the single data and then learn a latent low-dimensional representation through a deep learning model, which did not consider the distribution differently of omics data. Moreover, these methods ignore the relationship of samples. To tackle these problems, we proposed SADLN: A self-attention based deep learning network of integrating multi-omics data for cancer subtype recognition. SADLN combined encoder, self-attention, decoder, and discriminator into a unified framework, which can not only integrate multi-omics data but also adaptively model the sample's relationship for learning an accurately latent low-dimensional representation. With the integrated representation learned from the network, SADLN used Gaussian Mixture Model to identify cancer subtypes. Experiments on ten cancer datasets of TCGA demonstrated the advantages of SADLN compared to ten methods. The Self-Attention Based Deep Learning Network (SADLN) is an effective method of integrating multi-omics data for cancer subtype recognition.
Collapse
Affiliation(s)
- Qiuwen Sun
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, China
| | - Lei Cheng
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, China
| | - Ao Meng
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, China
| | - Shuguang Ge
- School of Information and Control Engineering, University of Mining and Technology, Xuzhou, China
| | - Jie Chen
- Department of Radiation Oncology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Longzhen Zhang
- Department of Radiation Oncology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Ping Gong
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, China,*Correspondence: Ping Gong,
| |
Collapse
|
21
|
Data augmentation guided breast cancer diagnosis and prognosis using an integrated deep-generative framework based on breast tumor’s morphological information. INFORMATICS IN MEDICINE UNLOCKED 2023. [DOI: 10.1016/j.imu.2023.101171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
|
22
|
Hao Y, Jing XY, Sun Q. Joint learning sample similarity and correlation representation for cancer survival prediction. BMC Bioinformatics 2022; 23:553. [PMID: 36536289 PMCID: PMC9761951 DOI: 10.1186/s12859-022-05110-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 12/13/2022] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND As a highly aggressive disease, cancer has been becoming the leading death cause around the world. Accurate prediction of the survival expectancy for cancer patients is significant, which can help clinicians make appropriate therapeutic schemes. With the high-throughput sequencing technology becoming more and more cost-effective, integrating multi-type genome-wide data has been a promising method in cancer survival prediction. Based on these genomic data, some data-integration methods for cancer survival prediction have been proposed. However, existing methods fail to simultaneously utilize feature information and structure information of multi-type genome-wide data. RESULTS We propose a Multi-type Data Joint Learning (MDJL) approach based on multi-type genome-wide data, which comprehensively exploits feature information and structure information. Specifically, MDJL exploits correlation representations between any two data types by cross-correlation calculation for learning discriminant features. Moreover, based on the learned multiple correlation representations, MDJL constructs sample similarity matrices for capturing global and local structures across different data types. With the learned discriminant representation matrix and fused similarity matrix, MDJL constructs graph convolutional network with Cox loss for survival prediction. CONCLUSIONS Experimental results demonstrate that our approach substantially outperforms established integrative methods and is effective for cancer survival prediction.
Collapse
Affiliation(s)
- Yaru Hao
- grid.49470.3e0000 0001 2331 6153School of Computer Science, Wuhan University, Wuhan, China
| | - Xiao-Yuan Jing
- grid.49470.3e0000 0001 2331 6153School of Computer Science, Wuhan University, Wuhan, China ,grid.459577.d0000 0004 1757 6559Guangdong Provincial Key Laboratory of Petrochemical Equipment Fault Diagnosis and School of Computer, Guangdong University of Petrochemical Technology, Maoming, China ,grid.41156.370000 0001 2314 964XState Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
| | - Qixing Sun
- grid.49470.3e0000 0001 2331 6153School of Computer Science, Wuhan University, Wuhan, China
| |
Collapse
|
23
|
Leng D, Zheng L, Wen Y, Zhang Y, Wu L, Wang J, Wang M, Zhang Z, He S, Bo X. A benchmark study of deep learning-based multi-omics data fusion methods for cancer. Genome Biol 2022; 23:171. [PMID: 35945544 PMCID: PMC9361561 DOI: 10.1186/s13059-022-02739-2] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 07/26/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A fused method using a combination of multi-omics data enables a comprehensive study of complex biological processes and highlights the interrelationship of relevant biomolecules and their functions. Driven by high-throughput sequencing technologies, several promising deep learning methods have been proposed for fusing multi-omics data generated from a large number of samples. RESULTS In this study, 16 representative deep learning methods are comprehensively evaluated on simulated, single-cell, and cancer multi-omics datasets. For each of the datasets, two tasks are designed: classification and clustering. The classification performance is evaluated by using three benchmarking metrics including accuracy, F1 macro, and F1 weighted. Meanwhile, the clustering performance is evaluated by using four benchmarking metrics including the Jaccard index (JI), C-index, silhouette score, and Davies Bouldin score. For the cancer multi-omics datasets, the methods' strength in capturing the association of multi-omics dimensionality reduction results with survival and clinical annotations is further evaluated. The benchmarking results indicate that moGAT achieves the best classification performance. Meanwhile, efmmdVAE, efVAE, and lfmmdVAE show the most promising performance across all complementary contexts in clustering tasks. CONCLUSIONS Our benchmarking results not only provide a reference for biomedical researchers to choose appropriate deep learning-based multi-omics data fusion methods, but also suggest the future directions for the development of more effective multi-omics data fusion methods. The deep learning frameworks are available at https://github.com/zhenglinyi/DL-mo .
Collapse
Affiliation(s)
- Dongjin Leng
- Institute of Health Service and Transfusion Medicine, Beijing, People's Republic of China
| | - Linyi Zheng
- School of Informatics, Xiamen University, Xiamen, People's Republic of China
| | - Yuqi Wen
- Institute of Health Service and Transfusion Medicine, Beijing, People's Republic of China
| | - Yunhao Zhang
- School of Informatics, Xiamen University, Xiamen, People's Republic of China
| | - Lianlian Wu
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, People's Republic of China
| | - Jing Wang
- School of Medicine, Tsinghua University, Beijing, People's Republic of China
| | - Meihong Wang
- School of Informatics, Xiamen University, Xiamen, People's Republic of China
| | - Zhongnan Zhang
- School of Informatics, Xiamen University, Xiamen, People's Republic of China.
| | - Song He
- Institute of Health Service and Transfusion Medicine, Beijing, People's Republic of China.
| | - Xiaochen Bo
- Institute of Health Service and Transfusion Medicine, Beijing, People's Republic of China.
| |
Collapse
|
24
|
Tabakhi S, Lu H. Multi-agent Feature Selection for Integrative Multi-omics Analysis. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2022; 2022:1638-1642. [PMID: 36086594 DOI: 10.1109/embc48229.2022.9871758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Multiomics data integration is key for cancer prediction as it captures different aspects of molecular mechanisms. Nevertheless, the high-dimensionality of multi-omics data with a relatively small number of patients presents a challenge for the cancer prediction tasks. While feature selection techniques have been widely used to tackle the curse of dimensionality of multi-omics data, most existing methods have been applied to each type of omics data separately. In this paper, we propose a multi-agent architecture for feature selection, called MAgentOmics, to consider all omics data together. MAgentOmics extends the ant colony optimization algorithm to multi-omics data, which iteratively builds candidate solutions and evaluates them. Moreover, a new fitness function is introduced to assess the candidate feature subsets without using prediction target such as survival time of patients. Therefore, it can be considered as an unsupervised method. We evaluate the performance of MAgentOmics on the TCGA ovarian cancer multi-omics data from 176 patients using a 5-fold cross-validation. The results demonstrate that the integration power of MAgentOmics is relatively better than the state-of-the-art supervised multi-view method. The code is publicly available at https://github.com/SinaTabakhi/MAgentOmics. Clinical relevance- Discovering knowledge in existing multi-omics datasets through better feature selection enhances the clinical understanding of cancers and speeds-up decision-making in the clinic.
Collapse
|
25
|
Combining Molecular, Imaging, and Clinical Data Analysis for Predicting Cancer Prognosis. Cancers (Basel) 2022; 14:cancers14133215. [PMID: 35804988 PMCID: PMC9265023 DOI: 10.3390/cancers14133215] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 06/24/2022] [Accepted: 06/27/2022] [Indexed: 02/04/2023] Open
Abstract
Simple Summary The rise of Big Data, the widespread use of Machine Learning, and the cheapening of omics techniques have allowed for the creation of more sophisticated and accurate models in biomedical research. This article presents the state-of-the-art predictive models of cancer prognosis that use multimodal data, considering clinical, molecular (omics and non-omics), and image data. The subject of study, the data modalities used, the data processing and modelling methods applied, the validation strategies involved, the integration strategies encompassed, and the evolution of prognostic predictive models are discussed. Finally, we discuss challenges and opportunities in this field of cancer research, with great potential impact on the clinical management of patients and, by extension, on the implementation of personalised and precision medicine. Abstract Cancer is one of the most detrimental diseases globally. Accordingly, the prognosis prediction of cancer patients has become a field of interest. In this review, we have gathered 43 state-of-the-art scientific papers published in the last 6 years that built cancer prognosis predictive models using multimodal data. We have defined the multimodality of data as four main types: clinical, anatomopathological, molecular, and medical imaging; and we have expanded on the information that each modality provides. The 43 studies were divided into three categories based on the modelling approach taken, and their characteristics were further discussed together with current issues and future trends. Research in this area has evolved from survival analysis through statistical modelling using mainly clinical and anatomopathological data to the prediction of cancer prognosis through a multi-faceted data-driven approach by the integration of complex, multimodal, and high-dimensional data containing multi-omics and medical imaging information and by applying Machine Learning and, more recently, Deep Learning techniques. This review concludes that cancer prognosis predictive multimodal models are capable of better stratifying patients, which can improve clinical management and contribute to the implementation of personalised medicine as well as provide new and valuable knowledge on cancer biology and its progression.
Collapse
|
26
|
Mo H, Breitling R, Francavilla C, Schwartz JM. Data integration and mechanistic modelling for breast cancer biology: Current state and future directions. CURRENT OPINION IN ENDOCRINE AND METABOLIC RESEARCH 2022; 24:None. [PMID: 36034741 PMCID: PMC9402443 DOI: 10.1016/j.coemr.2022.100350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Breast cancer is one of the most common cancers threatening women worldwide. A limited number of available treatment options, frequent recurrence, and drug resistance exacerbate the prognosis of breast cancer patients. Thus, there is an urgent need for methods to investigate novel treatment options, while taking into account the vast molecular heterogeneity of breast cancer. Recent advances in molecular profiling technologies, including genomics, epigenomics, transcriptomics, proteomics and metabolomics data, enable approaching breast cancer biology at multiple levels of omics interaction networks. Systems biology approaches, including computational inference of ‘big data’ and mechanistic modelling of specific pathways, are emerging to identify potential novel combinations of breast cancer subtype signatures and more diverse targeted therapies.
Collapse
|
27
|
Sapoval N, Aghazadeh A, Nute MG, Antunes DA, Balaji A, Baraniuk R, Barberan CJ, Dannenfelser R, Dun C, Edrisi M, Elworth RAL, Kille B, Kyrillidis A, Nakhleh L, Wolfe CR, Yan Z, Yao V, Treangen TJ. Current progress and open challenges for applying deep learning across the biosciences. Nat Commun 2022; 13:1728. [PMID: 35365602 PMCID: PMC8976012 DOI: 10.1038/s41467-022-29268-7] [Citation(s) in RCA: 61] [Impact Index Per Article: 30.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Accepted: 03/09/2022] [Indexed: 11/19/2022] Open
Abstract
Deep Learning (DL) has recently enabled unprecedented advances in one of the grand challenges in computational biology: the half-century-old problem of protein structure prediction. In this paper we discuss recent advances, limitations, and future perspectives of DL on five broad areas: protein structure prediction, protein function prediction, genome engineering, systems biology and data integration, and phylogenetic inference. We discuss each application area and cover the main bottlenecks of DL approaches, such as training data, problem scope, and the ability to leverage existing DL architectures in new contexts. To conclude, we provide a summary of the subject-specific and general challenges for DL across the biosciences.
Collapse
Affiliation(s)
- Nicolae Sapoval
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Amirali Aghazadeh
- Department of Electrical Engineering and Computer Sciences, University of California Berkeley, Berkeley, CA, USA
| | - Michael G Nute
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Dinler A Antunes
- Department of Biology and Biochemistry, University of Houston, Houston, TX, USA
| | - Advait Balaji
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Richard Baraniuk
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
| | - C J Barberan
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
| | | | - Chen Dun
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - R A Leo Elworth
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Bryce Kille
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Cameron R Wolfe
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Zhi Yan
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Vicky Yao
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX, USA.
- Department of Bioengineering, Rice University, Houston, TX, USA.
| |
Collapse
|
28
|
Stahlschmidt SR, Ulfenborg B, Synnergren J. Multimodal deep learning for biomedical data fusion: a review. Brief Bioinform 2022; 23:6516346. [PMID: 35089332 PMCID: PMC8921642 DOI: 10.1093/bib/bbab569] [Citation(s) in RCA: 68] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 12/06/2021] [Accepted: 12/11/2021] [Indexed: 02/06/2023] Open
Abstract
Biomedical data are becoming increasingly multimodal and thereby capture the underlying complex relationships among biological processes. Deep learning (DL)-based data fusion strategies are a popular approach for modeling these nonlinear relationships. Therefore, we review the current state-of-the-art of such methods and propose a detailed taxonomy that facilitates more informed choices of fusion strategies for biomedical applications, as well as research on novel methods. By doing so, we find that deep fusion strategies often outperform unimodal and shallow approaches. Additionally, the proposed subcategories of fusion strategies show different advantages and drawbacks. The review of current methods has shown that, especially for intermediate fusion strategies, joint representation learning is the preferred approach as it effectively models the complex interactions of different levels of biological organization. Finally, we note that gradual fusion, based on prior biological knowledge or on search strategies, is a promising future research path. Similarly, utilizing transfer learning might overcome sample size limitations of multimodal data sets. As these data sets become increasingly available, multimodal DL approaches present the opportunity to train holistic models that can learn the complex regulatory dynamics behind health and disease.
Collapse
Affiliation(s)
| | | | - Jane Synnergren
- Systems Biology Research Center, University of Skövde, Sweden
| |
Collapse
|
29
|
Benning L, Peintner A, Peintner L. Advances in and the Applicability of Machine Learning-Based Screening and Early Detection Approaches for Cancer: A Primer. Cancers (Basel) 2022; 14:cancers14030623. [PMID: 35158890 PMCID: PMC8833439 DOI: 10.3390/cancers14030623] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 01/22/2022] [Accepted: 01/25/2022] [Indexed: 02/07/2023] Open
Abstract
Simple Summary Non-communicable diseases in general, and cancer in particular, contribute greatly to the global burden of disease. Although significant advances have been made to address this burden, cancer is still among the top drivers of mortality, second only to cardiovascular diseases. Consensus has been established that a key factor to reduce the burden of disease from cancer is to improve screening for and the early detection of such conditions. To date, however, most approaches in this field relied on established screening methods, such as a clinical examination, radiographic imaging, tissue staining or biochemical markers. Yet, with the advances of information technology, new data-driven screening and diagnostic tools have been developed. This article provides a brief overview of the theoretical foundations of these data-driven approaches, highlights the promising use cases and underscores the challenges and limitations that come with the introduction of these approaches to the clinical field. Abstract Despite the efforts of the past decades, cancer is still among the key drivers of global mortality. To increase the detection rates, screening programs and other efforts to improve early detection were initiated to cover the populations at a particular risk for developing a specific malignant condition. These diagnostic approaches have, so far, mostly relied on conventional diagnostic methods and have made little use of the vast amounts of clinical and diagnostic data that are routinely being collected along the diagnostic pathway. Practitioners have lacked the tools to handle this ever-increasing flood of data. Only recently, the clinical field has opened up more for the opportunities that come with the systematic utilisation of high-dimensional computational data analysis. We aim to introduce the reader to the theoretical background of machine learning (ML) and elaborate on the established and potential use cases of ML algorithms in screening and early detection. Furthermore, we assess and comment on the relevant challenges and misconceptions of the applicability of ML-based diagnostic approaches. Lastly, we emphasise the need for a clear regulatory framework to responsibly introduce ML-based diagnostics in clinical practice and routine care.
Collapse
Affiliation(s)
- Leo Benning
- Health Care Supply Research and Data Mining Working Group, Emergency Department, University Medical Center Freiburg, 79106 Freiburg, Germany;
| | - Andreas Peintner
- Databases and Information Systems, Department of Computer Science, Leopold-Franzens University of Innsbruck, 6020 Innsbruck, Austria;
| | - Lukas Peintner
- Institute of Molecular Medicine and Cell Research, Albert Ludwigs University of Freiburg, 79085 Freiburg, Germany
- Correspondence: ; Tel.: +49-761-203-9618
| |
Collapse
|
30
|
Kang M, Ko E, Mersha TB. A roadmap for multi-omics data integration using deep learning. Brief Bioinform 2022; 23:bbab454. [PMID: 34791014 PMCID: PMC8769688 DOI: 10.1093/bib/bbab454] [Citation(s) in RCA: 79] [Impact Index Per Article: 39.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 09/30/2021] [Accepted: 10/05/2021] [Indexed: 12/18/2022] Open
Abstract
High-throughput next-generation sequencing now makes it possible to generate a vast amount of multi-omics data for various applications. These data have revolutionized biomedical research by providing a more comprehensive understanding of the biological systems and molecular mechanisms of disease development. Recently, deep learning (DL) algorithms have become one of the most promising methods in multi-omics data analysis, due to their predictive performance and capability of capturing nonlinear and hierarchical features. While integrating and translating multi-omics data into useful functional insights remain the biggest bottleneck, there is a clear trend towards incorporating multi-omics analysis in biomedical research to help explain the complex relationships between molecular layers. Multi-omics data have a role to improve prevention, early detection and prediction; monitor progression; interpret patterns and endotyping; and design personalized treatments. In this review, we outline a roadmap of multi-omics integration using DL and offer a practical perspective into the advantages, challenges and barriers to the implementation of DL in multi-omics data.
Collapse
Affiliation(s)
- Mingon Kang
- Department of Computer Science at the University of Nevada, Las Vegas, NV, USA
| | - Euiseong Ko
- Department of Computer Science at the University of Nevada, Las Vegas, NV, USA
| | - Tesfaye B Mersha
- Department of Pediatrics, Cincinnati Children’s Hospital Medical Center, University of Cincinnati, Cincinnati, OH, USA
| |
Collapse
|
31
|
Anklam E, Bahl MI, Ball R, Beger RD, Cohen J, Fitzpatrick S, Girard P, Halamoda-Kenzaoui B, Hinton D, Hirose A, Hoeveler A, Honma M, Hugas M, Ishida S, Kass GEN, Kojima H, Krefting I, Liachenko S, Liu Y, Masters S, Marx U, McCarthy T, Mercer T, Patri A, Pelaez C, Pirmohamed M, Platz S, Ribeiro AJS, Rodricks JV, Rusyn I, Salek RM, Schoonjans R, Silva P, Svendsen CN, Sumner S, Sung K, Tagle D, Tong L, Tong W, van den Eijnden-van-Raaij J, Vary N, Wang T, Waterton J, Wang M, Wen H, Wishart D, Yuan Y, Slikker Jr. W. Emerging technologies and their impact on regulatory science. Exp Biol Med (Maywood) 2022; 247:1-75. [PMID: 34783606 PMCID: PMC8749227 DOI: 10.1177/15353702211052280] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
There is an evolution and increasing need for the utilization of emerging cellular, molecular and in silico technologies and novel approaches for safety assessment of food, drugs, and personal care products. Convergence of these emerging technologies is also enabling rapid advances and approaches that may impact regulatory decisions and approvals. Although the development of emerging technologies may allow rapid advances in regulatory decision making, there is concern that these new technologies have not been thoroughly evaluated to determine if they are ready for regulatory application, singularly or in combinations. The magnitude of these combined technical advances may outpace the ability to assess fit for purpose and to allow routine application of these new methods for regulatory purposes. There is a need to develop strategies to evaluate the new technologies to determine which ones are ready for regulatory use. The opportunity to apply these potentially faster, more accurate, and cost-effective approaches remains an important goal to facilitate their incorporation into regulatory use. However, without a clear strategy to evaluate emerging technologies rapidly and appropriately, the value of these efforts may go unrecognized or may take longer. It is important for the regulatory science field to keep up with the research in these technically advanced areas and to understand the science behind these new approaches. The regulatory field must understand the critical quality attributes of these novel approaches and learn from each other's experience so that workforces can be trained to prepare for emerging global regulatory challenges. Moreover, it is essential that the regulatory community must work with the technology developers to harness collective capabilities towards developing a strategy for evaluation of these new and novel assessment tools.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Reza M Salek
- International Agency for Research on Cancer, France
| | | | | | | | | | | | | | - Li Tong
- Universities of Georgia Tech and Emory, USA
| | | | | | - Neil Vary
- Canadian Food Inspection Agency, Canada
| | - Tao Wang
- National Medical Products Administration, China
| | | | - May Wang
- Universities of Georgia Tech and Emory, USA
| | - Hairuo Wen
- National Institutes for Food and Drug Control, China
| | | | | | | |
Collapse
|
32
|
Vijayakumar S, Magazzù G, Moon P, Occhipinti A, Angione C. A Practical Guide to Integrating Multimodal Machine Learning and Metabolic Modeling. Methods Mol Biol 2022; 2399:87-122. [PMID: 35604554 DOI: 10.1007/978-1-0716-1831-8_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Complex, distributed, and dynamic sets of clinical biomedical data are collectively referred to as multimodal clinical data. In order to accommodate the volume and heterogeneity of such diverse data types and aid in their interpretation when they are combined with a multi-scale predictive model, machine learning is a useful tool that can be wielded to deconstruct biological complexity and extract relevant outputs. Additionally, genome-scale metabolic models (GSMMs) are one of the main frameworks striving to bridge the gap between genotype and phenotype by incorporating prior biological knowledge into mechanistic models. Consequently, the utilization of GSMMs as a foundation for the integration of multi-omic data originating from different domains is a valuable pursuit towards refining predictions. In this chapter, we show how cancer multi-omic data can be analyzed via multimodal machine learning and metabolic modeling. Firstly, we focus on the merits of adopting an integrative systems biology led approach to biomedical data mining. Following this, we propose how constraint-based metabolic models can provide a stable yet adaptable foundation for the integration of multimodal data with machine learning. Finally, we provide a step-by-step tutorial for the combination of machine learning and GSMMs, which includes: (i) tissue-specific constraint-based modeling; (ii) survival analysis using time-to-event prediction for cancer; and (iii) classification and regression approaches for multimodal machine learning. The code associated with the tutorial can be found at https://github.com/Angione-Lab/Tutorials_Combining_ML_and_GSMM .
Collapse
Affiliation(s)
- Supreeta Vijayakumar
- Computational Systems Biology and Data Analytics Research Group, Teesside University, Middlebrough, UK
| | - Giuseppe Magazzù
- Computational Systems Biology and Data Analytics Research Group, Teesside University, Middlebrough, UK
| | - Pradip Moon
- Computational Systems Biology and Data Analytics Research Group, Teesside University, Middlebrough, UK
| | - Annalisa Occhipinti
- Computational Systems Biology and Data Analytics Research Group, Middlebrough, UK
- Centre for Digital Innovation, Teesside University, Middlesbrough, UK
| | - Claudio Angione
- Computational Systems Biology and Data Analytics Research Group, Teesside University, Middlebrough, UK.
- Centre for Digital Innovation, Teesside University, Middlesbrough, UK.
- Healthcare Innovation Centre, Teesside University, Middlesbrough, UK.
| |
Collapse
|
33
|
Subramanian A, Zakeri P, Mousa M, Alnaqbi H, Alshamsi FY, Bettoni L, Damiani E, Alsafar H, Saeys Y, Carmeliet P. Angiogenesis goes computational – The future way forward to discover new angiogenic targets? Comput Struct Biotechnol J 2022; 20:5235-5255. [PMID: 36187917 PMCID: PMC9508490 DOI: 10.1016/j.csbj.2022.09.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 09/09/2022] [Accepted: 09/09/2022] [Indexed: 11/26/2022] Open
Abstract
Multi-omics technologies are being increasingly utilized in angiogenesis research. Yet, computational methods have not been widely used for angiogenic target discovery and prioritization in this field, partly because (wet-lab) vascular biologists are insufficiently familiar with computational biology tools and the opportunities they may offer. With this review, written for vascular biologists who lack expertise in computational methods, we aspire to break boundaries between both fields and to illustrate the potential of these tools for future angiogenic target discovery. We provide a comprehensive survey of currently available computational approaches that may be useful in prioritizing candidate genes, predicting associated mechanisms, and identifying their specificity to endothelial cell subtypes. We specifically highlight tools that use flexible, machine learning frameworks for large-scale data integration and gene prioritization. For each purpose-oriented category of tools, we describe underlying conceptual principles, highlight interesting applications and discuss limitations. Finally, we will discuss challenges and recommend some guidelines which can help to optimize the process of accurate target discovery.
Collapse
|
34
|
Arslan E, Schulz J, Rai K. Machine Learning in Epigenomics: Insights into Cancer Biology and Medicine. Biochim Biophys Acta Rev Cancer 2021; 1876:188588. [PMID: 34245839 PMCID: PMC8595561 DOI: 10.1016/j.bbcan.2021.188588] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 05/29/2021] [Accepted: 07/02/2021] [Indexed: 02/01/2023]
Abstract
The recent deluge of genome-wide technologies for the mapping of the epigenome and resulting data in cancer samples has provided the opportunity for gaining insights into and understanding the roles of epigenetic processes in cancer. However, the complexity, high-dimensionality, sparsity, and noise associated with these data pose challenges for extensive integrative analyses. Machine Learning (ML) algorithms are particularly suited for epigenomic data analyses due to their flexibility and ability to learn underlying hidden structures. We will discuss four overlapping but distinct major categories under ML: dimensionality reduction, unsupervised methods, supervised methods, and deep learning (DL). We review the preferred use cases of these algorithms in analyses of cancer epigenomics data with the hope to provide an overview of how ML approaches can be used to explore fundamental questions on the roles of epigenome in cancer biology and medicine.
Collapse
Affiliation(s)
- Emre Arslan
- Department of Genomic Medicine, MD Anderson Cancer Center, Houston, TX 77030, United States of America
| | - Jonathan Schulz
- Department of Genomic Medicine, MD Anderson Cancer Center, Houston, TX 77030, United States of America
| | - Kunal Rai
- Department of Genomic Medicine, MD Anderson Cancer Center, Houston, TX 77030, United States of America.
| |
Collapse
|
35
|
Kourou K, Exarchos KP, Papaloukas C, Sakaloglou P, Exarchos T, Fotiadis DI. Applied machine learning in cancer research: A systematic review for patient diagnosis, classification and prognosis. Comput Struct Biotechnol J 2021; 19:5546-5555. [PMID: 34712399 PMCID: PMC8523813 DOI: 10.1016/j.csbj.2021.10.006] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Revised: 10/04/2021] [Accepted: 10/04/2021] [Indexed: 02/08/2023] Open
Abstract
Artificial Intelligence (AI) has recently altered the landscape of cancer research and medical oncology using traditional Machine Learning (ML) algorithms and cutting-edge Deep Learning (DL) architectures. In this review article we focus on the ML aspect of AI applications in cancer research and present the most indicative studies with respect to the ML algorithms and data used. The PubMed and dblp databases were considered to obtain the most relevant research works of the last five years. Based on a comparison of the proposed studies and their research clinical outcomes concerning the medical ML application in cancer research, three main clinical scenarios were identified. We give an overview of the well-known DL and Reinforcement Learning (RL) methodologies, as well as their application in clinical practice, and we briefly discuss Systems Biology in cancer research. We also provide a thorough examination of the clinical scenarios with respect to disease diagnosis, patient classification and cancer prognosis and survival. The most relevant studies identified in the preceding year are presented along with their primary findings. Furthermore, we examine the effective implementation and the main points that need to be addressed in the direction of robustness, explainability and transparency of predictive models. Finally, we summarize the most recent advances in the field of AI/ML applications in cancer research and medical oncology, as well as some of the challenges and open issues that need to be addressed before data-driven models can be implemented in healthcare systems to assist physicians in their daily practice.
Collapse
Affiliation(s)
- Konstantina Kourou
- Unit of Medical Technology and Intelligent Information Systems, Dept. of Materials Science and Engineering, University of Ioannina, Ioannina, Greece
- Foundation for Research and Technology-Hellas, Institute of Molecular Biology and Biotechnology, Dept. of Biomedical Research, Ioannina GR45110, Greece
| | | | - Costas Papaloukas
- Dept. of Biological Applications and Technology, University of Ioannina, Ioannina, Greece
| | - Prodromos Sakaloglou
- Dept. of Precision and Molecular Medicine, Unit of Liquid Biopsy in Oncology, Ioannina University Hospital, Ioannina, Greece
- Laboratory of Medical Genetics in Clinical Practice, School of Health Sciences, Faculty of Medicine, University of Ioannina, Ioannina, Greece
| | | | - Dimitrios I. Fotiadis
- Unit of Medical Technology and Intelligent Information Systems, Dept. of Materials Science and Engineering, University of Ioannina, Ioannina, Greece
- Foundation for Research and Technology-Hellas, Institute of Molecular Biology and Biotechnology, Dept. of Biomedical Research, Ioannina GR45110, Greece
| |
Collapse
|
36
|
Venugopalan J, Tong L, Hassanzadeh HR, Wang MD. Multimodal deep learning models for early detection of Alzheimer's disease stage. Sci Rep 2021; 11:3254. [PMID: 33547343 PMCID: PMC7864942 DOI: 10.1038/s41598-020-74399-w] [Citation(s) in RCA: 106] [Impact Index Per Article: 35.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2018] [Accepted: 01/22/2020] [Indexed: 02/06/2023] Open
Abstract
Most current Alzheimer's disease (AD) and mild cognitive disorders (MCI) studies use single data modality to make predictions such as AD stages. The fusion of multiple data modalities can provide a holistic view of AD staging analysis. Thus, we use deep learning (DL) to integrally analyze imaging (magnetic resonance imaging (MRI)), genetic (single nucleotide polymorphisms (SNPs)), and clinical test data to classify patients into AD, MCI, and controls (CN). We use stacked denoising auto-encoders to extract features from clinical and genetic data, and use 3D-convolutional neural networks (CNNs) for imaging data. We also develop a novel data interpretation method to identify top-performing features learned by the deep-models with clustering and perturbation analysis. Using Alzheimer's disease neuroimaging initiative (ADNI) dataset, we demonstrate that deep models outperform shallow models, including support vector machines, decision trees, random forests, and k-nearest neighbors. In addition, we demonstrate that integrating multi-modality data outperforms single modality models in terms of accuracy, precision, recall, and meanF1 scores. Our models have identified hippocampus, amygdala brain areas, and the Rey Auditory Verbal Learning Test (RAVLT) as top distinguished features, which are consistent with the known AD literature.
Collapse
Affiliation(s)
- Janani Venugopalan
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | - Li Tong
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | - Hamid Reza Hassanzadeh
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - May D Wang
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA.
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA.
- Winship Cancer Institute, Parker H. Petit Institute for Bioengineering and Biosciences, Institute of People and Technology, Georgia Institute of Technology and Emory University, Atlanta, GA, USA.
| |
Collapse
|