1
|
Li W, Ballard J, Zhao Y, Long Q. Knowledge-guided learning methods for integrative analysis of multi-omics data. Comput Struct Biotechnol J 2024; 23:1945-1950. [PMID: 38736693 PMCID: PMC11087912 DOI: 10.1016/j.csbj.2024.04.053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 04/17/2024] [Accepted: 04/18/2024] [Indexed: 05/14/2024] Open
Abstract
Integrative analysis of multi-omics data has the potential to yield valuable and comprehensive insights into the molecular mechanisms underlying complex diseases such as cancer and Alzheimer's disease. However, a number of analytical challenges complicate multi-omics data integration. For instance, -omics data are usually high-dimensional, and sample sizes in multi-omics studies tend to be modest. Furthermore, when genes in an important pathway have relatively weak signal, it can be difficult to detect them individually. There is a growing body of literature on knowledge-guided learning methods that can address these challenges by incorporating biological knowledge such as functional genomics and functional proteomics into multi-omics data analysis. These methods have been shown to outperform their counterparts that do not utilize biological knowledge in tasks including prediction, feature selection, clustering, and dimension reduction. In this review, we survey recently developed methods and applications of knowledge-guided multi-omics data integration methods and discuss future research directions.
Collapse
Affiliation(s)
- Wenrui Li
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Drive, Philadelphia, 19104, PA, USA
| | - Jenna Ballard
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, 19104, PA, USA
| | - Yize Zhao
- Department of Biostatistics, School of Public Health, Yale University, 60 College Street, New Haven, 06510, CT, USA
| | - Qi Long
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Drive, Philadelphia, 19104, PA, USA
| |
Collapse
|
2
|
van Hilten A, Katz S, Saccenti E, Niessen WJ, Roshchupkin GV. Designing interpretable deep learning applications for functional genomics: a quantitative analysis. Brief Bioinform 2024; 25:bbae449. [PMID: 39293804 PMCID: PMC11410376 DOI: 10.1093/bib/bbae449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 08/07/2024] [Accepted: 08/28/2024] [Indexed: 09/20/2024] Open
Abstract
Deep learning applications have had a profound impact on many scientific fields, including functional genomics. Deep learning models can learn complex interactions between and within omics data; however, interpreting and explaining these models can be challenging. Interpretability is essential not only to help progress our understanding of the biological mechanisms underlying traits and diseases but also for establishing trust in these model's efficacy for healthcare applications. Recognizing this importance, recent years have seen the development of numerous diverse interpretability strategies, making it increasingly difficult to navigate the field. In this review, we present a quantitative analysis of the challenges arising when designing interpretable deep learning solutions in functional genomics. We explore design choices related to the characteristics of genomics data, the neural network architectures applied, and strategies for interpretation. By quantifying the current state of the field with a predefined set of criteria, we find the most frequent solutions, highlight exceptional examples, and identify unexplored opportunities for developing interpretable deep learning models in genomics.
Collapse
Affiliation(s)
- Arno van Hilten
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
| | - Sonja Katz
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, 6700 HB Wageningen WE, The Netherlands
| | - Edoardo Saccenti
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, 6700 HB Wageningen WE, The Netherlands
| | - Wiro J Niessen
- Department of Imaging Physics, Delft University of Technology, 2628 CD Delft, The Netherlands
| | - Gennady V Roshchupkin
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
- Department of Epidemiology, Erasmus MC, 3015 GD Rotterdam, The Netherlands
| |
Collapse
|
3
|
Chakraborty S, Sharma G, Karmakar S, Banerjee S. Multi-OMICS approaches in cancer biology: New era in cancer therapy. Biochim Biophys Acta Mol Basis Dis 2024; 1870:167120. [PMID: 38484941 DOI: 10.1016/j.bbadis.2024.167120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 03/06/2024] [Accepted: 03/06/2024] [Indexed: 04/01/2024]
Abstract
Innovative multi-omics frameworks integrate diverse datasets from the same patients to enhance our understanding of the molecular and clinical aspects of cancers. Advanced omics and multi-view clustering algorithms present unprecedented opportunities for classifying cancers into subtypes, refining survival predictions and treatment outcomes, and unravelling key pathophysiological processes across various molecular layers. However, with the increasing availability of cost-effective high-throughput technologies (HTT) that generate vast amounts of data, analyzing single layers often falls short of establishing causal relations. Integrating multi-omics data spanning genomes, epigenomes, transcriptomes, proteomes, metabolomes, and microbiomes offers unique prospects to comprehend the underlying biology of complex diseases like cancer. This discussion explores algorithmic frameworks designed to uncover cancer subtypes, disease mechanisms, and methods for identifying pivotal genomic alterations. It also underscores the significance of multi-omics in tumor classifications, diagnostics, and prognostications. Despite its unparalleled advantages, the integration of multi-omics data has been slow to find its way into everyday clinics. A major hurdle is the uneven maturity of different omics approaches and the widening gap between the generation of large datasets and the capacity to process this data. Initiatives promoting the standardization of sample processing and analytical pipelines, as well as multidisciplinary training for experts in data analysis and interpretation, are crucial for translating theoretical findings into practical applications.
Collapse
Affiliation(s)
- Sohini Chakraborty
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Gaurav Sharma
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Sricheta Karmakar
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Satarupa Banerjee
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India.
| |
Collapse
|
4
|
Klauschen F, Dippel J, Keyl P, Jurmeister P, Bockmayr M, Mock A, Buchstab O, Alber M, Ruff L, Montavon G, Müller KR. Toward Explainable Artificial Intelligence for Precision Pathology. ANNUAL REVIEW OF PATHOLOGY 2024; 19:541-570. [PMID: 37871132 DOI: 10.1146/annurev-pathmechdis-051222-113147] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
The rapid development of precision medicine in recent years has started to challenge diagnostic pathology with respect to its ability to analyze histological images and increasingly large molecular profiling data in a quantitative, integrative, and standardized way. Artificial intelligence (AI) and, more precisely, deep learning technologies have recently demonstrated the potential to facilitate complex data analysis tasks, including clinical, histological, and molecular data for disease classification; tissue biomarker quantification; and clinical outcome prediction. This review provides a general introduction to AI and describes recent developments with a focus on applications in diagnostic pathology and beyond. We explain limitations including the black-box character of conventional AI and describe solutions to make machine learning decisions more transparent with so-called explainable AI. The purpose of the review is to foster a mutual understanding of both the biomedical and the AI side. To that end, in addition to providing an overview of the relevant foundations in pathology and machine learning, we present worked-through examples for a better practical understanding of what AI can achieve and how it should be done.
Collapse
Affiliation(s)
- Frederick Klauschen
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
- Institute of Pathology, Charité Universitätsmedizin Berlin, Berlin, Germany
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- German Cancer Consortium, German Cancer Research Center (DKTK/DKFZ), Munich Partner Site, Munich, Germany
| | - Jonas Dippel
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- Machine Learning Group, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany;
| | - Philipp Keyl
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
| | - Philipp Jurmeister
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
- German Cancer Consortium, German Cancer Research Center (DKTK/DKFZ), Munich Partner Site, Munich, Germany
| | - Michael Bockmayr
- Institute of Pathology, Charité Universitätsmedizin Berlin, Berlin, Germany
- Department of Pediatric Hematology and Oncology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Research Institute Children's Cancer Center Hamburg, Hamburg, Germany
| | - Andreas Mock
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
- German Cancer Consortium, German Cancer Research Center (DKTK/DKFZ), Munich Partner Site, Munich, Germany
| | - Oliver Buchstab
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
| | - Maximilian Alber
- Institute of Pathology, Charité Universitätsmedizin Berlin, Berlin, Germany
- Aignostics, Berlin, Germany
| | | | - Grégoire Montavon
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- Machine Learning Group, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany;
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | - Klaus-Robert Müller
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- Machine Learning Group, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany;
- Department of Artificial Intelligence, Korea University, Seoul, Korea
- Max Planck Institute for Informatics, Saarbrücken, Germany
| |
Collapse
|
5
|
Ranjbari S, Arslanturk S. Integration of incomplete multi-omics data using Knowledge Distillation and Supervised Variational Autoencoders for disease progression prediction. J Biomed Inform 2023; 147:104512. [PMID: 37813325 DOI: 10.1016/j.jbi.2023.104512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 08/31/2023] [Accepted: 10/03/2023] [Indexed: 10/11/2023]
Abstract
OBJECTIVE The rapid advancement of high-throughput technologies in the biomedical field has resulted in the accumulation of diverse omics data types, such as mRNA expression, DNA methylation, and microRNA expression, for studying various diseases. Integrating these multi-omics datasets enables a comprehensive understanding of the molecular basis of cancer and facilitates accurate prediction of disease progression. METHODS However, conventional approaches face challenges due to the dimensionality curse problem. This paper introduces a novel framework called Knowledge Distillation and Supervised Variational AutoEncoders utilizing View Correlation Discovery Network (KD-SVAE-VCDN) to address the integration of high-dimensional multi-omics data with limited common samples. Through our experimental evaluation, we demonstrate that the proposed KD-SVAE-VCDN architecture accurately predicts the progression of breast and kidney carcinoma by effectively classifying patients as long- or short-term survivors. Furthermore, our approach outperforms other state-of-the-art multi-omics integration models. RESULTS Our findings highlight the efficacy of the KD-SVAE-VCDN architecture in predicting the disease progression of breast and kidney carcinoma. By enabling the classification of patients based on survival outcomes, our model contributes to personalized and targeted treatments. The favorable performance of our approach in comparison to several existing models suggests its potential to contribute to the advancement of cancer understanding and management. CONCLUSION The development of a robust predictive model capable of accurately forecasting disease progression at the time of diagnosis holds immense promise for advancing personalized medicine. By leveraging multi-omics data integration, our proposed KD-SVAE-VCDN framework offers an effective solution to this challenge, paving the way for more precise and tailored treatment strategies for patients with different types of cancer.
Collapse
Affiliation(s)
- Sima Ranjbari
- Department of Computer Science, Wayne State University, Detroit, 48202, MI, USA.
| | - Suzan Arslanturk
- Department of Computer Science, Wayne State University, Detroit, 48202, MI, USA.
| |
Collapse
|
6
|
Krix S, DeLong LN, Madan S, Domingo-Fernández D, Ahmad A, Gul S, Zaliani A, Fröhlich H. MultiGML: Multimodal graph machine learning for prediction of adverse drug events. Heliyon 2023; 9:e19441. [PMID: 37681175 PMCID: PMC10481305 DOI: 10.1016/j.heliyon.2023.e19441] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 08/22/2023] [Accepted: 08/23/2023] [Indexed: 09/09/2023] Open
Abstract
Adverse drug events constitute a major challenge for the success of clinical trials. Several computational strategies have been suggested to estimate the risk of adverse drug events in preclinical drug development. While these approaches have demonstrated high utility in practice, they are at the same time limited to specific information sources. Thus, many current computational approaches neglect a wealth of information which results from the integration of different data sources, such as biological protein function, gene expression, chemical compound structure, cell-based imaging and others. In this work we propose an integrative and explainable multi-modal Graph Machine Learning approach (MultiGML), which fuses knowledge graphs with multiple further data modalities to predict drug related adverse events and general drug target-phenotype associations. MultiGML demonstrates excellent prediction performance compared to alternative algorithms, including various traditional knowledge graph embedding techniques. MultiGML distinguishes itself from alternative techniques by providing in-depth explanations of model predictions, which point towards biological mechanisms associated with predictions of an adverse drug event. Hence, MultiGML could be a versatile tool to support decision making in preclinical drug development.
Collapse
Affiliation(s)
- Sophia Krix
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53757, Sankt Augustin, Germany
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53115, Bonn, Germany
- Fraunhofer Center for Machine Learning, Germany
| | - Lauren Nicole DeLong
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53757, Sankt Augustin, Germany
- Artificial Intelligence and its Applications Institute, School of Informatics, University of Edinburgh, 10 Crichton Street, EH8 9AB, UK
| | - Sumit Madan
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53757, Sankt Augustin, Germany
- Department of Computer Science, University of Bonn, 53115, Bonn, Germany
| | - Daniel Domingo-Fernández
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53757, Sankt Augustin, Germany
- Fraunhofer Center for Machine Learning, Germany
- Enveda Biosciences, Boulder, CO, 80301, USA
| | - Ashar Ahmad
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53115, Bonn, Germany
- Grunenthal GmbH, 52099, Aachen, Germany
| | - Sheraz Gul
- Fraunhofer Institute for Translational Medicine and Pharmacology ITMP, Schnackenburgallee 114, 22525, Hamburg, Germany
- Fraunhofer Cluster of Excellence for Immune-Mediated Diseases CIMD, Schnackenburgallee 114, 22525, Hamburg, Germany
| | - Andrea Zaliani
- Fraunhofer Institute for Translational Medicine and Pharmacology ITMP, Schnackenburgallee 114, 22525, Hamburg, Germany
- Fraunhofer Cluster of Excellence for Immune-Mediated Diseases CIMD, Schnackenburgallee 114, 22525, Hamburg, Germany
| | - Holger Fröhlich
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53757, Sankt Augustin, Germany
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53115, Bonn, Germany
| |
Collapse
|
7
|
Wysocka M, Wysocki O, Zufferey M, Landers D, Freitas A. A systematic review of biologically-informed deep learning models for cancer: fundamental trends for encoding and interpreting oncology data. BMC Bioinformatics 2023; 24:198. [PMID: 37189058 PMCID: PMC10186658 DOI: 10.1186/s12859-023-05262-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 03/30/2023] [Indexed: 05/17/2023] Open
Abstract
BACKGROUND There is an increasing interest in the use of Deep Learning (DL) based methods as a supporting analytical framework in oncology. However, most direct applications of DL will deliver models with limited transparency and explainability, which constrain their deployment in biomedical settings. METHODS This systematic review discusses DL models used to support inference in cancer biology with a particular emphasis on multi-omics analysis. It focuses on how existing models address the need for better dialogue with prior knowledge, biological plausibility and interpretability, fundamental properties in the biomedical domain. For this, we retrieved and analyzed 42 studies focusing on emerging architectural and methodological advances, the encoding of biological domain knowledge and the integration of explainability methods. RESULTS We discuss the recent evolutionary arch of DL models in the direction of integrating prior biological relational and network knowledge to support better generalisation (e.g. pathways or Protein-Protein-Interaction networks) and interpretability. This represents a fundamental functional shift towards models which can integrate mechanistic and statistical inference aspects. We introduce a concept of bio-centric interpretability and according to its taxonomy, we discuss representational methodologies for the integration of domain prior knowledge in such models. CONCLUSIONS The paper provides a critical outlook into contemporary methods for explainability and interpretability used in DL for cancer. The analysis points in the direction of a convergence between encoding prior knowledge and improved interpretability. We introduce bio-centric interpretability which is an important step towards formalisation of biological interpretability of DL models and developing methods that are less problem- or application-specific.
Collapse
Affiliation(s)
- Magdalena Wysocka
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
| | - Oskar Wysocki
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920 Martigny, Switzerland
| | - Marie Zufferey
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920 Martigny, Switzerland
| | - Dónal Landers
- DeLondra Oncology Ltd, 38 Carlton Avenue, Wilmslow, SK9 4EP UK
| | - André Freitas
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920 Martigny, Switzerland
| |
Collapse
|
8
|
Shutova MS, Borowczyk J, Russo B, Sellami S, Drukala J, Wolnicki M, Brembilla NC, Kaya G, Ivanov AI, Boehncke WH. Inflammation modulates intercellular adhesion and mechanotransduction in human epidermis via ROCK2. iScience 2023; 26:106195. [PMID: 36890793 PMCID: PMC9986521 DOI: 10.1016/j.isci.2023.106195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 12/05/2022] [Accepted: 02/08/2023] [Indexed: 02/15/2023] Open
Abstract
Aberrant mechanotransduction and compromised epithelial barrier function are associated with numerous human pathologies including inflammatory skin disorders. However, the cytoskeletal mechanisms regulating inflammatory responses in the epidermis are not well understood. Here we addressed this question by inducing a psoriatic phenotype in human keratinocytes and reconstructed human epidermis using a cytokine stimulation model. We show that the inflammation upregulates the Rho-myosin II pathway and destabilizes adherens junctions (AJs) promoting YAP nuclear entry. The integrity of cell-cell adhesion but not the myosin II contractility per se is the determinative factor for the YAP regulation in epidermal keratinocytes. The inflammation-induced disruption of AJs, increased paracellular permeability, and YAP nuclear translocation are regulated by ROCK2, independently from myosin II activation. Using a specific inhibitor KD025, we show that ROCK2 executes its effects via cytoskeletal and transcription-dependent mechanisms to shape the inflammatory response in the epidermis.
Collapse
Affiliation(s)
- Maria S. Shutova
- University of Geneva, Department of Pathology and Immunology, Geneva, Switzerland
- University Hospitals of Geneva, Division of Dermatology and Venereology, Geneva, Switzerland
- Geneva Centre for Inflammation Research, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Julia Borowczyk
- University of Geneva, Department of Pathology and Immunology, Geneva, Switzerland
| | - Barbara Russo
- University of Geneva, Department of Pathology and Immunology, Geneva, Switzerland
- University Hospitals of Geneva, Division of Dermatology and Venereology, Geneva, Switzerland
- Geneva Centre for Inflammation Research, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Sihem Sellami
- University of Geneva, Department of Pathology and Immunology, Geneva, Switzerland
| | - Justyna Drukala
- Jagiellonian University, Department of Cell Biology, Faculty of Biochemistry, Biophysics and Biotechnology, Cracow, Poland
| | - Michal Wolnicki
- Department of Pediatric Urology, Jagiellonian University Medical College, Cracow, Poland
| | - Nicolo C. Brembilla
- University Hospitals of Geneva, Division of Dermatology and Venereology, Geneva, Switzerland
- Geneva Centre for Inflammation Research, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Gurkan Kaya
- University Hospitals of Geneva, Division of Dermatology and Venereology, Geneva, Switzerland
| | - Andrei I. Ivanov
- Department of Inflammation and Immunity, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| | - Wolf-Henning Boehncke
- University of Geneva, Department of Pathology and Immunology, Geneva, Switzerland
- University Hospitals of Geneva, Division of Dermatology and Venereology, Geneva, Switzerland
- Geneva Centre for Inflammation Research, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| |
Collapse
|
9
|
Li J, Li L, You P, Wei Y, Xu B. Towards artificial intelligence to multi-omics characterization of tumor heterogeneity in esophageal cancer. Semin Cancer Biol 2023; 91:35-49. [PMID: 36868394 DOI: 10.1016/j.semcancer.2023.02.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Revised: 02/21/2023] [Accepted: 02/28/2023] [Indexed: 03/05/2023]
Abstract
Esophageal cancer is a unique and complex heterogeneous malignancy, with substantial tumor heterogeneity: at the cellular levels, tumors are composed of tumor and stromal cellular components; at the genetic levels, they comprise genetically distinct tumor clones; at the phenotypic levels, cells in distinct microenvironmental niches acquire diverse phenotypic features. This heterogeneity affects almost every process of esophageal cancer progression from onset to metastases and recurrence, etc. Intertumoral and intratumoral heterogeneity are major obstacles in the treatment of esophageal cancer, but also offer the potential to manipulate the heterogeneity themselves as a new therapeutic strategy. The high-dimensional, multi-faceted characterization of genomics, epigenomics, transcriptomics, proteomics, metabonomics, etc. of esophageal cancer has opened novel horizons for dissecting tumor heterogeneity. Artificial intelligence especially machine learning and deep learning algorithms, are able to make decisive interpretations of data from multi-omics layers. To date, artificial intelligence has emerged as a promising computational tool for analyzing and dissecting esophageal patient-specific multi-omics data. This review provides a comprehensive review of tumor heterogeneity from a multi-omics perspective. Especially, we discuss the novel techniques single-cell sequencing and spatial transcriptomics, which have revolutionized our understanding of the cell compositions of esophageal cancer and allowed us to determine novel cell types. We focus on the latest advances in artificial intelligence in integrating multi-omics data of esophageal cancer. Artificial intelligence-based multi-omics data integration computational tools exert a key role in tumor heterogeneity assessment, which will potentially boost the development of precision oncology in esophageal cancer.
Collapse
Affiliation(s)
- Junyu Li
- Department of Radiation Oncology, Jiangxi Cancer Hospital, Nanchang 330029, Jiangxi, China; Jiangxi Health Committee Key (JHCK) Laboratory of Tumor Metastasis, Jiangxi Cancer Hospital, Nanchang 330029, Jiangxi, China
| | - Lin Li
- Department of Thoracic Oncology, Jiangxi Cancer Hospital, Nanchang 330029, Jiangxi, China
| | - Peimeng You
- Nanchang University, Department of Radiation Oncology, Jiangxi Cancer Hospital, Nanchang 330029, Jiangxi, China
| | - Yiping Wei
- Department of Thoracic Surgery, The Second Affiliated Hospital of Nanchang University, Nanchang 330006, Jiangxi, China.
| | - Bin Xu
- Jiangxi Health Committee Key (JHCK) Laboratory of Tumor Metastasis, Jiangxi Cancer Hospital, Nanchang 330029, Jiangxi, China.
| |
Collapse
|
10
|
Minadakis G, Christodoulou K, Tsouloupas G, Spyrou GM. PathIN: an integrated tool for the visualization of pathway interaction networks. Comput Struct Biotechnol J 2022; 21:378-387. [PMID: 36618987 PMCID: PMC9798270 DOI: 10.1016/j.csbj.2022.12.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 12/16/2022] [Accepted: 12/16/2022] [Indexed: 12/23/2022] Open
Abstract
PathIN is a web-service that provides an easy and flexible way for rapidly creating pathway-based networks at several functional biological levels: genes, compounds and reactions. The tool is supported by a database repository of reference pathway networks across a large set of species, developed through the freely available information included in the KEGG, Reactome and Wiki Pathways database repositories. PathIN provides networks by means of five diverse methodologies: (a) direct connections between pathways of interest, (b) direct connections as well as the first neighbours of the given pathways, (c) direct connections, the first neighbours and the connections in between them, and (d) two additional methodologies for creating complementary pathway-to-pathway networks that involve additional (missing) pathways that interfere in-between pathways of interest. PathIN is expected to be used as a simple yet informative reference tool for understanding networks of molecular mechanisms related to specific diseases.
Collapse
Affiliation(s)
- George Minadakis
- Bioinformatics Department, The Cyprus Institute of Neurology & Genetics, 6 Iroon Avenue, 2371 Ayios Dometios, Nicosia, Cyprus | PO Box 23462, 1683, Nicosia, Cyprus
| | - Kyproula Christodoulou
- Neurogenetics Department, The Cyprus Institute of Neurology & Genetics, 6 Iroon Avenue, 2371 Ayios Dometios, Nicosia, Cyprus | PO Box 23462, 1683, Nicosia, Cyprus
| | - George Tsouloupas
- HPC Facility, The Cyprus Institute, 20 Konstantinou Kavafi Street, Aglantzia, 2121, Nicosia, Cyprus
| | - George M. Spyrou
- Bioinformatics Department, The Cyprus Institute of Neurology & Genetics, 6 Iroon Avenue, 2371 Ayios Dometios, Nicosia, Cyprus | PO Box 23462, 1683, Nicosia, Cyprus
| |
Collapse
|
11
|
Athieniti E, Spyrou GM. A guide to multi-omics data collection and integration for translational medicine. Comput Struct Biotechnol J 2022; 21:134-149. [PMID: 36544480 PMCID: PMC9747357 DOI: 10.1016/j.csbj.2022.11.050] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 11/25/2022] [Accepted: 11/25/2022] [Indexed: 12/02/2022] Open
Abstract
The emerging high-throughput technologies have led to the shift in the design of translational medicine projects towards collecting multi-omics patient samples and, consequently, their integrated analysis. However, the complexity of integrating these datasets has triggered new questions regarding the appropriateness of the available computational methods. Currently, there is no clear consensus on the best combination of omics to include and the data integration methodologies required for their analysis. This article aims to guide the design of multi-omics studies in the field of translational medicine regarding the types of omics and the integration method to choose. We review articles that perform the integration of multiple omics measurements from patient samples. We identify five objectives in translational medicine applications: (i) detect disease-associated molecular patterns, (ii) subtype identification, (iii) diagnosis/prognosis, (iv) drug response prediction, and (v) understand regulatory processes. We describe common trends in the selection of omic types combined for different objectives and diseases. To guide the choice of data integration tools, we group them into the scientific objectives they aim to address. We describe the main computational methods adopted to achieve these objectives and present examples of tools. We compare tools based on how they deal with the computational challenges of data integration and comment on how they perform against predefined objective-specific evaluation criteria. Finally, we discuss examples of tools for downstream analysis and further extraction of novel insights from multi-omics datasets.
Collapse
Affiliation(s)
- Efi Athieniti
- Department of Bioinformatics, The Cyprus Institute of Neurology and Genetics, 6 Iroon Avenue, 2371 Ayios Dometios, Nicosia, Cyprus
| | - George M. Spyrou
- Department of Bioinformatics, The Cyprus Institute of Neurology and Genetics, 6 Iroon Avenue, 2371 Ayios Dometios, Nicosia, Cyprus
| |
Collapse
|
12
|
Rong Z, Liu Z, Song J, Cao L, Yu Y, Qiu M, Hou Y. MCluster-VAEs: An end-to-end variational deep learning-based clustering method for subtype discovery using multi-omics data. Comput Biol Med 2022; 150:106085. [PMID: 36162197 DOI: 10.1016/j.compbiomed.2022.106085] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 07/30/2022] [Accepted: 09/03/2022] [Indexed: 11/03/2022]
Abstract
The discovery of cancer subtypes based on unsupervised clustering helps in providing a precise diagnosis, guide treatment, and improve patients' prognoses. Instead of single-omics data, multi-omics data can improve the clustering performance because it obtains a comprehensive landscape for understanding biological systems and mechanisms. However, heterogeneous data from multiple sources raises high complexity and different kinds of noise, which are detrimental to the extraction of clustering information. We propose an end-to-end deep learning based method, called Multi-omics Clustering Variational Autoencoders (MCluster-VAEs), that can extract cluster-friendly representations on multi-omics data. First, a unified network architecture with an attention mechanism was developed for accurately modeling multi-omics data. Then, using a novel objective function built from the Variational Bayes technique, the model was trained to effectively obtain the posterior estimation of the clustering assignments. Compared with 12 other state-of-the-art multi-omics clustering methods, MCluster-VAEs achieved an outstanding performance on benchmark datasets from the TCGA database. On the Pan Cancer dataset, MCluster-VAEs achieved an adjusted Rand index of approximately 0.78 for cancer category recognition, an increase of more than 18% compared with other methods. Furthermore, a survival analysis and clinical parameter enrichment tests conducted on 10 cancer datasets demonstrated that MCluster-VAEs provides comparable and even better results than many common integrative approaches. These results demonstrate that MCluster-VAEs are a powerful new tool for dissecting complex multi-omics relationships and providing new insights for cancer subtype discovery.
Collapse
Affiliation(s)
- Zhiwei Rong
- Department of Biostatistics Beijing, Peking University School of Public Health, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China
| | - Zhilin Liu
- Department of Biostatistics Beijing, Peking University School of Public Health, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China
| | - Jiali Song
- Department of Biostatistics Beijing, Peking University School of Public Health, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China
| | - Lei Cao
- Department of Epidemiology and Biostatistics Harbin, Harbin Medical University School of Public Health, Harbin, 150000, Heilongjiang, China
| | - Yipe Yu
- Department of Biostatistics Beijing, Peking University School of Public Health, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China
| | - Mantang Qiu
- Department of Thoracic Surgery Beijing, Peking University People's Hospital, Beijing, 100000, China.
| | - Yan Hou
- Department of Biostatistics Beijing, Peking University School of Public Health, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China; Peking University Clinical Research Center, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China.
| |
Collapse
|
13
|
Treppner M, Binder H, Hess M. Interpretable generative deep learning: an illustration with single cell gene expression data. Hum Genet 2022; 141:1481-1498. [PMID: 34988661 PMCID: PMC9360114 DOI: 10.1007/s00439-021-02417-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 08/06/2021] [Indexed: 11/26/2022]
Abstract
Deep generative models can learn the underlying structure, such as pathways or gene programs, from omics data. We provide an introduction as well as an overview of such techniques, specifically illustrating their use with single-cell gene expression data. For example, the low dimensional latent representations offered by various approaches, such as variational auto-encoders, are useful to get a better understanding of the relations between observed gene expressions and experimental factors or phenotypes. Furthermore, by providing a generative model for the latent and observed variables, deep generative models can generate synthetic observations, which allow us to assess the uncertainty in the learned representations. While deep generative models are useful to learn the structure of high-dimensional omics data by efficiently capturing non-linear dependencies between genes, they are sometimes difficult to interpret due to their neural network building blocks. More precisely, to understand the relationship between learned latent variables and observed variables, e.g., gene transcript abundances and external phenotypes, is difficult. Therefore, we also illustrate current approaches that allow us to infer the relationship between learned latent variables and observed variables as well as external phenotypes. Thereby, we render deep learning approaches more interpretable. In an application with single-cell gene expression data, we demonstrate the utility of the discussed methods.
Collapse
Affiliation(s)
- Martin Treppner
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Stefan-Meier-Str. 26, Freiburg, 79104, Germany.
| | - Harald Binder
- Freiburg Center for Data Analysis and Modeling, University of Freiburg, Freiburg, 79104, Germany
| | - Moritz Hess
- Freiburg Center for Data Analysis and Modeling, University of Freiburg, Freiburg, 79104, Germany
| |
Collapse
|
14
|
Hamamoto R, Takasawa K, Machino H, Kobayashi K, Takahashi S, Bolatkan A, Shinkai N, Sakai A, Aoyama R, Yamada M, Asada K, Komatsu M, Okamoto K, Kameoka H, Kaneko S. Application of non-negative matrix factorization in oncology: one approach for establishing precision medicine. Brief Bioinform 2022; 23:6628783. [PMID: 35788277 PMCID: PMC9294421 DOI: 10.1093/bib/bbac246] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 05/06/2022] [Accepted: 05/25/2022] [Indexed: 12/19/2022] Open
Abstract
The increase in the expectations of artificial intelligence (AI) technology has led to machine learning technology being actively used in the medical field. Non-negative matrix factorization (NMF) is a machine learning technique used for image analysis, speech recognition, and language processing; recently, it is being applied to medical research. Precision medicine, wherein important information is extracted from large-scale medical data to provide optimal medical care for every individual, is considered important in medical policies globally, and the application of machine learning techniques to this end is being handled in several ways. NMF is also introduced differently because of the characteristics of its algorithms. In this review, the importance of NMF in the field of medicine, with a focus on the field of oncology, is described by explaining the mathematical science of NMF and the characteristics of the algorithm, providing examples of how NMF can be used to establish precision medicine, and presenting the challenges of NMF. Finally, the direction regarding the effective use of NMF in the field of oncology is also discussed.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Rina Aoyama
- Showa University Graduate School of Medicine School of Medicine
| | | | - Ken Asada
- RIKEN Center for Advanced Intelligence Project
| | | | | | | | | |
Collapse
|
15
|
Gomari DP, Schweickart A, Cerchietti L, Paietta E, Fernandez H, Al-Amin H, Suhre K, Krumsiek J. Variational autoencoders learn transferrable representations of metabolomics data. Commun Biol 2022; 5:645. [PMID: 35773471 PMCID: PMC9246987 DOI: 10.1038/s42003-022-03579-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Accepted: 06/10/2022] [Indexed: 01/14/2023] Open
Abstract
Dimensionality reduction approaches are commonly used for the deconvolution of high-dimensional metabolomics datasets into underlying core metabolic processes. However, current state-of-the-art methods are widely incapable of detecting nonlinearities in metabolomics data. Variational Autoencoders (VAEs) are a deep learning method designed to learn nonlinear latent representations which generalize to unseen data. Here, we trained a VAE on a large-scale metabolomics population cohort of human blood samples consisting of over 4500 individuals. We analyzed the pathway composition of the latent space using a global feature importance score, which demonstrated that latent dimensions represent distinct cellular processes. To demonstrate model generalizability, we generated latent representations of unseen metabolomics datasets on type 2 diabetes, acute myeloid leukemia, and schizophrenia and found significant correlations with clinical patient groups. Notably, the VAE representations showed stronger effects than latent dimensions derived by linear and non-linear principal component analysis. Taken together, we demonstrate that the VAE is a powerful method that learns biologically meaningful, nonlinear, and transferrable latent representations of metabolomics data.
Collapse
Affiliation(s)
- Daniel P. Gomari
- grid.4567.00000 0004 0483 2525Institute of Computational Biology, Helmholtz Center Munich—German Research Center for Environmental Health, 85764 Neuherberg, Germany ,grid.6936.a0000000123222966Technical University of Munich—School of Life Sciences, 85354 Freising, Germany ,grid.168010.e0000000419368956Department of Genetics, Stanford University School of Medicine, Stanford, CA USA
| | - Annalise Schweickart
- grid.5386.8000000041936877XDepartment of Physiology and Biophysics, Weill Cornell Medicine, Institute for Computational Biomedicine, Englander Institute for Precision Medicine, New York, NY 10021 USA
| | - Leandro Cerchietti
- grid.5386.8000000041936877XDepartment of Medicine, Hematology and Oncology Division, Weill Cornell Medicine, New York, 10065 NY USA
| | - Elisabeth Paietta
- grid.251993.50000000121791997Albert Einstein College of Medicine-Montefiore Medical Center, Bronx, NY USA
| | - Hugo Fernandez
- grid.489080.d0000 0004 0444 4637Moffitt Malignant Hematology & Cellular Therapy at Memorial Healthcare System, Pembroke Pines, FL USA
| | - Hassen Al-Amin
- grid.416973.e0000 0004 0582 4340Department of Psychiatry, Weill Cornell Medicine—Qatar, Education City, P.O. Box 24144, Doha, Qatar
| | - Karsten Suhre
- grid.416973.e0000 0004 0582 4340Department of Physiology and Biophysics, Weill Cornell Medical College—Qatar Education City, Doha, Qatar
| | - Jan Krumsiek
- grid.5386.8000000041936877XDepartment of Physiology and Biophysics, Weill Cornell Medicine, Institute for Computational Biomedicine, Englander Institute for Precision Medicine, New York, NY 10021 USA
| |
Collapse
|
16
|
A novel liver cancer diagnosis method based on patient similarity network and DenseGCN. Sci Rep 2022; 12:6797. [PMID: 35474072 PMCID: PMC9043215 DOI: 10.1038/s41598-022-10441-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 04/05/2022] [Indexed: 11/17/2022] Open
Abstract
Liver cancer is the main malignancy in terms of mortality rate, accurate diagnosis can help the treatment outcome of liver cancer. Patient similarity network is an important information which helps in cancer diagnosis. However, recent works rarely take patient similarity into consideration. To address this issue, we constructed patient similarity network using three liver cancer omics data, and proposed a novel liver cancer diagnosis method consisted of similarity network fusion, denoising autoencoder and dense graph convolutional neural network to capitalize on patient similarity network and multi omics data. We compared our proposed method with other state-of-the-art methods and machine learning methods on TCGA-LIHC dataset to evaluate its performance. The results confirmed that our proposed method surpasses these comparison methods in terms of all the metrics. Especially, our proposed method has attained an accuracy up to 0.9857.
Collapse
|
17
|
Stahlschmidt SR, Ulfenborg B, Synnergren J. Multimodal deep learning for biomedical data fusion: a review. Brief Bioinform 2022; 23:bbab569. [PMID: 35089332 PMCID: PMC8921642 DOI: 10.1093/bib/bbab569] [Citation(s) in RCA: 76] [Impact Index Per Article: 38.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 12/06/2021] [Accepted: 12/11/2021] [Indexed: 02/06/2023] Open
Abstract
Biomedical data are becoming increasingly multimodal and thereby capture the underlying complex relationships among biological processes. Deep learning (DL)-based data fusion strategies are a popular approach for modeling these nonlinear relationships. Therefore, we review the current state-of-the-art of such methods and propose a detailed taxonomy that facilitates more informed choices of fusion strategies for biomedical applications, as well as research on novel methods. By doing so, we find that deep fusion strategies often outperform unimodal and shallow approaches. Additionally, the proposed subcategories of fusion strategies show different advantages and drawbacks. The review of current methods has shown that, especially for intermediate fusion strategies, joint representation learning is the preferred approach as it effectively models the complex interactions of different levels of biological organization. Finally, we note that gradual fusion, based on prior biological knowledge or on search strategies, is a promising future research path. Similarly, utilizing transfer learning might overcome sample size limitations of multimodal data sets. As these data sets become increasingly available, multimodal DL approaches present the opportunity to train holistic models that can learn the complex regulatory dynamics behind health and disease.
Collapse
Affiliation(s)
| | | | - Jane Synnergren
- Systems Biology Research Center, University of Skövde, Sweden
| |
Collapse
|
18
|
Kang M, Ko E, Mersha TB. A roadmap for multi-omics data integration using deep learning. Brief Bioinform 2022; 23:bbab454. [PMID: 34791014 PMCID: PMC8769688 DOI: 10.1093/bib/bbab454] [Citation(s) in RCA: 81] [Impact Index Per Article: 40.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 09/30/2021] [Accepted: 10/05/2021] [Indexed: 12/18/2022] Open
Abstract
High-throughput next-generation sequencing now makes it possible to generate a vast amount of multi-omics data for various applications. These data have revolutionized biomedical research by providing a more comprehensive understanding of the biological systems and molecular mechanisms of disease development. Recently, deep learning (DL) algorithms have become one of the most promising methods in multi-omics data analysis, due to their predictive performance and capability of capturing nonlinear and hierarchical features. While integrating and translating multi-omics data into useful functional insights remain the biggest bottleneck, there is a clear trend towards incorporating multi-omics analysis in biomedical research to help explain the complex relationships between molecular layers. Multi-omics data have a role to improve prevention, early detection and prediction; monitor progression; interpret patterns and endotyping; and design personalized treatments. In this review, we outline a roadmap of multi-omics integration using DL and offer a practical perspective into the advantages, challenges and barriers to the implementation of DL in multi-omics data.
Collapse
Affiliation(s)
- Mingon Kang
- Department of Computer Science at the University of Nevada, Las Vegas, NV, USA
| | - Euiseong Ko
- Department of Computer Science at the University of Nevada, Las Vegas, NV, USA
| | - Tesfaye B Mersha
- Department of Pediatrics, Cincinnati Children’s Hospital Medical Center, University of Cincinnati, Cincinnati, OH, USA
| |
Collapse
|
19
|
Vijayakumar S, Magazzù G, Moon P, Occhipinti A, Angione C. A Practical Guide to Integrating Multimodal Machine Learning and Metabolic Modeling. Methods Mol Biol 2022; 2399:87-122. [PMID: 35604554 DOI: 10.1007/978-1-0716-1831-8_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Complex, distributed, and dynamic sets of clinical biomedical data are collectively referred to as multimodal clinical data. In order to accommodate the volume and heterogeneity of such diverse data types and aid in their interpretation when they are combined with a multi-scale predictive model, machine learning is a useful tool that can be wielded to deconstruct biological complexity and extract relevant outputs. Additionally, genome-scale metabolic models (GSMMs) are one of the main frameworks striving to bridge the gap between genotype and phenotype by incorporating prior biological knowledge into mechanistic models. Consequently, the utilization of GSMMs as a foundation for the integration of multi-omic data originating from different domains is a valuable pursuit towards refining predictions. In this chapter, we show how cancer multi-omic data can be analyzed via multimodal machine learning and metabolic modeling. Firstly, we focus on the merits of adopting an integrative systems biology led approach to biomedical data mining. Following this, we propose how constraint-based metabolic models can provide a stable yet adaptable foundation for the integration of multimodal data with machine learning. Finally, we provide a step-by-step tutorial for the combination of machine learning and GSMMs, which includes: (i) tissue-specific constraint-based modeling; (ii) survival analysis using time-to-event prediction for cancer; and (iii) classification and regression approaches for multimodal machine learning. The code associated with the tutorial can be found at https://github.com/Angione-Lab/Tutorials_Combining_ML_and_GSMM .
Collapse
Affiliation(s)
- Supreeta Vijayakumar
- Computational Systems Biology and Data Analytics Research Group, Teesside University, Middlebrough, UK
| | - Giuseppe Magazzù
- Computational Systems Biology and Data Analytics Research Group, Teesside University, Middlebrough, UK
| | - Pradip Moon
- Computational Systems Biology and Data Analytics Research Group, Teesside University, Middlebrough, UK
| | - Annalisa Occhipinti
- Computational Systems Biology and Data Analytics Research Group, Middlebrough, UK
- Centre for Digital Innovation, Teesside University, Middlesbrough, UK
| | - Claudio Angione
- Computational Systems Biology and Data Analytics Research Group, Teesside University, Middlebrough, UK.
- Centre for Digital Innovation, Teesside University, Middlesbrough, UK.
- Healthcare Innovation Centre, Teesside University, Middlesbrough, UK.
| |
Collapse
|
20
|
Arslan E, Schulz J, Rai K. Machine Learning in Epigenomics: Insights into Cancer Biology and Medicine. Biochim Biophys Acta Rev Cancer 2021; 1876:188588. [PMID: 34245839 PMCID: PMC8595561 DOI: 10.1016/j.bbcan.2021.188588] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 05/29/2021] [Accepted: 07/02/2021] [Indexed: 02/01/2023]
Abstract
The recent deluge of genome-wide technologies for the mapping of the epigenome and resulting data in cancer samples has provided the opportunity for gaining insights into and understanding the roles of epigenetic processes in cancer. However, the complexity, high-dimensionality, sparsity, and noise associated with these data pose challenges for extensive integrative analyses. Machine Learning (ML) algorithms are particularly suited for epigenomic data analyses due to their flexibility and ability to learn underlying hidden structures. We will discuss four overlapping but distinct major categories under ML: dimensionality reduction, unsupervised methods, supervised methods, and deep learning (DL). We review the preferred use cases of these algorithms in analyses of cancer epigenomics data with the hope to provide an overview of how ML approaches can be used to explore fundamental questions on the roles of epigenome in cancer biology and medicine.
Collapse
Affiliation(s)
- Emre Arslan
- Department of Genomic Medicine, MD Anderson Cancer Center, Houston, TX 77030, United States of America
| | - Jonathan Schulz
- Department of Genomic Medicine, MD Anderson Cancer Center, Houston, TX 77030, United States of America
| | - Kunal Rai
- Department of Genomic Medicine, MD Anderson Cancer Center, Houston, TX 77030, United States of America.
| |
Collapse
|
21
|
Withnell E, Zhang X, Sun K, Guo Y. XOmiVAE: an interpretable deep learning model for cancer classification using high-dimensional omics data. Brief Bioinform 2021; 22:bbab315. [PMID: 34402865 PMCID: PMC8575033 DOI: 10.1093/bib/bbab315] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Revised: 07/04/2021] [Accepted: 07/20/2021] [Indexed: 12/26/2022] Open
Abstract
The lack of explainability is one of the most prominent disadvantages of deep learning applications in omics. This 'black box' problem can undermine the credibility and limit the practical implementation of biomedical deep learning models. Here we present XOmiVAE, a variational autoencoder (VAE)-based interpretable deep learning model for cancer classification using high-dimensional omics data. XOmiVAE is capable of revealing the contribution of each gene and latent dimension for each classification prediction and the correlation between each gene and each latent dimension. It is also demonstrated that XOmiVAE can explain not only the supervised classification but also the unsupervised clustering results from the deep learning network. To the best of our knowledge, XOmiVAE is one of the first activation level-based interpretable deep learning models explaining novel clusters generated by VAE. The explainable results generated by XOmiVAE were validated by both the performance of downstream tasks and the biomedical knowledge. In our experiments, XOmiVAE explanations of deep learning-based cancer classification and clustering aligned with current domain knowledge including biological annotation and academic literature, which shows great potential for novel biomedical knowledge discovery from deep learning models.
Collapse
Affiliation(s)
- Eloise Withnell
- Data Science Institute Imperial College London, SW7 2AZ London, UK
- Department of Health Informatics University College London, WC1E 6BT London, UK
| | - Xiaoyu Zhang
- Data Science Institute Imperial College London, SW7 2AZ London, UK
| | - Kai Sun
- Data Science Institute Imperial College London, SW7 2AZ London, UK
| | - Yike Guo
- Data Science Institute Imperial College London, SW7 2AZ London, UK
- Department of Computer Science Hong Kong Baptist University, Hong Kong China
| |
Collapse
|
22
|
Lai X, Zhou J, Wessely A, Heppt M, Maier A, Berking C, Vera J, Zhang L. A disease network-based deep learning approach for characterizing melanoma. Int J Cancer 2021; 150:1029-1044. [PMID: 34716589 DOI: 10.1002/ijc.33860] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Revised: 10/08/2021] [Accepted: 10/19/2021] [Indexed: 12/12/2022]
Abstract
Multiple types of genomic variations are present in cutaneous melanoma and some of the genomic features may have an impact on the prognosis of the disease. The access to genomics data via public repositories such as The Cancer Genome Atlas (TCGA) allows for a better understanding of melanoma at the molecular level, therefore making characterization of substantial heterogeneity in melanoma patients possible. Here, we proposed an approach that integrates genomics data, a disease network, and a deep learning model to classify melanoma patients for prognosis, assess the impact of genomic features on the classification and provide interpretation to the impactful features. We integrated genomics data into a melanoma network and applied an autoencoder model to identify subgroups in TCGA melanoma patients. The model utilizes communities identified in the network to effectively reduce the dimensionality of genomics data into a patient score profile. Based on the score profile, we identified three patient subtypes that show different survival times. Furthermore, we quantified and ranked the impact of genomic features on the patient score profile using a machine-learning technique. Follow-up analysis of the top-ranking features provided us with the biological interpretation of them at both pathway and molecular levels, such as their mutation and interactome profiles in melanoma and their involvement in pathways associated with signaling transduction, immune system and cell cycle. Taken together, we demonstrated the ability of the approach to identify disease subgroups using a deep learning model that captures the most relevant information of genomics data in the melanoma network.
Collapse
Affiliation(s)
- Xin Lai
- Department of Dermatology, Universitätsklinikum Erlangen and Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.,Deutsches Zentrum Immuntherapie, Erlangen, Germany.,Comprehensive Cancer Center Erlangen, Erlangen, Germany
| | - Jinfei Zhou
- College of Computer Science, Sichuan University, Chengdu, China
| | - Anja Wessely
- Department of Dermatology, Universitätsklinikum Erlangen and Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.,Deutsches Zentrum Immuntherapie, Erlangen, Germany.,Comprehensive Cancer Center Erlangen, Erlangen, Germany
| | - Markus Heppt
- Department of Dermatology, Universitätsklinikum Erlangen and Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.,Deutsches Zentrum Immuntherapie, Erlangen, Germany.,Comprehensive Cancer Center Erlangen, Erlangen, Germany
| | - Andreas Maier
- Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Carola Berking
- Department of Dermatology, Universitätsklinikum Erlangen and Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.,Deutsches Zentrum Immuntherapie, Erlangen, Germany.,Comprehensive Cancer Center Erlangen, Erlangen, Germany
| | - Julio Vera
- Department of Dermatology, Universitätsklinikum Erlangen and Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.,Deutsches Zentrum Immuntherapie, Erlangen, Germany.,Comprehensive Cancer Center Erlangen, Erlangen, Germany
| | - Le Zhang
- College of Computer Science, Sichuan University, Chengdu, China
| |
Collapse
|
23
|
Tran KA, Kondrashova O, Bradley A, Williams ED, Pearson JV, Waddell N. Deep learning in cancer diagnosis, prognosis and treatment selection. Genome Med 2021; 13:152. [PMID: 34579788 PMCID: PMC8477474 DOI: 10.1186/s13073-021-00968-x] [Citation(s) in RCA: 256] [Impact Index Per Article: 85.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2020] [Accepted: 09/12/2021] [Indexed: 12/13/2022] Open
Abstract
Deep learning is a subdiscipline of artificial intelligence that uses a machine learning technique called artificial neural networks to extract patterns and make predictions from large data sets. The increasing adoption of deep learning across healthcare domains together with the availability of highly characterised cancer datasets has accelerated research into the utility of deep learning in the analysis of the complex biology of cancer. While early results are promising, this is a rapidly evolving field with new knowledge emerging in both cancer biology and deep learning. In this review, we provide an overview of emerging deep learning techniques and how they are being applied to oncology. We focus on the deep learning applications for omics data types, including genomic, methylation and transcriptomic data, as well as histopathology-based genomic inference, and provide perspectives on how the different data types can be integrated to develop decision support tools. We provide specific examples of how deep learning may be applied in cancer diagnosis, prognosis and treatment management. We also assess the current limitations and challenges for the application of deep learning in precision oncology, including the lack of phenotypically rich data and the need for more explainable deep learning models. Finally, we conclude with a discussion of how current obstacles can be overcome to enable future clinical utilisation of deep learning.
Collapse
Affiliation(s)
- Khoa A. Tran
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, 4006 Australia
- School of Biomedical Sciences, Faculty of Health, Queensland University of Technology (QUT), Brisbane, 4059 Australia
| | - Olga Kondrashova
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, 4006 Australia
| | - Andrew Bradley
- Faculty of Engineering, Queensland University of Technology (QUT), Brisbane, 4000 Australia
| | - Elizabeth D. Williams
- School of Biomedical Sciences, Faculty of Health, Queensland University of Technology (QUT), Brisbane, 4059 Australia
- Australian Prostate Cancer Research Centre - Queensland (APCRC-Q) and Queensland Bladder Cancer Initiative (QBCI), Brisbane, 4102 Australia
| | - John V. Pearson
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, 4006 Australia
| | - Nicola Waddell
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, 4006 Australia
| |
Collapse
|
24
|
Monaco A, Pantaleo E, Amoroso N, Lacalamita A, Lo Giudice C, Fonzino A, Fosso B, Picardi E, Tangaro S, Pesole G, Bellotti R. A primer on machine learning techniques for genomic applications. Comput Struct Biotechnol J 2021; 19:4345-4359. [PMID: 34429852 PMCID: PMC8365460 DOI: 10.1016/j.csbj.2021.07.021] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 07/23/2021] [Accepted: 07/23/2021] [Indexed: 11/28/2022] Open
Abstract
High throughput sequencing technologies have enabled the study of complex biological aspects at single nucleotide resolution, opening the big data era. The analysis of large volumes of heterogeneous "omic" data, however, requires novel and efficient computational algorithms based on the paradigm of Artificial Intelligence. In the present review, we introduce and describe the most common machine learning methodologies, and lately deep learning, applied to a variety of genomics tasks, trying to emphasize capabilities, strengths and limitations through a simple and intuitive language. We highlight the power of the machine learning approach in handling big data by means of a real life example, and underline how described methods could be relevant in all cases in which large amounts of multimodal genomic data are available.
Collapse
Affiliation(s)
- Alfonso Monaco
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Bari, Via A. Orabona 4, 70125 Bari, Italy
| | - Ester Pantaleo
- Dipartimento Interateneo di Fisica "M. Merlin", Università degli Studi di Bari "Aldo Moro", Via G. Amendola 173, 70125 Bari, Italy
| | - Nicola Amoroso
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Bari, Via A. Orabona 4, 70125 Bari, Italy.,Dipartimento di Farmacia - Scienze del Farmaco, Università degli Studi di Bari "Aldo Moro", Via A. Orabona 4, 70125 Bari, Italy
| | - Antonio Lacalamita
- National Institute of Gastroenterology "S. de Bellis", Research Hospital, 70013 Castellana Grotte (Bari), Italy
| | - Claudio Lo Giudice
- Dipartimento di Bioscienze, Biotecnologie e Biofarmaceutica, Università degli Studi di Bari "Aldo Moro", Via A. Orabona 4, 70125 Bari, Italy
| | - Adriano Fonzino
- Dipartimento di Bioscienze, Biotecnologie e Biofarmaceutica, Università degli Studi di Bari "Aldo Moro", Via A. Orabona 4, 70125 Bari, Italy
| | - Bruno Fosso
- Istituto di Biomembrane, Bioenergetica e Biotecnologie Molecolari, Consiglio Nazionale delle Ricerche, Via G. Amendola 122/O, 70126 Bari, Italy
| | - Ernesto Picardi
- Dipartimento di Bioscienze, Biotecnologie e Biofarmaceutica, Università degli Studi di Bari "Aldo Moro", Via A. Orabona 4, 70125 Bari, Italy.,Istituto di Biomembrane, Bioenergetica e Biotecnologie Molecolari, Consiglio Nazionale delle Ricerche, Via G. Amendola 122/O, 70126 Bari, Italy
| | - Sabina Tangaro
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Bari, Via A. Orabona 4, 70125 Bari, Italy.,Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari "Aldo Moro", Bari, Via G. Amendola 165, 70125 Bari, Italy
| | - Graziano Pesole
- Dipartimento di Bioscienze, Biotecnologie e Biofarmaceutica, Università degli Studi di Bari "Aldo Moro", Via A. Orabona 4, 70125 Bari, Italy.,Istituto di Biomembrane, Bioenergetica e Biotecnologie Molecolari, Consiglio Nazionale delle Ricerche, Via G. Amendola 122/O, 70126 Bari, Italy
| | - Roberto Bellotti
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Bari, Via A. Orabona 4, 70125 Bari, Italy.,Dipartimento Interateneo di Fisica "M. Merlin", Università degli Studi di Bari "Aldo Moro", Via G. Amendola 173, 70125 Bari, Italy
| |
Collapse
|
25
|
Duan R, Gao L, Gao Y, Hu Y, Xu H, Huang M, Song K, Wang H, Dong Y, Jiang C, Zhang C, Jia S. Evaluation and comparison of multi-omics data integration methods for cancer subtyping. PLoS Comput Biol 2021; 17:e1009224. [PMID: 34383739 PMCID: PMC8384175 DOI: 10.1371/journal.pcbi.1009224] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Revised: 08/24/2021] [Accepted: 06/28/2021] [Indexed: 11/18/2022] Open
Abstract
Computational integrative analysis has become a significant approach in the data-driven exploration of biological problems. Many integration methods for cancer subtyping have been proposed, but evaluating these methods has become a complicated problem due to the lack of gold standards. Moreover, questions of practical importance remain to be addressed regarding the impact of selecting appropriate data types and combinations on the performance of integrative studies. Here, we constructed three classes of benchmarking datasets of nine cancers in TCGA by considering all the eleven combinations of four multi-omics data types. Using these datasets, we conducted a comprehensive evaluation of ten representative integration methods for cancer subtyping in terms of accuracy measured by combining both clustering accuracy and clinical significance, robustness, and computational efficiency. We subsequently investigated the influence of different omics data on cancer subtyping and the effectiveness of their combinations. Refuting the widely held intuition that incorporating more types of omics data always produces better results, our analyses showed that there are situations where integrating more omics data negatively impacts the performance of integration methods. Our analyses also suggested several effective combinations for most cancers under our studies, which may be of particular interest to researchers in omics data analysis. Cancer is one of the most heterogeneous diseases, characterized by diverse morphological, phenotypic, and genomic profiles between tumors and their subtypes. Identifying cancer subtypes can help patients receive precise treatments. With the development of high-throughput technologies, genomics, epigenomics, and transcriptomics data have been generated for large cancer patient cohorts. It is believed that the more omics data we use, the more accurate identification of cancer subtypes. To examine this assumption, we first constructed three classes of benchmarking datasets to conduct a comprehensive evaluation and comparison of ten representative multi-omics data integration methods for cancer subtyping by considering their accuracy, robustness, and computational efficiency. Then, we investigated the influence of different omics data and their various combinations on the effectiveness of cancer subtyping. Our analyses showed that there are situations where integrating more omics data negatively impacts the performance of integration methods. We hope that our work may help researchers choose a proper method and an effective data combination when identifying cancer subtypes using data integration methods.
Collapse
Affiliation(s)
- Ran Duan
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi’an, China
- * E-mail:
| | - Yong Gao
- Department of Computer Science, The University of British Columbia Okanagan, Kelowna, British Columbia, Canada
| | - Yuxuan Hu
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Han Xu
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Mingfeng Huang
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Kuo Song
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Hongda Wang
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Yongqiang Dong
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Chaoqun Jiang
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Chenxing Zhang
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Songwei Jia
- School of Computer Science and Technology, Xidian University, Xi’an, China
| |
Collapse
|
26
|
Li Y, Ma L, Wu D, Chen G. Advances in bulk and single-cell multi-omics approaches for systems biology and precision medicine. Brief Bioinform 2021; 22:6189773. [PMID: 33778867 DOI: 10.1093/bib/bbab024] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Revised: 12/31/2020] [Accepted: 01/20/2021] [Indexed: 12/13/2022] Open
Abstract
Multi-omics allows the systematic understanding of the information flow across different omics layers, while single omics can mainly reflect one aspect of the biological system. The advancement of bulk and single-cell sequencing technologies and related computational methods for multi-omics largely facilitated the development of system biology and precision medicine. Single-cell approaches have the advantage of dissecting cellular dynamics and heterogeneity, whereas traditional bulk technologies are limited to individual/population-level investigation. In this review, we first summarize the technologies for producing bulk and single-cell multi-omics data. Then, we survey the computational approaches for integrative analysis of bulk and single-cell multimodal data, respectively. Moreover, the databases and data storage for multi-omics, as well as the tools for visualizing multimodal data are summarized. We also outline the integration between bulk and single-cell data, and discuss the applications of multi-omics in precision medicine. Finally, we present the challenges and perspectives for multi-omics development.
Collapse
Affiliation(s)
| | - Lu Ma
- China Normal University, China
| | | | | |
Collapse
|
27
|
Vlachavas EI, Bohn J, Ückert F, Nürnberg S. A Detailed Catalogue of Multi-Omics Methodologies for Identification of Putative Biomarkers and Causal Molecular Networks in Translational Cancer Research. Int J Mol Sci 2021; 22:2822. [PMID: 33802234 PMCID: PMC8000236 DOI: 10.3390/ijms22062822] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 03/05/2021] [Accepted: 03/05/2021] [Indexed: 02/06/2023] Open
Abstract
Recent advances in sequencing and biotechnological methodologies have led to the generation of large volumes of molecular data of different omics layers, such as genomics, transcriptomics, proteomics and metabolomics. Integration of these data with clinical information provides new opportunities to discover how perturbations in biological processes lead to disease. Using data-driven approaches for the integration and interpretation of multi-omics data could stably identify links between structural and functional information and propose causal molecular networks with potential impact on cancer pathophysiology. This knowledge can then be used to improve disease diagnosis, prognosis, prevention, and therapy. This review will summarize and categorize the most current computational methodologies and tools for integration of distinct molecular layers in the context of translational cancer research and personalized therapy. Additionally, the bioinformatics tools Multi-Omics Factor Analysis (MOFA) and netDX will be tested using omics data from public cancer resources, to assess their overall robustness, provide reproducible workflows for gaining biological knowledge from multi-omics data, and to comprehensively understand the significantly perturbed biological entities in distinct cancer types. We show that the performed supervised and unsupervised analyses result in meaningful and novel findings.
Collapse
Affiliation(s)
- Efstathios Iason Vlachavas
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
| | - Jonas Bohn
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
| | - Frank Ückert
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
- Applied Medical Informatics, University Hospital Hamburg-Eppendorf, 20251 Hamburg, Germany
| | - Sylvia Nürnberg
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
- Applied Medical Informatics, University Hospital Hamburg-Eppendorf, 20251 Hamburg, Germany
| |
Collapse
|
28
|
Ghosh S, Bian J, Guo Y, Prosperi M. Deep propensity network using a sparse autoencoder for estimation of treatment effects. J Am Med Inform Assoc 2021; 28:1197-1206. [PMID: 33594415 DOI: 10.1093/jamia/ocaa346] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 11/22/2020] [Accepted: 12/28/2020] [Indexed: 12/11/2022] Open
Abstract
OBJECTIVE Drawing causal estimates from observational data is problematic, because datasets often contain underlying bias (eg, discrimination in treatment assignment). To examine causal effects, it is important to evaluate what-if scenarios-the so-called "counterfactuals." We propose a novel deep learning architecture for propensity score matching and counterfactual prediction-the deep propensity network using a sparse autoencoder (DPN-SA)-to tackle the problems of high dimensionality, nonlinear/nonparallel treatment assignment, and residual confounding when estimating treatment effects. MATERIALS AND METHODS We used 2 randomized prospective datasets, a semisynthetic one with nonlinear/nonparallel treatment selection bias and simulated counterfactual outcomes from the Infant Health and Development Program and a real-world dataset from the LaLonde's employment training program. We compared different configurations of the DPN-SA against logistic regression and LASSO as well as deep counterfactual networks with propensity dropout (DCN-PD). Models' performances were assessed in terms of average treatment effects, mean squared error in precision on effect's heterogeneity, and average treatment effect on the treated, over multiple training/test runs. RESULTS The DPN-SA outperformed logistic regression and LASSO by 36%-63%, and DCN-PD by 6%-10% across all datasets. All deep learning architectures yielded average treatment effects close to the true ones with low variance. Results were also robust to noise-injection and addition of correlated variables. Code is publicly available at https://github.com/Shantanu48114860/DPN-SAz. DISCUSSION AND CONCLUSION Deep sparse autoencoders are particularly suited for treatment effect estimation studies using electronic health records because they can handle high-dimensional covariate sets, large sample sizes, and complex heterogeneity in treatment assignments.
Collapse
Affiliation(s)
- Shantanu Ghosh
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, Florida, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida, USA
| | - Yi Guo
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida, USA
| | - Mattia Prosperi
- Department of Epidemiology, College of Public Health and Health Professions & College of Medicine, University of Florida, Gainesville, Florida, USA
| |
Collapse
|
29
|
Biswas N, Chakrabarti S. Artificial Intelligence (AI)-Based Systems Biology Approaches in Multi-Omics Data Analysis of Cancer. Front Oncol 2020; 10:588221. [PMID: 33154949 PMCID: PMC7591760 DOI: 10.3389/fonc.2020.588221] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 09/21/2020] [Indexed: 12/13/2022] Open
Abstract
Cancer is the manifestation of abnormalities of different physiological processes involving genes, DNAs, RNAs, proteins, and other biomolecules whose profiles are reflected in different omics data types. As these bio-entities are very much correlated, integrative analysis of different types of omics data, multi-omics data, is required to understanding the disease from the tumorigenesis to the disease progression. Artificial intelligence (AI), specifically machine learning algorithms, has the ability to make decisive interpretation of "big"-sized complex data and, hence, appears as the most effective tool for the analysis and understanding of multi-omics data for patient-specific observations. In this review, we have discussed about the recent outcomes of employing AI in multi-omics data analysis of different types of cancer. Based on the research trends and significance in patient treatment, we have primarily focused on the AI-based analysis for determining cancer subtypes, disease prognosis, and therapeutic targets. We have also discussed about AI analysis of some non-canonical types of omics data as they have the capability of playing the determiner role in cancer patient care. Additionally, we have briefly discussed about the data repositories because of their pivotal role in multi-omics data storing, processing, and analysis.
Collapse
Affiliation(s)
- Nupur Biswas
- Structural Biology and Bioinformatics Division, CSIR-Indian Institute of Chemical Biology, IICB TRUE Campus, Kolkata, India
| | - Saikat Chakrabarti
- Structural Biology and Bioinformatics Division, CSIR-Indian Institute of Chemical Biology, IICB TRUE Campus, Kolkata, India
| |
Collapse
|
30
|
Goldstein E, Yeghiazaryan K, Ahmad A, Giordano FA, Fröhlich H, Golubnitschaja O. Optimal multiparametric set-up modelled for best survival outcomes in palliative treatment of liver malignancies: unsupervised machine learning and 3 PM recommendations. EPMA J 2020; 11:505-515. [PMID: 32839667 PMCID: PMC7416811 DOI: 10.1007/s13167-020-00221-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 07/24/2020] [Indexed: 02/07/2023]
Abstract
Over the last decade, a rapid rise in deaths due to liver disease has been observed especially amongst young people. Nowadays liver disease accounts for approximately 2 million deaths per year worldwide: 1 million due to complications of cirrhosis and 1 million due to viral hepatitis and hepatocellular carcinoma. Besides primary liver malignancies, almost all solid tumours are capable to spread metastases to the liver, in particular, gastrointestinal cancers, breast and genitourinary cancers, lung cancer, melanomas and sarcomas. A big portion of liver malignancies undergo palliative care. To this end, the paradigm of the palliative care in the liver cancer management is evolving from "just end of the life" care to careful evaluation of all aspects relevant for the survivorship. In the presented study, an evidence-based approach has been taken to target molecular pathways and subcellular components for modelling most optimal conditions with the longest survival rates for patients diagnosed with advanced liver malignancies who underwent palliative treatments. We developed an unsupervised machine learning (UML) approach to robustly identify patient subgroups based on estimated survival curves for each individual patient and each individual potential biomarker. UML using consensus hierarchical clustering of biomarker derived risk profiles resulted into 3 stable patient subgroups. There were no significant differences in age, gender, therapy, diagnosis or comorbidities across clusters. Survival times across clusters differed significantly. Furthermore, several of the biomarkers demonstrated highly significant pairwise differences between clusters after correction for multiple testing, namely, "comet assay" patterns of classes I, III, IV and expression rates of calgranulin A (S100), SOD2 and profilin-all measured ex vivo in circulating leucocytes. Considering worst, intermediate and best survival curves with regard to identified clusters and corresponding patterns of parameters measured, clear differences were found for "comet assay" and S100 expression patterns. In conclusion, multi-faceted cancer control within the palliative care of liver malignancies is crucial for improved disease outcomes including individualised patient profiling, predictive models and implementation of corresponding cost-effective risks mitigating measures detailed in the paper. The "proof-of-principle" model is presented.
Collapse
Affiliation(s)
- Elisha Goldstein
- Machine learning research group, Department of Bioinformatics, Weizmann Institute, Rehovot, Israel
- State NRW-Israel program, Rheinische Friedrich-Wilhelms Universität Bonn, Bonn, Germany
| | - Kristina Yeghiazaryan
- IT-Department, University Hospital Bonn, Rheinische Friedrich-Wilhelms Universität Bonn, Bonn, Germany
| | - Ashar Ahmad
- AI & Data Science, Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), 53754 Sankt Augustin, Germany
- Bonn-Aachen International Centre for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, 53115 Bonn, Germany
| | - Frank A. Giordano
- Department of Radiation Oncology, University Hospital Bonn, Rheinische Friedrich-Wilhelms Universität Bonn, Bonn, Germany
| | - Holger Fröhlich
- AI & Data Science, Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), 53754 Sankt Augustin, Germany
- Bonn-Aachen International Centre for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, 53115 Bonn, Germany
| | - Olga Golubnitschaja
- Predictive, Preventive and Personalised (3P) Medicine, Department of Radiation Oncology, University Hospital Bonn, Rheinische Friedrich-Wilhelms Universität Bonn, Venusberg-Campus 1, 53127 Bonn, Germany
| |
Collapse
|