1
|
van Hilten A, van Rooij J, Ikram MA, Niessen WJ, van Meurs JBJ, Roshchupkin GV. Phenotype prediction using biologically interpretable neural networks on multi-cohort multi-omics data. NPJ Syst Biol Appl 2024; 10:81. [PMID: 39095438 PMCID: PMC11297229 DOI: 10.1038/s41540-024-00405-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 07/12/2024] [Indexed: 08/04/2024] Open
Abstract
Integrating multi-omics data into predictive models has the potential to enhance accuracy, which is essential for precision medicine. In this study, we developed interpretable predictive models for multi-omics data by employing neural networks informed by prior biological knowledge, referred to as visible networks. These neural networks offer insights into the decision-making process and can unveil novel perspectives on the underlying biological mechanisms associated with traits and complex diseases. We tested the performance, interpretability and generalizability for inferring smoking status, subject age and LDL levels using genome-wide RNA expression and CpG methylation data from the blood of the BIOS consortium (four population cohorts, Ntotal = 2940). In a cohort-wise cross-validation setting, the consistency of the diagnostic performance and interpretation was assessed. Performance was consistently high for predicting smoking status with an overall mean AUC of 0.95 (95% CI: 0.90-1.00) and interpretation revealed the involvement of well-replicated genes such as AHRR, GPR15 and LRRN3. LDL-level predictions were only generalized in a single cohort with an R2 of 0.07 (95% CI: 0.05-0.08). Age was inferred with a mean error of 5.16 (95% CI: 3.97-6.35) years with the genes COL11A2, AFAP1, OTUD7A, PTPRN2, ADARB2 and CD34 consistently predictive. For both regression tasks, we found that using multi-omics networks improved performance, stability and generalizability compared to interpretable single omic networks. We believe that visible neural networks have great potential for multi-omics analysis; they combine multi-omic data elegantly, are interpretable, and generalize well to data from different cohorts.
Collapse
Affiliation(s)
- Arno van Hilten
- Department of Radiology and Nuclear Medicine, Erasmus MC, Rotterdam, The Netherlands.
| | - Jeroen van Rooij
- Department of Internal Medicine, Erasmus MC, Rotterdam, The Netherlands
| | - M Arfan Ikram
- Department of Imaging Physics, Delft University of Technology, Delft, The Netherlands
| | - Wiro J Niessen
- Department of Radiology and Nuclear Medicine, Erasmus MC, Rotterdam, The Netherlands
- Department of Imaging Physics, Delft University of Technology, Delft, The Netherlands
| | - Joyce B J van Meurs
- Department of Internal Medicine, Erasmus MC, Rotterdam, The Netherlands
- Department of Orthopaedics and Sports Medicine, Erasmus MC, Rotterdam, The Netherlands
| | - Gennady V Roshchupkin
- Department of Radiology and Nuclear Medicine, Erasmus MC, Rotterdam, The Netherlands
| |
Collapse
|
2
|
van Hilten A, Katz S, Saccenti E, Niessen WJ, Roshchupkin GV. Designing interpretable deep learning applications for functional genomics: a quantitative analysis. Brief Bioinform 2024; 25:bbae449. [PMID: 39293804 PMCID: PMC11410376 DOI: 10.1093/bib/bbae449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 08/07/2024] [Accepted: 08/28/2024] [Indexed: 09/20/2024] Open
Abstract
Deep learning applications have had a profound impact on many scientific fields, including functional genomics. Deep learning models can learn complex interactions between and within omics data; however, interpreting and explaining these models can be challenging. Interpretability is essential not only to help progress our understanding of the biological mechanisms underlying traits and diseases but also for establishing trust in these model's efficacy for healthcare applications. Recognizing this importance, recent years have seen the development of numerous diverse interpretability strategies, making it increasingly difficult to navigate the field. In this review, we present a quantitative analysis of the challenges arising when designing interpretable deep learning solutions in functional genomics. We explore design choices related to the characteristics of genomics data, the neural network architectures applied, and strategies for interpretation. By quantifying the current state of the field with a predefined set of criteria, we find the most frequent solutions, highlight exceptional examples, and identify unexplored opportunities for developing interpretable deep learning models in genomics.
Collapse
Affiliation(s)
- Arno van Hilten
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
| | - Sonja Katz
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, 6700 HB Wageningen WE, The Netherlands
| | - Edoardo Saccenti
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, 6700 HB Wageningen WE, The Netherlands
| | - Wiro J Niessen
- Department of Imaging Physics, Delft University of Technology, 2628 CD Delft, The Netherlands
| | - Gennady V Roshchupkin
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
- Department of Epidemiology, Erasmus MC, 3015 GD Rotterdam, The Netherlands
| |
Collapse
|
3
|
Okwori M, Eslami A. Feature engineering from meta-data for prediction of differentially expressed genes: An investigation of Mus musculus exposed to space-conditions. Comput Biol Chem 2024; 109:108026. [PMID: 38335853 DOI: 10.1016/j.compbiolchem.2024.108026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 12/29/2023] [Accepted: 02/02/2024] [Indexed: 02/12/2024]
Abstract
Transcription profiling is a key process that can reveal those biological mechanisms driving the response to various exposure conditions or gene perturbations. In this work, we investigate the prediction of differentially expressed genes (DEGs) when exposed to conditions in space from a set of diverse engineered features. To do this, we collected DEGs and non-differentially expressed genes (NDEGs) of Mus musculus-based experiments on the GeneLab database. We engineered a diverse set of features from factors reported in the literature to affect gene expression. An extreme gradient boosting (XGBoost) model was trained to predict if a given gene would be differentially expressed at various levels of differential expression. The test results on a separate holdout dataset showed an area under the receiver operating characteristics curves (AUCs) of 0.90±0.07, averaged across the five selected percentages of the most and least differentially expressed genes. Subsequently, we investigated the impact of selection of features, both individually with a correlation-based feature-selection procedure and in groups with a combination procedure, on the prediction performance. The feature selection confirmed some known drivers of adaptation to radiation and highlighted some new transcription factors and micro RNAs (miRNAs). Finally, gene ontology (GO) analysis revealed biological processes that tend to have expression patterns most suitable for this approach. This work highlights the potential of detection of differentially expressed genes using a machine learning (ML) approach, and provides some evidence of gene expression changes being captured by a diverse feature set not related to the condition under study.
Collapse
Affiliation(s)
- Michael Okwori
- Department of Electrical, Computer and Biomedical Engineering, Union College, Schenectady, 12308, NY, United States of America.
| | - Ali Eslami
- Department of Electrical and Computer Engineering, Wichita State University, Wichita, 67260, KS, United States of America
| |
Collapse
|
4
|
Klauschen F, Dippel J, Keyl P, Jurmeister P, Bockmayr M, Mock A, Buchstab O, Alber M, Ruff L, Montavon G, Müller KR. Toward Explainable Artificial Intelligence for Precision Pathology. ANNUAL REVIEW OF PATHOLOGY 2024; 19:541-570. [PMID: 37871132 DOI: 10.1146/annurev-pathmechdis-051222-113147] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
The rapid development of precision medicine in recent years has started to challenge diagnostic pathology with respect to its ability to analyze histological images and increasingly large molecular profiling data in a quantitative, integrative, and standardized way. Artificial intelligence (AI) and, more precisely, deep learning technologies have recently demonstrated the potential to facilitate complex data analysis tasks, including clinical, histological, and molecular data for disease classification; tissue biomarker quantification; and clinical outcome prediction. This review provides a general introduction to AI and describes recent developments with a focus on applications in diagnostic pathology and beyond. We explain limitations including the black-box character of conventional AI and describe solutions to make machine learning decisions more transparent with so-called explainable AI. The purpose of the review is to foster a mutual understanding of both the biomedical and the AI side. To that end, in addition to providing an overview of the relevant foundations in pathology and machine learning, we present worked-through examples for a better practical understanding of what AI can achieve and how it should be done.
Collapse
Affiliation(s)
- Frederick Klauschen
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
- Institute of Pathology, Charité Universitätsmedizin Berlin, Berlin, Germany
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- German Cancer Consortium, German Cancer Research Center (DKTK/DKFZ), Munich Partner Site, Munich, Germany
| | - Jonas Dippel
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- Machine Learning Group, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany;
| | - Philipp Keyl
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
| | - Philipp Jurmeister
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
- German Cancer Consortium, German Cancer Research Center (DKTK/DKFZ), Munich Partner Site, Munich, Germany
| | - Michael Bockmayr
- Institute of Pathology, Charité Universitätsmedizin Berlin, Berlin, Germany
- Department of Pediatric Hematology and Oncology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Research Institute Children's Cancer Center Hamburg, Hamburg, Germany
| | - Andreas Mock
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
- German Cancer Consortium, German Cancer Research Center (DKTK/DKFZ), Munich Partner Site, Munich, Germany
| | - Oliver Buchstab
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
| | - Maximilian Alber
- Institute of Pathology, Charité Universitätsmedizin Berlin, Berlin, Germany
- Aignostics, Berlin, Germany
| | | | - Grégoire Montavon
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- Machine Learning Group, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany;
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | - Klaus-Robert Müller
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- Machine Learning Group, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany;
- Department of Artificial Intelligence, Korea University, Seoul, Korea
- Max Planck Institute for Informatics, Saarbrücken, Germany
| |
Collapse
|
5
|
Cheng KP, Shen WX, Jiang YY, Chen Y, Chen YZ, Tan Y. Deep learning of 2D-Restructured gene expression representations for improved low-sample therapeutic response prediction. Comput Biol Med 2023; 164:107245. [PMID: 37480677 DOI: 10.1016/j.compbiomed.2023.107245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 06/27/2023] [Accepted: 07/07/2023] [Indexed: 07/24/2023]
Abstract
Clinical outcome prediction is important for stratified therapeutics. Machine learning (ML) and deep learning (DL) methods facilitate therapeutic response prediction from transcriptomic profiles of cells and clinical samples. Clinical transcriptomic DL is challenged by the low-sample sizes (34-286 subjects), high-dimensionality (up to 21,653 genes) and unordered nature of clinical transcriptomic data. The established methods rely on ML algorithms at accuracy levels of 0.6-0.8 AUC/ACC values. Low-sample DL algorithms are needed for enhanced prediction capability. Here, an unsupervised manifold-guided algorithm was employed for restructuring transcriptomic data into ordered image-like 2D-representations, followed by efficient DL of these 2D-representations with deep ConvNets. Our DL models significantly outperformed the state-of-the-art (SOTA) ML models on 82% of 17 low-sample benchmark datasets (53% with >0.05 AUC/ACC improvement). They are more robust than the SOTA models in cross-cohort prediction tasks, and in identifying robust biomarkers and response-dependent variational patterns consistent with experimental indications.
Collapse
Affiliation(s)
- Kai Ping Cheng
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, PR China; Institute of Biomedical Health Technology and Engineering, Shenzhen Bay Laboratory, Shenzhen, 518132, PR China
| | - Wan Xiang Shen
- Bioinformatics and Drug Design Group, Department of Pharmacy, Center for Computational Science and Engineering, National University of Singapore, 117543, Singapore
| | - Yu Yang Jiang
- School of Pharmaceutical Sciences, Tsinghua University, Beijing, 100084, PR China
| | - Yan Chen
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, PR China
| | - Yu Zong Chen
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, PR China; Institute of Biomedical Health Technology and Engineering, Shenzhen Bay Laboratory, Shenzhen, 518132, PR China.
| | - Ying Tan
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, PR China; The Institute of Drug Discovery Technology, Ningbo University, Ningbo, 315211, PR China; Shenzhen Kivita Innovative Drug Discovery Institute, Shenzhen, 518110, PR China.
| |
Collapse
|
6
|
Beaude A, Rafiee Vahid M, Augé F, Zehraoui F, Hanczar B. AttOmics: attention-based architecture for diagnosis and prognosis from omics data. Bioinformatics 2023; 39:i94-i102. [PMID: 37387182 DOI: 10.1093/bioinformatics/btad232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION The increasing availability of high-throughput omics data allows for considering a new medicine centered on individual patients. Precision medicine relies on exploiting these high-throughput data with machine-learning models, especially the ones based on deep-learning approaches, to improve diagnosis. Due to the high-dimensional small-sample nature of omics data, current deep-learning models end up with many parameters and have to be fitted with a limited training set. Furthermore, interactions between molecular entities inside an omics profile are not patient specific but are the same for all patients. RESULTS In this article, we propose AttOmics, a new deep-learning architecture based on the self-attention mechanism. First, we decompose each omics profile into a set of groups, where each group contains related features. Then, by applying the self-attention mechanism to the set of groups, we can capture the different interactions specific to a patient. The results of different experiments carried out in this article show that our model can accurately predict the phenotype of a patient with fewer parameters than deep neural networks. Visualizing the attention maps can provide new insights into the essential groups for a particular phenotype. AVAILABILITY AND IMPLEMENTATION The code and data are available at https://forge.ibisc.univ-evry.fr/abeaude/AttOmics. TCGA data can be downloaded from the Genomic Data Commons Data Portal.
Collapse
Affiliation(s)
- Aurélien Beaude
- IBISC, Université Paris-Saclay, Univ Evry, 23 Boulevard de France, Evry-Courcouronnes 91020, France
- Artificial Intelligence & Deep Analytics, Omics Data Science, Sanofi R&D Data and Data Science, 1 Av. Pierre Brossolette, Chilly-Mazarin 91385, France
| | - Milad Rafiee Vahid
- Sanofi R&D Data and Data Science, Artificial Intelligence & Deep Analytics, Omics Data Science, 450 Water Street, Cambridge, MA 02142, United States
| | - Franck Augé
- Artificial Intelligence & Deep Analytics, Omics Data Science, Sanofi R&D Data and Data Science, 1 Av. Pierre Brossolette, Chilly-Mazarin 91385, France
| | - Farida Zehraoui
- IBISC, Université Paris-Saclay, Univ Evry, 23 Boulevard de France, Evry-Courcouronnes 91020, France
| | - Blaise Hanczar
- IBISC, Université Paris-Saclay, Univ Evry, 23 Boulevard de France, Evry-Courcouronnes 91020, France
| |
Collapse
|
7
|
Silva MC, Eugénio P, Faria D, Pesquita C. Ontologies and Knowledge Graphs in Oncology Research. Cancers (Basel) 2022; 14:cancers14081906. [PMID: 35454813 PMCID: PMC9029532 DOI: 10.3390/cancers14081906] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 03/25/2022] [Accepted: 04/07/2022] [Indexed: 11/16/2022] Open
Abstract
The complexity of cancer research stems from leaning on several biomedical disciplines for relevant sources of data, many of which are complex in their own right. A holistic view of cancer—which is critical for precision medicine approaches—hinges on integrating a variety of heterogeneous data sources under a cohesive knowledge model, a role which biomedical ontologies can fill. This study reviews the application of ontologies and knowledge graphs in cancer research. In total, our review encompasses 141 published works, which we categorized under 14 hierarchical categories according to their usage of ontologies and knowledge graphs. We also review the most commonly used ontologies and newly developed ones. Our review highlights the growing traction of ontologies in biomedical research in general, and cancer research in particular. Ontologies enable data accessibility, interoperability and integration, support data analysis, facilitate data interpretation and data mining, and more recently, with the emergence of the knowledge graph paradigm, support the application of Artificial Intelligence methods to unlock new knowledge from a holistic view of the available large volumes of heterogeneous data.
Collapse
|
8
|
Bourgeais V, Zehraoui F, Hanczar B. GraphGONet: a self-explaining neural network encapsulating the Gene Ontology graph for phenotype prediction on gene expression. Bioinformatics 2022; 38:2504-2511. [PMID: 35266505 DOI: 10.1093/bioinformatics/btac147] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2021] [Revised: 02/02/2022] [Accepted: 03/07/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Medical care is becoming more and more specific to patients' needs due to the increased availability of omics data. The application to these data of sophisticated machine learning models, in particular deep learning, can improve the field of precision medicine. However, their use in clinics is limited as their predictions are not accompanied by an explanation. The production of accurate and intelligible predictions can benefit from the inclusion of domain knowledge. Therefore, knowledge-based deep learning models appear to be a promising solution. RESULTS In this paper, we propose GraphGONet, where the Gene Ontology is encapsulated in the hidden layers of a new self-explaining neural network. Each neuron in the layers represents a biological concept, combining the gene expression profile of a patient, and the information from its neighboring neurons. The experiments described in the paper confirm that our model not only performs as accurately as the state-of-the-art (non-explainable ones) but also automatically produces stable and intelligible explanations composed of the biological concepts with the highest contribution. This feature allows experts to use our tool in a medical setting. AVAILABILITY GraphGONet is freely available at https://forge.ibisc.univ-evry.fr/vbourgeais/GraphGONet.git. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Victoria Bourgeais
- IBISC,Université Paris-Saclay (Univ. Évry), Évry-Courcouronnes, 91020, France
| | - Farida Zehraoui
- IBISC,Université Paris-Saclay (Univ. Évry), Évry-Courcouronnes, 91020, France
| | - Blaise Hanczar
- IBISC,Université Paris-Saclay (Univ. Évry), Évry-Courcouronnes, 91020, France
| |
Collapse
|