1
|
Wang FA, Li Y, Zeng T. Deep Learning of radiology-genomics integration for computational oncology: A mini review. Comput Struct Biotechnol J 2024; 23:2708-2716. [PMID: 39035833 PMCID: PMC11260400 DOI: 10.1016/j.csbj.2024.06.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 06/18/2024] [Accepted: 06/18/2024] [Indexed: 07/23/2024] Open
Abstract
In the field of computational oncology, patient status is often assessed using radiology-genomics, which includes two key technologies and data, such as radiology and genomics. Recent advances in deep learning have facilitated the integration of radiology-genomics data, and even new omics data, significantly improving the robustness and accuracy of clinical predictions. These factors are driving artificial intelligence (AI) closer to practical clinical applications. In particular, deep learning models are crucial in identifying new radiology-genomics biomarkers and therapeutic targets, supported by explainable AI (xAI) methods. This review focuses on recent developments in deep learning for radiology-genomics integration, highlights current challenges, and outlines some research directions for multimodal integration and biomarker discovery of radiology-genomics or radiology-omics that are urgently needed in computational oncology.
Collapse
Affiliation(s)
- Feng-ao Wang
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
| | - Yixue Li
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
- Guangzhou National Laboratory, Guangzhou, China
- GMU-GIBH Joint School of Life Sciences, The Guangdong-Hong Kong-Macau Joint Laboratory for Cell Fate Regulation and Diseases, Guangzhou Laboratory, Guangzhou Medical University, Guangzhou, China
| | - Tao Zeng
- Guangzhou National Laboratory, Guangzhou, China
- GMU-GIBH Joint School of Life Sciences, The Guangdong-Hong Kong-Macau Joint Laboratory for Cell Fate Regulation and Diseases, Guangzhou Laboratory, Guangzhou Medical University, Guangzhou, China
| |
Collapse
|
2
|
Ma W, Tang W, Kwok JS, Tong AH, Lo CW, Chu AT, Chung BH. A review on trends in development and translation of omics signatures in cancer. Comput Struct Biotechnol J 2024; 23:954-971. [PMID: 38385061 PMCID: PMC10879706 DOI: 10.1016/j.csbj.2024.01.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 01/31/2024] [Accepted: 01/31/2024] [Indexed: 02/23/2024] Open
Abstract
The field of cancer genomics and transcriptomics has evolved from targeted profiling to swift sequencing of individual tumor genome and transcriptome. The steady growth in genome, epigenome, and transcriptome datasets on a genome-wide scale has significantly increased our capability in capturing signatures that represent both the intrinsic and extrinsic biological features of tumors. These biological differences can help in precise molecular subtyping of cancer, predicting tumor progression, metastatic potential, and resistance to therapeutic agents. In this review, we summarized the current development of genomic, methylomic, transcriptomic, proteomic and metabolic signatures in the field of cancer research and highlighted their potentials in clinical applications to improve diagnosis, prognosis, and treatment decision in cancer patients.
Collapse
Affiliation(s)
- Wei Ma
- Hong Kong Genome Institute, Hong Kong, China
| | - Wenshu Tang
- Hong Kong Genome Institute, Hong Kong, China
| | | | | | | | | | - Brian H.Y. Chung
- Hong Kong Genome Institute, Hong Kong, China
- Department of Pediatrics and Adolescent Medicine, School of Clinical Medicine, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Hong Kong Genome Project
- Hong Kong Genome Institute, Hong Kong, China
- Department of Pediatrics and Adolescent Medicine, School of Clinical Medicine, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| |
Collapse
|
3
|
van Hilten A, Katz S, Saccenti E, Niessen WJ, Roshchupkin GV. Designing interpretable deep learning applications for functional genomics: a quantitative analysis. Brief Bioinform 2024; 25:bbae449. [PMID: 39293804 PMCID: PMC11410376 DOI: 10.1093/bib/bbae449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 08/07/2024] [Accepted: 08/28/2024] [Indexed: 09/20/2024] Open
Abstract
Deep learning applications have had a profound impact on many scientific fields, including functional genomics. Deep learning models can learn complex interactions between and within omics data; however, interpreting and explaining these models can be challenging. Interpretability is essential not only to help progress our understanding of the biological mechanisms underlying traits and diseases but also for establishing trust in these model's efficacy for healthcare applications. Recognizing this importance, recent years have seen the development of numerous diverse interpretability strategies, making it increasingly difficult to navigate the field. In this review, we present a quantitative analysis of the challenges arising when designing interpretable deep learning solutions in functional genomics. We explore design choices related to the characteristics of genomics data, the neural network architectures applied, and strategies for interpretation. By quantifying the current state of the field with a predefined set of criteria, we find the most frequent solutions, highlight exceptional examples, and identify unexplored opportunities for developing interpretable deep learning models in genomics.
Collapse
Affiliation(s)
- Arno van Hilten
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
| | - Sonja Katz
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, 6700 HB Wageningen WE, The Netherlands
| | - Edoardo Saccenti
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, 6700 HB Wageningen WE, The Netherlands
| | - Wiro J Niessen
- Department of Imaging Physics, Delft University of Technology, 2628 CD Delft, The Netherlands
| | - Gennady V Roshchupkin
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
- Department of Epidemiology, Erasmus MC, 3015 GD Rotterdam, The Netherlands
| |
Collapse
|
4
|
Valous NA, Popp F, Zörnig I, Jäger D, Charoentong P. Graph machine learning for integrated multi-omics analysis. Br J Cancer 2024; 131:205-211. [PMID: 38729996 PMCID: PMC11263675 DOI: 10.1038/s41416-024-02706-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 04/25/2024] [Accepted: 04/26/2024] [Indexed: 05/12/2024] Open
Abstract
Multi-omics experiments at bulk or single-cell resolution facilitate the discovery of hypothesis-generating biomarkers for predicting response to therapy, as well as aid in uncovering mechanistic insights into cellular and microenvironmental processes. Many methods for data integration have been developed for the identification of key elements that explain or predict disease risk or other biological outcomes. The heterogeneous graph representation of multi-omics data provides an advantage for discerning patterns suitable for predictive/exploratory analysis, thus permitting the modeling of complex relationships. Graph-based approaches-including graph neural networks-potentially offer a reliable methodological toolset that can provide a tangible alternative to scientists and clinicians that seek ideas and implementation strategies in the integrated analysis of their omics sets for biomedical research. Graph-based workflows continue to push the limits of the technological envelope, and this perspective provides a focused literature review of research articles in which graph machine learning is utilized for integrated multi-omics data analyses, with several examples that demonstrate the effectiveness of graph-based approaches.
Collapse
Affiliation(s)
- Nektarios A Valous
- Applied Tumor Immunity Clinical Cooperation Unit, National Center for Tumor Diseases (NCT), German Cancer Research Center (DKFZ), Im Neuenheimer Feld 460, 69120, Heidelberg, Germany.
- Center for Quantitative Analysis of Molecular and Cellular Biosystems (Bioquant), Heidelberg University, Im Neuenheimer Feld 267, 69120, Heidelberg, Germany.
| | - Ferdinand Popp
- Applied Tumor Immunity Clinical Cooperation Unit, National Center for Tumor Diseases (NCT), German Cancer Research Center (DKFZ), Im Neuenheimer Feld 460, 69120, Heidelberg, Germany
- Division of Applied Bioinformatics, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120, Heidelberg, Germany
| | - Inka Zörnig
- Center for Quantitative Analysis of Molecular and Cellular Biosystems (Bioquant), Heidelberg University, Im Neuenheimer Feld 267, 69120, Heidelberg, Germany
- Department of Medical Oncology, National Center for Tumor Diseases (NCT), Heidelberg University Hospital (UKHD), Im Neuenheimer Feld 460, 69120, Heidelberg, Germany
| | - Dirk Jäger
- Applied Tumor Immunity Clinical Cooperation Unit, National Center for Tumor Diseases (NCT), German Cancer Research Center (DKFZ), Im Neuenheimer Feld 460, 69120, Heidelberg, Germany
- Center for Quantitative Analysis of Molecular and Cellular Biosystems (Bioquant), Heidelberg University, Im Neuenheimer Feld 267, 69120, Heidelberg, Germany
- Department of Medical Oncology, National Center for Tumor Diseases (NCT), Heidelberg University Hospital (UKHD), Im Neuenheimer Feld 460, 69120, Heidelberg, Germany
| | - Pornpimol Charoentong
- Center for Quantitative Analysis of Molecular and Cellular Biosystems (Bioquant), Heidelberg University, Im Neuenheimer Feld 267, 69120, Heidelberg, Germany
- Department of Medical Oncology, National Center for Tumor Diseases (NCT), Heidelberg University Hospital (UKHD), Im Neuenheimer Feld 460, 69120, Heidelberg, Germany
| |
Collapse
|
5
|
Ma W, Li M, Chu Z, Chen H. Smart Biosensor for Breast Cancer Survival Prediction Based on Multi-View Multi-Way Graph Learning. SENSORS (BASEL, SWITZERLAND) 2024; 24:3289. [PMID: 38894082 PMCID: PMC11174864 DOI: 10.3390/s24113289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Revised: 05/17/2024] [Accepted: 05/19/2024] [Indexed: 06/21/2024]
Abstract
Biosensors play a crucial role in detecting cancer signals by orchestrating a series of intricate biological and physical transduction processes. Among various cancers, breast cancer stands out due to its genetic underpinnings, which trigger uncontrolled cell proliferation, predominantly impacting women, and resulting in significant mortality rates. The utilization of biosensors in predicting survival time becomes paramount in formulating an optimal treatment strategy. However, conventional biosensors employing traditional machine learning methods encounter challenges in preprocessing features for the learning task. Despite the potential of deep learning techniques to automatically extract useful features, they often struggle to effectively leverage the intricate relationships between features and instances. To address this challenge, our study proposes a novel smart biosensor architecture that integrates a multi-view multi-way graph learning (MVMWGL) approach for predicting breast cancer survival time. This innovative approach enables the assimilation of insights from gene interactions and biosensor similarities. By leveraging real-world data, we conducted comprehensive evaluations, and our experimental results unequivocally demonstrate the superiority of the MVMWGL approach over existing methods.
Collapse
Affiliation(s)
- Wenming Ma
- School of Computer and Control Engineering, Yantai University, Yantai 264005, China; (M.L.); (Z.C.); (H.C.)
| | | | | | | |
Collapse
|
6
|
Chereda H, Leha A, Beißbarth T. Stable feature selection utilizing Graph Convolutional Neural Network and Layer-wise Relevance Propagation for biomarker discovery in breast cancer. Artif Intell Med 2024; 151:102840. [PMID: 38658129 DOI: 10.1016/j.artmed.2024.102840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 03/05/2024] [Accepted: 03/10/2024] [Indexed: 04/26/2024]
Abstract
High-throughput technologies are becoming increasingly important in discovering prognostic biomarkers and in identifying novel drug targets. With Mammaprint, Oncotype DX, and many other prognostic molecular signatures breast cancer is one of the paradigmatic examples of the utility of high-throughput data to deliver prognostic biomarkers, that can be represented in a form of a rather short gene list. Such gene lists can be obtained as a set of features (genes) that are important for the decisions of a Machine Learning (ML) method applied to high-dimensional gene expression data. Several studies have identified predictive gene lists for patient prognosis in breast cancer, but these lists are unstable and have only a few genes in common. Instability of feature selection impedes biological interpretability: genes that are relevant for cancer pathology should be members of any predictive gene list obtained for the same clinical type of patients. Stability and interpretability of selected features can be improved by including information on molecular networks in ML methods. Graph Convolutional Neural Network (GCNN) is a contemporary deep learning approach applicable to gene expression data structured by a prior knowledge molecular network. Layer-wise Relevance Propagation (LRP) and SHapley Additive exPlanations (SHAP) are methods to explain individual decisions of deep learning models. We used both GCNN+LRP and GCNN+SHAP techniques to construct feature sets by aggregating individual explanations. We suggest a methodology to systematically and quantitatively analyze the stability, the impact on the classification performance, and the interpretability of the selected feature sets. We used this methodology to compare GCNN+LRP to GCNN+SHAP and to more classical ML-based feature selection approaches. Utilizing a large breast cancer gene expression dataset we show that, while feature selection with SHAP is useful in applications where selected features have to be impactful for classification performance, among all studied methods GCNN+LRP delivers the most stable (reproducible) and interpretable gene lists.
Collapse
Affiliation(s)
- Hryhorii Chereda
- Medical Bioinformatics, University Medical Center Göttingen, Goldschmidtstraße 1, Göttingen, 37077, Germany
| | - Andreas Leha
- Medical Bioinformatics, University Medical Center Göttingen, Goldschmidtstraße 1, Göttingen, 37077, Germany; Medical Statistics, University Medical Center Göttingen, Humboldtallee 32, Göttingen, 37073, Germany; Scientific Core Facility Medical Biometry and Statistical Bioinformatics, University Medical Center Göttingen, Humboldtallee 32, Göttingen, 37073, Germany
| | - Tim Beißbarth
- Medical Bioinformatics, University Medical Center Göttingen, Goldschmidtstraße 1, Göttingen, 37077, Germany; Campus-Institute Data Science (CIDAS), University of Göttingen, Goldschmidtstraße 1, Göttingen, 37077, Germany.
| |
Collapse
|
7
|
Yao X, Ouyang S, Lian Y, Peng Q, Zhou X, Huang F, Hu X, Shi F, Xia J. PheSeq, a Bayesian deep learning model to enhance and interpret the gene-disease association studies. Genome Med 2024; 16:56. [PMID: 38627848 PMCID: PMC11020195 DOI: 10.1186/s13073-024-01330-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 04/02/2024] [Indexed: 04/19/2024] Open
Abstract
Despite the abundance of genotype-phenotype association studies, the resulting association outcomes often lack robustness and interpretations. To address these challenges, we introduce PheSeq, a Bayesian deep learning model that enhances and interprets association studies through the integration and perception of phenotype descriptions. By implementing the PheSeq model in three case studies on Alzheimer's disease, breast cancer, and lung cancer, we identify 1024 priority genes for Alzheimer's disease and 818 and 566 genes for breast cancer and lung cancer, respectively. Benefiting from data fusion, these findings represent moderate positive rates, high recall rates, and interpretation in gene-disease association studies.
Collapse
Affiliation(s)
- Xinzhi Yao
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Sizhuo Ouyang
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Yulong Lian
- College of Science, Huazhong Agricultural University, Wuhan, China
| | - Qianqian Peng
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Xionghui Zhou
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Feier Huang
- College of Life Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Xuehai Hu
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Feng Shi
- College of Science, Huazhong Agricultural University, Wuhan, China
| | - Jingbo Xia
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China.
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China.
| |
Collapse
|
8
|
Yan H, Weng D, Li D, Gu Y, Ma W, Liu Q. Prior knowledge-guided multilevel graph neural network for tumor risk prediction and interpretation via multi-omics data integration. Brief Bioinform 2024; 25:bbae184. [PMID: 38670157 PMCID: PMC11052635 DOI: 10.1093/bib/bbae184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/11/2024] [Accepted: 04/06/2024] [Indexed: 04/28/2024] Open
Abstract
The interrelation and complementary nature of multi-omics data can provide valuable insights into the intricate molecular mechanisms underlying diseases. However, challenges such as limited sample size, high data dimensionality and differences in omics modalities pose significant obstacles to fully harnessing the potential of these data. The prior knowledge such as gene regulatory network and pathway information harbors useful gene-gene interaction and gene functional module information. To effectively integrate multi-omics data and make full use of the prior knowledge, here, we propose a Multilevel-graph neural network (GNN): a hierarchically designed deep learning algorithm that sequentially leverages multi-omics data, gene regulatory networks and pathway information to extract features and enhance accuracy in predicting survival risk. Our method achieved better accuracy compared with existing methods. Furthermore, key factors nonlinearly associated with the tumor pathogenesis are prioritized by employing two interpretation algorithms (i.e. GNN-Explainer and IGscore) for neural networks, at gene and pathway level, respectively. The top genes and pathways exhibit strong associations with disease in survival analyses, many of which such as SEC61G and CYP27B1 are previously reported in the literature.
Collapse
Affiliation(s)
- Hongxi Yan
- Department of Computer Science, Beihang University, XueYuan Road, 100191, BeiJing, China
| | - Dawei Weng
- School of Biomedical Engineering, Capital Medical University, 10 You An Men WaiXi Tou Tiao, 100069, Beijing, China
| | - Dongguo Li
- School of Biomedical Engineering, Capital Medical University, 10 You An Men WaiXi Tou Tiao, 100069, Beijing, China
| | - Yu Gu
- School of Biomedical Engineering, Capital Medical University, 10 You An Men WaiXi Tou Tiao, 100069, Beijing, China
| | - Wenji Ma
- Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, 227 South Chongqing Road, 200025, Shanghai, China
| | - Qingjie Liu
- Department of Computer Science, Beihang University, XueYuan Road, 100191, BeiJing, China
| |
Collapse
|
9
|
Klauschen F, Dippel J, Keyl P, Jurmeister P, Bockmayr M, Mock A, Buchstab O, Alber M, Ruff L, Montavon G, Müller KR. [Explainable artificial intelligence in pathology]. PATHOLOGIE (HEIDELBERG, GERMANY) 2024; 45:133-139. [PMID: 38315198 DOI: 10.1007/s00292-024-01308-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 01/19/2024] [Indexed: 02/07/2024]
Abstract
With the advancements in precision medicine, the demands on pathological diagnostics have increased, requiring standardized, quantitative, and integrated assessments of histomorphological and molecular pathological data. Great hopes are placed in artificial intelligence (AI) methods, which have demonstrated the ability to analyze complex clinical, histological, and molecular data for disease classification, biomarker quantification, and prognosis estimation. This paper provides an overview of the latest developments in pathology AI, discusses the limitations, particularly concerning the black box character of AI, and describes solutions to make decision processes more transparent using methods of so-called explainable AI (XAI).
Collapse
Affiliation(s)
- Frederick Klauschen
- Pathologisches Institut, Ludwig-Maximilians-Universität München, Thalkirchner Str. 36, 80337, München, Deutschland.
- Institut für Pathologie, Charité - Universitätsmedizin Berlin, Berlin, Deutschland.
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Berlin, Deutschland.
- Deutsches Krebsforschungszentrum (DKTK/DKFZ), Partnerstandort München, München, Deutschland.
| | - Jonas Dippel
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Berlin, Deutschland
- Machine Learning Group, Fachbereich Elektrotechnik und Informatik, Technische Universität Berlin, Berlin, Deutschland
| | - Philipp Keyl
- Pathologisches Institut, Ludwig-Maximilians-Universität München, Thalkirchner Str. 36, 80337, München, Deutschland
| | - Philipp Jurmeister
- Pathologisches Institut, Ludwig-Maximilians-Universität München, Thalkirchner Str. 36, 80337, München, Deutschland
- Deutsches Krebsforschungszentrum (DKTK/DKFZ), Partnerstandort München, München, Deutschland
| | - Michael Bockmayr
- Institut für Pathologie, Charité - Universitätsmedizin Berlin, Berlin, Deutschland
- Pädiatrische Hämatologie und Onkologie, Universitätsklinikum Hamburg-Eppendorf, Hamburg, Deutschland
- Forschungsinstitut Kinderkrebs-Zentrum Hamburg, Hamburg, Deutschland
| | - Andreas Mock
- Pathologisches Institut, Ludwig-Maximilians-Universität München, Thalkirchner Str. 36, 80337, München, Deutschland
- Deutsches Krebsforschungszentrum (DKTK/DKFZ), Partnerstandort München, München, Deutschland
| | - Oliver Buchstab
- Pathologisches Institut, Ludwig-Maximilians-Universität München, Thalkirchner Str. 36, 80337, München, Deutschland
| | - Maximilian Alber
- Institut für Pathologie, Charité - Universitätsmedizin Berlin, Berlin, Deutschland
- Aignostics GmbH, Berlin, Deutschland
| | | | - Grégoire Montavon
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Berlin, Deutschland
- Machine Learning Group, Fachbereich Elektrotechnik und Informatik, Technische Universität Berlin, Berlin, Deutschland
- Fachbereich Mathematik und Informatik, Freie Universität Berlin, Berlin, Deutschland
| | - Klaus-Robert Müller
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Berlin, Deutschland.
- Machine Learning Group, Fachbereich Elektrotechnik und Informatik, Technische Universität Berlin, Berlin, Deutschland.
- Department of Artificial Intelligence, Korea University, Seoul, Südkorea.
- Max-Planck-Institut für Informatik, Saarbrücken, Deutschland.
- Machine Learning/Intelligent Data Analysis (IDA), Technische Universität Berlin, Marchstr. 23, 10587, Berlin, Deutschland.
| |
Collapse
|
10
|
Klauschen F, Dippel J, Keyl P, Jurmeister P, Bockmayr M, Mock A, Buchstab O, Alber M, Ruff L, Montavon G, Müller KR. Toward Explainable Artificial Intelligence for Precision Pathology. ANNUAL REVIEW OF PATHOLOGY 2024; 19:541-570. [PMID: 37871132 DOI: 10.1146/annurev-pathmechdis-051222-113147] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
The rapid development of precision medicine in recent years has started to challenge diagnostic pathology with respect to its ability to analyze histological images and increasingly large molecular profiling data in a quantitative, integrative, and standardized way. Artificial intelligence (AI) and, more precisely, deep learning technologies have recently demonstrated the potential to facilitate complex data analysis tasks, including clinical, histological, and molecular data for disease classification; tissue biomarker quantification; and clinical outcome prediction. This review provides a general introduction to AI and describes recent developments with a focus on applications in diagnostic pathology and beyond. We explain limitations including the black-box character of conventional AI and describe solutions to make machine learning decisions more transparent with so-called explainable AI. The purpose of the review is to foster a mutual understanding of both the biomedical and the AI side. To that end, in addition to providing an overview of the relevant foundations in pathology and machine learning, we present worked-through examples for a better practical understanding of what AI can achieve and how it should be done.
Collapse
Affiliation(s)
- Frederick Klauschen
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
- Institute of Pathology, Charité Universitätsmedizin Berlin, Berlin, Germany
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- German Cancer Consortium, German Cancer Research Center (DKTK/DKFZ), Munich Partner Site, Munich, Germany
| | - Jonas Dippel
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- Machine Learning Group, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany;
| | - Philipp Keyl
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
| | - Philipp Jurmeister
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
- German Cancer Consortium, German Cancer Research Center (DKTK/DKFZ), Munich Partner Site, Munich, Germany
| | - Michael Bockmayr
- Institute of Pathology, Charité Universitätsmedizin Berlin, Berlin, Germany
- Department of Pediatric Hematology and Oncology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Research Institute Children's Cancer Center Hamburg, Hamburg, Germany
| | - Andreas Mock
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
- German Cancer Consortium, German Cancer Research Center (DKTK/DKFZ), Munich Partner Site, Munich, Germany
| | - Oliver Buchstab
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
| | - Maximilian Alber
- Institute of Pathology, Charité Universitätsmedizin Berlin, Berlin, Germany
- Aignostics, Berlin, Germany
| | | | - Grégoire Montavon
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- Machine Learning Group, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany;
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | - Klaus-Robert Müller
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- Machine Learning Group, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany;
- Department of Artificial Intelligence, Korea University, Seoul, Korea
- Max Planck Institute for Informatics, Saarbrücken, Germany
| |
Collapse
|
11
|
Brouard C, Mourad R, Vialaneix N. Should we really use graph neural networks for transcriptomic prediction? Brief Bioinform 2024; 25:bbae027. [PMID: 38349060 PMCID: PMC10939369 DOI: 10.1093/bib/bbae027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 12/20/2023] [Accepted: 01/17/2024] [Indexed: 02/15/2024] Open
Abstract
The recent development of deep learning methods have undoubtedly led to great improvement in various machine learning tasks, especially in prediction tasks. This type of methods have also been adapted to answer various problems in bioinformatics, including automatic genome annotation, artificial genome generation or phenotype prediction. In particular, a specific type of deep learning method, called graph neural network (GNN) has repeatedly been reported as a good candidate to predict phenotypes from gene expression because its ability to embed information on gene regulation or co-expression through the use of a gene network. However, up to date, no complete and reproducible benchmark has ever been performed to analyze the trade-off between cost and benefit of this approach compared to more standard (and simpler) machine learning methods. In this article, we provide such a benchmark, based on clear and comparable policies to evaluate the different methods on several datasets. Our conclusion is that GNN rarely provides a real improvement in prediction performance, especially when compared to the computation effort required by the methods. Our findings on a limited but controlled simulated dataset shows that this could be explained by the limited quality or predictive power of the input biological gene network itself.
Collapse
Affiliation(s)
- Céline Brouard
- Université Fédérale de Toulouse, INRAE, MIAT, 31326 Castanet-Tolosan, France
| | - Raphaël Mourad
- Université Fédérale de Toulouse, INRAE, MIAT, 31326 Castanet-Tolosan, France
- Université Paul Sabatier, 31062 Toulouse, France
| | - Nathalie Vialaneix
- Université Fédérale de Toulouse, INRAE, MIAT, 31326 Castanet-Tolosan, France
| |
Collapse
|
12
|
Somers J, Fenner M, Kong G, Thirumalaisamy D, Yashar WM, Thapa K, Kinali M, Nikolova O, Babur Ö, Demir E. A framework for considering prior information in network-based approaches to omics data analysis. Proteomics 2023; 23:e2200402. [PMID: 37986684 DOI: 10.1002/pmic.202200402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 09/20/2023] [Accepted: 09/21/2023] [Indexed: 11/22/2023]
Abstract
For decades, molecular biologists have been uncovering the mechanics of biological systems. Efforts to bring their findings together have led to the development of multiple databases and information systems that capture and present pathway information in a computable network format. Concurrently, the advent of modern omics technologies has empowered researchers to systematically profile cellular processes across different modalities. Numerous algorithms, methodologies, and tools have been developed to use prior knowledge networks (PKNs) in the analysis of omics datasets. Interestingly, it has been repeatedly demonstrated that the source of prior knowledge can greatly impact the results of a given analysis. For these methods to be successful it is paramount that their selection of PKNs is amenable to the data type and the computational task they aim to accomplish. Here we present a five-level framework that broadly describes network models in terms of their scope, level of detail, and ability to inform causal predictions. To contextualize this framework, we review a handful of network-based omics analysis methods at each level, while also describing the computational tasks they aim to accomplish.
Collapse
Affiliation(s)
- Julia Somers
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, Oregon, USA
| | - Madeleine Fenner
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, Oregon, USA
| | - Garth Kong
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, Oregon, USA
- Division of Oncological Sciences, Oregon Health and Science University, Portland, Oregon, USA
| | - Dharani Thirumalaisamy
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, Oregon, USA
| | - William M Yashar
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, Oregon, USA
- Division of Oncological Sciences, Oregon Health and Science University, Portland, Oregon, USA
| | - Kisan Thapa
- Computer Science Department, University of Massachusetts Boston, College of Science and Mathematics, Boston, Massachusetts, USA
| | - Meric Kinali
- Computer Science Department, University of Massachusetts Boston, College of Science and Mathematics, Boston, Massachusetts, USA
| | - Olga Nikolova
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, Oregon, USA
- Division of Oncological Sciences, Oregon Health and Science University, Portland, Oregon, USA
| | - Özgün Babur
- Computer Science Department, University of Massachusetts Boston, College of Science and Mathematics, Boston, Massachusetts, USA
| | - Emek Demir
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, Oregon, USA
| |
Collapse
|
13
|
Pfeifer B, Chereda H, Martin R, Saranti A, Clemens S, Hauschild AC, Beißbarth T, Holzinger A, Heider D. Ensemble-GNN: federated ensemble learning with graph neural networks for disease module discovery and classification. Bioinformatics 2023; 39:btad703. [PMID: 37988152 PMCID: PMC10684359 DOI: 10.1093/bioinformatics/btad703] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 09/06/2023] [Accepted: 11/20/2023] [Indexed: 11/22/2023] Open
Abstract
SUMMARY Federated learning enables collaboration in medicine, where data is scattered across multiple centers without the need to aggregate the data in a central cloud. While, in general, machine learning models can be applied to a wide range of data types, graph neural networks (GNNs) are particularly developed for graphs, which are very common in the biomedical domain. For instance, a patient can be represented by a protein-protein interaction (PPI) network where the nodes contain the patient-specific omics features. Here, we present our Ensemble-GNN software package, which can be used to deploy federated, ensemble-based GNNs in Python. Ensemble-GNN allows to quickly build predictive models utilizing PPI networks consisting of various node features such as gene expression and/or DNA methylation. We exemplary show the results from a public dataset of 981 patients and 8469 genes from the Cancer Genome Atlas (TCGA). AVAILABILITY AND IMPLEMENTATION The source code is available at https://github.com/pievos101/Ensemble-GNN, and the data at Zenodo (DOI: 10.5281/zenodo.8305122).
Collapse
Affiliation(s)
- Bastian Pfeifer
- Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Graz 8036, Austria
| | - Hryhorii Chereda
- Medical Bioinformatics, University Medical Center Göttingen, Göttingen 37077, Germany
| | - Roman Martin
- Data Science in Biomedicine, Department of Mathematics and Computer Science, University of Marburg, Marburg 35043, Germany
| | - Anna Saranti
- Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Graz 8036, Austria
- Human-Centered AI Lab, University of Natural Resources and Life Sciences, Vienna 1190, Austria
| | - Sandra Clemens
- Data Science in Biomedicine, Department of Mathematics and Computer Science, University of Marburg, Marburg 35043, Germany
| | - Anne-Christin Hauschild
- Institute for Medical Informatics, University Medical Center Göttingen, Göttingen 37075, Germany
| | - Tim Beißbarth
- Medical Bioinformatics, University Medical Center Göttingen, Göttingen 37077, Germany
| | - Andreas Holzinger
- Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Graz 8036, Austria
- Human-Centered AI Lab, University of Natural Resources and Life Sciences, Vienna 1190, Austria
| | - Dominik Heider
- Data Science in Biomedicine, Department of Mathematics and Computer Science, University of Marburg, Marburg 35043, Germany
| |
Collapse
|
14
|
Li Y, Zhang SW, Xie MY, Zhang T. PhenoDriver: interpretable framework for studying personalized phenotype-associated driver genes in breast cancer. Brief Bioinform 2023; 24:bbad291. [PMID: 37738403 DOI: 10.1093/bib/bbad291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Revised: 07/12/2023] [Accepted: 07/27/2023] [Indexed: 09/24/2023] Open
Abstract
Identifying personalized cancer driver genes and further revealing their oncogenic mechanisms is critical for understanding the mechanisms of cell transformation and aiding clinical diagnosis. Almost all existing methods primarily focus on identifying driver genes at the cohort or individual level but fail to further uncover their underlying oncogenic mechanisms. To fill this gap, we present an interpretable framework, PhenoDriver, to identify personalized cancer driver genes, elucidate their roles in cancer development and uncover the association between driver genes and clinical phenotypic alterations. By analyzing 988 breast cancer patients, we demonstrate the outstanding performance of PhenoDriver in identifying breast cancer driver genes at the cohort level compared to other state-of-the-art methods. Otherwise, our PhenoDriver can also effectively identify driver genes with both recurrent and rare mutations in individual patients. We further explore and reveal the oncogenic mechanisms of some known and unknown breast cancer driver genes (e.g. TP53, MAP3K1, HTT, etc.) identified by PhenoDriver, and construct their subnetworks for regulating clinical abnormal phenotypes. Notably, most of our findings are consistent with existing biological knowledge. Based on the personalized driver profiles, we discover two existing and one unreported breast cancer subtypes and uncover their molecular mechanisms. These results intensify our understanding for breast cancer mechanisms, guide therapeutic decisions and assist in the development of targeted anticancer therapies.
Collapse
Affiliation(s)
- Yan Li
- School of Automation from Northwestern Polytechnical University, China
| | - Shao-Wu Zhang
- School of Automation from Northwestern Polytechnical University, China
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, China
| | - Ming-Yu Xie
- School of Automation from Northwestern Polytechnical University, China
| | - Tong Zhang
- School of Automation from Northwestern Polytechnical University, China
| |
Collapse
|
15
|
Tran KA, Addala V, Johnston RL, Lovell D, Bradley A, Koufariotis LT, Wood S, Wu SZ, Roden D, Al-Eryani G, Swarbrick A, Williams ED, Pearson JV, Kondrashova O, Waddell N. Performance of tumour microenvironment deconvolution methods in breast cancer using single-cell simulated bulk mixtures. Nat Commun 2023; 14:5758. [PMID: 37717006 PMCID: PMC10505141 DOI: 10.1038/s41467-023-41385-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Accepted: 09/01/2023] [Indexed: 09/18/2023] Open
Abstract
Cells within the tumour microenvironment (TME) can impact tumour development and influence treatment response. Computational approaches have been developed to deconvolve the TME from bulk RNA-seq. Using scRNA-seq profiling from breast tumours we simulate thousands of bulk mixtures, representing tumour purities and cell lineages, to compare the performance of nine TME deconvolution methods (BayesPrism, Scaden, CIBERSORTx, MuSiC, DWLS, hspe, CPM, Bisque, and EPIC). Some methods are more robust in deconvolving mixtures with high tumour purity levels. Most methods tend to mis-predict normal epithelial for cancer epithelial as tumour purity increases, a finding that is validated in two independent datasets. The breast cancer molecular subtype influences this mis-prediction. BayesPrism and DWLS have the lowest combined numbers of false positives and false negatives, and have the best performance when deconvolving granular immune lineages. Our findings highlight the need for more single-cell characterisation of rarer cell types, and suggest that tumour cell compositions should be considered when deconvolving the TME.
Collapse
Affiliation(s)
- Khoa A Tran
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia
- School of Biomedical Sciences, Queensland University of Technology (QUT), Brisbane, QLD, 4000, Australia
| | - Venkateswar Addala
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia
| | - Rebecca L Johnston
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia
| | - David Lovell
- School of Computer Science, Queensland University of Technology, Brisbane, QLD, 4000, Australia
- QUT Centre for Data Science, Brisbane, QLD, 4000, Australia
| | - Andrew Bradley
- Faculty of Engineering, Queensland University of Technology, Brisbane, QLD, 4000, Australia
| | - Lambros T Koufariotis
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia
| | - Scott Wood
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia
| | - Sunny Z Wu
- Cancer Ecosystems Program, Garvan Institute of Medical Research, Darlinghurst, NSW, 2010, Australia
- School of Clinical Medicine, Faculty of Medicine and Health, UNSW Sydney, Kensington, NSW, 2052, Australia
| | - Daniel Roden
- Cancer Ecosystems Program, Garvan Institute of Medical Research, Darlinghurst, NSW, 2010, Australia
- School of Clinical Medicine, Faculty of Medicine and Health, UNSW Sydney, Kensington, NSW, 2052, Australia
| | - Ghamdan Al-Eryani
- Cancer Ecosystems Program, Garvan Institute of Medical Research, Darlinghurst, NSW, 2010, Australia
- School of Clinical Medicine, Faculty of Medicine and Health, UNSW Sydney, Kensington, NSW, 2052, Australia
| | - Alexander Swarbrick
- Cancer Ecosystems Program, Garvan Institute of Medical Research, Darlinghurst, NSW, 2010, Australia
- School of Clinical Medicine, Faculty of Medicine and Health, UNSW Sydney, Kensington, NSW, 2052, Australia
| | - Elizabeth D Williams
- School of Biomedical Sciences, Queensland University of Technology (QUT), Brisbane, QLD, 4000, Australia
- Australian Prostate Cancer Research Centre - Queensland (APCRC-Q) and Queensland Bladder Cancer Initiative (QBCI), Brisbane, QLD, 4000, Australia
| | - John V Pearson
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia
| | - Olga Kondrashova
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia
| | - Nicola Waddell
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia.
- School of Biomedical Sciences, Queensland University of Technology (QUT), Brisbane, QLD, 4000, Australia.
| |
Collapse
|
16
|
Tian L, Yu T. An integrated deep learning framework for the interpretation of untargeted metabolomics data. Brief Bioinform 2023; 24:bbad244. [PMID: 37369636 DOI: 10.1093/bib/bbad244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 06/02/2023] [Accepted: 06/12/2023] [Indexed: 06/29/2023] Open
Abstract
Untargeted metabolomics is gaining widespread applications. The key aspects of the data analysis include modeling complex activities of the metabolic network, selecting metabolites associated with clinical outcome and finding critical metabolic pathways to reveal biological mechanisms. One of the key roadblocks in data analysis is not well-addressed, which is the problem of matching uncertainty between data features and known metabolites. Given the limitations of the experimental technology, the identities of data features cannot be directly revealed in the data. The predominant approach for mapping features to metabolites is to match the mass-to-charge ratio (m/z) of data features to those derived from theoretical values of known metabolites. The relationship between features and metabolites is not one-to-one since some metabolites share molecular composition, and various adduct ions can be derived from the same metabolite. This matching uncertainty causes unreliable metabolite selection and functional analysis results. Here we introduce an integrated deep learning framework for metabolomics data that take matching uncertainty into consideration. The model is devised with a gradual sparsification neural network based on the known metabolic network and the annotation relationship between features and metabolites. This architecture characterizes metabolomics data and reflects the modular structure of biological system. Three goals can be achieved simultaneously without requiring much complex inference and additional assumptions: (1) evaluate metabolite importance, (2) infer feature-metabolite matching likelihood and (3) select disease sub-networks. When applied to a COVID metabolomics dataset and an aging mouse brain dataset, our method found metabolic sub-networks that were easily interpretable.
Collapse
Affiliation(s)
- Leqi Tian
- School of Data Science, The Chinese University of Hong Kong - Shenzhen, Guangdong, China
- Shenzhen Research Institute of Big Data, Guangdong, China
| | - Tianwei Yu
- School of Data Science, The Chinese University of Hong Kong - Shenzhen, Guangdong, China
- Shenzhen Research Institute of Big Data, Guangdong, China
- Guangdong Provincial Key Laboratory of Big Data Computing, Guangdong, China
| |
Collapse
|
17
|
Tian L, Wu W, Yu T. Graph Random Forest: A Graph Embedded Algorithm for Identifying Highly Connected Important Features. Biomolecules 2023; 13:1153. [PMID: 37509188 PMCID: PMC10377046 DOI: 10.3390/biom13071153] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Revised: 06/26/2023] [Accepted: 06/30/2023] [Indexed: 07/30/2023] Open
Abstract
Random Forest (RF) is a widely used machine learning method with good performance on classification and regression tasks. It works well under low sample size situations, which benefits applications in the field of biology. For example, gene expression data often involve much larger numbers of features (p) compared to the size of samples (n). Though the predictive accuracy using RF is often high, there are some problems when selecting important genes using RF. The important genes selected by RF are usually scattered on the gene network, which conflicts with the biological assumption of functional consistency between effective features. To improve feature selection by incorporating external topological information between genes, we propose the Graph Random Forest (GRF) for identifying highly connected important features by involving the known biological network when constructing the forest. The algorithm can identify effective features that form highly connected sub-graphs and achieve equivalent classification accuracy to RF. To evaluate the capability of our proposed method, we conducted simulation experiments and applied the method to two real datasets-non-small cell lung cancer RNA-seq data from The Cancer Genome Atlas, and human embryonic stem cell RNA-seq dataset (GSE93593). The resulting high classification accuracy, connectivity of selected sub-graphs, and interpretable feature selection results suggest the method is a helpful addition to graph-based classification models and feature selection procedures.
Collapse
Affiliation(s)
- Leqi Tian
- School of Data Science, The Chinese University of Hong Kong, Shenzhen 518172, China
- Shenzhen Research Institute of Big Data, Shenzhen 518172, China
| | - Wenbin Wu
- School of Data Science, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Tianwei Yu
- School of Data Science, The Chinese University of Hong Kong, Shenzhen 518172, China
- Shenzhen Research Institute of Big Data, Shenzhen 518172, China
- Guangdong Provincial Key Laboratory of Big Data Computing, Shenzhen 518172, China
| |
Collapse
|
18
|
Wysocka M, Wysocki O, Zufferey M, Landers D, Freitas A. A systematic review of biologically-informed deep learning models for cancer: fundamental trends for encoding and interpreting oncology data. BMC Bioinformatics 2023; 24:198. [PMID: 37189058 PMCID: PMC10186658 DOI: 10.1186/s12859-023-05262-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 03/30/2023] [Indexed: 05/17/2023] Open
Abstract
BACKGROUND There is an increasing interest in the use of Deep Learning (DL) based methods as a supporting analytical framework in oncology. However, most direct applications of DL will deliver models with limited transparency and explainability, which constrain their deployment in biomedical settings. METHODS This systematic review discusses DL models used to support inference in cancer biology with a particular emphasis on multi-omics analysis. It focuses on how existing models address the need for better dialogue with prior knowledge, biological plausibility and interpretability, fundamental properties in the biomedical domain. For this, we retrieved and analyzed 42 studies focusing on emerging architectural and methodological advances, the encoding of biological domain knowledge and the integration of explainability methods. RESULTS We discuss the recent evolutionary arch of DL models in the direction of integrating prior biological relational and network knowledge to support better generalisation (e.g. pathways or Protein-Protein-Interaction networks) and interpretability. This represents a fundamental functional shift towards models which can integrate mechanistic and statistical inference aspects. We introduce a concept of bio-centric interpretability and according to its taxonomy, we discuss representational methodologies for the integration of domain prior knowledge in such models. CONCLUSIONS The paper provides a critical outlook into contemporary methods for explainability and interpretability used in DL for cancer. The analysis points in the direction of a convergence between encoding prior knowledge and improved interpretability. We introduce bio-centric interpretability which is an important step towards formalisation of biological interpretability of DL models and developing methods that are less problem- or application-specific.
Collapse
Affiliation(s)
- Magdalena Wysocka
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
| | - Oskar Wysocki
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920 Martigny, Switzerland
| | - Marie Zufferey
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920 Martigny, Switzerland
| | - Dónal Landers
- DeLondra Oncology Ltd, 38 Carlton Avenue, Wilmslow, SK9 4EP UK
| | - André Freitas
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920 Martigny, Switzerland
| |
Collapse
|
19
|
Zhang Z, Wei X. Artificial intelligence-assisted selection and efficacy prediction of antineoplastic strategies for precision cancer therapy. Semin Cancer Biol 2023; 90:57-72. [PMID: 36796530 DOI: 10.1016/j.semcancer.2023.02.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 01/12/2023] [Accepted: 02/13/2023] [Indexed: 02/16/2023]
Abstract
The rapid development of artificial intelligence (AI) technologies in the context of the vast amount of collectable data obtained from high-throughput sequencing has led to an unprecedented understanding of cancer and accelerated the advent of a new era of clinical oncology with a tone of precision treatment and personalized medicine. However, the gains achieved by a variety of AI models in clinical oncology practice are far from what one would expect, and in particular, there are still many uncertainties in the selection of clinical treatment options that pose significant challenges to the application of AI in clinical oncology. In this review, we summarize emerging approaches, relevant datasets and open-source software of AI and show how to integrate them to address problems from clinical oncology and cancer research. We focus on the principles and procedures for identifying different antitumor strategies with the assistance of AI, including targeted cancer therapy, conventional cancer therapy, and cancer immunotherapy. In addition, we also highlight the current challenges and directions of AI in clinical oncology translation. Overall, we hope this article will provide researchers and clinicians with a deeper understanding of the role and implications of AI in precision cancer therapy, and help AI move more quickly into accepted cancer guidelines.
Collapse
Affiliation(s)
- Zhe Zhang
- Laboratory of Aging Research and Cancer Drug Target, State Key Laboratory of Biotherapy and Cancer Center, National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, Chengdu 610041, PR China; State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, and Collaborative Innovation Center for Biotherapy, Chengdu 610041, PR China
| | - Xiawei Wei
- Laboratory of Aging Research and Cancer Drug Target, State Key Laboratory of Biotherapy and Cancer Center, National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, Chengdu 610041, PR China.
| |
Collapse
|
20
|
Nazir S, Dickson DM, Akram MU. Survey of explainable artificial intelligence techniques for biomedical imaging with deep neural networks. Comput Biol Med 2023; 156:106668. [PMID: 36863192 DOI: 10.1016/j.compbiomed.2023.106668] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2022] [Revised: 01/12/2023] [Accepted: 02/10/2023] [Indexed: 02/21/2023]
Abstract
Artificial Intelligence (AI) techniques of deep learning have revolutionized the disease diagnosis with their outstanding image classification performance. In spite of the outstanding results, the widespread adoption of these techniques in clinical practice is still taking place at a moderate pace. One of the major hindrance is that a trained Deep Neural Networks (DNN) model provides a prediction, but questions about why and how that prediction was made remain unanswered. This linkage is of utmost importance for the regulated healthcare domain to increase the trust in the automated diagnosis system by the practitioners, patients and other stakeholders. The application of deep learning for medical imaging has to be interpreted with caution due to the health and safety concerns similar to blame attribution in the case of an accident involving autonomous cars. The consequences of both a false positive and false negative cases are far reaching for patients' welfare and cannot be ignored. This is exacerbated by the fact that the state-of-the-art deep learning algorithms comprise of complex interconnected structures, millions of parameters, and a 'black box' nature, offering little understanding of their inner working unlike the traditional machine learning algorithms. Explainable AI (XAI) techniques help to understand model predictions which help develop trust in the system, accelerate the disease diagnosis, and meet adherence to regulatory requirements. This survey provides a comprehensive review of the promising field of XAI for biomedical imaging diagnostics. We also provide a categorization of the XAI techniques, discuss the open challenges, and provide future directions for XAI which would be of interest to clinicians, regulators and model developers.
Collapse
Affiliation(s)
- Sajid Nazir
- Department of Computing, Glasgow Caledonian University, Glasgow, UK.
| | - Diane M Dickson
- Department of Podiatry and Radiography, Research Centre for Health, Glasgow Caledonian University, Glasgow, UK
| | - Muhammad Usman Akram
- Computer and Software Engineering Department, National University of Sciences and Technology, Islamabad, Pakistan
| |
Collapse
|
21
|
Keyl P, Bischoff P, Dernbach G, Bockmayr M, Fritz R, Horst D, Blüthgen N, Montavon G, Müller KR, Klauschen F. Single-cell gene regulatory network prediction by explainable AI. Nucleic Acids Res 2023; 51:e20. [PMID: 36629274 PMCID: PMC9976884 DOI: 10.1093/nar/gkac1212] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 11/16/2022] [Accepted: 12/06/2022] [Indexed: 01/12/2023] Open
Abstract
The molecular heterogeneity of cancer cells contributes to the often partial response to targeted therapies and relapse of disease due to the escape of resistant cell populations. While single-cell sequencing has started to improve our understanding of this heterogeneity, it offers a mostly descriptive view on cellular types and states. To obtain more functional insights, we propose scGeneRAI, an explainable deep learning approach that uses layer-wise relevance propagation (LRP) to infer gene regulatory networks from static single-cell RNA sequencing data for individual cells. We benchmark our method with synthetic data and apply it to single-cell RNA sequencing data of a cohort of human lung cancers. From the predicted single-cell networks our approach reveals characteristic network patterns for tumor cells and normal epithelial cells and identifies subnetworks that are observed only in (subgroups of) tumor cells of certain patients. While current state-of-the-art methods are limited by their ability to only predict average networks for cell populations, our approach facilitates the reconstruction of networks down to the level of single cells which can be utilized to characterize the heterogeneity of gene regulation within and across tumors.
Collapse
Affiliation(s)
- Philipp Keyl
- Institute of Pathology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität Berlin, Charitéplatz 1, 10117 Berlin, Germany
| | - Philip Bischoff
- Institute of Pathology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität Berlin, Charitéplatz 1, 10117 Berlin, Germany
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Anna-Louisa-Karsch-Straße 2, 10178 Berlin, Germany
- German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Berlin partner site, Germany
| | - Gabriel Dernbach
- Institute of Pathology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität Berlin, Charitéplatz 1, 10117 Berlin, Germany
- BIFOLD – Berlin Institute for the Foundations of Learning and Data, Berlin, Germany
| | - Michael Bockmayr
- Institute of Pathology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität Berlin, Charitéplatz 1, 10117 Berlin, Germany
- Department of Pediatric Hematology and Oncolog, University Medical Center Hamburg-Eppendorf, Martinistr. 52, 20246 Hamburg, Germany
- Mildred Scheel Cancer Career Center HaTriCS4, University Medical Center Hamburg-Eppendorf Martinistr. 52, 20246 Hamburg, Germany
| | - Rebecca Fritz
- Institute of Pathology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität Berlin, Charitéplatz 1, 10117 Berlin, Germany
| | - David Horst
- Institute of Pathology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität Berlin, Charitéplatz 1, 10117 Berlin, Germany
- German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Berlin partner site, Germany
| | - Nils Blüthgen
- Institute of Pathology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität Berlin, Charitéplatz 1, 10117 Berlin, Germany
- Institut für Biologie, Humboldt University, Free University of Berlin, Unter den Linden 6, 10099 Berlin, Germany
| | - Grégoire Montavon
- BIFOLD – Berlin Institute for the Foundations of Learning and Data, Berlin, Germany
- Machine Learning Group, Technical University of Berlin, Marchstr. 23, 10587 Berlin, Germany
| | - Klaus-Robert Müller
- BIFOLD – Berlin Institute for the Foundations of Learning and Data, Berlin, Germany
- Machine Learning Group, Technical University of Berlin, Marchstr. 23, 10587 Berlin, Germany
- Department of Artificial Intelligence, Korea University, Seoul 136-713, South Korea
- Max-Planck-Institute for Informatics, Stuhlsatzenhausweg 4, 66123 Saarbrücken, Germany
| | - Frederick Klauschen
- Institute of Pathology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität Berlin, Charitéplatz 1, 10117 Berlin, Germany
- German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Berlin partner site, Germany
- BIFOLD – Berlin Institute for the Foundations of Learning and Data, Berlin, Germany
- Institute of Pathology, Ludwig-Maximilians-University Munich, Thalkirchner Str. 36, 80337 München, Germany
- German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Munich partner site, Germany
| |
Collapse
|
22
|
Alharbi F, Vakanski A. Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review. Bioengineering (Basel) 2023; 10:bioengineering10020173. [PMID: 36829667 PMCID: PMC9952758 DOI: 10.3390/bioengineering10020173] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 01/24/2023] [Accepted: 01/26/2023] [Indexed: 01/31/2023] Open
Abstract
Cancer is a term that denotes a group of diseases caused by the abnormal growth of cells that can spread in different parts of the body. According to the World Health Organization (WHO), cancer is the second major cause of death after cardiovascular diseases. Gene expression can play a fundamental role in the early detection of cancer, as it is indicative of the biochemical processes in tissue and cells, as well as the genetic characteristics of an organism. Deoxyribonucleic acid (DNA) microarrays and ribonucleic acid (RNA)-sequencing methods for gene expression data allow quantifying the expression levels of genes and produce valuable data for computational analysis. This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods. Both conventional and deep learning-based approaches are reviewed, with an emphasis on the application of deep learning models due to their comparative advantages for identifying gene patterns that are distinctive for various types of cancers. Relevant works that employ the most commonly used deep neural network architectures are covered, including multi-layer perceptrons, as well as convolutional, recurrent, graph, and transformer networks. This survey also presents an overview of the data collection methods for gene expression analysis and lists important datasets that are commonly used for supervised machine learning for this task. Furthermore, we review pertinent techniques for feature engineering and data preprocessing that are typically used to handle the high dimensionality of gene expression data, caused by a large number of genes present in data samples. The paper concludes with a discussion of future research directions for machine learning-based gene expression analysis for cancer classification.
Collapse
|
23
|
Farahani FV, Fiok K, Lahijanian B, Karwowski W, Douglas PK. Explainable AI: A review of applications to neuroimaging data. Front Neurosci 2022; 16:906290. [PMID: 36583102 PMCID: PMC9793854 DOI: 10.3389/fnins.2022.906290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 11/14/2022] [Indexed: 12/05/2022] Open
Abstract
Deep neural networks (DNNs) have transformed the field of computer vision and currently constitute some of the best models for representations learned via hierarchical processing in the human brain. In medical imaging, these models have shown human-level performance and even higher in the early diagnosis of a wide range of diseases. However, the goal is often not only to accurately predict group membership or diagnose but also to provide explanations that support the model decision in a context that a human can readily interpret. The limited transparency has hindered the adoption of DNN algorithms across many domains. Numerous explainable artificial intelligence (XAI) techniques have been developed to peer inside the "black box" and make sense of DNN models, taking somewhat divergent approaches. Here, we suggest that these methods may be considered in light of the interpretation goal, including functional or mechanistic interpretations, developing archetypal class instances, or assessing the relevance of certain features or mappings on a trained model in a post-hoc capacity. We then focus on reviewing recent applications of post-hoc relevance techniques as applied to neuroimaging data. Moreover, this article suggests a method for comparing the reliability of XAI methods, especially in deep neural networks, along with their advantages and pitfalls.
Collapse
Affiliation(s)
- Farzad V. Farahani
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, United States
- Department of Industrial Engineering and Management Systems, University of Central Florida, Orlando, FL, United States
| | - Krzysztof Fiok
- Department of Industrial Engineering and Management Systems, University of Central Florida, Orlando, FL, United States
| | - Behshad Lahijanian
- Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL, United States
- H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, United States
| | - Waldemar Karwowski
- Department of Industrial Engineering and Management Systems, University of Central Florida, Orlando, FL, United States
| | - Pamela K. Douglas
- School of Modeling, Simulation, and Training, University of Central Florida, Orlando, FL, United States
| |
Collapse
|
24
|
Loh HW, Ooi CP, Seoni S, Barua PD, Molinari F, Acharya UR. Application of explainable artificial intelligence for healthcare: A systematic review of the last decade (2011-2022). COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 226:107161. [PMID: 36228495 DOI: 10.1016/j.cmpb.2022.107161] [Citation(s) in RCA: 107] [Impact Index Per Article: 53.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 09/16/2022] [Accepted: 09/25/2022] [Indexed: 06/16/2023]
Abstract
BACKGROUND AND OBJECTIVES Artificial intelligence (AI) has branched out to various applications in healthcare, such as health services management, predictive medicine, clinical decision-making, and patient data and diagnostics. Although AI models have achieved human-like performance, their use is still limited because they are seen as a black box. This lack of trust remains the main reason for their low use in practice, especially in healthcare. Hence, explainable artificial intelligence (XAI) has been introduced as a technique that can provide confidence in the model's prediction by explaining how the prediction is derived, thereby encouraging the use of AI systems in healthcare. The primary goal of this review is to provide areas of healthcare that require more attention from the XAI research community. METHODS Multiple journal databases were thoroughly searched using PRISMA guidelines 2020. Studies that do not appear in Q1 journals, which are highly credible, were excluded. RESULTS In this review, we surveyed 99 Q1 articles covering the following XAI techniques: SHAP, LIME, GradCAM, LRP, Fuzzy classifier, EBM, CBR, rule-based systems, and others. CONCLUSION We discovered that detecting abnormalities in 1D biosignals and identifying key text in clinical notes are areas that require more attention from the XAI research community. We hope this is review will encourage the development of a holistic cloud system for a smart city.
Collapse
Affiliation(s)
- Hui Wen Loh
- School of Science and Technology, Singapore University of Social Sciences, Singapore
| | - Chui Ping Ooi
- School of Science and Technology, Singapore University of Social Sciences, Singapore
| | - Silvia Seoni
- Department of Electronics and Telecommunications, Biolab, Politecnico di Torino, Torino 10129, Italy
| | - Prabal Datta Barua
- Faculty of Engineering and Information Technology, University of Technology Sydney, Australia; School of Business (Information Systems), Faculty of Business, Education, Law & Arts, University of Southern Queensland, Australia
| | - Filippo Molinari
- Department of Electronics and Telecommunications, Biolab, Politecnico di Torino, Torino 10129, Italy
| | - U Rajendra Acharya
- School of Science and Technology, Singapore University of Social Sciences, Singapore; School of Business (Information Systems), Faculty of Business, Education, Law & Arts, University of Southern Queensland, Australia; School of Engineering, Ngee Ann Polytechnic, Singapore; Department of Bioinformatics and Medical Engineering, Asia University, Taiwan; Research Organization for Advanced Science and Technology (IROAST), Kumamoto University, Kumamoto, Japan.
| |
Collapse
|
25
|
Jiang X, Xu C. Deep Learning and Machine Learning with Grid Search to Predict Later Occurrence of Breast Cancer Metastasis Using Clinical Data. J Clin Med 2022; 11:jcm11195772. [PMID: 36233640 PMCID: PMC9570670 DOI: 10.3390/jcm11195772] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 07/30/2022] [Accepted: 09/21/2022] [Indexed: 11/16/2022] Open
Abstract
Background: It is important to be able to predict, for each individual patient, the likelihood of later metastatic occurrence, because the prediction can guide treatment plans tailored to a specific patient to prevent metastasis and to help avoid under-treatment or over-treatment. Deep neural network (DNN) learning, commonly referred to as deep learning, has become popular due to its success in image detection and prediction, but questions such as whether deep learning outperforms other machine learning methods when using non-image clinical data remain unanswered. Grid search has been introduced to deep learning hyperparameter tuning for the purpose of improving its prediction performance, but the effect of grid search on other machine learning methods are under-studied. In this research, we take the empirical approach to study the performance of deep learning and other machine learning methods when using non-image clinical data to predict the occurrence of breast cancer metastasis (BCM) 5, 10, or 15 years after the initial treatment. We developed prediction models using the deep feedforward neural network (DFNN) methods, as well as models using nine other machine learning methods, including naïve Bayes (NB), logistic regression (LR), support vector machine (SVM), LASSO, decision tree (DT), k-nearest neighbor (KNN), random forest (RF), AdaBoost (ADB), and XGBoost (XGB). We used grid search to tune hyperparameters for all methods. We then compared our feedforward deep learning models to the models trained using the nine other machine learning methods. Results: Based on the mean test AUC (Area under the ROC Curve) results, DFNN ranks 6th, 4th, and 3rd when predicting 5-year, 10-year, and 15-year BCM, respectively, out of 10 methods. The top performing methods in predicting 5-year BCM are XGB (1st), RF (2nd), and KNN (3rd). For predicting 10-year BCM, the top performers are XGB (1st), RF (2nd), and NB (3rd). Finally, for 15-year BCM, the top performers are SVM (1st), LR and LASSO (tied for 2nd), and DFNN (3rd). The ensemble methods RF and XGB outperform other methods when data are less balanced, while SVM, LR, LASSO, and DFNN outperform other methods when data are more balanced. Our statistical testing results show that at a significance level of 0.05, DFNN overall performs comparably to other machine learning methods when predicting 5-year, 10-year, and 15-year BCM. Conclusions: Our results show that deep learning with grid search overall performs at least as well as other machine learning methods when using non-image clinical data. It is interesting to note that some of the other machine learning methods, such as XGB, RF, and SVM, are very strong competitors of DFNN when incorporating grid search. It is also worth noting that the computation time required to do grid search with DFNN is much more than that required to do grid search with the other nine machine learning methods.
Collapse
Affiliation(s)
- Xia Jiang
- Correspondence: ; Tel.: +412-648-9310
| | | |
Collapse
|
26
|
Treppner M, Binder H, Hess M. Interpretable generative deep learning: an illustration with single cell gene expression data. Hum Genet 2022; 141:1481-1498. [PMID: 34988661 PMCID: PMC9360114 DOI: 10.1007/s00439-021-02417-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 08/06/2021] [Indexed: 11/26/2022]
Abstract
Deep generative models can learn the underlying structure, such as pathways or gene programs, from omics data. We provide an introduction as well as an overview of such techniques, specifically illustrating their use with single-cell gene expression data. For example, the low dimensional latent representations offered by various approaches, such as variational auto-encoders, are useful to get a better understanding of the relations between observed gene expressions and experimental factors or phenotypes. Furthermore, by providing a generative model for the latent and observed variables, deep generative models can generate synthetic observations, which allow us to assess the uncertainty in the learned representations. While deep generative models are useful to learn the structure of high-dimensional omics data by efficiently capturing non-linear dependencies between genes, they are sometimes difficult to interpret due to their neural network building blocks. More precisely, to understand the relationship between learned latent variables and observed variables, e.g., gene transcript abundances and external phenotypes, is difficult. Therefore, we also illustrate current approaches that allow us to infer the relationship between learned latent variables and observed variables as well as external phenotypes. Thereby, we render deep learning approaches more interpretable. In an application with single-cell gene expression data, we demonstrate the utility of the discussed methods.
Collapse
Affiliation(s)
- Martin Treppner
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Stefan-Meier-Str. 26, Freiburg, 79104, Germany.
| | - Harald Binder
- Freiburg Center for Data Analysis and Modeling, University of Freiburg, Freiburg, 79104, Germany
| | - Moritz Hess
- Freiburg Center for Data Analysis and Modeling, University of Freiburg, Freiburg, 79104, Germany
| |
Collapse
|
27
|
Zhang B, Fan T. Knowledge structure and emerging trends in the application of deep learning in genetics research: A bibliometric analysis [2000–2021]. Front Genet 2022; 13:951939. [PMID: 36081985 PMCID: PMC9445221 DOI: 10.3389/fgene.2022.951939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 07/13/2022] [Indexed: 11/13/2022] Open
Abstract
Introduction: Deep learning technology has been widely used in genetic research because of its characteristics of computability, statistical analysis, and predictability. Herein, we aimed to summarize standardized knowledge and potentially innovative approaches for deep learning applications of genetics by evaluating publications to encourage more research.Methods: The Science Citation Index Expanded TM (SCIE) database was searched for deep learning applications for genomics-related publications. Original articles and reviews were considered. In this study, we derived a clustered network from 69,806 references that were cited by the 1,754 related manuscripts identified. We used CiteSpace and VOSviewer to identify countries, institutions, journals, co-cited references, keywords, subject evolution, path, current characteristics, and emerging topics.Results: We assessed the rapidly increasing publications concerned about deep learning applications of genomics approaches and identified 1,754 articles that published reports focusing on this subject. Among these, a total of 101 countries and 2,487 institutes contributed publications, The United States of America had the most publications (728/1754) and the highest h-index, and the US has been in close collaborations with China and Germany. The reference clusters of SCI articles were clustered into seven categories: deep learning, logic regression, variant prioritization, random forests, scRNA-seq (single-cell RNA-seq), genomic regulation, and recombination. The keywords representing the research frontiers by year were prediction (2016–2021), sequence (2017–2021), mutation (2017–2021), and cancer (2019–2021).Conclusion: Here, we summarized the current literature related to the status of deep learning for genetics applications and analyzed the current research characteristics and future trajectories in this field. This work aims to provide resources for possible further intensive exploration and encourages more researchers to overcome the research of deep learning applications in genetics.
Collapse
Affiliation(s)
- Bijun Zhang
- Department of Clinical Genetics, Shengjing Hospital of China Medical University, Shenyang, China
| | - Ting Fan
- Department of Computer, School of Intelligent Medicine, China Medical University, Shenyang, China
- *Correspondence: Ting Fan,
| |
Collapse
|
28
|
Fu X, Bates PA. Application of deep learning methods: From molecular modelling to patient classification. Exp Cell Res 2022; 418:113278. [PMID: 35810775 DOI: 10.1016/j.yexcr.2022.113278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 06/16/2022] [Accepted: 07/05/2022] [Indexed: 11/28/2022]
Abstract
We are now well into the information driven age with complex, heterogeneous, datasets in the biological sciences continuing to grow at a rapid pace. Moreover, distilling of such datasets, to find new governing principles, are underway. Leading the surge are new and exciting algorithmic developments in computer simulation and machine learning, most notably for the latter, those centred on deep learning. However, practical applications of cell centric computations within the biological sciences, even when carefully benchmarked against existing experimental datasets, remain challenging. Here we discuss the application of deep learning methodologies to support our understanding of cell functionality and as an aid to patient classification. Whilst comprehensive end-to-end deep learning approaches that utilise knowledge of the cell and its molecular components to aid human disease classification are yet to be implemented, important for opening the door to more effective molecular and cell-based therapies, we illustrate that many deep learning applications have been developed to tackle components of such an ambitious pipeline. We end our discussion on what the future may hold, especially how an integrated framework of computer simulations and deep learning, in conjunction with wet-bench experimentation, could enable to reveal the governing principles underlying cell functionalities within the tissue environments cells operate.
Collapse
Affiliation(s)
- Xiao Fu
- Biomolecular Modelling Laboratory, The Francis Crick Institute, 1 Midland Rd, London, NW1 1AT, UK.
| | - Paul A Bates
- Biomolecular Modelling Laboratory, The Francis Crick Institute, 1 Midland Rd, London, NW1 1AT, UK.
| |
Collapse
|
29
|
Hanczar B, Bourgeais V, Zehraoui F. Assessment of deep learning and transfer learning for cancer prediction based on gene expression data. BMC Bioinformatics 2022; 23:262. [PMID: 35786378 PMCID: PMC9250744 DOI: 10.1186/s12859-022-04807-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 06/15/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Machine learning is now a standard tool for cancer prediction based on gene expression data. However, deep learning is still new for this task, and there is no clear consensus about its performance and utility. Few experimental works have evaluated deep neural networks and compared them with state-of-the-art machine learning. Moreover, their conclusions are not consistent. RESULTS We extensively evaluate the deep learning approach on 22 cancer prediction tasks based on gene expression data. We measure the impact of the main hyper-parameters and compare the performances of neural networks with the state-of-the-art. We also investigate the effectiveness of several transfer learning schemes in different experimental setups. CONCLUSION Based on our experimentations, we provide several recommendations to optimize the construction and training of a neural network model. We show that neural networks outperform the state-of-the-art methods only for very large training set size. For a small training set, we show that transfer learning is possible and may strongly improve the model performance in some cases.
Collapse
Affiliation(s)
- Blaise Hanczar
- IBISC, Université Paris-Saclay (Univ. Evry), 23 boulevard de France, 91034, Evry, France.
| | - Victoria Bourgeais
- IBISC, Université Paris-Saclay (Univ. Evry), 23 boulevard de France, 91034, Evry, France
| | - Farida Zehraoui
- IBISC, Université Paris-Saclay (Univ. Evry), 23 boulevard de France, 91034, Evry, France
| |
Collapse
|
30
|
Huang N, Liu P, Yan Y, Xu L, Huang Y, Fu G, Lan Y, Yang S, Song J, Li Y. Predicting the Risk of Dental Implant Loss Using Deep Learning. J Clin Periodontol 2022; 49:872-883. [PMID: 35734921 DOI: 10.1111/jcpe.13689] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 05/15/2022] [Accepted: 06/20/2022] [Indexed: 11/30/2022]
Abstract
AIM To investigate the feasibility of predicting dental implant loss risk with deep learning (DL) based on preoperative cone-beam computed tomography. MATERIALS AND METHODS Six hundred and three patients who underwent implant surgery (279 high-risk patients who did and 324 low-risk patients who did not experience implant loss within 5 years) from January 2012 to January 2020 were enrolled. Three models, a logistic regression clinical model (CM) based on clinical features, a DL model based on radiography features, and an integrated model (IM) developed by combining CM with DL, were developed to predict the 5-year implant loss risk. The area under the receiver operating characteristic curve (AUC) was used to evaluate the model performance. Time to implant loss was considered for both groups, and Kaplan-Meier curves were created and compared by the log-rank test. RESULTS The IM exhibited the best performance in predicting implant loss risk [AUC = 0.90, 95% confidence interval (CI) 0.84-0.95], followed by the DL model (AUC = 0.87, 95% CI 0.80-0.92) and the CM (AUC = 0.72, 95% CI 0.63-0.79). CONCLUSION Our study offers preliminary evidence that both the DL model and IM performed well in predicting implant fate within 5 years and thus may greatly facilitate implant practitioners in assessing preoperative risks.
Collapse
Affiliation(s)
- Nannan Huang
- Stomatological Hospital of Chongqing Medical University, Chongqing, P.R China.,Chongqing Key Laboratory of Oral Diseases and Biomedical Sciences, Chongqing, P.R China.,Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, P.R China
| | - Peng Liu
- Stomatological Hospital of Chongqing Medical University, Chongqing, P.R China
| | - Youlong Yan
- Stomatological Hospital of Chongqing Medical University, Chongqing, P.R China
| | - Ling Xu
- Stomatological Hospital of Chongqing Medical University, Chongqing, P.R China.,Chongqing Key Laboratory of Oral Diseases and Biomedical Sciences, Chongqing, P.R China.,Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, P.R China
| | - Yuanding Huang
- Stomatological Hospital of Chongqing Medical University, Chongqing, P.R China.,Chongqing Key Laboratory of Oral Diseases and Biomedical Sciences, Chongqing, P.R China.,Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, P.R China
| | - Gang Fu
- Stomatological Hospital of Chongqing Medical University, Chongqing, P.R China.,Chongqing Key Laboratory of Oral Diseases and Biomedical Sciences, Chongqing, P.R China.,Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, P.R China
| | - Yiqing Lan
- Stomatological Hospital of Chongqing Medical University, Chongqing, P.R China.,Chongqing Key Laboratory of Oral Diseases and Biomedical Sciences, Chongqing, P.R China.,Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, P.R China
| | - Sheng Yang
- Stomatological Hospital of Chongqing Medical University, Chongqing, P.R China.,Chongqing Key Laboratory of Oral Diseases and Biomedical Sciences, Chongqing, P.R China.,Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, P.R China
| | - Jinlin Song
- Stomatological Hospital of Chongqing Medical University, Chongqing, P.R China.,Chongqing Key Laboratory of Oral Diseases and Biomedical Sciences, Chongqing, P.R China.,Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, P.R China
| | - Yuzhou Li
- Stomatological Hospital of Chongqing Medical University, Chongqing, P.R China.,Chongqing Key Laboratory of Oral Diseases and Biomedical Sciences, Chongqing, P.R China.,Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, P.R China
| |
Collapse
|
31
|
Patient-level proteomic network prediction by explainable artificial intelligence. NPJ Precis Oncol 2022; 6:35. [PMID: 35672443 PMCID: PMC9174200 DOI: 10.1038/s41698-022-00278-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Accepted: 04/15/2022] [Indexed: 11/08/2022] Open
Abstract
Understanding the pathological properties of dysregulated protein networks in individual patients’ tumors is the basis for precision therapy. Functional experiments are commonly used, but cover only parts of the oncogenic signaling networks, whereas methods that reconstruct networks from omics data usually only predict average network features across tumors. Here, we show that the explainable AI method layer-wise relevance propagation (LRP) can infer protein interaction networks for individual patients from proteomic profiling data. LRP reconstructs average and individual interaction networks with an AUC of 0.99 and 0.93, respectively, and outperforms state-of-the-art network prediction methods for individual tumors. Using data from The Cancer Proteome Atlas, we identify known and potentially novel oncogenic network features, among which some are cancer-type specific and show only minor variation among patients, while others are present across certain tumor types but differ among individual patients. Our approach may therefore support predictive diagnostics in precision oncology by inferring “patient-level” oncogenic mechanisms.
Collapse
|
32
|
Zhang G, Sun B, Chen Z, Gao Y, Zhang Z, Li K, Yang W. Diabetic Retinopathy Grading by Deep Graph Correlation Network on Retinal Images Without Manual Annotations. Front Med (Lausanne) 2022; 9:872214. [PMID: 35492360 PMCID: PMC9046841 DOI: 10.3389/fmed.2022.872214] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 03/18/2022] [Indexed: 11/20/2022] Open
Abstract
Background Diabetic retinopathy, as a severe public health problem associated with vision loss, should be diagnosed early using an accurate screening tool. While many previous deep learning models have been proposed for this disease, they need sufficient professional annotation data to train the model, requiring more expensive and time-consuming screening skills. Method This study aims to economize manual power and proposes a deep graph correlation network (DGCN) to develop automated diabetic retinopathy grading without any professional annotations. DGCN involves the novel deep learning algorithm of a graph convolutional network to exploit inherent correlations from independent retinal image features learned by a convolutional neural network. Three designed loss functions of graph-center, pseudo-contrastive, and transformation-invariant constrain the optimisation and application of the DGCN model in an automated diabetic retinopathy grading task. Results To evaluate the DGCN model, this study employed EyePACS-1 and Messidor-2 sets to perform grading results. It achieved an accuracy of 89.9% (91.8%), sensitivity of 88.2% (90.2%), and specificity of 91.3% (93.0%) on EyePACS-1 (Messidor-2) data set with a confidence index of 95% and commendable effectiveness on receiver operating characteristic (ROC) curve and t-SNE plots. Conclusion The grading capability of this study is close to that of retina specialists, but superior to that of trained graders, which demonstrates that the proposed DGCN provides an innovative route for automated diabetic retinopathy grading and other computer-aided diagnostic systems.
Collapse
Affiliation(s)
- Guanghua Zhang
- Department of Intelligence and Automation, Taiyuan University, Taiyuan, China
- Graphics and Imaging Laboratory, University of Girona, Girona, Spain
| | - Bin Sun
- Shanxi Eye Hospital, Taiyuan, China
| | - Zhixian Chen
- Department of Intelligence and Automation, Taiyuan University, Taiyuan, China
| | - Yuxi Gao
- Shanxi Finance and Taxation College, Taiyuan, China
| | | | - Keran Li
- The Laboratory of Artificial Intelligence and Bigdata in Ophthalmology, The Affiliated Eye Hospital of Nanjing Medical University, Nanjing, China
- Keran Li,
| | - Weihua Yang
- The Laboratory of Artificial Intelligence and Bigdata in Ophthalmology, The Affiliated Eye Hospital of Nanjing Medical University, Nanjing, China
- *Correspondence: Weihua Yang,
| |
Collapse
|
33
|
Non-Systematic Weighted Satisfiability in Discrete Hopfield Neural Network Using Binary Artificial Bee Colony Optimization. MATHEMATICS 2022. [DOI: 10.3390/math10071129] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Recently, new variants of non-systematic satisfiability logic were proposed to govern Discrete Hopfield Neural Network. This new variant of satisfiability logical rule will provide flexibility and enhance the diversity of the neuron states in the Discrete Hopfield Neural Network. However, there is no systematic method to control and optimize the logical structure of non-systematic satisfiability. Additionally, the role of negative literals was neglected, reducing the expressivity of the information that the logical structure holds. This study proposed an additional optimization layer of Discrete Hopfield Neural Network called the logic phase that controls the distribution of negative literals in the logical structure. Hence, a new variant of non-systematic satisfiability named Weighted Random 2 Satisfiability was formulated. Thus, a proposed searching technique called the binary Artificial Bee Colony algorithm will ensure the correct distribution of the negative literals. It is worth mentioning that the binary Artificial Bee Colony has flexible and less free parameters where the modifications tackled on the objective function. Specifically, this study utilizes a binary Artificial Bee Colony algorithm by modifying the updating rule equation by using not and (NAND) logic gate operator. The performance of the binary Artificial Bee Colony will be compared with other variants of binary Artificial Bee Colony algorithms of different logic gate operators and conventional binary algorithms such as the Particle Swarm Optimization, Exhaustive Search, and Genetic Algorithm. The experimental results and comparison show that the proposed algorithm is compatible in finding the correct logical structure according to the initiate ratio of negative literal.
Collapse
|
34
|
Alachram H, Chereda H, Beißbarth T, Wingender E, Stegmaier P. Text mining-based word representations for biomedical data analysis and protein-protein interaction networks in machine learning tasks. PLoS One 2021; 16:e0258623. [PMID: 34653224 PMCID: PMC8519453 DOI: 10.1371/journal.pone.0258623] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Accepted: 10/01/2021] [Indexed: 11/18/2022] Open
Abstract
Biomedical and life science literature is an essential way to publish experimental results. With the rapid growth of the number of new publications, the amount of scientific knowledge represented in free text is increasing remarkably. There has been much interest in developing techniques that can extract this knowledge and make it accessible to aid scientists in discovering new relationships between biological entities and answering biological questions. Making use of the word2vec approach, we generated word vector representations based on a corpus consisting of over 16 million PubMed abstracts. We developed a text mining pipeline to produce word2vec embeddings with different properties and performed validation experiments to assess their utility for biomedical analysis. An important pre-processing step consisted in the substitution of synonymous terms by their preferred terms in biomedical databases. Furthermore, we extracted gene-gene networks from two embedding versions and used them as prior knowledge to train Graph-Convolutional Neural Networks (CNNs) on large breast cancer gene expression data and on other cancer datasets. Performances of resulting models were compared to Graph-CNNs trained with protein-protein interaction (PPI) networks or with networks derived using other word embedding algorithms. We also assessed the effect of corpus size on the variability of word representations. Finally, we created a web service with a graphical and a RESTful interface to extract and explore relations between biomedical terms using annotated embeddings. Comparisons to biological databases showed that relations between entities such as known PPIs, signaling pathways and cellular functions, or narrower disease ontology groups correlated with higher cosine similarity. Graph-CNNs trained with word2vec-embedding-derived networks performed sufficiently good for the metastatic event prediction tasks compared to other networks. Such performance was good enough to validate the utility of our generated word embeddings in constructing biological networks. Word representations as produced by text mining algorithms like word2vec, therefore are able to capture biologically meaningful relations between entities. Our generated embeddings are publicly available at https://github.com/genexplain/Word2vec-based-Networks/blob/main/README.md.
Collapse
Affiliation(s)
- Halima Alachram
- Department of Medical Bioinformatics, University Medical Center, Göttingen, Lower Saxony, Germany
| | - Hryhorii Chereda
- Department of Medical Bioinformatics, University Medical Center, Göttingen, Lower Saxony, Germany
| | - Tim Beißbarth
- Department of Medical Bioinformatics, University Medical Center, Göttingen, Lower Saxony, Germany
| | | | | |
Collapse
|
35
|
Tran KA, Kondrashova O, Bradley A, Williams ED, Pearson JV, Waddell N. Deep learning in cancer diagnosis, prognosis and treatment selection. Genome Med 2021; 13:152. [PMID: 34579788 PMCID: PMC8477474 DOI: 10.1186/s13073-021-00968-x] [Citation(s) in RCA: 256] [Impact Index Per Article: 85.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2020] [Accepted: 09/12/2021] [Indexed: 12/13/2022] Open
Abstract
Deep learning is a subdiscipline of artificial intelligence that uses a machine learning technique called artificial neural networks to extract patterns and make predictions from large data sets. The increasing adoption of deep learning across healthcare domains together with the availability of highly characterised cancer datasets has accelerated research into the utility of deep learning in the analysis of the complex biology of cancer. While early results are promising, this is a rapidly evolving field with new knowledge emerging in both cancer biology and deep learning. In this review, we provide an overview of emerging deep learning techniques and how they are being applied to oncology. We focus on the deep learning applications for omics data types, including genomic, methylation and transcriptomic data, as well as histopathology-based genomic inference, and provide perspectives on how the different data types can be integrated to develop decision support tools. We provide specific examples of how deep learning may be applied in cancer diagnosis, prognosis and treatment management. We also assess the current limitations and challenges for the application of deep learning in precision oncology, including the lack of phenotypically rich data and the need for more explainable deep learning models. Finally, we conclude with a discussion of how current obstacles can be overcome to enable future clinical utilisation of deep learning.
Collapse
Affiliation(s)
- Khoa A. Tran
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, 4006 Australia
- School of Biomedical Sciences, Faculty of Health, Queensland University of Technology (QUT), Brisbane, 4059 Australia
| | - Olga Kondrashova
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, 4006 Australia
| | - Andrew Bradley
- Faculty of Engineering, Queensland University of Technology (QUT), Brisbane, 4000 Australia
| | - Elizabeth D. Williams
- School of Biomedical Sciences, Faculty of Health, Queensland University of Technology (QUT), Brisbane, 4059 Australia
- Australian Prostate Cancer Research Centre - Queensland (APCRC-Q) and Queensland Bladder Cancer Initiative (QBCI), Brisbane, 4102 Australia
| | - John V. Pearson
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, 4006 Australia
| | - Nicola Waddell
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, 4006 Australia
| |
Collapse
|
36
|
Westerlund AM, Hawe JS, Heinig M, Schunkert H. Risk Prediction of Cardiovascular Events by Exploration of Molecular Data with Explainable Artificial Intelligence. Int J Mol Sci 2021; 22:10291. [PMID: 34638627 PMCID: PMC8508897 DOI: 10.3390/ijms221910291] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 09/17/2021] [Accepted: 09/18/2021] [Indexed: 12/11/2022] Open
Abstract
Cardiovascular diseases (CVD) annually take almost 18 million lives worldwide. Most lethal events occur months or years after the initial presentation. Indeed, many patients experience repeated complications or require multiple interventions (recurrent events). Apart from affecting the individual, this leads to high medical costs for society. Personalized treatment strategies aiming at prediction and prevention of recurrent events rely on early diagnosis and precise prognosis. Complementing the traditional environmental and clinical risk factors, multi-omics data provide a holistic view of the patient and disease progression, enabling studies to probe novel angles in risk stratification. Specifically, predictive molecular markers allow insights into regulatory networks, pathways, and mechanisms underlying disease. Moreover, artificial intelligence (AI) represents a powerful, yet adaptive, framework able to recognize complex patterns in large-scale clinical and molecular data with the potential to improve risk prediction. Here, we review the most recent advances in risk prediction of recurrent cardiovascular events, and discuss the value of molecular data and biomarkers for understanding patient risk in a systems biology context. Finally, we introduce explainable AI which may improve clinical decision systems by making predictions transparent to the medical practitioner.
Collapse
Affiliation(s)
- Annie M. Westerlund
- Department of Cardiology, Deutsches Herzzentrum München, Technical University Munich, Lazarettstrasse 36, 80636 Munich, Germany; (A.M.W.); (J.S.H.)
- Institute of Computational Biology, HelmholtzZentrum München, Ingolstädter Landstrasse 1, 85764 Munich, Germany
| | - Johann S. Hawe
- Department of Cardiology, Deutsches Herzzentrum München, Technical University Munich, Lazarettstrasse 36, 80636 Munich, Germany; (A.M.W.); (J.S.H.)
| | - Matthias Heinig
- Institute of Computational Biology, HelmholtzZentrum München, Ingolstädter Landstrasse 1, 85764 Munich, Germany
- Department of Informatics, Technical University Munich, Boltzmannstrasse 3, 85748 Garching, Germany
| | - Heribert Schunkert
- Department of Cardiology, Deutsches Herzzentrum München, Technical University Munich, Lazarettstrasse 36, 80636 Munich, Germany; (A.M.W.); (J.S.H.)
- Deutsches Zentrum für Herz- und Kreislaufforschung (DZHK), Munich Heart Alliance, Biedersteiner Strasse 29, 80802 Munich, Germany
| |
Collapse
|
37
|
Zhang XM, Liang L, Liu L, Tang MJ. Graph Neural Networks and Their Current Applications in Bioinformatics. Front Genet 2021; 12:690049. [PMID: 34394185 PMCID: PMC8360394 DOI: 10.3389/fgene.2021.690049] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Accepted: 05/28/2021] [Indexed: 12/22/2022] Open
Abstract
Graph neural networks (GNNs), as a branch of deep learning in non-Euclidean space, perform particularly well in various tasks that process graph structure data. With the rapid accumulation of biological network data, GNNs have also become an important tool in bioinformatics. In this research, a systematic survey of GNNs and their advances in bioinformatics is presented from multiple perspectives. We first introduce some commonly used GNN models and their basic principles. Then, three representative tasks are proposed based on the three levels of structural information that can be learned by GNNs: node classification, link prediction, and graph generation. Meanwhile, according to the specific applications for various omics data, we categorize and discuss the related studies in three aspects: disease prediction, drug discovery, and biomedical imaging. Based on the analysis, we provide an outlook on the shortcomings of current studies and point out their developing prospect. Although GNNs have achieved excellent results in many biological tasks at present, they still face challenges in terms of low-quality data processing, methodology, and interpretability and have a long road ahead. We believe that GNNs are potentially an excellent method that solves various biological problems in bioinformatics research.
Collapse
Affiliation(s)
- Xiao-Meng Zhang
- School of Information, Yunnan Normal University, Kunming, China
| | - Li Liang
- School of Information, Yunnan Normal University, Kunming, China
| | - Lin Liu
- School of Information, Yunnan Normal University, Kunming, China
- Key Laboratory of Educational Informatization for Nationalities Ministry of Education, Yunnan Normal University, Kunming, China
| | - Ming-Jing Tang
- Key Laboratory of Educational Informatization for Nationalities Ministry of Education, Yunnan Normal University, Kunming, China
- School of Life Sciences, Yunnan Normal University, Kunming, China
| |
Collapse
|
38
|
Komatsu M, Sakai A, Dozen A, Shozu K, Yasutomi S, Machino H, Asada K, Kaneko S, Hamamoto R. Towards Clinical Application of Artificial Intelligence in Ultrasound Imaging. Biomedicines 2021; 9:720. [PMID: 34201827 PMCID: PMC8301304 DOI: 10.3390/biomedicines9070720] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Revised: 06/13/2021] [Accepted: 06/18/2021] [Indexed: 12/12/2022] Open
Abstract
Artificial intelligence (AI) is being increasingly adopted in medical research and applications. Medical AI devices have continuously been approved by the Food and Drug Administration in the United States and the responsible institutions of other countries. Ultrasound (US) imaging is commonly used in an extensive range of medical fields. However, AI-based US imaging analysis and its clinical implementation have not progressed steadily compared to other medical imaging modalities. The characteristic issues of US imaging owing to its manual operation and acoustic shadows cause difficulties in image quality control. In this review, we would like to introduce the global trends of medical AI research in US imaging from both clinical and basic perspectives. We also discuss US image preprocessing, ingenious algorithms that are suitable for US imaging analysis, AI explainability for obtaining informed consent, the approval process of medical AI devices, and future perspectives towards the clinical application of AI-based US diagnostic support technologies.
Collapse
Affiliation(s)
- Masaaki Komatsu
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan; (H.M.); (K.A.); (S.K.)
- Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; (A.D.); (K.S.)
| | - Akira Sakai
- Artificial Intelligence Laboratory, Research Unit, Fujitsu Research, Fujitsu Ltd., 4-1-1 Kamikodanaka, Nakahara-ku, Kawasaki, Kanagawa 211-8588, Japan; (A.S.); (S.Y.)
- RIKEN AIP—Fujitsu Collaboration Center, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan
- Biomedical Science and Engineering Track, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, 1-5-45 Yushima, Bunkyo-ku, Tokyo 113-8510, Japan
| | - Ai Dozen
- Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; (A.D.); (K.S.)
| | - Kanto Shozu
- Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; (A.D.); (K.S.)
| | - Suguru Yasutomi
- Artificial Intelligence Laboratory, Research Unit, Fujitsu Research, Fujitsu Ltd., 4-1-1 Kamikodanaka, Nakahara-ku, Kawasaki, Kanagawa 211-8588, Japan; (A.S.); (S.Y.)
- RIKEN AIP—Fujitsu Collaboration Center, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan
| | - Hidenori Machino
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan; (H.M.); (K.A.); (S.K.)
- Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; (A.D.); (K.S.)
| | - Ken Asada
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan; (H.M.); (K.A.); (S.K.)
- Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; (A.D.); (K.S.)
| | - Syuzo Kaneko
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan; (H.M.); (K.A.); (S.K.)
- Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; (A.D.); (K.S.)
| | - Ryuji Hamamoto
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan; (H.M.); (K.A.); (S.K.)
- Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; (A.D.); (K.S.)
- Biomedical Science and Engineering Track, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, 1-5-45 Yushima, Bunkyo-ku, Tokyo 113-8510, Japan
| |
Collapse
|