1
|
Park S, Hong CH, Son SJ, Roh HW, Kim D, Shin H, Woo HG. Identification of molecular subtypes of dementia by using blood-proteins interaction-aware graph propagational network. Brief Bioinform 2024; 25:bbae428. [PMID: 39226887 PMCID: PMC11370639 DOI: 10.1093/bib/bbae428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Revised: 07/26/2024] [Accepted: 08/15/2024] [Indexed: 09/05/2024] Open
Abstract
Plasma protein biomarkers have been considered promising tools for diagnosing dementia subtypes due to their low variability, cost-effectiveness, and minimal invasiveness in diagnostic procedures. Machine learning (ML) methods have been applied to enhance accuracy of the biomarker discovery. However, previous ML-based studies often overlook interactions between proteins, which are crucial in complex disorders like dementia. While protein-protein interactions (PPIs) have been used in network models, these models often fail to fully capture the diverse properties of PPIs due to their local awareness. This drawback increases the chance of neglecting critical components and magnifying the impact of noisy interactions. In this study, we propose a novel graph-based ML model for dementia subtype diagnosis, the graph propagational network (GPN). By propagating the independent effect of plasma proteins on PPI network, the GPN extracts the globally interactive effects between proteins. Experimental results showed that the interactive effect between proteins yielded to further clarify the differences between dementia subtype groups and contributed to the performance improvement where the GPN outperformed existing methods by 10.4% on average.
Collapse
Affiliation(s)
- Sunghong Park
- Department of Physiology, Ajou University School of Medicine, Worldcup-ro 164, Yeongtong-gu, Suwon, 16499, Republic of Korea
| | - Chang Hyung Hong
- Department of Psychiatry, Ajou University School of Medicine, Woldcup-ro 164, Yeongtong-gu, Suwon, 16499, Republic of Korea
| | - Sang Joon Son
- Department of Psychiatry, Ajou University School of Medicine, Woldcup-ro 164, Yeongtong-gu, Suwon, 16499, Republic of Korea
| | - Hyun Woong Roh
- Department of Psychiatry, Ajou University School of Medicine, Woldcup-ro 164, Yeongtong-gu, Suwon, 16499, Republic of Korea
| | - Doyoon Kim
- Department of Physiology, Ajou University School of Medicine, Worldcup-ro 164, Yeongtong-gu, Suwon, 16499, Republic of Korea
- Department of Biomedical Science, Graduate School, Ajou University, Worldcup-ro 164, Yeongtong-gu, Suwon, 16499, Republic of Korea
| | - Hyunjung Shin
- Department of Industrial Engineering, Ajou University, Worldcup-ro 206, Yeongtong-gu, Suwon, 16499, Republic of Korea
- Department of Artificial Intelligence, Ajou University, Worldcup-ro 206, Yeongtong-gu, Suwon, 16499, Republic of Korea
| | - Hyun Goo Woo
- Department of Physiology, Ajou University School of Medicine, Worldcup-ro 164, Yeongtong-gu, Suwon, 16499, Republic of Korea
- Department of Biomedical Science, Graduate School, Ajou University, Worldcup-ro 164, Yeongtong-gu, Suwon, 16499, Republic of Korea
- Ajou Translational Omics Center (ATOC), Research Institute for Innovative Medicine, Ajou University Medical Center, Worldcup-ro 164, Yeongtong-gu, Suwon, 16499, Republic of Korea
| |
Collapse
|
2
|
Mi H, Sivagnanam S, Ho WJ, Zhang S, Bergman D, Deshpande A, Baras AS, Jaffee EM, Coussens LM, Fertig EJ, Popel AS. Computational methods and biomarker discovery strategies for spatial proteomics: a review in immuno-oncology. Brief Bioinform 2024; 25:bbae421. [PMID: 39179248 PMCID: PMC11343572 DOI: 10.1093/bib/bbae421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Revised: 07/11/2024] [Accepted: 08/09/2024] [Indexed: 08/26/2024] Open
Abstract
Advancements in imaging technologies have revolutionized our ability to deeply profile pathological tissue architectures, generating large volumes of imaging data with unparalleled spatial resolution. This type of data collection, namely, spatial proteomics, offers invaluable insights into various human diseases. Simultaneously, computational algorithms have evolved to manage the increasing dimensionality of spatial proteomics inherent in this progress. Numerous imaging-based computational frameworks, such as computational pathology, have been proposed for research and clinical applications. However, the development of these fields demands diverse domain expertise, creating barriers to their integration and further application. This review seeks to bridge this divide by presenting a comprehensive guideline. We consolidate prevailing computational methods and outline a roadmap from image processing to data-driven, statistics-informed biomarker discovery. Additionally, we explore future perspectives as the field moves toward interfacing with other quantitative domains, holding significant promise for precision care in immuno-oncology.
Collapse
Affiliation(s)
- Haoyang Mi
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD 21205, United States
| | - Shamilene Sivagnanam
- The Knight Cancer Institute, Oregon Health and Science University, Portland, OR 97201, United States
- Department of Cell, Development and Cancer Biology, Oregon Health and Science University, Portland, OR 97201, United States
| | - Won Jin Ho
- Department of Oncology, Johns Hopkins University School of Medicine, MD 21205, United States
- Convergence Institute, Johns Hopkins University, Baltimore, MD 21205, United States
| | - Shuming Zhang
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD 21205, United States
| | - Daniel Bergman
- Department of Oncology, Johns Hopkins University School of Medicine, MD 21205, United States
- Convergence Institute, Johns Hopkins University, Baltimore, MD 21205, United States
| | - Atul Deshpande
- Department of Oncology, Johns Hopkins University School of Medicine, MD 21205, United States
- Convergence Institute, Johns Hopkins University, Baltimore, MD 21205, United States
- Bloomberg-Kimmel Institute for Cancer Immunotherapy, Johns Hopkins University School of Medicine, Baltimore, MD 21205, United States
| | - Alexander S Baras
- Bloomberg-Kimmel Institute for Cancer Immunotherapy, Johns Hopkins University School of Medicine, Baltimore, MD 21205, United States
- Department of Pathology, Johns Hopkins University School of Medicine, MD 21205, United States
- The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21205, United States
| | - Elizabeth M Jaffee
- Department of Oncology, Johns Hopkins University School of Medicine, MD 21205, United States
- Convergence Institute, Johns Hopkins University, Baltimore, MD 21205, United States
- Bloomberg-Kimmel Institute for Cancer Immunotherapy, Johns Hopkins University School of Medicine, Baltimore, MD 21205, United States
| | - Lisa M Coussens
- The Knight Cancer Institute, Oregon Health and Science University, Portland, OR 97201, United States
- Department of Cell, Development and Cancer Biology, Oregon Health and Science University, Portland, OR 97201, United States
- Brenden-Colson Center for Pancreatic Care, Oregon Health and Science University, Portland, OR 97201, United States
| | - Elana J Fertig
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD 21205, United States
- Department of Oncology, Johns Hopkins University School of Medicine, MD 21205, United States
- Convergence Institute, Johns Hopkins University, Baltimore, MD 21205, United States
- Bloomberg-Kimmel Institute for Cancer Immunotherapy, Johns Hopkins University School of Medicine, Baltimore, MD 21205, United States
- Department of Applied Mathematics and Statistics, Johns Hopkins University Whiting School of Engineering, Baltimore, MD 21218, United States
| | - Aleksander S Popel
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD 21205, United States
- Department of Oncology, Johns Hopkins University School of Medicine, MD 21205, United States
| |
Collapse
|
3
|
Chereda H, Leha A, Beißbarth T. Stable feature selection utilizing Graph Convolutional Neural Network and Layer-wise Relevance Propagation for biomarker discovery in breast cancer. Artif Intell Med 2024; 151:102840. [PMID: 38658129 DOI: 10.1016/j.artmed.2024.102840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 03/05/2024] [Accepted: 03/10/2024] [Indexed: 04/26/2024]
Abstract
High-throughput technologies are becoming increasingly important in discovering prognostic biomarkers and in identifying novel drug targets. With Mammaprint, Oncotype DX, and many other prognostic molecular signatures breast cancer is one of the paradigmatic examples of the utility of high-throughput data to deliver prognostic biomarkers, that can be represented in a form of a rather short gene list. Such gene lists can be obtained as a set of features (genes) that are important for the decisions of a Machine Learning (ML) method applied to high-dimensional gene expression data. Several studies have identified predictive gene lists for patient prognosis in breast cancer, but these lists are unstable and have only a few genes in common. Instability of feature selection impedes biological interpretability: genes that are relevant for cancer pathology should be members of any predictive gene list obtained for the same clinical type of patients. Stability and interpretability of selected features can be improved by including information on molecular networks in ML methods. Graph Convolutional Neural Network (GCNN) is a contemporary deep learning approach applicable to gene expression data structured by a prior knowledge molecular network. Layer-wise Relevance Propagation (LRP) and SHapley Additive exPlanations (SHAP) are methods to explain individual decisions of deep learning models. We used both GCNN+LRP and GCNN+SHAP techniques to construct feature sets by aggregating individual explanations. We suggest a methodology to systematically and quantitatively analyze the stability, the impact on the classification performance, and the interpretability of the selected feature sets. We used this methodology to compare GCNN+LRP to GCNN+SHAP and to more classical ML-based feature selection approaches. Utilizing a large breast cancer gene expression dataset we show that, while feature selection with SHAP is useful in applications where selected features have to be impactful for classification performance, among all studied methods GCNN+LRP delivers the most stable (reproducible) and interpretable gene lists.
Collapse
Affiliation(s)
- Hryhorii Chereda
- Medical Bioinformatics, University Medical Center Göttingen, Goldschmidtstraße 1, Göttingen, 37077, Germany
| | - Andreas Leha
- Medical Bioinformatics, University Medical Center Göttingen, Goldschmidtstraße 1, Göttingen, 37077, Germany; Medical Statistics, University Medical Center Göttingen, Humboldtallee 32, Göttingen, 37073, Germany; Scientific Core Facility Medical Biometry and Statistical Bioinformatics, University Medical Center Göttingen, Humboldtallee 32, Göttingen, 37073, Germany
| | - Tim Beißbarth
- Medical Bioinformatics, University Medical Center Göttingen, Goldschmidtstraße 1, Göttingen, 37077, Germany; Campus-Institute Data Science (CIDAS), University of Göttingen, Goldschmidtstraße 1, Göttingen, 37077, Germany.
| |
Collapse
|
4
|
Jubran J, Slutsky R, Rozenblum N, Rokach L, Ben-David U, Yeger-Lotem E. Machine-learning analysis reveals an important role for negative selection in shaping cancer aneuploidy landscapes. Genome Biol 2024; 25:95. [PMID: 38622679 PMCID: PMC11020441 DOI: 10.1186/s13059-024-03225-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 03/26/2024] [Indexed: 04/17/2024] Open
Abstract
BACKGROUND Aneuploidy, an abnormal number of chromosomes within a cell, is a hallmark of cancer. Patterns of aneuploidy differ across cancers, yet are similar in cancers affecting closely related tissues. The selection pressures underlying aneuploidy patterns are not fully understood, hindering our understanding of cancer development and progression. RESULTS Here, we apply interpretable machine learning methods to study tissue-selective aneuploidy patterns. We define 20 types of features corresponding to genomic attributes of chromosome-arms, normal tissues, primary tumors, and cancer cell lines (CCLs), and use them to model gains and losses of chromosome arms in 24 cancer types. To reveal the factors that shape the tissue-specific cancer aneuploidy landscapes, we interpret the machine learning models by estimating the relative contribution of each feature to the models. While confirming known drivers of positive selection, our quantitative analysis highlights the importance of negative selection for shaping aneuploidy landscapes. This is exemplified by tumor suppressor gene density being a better predictor of gain patterns than oncogene density, and vice versa for loss patterns. We also identify the importance of tissue-selective features and demonstrate them experimentally, revealing KLF5 as an important driver for chr13q gain in colon cancer. Further supporting an important role for negative selection in shaping the aneuploidy landscapes, we find compensation by paralogs to be among the top predictors of chromosome arm loss prevalence and demonstrate this relationship for one paralog interaction. Similar factors shape aneuploidy patterns in human CCLs, demonstrating their relevance for aneuploidy research. CONCLUSIONS Our quantitative, interpretable machine learning models improve the understanding of the genomic properties that shape cancer aneuploidy landscapes.
Collapse
Affiliation(s)
- Juman Jubran
- Department of Clinical Biochemistry and Pharmacology, Ben-Gurion University of the Negev, 84105, Beer Sheva, Israel
| | - Rachel Slutsky
- Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Nir Rozenblum
- Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Lior Rokach
- Department of Software & Information Systems Engineering, Ben-Gurion University of the Negev, 84105, Beer Sheva, Israel
| | - Uri Ben-David
- Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
| | - Esti Yeger-Lotem
- Department of Clinical Biochemistry and Pharmacology, Ben-Gurion University of the Negev, 84105, Beer Sheva, Israel.
- The National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, 84105, Beer Sheva, Israel.
| |
Collapse
|
5
|
Yan H, Weng D, Li D, Gu Y, Ma W, Liu Q. Prior knowledge-guided multilevel graph neural network for tumor risk prediction and interpretation via multi-omics data integration. Brief Bioinform 2024; 25:bbae184. [PMID: 38670157 PMCID: PMC11052635 DOI: 10.1093/bib/bbae184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/11/2024] [Accepted: 04/06/2024] [Indexed: 04/28/2024] Open
Abstract
The interrelation and complementary nature of multi-omics data can provide valuable insights into the intricate molecular mechanisms underlying diseases. However, challenges such as limited sample size, high data dimensionality and differences in omics modalities pose significant obstacles to fully harnessing the potential of these data. The prior knowledge such as gene regulatory network and pathway information harbors useful gene-gene interaction and gene functional module information. To effectively integrate multi-omics data and make full use of the prior knowledge, here, we propose a Multilevel-graph neural network (GNN): a hierarchically designed deep learning algorithm that sequentially leverages multi-omics data, gene regulatory networks and pathway information to extract features and enhance accuracy in predicting survival risk. Our method achieved better accuracy compared with existing methods. Furthermore, key factors nonlinearly associated with the tumor pathogenesis are prioritized by employing two interpretation algorithms (i.e. GNN-Explainer and IGscore) for neural networks, at gene and pathway level, respectively. The top genes and pathways exhibit strong associations with disease in survival analyses, many of which such as SEC61G and CYP27B1 are previously reported in the literature.
Collapse
Affiliation(s)
- Hongxi Yan
- Department of Computer Science, Beihang University, XueYuan Road, 100191, BeiJing, China
| | - Dawei Weng
- School of Biomedical Engineering, Capital Medical University, 10 You An Men WaiXi Tou Tiao, 100069, Beijing, China
| | - Dongguo Li
- School of Biomedical Engineering, Capital Medical University, 10 You An Men WaiXi Tou Tiao, 100069, Beijing, China
| | - Yu Gu
- School of Biomedical Engineering, Capital Medical University, 10 You An Men WaiXi Tou Tiao, 100069, Beijing, China
| | - Wenji Ma
- Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, 227 South Chongqing Road, 200025, Shanghai, China
| | - Qingjie Liu
- Department of Computer Science, Beihang University, XueYuan Road, 100191, BeiJing, China
| |
Collapse
|
6
|
Luo H, Liang H, Liu H, Fan Z, Wei Y, Yao X, Cong S. TEMINET: A Co-Informative and Trustworthy Multi-Omics Integration Network for Diagnostic Prediction. Int J Mol Sci 2024; 25:1655. [PMID: 38338932 PMCID: PMC10855161 DOI: 10.3390/ijms25031655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 01/20/2024] [Accepted: 01/26/2024] [Indexed: 02/12/2024] Open
Abstract
Advancing the domain of biomedical investigation, integrated multi-omics data have shown exceptional performance in elucidating complex human diseases. However, as the variety of omics information expands, precisely perceiving the informativeness of intra- and inter-omics becomes challenging due to the intricate interrelations, thus presenting significant challenges in the integration of multi-omics data. To address this, we introduce a novel multi-omics integration approach, referred to as TEMINET. This approach enhances diagnostic prediction by leveraging an intra-omics co-informative representation module and a trustworthy learning strategy used to address inter-omics fusion. Considering the multifactorial nature of complex diseases, TEMINET utilizes intra-omics features to construct disease-specific networks; then, it applies graph attention networks and a multi-level framework to capture more collective informativeness than pairwise relations. To perceive the contribution of co-informative representations within intra-omics, we designed a trustworthy learning strategy to identify the reliability of each omics in integration. To integrate inter-omics information, a combined-beliefs fusion approach is deployed to harmonize the trustworthy representations of different omics types effectively. Our experiments across four different diseases using mRNA, methylation, and miRNA data demonstrate that TEMINET achieves advanced performance and robustness in classification tasks.
Collapse
Affiliation(s)
- Haoran Luo
- Qingdao Innovation and Development Center, Harbin Engineering University, Qingdao 266000, China; (H.L.); (Z.F.)
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China; (H.L.); (H.L.); (Y.W.)
| | - Hong Liang
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China; (H.L.); (H.L.); (Y.W.)
| | - Hongwei Liu
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China; (H.L.); (H.L.); (Y.W.)
| | - Zhoujie Fan
- Qingdao Innovation and Development Center, Harbin Engineering University, Qingdao 266000, China; (H.L.); (Z.F.)
| | - Yanhui Wei
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China; (H.L.); (H.L.); (Y.W.)
| | - Xiaohui Yao
- Qingdao Innovation and Development Center, Harbin Engineering University, Qingdao 266000, China; (H.L.); (Z.F.)
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China; (H.L.); (H.L.); (Y.W.)
| | - Shan Cong
- Qingdao Innovation and Development Center, Harbin Engineering University, Qingdao 266000, China; (H.L.); (Z.F.)
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China; (H.L.); (H.L.); (Y.W.)
| |
Collapse
|
7
|
Brouard C, Mourad R, Vialaneix N. Should we really use graph neural networks for transcriptomic prediction? Brief Bioinform 2024; 25:bbae027. [PMID: 38349060 PMCID: PMC10939369 DOI: 10.1093/bib/bbae027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 12/20/2023] [Accepted: 01/17/2024] [Indexed: 02/15/2024] Open
Abstract
The recent development of deep learning methods have undoubtedly led to great improvement in various machine learning tasks, especially in prediction tasks. This type of methods have also been adapted to answer various problems in bioinformatics, including automatic genome annotation, artificial genome generation or phenotype prediction. In particular, a specific type of deep learning method, called graph neural network (GNN) has repeatedly been reported as a good candidate to predict phenotypes from gene expression because its ability to embed information on gene regulation or co-expression through the use of a gene network. However, up to date, no complete and reproducible benchmark has ever been performed to analyze the trade-off between cost and benefit of this approach compared to more standard (and simpler) machine learning methods. In this article, we provide such a benchmark, based on clear and comparable policies to evaluate the different methods on several datasets. Our conclusion is that GNN rarely provides a real improvement in prediction performance, especially when compared to the computation effort required by the methods. Our findings on a limited but controlled simulated dataset shows that this could be explained by the limited quality or predictive power of the input biological gene network itself.
Collapse
Affiliation(s)
- Céline Brouard
- Université Fédérale de Toulouse, INRAE, MIAT, 31326 Castanet-Tolosan, France
| | - Raphaël Mourad
- Université Fédérale de Toulouse, INRAE, MIAT, 31326 Castanet-Tolosan, France
- Université Paul Sabatier, 31062 Toulouse, France
| | - Nathalie Vialaneix
- Université Fédérale de Toulouse, INRAE, MIAT, 31326 Castanet-Tolosan, France
| |
Collapse
|
8
|
Li B, Nabavi S. A multimodal graph neural network framework for cancer molecular subtype classification. BMC Bioinformatics 2024; 25:27. [PMID: 38225583 PMCID: PMC10789042 DOI: 10.1186/s12859-023-05622-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 12/15/2023] [Indexed: 01/17/2024] Open
Abstract
BACKGROUND The recent development of high-throughput sequencing has created a large collection of multi-omics data, which enables researchers to better investigate cancer molecular profiles and cancer taxonomy based on molecular subtypes. Integrating multi-omics data has been proven to be effective for building more precise classification models. Most current multi-omics integrative models use either an early fusion in the form of concatenation or late fusion with a separate feature extractor for each omic, which are mainly based on deep neural networks. Due to the nature of biological systems, graphs are a better structural representation of bio-medical data. Although few graph neural network (GNN) based multi-omics integrative methods have been proposed, they suffer from three common disadvantages. One is most of them use only one type of connection, either inter-omics or intra-omic connection; second, they only consider one kind of GNN layer, either graph convolution network (GCN) or graph attention network (GAT); and third, most of these methods have not been tested on a more complex classification task, such as cancer molecular subtypes. RESULTS In this study, we propose a novel end-to-end multi-omics GNN framework for accurate and robust cancer subtype classification. The proposed model utilizes multi-omics data in the form of heterogeneous multi-layer graphs, which combine both inter-omics and intra-omic connections from established biological knowledge. The proposed model incorporates learned graph features and global genome features for accurate classification. We tested the proposed model on the Cancer Genome Atlas (TCGA) Pan-cancer dataset and TCGA breast invasive carcinoma (BRCA) dataset for molecular subtype and cancer subtype classification, respectively. The proposed model shows superior performance compared to four current state-of-the-art baseline models in terms of accuracy, F1 score, precision, and recall. The comparative analysis of GAT-based models and GCN-based models reveals that GAT-based models are preferred for smaller graphs with less information and GCN-based models are preferred for larger graphs with extra information.
Collapse
Affiliation(s)
- Bingjun Li
- Department of Computer Science and Engineering, University of Connecticut, Storrs, USA
| | - Sheida Nabavi
- Department of Computer Science and Engineering, University of Connecticut, Storrs, USA.
| |
Collapse
|
9
|
Zou J, Shah O, Chiu YC, Ma T, Atkinson JM, Oesterreich S, Lee AV, Tseng GC. Systems approach for congruence and selection of cancer models towards precision medicine. PLoS Comput Biol 2024; 20:e1011754. [PMID: 38198519 PMCID: PMC10805322 DOI: 10.1371/journal.pcbi.1011754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 01/23/2024] [Accepted: 12/12/2023] [Indexed: 01/12/2024] Open
Abstract
Cancer models are instrumental as a substitute for human studies and to expedite basic, translational, and clinical cancer research. For a given cancer type, a wide selection of models, such as cell lines, patient-derived xenografts, organoids and genetically modified murine models, are often available to researchers. However, how to quantify their congruence to human tumors and to select the most appropriate cancer model is a largely unsolved issue. Here, we present Congruence Analysis and Selection of CAncer Models (CASCAM), a statistical and machine learning framework for authenticating and selecting the most representative cancer models in a pathway-specific manner using transcriptomic data. CASCAM provides harmonization between human tumor and cancer model omics data, systematic congruence quantification, and pathway-based topological visualization to determine the most appropriate cancer model selection. The systems approach is presented using invasive lobular breast carcinoma (ILC) subtype and suggesting CAMA1 followed by UACC3133 as the most representative cell lines for ILC research. Two additional case studies for triple negative breast cancer (TNBC) and patient-derived xenograft/organoid (PDX/PDO) are further investigated. CASCAM is generalizable to any cancer subtype and will authenticate cancer models for faithful non-human preclinical research towards precision medicine.
Collapse
Affiliation(s)
- Jian Zou
- Department of Statistics, School of Public Health, Chongqing Medical University, Chongqing, China
| | - Osama Shah
- Women’s Cancer Research Center, UPMC Hillman Cancer Center (HCC), Pittsburgh, Pennsylvania, United States of America
- Magee-Womens Research Institute, Pittsburgh, Pennsylvania, United States of America
- Department of Pharmacology & Chemical Biology, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Yu-Chiao Chiu
- Cancer Therapeutics Program, UPMC Hillman Cancer Center (HCC), Pittsburgh, Pennsylvania, United States of America
- Department of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Tianzhou Ma
- Department of Epidemiology and Biostatistics, University of Maryland, College Park, Maryland, United States of America
| | - Jennifer M. Atkinson
- Women’s Cancer Research Center, UPMC Hillman Cancer Center (HCC), Pittsburgh, Pennsylvania, United States of America
- Magee-Womens Research Institute, Pittsburgh, Pennsylvania, United States of America
- Department of Pharmacology & Chemical Biology, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Steffi Oesterreich
- Women’s Cancer Research Center, UPMC Hillman Cancer Center (HCC), Pittsburgh, Pennsylvania, United States of America
- Magee-Womens Research Institute, Pittsburgh, Pennsylvania, United States of America
- Department of Pharmacology & Chemical Biology, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Adrian V. Lee
- Women’s Cancer Research Center, UPMC Hillman Cancer Center (HCC), Pittsburgh, Pennsylvania, United States of America
- Magee-Womens Research Institute, Pittsburgh, Pennsylvania, United States of America
- Department of Pharmacology & Chemical Biology, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - George C. Tseng
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
10
|
Hassan T, Li Z, Javed S, Dias J, Werghi N. Neural Graph Refinement for Robust Recognition of Nuclei Communities in Histopathological Landscape. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 33:241-256. [PMID: 38064329 DOI: 10.1109/tip.2023.3337666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2023]
Abstract
Accurate classification of nuclei communities is an important step towards timely treating the cancer spread. Graph theory provides an elegant way to represent and analyze nuclei communities within the histopathological landscape in order to perform tissue phenotyping and tumor profiling tasks. Many researchers have worked on recognizing nuclei regions within the histology images in order to grade cancerous progression. However, due to the high structural similarities between nuclei communities, defining a model that can accurately differentiate between nuclei pathological patterns still needs to be solved. To surmount this challenge, we present a novel approach, dubbed neural graph refinement, that enhances the capabilities of existing models to perform nuclei recognition tasks by employing graph representational learning and broadcasting processes. Based on the physical interaction of the nuclei, we first construct a fully connected graph in which nodes represent nuclei and adjacent nodes are connected to each other via an undirected edge. For each edge and node pair, appearance and geometric features are computed and are then utilized for generating the neural graph embeddings. These embeddings are used for diffusing contextual information to the neighboring nodes, all along a path traversing the whole graph to infer global information over an entire nuclei network and predict pathologically meaningful communities. Through rigorous evaluation of the proposed scheme across four public datasets, we showcase that learning such communities through neural graph refinement produces better results that outperform state-of-the-art methods.
Collapse
|
11
|
Bhonde SB, Wagh SK, Prasad JR. Identification of cancer types from gene expressions using learning techniques. Comput Methods Biomech Biomed Engin 2023; 26:1951-1965. [PMID: 36562388 DOI: 10.1080/10255842.2022.2160243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 10/15/2022] [Accepted: 11/15/2022] [Indexed: 12/24/2022]
Abstract
Tumor is the major cause of death all around the world in recent days. Early detection and prediction of a cancer type are important for a patient's well-being. Functional genomic data has recently been used in the effective and early detection of cancer. According to previous research, the use of microarray data in cancer prediction has evidenced two main problems as high dimensionality and limited sample size. Several researchers have used numerous statistical and machine learning-based methods to classify cancer types but still, limitations are there which makes cancer classification a difficult job. Deep Learning (DL) and Convolutional Neural Networks (CNN) have been proven with effective analyses of unstructured data including gene expression data. In the proposed method gene expression data for five types of cancer is collected from The Cancer Genome Atlas (TCGA). Prominent features are selected using a hybrid Particle Swarm Optimization (PSO) and Random Forest (RF) algorithm followed by the use of Principal Component Analysis (PCA) for dimensionality reduction. Finally, for classification blend of Convolutional Neural Network (CNN) and Bi-directional Long Short Term Memory (Bi-LSTM) is used to predict the target type of cancer. Experimental results demonstrate that accuracy of the proposed method is 96.89%. As compared to existing work, our method outperformed with better results.
Collapse
Affiliation(s)
- Swati B Bhonde
- Smt. Kashibai Navale College of Engineering, Pune, India
| | | | | |
Collapse
|
12
|
Tran KA, Addala V, Johnston RL, Lovell D, Bradley A, Koufariotis LT, Wood S, Wu SZ, Roden D, Al-Eryani G, Swarbrick A, Williams ED, Pearson JV, Kondrashova O, Waddell N. Performance of tumour microenvironment deconvolution methods in breast cancer using single-cell simulated bulk mixtures. Nat Commun 2023; 14:5758. [PMID: 37717006 PMCID: PMC10505141 DOI: 10.1038/s41467-023-41385-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Accepted: 09/01/2023] [Indexed: 09/18/2023] Open
Abstract
Cells within the tumour microenvironment (TME) can impact tumour development and influence treatment response. Computational approaches have been developed to deconvolve the TME from bulk RNA-seq. Using scRNA-seq profiling from breast tumours we simulate thousands of bulk mixtures, representing tumour purities and cell lineages, to compare the performance of nine TME deconvolution methods (BayesPrism, Scaden, CIBERSORTx, MuSiC, DWLS, hspe, CPM, Bisque, and EPIC). Some methods are more robust in deconvolving mixtures with high tumour purity levels. Most methods tend to mis-predict normal epithelial for cancer epithelial as tumour purity increases, a finding that is validated in two independent datasets. The breast cancer molecular subtype influences this mis-prediction. BayesPrism and DWLS have the lowest combined numbers of false positives and false negatives, and have the best performance when deconvolving granular immune lineages. Our findings highlight the need for more single-cell characterisation of rarer cell types, and suggest that tumour cell compositions should be considered when deconvolving the TME.
Collapse
Affiliation(s)
- Khoa A Tran
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia
- School of Biomedical Sciences, Queensland University of Technology (QUT), Brisbane, QLD, 4000, Australia
| | - Venkateswar Addala
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia
| | - Rebecca L Johnston
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia
| | - David Lovell
- School of Computer Science, Queensland University of Technology, Brisbane, QLD, 4000, Australia
- QUT Centre for Data Science, Brisbane, QLD, 4000, Australia
| | - Andrew Bradley
- Faculty of Engineering, Queensland University of Technology, Brisbane, QLD, 4000, Australia
| | - Lambros T Koufariotis
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia
| | - Scott Wood
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia
| | - Sunny Z Wu
- Cancer Ecosystems Program, Garvan Institute of Medical Research, Darlinghurst, NSW, 2010, Australia
- School of Clinical Medicine, Faculty of Medicine and Health, UNSW Sydney, Kensington, NSW, 2052, Australia
| | - Daniel Roden
- Cancer Ecosystems Program, Garvan Institute of Medical Research, Darlinghurst, NSW, 2010, Australia
- School of Clinical Medicine, Faculty of Medicine and Health, UNSW Sydney, Kensington, NSW, 2052, Australia
| | - Ghamdan Al-Eryani
- Cancer Ecosystems Program, Garvan Institute of Medical Research, Darlinghurst, NSW, 2010, Australia
- School of Clinical Medicine, Faculty of Medicine and Health, UNSW Sydney, Kensington, NSW, 2052, Australia
| | - Alexander Swarbrick
- Cancer Ecosystems Program, Garvan Institute of Medical Research, Darlinghurst, NSW, 2010, Australia
- School of Clinical Medicine, Faculty of Medicine and Health, UNSW Sydney, Kensington, NSW, 2052, Australia
| | - Elizabeth D Williams
- School of Biomedical Sciences, Queensland University of Technology (QUT), Brisbane, QLD, 4000, Australia
- Australian Prostate Cancer Research Centre - Queensland (APCRC-Q) and Queensland Bladder Cancer Initiative (QBCI), Brisbane, QLD, 4000, Australia
| | - John V Pearson
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia
| | - Olga Kondrashova
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia
| | - Nicola Waddell
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia.
- School of Biomedical Sciences, Queensland University of Technology (QUT), Brisbane, QLD, 4000, Australia.
| |
Collapse
|
13
|
Zolotovskaia M, Kovalenko M, Pugacheva P, Tkachev V, Simonov A, Sorokin M, Seryakov A, Garazha A, Gaifullin N, Sekacheva M, Zakharova G, Buzdin AA. Algorithmically Reconstructed Molecular Pathways as the New Generation of Prognostic Molecular Biomarkers in Human Solid Cancers. Proteomes 2023; 11:26. [PMID: 37755705 PMCID: PMC10535530 DOI: 10.3390/proteomes11030026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 08/18/2023] [Accepted: 08/22/2023] [Indexed: 09/28/2023] Open
Abstract
Individual gene expression and molecular pathway activation profiles were shown to be effective biomarkers in many cancers. Here, we used the human interactome model to algorithmically build 7470 molecular pathways centered around individual gene products. We assessed their associations with tumor type and survival in comparison with the previous generation of molecular pathway biomarkers (3022 "classical" pathways) and with the RNA transcripts or proteomic profiles of individual genes, for 8141 and 1117 samples, respectively. For all analytes in RNA and proteomic data, respectively, we found a total of 7441 and 7343 potential biomarker associations for gene-centric pathways, 3020 and 2950 for classical pathways, and 24,349 and 6742 for individual genes. Overall, the percentage of RNA biomarkers was statistically significantly higher for both types of pathways than for individual genes (p < 0.05). In turn, both types of pathways showed comparable performance. The percentage of cancer-type-specific biomarkers was comparable between proteomic and transcriptomic levels, but the proportion of survival biomarkers was dramatically lower for proteomic data. Thus, we conclude that pathway activation level is the advanced type of biomarker for RNA and proteomic data, and momentary algorithmic computer building of pathways is a new credible alternative to time-consuming hypothesis-driven manual pathway curation and reconstruction.
Collapse
Affiliation(s)
- Marianna Zolotovskaia
- Laboratory for Translational Genomic Bioinformatics, Moscow Institute of Physics and Technology (State University), 141701 Dolgoprudny, Russia
- Omicsway Corp., Walnut, CA 91789, USA
- Laboratory of Clinical and Genomic Bioinformatics, I.M. Sechenov First Moscow State Medical University, 119048 Moscow, Russia
| | - Maks Kovalenko
- Laboratory for Translational Genomic Bioinformatics, Moscow Institute of Physics and Technology (State University), 141701 Dolgoprudny, Russia
| | - Polina Pugacheva
- Laboratory for Translational Genomic Bioinformatics, Moscow Institute of Physics and Technology (State University), 141701 Dolgoprudny, Russia
| | | | - Alexander Simonov
- Laboratory for Translational Genomic Bioinformatics, Moscow Institute of Physics and Technology (State University), 141701 Dolgoprudny, Russia
- Omicsway Corp., Walnut, CA 91789, USA
| | - Maxim Sorokin
- Laboratory for Translational Genomic Bioinformatics, Moscow Institute of Physics and Technology (State University), 141701 Dolgoprudny, Russia
- Laboratory of Clinical and Genomic Bioinformatics, I.M. Sechenov First Moscow State Medical University, 119048 Moscow, Russia
- PathoBiology Group, European Organization for Research and Treatment of Cancer (EORTC), 1200 Brussels, Belgium
| | | | | | - Nurshat Gaifullin
- Department of Pathology, Faculty of Medicine, Lomonosov Moscow State University, 119991 Moscow, Russia
| | - Marina Sekacheva
- Laboratory of Clinical and Genomic Bioinformatics, I.M. Sechenov First Moscow State Medical University, 119048 Moscow, Russia
| | - Galina Zakharova
- Laboratory of Clinical and Genomic Bioinformatics, I.M. Sechenov First Moscow State Medical University, 119048 Moscow, Russia
| | - Anton A. Buzdin
- Laboratory for Translational Genomic Bioinformatics, Moscow Institute of Physics and Technology (State University), 141701 Dolgoprudny, Russia
- PathoBiology Group, European Organization for Research and Treatment of Cancer (EORTC), 1200 Brussels, Belgium
- World-Class Research Center “Digital Biodesign and Personalized Healthcare”, Sechenov First Moscow State Medical University, 119048 Moscow, Russia
- Laboratory of Systems Biology, Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, 117997 Moscow, Russia
| |
Collapse
|
14
|
Duan M, Wang Y, Zhao D, Liu H, Zhang G, Li K, Zhang H, Huang L, Zhang R, Zhou F. Orchestrating information across tissues via a novel multitask GAT framework to improve quantitative gene regulation relation modeling for survival analysis. Brief Bioinform 2023; 24:bbad238. [PMID: 37427963 DOI: 10.1093/bib/bbad238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2023] [Revised: 05/29/2023] [Accepted: 06/08/2023] [Indexed: 07/11/2023] Open
Abstract
Survival analysis is critical to cancer prognosis estimation. High-throughput technologies facilitate the increase in the dimension of genic features, but the number of clinical samples in cohorts is relatively small due to various reasons, including difficulties in participant recruitment and high data-generation costs. Transcriptome is one of the most abundantly available OMIC (referring to the high-throughput data, including genomic, transcriptomic, proteomic and epigenomic) data types. This study introduced a multitask graph attention network (GAT) framework DQSurv for the survival analysis task. We first used a large dataset of healthy tissue samples to pretrain the GAT-based HealthModel for the quantitative measurement of the gene regulatory relations. The multitask survival analysis framework DQSurv used the idea of transfer learning to initiate the GAT model with the pretrained HealthModel and further fine-tuned this model using two tasks i.e. the main task of survival analysis and the auxiliary task of gene expression prediction. This refined GAT was denoted as DiseaseModel. We fused the original transcriptomic features with the difference vector between the latent features encoded by the HealthModel and DiseaseModel for the final task of survival analysis. The proposed DQSurv model stably outperformed the existing models for the survival analysis of 10 benchmark cancer types and an independent dataset. The ablation study also supported the necessity of the main modules. We released the codes and the pretrained HealthModel to facilitate the feature encodings and survival analysis of transcriptome-based future studies, especially on small datasets. The model and the code are available at http://www.healthinformaticslab.org/supp/.
Collapse
Affiliation(s)
- Meiyu Duan
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China, 130012
| | - Yueying Wang
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China, 130012
| | - Dong Zhao
- School of Biology and Engineering, and Engineering Research Center of Medical Biotechnology, Guizhou Medical University, Guiyang, Guizhou 550025, China
| | - Hongmei Liu
- School of Biology and Engineering, and Engineering Research Center of Medical Biotechnology, Guizhou Medical University, Guiyang, Guizhou 550025, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, China, 130012
| | - Gongyou Zhang
- School of Biology and Engineering, and Engineering Research Center of Medical Biotechnology, Guizhou Medical University, Guiyang, Guizhou 550025, China
| | - Kewei Li
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China, 130012
| | - Haotian Zhang
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China, 130012
| | - Lan Huang
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China, 130012
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, China, 130012
| | - Ruochi Zhang
- School of Artificial Intelligence, Jilin University, Changchun, China, 130012
| | - Fengfeng Zhou
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China, 130012
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, China, 130012
| |
Collapse
|
15
|
Beaude A, Rafiee Vahid M, Augé F, Zehraoui F, Hanczar B. AttOmics: attention-based architecture for diagnosis and prognosis from omics data. Bioinformatics 2023; 39:i94-i102. [PMID: 37387182 DOI: 10.1093/bioinformatics/btad232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION The increasing availability of high-throughput omics data allows for considering a new medicine centered on individual patients. Precision medicine relies on exploiting these high-throughput data with machine-learning models, especially the ones based on deep-learning approaches, to improve diagnosis. Due to the high-dimensional small-sample nature of omics data, current deep-learning models end up with many parameters and have to be fitted with a limited training set. Furthermore, interactions between molecular entities inside an omics profile are not patient specific but are the same for all patients. RESULTS In this article, we propose AttOmics, a new deep-learning architecture based on the self-attention mechanism. First, we decompose each omics profile into a set of groups, where each group contains related features. Then, by applying the self-attention mechanism to the set of groups, we can capture the different interactions specific to a patient. The results of different experiments carried out in this article show that our model can accurately predict the phenotype of a patient with fewer parameters than deep neural networks. Visualizing the attention maps can provide new insights into the essential groups for a particular phenotype. AVAILABILITY AND IMPLEMENTATION The code and data are available at https://forge.ibisc.univ-evry.fr/abeaude/AttOmics. TCGA data can be downloaded from the Genomic Data Commons Data Portal.
Collapse
Affiliation(s)
- Aurélien Beaude
- IBISC, Université Paris-Saclay, Univ Evry, 23 Boulevard de France, Evry-Courcouronnes 91020, France
- Artificial Intelligence & Deep Analytics, Omics Data Science, Sanofi R&D Data and Data Science, 1 Av. Pierre Brossolette, Chilly-Mazarin 91385, France
| | - Milad Rafiee Vahid
- Sanofi R&D Data and Data Science, Artificial Intelligence & Deep Analytics, Omics Data Science, 450 Water Street, Cambridge, MA 02142, United States
| | - Franck Augé
- Artificial Intelligence & Deep Analytics, Omics Data Science, Sanofi R&D Data and Data Science, 1 Av. Pierre Brossolette, Chilly-Mazarin 91385, France
| | - Farida Zehraoui
- IBISC, Université Paris-Saclay, Univ Evry, 23 Boulevard de France, Evry-Courcouronnes 91020, France
| | - Blaise Hanczar
- IBISC, Université Paris-Saclay, Univ Evry, 23 Boulevard de France, Evry-Courcouronnes 91020, France
| |
Collapse
|
16
|
Padegal G, Rao MK, Boggaram Ravishankar OA, Acharya S, Athri P, Srinivasa G. Analysis of RNA-Seq data using self-supervised learning for vital status prediction of colorectal cancer patients. BMC Bioinformatics 2023; 24:241. [PMID: 37286944 DOI: 10.1186/s12859-023-05347-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Accepted: 05/21/2023] [Indexed: 06/09/2023] Open
Abstract
BACKGROUND RNA sequencing (RNA-Seq) is a technique that utilises the capabilities of next-generation sequencing to study a cellular transcriptome i.e., to determine the amount of RNA at a given time for a given biological sample. The advancement of RNA-Seq technology has resulted in a large volume of gene expression data for analysis. RESULTS Our computational model (built on top of TabNet) is first pretrained on an unlabelled dataset of multiple types of adenomas and adenocarcinomas and later fine-tuned on the labelled dataset, showing promising results in the context of the estimation of the vital status of colorectal cancer patients. We achieve a final cross-validated (ROC-AUC) Score of 0.88 by using multiple modalities of data. CONCLUSION The results of this study demonstrate that self-supervised learning methods pretrained on a vast corpus of unlabelled data outperform traditional supervised learning methods such as XGBoost, Neural Networks, and Decision Trees that have been prevalent in the tabular domain. The results of this study are further boosted by the inclusion of multiple modalities of data pertaining to the patients in question. We find that genes such as RBM3, GSPT1, MAD2L1, and others important to the computation model's prediction task obtained through model interpretability corroborate with pathological evidence in current literature.
Collapse
Affiliation(s)
- Girivinay Padegal
- PES Center for Pattern Recognition, Department of Computer Science and Engineering, PES University, Bengaluru, 560085, India
| | - Murali Krishna Rao
- PES Center for Pattern Recognition, Department of Computer Science and Engineering, PES University, Bengaluru, 560085, India
| | - Om Amitesh Boggaram Ravishankar
- PES Center for Pattern Recognition, Department of Computer Science and Engineering, PES University, Bengaluru, 560085, India
| | - Sathwik Acharya
- PES Center for Pattern Recognition, Department of Computer Science and Engineering, PES University, Bengaluru, 560085, India
| | - Prashanth Athri
- Department of Computer Science and Engineering, PES University Electronic City Campus, Bengaluru, 560100, India
| | - Gowri Srinivasa
- PES Center for Pattern Recognition, Department of Computer Science and Engineering, PES University, Bengaluru, 560085, India.
| |
Collapse
|
17
|
Kesimoglu ZN, Bozdag S. SUPREME: multiomics data integration using graph convolutional networks. NAR Genom Bioinform 2023; 5:lqad063. [PMID: 37680392 PMCID: PMC10481254 DOI: 10.1093/nargab/lqad063] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 05/08/2023] [Accepted: 06/07/2023] [Indexed: 09/09/2023] Open
Abstract
To pave the road towards precision medicine in cancer, patients with similar biology ought to be grouped into same cancer subtypes. Utilizing high-dimensional multiomics datasets, integrative approaches have been developed to uncover cancer subtypes. Recently, Graph Neural Networks have been discovered to learn node embeddings utilizing node features and associations on graph-structured data. Some integrative prediction tools have been developed leveraging these advances on multiple networks with some limitations. Addressing these limitations, we developed SUPREME, a node classification framework, which integrates multiple data modalities on graph-structured data. On breast cancer subtyping, unlike existing tools, SUPREME generates patient embeddings from multiple similarity networks utilizing multiomics features and integrates them with raw features to capture complementary signals. On breast cancer subtype prediction tasks from three datasets, SUPREME outperformed other tools. SUPREME-inferred subtypes had significant survival differences, mostly having more significance than ground truth, and outperformed nine other approaches. These results suggest that with proper multiomics data utilization, SUPREME could demystify undiscovered characteristics in cancer subtypes that cause significant survival differences and could improve ground truth label, which depends mainly on one datatype. In addition, to show model-agnostic property of SUPREME, we applied it to two additional datasets and had a clear outperformance.
Collapse
Affiliation(s)
| | - Serdar Bozdag
- Department of Computer Science and Engineering, University of North Texas, Denton, TX, USA
- Department of Mathematics, University of North Texas, Denton, TX, USA
- BioDiscovery Institute, University of North Texas, Denton, TX, USA
| |
Collapse
|
18
|
Wysocka M, Wysocki O, Zufferey M, Landers D, Freitas A. A systematic review of biologically-informed deep learning models for cancer: fundamental trends for encoding and interpreting oncology data. BMC Bioinformatics 2023; 24:198. [PMID: 37189058 DOI: 10.1186/s12859-023-05262-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 03/30/2023] [Indexed: 05/17/2023] Open
Abstract
BACKGROUND There is an increasing interest in the use of Deep Learning (DL) based methods as a supporting analytical framework in oncology. However, most direct applications of DL will deliver models with limited transparency and explainability, which constrain their deployment in biomedical settings. METHODS This systematic review discusses DL models used to support inference in cancer biology with a particular emphasis on multi-omics analysis. It focuses on how existing models address the need for better dialogue with prior knowledge, biological plausibility and interpretability, fundamental properties in the biomedical domain. For this, we retrieved and analyzed 42 studies focusing on emerging architectural and methodological advances, the encoding of biological domain knowledge and the integration of explainability methods. RESULTS We discuss the recent evolutionary arch of DL models in the direction of integrating prior biological relational and network knowledge to support better generalisation (e.g. pathways or Protein-Protein-Interaction networks) and interpretability. This represents a fundamental functional shift towards models which can integrate mechanistic and statistical inference aspects. We introduce a concept of bio-centric interpretability and according to its taxonomy, we discuss representational methodologies for the integration of domain prior knowledge in such models. CONCLUSIONS The paper provides a critical outlook into contemporary methods for explainability and interpretability used in DL for cancer. The analysis points in the direction of a convergence between encoding prior knowledge and improved interpretability. We introduce bio-centric interpretability which is an important step towards formalisation of biological interpretability of DL models and developing methods that are less problem- or application-specific.
Collapse
Affiliation(s)
- Magdalena Wysocka
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL, UK.
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL, UK.
| | - Oskar Wysocki
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL, UK.
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL, UK.
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920, Martigny, Switzerland.
| | - Marie Zufferey
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920, Martigny, Switzerland
| | - Dónal Landers
- DeLondra Oncology Ltd, 38 Carlton Avenue, Wilmslow, SK9 4EP, UK
| | - André Freitas
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL, UK
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL, UK
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920, Martigny, Switzerland
| |
Collapse
|
19
|
Zhang Z, Wei X. Artificial intelligence-assisted selection and efficacy prediction of antineoplastic strategies for precision cancer therapy. Semin Cancer Biol 2023; 90:57-72. [PMID: 36796530 DOI: 10.1016/j.semcancer.2023.02.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 01/12/2023] [Accepted: 02/13/2023] [Indexed: 02/16/2023]
Abstract
The rapid development of artificial intelligence (AI) technologies in the context of the vast amount of collectable data obtained from high-throughput sequencing has led to an unprecedented understanding of cancer and accelerated the advent of a new era of clinical oncology with a tone of precision treatment and personalized medicine. However, the gains achieved by a variety of AI models in clinical oncology practice are far from what one would expect, and in particular, there are still many uncertainties in the selection of clinical treatment options that pose significant challenges to the application of AI in clinical oncology. In this review, we summarize emerging approaches, relevant datasets and open-source software of AI and show how to integrate them to address problems from clinical oncology and cancer research. We focus on the principles and procedures for identifying different antitumor strategies with the assistance of AI, including targeted cancer therapy, conventional cancer therapy, and cancer immunotherapy. In addition, we also highlight the current challenges and directions of AI in clinical oncology translation. Overall, we hope this article will provide researchers and clinicians with a deeper understanding of the role and implications of AI in precision cancer therapy, and help AI move more quickly into accepted cancer guidelines.
Collapse
Affiliation(s)
- Zhe Zhang
- Laboratory of Aging Research and Cancer Drug Target, State Key Laboratory of Biotherapy and Cancer Center, National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, Chengdu 610041, PR China; State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, and Collaborative Innovation Center for Biotherapy, Chengdu 610041, PR China
| | - Xiawei Wei
- Laboratory of Aging Research and Cancer Drug Target, State Key Laboratory of Biotherapy and Cancer Center, National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, Chengdu 610041, PR China.
| |
Collapse
|
20
|
Bairakdar MD, Tewari A, Truttmann MC. A meta-analysis of RNA-Seq studies to identify novel genes that regulate aging. Exp Gerontol 2023; 173:112107. [PMID: 36731807 PMCID: PMC10653729 DOI: 10.1016/j.exger.2023.112107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 01/17/2023] [Accepted: 01/23/2023] [Indexed: 02/04/2023]
Abstract
Aging is a ubiquitous biological process that limits the maximal lifespan of most organisms. Significant efforts by many groups have identified mechanisms that, when triggered by natural or artificial stimuli, are sufficient to either enhance or decrease maximal lifespan. Previous aging studies using the nematode Caenorhabditis elegans (C. elegans) generated a wealth of publicly available transcriptomics datasets linking changes in gene expression to lifespan regulation. However, a comprehensive comparison of these datasets across studies in the context of aging biology is missing. Here, we carry out a systematic meta-analysis of over 1200 bulk RNA sequencing (RNASeq) samples obtained from 74 peer-reviewed publications on aging-related transcriptomic changes in C. elegans. Using both differential expression analyses and machine learning approaches, we mine the pooled data for novel pro-longevity genes. We find that both approaches identify known and propose novel pro-longevity genes. Further, we find that inter-lab experimental variance complicates the application of machine learning algorithms, a limitation that was not solved using bulk RNA-Seq batch correction and normalization techniques. Taken as a whole, our results indicate that machine learning approaches may hold promise for the identification of genes that regulate aging but will require more sophisticated batch correction strategies or standardized input data to reliably identify novel pro-longevity genes.
Collapse
Affiliation(s)
- Mohamad D Bairakdar
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, USA
| | - Ambuj Tewari
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, USA; Department of Statistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Matthias C Truttmann
- Department of Molecular & Integrative Physiology, University of Michigan, Ann Arbor, MI, 48109, USA; Geriatrics Center, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
21
|
Liu C, Duan Y, Zhou Q, Wang Y, Gao Y, Kan H, Hu J. A classification method of gastric cancer subtype based on residual graph convolution network. Front Genet 2023; 13:1090394. [PMID: 36685956 PMCID: PMC9845413 DOI: 10.3389/fgene.2022.1090394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Accepted: 12/09/2022] [Indexed: 01/06/2023] Open
Abstract
Background: Clinical diagnosis and treatment of tumors are greatly complicated by their heterogeneity, and the subtype classification of cancer frequently plays a significant role in the subsequent treatment of tumors. Presently, the majority of studies rely far too heavily on gene expression data, omitting the enormous power of multi-omics fusion data and the potential for patient similarities. Method: In this study, we created a gastric cancer subtype classification model called RRGCN based on residual graph convolutional network (GCN) using multi-omics fusion data and patient similarity network. Given the multi-omics data's high dimensionality, we built an artificial neural network Autoencoder (AE) to reduce the dimensionality of the data and extract hidden layer features. The model is then built using the feature data. In addition, we computed the correlation between patients using the Pearson correlation coefficient, and this relationship between patients forms the edge of the graph structure. Four graph convolutional network layers and two residual networks with skip connections make up RRGCN, which reduces the amount of information lost during transmission between layers and prevents model degradation. Results: The results show that RRGCN significantly outperforms other classification methods with an accuracy as high as 0.87 when compared to four other traditional machine learning methods and deep learning models. Conclusion: In terms of subtype classification, RRGCN excels in all areas and has the potential to offer fresh perspectives on disease mechanisms and disease progression. It has the potential to be used for a broader range of disorders and to aid in clinical diagnosis.
Collapse
Affiliation(s)
- Can Liu
- School of Medical Informatics Engineering, Anhui University of Chinese Medicine, Hefei, Anhui, China,Anhui Computer Application Research Institute of Chinese Medicine, China Academy of Chinese Medical Sciences, Hefei, Anhui, China
| | - Yuchen Duan
- School of Medical Informatics Engineering, Anhui University of Chinese Medicine, Hefei, Anhui, China
| | - Qingqing Zhou
- School of Medical Informatics Engineering, Anhui University of Chinese Medicine, Hefei, Anhui, China
| | - Yongkang Wang
- School of Medical Informatics Engineering, Anhui University of Chinese Medicine, Hefei, Anhui, China,Anhui Computer Application Research Institute of Chinese Medicine, China Academy of Chinese Medical Sciences, Hefei, Anhui, China
| | - Yong Gao
- School of Medical Informatics Engineering, Anhui University of Chinese Medicine, Hefei, Anhui, China,Anhui Computer Application Research Institute of Chinese Medicine, China Academy of Chinese Medical Sciences, Hefei, Anhui, China
| | - Hongxing Kan
- School of Medical Informatics Engineering, Anhui University of Chinese Medicine, Hefei, Anhui, China,Anhui Computer Application Research Institute of Chinese Medicine, China Academy of Chinese Medical Sciences, Hefei, Anhui, China
| | - Jili Hu
- School of Medical Informatics Engineering, Anhui University of Chinese Medicine, Hefei, Anhui, China,Anhui Computer Application Research Institute of Chinese Medicine, China Academy of Chinese Medical Sciences, Hefei, Anhui, China,*Correspondence: Jili Hu,
| |
Collapse
|
22
|
Deep-Learning Algorithm and Concomitant Biomarker Identification for NSCLC Prediction Using Multi-Omics Data Integration. Biomolecules 2022; 12:biom12121839. [PMID: 36551266 PMCID: PMC9775093 DOI: 10.3390/biom12121839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 12/05/2022] [Accepted: 12/05/2022] [Indexed: 12/14/2022] Open
Abstract
Early diagnosis of lung cancer to increase the survival rate, which is currently at a low range of mid-30%, remains a critical need. Despite this, multi-omics data have rarely been applied to non-small-cell lung cancer (NSCLC) diagnosis. We developed a multi-omics data-affinitive artificial intelligence algorithm based on the graph convolutional network that integrates mRNA expression, DNA methylation, and DNA sequencing data. This NSCLC prediction model achieved a 93.7% macro F1-score, indicating that values for false positives and negatives were substantially low, which is desirable for accurate classification. Gene ontology enrichment and pathway analysis of features revealed that two major subtypes of NSCLC, lung adenocarcinoma and lung squamous cell carcinoma, have both specific and common GO biological processes. Numerous biomarkers (i.e., microRNA, long non-coding RNA, differentially methylated regions) were newly identified, whereas some biomarkers were consistent with previous findings in NSCLC (e.g., SPRR1B). Thus, using multi-omics data integration, we developed a promising cancer prediction algorithm.
Collapse
|
23
|
Jones S, Beyers M, Shukla M, Xia F, Brettin T, Stevens R, Weil MR, Ranganathan Ganakammal S. TULIP: An RNA-seq-based Primary Tumor Type Prediction Tool Using Convolutional Neural Networks. Cancer Inform 2022; 21:11769351221139491. [PMID: 36507076 PMCID: PMC9729992 DOI: 10.1177/11769351221139491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Accepted: 10/28/2022] [Indexed: 12/12/2022] Open
Abstract
Background With cancer as one of the leading causes of death worldwide, accurate primary tumor type prediction is critical in identifying genetic factors that can inhibit or slow tumor progression. There have been efforts to categorize primary tumor types with gene expression data using machine learning, and more recently with deep learning, in the last several years. Methods In this paper, we developed four 1-dimensional (1D) Convolutional Neural Network (CNN) models to classify RNA-seq count data as one of 17 highly represented primary tumor types or 32 primary tumor types regardless of imbalanced representation. Additionally, we adapted the models to take as input either all Ensembl genes (60,483) or protein coding genes only (19,758). Unlike previous work, we avoided selection bias by not filtering genes based on expression values. RNA-seq count data expressed as FPKM-UQ of 9,025 and 10,940 samples from The Cancer Genome Atlas (TCGA) were downloaded from the Genomic Data Commons (GDC) corresponding to 17 and 32 primary tumor types respectively for training and validating the models. Results All 4 1D-CNN models had an overall accuracy of 94.7% to 97.6% on the test dataset. Further evaluation indicates that the models with protein coding genes only as features performed with better accuracy compared to the models with all Ensembl genes for both 17 and 32 primary tumor types. For all models, the accuracy by primary tumor type was above 80% for most primary tumor types. Conclusions We packaged all 4 models as a Python-based deep learning classification tool called TULIP (TUmor CLassIfication Predictor) for performing quality control on primary tumor samples and characterizing cancer samples of unknown tumor type. Further optimization of the models is needed to improve the accuracy of certain primary tumor types.
Collapse
Affiliation(s)
- Sara Jones
- Frederick National Laboratory for Cancer Research, Cancer Data Science Initiatives, Cancer Research Technology Program, Rockville, MD, USA
| | - Matthew Beyers
- Frederick National Laboratory for Cancer Research, Cancer Data Science Initiatives, Cancer Research Technology Program, Rockville, MD, USA
| | - Maulik Shukla
- Argonne National Laboratory, Computing, Environment and Life Sciences, Lemont, IL, USA
| | - Fangfang Xia
- Argonne National Laboratory, Computing, Environment and Life Sciences, Lemont, IL, USA
| | - Thomas Brettin
- Argonne National Laboratory, Computing, Environment and Life Sciences, Lemont, IL, USA
| | - Rick Stevens
- Argonne National Laboratory, Computing, Environment and Life Sciences, Lemont, IL, USA
| | - M Ryan Weil
- Frederick National Laboratory for Cancer Research, Cancer Data Science Initiatives, Cancer Research Technology Program, Rockville, MD, USA
| | - Satishkumar Ranganathan Ganakammal
- Frederick National Laboratory for Cancer Research, Cancer Data Science Initiatives, Cancer Research Technology Program, Rockville, MD, USA,Ranganathan Ganakammal Satishkumar, Cancer Data Science Initiatives, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, 9605 Medical Center Dr, Rockville, MD 20852, USA.
| |
Collapse
|
24
|
Kuang J, Scoglio C, Michel K. Feature learning and network structure from noisy node activity data. Phys Rev E 2022; 106:064301. [PMID: 36671154 PMCID: PMC9869472 DOI: 10.1103/physreve.106.064301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Accepted: 11/17/2022] [Indexed: 06/17/2023]
Abstract
In the studies of network structures, much attention has been devoted to developing approaches to reconstruct networks and predict missing links when edge-related information is given. However, such approaches are not applicable when we are only given noisy node activity data with missing values. This work presents an unsupervised learning framework to learn node vectors and construct networks from such node activity data. First, we design a scheme to generate random node sequences from node context sets, which are generated from node activity data. Then, a three-layer neural network is adopted training the node sequences to obtain node vectors, which allow us to construct networks and capture nodes with synergistic roles. Furthermore, we present an entropy-based approach to select the most meaningful neighbors for each node in the resulting network. Finally, the effectiveness of the method is validated through both synthetic and real data.
Collapse
Affiliation(s)
- Junyao Kuang
- Department of Electrical and Computer Engineering
| | | | | |
Collapse
|
25
|
Characterizing Macrophages Diversity in COVID-19 Patients Using Deep Learning. Genes (Basel) 2022; 13:genes13122264. [PMID: 36553530 PMCID: PMC9777824 DOI: 10.3390/genes13122264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 11/23/2022] [Accepted: 11/28/2022] [Indexed: 12/04/2022] Open
Abstract
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the etiological agent responsible for coronavirus disease 2019 (COVID-19), has affected the lives of billions and killed millions of infected people. This virus has been demonstrated to have different outcomes among individuals, with some of them presenting a mild infection, while others present severe symptoms or even death. The identification of the molecular states related to the severity of a COVID-19 infection has become of the utmost importance to understanding the differences in critical immune response. In this study, we computationally processed a set of publicly available single-cell RNA-Seq (scRNA-Seq) data of 12 Bronchoalveolar Lavage Fluid (BALF) samples diagnosed as having a mild, severe, or no infection, and generated a high-quality dataset that consists of 63,734 cells, each with 23,916 genes. We extended the cell-type and sub-type composition identification and our analysis showed significant differences in cell-type composition in mild and severe groups compared to the normal. Importantly, inflammatory responses were dramatically elevated in the severe group, which was evidenced by the significant increase in macrophages, from 10.56% in the normal group to 20.97% in the mild group and 34.15% in the severe group. As an indicator of immune defense, populations of T cells accounted for 24.76% in the mild group and decreased to 7.35% in the severe group. To verify these findings, we developed several artificial neural networks (ANNs) and graph convolutional neural network (GCNN) models. We showed that the GCNN models reach a prediction accuracy of the infection of 91.16% using data from subtypes of macrophages. Overall, our study indicates significant differences in the gene expression profiles of inflammatory response and immune cells of severely infected patients.
Collapse
|
26
|
Li MM, Huang K, Zitnik M. Graph representation learning in biomedicine and healthcare. Nat Biomed Eng 2022; 6:1353-1369. [PMID: 36316368 PMCID: PMC10699434 DOI: 10.1038/s41551-022-00942-x] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Accepted: 08/09/2022] [Indexed: 11/11/2022]
Abstract
Networks-or graphs-are universal descriptors of systems of interacting elements. In biomedicine and healthcare, they can represent, for example, molecular interactions, signalling pathways, disease co-morbidities or healthcare systems. In this Perspective, we posit that representation learning can realize principles of network medicine, discuss successes and current limitations of the use of representation learning on graphs in biomedicine and healthcare, and outline algorithmic strategies that leverage the topology of graphs to embed them into compact vectorial spaces. We argue that graph representation learning will keep pushing forward machine learning for biomedicine and healthcare applications, including the identification of genetic variants underlying complex traits, the disentanglement of single-cell behaviours and their effects on health, the assistance of patients in diagnosis and treatment, and the development of safe and effective medicines.
Collapse
Affiliation(s)
- Michelle M Li
- Bioinformatics and Integrative Genomics Program, Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Kexin Huang
- Health Data Science Program, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Harvard Data Science Initiative, Cambridge, MA, USA.
| |
Collapse
|
27
|
EpICC: A Bayesian neural network model with uncertainty correction for a more accurate classification of cancer. Sci Rep 2022; 12:14628. [PMID: 36028643 PMCID: PMC9418241 DOI: 10.1038/s41598-022-18874-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 08/22/2022] [Indexed: 11/09/2022] Open
Abstract
Accurate classification of cancers into their types and subtypes holds the key for choosing the right treatment strategy and can greatly impact patient well-being. However, existence of large-scale variations in the molecular processes driving even a single type of cancer can make accurate classification a challenging problem. Therefore, improved and robust methods for classification are absolutely critical. Although deep learning-based methods for cancer classification have been proposed earlier, they all provide point estimates for predictions without any measure of confidence and thus, can fall short in real-world applications where key decisions are to be made based on the predictions of the classifier. Here we report a Bayesian neural network-based model for classification of cancer types as well as sub-types from transcriptomic data. This model reported a measure of confidence with each prediction through analysis of epistemic uncertainty. We incorporated an uncertainty correction step with the Bayesian network-based model to greatly enhance prediction accuracy of cancer types (> 97% accuracy) and sub-types (> 80%). Our work suggests that reporting uncertainty measure with each classification can enable more accurate and informed decision-making that can be highly valuable in clinical settings.
Collapse
|
28
|
Hanczar B, Bourgeais V, Zehraoui F. Assessment of deep learning and transfer learning for cancer prediction based on gene expression data. BMC Bioinformatics 2022; 23:262. [PMID: 35786378 PMCID: PMC9250744 DOI: 10.1186/s12859-022-04807-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 06/15/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Machine learning is now a standard tool for cancer prediction based on gene expression data. However, deep learning is still new for this task, and there is no clear consensus about its performance and utility. Few experimental works have evaluated deep neural networks and compared them with state-of-the-art machine learning. Moreover, their conclusions are not consistent. RESULTS We extensively evaluate the deep learning approach on 22 cancer prediction tasks based on gene expression data. We measure the impact of the main hyper-parameters and compare the performances of neural networks with the state-of-the-art. We also investigate the effectiveness of several transfer learning schemes in different experimental setups. CONCLUSION Based on our experimentations, we provide several recommendations to optimize the construction and training of a neural network model. We show that neural networks outperform the state-of-the-art methods only for very large training set size. For a small training set, we show that transfer learning is possible and may strongly improve the model performance in some cases.
Collapse
Affiliation(s)
- Blaise Hanczar
- IBISC, Université Paris-Saclay (Univ. Evry), 23 boulevard de France, 91034, Evry, France.
| | - Victoria Bourgeais
- IBISC, Université Paris-Saclay (Univ. Evry), 23 boulevard de France, 91034, Evry, France
| | - Farida Zehraoui
- IBISC, Université Paris-Saclay (Univ. Evry), 23 boulevard de France, 91034, Evry, France
| |
Collapse
|
29
|
Pathway importance by graph convolutional network and Shapley additive explanations in gene expression phenotype of diffuse large B-cell lymphoma. PLoS One 2022; 17:e0269570. [PMID: 35749395 PMCID: PMC9231717 DOI: 10.1371/journal.pone.0269570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 05/09/2022] [Indexed: 11/30/2022] Open
Abstract
Deep learning techniques have recently been applied to analyze associations between gene expression data and disease phenotypes. However, there are concerns regarding the black box problem: it is difficult to interpret why the prediction results are obtained using deep learning models from model parameters. New methods have been proposed for interpreting deep learning model predictions but have not been applied to genetics. In this study, we demonstrated that applying SHapley Additive exPlanations (SHAP) to a deep learning model using graph convolutions of genetic pathways can provide pathway-level feature importance for classification prediction of diffuse large B-cell lymphoma (DLBCL) gene expression subtypes. Using Kyoto Encyclopedia of Genes and Genomes pathways, a graph convolutional network (GCN) model was implemented to construct graphs with nodes and edges. DLBCL datasets, including microarray gene expression data and clinical information on subtypes (germinal center B-cell-like type and activated B-cell-like type), were retrieved from the Gene Expression Omnibus to evaluate the model. The GCN model showed an accuracy of 0.914, precision of 0.948, recall of 0.868, and F1 score of 0.906 in analysis of the classification performance for the test datasets. The pathways with high feature importance by SHAP included highly enriched pathways in the gene set enrichment analysis. Moreover, a logistic regression model with explanatory variables of genes in pathways with high feature importance showed good performance in predicting DLBCL subtypes. In conclusion, our GCN model for classifying DLBCL subtypes is useful for interpreting important regulatory pathways that contribute to the prediction.
Collapse
|
30
|
A Novel Attention-Mechanism Based Cox Survival Model by Exploiting Pan-Cancer Empirical Genomic Information. Cells 2022; 11:cells11091421. [PMID: 35563727 PMCID: PMC9100007 DOI: 10.3390/cells11091421] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 04/15/2022] [Accepted: 04/19/2022] [Indexed: 01/27/2023] Open
Abstract
Cancer prognosis is an essential goal for early diagnosis, biomarker selection, and medical therapy. In the past decade, deep learning has successfully solved a variety of biomedical problems. However, due to the high dimensional limitation of human cancer transcriptome data and the small number of training samples, there is still no mature deep learning-based survival analysis model that can completely solve problems in the training process like overfitting and accurate prognosis. Given these problems, we introduced a novel framework called SAVAE-Cox for survival analysis of high-dimensional transcriptome data. This model adopts a novel attention mechanism and takes full advantage of the adversarial transfer learning strategy. We trained the model on 16 types of TCGA cancer RNA-seq data sets. Experiments show that our module outperformed state-of-the-art survival analysis models such as the Cox proportional hazard model (Cox-ph), Cox-lasso, Cox-ridge, Cox-nnet, and VAECox on the concordance index. In addition, we carry out some feature analysis experiments. Based on the experimental results, we concluded that our model is helpful for revealing cancer-related genes and biological functions.
Collapse
|
31
|
Bourgeais V, Zehraoui F, Hanczar B. GraphGONet: a self-explaining neural network encapsulating the Gene Ontology graph for phenotype prediction on gene expression. Bioinformatics 2022; 38:2504-2511. [PMID: 35266505 DOI: 10.1093/bioinformatics/btac147] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2021] [Revised: 02/02/2022] [Accepted: 03/07/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Medical care is becoming more and more specific to patients' needs due to the increased availability of omics data. The application to these data of sophisticated machine learning models, in particular deep learning, can improve the field of precision medicine. However, their use in clinics is limited as their predictions are not accompanied by an explanation. The production of accurate and intelligible predictions can benefit from the inclusion of domain knowledge. Therefore, knowledge-based deep learning models appear to be a promising solution. RESULTS In this paper, we propose GraphGONet, where the Gene Ontology is encapsulated in the hidden layers of a new self-explaining neural network. Each neuron in the layers represents a biological concept, combining the gene expression profile of a patient, and the information from its neighboring neurons. The experiments described in the paper confirm that our model not only performs as accurately as the state-of-the-art (non-explainable ones) but also automatically produces stable and intelligible explanations composed of the biological concepts with the highest contribution. This feature allows experts to use our tool in a medical setting. AVAILABILITY GraphGONet is freely available at https://forge.ibisc.univ-evry.fr/vbourgeais/GraphGONet.git. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Victoria Bourgeais
- IBISC,Université Paris-Saclay (Univ. Évry), Évry-Courcouronnes, 91020, France
| | - Farida Zehraoui
- IBISC,Université Paris-Saclay (Univ. Évry), Évry-Courcouronnes, 91020, France
| | - Blaise Hanczar
- IBISC,Université Paris-Saclay (Univ. Évry), Évry-Courcouronnes, 91020, France
| |
Collapse
|
32
|
Kakati T, Bhattacharyya DK, Kalita JK, Norden-Krichmar TM. DEGnext: classification of differentially expressed genes from RNA-seq data using a convolutional neural network with transfer learning. BMC Bioinformatics 2022; 23:17. [PMID: 34991439 PMCID: PMC8734099 DOI: 10.1186/s12859-021-04527-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 12/13/2021] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND A limitation of traditional differential expression analysis on small datasets involves the possibility of false positives and false negatives due to sample variation. Considering the recent advances in deep learning (DL) based models, we wanted to expand the state-of-the-art in disease biomarker prediction from RNA-seq data using DL. However, application of DL to RNA-seq data is challenging due to absence of appropriate labels and smaller sample size as compared to number of genes. Deep learning coupled with transfer learning can improve prediction performance on novel data by incorporating patterns learned from other related data. With the emergence of new disease datasets, biomarker prediction would be facilitated by having a generalized model that can transfer the knowledge of trained feature maps to the new dataset. To the best of our knowledge, there is no Convolutional Neural Network (CNN)-based model coupled with transfer learning to predict the significant upregulating (UR) and downregulating (DR) genes from both trained and untrained datasets. RESULTS We implemented a CNN model, DEGnext, to predict UR and DR genes from gene expression data obtained from The Cancer Genome Atlas database. DEGnext uses biologically validated data along with logarithmic fold change values to classify differentially expressed genes (DEGs) as UR and DR genes. We applied transfer learning to our model to leverage the knowledge of trained feature maps to untrained cancer datasets. DEGnext's results were competitive (ROC scores between 88 and 99[Formula: see text]) with those of five traditional machine learning methods: Decision Tree, K-Nearest Neighbors, Random Forest, Support Vector Machine, and XGBoost. DEGnext was robust and effective in terms of transferring learned feature maps to facilitate classification of unseen datasets. Additionally, we validated that the predicted DEGs from DEGnext were mapped to significant Gene Ontology terms and pathways related to cancer. CONCLUSIONS DEGnext can classify DEGs into UR and DR genes from RNA-seq cancer datasets with high performance. This type of analysis, using biologically relevant fine-tuning data, may aid in the exploration of potential biomarkers and can be adapted for other disease datasets.
Collapse
Affiliation(s)
- Tulika Kakati
- Department of Epidemiology and Biostatistics, University of California, Irvine, Irvine, CA, USA.,Department of Computer Science, Tezpur University, Assam, India
| | | | - Jugal K Kalita
- Department of Computer Science, University of Colorado, Colorado Springs, Colorado Springs, CO, USA
| | - Trina M Norden-Krichmar
- Department of Epidemiology and Biostatistics, University of California, Irvine, Irvine, CA, USA.
| |
Collapse
|
33
|
Ghandikota S, Jegga AG. gene2gauss: A multi-view gaussian gene embedding learner for analyzing transcriptomic networks. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2022; 2022:206-215. [PMID: 35854722 PMCID: PMC9285176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/01/2023]
Abstract
Analyzing gene co-expression networks can help in the discovery of biological processes and regulatory mechanisms underlying normal or perturbed states. Unlike standard differential analysis, network-based approaches consider the interactions between the genes involved leading to biologically relevant results. Applying such network-based methods to jointly analyze multiple transcriptomic networks representing independent disease cohorts or studies could lead to the identification of more robust gene modules or gene regulatory networks. We present gene2gauss, a novel feature learning framework that is capable of embedding genes as multivariate gaussian distributions by taking into account their long-range interaction neighborhoods across multiple transcriptomic studies. Using multiple gene co-expression networks from idiopathic pulmonary fibrosis, we demonstrate that these multi-dimensional gaussian features are suitable for identifying regulons of known transcription factors (TF). Using standard TF-target libraries, we demonstrate that the features from our method are highly relevant in comparison with other feature learning approaches on transcriptomic data.
Collapse
Affiliation(s)
- Sudhir Ghandikota
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
- Department of Electrical Engineering and Computer Science, University of Cincinnati College of Engineering, Cincinnati, Ohio, USA
| | - Anil G Jegga
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA
| |
Collapse
|
34
|
Baranwal M, Krishnan S, Oneka M, Frankel T, Rao A. CGAT: Cell Graph ATtention Network for Grading of Pancreatic Disease Histology Images. Front Immunol 2021; 12:727610. [PMID: 34671349 PMCID: PMC8522581 DOI: 10.3389/fimmu.2021.727610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2021] [Accepted: 09/03/2021] [Indexed: 11/13/2022] Open
Abstract
Early detection of Pancreatic Ductal Adenocarcinoma (PDAC), one of the most aggressive malignancies of the pancreas, is crucial to avoid metastatic spread to other body regions. Detection of pancreatic cancer is typically carried out by assessing the distribution and arrangement of tumor and immune cells in histology images. This is further complicated due to morphological similarities with chronic pancreatitis (CP), and the co-occurrence of precursor lesions in the same tissue. Most of the current automated methods for grading pancreatic cancers rely on extensive feature engineering involving accurate identification of cell features or utilising single number spatially informed indices for grading purposes. Moreover, sophisticated methods involving black-box approaches, such as neural networks, do not offer insights into the model's ability to accurately identify the correct disease grade. In this paper, we develop a novel cell-graph based Cell-Graph Attention (CGAT) network for the precise classification of pancreatic cancer and its precursors from multiplexed immunofluorescence histology images into the six different types of pancreatic diseases. The issue of class imbalance is addressed through bootstrapping multiple CGAT-nets, while the self-attention mechanism facilitates visualization of cell-cell features that are likely responsible for the predictive capabilities of the model. It is also shown that the model significantly outperforms the decision tree classifiers built using spatially informed metric, such as the Morisita-Horn (MH) indices.
Collapse
Affiliation(s)
- Mayank Baranwal
- Division of Data & Decision Sciences, Tata Consultancy Services Research, Mumbai, India.,Department of Systems and Control Engineering, Indian Institute of Technology, Bombay, India
| | - Santhoshi Krishnan
- Department of Electrical & Computer Engineering, Rice University, Houston, TX, United States.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States
| | - Morgan Oneka
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States
| | - Timothy Frankel
- Department of Surgery, University of Michigan, Ann Arbor, MI, United States
| | - Arvind Rao
- Department of Electrical & Computer Engineering, Rice University, Houston, TX, United States.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States.,Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States.,Department of Radiation Oncology, University of Michigan, Ann Arbor, MI, United States.,Department of Biomedical Engineering, University of Michigan, Ann Arbor, MI, United States
| |
Collapse
|
35
|
Tran KA, Kondrashova O, Bradley A, Williams ED, Pearson JV, Waddell N. Deep learning in cancer diagnosis, prognosis and treatment selection. Genome Med 2021; 13:152. [PMID: 34579788 PMCID: PMC8477474 DOI: 10.1186/s13073-021-00968-x] [Citation(s) in RCA: 244] [Impact Index Per Article: 81.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2020] [Accepted: 09/12/2021] [Indexed: 12/13/2022] Open
Abstract
Deep learning is a subdiscipline of artificial intelligence that uses a machine learning technique called artificial neural networks to extract patterns and make predictions from large data sets. The increasing adoption of deep learning across healthcare domains together with the availability of highly characterised cancer datasets has accelerated research into the utility of deep learning in the analysis of the complex biology of cancer. While early results are promising, this is a rapidly evolving field with new knowledge emerging in both cancer biology and deep learning. In this review, we provide an overview of emerging deep learning techniques and how they are being applied to oncology. We focus on the deep learning applications for omics data types, including genomic, methylation and transcriptomic data, as well as histopathology-based genomic inference, and provide perspectives on how the different data types can be integrated to develop decision support tools. We provide specific examples of how deep learning may be applied in cancer diagnosis, prognosis and treatment management. We also assess the current limitations and challenges for the application of deep learning in precision oncology, including the lack of phenotypically rich data and the need for more explainable deep learning models. Finally, we conclude with a discussion of how current obstacles can be overcome to enable future clinical utilisation of deep learning.
Collapse
Affiliation(s)
- Khoa A. Tran
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, 4006 Australia
- School of Biomedical Sciences, Faculty of Health, Queensland University of Technology (QUT), Brisbane, 4059 Australia
| | - Olga Kondrashova
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, 4006 Australia
| | - Andrew Bradley
- Faculty of Engineering, Queensland University of Technology (QUT), Brisbane, 4000 Australia
| | - Elizabeth D. Williams
- School of Biomedical Sciences, Faculty of Health, Queensland University of Technology (QUT), Brisbane, 4059 Australia
- Australian Prostate Cancer Research Centre - Queensland (APCRC-Q) and Queensland Bladder Cancer Initiative (QBCI), Brisbane, 4102 Australia
| | - John V. Pearson
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, 4006 Australia
| | - Nicola Waddell
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, 4006 Australia
| |
Collapse
|
36
|
Chiu YC, Zheng S, Wang LJ, Iskra BS, Rao MK, Houghton PJ, Huang Y, Chen Y. Predicting and characterizing a cancer dependency map of tumors with deep learning. SCIENCE ADVANCES 2021; 7:7/34/eabh1275. [PMID: 34417181 PMCID: PMC8378822 DOI: 10.1126/sciadv.abh1275] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Accepted: 06/29/2021] [Indexed: 05/14/2023]
Abstract
Genome-wide loss-of-function screens have revealed genes essential for cancer cell proliferation, called cancer dependencies. It remains challenging to link cancer dependencies to the molecular compositions of cancer cells or to unscreened cell lines and further to tumors. Here, we present DeepDEP, a deep learning model that predicts cancer dependencies using integrative genomic profiles. It uses a unique unsupervised pretraining that captures unlabeled tumor genomic representations to improve the learning of cancer dependencies. We demonstrated DeepDEP's improvement over conventional machine learning methods and validated the performance with three independent datasets. By systematic model interpretations, we extended the current dependency maps with functional characterizations of dependencies and a proof-of-concept in silico assay of synthetic essentiality. We applied DeepDEP to pan-cancer tumor genomics and built the first pan-cancer synthetic dependency map of 8000 tumors with clinical relevance. In summary, DeepDEP is a novel tool for investigating cancer dependency with rapidly growing genomic resources.
Collapse
Affiliation(s)
- Yu-Chiao Chiu
- Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX 78229, USA
| | - Siyuan Zheng
- Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX 78229, USA
- Department of Population Health Sciences, University of Texas Health San Antonio, San Antonio, TX 78229, USA
| | - Li-Ju Wang
- Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX 78229, USA
| | - Brian S Iskra
- Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX 78229, USA
| | - Manjeet K Rao
- Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX 78229, USA
- Department of Cell Systems and Anatomy, University of Texas Health San Antonio, San Antonio, TX 78229, USA
| | - Peter J Houghton
- Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX 78229, USA
- Department of Molecular Medicine, University of Texas Health San Antonio, San Antonio, TX 78229, USA
| | - Yufei Huang
- University of Pittsburgh Medical Center Hillman Cancer Center, Pittsburgh, PA 15232, USA.
- Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | - Yidong Chen
- Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX 78229, USA.
- Department of Population Health Sciences, University of Texas Health San Antonio, San Antonio, TX 78229, USA
| |
Collapse
|
37
|
Shawki MM, Azmy MM, Salama M, Shawki S. Mathematical and deep learning analysis based on tissue dielectric properties at low frequencies predict outcome in human breast cancer. Technol Health Care 2021; 30:633-645. [PMID: 34366303 DOI: 10.3233/thc-213096] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
BACKGROUND The early detection of human breast cancer represents a great chance of survival. Malignant tissues have more water content and higher electrolytes concentration while they have lower fat content than the normal. These cancer biochemical characters provide malignant tissue with high electric permittivity (ε´) and conductivity (σ). OBJECTIVE To examine if the dielectric behavior of normal and malignant tissues at low frequencies (α dispersion) will lead to the threshold (separating) line between them and find the threshold values of capacitance and resistance. These data are used as input for deep learning neural networks, and the outcomes are normal or malignant. METHODS ε´ and σ in the range of 50 Hz to 100 KHz for 15 human malignant tissues and their corresponding normal ones have been measured. The separating line equation between the two classes is found by mathematical calculations and verified via support vector machine (SVM). Normal range and the threshold value of both normal capacitance and resistance are calculated. RESULTS Deep learning analysis has an accuracy of 91.7%, 85.7% sensitivity, and 100% specificity for instant and automatic prediction of the type of breast tissue, either normal or malignant. CONCLUSIONS These data can be used in both cancer diagnosis and prognosis follow-up.
Collapse
Affiliation(s)
- Mamdouh M Shawki
- Medical Biophysics Department, Medical Research Institute, Alexandria University, Alexandria, Egypt
| | - Mohamed Moustafa Azmy
- Biomedical Engineering Department, Medical Research Institute, Alexandria University, Alexandria, Egypt
| | - Mohammed Salama
- Histochemistry and Cell Biology Department, Medical Research Institute, Alexandria University, Alexandria, Egypt
| | - Sanaa Shawki
- Pathology Department, Medical Research Institute, Alexandria University, Alexandria, Egypt
| |
Collapse
|
38
|
Mostavi M, Chiu YC, Chen Y, Huang Y. CancerSiamese: one-shot learning for predicting primary and metastatic tumor types unseen during model training. BMC Bioinformatics 2021; 22:244. [PMID: 33980137 PMCID: PMC8117642 DOI: 10.1186/s12859-021-04157-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 04/27/2021] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND The state-of-the-art deep learning based cancer type prediction can only predict cancer types whose samples are available during the training where the sample size is commonly large. In this paper, we consider how to utilize the existing training samples to predict cancer types unseen during the training. We hypothesize the existence of a set of type-agnostic expression representations that define the similarity/dissimilarity between samples of the same/different types and propose a novel one-shot learning model called CancerSiamese to learn this common representation. CancerSiamese accepts a pair of query and support samples (gene expression profiles) and learns the representation of similar or dissimilar cancer types through two parallel convolutional neural networks joined by a similarity function. RESULTS We trained CancerSiamese for cancer type prediction for primary and metastatic tumors using samples from the Cancer Genome Atlas (TCGA) and MET500. Network transfer learning was utilized to facilitate the training of the CancerSiamese models. CancerSiamese was tested for different N-way predictions and yielded an average accuracy improvement of 8% and 4% over the benchmark 1-Nearest Neighbor (1-NN) classifier for primary and metastatic tumors, respectively. Moreover, we applied the guided gradient saliency map and feature selection to CancerSiamese to examine 100 and 200 top marker-gene candidates for the prediction of primary and metastatic cancers, respectively. Functional analysis of these marker genes revealed several cancer related functions between primary and metastatic tumors. CONCLUSION This work demonstrated, for the first time, the feasibility of predicting unseen cancer types whose samples are limited. Thus, it could inspire new and ingenious applications of one-shot and few-shot learning solutions for improving cancer diagnosis, prognostic, and our understanding of cancer.
Collapse
Affiliation(s)
- Milad Mostavi
- Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX, 78229, USA
- Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, TX, 78249, USA
| | - Yu-Chiao Chiu
- Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX, 78229, USA
| | - Yidong Chen
- Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX, 78229, USA.
- Department of Population Health Sciences, University of Texas Health San Antonio, San Antonio, TX, 78229, USA.
| | - Yufei Huang
- Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, TX, 78249, USA.
- Department of Population Health Sciences, University of Texas Health San Antonio, San Antonio, TX, 78229, USA.
| |
Collapse
|
39
|
Gated Graph Attention Network for Cancer Prediction. SENSORS 2021; 21:s21061938. [PMID: 33801894 PMCID: PMC7998488 DOI: 10.3390/s21061938] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Revised: 03/02/2021] [Accepted: 03/05/2021] [Indexed: 01/17/2023]
Abstract
With its increasing incidence, cancer has become one of the main causes of worldwide mortality. In this work, we mainly propose a novel attention-based neural network model named Gated Graph ATtention network (GGAT) for cancer prediction, where a gating mechanism (GM) is introduced to work with the attention mechanism (AM), to break through the previous work's limitation of 1-hop neighbourhood reasoning. In this way, our GGAT is capable of fully mining the potential correlation between related samples, helping for improving the cancer prediction accuracy. Additionally, to simplify the datasets, we propose a hybrid feature selection algorithm to strictly select gene features, which significantly reduces training time without affecting prediction accuracy. To the best of our knowledge, our proposed GGAT achieves the state-of-the-art results in cancer prediction task on LIHC, LUAD, KIRC compared to other traditional machine learning methods and neural network models, and improves the accuracy by 1% to 2% on Cora dataset, compared to the state-of-the-art graph neural network methods.
Collapse
|
40
|
Ramirez R, Chiu YC, Zhang S, Ramirez J, Chen Y, Huang Y, Jin YF. Prediction and interpretation of cancer survival using graph convolution neural networks. Methods 2021; 192:120-130. [PMID: 33484826 DOI: 10.1016/j.ymeth.2021.01.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Revised: 01/07/2021] [Accepted: 01/12/2021] [Indexed: 12/13/2022] Open
Abstract
The survival rate of cancer has increased significantly during the past two decades for breast, prostate, testicular, and colon cancer, while the brain and pancreatic cancers have a much lower median survival rate that has not improved much over the last forty years. This has imposed the challenge of finding gene markers for early cancer detection and treatment strategies. Different methods including regression-based Cox-PH, artificial neural networks, and recently deep learning algorithms have been proposed to predict the survival rate for cancers. We established in this work a novel graph convolution neural network (GCNN) approach called Surv_GCNN to predict the survival rate for 13 different cancer types using the TCGA dataset. For each cancer type, 6 Surv_GCNN models with graphs generated by correlation analysis, GeneMania database, and correlation + GeneMania were trained with and without clinical data to predict the risk score (RS). The performance of the 6 Surv_GCNN models was compared with two other existing models, Cox-PH and Cox-nnet. The results showed that Cox-PH has the worst performance among 8 tested models across the 13 cancer types while Surv_GCNN models with clinical data reported the best overall performance, outperforming other competing models in 7 out of 13 cancer types including BLCA, BRCA, COAD, LUSC, SARC, STAD, and UCEC. A novel network-based interpretation of Surv_GCNN was also proposed to identify potential gene markers for breast cancer. The signatures learned by the nodes in the hidden layer of Surv_GCNN were identified and were linked to potential gene markers by network modularization. The identified gene markers for breast cancer have been compared to a total of 213 gene markers from three widely cited lists for breast cancer survival analysis. About 57% of gene markers obtained by Surv_GCNN with correlation + GeneMania graph either overlap or directly interact with the 213 genes, confirming the effectiveness of the identified markers by Surv_GCNN.
Collapse
Affiliation(s)
- Ricardo Ramirez
- Department of Electrical and Computer Engineering, The University of Texas at San Antonio, San Antonio, TX 78249, USA
| | - Yu-Chiao Chiu
- Greehey Children's Cancer Research Institute, The University of Texas Health San Antonio, San Antonio, Texas 78229, USA
| | - SongYao Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, Department of Intelligent Science And Technology, School of Automation, Northwestern Polytechnical University, Xí'an, China
| | - Joshua Ramirez
- Department of Electrical and Computer Engineering, The University of Texas at San Antonio, San Antonio, TX 78249, USA
| | - Yidong Chen
- Greehey Children's Cancer Research Institute, The University of Texas Health San Antonio, San Antonio, Texas 78229, USA; Department of Population Health Sciences, The University of Texas Health San Antonio, San Antonio, TX 78229, USA
| | - Yufei Huang
- Department of Electrical and Computer Engineering, The University of Texas at San Antonio, San Antonio, TX 78249, USA; Department of Population Health Sciences, The University of Texas Health San Antonio, San Antonio, TX 78229, USA
| | - Yu-Fang Jin
- Department of Electrical and Computer Engineering, The University of Texas at San Antonio, San Antonio, TX 78249, USA.
| |
Collapse
|