Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Mallavarapu T, Hao J, Kim Y, Oh JH, Kang M. Pathway-based deep clustering for molecular subtyping of cancer. Methods 2019;173:24-31. [PMID: 31247294 DOI: 10.1016/j.ymeth.2019.06.017] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2019] [Revised: 05/24/2019] [Accepted: 06/16/2019] [Indexed: 12/22/2022] Open

For:	Mallavarapu T, Hao J, Kim Y, Oh JH, Kang M. Pathway-based deep clustering for molecular subtyping of cancer. Methods 2019;173:24-31. [PMID: 31247294 DOI: 10.1016/j.ymeth.2019.06.017] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2019] [Revised: 05/24/2019] [Accepted: 06/16/2019] [Indexed: 12/22/2022] Open

Number

Cited by Other Article(s)

Yang Y, Shao X, Li Z, Zhang L, Yang B, Jin B, Hu X, Qu X, Che X, Liu Y. Prognostic heterogeneity of Ki67 in non-small cell lung cancer: A comprehensive reappraisal on immunohistochemistry and transcriptional data. J Cell Mol Med 2024;28:e18521. [PMID: 39021279 PMCID: PMC11255407 DOI: 10.1111/jcmm.18521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 05/26/2024] [Accepted: 06/12/2024] [Indexed: 07/20/2024] Open

Affiliation(s)

Yujing Yang Department of Medical OncologyThe First Hospital of China Medical UniversityShenyangChina Key Laboratory of Anticancer Drugs and Biotherapy of Liaoning ProvinceThe First Hospital of China Medical UniversityShenyangChina Clinical Cancer Research Center of ShenyangThe First Hospital of China Medical UniversityShenyangChina Department of Oncology, Nanfang HospitalSouthern Medical UniversityGuangzhouChina
Xinye Shao Department of Medical OncologyThe First Hospital of China Medical UniversityShenyangChina Key Laboratory of Anticancer Drugs and Biotherapy of Liaoning ProvinceThe First Hospital of China Medical UniversityShenyangChina Clinical Cancer Research Center of ShenyangThe First Hospital of China Medical UniversityShenyangChina
Zhi Li Department of Medical OncologyThe First Hospital of China Medical UniversityShenyangChina
Lingyun Zhang Department of Medical OncologyThe First Hospital of China Medical UniversityShenyangChina Clinical Cancer Research Center of ShenyangThe First Hospital of China Medical UniversityShenyangChina
Bowen Yang Department of Medical OncologyThe First Hospital of China Medical UniversityShenyangChina
Bo Jin Department of Medical OncologyThe First Hospital of China Medical UniversityShenyangChina Clinical Cancer Research Center of ShenyangThe First Hospital of China Medical UniversityShenyangChina
Xuejun Hu Department of Respiratory and Infectious Disease of GeriatricsThe First Hospital of China Medical UniversityShenyangChina
Xiujuan Qu Department of Medical OncologyThe First Hospital of China Medical UniversityShenyangChina Key Laboratory of Anticancer Drugs and Biotherapy of Liaoning ProvinceThe First Hospital of China Medical UniversityShenyangChina Clinical Cancer Research Center of ShenyangThe First Hospital of China Medical UniversityShenyangChina
Xiaofang Che Department of Medical OncologyThe First Hospital of China Medical UniversityShenyangChina Key Laboratory of Anticancer Drugs and Biotherapy of Liaoning ProvinceThe First Hospital of China Medical UniversityShenyangChina Clinical Cancer Research Center of ShenyangThe First Hospital of China Medical UniversityShenyangChina
Yunpeng Liu Department of Medical OncologyThe First Hospital of China Medical UniversityShenyangChina Key Laboratory of Anticancer Drugs and Biotherapy of Liaoning ProvinceThe First Hospital of China Medical UniversityShenyangChina Clinical Cancer Research Center of ShenyangThe First Hospital of China Medical UniversityShenyangChina

Collapse

Huang F, Welner RS, Chen JY, Yue Z. PAGER-scFGA: unveiling cell functions and molecular mechanisms in cell trajectories through single-cell functional genomics analysis. FRONTIERS IN BIOINFORMATICS 2024;4:1336135. [PMID: 38690527 PMCID: PMC11058213 DOI: 10.3389/fbinf.2024.1336135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 04/01/2024] [Indexed: 05/02/2024] Open

Abstract

Background: Understanding how cells and tissues respond to stress factors and perturbations during disease processes is crucial for developing effective prevention, diagnosis, and treatment strategies. Single-cell RNA sequencing (scRNA-seq) enables high-resolution identification of cells and exploration of cell heterogeneity, shedding light on cell differentiation/maturation and functional differences. Recent advancements in multimodal sequencing technologies have focused on improving access to cell-specific subgroups for functional genomics analysis. To facilitate the functional annotation of cell groups and characterization of molecular mechanisms underlying cell trajectories, we introduce the Pathways, Annotated Gene Lists, and Gene Signatures Electronic Repository for Single-Cell Functional Genomics Analysis (PAGER-scFGA). Results: We have developed PAGER-scFGA, which integrates cell functional annotations and gene-set enrichment analysis into popular single-cell analysis pipelines such as Scanpy. Using differentially expressed genes (DEGs) from pairwise cell clusters, PAGER-scFGA infers cell functions through the enrichment of potential cell-marker genesets. Moreover, PAGER-scFGA provides pathways, annotated gene lists, and gene signatures (PAGs) enriched in specific cell subsets with tissue compositions and continuous transitions along cell trajectories. Additionally, PAGER-scFGA enables the construction of a gene subcellular map based on DEGs and allows examination of the gene functional compartments (GFCs) underlying cell maturation/differentiation. In a real-world case study of mouse natural killer (mNK) cells, PAGER-scFGA revealed two major stages of natural killer (NK) cells and three trajectories from the precursor stage to NK T-like mature stage within blood, spleen, and bone marrow tissues. As the trajectories progress to later stages, the DEGs exhibit greater divergence and variability. However, the DEGs in different trajectories still interact within a network during NK cell maturation. Notably, PAGER-scFGA unveiled cell cytotoxicity, exocytosis, and the response to interleukin (IL) signaling pathways and associated network models during the progression from precursor NK cells to mature NK cells. Conclusion: PAGER-scFGA enables in-depth exploration of functional insights and presents a comprehensive knowledge map of gene networks and GFCs, which can be utilized for future studies and hypothesis generation. It is expected to become an indispensable tool for inferring cell functions and detecting molecular mechanisms within cell trajectories in single-cell studies. The web app (accessible at https://au-singlecell.streamlit.app/) is publicly available.

Collapse

Chafai N, Bonizzi L, Botti S, Badaoui B. Emerging applications of machine learning in genomic medicine and healthcare. Crit Rev Clin Lab Sci 2024;61:140-163. [PMID: 37815417 DOI: 10.1080/10408363.2023.2259466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 09/12/2023] [Indexed: 10/11/2023]

Jiang L, Xu C, Bai Y, Liu A, Gong Y, Wang YP, Deng HW. Autosurv: interpretable deep learning framework for cancer survival analysis incorporating clinical and multi-omics data. NPJ Precis Oncol 2024;8:4. [PMID: 38182734 PMCID: PMC10770412 DOI: 10.1038/s41698-023-00494-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Accepted: 12/05/2023] [Indexed: 01/07/2024] Open

Jiang L, Xu C, Bai Y, Liu A, Gong Y, Wang YP, Deng HW. AUTOSURV: INTERPRETABLE DEEP LEARNING FRAMEWORK FOR CANCER SURVIVAL ANALYSIS INCORPORATING CLINICAL AND MULTI-OMICS DATA. RESEARCH SQUARE 2023:rs.3.rs-2486756. [PMID: 37609286 PMCID: PMC10441464 DOI: 10.21203/rs.3.rs-2486756/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]

Liang B, Gong H, Lu L, Xu J. Risk stratification and pathway analysis based on graph neural network and interpretable algorithm. BMC Bioinformatics 2022;23:394. [PMID: 36167504 PMCID: PMC9516820 DOI: 10.1186/s12859-022-04950-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Accepted: 09/19/2022] [Indexed: 12/01/2022] Open

Diaz-Uriarte R, Gómez de Lope E, Giugno R, Fröhlich H, Nazarov PV, Nepomuceno-Chamorro IA, Rauschenberger A, Glaab E. Ten quick tips for biomarker discovery and validation analyses using machine learning. PLoS Comput Biol 2022;18:e1010357. [PMID: 35951526 PMCID: PMC9371329 DOI: 10.1371/journal.pcbi.1010357] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open

Weng Z, Yue Z, Zhu Y, Chen JY. DEMA: a distance-bounded energy-field minimization algorithm to model and layout biomolecular networks with quantitative features. Bioinformatics 2022;38:i359-i368. [PMID: 35758816 PMCID: PMC9235497 DOI: 10.1093/bioinformatics/btac261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open

Abstract

SUMMARY

In biology, graph layout algorithms can reveal comprehensive biological contexts by visually positioning graph nodes in their relevant neighborhoods. A layout software algorithm/engine commonly takes a set of nodes and edges and produces layout coordinates of nodes according to edge constraints. However, current layout engines normally do not consider node, edge or node-set properties during layout and only curate these properties after the layout is created. Here, we propose a new layout algorithm, distance-bounded energy-field minimization algorithm (DEMA), to natively consider various biological factors, i.e., the strength of gene-to-gene association, the gene's relative contribution weight and the functional groups of genes, to enhance the interpretation of complex network graphs. In DEMA, we introduce a parameterized energy model where nodes are repelled by the network topology and attracted by a few biological factors, i.e., interaction coefficient, effect coefficient and fold change of gene expression. We generalize these factors as gene weights, protein-protein interaction weights, gene-to-gene correlations and the gene set annotations-four parameterized functional properties used in DEMA. Moreover, DEMA considers further attraction/repulsion/grouping coefficient to enable different preferences in generating network views. Applying DEMA, we performed two case studies using genetic data in autism spectrum disorder and Alzheimer's disease, respectively, for gene candidate discovery. Furthermore, we implement our algorithm as a plugin to Cytoscape, an open-source software platform for visualizing networks; hence, it is convenient. Our software and demo can be freely accessed at http://discovery.informatics.uab.edu/dema.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Quazi S. Artificial intelligence and machine learning in precision and genomic medicine. Med Oncol 2022;39:120. [PMID: 35704152 PMCID: PMC9198206 DOI: 10.1007/s12032-022-01711-1] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Accepted: 03/14/2022] [Indexed: 10/28/2022]

Quazi S. Artificial intelligence and machine learning in precision and genomic medicine. Med Oncol 2022;39:120. [PMID: 35704152 PMCID: PMC9198206 DOI: 10.1007/s12032-022-01711-1;lastaccessedondecember18,2022at1730hrs] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/28/2023]

Yue Z, Slominski R, Bharti S, Chen JY. PAGER Web APP: An Interactive, Online Gene Set and Network Interpretation Tool for Functional Genomics. Front Genet 2022;13:820361. [PMID: 35495152 PMCID: PMC9039620 DOI: 10.3389/fgene.2022.820361] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Accepted: 03/17/2022] [Indexed: 12/30/2022] Open

Artificial Intelligence for Precision Oncology. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022;1361:249-268. [DOI: 10.1007/978-3-030-91836-1_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Rintala TJ, Federico A, Latonen L, Greco D, Fortino V. A systematic comparison of data- and knowledge-driven approaches to disease subtype discovery. Brief Bioinform 2021;22:6350885. [PMID: 34396389 PMCID: PMC8575038 DOI: 10.1093/bib/bbab314] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 07/05/2021] [Accepted: 07/20/2021] [Indexed: 12/14/2022] Open

Abstract

Typical clustering analysis for large-scale genomics data combines two unsupervised learning techniques: dimensionality reduction and clustering (DR-CL) methods. It has been demonstrated that transforming gene expression to pathway-level information can improve the robustness and interpretability of disease grouping results. This approach, referred to as biological knowledge-driven clustering (BK-CL) approach, is often neglected, due to a lack of tools enabling systematic comparisons with more established DR-based methods. Moreover, classic clustering metrics based on group separability tend to favor the DR-CL paradigm, which may increase the risk of identifying less actionable disease subtypes that have ambiguous biological and clinical explanations. Hence, there is a need for developing metrics that assess biological and clinical relevance. To facilitate the systematic analysis of BK-CL methods, we propose a computational protocol for quantitative analysis of clustering results derived from both DR-CL and BK-CL methods. Moreover, we propose a new BK-CL method that combines prior knowledge of disease relevant genes, network diffusion algorithms and gene set enrichment analysis to generate robust pathway-level information. Benchmarking studies were conducted to compare the grouping results from different DR-CL and BK-CL approaches with respect to standard clustering evaluation metrics, concordance with known subtypes, association with clinical outcomes and disease modules in co-expression networks of genes. No single approach dominated every metric, showing the importance multi-objective evaluation in clustering analysis. However, we demonstrated that, on gene expression data sets derived from TCGA samples, the BK-CL approach can find groupings that provide significant prognostic value in both breast and prostate cancers.

Collapse

Duan R, Gao L, Gao Y, Hu Y, Xu H, Huang M, Song K, Wang H, Dong Y, Jiang C, Zhang C, Jia S. Evaluation and comparison of multi-omics data integration methods for cancer subtyping. PLoS Comput Biol 2021;17:e1009224. [PMID: 34383739 PMCID: PMC8384175 DOI: 10.1371/journal.pcbi.1009224] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Revised: 08/24/2021] [Accepted: 06/28/2021] [Indexed: 11/18/2022] Open

Abstract

Computational integrative analysis has become a significant approach in the data-driven exploration of biological problems. Many integration methods for cancer subtyping have been proposed, but evaluating these methods has become a complicated problem due to the lack of gold standards. Moreover, questions of practical importance remain to be addressed regarding the impact of selecting appropriate data types and combinations on the performance of integrative studies. Here, we constructed three classes of benchmarking datasets of nine cancers in TCGA by considering all the eleven combinations of four multi-omics data types. Using these datasets, we conducted a comprehensive evaluation of ten representative integration methods for cancer subtyping in terms of accuracy measured by combining both clustering accuracy and clinical significance, robustness, and computational efficiency. We subsequently investigated the influence of different omics data on cancer subtyping and the effectiveness of their combinations. Refuting the widely held intuition that incorporating more types of omics data always produces better results, our analyses showed that there are situations where integrating more omics data negatively impacts the performance of integration methods. Our analyses also suggested several effective combinations for most cancers under our studies, which may be of particular interest to researchers in omics data analysis.

Cancer is one of the most heterogeneous diseases, characterized by diverse morphological, phenotypic, and genomic profiles between tumors and their subtypes. Identifying cancer subtypes can help patients receive precise treatments. With the development of high-throughput technologies, genomics, epigenomics, and transcriptomics data have been generated for large cancer patient cohorts. It is believed that the more omics data we use, the more accurate identification of cancer subtypes. To examine this assumption, we first constructed three classes of benchmarking datasets to conduct a comprehensive evaluation and comparison of ten representative multi-omics data integration methods for cancer subtyping by considering their accuracy, robustness, and computational efficiency. Then, we investigated the influence of different omics data and their various combinations on the effectiveness of cancer subtyping. Our analyses showed that there are situations where integrating more omics data negatively impacts the performance of integration methods. We hope that our work may help researchers choose a proper method and an effective data combination when identifying cancer subtypes using data integration methods.

Collapse

Oh JH, Choi W, Ko E, Kang M, Tannenbaum A, Deasy JO. PathCNN: interpretable convolutional neural networks for survival prediction and pathway analysis applied to glioblastoma. Bioinformatics 2021;37:i443-i450. [PMID: 34252964 PMCID: PMC8336441 DOI: 10.1093/bioinformatics/btab285] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open

Liñares-Blanco J, Pazos A, Fernandez-Lozano C. Machine learning analysis of TCGA cancer data. PeerJ Comput Sci 2021;7:e584. [PMID: 34322589 PMCID: PMC8293929 DOI: 10.7717/peerj-cs.584] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Accepted: 05/17/2021] [Indexed: 06/13/2023]

Abstract

In recent years, machine learning (ML) researchers have changed their focus towards biological problems that are difficult to analyse with standard approaches. Large initiatives such as The Cancer Genome Atlas (TCGA) have allowed the use of omic data for the training of these algorithms. In order to study the state of the art, this review is provided to cover the main works that have used ML with TCGA data. Firstly, the principal discoveries made by the TCGA consortium are presented. Once these bases have been established, we begin with the main objective of this study, the identification and discussion of those works that have used the TCGA data for the training of different ML approaches. After a review of more than 100 different papers, it has been possible to make a classification according to following three pillars: the type of tumour, the type of algorithm and the predicted biological problem. One of the conclusions drawn in this work shows a high density of studies based on two major algorithms: Random Forest and Support Vector Machines. We also observe the rise in the use of deep artificial neural networks. It is worth emphasizing, the increase of integrative models of multi-omic data analysis. The different biological conditions are a consequence of molecular homeostasis, driven by both protein coding regions, regulatory elements and the surrounding environment. It is notable that a large number of works make use of genetic expression data, which has been found to be the preferred method by researchers when training the different models. The biological problems addressed have been classified into five types: prognosis prediction, tumour subtypes, microsatellite instability (MSI), immunological aspects and certain pathways of interest. A clear trend was detected in the prediction of these conditions according to the type of tumour. That is the reason for which a greater number of works have focused on the BRCA cohort, while specific works for survival, for example, were centred on the GBM cohort, due to its large number of events. Throughout this review, it will be possible to go in depth into the works and the methodologies used to study TCGA cancer data. Finally, it is intended that this work will serve as a basis for future research in this field of study.

Collapse

Patel SK, George B, Rai V. Artificial Intelligence to Decode Cancer Mechanism: Beyond Patient Stratification for Precision Oncology. Front Pharmacol 2020;11:1177. [PMID: 32903628 PMCID: PMC7438594 DOI: 10.3389/fphar.2020.01177] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2020] [Accepted: 07/20/2020] [Indexed: 12/13/2022] Open

Abstract

The multitude of multi-omics data generated cost-effectively using advanced high-throughput technologies has imposed challenging domain for research in Artificial Intelligence (AI). Data curation poses a significant challenge as different parameters, instruments, and sample preparations approaches are employed for generating these big data sets. AI could reduce the fuzziness and randomness in data handling and build a platform for the data ecosystem, and thus serve as the primary choice for data mining and big data analysis to make informed decisions. However, AI implication remains intricate for researchers/clinicians lacking specific training in computational tools and informatics. Cancer is a major cause of death worldwide, accounting for an estimated 9.6 million deaths in 2018. Certain cancers, such as pancreatic and gastric cancers, are detected only after they have reached their advanced stages with frequent relapses. Cancer is one of the most complex diseases affecting a range of organs with diverse disease progression mechanisms and the effectors ranging from gene-epigenetics to a wide array of metabolites. Hence a comprehensive study, including genomics, epi-genomics, transcriptomics, proteomics, and metabolomics, along with the medical/mass-spectrometry imaging, patient clinical history, treatments provided, genetics, and disease endemicity, is essential. Cancer Moonshot℠ Research Initiatives by NIH National Cancer Institute aims to collect as much information as possible from different regions of the world and make a cancer data repository. AI could play an immense role in (a) analysis of complex and heterogeneous data sets (multi-omics and/or inter-omics), (b) data integration to provide a holistic disease molecular mechanism, (c) identification of diagnostic and prognostic markers, and (d) monitor patient's response to drugs/treatments and recovery. AI enables precision disease management well beyond the prevalent disease stratification patterns, such as differential expression and supervised classification. This review highlights critical advances and challenges in omics data analysis, dealing with data variability from lab-to-lab, and data integration. We also describe methods used in data mining and AI methods to obtain robust results for precision medicine from "big" data. In the future, AI could be expanded to achieve ground-breaking progress in disease management.

Collapse

Hao J, Kim Y, Mallavarapu T, Oh JH, Kang M. Interpretable deep neural network for cancer survival analysis by integrating genomic and clinical data. BMC Med Genomics 2019;12:189. [PMID: 31865908 PMCID: PMC6927105 DOI: 10.1186/s12920-019-0624-2] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open

Abstract

Background

Understanding the complex biological mechanisms of cancer patient survival using genomic and clinical data is vital, not only to develop new treatments for patients, but also to improve survival prediction. However, highly nonlinear and high-dimension, low-sample size (HDLSS) data cause computational challenges to applying conventional survival analysis.

Results

We propose a novel biologically interpretable pathway-based sparse deep neural network, named Cox-PASNet, which integrates high-dimensional gene expression data and clinical data on a simple neural network architecture for survival analysis. Cox-PASNet is biologically interpretable where nodes in the neural network correspond to biological genes and pathways, while capturing the nonlinear and hierarchical effects of biological pathways associated with cancer patient survival. We also propose a heuristic optimization solution to train Cox-PASNet with HDLSS data. Cox-PASNet was intensively evaluated by comparing the predictive performance of current state-of-the-art methods on glioblastoma multiforme (GBM) and ovarian serous cystadenocarcinoma (OV) cancer. In the experiments, Cox-PASNet showed out-performance, compared to the benchmarking methods. Moreover, the neural network architecture of Cox-PASNet was biologically interpreted, and several significant prognostic factors of genes and biological pathways were identified.

Conclusions

Cox-PASNet models biological mechanisms in the neural network by incorporating biological pathway databases and sparse coding. The neural network of Cox-PASNet can identify nonlinear and hierarchical associations of genomic and clinical data to cancer patient survival. The open-source code of Cox-PASNet in PyTorch implemented for training, evaluation, and model interpretation is available at: https://github.com/DataX-JieHao/Cox-PASNet.

Collapse