1
|
Mallik S, Zhao Z. Graph- and rule-based learning algorithms: a comprehensive review of their applications for cancer type classification and prognosis using genomic data. Brief Bioinform 2020; 21:368-394. [PMID: 30649169 PMCID: PMC7373185 DOI: 10.1093/bib/bby120] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Revised: 10/26/2018] [Accepted: 11/21/2018] [Indexed: 12/20/2022] Open
Abstract
Cancer is well recognized as a complex disease with dysregulated molecular networks or modules. Graph- and rule-based analytics have been applied extensively for cancer classification as well as prognosis using large genomic and other data over the past decade. This article provides a comprehensive review of various graph- and rule-based machine learning algorithms that have been applied to numerous genomics data to determine the cancer-specific gene modules, identify gene signature-based classifiers and carry out other related objectives of potential therapeutic value. This review focuses mainly on the methodological design and features of these algorithms to facilitate the application of these graph- and rule-based analytical approaches for cancer classification and prognosis. Based on the type of data integration, we divided all the algorithms into three categories: model-based integration, pre-processing integration and post-processing integration. Each category is further divided into four sub-categories (supervised, unsupervised, semi-supervised and survival-driven learning analyses) based on learning style. Therefore, a total of 11 categories of methods are summarized with their inputs, objectives and description, advantages and potential limitations. Next, we briefly demonstrate well-known and most recently developed algorithms for each sub-category along with salient information, such as data profiles, statistical or feature selection methods and outputs. Finally, we summarize the appropriate use and efficiency of all categories of graph- and rule mining-based learning methods when input data and specific objective are given. This review aims to help readers to select and use the appropriate algorithms for cancer classification and prognosis study.
Collapse
Affiliation(s)
- Saurav Mallik
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center, Houston
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center, Houston
| |
Collapse
|
2
|
Xi J, Li A, Wang M. HetRCNA: A Novel Method to Identify Recurrent Copy Number Alternations from Heterogeneous Tumor Samples Based on Matrix Decomposition Framework. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:422-434. [PMID: 29994262 DOI: 10.1109/tcbb.2018.2846599] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
A common strategy to discovering cancer associated copy number aberrations (CNAs) from a cohort of cancer samples is to detect recurrent CNAs (RCNAs). Although the previous methods can successfully identify communal RCNAs shared by nearly all tumor samples, detecting subgroup-specific RCNAs and their related subgroup samples from cancer samples with heterogeneity is still invalid for these existing approaches. In this paper, we introduce a novel integrated method called HetRCNA, which can identify statistically significant subgroup-specific RCNAs and their related subgroup samples. Based on matrix decomposition framework with weight constraint, HetRCNA can successfully measure the subgroup samples by coefficients of left vectors with weight constraint and subgroup-specific RCNAs by coefficients of the right vectors and significance test. When we evaluate HetRCNA on simulated dataset, the results show that HetRCNA gives the best performances among the competing methods and is robust to the noise factors of the simulated data. When HetRCNA is applied on a real breast cancer dataset, our approach successfully identifies a bunch of RCNA regions and the result is highly correlated with the results of the other two investigated approaches. Notably, the genomic regions identified by HetRCNA harbor many breast cancer related genes reported by previous researches.
Collapse
|
3
|
Kalamohan K, Gunasekaran P, Ibrahim S. Gene coexpression network analysis of multiple cancers discovers the varying stem cell features between gastric and breast cancer. Meta Gene 2019. [DOI: 10.1016/j.mgene.2019.100576] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open
|
4
|
Li M, Li C, Liu WX, Liu C, Cui J, Li Q, Ni H, Yang Y, Wu C, Chen C, Zhen X, Zeng T, Zhao M, Chen L, Wu J, Zeng R, Chen L. Dysfunction of PLA2G6 and CYP2C44-associated network signals imminent carcinogenesis from chronic inflammation to hepatocellular carcinoma. J Mol Cell Biol 2019; 9:489-503. [PMID: 28655161 PMCID: PMC5907842 DOI: 10.1093/jmcb/mjx021] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2016] [Accepted: 06/16/2017] [Indexed: 12/14/2022] Open
Abstract
Little is known about how chronic inflammation contributes to the progression of hepatocellular carcinoma (HCC), especially the initiation of cancer. To uncover the critical transition from chronic inflammation to HCC and the molecular mechanisms at a network level, we analyzed the time-series proteomic data of woodchuck hepatitis virus/c-myc mice and age-matched wt-C57BL/6 mice using our dynamical network biomarker (DNB) model. DNB analysis indicated that the 5th month after birth of transgenic mice was the critical period of cancer initiation, just before the critical transition, which is consistent with clinical symptoms. Meanwhile, the DNB-associated network showed a drastic inversion of protein expression and coexpression levels before and after the critical transition. Two members of DNB, PLA2G6 and CYP2C44, along with their associated differentially expressed proteins, were found to induce dysfunction of arachidonic acid metabolism, further activate inflammatory responses through inflammatory mediator regulation of transient receptor potential channels, and finally lead to impairments of liver detoxification and malignant transition to cancer. As a c-Myc target, PLA2G6 positively correlated with c-Myc in expression, showing a trend from decreasing to increasing during carcinogenesis, with the minimal point at the critical transition or tipping point. Such trend of homologous PLA2G6 and c-Myc was also observed during human hepatocarcinogenesis, with the minimal point at high-grade dysplastic nodules (a stage just before the carcinogenesis). Our study implies that PLA2G6 might function as an oncogene like famous c-Myc during hepatocarcinogenesis, while downregulation of PLA2G6 and c-Myc could be a warning signal indicating imminent carcinogenesis.
Collapse
Affiliation(s)
- Meiyi Li
- Key Laboratory of Systems Biology, CAS center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, China.,Minhang Hospital, Fudan University, Shanghai, China
| | - Chen Li
- Key Laboratory of Systems Biology, CAS center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, China
| | - Wei-Xin Liu
- Key Laboratory of Systems Biology, CAS center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,School of Life Science and Technology, ShanghaiTech University, Shanghai, China.,University of Chinese Academy of sciences, Beijing, China
| | - Conghui Liu
- Key Laboratory of Systems Biology, CAS center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, China.,University of Chinese Academy of sciences, Beijing, China
| | - Jingru Cui
- Key Laboratory of Systems Biology, CAS center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, China
| | - Qingrun Li
- Key Laboratory of Systems Biology, CAS center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, China
| | - Hong Ni
- Key Laboratory of Systems Biology, CAS center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, China
| | - Yingcheng Yang
- International Co-operation Laboratory on Signal Transduction, Eastern Hepatobiliary Surgery Institute, Second Military Medical University, Shanghai, China
| | - Chaochao Wu
- Key Laboratory of Systems Biology, CAS center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, China
| | - Chunlei Chen
- Key Laboratory of Systems Biology, CAS center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, China
| | - Xing Zhen
- Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, China
| | - Tao Zeng
- Key Laboratory of Systems Biology, CAS center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, China
| | - Mujun Zhao
- Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, China
| | - Lei Chen
- International Co-operation Laboratory on Signal Transduction, Eastern Hepatobiliary Surgery Institute, Second Military Medical University, Shanghai, China.,National Center for Liver Cancer, Shanghai, China
| | - Jiarui Wu
- Key Laboratory of Systems Biology, CAS center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, China.,School of Life Science and Technology, ShanghaiTech University, Shanghai, China.,Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai, China
| | - Rong Zeng
- Key Laboratory of Systems Biology, CAS center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, China.,School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Luonan Chen
- Key Laboratory of Systems Biology, CAS center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, China.,Minhang Hospital, Fudan University, Shanghai, China.,School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| |
Collapse
|
5
|
Abstract
The diversity and huge omics data take biology and biomedicine research and application into a big data era, just like that popular in human society a decade ago. They are opening a new challenge from horizontal data ensemble (e.g., the similar types of data collected from different labs or companies) to vertical data ensemble (e.g., the different types of data collected for a group of person with match information), which requires the integrative analysis in biology and biomedicine and also asks for emergent development of data integration to address the great changes from previous population-guided to newly individual-guided investigations.Data integration is an effective concept to solve the complex problem or understand the complicate system. Several benchmark studies have revealed the heterogeneity and trade-off that existed in the analysis of omics data. Integrative analysis can combine and investigate many datasets in a cost-effective reproducible way. Current integration approaches on biological data have two modes: one is "bottom-up integration" mode with follow-up manual integration, and the other one is "top-down integration" mode with follow-up in silico integration.This paper will firstly summarize the combinatory analysis approaches to give candidate protocol on biological experiment design for effectively integrative study on genomics and then survey the data fusion approaches to give helpful instruction on computational model development for biological significance detection, which have also provided newly data resources and analysis tools to support the precision medicine dependent on the big biomedical data. Finally, the problems and future directions are highlighted for integrative analysis of omics big data.
Collapse
Affiliation(s)
- Xiang-Tian Yu
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy Science, Shanghai, China
| | - Tao Zeng
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy Science, Shanghai, China.
| |
Collapse
|
6
|
Zhang W, Chien J, Yong J, Kuang R. Network-based machine learning and graph theory algorithms for precision oncology. NPJ Precis Oncol 2017; 1:25. [PMID: 29872707 PMCID: PMC5871915 DOI: 10.1038/s41698-017-0029-7] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2017] [Revised: 06/28/2017] [Accepted: 06/29/2017] [Indexed: 01/07/2023] Open
Abstract
Network-based analytics plays an increasingly important role in precision oncology. Growing evidence in recent studies suggests that cancer can be better understood through mutated or dysregulated pathways or networks rather than individual mutations and that the efficacy of repositioned drugs can be inferred from disease modules in molecular networks. This article reviews network-based machine learning and graph theory algorithms for integrative analysis of personal genomic data and biomedical knowledge bases to identify tumor-specific molecular mechanisms, candidate targets and repositioned drugs for personalized treatment. The review focuses on the algorithmic design and mathematical formulation of these methods to facilitate applications and implementations of network-based analysis in the practice of precision oncology. We review the methods applied in three scenarios to integrate genomic data and network models in different analysis pipelines, and we examine three categories of network-based approaches for repositioning drugs in drug-disease-gene networks. In addition, we perform a comprehensive subnetwork/pathway analysis of mutations in 31 cancer genome projects in the Cancer Genome Atlas and present a detailed case study on ovarian cancer. Finally, we discuss interesting observations, potential pitfalls and future directions in network-based precision oncology.
Collapse
Affiliation(s)
- Wei Zhang
- 1Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, MN USA
| | - Jeremy Chien
- 2Department of Cancer Biology, University of Kansas Medical Center, Kansas City, KS USA
| | - Jeongsik Yong
- 3Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota Twin Cities, Minneapolis, MN USA
| | - Rui Kuang
- 1Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, MN USA
| |
Collapse
|
7
|
Delaney JR, Patel CB, Willis KM, Haghighiabyaneh M, Axelrod J, Tancioni I, Lu D, Bapat J, Young S, Cadassou O, Bartakova A, Sheth P, Haft C, Hui S, Saenz C, Schlaepfer DD, Harismendy O, Stupack DG. Haploinsufficiency networks identify targetable patterns of allelic deficiency in low mutation ovarian cancer. Nat Commun 2017; 8:14423. [PMID: 28198375 PMCID: PMC5316854 DOI: 10.1038/ncomms14423] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Accepted: 12/28/2016] [Indexed: 01/11/2023] Open
Abstract
Identification of specific oncogenic gene changes has enabled the modern generation of targeted cancer therapeutics. In high-grade serous ovarian cancer (OV), the bulk of genetic changes is not somatic point mutations, but rather somatic copy-number alterations (SCNAs). The impact of SCNAs on tumour biology remains poorly understood. Here we build haploinsufficiency network analyses to identify which SCNA patterns are most disruptive in OV. Of all KEGG pathways (N=187), autophagy is the most significantly disrupted by coincident gene deletions. Compared with 20 other cancer types, OV is most severely disrupted in autophagy and in compensatory proteostasis pathways. Network analysis prioritizes MAP1LC3B (LC3) and BECN1 as most impactful. Knockdown of LC3 and BECN1 expression confers sensitivity to cells undergoing autophagic stress independent of platinum resistance status. The results support the use of pathway network tools to evaluate how the copy-number landscape of a tumour may guide therapy.
Collapse
Affiliation(s)
- Joe Ryan Delaney
- Division of Gynecologic Oncology, Department of Reproductive Medicine, UCSD School of Medicine and UCSD Moores Cancer Center, 3855 Health Sciences Drive, La Jolla, California 39216, USA
| | - Chandni B Patel
- Division of Gynecologic Oncology, Department of Reproductive Medicine, UCSD School of Medicine and UCSD Moores Cancer Center, 3855 Health Sciences Drive, La Jolla, California 39216, USA
| | - Katelyn McCabe Willis
- Division of Gynecologic Oncology, Department of Reproductive Medicine, UCSD School of Medicine and UCSD Moores Cancer Center, 3855 Health Sciences Drive, La Jolla, California 39216, USA
| | - Mina Haghighiabyaneh
- Division of Gynecologic Oncology, Department of Reproductive Medicine, UCSD School of Medicine and UCSD Moores Cancer Center, 3855 Health Sciences Drive, La Jolla, California 39216, USA
| | - Joshua Axelrod
- Division of Gynecologic Oncology, Department of Reproductive Medicine, UCSD School of Medicine and UCSD Moores Cancer Center, 3855 Health Sciences Drive, La Jolla, California 39216, USA
| | - Isabelle Tancioni
- Division of Gynecologic Oncology, Department of Reproductive Medicine, UCSD School of Medicine and UCSD Moores Cancer Center, 3855 Health Sciences Drive, La Jolla, California 39216, USA
| | - Dan Lu
- Division of Gynecologic Oncology, Department of Reproductive Medicine, UCSD School of Medicine and UCSD Moores Cancer Center, 3855 Health Sciences Drive, La Jolla, California 39216, USA
| | - Jaidev Bapat
- Division of Gynecologic Oncology, Department of Reproductive Medicine, UCSD School of Medicine and UCSD Moores Cancer Center, 3855 Health Sciences Drive, La Jolla, California 39216, USA
| | - Shanique Young
- Division of Gynecologic Oncology, Department of Reproductive Medicine, UCSD School of Medicine and UCSD Moores Cancer Center, 3855 Health Sciences Drive, La Jolla, California 39216, USA
| | - Octavia Cadassou
- Division of Gynecologic Oncology, Department of Reproductive Medicine, UCSD School of Medicine and UCSD Moores Cancer Center, 3855 Health Sciences Drive, La Jolla, California 39216, USA.,Centre de recherche en Cancérologie, INSERM 1052, CNRS 5286, Centre Léon Bérard, Université de Lyon, Lyon, France
| | - Alena Bartakova
- Division of Gynecologic Oncology, Department of Reproductive Medicine, UCSD School of Medicine and UCSD Moores Cancer Center, 3855 Health Sciences Drive, La Jolla, California 39216, USA
| | - Parthiv Sheth
- Division of Gynecologic Oncology, Department of Reproductive Medicine, UCSD School of Medicine and UCSD Moores Cancer Center, 3855 Health Sciences Drive, La Jolla, California 39216, USA
| | - Carley Haft
- Division of Gynecologic Oncology, Department of Reproductive Medicine, UCSD School of Medicine and UCSD Moores Cancer Center, 3855 Health Sciences Drive, La Jolla, California 39216, USA
| | - Sandra Hui
- Division of Gynecologic Oncology, Department of Reproductive Medicine, UCSD School of Medicine and UCSD Moores Cancer Center, 3855 Health Sciences Drive, La Jolla, California 39216, USA
| | - Cheryl Saenz
- Division of Gynecologic Oncology, Department of Reproductive Medicine, UCSD School of Medicine and UCSD Moores Cancer Center, 3855 Health Sciences Drive, La Jolla, California 39216, USA
| | - David D Schlaepfer
- Division of Gynecologic Oncology, Department of Reproductive Medicine, UCSD School of Medicine and UCSD Moores Cancer Center, 3855 Health Sciences Drive, La Jolla, California 39216, USA
| | - Olivier Harismendy
- Division of Biomedical Informatics, Department of Medicine, UCSD School of Medicine and UCSD Moores Cancer Center, 3855 Health Sciences Drive, La Jolla, California 92093, USA
| | - Dwayne G Stupack
- Division of Gynecologic Oncology, Department of Reproductive Medicine, UCSD School of Medicine and UCSD Moores Cancer Center, 3855 Health Sciences Drive, La Jolla, California 39216, USA
| |
Collapse
|
8
|
Luo J, Xiang G, Pan C. Discovery of microRNAs and Transcription Factors Co-Regulatory Modules by Integrating Multiple Types of Genomic Data. IEEE Trans Nanobioscience 2017; 16:51-59. [DOI: 10.1109/tnb.2017.2649560] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
9
|
Reyes-Palomares A, Bueno A, Rodríguez-López R, Medina MÁ, Sánchez-Jiménez F, Corpas M, Ranea JAG. Systematic identification of phenotypically enriched loci using a patient network of genomic disorders. BMC Genomics 2016; 17:232. [PMID: 26980139 PMCID: PMC4792099 DOI: 10.1186/s12864-016-2569-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Accepted: 03/07/2016] [Indexed: 11/29/2022] Open
Abstract
Background Network medicine is a promising new discipline that combines systems biology approaches and network science to understand the complexity of pathological phenotypes. Given the growing availability of personalized genomic and phenotypic profiles, network models offer a robust integrative framework for the analysis of "omics" data, allowing the characterization of the molecular aetiology of pathological processes underpinning genetic diseases. Methods Here we make use of patient genomic data to exploit different network-based analyses to study genetic and phenotypic relationships between individuals. For this method, we analyzed a dataset of structural variants and phenotypes for 6,564 patients from the DECIPHER database, which encompasses one of the most comprehensive collections of pathogenic Copy Number Variations (CNVs) and their associated ontology-controlled phenotypes. We developed a computational strategy that identifies clusters of patients in a synthetic patient network according to their genetic overlap and phenotype enrichments. Results Many of these clusters of patients represent new genotype-phenotype associations, suggesting the identification of newly discovered phenotypically enriched loci (indicative of potential novel syndromes) that are currently absent from reference genomic disorder databases such as ClinVar, OMIM or DECIPHER itself. Conclusions We provide a high-resolution map of pathogenic phenotypes associated with their respective significant genomic regions and a new powerful tool for diagnosis of currently uncharacterized mutations leading to deleterious phenotypes and syndromes. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2569-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Armando Reyes-Palomares
- Universidad de Málaga, Andalucía Tech, Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, and IBIMA (Biomedical Research Institute of Málaga), E-29071, Málaga, Spain. .,CIBER de Enfermedades Raras (CIBERER), E-29071, Málaga, Spain. .,Present address: The European Molecular Biology Laboratory Heidelberg, 69117, Heidelberg, Germany.
| | - Aníbal Bueno
- Universidad de Málaga, Andalucía Tech, Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, and IBIMA (Biomedical Research Institute of Málaga), E-29071, Málaga, Spain
| | - Rocío Rodríguez-López
- Universidad de Málaga, Andalucía Tech, Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, and IBIMA (Biomedical Research Institute of Málaga), E-29071, Málaga, Spain.,CIBER de Enfermedades Raras (CIBERER), E-29071, Málaga, Spain
| | - Miguel Ángel Medina
- Universidad de Málaga, Andalucía Tech, Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, and IBIMA (Biomedical Research Institute of Málaga), E-29071, Málaga, Spain.,CIBER de Enfermedades Raras (CIBERER), E-29071, Málaga, Spain
| | - Francisca Sánchez-Jiménez
- Universidad de Málaga, Andalucía Tech, Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, and IBIMA (Biomedical Research Institute of Málaga), E-29071, Málaga, Spain.,CIBER de Enfermedades Raras (CIBERER), E-29071, Málaga, Spain
| | - Manuel Corpas
- The Genome Analysis Centre, Norwich Research Park, Norwich, NR4 7UH, UK
| | - Juan A G Ranea
- Universidad de Málaga, Andalucía Tech, Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, and IBIMA (Biomedical Research Institute of Málaga), E-29071, Málaga, Spain. .,CIBER de Enfermedades Raras (CIBERER), E-29071, Málaga, Spain.
| |
Collapse
|
10
|
Park S, Kim SJ, Yu D, Peña-Llopis S, Gao J, Park JS, Chen B, Norris J, Wang X, Chen M, Kim M, Yong J, Wardak Z, Choe K, Story M, Starr T, Cheong JH, Hwang TH. An integrative somatic mutation analysis to identify pathways linked with survival outcomes across 19 cancer types. Bioinformatics 2015; 32:1643-51. [PMID: 26635139 DOI: 10.1093/bioinformatics/btv692] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2015] [Accepted: 11/09/2015] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Identification of altered pathways that are clinically relevant across human cancers is a key challenge in cancer genomics. Precise identification and understanding of these altered pathways may provide novel insights into patient stratification, therapeutic strategies and the development of new drugs. However, a challenge remains in accurately identifying pathways altered by somatic mutations across human cancers, due to the diverse mutation spectrum. We developed an innovative approach to integrate somatic mutation data with gene networks and pathways, in order to identify pathways altered by somatic mutations across cancers. RESULTS We applied our approach to The Cancer Genome Atlas (TCGA) dataset of somatic mutations in 4790 cancer patients with 19 different types of tumors. Our analysis identified cancer-type-specific altered pathways enriched with known cancer-relevant genes and targets of currently available drugs. To investigate the clinical significance of these altered pathways, we performed consensus clustering for patient stratification using member genes in the altered pathways coupled with gene expression datasets from 4870 patients from TCGA, and multiple independent cohorts confirmed that the altered pathways could be used to stratify patients into subgroups with significantly different clinical outcomes. Of particular significance, certain patient subpopulations with poor prognosis were identified because they had specific altered pathways for which there are available targeted therapies. These findings could be used to tailor and intensify therapy in these patients, for whom current therapy is suboptimal. AVAILABILITY AND IMPLEMENTATION The code is available at: http://www.taehyunlab.org CONTACT jhcheong@yuhs.ac or taehyun.hwang@utsouthwestern.edu or taehyun.cs@gmail.com SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sunho Park
- Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Seung-Jun Kim
- Department of Computer Science and Electrical Engineering, University of Maryland at Baltimore County, Baltimore, MD, USA
| | - Donghyeon Yu
- Department of Statistics, Keimyung University, Daegu, South Korea
| | - Samuel Peña-Llopis
- Internal Medicine and Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Jianjiong Gao
- Center for Molecular Biology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Jin Suk Park
- Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Beibei Chen
- Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Jessie Norris
- Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Xinlei Wang
- Department of Statistical Science, Southern Methodist University, Dallas, TX, USA
| | - Min Chen
- Department of Mathematical Sciences, University of Texas at Dallas, Dallas, TX, USA
| | - Minsoo Kim
- Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Jeongsik Yong
- Department of Biochemistry, Molecular Biology and Biophysics, Obstetrics, Gynecology & Women's Health, University of Minnesota Twin Cities, Minneapolis, MN, USA
| | - Zabi Wardak
- Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Radiation Oncology, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Kevin Choe
- Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Radiation Oncology, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Michael Story
- Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Radiation Oncology, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Timothy Starr
- Genetics, Cell Biology, University of Minnesota Twin Cities, Minneapolis, MN, USA, Masonic Cancer Center, University of Minnesota Twin Cities, Minneapolis, MN, USA
| | - Jae-Ho Cheong
- Department of Surgery, Yonsei University College of Medicine, Seoul, South Korea and Open NBI Convergence Technology Research Laboratory, Yonsei University College of Medicine, Seoul, South Korea
| | - Tae Hyun Hwang
- Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, TX, USA, Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| |
Collapse
|
11
|
Prioritizing Clinically Relevant Copy Number Variation from Genetic Interactions and Gene Function Data. PLoS One 2015; 10:e0139656. [PMID: 26437450 PMCID: PMC4593641 DOI: 10.1371/journal.pone.0139656] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2014] [Accepted: 09/16/2015] [Indexed: 11/19/2022] Open
Abstract
It is becoming increasingly necessary to develop computerized methods for identifying the few disease-causing variants from hundreds discovered in each individual patient. This problem is especially relevant for Copy Number Variants (CNVs), which can be cheaply interrogated via low-cost hybridization arrays commonly used in clinical practice. We present a method to predict the disease relevance of CNVs that combines functional context and clinical phenotype to discover clinically harmful CNVs (and likely causative genes) in patients with a variety of phenotypes. We compare several feature and gene weighing systems for classifying both genes and CNVs. We combined the best performing methodologies and parameters on over 2,500 Agilent CGH 180k Microarray CNVs derived from 140 patients. Our method achieved an F-score of 91.59%, with 87.08% precision and 97.00% recall. Our methods are freely available at https://github.com/compbio-UofT/cnv-prioritization. Our dataset is included with the supplementary information.
Collapse
|
12
|
Ma C, Chen Y, Wilkins D, Chen X, Zhang J. An unsupervised learning approach to find ovarian cancer genes through integration of biological data. BMC Genomics 2015; 16 Suppl 9:S3. [PMID: 26328548 PMCID: PMC4547402 DOI: 10.1186/1471-2164-16-s9-s3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Cancer is a disease characterized largely by the accumulation of out-of-control somatic mutations during the lifetime of a patient. Distinguishing driver mutations from passenger mutations has posed a challenge in modern cancer research. With the advanced development of microarray experiments and clinical studies, a large numbers of candidate cancer genes have been extracted and distinguishing informative genes out of them is essential. As a matter of fact, we proposed to find the informative genes for cancer by using mutation data from ovarian cancers in our framework. In our model we utilized the patient gene mutation profile, gene expression data and gene gene interactions network to construct a graphical representation of genes and patients. Markov processes for mutation and patients are triggered separately. After this process, cancer genes are prioritized automatically by examining their scores at their stationary distributions in the eigenvector. Extensive experiments demonstrate that the integration of heterogeneous sources of information is essential in finding important cancer genes.
Collapse
|
13
|
Khirade MF, Lal G, Bapat SA. Derivation of a fifteen gene prognostic panel for six cancers. Sci Rep 2015; 5:13248. [PMID: 26272668 PMCID: PMC4536526 DOI: 10.1038/srep13248] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2015] [Accepted: 07/22/2015] [Indexed: 12/21/2022] Open
Abstract
The hallmarks of cancer deem biological pathways and molecules to be conserved. This approach may be useful for deriving a prognostic gene signature. Weighted Gene Co-expression Network Analysis of gene expression datasets in eleven cancer types identified modules of highly correlated genes and interactive networks conserved across glioblastoma, breast, ovary, colon, rectal and lung cancers, from which a universal classifier for tumor stratification was extracted. Specific conserved gene modules were validated across different microarray platforms and datasets. Strikingly, preserved genes within these modules defined regulatory networks associated with immune regulation, cell differentiation, metastases, cell migration, metastases, oncogenic transformation, and resistance to apoptosis and senescence, with AIF1 and PRRX1 being suggested to be master regulators governing these biological processes. A universal classifier from these conserved networks enabled execution of common set of principles across different cancers that revealed distinct, differential correlation of biological functions with patient survival in a cancer-specific manner. Correlation analysis further identified a panel of 15 risk genes with potential prognostic value, termed as the GBOCRL-IIPr panel [(GBM-Breast-Ovary-Colon-Rectal-Lung)–Immune–Invasion–Prognosis], that surprisingly, were not amongst the master regulators or important network hubs. This panel may now be integrated in predicting patient outcomes in the six cancers.
Collapse
Affiliation(s)
- Mamata F Khirade
- National Centre for Cell Science, NCCS Complex, Pune 411007, India
| | - Girdhari Lal
- National Centre for Cell Science, NCCS Complex, Pune 411007, India
| | - Sharmila A Bapat
- National Centre for Cell Science, NCCS Complex, Pune 411007, India
| |
Collapse
|
14
|
Magnus N, D'Asti E, Meehan B, Garnier D, Rak J. Oncogenes and the coagulation system--forces that modulate dormant and aggressive states in cancer. Thromb Res 2015; 133 Suppl 2:S1-9. [PMID: 24862126 DOI: 10.1016/s0049-3848(14)50001-1] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Cancers arise and progress genetically amidst profound perturbations of the microenvironmental and systemic homeostasis. This includes the coagulation system, which is a part of the vascular milieu (niche) that remains under the control of molecular events occurring within the cancer cell genome. Thus, activation of several prototypic oncogenic pathways, such as RAS, EGFR, HER2, MET, SHH and loss of tumor suppressors (PTEN, TP53) alter the expression, activity and vesicular release of coagulation effectors, as exemplified by tissue factor (TF). The cancer-specific determinants of coagulopathy are also illustrated by the emerging link between the expression profiles of coagulation-related genes (coagulome) in glioblastoma multiforme (GBM), medulloblastoma (MB) and possibly other cancers and molecular subtypes of these respective tumors. The state of the coagulome is consequential for growth, metastasis and angiogenesis of established tumors, but could potentially also affect dormant cancer cells. For example, TF expression may trigger awakening of dormant glioma cells in mice in a manner involving recruitment of vascular and inflammatory cells, and resulting in lasting changes in the cancer cell genome and epigenome. Thus, coagulation system effectors could act as both targets and (indirect) inducers of genetic tumor progression, and a better understanding of this link may hold new diagnostic and therapeutic opportunities.
Collapse
Affiliation(s)
- Nathalie Magnus
- Montreal Children's Hospital, RI MUHC, McGill University, Montreal, Quebec, Canada
| | - Esterina D'Asti
- Montreal Children's Hospital, RI MUHC, McGill University, Montreal, Quebec, Canada
| | - Brian Meehan
- Montreal Children's Hospital, RI MUHC, McGill University, Montreal, Quebec, Canada
| | - Delphine Garnier
- Montreal Children's Hospital, RI MUHC, McGill University, Montreal, Quebec, Canada
| | - Janusz Rak
- Montreal Children's Hospital, RI MUHC, McGill University, Montreal, Quebec, Canada.
| |
Collapse
|
15
|
Qin G, Zhao XM. A survey on computational approaches to identifying disease biomarkers based on molecular networks. J Theor Biol 2014; 362:9-16. [DOI: 10.1016/j.jtbi.2014.06.007] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2014] [Revised: 06/03/2014] [Accepted: 06/04/2014] [Indexed: 11/29/2022]
|
16
|
Tieri P, Zhou X, Zhu L, Nardini C. Multi-omic landscape of rheumatoid arthritis: re-evaluation of drug adverse effects. Front Cell Dev Biol 2014; 2:59. [PMID: 25414848 PMCID: PMC4220167 DOI: 10.3389/fcell.2014.00059] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Accepted: 09/26/2014] [Indexed: 12/19/2022] Open
Abstract
Objective: To provide a frame to estimate the systemic impact (side/adverse events) of (novel) therapeutic targets by taking into consideration drugs potential on the numerous districts involved in rheumatoid arthritis (RA) from the inflammatory and immune response to the gut-intestinal (GI) microbiome. Methods: We curated the collection of molecules from high-throughput screens of diverse (multi-omic) biochemical origin, experimentally associated to RA. Starting from such collection we generated RA-related protein-protein interaction (PPI) networks (interactomes) based on experimental PPI data. Pharmacological treatment simulation, topological and functional analyses were further run to gain insight into the proteins most affected by therapy and by multi-omic modeling. Results: Simulation on the administration of MTX results in the activation of expected (apoptosis) and adverse (nitrogenous metabolism alteration) effects. Growth factor receptor-bound protein 2 (GRB2) and Interleukin-1 Receptor Associated Kinase-4 (IRAK4, already an RA target) emerge as relevant nodes. The former controls the activation of inflammatory, proliferative and degenerative pathways in host and pathogens. The latter controls immune alterations and blocks innate response to pathogens. Conclusions: This multi-omic map properly recollects in a single analytical picture known, yet complex, information like the adverse/side effects of MTX, and provides a reliable platform for in silico hypothesis testing or recommendation on novel therapies. These results can support the development of RA translational research in the design of validation experiments and clinical trials, as such we identify GRB2 as a robust potential new target for RA for its ability to control both synovial degeneracy and dysbiosis, and, conversely, warn on the usage of IRAK4-inhibitors recently promoted, as this involves potential adverse effects in the form of impaired innate response to pathogens.
Collapse
Affiliation(s)
- Paolo Tieri
- IAC - Istituto per le Applicazioni del Calcolo "Mauro Picone," CNR - Consiglio Nazionale delle Ricerche Rome, Italy ; Group of Clinical Genomic Networks, Key Laboratory of Computational Biology, Chinese Academy of Sciences - Max Planck Society Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences Shanghai, China
| | - XiaoYuan Zhou
- Group of Clinical Genomic Networks, Key Laboratory of Computational Biology, Chinese Academy of Sciences - Max Planck Society Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences Shanghai, China
| | - Lisha Zhu
- Group of Clinical Genomic Networks, Key Laboratory of Computational Biology, Chinese Academy of Sciences - Max Planck Society Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences Shanghai, China
| | - Christine Nardini
- Group of Clinical Genomic Networks, Key Laboratory of Computational Biology, Chinese Academy of Sciences - Max Planck Society Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences Shanghai, China
| |
Collapse
|
17
|
Liu Y, Wang M, Feng H, Li A. Comprehensive study of tumour single nucleotide polymorphism array data reveals significant driver aberrations and disrupted signalling pathways in human hepatocellular cancer. IET Syst Biol 2014; 8:24-32. [PMID: 25014222 DOI: 10.1049/iet-syb.2013.0027] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
The authors describe an integrated method for analysing cancer driver aberrations and disrupted pathways by using tumour single nucleotide polymorphism (SNP) arrays. The authors new method adopts a novel statistical model to explicitly quantify the SNP signals, and therefore infers the genomic aberrations, including copy number alteration and loss of heterozygosity. Examination on the dilution series dataset shows that this method can correctly identify the genomic aberrations even with the existence of severe normal cell contamination in tumour sample. Furthermore, with the results of the aberration identification obtained from multiple tumour samples, a permutation-based approach is proposed for identifying the statistically significant driver aberrations, which are further incorporated with the known signalling pathways for pathway enrichment analysis. By applying the approach to 286 hepatocellular tumour samples, they successfully uncover numerous driver aberration regions across the cancer genome, for example, chromosomes 4p and 5q, which harbour many known hepatocellular cancer related genes such as alpha-fetoprotein (AFP) and ectodermal-neural cortex (ENC1). In addition, they identify nine disrupted pathways that are highly enriched by the driver aberrations, including the systemic lupus erythematosus pathway, the vascular endothelial growth factor (VEGF) signalling pathway and so on. These results support the feasibility and the utility of the proposed method on the characterisation of the cancer genome and the downstream analysis of the driver aberrations and the disrupted signalling pathways.
Collapse
Affiliation(s)
- Yuanning Liu
- School of Information Science and Technology, University of Science and Technology of China, Hefei AH230027, People's Republic of China
| | - Minghui Wang
- Centers for Biomedical Engineering, University of Science and Technology of China, Hefei AH230027, People's Republic of China
| | - Huanqing Feng
- School of Information Science and Technology, University of Science and Technology of China, Hefei AH230027, People's Republic of China
| | - Ao Li
- Centers for Biomedical Engineering, University of Science and Technology of China, Hefei AH230027, People's Republic of China.
| |
Collapse
|