1
|
Jung S, Wang S, Lee D. CancerGATE: Prediction of cancer-driver genes using graph attention autoencoders. Comput Biol Med 2024; 176:108568. [PMID: 38744009 DOI: 10.1016/j.compbiomed.2024.108568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 04/13/2024] [Accepted: 05/05/2024] [Indexed: 05/16/2024]
Abstract
Discovery of the cancer type specific-driver genes is important for understanding the molecular mechanisms of each cancer type and for providing proper treatment. Recently, graph deep learning methods became widely used in finding cancer-driver genes. However, previous methods had limited performance in individual cancer types due to a small number of cancer-driver genes used in training and biases toward the cancer-driver genes used in training the models. Here, we introduce a novel pipeline, CancerGATE that predicts the cancer-driver genes using graph attention autoencoder (GATE) to learn in a self-supervised manner and can be applied to each of the cancer types. CancerGATE utilizes biological network topology and multi-omics data from 15 types of cancer of 20,079 samples from the cancer genome atlas (TCGA). Attention coefficients calculated in the model are used to prioritize cancer-driver genes by comparing coefficients of cancer and normal contexts. CancerGATE shows a higher AUPRC with a difference ranging from 1.5 % to 36.5 % compared to the previous graph deep learning models in each cancer type. We also show that CancerGATE is free from the bias toward cancer-driver genes used in training, revealing mechanisms of the cancer-driver genes in specific cancer types. Finally, we propose novel cancer-driver gene candidates that could be therapeutic targets for specific cancer types.
Collapse
Affiliation(s)
- Seunghwan Jung
- Department of Bio and Brain Engineering, KAIST, Daejeon 34141, Republic of Korea.
| | - Seunghyun Wang
- Department of Bio and Brain Engineering, KAIST, Daejeon 34141, Republic of Korea.
| | - Doheon Lee
- Department of Bio and Brain Engineering, KAIST, Daejeon 34141, Republic of Korea.
| |
Collapse
|
2
|
Kańduła MM, Aldoshin AD, Singh S, Kolaczyk ED, Kreil D. ViLoN-a multi-layer network approach to data integration demonstrated for patient stratification. Nucleic Acids Res 2022; 51:e6. [PMID: 36395816 PMCID: PMC9841426 DOI: 10.1093/nar/gkac988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 10/11/2022] [Accepted: 11/02/2022] [Indexed: 11/19/2022] Open
Abstract
With more and more data being collected, modern network representations exploit the complementary nature of different data sources as well as similarities across patients. We here introduce the Variation of information fused Layers of Networks algorithm (ViLoN), a novel network-based approach for the integration of multiple molecular profiles. As a key innovation, it directly incorporates prior functional knowledge (KEGG, GO). In the constructed network of patients, patients are represented by networks of pathways, comprising genes that are linked by common functions and joint regulation in the disease. Patient stratification remains a key challenge both in the clinic and for research on disease mechanisms and treatments. We thus validated ViLoN for patient stratification on multiple data type combinations (gene expression, methylation, copy number), showing substantial improvements and consistently competitive performance for all. Notably, the incorporation of prior functional knowledge was critical for good results in the smaller cohorts (rectum adenocarcinoma: 90, esophageal carcinoma: 180), where alternative methods failed.
Collapse
Affiliation(s)
- Maciej M Kańduła
- Institute of Molecular Biotechnology, Boku University Vienna, Austria,Janssen Pharmaceutica NV, Beerse, Belgium
| | | | - Swati Singh
- Institute of Molecular Biotechnology, Boku University Vienna, Austria,Department of Biological Sciences and Bioengineering, Indian Institute of Technology Kanpur, Kanpur, India
| | - Eric D Kolaczyk
- Correspondence may also be addressed to Eric D. Kolaczyk. Tel: +1 514 398 3805;
| | - David P Kreil
- To whom correspondence should be addressed. Tel: +43 1 47654 79009;
| |
Collapse
|
3
|
de Schaetzen van Brienen L, Miclotte G, Larmuseau M, Van den Eynden J, Marchal K. Network-Based Analysis to Identify Drivers of Metastatic Prostate Cancer Using GoNetic. Cancers (Basel) 2021; 13:5291. [PMID: 34771455 PMCID: PMC8582433 DOI: 10.3390/cancers13215291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 10/19/2021] [Accepted: 10/19/2021] [Indexed: 11/16/2022] Open
Abstract
Most known driver genes of metastatic prostate cancer are frequently mutated. To dig into the long tail of rarely mutated drivers, we performed network-based driver identification on the Hartwig Medical Foundation metastatic prostate cancer data set (HMF cohort). Hereto, we developed GoNetic, a method based on probabilistic pathfinding, to identify recurrently mutated subnetworks. In contrast to most state-of-the-art network-based methods, GoNetic can leverage sample-specific mutational information and the weights of the underlying prior network. When applied to the HMF cohort, GoNetic successfully recovered known primary and metastatic drivers of prostate cancer that are frequently mutated in the HMF cohort (TP53, RB1, and CTNNB1). In addition, the identified subnetworks contain frequently mutated genes, reflect processes related to metastatic prostate cancer, and contain rarely mutated driver candidates. To further validate these rarely mutated genes, we assessed whether the identified genes were more mutated in metastatic than in primary samples using an independent cohort. Then we evaluated their association with tumor evolution and with the lymph node status of the patients. This resulted in forwarding several novel putative driver genes for metastatic prostate cancer, some of which might be prognostic for disease evolution.
Collapse
Affiliation(s)
- Louise de Schaetzen van Brienen
- Department of Plant Biotechnology and Bioinformatics, Faculty of Sciences, Ghent University, 9052 Ghent, Belgium; (L.d.S.v.B.); (G.M.); (M.L.)
- Department of Information Technology, Faculty of Engineering and Architecture, Ghent University-IMEC, 9052 Ghent, Belgium
| | - Giles Miclotte
- Department of Plant Biotechnology and Bioinformatics, Faculty of Sciences, Ghent University, 9052 Ghent, Belgium; (L.d.S.v.B.); (G.M.); (M.L.)
- Department of Information Technology, Faculty of Engineering and Architecture, Ghent University-IMEC, 9052 Ghent, Belgium
| | - Maarten Larmuseau
- Department of Plant Biotechnology and Bioinformatics, Faculty of Sciences, Ghent University, 9052 Ghent, Belgium; (L.d.S.v.B.); (G.M.); (M.L.)
- Department of Information Technology, Faculty of Engineering and Architecture, Ghent University-IMEC, 9052 Ghent, Belgium
| | - Jimmy Van den Eynden
- Department of Human Structure and Repair, Faculty of Medicine and Health Sciences, Ghent University, 9000 Ghent, Belgium;
| | - Kathleen Marchal
- Department of Plant Biotechnology and Bioinformatics, Faculty of Sciences, Ghent University, 9052 Ghent, Belgium; (L.d.S.v.B.); (G.M.); (M.L.)
- Department of Information Technology, Faculty of Engineering and Architecture, Ghent University-IMEC, 9052 Ghent, Belgium
| |
Collapse
|
4
|
Zhao Y, Shin DG. Deep Pathway Analysis V2.0: A Pathway Analysis Framework Incorporating Multi-Dimensional Omics Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:373-385. [PMID: 31603796 DOI: 10.1109/tcbb.2019.2945959] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Pathway analysis is essential in cancer research particularly when scientists attempt to derive interpretation from genome-wide high-throughput experimental data. If pathway information is organized into a network topology, its use in interpreting omics data can become very powerful. In this paper, we propose a topology-based pathway analysis method, called DPA V2.0, which can combine multiple heterogeneous omics data types in its analysis. In this method, each pathway route is encoded as a Bayesian network which is initialized with a sequence of conditional probabilities specifically designed to encode directionality of regulatory relationships defined in the pathway. Unlike other topology-based pathway tools, DPA is capable of identifying pathway routes as representatives of perturbed regulatory signals. We demonstrate the effectiveness of our model by applying it to two well-established TCGA data sets, namely, breast cancer study (BRCA) and ovarian cancer study (OV). The analysis combines mRNA-seq, mutation, copy number variation, and phosphorylation data publicly available for both TCGA data sets. We performed survival analysis and patient subtype analysis and the analysis outcomes revealed the anticipated strengths of our model. We hope that the availability of our model encourages wet lab scientists to generate extra data sets to reap the benefits of using multiple data types in pathway analysis. The majority of pathways distinguished can be confirmed by biological literature. Moreover, the proportion of correctly indentified pathways is 10 percent higher than previous work where only mRNA-seq and mutation data is incorporated for breast cancer patients. Consequently, such an in-depth pathway analysis incorporating more diverse data can give rise to the accuracy of perturbed pathway detection.
Collapse
|
5
|
Rau A, Manansala R, Flister MJ, Rui H, Jaffrézic F, Laloë D, Auer PL. Individualized multi-omic pathway deviation scores using multiple factor analysis. Biostatistics 2020; 23:362-379. [PMID: 32766691 PMCID: PMC9074877 DOI: 10.1093/biostatistics/kxaa029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2019] [Revised: 05/30/2020] [Accepted: 06/28/2020] [Indexed: 01/22/2023] Open
Abstract
Malignant progression of normal tissue is typically driven by complex networks of somatic changes, including genetic mutations, copy number aberrations, epigenetic changes, and transcriptional reprogramming. To delineate aberrant multi-omic tumor features that correlate with clinical outcomes, we present a novel pathway-centric tool based on the multiple factor analysis framework called padma. Using a multi-omic consensus representation, padma quantifies and characterizes individualized pathway-specific multi-omic deviations and their underlying drivers, with respect to the sampled population. We demonstrate the utility of padma to correlate patient outcomes with complex genetic, epigenetic, and transcriptomic perturbations in clinically actionable pathways in breast and lung cancer.
Collapse
Affiliation(s)
- Andrea Rau
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350, Jouy-en-Josas, France
| | - Regina Manansala
- Joseph J. Zilber School of Public Health, University of Wisconsin-Milwaukee, Milwaukee, WI 53201, USA
| | - Michael J Flister
- Department of Pathology, Medical College of Wisconsin, Milwaukee, WI 53226, USA, Cancer Center, Medical College of Wisconsin, Milwaukee, WI 53226, USA, and Genomic Sciences and Precision Medicine Center, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Hallgeir Rui
- Department of Pathology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Florence Jaffrézic
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350, Jouy-en-Josas, France
| | - Denis Laloë
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350, Jouy-en-Josas, France
| | - Paul L Auer
- Joseph J. Zilber School of Public Health, University of Wisconsin-Milwaukee, Milwaukee, WI 53201, USA
| |
Collapse
|
6
|
Chierici M, Bussola N, Marcolini A, Francescatto M, Zandonà A, Trastulla L, Agostinelli C, Jurman G, Furlanello C. Integrative Network Fusion: A Multi-Omics Approach in Molecular Profiling. Front Oncol 2020; 10:1065. [PMID: 32714870 PMCID: PMC7340129 DOI: 10.3389/fonc.2020.01065] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2020] [Accepted: 05/28/2020] [Indexed: 12/20/2022] Open
Abstract
Recent technological advances and international efforts, such as The Cancer Genome Atlas (TCGA), have made available several pan-cancer datasets encompassing multiple omics layers with detailed clinical information in large collection of samples. The need has thus arisen for the development of computational methods aimed at improving cancer subtyping and biomarker identification from multi-modal data. Here we apply the Integrative Network Fusion (INF) pipeline, which combines multiple omics layers exploiting Similarity Network Fusion (SNF) within a machine learning predictive framework. INF includes a feature ranking scheme (rSNF) on SNF-integrated features, used by a classifier over juxtaposed multi-omics features (juXT). In particular, we show instances of INF implementing Random Forest (RF) and linear Support Vector Machine (LSVM) as the classifier, and two baseline RF and LSVM models are also trained on juXT. A compact RF model, called rSNFi, trained on the intersection of top-ranked biomarkers from the two approaches juXT and rSNF is finally derived. All the classifiers are run in a 10x5-fold cross-validation schema to warrant reproducibility, following the guidelines for an unbiased Data Analysis Plan by the US FDA-led initiatives MAQC/SEQC. INF is demonstrated on four classification tasks on three multi-modal TCGA oncogenomics datasets. Gene expression, protein expression and copy number variants are used to predict estrogen receptor status (BRCA-ER, N = 381) and breast invasive carcinoma subtypes (BRCA-subtypes, N = 305), while gene expression, miRNA expression and methylation data is used as predictor layers for acute myeloid leukemia and renal clear cell carcinoma survival (AML-OS, N = 157; KIRC-OS, N = 181). In test, INF achieved similar Matthews Correlation Coefficient (MCC) values and 97% to 83% smaller feature sizes (FS), compared with juXT for BRCA-ER (MCC: 0.83 vs. 0.80; FS: 56 vs. 1801) and BRCA-subtypes (0.84 vs. 0.80; 302 vs. 1801), improving KIRC-OS performance (0.38 vs. 0.31; 111 vs. 2319). INF predictions are generally more accurate in test than one-dimensional omics models, with smaller signatures too, where transcriptomics consistently play the leading role. Overall, the INF framework effectively integrates multiple data levels in oncogenomics classification tasks, improving over the performance of single layers alone and naive juxtaposition, and provides compact signature sizes.
Collapse
Affiliation(s)
| | - Nicole Bussola
- Fondazione Bruno Kessler, Trento, Italy
- University of Trento, Trento, Italy
| | | | - Margherita Francescatto
- Fondazione Bruno Kessler, Trento, Italy
- Department of Medical, Surgical and Health Sciences, University of Trieste, Trieste, Italy
| | | | | | | | | | | |
Collapse
|
7
|
Paczkowska M, Barenboim J, Sintupisut N, Fox NS, Zhu H, Abd-Rabbo D, Mee MW, Boutros PC, Reimand J. Integrative pathway enrichment analysis of multivariate omics data. Nat Commun 2020; 11:735. [PMID: 32024846 PMCID: PMC7002665 DOI: 10.1038/s41467-019-13983-9] [Citation(s) in RCA: 90] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2018] [Accepted: 12/11/2019] [Indexed: 12/14/2022] Open
Abstract
Multi-omics datasets represent distinct aspects of the central dogma of molecular biology. Such high-dimensional molecular profiles pose challenges to data interpretation and hypothesis generation. ActivePathways is an integrative method that discovers significantly enriched pathways across multiple datasets using statistical data fusion, rationalizes contributing evidence and highlights associated genes. As part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancers across 38 tumor types, we integrated genes with coding and non-coding mutations and revealed frequently mutated pathways and additional cancer genes with infrequent mutations. We also analyzed prognostic molecular pathways by integrating genomic and transcriptomic features of 1780 breast cancers and highlighted associations with immune response and anti-apoptotic signaling. Integration of ChIP-seq and RNA-seq data for master regulators of the Hippo pathway across normal human tissues identified processes of tissue regeneration and stem cell regulation. ActivePathways is a versatile method that improves systems-level understanding of cellular organization in health and disease through integration of multiple molecular datasets and pathway annotations.
Collapse
Affiliation(s)
- Marta Paczkowska
- Computational Biology Program, Ontario Institute for Cancer Research, 661 University Ave Suite 510, Toronto, ON, M5G 0A3, Canada
| | - Jonathan Barenboim
- Computational Biology Program, Ontario Institute for Cancer Research, 661 University Ave Suite 510, Toronto, ON, M5G 0A3, Canada
| | - Nardnisa Sintupisut
- Computational Biology Program, Ontario Institute for Cancer Research, 661 University Ave Suite 510, Toronto, ON, M5G 0A3, Canada
| | - Natalie S Fox
- Computational Biology Program, Ontario Institute for Cancer Research, 661 University Ave Suite 510, Toronto, ON, M5G 0A3, Canada
- Department of Medical Biophysics, University of Toronto, 101 College Street Suite 15-701, Toronto, ON, M5G 1L7, Canada
| | - Helen Zhu
- Computational Biology Program, Ontario Institute for Cancer Research, 661 University Ave Suite 510, Toronto, ON, M5G 0A3, Canada
- Department of Medical Biophysics, University of Toronto, 101 College Street Suite 15-701, Toronto, ON, M5G 1L7, Canada
| | - Diala Abd-Rabbo
- Computational Biology Program, Ontario Institute for Cancer Research, 661 University Ave Suite 510, Toronto, ON, M5G 0A3, Canada
| | - Miles W Mee
- Computational Biology Program, Ontario Institute for Cancer Research, 661 University Ave Suite 510, Toronto, ON, M5G 0A3, Canada
| | - Paul C Boutros
- Computational Biology Program, Ontario Institute for Cancer Research, 661 University Ave Suite 510, Toronto, ON, M5G 0A3, Canada
- Department of Medical Biophysics, University of Toronto, 101 College Street Suite 15-701, Toronto, ON, M5G 1L7, Canada
- Department of Pharmacology & Toxicology, University of Toronto, 1 King's College Circle Room 4207, Toronto, ON, M5S 1A8, Canada
- Department of Human Genetics, University of California Los Angeles, 10833 Le Conte Avenue, Los Angeles, CA, 90095, USA
- Department of Urology, University of California Los Angeles, 200 Medical Plaza Driveway #140, Los Angeles, CA, 90024, USA
- Institute of Precision Health, University of California Los Angeles, 10833 Le Conte Avenue, Los Angeles, CA, 90024, USA
- Broad Stem Cell Research Centre, University of California Los Angeles, 615 Charles E Young Drive S, Los Angeles, CA, 90095, USA
- Jonsson Comprehensive Cancer Centre, University of California Los Angeles, 10833 Le Conte Avenue, Los Angeles, CA, 90024, USA
| | - Jüri Reimand
- Computational Biology Program, Ontario Institute for Cancer Research, 661 University Ave Suite 510, Toronto, ON, M5G 0A3, Canada.
- Department of Medical Biophysics, University of Toronto, 101 College Street Suite 15-701, Toronto, ON, M5G 1L7, Canada.
| |
Collapse
|
8
|
Reyna MA, Haan D, Paczkowska M, Verbeke LPC, Vazquez M, Kahraman A, Pulido-Tamayo S, Barenboim J, Wadi L, Dhingra P, Shrestha R, Getz G, Lawrence MS, Pedersen JS, Rubin MA, Wheeler DA, Brunak S, Izarzugaza JMG, Khurana E, Marchal K, von Mering C, Sahinalp SC, Valencia A, Reimand J, Stuart JM, Raphael BJ. Pathway and network analysis of more than 2500 whole cancer genomes. Nat Commun 2020; 11:729. [PMID: 32024854 PMCID: PMC7002574 DOI: 10.1038/s41467-020-14367-0] [Citation(s) in RCA: 56] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2018] [Accepted: 12/18/2019] [Indexed: 12/14/2022] Open
Abstract
The catalog of cancer driver mutations in protein-coding genes has greatly expanded in the past decade. However, non-coding cancer driver mutations are less well-characterized and only a handful of recurrent non-coding mutations, most notably TERT promoter mutations, have been reported. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancer across 38 tumor types, we perform multi-faceted pathway and network analyses of non-coding mutations across 2583 whole cancer genomes from 27 tumor types compiled by the ICGC/TCGA PCAWG project that was motivated by the success of pathway and network analyses in prioritizing rare mutations in protein-coding genes. While few non-coding genomic elements are recurrently mutated in this cohort, we identify 93 genes harboring non-coding mutations that cluster into several modules of interacting proteins. Among these are promoter mutations associated with reduced mRNA expression in TP53, TLE4, and TCF4. We find that biological processes had variable proportions of coding and non-coding mutations, with chromatin remodeling and proliferation pathways altered primarily by coding mutations, while developmental pathways, including Wnt and Notch, altered by both coding and non-coding mutations. RNA splicing is primarily altered by non-coding mutations in this cohort, and samples containing non-coding mutations in well-known RNA splicing factors exhibit similar gene expression signatures as samples with coding mutations in these genes. These analyses contribute a new repertoire of possible cancer genes and mechanisms that are altered by non-coding mutations and offer insights into additional cancer vulnerabilities that can be investigated for potential therapeutic treatments.
Collapse
Affiliation(s)
- Matthew A Reyna
- Department of Computer Science, Princeton University, Princeton, NJ, 08540, USA
- Department of Biomedical Informatics, Emory University, Atlanta, GA, 30322, USA
| | - David Haan
- Department of Biomolecular Engineering and UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, 95060, USA
| | - Marta Paczkowska
- Computational Biology Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Lieven P C Verbeke
- Department of Information Technology, IDLab, Ghent University, IMEC, Ghent, the Netherlands
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, the Netherlands
| | - Miguel Vazquez
- Barcelona Supercomputing Center (BSC), Barcelona, 08034, Spain
- Department of Clinical and Molecular Medicine, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway
| | - Abdullah Kahraman
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, CH-8057, Zurich, Switzerland
- Department of Pathology and Molecular Pathology, University Hospital Zurich, CH-8091, Zurich, Switzerland
| | - Sergio Pulido-Tamayo
- Department of Information Technology, IDLab, Ghent University, IMEC, Ghent, the Netherlands
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, the Netherlands
| | - Jonathan Barenboim
- Computational Biology Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Lina Wadi
- Computational Biology Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Priyanka Dhingra
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Raunak Shrestha
- Vancouver Prostate Centre, 2660 Oak Street, Vancouver, BC, V6H 3Z6, Canada
| | - Gad Getz
- The Broad Institute of MIT and Harvard, Cambridge, MA, 02124, USA
- Massachusetts General Hospital Center for Cancer Research, Charlestown, MA, 02129, USA
- Harvard Medical School, 250 Longwood Avenue, Boston, MA, 02115, USA
- Massachusetts General Hospital, Department of Pathology, Boston, MA, 02114, USA
| | - Michael S Lawrence
- The Broad Institute of MIT and Harvard, Cambridge, MA, 02124, USA
- Massachusetts General Hospital Center for Cancer Research, Charlestown, MA, 02129, USA
| | - Jakob Skou Pedersen
- Department of Molecular Medicine (MOMA), Aarhus University Hospital, Aarhus, Denmark
- Bioinformatics Research Centre (BiRC), Aarhus University, Aarhus, Denmark
| | - Mark A Rubin
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10065, USA
| | - David A Wheeler
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Søren Brunak
- DTU Bioinformatics, Department of Bio and Health Informatics, Technical University of Denmark, Kemitorvet, 2800, Kongens Lyngby, Denmark
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200, Copenhagen, Denmark
| | - Jose M G Izarzugaza
- DTU Bioinformatics, Department of Bio and Health Informatics, Technical University of Denmark, Kemitorvet, 2800, Kongens Lyngby, Denmark
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200, Copenhagen, Denmark
| | - Ekta Khurana
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Kathleen Marchal
- Department of Information Technology, IDLab, Ghent University, IMEC, Ghent, the Netherlands
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, the Netherlands
| | - Christian von Mering
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, CH-8057, Zurich, Switzerland
| | - S Cenk Sahinalp
- Vancouver Prostate Centre, 2660 Oak Street, Vancouver, BC, V6H 3Z6, Canada
- Department of Computer Science, Indiana University, Bloomington, IN, 47405, USA
| | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC), Barcelona, 08034, Spain
- ICREA, Barcelona, 08010, Spain
| | - Jüri Reimand
- Computational Biology Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada.
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.
| | - Joshua M Stuart
- Department of Biomolecular Engineering and UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, 95060, USA.
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ, 08540, USA.
| |
Collapse
|
9
|
Perez-Romero CA, Weytjens B, Decap D, Swings T, Michiels J, De Maeyer D, Marchal K. IAMBEE: a web-service for the identification of adaptive pathways from parallel evolved clonal populations. Nucleic Acids Res 2019; 47:W151-W157. [PMID: 31127271 PMCID: PMC6602435 DOI: 10.1093/nar/gkz451] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2019] [Revised: 05/02/2019] [Accepted: 05/10/2019] [Indexed: 11/18/2022] Open
Abstract
IAMBEE is a web server designed for the Identification of Adaptive Mutations in Bacterial Evolution Experiments (IAMBEE). Input data consist of genotype information obtained from independently evolved clonal populations or strains that show the same adapted behavior (phenotype). To distinguish adaptive from passenger mutations, IAMBEE searches for neighborhoods in an organism-specific interaction network that are recurrently mutated in the adapted populations. This search for recurrently mutated network neighborhoods, as proxies for pathways is driven by additional information on the functional impact of the observed genetic changes and their dynamics during adaptive evolution. In addition, the search explicitly accounts for the differences in mutation rate between the independently evolved populations. Using this approach, IAMBEE allows exploiting parallel evolution to identify adaptive pathways. The web-server is freely available at http://bioinformatics.intec.ugent.be/iambee/ with no login requirement.
Collapse
Affiliation(s)
- Camilo Andres Perez-Romero
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.,Department of Information Technology, IDLab, Ghent University, IMEC, Ghent, Belgium
| | - Bram Weytjens
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.,Department of Information Technology, IDLab, Ghent University, IMEC, Ghent, Belgium
| | - Dries Decap
- Department of Information Technology, IDLab, Ghent University, IMEC, Ghent, Belgium
| | - Toon Swings
- VIB Center for Microbiology, Flanders Institute for Biotechnology, Leuven, Belgium.,Centre of Microbial and Plant Genetics, KU Leuven, Leuven, Belgium.,VIB Technology Watch, Flanders Institute for Biotechnology, Ghent, Belgium
| | - Jan Michiels
- VIB Center for Microbiology, Flanders Institute for Biotechnology, Leuven, Belgium.,Centre of Microbial and Plant Genetics, KU Leuven, Leuven, Belgium
| | - Dries De Maeyer
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.,Department of Information Technology, IDLab, Ghent University, IMEC, Ghent, Belgium
| | - Kathleen Marchal
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.,Department of Information Technology, IDLab, Ghent University, IMEC, Ghent, Belgium
| |
Collapse
|
10
|
Larmuseau M, Verbeke LPC, Marchal K. Associating expression and genomic data using co-occurrence measures. Biol Direct 2019; 14:10. [PMID: 31072345 PMCID: PMC6507230 DOI: 10.1186/s13062-019-0240-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Accepted: 04/10/2019] [Indexed: 12/11/2022] Open
Abstract
Abstract Recent technological evolutions have led to an exponential increase in data in all the omics fields. It is expected that integration of these different data sources, will drastically enhance our knowledge of the biological mechanisms behind genomic diseases such as cancer. However, the integration of different omics data still remains a challenge. In this work we propose an intuitive workflow for the integrative analysis of expression, mutation and copy number data taken from the METABRIC study on breast cancer. First, we present evidence that the expression profile of many important breast cancer genes consists of two modes or ‘regimes’, which contain important clinical information. Then, we show how the co-occurrence of these expression regimes can be used as an association measure between genes and validate our findings on the TCGA-BRCA study. Finally, we demonstrate how these co-occurrence measures can also be applied to link expression regimes to genomic aberrations, providing a more complete, integrative view on breast cancer. As a case study, an integrative analysis of the identified MLPH-FOXA1 association is performed, illustrating that the obtained expression associations are intimately linked to the underlying genomic changes. Reviewers This article was reviewed by Dirk Walther, Francisco Garcia and Isabel Nepomuceno. Electronic supplementary material The online version of this article (10.1186/s13062-019-0240-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Maarten Larmuseau
- Department of Information Technology, Ghent University - Imec, Technologiepark-Zwijnaarde 126, 9052, Ghent, Belgium
| | - Lieven P C Verbeke
- Department of Plant Biotechnology and Bioinformatics, Ghent University - Imec, Technologiepark-Zwijnaarde 126, 9052, Ghent, Belgium
| | - Kathleen Marchal
- Department of Plant Biotechnology and Bioinformatics, Ghent University - Imec, Technologiepark-Zwijnaarde 126, 9052, Ghent, Belgium.
| |
Collapse
|
11
|
Paczkowska M, Barenboim J, Sintupisut N, Fox NC, Zhu H, Abd-rabbo D, Boutros PC, Reimand J, PCAWG Network and Pathway Analysis Group. Integrative pathway enrichment analysis of multivariate omics data.. [DOI: 10.1101/399113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
ABSTRACTMulti-omics datasets quantify complementary aspects of molecular biology and thus pose challenges to data interpretation and hypothesis generation. ActivePathways is an integrative method that discovers significantly enriched pathways across multiple omics datasets using a statistical data fusion approach, rationalizes contributing evidence and highlights associated genes. We demonstrate its utility by analyzing coding and non-coding mutations from 2,583 whole cancer genomes, revealing frequently mutated hallmark pathways and a long tail of known and putative cancer driver genes. We also studied prognostic molecular pathways in breast cancer subtypes by integrating genomic and transcriptomic features of tumors and tumor-adjacent cells and found significant associations with immune response processes and anti-apoptotic signaling pathways. ActivePathways is a versatile method that improves systems-level understanding of cellular organization in health and disease through integration of multiple molecular datasets and pathway annotations.
Collapse
|
12
|
Zhao Y, Hoang TH, Joshi P, Hong SH, Giardina C, Shin DG. A route-based pathway analysis framework integrating mutation information and gene expression data. Methods 2017. [PMID: 28647608 DOI: 10.1016/j.ymeth.2017.06.016] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
We propose a new way of analyzing biological pathways in which the analysis combines both transcriptome data and mutation information and uses the outcome to identify "routes" of aberrant pathways potentially responsible for the etiology of disease. Each pathway route is encoded as a Bayesian Network which is initialized with a sequence of conditional probabilities which are designed to encode directionality of regulatory relationships encoded in the pathways, i.e. activation and inhibition relationships. First, we demonstrate the effectiveness of our model through simulation in which the model was able to easily separate Test samples from Control samples using fictitiously perturbed pathway routes. Second, we apply our model to analyze the Breast Cancer data set, available from TCGA, against many cancer pathways available from KEGG and rank the significance of identified pathways. The outcome is consistent with what have already been reported in the literature. Third, survival analysis has been carried out on the same data set by using pathway routes as features. Overall, we envision that our model of using pathway routes for analysis can further refine the conventional ways of subtyping cancer patients as it can discover additional characteristics specific to individual's tumor.
Collapse
Affiliation(s)
- Yue Zhao
- Computer Science and Engineering Department, University of Connecticut, 371 Fairfield Way, Unit 4155, Storrs, CT 06269, United States.
| | - Tham H Hoang
- Computer Science and Engineering Department, University of Connecticut, 371 Fairfield Way, Unit 4155, Storrs, CT 06269, United States
| | - Pujan Joshi
- Computer Science and Engineering Department, University of Connecticut, 371 Fairfield Way, Unit 4155, Storrs, CT 06269, United States
| | - Seung-Hyun Hong
- Computer Science and Engineering Department, University of Connecticut, 371 Fairfield Way, Unit 4155, Storrs, CT 06269, United States
| | - Charles Giardina
- Department of Molecular and Cell Biology, University of Connecticut, 91 North Eagleville Road, Unit 3125, Storrs, CT 06269, United States
| | - Dong-Guk Shin
- Computer Science and Engineering Department, University of Connecticut, 371 Fairfield Way, Unit 4155, Storrs, CT 06269, United States
| |
Collapse
|
13
|
Network-based integration of systems genetics data reveals pathways associated with lignocellulosic biomass accumulation and processing. Proc Natl Acad Sci U S A 2017; 114:1195-1200. [PMID: 28096391 DOI: 10.1073/pnas.1620119114] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
As a consequence of their remarkable adaptability, fast growth, and superior wood properties, eucalypt tree plantations have emerged as key renewable feedstocks (over 20 million ha globally) for the production of pulp, paper, bioenergy, and other lignocellulosic products. However, most biomass properties such as growth, wood density, and wood chemistry are complex traits that are hard to improve in long-lived perennials. Systems genetics, a process of harnessing multiple levels of component trait information (e.g., transcript, protein, and metabolite variation) in populations that vary in complex traits, has proven effective for dissecting the genetics and biology of such traits. We have applied a network-based data integration (NBDI) method for a systems-level analysis of genes, processes and pathways underlying biomass and bioenergy-related traits using a segregating Eucalyptus hybrid population. We show that the integrative approach can link biologically meaningful sets of genes to complex traits and at the same time reveal the molecular basis of trait variation. Gene sets identified for related woody biomass traits were found to share regulatory loci, cluster in network neighborhoods, and exhibit enrichment for molecular functions such as xylan metabolism and cell wall development. These findings offer a framework for identifying the molecular underpinnings of complex biomass and bioprocessing-related traits. A more thorough understanding of the molecular basis of plant biomass traits should provide additional opportunities for the establishment of a sustainable bio-based economy.
Collapse
|
14
|
Niroula A, Vihinen M. Variation Interpretation Predictors: Principles, Types, Performance, and Choice. Hum Mutat 2016; 37:579-97. [DOI: 10.1002/humu.22987] [Citation(s) in RCA: 90] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2015] [Accepted: 03/07/2016] [Indexed: 12/18/2022]
Affiliation(s)
- Abhishek Niroula
- Department of Experimental Medical Science; Lund University; BMC B13 Lund SE-22184 Sweden
| | - Mauno Vihinen
- Department of Experimental Medical Science; Lund University; BMC B13 Lund SE-22184 Sweden
| |
Collapse
|
15
|
Mizrachi E, Myburg AA. Systems genetics of wood formation. CURRENT OPINION IN PLANT BIOLOGY 2016; 30:94-100. [PMID: 26943939 DOI: 10.1016/j.pbi.2016.02.007] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2015] [Revised: 02/12/2016] [Accepted: 02/15/2016] [Indexed: 05/02/2023]
Abstract
In woody plants, xylogenesis is an exceptionally strong carbon sink requiring robust transcriptional control and dynamic coordination of cellular and metabolic processes directing carbon allocation and partitioning into secondary cell wall biosynthesis. As a biological process, wood formation is an excellent candidate for systems modeling due to the strong correlation patterns and interconnectedness observed for transcriptional and metabolic component traits contributing to complex phenotypes such as cell wall chemistry and ultrastructure. Genetic variation in undomesticated tree populations provides abundant perturbation of systems components, adding another dimension to plant systems biology (besides spatial and temporal variation). High-throughput analysis of molecular component traits in adult trees has provided the first insights into the systems genetics of wood, an important renewable feedstock for biomaterials and bioenergy.
Collapse
Affiliation(s)
- Eshchar Mizrachi
- Department of Genetics, Forestry and Agricultural Biotechnology Institute (FABI), Genomics Research Institute (GRI), University of Pretoria, Private Bag X20, Pretoria 0028, South Africa.
| | - Alexander A Myburg
- Department of Genetics, Forestry and Agricultural Biotechnology Institute (FABI), Genomics Research Institute (GRI), University of Pretoria, Private Bag X20, Pretoria 0028, South Africa.
| |
Collapse
|
16
|
De Maeyer D, Weytjens B, De Raedt L, Marchal K. Network-Based Analysis of eQTL Data to Prioritize Driver Mutations. Genome Biol Evol 2016; 8:481-94. [PMID: 26802430 PMCID: PMC4825419 DOI: 10.1093/gbe/evw010] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
In clonal systems, interpreting driver genes in terms of molecular networks helps understanding how these drivers elicit an adaptive phenotype. Obtaining such a network-based understanding depends on the correct identification of driver genes. In clonal systems, independent evolved lines can acquire a similar adaptive phenotype by affecting the same molecular pathways, a phenomenon referred to as parallelism at the molecular pathway level. This implies that successful driver identification depends on interpreting mutated genes in terms of molecular networks. Driver identification and obtaining a network-based understanding of the adaptive phenotype are thus confounded problems that ideally should be solved simultaneously. In this study, a network-based eQTL method is presented that solves both the driver identification and the network-based interpretation problem. As input the method uses coupled genotype-expression phenotype data (eQTL data) of independently evolved lines with similar adaptive phenotypes and an organism-specific genome-wide interaction network. The search for mutational consistency at pathway level is defined as a subnetwork inference problem, which consists of inferring a subnetwork from the genome-wide interaction network that best connects the genes containing mutations to differentially expressed genes. Based on their connectivity with the differentially expressed genes, mutated genes are prioritized as driver genes. Based on semisynthetic data and two publicly available data sets, we illustrate the potential of the network-based eQTL method to prioritize driver genes and to gain insights in the molecular mechanisms underlying an adaptive phenotype. The method is available at http://bioinformatics.intec.ugent.be/phenetic_eqtl/index.html
Collapse
Affiliation(s)
- Dries De Maeyer
- Deptartment of Information Technology (INTEC, iMINDS), UGent, 9052 Ghent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 927, 9052 Gent, Belgium Bioinformatics Institute Ghent, Technologiepark 927, 9052 Ghent, Belgium Department of Microbial and Molecular Systems, KU Leuven, Kasteelpark Arenberg 20, B-3001 Leuven, Belgium
| | - Bram Weytjens
- Deptartment of Information Technology (INTEC, iMINDS), UGent, 9052 Ghent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 927, 9052 Gent, Belgium Bioinformatics Institute Ghent, Technologiepark 927, 9052 Ghent, Belgium Department of Microbial and Molecular Systems, KU Leuven, Kasteelpark Arenberg 20, B-3001 Leuven, Belgium
| | - Luc De Raedt
- Department of Computer Science, KU Leuven, Celestijnenlaan 200A, B-3001 Leuven, Belgium
| | - Kathleen Marchal
- Deptartment of Information Technology (INTEC, iMINDS), UGent, 9052 Ghent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 927, 9052 Gent, Belgium Bioinformatics Institute Ghent, Technologiepark 927, 9052 Ghent, Belgium Department of Genetics, University of Pretoria, Hatfield Campus, Pretoria 0028, South Africa Department of Microbial and Molecular Systems, KU Leuven, Kasteelpark Arenberg 20, B-3001 Leuven, Belgium
| |
Collapse
|