1
|
Shuaibi A, Chitra U, Raphael BJ. A latent variable model for evaluating mutual exclusivity and co-occurrence between driver mutations in cancer. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.24.590995. [PMID: 38712136 PMCID: PMC11071465 DOI: 10.1101/2024.04.24.590995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
A key challenge in cancer genomics is understanding the functional relationships and dependencies between combinations of somatic mutations that drive cancer development. Such driver mutations frequently exhibit patterns of mutual exclusivity or co-occurrence across tumors, and many methods have been developed to identify such dependency patterns from bulk DNA sequencing data of a cohort of patients. However, while mutual exclusivity and co-occurrence are described as properties of driver mutations, existing methods do not explicitly disentangle functional, driver mutations from neutral, passenger mutations. In particular, nearly all existing methods evaluate mutual exclusivity or co-occurrence at the gene level, marking a gene as mutated if any mutation - driver or passenger - is present. Since some genes have a large number of passenger mutations, existing methods either restrict their analyses to a small subset of suspected driver genes - limiting their ability to identify novel dependencies - or make spurious inferences of mutual exclusivity and co-occurrence involving genes with many passenger mutations. We introduce DIALECT, an algorithm to identify dependencies between pairs of driver mutations from somatic mutation counts. We derive a latent variable mixture model for drivers and passengers that combines existing probabilistic models of passenger mutation rates with a latent variable describing the unknown status of a mutation as a driver or passenger. We use an expectation maximization (EM) algorithm to estimate the parameters of our model, including the rates of mutually exclusivity and co-occurrence between drivers. We demonstrate that DIALECT more accurately infers mutual exclusivity and co-occurrence between driver mutations compared to existing methods on both simulated mutation data and somatic mutation data from 5 cancer types in The Cancer Genome Atlas (TCGA).
Collapse
|
2
|
Luo XG, Kuipers J, Beerenwinkel N. Joint inference of exclusivity patterns and recurrent trajectories from tumor mutation trees. Nat Commun 2023; 14:3676. [PMID: 37344522 DOI: 10.1038/s41467-023-39400-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Accepted: 06/12/2023] [Indexed: 06/23/2023] Open
Abstract
Cancer progression is an evolutionary process shaped by both deterministic and stochastic forces. Multi-region and single-cell sequencing of tumors enable high-resolution reconstruction of the mutational history of each tumor and highlight the extensive diversity across tumors and patients. Resolving the interactions among mutations and recovering recurrent evolutionary processes may offer greater opportunities for successful therapeutic strategies. To this end, we present a novel probabilistic framework, called TreeMHN, for the joint inference of exclusivity patterns and recurrent trajectories from a cohort of intra-tumor phylogenetic trees. Through simulations, we show that TreeMHN outperforms existing alternatives that can only focus on one aspect of the task. By analyzing datasets of blood, lung, and breast cancers, we find the most likely evolutionary trajectories and mutational patterns, consistent with and enriching our current understanding of tumorigenesis. Moreover, TreeMHN facilitates the prediction of tumor evolution and provides probabilistic measures on the next mutational events given a tumor tree, a prerequisite for evolution-guided treatment strategies.
Collapse
Affiliation(s)
- Xiang Ge Luo
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, 4058, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Mattenstrasse 26, 4058, Basel, Switzerland
| | - Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, 4058, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Mattenstrasse 26, 4058, Basel, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, 4058, Basel, Switzerland.
- SIB Swiss Institute of Bioinformatics, Mattenstrasse 26, 4058, Basel, Switzerland.
| |
Collapse
|
3
|
ToMExO: A probabilistic tree-structured model for cancer progression. PLoS Comput Biol 2022; 18:e1010732. [DOI: 10.1371/journal.pcbi.1010732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Revised: 12/15/2022] [Accepted: 11/14/2022] [Indexed: 12/12/2022] Open
Abstract
Identifying the interrelations among cancer driver genes and the patterns in which the driver genes get mutated is critical for understanding cancer. In this paper, we study cross-sectional data from cohorts of tumors to identify the cancer-type (or subtype) specific process in which the cancer driver genes accumulate critical mutations. We model this mutation accumulation process using a tree, where each node includes a driver gene or a set of driver genes. A mutation in each node enables its children to have a chance of mutating. This model simultaneously explains the mutual exclusivity patterns observed in mutations in specific cancer genes (by its nodes) and the temporal order of events (by its edges). We introduce a computationally efficient dynamic programming procedure for calculating the likelihood of our noisy datasets and use it to build our Markov Chain Monte Carlo (MCMC) inference algorithm, ToMExO. Together with a set of engineered MCMC moves, our fast likelihood calculations enable us to work with datasets with hundreds of genes and thousands of tumors, which cannot be dealt with using available cancer progression analysis methods. We demonstrate our method’s performance on several synthetic datasets covering various scenarios for cancer progression dynamics. Then, a comparison against two state-of-the-art methods on a moderate-size biological dataset shows the merits of our algorithm in identifying significant and valid patterns. Finally, we present our analyses of several large biological datasets, including colorectal cancer, glioblastoma, and pancreatic cancer. In all the analyses, we validate the results using a set of method-independent metrics testing the causality and significance of the relations identified by ToMExO or competing methods.
Collapse
|
4
|
Zhang W, Wang SL, Liu Y. Identification of Cancer Driver Modules Based on Graph Clustering from Multiomics Data. J Comput Biol 2021; 28:1007-1020. [PMID: 34529511 DOI: 10.1089/cmb.2021.0052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
A major challenge in cancer genomics is to identify cancer driver genes and modules. Most existing methods to identify cancer driver modules (iCDM) identify groups of genes whose somatic mutational patterns exhibit either mutual exclusivity or high coverage of patient samples, without considering other biological information from multiomics data sets. Here we integrate mutual exclusivity, coverage, and protein-protein interaction information to construct an edge-weighted network, and present a graph clustering approach based on symmetric non-negative matrix factorization to iCDM. iCDM was tested on pan-cancer data and the results were compared with those from several advanced computational methods. Our approach outperformed other methods in recovering known cancer driver modules, and the identified driver modules showed high accuracy in classifying normal and tumor samples.
Collapse
Affiliation(s)
- Wei Zhang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China.,Hunan Province Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, China
| | - Shu-Lin Wang
- College of Computer Science and Electronics Engineering, Hunan University, Changsha, China
| | - Yue Liu
- College of Computer Science and Electronics Engineering, Hunan University, Changsha, China
| |
Collapse
|
5
|
Nicol PB, Coombes KR, Deaver C, Chkrebtii O, Paul S, Toland AE, Asiaee A. Oncogenetic network estimation with disjunctive Bayesian networks. COMPUTATIONAL AND SYSTEMS ONCOLOGY 2021. [DOI: 10.1002/cso2.1027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Affiliation(s)
| | - Kevin R. Coombes
- Department of Biomedical Informatics Ohio State University Columbus Ohio
| | - Courtney Deaver
- Natural Sciences Division Pepperdine University Malibu California
| | | | - Subhadeep Paul
- Department of Statistics Ohio State University Columbus Ohio
| | - Amanda E. Toland
- Department of Cancer Biology and Genetics and Department of Internal Medicine Division of Human Genetics, Comprehensive Cancer Center Ohio State University Columbus Ohio
| | - Amir Asiaee
- Mathematical Biosciences Institute Ohio State University Columbus Ohio
| |
Collapse
|
6
|
Inferring tumor progression in large datasets. PLoS Comput Biol 2020; 16:e1008183. [PMID: 33035204 PMCID: PMC7577444 DOI: 10.1371/journal.pcbi.1008183] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Revised: 10/21/2020] [Accepted: 07/22/2020] [Indexed: 12/31/2022] Open
Abstract
Identification of mutations of the genes that give cancer a selective advantage is an important step towards research and clinical objectives. As such, there has been a growing interest in developing methods for identification of driver genes and their temporal order within a single patient (intra-tumor) as well as across a cohort of patients (inter-tumor). In this paper, we develop a probabilistic model for tumor progression, in which the driver genes are clustered into several ordered driver pathways. We develop an efficient inference algorithm that exhibits favorable scalability to the number of genes and samples compared to a previously introduced ILP-based method. Adopting a probabilistic approach also allows principled approaches to model selection and uncertainty quantification. Using a large set of experiments on synthetic datasets, we demonstrate our superior performance compared to the ILP-based method. We also analyze two biological datasets of colorectal and glioblastoma cancers. We emphasize that while the ILP-based method puts many seemingly passenger genes in the driver pathways, our algorithm keeps focused on truly driver genes and outputs more accurate models for cancer progression. Cancer is a disease caused by the accumulation of somatic mutations in the genome. This process is mainly driven by mutations in certain genes that give the harboring cells some selective advantage. The rather few driver genes are usually masked amongst an abundance of so-called passenger mutations. Identification of the driver genes and the temporal order in which the mutations occur is of great importance towards research and clinical objectives. In this paper, we introduce a probabilistic model for cancer progression and devise an efficient inference algorithm to train the model. We show that our method scales favorably to large datasets and provides superior performance compared to an ILP-based counterpart on a wide set of synthetic data simulations. Our Bayesian approach also allows for systematic model selection and confidence quantification procedures in contrast to the previous non-probabilistic progression models. We also study two large datasets on colorectal and glioblastoma cancers and validate our inferred model in comparison to the ILP-based method.
Collapse
|
7
|
Schill R, Solbrig S, Wettig T, Spang R. Modelling cancer progression using Mutual Hazard Networks. Bioinformatics 2020; 36:241-249. [PMID: 31250881 PMCID: PMC6956791 DOI: 10.1093/bioinformatics/btz513] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2018] [Revised: 03/29/2019] [Accepted: 06/25/2019] [Indexed: 12/26/2022] Open
Abstract
MOTIVATION Cancer progresses by accumulating genomic events, such as mutations and copy number alterations, whose chronological order is key to understanding the disease but difficult to observe. Instead, cancer progression models use co-occurrence patterns in cross-sectional data to infer epistatic interactions between events and thereby uncover their most likely order of occurrence. State-of-the-art progression models, however, are limited by mathematical tractability and only allow events to interact in directed acyclic graphs, to promote but not inhibit subsequent events, or to be mutually exclusive in distinct groups that cannot overlap. RESULTS Here we propose Mutual Hazard Networks (MHN), a new Machine Learning algorithm to infer cyclic progression models from cross-sectional data. MHN model events by their spontaneous rate of fixation and by multiplicative effects they exert on the rates of successive events. MHN compared favourably to acyclic models in cross-validated model fit on four datasets tested. In application to the glioblastoma dataset from The Cancer Genome Atlas, MHN proposed a novel interaction in line with consecutive biopsies: IDH1 mutations are early events that promote subsequent fixation of TP53 mutations. AVAILABILITY AND IMPLEMENTATION Implementation and data are available at https://github.com/RudiSchill/MHN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rudolf Schill
- Department of Statistical Bioinformatics, Institute of Functional Genomics, Regensburg 93040, Germany
| | - Stefan Solbrig
- Department of Physics, University of Regensburg, Regensburg 93040, Germany
| | - Tilo Wettig
- Department of Physics, University of Regensburg, Regensburg 93040, Germany
| | - Rainer Spang
- Department of Statistical Bioinformatics, Institute of Functional Genomics, Regensburg 93040, Germany
| |
Collapse
|
8
|
Fleck JL, Pavel AB, Cassandras CG. A pan-cancer analysis of progression mechanisms and drug sensitivity in cancer cell lines. Mol Omics 2019; 15:399-405. [PMID: 31570905 DOI: 10.1039/c9mo00119k] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Biomarker discovery involves identifying genetic abnormalities within a tumor. However, one of the main challenges in defining such therapeutic targets is accounting for the molecular heterogeneity of cancer. By integrating somatic mutation and gene expression data from hundreds of heterogeneous cell lines from the Cancer Cell Line Encyclopedia (CCLE), we identify sequences of genetic events that may help explain common patterns of oncogenesis across 22 tumor types, and evaluate the general effect of late-stage mutations on drug sensitivity and resistance mechanisms. Through gene enrichment analysis, we find several cancer-specific and immune pathways that are significantly enriched in each of our three proposed phases of cancer progression. By further analyzing the drug activity area associated with compounds that target the BRAF oncogene, a known predictor of drug sensitivity for several compounds used in cancer treatment, we verify that the acquisition of new driver mutations interferes with the targeted drug mechanism, meaning that cells without late-stage mutations generally respond better to drugs.
Collapse
Affiliation(s)
- Julia L Fleck
- Department of Industrial Engineering, Pontifical Catholic University of Rio de Janeiro, Rua Marques de Sao Vicente, 225, Rio de Janeiro, Brazil.
| | | | | |
Collapse
|
9
|
Wang M, Yu T, Liu J, Chen L, Stromberg AJ, Villano JL, Arnold SM, Liu C, Wang C. A probabilistic method for leveraging functional annotations to enhance estimation of the temporal order of pathway mutations during carcinogenesis. BMC Bioinformatics 2019; 20:620. [PMID: 31791231 PMCID: PMC6889196 DOI: 10.1186/s12859-019-3218-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2019] [Accepted: 11/12/2019] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Cancer arises through accumulation of somatically acquired genetic mutations. An important question is to delineate the temporal order of somatic mutations during carcinogenesis, which contributes to better understanding of cancer biology and facilitates identification of new therapeutic targets. Although a number of statistical and computational methods have been proposed to estimate the temporal order of mutations, they do not account for the differences in the functional impacts of mutations and thus are likely to be obscured by the presence of passenger mutations that do not contribute to cancer progression. In addition, many methods infer the order of mutations at the gene level, which have limited power due to the low mutation rate in most genes. RESULTS In this paper, we develop a Probabilistic Approach for estimating the Temporal Order of Pathway mutations by leveraging functional Annotations of mutations (PATOPA). PATOPA infers the order of mutations at the pathway level, wherein it uses a probabilistic method to characterize the likelihood of mutational events from different pathways occurring in a certain order. The functional impact of each mutation is incorporated to weigh more on a mutation that is more integral to tumor development. A maximum likelihood method is used to estimate parameters and infer the probability of one pathway being mutated prior to another. Simulation studies and analysis of whole exome sequencing data from The Cancer Genome Atlas (TCGA) demonstrate that PATOPA is able to accurately estimate the temporal order of pathway mutations and provides new biological insights on carcinogenesis of colorectal and lung cancers. CONCLUSIONS PATOPA provides a useful tool to estimate temporal order of mutations at the pathway level while leveraging functional annotations of mutations.
Collapse
Affiliation(s)
- Menghan Wang
- Department of Statistics, University of Kentucky, Lexington, USA
| | - Tianxin Yu
- Department of Molecular & Cellular Biology, Roswell Park Comprehensive Cancer Center, Buffalo, USA
| | - Jinpeng Liu
- Markey Cancer Center, University of Kentucky, Lexington, USA
| | - Li Chen
- Markey Cancer Center, University of Kentucky, Lexington, USA
- Department of Biostatistics, University of Kentucky, Lexington, USA
| | | | - John L. Villano
- Markey Cancer Center, University of Kentucky, Lexington, USA
- Department of Internal Medicine, University of Kentucky, Lexington, USA
| | - Susanne M. Arnold
- Markey Cancer Center, University of Kentucky, Lexington, USA
- Department of Internal Medicine, University of Kentucky, Lexington, USA
| | - Chunming Liu
- Markey Cancer Center, University of Kentucky, Lexington, USA
- Department of Molecular & Cellular Biochemistry, University of Kentucky, Lexington, USA
| | - Chi Wang
- Markey Cancer Center, University of Kentucky, Lexington, USA
- Department of Biostatistics, University of Kentucky, Lexington, USA
| |
Collapse
|
10
|
Khakabimamaghani S, Ding D, Snow O, Ester M. Uncovering the subtype-specific temporal order of cancer pathway dysregulation. PLoS Comput Biol 2019; 15:e1007451. [PMID: 31710622 PMCID: PMC6872169 DOI: 10.1371/journal.pcbi.1007451] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 11/21/2019] [Accepted: 09/30/2019] [Indexed: 12/20/2022] Open
Abstract
Cancer is driven by genetic mutations that dysregulate pathways important for proper cell function. Therefore, discovering these cancer pathways and their dysregulation order is key to understanding and treating cancer. However, the heterogeneity of mutations between different individuals makes this challenging and requires that cancer progression is studied in a subtype-specific way. To address this challenge, we provide a mathematical model, called Subtype-specific Pathway Linear Progression Model (SPM), that simultaneously captures cancer subtypes and pathways and order of dysregulation of the pathways within each subtype. Experiments with synthetic data indicate the robustness of SPM to problem specifics including noise compared to an existing method. Moreover, experimental results on glioblastoma multiforme and colorectal adenocarcinoma show the consistency of SPM's results with the existing knowledge and its superiority to an existing method in certain cases. The implementation of our method is available at https://github.com/Dalton386/SPM.
Collapse
Affiliation(s)
| | - Dujian Ding
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Oliver Snow
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Martin Ester
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada
| |
Collapse
|
11
|
Diaz-Uriarte R, Vasallo C. Every which way? On predicting tumor evolution using cancer progression models. PLoS Comput Biol 2019; 15:e1007246. [PMID: 31374072 PMCID: PMC6693785 DOI: 10.1371/journal.pcbi.1007246] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Revised: 08/14/2019] [Accepted: 07/05/2019] [Indexed: 11/18/2022] Open
Abstract
Successful prediction of the likely paths of tumor progression is valuable for diagnostic, prognostic, and treatment purposes. Cancer progression models (CPMs) use cross-sectional samples to identify restrictions in the order of accumulation of driver mutations and thus CPMs encode the paths of tumor progression. Here we analyze the performance of four CPMs to examine whether they can be used to predict the true distribution of paths of tumor progression and to estimate evolutionary unpredictability. Employing simulations we show that if fitness landscapes are single peaked (have a single fitness maximum) there is good agreement between true and predicted distributions of paths of tumor progression when sample sizes are large, but performance is poor with the currently common much smaller sample sizes. Under multi-peaked fitness landscapes (i.e., those with multiple fitness maxima), performance is poor and improves only slightly with sample size. In all cases, detection regime (when tumors are sampled) is a key determinant of performance. Estimates of evolutionary unpredictability from the best performing CPM, among the four examined, tend to overestimate the true unpredictability and the bias is affected by detection regime; CPMs could be useful for estimating upper bounds to the true evolutionary unpredictability. Analysis of twenty-two cancer data sets shows low evolutionary unpredictability for several of the data sets. But most of the predictions of paths of tumor progression are very unreliable, and unreliability increases with the number of features analyzed. Our results indicate that CPMs could be valuable tools for predicting cancer progression but that, currently, obtaining useful predictions of paths of tumor progression from CPMs is dubious, and emphasize the need for methodological work that can account for the probably multi-peaked fitness landscapes in cancer.
Collapse
Affiliation(s)
- Ramon Diaz-Uriarte
- Department of Biochemistry, Universidad Autónoma de Madrid, Madrid, Spain
- Instituto de Investigaciones Biomédicas “Alberto Sols” (UAM-CSIC), Madrid, Spain
| | - Claudia Vasallo
- Department of Biochemistry, Universidad Autónoma de Madrid, Madrid, Spain
- Instituto de Investigaciones Biomédicas “Alberto Sols” (UAM-CSIC), Madrid, Spain
| |
Collapse
|
12
|
Sarto Basso R, Hochbaum DS, Vandin F. Efficient algorithms to discover alterations with complementary functional association in cancer. PLoS Comput Biol 2019; 15:e1006802. [PMID: 31120875 PMCID: PMC6550413 DOI: 10.1371/journal.pcbi.1006802] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Revised: 06/05/2019] [Accepted: 01/17/2019] [Indexed: 12/20/2022] Open
Abstract
Recent large cancer studies have measured somatic alterations in an unprecedented number of tumours. These large datasets allow the identification of cancer-related sets of genetic alterations by identifying relevant combinatorial patterns. Among such patterns, mutual exclusivity has been employed by several recent methods that have shown its effectiveness in characterizing gene sets associated to cancer. Mutual exclusivity arises because of the complementarity, at the functional level, of alterations in genes which are part of a group (e.g., a pathway) performing a given function. The availability of quantitative target profiles, from genetic perturbations or from clinical phenotypes, provides additional information that can be leveraged to improve the identification of cancer related gene sets by discovering groups with complementary functional associations with such targets. In this work we study the problem of finding groups of mutually exclusive alterations associated with a quantitative (functional) target. We propose a combinatorial formulation for the problem, and prove that the associated computational problem is computationally hard. We design two algorithms to solve the problem and implement them in our tool UNCOVER. We provide analytic evidence of the effectiveness of UNCOVER in finding high-quality solutions and show experimentally that UNCOVER finds sets of alterations significantly associated with functional targets in a variety of scenarios. In particular, we show that our algorithms find sets which are better than the ones obtained by the state-of-the-art method, even when sets are evaluated using the statistical score employed by the latter. In addition, our algorithms are much faster than the state-of-the-art, allowing the analysis of large datasets of thousands of target profiles from cancer cell lines. We show that on two such datasets, one from project Achilles and one from the Genomics of Drug Sensitivity in Cancer project, UNCOVER identifies several significant gene sets with complementary functional associations with targets. Software available at: https://github.com/VandinLab/UNCOVER.
Collapse
Affiliation(s)
- Rebecca Sarto Basso
- Department of Industrial Engineering and Operations Research, University of California at Berkeley, Berkeley, CA, USA
| | - Dorit S. Hochbaum
- Department of Industrial Engineering and Operations Research, University of California at Berkeley, Berkeley, CA, USA
| | - Fabio Vandin
- Department of Information Engineering, University of Padova, Padova, Italy
- Department of Computer Science, Brown University, Providence, RI, USA
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
- * E-mail:
| |
Collapse
|
13
|
Zhang W, Wang SL. An Integrated Framework for Identifying Mutated Driver Pathway and Cancer Progression. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:455-464. [PMID: 29990286 DOI: 10.1109/tcbb.2017.2788016] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Next-generation sequencing (NGS) technologies provide amount of somatic mutation data in a large number of patients. The identification of mutated driver pathway and cancer progression from these data is a challenging task because of the heterogeneity of interpatient. In addition, cancer progression at the pathway level has been proved to be more reasonable than at the gene level. In this paper, we introduce an integrated framework to identify mutated driver pathways and cancer progression (iMDPCP) at the pathway level from somatic mutation data. First, we use uncertainty coefficient to quantify mutual exclusivity on gene driver pathways and develop a computational framework to identify mutated driver pathways based on the adaptive discrete differential evolution algorithm. Then, we construct cancer progression model for driver pathways based on the Bayesian Network. Finally, we evaluate the performance of iMDPCP on real cancer somatic mutation datasets. The experimental results indicate that iMDPCP is more accurate than state-of-the-art methods according to the enrichment of KEGG pathways, and it also provides new insights on identifying cancer progression at the pathway level.
Collapse
|
14
|
Diaz-Uriarte R. Cancer progression models and fitness landscapes: a many-to-many relationship. Bioinformatics 2018; 34:836-844. [PMID: 29048486 PMCID: PMC6031050 DOI: 10.1093/bioinformatics/btx663] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2017] [Accepted: 10/17/2017] [Indexed: 11/13/2022] Open
Abstract
Motivation The identification of constraints, due to gene interactions, in the order of accumulation of mutations during cancer progression can allow us to single out therapeutic targets. Cancer progression models (CPMs) use genotype frequency data from cross-sectional samples to identify these constraints, and return Directed Acyclic Graphs (DAGs) of restrictions where arrows indicate dependencies or constraints. On the other hand, fitness landscapes, which map genotypes to fitness, contain all possible paths of tumor progression. Thus, we expect a correspondence between DAGs from CPMs and the fitness landscapes where evolution happened. But many fitness landscapes-e.g. those with reciprocal sign epistasis-cannot be represented by CPMs. Results Using simulated data under 500 fitness landscapes, I show that CPMs' performance (prediction of genotypes that can exist) degrades with reciprocal sign epistasis. There is large variability in the DAGs inferred from each landscape, which is also affected by mutation rate, detection regime and fitness landscape features, in ways that depend on CPM method. Using three cancer datasets, I show that these problems strongly affect the analysis of empirical data: fitness landscapes that are widely different from each other produce data similar to the empirically observed ones and lead to DAGs that infer very different restrictions. Because reciprocal sign epistasis can be common in cancer, these results question the use and interpretation of CPMs. Availability and implementation Code available from Supplementary Material. Contact ramon.diaz@iib.uam.es. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ramon Diaz-Uriarte
- Department of Biochemistry, Universidad Autónoma de Madrid, Instituto de Investigaciones Biomédicas "Alberto Sols" (UAM-CSIC), Madrid 28029, Spain
| |
Collapse
|
15
|
Vandin F. Computational Methods for Characterizing Cancer Mutational Heterogeneity. Front Genet 2017; 8:83. [PMID: 28659971 PMCID: PMC5469877 DOI: 10.3389/fgene.2017.00083] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Accepted: 05/30/2017] [Indexed: 12/21/2022] Open
Abstract
Advances in DNA sequencing technologies have allowed the characterization of somatic mutations in a large number of cancer genomes at an unprecedented level of detail, revealing the extreme genetic heterogeneity of cancer at two different levels: inter-tumor, with different patients of the same cancer type presenting different collections of somatic mutations, and intra-tumor, with different clones coexisting within the same tumor. Both inter-tumor and intra-tumor heterogeneity have crucial implications for clinical practices. Here, we review computational methods that use somatic alterations measured through next-generation DNA sequencing technologies for characterizing tumor heterogeneity and its association with clinical variables. We first review computational methods for studying inter-tumor heterogeneity, focusing on methods that attempt to summarize cancer heterogeneity by discovering pathways that are commonly mutated across different patients of the same cancer type. We then review computational methods for characterizing intra-tumor heterogeneity using information from bulk sequencing data or from single cell sequencing data. Finally, we present some of the recent computational methodologies that have been proposed to identify and assess the association between inter- or intra-tumor heterogeneity with clinical variables.
Collapse
Affiliation(s)
- Fabio Vandin
- Department of Information Engineering, University of PadovaPadova, Italy
| |
Collapse
|
16
|
Dimitrakopoulos CM, Beerenwinkel N. Computational approaches for the identification of cancer genes and pathways. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2016; 9. [PMID: 27863091 PMCID: PMC5215607 DOI: 10.1002/wsbm.1364] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2016] [Revised: 07/26/2016] [Accepted: 08/23/2016] [Indexed: 12/27/2022]
Abstract
High‐throughput DNA sequencing techniques enable large‐scale measurement of somatic mutations in tumors. Cancer genomics research aims at identifying all cancer‐related genes and solid interpretation of their contribution to cancer initiation and development. However, this venture is characterized by various challenges, such as the high number of neutral passenger mutations and the complexity of the biological networks affected by driver mutations. Based on biological pathway and network information, sophisticated computational methods have been developed to facilitate the detection of cancer driver mutations and pathways. They can be categorized into (1) methods using known pathways from public databases, (2) network‐based methods, and (3) methods learning cancer pathways de novo. Methods in the first two categories use and integrate different types of data, such as biological pathways, protein interaction networks, and gene expression measurements. The third category consists of de novo methods that detect combinatorial patterns of somatic mutations across tumor samples, such as mutual exclusivity and co‐occurrence. In this review, we discuss recent advances, current limitations, and future challenges of these approaches for detecting cancer genes and pathways. We also discuss the most important current resources of cancer‐related genes. WIREs Syst Biol Med 2017, 9:e1364. doi: 10.1002/wsbm.1364 For further resources related to this article, please visit the WIREs website.
Collapse
Affiliation(s)
- Christos M Dimitrakopoulos
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| |
Collapse
|
17
|
Shrestha G, MacNeil SM, McQuerry JA, Jenkins DF, Sharma S, Bild AH. The value of genomics in dissecting the RAS-network and in guiding therapeutics for RAS-driven cancers. Semin Cell Dev Biol 2016; 58:108-17. [PMID: 27338857 PMCID: PMC5951171 DOI: 10.1016/j.semcdb.2016.06.012] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Accepted: 06/18/2016] [Indexed: 12/11/2022]
Abstract
The rise in genomic knowledge over the past decade has revealed the molecular etiology of many diseases, and has identified intricate signaling network activity in human cancers. Genomics provides the opportunity to determine genome structure and capture the activity of thousands of molecular events concurrently, which is important for deciphering highly complex genetic diseases such as cancer. In this review, we focus on genomic efforts directed towards one of cancer's most frequently mutated networks, the RAS pathway. Genomic tools such as gene expression signatures and assessment of mutations across the RAS network enable the capture of RAS signaling complexity. Due to this high level of interaction and cross-talk within the network, efforts to target RAS signaling in the clinic have generally failed, and we currently lack the ability to directly inhibit the RAS protein with high efficacy. We propose that the use of gene expression data can identify effective treatments that broadly inhibit the RAS network as this approach measures pathway activity independent of mutation status or any single mechanism of activation. Here, we review the genomic studies that map the complexity of the RAS network in cancer, and that show how genomic measurements of RAS pathway activation can identify effective RAS inhibition strategies. We also address the challenges and future directions for treating RAS-driven tumors. In summary, genomic assessment of RAS signaling provides a level of complexity necessary to accurately map the network that matches the intricacy of RAS pathway interactions in cancer.
Collapse
Affiliation(s)
- Gajendra Shrestha
- Department of Pharmacology and Toxicology, University of Utah, Salt Lake City, UT, USA
| | - Shelley M MacNeil
- Department of Pharmacology and Toxicology, University of Utah, Salt Lake City, UT, USA; Department of Oncological Sciences, University of Utah, Salt Lake City, UT, USA
| | - Jasmine A McQuerry
- Department of Pharmacology and Toxicology, University of Utah, Salt Lake City, UT, USA; Department of Oncological Sciences, University of Utah, Salt Lake City, UT, USA
| | - David F Jenkins
- Division of Computational Biomedicine, Boston University School of Medicine, Boston, MA, USA
| | - Sunil Sharma
- Department of Medicine, University of Utah School of Medicine, Salt Lake City, UT, USA; Center for Investigational Therapeutics, Huntsman Cancer Institute, Salt Lake City, UT, USA
| | - Andrea H Bild
- Department of Pharmacology and Toxicology, University of Utah, Salt Lake City, UT, USA; Department of Oncological Sciences, University of Utah, Salt Lake City, UT, USA.
| |
Collapse
|
18
|
Beerenwinkel N, Greenman CD, Lagergren J. Computational Cancer Biology: An Evolutionary Perspective. PLoS Comput Biol 2016; 12:e1004717. [PMID: 26845763 PMCID: PMC4742235 DOI: 10.1371/journal.pcbi.1004717] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Affiliation(s)
- Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
- * E-mail: (NB); (CDG); (JL)
| | - Chris D. Greenman
- School of Computing Sciences, University of East Anglia, Norwich, United Kingdom
- * E-mail: (NB); (CDG); (JL)
| | - Jens Lagergren
- Science for Life Laboratory, School of Computer Science and Communication, Swedish E-Science Research Center, KTH Royal Institute of Technology, Solna, Sweden
- * E-mail: (NB); (CDG); (JL)
| |
Collapse
|
19
|
Fleck JL, Pavel AB, Cassandras CG. Integrating mutation and gene expression cross-sectional data to infer cancer progression. BMC SYSTEMS BIOLOGY 2016; 10:12. [PMID: 26810975 PMCID: PMC4727329 DOI: 10.1186/s12918-016-0255-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/26/2015] [Accepted: 01/11/2016] [Indexed: 01/21/2023]
Abstract
Background A major problem in identifying the best therapeutic targets for cancer is the molecular heterogeneity of the disease. Cancer is often caused by an accumulation of mutations which produce irreversible damage to the cell’s control mechanisms of survival and proliferation. Different mutations may affect these cellular anachronisms through a combination of molecular interactions which may be dynamically changing during cancer progression. It has been previously shown that cancer accumulates mutations over time. In this paper we address the problem of cancer heterogeneity by modeling cancer progression using somatic mutation and gene expression cross-sectional data. Results We propose a novel formulation of integrating somatic mutation and gene expression data to infer the temporal sequence of events from cross-sectional data. Using a mixed integer linear program we model the interaction between groups of different mutated genes and the resulting modifications at the gene expression level. Our approach identifies a partition of mutation events which gradually produce gene expression changes to a partition of genes over time. The proposed formulation is tested using both simulated data and real breast cancer data with matched somatic mutations and gene expression measurements from The Cancer Genome Atlas. First, we classify the genes as oncogenes or tumor suppressors based on the frequency of driver mutations. As expected, the most frequently mutated genes in breast cancer are PIK3CA and TP53 genes. Then, we select those genes with most frequent driver mutations and a set of genes known to play roles in cancer development. Furthermore, we apply the proposed mixed integer linear program to identify the temporal order in which genes mutate and, simultaneously, the changes they produce at the gene expression level during cancer progression. In addition, we are able to identify known causal relationships between mutations and gene expression changes in PI3K/AKT and TP53 pathways. Conclusions This paper proposes a new model to infer the temporal sequence in which mutations occur and lead to changes at the gene expression level during cancer progression. The approach is general and can be applied to any data sets with available somatic mutations and gene expression measurements. Electronic supplementary material The online version of this article (doi:10.1186/s12918-016-0255-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Julia L Fleck
- Division of Systems Engineering, Boston University, 15 Saint Mary's Street, Brookline, MA 02446, USA.
| | - Ana B Pavel
- Graduate Program in Bioinformatics, Boston University, 24 Cummington Mall, Boston, MA 02215, USA. .,Section of Computational Biomedicine, Boston University School of Medicine, 72 East Concord Street, Boston, MA 02118, USA.
| | - Christos G Cassandras
- Division of Systems Engineering, Boston University, 15 Saint Mary's Street, Brookline, MA 02446, USA.
| |
Collapse
|