1
|
Rupp K, Lösch A, Hu YL, Nie C, Schill R, Klever M, Pfahler S, Grasedyck L, Wettig T, Beerenwinkel N, Spang R. Modeling metastatic progression from cross-sectional cancer genomics data. Bioinformatics 2024; 40:i140-i150. [PMID: 38940126 PMCID: PMC11245855 DOI: 10.1093/bioinformatics/btae250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION Metastasis formation is a hallmark of cancer lethality. Yet, metastases are generally unobservable during their early stages of dissemination and spread to distant organs. Genomic datasets of matched primary tumors and metastases may offer insights into the underpinnings and the dynamics of metastasis formation. RESULTS We present metMHN, a cancer progression model designed to deduce the joint progression of primary tumors and metastases using cross-sectional cancer genomics data. The model elucidates the statistical dependencies among genomic events, the formation of metastasis, and the clinical emergence of both primary tumors and their metastatic counterparts. metMHN enables the chronological reconstruction of mutational sequences and facilitates estimation of the timing of metastatic seeding. In a study of nearly 5000 lung adenocarcinomas, metMHN pinpointed TP53 and EGFR as mediators of metastasis formation. Furthermore, the study revealed that post-seeding adaptation is predominantly influenced by frequent copy number alterations. AVAILABILITY AND IMPLEMENTATION All datasets and code are available on GitHub at https://github.com/cbg-ethz/metMHN.
Collapse
Affiliation(s)
- Kevin Rupp
- Faculty of Informatics and Data Science—Statistical Bioinformatics Group, University of Regensburg, Regensburg 93053, Germany
- Department of Biosystems Science and Engineering, ETH Zurich, Basel 4056, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Andreas Lösch
- Faculty of Informatics and Data Science—Statistical Bioinformatics Group, University of Regensburg, Regensburg 93053, Germany
| | - Yanren Linda Hu
- Faculty of Informatics and Data Science—Statistical Bioinformatics Group, University of Regensburg, Regensburg 93053, Germany
| | - Chenxi Nie
- Department of Biosystems Science and Engineering, ETH Zurich, Basel 4056, Switzerland
| | - Rudolf Schill
- Faculty of Informatics and Data Science—Statistical Bioinformatics Group, University of Regensburg, Regensburg 93053, Germany
- Department of Biosystems Science and Engineering, ETH Zurich, Basel 4056, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Maren Klever
- Institute for Geometry and Applied Mathematics, RWTH Aachen, Aachen 52062, Germany
| | - Simon Pfahler
- Faculty of Physics, University of Regensburg, Regensburg 93053, Germany
| | - Lars Grasedyck
- Institute for Geometry and Applied Mathematics, RWTH Aachen, Aachen 52062, Germany
| | - Tilo Wettig
- Faculty of Physics, University of Regensburg, Regensburg 93053, Germany
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel 4056, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Rainer Spang
- Faculty of Informatics and Data Science—Statistical Bioinformatics Group, University of Regensburg, Regensburg 93053, Germany
| |
Collapse
|
2
|
Georg P, Grasedyck L, Klever M, Schill R, Spang R, Wettig T. Low-rank tensor methods for Markov chains with applications to tumor progression models. J Math Biol 2023; 86:7. [PMID: 36460900 PMCID: PMC9718722 DOI: 10.1007/s00285-022-01846-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Revised: 09/19/2022] [Accepted: 11/22/2022] [Indexed: 12/05/2022]
Abstract
Cancer progression can be described by continuous-time Markov chains whose state space grows exponentially in the number of somatic mutations. The age of a tumor at diagnosis is typically unknown. Therefore, the quantity of interest is the time-marginal distribution over all possible genotypes of tumors, defined as the transient distribution integrated over an exponentially distributed observation time. It can be obtained as the solution of a large linear system. However, the sheer size of this system renders classical solvers infeasible. We consider Markov chains whose transition rates are separable functions, allowing for an efficient low-rank tensor representation of the linear system's operator. Thus we can reduce the computational complexity from exponential to linear. We derive a convergent iterative method using low-rank formats whose result satisfies the normalization constraint of a distribution. We also perform numerical experiments illustrating that the marginal distribution is well approximated with low rank.
Collapse
Affiliation(s)
- Peter Georg
- Department of Physics, University of Regensburg, 93040 Regensburg, Germany
| | - Lars Grasedyck
- Institute for Geometry and Applied Mathematics, RWTH Aachen University, 52062 Aachen, Germany
| | - Maren Klever
- Institute for Geometry and Applied Mathematics, RWTH Aachen University, 52062 Aachen, Germany
| | - Rudolf Schill
- Department of Statistical Bioinformatics, Institute of Functional Genomics, University of Regensburg, 93040 Regensburg, Germany
| | - Rainer Spang
- Department of Statistical Bioinformatics, Institute of Functional Genomics, University of Regensburg, 93040 Regensburg, Germany
| | - Tilo Wettig
- Department of Physics, University of Regensburg, 93040 Regensburg, Germany
| |
Collapse
|
3
|
Chen J. Timed hazard networks: Incorporating temporal difference for oncogenetic analysis. PLoS One 2023; 18:e0283004. [PMID: 36928529 PMCID: PMC10019724 DOI: 10.1371/journal.pone.0283004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 03/01/2023] [Indexed: 03/18/2023] Open
Abstract
Oncogenetic graphical models are crucial for understanding cancer progression by analyzing the accumulation of genetic events. These models are used to identify statistical dependencies and temporal order of genetic events, which helps design targeted therapies. However, existing algorithms do not account for temporal differences between samples in oncogenetic analysis. This paper introduces Timed Hazard Networks (TimedHN), a new statistical model that uses temporal differences to improve accuracy and reliability. TimedHN models the accumulation process as a continuous-time Markov chain and includes an efficient gradient computation algorithm for optimization. Our simulation experiments demonstrate that TimedHN outperforms current state-of-the-art graph reconstruction methods. We also compare TimedHN with existing methods on a luminal breast cancer dataset, highlighting its potential utility. The Matlab implementation and data are available at https://github.com/puar-playground/TimedHN.
Collapse
Affiliation(s)
- Jian Chen
- Department of Computer Science and Engineering, University at Buffalo, Buffalo, NY, United States of America
- * E-mail:
| |
Collapse
|
4
|
Moen MT, Johnston IG. HyperHMM: efficient inference of evolutionary and progressive dynamics on hypercubic transition graphs. Bioinformatics 2022; 39:6895098. [PMID: 36511587 PMCID: PMC9848056 DOI: 10.1093/bioinformatics/btac803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 11/11/2022] [Accepted: 12/12/2022] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION The evolution of bacterial drug resistance and other features in biology, the progression of cancer and other diseases and a wide range of broader questions can often be viewed as the sequential stochastic acquisition of binary traits (e.g. genetic changes, symptoms or characters). Using potentially noisy or incomplete data to learn the sequences by which such traits are acquired is a problem of general interest. The problem is complicated for large numbers of traits, which may, individually or synergistically, influence the probability of further acquisitions both positively and negatively. Hypercubic inference approaches, based on hidden Markov models on a hypercubic transition network, address these complications, but previous Bayesian instances can consume substantial time for converged results, limiting their practical use. RESULTS Here, we introduce HyperHMM, an adapted Baum-Welch (expectation-maximization) algorithm for hypercubic inference with resampling to quantify uncertainty, and show that it allows orders-of-magnitude faster inference while making few practical sacrifices compared to previous hypercubic inference approaches. We show that HyperHMM allows any combination of traits to exert arbitrary positive or negative influence on the acquisition of other traits, relaxing a common limitation of only independent trait influences. We apply this approach to synthetic and biological datasets and discuss its more general application in learning evolutionary and progressive pathways. AVAILABILITY AND IMPLEMENTATION Code for inference and visualization, and data for example cases, is freely available at https://github.com/StochasticBiology/hypercube-hmm. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Marcus T Moen
- Department of Mathematics, University of Bergen, Bergen, Vestland, Norway
| | | |
Collapse
|
5
|
ToMExO: A probabilistic tree-structured model for cancer progression. PLoS Comput Biol 2022; 18:e1010732. [DOI: 10.1371/journal.pcbi.1010732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Revised: 12/15/2022] [Accepted: 11/14/2022] [Indexed: 12/12/2022] Open
Abstract
Identifying the interrelations among cancer driver genes and the patterns in which the driver genes get mutated is critical for understanding cancer. In this paper, we study cross-sectional data from cohorts of tumors to identify the cancer-type (or subtype) specific process in which the cancer driver genes accumulate critical mutations. We model this mutation accumulation process using a tree, where each node includes a driver gene or a set of driver genes. A mutation in each node enables its children to have a chance of mutating. This model simultaneously explains the mutual exclusivity patterns observed in mutations in specific cancer genes (by its nodes) and the temporal order of events (by its edges). We introduce a computationally efficient dynamic programming procedure for calculating the likelihood of our noisy datasets and use it to build our Markov Chain Monte Carlo (MCMC) inference algorithm, ToMExO. Together with a set of engineered MCMC moves, our fast likelihood calculations enable us to work with datasets with hundreds of genes and thousands of tumors, which cannot be dealt with using available cancer progression analysis methods. We demonstrate our method’s performance on several synthetic datasets covering various scenarios for cancer progression dynamics. Then, a comparison against two state-of-the-art methods on a moderate-size biological dataset shows the merits of our algorithm in identifying significant and valid patterns. Finally, we present our analyses of several large biological datasets, including colorectal cancer, glioblastoma, and pancreatic cancer. In all the analyses, we validate the results using a set of method-independent metrics testing the causality and significance of the relations identified by ToMExO or competing methods.
Collapse
|
6
|
Jiang H, Li Q, Lin JT, Lin FC. Classification of disease recurrence using transition likelihoods with expectation-maximization algorithm. Stat Med 2022; 41:4697-4715. [PMID: 35908812 PMCID: PMC9489660 DOI: 10.1002/sim.9534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 05/17/2022] [Accepted: 07/10/2022] [Indexed: 11/09/2022]
Abstract
When an infectious disease recurs, it may be due to treatment failure or a new infection. Being able to distinguish and classify these two different outcomes is critical in effective disease control. A multi-state model based on Markov processes is a typical approach to estimating the transition probability between the disease states. However, it can perform poorly when the disease state is unknown. This article aims to demonstrate that the transition likelihoods of baseline covariates can distinguish one cause from another with high accuracy in infectious diseases such as malaria. A more general model for disease progression can be constructed to allow for additional disease outcomes. We start from a multinomial logit model to estimate the disease transition probabilities and then utilize the baseline covariate's transition information to provide a more accurate classification result. We apply the expectation-maximization (EM) algorithm to estimate unknown parameters, including the marginal probabilities of disease outcomes. A simulation study comparing our classifier to the existing two-stage method shows that our classifier has better accuracy, especially when the sample size is small. The proposed method is applied to determining relapse vs reinfection outcomes in two Plasmodium vivax treatment studies from Cambodia that used different genotyping approaches to demonstrate its practical use.
Collapse
Affiliation(s)
- Huijun Jiang
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
| | - Quefeng Li
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
| | - Jessica T. Lin
- Division of Infectious Disease, School of Medicine, University of North Carolina, Chapel Hill, NC, USA
| | - Feng-Chang Lin
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
| |
Collapse
|
7
|
Inferring tumor progression in large datasets. PLoS Comput Biol 2020; 16:e1008183. [PMID: 33035204 PMCID: PMC7577444 DOI: 10.1371/journal.pcbi.1008183] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Revised: 10/21/2020] [Accepted: 07/22/2020] [Indexed: 12/31/2022] Open
Abstract
Identification of mutations of the genes that give cancer a selective advantage is an important step towards research and clinical objectives. As such, there has been a growing interest in developing methods for identification of driver genes and their temporal order within a single patient (intra-tumor) as well as across a cohort of patients (inter-tumor). In this paper, we develop a probabilistic model for tumor progression, in which the driver genes are clustered into several ordered driver pathways. We develop an efficient inference algorithm that exhibits favorable scalability to the number of genes and samples compared to a previously introduced ILP-based method. Adopting a probabilistic approach also allows principled approaches to model selection and uncertainty quantification. Using a large set of experiments on synthetic datasets, we demonstrate our superior performance compared to the ILP-based method. We also analyze two biological datasets of colorectal and glioblastoma cancers. We emphasize that while the ILP-based method puts many seemingly passenger genes in the driver pathways, our algorithm keeps focused on truly driver genes and outputs more accurate models for cancer progression. Cancer is a disease caused by the accumulation of somatic mutations in the genome. This process is mainly driven by mutations in certain genes that give the harboring cells some selective advantage. The rather few driver genes are usually masked amongst an abundance of so-called passenger mutations. Identification of the driver genes and the temporal order in which the mutations occur is of great importance towards research and clinical objectives. In this paper, we introduce a probabilistic model for cancer progression and devise an efficient inference algorithm to train the model. We show that our method scales favorably to large datasets and provides superior performance compared to an ILP-based counterpart on a wide set of synthetic data simulations. Our Bayesian approach also allows for systematic model selection and confidence quantification procedures in contrast to the previous non-probabilistic progression models. We also study two large datasets on colorectal and glioblastoma cancers and validate our inferred model in comparison to the ILP-based method.
Collapse
|
8
|
Schill R, Solbrig S, Wettig T, Spang R. Modelling cancer progression using Mutual Hazard Networks. Bioinformatics 2020; 36:241-249. [PMID: 31250881 PMCID: PMC6956791 DOI: 10.1093/bioinformatics/btz513] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2018] [Revised: 03/29/2019] [Accepted: 06/25/2019] [Indexed: 12/26/2022] Open
Abstract
MOTIVATION Cancer progresses by accumulating genomic events, such as mutations and copy number alterations, whose chronological order is key to understanding the disease but difficult to observe. Instead, cancer progression models use co-occurrence patterns in cross-sectional data to infer epistatic interactions between events and thereby uncover their most likely order of occurrence. State-of-the-art progression models, however, are limited by mathematical tractability and only allow events to interact in directed acyclic graphs, to promote but not inhibit subsequent events, or to be mutually exclusive in distinct groups that cannot overlap. RESULTS Here we propose Mutual Hazard Networks (MHN), a new Machine Learning algorithm to infer cyclic progression models from cross-sectional data. MHN model events by their spontaneous rate of fixation and by multiplicative effects they exert on the rates of successive events. MHN compared favourably to acyclic models in cross-validated model fit on four datasets tested. In application to the glioblastoma dataset from The Cancer Genome Atlas, MHN proposed a novel interaction in line with consecutive biopsies: IDH1 mutations are early events that promote subsequent fixation of TP53 mutations. AVAILABILITY AND IMPLEMENTATION Implementation and data are available at https://github.com/RudiSchill/MHN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rudolf Schill
- Department of Statistical Bioinformatics, Institute of Functional Genomics, Regensburg 93040, Germany
| | - Stefan Solbrig
- Department of Physics, University of Regensburg, Regensburg 93040, Germany
| | - Tilo Wettig
- Department of Physics, University of Regensburg, Regensburg 93040, Germany
| | - Rainer Spang
- Department of Statistical Bioinformatics, Institute of Functional Genomics, Regensburg 93040, Germany
| |
Collapse
|
9
|
HyperTraPS: Inferring Probabilistic Patterns of Trait Acquisition in Evolutionary and Disease Progression Pathways. Cell Syst 2020; 10:39-51.e10. [PMID: 31786211 DOI: 10.1016/j.cels.2019.10.009] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Revised: 08/23/2019] [Accepted: 10/26/2019] [Indexed: 01/15/2023]
Abstract
The explosion of data throughout the biomedical sciences provides unprecedented opportunities to learn about the dynamics of evolution and disease progression, but harnessing these large and diverse datasets remains challenging. Here, we describe a highly generalizable statistical platform to infer the dynamic pathways by which many, potentially interacting, traits are acquired or lost over time. We use HyperTraPS (hypercubic transition path sampling) to efficiently learn progression pathways from cross-sectional, longitudinal, or phylogenetically linked data, readily distinguishing multiple competing pathways, and identifying the most parsimonious mechanisms underlying given observations. This Bayesian approach allows inclusion of prior knowledge, quantifies uncertainty in pathway structure, and allows predictions, such as which symptom a patient will acquire next. We provide visualization tools for intuitive assessment of multiple, variable pathways. We apply the method to ovarian cancer progression and the evolution of multidrug resistance in tuberculosis, demonstrating its power to reveal previously undetected dynamic pathways.
Collapse
|
10
|
Khakabimamaghani S, Ding D, Snow O, Ester M. Uncovering the subtype-specific temporal order of cancer pathway dysregulation. PLoS Comput Biol 2019; 15:e1007451. [PMID: 31710622 PMCID: PMC6872169 DOI: 10.1371/journal.pcbi.1007451] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 11/21/2019] [Accepted: 09/30/2019] [Indexed: 12/20/2022] Open
Abstract
Cancer is driven by genetic mutations that dysregulate pathways important for proper cell function. Therefore, discovering these cancer pathways and their dysregulation order is key to understanding and treating cancer. However, the heterogeneity of mutations between different individuals makes this challenging and requires that cancer progression is studied in a subtype-specific way. To address this challenge, we provide a mathematical model, called Subtype-specific Pathway Linear Progression Model (SPM), that simultaneously captures cancer subtypes and pathways and order of dysregulation of the pathways within each subtype. Experiments with synthetic data indicate the robustness of SPM to problem specifics including noise compared to an existing method. Moreover, experimental results on glioblastoma multiforme and colorectal adenocarcinoma show the consistency of SPM's results with the existing knowledge and its superiority to an existing method in certain cases. The implementation of our method is available at https://github.com/Dalton386/SPM.
Collapse
Affiliation(s)
| | - Dujian Ding
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Oliver Snow
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Martin Ester
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada
| |
Collapse
|
11
|
Hainke K, Szugat S, Fried R, Rahnenführer J. Variable selection for disease progression models: methods for oncogenetic trees and application to cancer and HIV. BMC Bioinformatics 2017; 18:358. [PMID: 28764644 PMCID: PMC5539896 DOI: 10.1186/s12859-017-1762-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2016] [Accepted: 07/14/2017] [Indexed: 12/12/2022] Open
Abstract
Background Disease progression models are important for understanding the critical steps during the development of diseases. The models are imbedded in a statistical framework to deal with random variations due to biology and the sampling process when observing only a finite population. Conditional probabilities are used to describe dependencies between events that characterise the critical steps in the disease process. Many different model classes have been proposed in the literature, from simple path models to complex Bayesian networks. A popular and easy to understand but yet flexible model class are oncogenetic trees. These have been applied to describe the accumulation of genetic aberrations in cancer and HIV data. However, the number of potentially relevant aberrations is often by far larger than the maximal number of events that can be used for reliably estimating the progression models. Still, there are only a few approaches to variable selection, which have not yet been investigated in detail. Results We fill this gap and propose specifically for oncogenetic trees ten variable selection methods, some of these being completely new. We compare them in an extensive simulation study and on real data from cancer and HIV. It turns out that the preselection of events by clique identification algorithms performs best. Here, events are selected if they belong to the largest or the maximum weight subgraph in which all pairs of vertices are connected. Conclusions The variable selection method of identifying cliques finds both the important frequent events and those related to disease pathways. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1762-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Katrin Hainke
- Department of Statistics, TU Dortmund University, Dortmund, 44221, Germany
| | - Sebastian Szugat
- Department of Statistics, TU Dortmund University, Dortmund, 44221, Germany
| | - Roland Fried
- Department of Statistics, TU Dortmund University, Dortmund, 44221, Germany
| | - Jörg Rahnenführer
- Department of Statistics, TU Dortmund University, Dortmund, 44221, Germany.
| |
Collapse
|
12
|
Montazeri H, Kuipers J, Kouyos R, Böni J, Yerly S, Klimkait T, Aubert V, Günthard HF, Beerenwinkel N. Large-scale inference of conjunctive Bayesian networks. Bioinformatics 2017; 32:i727-i735. [PMID: 27587695 DOI: 10.1093/bioinformatics/btw459] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
UNLABELLED The continuous time conjunctive Bayesian network (CT-CBN) is a graphical model for analyzing the waiting time process of the accumulation of genetic changes (mutations). CT-CBN models have been successfully used in several biological applications such as HIV drug resistance development and genetic progression of cancer. However, current approaches for parameter estimation and network structure learning of CBNs can only deal with a small number of mutations (<20). Here, we address this limitation by presenting an efficient and accurate approximate inference algorithm using a Monte Carlo expectation-maximization algorithm based on importance sampling. The new method can now be used for a large number of mutations, up to one thousand, an increase by two orders of magnitude. In simulation studies, we present the accuracy as well as the running time efficiency of the new inference method and compare it with a MLE method, expectation-maximization, and discrete time CBN model, i.e. a first-order approximation of the CT-CBN model. We also study the application of the new model on HIV drug resistance datasets for the combination therapy with zidovudine plus lamivudine (AZT + 3TC) as well as under no treatment, both extracted from the Swiss HIV Cohort Study database. AVAILABILITY AND IMPLEMENTATION The proposed method is implemented as an R package available at https://github.com/cbg-ethz/MC-CBN CONTACT: niko.beerenwinkel@bsse.ethz.ch SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hesam Montazeri
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Roger Kouyos
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland Institute of Medical Virology
| | - Jürg Böni
- Swiss National Center for Retroviruses, Institute of Medical Virology, University of Zurich, Zurich 8057, Switzerland
| | - Sabine Yerly
- Laboratory of Virology, Division of Infectious Diseases, Geneva University Hospital, Geneva, Switzerland
| | - Thomas Klimkait
- Department of Biomedicine-Petersplatz, University of Basel, Basel, Switzerland
| | - Vincent Aubert
- Division of Immunology and Allergy, University Hospital Lausanne, Lausanne, Switzerland
| | - Huldrych F Günthard
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland Institute of Medical Virology
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | | |
Collapse
|
13
|
Cristea S, Kuipers J, Beerenwinkel N. pathTiMEx: Joint Inference of Mutually Exclusive Cancer Pathways and Their Progression Dynamics. J Comput Biol 2017; 24:603-615. [DOI: 10.1089/cmb.2016.0171] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Affiliation(s)
- Simona Cristea
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- The Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- The Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- The Swiss Institute of Bioinformatics, Basel, Switzerland
| |
Collapse
|
14
|
Abstract
Rapid advances in high-throughput sequencing and a growing realization of the importance of evolutionary theory to cancer genomics have led to a proliferation of phylogenetic studies of tumour progression. These studies have yielded not only new insights but also a plethora of experimental approaches, sometimes reaching conflicting or poorly supported conclusions. Here, we consider this body of work in light of the key computational principles underpinning phylogenetic inference, with the goal of providing practical guidance on the design and analysis of scientifically rigorous tumour phylogeny studies. We survey the range of methods and tools available to the researcher, their key applications, and the various unsolved problems, closing with a perspective on the prospects and broader implications of this field.
Collapse
Affiliation(s)
- Russell Schwartz
- Department of Biological Sciences and Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15217, USA
| | - Alejandro A Schäffer
- Computational Biology Branch, National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland 20892, USA
| |
Collapse
|
15
|
Colijn C, Jones N, Johnston IG, Yaliraki S, Barahona M. Toward Precision Healthcare: Context and Mathematical Challenges. Front Physiol 2017; 8:136. [PMID: 28377724 PMCID: PMC5359292 DOI: 10.3389/fphys.2017.00136] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2016] [Accepted: 02/22/2017] [Indexed: 12/12/2022] Open
Abstract
Precision medicine refers to the idea of delivering the right treatment to the right patient at the right time, usually with a focus on a data-centered approach to this task. In this perspective piece, we use the term "precision healthcare" to describe the development of precision approaches that bridge from the individual to the population, taking advantage of individual-level data, but also taking the social context into account. These problems give rise to a broad spectrum of technical, scientific, policy, ethical and social challenges, and new mathematical techniques will be required to meet them. To ensure that the science underpinning "precision" is robust, interpretable and well-suited to meet the policy, ethical and social questions that such approaches raise, the mathematical methods for data analysis should be transparent, robust, and able to adapt to errors and uncertainties. In particular, precision methodologies should capture the complexity of data, yet produce tractable descriptions at the relevant resolution while preserving intelligibility and traceability, so that they can be used by practitioners to aid decision-making. Through several case studies in this domain of precision healthcare, we argue that this vision requires the development of new mathematical frameworks, both in modeling and in data analysis and interpretation.
Collapse
Affiliation(s)
- Caroline Colijn
- Department of Mathematics, Imperial College LondonLondon, UK
- EPSRC Centre for Mathematics of Precision Healthcare, Imperial College LondonLondon, UK
| | - Nick Jones
- Department of Mathematics, Imperial College LondonLondon, UK
- EPSRC Centre for Mathematics of Precision Healthcare, Imperial College LondonLondon, UK
| | - Iain G. Johnston
- EPSRC Centre for Mathematics of Precision Healthcare, Imperial College LondonLondon, UK
- School of Biosciences, University of BirminghamBirmingham, UK
| | - Sophia Yaliraki
- EPSRC Centre for Mathematics of Precision Healthcare, Imperial College LondonLondon, UK
- Department of Chemistry, Imperial College LondonLondon, UK
| | - Mauricio Barahona
- Department of Mathematics, Imperial College LondonLondon, UK
- EPSRC Centre for Mathematics of Precision Healthcare, Imperial College LondonLondon, UK
| |
Collapse
|
16
|
Gertz EM, Chowdhury SA, Lee WJ, Wangsa D, Heselmeyer-Haddad K, Ried T, Schwartz R, Schäffer AA. FISHtrees 3.0: Tumor Phylogenetics Using a Ploidy Probe. PLoS One 2016; 11:e0158569. [PMID: 27362268 PMCID: PMC4928784 DOI: 10.1371/journal.pone.0158569] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Accepted: 06/19/2016] [Indexed: 01/03/2023] Open
Abstract
Advances in fluorescence in situ hybridization (FISH) make it feasible to detect multiple copy-number changes in hundreds of cells of solid tumors. Studies using FISH, sequencing, and other technologies have revealed substantial intra-tumor heterogeneity. The evolution of subclones in tumors may be modeled by phylogenies. Tumors often harbor aneuploid or polyploid cell populations. Using a FISH probe to estimate changes in ploidy can guide the creation of trees that model changes in ploidy and individual gene copy-number variations. We present FISHtrees 3.0, which implements a ploidy-based tree building method based on mixed integer linear programming (MILP). The ploidy-based modeling in FISHtrees includes a new formulation of the problem of merging trees for changes of a single gene into trees modeling changes in multiple genes and the ploidy. When multiple samples are collected from each patient, varying over time or tumor regions, it is useful to evaluate similarities in tumor progression among the samples. Therefore, we further implemented in FISHtrees 3.0 a new method to build consensus graphs for multiple samples. We validate FISHtrees 3.0 on a simulated data and on FISH data from paired cases of cervical primary and metastatic tumors and on paired breast ductal carcinoma in situ (DCIS) and invasive ductal carcinoma (IDC). Tests on simulated data show improved accuracy of the ploidy-based approach relative to prior ploidyless methods. Tests on real data further demonstrate novel insights these methods offer into tumor progression processes. Trees for DCIS samples are significantly less complex than trees for paired IDC samples. Consensus graphs show substantial divergence among most paired samples from both sets. Low consensus between DCIS and IDC trees may help explain the difficulty in finding biomarkers that predict which DCIS cases are at most risk to progress to IDC. The FISHtrees software is available at ftp://ftp.ncbi.nih.gov/pub/FISHtrees.
Collapse
MESH Headings
- Biomarkers, Tumor/genetics
- Breast Neoplasms/genetics
- Breast Neoplasms/pathology
- Carcinoma, Ductal, Breast/genetics
- Carcinoma, Ductal, Breast/pathology
- Carcinoma, Intraductal, Noninfiltrating/genetics
- Carcinoma, Intraductal, Noninfiltrating/pathology
- Databases, Genetic
- Female
- Humans
- In Situ Hybridization, Fluorescence/methods
- Ploidies
- Uterine Cervical Neoplasms/genetics
- Uterine Cervical Neoplasms/pathology
Collapse
Affiliation(s)
- E. Michael Gertz
- Computational Biology Branch, National Center for Biotechnology Information, U.S. National Institutes of Health, Bethesda, MD, United States of America
| | - Salim Akhter Chowdhury
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States of America
- Carnegie Mellon/University of Pittsburgh Joint Ph.D. Program in Computational Biology, Pittsburgh, PA, United States of America
| | - Woei-Jyh Lee
- Computational Biology Branch, National Center for Biotechnology Information, U.S. National Institutes of Health, Bethesda, MD, United States of America
| | - Darawalee Wangsa
- Section of Cancer Genomics, Genetics Branch, Center for Cancer Research, National Cancer Institute, U.S. National Institutes of Health, Bethesda, MD, United States of America
| | - Kerstin Heselmeyer-Haddad
- Section of Cancer Genomics, Genetics Branch, Center for Cancer Research, National Cancer Institute, U.S. National Institutes of Health, Bethesda, MD, United States of America
| | - Thomas Ried
- Section of Cancer Genomics, Genetics Branch, Center for Cancer Research, National Cancer Institute, U.S. National Institutes of Health, Bethesda, MD, United States of America
| | - Russell Schwartz
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States of America
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, United States of America
| | - Alejandro A. Schäffer
- Computational Biology Branch, National Center for Biotechnology Information, U.S. National Institutes of Health, Bethesda, MD, United States of America
| |
Collapse
|
17
|
Johnston IG, Williams BP. Evolutionary Inference across Eukaryotes Identifies Specific Pressures Favoring Mitochondrial Gene Retention. Cell Syst 2016; 2:101-11. [PMID: 27135164 DOI: 10.1016/j.cels.2016.01.013] [Citation(s) in RCA: 110] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2015] [Revised: 12/14/2015] [Accepted: 01/27/2016] [Indexed: 11/18/2022]
Abstract
Since their endosymbiotic origin, mitochondria have lost most of their genes. Although many selective mechanisms underlying the evolution of mitochondrial genomes have been proposed, a data-driven exploration of these hypotheses is lacking, and a quantitatively supported consensus remains absent. We developed HyperTraPS, a methodology coupling stochastic modeling with Bayesian inference, to identify the ordering of evolutionary events and suggest their causes. Using 2015 complete mitochondrial genomes, we inferred evolutionary trajectories of mtDNA gene loss across the eukaryotic tree of life. We find that proteins comprising the structural cores of the electron transport chain are preferentially encoded within mitochondrial genomes across eukaryotes. A combination of high GC content and high protein hydrophobicity is required to explain patterns of mtDNA gene retention; a model that accounts for these selective pressures can also predict the success of artificial gene transfer experiments in vivo. This work provides a general method for data-driven inference of the ordering of evolutionary and progressive events, here identifying the distinct features shaping mitochondrial genomes of present-day species.
Collapse
Affiliation(s)
- Iain G Johnston
- School of Biosciences, University of Birmingham, Birmingham B15 2TT, UK.
| | - Ben P Williams
- Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA
| |
Collapse
|
18
|
Beerenwinkel N, Greenman CD, Lagergren J. Computational Cancer Biology: An Evolutionary Perspective. PLoS Comput Biol 2016; 12:e1004717. [PMID: 26845763 PMCID: PMC4742235 DOI: 10.1371/journal.pcbi.1004717] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Affiliation(s)
- Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
- * E-mail: (NB); (CDG); (JL)
| | - Chris D. Greenman
- School of Computing Sciences, University of East Anglia, Norwich, United Kingdom
- * E-mail: (NB); (CDG); (JL)
| | - Jens Lagergren
- Science for Life Laboratory, School of Computer Science and Communication, Swedish E-Science Research Center, KTH Royal Institute of Technology, Solna, Sweden
- * E-mail: (NB); (CDG); (JL)
| |
Collapse
|
19
|
Chowdhury SA, Gertz EM, Wangsa D, Heselmeyer-Haddad K, Ried T, Schäffer AA, Schwartz R. Inferring models of multiscale copy number evolution for single-tumor phylogenetics. Bioinformatics 2015; 31:i258-67. [PMID: 26072490 PMCID: PMC4481700 DOI: 10.1093/bioinformatics/btv233] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Motivation: Phylogenetic algorithms have begun to see widespread use in cancer research to reconstruct processes of evolution in tumor progression. Developing reliable phylogenies for tumor data requires quantitative models of cancer evolution that include the unusual genetic mechanisms by which tumors evolve, such as chromosome abnormalities, and allow for heterogeneity between tumor types and individual patients. Previous work on inferring phylogenies of single tumors by copy number evolution assumed models of uniform rates of genomic gain and loss across different genomic sites and scales, a substantial oversimplification necessitated by a lack of algorithms and quantitative parameters for fitting to more realistic tumor evolution models. Results: We propose a framework for inferring models of tumor progression from single-cell gene copy number data, including variable rates for different gain and loss events. We propose a new algorithm for identification of most parsimonious combinations of single gene and single chromosome events. We extend it via dynamic programming to include genome duplications. We implement an expectation maximization (EM)-like method to estimate mutation-specific and tumor-specific event rates concurrently with tree reconstruction. Application of our algorithms to real cervical cancer data identifies key genomic events in disease progression consistent with prior literature. Classification experiments on cervical and tongue cancer datasets lead to improved prediction accuracy for the metastasis of primary cervical cancers and for tongue cancer survival. Availability and implementation: Our software (FISHtrees) and two datasets are available at ftp://ftp.ncbi.nlm.nih.gov/pub/FISHtrees. Contact:russells@andrew.cmu.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Salim Akhter Chowdhury
- Joint Carnegie Mellon/University of Pittsburgh PhD Program in Computational Biology, Pittsburgh, PA, USA, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA, Computational Biology Branch, National Center for Biotechnology Information, U.S. National Institutes of Health, Bethesda, MD, USA, Section of Cancer Genomics, Genetics Branch, Center for Cancer Research, National Cancer Institute, U.S. National Institutes of Health, Bethesda, MD, USA and Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA Joint Carnegie Mellon/University of Pittsburgh PhD Program in Computational Biology, Pittsburgh, PA, USA, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA, Computational Biology Branch, National Center for Biotechnology Information, U.S. National Institutes of Health, Bethesda, MD, USA, Section of Cancer Genomics, Genetics Branch, Center for Cancer Research, National Cancer Institute, U.S. National Institutes of Health, Bethesda, MD, USA and Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA
| | - E Michael Gertz
- Joint Carnegie Mellon/University of Pittsburgh PhD Program in Computational Biology, Pittsburgh, PA, USA, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA, Computational Biology Branch, National Center for Biotechnology Information, U.S. National Institutes of Health, Bethesda, MD, USA, Section of Cancer Genomics, Genetics Branch, Center for Cancer Research, National Cancer Institute, U.S. National Institutes of Health, Bethesda, MD, USA and Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Darawalee Wangsa
- Joint Carnegie Mellon/University of Pittsburgh PhD Program in Computational Biology, Pittsburgh, PA, USA, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA, Computational Biology Branch, National Center for Biotechnology Information, U.S. National Institutes of Health, Bethesda, MD, USA, Section of Cancer Genomics, Genetics Branch, Center for Cancer Research, National Cancer Institute, U.S. National Institutes of Health, Bethesda, MD, USA and Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Kerstin Heselmeyer-Haddad
- Joint Carnegie Mellon/University of Pittsburgh PhD Program in Computational Biology, Pittsburgh, PA, USA, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA, Computational Biology Branch, National Center for Biotechnology Information, U.S. National Institutes of Health, Bethesda, MD, USA, Section of Cancer Genomics, Genetics Branch, Center for Cancer Research, National Cancer Institute, U.S. National Institutes of Health, Bethesda, MD, USA and Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Thomas Ried
- Joint Carnegie Mellon/University of Pittsburgh PhD Program in Computational Biology, Pittsburgh, PA, USA, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA, Computational Biology Branch, National Center for Biotechnology Information, U.S. National Institutes of Health, Bethesda, MD, USA, Section of Cancer Genomics, Genetics Branch, Center for Cancer Research, National Cancer Institute, U.S. National Institutes of Health, Bethesda, MD, USA and Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Alejandro A Schäffer
- Joint Carnegie Mellon/University of Pittsburgh PhD Program in Computational Biology, Pittsburgh, PA, USA, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA, Computational Biology Branch, National Center for Biotechnology Information, U.S. National Institutes of Health, Bethesda, MD, USA, Section of Cancer Genomics, Genetics Branch, Center for Cancer Research, National Cancer Institute, U.S. National Institutes of Health, Bethesda, MD, USA and Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Russell Schwartz
- Joint Carnegie Mellon/University of Pittsburgh PhD Program in Computational Biology, Pittsburgh, PA, USA, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA, Computational Biology Branch, National Center for Biotechnology Information, U.S. National Institutes of Health, Bethesda, MD, USA, Section of Cancer Genomics, Genetics Branch, Center for Cancer Research, National Cancer Institute, U.S. National Institutes of Health, Bethesda, MD, USA and Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA Joint Carnegie Mellon/University of Pittsburgh PhD Program in Computational Biology, Pittsburgh, PA, USA, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA, Computational Biology Branch, National Center for Biotechnology Information, U.S. National Institutes of Health, Bethesda, MD, USA, Section of Cancer Genomics, Genetics Branch, Center for Cancer Research, National Cancer Institute, U.S. National Institutes of Health, Bethesda, MD, USA and Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
20
|
Abstract
Mathematical modelling approaches have become increasingly abundant in cancer research. The complexity of cancer is well suited to quantitative approaches as it provides challenges and opportunities for new developments. In turn, mathematical modelling contributes to cancer research by helping to elucidate mechanisms and by providing quantitative predictions that can be validated. The recent expansion of quantitative models addresses many questions regarding tumour initiation, progression and metastases as well as intra-tumour heterogeneity, treatment responses and resistance. Mathematical models can complement experimental and clinical studies, but also challenge current paradigms, redefine our understanding of mechanisms driving tumorigenesis and shape future research in cancer biology.
Collapse
Affiliation(s)
- Philipp M Altrock
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Department of Biostatistics, Harvard T.H. Chan School of Public Health, 450 Brookline Avenue, Boston, Massachusetts 02115, USA
- Program for Evolutionary Dynamics, Harvard University, 1 Brattle Square, Suite 6, Cambridge, Massachusetts 02138, USA
| | - Lin L Liu
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Department of Biostatistics, Harvard T.H. Chan School of Public Health, 450 Brookline Avenue, Boston, Massachusetts 02115, USA
| | - Franziska Michor
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Department of Biostatistics, Harvard T.H. Chan School of Public Health, 450 Brookline Avenue, Boston, Massachusetts 02115, USA
| |
Collapse
|
21
|
Lecca P, Casiraghi N, Demichelis F. Defining order and timing of mutations during cancer progression: the TO-DAG probabilistic graphical model. Front Genet 2015; 6:309. [PMID: 26528329 PMCID: PMC4602157 DOI: 10.3389/fgene.2015.00309] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2015] [Accepted: 09/24/2015] [Indexed: 01/08/2023] Open
Abstract
Somatic mutations arise and accumulate both during tumor genesis and progression. However, the order in which mutations occur is an open question and the inference of the temporal ordering at the gene level could potentially impact on patient treatment. Thus, exploiting recent observations suggesting that the occurrence of mutations is a non-memoryless process, we developed a computational approach to infer timed oncogenetic directed acyclic graphs (TO-DAGs) from human tumor mutation data. Such graphs represent the path and the waiting times of alterations during tumor evolution. The probability of occurrence of each alteration in a path is the probability that the alteration occurs when all alterations prior to it have occurred. The waiting time between an alteration and the subsequent is modeled as a stochastic function of the conditional probability of the event given the occurrence of the previous one. TO-DAG performances have been evaluated both on synthetic data and on somatic non-silent mutations from prostate cancer and melanoma patients and then compared with those of current well-established approaches. TO-DAG shows high performance scores on synthetic data and recognizes mutations in gatekeeper tumor suppressor genes as trigger for several downstream mutational events in the human tumor data.
Collapse
Affiliation(s)
- Paola Lecca
- Laboratory of Computational Oncology, Centre for Integrative Biology, University of Trento Trento, Italy
| | - Nicola Casiraghi
- Laboratory of Computational Oncology, Centre for Integrative Biology, University of Trento Trento, Italy
| | - Francesca Demichelis
- Laboratory of Computational Oncology, Centre for Integrative Biology, University of Trento Trento, Italy ; Department of Physiology and Biophysics, Institute for Computational Biomedicine, Weill Medical College of Cornell University New York, NY, USA
| |
Collapse
|
22
|
Ramazzotti D, Caravagna G, Olde Loohuis L, Graudenzi A, Korsunsky I, Mauri G, Antoniotti M, Mishra B. CAPRI: efficient inference of cancer progression models from cross-sectional data. Bioinformatics 2015; 31:3016-26. [DOI: 10.1093/bioinformatics/btv296] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Accepted: 05/04/2015] [Indexed: 12/27/2022] Open
|
23
|
Raphael BJ, Vandin F. Simultaneous inference of cancer pathways and tumor progression from cross-sectional mutation data. J Comput Biol 2015; 22:510-27. [PMID: 25785493 DOI: 10.1089/cmb.2014.0161] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Recent cancer sequencing studies provide a wealth of somatic mutation data from a large number of patients. One of the most intriguing and challenging questions arising from this data is to determine whether the temporal order of somatic mutations in a cancer follows any common progression. Since we usually obtain only one sample from a patient, such inferences are commonly made from cross-sectional data from different patients. This analysis is complicated by the extensive variation in the somatic mutations across different patients, variation that is reduced by examining combinations of mutations in various pathways. Thus far, methods to reconstruct tumor progression at the pathway level have restricted attention to known, a priori defined pathways. In this work we show how to simultaneously infer pathways and the temporal order of their mutations from cross-sectional data, leveraging on the exclusivity property of driver mutations within a pathway. We define the pathway linear progression model, and derive a combinatorial formulation for the problem of finding the optimal model from mutation data. We show that with enough samples the optimal solution to this problem uniquely identifies the correct model with high probability even when errors are present in the mutation data. We then formulate the problem as an integer linear program (ILP), which allows the analysis of datasets from recent studies with large numbers of samples. We use our algorithm to analyze somatic mutation data from three cancer studies, including two studies from The Cancer Genome Atlas (TCGA) on large number of samples on colorectal cancer and glioblastoma. The models reconstructed with our method capture most of the current knowledge of the progression of somatic mutations in these cancer types, while also providing new insights on the tumor progression at the pathway level.
Collapse
Affiliation(s)
- Benjamin J Raphael
- 1Department of Computer Science and Center for Computational Molecular Biology, Brown University, Providence, Rhode Island
| | - Fabio Vandin
- 1Department of Computer Science and Center for Computational Molecular Biology, Brown University, Providence, Rhode Island.,2Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
24
|
Diaz-Uriarte R. Identifying restrictions in the order of accumulation of mutations during tumor progression: effects of passengers, evolutionary models, and sampling. BMC Bioinformatics 2015; 16:41. [PMID: 25879190 PMCID: PMC4339747 DOI: 10.1186/s12859-015-0466-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2014] [Accepted: 01/15/2015] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Cancer progression is caused by the sequential accumulation of mutations, but not all orders of accumulation are equally likely. When the fixation of some mutations depends on the presence of previous ones, identifying restrictions in the order of accumulation of mutations can lead to the discovery of therapeutic targets and diagnostic markers. The purpose of this study is to conduct a comprehensive comparison of the performance of all available methods to identify these restrictions from cross-sectional data. I used simulated data sets (where the true restrictions are known) but, in contrast to previous work, I embedded restrictions within evolutionary models of tumor progression that included passengers (mutations not responsible for the development of cancer, known to be very common). This allowed me to assess, for the first time, the effects of having to filter out passengers, of sampling schemes (when, how, and how many samples), and of deviations from order restrictions. RESULTS Poor choices of method, filtering, and sampling lead to large errors in all performance measures. Having to filter passengers lead to decreased performance, especially because true restrictions were missed. Overall, the best method for identifying order restrictions were Oncogenetic Trees, a fast and easy to use method that, although unable to recover dependencies of mutations on more than one mutation, showed good performance in most scenarios, superior to Conjunctive Bayesian Networks and Progression Networks. Single cell sampling provided no advantage, but sampling in the final stages of the disease vs. sampling at different stages had severe effects. Evolutionary model and deviations from order restrictions had major, and sometimes counterintuitive, interactions with other factors that affected performance. CONCLUSIONS This paper provides practical recommendations for using these methods with experimental data. It also identifies key areas of future methodological work and, in particular, it shows that it is both possible and necessary to embed assumptions about order restrictions and the nature of driver status within evolutionary models of cancer progression to evaluate the performance of inferential approaches.
Collapse
Affiliation(s)
- Ramon Diaz-Uriarte
- Dept. Biochemistry, Universidad Autónoma de Madrid, Instituto de Investigaciones Biomédicas "Alberto Sols" (UAM-CSIC), Arzobispo Morcillo, 4, 28029, Madrid, Spain.
| |
Collapse
|
25
|
Beerenwinkel N, Schwarz RF, Gerstung M, Markowetz F. Cancer evolution: mathematical models and computational inference. Syst Biol 2015; 64:e1-25. [PMID: 25293804 PMCID: PMC4265145 DOI: 10.1093/sysbio/syu081] [Citation(s) in RCA: 203] [Impact Index Per Article: 22.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2013] [Accepted: 09/26/2014] [Indexed: 12/12/2022] Open
Abstract
Cancer is a somatic evolutionary process characterized by the accumulation of mutations, which contribute to tumor growth, clinical progression, immune escape, and drug resistance development. Evolutionary theory can be used to analyze the dynamics of tumor cell populations and to make inference about the evolutionary history of a tumor from molecular data. We review recent approaches to modeling the evolution of cancer, including population dynamics models of tumor initiation and progression, phylogenetic methods to model the evolutionary relationship between tumor subclones, and probabilistic graphical models to describe dependencies among mutations. Evolutionary modeling helps to understand how tumors arise and will also play an increasingly important prognostic role in predicting disease progression and the outcome of medical interventions, such as targeted therapy.
Collapse
Affiliation(s)
- Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland; SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland; European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, CB20RE, United Kingdom Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland; SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland; European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, CB20RE, United Kingdom
| | - Roland F Schwarz
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland; SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland; European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, CB20RE, United Kingdom
| | - Moritz Gerstung
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland; SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland; European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, CB20RE, United Kingdom
| | - Florian Markowetz
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland; SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland; European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, CB20RE, United Kingdom
| |
Collapse
|
26
|
Loohuis LO, Caravagna G, Graudenzi A, Ramazzotti D, Mauri G, Antoniotti M, Mishra B. Inferring tree causal models of cancer progression with probability raising. PLoS One 2014; 9:e108358. [PMID: 25299648 PMCID: PMC4191986 DOI: 10.1371/journal.pone.0108358] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2014] [Accepted: 08/27/2014] [Indexed: 11/20/2022] Open
Abstract
Existing techniques to reconstruct tree models of progression for accumulative processes, such as cancer, seek to estimate causation by combining correlation and a frequentist notion of temporal priority. In this paper, we define a novel theoretical framework called CAPRESE (CAncer PRogression Extraction with Single Edges) to reconstruct such models based on the notion of probabilistic causation defined by Suppes. We consider a general reconstruction setting complicated by the presence of noise in the data due to biological variation, as well as experimental or measurement errors. To improve tolerance to noise we define and use a shrinkage-like estimator. We prove the correctness of our algorithm by showing asymptotic convergence to the correct tree under mild constraints on the level of noise. Moreover, on synthetic data, we show that our approach outperforms the state-of-the-art, that it is efficient even with a relatively small number of samples and that its performance quickly converges to its asymptote as the number of samples increases. For real cancer datasets obtained with different technologies, we highlight biologically significant differences in the progressions inferred with respect to other competing techniques and we also show how to validate conjectured biological relations with progression models.
Collapse
Affiliation(s)
- Loes Olde Loohuis
- Center for Neurobehavioral Genetics, University of California Los Angeles, Los Angeles, United States of America
| | - Giulio Caravagna
- Dipartimento di Informatica Sistemistica e Comunicazione, Università degli Studi, Milano-Bicocca, Milano, Italy
| | - Alex Graudenzi
- Dipartimento di Informatica Sistemistica e Comunicazione, Università degli Studi, Milano-Bicocca, Milano, Italy
| | - Daniele Ramazzotti
- Dipartimento di Informatica Sistemistica e Comunicazione, Università degli Studi, Milano-Bicocca, Milano, Italy
| | - Giancarlo Mauri
- Dipartimento di Informatica Sistemistica e Comunicazione, Università degli Studi, Milano-Bicocca, Milano, Italy
| | - Marco Antoniotti
- Dipartimento di Informatica Sistemistica e Comunicazione, Università degli Studi, Milano-Bicocca, Milano, Italy
| | - Bud Mishra
- Courant Institute of Mathematical Sciences, New York University, New York, United States of America
| |
Collapse
|
27
|
Chowdhury SA, Shackney SE, Heselmeyer-Haddad K, Ried T, Schäffer AA, Schwartz R. Algorithms to model single gene, single chromosome, and whole genome copy number changes jointly in tumor phylogenetics. PLoS Comput Biol 2014; 10:e1003740. [PMID: 25078894 PMCID: PMC4117424 DOI: 10.1371/journal.pcbi.1003740] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2014] [Accepted: 06/04/2014] [Indexed: 02/07/2023] Open
Abstract
We present methods to construct phylogenetic models of tumor progression at the cellular level that include copy number changes at the scale of single genes, entire chromosomes, and the whole genome. The methods are designed for data collected by fluorescence in situ hybridization (FISH), an experimental technique especially well suited to characterizing intratumor heterogeneity using counts of probes to genetic regions frequently gained or lost in tumor development. Here, we develop new provably optimal methods for computing an edit distance between the copy number states of two cells given evolution by copy number changes of single probes, all probes on a chromosome, or all probes in the genome. We then apply this theory to develop a practical heuristic algorithm, implemented in publicly available software, for inferring tumor phylogenies on data from potentially hundreds of single cells by this evolutionary model. We demonstrate and validate the methods on simulated data and published FISH data from cervical cancers and breast cancers. Our computational experiments show that the new model and algorithm lead to more parsimonious trees than prior methods for single-tumor phylogenetics and to improved performance on various classification tasks, such as distinguishing primary tumors from metastases obtained from the same patient population. Cancer is an evolutionary system whose growth and development is attributed to aberrations in well-known genes and to cancer-type specific genomic imbalances. Here, we present methods for reconstructing the evolution of individual tumors based on cell-to-cell variations between copy numbers of targeted regions of the genome. The methods are designed to work with fluorescence in situ hybridization (FISH), a technique that allows one to profile copy number changes in potentially thousands of single cells per study. Our work advances the prior art by developing theory and practical algorithms for building evolutionary trees of single tumors that can model gain or loss of genetic regions at the scale of single genes, whole chromosomes, or the entire genome, all common events in tumor evolution. We apply these methods on simulated and real tumor data to demonstrate substantial improvements in tree-building accuracy and in our ability to accurately classify tumors from their inferred evolutionary models. The newly developed algorithms have been released through our publicly available software, FISHtrees.
Collapse
Affiliation(s)
- Salim Akhter Chowdhury
- Joint Carnegie Mellon/University of Pittsburgh Ph.D. Program in Computational Biology, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- Lane Center for Computational Biology, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Stanley E. Shackney
- Intelligent Oncotherapeutics, Pittsburgh, Pennsylvania, United States of America
| | | | - Thomas Ried
- Genetics Branch, Center for Cancer Research, NCI, NIH, Bethesda, Maryland, United States of America
| | - Alejandro A. Schäffer
- Computational Biology Branch, NCBI, NIH, Bethesda, Maryland, United States of America
| | - Russell Schwartz
- Lane Center for Computational Biology, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|
28
|
Abstract
MOTIVATION Cancer cell genomes acquire several genetic alterations during somatic evolution from a normal cell type. The relative order in which these mutations accumulate and contribute to cell fitness is affected by epistatic interactions. Inferring their evolutionary history is challenging because of the large number of mutations acquired by cancer cells as well as the presence of unknown epistatic interactions. RESULTS We developed Bayesian Mutation Landscape (BML), a probabilistic approach for reconstructing ancestral genotypes from tumor samples for much larger sets of genes than previously feasible. BML infers the likely sequence of mutation accumulation for any set of genes that is recurrently mutated in tumor samples. When applied to tumor samples from colorectal, glioblastoma, lung and ovarian cancer patients, BML identifies the diverse evolutionary scenarios involved in tumor initiation and progression in greater detail, but broadly in agreement with prior results. AVAILABILITY AND IMPLEMENTATION Source code and all datasets are freely available at bml.molgen.mpg.de. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Navodit Misra
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, D-14195 Berlin, Germany and Department of Biosystems Science and Engineering, ETH Zurich and Swiss Institute of Bioinformatics, CH-4058 Basel, Switzerland
| | - Ewa Szczurek
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, D-14195 Berlin, Germany and Department of Biosystems Science and Engineering, ETH Zurich and Swiss Institute of Bioinformatics, CH-4058 Basel, Switzerland Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, D-14195 Berlin, Germany and Department of Biosystems Science and Engineering, ETH Zurich and Swiss Institute of Bioinformatics, CH-4058 Basel, Switzerland
| | - Martin Vingron
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, D-14195 Berlin, Germany and Department of Biosystems Science and Engineering, ETH Zurich and Swiss Institute of Bioinformatics, CH-4058 Basel, Switzerland
| |
Collapse
|
29
|
Purdom E, Ho C, Grasso CS, Quist MJ, Cho RJ, Spellman P. Methods and challenges in timing chromosomal abnormalities within cancer samples. ACTA ACUST UNITED AC 2013; 29:3113-20. [PMID: 24064421 PMCID: PMC3842754 DOI: 10.1093/bioinformatics/btt546] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Motivation: Tumors acquire many chromosomal amplifications, and those acquired early in the lifespan of the tumor may be not only important for tumor growth but also can be used for diagnostic purposes. Many methods infer the order of the accumulation of abnormalities based on their occurrence in a large cohort of patients. Recently, Durinck et al. (2011) and Greenman et al. (2012) developed methods to order a single tumor’s chromosomal amplifications based on the patterns of mutations accumulated within those regions. This method offers an unprecedented opportunity to assess the etiology of a single tumor sample, but has not been widely evaluated. Results: We show that the model for timing chromosomal amplifications is limited in scope, particularly for regions with high levels of amplification. We also show that the estimation of the order of events can be sensitive for events that occur early in the progression of the tumor and that the partial maximum likelihood method of Greenman et al. (2012) can give biased estimates, particularly for moderate read coverage or normal contamination. We propose a maximum-likelihood estimation procedure that fully accounts for sequencing variability and show that it outperforms the partial maximum-likelihood estimation method. We also propose a Bayesian estimation procedure that stabilizes the estimates in certain settings. We implement these methods on a small number of ovarian tumors, and the results suggest possible differences in how the tumors acquired amplifications. Availability and implementation: We provide implementation of these methods in an R package cancerTiming, which is available from the Comprehensive R Archive Network (CRAN) at http://CRAN.R-project.org/. Contact:epurdom@stat.Berkeley.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Elizabeth Purdom
- Department of Statistics, University of California, Berkeley, 367 Evans Hall Berkeley, CA 94720-3860, USA, Department of Molecular and Medical Genetics, Oregon Health & Science University, Portland, OR 97239, USA and Department of Dermatology, University of California, San Francisco, CA 94115, USA
| | | | | | | | | | | |
Collapse
|
30
|
Strino F, Parisi F, Micsinai M, Kluger Y. TrAp: a tree approach for fingerprinting subclonal tumor composition. Nucleic Acids Res 2013; 41:e165. [PMID: 23892400 PMCID: PMC3783191 DOI: 10.1093/nar/gkt641] [Citation(s) in RCA: 76] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Revealing the clonal composition of a single tumor is essential for identifying cell subpopulations with metastatic potential in primary tumors or with resistance to therapies in metastatic tumors. Sequencing technologies provide only an overview of the aggregate of numerous cells. Computational approaches to de-mix a collective signal composed of the aberrations of a mixed cell population of a tumor sample into its individual components are not available. We propose an evolutionary framework for deconvolving data from a single genome-wide experiment to infer the composition, abundance and evolutionary paths of the underlying cell subpopulations of a tumor. We have developed an algorithm (TrAp) for solving this mixture problem. In silico analyses show that TrAp correctly deconvolves mixed subpopulations when the number of subpopulations and the measurement errors are moderate. We demonstrate the applicability of the method using tumor karyotypes and somatic hypermutation data sets. We applied TrAp to Exome-Seq experiment of a renal cell carcinoma tumor sample and compared the mutational profile of the inferred subpopulations to the mutational profiles of single cells of the same tumor. Finally, we deconvolve sequencing data from eight acute myeloid leukemia patients and three distinct metastases of one melanoma patient to exhibit the evolutionary relationships of their subpopulations.
Collapse
Affiliation(s)
- Francesco Strino
- Department of Pathology, Yale University School of Medicine, New Haven, CT 06520, USA, NYU Center for Health Informatics and Bioinformatics, New York University Langone Medical Center, 227 East 30th Street, New York, NY 10016, USA and Yale Cancer Center, New Haven, CT 06520, USA
| | | | | | | |
Collapse
|
31
|
Shahrabi Farahani H, Lagergren J. Learning oncogenetic networks by reducing to mixed integer linear programming. PLoS One 2013; 8:e65773. [PMID: 23799047 PMCID: PMC3683041 DOI: 10.1371/journal.pone.0065773] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2012] [Accepted: 04/28/2013] [Indexed: 12/22/2022] Open
Abstract
Cancer can be a result of accumulation of different types of genetic mutations such as copy number aberrations. The data from tumors are cross-sectional and do not contain the temporal order of the genetic events. Finding the order in which the genetic events have occurred and progression pathways are of vital importance in understanding the disease. In order to model cancer progression, we propose Progression Networks, a special case of Bayesian networks, that are tailored to model disease progression. Progression networks have similarities with Conjunctive Bayesian Networks (CBNs) [1],a variation of Bayesian networks also proposed for modeling disease progression. We also describe a learning algorithm for learning Bayesian networks in general and progression networks in particular. We reduce the hard problem of learning the Bayesian and progression networks to Mixed Integer Linear Programming (MILP). MILP is a Non-deterministic Polynomial-time complete (NP-complete) problem for which very good heuristics exists. We tested our algorithm on synthetic and real cytogenetic data from renal cell carcinoma. We also compared our learned progression networks with the networks proposed in earlier publications. The software is available on the website https://bitbucket.org/farahani/diprog.
Collapse
Affiliation(s)
- Hossein Shahrabi Farahani
- KTH Royal Institute of Technology, Science for Life Laboratory (SciLifeLab), Center for Industrial and Applied Mathematics, School of Computer Science and Communication, Stockholm, Sweden
| | - Jens Lagergren
- KTH Royal Institute of Technology, Science for Life Laboratory (SciLifeLab), Center for Industrial and Applied Mathematics, School of Computer Science and Communication, Stockholm, Sweden
- * E-mail:
| |
Collapse
|
32
|
Czibula G, Bocicor IM, Czibula IG. Temporal ordering of cancer microarray data through a reinforcement learning based approach. PLoS One 2013; 8:e60883. [PMID: 23565283 PMCID: PMC3614992 DOI: 10.1371/journal.pone.0060883] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2012] [Accepted: 03/04/2013] [Indexed: 11/19/2022] Open
Abstract
Temporal modeling and analysis and more specifically, temporal ordering are very important problems within the fields of bioinformatics and computational biology, as the temporal analysis of the events characterizing a certain biological process could provide significant insights into its development and progression. Particularly, in the case of cancer, understanding the dynamics and the evolution of this disease could lead to better methods for prediction and treatment. In this paper we tackle, from a computational perspective, the temporal ordering problem, which refers to constructing a sorted collection of multi-dimensional biological data, collection that reflects an accurate temporal evolution of biological systems. We introduce a novel approach, based on reinforcement learning, more precisely, on Q-learning, for the biological temporal ordering problem. The experimental evaluation is performed using several DNA microarray data sets, two of which contain cancer gene expression data. The obtained solutions are correlated either to the given correct ordering (in the cases where this is provided for validation), or to the overall survival time of the patients (in the case of the cancer data sets), thus confirming a good performance of the proposed model and indicating the potential of our proposal.
Collapse
Affiliation(s)
- Gabriela Czibula
- Department of Computer Science, Babes-Bolyai University, Cluj-Napoca, Romania
| | - Iuliana M. Bocicor
- Department of Computer Science, Babes-Bolyai University, Cluj-Napoca, Romania
| | | |
Collapse
|
33
|
Hainke K, Rahnenführer J, Fried R. Cumulative disease progression models for cross-sectional data: a review and comparison. Biom J 2012; 54:617-40. [PMID: 22886685 DOI: 10.1002/bimj.201100186] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2011] [Revised: 04/19/2012] [Accepted: 05/25/2012] [Indexed: 11/06/2022]
Abstract
A better understanding of disease progression is beneficial for early diagnosis and appropriate individual therapy. Many different approaches for statistical modelling of cumulative disease progression have been proposed in the literature, including simple path models up to complex restricted Bayesian networks. Important fields of application are diseases such as cancer and HIV. Tumour progression is measured by means of chromosome aberrations, whereas people infected with HIV develop drug resistances because of genetic changes of the HI-virus. These two very different diseases have typical courses of disease progression, which can be modelled partly by consecutive and partly by independent steps. This paper gives an overview of the different progression models and points out their advantages and drawbacks. Different models are compared via simulations to analyse how they work if some of their assumptions are violated. In a simulation study, we evaluate how models perform in terms of fitting induced multivariate probability distributions and topological relationships. We often find that the true model class used for generating data is outperformed by either a less or a more complex model class. The more flexible conjunctive Bayesian networks can be used to fit oncogenetic trees, whereas mixtures of oncogenetic trees with three tree components can be well fitted by mixture models with only two tree components.
Collapse
Affiliation(s)
- Katrin Hainke
- Department of Statistics, TU Dortmund University, 44221 Dortmund, Germany.
| | | | | |
Collapse
|
34
|
Sakoparnig T, Beerenwinkel N. Efficient sampling for Bayesian inference of conjunctive Bayesian networks. Bioinformatics 2012; 28:2318-24. [PMID: 22782551 DOI: 10.1093/bioinformatics/bts433] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Cancer development is driven by the accumulation of advantageous mutations and subsequent clonal expansion of cells harbouring these mutations, but the order in which mutations occur remains poorly understood. Advances in genome sequencing and the soon-arriving flood of cancer genome data produced by large cancer sequencing consortia hold the promise to elucidate cancer progression. However, new computational methods are needed to analyse these large datasets. RESULTS We present a Bayesian inference scheme for Conjunctive Bayesian Networks, a probabilistic graphical model in which mutations accumulate according to partial order constraints and cancer genotypes are observed subject to measurement noise. We develop an efficient MCMC sampling scheme specifically designed to overcome local optima induced by dependency structures. We demonstrate the performance advantage of our sampler over traditional approaches on simulated data and show the advantages of adopting a Bayesian perspective when reanalyzing cancer datasets and comparing our results to previous maximum-likelihood-based approaches. AVAILABILITY An R package including the sampler and examples is available at http://www.cbg.ethz.ch/software/bayes-cbn. CONTACTS niko.beerenwinkel@bsse.ethz.ch.
Collapse
Affiliation(s)
- Thomas Sakoparnig
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, 4058 Basel, Switzerland
| | | |
Collapse
|
35
|
Cheng YK, Beroukhim R, Levine RL, Mellinghoff IK, Holland EC, Michor F. A mathematical methodology for determining the temporal order of pathway alterations arising during gliomagenesis. PLoS Comput Biol 2012; 8:e1002337. [PMID: 22241976 PMCID: PMC3252265 DOI: 10.1371/journal.pcbi.1002337] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2011] [Accepted: 11/17/2011] [Indexed: 12/31/2022] Open
Abstract
Human cancer is caused by the accumulation of genetic alterations in cells. Of special importance are changes that occur early during malignant transformation because they may result in oncogene addiction and thus represent promising targets for therapeutic intervention. We have previously described a computational approach, called Retracing the Evolutionary Steps in Cancer (RESIC), to determine the temporal sequence of genetic alterations during tumorigenesis from cross-sectional genomic data of tumors at their fully transformed stage. Since alterations within a set of genes belonging to a particular signaling pathway may have similar or equivalent effects, we applied a pathway-based systems biology approach to the RESIC methodology. This method was used to determine whether alterations of specific pathways develop early or late during malignant transformation. When applied to primary glioblastoma (GBM) copy number data from The Cancer Genome Atlas (TCGA) project, RESIC identified a temporal order of pathway alterations consistent with the order of events in secondary GBMs. We then further subdivided the samples into the four main GBM subtypes and determined the relative contributions of each subtype to the overall results: we found that the overall ordering applied for the proneural subtype but differed for mesenchymal samples. The temporal sequence of events could not be identified for neural and classical subtypes, possibly due to a limited number of samples. Moreover, for samples of the proneural subtype, we detected two distinct temporal sequences of events: (i) RAS pathway activation was followed by TP53 inactivation and finally PI3K2 activation, and (ii) RAS activation preceded only AKT activation. This extension of the RESIC methodology provides an evolutionary mathematical approach to identify the temporal sequence of pathway changes driving tumorigenesis and may be useful in guiding the understanding of signaling rearrangements in cancer development. Cancer is a deadly disease that develops through the accumulation of genetic changes over time. Many biological models do not incorporate this temporal aspect of tumor formation and progression, in part due to the difficulty of determining the sequence of events through biological experimentation for most cancer types. We previously developed a computational algorithm with which we can quickly and cost-effectively determine the order in which mutations arise in the tumor even when large numbers of mutations are considered. In this paper, we extended our method to incorporate biological knowledge of the common pathways by which cancer progresses. We applied these techniques to primary glioblastoma, the most common form of brain cancer. We found that when all samples are taken into account, a temporal sequence of pathway events emerges; however, different subtypes of glioblastoma vary in their temporal sequence of events. This algorithm can also be easily applied to other cancer types as clinical data becomes available, showing the benefit of computational and mathematical tools in cancer research. Using temporal information, cancer biologists will be able to develop more accurate animal models of tumor formation and learn more about how mutations interact in time, thus leading to better treatments for cancer.
Collapse
Affiliation(s)
- Yu-Kang Cheng
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, and Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
- Cancer Biology and Genetics Program, Brain Tumor Center, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
- Tri-Institutional Training Program in Computational Biology and Medicine, Weill Cornell Medical College, New York, New York, United States of America
| | - Rameen Beroukhim
- Departments of Cancer Biology and Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America, Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America, Department of Medicine, Brigham and Women's Hospital, Brigham and Women's Hospital, Boston, Massachusetts, United States of America, and Cancer Program, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Ross L. Levine
- Human Oncology and Pathogenesis Program, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
| | - Ingo K. Mellinghoff
- Human Oncology and Pathogenesis Program, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
| | - Eric C. Holland
- Cancer Biology and Genetics Program, Brain Tumor Center, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
| | - Franziska Michor
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, and Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
36
|
Ozery-Flato M, Linhart C, Trakhtenbrot L, Izraeli S, Shamir R. Large-scale analysis of chromosomal aberrations in cancer karyotypes reveals two distinct paths to aneuploidy. Genome Biol 2011; 12:R61. [PMID: 21714908 PMCID: PMC3218849 DOI: 10.1186/gb-2011-12-6-r61] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2011] [Revised: 05/17/2011] [Accepted: 06/29/2011] [Indexed: 01/05/2023] Open
Abstract
Background Chromosomal aneuploidy, that is to say the gain or loss of chromosomes, is the most common abnormality in cancer. While certain aberrations, most commonly translocations, are known to be strongly associated with specific cancers and contribute to their formation, most aberrations appear to be non-specific and arbitrary, and do not have a clear effect. The understanding of chromosomal aneuploidy and its role in tumorigenesis is a fundamental open problem in cancer biology. Results We report on a systematic study of the characteristics of chromosomal aberrations in cancers, using over 15,000 karyotypes and 62 cancer classes in the Mitelman Database. Remarkably, we discovered a very high co-occurrence rate of chromosome gains with other chromosome gains, and of losses with losses. Gains and losses rarely show significant co-occurrence. This finding was consistent across cancer classes and was confirmed on an independent comparative genomic hybridization dataset of cancer samples. The results of our analysis are available for further investigation via an accompanying website. Conclusions The broad generality and the intricate characteristics of the dichotomy of aneuploidy, ranging across numerous tumor classes, are revealed here rigorously for the first time using statistical analyses of large-scale datasets. Our finding suggests that aneuploid cancer cells may use extra chromosome gain or loss events to restore a balance in their altered protein ratios, needed for maintaining their cellular fitness.
Collapse
Affiliation(s)
- Michal Ozery-Flato
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | | | | | | | | |
Collapse
|
37
|
Mathematical modeling of carcinogenesis based on chromosome aberration data. Chin J Cancer Res 2009. [DOI: 10.1007/s11670-009-0240-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
|
38
|
Gerstung M, Baudis M, Moch H, Beerenwinkel N. Quantifying cancer progression with conjunctive Bayesian networks. Bioinformatics 2009; 25:2809-15. [PMID: 19692554 PMCID: PMC2781752 DOI: 10.1093/bioinformatics/btp505] [Citation(s) in RCA: 88] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Cancer is an evolutionary process characterized by accumulating mutations. However, the precise timing and the order of genetic alterations that drive tumor progression remain enigmatic. RESULTS We present a specific probabilistic graphical model for the accumulation of mutations and their interdependencies. The Bayesian network models cancer progression by an explicit unobservable accumulation process in time that is separated from the observable but error-prone detection of mutations. Model parameters are estimated by an Expectation-Maximization algorithm and the underlying interaction graph is obtained by a simulated annealing procedure. Applying this method to cytogenetic data for different cancer types, we find multiple complex oncogenetic pathways deviating substantially from simplified models, such as linear pathways or trees. We further demonstrate how the inferred progression dynamics can be used to improve genetics-based survival predictions which could support diagnostics and prognosis. AVAILABILITY The software package ct-cbn is available under a GPL license on the web site cbg.ethz.ch/software/ct-cbn CONTACT moritz.gerstung@bsse.ethz.ch.
Collapse
Affiliation(s)
- Moritz Gerstung
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, 4058 Basel, Switzerland.
| | | | | | | |
Collapse
|
39
|
|
40
|
Stanchescu R, Betts DR, Yekutieli D, Ambros P, Cohen N, Rechavi G, Amariglio N, Trakhtenbrot L. SKY analysis of childhood neural tumors and cell lines demonstrates a susceptibility of aberrant chromosomes to further rearrangements. Cancer Lett 2007; 250:47-52. [PMID: 17084022 DOI: 10.1016/j.canlet.2006.09.015] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2006] [Revised: 08/23/2006] [Accepted: 09/15/2006] [Indexed: 11/21/2022]
Abstract
Malignant solid tumors are commonly characterized by a large number of complex structural and numerical chromosomal alterations, which often reflect the level of genomic instability and can be associated with disease progression. The aim of this study was to evaluate whether chromosomes that harbor primary aberrations have a higher susceptibility to accumulate further alterations. We used spectral karyotyping (SKY), to compare the individual chromosomal instability of two chromosome types: chromosomes that have a primary aberration and chromosomes without an aberration, in 13 primary childhood neural tumors and seven cell lines. We found that chromosomes that contain a primary aberration are significantly (p-value<0.001) more likely to gain further structural rearrangements or to undergo numerical changes (22.6%, 36 of 159 chromosomes) than chromosomes with no initial aberration (4.9%, 54 of 1099 chromosomes). These results are highly suggestive that aberrant chromosomes in solid tumors have a higher susceptibility to accumulate further rearrangements than "normal" chromosomes.
Collapse
Affiliation(s)
- Racheli Stanchescu
- Department of Pediatric Hemato-Oncology and Cancer Research Center, The Edmond and Lily Safra Children's Hospital, Sheba Medical Center, Tel-Hashomer, Sackler School of Medicine, Tel-Aviv University, Tel-Aviv, Israel
| | | | | | | | | | | | | | | |
Collapse
|