Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Shahrabi Farahani H, Lagergren J. Learning oncogenetic networks by reducing to mixed integer linear programming. PLoS One 2013;8:e65773. [PMID: 23799047 PMCID: PMC3683041 DOI: 10.1371/journal.pone.0065773] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2012] [Accepted: 04/28/2013] [Indexed: 12/22/2022] Open

For:	Shahrabi Farahani H, Lagergren J. Learning oncogenetic networks by reducing to mixed integer linear programming. PLoS One 2013;8:e65773. [PMID: 23799047 PMCID: PMC3683041 DOI: 10.1371/journal.pone.0065773] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2012] [Accepted: 04/28/2013] [Indexed: 12/22/2022] Open

Number

Cited by Other Article(s)

Chen J. Timed hazard networks: Incorporating temporal difference for oncogenetic analysis. PLoS One 2023;18:e0283004. [PMID: 36928529 PMCID: PMC10019724 DOI: 10.1371/journal.pone.0283004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 03/01/2023] [Indexed: 03/18/2023] Open

An Analysis of Transcriptomic Burden Identifies Biological Progression Roadmaps for Hematological Malignancies and Solid Tumors. Biomedicines 2022;10:biomedicines10112720. [DOI: 10.3390/biomedicines10112720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 10/24/2022] [Indexed: 11/16/2022] Open

Nicol PB, Coombes KR, Deaver C, Chkrebtii O, Paul S, Toland AE, Asiaee A. Oncogenetic network estimation with disjunctive Bayesian networks. COMPUTATIONAL AND SYSTEMS ONCOLOGY 2021. [DOI: 10.1002/cso2.1027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open

Inferring tumor progression in large datasets. PLoS Comput Biol 2020;16:e1008183. [PMID: 33035204 PMCID: PMC7577444 DOI: 10.1371/journal.pcbi.1008183] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Revised: 10/21/2020] [Accepted: 07/22/2020] [Indexed: 12/31/2022] Open

Abstract

Identification of mutations of the genes that give cancer a selective advantage is an important step towards research and clinical objectives. As such, there has been a growing interest in developing methods for identification of driver genes and their temporal order within a single patient (intra-tumor) as well as across a cohort of patients (inter-tumor). In this paper, we develop a probabilistic model for tumor progression, in which the driver genes are clustered into several ordered driver pathways. We develop an efficient inference algorithm that exhibits favorable scalability to the number of genes and samples compared to a previously introduced ILP-based method. Adopting a probabilistic approach also allows principled approaches to model selection and uncertainty quantification. Using a large set of experiments on synthetic datasets, we demonstrate our superior performance compared to the ILP-based method. We also analyze two biological datasets of colorectal and glioblastoma cancers. We emphasize that while the ILP-based method puts many seemingly passenger genes in the driver pathways, our algorithm keeps focused on truly driver genes and outputs more accurate models for cancer progression.

Cancer is a disease caused by the accumulation of somatic mutations in the genome. This process is mainly driven by mutations in certain genes that give the harboring cells some selective advantage. The rather few driver genes are usually masked amongst an abundance of so-called passenger mutations. Identification of the driver genes and the temporal order in which the mutations occur is of great importance towards research and clinical objectives. In this paper, we introduce a probabilistic model for cancer progression and devise an efficient inference algorithm to train the model. We show that our method scales favorably to large datasets and provides superior performance compared to an ILP-based counterpart on a wide set of synthetic data simulations. Our Bayesian approach also allows for systematic model selection and confidence quantification procedures in contrast to the previous non-probabilistic progression models. We also study two large datasets on colorectal and glioblastoma cancers and validate our inferred model in comparison to the ILP-based method.

Collapse

Schill R, Solbrig S, Wettig T, Spang R. Modelling cancer progression using Mutual Hazard Networks. Bioinformatics 2020;36:241-249. [PMID: 31250881 PMCID: PMC6956791 DOI: 10.1093/bioinformatics/btz513] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2018] [Revised: 03/29/2019] [Accepted: 06/25/2019] [Indexed: 12/26/2022] Open

Wang M, Yu T, Liu J, Chen L, Stromberg AJ, Villano JL, Arnold SM, Liu C, Wang C. A probabilistic method for leveraging functional annotations to enhance estimation of the temporal order of pathway mutations during carcinogenesis. BMC Bioinformatics 2019;20:620. [PMID: 31791231 PMCID: PMC6889196 DOI: 10.1186/s12859-019-3218-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2019] [Accepted: 11/12/2019] [Indexed: 12/02/2022] Open

Abstract

BACKGROUND

Cancer arises through accumulation of somatically acquired genetic mutations. An important question is to delineate the temporal order of somatic mutations during carcinogenesis, which contributes to better understanding of cancer biology and facilitates identification of new therapeutic targets. Although a number of statistical and computational methods have been proposed to estimate the temporal order of mutations, they do not account for the differences in the functional impacts of mutations and thus are likely to be obscured by the presence of passenger mutations that do not contribute to cancer progression. In addition, many methods infer the order of mutations at the gene level, which have limited power due to the low mutation rate in most genes.

RESULTS

In this paper, we develop a Probabilistic Approach for estimating the Temporal Order of Pathway mutations by leveraging functional Annotations of mutations (PATOPA). PATOPA infers the order of mutations at the pathway level, wherein it uses a probabilistic method to characterize the likelihood of mutational events from different pathways occurring in a certain order. The functional impact of each mutation is incorporated to weigh more on a mutation that is more integral to tumor development. A maximum likelihood method is used to estimate parameters and infer the probability of one pathway being mutated prior to another. Simulation studies and analysis of whole exome sequencing data from The Cancer Genome Atlas (TCGA) demonstrate that PATOPA is able to accurately estimate the temporal order of pathway mutations and provides new biological insights on carcinogenesis of colorectal and lung cancers.

CONCLUSIONS

PATOPA provides a useful tool to estimate temporal order of mutations at the pathway level while leveraging functional annotations of mutations.

Collapse

Khakabimamaghani S, Ding D, Snow O, Ester M. Uncovering the subtype-specific temporal order of cancer pathway dysregulation. PLoS Comput Biol 2019;15:e1007451. [PMID: 31710622 PMCID: PMC6872169 DOI: 10.1371/journal.pcbi.1007451] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 11/21/2019] [Accepted: 09/30/2019] [Indexed: 12/20/2022] Open

Diaz-Uriarte R, Vasallo C. Every which way? On predicting tumor evolution using cancer progression models. PLoS Comput Biol 2019;15:e1007246. [PMID: 31374072 PMCID: PMC6693785 DOI: 10.1371/journal.pcbi.1007246] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Revised: 08/14/2019] [Accepted: 07/05/2019] [Indexed: 11/18/2022] Open

Abstract

Successful prediction of the likely paths of tumor progression is valuable for diagnostic, prognostic, and treatment purposes. Cancer progression models (CPMs) use cross-sectional samples to identify restrictions in the order of accumulation of driver mutations and thus CPMs encode the paths of tumor progression. Here we analyze the performance of four CPMs to examine whether they can be used to predict the true distribution of paths of tumor progression and to estimate evolutionary unpredictability. Employing simulations we show that if fitness landscapes are single peaked (have a single fitness maximum) there is good agreement between true and predicted distributions of paths of tumor progression when sample sizes are large, but performance is poor with the currently common much smaller sample sizes. Under multi-peaked fitness landscapes (i.e., those with multiple fitness maxima), performance is poor and improves only slightly with sample size. In all cases, detection regime (when tumors are sampled) is a key determinant of performance. Estimates of evolutionary unpredictability from the best performing CPM, among the four examined, tend to overestimate the true unpredictability and the bias is affected by detection regime; CPMs could be useful for estimating upper bounds to the true evolutionary unpredictability. Analysis of twenty-two cancer data sets shows low evolutionary unpredictability for several of the data sets. But most of the predictions of paths of tumor progression are very unreliable, and unreliability increases with the number of features analyzed. Our results indicate that CPMs could be valuable tools for predicting cancer progression but that, currently, obtaining useful predictions of paths of tumor progression from CPMs is dubious, and emphasize the need for methodological work that can account for the probably multi-peaked fitness landscapes in cancer.

Collapse

Mutational interactions define novel cancer subgroups. Nat Commun 2018;9:4353. [PMID: 30341300 PMCID: PMC6195543 DOI: 10.1038/s41467-018-06867-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2017] [Accepted: 10/01/2018] [Indexed: 12/12/2022] Open

Ramazzotti D, Graudenzi A, Caravagna G, Antoniotti M. Modeling Cumulative Biological Phenomena with Suppes-Bayes Causal Networks. Evol Bioinform Online 2018;14:1176934318785167. [PMID: 30013303 PMCID: PMC6043942 DOI: 10.1177/1176934318785167] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Accepted: 05/27/2018] [Indexed: 12/18/2022] Open

Hainke K, Szugat S, Fried R, Rahnenführer J. Variable selection for disease progression models: methods for oncogenetic trees and application to cancer and HIV. BMC Bioinformatics 2017;18:358. [PMID: 28764644 PMCID: PMC5539896 DOI: 10.1186/s12859-017-1762-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2016] [Accepted: 07/14/2017] [Indexed: 12/12/2022] Open

Cristea S, Kuipers J, Beerenwinkel N. pathTiMEx: Joint Inference of Mutually Exclusive Cancer Pathways and Their Progression Dynamics. J Comput Biol 2017;24:603-615. [DOI: 10.1089/cmb.2016.0171] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open

Schwartz R, Schäffer AA. The evolution of tumour phylogenetics: principles and practice. Nat Rev Genet 2017;18:213-229. [PMID: 28190876 PMCID: PMC5886015 DOI: 10.1038/nrg.2016.170] [Citation(s) in RCA: 142] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

Catanzaro D, Shackney SE, Schaffer AA, Schwartz R. Classifying the Progression of Ductal Carcinoma from Single-Cell Sampled Data via Integer Linear Programming: A Case Study. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016;13:643-655. [PMID: 26353381 PMCID: PMC5217787 DOI: 10.1109/tcbb.2015.2476808] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Gertz EM, Chowdhury SA, Lee WJ, Wangsa D, Heselmeyer-Haddad K, Ried T, Schwartz R, Schäffer AA. FISHtrees 3.0: Tumor Phylogenetics Using a Ploidy Probe. PLoS One 2016;11:e0158569. [PMID: 27362268 PMCID: PMC4928784 DOI: 10.1371/journal.pone.0158569] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Accepted: 06/19/2016] [Indexed: 01/03/2023] Open

Abstract

Advances in fluorescence in situ hybridization (FISH) make it feasible to detect multiple copy-number changes in hundreds of cells of solid tumors. Studies using FISH, sequencing, and other technologies have revealed substantial intra-tumor heterogeneity. The evolution of subclones in tumors may be modeled by phylogenies. Tumors often harbor aneuploid or polyploid cell populations. Using a FISH probe to estimate changes in ploidy can guide the creation of trees that model changes in ploidy and individual gene copy-number variations. We present FISHtrees 3.0, which implements a ploidy-based tree building method based on mixed integer linear programming (MILP). The ploidy-based modeling in FISHtrees includes a new formulation of the problem of merging trees for changes of a single gene into trees modeling changes in multiple genes and the ploidy. When multiple samples are collected from each patient, varying over time or tumor regions, it is useful to evaluate similarities in tumor progression among the samples. Therefore, we further implemented in FISHtrees 3.0 a new method to build consensus graphs for multiple samples. We validate FISHtrees 3.0 on a simulated data and on FISH data from paired cases of cervical primary and metastatic tumors and on paired breast ductal carcinoma in situ (DCIS) and invasive ductal carcinoma (IDC). Tests on simulated data show improved accuracy of the ploidy-based approach relative to prior ploidyless methods. Tests on real data further demonstrate novel insights these methods offer into tumor progression processes. Trees for DCIS samples are significantly less complex than trees for paired IDC samples. Consensus graphs show substantial divergence among most paired samples from both sets. Low consensus between DCIS and IDC trees may help explain the difficulty in finding biomarkers that predict which DCIS cases are at most risk to progress to IDC. The FISHtrees software is available at ftp://ftp.ncbi.nih.gov/pub/FISHtrees.

Collapse

Beerenwinkel N, Greenman CD, Lagergren J. Computational Cancer Biology: An Evolutionary Perspective. PLoS Comput Biol 2016;12:e1004717. [PMID: 26845763 PMCID: PMC4742235 DOI: 10.1371/journal.pcbi.1004717] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open

Raphael BJ, Vandin F. Simultaneous inference of cancer pathways and tumor progression from cross-sectional mutation data. J Comput Biol 2015;22:510-27. [PMID: 25785493 DOI: 10.1089/cmb.2014.0161] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open

Abstract

Recent cancer sequencing studies provide a wealth of somatic mutation data from a large number of patients. One of the most intriguing and challenging questions arising from this data is to determine whether the temporal order of somatic mutations in a cancer follows any common progression. Since we usually obtain only one sample from a patient, such inferences are commonly made from cross-sectional data from different patients. This analysis is complicated by the extensive variation in the somatic mutations across different patients, variation that is reduced by examining combinations of mutations in various pathways. Thus far, methods to reconstruct tumor progression at the pathway level have restricted attention to known, a priori defined pathways. In this work we show how to simultaneously infer pathways and the temporal order of their mutations from cross-sectional data, leveraging on the exclusivity property of driver mutations within a pathway. We define the pathway linear progression model, and derive a combinatorial formulation for the problem of finding the optimal model from mutation data. We show that with enough samples the optimal solution to this problem uniquely identifies the correct model with high probability even when errors are present in the mutation data. We then formulate the problem as an integer linear program (ILP), which allows the analysis of datasets from recent studies with large numbers of samples. We use our algorithm to analyze somatic mutation data from three cancer studies, including two studies from The Cancer Genome Atlas (TCGA) on large number of samples on colorectal cancer and glioblastoma. The models reconstructed with our method capture most of the current knowledge of the progression of somatic mutations in these cancer types, while also providing new insights on the tumor progression at the pathway level.

Collapse

Diaz-Uriarte R. Identifying restrictions in the order of accumulation of mutations during tumor progression: effects of passengers, evolutionary models, and sampling. BMC Bioinformatics 2015;16:41. [PMID: 25879190 PMCID: PMC4339747 DOI: 10.1186/s12859-015-0466-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2014] [Accepted: 01/15/2015] [Indexed: 11/29/2022] Open

Abstract

BACKGROUND

Cancer progression is caused by the sequential accumulation of mutations, but not all orders of accumulation are equally likely. When the fixation of some mutations depends on the presence of previous ones, identifying restrictions in the order of accumulation of mutations can lead to the discovery of therapeutic targets and diagnostic markers. The purpose of this study is to conduct a comprehensive comparison of the performance of all available methods to identify these restrictions from cross-sectional data. I used simulated data sets (where the true restrictions are known) but, in contrast to previous work, I embedded restrictions within evolutionary models of tumor progression that included passengers (mutations not responsible for the development of cancer, known to be very common). This allowed me to assess, for the first time, the effects of having to filter out passengers, of sampling schemes (when, how, and how many samples), and of deviations from order restrictions.

RESULTS

Poor choices of method, filtering, and sampling lead to large errors in all performance measures. Having to filter passengers lead to decreased performance, especially because true restrictions were missed. Overall, the best method for identifying order restrictions were Oncogenetic Trees, a fast and easy to use method that, although unable to recover dependencies of mutations on more than one mutation, showed good performance in most scenarios, superior to Conjunctive Bayesian Networks and Progression Networks. Single cell sampling provided no advantage, but sampling in the final stages of the disease vs. sampling at different stages had severe effects. Evolutionary model and deviations from order restrictions had major, and sometimes counterintuitive, interactions with other factors that affected performance.

CONCLUSIONS

This paper provides practical recommendations for using these methods with experimental data. It also identifies key areas of future methodological work and, in particular, it shows that it is both possible and necessary to embed assumptions about order restrictions and the nature of driver status within evolutionary models of cancer progression to evaluate the performance of inferential approaches.

Collapse

Beerenwinkel N, Schwarz RF, Gerstung M, Markowetz F. Cancer evolution: mathematical models and computational inference. Syst Biol 2015;64:e1-25. [PMID: 25293804 PMCID: PMC4265145 DOI: 10.1093/sysbio/syu081] [Citation(s) in RCA: 201] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2013] [Accepted: 09/26/2014] [Indexed: 12/12/2022] Open

Affiliation(s)

Niko Beerenwinkel Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland; SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland; European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, CB20RE, United Kingdom Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland; SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland; European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, CB20RE, United Kingdom
Roland F Schwarz Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland; SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland; European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, CB20RE, United Kingdom
Moritz Gerstung Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland; SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland; European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, CB20RE, United Kingdom
Florian Markowetz Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland; SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland; European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, CB20RE, United Kingdom

Collapse

Chowdhury SA, Shackney SE, Heselmeyer-Haddad K, Ried T, Schäffer AA, Schwartz R. Algorithms to model single gene, single chromosome, and whole genome copy number changes jointly in tumor phylogenetics. PLoS Comput Biol 2014;10:e1003740. [PMID: 25078894 PMCID: PMC4117424 DOI: 10.1371/journal.pcbi.1003740] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2014] [Accepted: 06/04/2014] [Indexed: 02/07/2023] Open

Abstract

We present methods to construct phylogenetic models of tumor progression at the cellular level that include copy number changes at the scale of single genes, entire chromosomes, and the whole genome. The methods are designed for data collected by fluorescence in situ hybridization (FISH), an experimental technique especially well suited to characterizing intratumor heterogeneity using counts of probes to genetic regions frequently gained or lost in tumor development. Here, we develop new provably optimal methods for computing an edit distance between the copy number states of two cells given evolution by copy number changes of single probes, all probes on a chromosome, or all probes in the genome. We then apply this theory to develop a practical heuristic algorithm, implemented in publicly available software, for inferring tumor phylogenies on data from potentially hundreds of single cells by this evolutionary model. We demonstrate and validate the methods on simulated data and published FISH data from cervical cancers and breast cancers. Our computational experiments show that the new model and algorithm lead to more parsimonious trees than prior methods for single-tumor phylogenetics and to improved performance on various classification tasks, such as distinguishing primary tumors from metastases obtained from the same patient population.

Cancer is an evolutionary system whose growth and development is attributed to aberrations in well-known genes and to cancer-type specific genomic imbalances. Here, we present methods for reconstructing the evolution of individual tumors based on cell-to-cell variations between copy numbers of targeted regions of the genome. The methods are designed to work with fluorescence in situ hybridization (FISH), a technique that allows one to profile copy number changes in potentially thousands of single cells per study. Our work advances the prior art by developing theory and practical algorithms for building evolutionary trees of single tumors that can model gain or loss of genetic regions at the scale of single genes, whole chromosomes, or the entire genome, all common events in tumor evolution. We apply these methods on simulated and real tumor data to demonstrate substantial improvements in tree-building accuracy and in our ability to accurately classify tumors from their inferred evolutionary models. The newly developed algorithms have been released through our publicly available software, FISHtrees.

Collapse