1
|
Nair NU, Schäffer AA, Gertz EM, Cheng K, Zerbib J, Sahu AD, Leor G, Shulman ED, Aldape KD, Ben-David U, Ruppin E. Chromosome 7 to the rescue: overcoming chromosome 10 loss in gliomas. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.17.576103. [PMID: 38313282 PMCID: PMC10836086 DOI: 10.1101/2024.01.17.576103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2024]
Abstract
The co-occurrence of chromosome 10 loss and chromosome 7 gain in gliomas is the most frequent loss-gain co-aneuploidy pair in human cancers, a phenomenon that has been investigated without resolution since the late 1980s. Expanding beyond previous gene-centric studies, we investigate the co-occurrence in a genome-wide manner taking an evolutionary perspective. First, by mining large tumor aneuploidy data, we predict that the more likely order is 10 loss followed by 7 gain. Second, by analyzing extensive genomic and transcriptomic data from both patients and cell lines, we find that this co-occurrence can be explained by functional rescue interactions that are highly enriched on 7, which can possibly compensate for any detrimental consequences arising from the loss of 10. Finally, by analyzing transcriptomic data from normal, non-cancerous, human brain tissues, we provide a plausible reason why this co-occurrence happens preferentially in cancers originating in certain regions of the brain.
Collapse
|
2
|
Rossi N, Gigante N, Vitacolonna N, Piazza C. Inferring Markov Chains to Describe Convergent Tumor Evolution With CIMICE. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:106-119. [PMID: 38015671 DOI: 10.1109/tcbb.2023.3337258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
The field of tumor phylogenetics focuses on studying the differences within cancer cell populations. Many efforts are done within the scientific community to build cancer progression models trying to understand the heterogeneity of such diseases. These models are highly dependent on the kind of data used for their construction, therefore, as the experimental technologies evolve, it is of major importance to exploit their peculiarities. In this work we describe a cancer progression model based on Single Cell DNA Sequencing data. When constructing the model, we focus on tailoring the formalism on the specificity of the data. We operate by defining a minimal set of assumptions needed to reconstruct a flexible DAG structured model, capable of identifying progression beyond the limitation of the infinite site assumption. Our proposal is conservative in the sense that we aim to neither discard nor infer knowledge which is not represented in the data. We provide simulations and analytical results to show the features of our model, test it on real data, show how it can be integrated with other approaches to cope with input noise. Moreover, our framework can be exploited to produce simulated data that follows our theoretical assumptions. Finally, we provide an open source R implementation of our approach, called CIMICE, that is publicly available on BioConductor.
Collapse
|
3
|
Nurminen A, Jaatinen S, Taavitsainen S, Högnäs G, Lesluyes T, Ansari-Pour N, Tolonen T, Haase K, Koskenalho A, Kankainen M, Jasu J, Rauhala H, Kesäniemi J, Nikupaavola T, Kujala P, Rinta-Kiikka I, Riikonen J, Kaipia A, Murtola T, Tammela TL, Visakorpi T, Nykter M, Wedge DC, Van Loo P, Bova GS. Cancer origin tracing and timing in two high-risk prostate cancers using multisample whole genome analysis: prospects for personalized medicine. Genome Med 2023; 15:82. [PMID: 37828555 PMCID: PMC10571458 DOI: 10.1186/s13073-023-01242-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 10/02/2023] [Indexed: 10/14/2023] Open
Abstract
BACKGROUND Prostate cancer (PrCa) genomic heterogeneity causes resistance to therapies such as androgen deprivation. Such heterogeneity can be deciphered in the context of evolutionary principles, but current clinical trials do not include evolution as an essential feature. Whether or not analysis of genomic data in an evolutionary context in primary prostate cancer can provide unique added value in the research and clinical domains remains an open question. METHODS We used novel processing techniques to obtain whole genome data together with 3D anatomic and histomorphologic analysis in two men (GP5 and GP12) with high-risk PrCa undergoing radical prostatectomy. A total of 22 whole genome-sequenced sites (16 primary cancer foci and 6 lymph node metastatic) were analyzed using evolutionary reconstruction tools and spatio-evolutionary models. Probability models were used to trace spatial and chronological origins of the primary tumor and metastases, chart their genetic drivers, and distinguish metastatic and non-metastatic subclones. RESULTS In patient GP5, CDK12 inactivation was among the first mutations, leading to a PrCa tandem duplicator phenotype and initiating the cancer around age 50, followed by rapid cancer evolution after age 57, and metastasis around age 59, 5 years prior to prostatectomy. In patient GP12, accelerated cancer progression was detected after age 54, and metastasis occurred around age 56, 3 years prior to prostatectomy. Multiple metastasis-originating events were identified in each patient and tracked anatomically. Metastasis from prostate to lymph nodes occurred strictly ipsilaterally in all 12 detected events. In this pilot, metastatic subclone content analysis appears to substantially enhance the identification of key drivers. Evolutionary analysis' potential impact on therapy selection appears positive in these pilot cases. CONCLUSIONS PrCa evolutionary analysis allows tracking of anatomic site of origin, timing of cancer origin and spread, and distinction of metastatic-capable from non-metastatic subclones. This enables better identification of actionable targets for therapy. If extended to larger cohorts, it appears likely that similar analyses could add substantial biological insight and clinically relevant value.
Collapse
Affiliation(s)
- Anssi Nurminen
- Faculty of Medicine and Health Technology, Prostate Cancer Research Center, Tampere University and Tays Cancer Center, PO Box 100, 33014, Tampere, Finland
| | - Serafiina Jaatinen
- Faculty of Medicine and Health Technology, Prostate Cancer Research Center, Tampere University and Tays Cancer Center, PO Box 100, 33014, Tampere, Finland
| | - Sinja Taavitsainen
- Faculty of Medicine and Health Technology, Prostate Cancer Research Center, Tampere University and Tays Cancer Center, PO Box 100, 33014, Tampere, Finland
| | - Gunilla Högnäs
- Faculty of Medicine and Health Technology, Prostate Cancer Research Center, Tampere University and Tays Cancer Center, PO Box 100, 33014, Tampere, Finland
| | - Tom Lesluyes
- The Francis Crick Institute, London, NW1 1AT, UK
| | - Naser Ansari-Pour
- MRC Molecular Haematology Unit, Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK
| | - Teemu Tolonen
- Fimlab Laboratories, Department of Pathology, Tampere University Hospital, Tampere, Finland
| | - Kerstin Haase
- The Francis Crick Institute, London, NW1 1AT, UK
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität Zu Berlin, ECRC Experimental and Clinical Research Center, Berlin, Germany
| | - Antti Koskenalho
- Faculty of Medicine and Health Technology, Prostate Cancer Research Center, Tampere University and Tays Cancer Center, PO Box 100, 33014, Tampere, Finland
| | - Matti Kankainen
- Institute for Molecular Medicine Finland, University of Helsinki, Tukholmankatu 8, Helsinki, 00290, Finland
| | - Juho Jasu
- Faculty of Medicine and Health Technology, Prostate Cancer Research Center, Tampere University and Tays Cancer Center, PO Box 100, 33014, Tampere, Finland
| | - Hanna Rauhala
- Faculty of Medicine and Health Technology, Prostate Cancer Research Center, Tampere University and Tays Cancer Center, PO Box 100, 33014, Tampere, Finland
| | - Jenni Kesäniemi
- Faculty of Medicine and Health Technology, Prostate Cancer Research Center, Tampere University and Tays Cancer Center, PO Box 100, 33014, Tampere, Finland
| | - Tiia Nikupaavola
- Faculty of Medicine and Health Technology, Prostate Cancer Research Center, Tampere University and Tays Cancer Center, PO Box 100, 33014, Tampere, Finland
| | - Paula Kujala
- Fimlab Laboratories, Department of Pathology, Tampere University Hospital, Tampere, Finland
| | - Irina Rinta-Kiikka
- Imaging Centre, Department of Radiology, Tampere University Hospital, Tampere, Finland
| | - Jarno Riikonen
- Department of Urology, TAYS Cancer Center, Tampere University Hospital, Tampere, Finland
| | - Antti Kaipia
- Department of Urology, TAYS Cancer Center, Tampere University Hospital, Tampere, Finland
| | - Teemu Murtola
- Faculty of Medicine and Health Technology, Prostate Cancer Research Center, Tampere University and Tays Cancer Center, PO Box 100, 33014, Tampere, Finland
- Department of Urology, TAYS Cancer Center, Tampere University Hospital, Tampere, Finland
| | - Teuvo L Tammela
- Faculty of Medicine and Health Technology, Prostate Cancer Research Center, Tampere University and Tays Cancer Center, PO Box 100, 33014, Tampere, Finland
- Department of Urology, TAYS Cancer Center, Tampere University Hospital, Tampere, Finland
| | - Tapio Visakorpi
- Faculty of Medicine and Health Technology, Prostate Cancer Research Center, Tampere University and Tays Cancer Center, PO Box 100, 33014, Tampere, Finland
- Fimlab Laboratories, Department of Pathology, Tampere University Hospital, Tampere, Finland
| | - Matti Nykter
- Faculty of Medicine and Health Technology, Prostate Cancer Research Center, Tampere University and Tays Cancer Center, PO Box 100, 33014, Tampere, Finland
| | - David C Wedge
- Manchester Cancer Research Centre, Division of Cancer Sciences, University of Manchester, Manchester, M20 4GJ, UK
| | - Peter Van Loo
- The Francis Crick Institute, London, NW1 1AT, UK
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - G Steven Bova
- Faculty of Medicine and Health Technology, Prostate Cancer Research Center, Tampere University and Tays Cancer Center, PO Box 100, 33014, Tampere, Finland.
| |
Collapse
|
4
|
Moravec JC, Lanfear R, Spector DL, Diermeier SD, Gavryushkin A. Testing for Phylogenetic Signal in Single-Cell RNA-Seq Data. J Comput Biol 2023; 30:518-537. [PMID: 36475926 PMCID: PMC10125402 DOI: 10.1089/cmb.2022.0357] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Phylogenetic methods are emerging as a useful tool to understand cancer evolutionary dynamics, including tumor structure, heterogeneity, and progression. Most currently used approaches utilize either bulk whole genome sequencing or single-cell DNA sequencing and are based on calling copy number alterations and single nucleotide variants (SNVs). Single-cell RNA sequencing (scRNA-seq) is commonly applied to explore differential gene expression of cancer cells throughout tumor progression. The method exacerbates the single-cell sequencing problem of low yield per cell with uneven expression levels. This accounts for low and uneven sequencing coverage and makes SNV detection and phylogenetic analysis challenging. In this article, we demonstrate for the first time that scRNA-seq data contain sufficient evolutionary signal and can also be utilized in phylogenetic analyses. We explore and compare results of such analyses based on both expression levels and SNVs called from scRNA-seq data. Both techniques are shown to be useful for reconstructing phylogenetic relationships between cells, reflecting the clonal composition of a tumor. Both standardized expression values and SNVs appear to be equally capable of reconstructing a similar pattern of phylogenetic relationship. This pattern is stable even when phylogenetic uncertainty is taken in account. Our results open up a new direction of somatic phylogenetics based on scRNA-seq data. Further research is required to refine and improve these approaches to capture the full picture of somatic evolutionary dynamics in cancer.
Collapse
Affiliation(s)
- Jiří C. Moravec
- Department of Computer Science, University of Otago, Dunedin, New Zealand
- School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
| | - Robert Lanfear
- Division of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, Australia
| | | | | | - Alex Gavryushkin
- School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
| |
Collapse
|
5
|
Chen J. Timed hazard networks: Incorporating temporal difference for oncogenetic analysis. PLoS One 2023; 18:e0283004. [PMID: 36928529 PMCID: PMC10019724 DOI: 10.1371/journal.pone.0283004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 03/01/2023] [Indexed: 03/18/2023] Open
Abstract
Oncogenetic graphical models are crucial for understanding cancer progression by analyzing the accumulation of genetic events. These models are used to identify statistical dependencies and temporal order of genetic events, which helps design targeted therapies. However, existing algorithms do not account for temporal differences between samples in oncogenetic analysis. This paper introduces Timed Hazard Networks (TimedHN), a new statistical model that uses temporal differences to improve accuracy and reliability. TimedHN models the accumulation process as a continuous-time Markov chain and includes an efficient gradient computation algorithm for optimization. Our simulation experiments demonstrate that TimedHN outperforms current state-of-the-art graph reconstruction methods. We also compare TimedHN with existing methods on a luminal breast cancer dataset, highlighting its potential utility. The Matlab implementation and data are available at https://github.com/puar-playground/TimedHN.
Collapse
Affiliation(s)
- Jian Chen
- Department of Computer Science and Engineering, University at Buffalo, Buffalo, NY, United States of America
- * E-mail:
| |
Collapse
|
6
|
Pirkl M, Büch J, Devaux C, Böhm M, Sönnerborg A, Incardona F, Abecasis A, Vandamme AM, Zazzi M, Kaiser R, Lengauer T, The EuResist Network Study Group. Analysis of mutational history of multidrug-resistant genotypes with a mutagenetic tree model. J Med Virol 2023; 95:e28389. [PMID: 36484375 DOI: 10.1002/jmv.28389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 11/24/2022] [Accepted: 12/01/2022] [Indexed: 12/14/2022]
Abstract
Human immunodeficiency virus (HIV) can develop resistance to all antiretroviral drugs. Multidrug resistance, however, is a rare event in modern HIV treatment, but can be life-threatening, particular in patients with very long therapy histories and in areas with limited access to novel drugs. To understand the evolution of multidrug resistance, we analyzed the EuResist database to uncover the accumulation of mutations over time. We hypothesize that the accumulation of resistance mutations is not acquired simultaneously and randomly across viral genotypes but rather tends to follow a predetermined order. The knowledge of this order might help to elucidate potential mechanisms of multidrug resistance. Our evolutionary model shows an almost monotonic increase of resistance with each acquired mutation, including less well-known nucleoside reverse transcriptase (RT) inhibitor-related mutations like K223Q, L228H, and Q242H. Mutations within the integrase (IN) (T97A, E138A/K G140S, Q148H, N155H) indicate high probability of multidrug resistance. Hence, these IN mutations also tend to be observed together with mutations in the protease (PR) and RT. We followed up with an analysis of the mutation-specific error rates of our model given the data. We identified several mutations with unusual rates (PR: M41L, L33F, IN: G140S). This could imply the existence of previously unknown virus variants in the viral quasispecies. In conclusion, our bioinformatics model supports the analysis and understanding of multidrug resistance.
Collapse
Affiliation(s)
- Martin Pirkl
- Institute of Virology, University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Joachim Büch
- Institute of Virology, University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Carole Devaux
- Department of Infection and Immunity, Luxembourg Institute of Health, Strassen, Luxembourg
| | - Michael Böhm
- Institute of Virology, University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Anders Sönnerborg
- Department of Laboratory Medicine, Division of Clinical Microbiology, Karolinska Institute, Solna, Sweden
| | | | - Ana Abecasis
- Center for Global Health and Tropical Medicine, Instituto de Higiene e Medicina Tropical, Universidade Nova de Lisboa, Lisbon, Portugal
| | - Anne-Mieke Vandamme
- Center for Global Health and Tropical Medicine, Instituto de Higiene e Medicina Tropical, Universidade Nova de Lisboa, Lisbon, Portugal.,Department of Microbiology, Immunology and Transplantation, Clinical and Epidemiological Virology, Institute for the Future, Rega Institute for Medical Research, KU Leuven, Leuven, Belgium
| | - Maurizio Zazzi
- Department of Medical Biotechnologies, University of Siena, Siena, Italy
| | - Rolf Kaiser
- Institute of Virology, University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Thomas Lengauer
- Institute of Virology, University Hospital Cologne, University of Cologne, Cologne, Germany
| | | |
Collapse
|
7
|
ToMExO: A probabilistic tree-structured model for cancer progression. PLoS Comput Biol 2022; 18:e1010732. [DOI: 10.1371/journal.pcbi.1010732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Revised: 12/15/2022] [Accepted: 11/14/2022] [Indexed: 12/12/2022] Open
Abstract
Identifying the interrelations among cancer driver genes and the patterns in which the driver genes get mutated is critical for understanding cancer. In this paper, we study cross-sectional data from cohorts of tumors to identify the cancer-type (or subtype) specific process in which the cancer driver genes accumulate critical mutations. We model this mutation accumulation process using a tree, where each node includes a driver gene or a set of driver genes. A mutation in each node enables its children to have a chance of mutating. This model simultaneously explains the mutual exclusivity patterns observed in mutations in specific cancer genes (by its nodes) and the temporal order of events (by its edges). We introduce a computationally efficient dynamic programming procedure for calculating the likelihood of our noisy datasets and use it to build our Markov Chain Monte Carlo (MCMC) inference algorithm, ToMExO. Together with a set of engineered MCMC moves, our fast likelihood calculations enable us to work with datasets with hundreds of genes and thousands of tumors, which cannot be dealt with using available cancer progression analysis methods. We demonstrate our method’s performance on several synthetic datasets covering various scenarios for cancer progression dynamics. Then, a comparison against two state-of-the-art methods on a moderate-size biological dataset shows the merits of our algorithm in identifying significant and valid patterns. Finally, we present our analyses of several large biological datasets, including colorectal cancer, glioblastoma, and pancreatic cancer. In all the analyses, we validate the results using a set of method-independent metrics testing the causality and significance of the relations identified by ToMExO or competing methods.
Collapse
|
8
|
Gao Y, Gaither J, Chifman J, Kubatko L. A phylogenetic approach to inferring the order in which mutations arise during cancer progression. PLoS Comput Biol 2022; 18:e1010560. [PMID: 36459515 DOI: 10.1371/journal.pcbi.1010560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2021] [Revised: 12/14/2022] [Accepted: 09/12/2022] [Indexed: 12/05/2022] Open
Abstract
Although the role of evolutionary process in cancer progression is widely accepted, increasing attention is being given to the evolutionary mechanisms that can lead to differences in clinical outcome. Recent studies suggest that the temporal order in which somatic mutations accumulate during cancer progression is important. Single-cell sequencing (SCS) provides a unique opportunity to examine the effect that the mutation order has on cancer progression and treatment effect. However, the error rates associated with single-cell sequencing are known to be high, which greatly complicates the task. We propose a novel method for inferring the order in which somatic mutations arise within an individual tumor using noisy data from single-cell sequencing. Our method incorporates models at two levels in that the evolutionary process of somatic mutation within the tumor is modeled along with the technical errors that arise from the single-cell sequencing data collection process. Through analyses of simulations across a wide range of realistic scenarios, we show that our method substantially outperforms existing approaches for identifying mutation order. Most importantly, our method provides a unique means to capture and quantify the uncertainty in the inferred mutation order along a given phylogeny. We illustrate our method by analyzing data from colorectal and prostate cancer patients, in which our method strengthens previously reported mutation orders. Our work is an important step towards producing meaningful prediction of mutation order with high accuracy and measuring the uncertainty of predicted mutation order in cancer patients, with the potential to lead to new insights about the evolutionary trajectories of cancer.
Collapse
Affiliation(s)
- Yuan Gao
- Division of Biostatistics, The Ohio State University, Columbus, Ohio, United States of America
| | - Jeff Gaither
- Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, Ohio, United States of America
| | - Julia Chifman
- Dept of Mathematics and Statistics, American University, Washington D. C., United States of America
| | - Laura Kubatko
- Dept of Statistics, The Ohio State University, Columbus, Ohio, United States of America
- Dept of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, Ohio, United States of America
| |
Collapse
|
9
|
Diaz-Uriarte R, Herrera-Nieto P. EvAM-Tools: tools for evolutionary accumulation and cancer progression models. Bioinformatics 2022; 38:5457-5459. [PMID: 36287062 PMCID: PMC9750106 DOI: 10.1093/bioinformatics/btac710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Revised: 10/03/2022] [Accepted: 10/25/2022] [Indexed: 12/25/2022] Open
Abstract
SUMMARY EvAM-Tools is an R package and web application that provides a unified interface to state-of-the-art cancer progression models and, more generally, evolutionary models of event accumulation. The output includes, in addition to the fitted models, the transition (and transition rate) matrices between genotypes and the probabilities of evolutionary paths. Generation of random cancer progression models is also available. Using the GUI in the web application, users can easily construct models (modifying directed acyclic graphs of restrictions, matrices of mutual hazards or specifying genotype composition), generate data from them (with user-specified observational/genotyping error) and analyze the data. AVAILABILITY AND IMPLEMENTATION Implemented in R and C; open source code available under the GNU Affero General Public License v3.0 at https://github.com/rdiaz02/EvAM-Tools. Docker images freely available from https://hub.docker.com/u/rdiaz02. Web app freely accessible at https://iib.uam.es/evamtools. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Pablo Herrera-Nieto
- Department of Biochemistry, Universidad Autónoma de Madrid, Instituto de Investigaciones Biomédicas “Alberto Sols” (UAM-CSIC), Madrid, Spain
| |
Collapse
|
10
|
Jiang H, Li Q, Lin JT, Lin FC. Classification of disease recurrence using transition likelihoods with expectation-maximization algorithm. Stat Med 2022; 41:4697-4715. [PMID: 35908812 PMCID: PMC9489660 DOI: 10.1002/sim.9534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 05/17/2022] [Accepted: 07/10/2022] [Indexed: 11/09/2022]
Abstract
When an infectious disease recurs, it may be due to treatment failure or a new infection. Being able to distinguish and classify these two different outcomes is critical in effective disease control. A multi-state model based on Markov processes is a typical approach to estimating the transition probability between the disease states. However, it can perform poorly when the disease state is unknown. This article aims to demonstrate that the transition likelihoods of baseline covariates can distinguish one cause from another with high accuracy in infectious diseases such as malaria. A more general model for disease progression can be constructed to allow for additional disease outcomes. We start from a multinomial logit model to estimate the disease transition probabilities and then utilize the baseline covariate's transition information to provide a more accurate classification result. We apply the expectation-maximization (EM) algorithm to estimate unknown parameters, including the marginal probabilities of disease outcomes. A simulation study comparing our classifier to the existing two-stage method shows that our classifier has better accuracy, especially when the sample size is small. The proposed method is applied to determining relapse vs reinfection outcomes in two Plasmodium vivax treatment studies from Cambodia that used different genotyping approaches to demonstrate its practical use.
Collapse
Affiliation(s)
- Huijun Jiang
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
| | - Quefeng Li
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
| | - Jessica T. Lin
- Division of Infectious Disease, School of Medicine, University of North Carolina, Chapel Hill, NC, USA
| | - Feng-Chang Lin
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
| |
Collapse
|
11
|
Early Breast Cancer Evolution by Autosomal Broad Copy Number Alterations. Int J Genomics 2022; 2022:9332922. [PMID: 35252434 PMCID: PMC8896957 DOI: 10.1155/2022/9332922] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Accepted: 02/08/2022] [Indexed: 12/13/2022] Open
Abstract
The availability of comprehensive genomic datasets across patient populations enables the application of novel methods for reconstructing tumor evolution within individual patients. To this end, we propose studying autosomal broad copy number alterations (CNAs) as a framework to better understand early tumor evolution. We compared the broad CNAs and somatic mutations of patients with 1 to 10 autosomal broad CNAs against the full set of patients, using data from The Cancer Genome Atlas breast cancer project. We reveal here that the frequency of a chromosome arm obtaining a broad CNA and a genome acquiring somatic mutations changes as autosomal broad CNAs accumulate. Therefore, we propose that the number of autosomal broad CNAs is an important characteristic of breast tumors that needs to be taken into consideration when studying breast tumors. To investigate this idea more in-depth, we next studied the frequency that specific chromosome arms acquire broad CNAs in patients with 1 to 10 broad CNAs. With this process, we identified the broad CNAs that exhibit the fastest rates of accumulation across all patients. This finding suggests a likely order of occurrence of these alterations in patients, which is apparent when we consider a subset of patients with few broad CNAs. Here, we lay the foundation for future studies to build upon our findings and use autosomal broad CNAs as a method to monitor breast tumor progression in vivo to further our understanding of how early tumor evolution unfolds.
Collapse
|
12
|
Golas MM, Gunawan B, Cakir M, Cameron S, Enders C, Liersch T, Füzesi L, Sander B. Evolutionary patterns of chromosomal instability and mismatch repair deficiency in proximal and distal colorectal cancer. Colorectal Dis 2022; 24:157-176. [PMID: 34623739 DOI: 10.1111/codi.15946] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 07/04/2021] [Accepted: 09/28/2021] [Indexed: 12/27/2022]
Abstract
AIM Colorectal carcinomas (CRCs) progress through heterogeneous pathways. The aim of this study was to analyse whether or not the cytogenetic evolution of CRC is linked to tumour site, level of chromosomal imbalance and metastasis. METHOD A set of therapy-naïve pT3 CRCs comprising 26 proximal and 49 distal pT3 CRCs was studied by combining immunohistochemistry of mismatch repair (MMR) proteins, microsatellite analyses and molecular karyotyping as well as clinical parameters. RESULTS A MMR deficient/microsatellite-unstable (dMMR/MSI-H) status was associated with location of the primary tumour proximal to the splenic flexure, and dMMR/MSI-H tumours presented with significantly lower levels of chromosomal imbalances compared with MMR proficient/microsatellite-stable (pMMR/MSS) tumours. Oncogenetic tree modelling suggested two evolutionary clusters characterized by dMMR/MSI-H and chromosomal instability (CIN), respectively, for both proximal and distal CRCs. In CIN cases, +13q, -18q and +20q were predicted as preferentially early events, and -1p, -4 -and -5q as late events. Separate oncogenetic tree models of proximal and distal cases indicated similar early events independent of tumour site. However, in cases with high CIN defined by more than 10 copy number aberrations, loss of 17p occurred earlier in cytogenetic evolution than in cases showing low to moderate CIN. Differences in the oncogenetic trees were observed for CRCs with lymph node and distant metastasis. Loss of 8p was modelled as an early event in node-positive CRC, while +7p and +8q comprised early events in CRC with distant metastasis. CONCLUSION CRCs characterized by CIN follow multiple, interconnected genetic pathways in line with the basic 'Vogelgram' concept proposed for the progression of CRC that places the accumulation of genetic changes at centre of tumour evolution. However, the timing of specific genetic events may favour metastatic potential.
Collapse
Affiliation(s)
- Mariola Monika Golas
- Department of Hematology and Medical Oncology, Comprehensive Cancer Center Augsburg, University Medical Center Augsburg, Augsburg, Germany
| | - Bastian Gunawan
- Institute of Pathology, University Medical Center Göttingen, Göttingen, Germany
| | - Meliha Cakir
- Institute of Pathology, University Medical Center Göttingen, Göttingen, Germany
| | - Silke Cameron
- Department of Gastroenterology and Gastrointestinal Oncology, University Medical Center Göttingen, Göttingen, Germany
| | - Christina Enders
- Institute of Pathology, University Medical Center Göttingen, Göttingen, Germany
| | - Torsten Liersch
- Department of General, Visceral and Pediatric Surgery, University Medical Center Göttingen, Göttingen, Germany
| | - Laszlo Füzesi
- Institute of Pathology, University Medical Center Göttingen, Göttingen, Germany.,Institute of Pathology and Molecular Diagnostics, University Medical Center Augsburg, Augsburg, Germany
| | - Bjoern Sander
- Institute of Pathology, University Medical Center Göttingen, Göttingen, Germany.,Institute of Pathology, Hannover Medical School, Hannover, Germany
| |
Collapse
|
13
|
Angaroni F, Chen K, Damiani C, Caravagna G, Graudenzi A, Ramazzotti D. PMCE: efficient inference of expressive models of cancer evolution with high prognostic power. Bioinformatics 2022; 38:754-762. [PMID: 34647978 DOI: 10.1093/bioinformatics/btab717] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 10/04/2021] [Accepted: 10/12/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Driver (epi)genomic alterations underlie the positive selection of cancer subpopulations, which promotes drug resistance and relapse. Even though substantial heterogeneity is witnessed in most cancer types, mutation accumulation patterns can be regularly found and can be exploited to reconstruct predictive models of cancer evolution. Yet, available methods can not infer logical formulas connecting events to represent alternative evolutionary routes or convergent evolution. RESULTS We introduce PMCE, an expressive framework that leverages mutational profiles from cross-sectional sequencing data to infer probabilistic graphical models of cancer evolution including arbitrary logical formulas, and which outperforms the state-of-the-art in terms of accuracy and robustness to noise, on simulations. The application of PMCE to 7866 samples from the TCGA database allows us to identify a highly significant correlation between the predicted evolutionary paths and the overall survival in 7 tumor types, proving that our approach can effectively stratify cancer patients in reliable risk groups. AVAILABILITY AND IMPLEMENTATION PMCE is freely available at https://github.com/BIMIB-DISCo/PMCE, in addition to the code to replicate all the analyses presented in the manuscript. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fabrizio Angaroni
- Department of Informatics, Systems and Communication, University of Milan-Bicocca, Milan 20125, Italy
| | - Kevin Chen
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Chiara Damiani
- Department of Biotechnology and Biosciences, University of Milan-Bicocca, Milan 20126, Italy.,Sysbio Centre for Systems Biology, Milan 20100, Italy
| | - Giulio Caravagna
- Department of Mathematics and Geosciences, University of Trieste, Trieste 34128, Italy
| | - Alex Graudenzi
- Institute of Molecular Bioimaging and Physiology, Consiglio Nazionale delle Ricerche (IBFM-CNR), Segrate, Milan 20054, Italy.,Bicocca Bioinformatics, Biostatistics and Bioimaging Centre (B4), Milan 20100, Italy
| | - Daniele Ramazzotti
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA.,Department of Pathology, Stanford University, Stanford, CA 94305, USA.,Department of Medicine and Surgery, University of Milan-Bicocca, Monza 20900, Italy
| |
Collapse
|
14
|
OUP accepted manuscript. Bioinformatics 2022; 38:i125-i133. [PMID: 35758777 PMCID: PMC9236577 DOI: 10.1093/bioinformatics/btac253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Cancer develops through a process of clonal evolution in which an initially healthy cell gives rise to progeny gradually differentiating through the accumulation of genetic and epigenetic mutations. These mutations can take various forms, including single-nucleotide variants (SNVs), copy number alterations (CNAs) or structural variations (SVs), with each variant type providing complementary insights into tumor evolution as well as offering distinct challenges to phylogenetic inference. RESULTS In this work, we develop a tumor phylogeny method, TUSV-ext, which incorporates SNVs, CNAs and SVs into a single inference framework. We demonstrate on simulated data that the method produces accurate tree inferences in the presence of all three variant types. We further demonstrate the method through application to real prostate tumor data, showing how our approach to coordinated phylogeny inference and clonal construction with all three variant types can reveal a more complicated clonal structure than is suggested by prior work, consistent with extensive polyclonal seeding or migration. AVAILABILITY AND IMPLEMENTATION https://github.com/CMUSchwartzLab/TUSV-ext. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
15
|
Diaz-Colunga J, Diaz-Uriarte R. Conditional prediction of consecutive tumor evolution using cancer progression models: What genotype comes next? PLoS Comput Biol 2021; 17:e1009055. [PMID: 34932572 PMCID: PMC8730404 DOI: 10.1371/journal.pcbi.1009055] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Revised: 01/05/2022] [Accepted: 11/25/2021] [Indexed: 12/13/2022] Open
Abstract
Accurate prediction of tumor progression is key for adaptive therapy and precision medicine. Cancer progression models (CPMs) can be used to infer dependencies in mutation accumulation from cross-sectional data and provide predictions of tumor progression paths. However, their performance when predicting complete evolutionary trajectories is limited by violations of assumptions and the size of available data sets. Instead of predicting full tumor progression paths, here we focus on short-term predictions, more relevant for diagnostic and therapeutic purposes. We examine whether five distinct CPMs can be used to answer the question "Given that a genotype with n mutations has been observed, what genotype with n + 1 mutations is next in the path of tumor progression?" or, shortly, "What genotype comes next?". Using simulated data we find that under specific combinations of genotype and fitness landscape characteristics CPMs can provide predictions of short-term evolution that closely match the true probabilities, and that some genotype characteristics can be much more relevant than global features. Application of these methods to 25 cancer data sets shows that their use is hampered by a lack of information needed to make principled decisions about method choice. Fruitful use of these methods for short-term predictions requires adapting method's use to local genotype characteristics and obtaining reliable indicators of performance; it will also be necessary to clarify the interpretation of the method's results when key assumptions do not hold.
Collapse
Affiliation(s)
- Juan Diaz-Colunga
- Department of Biochemistry, School of Medicine, Universidad Autónoma de Madrid, Madrid, Spain
- Instituto de Investigaciones Biomédicas ‘Alberto Sols’ (UAM-CSIC), Madrid, Spain
- Department of Ecology & Evolutionary Biology and Microbial Sciences Institute, Yale University, New Haven, Connecticut, United States of America
| | - Ramon Diaz-Uriarte
- Department of Biochemistry, School of Medicine, Universidad Autónoma de Madrid, Madrid, Spain
- Instituto de Investigaciones Biomédicas ‘Alberto Sols’ (UAM-CSIC), Madrid, Spain
- * E-mail:
| |
Collapse
|
16
|
Kwon BC, Anand V, Severson KA, Ghosh S, Sun Z, Frohnert BI, Lundgren M, Ng K. DPVis: Visual Analytics With Hidden Markov Models for Disease Progression Pathways. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:3685-3700. [PMID: 32275600 DOI: 10.1109/tvcg.2020.2985689] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Clinical researchers use disease progression models to understand patient status and characterize progression patterns from longitudinal health records. One approach for disease progression modeling is to describe patient status using a small number of states that represent distinctive distributions over a set of observed measures. Hidden Markov models (HMMs) and its variants are a class of models that both discover these states and make inferences of health states for patients. Despite the advantages of using the algorithms for discovering interesting patterns, it still remains challenging for medical experts to interpret model outputs, understand complex modeling parameters, and clinically make sense of the patterns. To tackle these problems, we conducted a design study with clinical scientists, statisticians, and visualization experts, with the goal to investigate disease progression pathways of chronic diseases, namely type 1 diabetes (T1D), Huntington's disease, Parkinson's disease, and chronic obstructive pulmonary disease (COPD). As a result, we introduce DPVis which seamlessly integrates model parameters and outcomes of HMMs into interpretable and interactive visualizations. In this article, we demonstrate that DPVis is successful in evaluating disease progression models, visually summarizing disease states, interactively exploring disease progression patterns, and building, analyzing, and comparing clinically relevant patient subgroups.
Collapse
|
17
|
Nicol PB, Coombes KR, Deaver C, Chkrebtii O, Paul S, Toland AE, Asiaee A. Oncogenetic network estimation with disjunctive Bayesian networks. COMPUTATIONAL AND SYSTEMS ONCOLOGY 2021. [DOI: 10.1002/cso2.1027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Affiliation(s)
| | - Kevin R. Coombes
- Department of Biomedical Informatics Ohio State University Columbus Ohio
| | - Courtney Deaver
- Natural Sciences Division Pepperdine University Malibu California
| | | | - Subhadeep Paul
- Department of Statistics Ohio State University Columbus Ohio
| | - Amanda E. Toland
- Department of Cancer Biology and Genetics and Department of Internal Medicine Division of Human Genetics, Comprehensive Cancer Center Ohio State University Columbus Ohio
| | - Amir Asiaee
- Mathematical Biosciences Institute Ohio State University Columbus Ohio
| |
Collapse
|
18
|
Manrubia S, Cuesta JA, Aguirre J, Ahnert SE, Altenberg L, Cano AV, Catalán P, Diaz-Uriarte R, Elena SF, García-Martín JA, Hogeweg P, Khatri BS, Krug J, Louis AA, Martin NS, Payne JL, Tarnowski MJ, Weiß M. From genotypes to organisms: State-of-the-art and perspectives of a cornerstone in evolutionary dynamics. Phys Life Rev 2021; 38:55-106. [PMID: 34088608 DOI: 10.1016/j.plrev.2021.03.004] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 03/01/2021] [Indexed: 12/21/2022]
Abstract
Understanding how genotypes map onto phenotypes, fitness, and eventually organisms is arguably the next major missing piece in a fully predictive theory of evolution. We refer to this generally as the problem of the genotype-phenotype map. Though we are still far from achieving a complete picture of these relationships, our current understanding of simpler questions, such as the structure induced in the space of genotypes by sequences mapped to molecular structures, has revealed important facts that deeply affect the dynamical description of evolutionary processes. Empirical evidence supporting the fundamental relevance of features such as phenotypic bias is mounting as well, while the synthesis of conceptual and experimental progress leads to questioning current assumptions on the nature of evolutionary dynamics-cancer progression models or synthetic biology approaches being notable examples. This work delves with a critical and constructive attitude into our current knowledge of how genotypes map onto molecular phenotypes and organismal functions, and discusses theoretical and empirical avenues to broaden and improve this comprehension. As a final goal, this community should aim at deriving an updated picture of evolutionary processes soundly relying on the structural properties of genotype spaces, as revealed by modern techniques of molecular and functional analysis.
Collapse
Affiliation(s)
- Susanna Manrubia
- Department of Systems Biology, Centro Nacional de Biotecnología (CSIC), Madrid, Spain; Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain.
| | - José A Cuesta
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Departamento de Matemáticas, Universidad Carlos III de Madrid, Leganés, Spain; Instituto de Biocomputación y Física de Sistemas Complejos (BiFi), Universidad de Zaragoza, Spain; UC3M-Santander Big Data Institute (IBiDat), Getafe, Madrid, Spain
| | - Jacobo Aguirre
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Centro de Astrobiología, CSIC-INTA, ctra. de Ajalvir km 4, 28850 Torrejón de Ardoz, Madrid, Spain
| | - Sebastian E Ahnert
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, UK; The Alan Turing Institute, British Library, 96 Euston Road, London NW1 2DB, UK
| | | | - Alejandro V Cano
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Pablo Catalán
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Departamento de Matemáticas, Universidad Carlos III de Madrid, Leganés, Spain
| | - Ramon Diaz-Uriarte
- Department of Biochemistry, Universidad Autónoma de Madrid, Madrid, Spain; Instituto de Investigaciones Biomédicas "Alberto Sols" (UAM-CSIC), Madrid, Spain
| | - Santiago F Elena
- Instituto de Biología Integrativa de Sistemas, I(2)SysBio (CSIC-UV), València, Spain; The Santa Fe Institute, Santa Fe, NM, USA
| | | | - Paulien Hogeweg
- Theoretical Biology and Bioinformatics Group, Utrecht University, the Netherlands
| | - Bhavin S Khatri
- The Francis Crick Institute, London, UK; Department of Life Sciences, Imperial College London, London, UK
| | - Joachim Krug
- Institute for Biological Physics, University of Cologne, Köln, Germany
| | - Ard A Louis
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford, UK
| | - Nora S Martin
- Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, Cambridge, UK; Sainsbury Laboratory, University of Cambridge, Cambridge, UK
| | - Joshua L Payne
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | | - Marcel Weiß
- Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, Cambridge, UK; Sainsbury Laboratory, University of Cambridge, Cambridge, UK
| |
Collapse
|
19
|
Haupt S, Zeilmann A, Ahadova A, Bläker H, von Knebel Doeberitz M, Kloor M, Heuveline V. Mathematical modeling of multiple pathways in colorectal carcinogenesis using dynamical systems with Kronecker structure. PLoS Comput Biol 2021; 17:e1008970. [PMID: 34003820 PMCID: PMC8162698 DOI: 10.1371/journal.pcbi.1008970] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Revised: 05/28/2021] [Accepted: 04/16/2021] [Indexed: 01/02/2023] Open
Abstract
Like many other types of cancer, colorectal cancer (CRC) develops through multiple pathways of carcinogenesis. This is also true for colorectal carcinogenesis in Lynch syndrome (LS), the most common inherited CRC syndrome. However, a comprehensive understanding of the distribution of these pathways of carcinogenesis, which allows for tailored clinical treatment and even prevention, is still lacking. We suggest a linear dynamical system modeling the evolution of different pathways of colorectal carcinogenesis based on the involved driver mutations. The model consists of different components accounting for independent and dependent mutational processes. We define the driver gene mutation graphs and combine them using the Cartesian graph product. This leads to matrix components built by the Kronecker sum and product of the adjacency matrices of the gene mutation graphs enabling a thorough mathematical analysis and medical interpretation. Using the Kronecker structure, we developed a mathematical model which we applied exemplarily to the three pathways of colorectal carcinogenesis in LS. Beside a pathogenic germline variant in one of the DNA mismatch repair (MMR) genes, driver mutations in APC, CTNNB1, KRAS and TP53 are considered. We exemplarily incorporate mutational dependencies, such as increased point mutation rates after MMR deficiency, and based on recent experimental data, biallelic somatic CTNNB1 mutations as common drivers of LS-associated CRCs. With the model and parameter choice, we obtained simulation results that are in concordance with clinical observations. These include the evolution of MMR-deficient crypts as early precursors in LS carcinogenesis and the influence of variants in MMR genes thereon. The proportions of MMR-deficient and MMR-proficient APC-inactivated crypts as first measure for the distribution among the pathways in LS-associated colorectal carcinogenesis are compatible with clinical observations. The approach provides a modular framework for modeling multiple pathways of carcinogenesis yielding promising results in concordance with clinical observations in LS CRCs.
Collapse
Affiliation(s)
- Saskia Haupt
- Engineering Mathematics and Computing Lab (EMCL), Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, Heidelberg, Germany
- Data Mining and Uncertainty Quantification (DMQ), Heidelberg Institute for Theoretical Studies (HITS), Heidelberg, Germany
| | - Alexander Zeilmann
- Image and Pattern Analysis Group (IPA), Heidelberg University, Heidelberg, Germany
| | - Aysel Ahadova
- Department of Applied Tumor Biology (ATB), Institute of Pathology, University Hospital Heidelberg, Heidelberg, Germany
| | - Hendrik Bläker
- Institute of Pathology, University Hospital Leipzig, Leipzig, Germany
| | - Magnus von Knebel Doeberitz
- Department of Applied Tumor Biology (ATB), Institute of Pathology, University Hospital Heidelberg, Heidelberg, Germany
| | - Matthias Kloor
- Department of Applied Tumor Biology (ATB), Institute of Pathology, University Hospital Heidelberg, Heidelberg, Germany
| | - Vincent Heuveline
- Engineering Mathematics and Computing Lab (EMCL), Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, Heidelberg, Germany
- Data Mining and Uncertainty Quantification (DMQ), Heidelberg Institute for Theoretical Studies (HITS), Heidelberg, Germany
| |
Collapse
|
20
|
Inferring tumor progression in large datasets. PLoS Comput Biol 2020; 16:e1008183. [PMID: 33035204 PMCID: PMC7577444 DOI: 10.1371/journal.pcbi.1008183] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Revised: 10/21/2020] [Accepted: 07/22/2020] [Indexed: 12/31/2022] Open
Abstract
Identification of mutations of the genes that give cancer a selective advantage is an important step towards research and clinical objectives. As such, there has been a growing interest in developing methods for identification of driver genes and their temporal order within a single patient (intra-tumor) as well as across a cohort of patients (inter-tumor). In this paper, we develop a probabilistic model for tumor progression, in which the driver genes are clustered into several ordered driver pathways. We develop an efficient inference algorithm that exhibits favorable scalability to the number of genes and samples compared to a previously introduced ILP-based method. Adopting a probabilistic approach also allows principled approaches to model selection and uncertainty quantification. Using a large set of experiments on synthetic datasets, we demonstrate our superior performance compared to the ILP-based method. We also analyze two biological datasets of colorectal and glioblastoma cancers. We emphasize that while the ILP-based method puts many seemingly passenger genes in the driver pathways, our algorithm keeps focused on truly driver genes and outputs more accurate models for cancer progression. Cancer is a disease caused by the accumulation of somatic mutations in the genome. This process is mainly driven by mutations in certain genes that give the harboring cells some selective advantage. The rather few driver genes are usually masked amongst an abundance of so-called passenger mutations. Identification of the driver genes and the temporal order in which the mutations occur is of great importance towards research and clinical objectives. In this paper, we introduce a probabilistic model for cancer progression and devise an efficient inference algorithm to train the model. We show that our method scales favorably to large datasets and provides superior performance compared to an ILP-based counterpart on a wide set of synthetic data simulations. Our Bayesian approach also allows for systematic model selection and confidence quantification procedures in contrast to the previous non-probabilistic progression models. We also study two large datasets on colorectal and glioblastoma cancers and validate our inferred model in comparison to the ILP-based method.
Collapse
|
21
|
Schill R, Solbrig S, Wettig T, Spang R. Modelling cancer progression using Mutual Hazard Networks. Bioinformatics 2020; 36:241-249. [PMID: 31250881 PMCID: PMC6956791 DOI: 10.1093/bioinformatics/btz513] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2018] [Revised: 03/29/2019] [Accepted: 06/25/2019] [Indexed: 12/26/2022] Open
Abstract
MOTIVATION Cancer progresses by accumulating genomic events, such as mutations and copy number alterations, whose chronological order is key to understanding the disease but difficult to observe. Instead, cancer progression models use co-occurrence patterns in cross-sectional data to infer epistatic interactions between events and thereby uncover their most likely order of occurrence. State-of-the-art progression models, however, are limited by mathematical tractability and only allow events to interact in directed acyclic graphs, to promote but not inhibit subsequent events, or to be mutually exclusive in distinct groups that cannot overlap. RESULTS Here we propose Mutual Hazard Networks (MHN), a new Machine Learning algorithm to infer cyclic progression models from cross-sectional data. MHN model events by their spontaneous rate of fixation and by multiplicative effects they exert on the rates of successive events. MHN compared favourably to acyclic models in cross-validated model fit on four datasets tested. In application to the glioblastoma dataset from The Cancer Genome Atlas, MHN proposed a novel interaction in line with consecutive biopsies: IDH1 mutations are early events that promote subsequent fixation of TP53 mutations. AVAILABILITY AND IMPLEMENTATION Implementation and data are available at https://github.com/RudiSchill/MHN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rudolf Schill
- Department of Statistical Bioinformatics, Institute of Functional Genomics, Regensburg 93040, Germany
| | - Stefan Solbrig
- Department of Physics, University of Regensburg, Regensburg 93040, Germany
| | - Tilo Wettig
- Department of Physics, University of Regensburg, Regensburg 93040, Germany
| | - Rainer Spang
- Department of Statistical Bioinformatics, Institute of Functional Genomics, Regensburg 93040, Germany
| |
Collapse
|
22
|
Abstract
BACKGROUND During cancer progression, malignant cells accumulate somatic mutations that can lead to genetic aberrations. In particular, evolutionary events akin to segmental duplications or deletions can alter the copy-number profile (CNP) of a set of genes in a genome. Our aim is to compute the evolutionary distance between two cells for which only CNPs are known. This asks for the minimum number of segmental amplifications and deletions to turn one CNP into another. This was recently formalized into a model where each event is assumed to alter a copy-number by 1 or -1, even though these events can affect large portions of a chromosome. RESULTS We propose a general cost framework where an event can modify the copy-number of a gene by larger amounts. We show that any cost scheme that allows segmental deletions of arbitrary length makes computing the distance strongly NP-hard. We then devise a factor 2 approximation algorithm for the problem when copy-numbers are non-zero and provide an implementation called cnp2cnp. We evaluate our approach experimentally by reconstructing simulated cancer phylogenies from the pairwise distances inferred by cnp2cnp and compare it against two other alternatives, namely the MEDICC distance and the Euclidean distance. CONCLUSIONS The experimental results show that our distance yields more accurate phylogenies on average than these alternatives if the given CNPs are error-free, but that the MEDICC distance is slightly more robust against error in the data. In all cases, our experiments show that either our approach or the MEDICC approach should preferred over the Euclidean distance.
Collapse
Affiliation(s)
| | - Manuel Lafond
- Department of Computer Science, Université de Sherbrooke, Sherbrooke, Canada.
| |
Collapse
|
23
|
HyperTraPS: Inferring Probabilistic Patterns of Trait Acquisition in Evolutionary and Disease Progression Pathways. Cell Syst 2020; 10:39-51.e10. [PMID: 31786211 DOI: 10.1016/j.cels.2019.10.009] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Revised: 08/23/2019] [Accepted: 10/26/2019] [Indexed: 01/15/2023]
Abstract
The explosion of data throughout the biomedical sciences provides unprecedented opportunities to learn about the dynamics of evolution and disease progression, but harnessing these large and diverse datasets remains challenging. Here, we describe a highly generalizable statistical platform to infer the dynamic pathways by which many, potentially interacting, traits are acquired or lost over time. We use HyperTraPS (hypercubic transition path sampling) to efficiently learn progression pathways from cross-sectional, longitudinal, or phylogenetically linked data, readily distinguishing multiple competing pathways, and identifying the most parsimonious mechanisms underlying given observations. This Bayesian approach allows inclusion of prior knowledge, quantifies uncertainty in pathway structure, and allows predictions, such as which symptom a patient will acquire next. We provide visualization tools for intuitive assessment of multiple, variable pathways. We apply the method to ovarian cancer progression and the evolution of multidrug resistance in tuberculosis, demonstrating its power to reveal previously undetected dynamic pathways.
Collapse
|
24
|
Wang M, Yu T, Liu J, Chen L, Stromberg AJ, Villano JL, Arnold SM, Liu C, Wang C. A probabilistic method for leveraging functional annotations to enhance estimation of the temporal order of pathway mutations during carcinogenesis. BMC Bioinformatics 2019; 20:620. [PMID: 31791231 PMCID: PMC6889196 DOI: 10.1186/s12859-019-3218-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2019] [Accepted: 11/12/2019] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Cancer arises through accumulation of somatically acquired genetic mutations. An important question is to delineate the temporal order of somatic mutations during carcinogenesis, which contributes to better understanding of cancer biology and facilitates identification of new therapeutic targets. Although a number of statistical and computational methods have been proposed to estimate the temporal order of mutations, they do not account for the differences in the functional impacts of mutations and thus are likely to be obscured by the presence of passenger mutations that do not contribute to cancer progression. In addition, many methods infer the order of mutations at the gene level, which have limited power due to the low mutation rate in most genes. RESULTS In this paper, we develop a Probabilistic Approach for estimating the Temporal Order of Pathway mutations by leveraging functional Annotations of mutations (PATOPA). PATOPA infers the order of mutations at the pathway level, wherein it uses a probabilistic method to characterize the likelihood of mutational events from different pathways occurring in a certain order. The functional impact of each mutation is incorporated to weigh more on a mutation that is more integral to tumor development. A maximum likelihood method is used to estimate parameters and infer the probability of one pathway being mutated prior to another. Simulation studies and analysis of whole exome sequencing data from The Cancer Genome Atlas (TCGA) demonstrate that PATOPA is able to accurately estimate the temporal order of pathway mutations and provides new biological insights on carcinogenesis of colorectal and lung cancers. CONCLUSIONS PATOPA provides a useful tool to estimate temporal order of mutations at the pathway level while leveraging functional annotations of mutations.
Collapse
Affiliation(s)
- Menghan Wang
- Department of Statistics, University of Kentucky, Lexington, USA
| | - Tianxin Yu
- Department of Molecular & Cellular Biology, Roswell Park Comprehensive Cancer Center, Buffalo, USA
| | - Jinpeng Liu
- Markey Cancer Center, University of Kentucky, Lexington, USA
| | - Li Chen
- Markey Cancer Center, University of Kentucky, Lexington, USA
- Department of Biostatistics, University of Kentucky, Lexington, USA
| | | | - John L. Villano
- Markey Cancer Center, University of Kentucky, Lexington, USA
- Department of Internal Medicine, University of Kentucky, Lexington, USA
| | - Susanne M. Arnold
- Markey Cancer Center, University of Kentucky, Lexington, USA
- Department of Internal Medicine, University of Kentucky, Lexington, USA
| | - Chunming Liu
- Markey Cancer Center, University of Kentucky, Lexington, USA
- Department of Molecular & Cellular Biochemistry, University of Kentucky, Lexington, USA
| | - Chi Wang
- Markey Cancer Center, University of Kentucky, Lexington, USA
- Department of Biostatistics, University of Kentucky, Lexington, USA
| |
Collapse
|
25
|
Khakabimamaghani S, Ding D, Snow O, Ester M. Uncovering the subtype-specific temporal order of cancer pathway dysregulation. PLoS Comput Biol 2019; 15:e1007451. [PMID: 31710622 PMCID: PMC6872169 DOI: 10.1371/journal.pcbi.1007451] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 11/21/2019] [Accepted: 09/30/2019] [Indexed: 12/20/2022] Open
Abstract
Cancer is driven by genetic mutations that dysregulate pathways important for proper cell function. Therefore, discovering these cancer pathways and their dysregulation order is key to understanding and treating cancer. However, the heterogeneity of mutations between different individuals makes this challenging and requires that cancer progression is studied in a subtype-specific way. To address this challenge, we provide a mathematical model, called Subtype-specific Pathway Linear Progression Model (SPM), that simultaneously captures cancer subtypes and pathways and order of dysregulation of the pathways within each subtype. Experiments with synthetic data indicate the robustness of SPM to problem specifics including noise compared to an existing method. Moreover, experimental results on glioblastoma multiforme and colorectal adenocarcinoma show the consistency of SPM's results with the existing knowledge and its superiority to an existing method in certain cases. The implementation of our method is available at https://github.com/Dalton386/SPM.
Collapse
Affiliation(s)
| | - Dujian Ding
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Oliver Snow
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Martin Ester
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada
| |
Collapse
|
26
|
Eaton J, Wang J, Schwartz R. Deconvolution and phylogeny inference of structural variations in tumor genomic samples. Bioinformatics 2019; 34:i357-i365. [PMID: 29950001 PMCID: PMC6022719 DOI: 10.1093/bioinformatics/bty270] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Motivation Phylogenetic reconstruction of tumor evolution has emerged as a crucial tool for making sense of the complexity of emerging cancer genomic datasets. Despite the growing use of phylogenetics in cancer studies, though, the field has only slowly adapted to many ways that tumor evolution differs from classic species evolution. One crucial question in that regard is how to handle inference of structural variations (SVs), which are a major mechanism of evolution in cancers but have been largely neglected in tumor phylogenetics to date, in part due to the challenges of reliably detecting and typing SVs and interpreting them phylogenetically. Results We present a novel method for reconstructing evolutionary trajectories of SVs from bulk whole-genome sequence data via joint deconvolution and phylogenetics, to infer clonal sub-populations and reconstruct their ancestry. We establish a novel likelihood model for joint deconvolution and phylogenetic inference on bulk SV data and formulate an associated optimization algorithm. We demonstrate the approach to be efficient and accurate for realistic scenarios of SV mutation on simulated data. Application to breast cancer genomic data from The Cancer Genome Atlas shows it to be practical and effective at reconstructing features of SV-driven evolution in single tumors. Availability and implementation Python source code and associated documentation are available at https://github.com/jaebird123/tusv.
Collapse
Affiliation(s)
- Jesse Eaton
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Jingyi Wang
- Department of Biomedical Engineering, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Russell Schwartz
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA.,Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
27
|
Yu XJ, Chen G, Yang J, Yu GC, Zhu PF, Jiang ZK, Feng K, Lu Y, Bao B, Zhong FM. Smoking alters the evolutionary trajectory of non-small cell lung cancer. Exp Ther Med 2019; 18:3315-3324. [PMID: 31602204 PMCID: PMC6777332 DOI: 10.3892/etm.2019.7958] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2018] [Accepted: 05/16/2019] [Indexed: 12/14/2022] Open
Abstract
Smoking is the biggest risk factor for lung cancer. Smokers have a much higher chance of developing lung tumors with a worse survival rate; however, non-smokers also develop lung tumors. A number of questions remain including the underlying difference between smoker and non-smoker lung cancer patients and the involvement of genetic and epigenetic processes in tumor development. The present study analyzed the mutation data of 100 non-small cell lung cancer (NSCLC) patients, 12 non-smokers, 48 ex-smokers and 40 smokers, from Tracking Non-Small Cell Lung Cancer Evolution through Therapy Consortium. A total of 68 genes exhibited different mutation patterns across non-smokers, ex-smokers and smokers. A number of these 68 genes encode membrane proteins with biological regulation, metabolic process, and response to stimulus functions. For each group of patients, the top 10 most frequently mutated genes were selected and their oncogenetic tree inferred, which reflected how the genes evolve during tumor genesis. By comparing the oncogenetic trees of non-smokers and smokers, it was identified that in non-smokers, the mutation of epidermal growth factor receptor (EGFR) was an early genetic alteration event and EGFR was the key driver, but in smokers, the mutation of titin (TTN) was more important. Based on network analysis, TTN can interact with spectrin α erythrocytic 1 through calmodulin 2 and troponin C1. These genetic differences during tumorigenesis of non-smoker and smoker lung cancer patients provided novel insights into the effects of smoking on the evolutionary trajectory of non-small cell lung cancer and may prove helpful for targeted therapy of different lung cancer subtypes.
Collapse
Affiliation(s)
- Xiao-Jun Yu
- Department of Thoracic Surgery, The First People's Hospital of Fuyang Hangzhou, Hangzhou, Zhejiang 311400, P.R. China
| | - Gang Chen
- Department of Thoracic Surgery, Hangzhou Red Cross Hospital, Hangzhou, Zhejiang 310003, P.R. China
| | - Jun Yang
- Department of Thoracic Surgery, Hangzhou Red Cross Hospital, Hangzhou, Zhejiang 310003, P.R. China
| | - Guo-Can Yu
- Department of Thoracic Surgery, Hangzhou Red Cross Hospital, Hangzhou, Zhejiang 310003, P.R. China
| | - Peng-Fei Zhu
- Department of Thoracic Surgery, Hangzhou Red Cross Hospital, Hangzhou, Zhejiang 310003, P.R. China
| | - Zheng-Ke Jiang
- Department of Surgery, Hangzhou Fuyang Hospital of Traditional Chinese Medicine, Hangzhou, Zhejiang 311400, P.R. China
| | - Kan Feng
- Department of Thoracic Surgery, The First People's Hospital of Fuyang Hangzhou, Hangzhou, Zhejiang 311400, P.R. China
| | - Yong Lu
- Department of Thoracic Surgery, The First People's Hospital of Fuyang Hangzhou, Hangzhou, Zhejiang 311400, P.R. China
| | - Bin Bao
- Department of Thoracic Surgery, The First People's Hospital of Fuyang Hangzhou, Hangzhou, Zhejiang 311400, P.R. China
| | - Fang-Ming Zhong
- Department of Thoracic Surgery, Hangzhou Red Cross Hospital, Hangzhou, Zhejiang 310003, P.R. China
| |
Collapse
|
28
|
Diaz-Uriarte R, Vasallo C. Every which way? On predicting tumor evolution using cancer progression models. PLoS Comput Biol 2019; 15:e1007246. [PMID: 31374072 PMCID: PMC6693785 DOI: 10.1371/journal.pcbi.1007246] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Revised: 08/14/2019] [Accepted: 07/05/2019] [Indexed: 11/18/2022] Open
Abstract
Successful prediction of the likely paths of tumor progression is valuable for diagnostic, prognostic, and treatment purposes. Cancer progression models (CPMs) use cross-sectional samples to identify restrictions in the order of accumulation of driver mutations and thus CPMs encode the paths of tumor progression. Here we analyze the performance of four CPMs to examine whether they can be used to predict the true distribution of paths of tumor progression and to estimate evolutionary unpredictability. Employing simulations we show that if fitness landscapes are single peaked (have a single fitness maximum) there is good agreement between true and predicted distributions of paths of tumor progression when sample sizes are large, but performance is poor with the currently common much smaller sample sizes. Under multi-peaked fitness landscapes (i.e., those with multiple fitness maxima), performance is poor and improves only slightly with sample size. In all cases, detection regime (when tumors are sampled) is a key determinant of performance. Estimates of evolutionary unpredictability from the best performing CPM, among the four examined, tend to overestimate the true unpredictability and the bias is affected by detection regime; CPMs could be useful for estimating upper bounds to the true evolutionary unpredictability. Analysis of twenty-two cancer data sets shows low evolutionary unpredictability for several of the data sets. But most of the predictions of paths of tumor progression are very unreliable, and unreliability increases with the number of features analyzed. Our results indicate that CPMs could be valuable tools for predicting cancer progression but that, currently, obtaining useful predictions of paths of tumor progression from CPMs is dubious, and emphasize the need for methodological work that can account for the probably multi-peaked fitness landscapes in cancer.
Collapse
Affiliation(s)
- Ramon Diaz-Uriarte
- Department of Biochemistry, Universidad Autónoma de Madrid, Madrid, Spain
- Instituto de Investigaciones Biomédicas “Alberto Sols” (UAM-CSIC), Madrid, Spain
| | - Claudia Vasallo
- Department of Biochemistry, Universidad Autónoma de Madrid, Madrid, Spain
- Instituto de Investigaciones Biomédicas “Alberto Sols” (UAM-CSIC), Madrid, Spain
| |
Collapse
|
29
|
Aguse N, Qi Y, El-Kebir M. Summarizing the solution space in tumor phylogeny inference by multiple consensus trees. Bioinformatics 2019; 35:i408-i416. [PMID: 31510657 PMCID: PMC6612807 DOI: 10.1093/bioinformatics/btz312] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
MOTIVATION Cancer phylogenies are key to studying tumorigenesis and have clinical implications. Due to the heterogeneous nature of cancer and limitations in current sequencing technology, current cancer phylogeny inference methods identify a large solution space of plausible phylogenies. To facilitate further downstream analyses, methods that accurately summarize such a set T of cancer phylogenies are imperative. However, current summary methods are limited to a single consensus tree or graph and may miss important topological features that are present in different subsets of candidate trees. RESULTS We introduce the Multiple Consensus Tree (MCT) problem to simultaneously cluster T and infer a consensus tree for each cluster. We show that MCT is NP-hard, and present an exact algorithm based on mixed integer linear programming (MILP). In addition, we introduce a heuristic algorithm that efficiently identifies high-quality consensus trees, recovering all optimal solutions identified by the MILP in simulated data at a fraction of the time. We demonstrate the applicability of our methods on both simulated and real data, showing that our approach selects the number of clusters depending on the complexity of the solution space T. AVAILABILITY AND IMPLEMENTATION https://github.com/elkebir-group/MCT. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nuraini Aguse
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Yuanyuan Qi
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Mohammed El-Kebir
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| |
Collapse
|
30
|
Diaz-Uriarte R. Cancer progression models and fitness landscapes: a many-to-many relationship. Bioinformatics 2018; 34:836-844. [PMID: 29048486 PMCID: PMC6031050 DOI: 10.1093/bioinformatics/btx663] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2017] [Accepted: 10/17/2017] [Indexed: 11/13/2022] Open
Abstract
Motivation The identification of constraints, due to gene interactions, in the order of accumulation of mutations during cancer progression can allow us to single out therapeutic targets. Cancer progression models (CPMs) use genotype frequency data from cross-sectional samples to identify these constraints, and return Directed Acyclic Graphs (DAGs) of restrictions where arrows indicate dependencies or constraints. On the other hand, fitness landscapes, which map genotypes to fitness, contain all possible paths of tumor progression. Thus, we expect a correspondence between DAGs from CPMs and the fitness landscapes where evolution happened. But many fitness landscapes-e.g. those with reciprocal sign epistasis-cannot be represented by CPMs. Results Using simulated data under 500 fitness landscapes, I show that CPMs' performance (prediction of genotypes that can exist) degrades with reciprocal sign epistasis. There is large variability in the DAGs inferred from each landscape, which is also affected by mutation rate, detection regime and fitness landscape features, in ways that depend on CPM method. Using three cancer datasets, I show that these problems strongly affect the analysis of empirical data: fitness landscapes that are widely different from each other produce data similar to the empirically observed ones and lead to DAGs that infer very different restrictions. Because reciprocal sign epistasis can be common in cancer, these results question the use and interpretation of CPMs. Availability and implementation Code available from Supplementary Material. Contact ramon.diaz@iib.uam.es. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ramon Diaz-Uriarte
- Department of Biochemistry, Universidad Autónoma de Madrid, Instituto de Investigaciones Biomédicas "Alberto Sols" (UAM-CSIC), Madrid 28029, Spain
| |
Collapse
|
31
|
Ramazzotti D, Graudenzi A, Caravagna G, Antoniotti M. Modeling Cumulative Biological Phenomena with Suppes-Bayes Causal Networks. Evol Bioinform Online 2018; 14:1176934318785167. [PMID: 30013303 PMCID: PMC6043942 DOI: 10.1177/1176934318785167] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Accepted: 05/27/2018] [Indexed: 12/18/2022] Open
Abstract
Several diseases related to cell proliferation are characterized by the accumulation of somatic DNA changes, with respect to wild-type conditions. Cancer and HIV are 2 common examples of such diseases, where the mutational load in the cancerous/viral population increases over time. In these cases, selective pressures are often observed along with competition, co-operation, and parasitism among distinct cellular clones. Recently, we presented a mathematical framework to model these phenomena, based on a combination of Bayesian inference and Suppes’ theory of probabilistic causation, depicted in graphical structures dubbed Suppes-Bayes Causal Networks (SBCNs). The SBCNs are generative probabilistic graphical models that recapitulate the potential ordering of accumulation of such DNA changes during the progression of the disease. Such models can be inferred from data by exploiting likelihood-based model selection strategies with regularization. In this article, we discuss the theoretical foundations of our approach and we investigate in depth the influence on the model selection task of (1) the poset based on Suppes’ theory and (2) different regularization strategies. Furthermore, we provide an example of application of our framework to HIV genetic data highlighting the valuable insights provided by the inferred SBCN
Collapse
Affiliation(s)
| | - Alex Graudenzi
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
| | | | - Marco Antoniotti
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
| |
Collapse
|
32
|
Sun Y, Yao J, Yang L, Chen R, Nowak NJ, Goodison S. Computational approach for deriving cancer progression roadmaps from static sample data. Nucleic Acids Res 2017; 45:e69. [PMID: 28108658 PMCID: PMC5436003 DOI: 10.1093/nar/gkx003] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Accepted: 01/07/2017] [Indexed: 12/26/2022] Open
Abstract
As with any biological process, cancer development is inherently dynamic. While major efforts continue to catalog the genomic events associated with human cancer, it remains difficult to interpret and extrapolate the accumulating data to provide insights into the dynamic aspects of the disease. Here, we present a computational strategy that enables the construction of a cancer progression model using static tumor sample data. The developed approach overcame many technical limitations of existing methods. Application of the approach to breast cancer data revealed a linear, branching model with two distinct trajectories for malignant progression. The validity of the constructed model was demonstrated in 27 independent breast cancer data sets, and through visualization of the data in the context of disease progression we were able to identify a number of potentially key molecular events in the advance of breast cancer to malignancy.
Collapse
Affiliation(s)
- Yijun Sun
- Department of Microbiology and Immunology.,Department of Computer Science and Engineering.,Department of Biostatistics, The State University of New York, Buffalo, NY14203, USA.,Department of Biochemistry The State University of New York, Buffalo, NY14203, USA
| | - Jin Yao
- Department of Microbiology and Immunology
| | - Le Yang
- Department of Computer Science and Engineering
| | - Runpu Chen
- Department of Computer Science and Engineering
| | - Norma J Nowak
- Department of Bioinformatics and Biostatistics Roswell Park Cancer Institute, Buffalo, NY 14201, USA
| | - Steve Goodison
- Department of Health Sciences Research Mayo Clinic, Jacksonville, FL 32224, USA
| |
Collapse
|
33
|
Hainke K, Szugat S, Fried R, Rahnenführer J. Variable selection for disease progression models: methods for oncogenetic trees and application to cancer and HIV. BMC Bioinformatics 2017; 18:358. [PMID: 28764644 PMCID: PMC5539896 DOI: 10.1186/s12859-017-1762-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2016] [Accepted: 07/14/2017] [Indexed: 12/12/2022] Open
Abstract
Background Disease progression models are important for understanding the critical steps during the development of diseases. The models are imbedded in a statistical framework to deal with random variations due to biology and the sampling process when observing only a finite population. Conditional probabilities are used to describe dependencies between events that characterise the critical steps in the disease process. Many different model classes have been proposed in the literature, from simple path models to complex Bayesian networks. A popular and easy to understand but yet flexible model class are oncogenetic trees. These have been applied to describe the accumulation of genetic aberrations in cancer and HIV data. However, the number of potentially relevant aberrations is often by far larger than the maximal number of events that can be used for reliably estimating the progression models. Still, there are only a few approaches to variable selection, which have not yet been investigated in detail. Results We fill this gap and propose specifically for oncogenetic trees ten variable selection methods, some of these being completely new. We compare them in an extensive simulation study and on real data from cancer and HIV. It turns out that the preselection of events by clique identification algorithms performs best. Here, events are selected if they belong to the largest or the maximum weight subgraph in which all pairs of vertices are connected. Conclusions The variable selection method of identifying cliques finds both the important frequent events and those related to disease pathways. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1762-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Katrin Hainke
- Department of Statistics, TU Dortmund University, Dortmund, 44221, Germany
| | - Sebastian Szugat
- Department of Statistics, TU Dortmund University, Dortmund, 44221, Germany
| | - Roland Fried
- Department of Statistics, TU Dortmund University, Dortmund, 44221, Germany
| | - Jörg Rahnenführer
- Department of Statistics, TU Dortmund University, Dortmund, 44221, Germany.
| |
Collapse
|
34
|
Montazeri H, Kuipers J, Kouyos R, Böni J, Yerly S, Klimkait T, Aubert V, Günthard HF, Beerenwinkel N. Large-scale inference of conjunctive Bayesian networks. Bioinformatics 2017; 32:i727-i735. [PMID: 27587695 DOI: 10.1093/bioinformatics/btw459] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
UNLABELLED The continuous time conjunctive Bayesian network (CT-CBN) is a graphical model for analyzing the waiting time process of the accumulation of genetic changes (mutations). CT-CBN models have been successfully used in several biological applications such as HIV drug resistance development and genetic progression of cancer. However, current approaches for parameter estimation and network structure learning of CBNs can only deal with a small number of mutations (<20). Here, we address this limitation by presenting an efficient and accurate approximate inference algorithm using a Monte Carlo expectation-maximization algorithm based on importance sampling. The new method can now be used for a large number of mutations, up to one thousand, an increase by two orders of magnitude. In simulation studies, we present the accuracy as well as the running time efficiency of the new inference method and compare it with a MLE method, expectation-maximization, and discrete time CBN model, i.e. a first-order approximation of the CT-CBN model. We also study the application of the new model on HIV drug resistance datasets for the combination therapy with zidovudine plus lamivudine (AZT + 3TC) as well as under no treatment, both extracted from the Swiss HIV Cohort Study database. AVAILABILITY AND IMPLEMENTATION The proposed method is implemented as an R package available at https://github.com/cbg-ethz/MC-CBN CONTACT: niko.beerenwinkel@bsse.ethz.ch SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hesam Montazeri
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Roger Kouyos
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland Institute of Medical Virology
| | - Jürg Böni
- Swiss National Center for Retroviruses, Institute of Medical Virology, University of Zurich, Zurich 8057, Switzerland
| | - Sabine Yerly
- Laboratory of Virology, Division of Infectious Diseases, Geneva University Hospital, Geneva, Switzerland
| | - Thomas Klimkait
- Department of Biomedicine-Petersplatz, University of Basel, Basel, Switzerland
| | - Vincent Aubert
- Division of Immunology and Allergy, University Hospital Lausanne, Lausanne, Switzerland
| | - Huldrych F Günthard
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland Institute of Medical Virology
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | | |
Collapse
|
35
|
Cristea S, Kuipers J, Beerenwinkel N. pathTiMEx: Joint Inference of Mutually Exclusive Cancer Pathways and Their Progression Dynamics. J Comput Biol 2017; 24:603-615. [DOI: 10.1089/cmb.2016.0171] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Affiliation(s)
- Simona Cristea
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- The Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- The Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- The Swiss Institute of Bioinformatics, Basel, Switzerland
| |
Collapse
|
36
|
Abstract
Rapid advances in high-throughput sequencing and a growing realization of the importance of evolutionary theory to cancer genomics have led to a proliferation of phylogenetic studies of tumour progression. These studies have yielded not only new insights but also a plethora of experimental approaches, sometimes reaching conflicting or poorly supported conclusions. Here, we consider this body of work in light of the key computational principles underpinning phylogenetic inference, with the goal of providing practical guidance on the design and analysis of scientifically rigorous tumour phylogeny studies. We survey the range of methods and tools available to the researcher, their key applications, and the various unsolved problems, closing with a perspective on the prospects and broader implications of this field.
Collapse
Affiliation(s)
- Russell Schwartz
- Department of Biological Sciences and Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15217, USA
| | - Alejandro A Schäffer
- Computational Biology Branch, National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland 20892, USA
| |
Collapse
|
37
|
Abstract
Epistasis is a key concept in the theory of adaptation. Indicators of epistasis are of interest for large systems where systematic fitness measurements may not be possible. Some recent approaches depend on information theory. We show that considering shared entropy for pairs of loci can be misleading. The reason is that shared entropy does not imply epistasis for the pair. This observation holds true also in the absence of higher order epistasis. We discuss a method for reducing the number of false positives. However, our main conclusion is that entropy-based approaches have serious limitations in this context. Some recent approaches for identifying epistasis from sequence data depend on information theory. We show that considering shared entropy for pairs of loci can be misleading. The reason is that shared entropy does not imply epistasis for the pair. This observation holds true also in the absence of higher order epistasis. We discuss a method for reducing the number of false positives in the proposed method. However, our main conclusion is that shared entropy for pairs of loci is difficult to interpret. Gene frequencies reflect interactions in the entire system, and there is no natural way to decompose frequency data.
Collapse
Affiliation(s)
- Kristina Crona
- American University, Washington, D.C., United States of America
- * E-mail:
| |
Collapse
|
38
|
Wu H, Gao L, Kasabov NK. Network-Based Method for Inferring Cancer Progression at the Pathway Level from Cross-Sectional Mutation Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:1036-1044. [PMID: 26915128 DOI: 10.1109/tcbb.2016.2520934] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Large-scale cancer genomics projects are providing a wealth of somatic mutation data from a large number of cancer patients. However, it is difficult to obtain several samples with a temporal order from one patient in evaluating the cancer progression. Therefore, one of the most challenging problems arising from the data is to infer the temporal order of mutations across many patients. To solve the problem efficiently, we present a Network-based method (NetInf) to Infer cancer progression at the pathway level from cross-sectional data across many patients, leveraging on the exclusive property of driver mutations within a pathway and the property of linear progression between pathways. To assess the robustness of NetInf, we apply it on simulated data with the addition of different levels of noise. To verify the performance of NetInf, we apply it to analyze somatic mutation data from three real cancer studies with large number of samples. Experimental results reveal that the pathways detected by NetInf show significant enrichment. Our method reduces computational complexity by constructing gene networks without assigning the number of pathways, which also provides new insights on the temporal order of somatic mutations at the pathway level rather than at the gene level.
Collapse
|
39
|
Gertz EM, Chowdhury SA, Lee WJ, Wangsa D, Heselmeyer-Haddad K, Ried T, Schwartz R, Schäffer AA. FISHtrees 3.0: Tumor Phylogenetics Using a Ploidy Probe. PLoS One 2016; 11:e0158569. [PMID: 27362268 PMCID: PMC4928784 DOI: 10.1371/journal.pone.0158569] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Accepted: 06/19/2016] [Indexed: 01/03/2023] Open
Abstract
Advances in fluorescence in situ hybridization (FISH) make it feasible to detect multiple copy-number changes in hundreds of cells of solid tumors. Studies using FISH, sequencing, and other technologies have revealed substantial intra-tumor heterogeneity. The evolution of subclones in tumors may be modeled by phylogenies. Tumors often harbor aneuploid or polyploid cell populations. Using a FISH probe to estimate changes in ploidy can guide the creation of trees that model changes in ploidy and individual gene copy-number variations. We present FISHtrees 3.0, which implements a ploidy-based tree building method based on mixed integer linear programming (MILP). The ploidy-based modeling in FISHtrees includes a new formulation of the problem of merging trees for changes of a single gene into trees modeling changes in multiple genes and the ploidy. When multiple samples are collected from each patient, varying over time or tumor regions, it is useful to evaluate similarities in tumor progression among the samples. Therefore, we further implemented in FISHtrees 3.0 a new method to build consensus graphs for multiple samples. We validate FISHtrees 3.0 on a simulated data and on FISH data from paired cases of cervical primary and metastatic tumors and on paired breast ductal carcinoma in situ (DCIS) and invasive ductal carcinoma (IDC). Tests on simulated data show improved accuracy of the ploidy-based approach relative to prior ploidyless methods. Tests on real data further demonstrate novel insights these methods offer into tumor progression processes. Trees for DCIS samples are significantly less complex than trees for paired IDC samples. Consensus graphs show substantial divergence among most paired samples from both sets. Low consensus between DCIS and IDC trees may help explain the difficulty in finding biomarkers that predict which DCIS cases are at most risk to progress to IDC. The FISHtrees software is available at ftp://ftp.ncbi.nih.gov/pub/FISHtrees.
Collapse
MESH Headings
- Biomarkers, Tumor/genetics
- Breast Neoplasms/genetics
- Breast Neoplasms/pathology
- Carcinoma, Ductal, Breast/genetics
- Carcinoma, Ductal, Breast/pathology
- Carcinoma, Intraductal, Noninfiltrating/genetics
- Carcinoma, Intraductal, Noninfiltrating/pathology
- Databases, Genetic
- Female
- Humans
- In Situ Hybridization, Fluorescence/methods
- Ploidies
- Uterine Cervical Neoplasms/genetics
- Uterine Cervical Neoplasms/pathology
Collapse
Affiliation(s)
- E. Michael Gertz
- Computational Biology Branch, National Center for Biotechnology Information, U.S. National Institutes of Health, Bethesda, MD, United States of America
| | - Salim Akhter Chowdhury
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States of America
- Carnegie Mellon/University of Pittsburgh Joint Ph.D. Program in Computational Biology, Pittsburgh, PA, United States of America
| | - Woei-Jyh Lee
- Computational Biology Branch, National Center for Biotechnology Information, U.S. National Institutes of Health, Bethesda, MD, United States of America
| | - Darawalee Wangsa
- Section of Cancer Genomics, Genetics Branch, Center for Cancer Research, National Cancer Institute, U.S. National Institutes of Health, Bethesda, MD, United States of America
| | - Kerstin Heselmeyer-Haddad
- Section of Cancer Genomics, Genetics Branch, Center for Cancer Research, National Cancer Institute, U.S. National Institutes of Health, Bethesda, MD, United States of America
| | - Thomas Ried
- Section of Cancer Genomics, Genetics Branch, Center for Cancer Research, National Cancer Institute, U.S. National Institutes of Health, Bethesda, MD, United States of America
| | - Russell Schwartz
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States of America
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, United States of America
| | - Alejandro A. Schäffer
- Computational Biology Branch, National Center for Biotechnology Information, U.S. National Institutes of Health, Bethesda, MD, United States of America
| |
Collapse
|
40
|
Algorithmic methods to infer the evolutionary trajectories in cancer progression. Proc Natl Acad Sci U S A 2016; 113:E4025-34. [PMID: 27357673 DOI: 10.1073/pnas.1520213113] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The genomic evolution inherent to cancer relates directly to a renewed focus on the voluminous next-generation sequencing data and machine learning for the inference of explanatory models of how the (epi)genomic events are choreographed in cancer initiation and development. However, despite the increasing availability of multiple additional -omics data, this quest has been frustrated by various theoretical and technical hurdles, mostly stemming from the dramatic heterogeneity of the disease. In this paper, we build on our recent work on the "selective advantage" relation among driver mutations in cancer progression and investigate its applicability to the modeling problem at the population level. Here, we introduce PiCnIc (Pipeline for Cancer Inference), a versatile, modular, and customizable pipeline to extract ensemble-level progression models from cross-sectional sequenced cancer genomes. The pipeline has many translational implications because it combines state-of-the-art techniques for sample stratification, driver selection, identification of fitness-equivalent exclusive alterations, and progression model inference. We demonstrate PiCnIc's ability to reproduce much of the current knowledge on colorectal cancer progression as well as to suggest novel experimentally verifiable hypotheses.
Collapse
|
41
|
The Occurrence of Genetic Alterations during the Progression of Breast Carcinoma. BIOMED RESEARCH INTERNATIONAL 2016; 2016:5237827. [PMID: 27190992 PMCID: PMC4848409 DOI: 10.1155/2016/5237827] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/04/2015] [Accepted: 03/28/2016] [Indexed: 11/24/2022]
Abstract
The interrelationship among genetic variations between the developing process of carcinoma and the order of occurrence has not been completely understood. Interpreting the mechanisms of copy number variation (CNV) is absolutely necessary for understanding the etiology of genetic disorders. Oncogenetic tree is a special phylogenetic tree inferential pictorial representation of oncogenesis. In our present study, we constructed oncogenetic tree to imitate the occurrence of genetic and cytogenetic alterations in human breast cancer. The oncogenetic tree model was built on CNV of ErbB2, AKT2, KRAS, PIK3CA, PTEN, and CCND1 genes in 963 cases of tumors with sequencing and CNA data of human breast cancer from TCGA. Results from the oncogenetic tree model indicate that ErbB2 copy number variation is the frequent early event of human breast cancer. The oncogenetic tree model based on the phylogenetic tree is a type of mathematical model that may eventually provide a better way to understand the process of oncogenesis.
Collapse
|
42
|
Beerenwinkel N, Greenman CD, Lagergren J. Computational Cancer Biology: An Evolutionary Perspective. PLoS Comput Biol 2016; 12:e1004717. [PMID: 26845763 PMCID: PMC4742235 DOI: 10.1371/journal.pcbi.1004717] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Affiliation(s)
- Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
- * E-mail: (NB); (CDG); (JL)
| | - Chris D. Greenman
- School of Computing Sciences, University of East Anglia, Norwich, United Kingdom
- * E-mail: (NB); (CDG); (JL)
| | - Jens Lagergren
- Science for Life Laboratory, School of Computer Science and Communication, Swedish E-Science Research Center, KTH Royal Institute of Technology, Solna, Sweden
- * E-mail: (NB); (CDG); (JL)
| |
Collapse
|
43
|
Chowdhury SA, Gertz EM, Wangsa D, Heselmeyer-Haddad K, Ried T, Schäffer AA, Schwartz R. Inferring models of multiscale copy number evolution for single-tumor phylogenetics. Bioinformatics 2015; 31:i258-67. [PMID: 26072490 PMCID: PMC4481700 DOI: 10.1093/bioinformatics/btv233] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Motivation: Phylogenetic algorithms have begun to see widespread use in cancer research to reconstruct processes of evolution in tumor progression. Developing reliable phylogenies for tumor data requires quantitative models of cancer evolution that include the unusual genetic mechanisms by which tumors evolve, such as chromosome abnormalities, and allow for heterogeneity between tumor types and individual patients. Previous work on inferring phylogenies of single tumors by copy number evolution assumed models of uniform rates of genomic gain and loss across different genomic sites and scales, a substantial oversimplification necessitated by a lack of algorithms and quantitative parameters for fitting to more realistic tumor evolution models. Results: We propose a framework for inferring models of tumor progression from single-cell gene copy number data, including variable rates for different gain and loss events. We propose a new algorithm for identification of most parsimonious combinations of single gene and single chromosome events. We extend it via dynamic programming to include genome duplications. We implement an expectation maximization (EM)-like method to estimate mutation-specific and tumor-specific event rates concurrently with tree reconstruction. Application of our algorithms to real cervical cancer data identifies key genomic events in disease progression consistent with prior literature. Classification experiments on cervical and tongue cancer datasets lead to improved prediction accuracy for the metastasis of primary cervical cancers and for tongue cancer survival. Availability and implementation: Our software (FISHtrees) and two datasets are available at ftp://ftp.ncbi.nlm.nih.gov/pub/FISHtrees. Contact:russells@andrew.cmu.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Salim Akhter Chowdhury
- Joint Carnegie Mellon/University of Pittsburgh PhD Program in Computational Biology, Pittsburgh, PA, USA, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA, Computational Biology Branch, National Center for Biotechnology Information, U.S. National Institutes of Health, Bethesda, MD, USA, Section of Cancer Genomics, Genetics Branch, Center for Cancer Research, National Cancer Institute, U.S. National Institutes of Health, Bethesda, MD, USA and Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA Joint Carnegie Mellon/University of Pittsburgh PhD Program in Computational Biology, Pittsburgh, PA, USA, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA, Computational Biology Branch, National Center for Biotechnology Information, U.S. National Institutes of Health, Bethesda, MD, USA, Section of Cancer Genomics, Genetics Branch, Center for Cancer Research, National Cancer Institute, U.S. National Institutes of Health, Bethesda, MD, USA and Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA
| | - E Michael Gertz
- Joint Carnegie Mellon/University of Pittsburgh PhD Program in Computational Biology, Pittsburgh, PA, USA, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA, Computational Biology Branch, National Center for Biotechnology Information, U.S. National Institutes of Health, Bethesda, MD, USA, Section of Cancer Genomics, Genetics Branch, Center for Cancer Research, National Cancer Institute, U.S. National Institutes of Health, Bethesda, MD, USA and Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Darawalee Wangsa
- Joint Carnegie Mellon/University of Pittsburgh PhD Program in Computational Biology, Pittsburgh, PA, USA, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA, Computational Biology Branch, National Center for Biotechnology Information, U.S. National Institutes of Health, Bethesda, MD, USA, Section of Cancer Genomics, Genetics Branch, Center for Cancer Research, National Cancer Institute, U.S. National Institutes of Health, Bethesda, MD, USA and Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Kerstin Heselmeyer-Haddad
- Joint Carnegie Mellon/University of Pittsburgh PhD Program in Computational Biology, Pittsburgh, PA, USA, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA, Computational Biology Branch, National Center for Biotechnology Information, U.S. National Institutes of Health, Bethesda, MD, USA, Section of Cancer Genomics, Genetics Branch, Center for Cancer Research, National Cancer Institute, U.S. National Institutes of Health, Bethesda, MD, USA and Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Thomas Ried
- Joint Carnegie Mellon/University of Pittsburgh PhD Program in Computational Biology, Pittsburgh, PA, USA, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA, Computational Biology Branch, National Center for Biotechnology Information, U.S. National Institutes of Health, Bethesda, MD, USA, Section of Cancer Genomics, Genetics Branch, Center for Cancer Research, National Cancer Institute, U.S. National Institutes of Health, Bethesda, MD, USA and Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Alejandro A Schäffer
- Joint Carnegie Mellon/University of Pittsburgh PhD Program in Computational Biology, Pittsburgh, PA, USA, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA, Computational Biology Branch, National Center for Biotechnology Information, U.S. National Institutes of Health, Bethesda, MD, USA, Section of Cancer Genomics, Genetics Branch, Center for Cancer Research, National Cancer Institute, U.S. National Institutes of Health, Bethesda, MD, USA and Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Russell Schwartz
- Joint Carnegie Mellon/University of Pittsburgh PhD Program in Computational Biology, Pittsburgh, PA, USA, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA, Computational Biology Branch, National Center for Biotechnology Information, U.S. National Institutes of Health, Bethesda, MD, USA, Section of Cancer Genomics, Genetics Branch, Center for Cancer Research, National Cancer Institute, U.S. National Institutes of Health, Bethesda, MD, USA and Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA Joint Carnegie Mellon/University of Pittsburgh PhD Program in Computational Biology, Pittsburgh, PA, USA, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA, Computational Biology Branch, National Center for Biotechnology Information, U.S. National Institutes of Health, Bethesda, MD, USA, Section of Cancer Genomics, Genetics Branch, Center for Cancer Research, National Cancer Institute, U.S. National Institutes of Health, Bethesda, MD, USA and Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
44
|
Abstract
Mathematical modelling approaches have become increasingly abundant in cancer research. The complexity of cancer is well suited to quantitative approaches as it provides challenges and opportunities for new developments. In turn, mathematical modelling contributes to cancer research by helping to elucidate mechanisms and by providing quantitative predictions that can be validated. The recent expansion of quantitative models addresses many questions regarding tumour initiation, progression and metastases as well as intra-tumour heterogeneity, treatment responses and resistance. Mathematical models can complement experimental and clinical studies, but also challenge current paradigms, redefine our understanding of mechanisms driving tumorigenesis and shape future research in cancer biology.
Collapse
Affiliation(s)
- Philipp M Altrock
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Department of Biostatistics, Harvard T.H. Chan School of Public Health, 450 Brookline Avenue, Boston, Massachusetts 02115, USA
- Program for Evolutionary Dynamics, Harvard University, 1 Brattle Square, Suite 6, Cambridge, Massachusetts 02138, USA
| | - Lin L Liu
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Department of Biostatistics, Harvard T.H. Chan School of Public Health, 450 Brookline Avenue, Boston, Massachusetts 02115, USA
| | - Franziska Michor
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Department of Biostatistics, Harvard T.H. Chan School of Public Health, 450 Brookline Avenue, Boston, Massachusetts 02115, USA
| |
Collapse
|
45
|
Masecchia S, Coco S, Barla A, Verri A, Tonini GP. Genome instability model of metastatic neuroblastoma tumorigenesis by a dictionary learning algorithm. BMC Med Genomics 2015; 8:57. [PMID: 26358114 PMCID: PMC4566396 DOI: 10.1186/s12920-015-0132-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2015] [Accepted: 08/28/2015] [Indexed: 12/21/2022] Open
Abstract
Background Metastatic neuroblastoma (NB) occurs in pediatric patients as stage 4S or stage 4 and it is characterized by heterogeneous clinical behavior associated with diverse genotypes. Tumors of stage 4 contain several structural copy number aberrations (CNAs) rarely found in stage 4S. To date, the NB tumorigenesis is not still elucidated, although it is evident that genomic instability plays a critical role in the genesis of the tumor. Here we propose a mathematical approach to decipher genomic data and we provide a new model of NB metastatic tumorigenesis. Method We elucidate NB tumorigenesis using Enhanced Fused Lasso Latent Feature Model (E-FLLat) modeling the array comparative chromosome hybridization (aCGH) data of 190 metastatic NBs (63 stage 4S and 127 stage 4). This model for aCGH segmentation, based on the minimization of functional dictionary learning (DL), combines several penalties tailored to the specificities of aCGH data. In DL, the original signal is approximated by a linear weighted combination of atoms: the elements of the learned dictionary. Results The hierarchical structures for stage 4S shows at the first level of the oncogenetic tree several whole chromosome gains except to the unbalanced gains of 17q, 2p and 2q. Conversely, the high CNA complexity found in stage 4 tumors, requires two different trees. Both stage 4 oncogenetic trees are marked diverged, up to five sublevels and the 17q gain is the most common event at the first level (2/3 nodes). Moreover the 11q deletion, one of the major unfavorable marker of disease progression, occurs before 3p loss indicating that critical chromosome aberrations appear at early stages of tumorigenesis. Finally, we also observed a significant (p = 0.025) association between patient age and chromosome loss in stage 4 cases. Conclusion These results led us to propose a genome instability progressive model in which NB cells initiate with a DNA synthesis uncoupled from cell division, that leads to stage 4S tumors, primarily characterized by numerical aberrations, or stage 4 tumors with high levels of genome instability resulting in complex chromosome rearrangements associated with high tumor aggressiveness and rapid disease progression. Electronic supplementary material The online version of this article (doi:10.1186/s12920-015-0132-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Simona Coco
- Lung Cancer Unit; IRCCS A.O.U. San Martino - IST, Genova, Italy.
| | - Annalisa Barla
- DIBRIS, Università degli Studi di Genova, Genova, Italy.
| | | | - Gian Paolo Tonini
- Neuroblastoma Laboratory, Onco/Hematology Laboratory, Department of Woman and Child Health, University of Padua, Pediatric Research Institute, Fondazione Città della Speranza, Padua, Corso Stati Uniti, 4, 35127, Padua, Italy.
| |
Collapse
|
46
|
Wangsa D, Chowdhury SA, Ryott M, Gertz EM, Elmberger G, Auer G, Åvall Lundqvist E, Küffer S, Ströbel P, Schäffer AA, Schwartz R, Munck-Wikland E, Ried T, Heselmeyer-Haddad K. Phylogenetic analysis of multiple FISH markers in oral tongue squamous cell carcinoma suggests that a diverse distribution of copy number changes is associated with poor prognosis. Int J Cancer 2015; 138:98-109. [PMID: 26175310 DOI: 10.1002/ijc.29691] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2014] [Revised: 04/21/2015] [Accepted: 06/19/2015] [Indexed: 12/31/2022]
Abstract
Oral tongue squamous cell carcinoma (OTSCC) is associated with poor prognosis. To improve prognostication, we analyzed four gene probes (TERC, CCND1, EGFR and TP53) and the centromere probe CEP4 as a marker of chromosomal instability, using fluorescence in situ hybridization (FISH) in single cells from the tumors of sixty-five OTSCC patients (Stage I, n = 15; Stage II, n = 30; Stage III, n = 7; Stage IV, n = 13). Unsupervised hierarchical clustering of the FISH data distinguished three clusters related to smoking status. Copy number increases of all five markers were found to be correlated to non-smoking habits, while smokers in this cohort had low-level copy number gains. Using the phylogenetic modeling software FISHtrees, we constructed models of tumor progression for each patient based on the four gene probes. Then, we derived test statistics on the models that are significant predictors of disease-free and overall survival, independent of tumor stage and smoking status in multivariate analysis. The patients whose tumors were modeled as progressing by a more diverse distribution of copy number changes across the four genes have poorer prognosis. This is consistent with the view that multiple genetic pathways need to become deregulated in order for cancer to progress.
Collapse
Affiliation(s)
- Darawalee Wangsa
- Genetics Branch, Center For Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD.,Department of Oncology-Pathology, Karolinska Institutet, Karolinska University Hospital, Stockholm, Sweden
| | - Salim Akhter Chowdhury
- Joint Carnegie Mellon/University of Pittsburgh Ph.D. Program In Computational Biology, Carnegie Mellon University, Pittsburgh, PA.,Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA
| | - Michael Ryott
- Department of Otorhinolaryngology, Sophiahemmet Hospital, Stockholm, Sweden
| | - E Michael Gertz
- Computational Biology Branch, National Center For Biotechnology Information, National Institutes of Health, Bethesda, MD
| | - Göran Elmberger
- Department of Laboratory Medicine, Pathology, Örebro University Hospital, Örebro, Sweden
| | - Gert Auer
- Department of Oncology-Pathology, Karolinska Institutet, Karolinska University Hospital, Stockholm, Sweden
| | - Elisabeth Åvall Lundqvist
- Department of Oncology-Pathology, Karolinska Institutet, Karolinska University Hospital, Stockholm, Sweden.,Department of Oncology And Department Of Clinical And Experimental Medicine, Linköping University, Linköping, Sweden
| | - Stefan Küffer
- Institute of Pathology, University Medical Center Göttingen, Göttingen, Germany
| | - Philipp Ströbel
- Institute of Pathology, University Medical Center Göttingen, Göttingen, Germany
| | - Alejandro A Schäffer
- Computational Biology Branch, National Center For Biotechnology Information, National Institutes of Health, Bethesda, MD
| | - Russell Schwartz
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA.,Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA
| | - Eva Munck-Wikland
- Department of Oto-Rhino-Laryngology, Head And Neck Surgery, Karolinska University Hospital and Department of Clinical Science, Intervention and Technology, Karolinska Institutet, Stockholm, Sweden
| | - Thomas Ried
- Genetics Branch, Center For Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD
| | - Kerstin Heselmeyer-Haddad
- Genetics Branch, Center For Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD
| |
Collapse
|
47
|
Roman T, Nayyeri A, Fasy BT, Schwartz R. A simplicial complex-based approach to unmixing tumor progression data. BMC Bioinformatics 2015; 16:254. [PMID: 26264682 PMCID: PMC4534068 DOI: 10.1186/s12859-015-0694-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2014] [Accepted: 08/03/2015] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND Tumorigenesis is an evolutionary process by which tumor cells acquire mutations through successive diversification and differentiation. There is much interest in reconstructing this process of evolution due to its relevance to identifying drivers of mutation and predicting future prognosis and drug response. Efforts are challenged by high tumor heterogeneity, though, both within and among patients. In prior work, we showed that this heterogeneity could be turned into an advantage by computationally reconstructing models of cell populations mixed to different degrees in distinct tumors. Such mixed membership model approaches, however, are still limited in their ability to dissect more than a few well-conserved cell populations across a tumor data set. RESULTS We present a method to improve on current mixed membership model approaches by better accounting for conserved progression pathways between subsets of cancers, which imply a structure to the data that has not previously been exploited. We extend our prior methods, which use an interpretation of the mixture problem as that of reconstructing simple geometric objects called simplices, to instead search for structured unions of simplices called simplicial complexes that one would expect to emerge from mixture processes describing branches along an evolutionary tree. We further improve on the prior work with a novel objective function to better identify mixtures corresponding to parsimonious evolutionary tree models. We demonstrate that this approach improves on our ability to accurately resolve mixtures on simulated data sets and demonstrate its practical applicability on a large RNASeq tumor data set. CONCLUSIONS Better exploiting the expected geometric structure for mixed membership models produced from common evolutionary trees allows us to quickly and accurately reconstruct models of cell populations sampled from those trees. In the process, we hope to develop a better understanding of tumor evolution as well as other biological problems that involve interpreting genomic data gathered from heterogeneous populations of cells.
Collapse
Affiliation(s)
- Theodore Roman
- Computatational Biology Department, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, USA.
| | - Amir Nayyeri
- Computer Science Department, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, USA.
| | - Brittany Terese Fasy
- Department of Computer Science, Tulane University, 6834 St. Charles St., New Orleans, USA.
| | - Russell Schwartz
- Computatational Biology Department, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, USA. .,Department of Biological Sciences, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, USA.
| |
Collapse
|
48
|
Montazeri H, Günthard HF, Yang WL, Kouyos R, Beerenwinkel N. Estimating the dynamics and dependencies of accumulating mutations with applications to HIV drug resistance. Biostatistics 2015; 16:713-26. [PMID: 25979750 DOI: 10.1093/biostatistics/kxv019] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2014] [Accepted: 03/13/2015] [Indexed: 12/14/2022] Open
Abstract
We introduce a new model called the observed time conjunctive Bayesian network (OT-CBN) that describes the accumulation of genetic events (mutations) under partial temporal ordering constraints. Unlike other CBN models, the OT-CBN model uses sampling time points of genotypes in addition to genotypes themselves to estimate model parameters. We developed an expectation-maximization algorithm to obtain approximate maximum likelihood estimates by accounting for this additional information. In a simulation study, we show that the OT-CBN model outperforms the continuous time CBN (CT-CBN) (Beerenwinkel and Sullivant, 2009. Markov models for accumulating mutations. Biometrika 96: (3), 645-661), which does not take into account individual sampling times for parameter estimation. We also show superiority of the OT-CBN model on several datasets of HIV drug resistance mutations extracted from the Swiss HIV Cohort Study database.
Collapse
Affiliation(s)
- Hesam Montazeri
- Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland and SIB Swiss Institute of Bioinformatics, Basel 4058, Switzerland
| | - Huldrych F Günthard
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich 8091, Switzerland Institute of Medical Virology, University of Zurich, Zurich 8057, Switzerland
| | - Wan-Lin Yang
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich 8091, Switzerland Institute of Medical Virology, University of Zurich, Zurich 8057, Switzerland
| | - Roger Kouyos
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich 8091, Switzerland Institute of Medical Virology, University of Zurich, Zurich 8057, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland and SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | | |
Collapse
|
49
|
Ramazzotti D, Caravagna G, Olde Loohuis L, Graudenzi A, Korsunsky I, Mauri G, Antoniotti M, Mishra B. CAPRI: efficient inference of cancer progression models from cross-sectional data. Bioinformatics 2015; 31:3016-26. [DOI: 10.1093/bioinformatics/btv296] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Accepted: 05/04/2015] [Indexed: 12/27/2022] Open
|
50
|
Turajlic S, McGranahan N, Swanton C. Inferring mutational timing and reconstructing tumour evolutionary histories. BIOCHIMICA ET BIOPHYSICA ACTA 2015; 1855:264-75. [PMID: 25827356 DOI: 10.1016/j.bbcan.2015.03.005] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/02/2015] [Revised: 03/17/2015] [Accepted: 03/19/2015] [Indexed: 12/28/2022]
Abstract
Cancer evolution can be considered within a Darwinian framework. Both micro and macro-evolutionary theories can be applied to understand tumour progression and treatment failure. Owing to cancers' complexity and heterogeneity the rules of tumour evolution, such as the role of selection, remain incompletely understood. The timing of mutational events during tumour evolution presents diagnostic, prognostic and therapeutic opportunities. Here we review the current sampling and computational approaches for inferring mutational timing and the evidence from next generation sequencing-informed data on mutational timing across all tumour types. We discuss how this knowledge can be used to illuminate the genes and pathways that drive cancer initiation and relapse; and to support drug development and clinical trial design.
Collapse
Affiliation(s)
- Samra Turajlic
- The Francis Crick Institute, 44 Lincoln's Inn Fields, London WC2A 3LY, UK
| | | | - Charles Swanton
- The Francis Crick Institute, 44 Lincoln's Inn Fields, London WC2A 3LY, UK; UCL Cancer Institute, CRUK Lung Cancer Centre of Excellence, Huntley Street, WC1E 6DD, UK.
| |
Collapse
|