1
|
Schill R, Klever M, Lösch A, Hu YL, Vocht S, Rupp K, Grasedyck L, Spang R, Beerenwinkel N. Correcting for Observation Bias in Cancer Progression Modeling. J Comput Biol 2024; 31:927-945. [PMID: 39480133 DOI: 10.1089/cmb.2024.0666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2024] Open
Abstract
Tumor progression is driven by the accumulation of genetic alterations, including both point mutations and copy number changes. Understanding the temporal sequence of these events is crucial for comprehending the disease but is not directly discernible from cross-sectional genomic data. Cancer progression models, including Mutual Hazard Networks (MHNs), aim to reconstruct the dynamics of tumor progression by learning the causal interactions between genetic events based on their co-occurrence patterns in cross-sectional data. Here, we highlight a commonly overlooked bias in cross-sectional datasets that can distort progression modeling. Tumors become clinically detectable when they cause symptoms or are identified through imaging or tests. Detection factors, such as size, inflammation (fever, fatigue), and elevated biochemical markers, are influenced by genomic alterations. Ignoring these effects leads to "conditioning on a collider" bias, where events making the tumor more observable appear anticorrelated, creating false suppressive effects or masking promoting effects among genetic events. We enhance MHNs by incorporating the effects of genetic progression events on the inclusion of a tumor in a dataset, thus correcting for collider bias. We derive an efficient tensor formula for the likelihood function and apply it to two datasets from the MSK-IMPACT study. In colon adenocarcinoma, we observe a significantly higher rate of clinical detection for TP53-positive tumors, while in lung adenocarcinoma, the same is true for EGFR-positive tumors. Compared to classical MHNs, this approach eliminates several spurious suppressive interactions and uncovers multiple promoting effects.
Collapse
Affiliation(s)
- Rudolf Schill
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Maren Klever
- Institute for Geometry and Applied Mathematics, RWTH Aachen, Aachen, Germany
| | - Andreas Lösch
- Department of Statistical Bioinformatics, University of Regensburg, Regensburg, Germany
| | - Y Linda Hu
- Department of Statistical Bioinformatics, University of Regensburg, Regensburg, Germany
| | - Stefan Vocht
- Department of Statistical Bioinformatics, University of Regensburg, Regensburg, Germany
| | - Kevin Rupp
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Lars Grasedyck
- Institute for Geometry and Applied Mathematics, RWTH Aachen, Aachen, Germany
| | - Rainer Spang
- Department of Statistical Bioinformatics, University of Regensburg, Regensburg, Germany
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| |
Collapse
|
2
|
Wang M, Xie Y, Liu J, Li A, Chen L, Stromberg A, Arnold SM, Liu C, Wang C. A Probabilistic Approach to Estimate the Temporal Order of Pathway Mutations Accounting for Intra-Tumor Heterogeneity. Cancers (Basel) 2024; 16:2488. [PMID: 39001551 PMCID: PMC11240401 DOI: 10.3390/cancers16132488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Accepted: 06/26/2024] [Indexed: 07/16/2024] Open
Abstract
The development of cancer involves the accumulation of somatic mutations in several essential biological pathways. Delineating the temporal order of pathway mutations during tumorigenesis is crucial for comprehending the biological mechanisms underlying cancer development and identifying potential targets for therapeutic intervention. Several computational and statistical methods have been introduced for estimating the order of somatic mutations based on mutation profile data from a cohort of patients. However, one major issue of current methods is that they do not take into account intra-tumor heterogeneity (ITH), which limits their ability to accurately discern the order of pathway mutations. To address this problem, we propose PATOPAI, a probabilistic approach to estimate the temporal order of mutations at the pathway level by incorporating ITH information as well as pathway and functional annotation information of mutations. PATOPAI uses a maximum likelihood approach to estimate the probability of pathway mutational events occurring in a specific sequence, wherein it focuses on the orders that are consistent with the phylogenetic structure of the tumors. Applications to whole exome sequencing data from The Cancer Genome Atlas (TCGA) illustrate our method's ability to recover the temporal order of pathway mutations in several cancer types.
Collapse
Affiliation(s)
- Menghan Wang
- Department of Statistics, University of Kentucky, Lexington, KY 40536, USA; (M.W.); (A.S.)
| | - Yanqi Xie
- Department of Molecular & Cellular Biochemistry, University of Kentucky, Lexington, KY 40508, USA; (Y.X.); (C.L.)
| | - Jinpeng Liu
- Markey Cancer Center, University of Kentucky, Lexington, KY 40536, USA; (J.L.); (L.C.); (S.M.A.)
- Division of Cancer Biostatistics, Department of Internal Medicine, University of Kentucky, Lexington, KY 40536, USA
| | - Austin Li
- Department of Computer Science, Princeton University, Princeton, NJ 08540, USA;
| | - Li Chen
- Markey Cancer Center, University of Kentucky, Lexington, KY 40536, USA; (J.L.); (L.C.); (S.M.A.)
- Division of Cancer Biostatistics, Department of Internal Medicine, University of Kentucky, Lexington, KY 40536, USA
| | - Arnold Stromberg
- Department of Statistics, University of Kentucky, Lexington, KY 40536, USA; (M.W.); (A.S.)
| | - Susanne M. Arnold
- Markey Cancer Center, University of Kentucky, Lexington, KY 40536, USA; (J.L.); (L.C.); (S.M.A.)
- Division of Medical Oncology, Department of Internal Medicine, University of Kentucky, Lexington, KY 40536, USA
| | - Chunming Liu
- Department of Molecular & Cellular Biochemistry, University of Kentucky, Lexington, KY 40508, USA; (Y.X.); (C.L.)
- Markey Cancer Center, University of Kentucky, Lexington, KY 40536, USA; (J.L.); (L.C.); (S.M.A.)
| | - Chi Wang
- Department of Statistics, University of Kentucky, Lexington, KY 40536, USA; (M.W.); (A.S.)
- Markey Cancer Center, University of Kentucky, Lexington, KY 40536, USA; (J.L.); (L.C.); (S.M.A.)
- Division of Cancer Biostatistics, Department of Internal Medicine, University of Kentucky, Lexington, KY 40536, USA
| |
Collapse
|
3
|
Sperber C, Hakim A, Gallucci L, Seiffge D, Rezny-Kasprzak B, Jäger E, Meinel T, Wiest R, Fischer U, Arnold M, Umarova R. A typology of cerebral small vessel disease based on imaging markers. J Neurol 2023; 270:4985-4994. [PMID: 37368130 PMCID: PMC10511610 DOI: 10.1007/s00415-023-11831-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 06/16/2023] [Accepted: 06/17/2023] [Indexed: 06/28/2023]
Abstract
BACKGROUND Lacunes, microbleeds, enlarged perivascular spaces (EPVS), and white matter hyperintensities (WMH) are brain imaging features of cerebral small vessel disease (SVD). Based on these imaging markers, we aimed to identify subtypes of SVD and to evaluate the validity of these markers as part of clinical ratings and as biomarkers for stroke outcome. METHODS In a cross-sectional study, we examined 1207 first-ever anterior circulation ischemic stroke patients (mean age 69.1 ± 15.4 years; mean NIHSS 5.3 ± 6.8). On acute stroke MRI, we assessed the numbers of lacunes and microbleeds and rated EPVS and deep and periventricular WMH. We used unsupervised learning to cluster patients based on these variables. RESULTS We identified five clusters, of which the last three appeared to represent distinct late stages of SVD. The two largest clusters had no to only mild or moderate WMH and EPVS, respectively, and favorable stroke outcome. The third cluster was characterized by the largest number of lacunes and a likewise favorable outcome. The fourth cluster had the highest age, most pronounced WMH, and poor outcome. Showing the worst outcome, the fifth cluster presented pronounced microbleeds and the most severe SVD burden. CONCLUSION The study confirmed the existence of different SVD types with different relationships to stroke outcome. EPVS and WMH were identified as imaging features of presumably early progression. The number of microbleeds and WMH severity appear to be promising biomarkers for distinguishing clinical subgroups. Further understanding of SVD progression might require consideration of refined SVD features, e.g., for EPVS and type of lacunes.
Collapse
Affiliation(s)
- Christoph Sperber
- Department of Neurology, Inselspital, University Hospital Bern, University of Bern, Bern, Switzerland
| | - Arsany Hakim
- University Institute of Diagnostic and Interventional Neuroradiology, Inselspital, University Hospital Bern, University of Bern, Bern, Switzerland
| | - Laura Gallucci
- Department of Neurology, Inselspital, University Hospital Bern, University of Bern, Bern, Switzerland
| | - David Seiffge
- Department of Neurology, Inselspital, University Hospital Bern, University of Bern, Bern, Switzerland
| | - Beata Rezny-Kasprzak
- University Institute of Diagnostic and Interventional Neuroradiology, Inselspital, University Hospital Bern, University of Bern, Bern, Switzerland
| | - Eugen Jäger
- University Institute of Diagnostic and Interventional Neuroradiology, Inselspital, University Hospital Bern, University of Bern, Bern, Switzerland
| | - Thomas Meinel
- Department of Neurology, Inselspital, University Hospital Bern, University of Bern, Bern, Switzerland
| | - Roland Wiest
- University Institute of Diagnostic and Interventional Neuroradiology, Inselspital, University Hospital Bern, University of Bern, Bern, Switzerland
| | - Urs Fischer
- Department of Neurology, Inselspital, University Hospital Bern, University of Bern, Bern, Switzerland
- Department of Neurology, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Marcel Arnold
- Department of Neurology, Inselspital, University Hospital Bern, University of Bern, Bern, Switzerland
| | - Roza Umarova
- Department of Neurology, Inselspital, University Hospital Bern, University of Bern, Bern, Switzerland.
| |
Collapse
|
4
|
Pirkl M, Büch J, Devaux C, Böhm M, Sönnerborg A, Incardona F, Abecasis A, Vandamme AM, Zazzi M, Kaiser R, Lengauer T, The EuResist Network Study Group. Analysis of mutational history of multidrug-resistant genotypes with a mutagenetic tree model. J Med Virol 2023; 95:e28389. [PMID: 36484375 DOI: 10.1002/jmv.28389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 11/24/2022] [Accepted: 12/01/2022] [Indexed: 12/14/2022]
Abstract
Human immunodeficiency virus (HIV) can develop resistance to all antiretroviral drugs. Multidrug resistance, however, is a rare event in modern HIV treatment, but can be life-threatening, particular in patients with very long therapy histories and in areas with limited access to novel drugs. To understand the evolution of multidrug resistance, we analyzed the EuResist database to uncover the accumulation of mutations over time. We hypothesize that the accumulation of resistance mutations is not acquired simultaneously and randomly across viral genotypes but rather tends to follow a predetermined order. The knowledge of this order might help to elucidate potential mechanisms of multidrug resistance. Our evolutionary model shows an almost monotonic increase of resistance with each acquired mutation, including less well-known nucleoside reverse transcriptase (RT) inhibitor-related mutations like K223Q, L228H, and Q242H. Mutations within the integrase (IN) (T97A, E138A/K G140S, Q148H, N155H) indicate high probability of multidrug resistance. Hence, these IN mutations also tend to be observed together with mutations in the protease (PR) and RT. We followed up with an analysis of the mutation-specific error rates of our model given the data. We identified several mutations with unusual rates (PR: M41L, L33F, IN: G140S). This could imply the existence of previously unknown virus variants in the viral quasispecies. In conclusion, our bioinformatics model supports the analysis and understanding of multidrug resistance.
Collapse
Affiliation(s)
- Martin Pirkl
- Institute of Virology, University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Joachim Büch
- Institute of Virology, University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Carole Devaux
- Department of Infection and Immunity, Luxembourg Institute of Health, Strassen, Luxembourg
| | - Michael Böhm
- Institute of Virology, University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Anders Sönnerborg
- Department of Laboratory Medicine, Division of Clinical Microbiology, Karolinska Institute, Solna, Sweden
| | | | - Ana Abecasis
- Center for Global Health and Tropical Medicine, Instituto de Higiene e Medicina Tropical, Universidade Nova de Lisboa, Lisbon, Portugal
| | - Anne-Mieke Vandamme
- Center for Global Health and Tropical Medicine, Instituto de Higiene e Medicina Tropical, Universidade Nova de Lisboa, Lisbon, Portugal.,Department of Microbiology, Immunology and Transplantation, Clinical and Epidemiological Virology, Institute for the Future, Rega Institute for Medical Research, KU Leuven, Leuven, Belgium
| | - Maurizio Zazzi
- Department of Medical Biotechnologies, University of Siena, Siena, Italy
| | - Rolf Kaiser
- Institute of Virology, University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Thomas Lengauer
- Institute of Virology, University Hospital Cologne, University of Cologne, Cologne, Germany
| | | |
Collapse
|
5
|
Jiang H, Li Q, Lin JT, Lin FC. Classification of disease recurrence using transition likelihoods with expectation-maximization algorithm. Stat Med 2022; 41:4697-4715. [PMID: 35908812 PMCID: PMC9489660 DOI: 10.1002/sim.9534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 05/17/2022] [Accepted: 07/10/2022] [Indexed: 11/09/2022]
Abstract
When an infectious disease recurs, it may be due to treatment failure or a new infection. Being able to distinguish and classify these two different outcomes is critical in effective disease control. A multi-state model based on Markov processes is a typical approach to estimating the transition probability between the disease states. However, it can perform poorly when the disease state is unknown. This article aims to demonstrate that the transition likelihoods of baseline covariates can distinguish one cause from another with high accuracy in infectious diseases such as malaria. A more general model for disease progression can be constructed to allow for additional disease outcomes. We start from a multinomial logit model to estimate the disease transition probabilities and then utilize the baseline covariate's transition information to provide a more accurate classification result. We apply the expectation-maximization (EM) algorithm to estimate unknown parameters, including the marginal probabilities of disease outcomes. A simulation study comparing our classifier to the existing two-stage method shows that our classifier has better accuracy, especially when the sample size is small. The proposed method is applied to determining relapse vs reinfection outcomes in two Plasmodium vivax treatment studies from Cambodia that used different genotyping approaches to demonstrate its practical use.
Collapse
Affiliation(s)
- Huijun Jiang
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
| | - Quefeng Li
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
| | - Jessica T. Lin
- Division of Infectious Disease, School of Medicine, University of North Carolina, Chapel Hill, NC, USA
| | - Feng-Chang Lin
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
| |
Collapse
|
6
|
Comparing mutational pathways to lopinavir resistance in HIV-1 subtypes B versus C. PLoS Comput Biol 2021; 17:e1008363. [PMID: 34491984 PMCID: PMC8448360 DOI: 10.1371/journal.pcbi.1008363] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Revised: 09/17/2021] [Accepted: 08/09/2021] [Indexed: 11/19/2022] Open
Abstract
Although combination antiretroviral therapies seem to be effective at controlling HIV-1 infections regardless of the viral subtype, there is increasing evidence for subtype-specific drug resistance mutations. The order and rates at which resistance mutations accumulate in different subtypes also remain poorly understood. Most of this knowledge is derived from studies of subtype B genotypes, despite not being the most abundant subtype worldwide. Here, we present a methodology for the comparison of mutational networks in different HIV-1 subtypes, based on Hidden Conjunctive Bayesian Networks (H-CBN), a probabilistic model for inferring mutational networks from cross-sectional genotype data. We introduce a Monte Carlo sampling scheme for learning H-CBN models for a larger number of resistance mutations and develop a statistical test to assess differences in the inferred mutational networks between two groups. We apply this method to infer the temporal progression of mutations conferring resistance to the protease inhibitor lopinavir in a large cross-sectional cohort of HIV-1 subtype C genotypes from South Africa, as well as to a data set of subtype B genotypes obtained from the Stanford HIV Drug Resistance Database and the Swiss HIV Cohort Study. We find strong support for different initial mutational events in the protease, namely at residue 46 in subtype B and at residue 82 in subtype C. The inferred mutational networks for subtype B versus C are significantly different sharing only five constraints on the order of accumulating mutations with mutation at residue 54 as the parental event. The results also suggest that mutations can accumulate along various alternative paths within subtypes, as opposed to a unique total temporal ordering. Beyond HIV drug resistance, the statistical methodology is applicable more generally for the comparison of inferred mutational networks between any two groups. There is a disparity in the distribution of infections by HIV-1 subtype in the world. Subtype B is predominant in America, Australia and western and central Europe, and most therapeutic strategies are based on research and clinical studies on this subtype. However, non-B subtypes represent the majority of global HIV-1 infections; e.g., subtype C alone accounts for nearly half of all HIV-1 infections. We present a statistical framework enabling the comparison of patterns of accumulating mutations in different HIV-1 subtypes. Specifically, we compare the temporal ordering of lopinavir resistance mutations in HIV-1 subtypes B versus C. To this end, we combine the Hidden Conjunctive Bayesian Network (H-CBN) model with an approximate inference scheme enabling comparisons of larger networks. We show that the development of resistance to lopinavir differs significantly between subtypes B and C, such that findings based on subtype B sequences can not always be applied to sybtype C. The described methodology is suitable for comparing different subgroups in the context of other evolutionary processes.
Collapse
|
7
|
Feder AF, Harper KN, Brumme CJ, Pennings PS. Understanding patterns of HIV multi-drug resistance through models of temporal and spatial drug heterogeneity. eLife 2021; 10:e69032. [PMID: 34473060 PMCID: PMC8412921 DOI: 10.7554/elife.69032] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 08/03/2021] [Indexed: 01/09/2023] Open
Abstract
Triple-drug therapies have transformed HIV from a fatal condition to a chronic one. These therapies should prevent HIV drug resistance evolution, because one or more drugs suppress any partially resistant viruses. In practice, such therapies drastically reduced, but did not eliminate, resistance evolution. In this article, we reanalyze published data from an evolutionary perspective and demonstrate several intriguing patterns about HIV resistance evolution - resistance evolves (1) even after years on successful therapy, (2) sequentially, often via one mutation at a time and (3) in a partially predictable order. We describe how these observations might emerge under two models of HIV drugs varying in space or time. Despite decades of work in this area, much opportunity remains to create models with realistic parameters for three drugs, and to match model outcomes to resistance rates and genetic patterns from individuals on triple-drug therapy. Further, lessons from HIV may inform other systems.
Collapse
Affiliation(s)
- Alison F Feder
- Department of Integrative Biology, University of California, BerkeleyBerkeleyUnited States
- Department of Genome Sciences, University of WashingtonSeattleUnited States
| | - Kristin N Harper
- Harper Health and Science Communications, LLCSeattleUnited States
| | - Chanson J Brumme
- British Columbia Centre for Excellence in HIV/AIDSVancouverCanada
- Department of Medicine, University of British ColumbiaVancouverCanada
| | - Pleuni S Pennings
- Department of Biology, San Francisco State UniversitySan FranciscoUnited States
| |
Collapse
|
8
|
Nicol PB, Coombes KR, Deaver C, Chkrebtii O, Paul S, Toland AE, Asiaee A. Oncogenetic network estimation with disjunctive Bayesian networks. COMPUTATIONAL AND SYSTEMS ONCOLOGY 2021. [DOI: 10.1002/cso2.1027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Affiliation(s)
| | - Kevin R. Coombes
- Department of Biomedical Informatics Ohio State University Columbus Ohio
| | - Courtney Deaver
- Natural Sciences Division Pepperdine University Malibu California
| | | | - Subhadeep Paul
- Department of Statistics Ohio State University Columbus Ohio
| | - Amanda E. Toland
- Department of Cancer Biology and Genetics and Department of Internal Medicine Division of Human Genetics, Comprehensive Cancer Center Ohio State University Columbus Ohio
| | - Amir Asiaee
- Mathematical Biosciences Institute Ohio State University Columbus Ohio
| |
Collapse
|
9
|
Nguembang Fadja A, Riguzzi F, Lamma E. Learning hierarchical probabilistic logic programs. Mach Learn 2021. [DOI: 10.1007/s10994-021-06016-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
AbstractProbabilistic logic programming (PLP) combines logic programs and probabilities. Due to its expressiveness and simplicity, it has been considered as a powerful tool for learning and reasoning in relational domains characterized by uncertainty. Still, learning the parameter and the structure of general PLP is computationally expensive due to the inference cost. We have recently proposed a restriction of the general PLP language called hierarchical PLP (HPLP) in which clauses and predicates are hierarchically organized. HPLPs can be converted into arithmetic circuits or deep neural networks and inference is much cheaper than for general PLP. In this paper we present algorithms for learning both the parameters and the structure of HPLPs from data. We first present an algorithm, called parameter learning for hierarchical probabilistic logic programs (PHIL) which performs parameter estimation of HPLPs using gradient descent and expectation maximization. We also propose structure learning of hierarchical probabilistic logic programming (SLEAHP), that learns both the structure and the parameters of HPLPs from data. Experiments were performed comparing PHIL and SLEAHP with PLP and Markov Logic Networks state-of-the art systems for parameter and structure learning respectively. PHIL was compared with EMBLEM, ProbLog2 and Tuffy and SLEAHP with SLIPCOVER, PROBFOIL+, MLB-BC, MLN-BT and RDN-B. The experiments on five well known datasets show that our algorithms achieve similar and often better accuracies but in a shorter time.
Collapse
|
10
|
Haupt S, Zeilmann A, Ahadova A, Bläker H, von Knebel Doeberitz M, Kloor M, Heuveline V. Mathematical modeling of multiple pathways in colorectal carcinogenesis using dynamical systems with Kronecker structure. PLoS Comput Biol 2021; 17:e1008970. [PMID: 34003820 PMCID: PMC8162698 DOI: 10.1371/journal.pcbi.1008970] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Revised: 05/28/2021] [Accepted: 04/16/2021] [Indexed: 01/02/2023] Open
Abstract
Like many other types of cancer, colorectal cancer (CRC) develops through multiple pathways of carcinogenesis. This is also true for colorectal carcinogenesis in Lynch syndrome (LS), the most common inherited CRC syndrome. However, a comprehensive understanding of the distribution of these pathways of carcinogenesis, which allows for tailored clinical treatment and even prevention, is still lacking. We suggest a linear dynamical system modeling the evolution of different pathways of colorectal carcinogenesis based on the involved driver mutations. The model consists of different components accounting for independent and dependent mutational processes. We define the driver gene mutation graphs and combine them using the Cartesian graph product. This leads to matrix components built by the Kronecker sum and product of the adjacency matrices of the gene mutation graphs enabling a thorough mathematical analysis and medical interpretation. Using the Kronecker structure, we developed a mathematical model which we applied exemplarily to the three pathways of colorectal carcinogenesis in LS. Beside a pathogenic germline variant in one of the DNA mismatch repair (MMR) genes, driver mutations in APC, CTNNB1, KRAS and TP53 are considered. We exemplarily incorporate mutational dependencies, such as increased point mutation rates after MMR deficiency, and based on recent experimental data, biallelic somatic CTNNB1 mutations as common drivers of LS-associated CRCs. With the model and parameter choice, we obtained simulation results that are in concordance with clinical observations. These include the evolution of MMR-deficient crypts as early precursors in LS carcinogenesis and the influence of variants in MMR genes thereon. The proportions of MMR-deficient and MMR-proficient APC-inactivated crypts as first measure for the distribution among the pathways in LS-associated colorectal carcinogenesis are compatible with clinical observations. The approach provides a modular framework for modeling multiple pathways of carcinogenesis yielding promising results in concordance with clinical observations in LS CRCs.
Collapse
Affiliation(s)
- Saskia Haupt
- Engineering Mathematics and Computing Lab (EMCL), Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, Heidelberg, Germany
- Data Mining and Uncertainty Quantification (DMQ), Heidelberg Institute for Theoretical Studies (HITS), Heidelberg, Germany
| | - Alexander Zeilmann
- Image and Pattern Analysis Group (IPA), Heidelberg University, Heidelberg, Germany
| | - Aysel Ahadova
- Department of Applied Tumor Biology (ATB), Institute of Pathology, University Hospital Heidelberg, Heidelberg, Germany
| | - Hendrik Bläker
- Institute of Pathology, University Hospital Leipzig, Leipzig, Germany
| | - Magnus von Knebel Doeberitz
- Department of Applied Tumor Biology (ATB), Institute of Pathology, University Hospital Heidelberg, Heidelberg, Germany
| | - Matthias Kloor
- Department of Applied Tumor Biology (ATB), Institute of Pathology, University Hospital Heidelberg, Heidelberg, Germany
| | - Vincent Heuveline
- Engineering Mathematics and Computing Lab (EMCL), Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, Heidelberg, Germany
- Data Mining and Uncertainty Quantification (DMQ), Heidelberg Institute for Theoretical Studies (HITS), Heidelberg, Germany
| |
Collapse
|
11
|
Schill R, Solbrig S, Wettig T, Spang R. Modelling cancer progression using Mutual Hazard Networks. Bioinformatics 2020; 36:241-249. [PMID: 31250881 PMCID: PMC6956791 DOI: 10.1093/bioinformatics/btz513] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2018] [Revised: 03/29/2019] [Accepted: 06/25/2019] [Indexed: 12/26/2022] Open
Abstract
MOTIVATION Cancer progresses by accumulating genomic events, such as mutations and copy number alterations, whose chronological order is key to understanding the disease but difficult to observe. Instead, cancer progression models use co-occurrence patterns in cross-sectional data to infer epistatic interactions between events and thereby uncover their most likely order of occurrence. State-of-the-art progression models, however, are limited by mathematical tractability and only allow events to interact in directed acyclic graphs, to promote but not inhibit subsequent events, or to be mutually exclusive in distinct groups that cannot overlap. RESULTS Here we propose Mutual Hazard Networks (MHN), a new Machine Learning algorithm to infer cyclic progression models from cross-sectional data. MHN model events by their spontaneous rate of fixation and by multiplicative effects they exert on the rates of successive events. MHN compared favourably to acyclic models in cross-validated model fit on four datasets tested. In application to the glioblastoma dataset from The Cancer Genome Atlas, MHN proposed a novel interaction in line with consecutive biopsies: IDH1 mutations are early events that promote subsequent fixation of TP53 mutations. AVAILABILITY AND IMPLEMENTATION Implementation and data are available at https://github.com/RudiSchill/MHN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rudolf Schill
- Department of Statistical Bioinformatics, Institute of Functional Genomics, Regensburg 93040, Germany
| | - Stefan Solbrig
- Department of Physics, University of Regensburg, Regensburg 93040, Germany
| | - Tilo Wettig
- Department of Physics, University of Regensburg, Regensburg 93040, Germany
| | - Rainer Spang
- Department of Statistical Bioinformatics, Institute of Functional Genomics, Regensburg 93040, Germany
| |
Collapse
|
12
|
Wang M, Yu T, Liu J, Chen L, Stromberg AJ, Villano JL, Arnold SM, Liu C, Wang C. A probabilistic method for leveraging functional annotations to enhance estimation of the temporal order of pathway mutations during carcinogenesis. BMC Bioinformatics 2019; 20:620. [PMID: 31791231 PMCID: PMC6889196 DOI: 10.1186/s12859-019-3218-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2019] [Accepted: 11/12/2019] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Cancer arises through accumulation of somatically acquired genetic mutations. An important question is to delineate the temporal order of somatic mutations during carcinogenesis, which contributes to better understanding of cancer biology and facilitates identification of new therapeutic targets. Although a number of statistical and computational methods have been proposed to estimate the temporal order of mutations, they do not account for the differences in the functional impacts of mutations and thus are likely to be obscured by the presence of passenger mutations that do not contribute to cancer progression. In addition, many methods infer the order of mutations at the gene level, which have limited power due to the low mutation rate in most genes. RESULTS In this paper, we develop a Probabilistic Approach for estimating the Temporal Order of Pathway mutations by leveraging functional Annotations of mutations (PATOPA). PATOPA infers the order of mutations at the pathway level, wherein it uses a probabilistic method to characterize the likelihood of mutational events from different pathways occurring in a certain order. The functional impact of each mutation is incorporated to weigh more on a mutation that is more integral to tumor development. A maximum likelihood method is used to estimate parameters and infer the probability of one pathway being mutated prior to another. Simulation studies and analysis of whole exome sequencing data from The Cancer Genome Atlas (TCGA) demonstrate that PATOPA is able to accurately estimate the temporal order of pathway mutations and provides new biological insights on carcinogenesis of colorectal and lung cancers. CONCLUSIONS PATOPA provides a useful tool to estimate temporal order of mutations at the pathway level while leveraging functional annotations of mutations.
Collapse
Affiliation(s)
- Menghan Wang
- Department of Statistics, University of Kentucky, Lexington, USA
| | - Tianxin Yu
- Department of Molecular & Cellular Biology, Roswell Park Comprehensive Cancer Center, Buffalo, USA
| | - Jinpeng Liu
- Markey Cancer Center, University of Kentucky, Lexington, USA
| | - Li Chen
- Markey Cancer Center, University of Kentucky, Lexington, USA
- Department of Biostatistics, University of Kentucky, Lexington, USA
| | | | - John L. Villano
- Markey Cancer Center, University of Kentucky, Lexington, USA
- Department of Internal Medicine, University of Kentucky, Lexington, USA
| | - Susanne M. Arnold
- Markey Cancer Center, University of Kentucky, Lexington, USA
- Department of Internal Medicine, University of Kentucky, Lexington, USA
| | - Chunming Liu
- Markey Cancer Center, University of Kentucky, Lexington, USA
- Department of Molecular & Cellular Biochemistry, University of Kentucky, Lexington, USA
| | - Chi Wang
- Markey Cancer Center, University of Kentucky, Lexington, USA
- Department of Biostatistics, University of Kentucky, Lexington, USA
| |
Collapse
|
13
|
Khakabimamaghani S, Ding D, Snow O, Ester M. Uncovering the subtype-specific temporal order of cancer pathway dysregulation. PLoS Comput Biol 2019; 15:e1007451. [PMID: 31710622 PMCID: PMC6872169 DOI: 10.1371/journal.pcbi.1007451] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 11/21/2019] [Accepted: 09/30/2019] [Indexed: 12/20/2022] Open
Abstract
Cancer is driven by genetic mutations that dysregulate pathways important for proper cell function. Therefore, discovering these cancer pathways and their dysregulation order is key to understanding and treating cancer. However, the heterogeneity of mutations between different individuals makes this challenging and requires that cancer progression is studied in a subtype-specific way. To address this challenge, we provide a mathematical model, called Subtype-specific Pathway Linear Progression Model (SPM), that simultaneously captures cancer subtypes and pathways and order of dysregulation of the pathways within each subtype. Experiments with synthetic data indicate the robustness of SPM to problem specifics including noise compared to an existing method. Moreover, experimental results on glioblastoma multiforme and colorectal adenocarcinoma show the consistency of SPM's results with the existing knowledge and its superiority to an existing method in certain cases. The implementation of our method is available at https://github.com/Dalton386/SPM.
Collapse
Affiliation(s)
| | - Dujian Ding
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Oliver Snow
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Martin Ester
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada
| |
Collapse
|
14
|
Hainke K, Szugat S, Fried R, Rahnenführer J. Variable selection for disease progression models: methods for oncogenetic trees and application to cancer and HIV. BMC Bioinformatics 2017; 18:358. [PMID: 28764644 PMCID: PMC5539896 DOI: 10.1186/s12859-017-1762-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2016] [Accepted: 07/14/2017] [Indexed: 12/12/2022] Open
Abstract
Background Disease progression models are important for understanding the critical steps during the development of diseases. The models are imbedded in a statistical framework to deal with random variations due to biology and the sampling process when observing only a finite population. Conditional probabilities are used to describe dependencies between events that characterise the critical steps in the disease process. Many different model classes have been proposed in the literature, from simple path models to complex Bayesian networks. A popular and easy to understand but yet flexible model class are oncogenetic trees. These have been applied to describe the accumulation of genetic aberrations in cancer and HIV data. However, the number of potentially relevant aberrations is often by far larger than the maximal number of events that can be used for reliably estimating the progression models. Still, there are only a few approaches to variable selection, which have not yet been investigated in detail. Results We fill this gap and propose specifically for oncogenetic trees ten variable selection methods, some of these being completely new. We compare them in an extensive simulation study and on real data from cancer and HIV. It turns out that the preselection of events by clique identification algorithms performs best. Here, events are selected if they belong to the largest or the maximum weight subgraph in which all pairs of vertices are connected. Conclusions The variable selection method of identifying cliques finds both the important frequent events and those related to disease pathways. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1762-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Katrin Hainke
- Department of Statistics, TU Dortmund University, Dortmund, 44221, Germany
| | - Sebastian Szugat
- Department of Statistics, TU Dortmund University, Dortmund, 44221, Germany
| | - Roland Fried
- Department of Statistics, TU Dortmund University, Dortmund, 44221, Germany
| | - Jörg Rahnenführer
- Department of Statistics, TU Dortmund University, Dortmund, 44221, Germany.
| |
Collapse
|
15
|
Montazeri H, Kuipers J, Kouyos R, Böni J, Yerly S, Klimkait T, Aubert V, Günthard HF, Beerenwinkel N. Large-scale inference of conjunctive Bayesian networks. Bioinformatics 2017; 32:i727-i735. [PMID: 27587695 DOI: 10.1093/bioinformatics/btw459] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
UNLABELLED The continuous time conjunctive Bayesian network (CT-CBN) is a graphical model for analyzing the waiting time process of the accumulation of genetic changes (mutations). CT-CBN models have been successfully used in several biological applications such as HIV drug resistance development and genetic progression of cancer. However, current approaches for parameter estimation and network structure learning of CBNs can only deal with a small number of mutations (<20). Here, we address this limitation by presenting an efficient and accurate approximate inference algorithm using a Monte Carlo expectation-maximization algorithm based on importance sampling. The new method can now be used for a large number of mutations, up to one thousand, an increase by two orders of magnitude. In simulation studies, we present the accuracy as well as the running time efficiency of the new inference method and compare it with a MLE method, expectation-maximization, and discrete time CBN model, i.e. a first-order approximation of the CT-CBN model. We also study the application of the new model on HIV drug resistance datasets for the combination therapy with zidovudine plus lamivudine (AZT + 3TC) as well as under no treatment, both extracted from the Swiss HIV Cohort Study database. AVAILABILITY AND IMPLEMENTATION The proposed method is implemented as an R package available at https://github.com/cbg-ethz/MC-CBN CONTACT: niko.beerenwinkel@bsse.ethz.ch SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hesam Montazeri
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Roger Kouyos
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland Institute of Medical Virology
| | - Jürg Böni
- Swiss National Center for Retroviruses, Institute of Medical Virology, University of Zurich, Zurich 8057, Switzerland
| | - Sabine Yerly
- Laboratory of Virology, Division of Infectious Diseases, Geneva University Hospital, Geneva, Switzerland
| | - Thomas Klimkait
- Department of Biomedicine-Petersplatz, University of Basel, Basel, Switzerland
| | - Vincent Aubert
- Division of Immunology and Allergy, University Hospital Lausanne, Lausanne, Switzerland
| | - Huldrych F Günthard
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland Institute of Medical Virology
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | | |
Collapse
|
16
|
Abstract
Rapid advances in high-throughput sequencing and a growing realization of the importance of evolutionary theory to cancer genomics have led to a proliferation of phylogenetic studies of tumour progression. These studies have yielded not only new insights but also a plethora of experimental approaches, sometimes reaching conflicting or poorly supported conclusions. Here, we consider this body of work in light of the key computational principles underpinning phylogenetic inference, with the goal of providing practical guidance on the design and analysis of scientifically rigorous tumour phylogeny studies. We survey the range of methods and tools available to the researcher, their key applications, and the various unsolved problems, closing with a perspective on the prospects and broader implications of this field.
Collapse
Affiliation(s)
- Russell Schwartz
- Department of Biological Sciences and Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15217, USA
| | - Alejandro A Schäffer
- Computational Biology Branch, National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland 20892, USA
| |
Collapse
|
17
|
Algorithmic methods to infer the evolutionary trajectories in cancer progression. Proc Natl Acad Sci U S A 2016; 113:E4025-34. [PMID: 27357673 DOI: 10.1073/pnas.1520213113] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The genomic evolution inherent to cancer relates directly to a renewed focus on the voluminous next-generation sequencing data and machine learning for the inference of explanatory models of how the (epi)genomic events are choreographed in cancer initiation and development. However, despite the increasing availability of multiple additional -omics data, this quest has been frustrated by various theoretical and technical hurdles, mostly stemming from the dramatic heterogeneity of the disease. In this paper, we build on our recent work on the "selective advantage" relation among driver mutations in cancer progression and investigate its applicability to the modeling problem at the population level. Here, we introduce PiCnIc (Pipeline for Cancer Inference), a versatile, modular, and customizable pipeline to extract ensemble-level progression models from cross-sectional sequenced cancer genomes. The pipeline has many translational implications because it combines state-of-the-art techniques for sample stratification, driver selection, identification of fitness-equivalent exclusive alterations, and progression model inference. We demonstrate PiCnIc's ability to reproduce much of the current knowledge on colorectal cancer progression as well as to suggest novel experimentally verifiable hypotheses.
Collapse
|
18
|
Abstract
Mathematical modelling approaches have become increasingly abundant in cancer research. The complexity of cancer is well suited to quantitative approaches as it provides challenges and opportunities for new developments. In turn, mathematical modelling contributes to cancer research by helping to elucidate mechanisms and by providing quantitative predictions that can be validated. The recent expansion of quantitative models addresses many questions regarding tumour initiation, progression and metastases as well as intra-tumour heterogeneity, treatment responses and resistance. Mathematical models can complement experimental and clinical studies, but also challenge current paradigms, redefine our understanding of mechanisms driving tumorigenesis and shape future research in cancer biology.
Collapse
Affiliation(s)
- Philipp M Altrock
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Department of Biostatistics, Harvard T.H. Chan School of Public Health, 450 Brookline Avenue, Boston, Massachusetts 02115, USA
- Program for Evolutionary Dynamics, Harvard University, 1 Brattle Square, Suite 6, Cambridge, Massachusetts 02138, USA
| | - Lin L Liu
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Department of Biostatistics, Harvard T.H. Chan School of Public Health, 450 Brookline Avenue, Boston, Massachusetts 02115, USA
| | - Franziska Michor
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Department of Biostatistics, Harvard T.H. Chan School of Public Health, 450 Brookline Avenue, Boston, Massachusetts 02115, USA
| |
Collapse
|
19
|
Masecchia S, Coco S, Barla A, Verri A, Tonini GP. Genome instability model of metastatic neuroblastoma tumorigenesis by a dictionary learning algorithm. BMC Med Genomics 2015; 8:57. [PMID: 26358114 PMCID: PMC4566396 DOI: 10.1186/s12920-015-0132-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2015] [Accepted: 08/28/2015] [Indexed: 12/21/2022] Open
Abstract
Background Metastatic neuroblastoma (NB) occurs in pediatric patients as stage 4S or stage 4 and it is characterized by heterogeneous clinical behavior associated with diverse genotypes. Tumors of stage 4 contain several structural copy number aberrations (CNAs) rarely found in stage 4S. To date, the NB tumorigenesis is not still elucidated, although it is evident that genomic instability plays a critical role in the genesis of the tumor. Here we propose a mathematical approach to decipher genomic data and we provide a new model of NB metastatic tumorigenesis. Method We elucidate NB tumorigenesis using Enhanced Fused Lasso Latent Feature Model (E-FLLat) modeling the array comparative chromosome hybridization (aCGH) data of 190 metastatic NBs (63 stage 4S and 127 stage 4). This model for aCGH segmentation, based on the minimization of functional dictionary learning (DL), combines several penalties tailored to the specificities of aCGH data. In DL, the original signal is approximated by a linear weighted combination of atoms: the elements of the learned dictionary. Results The hierarchical structures for stage 4S shows at the first level of the oncogenetic tree several whole chromosome gains except to the unbalanced gains of 17q, 2p and 2q. Conversely, the high CNA complexity found in stage 4 tumors, requires two different trees. Both stage 4 oncogenetic trees are marked diverged, up to five sublevels and the 17q gain is the most common event at the first level (2/3 nodes). Moreover the 11q deletion, one of the major unfavorable marker of disease progression, occurs before 3p loss indicating that critical chromosome aberrations appear at early stages of tumorigenesis. Finally, we also observed a significant (p = 0.025) association between patient age and chromosome loss in stage 4 cases. Conclusion These results led us to propose a genome instability progressive model in which NB cells initiate with a DNA synthesis uncoupled from cell division, that leads to stage 4S tumors, primarily characterized by numerical aberrations, or stage 4 tumors with high levels of genome instability resulting in complex chromosome rearrangements associated with high tumor aggressiveness and rapid disease progression. Electronic supplementary material The online version of this article (doi:10.1186/s12920-015-0132-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Simona Coco
- Lung Cancer Unit; IRCCS A.O.U. San Martino - IST, Genova, Italy.
| | - Annalisa Barla
- DIBRIS, Università degli Studi di Genova, Genova, Italy.
| | | | - Gian Paolo Tonini
- Neuroblastoma Laboratory, Onco/Hematology Laboratory, Department of Woman and Child Health, University of Padua, Pediatric Research Institute, Fondazione Città della Speranza, Padua, Corso Stati Uniti, 4, 35127, Padua, Italy.
| |
Collapse
|
20
|
Wangsa D, Chowdhury SA, Ryott M, Gertz EM, Elmberger G, Auer G, Åvall Lundqvist E, Küffer S, Ströbel P, Schäffer AA, Schwartz R, Munck-Wikland E, Ried T, Heselmeyer-Haddad K. Phylogenetic analysis of multiple FISH markers in oral tongue squamous cell carcinoma suggests that a diverse distribution of copy number changes is associated with poor prognosis. Int J Cancer 2015; 138:98-109. [PMID: 26175310 DOI: 10.1002/ijc.29691] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2014] [Revised: 04/21/2015] [Accepted: 06/19/2015] [Indexed: 12/31/2022]
Abstract
Oral tongue squamous cell carcinoma (OTSCC) is associated with poor prognosis. To improve prognostication, we analyzed four gene probes (TERC, CCND1, EGFR and TP53) and the centromere probe CEP4 as a marker of chromosomal instability, using fluorescence in situ hybridization (FISH) in single cells from the tumors of sixty-five OTSCC patients (Stage I, n = 15; Stage II, n = 30; Stage III, n = 7; Stage IV, n = 13). Unsupervised hierarchical clustering of the FISH data distinguished three clusters related to smoking status. Copy number increases of all five markers were found to be correlated to non-smoking habits, while smokers in this cohort had low-level copy number gains. Using the phylogenetic modeling software FISHtrees, we constructed models of tumor progression for each patient based on the four gene probes. Then, we derived test statistics on the models that are significant predictors of disease-free and overall survival, independent of tumor stage and smoking status in multivariate analysis. The patients whose tumors were modeled as progressing by a more diverse distribution of copy number changes across the four genes have poorer prognosis. This is consistent with the view that multiple genetic pathways need to become deregulated in order for cancer to progress.
Collapse
Affiliation(s)
- Darawalee Wangsa
- Genetics Branch, Center For Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD.,Department of Oncology-Pathology, Karolinska Institutet, Karolinska University Hospital, Stockholm, Sweden
| | - Salim Akhter Chowdhury
- Joint Carnegie Mellon/University of Pittsburgh Ph.D. Program In Computational Biology, Carnegie Mellon University, Pittsburgh, PA.,Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA
| | - Michael Ryott
- Department of Otorhinolaryngology, Sophiahemmet Hospital, Stockholm, Sweden
| | - E Michael Gertz
- Computational Biology Branch, National Center For Biotechnology Information, National Institutes of Health, Bethesda, MD
| | - Göran Elmberger
- Department of Laboratory Medicine, Pathology, Örebro University Hospital, Örebro, Sweden
| | - Gert Auer
- Department of Oncology-Pathology, Karolinska Institutet, Karolinska University Hospital, Stockholm, Sweden
| | - Elisabeth Åvall Lundqvist
- Department of Oncology-Pathology, Karolinska Institutet, Karolinska University Hospital, Stockholm, Sweden.,Department of Oncology And Department Of Clinical And Experimental Medicine, Linköping University, Linköping, Sweden
| | - Stefan Küffer
- Institute of Pathology, University Medical Center Göttingen, Göttingen, Germany
| | - Philipp Ströbel
- Institute of Pathology, University Medical Center Göttingen, Göttingen, Germany
| | - Alejandro A Schäffer
- Computational Biology Branch, National Center For Biotechnology Information, National Institutes of Health, Bethesda, MD
| | - Russell Schwartz
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA.,Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA
| | - Eva Munck-Wikland
- Department of Oto-Rhino-Laryngology, Head And Neck Surgery, Karolinska University Hospital and Department of Clinical Science, Intervention and Technology, Karolinska Institutet, Stockholm, Sweden
| | - Thomas Ried
- Genetics Branch, Center For Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD
| | - Kerstin Heselmeyer-Haddad
- Genetics Branch, Center For Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD
| |
Collapse
|
21
|
|
22
|
Ramazzotti D, Caravagna G, Olde Loohuis L, Graudenzi A, Korsunsky I, Mauri G, Antoniotti M, Mishra B. CAPRI: efficient inference of cancer progression models from cross-sectional data. Bioinformatics 2015; 31:3016-26. [DOI: 10.1093/bioinformatics/btv296] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Accepted: 05/04/2015] [Indexed: 12/27/2022] Open
|
23
|
Turajlic S, McGranahan N, Swanton C. Inferring mutational timing and reconstructing tumour evolutionary histories. BIOCHIMICA ET BIOPHYSICA ACTA 2015; 1855:264-75. [PMID: 25827356 DOI: 10.1016/j.bbcan.2015.03.005] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/02/2015] [Revised: 03/17/2015] [Accepted: 03/19/2015] [Indexed: 12/28/2022]
Abstract
Cancer evolution can be considered within a Darwinian framework. Both micro and macro-evolutionary theories can be applied to understand tumour progression and treatment failure. Owing to cancers' complexity and heterogeneity the rules of tumour evolution, such as the role of selection, remain incompletely understood. The timing of mutational events during tumour evolution presents diagnostic, prognostic and therapeutic opportunities. Here we review the current sampling and computational approaches for inferring mutational timing and the evidence from next generation sequencing-informed data on mutational timing across all tumour types. We discuss how this knowledge can be used to illuminate the genes and pathways that drive cancer initiation and relapse; and to support drug development and clinical trial design.
Collapse
Affiliation(s)
- Samra Turajlic
- The Francis Crick Institute, 44 Lincoln's Inn Fields, London WC2A 3LY, UK
| | | | - Charles Swanton
- The Francis Crick Institute, 44 Lincoln's Inn Fields, London WC2A 3LY, UK; UCL Cancer Institute, CRUK Lung Cancer Centre of Excellence, Huntley Street, WC1E 6DD, UK.
| |
Collapse
|
24
|
Raphael BJ, Vandin F. Simultaneous inference of cancer pathways and tumor progression from cross-sectional mutation data. J Comput Biol 2015; 22:510-27. [PMID: 25785493 DOI: 10.1089/cmb.2014.0161] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Recent cancer sequencing studies provide a wealth of somatic mutation data from a large number of patients. One of the most intriguing and challenging questions arising from this data is to determine whether the temporal order of somatic mutations in a cancer follows any common progression. Since we usually obtain only one sample from a patient, such inferences are commonly made from cross-sectional data from different patients. This analysis is complicated by the extensive variation in the somatic mutations across different patients, variation that is reduced by examining combinations of mutations in various pathways. Thus far, methods to reconstruct tumor progression at the pathway level have restricted attention to known, a priori defined pathways. In this work we show how to simultaneously infer pathways and the temporal order of their mutations from cross-sectional data, leveraging on the exclusivity property of driver mutations within a pathway. We define the pathway linear progression model, and derive a combinatorial formulation for the problem of finding the optimal model from mutation data. We show that with enough samples the optimal solution to this problem uniquely identifies the correct model with high probability even when errors are present in the mutation data. We then formulate the problem as an integer linear program (ILP), which allows the analysis of datasets from recent studies with large numbers of samples. We use our algorithm to analyze somatic mutation data from three cancer studies, including two studies from The Cancer Genome Atlas (TCGA) on large number of samples on colorectal cancer and glioblastoma. The models reconstructed with our method capture most of the current knowledge of the progression of somatic mutations in these cancer types, while also providing new insights on the tumor progression at the pathway level.
Collapse
Affiliation(s)
- Benjamin J Raphael
- 1Department of Computer Science and Center for Computational Molecular Biology, Brown University, Providence, Rhode Island
| | - Fabio Vandin
- 1Department of Computer Science and Center for Computational Molecular Biology, Brown University, Providence, Rhode Island.,2Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
25
|
Diaz-Uriarte R. Identifying restrictions in the order of accumulation of mutations during tumor progression: effects of passengers, evolutionary models, and sampling. BMC Bioinformatics 2015; 16:41. [PMID: 25879190 PMCID: PMC4339747 DOI: 10.1186/s12859-015-0466-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2014] [Accepted: 01/15/2015] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Cancer progression is caused by the sequential accumulation of mutations, but not all orders of accumulation are equally likely. When the fixation of some mutations depends on the presence of previous ones, identifying restrictions in the order of accumulation of mutations can lead to the discovery of therapeutic targets and diagnostic markers. The purpose of this study is to conduct a comprehensive comparison of the performance of all available methods to identify these restrictions from cross-sectional data. I used simulated data sets (where the true restrictions are known) but, in contrast to previous work, I embedded restrictions within evolutionary models of tumor progression that included passengers (mutations not responsible for the development of cancer, known to be very common). This allowed me to assess, for the first time, the effects of having to filter out passengers, of sampling schemes (when, how, and how many samples), and of deviations from order restrictions. RESULTS Poor choices of method, filtering, and sampling lead to large errors in all performance measures. Having to filter passengers lead to decreased performance, especially because true restrictions were missed. Overall, the best method for identifying order restrictions were Oncogenetic Trees, a fast and easy to use method that, although unable to recover dependencies of mutations on more than one mutation, showed good performance in most scenarios, superior to Conjunctive Bayesian Networks and Progression Networks. Single cell sampling provided no advantage, but sampling in the final stages of the disease vs. sampling at different stages had severe effects. Evolutionary model and deviations from order restrictions had major, and sometimes counterintuitive, interactions with other factors that affected performance. CONCLUSIONS This paper provides practical recommendations for using these methods with experimental data. It also identifies key areas of future methodological work and, in particular, it shows that it is both possible and necessary to embed assumptions about order restrictions and the nature of driver status within evolutionary models of cancer progression to evaluate the performance of inferential approaches.
Collapse
Affiliation(s)
- Ramon Diaz-Uriarte
- Dept. Biochemistry, Universidad Autónoma de Madrid, Instituto de Investigaciones Biomédicas "Alberto Sols" (UAM-CSIC), Arzobispo Morcillo, 4, 28029, Madrid, Spain.
| |
Collapse
|
26
|
Li G, Theys K, Verheyen J, Pineda-Peña AC, Khouri R, Piampongsant S, Eusébio M, Ramon J, Vandamme AM. A new ensemble coevolution system for detecting HIV-1 protein coevolution. Biol Direct 2015; 10:1. [PMID: 25564011 PMCID: PMC4332441 DOI: 10.1186/s13062-014-0031-8] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2014] [Accepted: 12/02/2014] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND A key challenge in the field of HIV-1 protein evolution is the identification of coevolving amino acids at the molecular level. In the past decades, many sequence-based methods have been designed to detect position-specific coevolution within and between different proteins. However, an ensemble coevolution system that integrates different methods to improve the detection of HIV-1 protein coevolution has not been developed. RESULTS We integrated 27 sequence-based prediction methods published between 2004 and 2013 into an ensemble coevolution system. This system allowed combinations of different sequence-based methods for coevolution predictions. Using HIV-1 protein structures and experimental data, we evaluated the performance of individual and combined sequence-based methods in the prediction of HIV-1 intra- and inter-protein coevolution. We showed that sequence-based methods clustered according to their methodology, and a combination of four methods outperformed any of the 27 individual methods. This four-method combination estimated that HIV-1 intra-protein coevolving positions were mainly located in functional domains and physically contacted with each other in the protein tertiary structures. In the analysis of HIV-1 inter-protein coevolving positions between Gag and protease, protease drug resistance positions near the active site mostly coevolved with Gag cleavage positions (V128, S373-T375, A431, F448-P453) and Gag C-terminal positions (S489-Q500) under selective pressure of protease inhibitors. CONCLUSIONS This study presents a new ensemble coevolution system which detects position-specific coevolution using combinations of 27 different sequence-based methods. Our findings highlight key coevolving residues within HIV-1 structural proteins and between Gag and protease, shedding light on HIV-1 intra- and inter-protein coevolution.
Collapse
Affiliation(s)
- Guangdi Li
- KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium.
| | - Kristof Theys
- KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium.
| | - Jens Verheyen
- Institute of Virology, University hospital, University Duisburg-Essen, Essen, Germany.
| | - Andrea-Clemencia Pineda-Peña
- KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium. .,Clinical and Molecular Infectious Disease Group, Faculty of Sciences and Mathematics, Universidad del Rosario, Bogotá, Colombia.
| | - Ricardo Khouri
- KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium.
| | - Supinya Piampongsant
- KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium.
| | - Mónica Eusébio
- Centro de Malária e Outras Doenças Tropicais and Unidade de Microbiologia, Instituto de Higiene e Medicina Tropical, Universidade Nova de Lisboa, Lisboa, Portugal.
| | - Jan Ramon
- Department of Computer Science, KU Leuven - University of Leuven, Leuven, Belgium.
| | - Anne-Mieke Vandamme
- KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium. .,Centro de Malária e Outras Doenças Tropicais and Unidade de Microbiologia, Instituto de Higiene e Medicina Tropical, Universidade Nova de Lisboa, Lisboa, Portugal.
| |
Collapse
|
27
|
Beerenwinkel N, Schwarz RF, Gerstung M, Markowetz F. Cancer evolution: mathematical models and computational inference. Syst Biol 2015; 64:e1-25. [PMID: 25293804 PMCID: PMC4265145 DOI: 10.1093/sysbio/syu081] [Citation(s) in RCA: 203] [Impact Index Per Article: 22.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2013] [Accepted: 09/26/2014] [Indexed: 12/12/2022] Open
Abstract
Cancer is a somatic evolutionary process characterized by the accumulation of mutations, which contribute to tumor growth, clinical progression, immune escape, and drug resistance development. Evolutionary theory can be used to analyze the dynamics of tumor cell populations and to make inference about the evolutionary history of a tumor from molecular data. We review recent approaches to modeling the evolution of cancer, including population dynamics models of tumor initiation and progression, phylogenetic methods to model the evolutionary relationship between tumor subclones, and probabilistic graphical models to describe dependencies among mutations. Evolutionary modeling helps to understand how tumors arise and will also play an increasingly important prognostic role in predicting disease progression and the outcome of medical interventions, such as targeted therapy.
Collapse
Affiliation(s)
- Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland; SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland; European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, CB20RE, United Kingdom Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland; SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland; European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, CB20RE, United Kingdom
| | - Roland F Schwarz
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland; SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland; European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, CB20RE, United Kingdom
| | - Moritz Gerstung
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland; SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland; European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, CB20RE, United Kingdom
| | - Florian Markowetz
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland; SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland; European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom; Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, CB20RE, United Kingdom
| |
Collapse
|
28
|
Loohuis LO, Caravagna G, Graudenzi A, Ramazzotti D, Mauri G, Antoniotti M, Mishra B. Inferring tree causal models of cancer progression with probability raising. PLoS One 2014; 9:e108358. [PMID: 25299648 PMCID: PMC4191986 DOI: 10.1371/journal.pone.0108358] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2014] [Accepted: 08/27/2014] [Indexed: 11/20/2022] Open
Abstract
Existing techniques to reconstruct tree models of progression for accumulative processes, such as cancer, seek to estimate causation by combining correlation and a frequentist notion of temporal priority. In this paper, we define a novel theoretical framework called CAPRESE (CAncer PRogression Extraction with Single Edges) to reconstruct such models based on the notion of probabilistic causation defined by Suppes. We consider a general reconstruction setting complicated by the presence of noise in the data due to biological variation, as well as experimental or measurement errors. To improve tolerance to noise we define and use a shrinkage-like estimator. We prove the correctness of our algorithm by showing asymptotic convergence to the correct tree under mild constraints on the level of noise. Moreover, on synthetic data, we show that our approach outperforms the state-of-the-art, that it is efficient even with a relatively small number of samples and that its performance quickly converges to its asymptote as the number of samples increases. For real cancer datasets obtained with different technologies, we highlight biologically significant differences in the progressions inferred with respect to other competing techniques and we also show how to validate conjectured biological relations with progression models.
Collapse
Affiliation(s)
- Loes Olde Loohuis
- Center for Neurobehavioral Genetics, University of California Los Angeles, Los Angeles, United States of America
| | - Giulio Caravagna
- Dipartimento di Informatica Sistemistica e Comunicazione, Università degli Studi, Milano-Bicocca, Milano, Italy
| | - Alex Graudenzi
- Dipartimento di Informatica Sistemistica e Comunicazione, Università degli Studi, Milano-Bicocca, Milano, Italy
| | - Daniele Ramazzotti
- Dipartimento di Informatica Sistemistica e Comunicazione, Università degli Studi, Milano-Bicocca, Milano, Italy
| | - Giancarlo Mauri
- Dipartimento di Informatica Sistemistica e Comunicazione, Università degli Studi, Milano-Bicocca, Milano, Italy
| | - Marco Antoniotti
- Dipartimento di Informatica Sistemistica e Comunicazione, Università degli Studi, Milano-Bicocca, Milano, Italy
| | - Bud Mishra
- Courant Institute of Mathematical Sciences, New York University, New York, United States of America
| |
Collapse
|
29
|
Chowdhury SA, Shackney SE, Heselmeyer-Haddad K, Ried T, Schäffer AA, Schwartz R. Algorithms to model single gene, single chromosome, and whole genome copy number changes jointly in tumor phylogenetics. PLoS Comput Biol 2014; 10:e1003740. [PMID: 25078894 PMCID: PMC4117424 DOI: 10.1371/journal.pcbi.1003740] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2014] [Accepted: 06/04/2014] [Indexed: 02/07/2023] Open
Abstract
We present methods to construct phylogenetic models of tumor progression at the cellular level that include copy number changes at the scale of single genes, entire chromosomes, and the whole genome. The methods are designed for data collected by fluorescence in situ hybridization (FISH), an experimental technique especially well suited to characterizing intratumor heterogeneity using counts of probes to genetic regions frequently gained or lost in tumor development. Here, we develop new provably optimal methods for computing an edit distance between the copy number states of two cells given evolution by copy number changes of single probes, all probes on a chromosome, or all probes in the genome. We then apply this theory to develop a practical heuristic algorithm, implemented in publicly available software, for inferring tumor phylogenies on data from potentially hundreds of single cells by this evolutionary model. We demonstrate and validate the methods on simulated data and published FISH data from cervical cancers and breast cancers. Our computational experiments show that the new model and algorithm lead to more parsimonious trees than prior methods for single-tumor phylogenetics and to improved performance on various classification tasks, such as distinguishing primary tumors from metastases obtained from the same patient population. Cancer is an evolutionary system whose growth and development is attributed to aberrations in well-known genes and to cancer-type specific genomic imbalances. Here, we present methods for reconstructing the evolution of individual tumors based on cell-to-cell variations between copy numbers of targeted regions of the genome. The methods are designed to work with fluorescence in situ hybridization (FISH), a technique that allows one to profile copy number changes in potentially thousands of single cells per study. Our work advances the prior art by developing theory and practical algorithms for building evolutionary trees of single tumors that can model gain or loss of genetic regions at the scale of single genes, whole chromosomes, or the entire genome, all common events in tumor evolution. We apply these methods on simulated and real tumor data to demonstrate substantial improvements in tree-building accuracy and in our ability to accurately classify tumors from their inferred evolutionary models. The newly developed algorithms have been released through our publicly available software, FISHtrees.
Collapse
Affiliation(s)
- Salim Akhter Chowdhury
- Joint Carnegie Mellon/University of Pittsburgh Ph.D. Program in Computational Biology, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- Lane Center for Computational Biology, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Stanley E. Shackney
- Intelligent Oncotherapeutics, Pittsburgh, Pennsylvania, United States of America
| | | | - Thomas Ried
- Genetics Branch, Center for Cancer Research, NCI, NIH, Bethesda, Maryland, United States of America
| | - Alejandro A. Schäffer
- Computational Biology Branch, NCBI, NIH, Bethesda, Maryland, United States of America
| | - Russell Schwartz
- Lane Center for Computational Biology, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|
30
|
Purdom E, Ho C, Grasso CS, Quist MJ, Cho RJ, Spellman P. Methods and challenges in timing chromosomal abnormalities within cancer samples. ACTA ACUST UNITED AC 2013; 29:3113-20. [PMID: 24064421 PMCID: PMC3842754 DOI: 10.1093/bioinformatics/btt546] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Motivation: Tumors acquire many chromosomal amplifications, and those acquired early in the lifespan of the tumor may be not only important for tumor growth but also can be used for diagnostic purposes. Many methods infer the order of the accumulation of abnormalities based on their occurrence in a large cohort of patients. Recently, Durinck et al. (2011) and Greenman et al. (2012) developed methods to order a single tumor’s chromosomal amplifications based on the patterns of mutations accumulated within those regions. This method offers an unprecedented opportunity to assess the etiology of a single tumor sample, but has not been widely evaluated. Results: We show that the model for timing chromosomal amplifications is limited in scope, particularly for regions with high levels of amplification. We also show that the estimation of the order of events can be sensitive for events that occur early in the progression of the tumor and that the partial maximum likelihood method of Greenman et al. (2012) can give biased estimates, particularly for moderate read coverage or normal contamination. We propose a maximum-likelihood estimation procedure that fully accounts for sequencing variability and show that it outperforms the partial maximum-likelihood estimation method. We also propose a Bayesian estimation procedure that stabilizes the estimates in certain settings. We implement these methods on a small number of ovarian tumors, and the results suggest possible differences in how the tumors acquired amplifications. Availability and implementation: We provide implementation of these methods in an R package cancerTiming, which is available from the Comprehensive R Archive Network (CRAN) at http://CRAN.R-project.org/. Contact:epurdom@stat.Berkeley.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Elizabeth Purdom
- Department of Statistics, University of California, Berkeley, 367 Evans Hall Berkeley, CA 94720-3860, USA, Department of Molecular and Medical Genetics, Oregon Health & Science University, Portland, OR 97239, USA and Department of Dermatology, University of California, San Francisco, CA 94115, USA
| | | | | | | | | | | |
Collapse
|
31
|
Shahrabi Farahani H, Lagergren J. Learning oncogenetic networks by reducing to mixed integer linear programming. PLoS One 2013; 8:e65773. [PMID: 23799047 PMCID: PMC3683041 DOI: 10.1371/journal.pone.0065773] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2012] [Accepted: 04/28/2013] [Indexed: 12/22/2022] Open
Abstract
Cancer can be a result of accumulation of different types of genetic mutations such as copy number aberrations. The data from tumors are cross-sectional and do not contain the temporal order of the genetic events. Finding the order in which the genetic events have occurred and progression pathways are of vital importance in understanding the disease. In order to model cancer progression, we propose Progression Networks, a special case of Bayesian networks, that are tailored to model disease progression. Progression networks have similarities with Conjunctive Bayesian Networks (CBNs) [1],a variation of Bayesian networks also proposed for modeling disease progression. We also describe a learning algorithm for learning Bayesian networks in general and progression networks in particular. We reduce the hard problem of learning the Bayesian and progression networks to Mixed Integer Linear Programming (MILP). MILP is a Non-deterministic Polynomial-time complete (NP-complete) problem for which very good heuristics exists. We tested our algorithm on synthetic and real cytogenetic data from renal cell carcinoma. We also compared our learned progression networks with the networks proposed in earlier publications. The software is available on the website https://bitbucket.org/farahani/diprog.
Collapse
Affiliation(s)
- Hossein Shahrabi Farahani
- KTH Royal Institute of Technology, Science for Life Laboratory (SciLifeLab), Center for Industrial and Applied Mathematics, School of Computer Science and Communication, Stockholm, Sweden
| | - Jens Lagergren
- KTH Royal Institute of Technology, Science for Life Laboratory (SciLifeLab), Center for Industrial and Applied Mathematics, School of Computer Science and Communication, Stockholm, Sweden
- * E-mail:
| |
Collapse
|
32
|
Czibula G, Bocicor IM, Czibula IG. Temporal ordering of cancer microarray data through a reinforcement learning based approach. PLoS One 2013; 8:e60883. [PMID: 23565283 PMCID: PMC3614992 DOI: 10.1371/journal.pone.0060883] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2012] [Accepted: 03/04/2013] [Indexed: 11/19/2022] Open
Abstract
Temporal modeling and analysis and more specifically, temporal ordering are very important problems within the fields of bioinformatics and computational biology, as the temporal analysis of the events characterizing a certain biological process could provide significant insights into its development and progression. Particularly, in the case of cancer, understanding the dynamics and the evolution of this disease could lead to better methods for prediction and treatment. In this paper we tackle, from a computational perspective, the temporal ordering problem, which refers to constructing a sorted collection of multi-dimensional biological data, collection that reflects an accurate temporal evolution of biological systems. We introduce a novel approach, based on reinforcement learning, more precisely, on Q-learning, for the biological temporal ordering problem. The experimental evaluation is performed using several DNA microarray data sets, two of which contain cancer gene expression data. The obtained solutions are correlated either to the given correct ordering (in the cases where this is provided for validation), or to the overall survival time of the patients (in the case of the cancer data sets), thus confirming a good performance of the proposed model and indicating the potential of our proposal.
Collapse
Affiliation(s)
- Gabriela Czibula
- Department of Computer Science, Babes-Bolyai University, Cluj-Napoca, Romania
| | - Iuliana M. Bocicor
- Department of Computer Science, Babes-Bolyai University, Cluj-Napoca, Romania
| | | |
Collapse
|
33
|
Izu A, Cohen T, Degruttola V. Bayesian estimation of mixture models with prespecified elements to compare drug resistance in treatment-naïve and experienced tuberculosis cases. PLoS Comput Biol 2013; 9:e1002973. [PMID: 23555210 PMCID: PMC3605089 DOI: 10.1371/journal.pcbi.1002973] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2012] [Accepted: 01/21/2013] [Indexed: 11/23/2022] Open
Abstract
We propose a Bayesian approach for estimating branching tree mixture models to compare drug-resistance pathways (i.e. patterns of sequential acquisition of resistance to individual antibiotics) that are observed among Mycobacterium tuberculosis isolates collected from treatment-naïve and treatment-experienced patients. Resistant pathogens collected from treatment-naïve patients are strains for which fitness costs of resistance were not sufficient to prevent transmission, whereas those collected from treatment-experienced patients reflect both transmitted and acquired resistance, the latter of which may or may not be associated with lower transmissibility. The comparison of the resistance pathways constructed from these two groups of drug-resistant strains provides insight into which pathways preferentially lead to the development of multiple drug resistant strains that are transmissible. We apply the proposed statistical methods to data from worldwide surveillance of drug-resistant tuberculosis collected by the World Health Organization over 13 years. Drug-resistant tuberculosis (TB) initially arises as a result of the sporadic appearance and subsequent selection of drug-resistant M. tuberculosis mutants. Such strains may or may not be associated with fitness costs affecting their ability to transmit and cause disease. Resistant pathogens collected from treatment-naïve patients are strains for which fitness costs of resistance were not sufficient to prevent transmission. Those collected from treatment-experienced patients reflect strains that may or may not be associated with lower transmissibility. Determining which strains are sufficiently fit to be transmitted and cause disease can aid in developing effective strategies to combat the spread of resistance. Branching trees are graphical models used to infer the sequence of several binary events (i.e. a pathway) that have occurred in an unknown order. We propose a novel method using branching trees with prespecified components to compare evolutionary pathways among different populations. We apply our model to understand if there are unique drug-resistant pathways found only amongst treatment experienced patients that might reflect acquired resistant disease associated with fitness costs that limits its ability to transmit. Our methods can be generalized to any biological process for which the assumption of an ascending markov process applies.
Collapse
Affiliation(s)
- Alane Izu
- Department of Science and Technology/National Research Foundation, Vaccine Preventable Diseases and Respiratory & Meningeal Pathogens Research Unit, University of Witwatersrand, Faculty of Health Science, Johannesburg, Gauteng, South Africa.
| | | | | |
Collapse
|
34
|
Hainke K, Rahnenführer J, Fried R. Cumulative disease progression models for cross-sectional data: a review and comparison. Biom J 2012; 54:617-40. [PMID: 22886685 DOI: 10.1002/bimj.201100186] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2011] [Revised: 04/19/2012] [Accepted: 05/25/2012] [Indexed: 11/06/2022]
Abstract
A better understanding of disease progression is beneficial for early diagnosis and appropriate individual therapy. Many different approaches for statistical modelling of cumulative disease progression have been proposed in the literature, including simple path models up to complex restricted Bayesian networks. Important fields of application are diseases such as cancer and HIV. Tumour progression is measured by means of chromosome aberrations, whereas people infected with HIV develop drug resistances because of genetic changes of the HI-virus. These two very different diseases have typical courses of disease progression, which can be modelled partly by consecutive and partly by independent steps. This paper gives an overview of the different progression models and points out their advantages and drawbacks. Different models are compared via simulations to analyse how they work if some of their assumptions are violated. In a simulation study, we evaluate how models perform in terms of fitting induced multivariate probability distributions and topological relationships. We often find that the true model class used for generating data is outperformed by either a less or a more complex model class. The more flexible conjunctive Bayesian networks can be used to fit oncogenetic trees, whereas mixtures of oncogenetic trees with three tree components can be well fitted by mixture models with only two tree components.
Collapse
Affiliation(s)
- Katrin Hainke
- Department of Statistics, TU Dortmund University, 44221 Dortmund, Germany.
| | | | | |
Collapse
|
35
|
Cheng YK, Beroukhim R, Levine RL, Mellinghoff IK, Holland EC, Michor F. A mathematical methodology for determining the temporal order of pathway alterations arising during gliomagenesis. PLoS Comput Biol 2012; 8:e1002337. [PMID: 22241976 PMCID: PMC3252265 DOI: 10.1371/journal.pcbi.1002337] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2011] [Accepted: 11/17/2011] [Indexed: 12/31/2022] Open
Abstract
Human cancer is caused by the accumulation of genetic alterations in cells. Of special importance are changes that occur early during malignant transformation because they may result in oncogene addiction and thus represent promising targets for therapeutic intervention. We have previously described a computational approach, called Retracing the Evolutionary Steps in Cancer (RESIC), to determine the temporal sequence of genetic alterations during tumorigenesis from cross-sectional genomic data of tumors at their fully transformed stage. Since alterations within a set of genes belonging to a particular signaling pathway may have similar or equivalent effects, we applied a pathway-based systems biology approach to the RESIC methodology. This method was used to determine whether alterations of specific pathways develop early or late during malignant transformation. When applied to primary glioblastoma (GBM) copy number data from The Cancer Genome Atlas (TCGA) project, RESIC identified a temporal order of pathway alterations consistent with the order of events in secondary GBMs. We then further subdivided the samples into the four main GBM subtypes and determined the relative contributions of each subtype to the overall results: we found that the overall ordering applied for the proneural subtype but differed for mesenchymal samples. The temporal sequence of events could not be identified for neural and classical subtypes, possibly due to a limited number of samples. Moreover, for samples of the proneural subtype, we detected two distinct temporal sequences of events: (i) RAS pathway activation was followed by TP53 inactivation and finally PI3K2 activation, and (ii) RAS activation preceded only AKT activation. This extension of the RESIC methodology provides an evolutionary mathematical approach to identify the temporal sequence of pathway changes driving tumorigenesis and may be useful in guiding the understanding of signaling rearrangements in cancer development. Cancer is a deadly disease that develops through the accumulation of genetic changes over time. Many biological models do not incorporate this temporal aspect of tumor formation and progression, in part due to the difficulty of determining the sequence of events through biological experimentation for most cancer types. We previously developed a computational algorithm with which we can quickly and cost-effectively determine the order in which mutations arise in the tumor even when large numbers of mutations are considered. In this paper, we extended our method to incorporate biological knowledge of the common pathways by which cancer progresses. We applied these techniques to primary glioblastoma, the most common form of brain cancer. We found that when all samples are taken into account, a temporal sequence of pathway events emerges; however, different subtypes of glioblastoma vary in their temporal sequence of events. This algorithm can also be easily applied to other cancer types as clinical data becomes available, showing the benefit of computational and mathematical tools in cancer research. Using temporal information, cancer biologists will be able to develop more accurate animal models of tumor formation and learn more about how mutations interact in time, thus leading to better treatments for cancer.
Collapse
Affiliation(s)
- Yu-Kang Cheng
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, and Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
- Cancer Biology and Genetics Program, Brain Tumor Center, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
- Tri-Institutional Training Program in Computational Biology and Medicine, Weill Cornell Medical College, New York, New York, United States of America
| | - Rameen Beroukhim
- Departments of Cancer Biology and Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America, Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America, Department of Medicine, Brigham and Women's Hospital, Brigham and Women's Hospital, Boston, Massachusetts, United States of America, and Cancer Program, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Ross L. Levine
- Human Oncology and Pathogenesis Program, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
| | - Ingo K. Mellinghoff
- Human Oncology and Pathogenesis Program, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
| | - Eric C. Holland
- Cancer Biology and Genetics Program, Brain Tumor Center, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
| | - Franziska Michor
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, and Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
36
|
Gerstung M, Eriksson N, Lin J, Vogelstein B, Beerenwinkel N. The temporal order of genetic and pathway alterations in tumorigenesis. PLoS One 2011; 6:e27136. [PMID: 22069497 PMCID: PMC3206070 DOI: 10.1371/journal.pone.0027136] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2011] [Accepted: 10/11/2011] [Indexed: 01/06/2023] Open
Abstract
Cancer evolves through the accumulation of mutations, but the order in which mutations occur is poorly understood. Inference of a temporal ordering on the level of genes is challenging because clinically and histologically identical tumors often have few mutated genes in common. This heterogeneity may at least in part be due to mutations in different genes having similar phenotypic effects by acting in the same functional pathway. We estimate the constraints on the order in which alterations accumulate during cancer progression from cross-sectional mutation data using a probabilistic graphical model termed Hidden Conjunctive Bayesian Network (H-CBN). The possible orders are analyzed on the level of genes and, after mapping genes to functional pathways, also on the pathway level. We find stronger evidence for pathway order constraints than for gene order constraints, indicating that temporal ordering results from selective pressure acting at the pathway level. The accumulation of changes in core pathways differs among cancer types, yet a common feature is that progression appears to begin with mutations in genes that regulate apoptosis pathways and to conclude with mutations in genes involved in invasion pathways. H-CBN models provide a quantitative and intuitive model of tumorigenesis showing that the genetic events can be linked to the phenotypic progression on the level of pathways.
Collapse
Affiliation(s)
- Moritz Gerstung
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- Swiss Institute of Bioinformatics (SIB), Basel, Switzerland
| | - Nicholas Eriksson
- Department of Statistics, University of Chicago, Chicago, Illinois, United States of America
| | - Jimmy Lin
- Ludwig Center and Howard Hughes Medical Institute, Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins, Baltimore, Maryland, United States of America
| | - Bert Vogelstein
- Ludwig Center and Howard Hughes Medical Institute, Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins, Baltimore, Maryland, United States of America
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- Swiss Institute of Bioinformatics (SIB), Basel, Switzerland
- * E-mail:
| |
Collapse
|
37
|
Lawyer G, Altmann A, Thielen A, Zazzi M, Sönnerborg A, Lengauer T. HIV-1 mutational pathways under multidrug therapy. AIDS Res Ther 2011; 8:26. [PMID: 21794106 PMCID: PMC3162516 DOI: 10.1186/1742-6405-8-26] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2011] [Accepted: 07/27/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Genotype-derived drug resistance profiles are a valuable asset in HIV-1 therapy decisions. Therapy decisions could be further improved, both in terms of predicting length of current therapy success and in preserving followup therapy options, through better knowledge of mutational pathways- here defined as specific locations on the viral genome which, when mutant, alter the risk that additional specific mutations arise. We limit the search to locations in the reverse transcriptase region of the HIV-1 genome which host resistance mutations to nucleoside (NRTI) and non-nucleoside (NNRTI) reverse transcriptase inhibitors (as listed in the 2008 International AIDS Society report), or which were mutant at therapy start in 5% or more of the therapies studied. METHODS A Cox proportional hazards model was fit to each location with the hazard of a mutation at that location during therapy proportional to the presence/absence of mutations at the remaining locations at therapy start. A pathway from preexisting to occurring mutation was indicated if the covariate was both selected as important via smoothly clipped absolute deviation (a form of regularized regression) and had a small p-value. The Cox model also allowed controlling for non-genetic parameters and potential nuisance factors such as viral resistance and number of previous therapies. Results were based on 1981 therapies given to 1495 distinct patients drawn from the EuResist database. RESULTS The strongest influence on the hazard of developing NRTI resistance was having more than four previous therapies, not any one existing resistance mutation. Known NRTI resistance pathways were shown, and previously speculated inhibition between the thymidine analog pathways was evidenced. Evidence was found for a number of specific pathways between NRTI and NNRTI resistance sites. A number of common mutations were shown to increase the hazard of developing both NRTI and NNRTI resistance. Viral resistance to the therapy compounds did not materially effect the hazard of mutation in our model. CONCLUSIONS The accuracy of therapy outcome prediction tools may be increased by including the number of previous treatments, and by considering locations in the HIV genome which increase the hazard of developing resistance mutations.
Collapse
|
38
|
Abstract
Genomic instability, the propensity of aberrations in chromosomes, plays a critical role in the development of many diseases. High throughput genotyping experiments have been performed to study genomic instability in diseases. The output of such experiments can be summarized as high-dimensional binary vectors, where each binary variable records aberration status at one marker locus. It is of keen interest to understand how aberrations may interact with each other, as it provides insight into the process of the disease development. In this article, we propose a novel method, LogitNet, to infer such interactions among these aberration events. The method is based on penalized logistic regression with an extension to account for spatial correlation in the genomic instability data. We conduct extensive simulation studies and show that the proposed method performs well in the situations considered. Finally, we illustrate the method using genomic instability data from breast cancer samples.
Collapse
Affiliation(s)
- Pei Wang
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA
| | | | | |
Collapse
|
39
|
Izu A, Cohen T, Mitnick C, Murray M, De Gruttola V. Bayesian methods for fitting mixture models that characterize branching tree processes: An application to development of resistant TB strains. Stat Med 2011; 30:2708-20. [PMID: 21717491 DOI: 10.1002/sim.4287] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2010] [Accepted: 03/31/2011] [Indexed: 11/10/2022]
Abstract
For pathogens that must be treated with combinations of antibiotics and acquire resistance through genetic mutation, knowledge of the order in which drug-resistance mutations occur may be important for determining treatment policies. Diagnostic specimens collected from patients are often available; this makes it possible to determine the presence of individual drug resistance-conferring mutations and combinations of these mutations. In most cases, these specimens are only available from a patient at a single point in time; it is very rare to have access to multiple specimens from a single patient collected over time as resistance accumulates to multiple drugs. Statistical methods that use branching trees have been successfully applied to such cross-sectional data to make inference on the ordering of events that occurred prior to sampling. Here, we propose a Bayesian approach to fitting branching tree models that has several advantages, including the ability to accommodate prior information regarding measurement error or cross resistance and the natural way it permits the characterization of uncertainty. Our methods are applied to a data set for drug-resistant TB in Peru; the goal of the analysis was to determine the order with which patients develop resistance to the drugs commonly used for treating TB in this setting.
Collapse
Affiliation(s)
- Alane Izu
- Harvard School of Public Health, 655 Huntington Avenue, Boston, MA 02115, USA.
| | | | | | | | | |
Collapse
|
40
|
Differences in reversion of resistance mutations to wild-type under structured treatment interruption and related increase in replication capacity. PLoS One 2011; 6:e14638. [PMID: 21297946 PMCID: PMC3031504 DOI: 10.1371/journal.pone.0014638] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2010] [Accepted: 12/01/2010] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND The CPCRA 064 study examined the effect of structured treatment interruption (STI) of up to 4 months followed by salvage treatment in patients failing therapy with multi-drug resistant HIV. We examined the relationship between the reversion rate of major reverse transcriptase (RT) resistance-associated mutations and change in viral replication capacity (RC). The dataset included 90 patients with RC and genotypic data from virus samples collected at 0 (baseline), 2 and 4 months of STI. PRINCIPAL FINDINGS Rapid shift towards wild-type RC was observed during the first 2 months of STI. Median RC increased from 47.5% at baseline to 86.0% at 2 months and to 97.5% at 4 months. Between baseline and 2 months of STI, T215F had the fastest rate of reversion (41%) and the reversion of E44D and T69D was associated with the largest changes in RC. Among the most prevalent RT mutations, M184V had the fastest rate of reversion from baseline to 2 months (40%), and its reversion was associated with the largest increase in RC. Most rates of reversion increased between 2 months and 4 months, but the change in RC was more limited as it was already close to 100%. The highest frequency of concurrent reversion was found for L100I and K103N. Mutagenesis tree models showed that M184V, when present, was overall the first mutation to revert among all the RT mutations reported in the study. CONCLUSION Longitudinal analysis of combined phenotypic and genotypic data during STI showed a large amount of variability in prevalence and reversion rates to wild-type codons among the RT resistance-associated mutations. The rate of reversion of these mutations may depend on the extent of RC increase as well as the co-occurring reversion of other mutations belonging to the same mutational pathway.
Collapse
|
41
|
Abstract
Viruses are fast evolving pathogens that continuously adapt to the highly variable environments they live and reproduce in. Strategies devoted to inhibit virus replication and to control their spread among hosts need to cope with these extremely heterogeneous populations and with their potential to avoid medical interventions. Computational techniques such as phylogenetic methods have broadened our picture of viral evolution both in time and space, and mathematical modeling has contributed substantially to our progress in unraveling the dynamics of virus replication, fitness, and virulence. Integration of multiple computational and mathematical approaches with experimental data can help to predict the behavior of viral pathogens and to anticipate their escape dynamics. This piece of information plays a critical role in some aspects of vaccine development, such as viral strain selection for vaccinations or rational attenuation of viruses. Here we review several aspects of viral evolution that can be addressed quantitatively, and we discuss computational methods that have the potential to improve vaccine design.
Collapse
Affiliation(s)
- Samuel Ojosnegros
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.
| | | |
Collapse
|
42
|
Evolution of drug resistance during 48 weeks of zidovudine/lamivudine/tenofovir in the absence of real-time viral load monitoring. J Acquir Immune Defic Syndr 2010; 55:277-83. [PMID: 20686411 DOI: 10.1097/qai.0b013e3181ea0df8] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
OBJECTIVES To describe the resistance mutations selected by a first-line regimen of zidovudine/lamivudine/tenofovir in the absence of real-time viral load monitoring. DESIGN A substudy of 300 participants from the Development of Antiretroviral Therapy in Africa trial in Uganda and Zimbabwe, which compared managing antiretroviral therapy with and without laboratory monitoring. METHODS Stored plasma samples from selected time points were assayed retrospectively for HIV-1 RNA. The pol gene in all baseline samples and those with HIV RNA >1000 copies per milliliter at weeks 24 and 48 were sequenced. RESULTS The proportion with HIV RNA >1000 copies per milliliter increased from 15% at 24 weeks to 24% at 48 weeks. Eighteen of 31 (58%) genotyped samples at 24 weeks had ≥ 1 major nucleoside reverse transcriptase inhibitor-associated mutations compared with 41 of 47 (87%) at 48 weeks. Excluding 1 nonadherent patient, a mean of 2.0 (95% confidence interval: 1.3 to 2.8) thymidine analogue mutations (TAMs) developed between weeks 24 and 48 among 14 patients with HIV RNA >1000 copies per milliliter at both time points. K65R was detected in 8 of 63 (13%) patients and was negatively associated with number of TAMs (P = 0.01) but not viral subtype (P = 0.30). CONCLUSIONS A high rate of acquisition of TAMs, but not of K65R, among patients with prolonged viraemia was observed. However, most patients were virologically suppressed at 48 weeks, and long-term clinical and immunological outcomes in the Development of Antiretroviral Therapy in Africa trial were favorable.
Collapse
|
43
|
A differentiation-based phylogeny of cancer subtypes. PLoS Comput Biol 2010; 6:e1000777. [PMID: 20463876 PMCID: PMC2865519 DOI: 10.1371/journal.pcbi.1000777] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2009] [Accepted: 04/02/2010] [Indexed: 12/20/2022] Open
Abstract
Histopathological classification of human tumors relies in part on the degree of differentiation of the tumor sample. To date, there is no objective systematic method to categorize tumor subtypes by maturation. In this paper, we introduce a novel computational algorithm to rank tumor subtypes according to the dissimilarity of their gene expression from that of stem cells and fully differentiated tissue, and thereby construct a phylogenetic tree of cancer. We validate our methodology with expression data of leukemia, breast cancer and liposarcoma subtypes and then apply it to a broader group of sarcomas. This ranking of tumor subtypes resulting from the application of our methodology allows the identification of genes correlated with differentiation and may help to identify novel therapeutic targets. Our algorithm represents the first phylogeny-based tool to analyze the differentiation status of human tumors. Gene expression profiling of malignancies is often held to demonstrate genes that are “up-regulated” or “down-regulated”, but the appropriate frame of reference against which observations should be compared has not been determined. Fully differentiated somatic cells arise from stem cells, with changes in gene expression that can be experimentally determined. If cancers arise as the result of an abruption of the differentiation process, then poorly differentiated cancers would have a gene expression more similar to stem cells than to normal differentiated tissue, and well differentiated cancers would have a gene expression more similar to fully differentiated cells than to stem cells. In this paper, we describe a novel computational algorithm that allows orientation of cancer gene expression between the poles of the gene expression of stem cells and of fully differentiated tissue. Our methodology allows the construction of a multi-branched phylogeny of human malignancies and can be used to identify genes related to differentiation as well as novel therapeutic targets.
Collapse
|
44
|
Theys K, Deforche K, Libin P, Camacho RJ, Van Laethem K, Vandamme AM. Resistance pathways of human immunodeficiency virus type 1 against the combination of zidovudine and lamivudine. J Gen Virol 2010; 91:1898-1908. [PMID: 20410311 DOI: 10.1099/vir.0.022657-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A better understanding of human immunodeficiency virus type 1 drug-resistance evolution under the selective pressure of combination treatment is important for the design of long-term effective treatment strategies. We applied Bayesian network learning to sequences from patients treated with the reverse transcriptase inhibitor combination of zidovudine (AZT) and lamivudine (3TC) to identify the role of many treatment-selected mutations in the development of resistance. Based on the Bayesian network structure, an in vivo fitness landscape was built, reflecting the necessary selective pressure under treatment, to evolve naive sequences to sequences obtained from patients treated with the combination. This landscape, combined with an evolutionary model, was used to predict resistance evolution in longitudinal sequence pairs. In our analysis, mutations 41L, 70R, 184V and 215F/Y were identified as major resistance mutations to the combination of AZT and 3TC, as they were associated directly with treatment experience. The network also suggested a possible role in resistance development for a number of novel mutations. Estimated fitness, using the landscape, correlated significantly with in vitro resistance phenotype in genotype-phenotype pairs (R(2)=0.70). Variation in predicted evolution under selective pressure correlated significantly with observed in vivo evolution during AZT plus 3CT treatment. In conclusion, we confirmed current knowledge on resistance development to the combination of AZT and 3CT, but additional novel mutations were identified. Moreover, a model to predict resistance evolution during AZT and 3CT treatment has been built and validated.
Collapse
Affiliation(s)
- K Theys
- Rega Institute for Medical Research, Katholieke Universiteit Leuven, Leuven, Belgium
| | | | - P Libin
- MyBioData, Rotselaar, Belgium
| | - R J Camacho
- Instituto de Higiene e Medicina Tropical, Universidade Nova de Lisboa, Lisbon, Portugal
| | - K Van Laethem
- Rega Institute for Medical Research, Katholieke Universiteit Leuven, Leuven, Belgium
| | - A-M Vandamme
- Rega Institute for Medical Research, Katholieke Universiteit Leuven, Leuven, Belgium
| |
Collapse
|
45
|
von Kleist M, Menz S, Huisinga W. Drug-class specific impact of antivirals on the reproductive capacity of HIV. PLoS Comput Biol 2010; 6:e1000720. [PMID: 20361047 PMCID: PMC2845651 DOI: 10.1371/journal.pcbi.1000720] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2009] [Accepted: 02/23/2010] [Indexed: 11/18/2022] Open
Abstract
Predictive markers linking drug efficacy to clinical outcome are a key component in the drug discovery and development process. In HIV infection, two different measures, viral load decay and phenotypic assays, are used to assess drug efficacy in vivo and in vitro. For the newly introduced class of integrase inhibitors, a huge discrepancy between these two measures of efficacy was observed. Hence, a thorough understanding of the relation between these two measures of drug efficacy is imperative for guiding future drug discovery and development activities in HIV. In this article, we developed a novel viral dynamics model, which allows for a mechanistic integration of the mode of action of all approved drugs and drugs in late clinical trials. Subsequently, we established a link between in vivo and in vitro measures of drug efficacy, and extract important determinants of drug efficacy in vivo. The analysis is based on a new quantity-the reproductive capacity-that represents in mathematical terms the in vivo analog of the read-out of a phenotypic assay. Our results suggest a drug-class specific impact of antivirals on the total amount of viral replication. Moreover, we showed that the (drug-)target half life, dominated by immune-system related clearance processes, is a key characteristic that affects both the emergence of resistance as well as the in vitro-in vivo correlation of efficacy measures in HIV treatment. We found that protease- and maturation inhibitors, due to their target half-life, decrease the total amount of viral replication and the emergence of resistance most efficiently.
Collapse
Affiliation(s)
- Max von Kleist
- Hamilton Institute, Computational Physiology Group, National University of Ireland Maynooth, Kildare, Ireland.
| | | | | |
Collapse
|
46
|
Buendia P, Cadwallader B, DeGruttola V. A phylogenetic and Markov model approach for the reconstruction of mutational pathways of drug resistance. Bioinformatics 2009; 25:2522-9. [PMID: 19654117 PMCID: PMC2752619 DOI: 10.1093/bioinformatics/btp466] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2009] [Revised: 07/24/2009] [Accepted: 07/25/2009] [Indexed: 02/05/2023] Open
Abstract
MOTIVATION Modern HIV-1, hepatitis B virus and hepatitis C virus antiviral therapies have been successful at keeping viruses suppressed for prolonged periods of time, but therapy failures attributable to the emergence of drug resistant mutations continue to be a distressing reminder that no therapy can fully eradicate these viruses from their host organisms. To better understand the emergence of drug resistance, we combined phylogenetic and statistical models of viral evolution in a 2-phase computational approach that reconstructs mutational pathways of drug resistance. RESULTS The first phase of the algorithm involved the modeling of the evolution of the virus within the human host environment. The inclusion of longitudinal clonal sequence data was a key aspect of the model due to the progressive fashion in which multiple mutations become linked in the same genome creating drug resistant genotypes. The second phase involved the development of a Markov model to calculate the transition probabilities between the different genotypes. The proposed method was applied to data from an HIV-1 Efavirenz clinical trial study. The obtained model revealed the direction of evolution over time with greater detail than previous models. Our results show that the mutational pathways facilitate the identification of fast versus slow evolutionary pathways to drug resistance. AVAILABILITY Source code for the algorithm is publicly available at http://biorg.cis.fiu.edu/vPhyloMM/
Collapse
Affiliation(s)
- Patricia Buendia
- Department of Biology and Center for Computational Science, University of Miami, Miami, USA.
| | | | | |
Collapse
|
47
|
|
48
|
Pathare S, Schäffer AA, Beerenwinkel N, Mahimkar M. Construction of oncogenetic tree models reveals multiple pathways of oral cancer progression. Int J Cancer 2009; 124:2864-71. [PMID: 19267402 DOI: 10.1002/ijc.24267] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Oral cancer develops and progresses by accumulation of genetic alterations. The interrelationship between these alterations and their sequence of occurrence in oral cancers has not been thoroughly understood. In the present study, we applied oncogenetic tree models to comparative genomic hybridization (CGH) data of 97 primary oral cancers to identify pathways of progression. CGH revealed the most frequent gains on chromosomes 8q (72.4%) and 9q (41.2%) and frequent losses on 3p (49.5%) and 8p (47.5%). Both mixture and distance-based tree models suggested multiple progression pathways and identified +8q as an early event. The mixture model suggested two independent pathways namely a major pathway with -8p and a less frequent pathway with +9q. The distance-based tree identified three progression pathways, one characterized by -8p, another by -3p and the third by alterations +11q and +7p. Differences were observed in cytogenetic pathways of node-positive and node-negative oral cancers. Node-positive cancers were characterized by more non-random aberrations (n = 11) and progressed via -8p or -3p. On the other hand, node-negative cancers involved fewer non-random alterations (n = 6) and progressed along -3p. In summary, the tree models for oral cancers provided novel information about the interactions between genetic alterations and predicted their probable order of occurrence.
Collapse
Affiliation(s)
- Swapnali Pathare
- Advanced Centre for Treatment, Research and Education in Cancer (ACTREC), Cancer Research Institute (CRI), Tata Memorial Centre (TMC), Kharghar, Navi Mumbai, India
| | | | | | | |
Collapse
|
49
|
Altmann A, Rosen-Zvi M, Prosperi M, Aharoni E, Neuvirth H, Schülter E, Büch J, Struck D, Peres Y, Incardona F, Sönnerborg A, Kaiser R, Zazzi M, Lengauer T. Comparison of classifier fusion methods for predicting response to anti HIV-1 therapy. PLoS One 2008; 3:e3470. [PMID: 18941628 PMCID: PMC2565127 DOI: 10.1371/journal.pone.0003470] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2008] [Accepted: 09/25/2008] [Indexed: 12/12/2022] Open
Abstract
Background Analysis of the viral genome for drug resistance mutations is state-of-the-art for guiding treatment selection for human immunodeficiency virus type 1 (HIV-1)-infected patients. These mutations alter the structure of viral target proteins and reduce or in the worst case completely inhibit the effect of antiretroviral compounds while maintaining the ability for effective replication. Modern anti-HIV-1 regimens comprise multiple drugs in order to prevent or at least delay the development of resistance mutations. However, commonly used HIV-1 genotype interpretation systems provide only classifications for single drugs. The EuResist initiative has collected data from about 18,500 patients to train three classifiers for predicting response to combination antiretroviral therapy, given the viral genotype and further information. In this work we compare different classifier fusion methods for combining the individual classifiers. Principal Findings The individual classifiers yielded similar performance, and all the combination approaches considered performed equally well. The gain in performance due to combining methods did not reach statistical significance compared to the single best individual classifier on the complete training set. However, on smaller training set sizes (200 to 1,600 instances compared to 2,700) the combination significantly outperformed the individual classifiers (p<0.01; paired one-sided Wilcoxon test). Together with a consistent reduction of the standard deviation compared to the individual prediction engines this shows a more robust behavior of the combined system. Moreover, using the combined system we were able to identify a class of therapy courses that led to a consistent underestimation (about 0.05 AUC) of the system performance. Discovery of these therapy courses is a further hint for the robustness of the combined system. Conclusion The combined EuResist prediction engine is freely available at http://engine.euresist.org.
Collapse
Affiliation(s)
- André Altmann
- Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbrücken, Germany.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
50
|
Bogojeska J, Alexa A, Altmann A, Lengauer T, Rahnenführer J. Rtreemix: an R package for estimating evolutionary pathways and genetic progression scores. Bioinformatics 2008; 24:2391-2. [PMID: 18718947 PMCID: PMC2562010 DOI: 10.1093/bioinformatics/btn410] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Summary: In genetics, many evolutionary pathways can be modeled by the ordered accumulation of permanent changes. Mixture models of mutagenetic trees have been used to describe disease progression in cancer and in HIV. In cancer, progression is modeled by the accumulation of chromosomal gains and losses in tumor cells; in HIV, the accumulation of drug resistance-associated mutations in the viral genome is known to be associated with disease progression. From such evolutionary models, genetic progression scores can be derived that assign measures for the disease state to single patients. Rtreemix is an R package for estimating mixture models of evolutionary pathways from observed cross-sectional data and for estimating associated genetic progression scores. The package also provides extended functionality for estimating confidence intervals for estimated model parameters and for evaluating the stability of the estimated evolutionary mixture models. Availability:Rtreemix is an R package that is freely available from the Bioconductor project at http://www.bioconductor.org and runs on Linux and Windows. Contact:jasmina@mpi-inf.mpg.de
Collapse
|