1
|
Schmidt H, Raphael BJ. A regression based approach to phylogenetic reconstruction from multi-sample bulk DNA sequencing of tumors. PLoS Comput Biol 2024; 20:e1012631. [PMID: 39630782 DOI: 10.1371/journal.pcbi.1012631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Revised: 12/20/2024] [Accepted: 11/12/2024] [Indexed: 12/07/2024] Open
Abstract
MOTIVATION DNA sequencing of multiple bulk samples from a tumor provides the opportunity to investigate tumor heterogeneity and reconstruct a phylogeny of a patient's cancer. However, since bulk DNA sequencing of tumor tissue measures thousands of cells from a heterogeneous mixture of distinct sub-populations, accurate reconstruction of the tumor phylogeny requires simultaneous deconvolution of cancer clones and inference of ancestral relationships, leading to a challenging computational problem. Many existing methods for phylogenetic reconstruction from bulk sequencing data do not scale to large datasets, such as recent datasets containing upwards of ninety samples with dozens of distinct sub-populations. RESULTS We develop an approach to reconstruct phylogenetic trees from multi-sample bulk DNA sequencing data by separating the reconstruction problem into two parts: a structured regression problem for a fixed tree [Formula: see text], and an optimization over tree space. We derive an algorithm for the regression sub-problem by exploiting the unique, combinatorial structure of the matrices appearing within the problem. This algorithm has both asymptotic and empirical improvements over linear programming (LP) approaches to the problem. Using our algorithm for this regression sub-problem, we develop fastBE, a simple method for phylogenetic inference from multi-sample bulk DNA sequencing data. We demonstrate on simulated data with hundreds of samples and upwards of a thousand distinct sub-populations that fastBE outperforms existing approaches in terms of reconstruction accuracy, sample efficiency, and runtime. Owing to its scalability, fastBE enables both phylogenetic reconstruction directly from indvidual mutations without requiring the clustering of mutations into clones, as well as a new phylogeny constrained mutation clustering algorithm. On real data from fourteen B-progenitor acute lymphoblastic leukemia patients, fastBE infers mutation phylogenies with fewer violations of a widely used evolutionary constraint and better agreement to the observed mutational frequencies. Using our phylogeny constrained mutation clustering algorithm, we also find mutation clusters with lower distortion compared to state-of-the-art approaches. Finally, we show that on two patient-derived colorectal cancer models, fastBE infers mutation phylogenies with less violation of a widely used evolutionary constraint compared to existing methods.
Collapse
Affiliation(s)
- Henri Schmidt
- Department of Computer Science, Princeton University, New Jersey, United States of America
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, New Jersey, United States of America
| |
Collapse
|
2
|
Lai J, Yang Y, Liu Y, Scharpf RB, Karchin R. Assessing the merits: an opinion on the effectiveness of simulation techniques in tumor subclonal reconstruction. BIOINFORMATICS ADVANCES 2024; 4:vbae094. [PMID: 38948008 PMCID: PMC11213631 DOI: 10.1093/bioadv/vbae094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 05/28/2024] [Accepted: 06/15/2024] [Indexed: 07/02/2024]
Abstract
Summary Neoplastic tumors originate from a single cell, and their evolution can be traced through lineages characterized by mutations, copy number alterations, and structural variants. These lineages are reconstructed and mapped onto evolutionary trees with algorithmic approaches. However, without ground truth benchmark sets, the validity of an algorithm remains uncertain, limiting potential clinical applicability. With a growing number of algorithms available, there is urgent need for standardized benchmark sets to evaluate their merits. Benchmark sets rely on in silico simulations of tumor sequence, but there are no accepted standards for simulation tools, presenting a major obstacle to progress in this field. Availability and implementation All analysis done in the paper was based on publicly available data from the publication of each accessed tool.
Collapse
Affiliation(s)
- Jiaying Lai
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Yi Yang
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Yunzhou Liu
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Robert B Scharpf
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21231, United States
- Department of Oncology, Johns Hopkins Medical Institutions, Baltimore, MD 21231, United States
| | - Rachel Karchin
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21231, United States
- Department of Oncology, Johns Hopkins Medical Institutions, Baltimore, MD 21231, United States
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, United States
| |
Collapse
|
3
|
Li L, Xie W, Zhan L, Wen S, Luo X, Xu S, Cai Y, Tang W, Wang Q, Li M, Xie Z, Deng L, Zhu H, Yu G. Resolving tumor evolution: a phylogenetic approach. JOURNAL OF THE NATIONAL CANCER CENTER 2024; 4:97-106. [PMID: 39282584 PMCID: PMC11390690 DOI: 10.1016/j.jncc.2024.03.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Revised: 02/28/2024] [Accepted: 03/20/2024] [Indexed: 09/19/2024] Open
Abstract
The evolutionary dynamics of cancer, characterized by its profound heterogeneity, demand sophisticated tools for a holistic understanding. This review delves into tumor phylogenetics, an essential approach bridging evolutionary biology with oncology, offering unparalleled insights into cancer's evolutionary trajectory. We provide an overview of the workflow, encompassing study design, data acquisition, and phylogeny reconstruction. Notably, the integration of diverse data sets emerges as a transformative step, enhancing the depth and breadth of evolutionary insights. With this integrated perspective, tumor phylogenetics stands poised to redefine our understanding of cancer evolution and influence therapeutic strategies.
Collapse
Affiliation(s)
- Lin Li
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Wenqin Xie
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Li Zhan
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Shaodi Wen
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
- Department of Oncology, The Affiliated Cancer Hospital of Nanjing Medical University & Jiangsu Cancer Hospital, Nanjing, China
| | - Xiao Luo
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Shuangbin Xu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
- Division of Laboratory Medicine, Microbiome Center, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Yantong Cai
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
- Dermatology Hospital, Southern Medical University, Guangzhou, China
| | - Wenli Tang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Qianwen Wang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Ming Li
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Zijing Xie
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Lin Deng
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Hongyuan Zhu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Guangchuang Yu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| |
Collapse
|
4
|
Lai J, Liu Y, Scharpf RB, Karchin R. Evaluation of simulation methods for tumor subclonal reconstruction. ARXIV 2024:arXiv:2402.09599v1. [PMID: 38410652 PMCID: PMC10896360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 02/28/2024]
Abstract
Most neoplastic tumors originate from a single cell, and their evolution can be genetically traced through lineages characterized by common alterations such as small somatic mutations (SSMs), copy number alterations (CNAs), structural variants (SVs), and aneuploidies. Due to the complexity of these alterations in most tumors and the errors introduced by sequencing protocols and calling algorithms, tumor subclonal reconstruction algorithms are necessary to recapitulate the DNA sequence composition and tumor evolution in silico. With a growing number of these algorithms available, there is a pressing need for consistent and comprehensive benchmarking, which relies on realistic tumor sequencing generated by simulation tools. Here, we examine the current simulation methods, identifying their strengths and weaknesses, and provide recommendations for their improvement. Our review also explores potential new directions for research in this area. This work aims to serve as a resource for understanding and enhancing tumor genomic simulations, contributing to the advancement of the field.
Collapse
Affiliation(s)
- Jiaying Lai
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD
| | - Yunzhou Liu
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD
| | - Robert B. Scharpf
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD
- Department of Oncology, Johns Hopkins Medical Institutions, Baltimore, MD
| | - Rachel Karchin
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD
- Department of Oncology, Johns Hopkins Medical Institutions, Baltimore, MD
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD
| |
Collapse
|
5
|
Antonello A, Bergamin R, Calonaci N, Househam J, Milite S, Williams MJ, Anselmi F, d'Onofrio A, Sundaram V, Sosinsky A, Cross WCH, Caravagna G. Computational validation of clonal and subclonal copy number alterations from bulk tumor sequencing using CNAqc. Genome Biol 2024; 25:38. [PMID: 38297376 PMCID: PMC10832148 DOI: 10.1186/s13059-024-03170-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 01/10/2024] [Indexed: 02/02/2024] Open
Abstract
Copy number alterations (CNAs) are among the most important genetic events in cancer, but their detection from sequencing data is challenging because of unknown sample purity, tumor ploidy, and general intra-tumor heterogeneity. Here, we present CNAqc, an evolution-inspired method to perform the computational validation of clonal and subclonal CNAs detected from bulk DNA sequencing. CNAqc is validated using single-cell data and simulations, is applied to over 4000 TCGA and PCAWG samples, and is incorporated into the validation process for the clinically accredited bioinformatics pipeline at Genomics England. CNAqc is designed to support automated quality control procedures for tumor somatic data validation.
Collapse
Affiliation(s)
- Alice Antonello
- Department of Mathematics, Informatics and Geosciences (MIGe), University of Trieste, Trieste, Italy
| | - Riccardo Bergamin
- Department of Mathematics, Informatics and Geosciences (MIGe), University of Trieste, Trieste, Italy
| | - Nicola Calonaci
- Department of Mathematics, Informatics and Geosciences (MIGe), University of Trieste, Trieste, Italy
| | - Jacob Househam
- Evolution and Cancer Lab, Centre for Genomics and Computational Biology, Barts Cancer Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Salvatore Milite
- Department of Mathematics, Informatics and Geosciences (MIGe), University of Trieste, Trieste, Italy
- Centre for Computational Biology, Human Technopole, Milan, Italy
| | - Marc J Williams
- Department of Computational Oncology, Memorial Sloan Kettering, New York, USA
| | - Fabio Anselmi
- Department of Mathematics, Informatics and Geosciences (MIGe), University of Trieste, Trieste, Italy
| | - Alberto d'Onofrio
- Department of Mathematics, Informatics and Geosciences (MIGe), University of Trieste, Trieste, Italy
| | | | | | - William C H Cross
- Department of Research Pathology, UCL Cancer Institute, University College London, London, UK
| | - Giulio Caravagna
- Department of Mathematics, Informatics and Geosciences (MIGe), University of Trieste, Trieste, Italy.
- Evolutionary Genomics and Modelling Team, Centre for Evolution and Cancer, Institute of Cancer Research, London, UK.
| |
Collapse
|
6
|
Peroni E, Randi ML, Rosato A, Cagnin S. Acute myeloid leukemia: from NGS, through scRNA-seq, to CAR-T. dissect cancer heterogeneity and tailor the treatment. J Exp Clin Cancer Res 2023; 42:259. [PMID: 37803464 PMCID: PMC10557350 DOI: 10.1186/s13046-023-02841-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 09/25/2023] [Indexed: 10/08/2023] Open
Abstract
Acute myeloid leukemia (AML) is a malignant blood cancer with marked cellular heterogeneity due to altered maturation and differentiation of myeloid blasts, the possible causes of which are transcriptional or epigenetic alterations, impaired apoptosis, and excessive cell proliferation. This neoplasm has a high rate of resistance to anticancer therapies and thus a high risk of relapse and mortality because of both the biological diversity of the patient and intratumoral heterogeneity due to the acquisition of new somatic changes. For more than 40 years, the old gold standard "one size fits all" treatment approach included intensive chemotherapy treatment with anthracyclines and cytarabine.The manuscript first traces the evolution of the understanding of the pathology from the 1970s to the present. The enormous strides made in its categorization prove to be crucial for risk stratification, enabling an increasingly personalized diagnosis and treatment approach.Subsequently, we highlight how, over the past 15 years, technological advances enabling single cell RNA sequencing and T-cell modification based on the genomic tools are affecting the classification and treatment of AML. At the dawn of the new millennium, the advent of high-throughput next-generation sequencing technologies has enabled the profiling of patients evidencing different facets of the same disease, stratifying risk, and identifying new possible therapeutic targets that have subsequently been validated. Currently, the possibility of investigating tumor heterogeneity at the single cell level, profiling the tumor at the time of diagnosis or after treatments exist. This would allow the identification of underrepresented cellular subclones or clones resistant to therapeutic approaches and thus responsible for post-treatment relapse that would otherwise be difficult to detect with bulk investigations on the tumor biopsy. Single-cell investigation will then allow even greater personalization of therapy to the genetic and transcriptional profile of the tumor, saving valuable time and dangerous side effects. The era of personalized medicine will take a huge step forward through the disclosure of each individual piece of the complex puzzle that is cancer pathology, to implement a "tailored" therapeutic approach based also on engineered CAR-T cells.
Collapse
Affiliation(s)
- Edoardo Peroni
- Immunology and Molecular Oncology Unit, Veneto Institute of Oncology, IOV-IRCCS, Padova, 35128, Italy.
| | - Maria Luigia Randi
- First Medical Clinic, Department of Medicine-DIMED, University of Padua, Padua, Italy
| | - Antonio Rosato
- Immunology and Molecular Oncology Unit, Veneto Institute of Oncology, IOV-IRCCS, Padova, 35128, Italy
- Department of Surgery, Oncology and Gastroenterology, University of Padua, Padua, Italy
| | - Stefano Cagnin
- Department of Biology, University of Padova, Padova, 35131, Italy
- CIR-Myo Myology Center, University of Padova, Padova, 35131, Italy
| |
Collapse
|
7
|
Sandmann S, Richter S, Jiang X, Varghese J. Reconstructing Clonal Evolution-A Systematic Evaluation of Current Bioinformatics Approaches. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:5128. [PMID: 36982036 PMCID: PMC10049679 DOI: 10.3390/ijerph20065128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 03/04/2023] [Accepted: 03/13/2023] [Indexed: 06/18/2023]
Abstract
The accurate reconstruction of clonal evolution, including the identification of newly developing, highly aggressive subclones, is essential for the application of precision medicine in cancer treatment. Reconstruction, aiming for correct variant clustering and clonal evolution tree reconstruction, is commonly performed by tedious manual work. While there is a plethora of tools to automatically generate reconstruction, their reliability, especially reasons for unreliability, are not systematically assessed. We developed clevRsim-an approach to simulate clonal evolution data, including single-nucleotide variants as well as (overlapping) copy number variants. From this, we generated 88 data sets and performed a systematic evaluation of the tools for the reconstruction of clonal evolution. The results indicate a major negative influence of a high number of clones on both clustering and tree reconstruction. Low coverage as well as an extreme number of time points usually leads to poor clustering results. An underlying branched independent evolution hampers correct tree reconstruction. A further major decline in performance could be observed for large deletions and duplications overlapping single-nucleotide variants. In summary, to explore the full potential of reconstructing clonal evolution, improved algorithms that can properly handle the identified limitations are greatly needed.
Collapse
Affiliation(s)
- Sarah Sandmann
- Institute of Medical Informatics, University of Münster, 48149 Münster, Germany
| | - Silja Richter
- Institute of Medical Informatics, University of Münster, 48149 Münster, Germany
| | - Xiaoyi Jiang
- Department of Computer Science, University of Münster, 48149 Münster, Germany
| | - Julian Varghese
- Institute of Medical Informatics, University of Münster, 48149 Münster, Germany
| |
Collapse
|
8
|
Sandmann S, Inserte C, Varghese J. clevRvis: visualization techniques for clonal evolution. Gigascience 2022; 12:giad020. [PMID: 37039116 PMCID: PMC10087014 DOI: 10.1093/gigascience/giad020] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 01/23/2023] [Accepted: 03/08/2023] [Indexed: 04/12/2023] Open
Abstract
BACKGROUND A thorough analysis of clonal evolution commonly requires integration of diverse sources of data (e.g., karyotyping, next-generation sequencing, and clinical information). Subsequent to actual reconstruction of clonal evolution, detailed analysis and interpretation of the results are essential. Often, however, only few tumor samples per patient are available. Thus, information on clonal development and therapy effect may be incomplete. Furthermore, analysis of biallelic events-considered of high relevance with respect to disease course-can commonly only be realized by time-consuming analysis of the raw results and even raw sequencing data. RESULTS We developed clevRvis, an R/Bioconductor package providing an extensive set of visualization techniques for clonal evolution. In addition to common approaches for visualization, clevRvis offers a unique option for allele-aware representation: plaice plots. Biallelic events may be visualized and inspected at a glance. Analyzing 4 public datasets, we show that plaice plots help to gain new insights into tumor development and investigate hypotheses on disease progression and therapy resistance. In addition to a graphical user interface, automatic phylogeny-aware color coding of the plots, and an approach to explore alternative trees, clevRvis provides 2 algorithms for fully automatic time point interpolation and therapy effect estimation. Analyzing 2 public datasets, we show that both approaches allow for valid approximation of a tumor's development in between measured time points. CONCLUSIONS clevRvis represents a novel option for user-friendly analysis of clonal evolution, contributing to gaining new insights into tumor development.
Collapse
Affiliation(s)
- Sarah Sandmann
- Institute of Medical Informatics, University of Münster, Münster 48149, Germany
| | - Clara Inserte
- Institute of Medical Informatics, University of Münster, Münster 48149, Germany
| | - Julian Varghese
- Institute of Medical Informatics, University of Münster, Münster 48149, Germany
| |
Collapse
|
9
|
Ahmadinejad N, Troftgruben S, Wang J, Chandrashekar PB, Dinu V, Maley C, Liu L. Accurate Identification of Subclones in Tumor Genomes. Mol Biol Evol 2022; 39:msac136. [PMID: 35749590 PMCID: PMC9260306 DOI: 10.1093/molbev/msac136] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Understanding intratumor heterogeneity is critical for studying tumorigenesis and designing personalized treatments. To decompose the mixed cell population in a tumor, subclones are inferred computationally based on variant allele frequency (VAF) from bulk sequencing data. In this study, we showed that sequencing depth, mean VAF, and variance of VAF of a subclone are confounded. Without considering this effect, current methods require deep-sequencing data (>300× depth) to reliably infer subclones. Here, we present a novel algorithm that incorporates depth-variance and mean-variance dependencies in a clustering error model and successfully identifies subclones in tumors sequenced at depths of as low as 30×. We implemented the algorithm as a model-based adaptive grouping of subclones (MAGOS) method. Analyses of computer simulated data and empirical sequencing data showed that MAGOS outperformed existing methods on minimum sequencing depth, decomposition accuracy, and computation efficiency. The most prominent improvements were observed in analyzing tumors sequenced at depths between 30× and 200×, whereas the performance was comparable between MAGOS and existing methods on deeply sequenced tumors. MAGOS supports analysis of single-nucleotide variants and copy number variants from a single sample or multiple samples of a tumor. We applied MAGOS to whole-exome data of late-stage liver cancers and discovered that high subclone count in a tumor was a significant risk factor of poor prognosis. Lastly, our analysis suggested that sequencing multiple samples of the same tumor at standard depth is more cost-effective and robust for subclone characterization than deep sequencing a single sample. MAGOS is available at github (https://github.com/liliulab/magos).
Collapse
Affiliation(s)
- Navid Ahmadinejad
- College of Health Solutions, Arizona State University, Phoenix, AZ 85054, USA
- Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA
| | - Shayna Troftgruben
- College of Health Solutions, Arizona State University, Phoenix, AZ 85054, USA
| | - Junwen Wang
- College of Health Solutions, Arizona State University, Phoenix, AZ 85054, USA
- Department of Health Sciences Research and Center for Individualized Medicine, Mayo Clinic Arizona, Scottsdale, AZ 85259, USA
| | - Pramod B Chandrashekar
- College of Health Solutions, Arizona State University, Phoenix, AZ 85054, USA
- Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA
| | - Valentin Dinu
- College of Health Solutions, Arizona State University, Phoenix, AZ 85054, USA
- Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA
| | - Carlo Maley
- Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA
| | - Li Liu
- College of Health Solutions, Arizona State University, Phoenix, AZ 85054, USA
- Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA
| |
Collapse
|
10
|
Wintersinger JA, Dobson SM, Kulman E, Stein LD, Dick JE, Morris Q. Reconstructing Complex Cancer Evolutionary Histories from Multiple Bulk DNA Samples Using Pairtree. Blood Cancer Discov 2022; 3:208-219. [PMID: 35247876 PMCID: PMC9780082 DOI: 10.1158/2643-3230.bcd-21-0092] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 11/23/2021] [Accepted: 02/28/2022] [Indexed: 01/25/2023] Open
Abstract
Cancers are composed of genetically distinct subpopulations of malignant cells. DNA-sequencing data can be used to determine the somatic point mutations specific to each population and build clone trees describing the evolutionary relationships between them. These clone trees can reveal critical points in disease development and inform treatment. Pairtree is a new method that constructs more accurate and detailed clone trees than previously possible using variant allele frequency data from one or more bulk cancer samples. It does so by first building a Pairs Tensor that captures the evolutionary relationships between pairs of subpopulations, and then it uses these relations to constrain clone trees and infer violations of the infinite sites assumption. Pairtree can accurately build clone trees using up to 100 samples per cancer that contain 30 or more subclonal populations. On 14 B-progenitor acute lymphoblastic leukemias, Pairtree replicates or improves upon expert-derived clone tree reconstructions. SIGNIFICANCE Clone trees illustrate the evolutionary history of a cancer and can provide insights into how the disease changed through time (e.g., between diagnosis and relapse). Pairtree uses DNA-sequencing data from many samples of the same cancer to build more detailed and accurate clone trees than previously possible. See related commentary by Miller, p. 176. This article is highlighted in the In This Issue feature, p. 171.
Collapse
Affiliation(s)
- Jeff A. Wintersinger
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.,Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada.,Ontario Institute for Cancer Research, Toronto, Ontario, Canada.,Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada
| | - Stephanie M. Dobson
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.,Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - Ethan Kulman
- Memorial Sloan Kettering Cancer Center, New York, New York
| | - Lincoln D. Stein
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - John E. Dick
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.,Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - Quaid Morris
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.,Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.,Memorial Sloan Kettering Cancer Center, New York, New York.,Corresponding Author: Quaid Morris, Sloan Kettering Institute, 417 East 68th Street, New York, NY 10021. Phone: 646-888-2201; E-mail:
| |
Collapse
|
11
|
Sashittal P, Zaccaria S, El-Kebir M. Parsimonious Clone Tree Integration in cancer. Algorithms Mol Biol 2022; 17:3. [PMID: 35282838 PMCID: PMC8919608 DOI: 10.1186/s13015-022-00209-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 01/25/2022] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Every tumor is composed of heterogeneous clones, each corresponding to a distinct subpopulation of cells that accumulated different types of somatic mutations, ranging from single-nucleotide variants (SNVs) to copy-number aberrations (CNAs). As the analysis of this intra-tumor heterogeneity has important clinical applications, several computational methods have been introduced to identify clones from DNA sequencing data. However, due to technological and methodological limitations, current analyses are restricted to identifying tumor clones only based on either SNVs or CNAs, preventing a comprehensive characterization of a tumor's clonal composition. RESULTS To overcome these challenges, we formulate the identification of clones in terms of both SNVs and CNAs as a integration problem while accounting for uncertainty in the input SNV and CNA proportions. We thus characterize the computational complexity of this problem and we introduce PACTION (PArsimonious Clone Tree integratION), an algorithm that solves the problem using a mixed integer linear programming formulation. On simulated data, we show that tumor clones can be identified reliably, especially when further taking into account the ancestral relationships that can be inferred from the input SNVs and CNAs. On 49 tumor samples from 10 prostate cancer patients, our integration approach provides a higher resolution view of tumor evolution than previous studies. CONCLUSION PACTION is an accurate and fast method that reconstructs clonal architecture of cancer tumors by integrating SNV and CNA clones inferred using existing methods.
Collapse
|
12
|
Sandmann S, Richter S, Jiang X, Varghese J. Exploring Current Challenges and Perspectives for Automatic Reconstruction of Clonal Evolution. Cancer Genomics Proteomics 2022; 19:194-204. [PMID: 35181588 DOI: 10.21873/cgp.20314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Revised: 12/11/2021] [Accepted: 12/17/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND/AIM In the field of cancer research, reconstructing clonal evolution is of major interest. The technique provides new insights for analysis and prediction of tumor development. However, reconstruction based on mutational data is characterized by several challenges. MATERIALS AND METHODS By performing extensive literature research, we identified 51 currently available tools for reconstructing clonal evolution. By analyzing two cancer data sets (n=21), we investigated the applicability and performance of each tool. RESULTS Seventeen out of 51 tools could be applied to our data. Correct clustering of variants can be observed for 4 patients in the presence of ≤3 clusters and ≥5 time points. Correct phylogenetic trees are determined for 10 patients. Accurate visualization is possible, by applying adjustments to the original algorithms. CONCLUSION Despite bearing considerable potential, automatic reconstruction of clonal evolution remains challenging. To replace tedious manual reconstruction, further research including systematic error analyses using simulation tools needs to be conducted.
Collapse
Affiliation(s)
- Sarah Sandmann
- Institute of Medical Informatics, University of Münster, Münster, Germany;
| | - Silja Richter
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Xiaoyi Jiang
- Department of Computer Science, University of Münster, Münster, Germany
| | - Julian Varghese
- Institute of Medical Informatics, University of Münster, Münster, Germany
| |
Collapse
|
13
|
Govek K, Sikes C, Zhou Y, Oesper L. GraPhyC: Using Consensus to Infer Tumor Evolution. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:465-478. [PMID: 33031032 DOI: 10.1109/tcbb.2020.3029689] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
We consider the problem of finding a consensus tumor evolution tree from a set of conflicting input trees. In contrast to traditional phylogenetic trees, the tumor trees we consider do not have the same set of labels applied to the leaves of each tree. We describe several distance measures between these tumor trees. Our GraPhyC algorithm solves the consensus problem using a weighted directed graph where vertices are sets of mutations and edges are weighted based on the number of times a parental relationship is observed between their constituent mutations in the input trees. We find a minimum weight spanning arborescence in this graph and prove that it minimizes the total distance to all input trees for one of our distance measures. We also describe several extensions of our GraPhyC approach. On simulated data we show that GraPhyC outperforms a baseline method and demonstrate that GraPhyC can be an effective means of computing centroids in k-medians clustering. We analyze two real sequencing datasets and find that GraPhyC is able to identify a tree not included in the set of input trees, but that contains characteristics supported by other reported evolutionary reconstructions of this tumor.
Collapse
|
14
|
Reconstruction of evolving gene variants and fitness from short sequencing reads. Nat Chem Biol 2021; 17:1188-1198. [PMID: 34635842 PMCID: PMC8551035 DOI: 10.1038/s41589-021-00876-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Accepted: 08/09/2021] [Indexed: 12/23/2022]
Abstract
Directed evolution can generate proteins with tailor-made activities. However, full-length genotypes, their frequencies and fitnesses are difficult to measure for evolving gene-length biomolecules using most high-throughput DNA sequencing methods, as short read lengths can lose mutation linkages in haplotypes. Here we present Evoracle, a machine learning method that accurately reconstructs full-length genotypes (R2 = 0.94) and fitness using short-read data from directed evolution experiments, with substantial improvements over related methods. We validate Evoracle on phage-assisted continuous evolution (PACE) and phage-assisted non-continuous evolution (PANCE) of adenine base editors and OrthoRep evolution of drug-resistant enzymes. Evoracle retains strong performance (R2 = 0.86) on data with complete linkage loss between neighboring nucleotides and large measurement noise, such as pooled Sanger sequencing data (~US$10 per timepoint), and broadens the accessibility of training machine learning models on gene variant fitnesses. Evoracle can also identify high-fitness variants, including low-frequency 'rising stars', well before they are identifiable from consensus mutations.
Collapse
|
15
|
Ali S, Ciccolella S, Lucarella L, Vedova GD, Patterson M. Simpler and Faster Development of Tumor Phylogeny Pipelines. J Comput Biol 2021; 28:1142-1155. [PMID: 34698531 DOI: 10.1089/cmb.2021.0271] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
In the recent years, there has been an increasing amount of single-cell sequencing studies, producing a considerable number of new data sets. This has particularly affected the field of cancer analysis, where more and more articles are published using this sequencing technique that allows for capturing more detailed information regarding the specific genetic mutations on each individually sampled cell. As the amount of information increases, it is necessary to have more sophisticated and rapid tools for analyzing the samples. To this goal, we developed plastic (PipeLine Amalgamating Single-cell Tree Inference Components), an easy-to-use and quick to adapt pipeline that integrates three different steps: (1) to simplify the input data, (2) to infer tumor phylogenies, and (3) to compare the phylogenies. We have created a pipeline submodule for each of those steps and developed new in-memory data structures that allow for easy and transparent sharing of the information across the tools implementing the above steps. While we use existing open source tools for those steps, we have extended the tool used for simplifying the input data, incorporating two machine learning procedures-which greatly reduce the running time without affecting the quality of the downstream analysis. Moreover, we have introduced the capability of producing some plots to quickly visualize results.
Collapse
Affiliation(s)
- Sarwan Ali
- Department of Computer Science, Georgia State University, Atlanta, Georgia, USA
| | - Simone Ciccolella
- Department of Informatics, Systems, and Communications, University of Milano-Bicocca, Milano, Italy
| | - Lorenzo Lucarella
- Department of Informatics, Systems, and Communications, University of Milano-Bicocca, Milano, Italy
| | - Gianluca Della Vedova
- Department of Informatics, Systems, and Communications, University of Milano-Bicocca, Milano, Italy
| | - Murray Patterson
- Department of Computer Science, Georgia State University, Atlanta, Georgia, USA
| |
Collapse
|
16
|
Andersson N, Chattopadhyay S, Valind A, Karlsson J, Gisselsson D. DEVOLUTION-A method for phylogenetic reconstruction of aneuploid cancers based on multiregional genotyping data. Commun Biol 2021; 4:1103. [PMID: 34545199 PMCID: PMC8452746 DOI: 10.1038/s42003-021-02637-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Accepted: 09/03/2021] [Indexed: 02/05/2023] Open
Abstract
Phylogenetic reconstruction of cancer cell populations remains challenging. There is a particular lack of tools that deconvolve clones based on copy number aberration analyses of multiple tumor biopsies separated in time and space from the same patient. This has hampered investigations of tumors rich in aneuploidy but few point mutations, as in many childhood cancers and high-risk adult cancer. Here, we present DEVOLUTION, an algorithm for subclonal deconvolution followed by phylogenetic reconstruction from bulk genotyping data. It integrates copy number and sequencing information across multiple tumor regions throughout the inference process, provided that the mutated clone fraction for each mutation is known. We validate DEVOLUTION on data from 56 pediatric tumors comprising 253 tumor biopsies and show a robust performance on simulations of bulk genotyping data. We also benchmark DEVOLUTION to similar bioinformatic tools using an external dataset. DEVOLUTION holds the potential to facilitate insights into the development, progression, and response to treatment, particularly in tumors with high burden of chromosomal copy number alterations.
Collapse
Affiliation(s)
- Natalie Andersson
- Division of Clinical Genetics, Department of Laboratory Medicine, Lund University, Lund, Sweden.
| | - Subhayan Chattopadhyay
- Division of Clinical Genetics, Department of Laboratory Medicine, Lund University, Lund, Sweden
| | - Anders Valind
- Division of Clinical Genetics, Department of Laboratory Medicine, Lund University, Lund, Sweden
- Department of Pediatrics, Skåne University Hospital, Lund, Sweden
| | - Jenny Karlsson
- Division of Clinical Genetics, Department of Laboratory Medicine, Lund University, Lund, Sweden
| | - David Gisselsson
- Division of Clinical Genetics, Department of Laboratory Medicine, Lund University, Lund, Sweden
- Division of Oncology-Pathology, Department of Clinical Sciences, Lund University, Lund, Sweden
- Clinical Genetics and Pathology, Laboratory Medicine, Lund University Hospital, Skåne Healthcare Region, Lund, Sweden
| |
Collapse
|
17
|
Montemurro M, Grassi E, Pizzino CG, Bertotti A, Ficarra E, Urgese G. PhyliCS: a Python library to explore scCNA data and quantify spatial tumor heterogeneity. BMC Bioinformatics 2021; 22:360. [PMID: 34217219 PMCID: PMC8254361 DOI: 10.1186/s12859-021-04277-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Accepted: 06/21/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Tumors are composed by a number of cancer cell subpopulations (subclones), characterized by a distinguishable set of mutations. This phenomenon, known as intra-tumor heterogeneity (ITH), may be studied using Copy Number Aberrations (CNAs). Nowadays ITH can be assessed at the highest possible resolution using single-cell DNA (scDNA) sequencing technology. Additionally, single-cell CNA (scCNA) profiles from multiple samples of the same tumor can in principle be exploited to study the spatial distribution of subclones within a tumor mass. However, since the technology required to generate large scDNA sequencing datasets is relatively recent, dedicated analytical approaches are still lacking. RESULTS We present PhyliCS, the first tool which exploits scCNA data from multiple samples from the same tumor to estimate whether the different clones of a tumor are well mixed or spatially separated. Starting from the CNA data produced with third party instruments, it computes a score, the Spatial Heterogeneity score, aimed at distinguishing spatially intermixed cell populations from spatially segregated ones. Additionally, it provides functionalities to facilitate scDNA analysis, such as feature selection and dimensionality reduction methods, visualization tools and a flexible clustering module. CONCLUSIONS PhyliCS represents a valuable instrument to explore the extent of spatial heterogeneity in multi-regional tumour sampling, exploiting the potential of scCNA data.
Collapse
Affiliation(s)
- Marilisa Montemurro
- Department of Control and Computer Science, Politecnico di Torino, C.so Duca degli Abruzzi 24, 10129, Turin, Italy.
| | - Elena Grassi
- Department of Oncology, University of Torino, Strada Provinciale, 142 - KM 3.95, 10060, Candiolo, Turin, Italy.,Candiolo Cancer Institute - FPO IRCCS, Strada Provinciale, 142 - KM 3.95, 10060, Candiolo, TO, Italy
| | - Carmelo Gabriele Pizzino
- Department of Oncology, University of Torino, Strada Provinciale, 142 - KM 3.95, 10060, Candiolo, Turin, Italy.,Candiolo Cancer Institute - FPO IRCCS, Strada Provinciale, 142 - KM 3.95, 10060, Candiolo, TO, Italy
| | - Andrea Bertotti
- Department of Oncology, University of Torino, Strada Provinciale, 142 - KM 3.95, 10060, Candiolo, Turin, Italy.,Candiolo Cancer Institute - FPO IRCCS, Strada Provinciale, 142 - KM 3.95, 10060, Candiolo, TO, Italy
| | - Elisa Ficarra
- Enzo Ferrari Engineering Dept, University of Modena and Reggio Emilia, Via Vivarelli 10/1, 41125, Modena, Italy
| | - Gianvito Urgese
- Interuniversity Department of Regional and Urban Studies and Planning, Politecnico di Torino, C.so Duca degli Abruzzi 24, 10129, Turin, Italy
| |
Collapse
|
18
|
Ciccolella S, Patterson M, Bonizzoni P, Della Vedova G. Effective Clustering for Single Cell Sequencing Cancer Data. IEEE J Biomed Health Inform 2021; 25:4068-4078. [PMID: 34003758 DOI: 10.1109/jbhi.2021.3081380] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Single cell sequencing (SCS) technologies provide a level of resolution that makes it indispensable for inferring from a sequenced tumor, evolutionary trees or phylogenies representing an accumulation of cancerous mutations. A drawback of SCS is elevated false negative and missing value rates, resulting in a large space of possible solutions, which in turn makes it difficult, sometimes infeasible using current approaches and tools. One possible solution is to reduce the size of an SCS instance --- usually represented as a matrix of presence, absence, and uncertainty of the mutations found in the different sequenced cells --- and to infer the tree from this reduced-size instance. In this work, we present a new clustering procedure aimed at clustering such categorical vector, or matrix data --- here representing SCS instances, called celluloid. We show that celluloid clusters mutations with high precision: never pairing too many mutations that are unrelated in the ground truth, but also obtains accurate results in terms of the phylogeny inferred downstream from the reduced instance produced by this method. We demonstrate the usefulness of a clustering step by applying the entire pipeline (clustering + inference method) to a real dataset, showing a significant reduction in the runtime, raising considerably the upper bound on the size of SCS instances which can be solved in practice. Our approach, celluloid: clustering single cell sequencing data around centroids is available at https://github.com/AlgoLab/celluloid/ under an MIT license, as well as on the Python Package Index (PyPI) at https://pypi.org/project/celluloid-clust/.
Collapse
|
19
|
Ciccolella S, Ricketts C, Soto Gomez M, Patterson M, Silverbush D, Bonizzoni P, Hajirasouliha I, Della Vedova G. Inferring cancer progression from Single-Cell Sequencing while allowing mutation losses. Bioinformatics 2021; 37:326-333. [PMID: 32805010 PMCID: PMC8058767 DOI: 10.1093/bioinformatics/btaa722] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Revised: 08/06/2020] [Accepted: 08/11/2020] [Indexed: 01/21/2023] Open
Abstract
Motivation In recent years, the well-known Infinite Sites Assumption has been a fundamental feature of computational methods devised for reconstructing tumor phylogenies and inferring cancer progressions. However, recent studies leveraging single-cell sequencing (SCS) techniques have shown evidence of the widespread recurrence and, especially, loss of mutations in several tumor samples. While there exist established computational methods that infer phylogenies with mutation losses, there remain some advancements to be made. Results We present Simulated Annealing Single-Cell inference (SASC): a new and robust approach based on simulated annealing for the inference of cancer progression from SCS datasets. In particular, we introduce an extension of the model of evolution where mutations are only accumulated, by allowing also a limited amount of mutation loss in the evolutionary history of the tumor: the Dollo-k model. We demonstrate that SASC achieves high levels of accuracy when tested on both simulated and real datasets and in comparison with some other available methods. Availability and implementation The SASC tool is open source and available at https://github.com/sciccolella/sasc. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Simone Ciccolella
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
| | - Camir Ricketts
- Department of Physiology and Biophysics, Tri-I Computational Biology & Medicine Graduate Program, Weill Cornell Medicine of Cornell University, New York, NY 10021, USA.,Institute for Computational Biomedicine, Englander Institute for Precision Medicine, The Meyer Cancer Center, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, New York City, NY 10021, USA
| | - Mauricio Soto Gomez
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
| | - Murray Patterson
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy.,Department of Computer Science, College of Arts and Sciences, Georgia State University, Atlanta, GA 30303, USA
| | - Dana Silverbush
- Department of Pathology and Center for Cancer Research, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Paola Bonizzoni
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
| | - Iman Hajirasouliha
- Institute for Computational Biomedicine, Englander Institute for Precision Medicine, The Meyer Cancer Center, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, New York City, NY 10021, USA
| | - Gianluca Della Vedova
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
| |
Collapse
|
20
|
Zapatka M, Tausch E, Öztürk S, Yosifov DY, Seiffert M, Zenz T, Schneider C, Blöhdorn J, Döhner H, Mertens D, Lichter P, Stilgenbauer S. Clonal evolution in chronic lymphocytic leukemia is scant in relapsed but accelerated in refractory cases after chemo(immune)therapy. Haematologica 2021; 107:604-614. [PMID: 33691380 PMCID: PMC8883533 DOI: 10.3324/haematol.2020.265777] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Indexed: 11/20/2022] Open
Abstract
Clonal evolution is involved in the progression of chronic lymphocytic leukemia (CLL). In order to link evolutionary patterns to different disease courses, we performed a long-term longitudinal mutation profiling study of CLL patients. Tracking somatic mutations and their changes in allele frequency over time and assessing the underlying cancer cell fraction revealed highly distinct evolutionary patterns. Surprisingly, in long-term stable disease and in relapse after long-lasting clinical response to treatment, clonal shifts are minor. In contrast, in refractory disease major clonal shifts occur although there is little impact on leukemia cell counts. As this striking pattern in refractory cases is not linked to a strong contribution of known CLL driver genes, the evolution is mostly driven by treatment-induced selection of sub-clones, underlining the need for novel, non-genotoxic treatment regimens.
Collapse
Affiliation(s)
- Marc Zapatka
- Division of Molecular Genetics, German Cancer Research Center, Heidelberg, 69120, Germany
| | - Eugen Tausch
- Department of Internal Medicine III, Ulm University Hospital Ulm, 89081, Germany
| | - Selcen Öztürk
- Division of Molecular Genetics, German Cancer Research Center, Heidelberg, 69120, Germany
| | - Deyan Yordanov Yosifov
- Department of Internal Medicine III, Ulm University Hospital Ulm, 89081, Germany; Mechanisms of Leukemogenesis, German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
| | - Martina Seiffert
- Division of Molecular Genetics, German Cancer Research Center, Heidelberg, 69120, Germany
| | - Thorsten Zenz
- University Hospital and University of Zürich, 8091, Switzerland
| | - Christof Schneider
- Department of Internal Medicine III, Ulm University Hospital Ulm, 89081, Germany
| | - Johannes Blöhdorn
- Department of Internal Medicine III, Ulm University Hospital Ulm, 89081, Germany
| | - Hartmut Döhner
- Department of Internal Medicine III, Ulm University Hospital Ulm, 89081, Germany
| | - Daniel Mertens
- Department of Internal Medicine III, Ulm University Hospital Ulm, 89081, Germany; Mechanisms of Leukemogenesis, German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
| | - Peter Lichter
- Division of Molecular Genetics, German Cancer Research Center, Heidelberg, 69120, Germany.
| | - Stephan Stilgenbauer
- Department of Internal Medicine III, Ulm University Hospital Ulm, 89081, Germany.
| |
Collapse
|
21
|
Sadeqi Azer E, Rashidi Mehrabadi F, Malikić S, Li XC, Bartok O, Litchfield K, Levy R, Samuels Y, Schäffer AA, Gertz EM, Day CP, Pérez-Guijarro E, Marie K, Lee MP, Merlino G, Ergun F, Sahinalp SC. PhISCS-BnB: a fast branch and bound algorithm for the perfect tumor phylogeny reconstruction problem. Bioinformatics 2021; 36:i169-i176. [PMID: 32657358 DOI: 10.1093/bioinformatics/btaa464] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
MOTIVATION Recent advances in single-cell sequencing (SCS) offer an unprecedented insight into tumor emergence and evolution. Principled approaches to tumor phylogeny reconstruction via SCS data are typically based on general computational methods for solving an integer linear program, or a constraint satisfaction program, which, although guaranteeing convergence to the most likely solution, are very slow. Others based on Monte Carlo Markov Chain or alternative heuristics not only offer no such guarantee, but also are not faster in practice. As a result, novel methods that can scale up to handle the size and noise characteristics of emerging SCS data are highly desirable to fully utilize this technology. RESULTS We introduce PhISCS-BnB (phylogeny inference using SCS via branch and bound), a branch and bound algorithm to compute the most likely perfect phylogeny on an input genotype matrix extracted from an SCS dataset. PhISCS-BnB not only offers an optimality guarantee, but is also 10-100 times faster than the best available methods on simulated tumor SCS data. We also applied PhISCS-BnB on a recently published large melanoma dataset derived from the sublineages of a cell line involving 20 clones with 2367 mutations, which returned the optimal tumor phylogeny in <4 h. The resulting phylogeny agrees with and extends the published results by providing a more detailed picture on the clonal evolution of the tumor. AVAILABILITY AND IMPLEMENTATION https://github.com/algo-cancer/PhISCS-BnB. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Erfan Sadeqi Azer
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA
| | - Farid Rashidi Mehrabadi
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA.,Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Salem Malikić
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA
| | - Xuan Cindy Li
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA.,Program in Computational Biology, Bioinformatics and Genomics, University of Maryland, College Park, MD 20742, USA
| | - Osnat Bartok
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Kevin Litchfield
- Cancer Evolution and Genome Instability Laboratory, Francis Crick Institute, London NW1 1AT, UK.,Cancer Research UK Lung Cancer Centre of Excellence London, University College London Cancer Institute, London WC1E 6DD, UK
| | - Ronen Levy
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Yardena Samuels
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Alejandro A Schäffer
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - E Michael Gertz
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Chi-Ping Day
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Eva Pérez-Guijarro
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Kerrie Marie
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Maxwell P Lee
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Glenn Merlino
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Funda Ergun
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA
| | - S Cenk Sahinalp
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
22
|
Tarabichi M, Salcedo A, Deshwar AG, Leathlobhair MN, Wintersinger J, Wedge DC, Loo PV, Morris QD, Boutros PC. A practical guide to cancer subclonal reconstruction from DNA sequencing. Nat Methods 2021; 18:144-155. [PMID: 33398189 PMCID: PMC7867630 DOI: 10.1038/s41592-020-01013-2] [Citation(s) in RCA: 97] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2019] [Accepted: 11/09/2020] [Indexed: 01/28/2023]
Abstract
Subclonal reconstruction from bulk tumor DNA sequencing has become a pillar of cancer evolution studies, providing insight into the clonality and relative ordering of mutations and mutational processes. We provide an outline of the complex computational approaches used for subclonal reconstruction from single and multiple tumor samples. We identify the underlying assumptions and uncertainties in each step and suggest best practices for analysis and quality assessment. This guide provides a pragmatic resource for the growing user community of subclonal reconstruction methods.
Collapse
Affiliation(s)
- Maxime Tarabichi
- The Francis Crick Institute, London, United Kingdom
- Wellcome Sanger Institute, Hinxton, United Kingdom
| | - Adriana Salcedo
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Department of Human Genetics, University of California, Los Angeles
- Jonsson Comprehensive Cancer Center, David Geffen School of Medicine, University of California, Los Angeles
- Institute for Precision Health, University of California, Los Angeles
- Ontario Institute for Cancer Research, Toronto, Canada
| | - Amit G. Deshwar
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering, University of Toronto, Toronto, Canada
| | - Máire Ni Leathlobhair
- Big Data Institute, University of Oxford, Oxford, United Kingdom
- Ludwig Institute for Cancer Research, University of Oxford, Oxford, United Kingdom
| | - Jeff Wintersinger
- Department of Computer Science, University of Toronto, Toronto, Canada
| | - David C. Wedge
- Big Data Institute, University of Oxford, Oxford, United Kingdom
- Oxford NIHR Biomedical Research Centre, Oxford, United Kingdom
- Manchester Cancer Research Centre, University of Manchester, Manchester, United Kingdom
| | | | - Quaid D. Morris
- Ontario Institute for Cancer Research, Toronto, Canada
- Department of Computer Science, University of Toronto, Toronto, Canada
- Vector Institute, Toronto, Canada
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York
- Donnelly Centre, University of Toronto, Toronto, Canada
| | - Paul C. Boutros
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Department of Human Genetics, University of California, Los Angeles
- Jonsson Comprehensive Cancer Center, David Geffen School of Medicine, University of California, Los Angeles
- Institute for Precision Health, University of California, Los Angeles
- Vector Institute, Toronto, Canada
- Department of Pharmacology and Toxicology, University of Toronto, Toronto, Canada
- Department of Urology, David Geffen School of Medicine, University of California, Los Angeles
| |
Collapse
|
23
|
Sundermann LK, Wintersinger J, Rätsch G, Stoye J, Morris Q. Reconstructing tumor evolutionary histories and clone trees in polynomial-time with SubMARine. PLoS Comput Biol 2021; 17:e1008400. [PMID: 33465079 PMCID: PMC7845980 DOI: 10.1371/journal.pcbi.1008400] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Revised: 01/29/2021] [Accepted: 09/22/2020] [Indexed: 11/18/2022] Open
Abstract
Tumors contain multiple subpopulations of genetically distinct cancer cells. Reconstructing their evolutionary history can improve our understanding of how cancers develop and respond to treatment. Subclonal reconstruction methods cluster mutations into groups that co-occur within the same subpopulations, estimate the frequency of cells belonging to each subpopulation, and infer the ancestral relationships among the subpopulations by constructing a clone tree. However, often multiple clone trees are consistent with the data and current methods do not efficiently capture this uncertainty; nor can these methods scale to clone trees with a large number of subclonal populations. Here, we formalize the notion of a partially-defined clone tree (partial clone tree for short) that defines a subset of the pairwise ancestral relationships in a clone tree, thereby implicitly representing the set of all clone trees that have these defined pairwise relationships. Also, we introduce a special partial clone tree, the Maximally-Constrained Ancestral Reconstruction (MAR), which summarizes all clone trees fitting the input data equally well. Finally, we extend commonly used clone tree validity conditions to apply to partial clone trees and describe SubMARine, a polynomial-time algorithm producing the subMAR, which approximates the MAR and guarantees that its defined relationships are a subset of those present in the MAR. We also extend SubMARine to work with subclonal copy number aberrations and define equivalence constraints for this purpose. Further, we extend SubMARine to permit noise in the estimates of the subclonal frequencies while retaining its validity conditions and guarantees. In contrast to other clone tree reconstruction methods, SubMARine runs in time and space that scale polynomially in the number of subclones. We show through extensive noise-free simulation, a large lung cancer dataset and a prostate cancer dataset that the subMAR equals the MAR in all cases where only a single clone tree exists and that it is a perfect match to the MAR in most of the other cases. Notably, SubMARine runs in less than 70 seconds on a single thread with less than one Gb of memory on all datasets presented in this paper, including ones with 50 nodes in a clone tree. On the real-world data, SubMARine almost perfectly recovers the previously reported trees and identifies minor errors made in the expert-driven reconstructions of those trees. The freely-available open-source code implementing SubMARine can be downloaded at https://github.com/morrislab/submarine. Cancer cells accumulate mutations over time and consist of genetically distinct subpopulations. Their evolutionary history (as represented by tumor phylogenies) can be inferred from bulk cancer genome sequencing data. Current tumor phylogeny reconstruction methods have two main issues: they are slow, and they do not efficiently represent uncertainty in the reconstruction. To address these issues, we developed SubMARine, a fast algorithm that summarizes all valid phylogenies in an intuitive format. SubMARine solved all reconstruction problems in this manuscript in less than 70 seconds, orders of magnitude faster than other methods. These reconstruction problems included those with up to 50 subclones; problems that are too large for other algorithms to even attempt. SubMARine achieves these result because, unlike other algorithms, it performs its reconstruction by identifying an upper-bound on the solution set of trees and the amount of noise in the estimates of the subclonal frequencies. In the vast majority of cases we checked, i. e. an extensive noise-free simulation, a lung cancer and a prostate cancer dataset, this upper bound is tight: when only a single solution exists, SubMARine converges to it every time. When multiple solutions exist, our algorithm correctly recovers the uncertain relationships in 71% of cases. In addition to solving these two major challenges, we introduce some useful new concepts for and open research problems in the field of tumor phylogeny reconstruction. Specifically, we formalize the concept of a partial clone tree which provides a set of constraints on the solution set of clone trees; and provide a complete set of conditions under which a partial clone tree is valid. These conditions guarantee that all trees in the solution set satisfy the constraints implied by the partial clone tree.
Collapse
Affiliation(s)
- Linda K. Sundermann
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Jeff Wintersinger
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - Gunnar Rätsch
- Department of Computer Science, ETH Zurich, Zurich, Zurich, Switzerland
- Biomedical Informatics, University Hospital Zurich, Zurich, Zurich, Switzerland
| | - Jens Stoye
- Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, North Rhine-Westphalia, Germany
| | - Quaid Morris
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Computational and Systems Biology, Memorial Sloan Kettering Cancer Center, New York City, New York, United States of America
- * E-mail:
| |
Collapse
|
24
|
Ciccolella S, Soto Gomez M, Patterson MD, Della Vedova G, Hajirasouliha I, Bonizzoni P. gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data. BMC Bioinformatics 2020; 21:413. [PMID: 33297943 PMCID: PMC7725124 DOI: 10.1186/s12859-020-03736-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Accepted: 09/03/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Cancer progression reconstruction is an important development stemming from the phylogenetics field. In this context, the reconstruction of the phylogeny representing the evolutionary history presents some peculiar aspects that depend on the technology used to obtain the data to analyze: Single Cell DNA Sequencing data have great specificity, but are affected by moderate false negative and missing value rates. Moreover, there has been some recent evidence of back mutations in cancer: this phenomenon is currently widely ignored. RESULTS We present a new tool, gpps, that reconstructs a tumor phylogeny from Single Cell Sequencing data, allowing each mutation to be lost at most a fixed number of times. The General Parsimony Phylogeny from Single cell (gpps) tool is open source and available at https://github.com/AlgoLab/gpps . CONCLUSIONS gpps provides new insights to the analysis of intra-tumor heterogeneity by proposing a new progression model to the field of cancer phylogeny reconstruction on Single Cell data.
Collapse
Affiliation(s)
- Simone Ciccolella
- Department of Informatics, Systems, and Communication, University of Milano - Bicocca, Milan, Italy.
| | - Mauricio Soto Gomez
- Department of Informatics, Systems, and Communication, University of Milano - Bicocca, Milan, Italy
| | - Murray D Patterson
- Department of Informatics, Systems, and Communication, University of Milano - Bicocca, Milan, Italy.,Georgia State University, Atlanta, GA, USA
| | - Gianluca Della Vedova
- Department of Informatics, Systems, and Communication, University of Milano - Bicocca, Milan, Italy
| | - Iman Hajirasouliha
- Institute for Computational Biomedicine, Weill Cornell Medicine, New York City, NY, USA.,Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, NewYork City, 10021, NY, USA
| | - Paola Bonizzoni
- Department of Informatics, Systems, and Communication, University of Milano - Bicocca, Milan, Italy
| |
Collapse
|
25
|
Sadeqi Azer E, Haghir Ebrahimabadi M, Malikić S, Khardon R, Sahinalp SC. Tumor Phylogeny Topology Inference via Deep Learning. iScience 2020; 23:101655. [PMID: 33117968 PMCID: PMC7582044 DOI: 10.1016/j.isci.2020.101655] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Revised: 08/10/2020] [Accepted: 10/02/2020] [Indexed: 01/24/2023] Open
Abstract
Principled computational approaches for tumor phylogeny reconstruction via single-cell sequencing typically aim to build the most likely perfect phylogeny tree from the noisy genotype matrix - which represents genotype calls of single cells. This problem is NP-hard, and as a result, existing approaches aim to solve relatively small instances of it through combinatorial optimization techniques or Bayesian inference. As expected, even when the goal is to infer basic topological features of the tumor phylogeny, rather than reconstructing the topology entirely, these approaches could be prohibitively slow. In this paper, we introduce fast deep learning solutions to the problems of inferring whether the most likely tree has a linear (chain) or branching topology and whether a perfect phylogeny is feasible from a given genotype matrix. We also present a reinforcement learning approach for reconstructing the most likely tumor phylogeny. This preliminary work demonstrates that data-driven approaches can reconstruct key features of tumor evolution.
Collapse
Affiliation(s)
- Erfan Sadeqi Azer
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA
| | - Mohammad Haghir Ebrahimabadi
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Salem Malikić
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Roni Khardon
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA
| | - S. Cenk Sahinalp
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
26
|
Clonal evolution of acute myeloid leukemia revealed by high-throughput single-cell genomics. Nat Commun 2020; 11:5327. [PMID: 33087716 PMCID: PMC7577981 DOI: 10.1038/s41467-020-19119-8] [Citation(s) in RCA: 215] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Accepted: 09/22/2020] [Indexed: 02/04/2023] Open
Abstract
Clonal diversity is a consequence of cancer cell evolution driven by Darwinian selection. Precise characterization of clonal architecture is essential to understand the evolutionary history of tumor development and its association with treatment resistance. Here, using a single-cell DNA sequencing, we report the clonal architecture and mutational histories of 123 acute myeloid leukemia (AML) patients. The single-cell data reveals cell-level mutation co-occurrence and enables reconstruction of mutational histories characterized by linear and branching patterns of clonal evolution, with the latter including convergent evolution. Through xenotransplantion, we show leukemia initiating capabilities of individual subclones evolving in parallel. Also, by simultaneous single-cell DNA and cell surface protein analysis, we illustrate both genetic and phenotypic evolution in AML. Lastly, single-cell analysis of longitudinal samples reveals underlying evolutionary process of therapeutic resistance. Together, these data unravel clonal diversity and evolution patterns of AML, and highlight their clinical relevance in the era of precision medicine. Understanding the evolutionary trajectory of cancer samples may enable understanding resistance to treatment. Here, the authors used single cell sequencing of a cohort of acute myeloid leukemia tumours and identify features of linear and branching evolution in tumours.
Collapse
|
27
|
Xiao Y, Wang X, Zhang H, Ulintz PJ, Li H, Guan Y. FastClone is a probabilistic tool for deconvoluting tumor heterogeneity in bulk-sequencing samples. Nat Commun 2020; 11:4469. [PMID: 32901013 PMCID: PMC7478963 DOI: 10.1038/s41467-020-18169-2] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Accepted: 08/06/2020] [Indexed: 02/06/2023] Open
Abstract
Dissecting tumor heterogeneity is a key to understanding the complex mechanisms underlying drug resistance in cancers. The rich literature of pioneering studies on tumor heterogeneity analysis spurred a recent community-wide benchmark study that compares diverse modeling algorithms. Here we present FastClone, a top-performing algorithm in accuracy in this benchmark. FastClone improves over existing methods by allowing the deconvolution of subclones that have independent copy number variation events within the same chromosome regions. We characterize the behavior of FastClone in identifying subclones using stage III colon cancer primary tumor samples as well as simulated data. It achieves approximately 100-fold acceleration in computation for both simulated and patient data. The efficacy of FastClone will allow its application to large-scale data and clinical data, and facilitate personalized medicine in cancers.
Collapse
Affiliation(s)
- Yao Xiao
- Department of Computational Medicine and Bioinformatics, Michigan Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Xueqing Wang
- Department of Computational Medicine and Bioinformatics, Michigan Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Hongjiu Zhang
- Department of Computational Medicine and Bioinformatics, Michigan Medicine, University of Michigan, Ann Arbor, MI, USA.,Microsoft Inc., Redmond, WA, USA
| | - Peter J Ulintz
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Hongyang Li
- Department of Computational Medicine and Bioinformatics, Michigan Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, Michigan Medicine, University of Michigan, Ann Arbor, MI, USA. .,Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
28
|
Lee D, Park Y, Kim S. Towards multi-omics characterization of tumor heterogeneity: a comprehensive review of statistical and machine learning approaches. Brief Bioinform 2020; 22:5896573. [PMID: 34020548 DOI: 10.1093/bib/bbaa188] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 06/29/2020] [Accepted: 07/21/2020] [Indexed: 12/19/2022] Open
Abstract
The multi-omics molecular characterization of cancer opened a new horizon for our understanding of cancer biology and therapeutic strategies. However, a tumor biopsy comprises diverse types of cells limited not only to cancerous cells but also to tumor microenvironmental cells and adjacent normal cells. This heterogeneity is a major confounding factor that hampers a robust and reproducible bioinformatic analysis for biomarker identification using multi-omics profiles. Besides, the heterogeneity itself has been recognized over the years for its significant prognostic values in some cancer types, thus offering another promising avenue for therapeutic intervention. A number of computational approaches to unravel such heterogeneity from high-throughput molecular profiles of a tumor sample have been proposed, but most of them rely on the data from an individual omics layer. Since the heterogeneity of cells is widely distributed across multi-omics layers, methods based on an individual layer can only partially characterize the heterogeneous admixture of cells. To help facilitate further development of the methodologies that synchronously account for several multi-omics profiles, we wrote a comprehensive review of diverse approaches to characterize tumor heterogeneity based on three different omics layers: genome, epigenome and transcriptome. As a result, this review can be useful for the analysis of multi-omics profiles produced by many large-scale consortia. Contact:sunkim.bioinfo@snu.ac.kr.
Collapse
Affiliation(s)
- Dohoon Lee
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Youngjune Park
- Department of Computer Science and Engineering, Institute of Engineering Research, Seoul National University, Seoul 08826, Korea
| | - Sun Kim
- Bioinformatics Institute, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
29
|
Miura S, Vu T, Deng J, Buturla T, Oladeinde O, Choi J, Kumar S. Power and pitfalls of computational methods for inferring clone phylogenies and mutation orders from bulk sequencing data. Sci Rep 2020; 10:3498. [PMID: 32103044 PMCID: PMC7044161 DOI: 10.1038/s41598-020-59006-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Accepted: 01/23/2020] [Indexed: 12/13/2022] Open
Abstract
Tumors harbor extensive genetic heterogeneity in the form of distinct clone genotypes that arise over time and across different tissues and regions in cancer. Many computational methods produce clone phylogenies from population bulk sequencing data collected from multiple tumor samples from a patient. These clone phylogenies are used to infer mutation order and clone origins during tumor progression, rendering the selection of the appropriate clonal deconvolution method critical. Surprisingly, absolute and relative accuracies of these methods in correctly inferring clone phylogenies are yet to consistently assessed. Therefore, we evaluated the performance of seven computational methods. The accuracy of the reconstructed mutation order and inferred clone groupings varied extensively among methods. All the tested methods showed limited ability to identify ancestral clone sequences present in tumor samples correctly. The presence of copy number alterations, the occurrence of multiple seeding events among tumor sites during metastatic tumor evolution, and extensive intermixture of cancer cells among tumors hindered the detection of clones and the inference of clone phylogenies for all methods tested. Overall, CloneFinder, MACHINA, and LICHeE showed the highest overall accuracy, but none of the methods performed well for all simulated datasets. So, we present guidelines for selecting methods for data analysis.
Collapse
Affiliation(s)
- Sayaka Miura
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA.,Department of Biology, Temple University, Philadelphia, PA, 19122, USA
| | - Tracy Vu
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA.,Department of Biology, Temple University, Philadelphia, PA, 19122, USA
| | - Jiamin Deng
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA.,Department of Biology, Temple University, Philadelphia, PA, 19122, USA
| | - Tiffany Buturla
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA.,Department of Biology, Temple University, Philadelphia, PA, 19122, USA
| | - Olumide Oladeinde
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA.,Department of Biology, Temple University, Philadelphia, PA, 19122, USA
| | - Jiyeong Choi
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA.,Department of Biology, Temple University, Philadelphia, PA, 19122, USA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA. .,Department of Biology, Temple University, Philadelphia, PA, 19122, USA. .,Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia.
| |
Collapse
|
30
|
Husić E, Li X, Hujdurović A, Mehine M, Rizzi R, Mäkinen V, Milanič M, Tomescu AI. MIPUP: minimum perfect unmixed phylogenies for multi-sampled tumors via branchings and ILP. Bioinformatics 2019; 35:769-777. [PMID: 30101335 PMCID: PMC6394401 DOI: 10.1093/bioinformatics/bty683] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2018] [Revised: 07/08/2018] [Accepted: 08/07/2018] [Indexed: 12/31/2022] Open
Abstract
Motivation Discovering the evolution of a tumor may help identify driver mutations and provide a more comprehensive view on the history of the tumor. Recent studies have tackled this problem using multiple samples sequenced from a tumor, and due to clinical implications, this has attracted great interest. However, such samples usually mix several distinct tumor subclones, which confounds the discovery of the tumor phylogeny. Results We study a natural problem formulation requiring to decompose the tumor samples into several subclones with the objective of forming a minimum perfect phylogeny. We propose an Integer Linear Programming formulation for it, and implement it into a method called MIPUP. We tested the ability of MIPUP and of four popular tools LICHeE, AncesTree, CITUP, Treeomics to reconstruct the tumor phylogeny. On simulated data, MIPUP shows up to a 34% improvement under the ancestor-descendant relations metric. On four real datasets, MIPUP’s reconstructions proved to be generally more faithful than those of LICHeE. Availability and implementation MIPUP is available at https://github.com/zhero9/MIPUP as open source. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Edin Husić
- Department of Mathematics, London School of Economics and Political Science, London, UK
| | - Xinyue Li
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Finland
| | - Ademir Hujdurović
- University of Primorska, UP IAM, Koper, Slovenia.,University of Primorska, UP FAMNIT, Koper, Slovenia
| | - Miika Mehine
- Genome-Scale Biology Research Program, Research Programs Unit, Department of Medical and Clinical Genetics, Faculty of Medicine, University of Helsinki, Medicum, Helsinki, Finland
| | - Romeo Rizzi
- Department of Computer Science, University of Verona, Verona, Italy
| | - Veli Mäkinen
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Finland
| | - Martin Milanič
- University of Primorska, UP IAM, Koper, Slovenia.,University of Primorska, UP FAMNIT, Koper, Slovenia
| | - Alexandru I Tomescu
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Finland
| |
Collapse
|
31
|
Ismail WM, Nzabarushimana E, Tang H. Algorithmic approaches to clonal reconstruction in heterogeneous cell populations. QUANTITATIVE BIOLOGY 2019; 7:255-265. [PMID: 32431959 PMCID: PMC7236794 DOI: 10.1007/s40484-019-0188-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2019] [Revised: 08/09/2019] [Accepted: 08/25/2019] [Indexed: 12/15/2022]
Abstract
BACKGROUND The reconstruction of clonal haplotypes and their evolutionary history in evolving populations is a common problem in both microbial evolutionary biology and cancer biology. The clonal theory of evolution provides a theoretical framework for modeling the evolution of clones. RESULTS In this paper, we review the theoretical framework and assumptions over which the clonal reconstruction problem is formulated. We formally define the problem and then discuss the complexity and solution space of the problem. Various methods have been proposed to find the phylogeny that best explains the observed data. We categorize these methods based on the type of input data that they use (space-resolved or time-resolved), and also based on their computational formulation as either combinatorial or probabilistic. It is crucial to understand the different types of input data because each provides essential but distinct information for drastically reducing the solution space of the clonal reconstruction problem. Complementary information provided by single cell sequencing or from whole genome sequencing of randomly isolated clones can also improve the accuracy of clonal reconstruction. We briefly review the existing algorithms and their relationships. Finally we summarize the tools that are developed for either directly solving the clonal reconstruction problem or a related computational problem. CONCLUSIONS In this review, we discuss the various formulations of the problem of inferring the clonal evolutionary history from allele frequeny data, review existing algorithms and catergorize them according to their problem formulation and solution approaches. We note that most of the available clonal inference algorithms were developed for elucidating tumor evolution whereas clonal reconstruction for unicellular genomes are less addressed. We conclude the review by discussing more open problems such as the lack of benchmark datasets and comparison of performance between available tools.
Collapse
Affiliation(s)
- Wazim Mohammed Ismail
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN 47405-7000, USA
| | - Etienne Nzabarushimana
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN 47405-7000, USA
| | - Haixu Tang
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN 47405-7000, USA
| |
Collapse
|
32
|
Malikic S, Mehrabadi FR, Ciccolella S, Rahman MK, Ricketts C, Haghshenas E, Seidman D, Hach F, Hajirasouliha I, Sahinalp SC. PhISCS: a combinatorial approach for subperfect tumor phylogeny reconstruction via integrative use of single-cell and bulk sequencing data. Genome Res 2019; 29:1860-1877. [PMID: 31628256 PMCID: PMC6836735 DOI: 10.1101/gr.234435.118] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Accepted: 09/11/2019] [Indexed: 12/29/2022]
Abstract
Available computational methods for tumor phylogeny inference via single-cell sequencing (SCS) data typically aim to identify the most likely perfect phylogeny tree satisfying the infinite sites assumption (ISA). However, the limitations of SCS technologies including frequent allele dropout and variable sequence coverage may prohibit a perfect phylogeny. In addition, ISA violations are commonly observed in tumor phylogenies due to the loss of heterozygosity, deletions, and convergent evolution. In order to address such limitations, we introduce the optimal subperfect phylogeny problem which asks to integrate SCS data with matching bulk sequencing data by minimizing a linear combination of potential false negatives (due to allele dropout or variance in sequence coverage), false positives (due to read errors) among mutation calls, and the number of mutations that violate ISA (real or because of incorrect copy number estimation). We then describe a combinatorial formulation to solve this problem which ensures that several lineage constraints imposed by the use of variant allele frequencies (VAFs, derived from bulk sequence data) are satisfied. We express our formulation both in the form of an integer linear program (ILP) and—as a first in tumor phylogeny reconstruction—a Boolean constraint satisfaction problem (CSP) and solve them by leveraging state-of-the-art ILP/CSP solvers. The resulting method, which we name PhISCS, is the first to integrate SCS and bulk sequencing data while accounting for ISA violating mutations. In contrast to the alternative methods, typically based on probabilistic approaches, PhISCS provides a guarantee of optimality in reported solutions. Using simulated and real data sets, we demonstrate that PhISCS is more general and accurate than all available approaches.
Collapse
Affiliation(s)
- Salem Malikic
- School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | - Farid Rashidi Mehrabadi
- Department of Computer Science, Indiana University, Bloomington, Indiana 47408, USA.,Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Simone Ciccolella
- Department of Computer Systems and Communication, University of Milano-Bicocca, 20136 Milan, Italy.,Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York 10065, USA
| | - Md Khaledur Rahman
- Department of Computer Science, Indiana University, Bloomington, Indiana 47408, USA
| | - Camir Ricketts
- Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York 10065, USA.,Tri-I Computational Biology and Medicine Graduate Program, Cornell University, New York, New York 10065, USA
| | - Ehsan Haghshenas
- School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | - Daniel Seidman
- Tri-I Computational Biology and Medicine Graduate Program, Cornell University, New York, New York 10065, USA
| | - Faraz Hach
- School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada.,Department of Urologic Sciences, University of British Columbia, Vancouver, BC V5Z 1M9, Canada.,Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada
| | - Iman Hajirasouliha
- Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York 10065, USA.,Department of Physiology and Biophysics, Englander Institute for Precision Medicine, The Meyer Cancer Center, Weill Cornell Medicine, New York, New York 10065, USA
| | - S Cenk Sahinalp
- Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| |
Collapse
|
33
|
Qi Y, Pradhan D, El-Kebir M. Implications of non-uniqueness in phylogenetic deconvolution of bulk DNA samples of tumors. Algorithms Mol Biol 2019; 14:19. [PMID: 31497065 PMCID: PMC6719395 DOI: 10.1186/s13015-019-0155-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2019] [Accepted: 08/17/2019] [Indexed: 12/11/2022] Open
Abstract
Background Tumors exhibit extensive intra-tumor heterogeneity, the presence of groups of cellular populations with distinct sets of somatic mutations. This heterogeneity is the result of an evolutionary process, described by a phylogenetic tree. In addition to enabling clinicians to devise patient-specific treatment plans, phylogenetic trees of tumors enable researchers to decipher the mechanisms of tumorigenesis and metastasis. However, the problem of reconstructing a phylogenetic tree T given bulk sequencing data from a tumor is more complicated than the classic phylogeny inference problem. Rather than observing the leaves of T directly, we are given mutation frequencies that are the result of mixtures of the leaves of T. The majority of current tumor phylogeny inference methods employ the perfect phylogeny evolutionary model. The underlying Perfect Phylogeny Mixture (PPM) combinatorial problem typically has multiple solutions. Results We prove that determining the exact number of solutions to the PPM problem is #P-complete and hard to approximate within a constant factor. Moreover, we show that sampling solutions uniformly at random is hard as well. On the positive side, we provide a polynomial-time computable upper bound on the number of solutions and introduce a simple rejection-sampling based scheme that works well for small instances. Using simulated and real data, we identify factors that contribute to and counteract non-uniqueness of solutions. In addition, we study the sampling performance of current methods, identifying significant biases. Conclusions Awareness of non-uniqueness of solutions to the PPM problem is key to drawing accurate conclusions in downstream analyses based on tumor phylogenies. This work provides the theoretical foundations for non-uniqueness of solutions in tumor phylogeny inference from bulk DNA samples.
Collapse
|
34
|
Karpov N, Malikic S, Rahman MK, Sahinalp SC. A multi-labeled tree dissimilarity measure for comparing "clonal trees" of tumor progression. Algorithms Mol Biol 2019; 14:17. [PMID: 31372179 PMCID: PMC6661107 DOI: 10.1186/s13015-019-0152-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Accepted: 07/15/2019] [Indexed: 12/18/2022] Open
Abstract
We introduce a new dissimilarity measure between a pair of "clonal trees", each representing the progression and mutational heterogeneity of a tumor sample, constructed by the use of single cell or bulk high throughput sequencing data. In a clonal tree, each vertex represents a specific tumor clone, and is labeled with one or more mutations in a way that each mutation is assigned to the oldest clone that harbors it. Given two clonal trees, our multi-labeled tree dissimilarity (MLTD) measure is defined as the minimum number of mutation/label deletions, (empty) leaf deletions, and vertex (clonal) expansions, applied in any order, to convert each of the two trees to the maximum common tree. We show that the MLTD measure can be computed efficiently in polynomial time and it captures the similarity between trees of different clonal granularity well.
Collapse
Affiliation(s)
- Nikolai Karpov
- Department of Computer Science, Indiana University, Bloomington, IN USA
| | - Salem Malikic
- School of Computing Science, Simon Fraser University, Burnaby, BC Canada
| | | | - S. Cenk Sahinalp
- Department of Computer Science, Indiana University, Bloomington, IN USA
| |
Collapse
|
35
|
Aguse N, Qi Y, El-Kebir M. Summarizing the solution space in tumor phylogeny inference by multiple consensus trees. Bioinformatics 2019; 35:i408-i416. [PMID: 31510657 PMCID: PMC6612807 DOI: 10.1093/bioinformatics/btz312] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
MOTIVATION Cancer phylogenies are key to studying tumorigenesis and have clinical implications. Due to the heterogeneous nature of cancer and limitations in current sequencing technology, current cancer phylogeny inference methods identify a large solution space of plausible phylogenies. To facilitate further downstream analyses, methods that accurately summarize such a set T of cancer phylogenies are imperative. However, current summary methods are limited to a single consensus tree or graph and may miss important topological features that are present in different subsets of candidate trees. RESULTS We introduce the Multiple Consensus Tree (MCT) problem to simultaneously cluster T and infer a consensus tree for each cluster. We show that MCT is NP-hard, and present an exact algorithm based on mixed integer linear programming (MILP). In addition, we introduce a heuristic algorithm that efficiently identifies high-quality consensus trees, recovering all optimal solutions identified by the MILP in simulated data at a fraction of the time. We demonstrate the applicability of our methods on both simulated and real data, showing that our approach selects the number of clusters depending on the complexity of the solution space T. AVAILABILITY AND IMPLEMENTATION https://github.com/elkebir-group/MCT. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nuraini Aguse
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Yuanyuan Qi
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Mohammed El-Kebir
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| |
Collapse
|
36
|
Malikic S, Jahn K, Kuipers J, Sahinalp SC, Beerenwinkel N. Integrative inference of subclonal tumour evolution from single-cell and bulk sequencing data. Nat Commun 2019; 10:2750. [PMID: 31227714 PMCID: PMC6588593 DOI: 10.1038/s41467-019-10737-5] [Citation(s) in RCA: 81] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Accepted: 05/30/2019] [Indexed: 02/07/2023] Open
Abstract
Understanding the clonal architecture and evolutionary history of a tumour poses one of the key challenges to overcome treatment failure due to resistant cell populations. Previously, studies on subclonal tumour evolution have been primarily based on bulk sequencing and in some recent cases on single-cell sequencing data. Either data type alone has shortcomings with regard to this task, but methods integrating both data types have been lacking. Here, we present B-SCITE, the first computational approach that infers tumour phylogenies from combined single-cell and bulk sequencing data. Using a comprehensive set of simulated data, we show that B-SCITE systematically outperforms existing methods with respect to tree reconstruction accuracy and subclone identification. B-SCITE provides high-fidelity reconstructions even with a modest number of single cells and in cases where bulk allele frequencies are affected by copy number changes. On real tumour data, B-SCITE generated mutation histories show high concordance with expert generated trees. Intra-tumour heterogeneity provides important information about subclonal tumour evolution. Here, the authors develop B-SCITE, a computational method for inferring tumour phylogenies from combined single-cell and bulk sequencing data.
Collapse
Affiliation(s)
- Salem Malikic
- School of Computing Science, Simon Fraser University, Burnaby, V5A 1S6, BC, Canada.,Vancouver Prostate Centre, Vancouver, V6H 3Z6, BC, Canada
| | - Katharina Jahn
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
| | - Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
| | - S Cenk Sahinalp
- Department of Computer Science, Indiana University, Bloomington, 47405, IN, USA.
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland. .,Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland.
| |
Collapse
|
37
|
Myers MA, Satas G, Raphael BJ. CALDER: Inferring Phylogenetic Trees from Longitudinal Tumor Samples. Cell Syst 2019; 8:514-522.e5. [PMID: 31229560 DOI: 10.1016/j.cels.2019.05.010] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Accepted: 05/23/2019] [Indexed: 02/07/2023]
Abstract
Longitudinal DNA sequencing of cancer patients yields insight into how tumors evolve over time or in response to treatment. However, sequencing data from bulk tumor samples often have considerable ambiguity in clonal composition, complicating the inference of ancestral relationships between clones. We introduce Cancer Analysis of Longitudinal Data through Evolutionary Reconstruction (CALDER), an algorithm to infer phylogenetic trees from longitudinal bulk DNA sequencing data. CALDER explicitly models a longitudinally observed phylogeny incorporating constraints that longitudinal sampling imposes on phylogeny reconstruction. We show on simulated bulk tumor data that longitudinal constraints substantially reduce ambiguity in phylogeny reconstruction and that CALDER outperforms existing methods that do not leverage this longitudinal information. On real data from two chronic lymphocytic leukemia patients, we find that CALDER reconstructs more plausible and parsimonious phylogenies than existing methods, with CALDER phylogenies containing fewer tumor clones per sample. CALDER's use of longitudinal information will be advantageous in further studies of tumor heterogeneity and evolution.
Collapse
Affiliation(s)
- Matthew A Myers
- Department of Computer Science, Princeton University, Princeton, NJ 08540, USA
| | - Gryte Satas
- Department of Computer Science, Princeton University, Princeton, NJ 08540, USA; Department of Computer Science, Brown University, Providence, RI 02912, USA
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ 08540, USA.
| |
Collapse
|
38
|
Toosi H, Moeini A, Hajirasouliha I. BAMSE: Bayesian model selection for tumor phylogeny inference among multiple samples. BMC Bioinformatics 2019; 20:282. [PMID: 31167637 PMCID: PMC6551234 DOI: 10.1186/s12859-019-2824-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Intra-tumor heterogeneity is known to contribute to cancer complexity and drug resistance. Understanding the number of distinct subclones and the evolutionary relationships between them is scientifically and clinically very important and still a challenging problem. RESULTS In this paper, we present BAMSE (BAyesian Model Selection for tumor Evolution), a new probabilistic method for inferring subclonal history and lineage tree reconstruction of heterogeneous tumor samples. BAMSE uses somatic mutation read counts as input and can leverage multiple tumor samples accurately and efficiently. In the first step, possible clusterings of mutations into subclones are scored and a user defined number are selected for further analysis. In the next step, for each of these candidates, a list of trees describing the evolutionary relationships between the subclones is generated. These trees are sorted by their posterior probability. The posterior probability is calculated using a Bayesian model that integrates prior belief about the number of subclones, the composition of the tumor and the process of subclonal evolution. BAMSE also takes the sequencing error into account. We benchmarked BAMSE against state of the art software using simulated datasets. CONCLUSIONS In this work we developed a flexible and fast software to reconstruct the history of a tumor's subclonal evolution using somatic mutation read counts across multiple samples. BAMSE software is implemented in Python and is available open source under GNU GLPv3 at https://github.com/HoseinT/BAMSE .
Collapse
Affiliation(s)
- Hosein Toosi
- Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Ali Moeini
- School of Engineering Sciences, College of Engineering, University of Tehran, Tehran, Iran.
| | - Iman Hajirasouliha
- Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA. .,Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA. .,Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY, USA. .,The Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA.
| |
Collapse
|
39
|
Ramazzotti D, Graudenzi A, De Sano L, Antoniotti M, Caravagna G. Learning mutational graphs of individual tumour evolution from single-cell and multi-region sequencing data. BMC Bioinformatics 2019; 20:210. [PMID: 31023236 PMCID: PMC6485126 DOI: 10.1186/s12859-019-2795-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2018] [Accepted: 04/08/2019] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND A large number of algorithms is being developed to reconstruct evolutionary models of individual tumours from genome sequencing data. Most methods can analyze multiple samples collected either through bulk multi-region sequencing experiments or the sequencing of individual cancer cells. However, rarely the same method can support both data types. RESULTS We introduce TRaIT, a computational framework to infer mutational graphs that model the accumulation of multiple types of somatic alterations driving tumour evolution. Compared to other tools, TRaIT supports multi-region and single-cell sequencing data within the same statistical framework, and delivers expressive models that capture many complex evolutionary phenomena. TRaIT improves accuracy, robustness to data-specific errors and computational complexity compared to competing methods. CONCLUSIONS We show that the application of TRaIT to single-cell and multi-region cancer datasets can produce accurate and reliable models of single-tumour evolution, quantify the extent of intra-tumour heterogeneity and generate new testable experimental hypotheses.
Collapse
Affiliation(s)
| | - Alex Graudenzi
- Dipartimento di Informatica, Sistemistica e Comunicazione, Università degli Studi di Milano-Bicocca, Viale Sarca 336, Milan, 20126 Italy
- Institute of Molecular Bioimaging and Physiology of the Italian National Research Council (IBFM-CNR), Viale F.lli Cervi 93, Segrate, Milan, 20090 Italy
| | - Luca De Sano
- Dipartimento di Informatica, Sistemistica e Comunicazione, Università degli Studi di Milano-Bicocca, Viale Sarca 336, Milan, 20126 Italy
| | - Marco Antoniotti
- Dipartimento di Informatica, Sistemistica e Comunicazione, Università degli Studi di Milano-Bicocca, Viale Sarca 336, Milan, 20126 Italy
- Milan Center for Neuroscience, Università degli Studi di Milano-Bicocca, San Gerardo Hospital, Via Pergolesi 33, Monza, 20052 Italy
| | - Giulio Caravagna
- Centre for Evolution and Cancer, The Institute of Cancer Research, 15 Cotswold Road, London, SM2 5NG UK
| |
Collapse
|
40
|
Wang Y, Zhang X, Ding S, Geng Y, Liu J, Zhao Z, Zhang R, Xiao X, Wang J. A graph-based algorithm for estimating clonal haplotypes of tumor sample from sequencing data. BMC Med Genomics 2019; 12:27. [PMID: 30704456 PMCID: PMC6357344 DOI: 10.1186/s12920-018-0457-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Haplotype phasing is an important step in many bioinformatics workflows. In cancer genomics, it is suggested that reconstructing the clonal haplotypes of a tumor sample could facilitate a comprehensive understanding of its clonal architecture and further provide valuable reference in clinical diagnosis and treatment. However, the sequencing data is an admixture of reads sampled from different clonal haplotypes, which complicates the computational problem by exponentially increasing the solution-space and leads the existing algorithms to an unacceptable time-/space- complexity. In addition, the evolutionary process among clonal haplotypes further weakens those algorithms by bringing indistinguishable candidate solutions. RESULTS To improve the algorithmic performance of phasing clonal haplotypes, in this article, we propose MixSubHap, which is a graph-based computational pipeline working on cancer sequencing data. To reduce the computation complexity, MixSubHap adopts three bounding strategies to limit the solution space and filter out false positive candidates. It first estimates the global clonal structure by clustering the variant allelic frequencies on sampled point mutations. This offers a priori on the number of clonal haplotypes when copy-number variations are not considered. Then, it utilizes a greedy extension algorithm to approximately find the longest linkage of the locally assembled contigs. Finally, it incorporates a read-depth stripping algorithm to filter out false linkages according to the posterior estimation of tumor purity and the estimated percentage of each sub-clone in the sample. A series of experiments are conducted to verify the performance of the proposed pipeline. CONCLUSIONS The results demonstrate that MixSubHap is able to identify about 90% on average of the preset clonal haplotypes under different simulation configurations. Especially, MixSubHap is robust when decreasing the mutation rates, in which cases the longest assembled contig could reach to 10kbps, while the accuracy of assigning a mutation to its haplotype still keeps more than 60% on average. MixSubHap is considered as a practical algorithm to reconstruct clonal haplotypes from cancer sequencing data. The source codes have been uploaded and maintained at https://github.com/YixuanWang1120/MixSubHap for academic use only.
Collapse
Affiliation(s)
- Yixuan Wang
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710048 China
- Shaanxi Engineering Research Center of Medical and Health Big Data, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710048 China
| | - Xuanping Zhang
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710048 China
- Shaanxi Engineering Research Center of Medical and Health Big Data, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710048 China
| | - Shuai Ding
- School of Management, Ministry of Education Key Laboratory of Process Optimization and Intelligent Decision-Making, Hefei University of Technology, Hefei, 23009 China
| | - Yu Geng
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710048 China
- Shaanxi Engineering Research Center of Medical and Health Big Data, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710048 China
| | - Jianye Liu
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710048 China
- Shaanxi Engineering Research Center of Medical and Health Big Data, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710048 China
| | - Zhongmeng Zhao
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710048 China
- Shaanxi Engineering Research Center of Medical and Health Big Data, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710048 China
| | - Rong Zhang
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710048 China
- Shaanxi Engineering Research Center of Medical and Health Big Data, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710048 China
| | - Xiao Xiao
- Shaanxi Engineering Research Center of Medical and Health Big Data, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710048 China
- Institute of Health Administration and Policy, School of Public Policy and Administration, Xi’an Jiaotong University, Xi’an, 710048 China
| | - Jiayin Wang
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710048 China
- Shaanxi Engineering Research Center of Medical and Health Big Data, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710048 China
| |
Collapse
|
41
|
Nieboer MM, Dorssers LCJ, Straver R, Looijenga LHJ, de Ridder J. TargetClone: A multi-sample approach for reconstructing subclonal evolution of tumors. PLoS One 2018; 13:e0208002. [PMID: 30496231 PMCID: PMC6264523 DOI: 10.1371/journal.pone.0208002] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2018] [Accepted: 11/11/2018] [Indexed: 11/18/2022] Open
Abstract
Most tumors are composed of a heterogeneous population of subclones. A more detailed insight into the subclonal evolution of these tumors can be helpful to study progression and treatment response. Problematically, tumor samples are typically very heterogeneous, making deconvolving individual tumor subclones a major challenge. To overcome this limitation, reducing heterogeneity, such as by means of microdissections, coupled with targeted sequencing, is a viable approach. However, computational methods that enable reconstruction of the evolutionary relationships require unbiased read depth measurements, which are commonly challenging to obtain in this setting. We introduce TargetClone, a novel method to reconstruct the subclonal evolution tree of tumors from single-nucleotide polymorphism allele frequency and somatic single-nucleotide variant measurements. Furthermore, our method infers copy numbers, alleles and the fraction of the tumor component in each sample. TargetClone was specifically designed for targeted sequencing data obtained from microdissected samples. We demonstrate that our method obtains low error rates on simulated data. Additionally, we show that our method is able to reconstruct expected trees in a testicular germ cell cancer and ovarian cancer dataset. The TargetClone package including tree visualization is written in Python and is publicly available at https://github.com/UMCUGenetics/targetclone.
Collapse
Affiliation(s)
- Marleen M. Nieboer
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Lambert C. J. Dorssers
- Department of Pathology, Erasmus MC Cancer Institute, University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - Roy Straver
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Leendert H. J. Looijenga
- Department of Pathology, Erasmus MC Cancer Institute, University Medical Center Rotterdam, Rotterdam, The Netherlands
- Princess Maxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Jeroen de Ridder
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
| |
Collapse
|
42
|
Dang HX, White BS, Foltz SM, Miller CA, Luo J, Fields RC, Maher CA. ClonEvol: clonal ordering and visualization in cancer sequencing. Ann Oncol 2018; 28:3076-3082. [PMID: 28950321 DOI: 10.1093/annonc/mdx517] [Citation(s) in RCA: 118] [Impact Index Per Article: 16.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Background Reconstruction of clonal evolution is critical for understanding tumor progression and implementing personalized therapies. This is often done by clustering somatic variants based on their cellular prevalence estimated via bulk tumor sequencing of multiple samples. The clusters, consisting of the clonal marker variants, are then ordered based on their estimated cellular prevalence to reconstruct clonal evolution trees, a process referred to as 'clonal ordering'. However, cellular prevalence estimate is confounded by statistical variability and errors in sequencing/data analysis, and therefore inhibits accurate reconstruction of the clonal evolution. This problem is further complicated by intra- and inter-tumor heterogeneity. Furthermore, the field lacks a comprehensive visualization tool to facilitate the interpretation of complex clonal relationships. To address these challenges we developed ClonEvol, a unified software tool for clonal ordering, visualization, and interpretation. Materials and methods ClonEvol uses a bootstrap resampling technique to estimate the cellular fraction of the clones and probabilistically models the clonal ordering constraints to account for statistical variability. The bootstrapping allows identification of the sample founding- and sub-clones, thus enabling interpretation of clonal seeding. ClonEvol automates the generation of multiple widely used visualizations for reconstructing and interpreting clonal evolution. Results ClonEvol outperformed three of the state of the art tools (LICHeE, Canopy and PhyloWGS) for clonal evolution inference, showing more robust error tolerance and producing more accurate trees in a simulation. Building upon multiple recent publications that utilized ClonEvol to study metastasis and drug resistance in solid cancers, here we show that ClonEvol rediscovered relapsed subclones in two published acute myeloid leukemia patients. Furthermore, we demonstrated that through noninvasive monitoring ClonEvol recapitulated the emerging subclones throughout metastatic progression observed in the tumors of a published breast cancer patient. Conclusions ClonEvol has broad applicability for longitudinal monitoring of clonal populations in tumor biopsies, or noninvasively, to guide precision medicine. Availability ClonEvol is written in R and is available at https://github.com/ChrisMaherLab/ClonEvol.
Collapse
Affiliation(s)
- H X Dang
- McDonnell Genome Institute.,Department of Internal Medicine
| | - B S White
- McDonnell Genome Institute.,Department of Internal Medicine
| | | | | | - J Luo
- Department of Surgery.,Siteman Cancer Center
| | - R C Fields
- Department of Surgery.,Siteman Cancer Center
| | - C A Maher
- McDonnell Genome Institute.,Department of Internal Medicine.,Siteman Cancer Center.,Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, USA
| |
Collapse
|
43
|
El-Kebir M, Satas G, Raphael BJ. Inferring parsimonious migration histories for metastatic cancers. Nat Genet 2018; 50:718-726. [PMID: 29700472 PMCID: PMC6103651 DOI: 10.1038/s41588-018-0106-z] [Citation(s) in RCA: 74] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Accepted: 03/09/2018] [Indexed: 01/29/2023]
Abstract
Metastasis is the migration of cancerous cells from a primary tumor to other anatomical sites. Although metastasis was long thought to result from monoclonal seeding, or single cellular migrations, recent phylogenetic analyses of metastatic cancers have reported complex patterns of cellular migrations between sites, including polyclonal migrations and reseeding. However, accurate determination of migration patterns from somatic mutation data is complicated by intratumor heterogeneity and discordance between clonal lineage and cellular migration. We introduce MACHINA, a multi-objective optimization algorithm that jointly infers clonal lineages and parsimonious migration histories of metastatic cancers from DNA sequencing data. MACHINA analysis of data from multiple cancers shows that migration patterns are often not uniquely determined from sequencing data alone and that complicated migration patterns among primary tumors and metastases may be less prevalent than previously reported. MACHINA's rigorous analysis of migration histories will aid in studies of the drivers of metastasis.
Collapse
Affiliation(s)
- Mohammed El-Kebir
- Department of Computer Science, Princeton University, Princeton, NJ, USA
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Gryte Satas
- Department of Computer Science, Princeton University, Princeton, NJ, USA
- Department of Computer Science, Brown University, Providence, RI, USA
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ, USA.
| |
Collapse
|
44
|
Genetic alterations driving metastatic colony formation are acquired outside of the primary tumour in melanoma. Nat Commun 2018; 9:595. [PMID: 29426936 PMCID: PMC5807512 DOI: 10.1038/s41467-017-02674-y] [Citation(s) in RCA: 65] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Accepted: 12/19/2017] [Indexed: 02/07/2023] Open
Abstract
Mouse models indicate that metastatic dissemination occurs extremely early; however, the timing in human cancers is unknown. We therefore determined the time point of metastatic seeding relative to tumour thickness and genomic alterations in melanoma. Here, we find that lymphatic dissemination occurs shortly after dermal invasion of the primary lesion at a median thickness of ~0.5 mm and that typical driver changes, including BRAF mutation and gained or lost regions comprising genes like MET or CDKNA2, are acquired within the lymph node at the time of colony formation. These changes define a colonisation signature that was linked to xenograft formation in immunodeficient mice and death from melanoma. Thus, melanoma cells leave primary tumours early and evolve at different sites in parallel. We propose a model of metastatic melanoma dormancy, evolution and colonisation that will inform direct monitoring of adjuvant therapy targets.
Collapse
|
45
|
Wen Y, Wei Y, Zhang S, Li S, Liu H, Wang F, Zhao Y, Zhang D, Zhang Y. Cell subpopulation deconvolution reveals breast cancer heterogeneity based on DNA methylation signature. Brief Bioinform 2017; 18:426-440. [PMID: 27016391 DOI: 10.1093/bib/bbw028] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2015] [Indexed: 12/21/2022] Open
Abstract
Tumour heterogeneity describes the coexistence of divergent tumour cell clones within tumours, which is often caused by underlying epigenetic changes. DNA methylation is commonly regarded as a significant regulator that differs across cells and tissues. In this study, we comprehensively reviewed research progress on estimating of tumour heterogeneity. Bioinformatics-based analysis of DNA methylation has revealed the evolutionary relationships between breast cancer cell lines and tissues. Further analysis of the DNA methylation profiles in 33 breast cancer-related cell lines identified cell line-specific methylation patterns. Next, we reviewed the computational methods in inferring clonal evolution of tumours from different perspectives and then proposed a deconvolution strategy for modelling cell subclonal populations dynamics in breast cancer tissues based on DNA methylation. Further analysis of simulated cancer tissues and real cell lines revealed that this approach exhibits satisfactory performance and relative stability in estimating the composition and proportions of cellular subpopulations. The application of this strategy to breast cancer individuals of the Cancer Genome Atlas's identified different cellular subpopulations with distinct molecular phenotypes. Moreover, the current and potential future applications of this deconvolution strategy to clinical breast cancer research are discussed, and emphasis was placed on the DNA methylation-based recognition of intra-tumour heterogeneity. The wide use of these methods for estimating heterogeneity to further clinical cohorts will improve our understanding of neoplastic progression and the design of therapeutic interventions for treating breast cancer and other malignancies.
Collapse
|
46
|
Comprehensive statistical inference of the clonal structure of cancer from multiple biopsies. Sci Rep 2017; 7:16943. [PMID: 29208983 PMCID: PMC5717219 DOI: 10.1038/s41598-017-16813-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2017] [Accepted: 11/17/2017] [Indexed: 11/20/2022] Open
Abstract
A comprehensive characterization of tumor genetic heterogeneity is critical for understanding how cancers evolve and escape treatment. Although many algorithms have been developed for capturing tumor heterogeneity, they are designed for analyzing either a single type of genomic aberration or individual biopsies. Here we present THEMIS (Tumor Heterogeneity Extensible Modeling via an Integrative System), which allows for the joint analysis of different types of genomic aberrations from multiple biopsies taken from the same patient, using a dynamic graphical model. Simulation experiments demonstrate higher accuracy of THEMIS over its ancestor, TITAN. The heterogeneity analysis results from THEMIS are validated with single cell DNA sequencing from a clinical tumor biopsy. When THEMIS is used to analyze tumor heterogeneity among multiple biopsies from the same patient, it helps to reveal the mutation accumulation history, track cancer progression, and identify the mutations related to treatment resistance. We implement our model via an extensible modeling platform, which makes our approach open, reproducible, and easy for others to extend.
Collapse
|
47
|
An Improved Binary Differential Evolution Algorithm to Infer Tumor Phylogenetic Trees. BIOMED RESEARCH INTERNATIONAL 2017; 2017:5482750. [PMID: 29279850 PMCID: PMC5723949 DOI: 10.1155/2017/5482750] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2017] [Accepted: 10/18/2017] [Indexed: 12/14/2022]
Abstract
Tumourigenesis is a mutation accumulation process, which is likely to start with a mutated founder cell. The evolutionary nature of tumor development makes phylogenetic models suitable for inferring tumor evolution through genetic variation data. Copy number variation (CNV) is the major genetic marker of the genome with more genes, disease loci, and functional elements involved. Fluorescence in situ hybridization (FISH) accurately measures multiple gene copy number of hundreds of single cells. We propose an improved binary differential evolution algorithm, BDEP, to infer tumor phylogenetic tree based on FISH platform. The topology analysis of tumor progression tree shows that the pathway of tumor subcell expansion varies greatly during different stages of tumor formation. And the classification experiment shows that tree-based features are better than data-based features in distinguishing tumor. The constructed phylogenetic trees have great performance in characterizing tumor development process, which outperforms other similar algorithms.
Collapse
|
48
|
Farahani H, de Souza CPE, Billings R, Yap D, Shumansky K, Wan A, Lai D, Mes-Masson AM, Aparicio S, P Shah S. Engineered in-vitro cell line mixtures and robust evaluation of computational methods for clonal decomposition and longitudinal dynamics in cancer. Sci Rep 2017; 7:13467. [PMID: 29044127 PMCID: PMC5647443 DOI: 10.1038/s41598-017-13338-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2017] [Accepted: 09/19/2017] [Indexed: 11/30/2022] Open
Abstract
Characterization and quantification of tumour clonal populations over time via longitudinal sampling are essential components in understanding and predicting the response to therapeutic interventions. Computational methods for inferring tumour clonal composition from deep-targeted sequencing data are ubiquitous, however due to the lack of a ground truth biological data, evaluating their performance is difficult. In this work, we generate a benchmark data set that simulates tumour longitudinal growth and heterogeneity by in vitro mixing of cancer cell lines with known proportions. We apply four different algorithms to our ground truth data set and assess their performance in inferring clonal composition using different metrics. We also analyse the performance of these algorithms on breast tumour xenograft samples. We conclude that methods that can simultaneously analyse multiple samples while accounting for copy number alterations as a factor in allelic measurements exhibit the most accurate predictions. These results will inform future functional genomics oriented studies of model systems where time series measurements in the context of therapeutic interventions are becoming increasingly common. These studies will need computational models which accurately reflect the multi-factorial nature of allele measurement in cancer including, as we show here, segmental aneuploidies.
Collapse
Affiliation(s)
- Hossein Farahani
- BC Cancer Agency, Department of Molecular Oncology, Vancouver, V5Z 1L3, Canada.,University of British Columbia, Department of Pathology and Laboratory Medicine, Vancouver, V6T 2B5, Canada
| | - Camila P E de Souza
- BC Cancer Agency, Department of Molecular Oncology, Vancouver, V5Z 1L3, Canada.,University of British Columbia, Department of Pathology and Laboratory Medicine, Vancouver, V6T 2B5, Canada
| | - Raewyn Billings
- BC Cancer Agency, Department of Molecular Oncology, Vancouver, V5Z 1L3, Canada
| | - Damian Yap
- BC Cancer Agency, Department of Molecular Oncology, Vancouver, V5Z 1L3, Canada
| | - Karey Shumansky
- BC Cancer Agency, Department of Molecular Oncology, Vancouver, V5Z 1L3, Canada
| | - Adrian Wan
- BC Cancer Agency, Department of Molecular Oncology, Vancouver, V5Z 1L3, Canada
| | - Daniel Lai
- BC Cancer Agency, Department of Molecular Oncology, Vancouver, V5Z 1L3, Canada
| | - Anne-Marie Mes-Masson
- Centre de recherche du Centre hospitalier de l' Université de Montréal (CRCHUM), Montreal, Canada.,Institut du cancer de Montréal, Montreal, Canada.,Department of Medicine, Université de Montréal, Montreal, Canada
| | - Samuel Aparicio
- BC Cancer Agency, Department of Molecular Oncology, Vancouver, V5Z 1L3, Canada.,University of British Columbia, Department of Pathology and Laboratory Medicine, Vancouver, V6T 2B5, Canada
| | - Sohrab P Shah
- BC Cancer Agency, Department of Molecular Oncology, Vancouver, V5Z 1L3, Canada. .,University of British Columbia, Department of Pathology and Laboratory Medicine, Vancouver, V6T 2B5, Canada. .,BC Cancer Agency, Michael Smith Genome Sciences Centre, Vancouver, V5Z 1L3, Canada.
| |
Collapse
|
49
|
Kuipers J, Jahn K, Raphael BJ, Beerenwinkel N. Single-cell sequencing data reveal widespread recurrence and loss of mutational hits in the life histories of tumors. Genome Res 2017; 27:1885-1894. [PMID: 29030470 PMCID: PMC5668945 DOI: 10.1101/gr.220707.117] [Citation(s) in RCA: 75] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2017] [Accepted: 09/20/2017] [Indexed: 01/04/2023]
Abstract
Intra-tumor heterogeneity poses substantial challenges for cancer treatment. A tumor's composition can be deduced by reconstructing its mutational history. Central to current approaches is the infinite sites assumption that every genomic position can only mutate once over the lifetime of a tumor. The validity of this assumption has never been quantitatively assessed. We developed a rigorous statistical framework to test the infinite sites assumption with single-cell sequencing data. Our framework accounts for the high noise and contamination present in such data. We found strong evidence for the same genomic position being mutationally affected multiple times in individual tumors for 11 of 12 single-cell sequencing data sets from a variety of human cancers. Seven cases involved the loss of earlier mutations, five of which occurred at sites unaffected by large-scale genomic deletions. Four cases exhibited a parallel mutation, potentially indicating convergent evolution at the base pair level. Our results refute the general validity of the infinite sites assumption and indicate that more complex models are needed to adequately quantify intra-tumor heterogeneity for more effective cancer treatment.
Collapse
Affiliation(s)
- Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, 4058, Switzerland
| | - Katharina Jahn
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, 4058, Switzerland
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, New Jersey 08540, USA
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, 4058, Switzerland
| |
Collapse
|
50
|
Abstract
MOTIVATION A tumor arises from an evolutionary process that can be modeled as a phylogenetic tree. However, reconstructing this tree is challenging as most cancer sequencing uses bulk tumor tissue containing heterogeneous mixtures of cells. RESULTS We introduce P robabilistic A lgorithm for S omatic Tr ee I nference (PASTRI), a new algorithm for bulk-tumor sequencing data that clusters somatic mutations into clones and infers a phylogenetic tree that describes the evolutionary history of the tumor. PASTRI uses an importance sampling algorithm that combines a probabilistic model of DNA sequencing data with a enumeration algorithm based on the combinatorial constraints defined by the underlying phylogenetic tree. As a result, tree inference is fast, accurate and robust to noise. We demonstrate on simulated data that PASTRI outperforms other cancer phylogeny algorithms in terms of runtime and accuracy. On real data from a chronic lymphocytic leukemia (CLL) patient, we show that a simple linear phylogeny better explains the data the complex branching phylogeny that was previously reported. PASTRI provides a robust approach for phylogenetic tree inference from mixed samples. AVAILABILITY AND IMPLEMENTATION Software is available at compbio.cs.brown.edu/software. CONTACT braphael@princeton.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gryte Satas
- Department of Computer Science, Brown University, Providence, RI, USA
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ, USA
| |
Collapse
|