51
|
Abstract
Metastatic dissemination occurs very early in the malignant progression of a cancer but the clinical manifestation of metastases often takes years. In recent decades, 5-year survival of patients with many solid cancers has increased due to earlier detection, local disease control and adjuvant therapies. As a consequence, we are confronted with an increase in late relapses as more antiproliferative cancer therapies prolong disease courses, raising questions about how cancer cells survive, evolve or stop growing and finally expand during periods of clinical latency. I argue here that the understanding of early metastasis formation, particularly of the currently invisible phase of metastatic colonization, will be essential for the next stage in adjuvant therapy development that reliably prevents metachronous metastasis.
Collapse
Affiliation(s)
- Christoph A Klein
- Experimental Medicine and Therapy Research, University of Regensburg, Regensburg, Germany.
- Division of Personalized Tumor Therapy, Fraunhofer Institute for Toxicology and Experimental Medicine, Regensburg, Germany.
| |
Collapse
|
52
|
Wu Y. Accurate and efficient cell lineage tree inference from noisy single cell data: the maximum likelihood perfect phylogeny approach. Bioinformatics 2020; 36:742-750. [PMID: 31504211 DOI: 10.1093/bioinformatics/btz676] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 08/21/2019] [Accepted: 08/27/2019] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Cells in an organism share a common evolutionary history, called cell lineage tree. Cell lineage tree can be inferred from single cell genotypes at genomic variation sites. Cell lineage tree inference from noisy single cell data is a challenging computational problem. Most existing methods for cell lineage tree inference assume uniform uncertainty in genotypes. A key missing aspect is that real single cell data usually has non-uniform uncertainty in individual genotypes. Moreover, existing methods are often sampling based and can be very slow for large data. RESULTS In this article, we propose a new method called ScisTree, which infers cell lineage tree and calls genotypes from noisy single cell genotype data. Different from most existing approaches, ScisTree works with genotype probabilities of individual genotypes (which can be computed by existing single cell genotype callers). ScisTree assumes the infinite sites model. Given uncertain genotypes with individualized probabilities, ScisTree implements a fast heuristic for inferring cell lineage tree and calling the genotypes that allow the so-called perfect phylogeny and maximize the likelihood of the genotypes. Through simulation, we show that ScisTree performs well on the accuracy of inferred trees, and is much more efficient than existing methods. The efficiency of ScisTree enables new applications including imputation of the so-called doublets. AVAILABILITY AND IMPLEMENTATION The program ScisTree is available for download at: https://github.com/yufengwudcs/ScisTree. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yufeng Wu
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
53
|
Schill R, Solbrig S, Wettig T, Spang R. Modelling cancer progression using Mutual Hazard Networks. Bioinformatics 2020; 36:241-249. [PMID: 31250881 PMCID: PMC6956791 DOI: 10.1093/bioinformatics/btz513] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2018] [Revised: 03/29/2019] [Accepted: 06/25/2019] [Indexed: 12/26/2022] Open
Abstract
MOTIVATION Cancer progresses by accumulating genomic events, such as mutations and copy number alterations, whose chronological order is key to understanding the disease but difficult to observe. Instead, cancer progression models use co-occurrence patterns in cross-sectional data to infer epistatic interactions between events and thereby uncover their most likely order of occurrence. State-of-the-art progression models, however, are limited by mathematical tractability and only allow events to interact in directed acyclic graphs, to promote but not inhibit subsequent events, or to be mutually exclusive in distinct groups that cannot overlap. RESULTS Here we propose Mutual Hazard Networks (MHN), a new Machine Learning algorithm to infer cyclic progression models from cross-sectional data. MHN model events by their spontaneous rate of fixation and by multiplicative effects they exert on the rates of successive events. MHN compared favourably to acyclic models in cross-validated model fit on four datasets tested. In application to the glioblastoma dataset from The Cancer Genome Atlas, MHN proposed a novel interaction in line with consecutive biopsies: IDH1 mutations are early events that promote subsequent fixation of TP53 mutations. AVAILABILITY AND IMPLEMENTATION Implementation and data are available at https://github.com/RudiSchill/MHN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rudolf Schill
- Department of Statistical Bioinformatics, Institute of Functional Genomics, Regensburg 93040, Germany
| | - Stefan Solbrig
- Department of Physics, University of Regensburg, Regensburg 93040, Germany
| | - Tilo Wettig
- Department of Physics, University of Regensburg, Regensburg 93040, Germany
| | - Rainer Spang
- Department of Statistical Bioinformatics, Institute of Functional Genomics, Regensburg 93040, Germany
| |
Collapse
|
54
|
Inferring clonal composition from multiple tumor biopsies. NPJ Syst Biol Appl 2020; 6:27. [PMID: 32843649 PMCID: PMC7447821 DOI: 10.1038/s41540-020-00147-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Accepted: 07/15/2020] [Indexed: 01/09/2023] Open
Abstract
Knowledge about the clonal evolution of a tumor can help to interpret the function of its genetic alterations by identifying initiating events and events that contribute to the selective advantage of proliferative, metastatic, and drug-resistant subclones. Clonal evolution can be reconstructed from estimates of the relative abundance (frequency) of subclone-specific alterations in tumor biopsies, which, in turn, inform on its composition. However, estimating these frequencies is complicated by the high genetic instability that characterizes many cancers. Models for genetic instability suggest that copy number alterations (CNAs) can influence mutation-frequency estimates and thus impede efforts to reconstruct tumor phylogenies. Our analysis suggested that accurate mutation frequency estimates require accounting for CNAs—a challenging endeavour using the genetic profile of a single tumor biopsy. Instead, we propose an optimization algorithm, Chimæra, to account for the effects of CNAs using profiles of multiple biopsies per tumor. Analyses of simulated data and tumor profiles suggested that Chimæra estimates are consistently more accurate than those of previously proposed methods and resulted in improved phylogeny reconstructions and subclone characterizations. Our analyses inferred recurrent initiating mutations in hepatocellular carcinomas, resolved the clonal composition of Wilms’ tumors, and characterized the acquisition of mutations in drug-resistant prostate cancers.
Collapse
|
55
|
Nam AS, Chaligne R, Landau DA. Integrating genetic and non-genetic determinants of cancer evolution by single-cell multi-omics. Nat Rev Genet 2020; 22:3-18. [PMID: 32807900 DOI: 10.1038/s41576-020-0265-5] [Citation(s) in RCA: 195] [Impact Index Per Article: 48.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/01/2020] [Indexed: 12/17/2022]
Abstract
Cancer represents an evolutionary process through which growing malignant populations genetically diversify, leading to tumour progression, relapse and resistance to therapy. In addition to genetic diversity, the cell-to-cell variation that fuels evolutionary selection also manifests in cellular states, epigenetic profiles, spatial distributions and interactions with the microenvironment. Therefore, the study of cancer requires the integration of multiple heritable dimensions at the resolution of the single cell - the atomic unit of somatic evolution. In this Review, we discuss emerging analytic and experimental technologies for single-cell multi-omics that enable the capture and integration of multiple data modalities to inform the study of cancer evolution. These data show that cancer results from a complex interplay between genetic and non-genetic determinants of somatic evolution.
Collapse
Affiliation(s)
- Anna S Nam
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA.,New York Genome Center, New York, NY, USA.,Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA
| | - Ronan Chaligne
- New York Genome Center, New York, NY, USA.,Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA.,Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Dan A Landau
- New York Genome Center, New York, NY, USA. .,Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA. .,Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medicine, New York, NY, USA. .,Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA.
| |
Collapse
|
56
|
Mallory XF, Edrisi M, Navin N, Nakhleh L. Methods for copy number aberration detection from single-cell DNA-sequencing data. Genome Biol 2020; 21:208. [PMID: 32807205 PMCID: PMC7433197 DOI: 10.1186/s13059-020-02119-8] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Accepted: 07/23/2020] [Indexed: 02/06/2023] Open
Abstract
Copy number aberrations (CNAs), which are pathogenic copy number variations (CNVs), play an important role in the initiation and progression of cancer. Single-cell DNA-sequencing (scDNAseq) technologies produce data that is ideal for inferring CNAs. In this review, we review eight methods that have been developed for detecting CNAs in scDNAseq data, and categorize them according to the steps of a seven-step pipeline that they employ. Furthermore, we review models and methods for evolutionary analyses of CNAs from scDNAseq data and highlight advances and future research directions for computational methods for CNA detection from scDNAseq data.
Collapse
Affiliation(s)
- Xian F. Mallory
- Department of Computer Science, Rice University, Houston, TX USA
- Department of Computer Science, Florida State University, Tallahassee, FL USA
| | | | - Nicholas Navin
- Department of Genetics, the University of Texas M.D. Anderson Cancer Center, Houston, TX USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, TX USA
| |
Collapse
|
57
|
Xin H, Lian Q, Jiang Y, Luo J, Wang X, Erb C, Xu Z, Zhang X, Heidrich-O’Hare E, Yan Q, Duerr RH, Chen K, Chen W. GMM-Demux: sample demultiplexing, multiplet detection, experiment planning, and novel cell-type verification in single cell sequencing. Genome Biol 2020; 21:188. [PMID: 32731885 PMCID: PMC7393741 DOI: 10.1186/s13059-020-02084-2] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Accepted: 06/24/2020] [Indexed: 11/10/2022] Open
Abstract
Identifying and removing multiplets are essential to improving the scalability and the reliability of single cell RNA sequencing (scRNA-seq). Multiplets create artificial cell types in the dataset. We propose a Gaussian mixture model-based multiplet identification method, GMM-Demux. GMM-Demux accurately identifies and removes multiplets through sample barcoding, including cell hashing and MULTI-seq. GMM-Demux uses a droplet formation model to authenticate putative cell types discovered from a scRNA-seq dataset. We generate two in-house cell-hashing datasets and compared GMM-Demux against three state-of-the-art sample barcoding classifiers. We show that GMM-Demux is stable and highly accurate and recognizes 9 multiplet-induced fake cell types in a PBMC dataset.
Collapse
Affiliation(s)
- Hongyi Xin
- University of Michigan-Shanghai Jiao Tong University Joint Institute, Shanghai Jiao Tong University, Shanghai, 200240 China
- Department of Pediatrics, School of Medicine, University of Pittsburgh, Pittsburgh, 15260 USA
| | - Qiuyu Lian
- Department of Pediatrics, School of Medicine, University of Pittsburgh, Pittsburgh, 15260 USA
- Department of Automation, Tsinghua University, Beijing, 100086 China
| | - Yale Jiang
- Department of Pediatrics, School of Medicine, University of Pittsburgh, Pittsburgh, 15260 USA
- School of Medicine, Tsinghua University, Beijing, 100086 China
| | - Jiadi Luo
- Department of Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, 15260 USA
| | - Xinjun Wang
- Department of Biostatistics, School of Public Health, University of Pittsburgh, Pittsburgh, 15260 USA
| | - Carla Erb
- Department of Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, 15260 USA
| | - Zhongli Xu
- Department of Pediatrics, School of Medicine, University of Pittsburgh, Pittsburgh, 15260 USA
- School of Medicine, Tsinghua University, Beijing, 100086 China
| | - Xiaoyi Zhang
- Department of Pediatrics, School of Medicine, University of Pittsburgh, Pittsburgh, 15260 USA
| | - Elisa Heidrich-O’Hare
- Department of Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, 15260 USA
| | - Qi Yan
- Department of Pediatrics, School of Medicine, University of Pittsburgh, Pittsburgh, 15260 USA
| | - Richard H. Duerr
- Department of Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, 15260 USA
| | - Kong Chen
- Department of Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, 15260 USA
| | - Wei Chen
- Department of Pediatrics, School of Medicine, University of Pittsburgh, Pittsburgh, 15260 USA
| |
Collapse
|
58
|
Aitken SJ, Anderson CJ, Connor F, Pich O, Sundaram V, Feig C, Rayner TF, Lukk M, Aitken S, Luft J, Kentepozidou E, Arnedo-Pac C, Beentjes SV, Davies SE, Drews RM, Ewing A, Kaiser VB, Khamseh A, López-Arribillaga E, Redmond AM, Santoyo-Lopez J, Sentís I, Talmane L, Yates AD, Semple CA, López-Bigas N, Flicek P, Odom DT, Taylor MS. Pervasive lesion segregation shapes cancer genome evolution. Nature 2020; 583:265-270. [PMID: 32581361 PMCID: PMC7116693 DOI: 10.1038/s41586-020-2435-1] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2019] [Accepted: 05/07/2020] [Indexed: 02/08/2023]
Abstract
Cancers arise through the acquisition of oncogenic mutations and grow by clonal expansion1,2. Here we reveal that most mutagenic DNA lesions are not resolved into a mutated DNA base pair within a single cell cycle. Instead, DNA lesions segregate, unrepaired, into daughter cells for multiple cell generations, resulting in the chromosome-scale phasing of subsequent mutations. We characterize this process in mutagen-induced mouse liver tumours and show that DNA replication across persisting lesions can produce multiple alternative alleles in successive cell divisions, thereby generating both multiallelic and combinatorial genetic diversity. The phasing of lesions enables accurate measurement of strand-biased repair processes, quantification of oncogenic selection and fine mapping of sister-chromatid-exchange events. Finally, we demonstrate that lesion segregation is a unifying property of exogenous mutagens, including UV light and chemotherapy agents in human cells and tumours, which has profound implications for the evolution and adaptation of cancer genomes.
Collapse
Affiliation(s)
- Sarah J Aitken
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
- Department of Pathology, University of Cambridge, Cambridge, UK
- Department of Histopathology, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Craig J Anderson
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK
| | - Frances Connor
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Oriol Pich
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Vasavi Sundaram
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Christine Feig
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Tim F Rayner
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Margus Lukk
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Stuart Aitken
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK
| | - Juliet Luft
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK
| | | | - Claudia Arnedo-Pac
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Sjoerd V Beentjes
- School of Mathematics and Maxwell Institute, University of Edinburgh, Edinburgh, UK
| | - Susan E Davies
- Department of Histopathology, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Ruben M Drews
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Ailith Ewing
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK
| | - Vera B Kaiser
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK
| | - Ava Khamseh
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK
- Higgs Centre for Theoretical Physics, University of Edinburgh, Edinburgh, UK
| | - Erika López-Arribillaga
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Aisling M Redmond
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | | | - Inés Sentís
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Lana Talmane
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK
| | - Andrew D Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Colin A Semple
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK
| | - Núria López-Bigas
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Paul Flicek
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Duncan T Odom
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK.
- German Cancer Research Center (DKFZ), Division of Regulatory Genomics and Cancer Evolution, Heidelberg, Germany.
| | - Martin S Taylor
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK.
| |
Collapse
|
59
|
Skums P, Tsyvina V, Zelikovsky A. Inference of clonal selection in cancer populations using single-cell sequencing data. Bioinformatics 2020; 35:i398-i407. [PMID: 31510696 PMCID: PMC6612866 DOI: 10.1093/bioinformatics/btz392] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Summary Intra-tumor heterogeneity is one of the major factors influencing cancer progression and treatment outcome. However, evolutionary dynamics of cancer clone populations remain poorly understood. Quantification of clonal selection and inference of fitness landscapes of tumors is a key step to understanding evolutionary mechanisms driving cancer. These problems could be addressed using single-cell sequencing (scSeq), which provides an unprecedented insight into intra-tumor heterogeneity allowing to study and quantify selective advantages of individual clones. Here, we present Single Cell Inference of FItness Landscape (SCIFIL), a computational tool for inference of fitness landscapes of heterogeneous cancer clone populations from scSeq data. SCIFIL allows to estimate maximum likelihood fitnesses of clone variants, measure their selective advantages and order of appearance by fitting an evolutionary model into the tumor phylogeny. We demonstrate the accuracy our approach, and show how it could be applied to experimental tumor data to study clonal selection and infer evolutionary history. SCIFIL can be used to provide new insight into the evolutionary dynamics of cancer. Availability and implementation Its source code is available at https://github.com/compbel/SCIFIL.
Collapse
Affiliation(s)
- Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, GA, USA
| | - Viachaslau Tsyvina
- Department of Computer Science, Georgia State University, Atlanta, GA, USA
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, Atlanta, GA, USA.,The Laboratory of Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| |
Collapse
|
60
|
Satas G, Zaccaria S, Mon G, Raphael BJ. SCARLET: Single-cell tumor phylogeny inference with copy-number constrained mutation losses. Cell Syst 2020; 10:323-332.e8. [PMID: 32864481 PMCID: PMC7451135 DOI: 10.1016/j.cels.2020.04.001] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
A small number of somatic mutations drive the development of cancer, but all somatic mutations are markers of the evolutionary history of a tumor. Prominent methods to construct phylogenies from single-cell sequencing data use single-nucleotide variants (SNVs) as markers but fail to adequately account for copy-number aberrations (CNAs), which can overlap SNVs and result in SNV losses. Here, we introduce SCARLET, an algorithm that infers tumor phylogenies from single-cell DNA sequencing data while accounting for both CNA-driven loss of SNVs and sequencing errors. SCARLET outperforms existing methods on simulated data, with more accurate inference of the order in which mutations were acquired and the mutations present in individual cells. Using a single-cell dataset from a patient with colorectal cancer, SCARLET constructs a tumor phylogeny that is consistent with the observed CNAs and suggests an alternate origin for the patient's metastases. SCARLET is available at: github.com/raphael-group/scarlet.
Collapse
Affiliation(s)
- Gryte Satas
- Department of Computer Science, Brown University, Providence, RI 02912
- Department of Computer Science, Princeton University, Princeton, NJ 08540
| | - Simone Zaccaria
- Department of Computer Science, Princeton University, Princeton, NJ 08540
| | - Geoffrey Mon
- Department of Computer Science, Princeton University, Princeton, NJ 08540
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ 08540
| |
Collapse
|
61
|
McCarthy DJ, Rostom R, Huang Y, Kunz DJ, Danecek P, Bonder MJ, Hagai T, Lyu R, Wang W, Gaffney DJ, Simons BD, Stegle O, Teichmann SA. Cardelino: computational integration of somatic clonal substructure and single-cell transcriptomes. Nat Methods 2020; 17:414-421. [PMID: 32203388 DOI: 10.1038/s41592-020-0766-3] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Accepted: 01/31/2020] [Indexed: 02/03/2023]
Abstract
Bulk and single-cell DNA sequencing has enabled reconstructing clonal substructures of somatic tissues from frequency and cooccurrence patterns of somatic variants. However, approaches to characterize phenotypic variations between clones are not established. Here we present cardelino (https://github.com/single-cell-genetics/cardelino), a computational method for inferring the clonal tree configuration and the clone of origin of individual cells assayed using single-cell RNA-seq (scRNA-seq). Cardelino flexibly integrates information from imperfect clonal trees inferred based on bulk exome-seq data, and sparse variant alleles expressed in scRNA-seq data. We apply cardelino to a published cancer dataset and to newly generated matched scRNA-seq and exome-seq data from 32 human dermal fibroblast lines, identifying hundreds of differentially expressed genes between cells from different somatic clones. These genes are frequently enriched for cell cycle and proliferation pathways, indicating a role for cell division genes in somatic evolution in healthy skin.
Collapse
Affiliation(s)
- Davis J McCarthy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK.,St Vincent's Institute of Medical Research, Fitzroy, Victoria, Australia.,Melbourne Integrative Genomics, School of Mathematics and Statistics/School of Biosciences, University of Melbourne, Parkville, Victoria, Australia
| | - Raghd Rostom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK.,Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Yuanhua Huang
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK.,Department of Clinical Neurosciences, University of Cambridge, Cambridge, UK
| | - Daniel J Kunz
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.,Department of Physics, Cavendish Laboratory, Cambridge, UK.,The Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, Cambridge, UK
| | - Petr Danecek
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Marc Jan Bonder
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Tzachi Hagai
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK.,Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.,School of Molecular Cell Biology and Biotechnology, George S Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Ruqian Lyu
- St Vincent's Institute of Medical Research, Fitzroy, Victoria, Australia.,Melbourne Integrative Genomics, School of Mathematics and Statistics/School of Biosciences, University of Melbourne, Parkville, Victoria, Australia
| | | | - Wenyi Wang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | | | - Benjamin D Simons
- Department of Physics, Cavendish Laboratory, Cambridge, UK.,The Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, Cambridge, UK.,The Wellcome Trust/Medical Research Council Stem Cell Institute, University of Cambridge, Cambridge, UK
| | - Oliver Stegle
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK. .,Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK. .,European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany. .,Division of Computational Genomics and Systems Genetics, German Cancer Research Center, Heidelberg, Germany.
| | - Sarah A Teichmann
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK. .,Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK. .,Department of Physics, Cavendish Laboratory, Cambridge, UK.
| |
Collapse
|
62
|
Lähnemann D, Köster J, Szczurek E, McCarthy DJ, Hicks SC, Robinson MD, Vallejos CA, Campbell KR, Beerenwinkel N, Mahfouz A, Pinello L, Skums P, Stamatakis A, Attolini CSO, Aparicio S, Baaijens J, Balvert M, Barbanson BD, Cappuccio A, Corleone G, Dutilh BE, Florescu M, Guryev V, Holmer R, Jahn K, Lobo TJ, Keizer EM, Khatri I, Kielbasa SM, Korbel JO, Kozlov AM, Kuo TH, Lelieveldt BP, Mandoiu II, Marioni JC, Marschall T, Mölder F, Niknejad A, Rączkowska A, Reinders M, Ridder JD, Saliba AE, Somarakis A, Stegle O, Theis FJ, Yang H, Zelikovsky A, McHardy AC, Raphael BJ, Shah SP, Schönhuth A. Eleven grand challenges in single-cell data science. Genome Biol 2020; 21:31. [PMID: 32033589 PMCID: PMC7007675 DOI: 10.1186/s13059-020-1926-6] [Citation(s) in RCA: 576] [Impact Index Per Article: 144.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 01/02/2020] [Indexed: 02/08/2023] Open
Abstract
The recent boom in microfluidics and combinatorial indexing strategies, combined with low sequencing costs, has empowered single-cell sequencing technology. Thousands-or even millions-of cells analyzed in a single experiment amount to a data revolution in single-cell biology and pose unique data science problems. Here, we outline eleven challenges that will be central to bringing this emerging field of single-cell data science forward. For each challenge, we highlight motivating research questions, review prior work, and formulate open problems. This compendium is for established researchers, newcomers, and students alike, highlighting interesting and rewarding problems for the coming years.
Collapse
Affiliation(s)
- David Lähnemann
- Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
- Department of Paediatric Oncology, Haematology and Immunology, Medical Faculty, Heinrich Heine University, University Hospital, Düsseldorf, Germany
- Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Johannes Köster
- Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
- Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, USA
| | - Ewa Szczurek
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warszawa, Poland
| | - Davis J. McCarthy
- Bioinformatics and Cellular Genomics, St Vincent’s Institute of Medical Research, Fitzroy, Australia
- Melbourne Integrative Genomics, School of BioSciences–School of Mathematics & Statistics, Faculty of Science, University of Melbourne, Melbourne, Australia
| | - Stephanie C. Hicks
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD USA
| | - Mark D. Robinson
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zürich, Zürich, Switzerland
| | - Catalina A. Vallejos
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, UK
- The Alan Turing Institute, British Library, London, UK
| | - Kieran R. Campbell
- Department of Statistics, University of British Columbia, Vancouver, Canada
- Department of Molecular Oncology, BC Cancer Agency, Vancouver, Canada
- Data Science Institute, University of British Columbia, Vancouver, Canada
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Ahmed Mahfouz
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands
- Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands
| | - Luca Pinello
- Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital Research Institute, Charlestown, USA
- Department of Pathology, Harvard Medical School, Boston, USA
- Broad Institute of Harvard and MIT, Cambridge, MA USA
| | - Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, USA
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | | | - Samuel Aparicio
- Department of Molecular Oncology, BC Cancer Agency, Vancouver, Canada
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
| | - Jasmijn Baaijens
- Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
| | - Marleen Balvert
- Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
| | - Buys de Barbanson
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
- Quantitative biology, Hubrecht Institute, Utrecht, The Netherlands
| | - Antonio Cappuccio
- Institute for Advanced Study, University of Amsterdam, Amsterdam, The Netherlands
| | - Giacomo Corleone
- Department of Surgery and Cancer, The Imperial Centre for Translational and Experimental Medicine, Imperial College London, London, UK
| | - Bas E. Dutilh
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Maria Florescu
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
- Quantitative biology, Hubrecht Institute, Utrecht, The Netherlands
| | - Victor Guryev
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Rens Holmer
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Katharina Jahn
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Thamar Jessurun Lobo
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Emma M. Keizer
- Biometris, Wageningen University & Research, Wageningen, The Netherlands
| | - Indu Khatri
- Department of Immunohematology and Blood Transfusion, Leiden University Medical Center, Leiden, The Netherlands
| | - Szymon M. Kielbasa
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Jan O. Korbel
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Alexey M. Kozlov
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Tzu-Hao Kuo
- Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Boudewijn P.F. Lelieveldt
- PRB lab, Delft University of Technology, Delft, The Netherlands
- Division of Image Processing, Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Ion I. Mandoiu
- Computer Science & Engineering Department, University of Connecticut, Storrs, USA
| | - John C. Marioni
- Cancer Research UK Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Cambridge, UK
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Tobias Marschall
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
- Max Planck Institute for Informatics, Saarbrücken, Germany
| | - Felix Mölder
- Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
- Institute of Pathology, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
| | - Amir Niknejad
- Computation molecular design, Zuse Institute Berlin, Berlin, Germany
- Mathematics Department, Mount Saint Vincent, New York, USA
| | - Alicja Rączkowska
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warszawa, Poland
| | - Marcel Reinders
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands
- Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands
| | - Jeroen de Ridder
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
| | - Antoine-Emmanuel Saliba
- Helmholtz Institute for RNA-based Infection Research, Helmholtz-Center for Infection Research, Würzburg, Germany
| | - Antonios Somarakis
- Division of Image Processing, Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Oliver Stegle
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center–DKFZ, Heidelberg, Germany
| | - Fabian J. Theis
- Institute of Computational Biology, Helmholtz Zentrum München–German Research Center for Environmental Health, Neuherberg, Germany
| | - Huan Yang
- Division of Drug Discovery and Safety, Leiden Academic Center for Drug Research–LACDR–Leiden University, Leiden, The Netherlands
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, Atlanta, USA
- The Laboratory of Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| | - Alice C. McHardy
- Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | | | - Sohrab P. Shah
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, USA
| | - Alexander Schönhuth
- Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
63
|
Dinh KN, Jaksik R, Kimmel M, Lambert A, Tavaré S. Statistical Inference for the Evolutionary History of Cancer Genomes. Stat Sci 2020. [DOI: 10.1214/19-sts7561] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
64
|
Noorani A, Li X, Goddard M, Crawte J, Alexandrov LB, Secrier M, Eldridge MD, Bower L, Weaver J, Lao-Sirieix P, Martincorena I, Debiram-Beecham I, Grehan N, MacRae S, Malhotra S, Miremadi A, Thomas T, Galbraith S, Petersen L, Preston SD, Gilligan D, Hindmarsh A, Hardwick RH, Stratton MR, Wedge DC, Fitzgerald RC. Genomic evidence supports a clonal diaspora model for metastases of esophageal adenocarcinoma. Nat Genet 2020; 52:74-83. [PMID: 31907488 PMCID: PMC7100916 DOI: 10.1038/s41588-019-0551-3] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2018] [Accepted: 11/19/2019] [Indexed: 01/23/2023]
Abstract
The poor outcomes in esophageal adenocarcinoma (EAC) prompted us to interrogate the pattern and timing of metastatic spread. Whole-genome sequencing and phylogenetic analysis of 388 samples across 18 individuals with EAC showed, in 90% of patients, that multiple subclones from the primary tumor spread very rapidly from the primary site to form multiple metastases, including lymph nodes and distant tissues-a mode of dissemination that we term 'clonal diaspora'. Metastatic subclones at autopsy were present in tissue and blood samples from earlier time points. These findings have implications for our understanding and clinical evaluation of EAC.
Collapse
Affiliation(s)
| | - Xiaodun Li
- MRC Cancer Unit, University of Cambridge, Cambridge, UK
| | - Martin Goddard
- Department of Histopathology, Papworth Hospital NHS Trust, Cambridge, UK
| | - Jason Crawte
- MRC Cancer Unit, University of Cambridge, Cambridge, UK
| | - Ludmil B Alexandrov
- Cellular and Molecular Medicine, University of California, San Diego, San Diego, CA, USA
| | - Maria Secrier
- Cancer Research UK Cambridge Research Institute, Cambridge, UK
| | | | - Lawrence Bower
- Cancer Research UK Cambridge Research Institute, Cambridge, UK
| | - Jamie Weaver
- MRC Cancer Unit, University of Cambridge, Cambridge, UK
| | | | | | | | - Nicola Grehan
- MRC Cancer Unit, University of Cambridge, Cambridge, UK
| | - Shona MacRae
- MRC Cancer Unit, University of Cambridge, Cambridge, UK
| | - Shalini Malhotra
- Department of Histopathology, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Ahmad Miremadi
- Department of Histopathology, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | | | - Sarah Galbraith
- Department of Palliative Care, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | | | - Stephen D Preston
- Department of Histopathology, Papworth Hospital NHS Trust, Cambridge, UK
| | - David Gilligan
- Department of Oncology, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Andrew Hindmarsh
- Cambridge Oesophago-Gastric Centre, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Richard H Hardwick
- Cambridge Oesophago-Gastric Centre, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | | | - David C Wedge
- Big Data Institute, University of Oxford, Oxford, UK.
- Oxford NIHR Biomedical Research Centre, Oxford, UK.
| | | |
Collapse
|
65
|
Abstract
BACKGROUND Accurate inference of the evolutionary history of a tumor has important implications for understanding and potentially treating the disease. While a number of methods have been proposed to reconstruct the evolutionary history of a tumor from DNA sequencing data, it is not clear how aspects of the sequencing data and tumor itself affect these reconstructions. METHODS We investigate when and how well these histories can be reconstructed from multi-sample bulk sequencing data when considering only single nucleotide variants (SNVs). Specifically, we examine the space of all possible tumor phylogenies under the infinite sites assumption (ISA) using several approaches for enumerating phylogenies consistent with the sequencing data. RESULTS On noisy simulated data, we find that the ISA is often violated and that low coverage and high noise make it more difficult to identify phylogenies. Additionally, we find that evolutionary trees with branching topologies are easier to reconstruct accurately. We also apply our reconstruction methods to both chronic lymphocytic leukemia and clear cell renal cell carcinoma datasets and confirm that ISA violations are common in practice, especially in lower-coverage sequencing data. Nonetheless, we show that an ISA-based approach can be relaxed to produce high-quality phylogenies. CONCLUSIONS Consideration of practical aspects of sequencing data such as coverage or the model of tumor evolution (branching, linear, etc.) is essential to effectively using the output of tumor phylogeny inference methods. Additionally, these factors should be considered in the development of new inference methods.
Collapse
Affiliation(s)
- Kiran Tomlinson
- Department of Computer Science, Carleton College, 1 N College St, Northfield, 55057, MN, USA
- Department of Computer Science, Cornell University, 402 Gates Hall, Ithaca, 14853, NY, USA
| | - Layla Oesper
- Department of Computer Science, Carleton College, 1 N College St, Northfield, 55057, MN, USA.
| |
Collapse
|
66
|
Zafar H, Navin N, Chen K, Nakhleh L. SiCloneFit: Bayesian inference of population structure, genotype, and phylogeny of tumor clones from single-cell genome sequencing data. Genome Res 2019; 29:1847-1859. [PMID: 31628257 PMCID: PMC6836738 DOI: 10.1101/gr.243121.118] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Accepted: 07/10/2019] [Indexed: 12/12/2022]
Abstract
Accumulation and selection of somatic mutations in a Darwinian framework result in intra-tumor heterogeneity (ITH) that poses significant challenges to the diagnosis and clinical therapy of cancer. Identification of the tumor cell populations (clones) and reconstruction of their evolutionary relationship can elucidate this heterogeneity. Recently developed single-cell DNA sequencing (SCS) technologies promise to resolve ITH to a single-cell level. However, technical errors in SCS data sets, including false-positives (FP) and false-negatives (FN) due to allelic dropout, and cell doublets, significantly complicate these tasks. Here, we propose a nonparametric Bayesian method that reconstructs the clonal populations as clusters of single cells, genotypes of each clone, and the evolutionary relationship between the clones. It employs a tree-structured Chinese restaurant process as the prior on the number and composition of clonal populations. The evolution of the clonal populations is modeled by a clonal phylogeny and a finite-site model of evolution to account for potential mutation recurrence and losses. We probabilistically account for FP and FN errors, and cell doublets are modeled by employing a Beta-binomial distribution. We develop a Gibbs sampling algorithm comprising partial reversible-jump and partial Metropolis-Hastings updates to explore the joint posterior space of all parameters. The performance of our method on synthetic and experimental data sets suggests that joint reconstruction of tumor clones and clonal phylogeny under a finite-site model of evolution leads to more accurate inferences. Our method is the first to enable this joint reconstruction in a fully Bayesian framework, thus providing measures of support of the inferences it makes.
Collapse
Affiliation(s)
- Hamim Zafar
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA
| | - Nicholas Navin
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA
| | - Ken Chen
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| |
Collapse
|
67
|
Malikic S, Mehrabadi FR, Ciccolella S, Rahman MK, Ricketts C, Haghshenas E, Seidman D, Hach F, Hajirasouliha I, Sahinalp SC. PhISCS: a combinatorial approach for subperfect tumor phylogeny reconstruction via integrative use of single-cell and bulk sequencing data. Genome Res 2019; 29:1860-1877. [PMID: 31628256 PMCID: PMC6836735 DOI: 10.1101/gr.234435.118] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Accepted: 09/11/2019] [Indexed: 12/29/2022]
Abstract
Available computational methods for tumor phylogeny inference via single-cell sequencing (SCS) data typically aim to identify the most likely perfect phylogeny tree satisfying the infinite sites assumption (ISA). However, the limitations of SCS technologies including frequent allele dropout and variable sequence coverage may prohibit a perfect phylogeny. In addition, ISA violations are commonly observed in tumor phylogenies due to the loss of heterozygosity, deletions, and convergent evolution. In order to address such limitations, we introduce the optimal subperfect phylogeny problem which asks to integrate SCS data with matching bulk sequencing data by minimizing a linear combination of potential false negatives (due to allele dropout or variance in sequence coverage), false positives (due to read errors) among mutation calls, and the number of mutations that violate ISA (real or because of incorrect copy number estimation). We then describe a combinatorial formulation to solve this problem which ensures that several lineage constraints imposed by the use of variant allele frequencies (VAFs, derived from bulk sequence data) are satisfied. We express our formulation both in the form of an integer linear program (ILP) and—as a first in tumor phylogeny reconstruction—a Boolean constraint satisfaction problem (CSP) and solve them by leveraging state-of-the-art ILP/CSP solvers. The resulting method, which we name PhISCS, is the first to integrate SCS and bulk sequencing data while accounting for ISA violating mutations. In contrast to the alternative methods, typically based on probabilistic approaches, PhISCS provides a guarantee of optimality in reported solutions. Using simulated and real data sets, we demonstrate that PhISCS is more general and accurate than all available approaches.
Collapse
Affiliation(s)
- Salem Malikic
- School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | - Farid Rashidi Mehrabadi
- Department of Computer Science, Indiana University, Bloomington, Indiana 47408, USA.,Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Simone Ciccolella
- Department of Computer Systems and Communication, University of Milano-Bicocca, 20136 Milan, Italy.,Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York 10065, USA
| | - Md Khaledur Rahman
- Department of Computer Science, Indiana University, Bloomington, Indiana 47408, USA
| | - Camir Ricketts
- Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York 10065, USA.,Tri-I Computational Biology and Medicine Graduate Program, Cornell University, New York, New York 10065, USA
| | - Ehsan Haghshenas
- School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | - Daniel Seidman
- Tri-I Computational Biology and Medicine Graduate Program, Cornell University, New York, New York 10065, USA
| | - Faraz Hach
- School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada.,Department of Urologic Sciences, University of British Columbia, Vancouver, BC V5Z 1M9, Canada.,Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada
| | - Iman Hajirasouliha
- Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York 10065, USA.,Department of Physiology and Biophysics, Englander Institute for Precision Medicine, The Meyer Cancer Center, Weill Cornell Medicine, New York, New York 10065, USA
| | - S Cenk Sahinalp
- Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| |
Collapse
|
68
|
El-Kebir M. SPhyR: tumor phylogeny estimation from single-cell sequencing data under loss and error. Bioinformatics 2019; 34:i671-i679. [PMID: 30423070 PMCID: PMC6153375 DOI: 10.1093/bioinformatics/bty589] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Motivation Cancer is characterized by intra-tumor heterogeneity, the presence of distinct cell populations with distinct complements of somatic mutations, which include single-nucleotide variants (SNVs) and copy-number aberrations (CNAs). Single-cell sequencing technology enables one to study these cell populations at single-cell resolution. Phylogeny estimation algorithms that employ appropriate evolutionary models are key to understanding the evolutionary mechanisms behind intra-tumor heterogeneity. Results We introduce Single-cell Phylogeny Reconstruction (SPhyR), a method for tumor phylogeny estimation from single-cell sequencing data. In light of frequent loss of SNVs due to CNAs in cancer, SPhyR employs the k-Dollo evolutionary model, where a mutation can only be gained once but lost k times. Underlying SPhyR is a novel combinatorial characterization of solutions as constrained integer matrix completions, based on a connection to the cladistic multi-state perfect phylogeny problem. SPhyR outperforms existing methods on simulated data and on a metastatic colorectal cancer. Availability and implementation SPhyR is available on https://github.com/elkebir-group/SPhyR. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mohammed El-Kebir
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| |
Collapse
|
69
|
Bonizzoni P, Ciccolella S, Vedova GD, Soto M. Does Relaxing the Infinite Sites Assumption Give Better Tumor Phylogenies? An ILP-Based Comparative Approach. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1410-1423. [PMID: 31603766 DOI: 10.1109/tcbb.2018.2865729] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Most of the evolutionary history reconstruction approaches are based on the infinite sites assumption, which states that mutations appear once in the evolutionary history. The Perfect Phylogeny model is the result of the infinite sites assumption and has been widely used to infer cancer evolution. Nonetheless, recent results show that recurrent and back mutations are present in the evolutionary history of tumors, hence the Perfect Phylogeny model might be too restrictive. We propose an approach that allows losing previously acquired mutations and multiple acquisitions of a character. Moreover, we provide an ILP formulation for the evolutionary tree reconstruction problem. Our formulation allows us to tackle both the Incomplete Directed Phylogeny problem and the Clonal Reconstruction problem when general evolutionary models are considered. The latter problem is fundamental in cancer genomics, the goal is to study the evolutionary history of a tumor considering as input data the fraction of cells having a certain mutation in a set of cancer samples. For the Clonal Reconstruction problem, an experimental analysis shows the advantage of allowing mutation losses. Namely, by analyzing real and simulated datasets, our ILP approach provides a better interpretation of the evolutionary history than a Perfect Phylogeny. The software is at https://github.com/AlgoLab/gppf.
Collapse
|
70
|
Malikic S, Jahn K, Kuipers J, Sahinalp SC, Beerenwinkel N. Integrative inference of subclonal tumour evolution from single-cell and bulk sequencing data. Nat Commun 2019; 10:2750. [PMID: 31227714 PMCID: PMC6588593 DOI: 10.1038/s41467-019-10737-5] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Accepted: 05/30/2019] [Indexed: 02/07/2023] Open
Abstract
Understanding the clonal architecture and evolutionary history of a tumour poses one of the key challenges to overcome treatment failure due to resistant cell populations. Previously, studies on subclonal tumour evolution have been primarily based on bulk sequencing and in some recent cases on single-cell sequencing data. Either data type alone has shortcomings with regard to this task, but methods integrating both data types have been lacking. Here, we present B-SCITE, the first computational approach that infers tumour phylogenies from combined single-cell and bulk sequencing data. Using a comprehensive set of simulated data, we show that B-SCITE systematically outperforms existing methods with respect to tree reconstruction accuracy and subclone identification. B-SCITE provides high-fidelity reconstructions even with a modest number of single cells and in cases where bulk allele frequencies are affected by copy number changes. On real tumour data, B-SCITE generated mutation histories show high concordance with expert generated trees. Intra-tumour heterogeneity provides important information about subclonal tumour evolution. Here, the authors develop B-SCITE, a computational method for inferring tumour phylogenies from combined single-cell and bulk sequencing data.
Collapse
Affiliation(s)
- Salem Malikic
- School of Computing Science, Simon Fraser University, Burnaby, V5A 1S6, BC, Canada.,Vancouver Prostate Centre, Vancouver, V6H 3Z6, BC, Canada
| | - Katharina Jahn
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
| | - Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
| | - S Cenk Sahinalp
- Department of Computer Science, Indiana University, Bloomington, 47405, IN, USA.
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland. .,Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland.
| |
Collapse
|
71
|
Little P, Lin DY, Sun W. Associating somatic mutations to clinical outcomes: a pan-cancer study of survival time. Genome Med 2019; 11:37. [PMID: 31138328 PMCID: PMC6540540 DOI: 10.1186/s13073-019-0643-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Accepted: 04/30/2019] [Indexed: 02/07/2023] Open
Abstract
We developed subclone multiplicity allocation and somatic heterogeneity (SMASH), a new statistical method for intra-tumor heterogeneity (ITH) inference. SMASH is tailored to the purpose of large-scale association studies with one tumor sample per patient. In a pan-cancer study of 14 cancer types, we studied the associations between survival time and ITH quantified by SMASH, together with other features of somatic mutations. Our results show that ITH is associated with survival time in several cancer types and its effect can be modified by other covariates, such as mutation burden. SMASH is available at https://github.com/Sun-lab/SMASH .
Collapse
Affiliation(s)
- Paul Little
- Department of Biostatistics, University of North Carolina Chapel Hill, Dauer Drive, Chapel Hill, 27599, NC, USA
| | - Dan-Yu Lin
- Department of Biostatistics, University of North Carolina Chapel Hill, Dauer Drive, Chapel Hill, 27599, NC, USA.
| | - Wei Sun
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, 98109, WA, USA. .,Department of Biostatistics, University of North Carolina Chapel Hill, Dauer Drive, Chapel Hill, 27599, NC, USA. .,Department of Biostatistics, University of Washington, NE Pacific St, Seattle, 98195, WA, USA.
| |
Collapse
|
72
|
Single-cell mutation identification via phylogenetic inference. Nat Commun 2018; 9:5144. [PMID: 30514897 PMCID: PMC6279798 DOI: 10.1038/s41467-018-07627-7] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Accepted: 11/15/2018] [Indexed: 12/25/2022] Open
Abstract
Reconstructing the evolution of tumors is a key aspect towards the identification of appropriate cancer therapies. The task is challenging because tumors evolve as heterogeneous cell populations. Single-cell sequencing holds the promise of resolving the heterogeneity of tumors; however, it has its own challenges including elevated error rates, allelic drop-out, and uneven coverage. Here, we develop a new approach to mutation detection in individual tumor cells by leveraging the evolutionary relationship among cells. Our method, called SCIΦ, jointly calls mutations in individual cells and estimates the tumor phylogeny among these cells. Employing a Markov Chain Monte Carlo scheme enables us to reliably call mutations in each single cell even in experiments with high drop-out rates and missing data. We show that SCIΦ outperforms existing methods on simulated data and applied it to different real-world datasets, namely a whole exome breast cancer as well as a panel acute lymphoblastic leukemia dataset. Cross-cell heterogeneity of genotypes can be revealed by analyzing single-cell sequencing data. Here the authors develop a tool for single-cell variant calling via phylogenetic inference, and use it to analyze cancer genomics datasets.
Collapse
|
73
|
Abstract
Cellular heterogeneity within and across tumors has been a major obstacle in understanding and treating cancer, and the complex heterogeneity is masked if bulk tumor tissues are used for analysis. The advent of rapidly developing single-cell sequencing technologies, which include methods related to single-cell genome, epigenome, transcriptome, and multi-omics sequencing, have been applied to cancer research and led to exciting new findings in the fields of cancer evolution, metastasis, resistance to therapy, and tumor microenvironment. In this review, we discuss recent advances and limitations of these new technologies and their potential applications in cancer studies.
Collapse
Affiliation(s)
- Xianwen Ren
- Beijing Advanced Innovation Centre for Genomics, Peking-Tsinghua Centre for Life Sciences, Biomedical Pioneering Innovation Center (BIOPIC), School of Life Sciences, Peking University, Beijing, 100871, China.
| | - Boxi Kang
- Beijing Advanced Innovation Centre for Genomics, Peking-Tsinghua Centre for Life Sciences, Biomedical Pioneering Innovation Center (BIOPIC), School of Life Sciences, Peking University, Beijing, 100871, China
| | - Zemin Zhang
- Beijing Advanced Innovation Centre for Genomics, Peking-Tsinghua Centre for Life Sciences, Biomedical Pioneering Innovation Center (BIOPIC), School of Life Sciences, Peking University, Beijing, 100871, China.
| |
Collapse
|
74
|
|
75
|
Genetic alterations driving metastatic colony formation are acquired outside of the primary tumour in melanoma. Nat Commun 2018; 9:595. [PMID: 29426936 PMCID: PMC5807512 DOI: 10.1038/s41467-017-02674-y] [Citation(s) in RCA: 62] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Accepted: 12/19/2017] [Indexed: 02/07/2023] Open
Abstract
Mouse models indicate that metastatic dissemination occurs extremely early; however, the timing in human cancers is unknown. We therefore determined the time point of metastatic seeding relative to tumour thickness and genomic alterations in melanoma. Here, we find that lymphatic dissemination occurs shortly after dermal invasion of the primary lesion at a median thickness of ~0.5 mm and that typical driver changes, including BRAF mutation and gained or lost regions comprising genes like MET or CDKNA2, are acquired within the lymph node at the time of colony formation. These changes define a colonisation signature that was linked to xenograft formation in immunodeficient mice and death from melanoma. Thus, melanoma cells leave primary tumours early and evolve at different sites in parallel. We propose a model of metastatic melanoma dormancy, evolution and colonisation that will inform direct monitoring of adjuvant therapy targets.
Collapse
|