1
|
Bristy NA, Fu X, Schwartz R. Sc-TUSV-Ext: Single-Cell Clonal Lineage Inference from Single Nucleotide Variants, Copy Number Alterations, and Structural Variants. J Comput Biol 2025. [PMID: 40049606 DOI: 10.1089/cmb.2024.0613] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/12/2025] Open
Abstract
Clonal lineage inference ("tumor phylogenetics") has become a crucial tool for making sense of somatic evolution processes that underlie cancer development and are increasingly recognized as part of normal tissue growth and aging. The inference of clonal lineage trees from single-cell sequence data offers particular promise for revealing processes of somatic evolution in unprecedented detail. However, most such tools are based on fairly restrictive models of the types of mutation events observed in somatic evolution and of the processes by which they develop. The present work seeks to enhance the power and versatility of tools for single-cell lineage reconstruction by making more comprehensive use of the range of molecular variant types by which tumors evolve. We introduce Sc-TUSV-ext, an integer linear programming-based tumor phylogeny reconstruction method that, for the first time, integrates single nucleotide variants, copy number alterations, and structural variations into clonal lineage reconstruction from single-cell DNA sequencing data. We show on synthetic data that accounting for these variant types collectively leads to improved accuracy in clonal lineage reconstruction relative to prior methods that consider only subsets of the variant types. We further demonstrate the effectiveness of real data in resolving clonal evolution in the presence of multiple variant types, providing a path toward more comprehensive insight into how various forms of somatic mutability collectively shape tissue development.
Collapse
Affiliation(s)
- Nishat Anjum Bristy
- Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Xuecong Fu
- Department of Biological Sciences, Carnegie Mellon University Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Russell Schwartz
- Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University Pittsburgh, Pittsburgh, Pennsylvania, USA
- Department of Biological Sciences, Carnegie Mellon University Pittsburgh, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
2
|
Story B, Velten L, Mönke G, Annan A, Steinmetz L. Mitoclone2: an R package for elucidating clonal structure in single-cell RNA-sequencing data using mitochondrial variants. NAR Genom Bioinform 2024; 6:lqae095. [PMID: 39131821 PMCID: PMC11310777 DOI: 10.1093/nargab/lqae095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Revised: 06/14/2024] [Accepted: 07/23/2024] [Indexed: 08/13/2024] Open
Abstract
Clonal cell population dynamics play a critical role in both disease and development. Due to high mitochondrial mutation rates under both healthy and diseased conditions, mitochondrial genomic variability is a particularly useful resource in facilitating the identification of clonal population structure. Here we present mitoClone2, an all-inclusive R package allowing for the identification of clonal populations through integration of mitochondrial heteroplasmic variants discovered from single-cell sequencing experiments. Our package streamlines the investigation of this phenomenon by providing: built-in compatibility with commonly used tools for the delineation of clonal structure, the ability to directly use multiplexed BAM files as input, annotations for both human and mouse mitochondrial genomes, and helper functions for calling, filtering, clustering, and visualizing variants.
Collapse
Affiliation(s)
- Benjamin Story
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
| | - Lars Velten
- Center for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Gregor Mönke
- Developmental Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - Ahrmad Annan
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Lars Steinmetz
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Genome Technology Center, Palo Alto, CA, USA
| |
Collapse
|
3
|
Foltz SM, Li Y, Yao L, Terekhanova NV, Weerasinghe A, Gao Q, Dong G, Schindler M, Cao S, Sun H, Jayasinghe RG, Fulton RS, Fronick CC, King J, Kohnen DR, Fiala MA, Chen K, DiPersio JF, Vij R, Ding L. Somatic mutation phasing and haplotype extension using linked-reads in multiple myeloma. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.09.607342. [PMID: 39149342 PMCID: PMC11326269 DOI: 10.1101/2024.08.09.607342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Somatic mutation phasing informs our understanding of cancer-related events, like driver mutations. We generated linked-read whole genome sequencing data for 23 samples across disease stages from 14 multiple myeloma (MM) patients and systematically assigned somatic mutations to haplotypes using linked-reads. Here, we report the reconstructed cancer haplotypes and phase blocks from several MM samples and show how phase block length can be extended by integrating samples from the same individual. We also uncover phasing information in genes frequently mutated in MM, including DIS3, HIST1H1E, KRAS, NRAS, and TP53, phasing 79.4% of 20,705 high-confidence somatic mutations. In some cases, this enabled us to interpret clonal evolution models at higher resolution using pairs of phased somatic mutations. For example, our analysis of one patient suggested that two NRAS hotspot mutations occurred on the same haplotype but were independent events in different subclones. Given sufficient tumor purity and data quality, our framework illustrates how haplotype-aware analysis of somatic mutations in cancer can be beneficial for some cancer cases.
Collapse
Affiliation(s)
- Steven M. Foltz
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Yize Li
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Lijun Yao
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Nadezhda V. Terekhanova
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Amila Weerasinghe
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Qingsong Gao
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Guanlan Dong
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Moses Schindler
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Song Cao
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Hua Sun
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Reyka G. Jayasinghe
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Robert S. Fulton
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Catrina C. Fronick
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Justin King
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Daniel R. Kohnen
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Mark A. Fiala
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Ken Chen
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - John F. DiPersio
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Ravi Vij
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Li Ding
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
- Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, 63110, USA
- Department of Genetics, Washington University in St. Louis, St. Louis, MO, 63110, USA
| |
Collapse
|
4
|
Sashittal P, Chen V, Pasarkar A, Raphael BJ. Joint inference of cell lineage and mitochondrial evolution from single-cell sequencing data. Bioinformatics 2024; 40:i218-i227. [PMID: 38940122 PMCID: PMC11211840 DOI: 10.1093/bioinformatics/btae231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION Eukaryotic cells contain organelles called mitochondria that have their own genome. Most cells contain thousands of mitochondria which replicate, even in nondividing cells, by means of a relatively error-prone process resulting in somatic mutations in their genome. Because of the higher mutation rate compared to the nuclear genome, mitochondrial mutations have been used to track cellular lineage, particularly using single-cell sequencing that measures mitochondrial mutations in individual cells. However, existing methods to infer the cell lineage tree from mitochondrial mutations do not model "heteroplasmy," which is the presence of multiple mitochondrial clones with distinct sets of mutations in an individual cell. Single-cell sequencing data thus provide a mixture of the mitochondrial clones in individual cells, with the ancestral relationships between these clones described by a mitochondrial clone tree. While deconvolution of somatic mutations from a mixture of evolutionarily related genomes has been extensively studied in the context of bulk sequencing of cancer tumor samples, the problem of mitochondrial deconvolution has the additional constraint that the mitochondrial clone tree must be concordant with the cell lineage tree. RESULTS We formalize the problem of inferring a concordant pair of a mitochondrial clone tree and a cell lineage tree from single-cell sequencing data as the Nested Perfect Phylogeny Mixture (NPPM) problem. We derive a combinatorial characterization of the solutions to the NPPM problem, and formulate an algorithm, MERLIN, to solve this problem exactly using a mixed integer linear program. We show on simulated data that MERLIN outperforms existing methods that do not model mitochondrial heteroplasmy nor the concordance between the mitochondrial clone tree and the cell lineage tree. We use MERLIN to analyze single-cell whole-genome sequencing data of 5220 cells of a gastric cancer cell line and show that MERLIN infers a more biologically plausible cell lineage tree and mitochondrial clone tree compared to existing methods. AVAILABILITY AND IMPLEMENTATION https://github.com/raphael-group/MERLIN.
Collapse
Affiliation(s)
- Palash Sashittal
- Department of Computer Science, Princeton University, Princeton, NJ 08540, United States
| | - Viola Chen
- Department of Computer Science, Princeton University, Princeton, NJ 08540, United States
| | - Amey Pasarkar
- Department of Computer Science, Princeton University, Princeton, NJ 08540, United States
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ 08540, United States
| |
Collapse
|
5
|
Lai J, Yang Y, Liu Y, Scharpf RB, Karchin R. Assessing the merits: an opinion on the effectiveness of simulation techniques in tumor subclonal reconstruction. BIOINFORMATICS ADVANCES 2024; 4:vbae094. [PMID: 38948008 PMCID: PMC11213631 DOI: 10.1093/bioadv/vbae094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 05/28/2024] [Accepted: 06/15/2024] [Indexed: 07/02/2024]
Abstract
Summary Neoplastic tumors originate from a single cell, and their evolution can be traced through lineages characterized by mutations, copy number alterations, and structural variants. These lineages are reconstructed and mapped onto evolutionary trees with algorithmic approaches. However, without ground truth benchmark sets, the validity of an algorithm remains uncertain, limiting potential clinical applicability. With a growing number of algorithms available, there is urgent need for standardized benchmark sets to evaluate their merits. Benchmark sets rely on in silico simulations of tumor sequence, but there are no accepted standards for simulation tools, presenting a major obstacle to progress in this field. Availability and implementation All analysis done in the paper was based on publicly available data from the publication of each accessed tool.
Collapse
Affiliation(s)
- Jiaying Lai
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Yi Yang
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Yunzhou Liu
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Robert B Scharpf
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21231, United States
- Department of Oncology, Johns Hopkins Medical Institutions, Baltimore, MD 21231, United States
| | - Rachel Karchin
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21231, United States
- Department of Oncology, Johns Hopkins Medical Institutions, Baltimore, MD 21231, United States
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, United States
| |
Collapse
|
6
|
Li L, Xie W, Zhan L, Wen S, Luo X, Xu S, Cai Y, Tang W, Wang Q, Li M, Xie Z, Deng L, Zhu H, Yu G. Resolving tumor evolution: a phylogenetic approach. JOURNAL OF THE NATIONAL CANCER CENTER 2024; 4:97-106. [PMID: 39282584 PMCID: PMC11390690 DOI: 10.1016/j.jncc.2024.03.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Revised: 02/28/2024] [Accepted: 03/20/2024] [Indexed: 09/19/2024] Open
Abstract
The evolutionary dynamics of cancer, characterized by its profound heterogeneity, demand sophisticated tools for a holistic understanding. This review delves into tumor phylogenetics, an essential approach bridging evolutionary biology with oncology, offering unparalleled insights into cancer's evolutionary trajectory. We provide an overview of the workflow, encompassing study design, data acquisition, and phylogeny reconstruction. Notably, the integration of diverse data sets emerges as a transformative step, enhancing the depth and breadth of evolutionary insights. With this integrated perspective, tumor phylogenetics stands poised to redefine our understanding of cancer evolution and influence therapeutic strategies.
Collapse
Affiliation(s)
- Lin Li
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Wenqin Xie
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Li Zhan
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Shaodi Wen
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
- Department of Oncology, The Affiliated Cancer Hospital of Nanjing Medical University & Jiangsu Cancer Hospital, Nanjing, China
| | - Xiao Luo
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Shuangbin Xu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
- Division of Laboratory Medicine, Microbiome Center, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Yantong Cai
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
- Dermatology Hospital, Southern Medical University, Guangzhou, China
| | - Wenli Tang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Qianwen Wang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Ming Li
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Zijing Xie
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Lin Deng
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Hongyuan Zhu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Guangchuang Yu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| |
Collapse
|
7
|
Lai J, Liu Y, Scharpf RB, Karchin R. Evaluation of simulation methods for tumor subclonal reconstruction. ARXIV 2024:arXiv:2402.09599v1. [PMID: 38410652 PMCID: PMC10896360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 02/28/2024]
Abstract
Most neoplastic tumors originate from a single cell, and their evolution can be genetically traced through lineages characterized by common alterations such as small somatic mutations (SSMs), copy number alterations (CNAs), structural variants (SVs), and aneuploidies. Due to the complexity of these alterations in most tumors and the errors introduced by sequencing protocols and calling algorithms, tumor subclonal reconstruction algorithms are necessary to recapitulate the DNA sequence composition and tumor evolution in silico. With a growing number of these algorithms available, there is a pressing need for consistent and comprehensive benchmarking, which relies on realistic tumor sequencing generated by simulation tools. Here, we examine the current simulation methods, identifying their strengths and weaknesses, and provide recommendations for their improvement. Our review also explores potential new directions for research in this area. This work aims to serve as a resource for understanding and enhancing tumor genomic simulations, contributing to the advancement of the field.
Collapse
Affiliation(s)
- Jiaying Lai
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD
| | - Yunzhou Liu
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD
| | - Robert B. Scharpf
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD
- Department of Oncology, Johns Hopkins Medical Institutions, Baltimore, MD
| | - Rachel Karchin
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD
- Department of Oncology, Johns Hopkins Medical Institutions, Baltimore, MD
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD
| |
Collapse
|
8
|
Ve K, R R, Cac P, A K, E T, Cc S, Ab O. Single Nucleotide Polymorphism (SNP) and Antibody-based Cell Sorting (SNACS): A tool for demultiplexing single-cell DNA sequencing data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.07.579345. [PMID: 38370638 PMCID: PMC10871358 DOI: 10.1101/2024.02.07.579345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Motivation Recently, single-cell DNA sequencing (scDNA-seq) and multi-modal profiling with the addition of cell-surface antibodies (scDAb-seq) have provided key insights into cancer heterogeneity. Scaling these technologies across large patient cohorts, however, is cost and time prohibitive. Multiplexing, in which cells from unique patients are pooled into a single experiment, offers a possible solution. While multiplexing methods exist for scRNAseq, accurate demultiplexing in scDNAseq remains an unmet need. Results Here, we introduce SNACS: Single-Nucleotide Polymorphism (SNP) and Antibody-based Cell Sorting. SNACS relies on a combination of patient-level cell-surface identifiers and natural variation in genetic polymorphisms to demultiplex scDNAseq data. We demonstrated the performance of SNACS on a dataset consisting of multi-sample experiments from patients with leukemia where we knew truth from single-sample experiments from the same patients. Using SNACS, accuracy ranged from 0.948 - 0.991 vs 0.552 - 0.934 using demultiplexing methods from the single-cell literature.
Collapse
Affiliation(s)
- Kennedy Ve
- Division of Hematology and Oncology, Department of Medicine, University of California San Francisco, San Francisco, CA, USA, 94143
| | - Roy R
- Hellen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA, USA, 94143
| | - Peretz Cac
- Hellen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA, USA, 94143
- Division of Hematology and Oncology, Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA, 94143
| | - Koh A
- Division of Hematology and Oncology, Department of Medicine, University of California San Francisco, San Francisco, CA, USA, 94143
| | - Tran E
- Division of Hematology and Oncology, Department of Medicine, University of California San Francisco, San Francisco, CA, USA, 94143
| | - Smith Cc
- Division of Hematology and Oncology, Department of Medicine, University of California San Francisco, San Francisco, CA, USA, 94143
- Hellen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA, USA, 94143
| | - Olshen Ab
- Hellen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA, USA, 94143
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA, USA, 94143
| |
Collapse
|
9
|
López Sánchez A, Lafond M. Predicting horizontal gene transfers with perfect transfer networks. Algorithms Mol Biol 2024; 19:6. [PMID: 38321476 PMCID: PMC10848447 DOI: 10.1186/s13015-023-00242-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 10/25/2023] [Indexed: 02/08/2024] Open
Abstract
BACKGROUND Horizontal gene transfer inference approaches are usually based on gene sequences: parametric methods search for patterns that deviate from a particular genomic signature, while phylogenetic methods use sequences to reconstruct the gene and species trees. However, it is well-known that sequences have difficulty identifying ancient transfers since mutations have enough time to erase all evidence of such events. In this work, we ask whether character-based methods can predict gene transfers. Their advantage over sequences is that homologous genes can have low DNA similarity, but still have retained enough important common motifs that allow them to have common character traits, for instance the same functional or expression profile. A phylogeny that has two separate clades that acquired the same character independently might indicate the presence of a transfer even in the absence of sequence similarity. OUR CONTRIBUTIONS We introduce perfect transfer networks, which are phylogenetic networks that can explain the character diversity of a set of taxa under the assumption that characters have unique births, and that once a character is gained it is rarely lost. Examples of such traits include transposable elements, biochemical markers and emergence of organelles, just to name a few. We study the differences between our model and two similar models: perfect phylogenetic networks and ancestral recombination networks. Our goals are to initiate a study on the structural and algorithmic properties of perfect transfer networks. We then show that in polynomial time, one can decide whether a given network is a valid explanation for a set of taxa, and show how, for a given tree, one can add transfer edges to it so that it explains a set of taxa. We finally provide lower and upper bounds on the number of transfers required to explain a set of taxa, in the worst case.
Collapse
Affiliation(s)
| | - Manuel Lafond
- Department of Computer Science, Université de Sherbrooke, Sherbrooke, Canada
| |
Collapse
|
10
|
Rossi N, Gigante N, Vitacolonna N, Piazza C. Inferring Markov Chains to Describe Convergent Tumor Evolution With CIMICE. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:106-119. [PMID: 38015671 DOI: 10.1109/tcbb.2023.3337258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
The field of tumor phylogenetics focuses on studying the differences within cancer cell populations. Many efforts are done within the scientific community to build cancer progression models trying to understand the heterogeneity of such diseases. These models are highly dependent on the kind of data used for their construction, therefore, as the experimental technologies evolve, it is of major importance to exploit their peculiarities. In this work we describe a cancer progression model based on Single Cell DNA Sequencing data. When constructing the model, we focus on tailoring the formalism on the specificity of the data. We operate by defining a minimal set of assumptions needed to reconstruct a flexible DAG structured model, capable of identifying progression beyond the limitation of the infinite site assumption. Our proposal is conservative in the sense that we aim to neither discard nor infer knowledge which is not represented in the data. We provide simulations and analytical results to show the features of our model, test it on real data, show how it can be integrated with other approaches to cope with input noise. Moreover, our framework can be exploited to produce simulated data that follows our theoretical assumptions. Finally, we provide an open source R implementation of our approach, called CIMICE, that is publicly available on BioConductor.
Collapse
|
11
|
Bristy NA, Fu X, Schwartz R. Sc-TUSV-ext: Single-cell clonal lineage inference from single nucleotide variants (SNV), copy number alterations (CNA) and structural variants (SV). BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.07.570724. [PMID: 38106049 PMCID: PMC10723466 DOI: 10.1101/2023.12.07.570724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
Clonal lineage inference ("tumor phylogenetics") has become a crucial tool for making sense of somatic evolution processes that underlie cancer development and are increasingly recognized as part of normal tissue growth and aging. The inference of clonal lineage trees from single cell sequence data offers particular promise for revealing processes of somatic evolution in unprecedented detail. However, most such tools are based on fairly restrictive models of the types of mutation events observed in somatic evolution and of the processes by which they develop. The present work seeks to enhance the power and versatility of tools for single-cell lineage reconstruction by making more comprehensive use of the range of molecular variant types by which tumors evolve. We introduce Sc-TUSV-ext, an integer linear programming (ILP) based tumor phylogeny reconstruction method that, for the first time, integrates single nucleotide variants (SNV), copy number alterations (CNA) and structural variations (SV) into clonal lineage reconstruction from single-cell DNA sequencing data. We show on synthetic data that accounting for these variant types collectively leads to improved accuracy in clonal lineage reconstruction relative to prior methods that consider only subsets of the variant types. We further demonstrate the effectiveness on real data in resolving clonal evolution in the presence of multiple variant types, providing a path towards more comprehensive insight into how various forms of somatic mutability collectively shape tissue development.
Collapse
|
12
|
Han Y, Molloy EK. Quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model. Algorithms Mol Biol 2023; 18:19. [PMID: 38041123 PMCID: PMC10691101 DOI: 10.1186/s13015-023-00248-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 11/19/2023] [Indexed: 12/03/2023] Open
Abstract
Cancer progression and treatment can be informed by reconstructing its evolutionary history from tumor cells. Although many methods exist to estimate evolutionary trees (called phylogenies) from molecular sequences, traditional approaches assume the input data are error-free and the output tree is fully resolved. These assumptions are challenged in tumor phylogenetics because single-cell sequencing produces sparse, error-ridden data and because tumors evolve clonally. Here, we study the theoretical utility of methods based on quartets (four-leaf, unrooted phylogenetic trees) in light of these barriers. We consider a popular tumor phylogenetics model, in which mutations arise on a (highly unresolved) tree and then (unbiased) errors and missing values are introduced. Quartets are then implied by mutations present in two cells and absent from two cells. Our main result is that the most probable quartet identifies the unrooted model tree on four cells. This motivates seeking a tree such that the number of quartets shared between it and the input mutations is maximized. We prove an optimal solution to this problem is a consistent estimator of the unrooted cell lineage tree; this guarantee includes the case where the model tree is highly unresolved, with error defined as the number of false negative branches. Lastly, we outline how quartet-based methods might be employed when there are copy number aberrations and other challenges specific to tumor phylogenetics.
Collapse
Affiliation(s)
- Yunheng Han
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Erin K Molloy
- Department of Computer Science, University of Maryland, College Park, MD, USA.
- University of Maryland Institute for Advanced Computer Studies, College Park, MD, USA.
| |
Collapse
|
13
|
Sashittal P, Zhang H, Iacobuzio-Donahue CA, Raphael BJ. ConDoR: tumor phylogeny inference with a copy-number constrained mutation loss model. Genome Biol 2023; 24:272. [PMID: 38037115 PMCID: PMC10688497 DOI: 10.1186/s13059-023-03106-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Accepted: 11/07/2023] [Indexed: 12/02/2023] Open
Abstract
A tumor contains a diverse collection of somatic mutations that reflect its past evolutionary history and that range in scale from single nucleotide variants (SNVs) to large-scale copy-number aberrations (CNAs). However, no current single-cell DNA sequencing (scDNA-seq) technology produces accurate measurements of both SNVs and CNAs, complicating the inference of tumor phylogenies. We introduce a new evolutionary model, the constrained k-Dollo model, that uses SNVs as phylogenetic markers but constrains losses of SNVs according to clusters of cells. We derive an algorithm, ConDoR, that infers phylogenies from targeted scDNA-seq data using this model. We demonstrate the advantages of ConDoR on simulated and real scDNA-seq data.
Collapse
Affiliation(s)
| | - Haochen Zhang
- Gerstner Sloan Kettering Graduate School of Biomedical Sciences, Memorial Sloan Kettering Cancer Center, NY, USA
| | - Christine A Iacobuzio-Donahue
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, NY, USA
- David M. Rubenstein Center for Pancreatic Cancer Research, Memorial Sloan Kettering Cancer Center, NY, USA
- Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, NY, USA
| | | |
Collapse
|
14
|
Weber LL, Zhang C, Ochoa I, El-Kebir M. Phertilizer: Growing a clonal tree from ultra-low coverage single-cell DNA sequencing of tumors. PLoS Comput Biol 2023; 19:e1011544. [PMID: 37819942 PMCID: PMC10593221 DOI: 10.1371/journal.pcbi.1011544] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 10/23/2023] [Accepted: 09/26/2023] [Indexed: 10/13/2023] Open
Abstract
Emerging ultra-low coverage single-cell DNA sequencing (scDNA-seq) technologies have enabled high resolution evolutionary studies of copy number aberrations (CNAs) within tumors. While these sequencing technologies are well suited for identifying CNAs due to the uniformity of sequencing coverage, the sparsity of coverage poses challenges for the study of single-nucleotide variants (SNVs). In order to maximize the utility of increasingly available ultra-low coverage scDNA-seq data and obtain a comprehensive understanding of tumor evolution, it is important to also analyze the evolution of SNVs from the same set of tumor cells. We present Phertilizer, a method to infer a clonal tree from ultra-low coverage scDNA-seq data of a tumor. Based on a probabilistic model, our method recursively partitions the data by identifying key evolutionary events in the history of the tumor. We demonstrate the performance of Phertilizer on simulated data as well as on two real datasets, finding that Phertilizer effectively utilizes the copy-number signal inherent in the data to more accurately uncover clonal structure and genotypes compared to previous methods.
Collapse
Affiliation(s)
- Leah L. Weber
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana-Champaign, Illinois, United States of America
| | - Chuanyi Zhang
- Department of Electrical & Computer Engineering, University of Illinois Urbana-Champaign, Urbana-Champaign, Illinois, United States of America
| | - Idoia Ochoa
- Department of Electrical & Computer Engineering, University of Illinois Urbana-Champaign, Urbana-Champaign, Illinois, United States of America
- Department of Electrical and Electronics Engineering, University of Navarre, Donostia, Spain
| | - Mohammed El-Kebir
- Department of Electrical and Electronics Engineering, University of Navarre, Donostia, Spain
- Cancer Center at Illinois, University of Illinois Urbana-Champaign, Urbana-Champaign, Illinois, United States of America
| |
Collapse
|
15
|
Liu Y, Li XC, Rashidi Mehrabadi F, Schäffer AA, Pratt D, Crawford DR, Malikić S, Molloy EK, Gopalan V, Mount SM, Ruppin E, Aldape KD, Sahinalp SC. Single-cell methylation sequencing data reveal succinct metastatic migration histories and tumor progression models. Genome Res 2023; 33:1089-1100. [PMID: 37316351 PMCID: PMC10538489 DOI: 10.1101/gr.277608.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 06/06/2023] [Indexed: 06/16/2023]
Abstract
Recent studies exploring the impact of methylation in tumor evolution suggest that although the methylation status of many of the CpG sites are preserved across distinct lineages, others are altered as the cancer progresses. Because changes in methylation status of a CpG site may be retained in mitosis, they could be used to infer the progression history of a tumor via single-cell lineage tree reconstruction. In this work, we introduce the first principled distance-based computational method, Sgootr, for inferring a tumor's single-cell methylation lineage tree and for jointly identifying lineage-informative CpG sites that harbor changes in methylation status that are retained along the lineage. We apply Sgootr on single-cell bisulfite-treated whole-genome sequencing data of multiregionally sampled tumor cells from nine metastatic colorectal cancer patients, as well as multiregionally sampled single-cell reduced-representation bisulfite sequencing data from a glioblastoma patient. We show that the tumor lineages constructed reveal a simple model underlying tumor progression and metastatic seeding. A comparison of Sgootr against alternative approaches shows that Sgootr can construct lineage trees with fewer migration events and with more in concordance with the sequential-progression model of tumor evolution, with a running time a fraction of that used in prior studies. Lineage-informative CpG sites identified by Sgootr are in inter-CpG island (CGI) regions, as opposed to intra-CGIs, which have been the main regions of interest in genomic methylation-related analyses.
Collapse
Affiliation(s)
- Yuelin Liu
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
- Department of Computer Science, University of Maryland, College Park, Maryland 20742, USA
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland 20742, USA
| | - Xuan Cindy Li
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
- Program in Computational Biology, Bioinformatics, and Genomics, University of Maryland, College Park, Maryland 20742, USA
| | - Farid Rashidi Mehrabadi
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
- Department of Computer Science, Indiana University, Bloomington, Indiana 47408, USA
- Laboratory of Human Carcinogenesis, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Alejandro A Schäffer
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Drew Pratt
- Laboratory of Pathology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - David R Crawford
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
- Program in Computational Biology, Bioinformatics, and Genomics, University of Maryland, College Park, Maryland 20742, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland 20742, USA
| | - Salem Malikić
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Erin K Molloy
- Department of Computer Science, University of Maryland, College Park, Maryland 20742, USA
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland 20742, USA
| | - Vishaka Gopalan
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Stephen M Mount
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland 20742, USA
| | - Eytan Ruppin
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Kenneth D Aldape
- Laboratory of Pathology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - S Cenk Sahinalp
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA;
| |
Collapse
|
16
|
Guang Z, Smith-Erb M, Oesper L. A weighted distance-based approach for deriving consensus tumor evolutionary trees. Bioinformatics 2023; 39:i204-i212. [PMID: 37387177 DOI: 10.1093/bioinformatics/btad230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION The acquisition of somatic mutations by a tumor can be modeled by a type of evolutionary tree. However, it is impossible to observe this tree directly. Instead, numerous algorithms have been developed to infer such a tree from different types of sequencing data. But such methods can produce conflicting trees for the same patient, making it desirable to have approaches that can combine several such tumor trees into a consensus or summary tree. We introduce The Weighted m-Tumor Tree Consensus Problem (W-m-TTCP) to find a consensus tree among multiple plausible tumor evolutionary histories, each assigned a confidence weight, given a specific distance measure between tumor trees. We present an algorithm called TuELiP that is based on integer linear programming which solves the W-m-TTCP, and unlike other existing consensus methods, allows the input trees to be weighted differently. RESULTS On simulated data we show that TuELiP outperforms two existing methods at correctly identifying the true underlying tree used to create the simulations. We also show that the incorporation of weights can lead to more accurate tree inference. On a Triple-Negative Breast Cancer dataset, we show that including confidence weights can have important impacts on the consensus tree identified. AVAILABILITY An implementation of TuELiP and simulated datasets are available at https://bitbucket.org/oesperlab/consensus-ilp/src/main/.
Collapse
Affiliation(s)
- Ziyun Guang
- Department of Computer Science, Carleton College, Northfield, MN 55057, USA
| | - Matthew Smith-Erb
- Department of Computer Science, Carleton College, Northfield, MN 55057, USA
| | - Layla Oesper
- Department of Computer Science, Carleton College, Northfield, MN 55057, USA
| |
Collapse
|
17
|
Beneyto-Calabuig S, Merbach AK, Kniffka JA, Antes M, Szu-Tu C, Rohde C, Waclawiczek A, Stelmach P, Gräßle S, Pervan P, Janssen M, Landry JJM, Benes V, Jauch A, Brough M, Bauer M, Besenbeck B, Felden J, Bäumer S, Hundemer M, Sauer T, Pabst C, Wickenhauser C, Angenendt L, Schliemann C, Trumpp A, Haas S, Scherer M, Raffel S, Müller-Tidow C, Velten L. Clonally resolved single-cell multi-omics identifies routes of cellular differentiation in acute myeloid leukemia. Cell Stem Cell 2023; 30:706-721.e8. [PMID: 37098346 DOI: 10.1016/j.stem.2023.04.001] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 02/05/2023] [Accepted: 03/30/2023] [Indexed: 04/27/2023]
Abstract
Inter-patient variability and the similarity of healthy and leukemic stem cells (LSCs) have impeded the characterization of LSCs in acute myeloid leukemia (AML) and their differentiation landscape. Here, we introduce CloneTracer, a novel method that adds clonal resolution to single-cell RNA-seq datasets. Applied to samples from 19 AML patients, CloneTracer revealed routes of leukemic differentiation. Although residual healthy and preleukemic cells dominated the dormant stem cell compartment, active LSCs resembled their healthy counterpart and retained erythroid capacity. By contrast, downstream myeloid progenitors constituted a highly aberrant, disease-defining compartment: their gene expression and differentiation state affected both the chemotherapy response and leukemia's ability to differentiate into transcriptomically normal monocytes. Finally, we demonstrated the potential of CloneTracer to identify surface markers misregulated specifically in leukemic cells. Taken together, CloneTracer reveals a differentiation landscape that mimics its healthy counterpart and may determine biology and therapy response in AML.
Collapse
Affiliation(s)
- Sergi Beneyto-Calabuig
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain; Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Anne Kathrin Merbach
- Department of Medicine, Hematology, Oncology and Rheumatology, University Hospital Heidelberg, 69120 Heidelberg, Germany; Molecular Medicine Partnership Unit, European Molecular Biology Laboratory (EMBL), University of Heidelberg, 69117 Heidelberg, Germany
| | - Jonas-Alexander Kniffka
- Department of Medicine, Hematology, Oncology and Rheumatology, University Hospital Heidelberg, 69120 Heidelberg, Germany
| | - Magdalena Antes
- Department of Medicine, Hematology, Oncology and Rheumatology, University Hospital Heidelberg, 69120 Heidelberg, Germany; Heidelberg Institute for Stem Cell Technology and Experimental Medicine (HI-STEM gGmbH), 69120 Heidelberg, Germany; Division of Stem Cells and Cancer, Deutsches Krebsforschungszentrum (DKFZ) and DKFZ-ZMBH Alliance, 69120 Heidelberg, Germany
| | - Chelsea Szu-Tu
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Christian Rohde
- Department of Medicine, Hematology, Oncology and Rheumatology, University Hospital Heidelberg, 69120 Heidelberg, Germany; Molecular Medicine Partnership Unit, European Molecular Biology Laboratory (EMBL), University of Heidelberg, 69117 Heidelberg, Germany
| | - Alexander Waclawiczek
- Heidelberg Institute for Stem Cell Technology and Experimental Medicine (HI-STEM gGmbH), 69120 Heidelberg, Germany; Division of Stem Cells and Cancer, Deutsches Krebsforschungszentrum (DKFZ) and DKFZ-ZMBH Alliance, 69120 Heidelberg, Germany
| | - Patrick Stelmach
- Department of Medicine, Hematology, Oncology and Rheumatology, University Hospital Heidelberg, 69120 Heidelberg, Germany; Division of Stem Cells and Cancer, Deutsches Krebsforschungszentrum (DKFZ) and DKFZ-ZMBH Alliance, 69120 Heidelberg, Germany
| | - Sarah Gräßle
- Berlin Institute of Health (BIH) at Charité - Universitätsmedizin Berlin, 10117 Berlin, Germany; Charité-Universitätsmedizin, 10117 Berlin, Germany; Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 10115 Berlin, Germany
| | - Philip Pervan
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Maike Janssen
- Department of Medicine, Hematology, Oncology and Rheumatology, University Hospital Heidelberg, 69120 Heidelberg, Germany; Molecular Medicine Partnership Unit, European Molecular Biology Laboratory (EMBL), University of Heidelberg, 69117 Heidelberg, Germany
| | - Jonathan J M Landry
- Genomics Core Facility, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany
| | - Vladimir Benes
- Genomics Core Facility, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany
| | - Anna Jauch
- Institute of Human Genetics, University of Heidelberg, 69120 Heidelberg, Germany
| | - Michaela Brough
- Institute of Human Genetics, University of Heidelberg, 69120 Heidelberg, Germany
| | - Marcus Bauer
- Institute of Pathology, University Hospital Halle (Saale), Martin-Luther-University Halle-Wittenberg, 06112 Halle, Germany
| | - Birgit Besenbeck
- Department of Medicine, Hematology, Oncology and Rheumatology, University Hospital Heidelberg, 69120 Heidelberg, Germany
| | - Julia Felden
- Department of Medicine, Hematology, Oncology and Rheumatology, University Hospital Heidelberg, 69120 Heidelberg, Germany
| | - Sebastian Bäumer
- Department of Medicine A, Hematology and Oncology, University Hospital, Muenster, Germany
| | - Michael Hundemer
- Department of Medicine, Hematology, Oncology and Rheumatology, University Hospital Heidelberg, 69120 Heidelberg, Germany
| | - Tim Sauer
- Department of Medicine, Hematology, Oncology and Rheumatology, University Hospital Heidelberg, 69120 Heidelberg, Germany
| | - Caroline Pabst
- Department of Medicine, Hematology, Oncology and Rheumatology, University Hospital Heidelberg, 69120 Heidelberg, Germany; Molecular Medicine Partnership Unit, European Molecular Biology Laboratory (EMBL), University of Heidelberg, 69117 Heidelberg, Germany
| | - Claudia Wickenhauser
- Institute of Pathology, University Hospital Halle (Saale), Martin-Luther-University Halle-Wittenberg, 06112 Halle, Germany
| | - Linus Angenendt
- Department of Medicine A, Hematology and Oncology, University Hospital, Muenster, Germany; Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
| | - Christoph Schliemann
- Department of Medicine A, Hematology and Oncology, University Hospital, Muenster, Germany
| | - Andreas Trumpp
- Heidelberg Institute for Stem Cell Technology and Experimental Medicine (HI-STEM gGmbH), 69120 Heidelberg, Germany; Division of Stem Cells and Cancer, Deutsches Krebsforschungszentrum (DKFZ) and DKFZ-ZMBH Alliance, 69120 Heidelberg, Germany
| | - Simon Haas
- Heidelberg Institute for Stem Cell Technology and Experimental Medicine (HI-STEM gGmbH), 69120 Heidelberg, Germany; Division of Stem Cells and Cancer, Deutsches Krebsforschungszentrum (DKFZ) and DKFZ-ZMBH Alliance, 69120 Heidelberg, Germany; Berlin Institute of Health (BIH) at Charité - Universitätsmedizin Berlin, 10117 Berlin, Germany; Charité-Universitätsmedizin, 10117 Berlin, Germany; Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 10115 Berlin, Germany
| | - Michael Scherer
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Simon Raffel
- Department of Medicine, Hematology, Oncology and Rheumatology, University Hospital Heidelberg, 69120 Heidelberg, Germany
| | - Carsten Müller-Tidow
- Department of Medicine, Hematology, Oncology and Rheumatology, University Hospital Heidelberg, 69120 Heidelberg, Germany; Molecular Medicine Partnership Unit, European Molecular Biology Laboratory (EMBL), University of Heidelberg, 69117 Heidelberg, Germany.
| | - Lars Velten
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain; Universitat Pompeu Fabra (UPF), Barcelona, Spain.
| |
Collapse
|
18
|
Sashittal P, Zhang H, Iacobuzio-Donahue CA, Raphael BJ. ConDoR: Tumor phylogeny inference with a copy-number constrained mutation loss model. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.05.522408. [PMID: 36711528 PMCID: PMC9882003 DOI: 10.1101/2023.01.05.522408] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Tumors consist of subpopulations of cells that harbor distinct collections of somatic mutations. These mutations range in scale from single nucleotide variants (SNVs) to large-scale copy-number aberrations (CNAs). While many approaches infer tumor phylogenies using SNVs as phylogenetic markers, CNAs that overlap SNVs may lead to erroneous phylogenetic inference. Specifically, an SNV may be lost in a cell due to a deletion of the genomic segment containing the SNV. Unfortunately, no current single-cell DNA sequencing (scDNA-seq) technology produces accurate measurements of both SNVs and CNAs. For instance, recent targeted scDNA-seq technologies, such as Mission Bio Tapestri, measure SNVs with high fidelity in individual cells, but yield much less reliable measurements of CNAs. We introduce a new evolutionary model, the constrained k-Dollo model, that uses SNVs as phylogenetic markers and partial information about CNAs in the form of clustering of cells with similar copy-number profiles. This copy-number clustering constrains where loss of SNVs can occur in the phylogeny. We develop ConDoR (Constrained Dollo Reconstruction), an algorithm to infer tumor phylogenies from targeted scDNA-seq data using the constrained k-Dollo model. We show that ConDoR outperforms existing methods on simulated data. We use ConDoR to analyze a new multi-region targeted scDNA-seq dataset of 2153 cells from a pancreatic ductal adenocarcinoma (PDAC) tumor and produce a more plausible phylogeny compared to existing methods that conforms to histological results for the tumor from a previous study. We also analyze a metastatic colorectal cancer dataset, deriving a more parsimonious phylogeny than previously published analyses and with a simpler monoclonal origin of metastasis compared to the original study. Code availability Software is available at https://github.com/raphael-group/constrained-Dollo.
Collapse
Affiliation(s)
| | - Haochen Zhang
- Gerstner Sloan Kettering Graduate School of Biomedical Sciences, Memorial Sloan Kettering Cancer Center, NY, USA
| | - Christine A. Iacobuzio-Donahue
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, NY, USA
- David M. Rubenstein Center for Pancreatic Cancer Research, Memorial Sloan Kettering Cancer Center, NY, USA
- Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, NY, USA
| | | |
Collapse
|
19
|
Pellegrina L, Vandin F. Discovering significant evolutionary trajectories in cancer phylogenies. Bioinformatics 2022; 38:ii49-ii55. [PMID: 36124798 DOI: 10.1093/bioinformatics/btac467] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Tumors are the result of a somatic evolutionary process leading to substantial intra-tumor heterogeneity. Single-cell and multi-region sequencing enable the detailed characterization of the clonal architecture of tumors and have highlighted its extensive diversity across tumors. While several computational methods have been developed to characterize the clonal composition and the evolutionary history of tumors, the identification of significantly conserved evolutionary trajectories across tumors is still a major challenge. RESULTS We present a new algorithm, MAximal tumor treeS TRajectOries (MASTRO), to discover significantly conserved evolutionary trajectories in cancer. MASTRO discovers all conserved trajectories in a collection of phylogenetic trees describing the evolution of a cohort of tumors, allowing the discovery of conserved complex relations between alterations. MASTRO assesses the significance of the trajectories using a conditional statistical test that captures the coherence in the order in which alterations are observed in different tumors. We apply MASTRO to data from nonsmall-cell lung cancer bulk sequencing and to acute myeloid leukemia data from single-cell panel sequencing, and find significant evolutionary trajectories recapitulating and extending the results reported in the original studies. AVAILABILITY AND IMPLEMENTATION MASTRO is available at https://github.com/VandinLab/MASTRO. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Leonardo Pellegrina
- Department of Information Engineering, University of Padova, Padova, 35129, Italy
| | - Fabio Vandin
- Department of Information Engineering, University of Padova, Padova, 35129, Italy
| |
Collapse
|
20
|
Kızılkale C, Rashidi Mehrabadi F, Sadeqi Azer E, Pérez-Guijarro E, Marie KL, Lee MP, Day CP, Merlino G, Ergün F, Buluç A, Sahinalp SC, Malikić S. Fast intratumor heterogeneity inference from single-cell sequencing data. NATURE COMPUTATIONAL SCIENCE 2022; 2:577-583. [PMID: 38177468 PMCID: PMC10765963 DOI: 10.1038/s43588-022-00298-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 07/14/2022] [Indexed: 01/06/2024]
Abstract
We introduce HUNTRESS, a computational method for mutational intratumor heterogeneity inference from noisy genotype matrices derived from single-cell sequencing data, the running time of which is linear with the number of cells and quadratic with the number of mutations. We prove that, under reasonable conditions, HUNTRESS computes the true progression history of a tumor with high probability. On simulated and real tumor sequencing data, HUNTRESS is demonstrated to be faster than available alternatives with comparable or better accuracy. Additionally, the progression histories of tumors inferred by HUNTRESS on real single-cell sequencing datasets agree with the best known evolution scenarios for the associated tumors.
Collapse
Affiliation(s)
- Can Kızılkale
- Department of Electrical Engineering and Computer Sciences UC Berkeley, Berkeley, CA, USA
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Farid Rashidi Mehrabadi
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
- Department of Computer Science, Indiana University, Bloomington, IN, USA
| | - Erfan Sadeqi Azer
- Department of Computer Science, Indiana University, Bloomington, IN, USA
- Google LLC, Sunnyvale, CA, USA
| | - Eva Pérez-Guijarro
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Kerrie L Marie
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Maxwell P Lee
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Chi-Ping Day
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Glenn Merlino
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Funda Ergün
- Department of Computer Science, Indiana University, Bloomington, IN, USA
| | - Aydın Buluç
- Department of Electrical Engineering and Computer Sciences UC Berkeley, Berkeley, CA, USA
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - S Cenk Sahinalp
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
| | - Salem Malikić
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
21
|
Foroughmand-Araabi MH, Goliaei S, McHardy AC. Scelestial: Fast and accurate single-cell lineage tree inference based on a Steiner tree approximation algorithm. PLoS Comput Biol 2022; 18:e1009100. [PMID: 35951662 PMCID: PMC9426887 DOI: 10.1371/journal.pcbi.1009100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 08/30/2022] [Accepted: 06/23/2022] [Indexed: 11/19/2022] Open
Abstract
Single-cell genome sequencing provides a highly granular view of biological systems but is affected by high error rates, allelic amplification bias, and uneven genome coverage. This creates a need for data-specific computational methods, for purposes such as for cell lineage tree inference. The objective of cell lineage tree reconstruction is to infer the evolutionary process that generated a set of observed cell genomes. Lineage trees may enable a better understanding of tumor formation and growth, as well as of organ development for healthy body cells. We describe a method, Scelestial, for lineage tree reconstruction from single-cell data, which is based on an approximation algorithm for the Steiner tree problem and is a generalization of the neighbor-joining method. We adapt the algorithm to efficiently select a limited subset of potential sequences as internal nodes, in the presence of missing values, and to minimize cost by lineage tree-based missing value imputation. In a comparison against seven state-of-the-art single-cell lineage tree reconstruction algorithms—BitPhylogeny, OncoNEM, SCITE, SiFit, SASC, SCIPhI, and SiCloneFit—on simulated and real single-cell tumor samples, Scelestial performed best at reconstructing trees in terms of accuracy and run time. Scelestial has been implemented in C++. It is also available as an R package named RScelestial.
Collapse
Affiliation(s)
- Mohammad-Hadi Foroughmand-Araabi
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Sama Goliaei
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Alice C. McHardy
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
- * E-mail:
| |
Collapse
|
22
|
Wang R, Zheng X, Wang J, Wan S, Song F, Wong MH, Leung KS, Cheng L. Improving bulk RNA-seq classification by transferring gene signature from single cells in acute myeloid leukemia. Brief Bioinform 2022; 23:6523149. [PMID: 35136933 DOI: 10.1093/bib/bbac002] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2021] [Revised: 12/22/2021] [Accepted: 01/04/2022] [Indexed: 12/13/2022] Open
Abstract
The advances in single-cell RNA sequencing (scRNA-seq) technologies enable the characterization of transcriptomic profiles at the cellular level and demonstrate great promise in bulk sample analysis thereby offering opportunities to transfer gene signature from scRNA-seq to bulk data. However, the gene expression signatures identified from single cells are typically inapplicable to bulk RNA-seq data due to the profiling differences of distinct sequencing technologies. Here, we propose single-cell pair-wise gene expression (scPAGE), a novel method to develop single-cell gene pair signatures (scGPSs) that were beneficial to bulk RNA-seq classification to transfer knowledge across platforms. PAGE was adopted to tackle the challenge of profiling differences. We applied the method to acute myeloid leukemia (AML) and identified the scGPS from mouse scRNA-seq that allowed discriminating between AML and control cells. The scGPS was validated in bulk RNA-seq datasets and demonstrated better performance (average area under the curve [AUC] = 0.96) than the conventional gene expression strategies (average AUC$\le$ 0.88) suggesting its potential in disclosing the molecular mechanism of AML. The scGPS also outperformed its bulk counterpart, which highlighted the benefit of gene signature transfer. Furthermore, we confirmed the utility of scPAGE in sepsis as an example of other disease scenarios. scPAGE leveraged the advantages of single-cell profiles to enhance the analysis of bulk samples revealing great potential of transferring knowledge from single-cell to bulk transcriptome studies.
Collapse
Affiliation(s)
- Ran Wang
- Shenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medicine College of Jinan University, Shenzhen 518020, China.,Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Xubin Zheng
- Shenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medicine College of Jinan University, Shenzhen 518020, China.,Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Jun Wang
- Shenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medicine College of Jinan University, Shenzhen 518020, China
| | - Shibiao Wan
- Center for Applied Bioinformatics, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Fangda Song
- School of Data Science, The Chinese University of Hong Kong, Shenzhen 518000, China
| | - Man Hon Wong
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Kwong Sak Leung
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Lixin Cheng
- Shenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medicine College of Jinan University, Shenzhen 518020, China
| |
Collapse
|
23
|
Abstract
MOTIVATION Cancer develops through a process of clonal evolution in which an initially healthy cell gives rise to progeny gradually differentiating through the accumulation of genetic and epigenetic mutations. These mutations can take various forms, including single-nucleotide variants (SNVs), copy number alterations (CNAs) or structural variations (SVs), with each variant type providing complementary insights into tumor evolution as well as offering distinct challenges to phylogenetic inference. RESULTS In this work, we develop a tumor phylogeny method, TUSV-ext, which incorporates SNVs, CNAs and SVs into a single inference framework. We demonstrate on simulated data that the method produces accurate tree inferences in the presence of all three variant types. We further demonstrate the method through application to real prostate tumor data, showing how our approach to coordinated phylogeny inference and clonal construction with all three variant types can reveal a more complicated clonal structure than is suggested by prior work, consistent with extensive polyclonal seeding or migration. AVAILABILITY AND IMPLEMENTATION https://github.com/CMUSchwartzLab/TUSV-ext. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
24
|
Govek K, Sikes C, Zhou Y, Oesper L. GraPhyC: Using Consensus to Infer Tumor Evolution. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:465-478. [PMID: 33031032 DOI: 10.1109/tcbb.2020.3029689] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
We consider the problem of finding a consensus tumor evolution tree from a set of conflicting input trees. In contrast to traditional phylogenetic trees, the tumor trees we consider do not have the same set of labels applied to the leaves of each tree. We describe several distance measures between these tumor trees. Our GraPhyC algorithm solves the consensus problem using a weighted directed graph where vertices are sets of mutations and edges are weighted based on the number of times a parental relationship is observed between their constituent mutations in the input trees. We find a minimum weight spanning arborescence in this graph and prove that it minimizes the total distance to all input trees for one of our distance measures. We also describe several extensions of our GraPhyC approach. On simulated data we show that GraPhyC outperforms a baseline method and demonstrate that GraPhyC can be an effective means of computing centroids in k-medians clustering. We analyze two real sequencing datasets and find that GraPhyC is able to identify a tree not included in the set of input trees, but that contains characteristics supported by other reported evolutionary reconstructions of this tumor.
Collapse
|
25
|
Lei H, Guo XA, Tao Y, Ding K, Fu X, Oesterreich S, Lee AV, Schwartz R. OUP accepted manuscript. Bioinformatics 2022; 38:i386-i394. [PMID: 35758822 PMCID: PMC9235482 DOI: 10.1093/bioinformatics/btac262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Motivation Identifying cell types and their abundances and how these evolve during tumor progression is critical to understanding the mechanisms of metastasis and identifying predictors of metastatic potential that can guide the development of new diagnostics or therapeutics. Single-cell RNA sequencing (scRNA-seq) has been especially promising in resolving heterogeneity of expression programs at the single-cell level, but is not always feasible, e.g. for large cohort studies or longitudinal analysis of archived samples. In such cases, clonal subpopulations may still be inferred via genomic deconvolution, but deconvolution methods have limited ability to resolve fine clonal structure and may require reference cell type profiles that are missing or imprecise. Prior methods can eliminate the need for reference profiles but show unstable performance when few bulk samples are available. Results In this work, we develop a new method using reference scRNA-seq to interpret sample collections for which only bulk RNA-seq is available for some samples, e.g. clonally resolving archived primary tissues using scRNA-seq from metastases. By integrating such information in a Quadratic Programming framework, our method can recover more accurate cell types and corresponding cell type abundances in bulk samples. Application to a breast tumor bone metastases dataset confirms the power of scRNA-seq data to improve cell type inference and quantification in same-patient bulk samples. Availability and implementation Source code is available on Github at https://github.com/CMUSchwartzLab/RADs.
Collapse
Affiliation(s)
| | | | - Yifeng Tao
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Kai Ding
- Department of Pharmacology and Chemical Biology, UPMC Hillman Cancer Center, Magee-Womens Research Institute, Pittsburgh, PA 15213, USA
| | - Xuecong Fu
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Steffi Oesterreich
- Department of Pharmacology and Chemical Biology, UPMC Hillman Cancer Center, Magee-Womens Research Institute, Pittsburgh, PA 15213, USA
| | - Adrian V Lee
- Department of Pharmacology and Chemical Biology, UPMC Hillman Cancer Center, Magee-Womens Research Institute, Pittsburgh, PA 15213, USA
| | | |
Collapse
|
26
|
Baghaarabani L, Goliaei S, Foroughmand-Araabi MH, Shariatpanahi SP, Goliaei B. Conifer: clonal tree inference for tumor heterogeneity with single-cell and bulk sequencing data. BMC Bioinformatics 2021; 22:416. [PMID: 34461827 PMCID: PMC8404257 DOI: 10.1186/s12859-021-04338-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2021] [Accepted: 08/16/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Genetic heterogeneity of a cancer tumor that develops during clonal evolution is one of the reasons for cancer treatment failure, by increasing the chance of drug resistance. Clones are cell populations with different genotypes, resulting from differences in somatic mutations that occur and accumulate during cancer development. An appropriate approach for identifying clones is determining the variant allele frequency of mutations that occurred in the tumor. Although bulk sequencing data can be used to provide that information, the frequencies are not informative enough for identifying different clones with the same prevalence and their evolutionary relationships. On the other hand, single-cell sequencing data provides valuable information about branching events in the evolution of a cancerous tumor. However, the temporal order of mutations may be determined with ambiguities using only single-cell data, while variant allele frequencies from bulk sequencing data can provide beneficial information for inferring the temporal order of mutations with fewer ambiguities. RESULT In this study, a new method called Conifer (ClONal tree Inference For hEterogeneity of tumoR) is proposed which combines aggregated variant allele frequency from bulk sequencing data with branching event information from single-cell sequencing data to more accurately identify clones and their evolutionary relationships. It is proven that the accuracy of clone identification and clonal tree inference is increased by using Conifer compared to other existing methods on various sets of simulated data. In addition, it is discussed that the evolutionary tree provided by Conifer on real cancer data sets is highly consistent with information in both bulk and single-cell data. CONCLUSIONS In this study, we have provided an accurate and robust method to identify clones of tumor heterogeneity and their evolutionary history by combining single-cell and bulk sequencing data.
Collapse
Affiliation(s)
- Leila Baghaarabani
- Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Sama Goliaei
- Faculty of New Sciences and Technologies, University of Tehran, Tehran, Iran
| | | | | | - Bahram Goliaei
- Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran.
| |
Collapse
|
27
|
Malikić S, Mehrabadi FR, Azer ES, Ebrahimabadi MH, Sahinalp SC. Studying the History of Tumor Evolution from Single-Cell Sequencing Data by Exploring the Space of Binary Matrices. J Comput Biol 2021; 28:857-879. [PMID: 34297621 DOI: 10.1089/cmb.2020.0595] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Single-cell sequencing (SCS) data have great potential in reconstructing the evolutionary history of tumors. Rapid advances in SCS technology in the past decade were followed by the design of various computational methods for inferring trees of tumor evolution. Some of the earliest methods were based on the direct search in the space of trees with the goal of finding the maximum likelihood tree. However, it can be shown that instead of searching directly in the tree space, we can perform a search in the space of binary matrices and obtain maximum likelihood tree directly from the maximum likelihood matrix. The potential of the latter tree search strategy has recently been recognized by different research groups and several related methods were published in the past 2 years. Here we provide a review of the theoretical background of these methods and a detailed discussion, which are largely missing in the available publications, of the correlation between the two tree search strategies. We also discuss each of the existing methods based on the search in the space of binary matrices and summarize the best-known single-cell DNA sequencing data sets, which can be used in the future for assessing performance on real data of newly developed methods.
Collapse
Affiliation(s)
- Salem Malikić
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Farid Rashidi Mehrabadi
- Department of Computer Science, Indiana University, Bloomington, Indiana, USA.,Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Erfan Sadeqi Azer
- Department of Computer Science, Indiana University, Bloomington, Indiana, USA
| | - Mohammad Haghir Ebrahimabadi
- Department of Computer Science, Indiana University, Bloomington, Indiana, USA.,Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Suleyman Cenk Sahinalp
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
28
|
Weber LL, Sashittal P, El-Kebir M. doubletD: detecting doublets in single-cell DNA sequencing data. Bioinformatics 2021; 37:i214-i221. [PMID: 34252961 PMCID: PMC8275324 DOI: 10.1093/bioinformatics/btab266] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/21/2021] [Indexed: 11/13/2022] Open
Abstract
Motivation While single-cell DNA sequencing (scDNA-seq) has enabled the study of intratumor heterogeneity at an unprecedented resolution, current technologies are error-prone and often result in doublets where two or more cells are mistaken for a single cell. Not only do doublets confound downstream analyses, but the increase in doublet rate is also a major bottleneck preventing higher throughput with current single-cell technologies. Although doublet detection and removal are standard practice in scRNA-seq data analysis, options for scDNA-seq data are limited. Current methods attempt to detect doublets while also performing complex downstream analyses tasks, leading to decreased efficiency and/or performance. Results We present doubletD, the first standalone method for detecting doublets in scDNA-seq data. Underlying our method is a simple maximum likelihood approach with a closed-form solution. We demonstrate the performance of doubletD on simulated data as well as real datasets, outperforming current methods for downstream analysis of scDNA-seq data that jointly infer doublets as well as standalone approaches for doublet detection in scRNA-seq data. Incorporating doubletD in scDNA-seq analysis pipelines will reduce complexity and lead to more accurate results. Availability and implementation https://github.com/elkebir-group/doubletD. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Leah L Weber
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbama, IL 61801, USA
| | - Palash Sashittal
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbama, IL 61801, USA.,Department of Aerospace Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Mohammed El-Kebir
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbama, IL 61801, USA
| |
Collapse
|
29
|
Weber LL, El-Kebir M. Distinguishing linear and branched evolution given single-cell DNA sequencing data of tumors. Algorithms Mol Biol 2021; 16:14. [PMID: 34229713 PMCID: PMC8259357 DOI: 10.1186/s13015-021-00194-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 06/22/2021] [Indexed: 01/24/2023] Open
Abstract
Background Cancer arises from an evolutionary process where somatic mutations give rise to clonal expansions. Reconstructing this evolutionary process is useful for treatment decision-making as well as understanding evolutionary patterns across patients and cancer types. In particular, classifying a tumor’s evolutionary process as either linear or branched and understanding what cancer types and which patients have each of these trajectories could provide useful insights for both clinicians and researchers. While comprehensive cancer phylogeny inference from single-cell DNA sequencing data is challenging due to limitations with current sequencing technology and the complexity of the resulting problem, current data might provide sufficient signal to accurately classify a tumor’s evolutionary history as either linear or branched. Results We introduce the Linear Perfect Phylogeny Flipping (LPPF) problem as a means of testing two alternative hypotheses for the pattern of evolution, which we prove to be NP-hard. We develop Phyolin, which uses constraint programming to solve the LPPF problem. Through both in silico experiments and real data application, we demonstrate the performance of our method, outperforming a competing machine learning approach. Conclusion Phyolin is an accurate, easy to use and fast method for classifying an evolutionary trajectory as linear or branched given a tumor’s single-cell DNA sequencing data.
Collapse
|
30
|
Jahn K, Beerenwinkel N, Zhang L. The Bourque distances for mutation trees of cancers. Algorithms Mol Biol 2021; 16:9. [PMID: 34112201 PMCID: PMC8193869 DOI: 10.1186/s13015-021-00188-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 06/02/2021] [Indexed: 12/02/2022] Open
Abstract
Background Mutation trees are rooted trees in which nodes are of arbitrary degree and labeled with a mutation set. These trees, also referred to as clonal trees, are used in computational oncology to represent the mutational history of tumours. Classical tree metrics such as the popular Robinson–Foulds distance are of limited use for the comparison of mutation trees. One reason is that mutation trees inferred with different methods or for different patients often contain different sets of mutation labels. Results We generalize the Robinson–Foulds distance into a set of distance metrics called Bourque distances for comparing mutation trees. We show the basic version of the Bourque distance for mutation trees can be computed in linear time. We also make a connection between the Robinson–Foulds distance and the nearest neighbor interchange distance. Supplementary Information The online version contains supplementary material available at 10.1186/s13015-021-00188-3.
Collapse
|
31
|
Ciccolella S, Bernardini G, Denti L, Bonizzoni P, Previtali M, Della Vedova G. Triplet-based similarity score for fully multilabeled trees with poly-occurring labels. Bioinformatics 2021; 37:178-184. [PMID: 32730595 PMCID: PMC8055217 DOI: 10.1093/bioinformatics/btaa676] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Revised: 06/29/2020] [Accepted: 07/22/2020] [Indexed: 01/06/2023] Open
Abstract
MOTIVATION The latest advances in cancer sequencing, and the availability of a wide range of methods to infer the evolutionary history of tumors, have made it important to evaluate, reconcile and cluster different tumor phylogenies. Recently, several notions of distance or similarities have been proposed in the literature, but none of them has emerged as the golden standard. Moreover, none of the known similarity measures is able to manage mutations occurring multiple times in the tree, a circumstance often occurring in real cases. RESULTS To overcome these limitations, in this article, we propose MP3, the first similarity measure for tumor phylogenies able to effectively manage cases where multiple mutations can occur at the same time and mutations can occur multiple times. Moreover, a comparison of MP3 with other measures shows that it is able to classify correctly similar and dissimilar trees, both on simulated and on real data. AVAILABILITY AND IMPLEMENTATION An open source implementation of MP3 is publicly available at https://github.com/AlgoLab/mp3treesim. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Simone Ciccolella
- Department of Informatics, Systems and Communication, University of Milano-Biccoca, Milan 20126, Italy
| | - Giulia Bernardini
- Department of Informatics, Systems and Communication, University of Milano-Biccoca, Milan 20126, Italy
| | - Luca Denti
- Department of Informatics, Systems and Communication, University of Milano-Biccoca, Milan 20126, Italy
| | - Paola Bonizzoni
- Department of Informatics, Systems and Communication, University of Milano-Biccoca, Milan 20126, Italy
| | - Marco Previtali
- Department of Informatics, Systems and Communication, University of Milano-Biccoca, Milan 20126, Italy
| | - Gianluca Della Vedova
- Department of Informatics, Systems and Communication, University of Milano-Biccoca, Milan 20126, Italy
| |
Collapse
|
32
|
Ciccolella S, Ricketts C, Soto Gomez M, Patterson M, Silverbush D, Bonizzoni P, Hajirasouliha I, Della Vedova G. Inferring cancer progression from Single-Cell Sequencing while allowing mutation losses. Bioinformatics 2021; 37:326-333. [PMID: 32805010 PMCID: PMC8058767 DOI: 10.1093/bioinformatics/btaa722] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Revised: 08/06/2020] [Accepted: 08/11/2020] [Indexed: 01/21/2023] Open
Abstract
Motivation In recent years, the well-known Infinite Sites Assumption has been a fundamental feature of computational methods devised for reconstructing tumor phylogenies and inferring cancer progressions. However, recent studies leveraging single-cell sequencing (SCS) techniques have shown evidence of the widespread recurrence and, especially, loss of mutations in several tumor samples. While there exist established computational methods that infer phylogenies with mutation losses, there remain some advancements to be made. Results We present Simulated Annealing Single-Cell inference (SASC): a new and robust approach based on simulated annealing for the inference of cancer progression from SCS datasets. In particular, we introduce an extension of the model of evolution where mutations are only accumulated, by allowing also a limited amount of mutation loss in the evolutionary history of the tumor: the Dollo-k model. We demonstrate that SASC achieves high levels of accuracy when tested on both simulated and real datasets and in comparison with some other available methods. Availability and implementation The SASC tool is open source and available at https://github.com/sciccolella/sasc. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Simone Ciccolella
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
| | - Camir Ricketts
- Department of Physiology and Biophysics, Tri-I Computational Biology & Medicine Graduate Program, Weill Cornell Medicine of Cornell University, New York, NY 10021, USA.,Institute for Computational Biomedicine, Englander Institute for Precision Medicine, The Meyer Cancer Center, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, New York City, NY 10021, USA
| | - Mauricio Soto Gomez
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
| | - Murray Patterson
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy.,Department of Computer Science, College of Arts and Sciences, Georgia State University, Atlanta, GA 30303, USA
| | - Dana Silverbush
- Department of Pathology and Center for Cancer Research, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Paola Bonizzoni
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
| | - Iman Hajirasouliha
- Institute for Computational Biomedicine, Englander Institute for Precision Medicine, The Meyer Cancer Center, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, New York City, NY 10021, USA
| | - Gianluca Della Vedova
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
| |
Collapse
|
33
|
Vavoulis DV, Cutts A, Taylor JC, Schuh A. A statistical approach for tracking clonal dynamics in cancer using longitudinal next-generation sequencing data. Bioinformatics 2021; 37:147-154. [PMID: 32722772 PMCID: PMC8055230 DOI: 10.1093/bioinformatics/btaa672] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Revised: 05/13/2020] [Accepted: 07/20/2020] [Indexed: 01/22/2023] Open
Abstract
MOTIVATION Tumours are composed of distinct cancer cell populations (clones), which continuously adapt to their local micro-environment. Standard methods for clonal deconvolution seek to identify groups of mutations and estimate the prevalence of each group in the tumour, while considering its purity and copy number profile. These methods have been applied on cross-sectional data and on longitudinal data after discarding information on the timing of sample collection. Two key questions are how can we incorporate such information in our analyses and is there any benefit in doing so? RESULTS We developed a clonal deconvolution method, which incorporates explicitly the temporal spacing of longitudinally sampled tumours. By merging a Dirichlet Process Mixture Model with Gaussian Process priors and using as input a sequence of several sparsely collected samples, our method can reconstruct the temporal profile of the abundance of any mutation cluster supported by the data as a continuous function of time. We benchmarked our method on whole genome, whole exome and targeted sequencing data from patients with chronic lymphocytic leukaemia, on liquid biopsy data from a patient with melanoma and on synthetic data and we found that incorporating information on the timing of tissue collection improves model performance, as long as data of sufficient volume and complexity are available for estimating free model parameters. Thus, our approach is particularly useful when collecting a relatively long sequence of tumour samples is feasible, as in liquid cancers (e.g. leukaemia) and liquid biopsies. AVAILABILITY AND IMPLEMENTATION The statistical methodology presented in this paper is freely available at github.com/dvav/clonosGP. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dimitrios V Vavoulis
- Department of Oncology, University of Oxford, Oxford, OX3 7DQ, UK
- Nuffield Department of Medicine, Wellcome Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Trust, Oxford, OX3 9DU, UK
- Department of Oncology, Molecular Diagnostic Centre, University of Oxford, Oxford OX3 9DU, UK
| | - Anthony Cutts
- Department of Oncology, University of Oxford, Oxford, OX3 7DQ, UK
- Department of Oncology, Molecular Diagnostic Centre, University of Oxford, Oxford OX3 9DU, UK
| | - Jenny C Taylor
- Nuffield Department of Medicine, Wellcome Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Trust, Oxford, OX3 9DU, UK
| | - Anna Schuh
- Department of Oncology, University of Oxford, Oxford, OX3 7DQ, UK
- NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Trust, Oxford, OX3 9DU, UK
- Department of Oncology, Molecular Diagnostic Centre, University of Oxford, Oxford OX3 9DU, UK
- Department of Haematology, Oxford University Hospitals NHS Trust, Oxford OX3 9DU, UK
| |
Collapse
|
34
|
Zhang C, El-Kebir M, Ochoa I. Moss enables high sensitivity single-nucleotide variant calling from multiple bulk DNA tumor samples. Nat Commun 2021; 12:2204. [PMID: 33850139 PMCID: PMC8044184 DOI: 10.1038/s41467-021-22466-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2020] [Accepted: 03/05/2021] [Indexed: 11/17/2022] Open
Abstract
Intra-tumor heterogeneity renders the identification of somatic single-nucleotide variants (SNVs) a challenging problem. In particular, low-frequency SNVs are hard to distinguish from sequencing artifacts. While the increasing availability of multi-sample tumor DNA sequencing data holds the potential for more accurate variant calling, there is a lack of high-sensitivity multi-sample SNV callers that utilize these data. Here we report Moss, a method to identify low-frequency SNVs that recur in multiple sequencing samples from the same tumor. Moss provides any existing single-sample SNV caller the ability to support multiple samples with little additional time overhead. We demonstrate that Moss improves recall while maintaining high precision in a simulated dataset. On multi-sample hepatocellular carcinoma, acute myeloid leukemia and colorectal cancer datasets, Moss identifies new low-frequency variants that meet manual review criteria and are consistent with the tumor's mutational signature profile. In addition, Moss detects the presence of variants in more samples of the same tumor than reported by the single-sample caller. Moss' improved sensitivity in SNV calling will enable more detailed downstream analyses in cancer genomics.
Collapse
Affiliation(s)
- Chuanyi Zhang
- Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Mohammed El-Kebir
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
| | - Idoia Ochoa
- Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
- Department of Electrical Engineering, University of Navarra, Tecnun, San Sebastian, Spain.
| |
Collapse
|
35
|
Sadeqi Azer E, Rashidi Mehrabadi F, Malikić S, Li XC, Bartok O, Litchfield K, Levy R, Samuels Y, Schäffer AA, Gertz EM, Day CP, Pérez-Guijarro E, Marie K, Lee MP, Merlino G, Ergun F, Sahinalp SC. PhISCS-BnB: a fast branch and bound algorithm for the perfect tumor phylogeny reconstruction problem. Bioinformatics 2021; 36:i169-i176. [PMID: 32657358 DOI: 10.1093/bioinformatics/btaa464] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
MOTIVATION Recent advances in single-cell sequencing (SCS) offer an unprecedented insight into tumor emergence and evolution. Principled approaches to tumor phylogeny reconstruction via SCS data are typically based on general computational methods for solving an integer linear program, or a constraint satisfaction program, which, although guaranteeing convergence to the most likely solution, are very slow. Others based on Monte Carlo Markov Chain or alternative heuristics not only offer no such guarantee, but also are not faster in practice. As a result, novel methods that can scale up to handle the size and noise characteristics of emerging SCS data are highly desirable to fully utilize this technology. RESULTS We introduce PhISCS-BnB (phylogeny inference using SCS via branch and bound), a branch and bound algorithm to compute the most likely perfect phylogeny on an input genotype matrix extracted from an SCS dataset. PhISCS-BnB not only offers an optimality guarantee, but is also 10-100 times faster than the best available methods on simulated tumor SCS data. We also applied PhISCS-BnB on a recently published large melanoma dataset derived from the sublineages of a cell line involving 20 clones with 2367 mutations, which returned the optimal tumor phylogeny in <4 h. The resulting phylogeny agrees with and extends the published results by providing a more detailed picture on the clonal evolution of the tumor. AVAILABILITY AND IMPLEMENTATION https://github.com/algo-cancer/PhISCS-BnB. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Erfan Sadeqi Azer
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA
| | - Farid Rashidi Mehrabadi
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA.,Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Salem Malikić
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA
| | - Xuan Cindy Li
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA.,Program in Computational Biology, Bioinformatics and Genomics, University of Maryland, College Park, MD 20742, USA
| | - Osnat Bartok
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Kevin Litchfield
- Cancer Evolution and Genome Instability Laboratory, Francis Crick Institute, London NW1 1AT, UK.,Cancer Research UK Lung Cancer Centre of Excellence London, University College London Cancer Institute, London WC1E 6DD, UK
| | - Ronen Levy
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Yardena Samuels
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Alejandro A Schäffer
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - E Michael Gertz
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Chi-Ping Day
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Eva Pérez-Guijarro
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Kerrie Marie
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Maxwell P Lee
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Glenn Merlino
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Funda Ergun
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA
| | - S Cenk Sahinalp
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
36
|
Hodzic E, Shrestha R, Malikic S, Collins CC, Litchfield K, Turajlic S, Sahinalp SC. Identification of conserved evolutionary trajectories in tumors. Bioinformatics 2021; 36:i427-i435. [PMID: 32657374 DOI: 10.1093/bioinformatics/btaa453] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION As multi-region, time-series and single-cell sequencing data become more widely available; it is becoming clear that certain tumors share evolutionary characteristics with others. In the last few years, several computational methods have been developed with the goal of inferring the subclonal composition and evolutionary history of tumors from tumor biopsy sequencing data. However, the phylogenetic trees that they report differ significantly between tumors (even those with similar characteristics). RESULTS In this article, we present a novel combinatorial optimization method, CONETT, for detection of recurrent tumor evolution trajectories. Our method constructs a consensus tree of conserved evolutionary trajectories based on the information about temporal order of alteration events in a set of tumors. We apply our method to previously published datasets of 100 clear-cell renal cell carcinoma and 99 non-small-cell lung cancer patients and identify both conserved trajectories that were reported in the original studies, as well as new trajectories. AVAILABILITY AND IMPLEMENTATION CONETT is implemented in C++ and available at https://github.com/ehodzic/CONETT. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ermin Hodzic
- Department of Computing Science, Simon Fraser University, Burnaby, BC, Canada
| | - Raunak Shrestha
- Department of Radiation Oncology, University of California San Francisco, San Francisco, CA, USA
| | - Salem Malikic
- Department of Computer Science, Indiana University Bloomington, Bloomington, IN, USA
| | - Colin C Collins
- Department of Urologic Sciences, University of British Columbia, Vancouver, BC, Canada.,aboratory for Advanced Genome Analysis, Vancouver Prostate Centre, Vancouver, BC, Canada
| | - Kevin Litchfield
- Cancer Dynamics Laboratory, the Francis Crick institute, Genome Instability Laboratory, Francis Crick Institute, London, UK
| | - Samra Turajlic
- Cancer Dynamics Laboratory, the Francis Crick institute, Genome Instability Laboratory, Francis Crick Institute, London, UK.,Skin and Renal Units, The royal Marsden NHS Foundation Trust, London, UK
| | - S Cenk Sahinalp
- Cancer Data Science Lab., National Cancer Institute, NIH, Bethesda, MD, USA
| |
Collapse
|
37
|
Borgsmüller N, Bonet J, Marass F, Gonzalez-Perez A, Lopez-Bigas N, Beerenwinkel N. BnpC: Bayesian non-parametric clustering of single-cell mutation profiles. Bioinformatics 2021; 36:4854-4859. [PMID: 32592465 PMCID: PMC7750970 DOI: 10.1093/bioinformatics/btaa599] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 05/23/2020] [Accepted: 06/19/2020] [Indexed: 02/06/2023] Open
Abstract
Motivation The high resolution of single-cell DNA sequencing (scDNA-seq) offers great potential to resolve intratumor heterogeneity (ITH) by distinguishing clonal populations based on their mutation profiles. However, the increasing size of scDNA-seq datasets and technical limitations, such as high error rates and a large proportion of missing values, complicate this task and limit the applicability of existing methods. Results Here, we introduce BnpC, a novel non-parametric method to cluster individual cells into clones and infer their genotypes based on their noisy mutation profiles. We benchmarked our method comprehensively against state-of-the-art methods on simulated data using various data sizes, and applied it to three cancer scDNA-seq datasets. On simulated data, BnpC compared favorably against current methods in terms of accuracy, runtime and scalability. Its inferred genotypes were the most accurate, especially on highly heterogeneous data, and it was the only method able to run and produce results on datasets with 5000 cells. On tumor scDNA-seq data, BnpC was able to identify clonal populations missed by the original cluster analysis but supported by Supplementary Experimental Data. With ever growing scDNA-seq datasets, scalable and accurate methods such as BnpC will become increasingly relevant, not only to resolve ITH but also as a preprocessing step to reduce data size. Availability and implementation BnpC is freely available under MIT license at https://github.com/cbg-ethz/BnpC. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nico Borgsmüller
- Department of Biosystems Science and Engineering, ETH Zürich, Basel 4058, Switzerland.,SIB, Swiss Institute of Bioinformatics, Basel 4058, Switzerland
| | - Jose Bonet
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona 08028, Spain.,Research Program on Biomedical Informatics, Universitat Pompeu Fabra, Barcelona, Catalonia 08002, Spain
| | - Francesco Marass
- Department of Biosystems Science and Engineering, ETH Zürich, Basel 4058, Switzerland.,SIB, Swiss Institute of Bioinformatics, Basel 4058, Switzerland
| | - Abel Gonzalez-Perez
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona 08028, Spain.,Research Program on Biomedical Informatics, Universitat Pompeu Fabra, Barcelona, Catalonia 08002, Spain
| | - Nuria Lopez-Bigas
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona 08028, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona 08010, Spain
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zürich, Basel 4058, Switzerland.,SIB, Swiss Institute of Bioinformatics, Basel 4058, Switzerland
| |
Collapse
|
38
|
Velten L, Story BA, Hernández-Malmierca P, Raffel S, Leonce DR, Milbank J, Paulsen M, Demir A, Szu-Tu C, Frömel R, Lutz C, Nowak D, Jann JC, Pabst C, Boch T, Hofmann WK, Müller-Tidow C, Trumpp A, Haas S, Steinmetz LM. Identification of leukemic and pre-leukemic stem cells by clonal tracking from single-cell transcriptomics. Nat Commun 2021; 12:1366. [PMID: 33649320 PMCID: PMC7921413 DOI: 10.1038/s41467-021-21650-1] [Citation(s) in RCA: 77] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Accepted: 02/05/2021] [Indexed: 01/07/2023] Open
Abstract
Cancer stem cells drive disease progression and relapse in many types of cancer. Despite this, a thorough characterization of these cells remains elusive and with it the ability to eradicate cancer at its source. In acute myeloid leukemia (AML), leukemic stem cells (LSCs) underlie mortality but are difficult to isolate due to their low abundance and high similarity to healthy hematopoietic stem cells (HSCs). Here, we demonstrate that LSCs, HSCs, and pre-leukemic stem cells can be identified and molecularly profiled by combining single-cell transcriptomics with lineage tracing using both nuclear and mitochondrial somatic variants. While mutational status discriminates between healthy and cancerous cells, gene expression distinguishes stem cells and progenitor cell populations. Our approach enables the identification of LSC-specific gene expression programs and the characterization of differentiation blocks induced by leukemic mutations. Taken together, we demonstrate the power of single-cell multi-omic approaches in characterizing cancer stem cells.
Collapse
Affiliation(s)
- Lars Velten
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.
- Universitat Pompeu Fabra (UPF), Barcelona, Spain.
| | - Benjamin A Story
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
- Swiss Federal Institute of Technology (ETH) Zurich, Department of Biosystems Science and Engineering, Basel, Switzerland
| | - Pablo Hernández-Malmierca
- Heidelberg Institute for Stem Cell Technology and Experimental Medicine (HI-STEM gGmbH), Heidelberg, Germany
- Division of Stem Cells and Cancer, Deutsches Krebsforschungszentrum (DKFZ) and DKFZ-ZMBH Alliance, Heidelberg, Germany
| | - Simon Raffel
- Heidelberg Institute for Stem Cell Technology and Experimental Medicine (HI-STEM gGmbH), Heidelberg, Germany
- Division of Stem Cells and Cancer, Deutsches Krebsforschungszentrum (DKFZ) and DKFZ-ZMBH Alliance, Heidelberg, Germany
- Department of Internal Medicine V, Hematology, Oncology and Rheumatology, University of Heidelberg, Heidelberg, Germany
| | - Daniel R Leonce
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Jennifer Milbank
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Malte Paulsen
- European Molecular Biology Laboratory (EMBL), Flow Cytometry Core Facility, Heidelberg, Germany
| | - Aykut Demir
- Department of Internal Medicine V, Hematology, Oncology and Rheumatology, University of Heidelberg, Heidelberg, Germany
| | - Chelsea Szu-Tu
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Robert Frömel
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Christoph Lutz
- Department of Internal Medicine V, Hematology, Oncology and Rheumatology, University of Heidelberg, Heidelberg, Germany
| | - Daniel Nowak
- Department of Hematology and Oncology, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Johann-Christoph Jann
- Department of Hematology and Oncology, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Caroline Pabst
- Department of Internal Medicine V, Hematology, Oncology and Rheumatology, University of Heidelberg, Heidelberg, Germany
- Molecular Medicine Partnership Unit (MMPU), University of Heidelberg and European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - Tobias Boch
- Swiss Federal Institute of Technology (ETH) Zurich, Department of Biosystems Science and Engineering, Basel, Switzerland
- Division of Stem Cells and Cancer, Deutsches Krebsforschungszentrum (DKFZ) and DKFZ-ZMBH Alliance, Heidelberg, Germany
- Department of Hematology and Oncology, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Wolf-Karsten Hofmann
- Department of Hematology and Oncology, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Carsten Müller-Tidow
- Department of Internal Medicine V, Hematology, Oncology and Rheumatology, University of Heidelberg, Heidelberg, Germany
| | - Andreas Trumpp
- Heidelberg Institute for Stem Cell Technology and Experimental Medicine (HI-STEM gGmbH), Heidelberg, Germany
- Division of Stem Cells and Cancer, Deutsches Krebsforschungszentrum (DKFZ) and DKFZ-ZMBH Alliance, Heidelberg, Germany
- German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Simon Haas
- Heidelberg Institute for Stem Cell Technology and Experimental Medicine (HI-STEM gGmbH), Heidelberg, Germany
- Division of Stem Cells and Cancer, Deutsches Krebsforschungszentrum (DKFZ) and DKFZ-ZMBH Alliance, Heidelberg, Germany
- German Cancer Consortium (DKTK), Heidelberg, Germany
- Berlin Institute of Health (BIH), Berlin, Germany
- Charité-Universitätsmedizin, Berlin, Germany
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
| | - Lars M Steinmetz
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany.
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA.
- Stanford Genome Technology Center, Palo Alto, CA, USA.
| |
Collapse
|
39
|
Ciccolella S, Soto Gomez M, Patterson MD, Della Vedova G, Hajirasouliha I, Bonizzoni P. gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data. BMC Bioinformatics 2020; 21:413. [PMID: 33297943 PMCID: PMC7725124 DOI: 10.1186/s12859-020-03736-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Accepted: 09/03/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Cancer progression reconstruction is an important development stemming from the phylogenetics field. In this context, the reconstruction of the phylogeny representing the evolutionary history presents some peculiar aspects that depend on the technology used to obtain the data to analyze: Single Cell DNA Sequencing data have great specificity, but are affected by moderate false negative and missing value rates. Moreover, there has been some recent evidence of back mutations in cancer: this phenomenon is currently widely ignored. RESULTS We present a new tool, gpps, that reconstructs a tumor phylogeny from Single Cell Sequencing data, allowing each mutation to be lost at most a fixed number of times. The General Parsimony Phylogeny from Single cell (gpps) tool is open source and available at https://github.com/AlgoLab/gpps . CONCLUSIONS gpps provides new insights to the analysis of intra-tumor heterogeneity by proposing a new progression model to the field of cancer phylogeny reconstruction on Single Cell data.
Collapse
Affiliation(s)
- Simone Ciccolella
- Department of Informatics, Systems, and Communication, University of Milano - Bicocca, Milan, Italy.
| | - Mauricio Soto Gomez
- Department of Informatics, Systems, and Communication, University of Milano - Bicocca, Milan, Italy
| | - Murray D Patterson
- Department of Informatics, Systems, and Communication, University of Milano - Bicocca, Milan, Italy.,Georgia State University, Atlanta, GA, USA
| | - Gianluca Della Vedova
- Department of Informatics, Systems, and Communication, University of Milano - Bicocca, Milan, Italy
| | - Iman Hajirasouliha
- Institute for Computational Biomedicine, Weill Cornell Medicine, New York City, NY, USA.,Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, NewYork City, 10021, NY, USA
| | - Paola Bonizzoni
- Department of Informatics, Systems, and Communication, University of Milano - Bicocca, Milan, Italy
| |
Collapse
|
40
|
Sadeqi Azer E, Haghir Ebrahimabadi M, Malikić S, Khardon R, Sahinalp SC. Tumor Phylogeny Topology Inference via Deep Learning. iScience 2020; 23:101655. [PMID: 33117968 PMCID: PMC7582044 DOI: 10.1016/j.isci.2020.101655] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Revised: 08/10/2020] [Accepted: 10/02/2020] [Indexed: 01/24/2023] Open
Abstract
Principled computational approaches for tumor phylogeny reconstruction via single-cell sequencing typically aim to build the most likely perfect phylogeny tree from the noisy genotype matrix - which represents genotype calls of single cells. This problem is NP-hard, and as a result, existing approaches aim to solve relatively small instances of it through combinatorial optimization techniques or Bayesian inference. As expected, even when the goal is to infer basic topological features of the tumor phylogeny, rather than reconstructing the topology entirely, these approaches could be prohibitively slow. In this paper, we introduce fast deep learning solutions to the problems of inferring whether the most likely tree has a linear (chain) or branching topology and whether a perfect phylogeny is feasible from a given genotype matrix. We also present a reinforcement learning approach for reconstructing the most likely tumor phylogeny. This preliminary work demonstrates that data-driven approaches can reconstruct key features of tumor evolution.
Collapse
Affiliation(s)
- Erfan Sadeqi Azer
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA
| | - Mohammad Haghir Ebrahimabadi
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Salem Malikić
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Roni Khardon
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA
| | - S. Cenk Sahinalp
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
41
|
Tsyvina V, Zelikovsky A, Snir S, Skums P. Inference of mutability landscapes of tumors from single cell sequencing data. PLoS Comput Biol 2020; 16:e1008454. [PMID: 33253159 PMCID: PMC7728263 DOI: 10.1371/journal.pcbi.1008454] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Revised: 12/10/2020] [Accepted: 10/20/2020] [Indexed: 11/18/2022] Open
Abstract
One of the hallmarks of cancer is the extremely high mutability and genetic instability of tumor cells. Inherent heterogeneity of intra-tumor populations manifests itself in high variability of clone instability rates. Analogously to fitness landscapes, the instability rates of clonal populations form their mutability landscapes. Here, we present MULAN (MUtability LANdscape inference), a maximum-likelihood computational framework for inference of mutation rates of individual cancer subclones using single-cell sequencing data. It utilizes the partial information about the orders of mutation events provided by cancer mutation trees and extends it by inferring full evolutionary history and mutability landscape of a tumor. Evaluation of mutation rates on the level of subclones rather than individual genes allows to capture the effects of genomic interactions and epistasis. We estimate the accuracy of our approach and demonstrate that it can be used to study the evolution of genetic instability and infer tumor evolutionary history from experimental data. MULAN is available at https://github.com/compbel/MULAN.
Collapse
Affiliation(s)
- Viachaslau Tsyvina
- Department of Computer Science, Georgia State University, Atlanta, Georgia, United States of America
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, Atlanta, Georgia, United States of America
| | - Sagi Snir
- Department of Evolutionary and Environmental Biology, University of Haifa, Haifa, Israel
| | - Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, Georgia, United States of America
| |
Collapse
|
42
|
Weber LL, Aguse N, Chia N, El-Kebir M. PhyDOSE: Design of follow-up single-cell sequencing experiments of tumors. PLoS Comput Biol 2020; 16:e1008240. [PMID: 33001973 PMCID: PMC7553321 DOI: 10.1371/journal.pcbi.1008240] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2020] [Revised: 10/13/2020] [Accepted: 08/12/2020] [Indexed: 01/07/2023] Open
Abstract
The combination of bulk and single-cell DNA sequencing data of the same tumor enables the inference of high-fidelity phylogenies that form the input to many important downstream analyses in cancer genomics. While many studies simultaneously perform bulk and single-cell sequencing, some studies have analyzed initial bulk data to identify which mutations to target in a follow-up single-cell sequencing experiment, thereby decreasing cost. Bulk data provide an additional untapped source of valuable information, composed of candidate phylogenies and associated clonal prevalence. Here, we introduce PhyDOSE, a method that uses this information to strategically optimize the design of follow-up single cell experiments. Underpinning our method is the observation that only a small number of clones uniquely distinguish one candidate tree from all other trees. We incorporate distinguishing features into a probabilistic model that infers the number of cells to sequence so as to confidently reconstruct the phylogeny of the tumor. We validate PhyDOSE using simulations and a retrospective analysis of a leukemia patient, concluding that PhyDOSE's computed number of cells resolves tree ambiguity even in the presence of typical single-cell sequencing errors. We also conduct a retrospective analysis on an acute myeloid leukemia cohort, demonstrating the potential to achieve similar results with a significant reduction in the number of cells sequenced. In a prospective analysis, we demonstrate the advantage of selecting cells to sequence across multiple biopsies and that only a small number of cells suffice to disambiguate the solution space of trees in a recent lung cancer cohort. In summary, PhyDOSE proposes cost-efficient single-cell sequencing experiments that yield high-fidelity phylogenies, which will improve downstream analyses aimed at deepening our understanding of cancer biology.
Collapse
Affiliation(s)
- Leah L Weber
- Dept. of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Nuraini Aguse
- Dept. of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Nicholas Chia
- Microbiome Program, Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota, United States of America
- Division of Surgical Research, Department of Surgery, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Mohammed El-Kebir
- Dept. of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| |
Collapse
|
43
|
Lee D, Park Y, Kim S. Towards multi-omics characterization of tumor heterogeneity: a comprehensive review of statistical and machine learning approaches. Brief Bioinform 2020; 22:5896573. [PMID: 34020548 DOI: 10.1093/bib/bbaa188] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 06/29/2020] [Accepted: 07/21/2020] [Indexed: 12/19/2022] Open
Abstract
The multi-omics molecular characterization of cancer opened a new horizon for our understanding of cancer biology and therapeutic strategies. However, a tumor biopsy comprises diverse types of cells limited not only to cancerous cells but also to tumor microenvironmental cells and adjacent normal cells. This heterogeneity is a major confounding factor that hampers a robust and reproducible bioinformatic analysis for biomarker identification using multi-omics profiles. Besides, the heterogeneity itself has been recognized over the years for its significant prognostic values in some cancer types, thus offering another promising avenue for therapeutic intervention. A number of computational approaches to unravel such heterogeneity from high-throughput molecular profiles of a tumor sample have been proposed, but most of them rely on the data from an individual omics layer. Since the heterogeneity of cells is widely distributed across multi-omics layers, methods based on an individual layer can only partially characterize the heterogeneous admixture of cells. To help facilitate further development of the methodologies that synchronously account for several multi-omics profiles, we wrote a comprehensive review of diverse approaches to characterize tumor heterogeneity based on three different omics layers: genome, epigenome and transcriptome. As a result, this review can be useful for the analysis of multi-omics profiles produced by many large-scale consortia. Contact:sunkim.bioinfo@snu.ac.kr.
Collapse
Affiliation(s)
- Dohoon Lee
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Youngjune Park
- Department of Computer Science and Engineering, Institute of Engineering Research, Seoul National University, Seoul 08826, Korea
| | - Sun Kim
- Bioinformatics Institute, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
44
|
Mallory XF, Edrisi M, Navin N, Nakhleh L. Methods for copy number aberration detection from single-cell DNA-sequencing data. Genome Biol 2020; 21:208. [PMID: 32807205 PMCID: PMC7433197 DOI: 10.1186/s13059-020-02119-8] [Citation(s) in RCA: 77] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Accepted: 07/23/2020] [Indexed: 02/06/2023] Open
Abstract
Copy number aberrations (CNAs), which are pathogenic copy number variations (CNVs), play an important role in the initiation and progression of cancer. Single-cell DNA-sequencing (scDNAseq) technologies produce data that is ideal for inferring CNAs. In this review, we review eight methods that have been developed for detecting CNAs in scDNAseq data, and categorize them according to the steps of a seven-step pipeline that they employ. Furthermore, we review models and methods for evolutionary analyses of CNAs from scDNAseq data and highlight advances and future research directions for computational methods for CNA detection from scDNAseq data.
Collapse
Affiliation(s)
- Xian F. Mallory
- Department of Computer Science, Rice University, Houston, TX USA
- Department of Computer Science, Florida State University, Tallahassee, FL USA
| | | | - Nicholas Navin
- Department of Genetics, the University of Texas M.D. Anderson Cancer Center, Houston, TX USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, TX USA
| |
Collapse
|
45
|
Abstract
MOTIVATION Recent single-cell DNA sequencing technologies enable whole-genome sequencing of hundreds to thousands of individual cells. However, these technologies have ultra-low sequencing coverage (<0.5× per cell) which has limited their use to the analysis of large copy-number aberrations (CNAs) in individual cells. While CNAs are useful markers in cancer studies, single-nucleotide mutations are equally important, both in cancer studies and in other applications. However, ultra-low coverage sequencing yields single-nucleotide mutation data that are too sparse for current single-cell analysis methods. RESULTS We introduce SBMClone, a method to infer clusters of cells, or clones, that share groups of somatic single-nucleotide mutations. SBMClone uses a stochastic block model to overcome sparsity in ultra-low coverage single-cell sequencing data, and we show that SBMClone accurately infers the true clonal composition on simulated datasets with coverage at low as 0.2×. We applied SBMClone to single-cell whole-genome sequencing data from two breast cancer patients obtained using two different sequencing technologies. On the first patient, sequenced using the 10X Genomics CNV solution with sequencing coverage ≈0.03×, SBMClone recovers the major clonal composition when incorporating a small amount of additional information. On the second patient, where pre- and post-treatment tumor samples were sequenced using DOP-PCR with sequencing coverage ≈0.5×, SBMClone shows that tumor cells are present in the post-treatment sample, contrary to published analysis of this dataset. AVAILABILITY AND IMPLEMENTATION SBMClone is available on the GitHub repository https://github.com/raphael-group/SBMClone. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Matthew A Myers
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
| | - Simone Zaccaria
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
46
|
Satas G, Zaccaria S, Mon G, Raphael BJ. SCARLET: Single-cell tumor phylogeny inference with copy-number constrained mutation losses. Cell Syst 2020; 10:323-332.e8. [PMID: 32864481 PMCID: PMC7451135 DOI: 10.1016/j.cels.2020.04.001] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
A small number of somatic mutations drive the development of cancer, but all somatic mutations are markers of the evolutionary history of a tumor. Prominent methods to construct phylogenies from single-cell sequencing data use single-nucleotide variants (SNVs) as markers but fail to adequately account for copy-number aberrations (CNAs), which can overlap SNVs and result in SNV losses. Here, we introduce SCARLET, an algorithm that infers tumor phylogenies from single-cell DNA sequencing data while accounting for both CNA-driven loss of SNVs and sequencing errors. SCARLET outperforms existing methods on simulated data, with more accurate inference of the order in which mutations were acquired and the mutations present in individual cells. Using a single-cell dataset from a patient with colorectal cancer, SCARLET constructs a tumor phylogeny that is consistent with the observed CNAs and suggests an alternate origin for the patient's metastases. SCARLET is available at: github.com/raphael-group/scarlet.
Collapse
Affiliation(s)
- Gryte Satas
- Department of Computer Science, Brown University, Providence, RI 02912
- Department of Computer Science, Princeton University, Princeton, NJ 08540
| | - Simone Zaccaria
- Department of Computer Science, Princeton University, Princeton, NJ 08540
| | - Geoffrey Mon
- Department of Computer Science, Princeton University, Princeton, NJ 08540
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ 08540
| |
Collapse
|
47
|
DiNardo Z, Tomlinson K, Ritz A, Oesper L. Distance measures for tumor evolutionary trees. Bioinformatics 2020; 36:2090-2097. [PMID: 31750900 PMCID: PMC7141873 DOI: 10.1093/bioinformatics/btz869] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 09/04/2019] [Accepted: 11/19/2019] [Indexed: 12/14/2022] Open
Abstract
MOTIVATION There has been recent increased interest in using algorithmic methods to infer the evolutionary tree underlying the developmental history of a tumor. Quantitative measures that compare such trees are vital to a number of different applications including benchmarking tree inference methods and evaluating common inheritance patterns across patients. However, few appropriate distance measures exist, and those that do have low resolution for differentiating trees or do not fully account for the complex relationship between tree topology and the inheritance of the mutations labeling that topology. RESULTS Here, we present two novel distance measures, Common Ancestor Set distance (CASet) and Distinctly Inherited Set Comparison distance (DISC), that are specifically designed to account for the subclonal mutation inheritance patterns characteristic of tumor evolutionary trees. We apply CASet and DISC to multiple simulated datasets and two breast cancer datasets and show that our distance measures allow for more nuanced and accurate delineation between tumor evolutionary trees than existing distance measures. AVAILABILITY AND IMPLEMENTATION Implementations of CASet and DISC are freely available at: https://bitbucket.org/oesperlab/stereodist. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zach DiNardo
- Department of Computer Science, Carleton College, Northfield, MN 55057, USA
| | - Kiran Tomlinson
- Department of Computer Science, Carleton College, Northfield, MN 55057, USA
- Department of Computer Science, Cornell University, Ithaca, NY 14853, USA
| | - Anna Ritz
- Department of Biology, Reed College, Portland, OR 97202, USA
| | - Layla Oesper
- Department of Computer Science, Carleton College, Northfield, MN 55057, USA
| |
Collapse
|
48
|
Lei H, Lyu B, Gertz EM, Schäffer AA, Shi X, Wu K, Li G, Xu L, Hou Y, Dean M, Schwartz R. Tumor Copy Number Deconvolution Integrating Bulk and Single-Cell Sequencing Data. J Comput Biol 2020; 27:565-598. [PMID: 32181683 DOI: 10.1089/cmb.2019.0302] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Characterizing intratumor heterogeneity (ITH) is crucial to understanding cancer development, but it is hampered by limits of available data sources. Bulk DNA sequencing is the most common technology to assess ITH, but involves the analysis of a mixture of many genetically distinct cells in each sample, which must then be computationally deconvolved. Single-cell sequencing is a promising alternative, but its limitations-for example, high noise, difficulty scaling to large populations, technical artifacts, and large data sets-have so far made it impractical for studying cohorts of sufficient size to identify statistically robust features of tumor evolution. We have developed strategies for deconvolution and tumor phylogenetics combining limited amounts of bulk and single-cell data to gain some advantages of single-cell resolution with much lower cost, with specific focus on deconvolving genomic copy number data. We developed a mixed membership model for clonal deconvolution via non-negative matrix factorization balancing deconvolution quality with similarity to single-cell samples via an associated efficient coordinate descent algorithm. We then improve on that algorithm by integrating deconvolution with clonal phylogeny inference, using a mixed integer linear programming model to incorporate a minimum evolution phylogenetic tree cost in the problem objective. We demonstrate the effectiveness of these methods on semisimulated data of known ground truth, showing improved deconvolution accuracy relative to bulk data alone.
Collapse
Affiliation(s)
- Haoyun Lei
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania
| | - Bochuan Lyu
- Department of Mathematics, Rose-Hulman Institute of Technology, Terre Haute, Indiana
| | - E Michael Gertz
- National Center for Biotechnology Information, U.S. National Institutes of Health, Bethesda, Maryland.,Cancer Data Science Laboratory, National Cancer Institute, U.S. National Institutes of Health, Bethesda, Maryland
| | - Alejandro A Schäffer
- National Center for Biotechnology Information, U.S. National Institutes of Health, Bethesda, Maryland.,Cancer Data Science Laboratory, National Cancer Institute, U.S. National Institutes of Health, Bethesda, Maryland
| | | | - Kui Wu
- BGI-Shenzhen, Shenzhen, China
| | | | | | | | - Michael Dean
- Laboratory of Translational Genomics, Division of Cancer Epidemiology & Genetics, National Cancer Institute, U.S. National Institutes of Health, Gaithersburg, Maryland
| | - Russell Schwartz
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania.,Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, Pennsylvania
| |
Collapse
|
49
|
Martín-Vide C, Vega-Rodríguez MA, Wheeler T. Comparing Integer Linear Programming to SAT-Solving for Hard Problems in Computational and Systems Biology. ALGORITHMS FOR COMPUTATIONAL BIOLOGY 2020. [PMCID: PMC7197060 DOI: 10.1007/978-3-030-42266-0_6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
It is useful to have general-purpose solution methods that can be applied to a wide range of problems, rather than relying on the development of clever, intricate algorithms for each specific problem. Integer Linear Programming is the most widely-used such general-purpose solution method. It is successful in a wide range of problems. However, there are some problems in computational biology where integer linear programming has had only limited success. In this paper, we explore an alternate, general-purpose solution method: SAT-solving, i.e., constructing Boolean formulas in conjunctive normal form (CNF) that encode a problem instance, and using a SAT-solver to determine if the CNF formula is satisfiable or not. In three hard problems examined, we were very surprised to find the SAT-solving approach was dramatically better than the ILP approach in two problems; and a little slower, but more robust, in the third problem. We also re-examined and confirmed an earlier result on a fourth problem, using current ILP and SAT-solvers. These results should encourage further efforts to exploit SAT-solving in computational biology.
Collapse
|
50
|
Zafar H, Navin N, Chen K, Nakhleh L. SiCloneFit: Bayesian inference of population structure, genotype, and phylogeny of tumor clones from single-cell genome sequencing data. Genome Res 2019; 29:1847-1859. [PMID: 31628257 PMCID: PMC6836738 DOI: 10.1101/gr.243121.118] [Citation(s) in RCA: 70] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Accepted: 07/10/2019] [Indexed: 12/12/2022]
Abstract
Accumulation and selection of somatic mutations in a Darwinian framework result in intra-tumor heterogeneity (ITH) that poses significant challenges to the diagnosis and clinical therapy of cancer. Identification of the tumor cell populations (clones) and reconstruction of their evolutionary relationship can elucidate this heterogeneity. Recently developed single-cell DNA sequencing (SCS) technologies promise to resolve ITH to a single-cell level. However, technical errors in SCS data sets, including false-positives (FP) and false-negatives (FN) due to allelic dropout, and cell doublets, significantly complicate these tasks. Here, we propose a nonparametric Bayesian method that reconstructs the clonal populations as clusters of single cells, genotypes of each clone, and the evolutionary relationship between the clones. It employs a tree-structured Chinese restaurant process as the prior on the number and composition of clonal populations. The evolution of the clonal populations is modeled by a clonal phylogeny and a finite-site model of evolution to account for potential mutation recurrence and losses. We probabilistically account for FP and FN errors, and cell doublets are modeled by employing a Beta-binomial distribution. We develop a Gibbs sampling algorithm comprising partial reversible-jump and partial Metropolis-Hastings updates to explore the joint posterior space of all parameters. The performance of our method on synthetic and experimental data sets suggests that joint reconstruction of tumor clones and clonal phylogeny under a finite-site model of evolution leads to more accurate inferences. Our method is the first to enable this joint reconstruction in a fully Bayesian framework, thus providing measures of support of the inferences it makes.
Collapse
Affiliation(s)
- Hamim Zafar
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA
| | - Nicholas Navin
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA
| | - Ken Chen
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| |
Collapse
|