1
|
Sashittal P, Zhang RY, Law BK, Strzalkowski A, Schmidt H, Bolondi A, Chan MM, Raphael BJ. Inferring cell differentiation maps from lineage tracing data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.09.611835. [PMID: 39314473 PMCID: PMC11419031 DOI: 10.1101/2024.09.09.611835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
During development, mulitpotent cells differentiate through a hierarchy of increasingly restricted progenitor cell types until they realize specialized cell types. A cell differentiation map describes this hierarchy, and inferring these maps is an active area of research spanning traditional single marker lineage studies to data-driven trajectory inference methods on single-cell RNA-seq data. Recent high-throughput lineage tracing technologies profile lineages and cell types at scale, but current methods to infer cell differentiation maps from these data rely on simple models with restrictive assumptions about the developmental process. We introduce a mathematical framework for cell differentiation maps based on the concept of potency, and develop an algorithm, Carta, that infers an optimal cell differentiation map from single-cell lineage tracing data. The key insight in Carta is to balance the trade-off between the complexity of the cell differentiation map and the number of unobserved cell type transitions on the lineage tree. We show that Carta more accurately infers cell differentiation maps on both simulated and real data compared to existing methods. In models of mammalian trunk development and mouse hematopoiesis, Carta identifies important features of development that are not revealed by other methods including convergent differentiation of specialized cell types, progenitor differentiation dynamics, and the refinement of routes of differentiation via new intermediate progenitors.
Collapse
Affiliation(s)
- Palash Sashittal
- Dept. of Computer Science, Princeton University, Princeton; 08544 NJ, USA
| | - Richard Y. Zhang
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton; 08544 NJ, USA
| | - Benjamin K. Law
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton; 08544 NJ, USA
- Dept. of Molecular Biology, Princeton University, Princeton; 08544 NJ, USA
| | | | - Henri Schmidt
- Dept. of Computer Science, Princeton University, Princeton; 08544 NJ, USA
| | - Adriano Bolondi
- Dept. of Genome Regulation, Max Planck Institute for Molecular Genetics; 14195 Berlin, Germany
| | - Michelle M. Chan
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton; 08544 NJ, USA
- Dept. of Molecular Biology, Princeton University, Princeton; 08544 NJ, USA
| | | |
Collapse
|
2
|
Sashittal P, Chen V, Pasarkar A, Raphael BJ. Joint inference of cell lineage and mitochondrial evolution from single-cell sequencing data. Bioinformatics 2024; 40:i218-i227. [PMID: 38940122 PMCID: PMC11211840 DOI: 10.1093/bioinformatics/btae231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION Eukaryotic cells contain organelles called mitochondria that have their own genome. Most cells contain thousands of mitochondria which replicate, even in nondividing cells, by means of a relatively error-prone process resulting in somatic mutations in their genome. Because of the higher mutation rate compared to the nuclear genome, mitochondrial mutations have been used to track cellular lineage, particularly using single-cell sequencing that measures mitochondrial mutations in individual cells. However, existing methods to infer the cell lineage tree from mitochondrial mutations do not model "heteroplasmy," which is the presence of multiple mitochondrial clones with distinct sets of mutations in an individual cell. Single-cell sequencing data thus provide a mixture of the mitochondrial clones in individual cells, with the ancestral relationships between these clones described by a mitochondrial clone tree. While deconvolution of somatic mutations from a mixture of evolutionarily related genomes has been extensively studied in the context of bulk sequencing of cancer tumor samples, the problem of mitochondrial deconvolution has the additional constraint that the mitochondrial clone tree must be concordant with the cell lineage tree. RESULTS We formalize the problem of inferring a concordant pair of a mitochondrial clone tree and a cell lineage tree from single-cell sequencing data as the Nested Perfect Phylogeny Mixture (NPPM) problem. We derive a combinatorial characterization of the solutions to the NPPM problem, and formulate an algorithm, MERLIN, to solve this problem exactly using a mixed integer linear program. We show on simulated data that MERLIN outperforms existing methods that do not model mitochondrial heteroplasmy nor the concordance between the mitochondrial clone tree and the cell lineage tree. We use MERLIN to analyze single-cell whole-genome sequencing data of 5220 cells of a gastric cancer cell line and show that MERLIN infers a more biologically plausible cell lineage tree and mitochondrial clone tree compared to existing methods. AVAILABILITY AND IMPLEMENTATION https://github.com/raphael-group/MERLIN.
Collapse
Affiliation(s)
- Palash Sashittal
- Department of Computer Science, Princeton University, Princeton, NJ 08540, United States
| | - Viola Chen
- Department of Computer Science, Princeton University, Princeton, NJ 08540, United States
| | - Amey Pasarkar
- Department of Computer Science, Princeton University, Princeton, NJ 08540, United States
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ 08540, United States
| |
Collapse
|
3
|
Yan J, Ma M, Yu Z. bmVAE: a variational autoencoder method for clustering single-cell mutation data. Bioinformatics 2022; 39:6881080. [PMID: 36478203 PMCID: PMC9825778 DOI: 10.1093/bioinformatics/btac790] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 10/26/2022] [Accepted: 12/06/2022] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Genetic intra-tumor heterogeneity (ITH) characterizes the differences in genomic variations between tumor clones, and accurately unmasking ITH is important for personalized cancer therapy. Single-cell DNA sequencing now emerges as a powerful means for deciphering underlying ITH based on point mutations of single cells. However, detecting tumor clones from single-cell mutation data remains challenging due to the error-prone and discrete nature of the data. RESULTS We introduce bmVAE, a bioinformatics tool for learning low-dimensional latent representation of single cell based on a variational autoencoder and then clustering cells into subpopulations in the latent space. bmVAE takes single-cell binary mutation data as inputs, and outputs inferred cell subpopulations as well as their genotypes. To achieve this, the bmVAE framework is designed to consist of three modules including dimensionality reduction, cell clustering and genotype estimation. We assess the method on various synthetic datasets where different factors including false negative rate, data size and data heterogeneity are considered in simulation, and further demonstrate its effectiveness on two real datasets. The results suggest bmVAE is highly effective in reasoning ITH, and performs competitive to existing methods. AVAILABILITY AND IMPLEMENTATION bmVAE is freely available at https://github.com/zhyu-lab/bmvae. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiaqian Yan
- School of Information Engineering, Ningxia University, Yinchuan 750021, China
| | - Ming Ma
- School of Information Engineering, Ningxia University, Yinchuan 750021, China
| | - Zhenhua Yu
- To whom correspondence should be addressed.
| |
Collapse
|
4
|
Kızılkale C, Rashidi Mehrabadi F, Sadeqi Azer E, Pérez-Guijarro E, Marie KL, Lee MP, Day CP, Merlino G, Ergün F, Buluç A, Sahinalp SC, Malikić S. Fast intratumor heterogeneity inference from single-cell sequencing data. NATURE COMPUTATIONAL SCIENCE 2022; 2:577-583. [PMID: 38177468 PMCID: PMC10765963 DOI: 10.1038/s43588-022-00298-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 07/14/2022] [Indexed: 01/06/2024]
Abstract
We introduce HUNTRESS, a computational method for mutational intratumor heterogeneity inference from noisy genotype matrices derived from single-cell sequencing data, the running time of which is linear with the number of cells and quadratic with the number of mutations. We prove that, under reasonable conditions, HUNTRESS computes the true progression history of a tumor with high probability. On simulated and real tumor sequencing data, HUNTRESS is demonstrated to be faster than available alternatives with comparable or better accuracy. Additionally, the progression histories of tumors inferred by HUNTRESS on real single-cell sequencing datasets agree with the best known evolution scenarios for the associated tumors.
Collapse
Affiliation(s)
- Can Kızılkale
- Department of Electrical Engineering and Computer Sciences UC Berkeley, Berkeley, CA, USA
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Farid Rashidi Mehrabadi
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
- Department of Computer Science, Indiana University, Bloomington, IN, USA
| | - Erfan Sadeqi Azer
- Department of Computer Science, Indiana University, Bloomington, IN, USA
- Google LLC, Sunnyvale, CA, USA
| | - Eva Pérez-Guijarro
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Kerrie L Marie
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Maxwell P Lee
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Chi-Ping Day
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Glenn Merlino
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Funda Ergün
- Department of Computer Science, Indiana University, Bloomington, IN, USA
| | - Aydın Buluç
- Department of Electrical Engineering and Computer Sciences UC Berkeley, Berkeley, CA, USA
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - S Cenk Sahinalp
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
| | - Salem Malikić
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
5
|
Kuipers J, Singer J, Beerenwinkel N. Single-cell mutation calling and phylogenetic tree reconstruction with loss and recurrence. Bioinformatics 2022; 38:4713-4719. [PMID: 36000873 PMCID: PMC9563700 DOI: 10.1093/bioinformatics/btac577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 07/08/2022] [Accepted: 08/23/2022] [Indexed: 11/13/2022] Open
Abstract
Motivation Tumours evolve as heterogeneous populations of cells, which may be distinguished by different genomic aberrations. The resulting intra-tumour heterogeneity plays an important role in cancer patient relapse and treatment failure, so that obtaining a clear understanding of each patient’s tumour composition and evolutionary history is key for personalized therapies. Single-cell sequencing (SCS) now provides the possibility to resolve tumour heterogeneity at the highest resolution of individual tumour cells, but brings with it challenges related to the particular noise profiles of the sequencing protocols as well as the complexity of the underlying evolutionary process. Results By modelling the noise processes and allowing mutations to be lost or to reoccur during tumour evolution, we present a method to jointly call mutations in each cell, reconstruct the phylogenetic relationship between cells, and determine the locations of mutational losses and recurrences. Our Bayesian approach allows us to accurately call mutations as well as to quantify our certainty in such predictions. We show the advantages of allowing mutational loss or recurrence with simulated data and present its application to tumour SCS data. Availability and implementation SCIΦN is available at https://github.com/cbg-ethz/SCIPhIN. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Jochen Singer
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| |
Collapse
|
6
|
Yu Z, Du F, Song L. SCClone: Accurate Clustering of Tumor Single-Cell DNA Sequencing Data. Front Genet 2022; 13:823941. [PMID: 35154282 PMCID: PMC8830741 DOI: 10.3389/fgene.2022.823941] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2021] [Accepted: 01/04/2022] [Indexed: 12/11/2022] Open
Abstract
Single-cell DNA sequencing (scDNA-seq) enables high-resolution profiling of genetic diversity among single cells and is especially useful for deciphering the intra-tumor heterogeneity and evolutionary history of tumor. Specific technical issues such as allele dropout, false-positive errors, and doublets make scDNA-seq data incomplete and error-prone, giving rise to a severe challenge of accurately inferring clonal architecture of tumor. To effectively address these issues, we introduce a new computational method called SCClone for reasoning subclones from single nucleotide variation (SNV) data of single cells. Specifically, SCClone leverages a probability mixture model for binary data to cluster single cells into distinct subclones. To accurately decipher underlying clonal composition, a novel model selection scheme based on inter-cluster variance is employed to find the optimal number of subclones. Extensive evaluations on various simulated datasets suggest SCClone has strong robustness against different technical noises in scDNA-seq data and achieves better performance than the state-of-the-art methods in reasoning clonal composition. Further evaluations of SCClone on three real scDNA-seq datasets show that it can effectively find the underlying subclones from severely disturbed data. The SCClone software is freely available at https://github.com/qasimyu/scclone.
Collapse
Affiliation(s)
- Zhenhua Yu
- School of Information Engineering, Ningxia University, Yinchuan, China.,Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Ningxia University, Yinchuan, China
| | - Fang Du
- School of Information Engineering, Ningxia University, Yinchuan, China.,Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Ningxia University, Yinchuan, China
| | - Lijuan Song
- School of Information Engineering, Ningxia University, Yinchuan, China.,Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Ningxia University, Yinchuan, China
| |
Collapse
|
7
|
Kozlov A, Alves JM, Stamatakis A, Posada D. CellPhy: accurate and fast probabilistic inference of single-cell phylogenies from scDNA-seq data. Genome Biol 2022; 23:37. [PMID: 35081992 PMCID: PMC8790911 DOI: 10.1186/s13059-021-02583-w] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Accepted: 12/20/2021] [Indexed: 01/15/2023] Open
Abstract
We introduce CellPhy, a maximum likelihood framework for inferring phylogenetic trees from somatic single-cell single-nucleotide variants. CellPhy leverages a finite-site Markov genotype model with 16 diploid states and considers amplification error and allelic dropout. We implement CellPhy into RAxML-NG, a widely used phylogenetic inference package that provides statistical confidence measurements and scales well on large datasets with hundreds or thousands of cells. Comprehensive simulations suggest that CellPhy is more robust to single-cell genomics errors and outperforms state-of-the-art methods under realistic scenarios, both in accuracy and speed. CellPhy is freely available at https://github.com/amkozlov/cellphy .
Collapse
Affiliation(s)
- Alexey Kozlov
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, 76128 Karlsruhe, Germany
| | - Joao M. Alves
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics, and Immunology, Universidade de Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany
| | - David Posada
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics, and Immunology, Universidade de Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
| |
Collapse
|
8
|
Malikić S, Mehrabadi FR, Azer ES, Ebrahimabadi MH, Sahinalp SC. Studying the History of Tumor Evolution from Single-Cell Sequencing Data by Exploring the Space of Binary Matrices. J Comput Biol 2021; 28:857-879. [PMID: 34297621 DOI: 10.1089/cmb.2020.0595] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Single-cell sequencing (SCS) data have great potential in reconstructing the evolutionary history of tumors. Rapid advances in SCS technology in the past decade were followed by the design of various computational methods for inferring trees of tumor evolution. Some of the earliest methods were based on the direct search in the space of trees with the goal of finding the maximum likelihood tree. However, it can be shown that instead of searching directly in the tree space, we can perform a search in the space of binary matrices and obtain maximum likelihood tree directly from the maximum likelihood matrix. The potential of the latter tree search strategy has recently been recognized by different research groups and several related methods were published in the past 2 years. Here we provide a review of the theoretical background of these methods and a detailed discussion, which are largely missing in the available publications, of the correlation between the two tree search strategies. We also discuss each of the existing methods based on the search in the space of binary matrices and summarize the best-known single-cell DNA sequencing data sets, which can be used in the future for assessing performance on real data of newly developed methods.
Collapse
Affiliation(s)
- Salem Malikić
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Farid Rashidi Mehrabadi
- Department of Computer Science, Indiana University, Bloomington, Indiana, USA.,Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Erfan Sadeqi Azer
- Department of Computer Science, Indiana University, Bloomington, Indiana, USA
| | - Mohammad Haghir Ebrahimabadi
- Department of Computer Science, Indiana University, Bloomington, Indiana, USA.,Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Suleyman Cenk Sahinalp
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
9
|
Yu Z, Liu H, Du F, Tang X. GRMT: Generative Reconstruction of Mutation Tree From Scratch Using Single-Cell Sequencing Data. Front Genet 2021; 12:692964. [PMID: 34149820 PMCID: PMC8212059 DOI: 10.3389/fgene.2021.692964] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Accepted: 05/17/2021] [Indexed: 12/11/2022] Open
Abstract
Single-cell sequencing (SCS) now promises the landscape of genetic diversity at single cell level, and is particularly useful to reconstruct the evolutionary history of tumor. There are multiple types of noise that make the SCS data notoriously error-prone, and significantly complicate tumor tree reconstruction. Existing methods for tumor phylogeny estimation suffer from either high computational intensity or low-resolution indication of clonal architecture, giving a necessity of developing new methods for efficient and accurate reconstruction of tumor trees. We introduce GRMT (Generative Reconstruction of Mutation Tree from scratch), a method for inferring tumor mutation tree from SCS data. GRMT exploits the k-Dollo parsimony model to allow each mutation to be gained once and lost at most k times. Under this constraint on mutation evolution, GRMT searches for mutation tree structures from a perspective of tree generation from scratch, and implements it to an iterative process that gradually increases the tree size by introducing a new mutation per time until a complete tree structure that contains all mutations is obtained. This enables GRMT to efficiently recover the chronological order of mutations and scale well to large datasets. Extensive evaluations on simulated and real datasets suggest GRMT outperforms the state-of-the-arts in multiple performance metrics. The GRMT software is freely available at https://github.com/qasimyu/grmt.
Collapse
Affiliation(s)
- Zhenhua Yu
- School of Information Engineering, Ningxia University, Yinchuan, China.,Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Ningxia University, Yinchuan, China
| | - Huidong Liu
- School of Information Engineering, Ningxia University, Yinchuan, China
| | - Fang Du
- School of Information Engineering, Ningxia University, Yinchuan, China.,Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Ningxia University, Yinchuan, China
| | - Xiaofen Tang
- School of Information Engineering, Ningxia University, Yinchuan, China.,Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Ningxia University, Yinchuan, China
| |
Collapse
|
10
|
Espinoza DA, Mortlock RD, Koelle SJ, Wu C, Dunbar CE. Interrogation of clonal tracking data using barcodetrackR. NATURE COMPUTATIONAL SCIENCE 2021; 1:280-289. [PMID: 37621673 PMCID: PMC10449013 DOI: 10.1038/s43588-021-00057-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 03/17/2021] [Indexed: 08/26/2023]
Abstract
Clonal tracking methods provide quantitative insights into the cellular output of genetically labelled progenitor cells across time and cellular compartments. In the context of gene and cell therapies, clonal tracking methods have enabled the tracking of progenitor cell output both in humans receiving therapies and in corresponding animal models, providing valuable insight into lineage reconstitution, clonal dynamics, and vector genotoxicity. However, the absence of a toolbox for analysis of clonal tracking data has precluded the development of standardized analytical frameworks within the field. Thus, we developed barcodetrackR, an R package and accompanying Shiny app containing diverse tools for the analysis and visualization of clonal tracking data. We demonstrate the utility of barcodetrackR in exploring longitudinal clonal patterns and lineage relationships in a number of clonal tracking studies of hematopoietic stem and progenitor cells (HSPCs) in humans receiving HSPC gene therapy and in animals receiving lentivirally transduced HSPC transplants or tumor cells.
Collapse
Affiliation(s)
- Diego A. Espinoza
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Translational Stem Cell Biology Branch, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Ryland D. Mortlock
- Translational Stem Cell Biology Branch, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Samson J. Koelle
- Translational Stem Cell Biology Branch, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD, USA
- Department of Statistics, University of Washington, Seattle, WA, USA
| | - Chuanfeng Wu
- Translational Stem Cell Biology Branch, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Cynthia E. Dunbar
- Translational Stem Cell Biology Branch, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
11
|
Sadeqi Azer E, Haghir Ebrahimabadi M, Malikić S, Khardon R, Sahinalp SC. Tumor Phylogeny Topology Inference via Deep Learning. iScience 2020; 23:101655. [PMID: 33117968 PMCID: PMC7582044 DOI: 10.1016/j.isci.2020.101655] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Revised: 08/10/2020] [Accepted: 10/02/2020] [Indexed: 01/24/2023] Open
Abstract
Principled computational approaches for tumor phylogeny reconstruction via single-cell sequencing typically aim to build the most likely perfect phylogeny tree from the noisy genotype matrix - which represents genotype calls of single cells. This problem is NP-hard, and as a result, existing approaches aim to solve relatively small instances of it through combinatorial optimization techniques or Bayesian inference. As expected, even when the goal is to infer basic topological features of the tumor phylogeny, rather than reconstructing the topology entirely, these approaches could be prohibitively slow. In this paper, we introduce fast deep learning solutions to the problems of inferring whether the most likely tree has a linear (chain) or branching topology and whether a perfect phylogeny is feasible from a given genotype matrix. We also present a reinforcement learning approach for reconstructing the most likely tumor phylogeny. This preliminary work demonstrates that data-driven approaches can reconstruct key features of tumor evolution.
Collapse
Affiliation(s)
- Erfan Sadeqi Azer
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA
| | - Mohammad Haghir Ebrahimabadi
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Salem Malikić
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Roni Khardon
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA
| | - S. Cenk Sahinalp
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
12
|
Tsyvina V, Zelikovsky A, Snir S, Skums P. Inference of mutability landscapes of tumors from single cell sequencing data. PLoS Comput Biol 2020; 16:e1008454. [PMID: 33253159 PMCID: PMC7728263 DOI: 10.1371/journal.pcbi.1008454] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Revised: 12/10/2020] [Accepted: 10/20/2020] [Indexed: 11/18/2022] Open
Abstract
One of the hallmarks of cancer is the extremely high mutability and genetic instability of tumor cells. Inherent heterogeneity of intra-tumor populations manifests itself in high variability of clone instability rates. Analogously to fitness landscapes, the instability rates of clonal populations form their mutability landscapes. Here, we present MULAN (MUtability LANdscape inference), a maximum-likelihood computational framework for inference of mutation rates of individual cancer subclones using single-cell sequencing data. It utilizes the partial information about the orders of mutation events provided by cancer mutation trees and extends it by inferring full evolutionary history and mutability landscape of a tumor. Evaluation of mutation rates on the level of subclones rather than individual genes allows to capture the effects of genomic interactions and epistasis. We estimate the accuracy of our approach and demonstrate that it can be used to study the evolution of genetic instability and infer tumor evolutionary history from experimental data. MULAN is available at https://github.com/compbel/MULAN.
Collapse
Affiliation(s)
- Viachaslau Tsyvina
- Department of Computer Science, Georgia State University, Atlanta, Georgia, United States of America
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, Atlanta, Georgia, United States of America
| | - Sagi Snir
- Department of Evolutionary and Environmental Biology, University of Haifa, Haifa, Israel
| | - Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, Georgia, United States of America
| |
Collapse
|
13
|
Lee J, Park J, Kim J, Jeong B, Choi SY, Jang HS, Yang H. Targeted Isolation of Cytotoxic Sesquiterpene Lactones from Eupatorium fortunei by the NMR Annotation Tool, SMART 2.0. ACS OMEGA 2020; 5:23989-23995. [PMID: 32984720 PMCID: PMC7513349 DOI: 10.1021/acsomega.0c03270] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 08/26/2020] [Indexed: 05/12/2023]
Abstract
Small Molecular Accurate Recognition Technology (SMART 2.0) has recently been introduced as a NMR-based machine learning tool for the discovery and characterization of natural products. We attempted targeted isolation of sesquiterpene lactones from Eupatorium fortunei with the aid of structural annotation by SMART 2.0 and chemical profiling. Eight germacrene-type (1-7 and 10) and two eudesmane-type sesquiterpene lactones (8 and 9) were isolated from the whole plant of Eupatorium fortunei. With the guidance of the results of the subfractions from E. fortunei obtained by SMART 2.0, their cytotoxic activities were evaluated against five cancer cells (SKOV3, A549, PC3, HEp-2, and MCF-7). Compounds 4 and 8 exhibited IC50 values of 3.9 ± 1.2 and 3.9 ± 0.6 μM against prostate cancer cells, PC3, respectively. Compound 7 showed good cytotoxicity with IC50 values of 5.8 ± 0.1 μM against breast cancer cells, MCF-7. In the present study, the rapid annotation of the mixture of compounds in a fraction by the NMR-based machine learning tool helped the targeted isolation of bioactive compounds from natural products.
Collapse
|