1
|
Nadalin F, Marzi MJ, Pirra Piscazzi M, Fuentes-Bravo P, Procaccia S, Climent M, Bonetti P, Rubolino C, Giuliani B, Papatheodorou I, Marioni JC, Nicassio F. Multi-omic lineage tracing predicts the transcriptional, epigenetic and genetic determinants of cancer evolution. Nat Commun 2024; 15:7609. [PMID: 39218912 PMCID: PMC11366763 DOI: 10.1038/s41467-024-51424-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 08/05/2024] [Indexed: 09/04/2024] Open
Abstract
Cancer is a highly heterogeneous disease, where phenotypically distinct subpopulations coexist and can be primed to different fates. Both genetic and epigenetic factors may drive cancer evolution, however little is known about whether and how such a process is pre-encoded in cancer clones. Using single-cell multi-omic lineage tracing and phenotypic assays, we investigate the predictive features of either tumour initiation or drug tolerance within the same cancer population. Clones primed to tumour initiation in vivo display two distinct transcriptional states at baseline. Remarkably, these states share a distinctive DNA accessibility profile, highlighting an epigenetic basis for tumour initiation. The drug tolerant niche is also largely pre-encoded, but only partially overlaps the tumour-initiating one and evolves following two genetically and transcriptionally distinct trajectories. Our study highlights coexisting genetic, epigenetic and transcriptional determinants of cancer evolution, unravelling the molecular complexity of pre-encoded tumour phenotypes.
Collapse
Affiliation(s)
- F Nadalin
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia (IIT), Milan, Italy.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK.
| | - M J Marzi
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia (IIT), Milan, Italy
| | - M Pirra Piscazzi
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia (IIT), Milan, Italy
| | - P Fuentes-Bravo
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia (IIT), Milan, Italy
| | - S Procaccia
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia (IIT), Milan, Italy
| | - M Climent
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia (IIT), Milan, Italy
| | - P Bonetti
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia (IIT), Milan, Italy
| | - C Rubolino
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia (IIT), Milan, Italy
| | - B Giuliani
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia (IIT), Milan, Italy
| | - I Papatheodorou
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK
| | - J C Marioni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - F Nicassio
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia (IIT), Milan, Italy.
| |
Collapse
|
2
|
Foltz SM, Li Y, Yao L, Terekhanova NV, Weerasinghe A, Gao Q, Dong G, Schindler M, Cao S, Sun H, Jayasinghe RG, Fulton RS, Fronick CC, King J, Kohnen DR, Fiala MA, Chen K, DiPersio JF, Vij R, Ding L. Somatic mutation phasing and haplotype extension using linked-reads in multiple myeloma. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.09.607342. [PMID: 39149342 PMCID: PMC11326269 DOI: 10.1101/2024.08.09.607342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Somatic mutation phasing informs our understanding of cancer-related events, like driver mutations. We generated linked-read whole genome sequencing data for 23 samples across disease stages from 14 multiple myeloma (MM) patients and systematically assigned somatic mutations to haplotypes using linked-reads. Here, we report the reconstructed cancer haplotypes and phase blocks from several MM samples and show how phase block length can be extended by integrating samples from the same individual. We also uncover phasing information in genes frequently mutated in MM, including DIS3, HIST1H1E, KRAS, NRAS, and TP53, phasing 79.4% of 20,705 high-confidence somatic mutations. In some cases, this enabled us to interpret clonal evolution models at higher resolution using pairs of phased somatic mutations. For example, our analysis of one patient suggested that two NRAS hotspot mutations occurred on the same haplotype but were independent events in different subclones. Given sufficient tumor purity and data quality, our framework illustrates how haplotype-aware analysis of somatic mutations in cancer can be beneficial for some cancer cases.
Collapse
Affiliation(s)
- Steven M. Foltz
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Yize Li
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Lijun Yao
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Nadezhda V. Terekhanova
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Amila Weerasinghe
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Qingsong Gao
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Guanlan Dong
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Moses Schindler
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Song Cao
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Hua Sun
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Reyka G. Jayasinghe
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Robert S. Fulton
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Catrina C. Fronick
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Justin King
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Daniel R. Kohnen
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Mark A. Fiala
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Ken Chen
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - John F. DiPersio
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Ravi Vij
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Li Ding
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
- Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, 63110, USA
- Department of Genetics, Washington University in St. Louis, St. Louis, MO, 63110, USA
| |
Collapse
|
3
|
Baciu-Drăgan MA, Beerenwinkel N. Oncotree2vec - a method for embedding and clustering of tumor mutation trees. Bioinformatics 2024; 40:i180-i188. [PMID: 38940124 PMCID: PMC11211817 DOI: 10.1093/bioinformatics/btae214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION Understanding the genomic heterogeneity of tumors is an important task in computational oncology, especially in the context of finding personalized treatments based on the genetic profile of each patient's tumor. Tumor clustering that takes into account the temporal order of genetic events, as represented by tumor mutation trees, is a powerful approach for grouping together patients with genetically and evolutionarily similar tumors and can provide insights into discovering tumor subtypes, for more accurate clinical diagnosis and prognosis. RESULTS Here, we propose oncotree2vec, a method for clustering tumor mutation trees by learning vector representations of mutation trees that capture the different relationships between subclones in an unsupervised manner. Learning low-dimensional tree embeddings facilitates the visualization of relations between trees in large cohorts and can be used for downstream analyses, such as deep learning approaches for single-cell multi-omics data integration. We assessed the performance and the usefulness of our method in three simulation studies and on two real datasets: a cohort of 43 trees from six cancer types with different branching patterns corresponding to different modes of spatial tumor evolution and a cohort of 123 AML mutation trees. AVAILABILITY AND IMPLEMENTATION https://github.com/cbg-ethz/oncotree2vec.
Collapse
Affiliation(s)
- Monica-Andreea Baciu-Drăgan
- Department of Biosystems Science and Engineering, ETH Zürich, Schanzenstrasse 44, Basel 4056, Switzerland
- SIB Swiss Institute of Bioinformatics, Schanzenstrasse 44, Basel 4056, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zürich, Schanzenstrasse 44, Basel 4056, Switzerland
- SIB Swiss Institute of Bioinformatics, Schanzenstrasse 44, Basel 4056, Switzerland
| |
Collapse
|
4
|
Lai J, Yang Y, Liu Y, Scharpf RB, Karchin R. Assessing the merits: an opinion on the effectiveness of simulation techniques in tumor subclonal reconstruction. BIOINFORMATICS ADVANCES 2024; 4:vbae094. [PMID: 38948008 PMCID: PMC11213631 DOI: 10.1093/bioadv/vbae094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 05/28/2024] [Accepted: 06/15/2024] [Indexed: 07/02/2024]
Abstract
Summary Neoplastic tumors originate from a single cell, and their evolution can be traced through lineages characterized by mutations, copy number alterations, and structural variants. These lineages are reconstructed and mapped onto evolutionary trees with algorithmic approaches. However, without ground truth benchmark sets, the validity of an algorithm remains uncertain, limiting potential clinical applicability. With a growing number of algorithms available, there is urgent need for standardized benchmark sets to evaluate their merits. Benchmark sets rely on in silico simulations of tumor sequence, but there are no accepted standards for simulation tools, presenting a major obstacle to progress in this field. Availability and implementation All analysis done in the paper was based on publicly available data from the publication of each accessed tool.
Collapse
Affiliation(s)
- Jiaying Lai
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Yi Yang
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Yunzhou Liu
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Robert B Scharpf
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21231, United States
- Department of Oncology, Johns Hopkins Medical Institutions, Baltimore, MD 21231, United States
| | - Rachel Karchin
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21231, United States
- Department of Oncology, Johns Hopkins Medical Institutions, Baltimore, MD 21231, United States
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, United States
| |
Collapse
|
5
|
Dai L, Fan G, Xie T, Li L, Tang L, Chen H, Shi Y, Han X. Single-cell and spatial transcriptomics reveal a high glycolysis B cell and tumor-associated macrophages cluster correlated with poor prognosis and exhausted immune microenvironment in diffuse large B-cell lymphoma. Biomark Res 2024; 12:58. [PMID: 38840205 PMCID: PMC11155084 DOI: 10.1186/s40364-024-00605-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Accepted: 05/22/2024] [Indexed: 06/07/2024] Open
Abstract
BACKGROUND Diffuse large B-cell lymphoma (DLBCL) is a heterogeneous malignancy characterized by varied responses to treatment and prognoses. Understanding the metabolic characteristics driving DLBCL progression is crucial for developing personalized therapies. METHODS This study utilized multiple omics technologies including single-cell transcriptomics (n = 5), bulk transcriptomics (n = 966), spatial transcriptomics (n = 10), immunohistochemistry (n = 34), multiple immunofluorescence (n = 20) and to elucidate the metabolic features of highly malignant DLBCL cells and tumor-associated macrophages (TAMs), along with their associated tumor microenvironment. Metabolic pathway analysis facilitated by scMetabolism, and integrated analysis via hdWGCNA, identified glycolysis genes correlating with malignancy, and the prognostic value of glycolysis genes (STMN1, ENO1, PKM, and CDK1) and TAMs were verified. RESULTS High-glycolysis malignant DLBCL tissues exhibited an immunosuppressive microenvironment characterized by abundant IFN_TAMs (CD68+CXCL10+PD-L1+) and diminished CD8+ T cell infiltration. Glycolysis genes were positively correlated with malignancy degree. IFN_TAMs exhibited high glycolysis activity and closely communicating with high-malignancy DLBCL cells identified within datasets. The glycolysis score, evaluated by seven genes, emerged as an independent prognostic factor (HR = 1.796, 95% CI: 1.077-2.995, p = 0.025 and HR = 2.631, 95% CI: 1.207-5.735, p = 0.015) along with IFN_TAMs were positively correlated with poor survival (p < 0.05) in DLBCL. Immunohistochemical validation of glycolysis markers (STMN1, ENO1, PKM, and CDK1) and multiple immunofluorescence validation of IFN_TAMs underscored their prognostic value (p < 0.05) in DLBCL. CONCLUSIONS This study underscores the significance of glycolysis in tumor progression and modulation of the immune microenvironment. The identified glycolysis genes and IFN_TAMs represent potential prognostic markers and therapeutic targets in DLBCL.
Collapse
Affiliation(s)
- Liyuan Dai
- National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing Key Laboratory of Clinical Study on Anticancer Molecular Targeted Drugs, No. 17 Panjiayuan Nanli, Chaoyang District, Beijing, 100021, China
| | - Guangyu Fan
- Department of Medical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing Key Laboratory of Clinical Study on Anticancer Molecular Targeted Drugs, No. 17 Panjiayuan Nanli, Chaoyang District, Beijing, 100021, China
| | - Tongji Xie
- Department of Medical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing Key Laboratory of Clinical Study on Anticancer Molecular Targeted Drugs, No. 17 Panjiayuan Nanli, Chaoyang District, Beijing, 100021, China
| | - Lin Li
- Department of Pathology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, No. 17 Panjiayuan Nanli, Chaoyang District, Beijing, 100021, China
| | - Le Tang
- Department of Medical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing Key Laboratory of Clinical Study on Anticancer Molecular Targeted Drugs, No. 17 Panjiayuan Nanli, Chaoyang District, Beijing, 100021, China
| | - Haizhu Chen
- Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Breast Tumor Centre, Department of Medical Oncology, Phase I Clinical Trial Centre, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, 510120, P. R. China
| | - Yuankai Shi
- Department of Medical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing Key Laboratory of Clinical Study on Anticancer Molecular Targeted Drugs, No. 17 Panjiayuan Nanli, Chaoyang District, Beijing, 100021, China.
| | - Xiaohong Han
- Clinical Pharmacology Research Center, Peking Union Medical College Hospital, State Key Laboratory of Complex Severe and Rare Diseases, NMPA Key Laboratory for Clinical Research and Evaluation of Drug, Beijing Key Laboratory of Clinical PK & PD Investigation for Innovative Drugs, Chinese Academy of Medical Sciences & Peking Union Medical College, No.1, Shuaifuyuan, Dongcheng District, Beijing, 100730, China.
| |
Collapse
|
6
|
Li L, Xie W, Zhan L, Wen S, Luo X, Xu S, Cai Y, Tang W, Wang Q, Li M, Xie Z, Deng L, Zhu H, Yu G. Resolving tumor evolution: a phylogenetic approach. JOURNAL OF THE NATIONAL CANCER CENTER 2024; 4:97-106. [PMID: 39282584 PMCID: PMC11390690 DOI: 10.1016/j.jncc.2024.03.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Revised: 02/28/2024] [Accepted: 03/20/2024] [Indexed: 09/19/2024] Open
Abstract
The evolutionary dynamics of cancer, characterized by its profound heterogeneity, demand sophisticated tools for a holistic understanding. This review delves into tumor phylogenetics, an essential approach bridging evolutionary biology with oncology, offering unparalleled insights into cancer's evolutionary trajectory. We provide an overview of the workflow, encompassing study design, data acquisition, and phylogeny reconstruction. Notably, the integration of diverse data sets emerges as a transformative step, enhancing the depth and breadth of evolutionary insights. With this integrated perspective, tumor phylogenetics stands poised to redefine our understanding of cancer evolution and influence therapeutic strategies.
Collapse
Affiliation(s)
- Lin Li
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Wenqin Xie
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Li Zhan
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Shaodi Wen
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
- Department of Oncology, The Affiliated Cancer Hospital of Nanjing Medical University & Jiangsu Cancer Hospital, Nanjing, China
| | - Xiao Luo
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Shuangbin Xu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
- Division of Laboratory Medicine, Microbiome Center, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Yantong Cai
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
- Dermatology Hospital, Southern Medical University, Guangzhou, China
| | - Wenli Tang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Qianwen Wang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Ming Li
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Zijing Xie
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Lin Deng
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Hongyuan Zhu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Guangchuang Yu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| |
Collapse
|
7
|
Coorens THH, Spencer Chapman M, Williams N, Martincorena I, Stratton MR, Nangalia J, Campbell PJ. Reconstructing phylogenetic trees from genome-wide somatic mutations in clonal samples. Nat Protoc 2024; 19:1866-1886. [PMID: 38396041 DOI: 10.1038/s41596-024-00962-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 12/13/2023] [Indexed: 02/25/2024]
Abstract
Phylogenetic trees are a powerful means to display the evolutionary history of species, pathogens and, more recently, individual cells of the human body. Whole-genome sequencing of laser capture microdissections or expanded stem cells has allowed the discovery of somatic mutations in clones, which can be used as natural barcodes to reconstruct the developmental history of individual cells. Here we describe Sequoia, our pipeline to reconstruct lineage trees from clones of normal cells. Candidate somatic mutations are called against the human reference genome and filtered to exclude germline mutations and artifactual variants. These filtered somatic mutations form the basis for phylogeny reconstruction using a maximum parsimony framework. Lastly, we use a maximum likelihood framework to explicitly map mutations to branches in the phylogenetic tree. The resulting phylogenies can then serve as a basis for many subsequent analyses, including investigating embryonic development, tissue dynamics in health and disease, and mutational signatures. Sequoia can be readily applied to any clonal somatic mutation dataset, including single-cell DNA sequencing datasets, using the commands and scripts provided. Moreover, Sequoia is highly flexible and can be easily customized. Typically, the runtime of the core script ranges from minutes to an hour for datasets with a moderate number (50,000-150,000) of variants. Competent bioinformatic skills, including in-depth knowledge of the R programming language, are required. A high-performance computing cluster (one that is capable of running mutation-calling algorithms and other aspects of the analysis at scale) is also required, especially if handling large datasets.
Collapse
Affiliation(s)
- Tim H H Coorens
- Wellcome Sanger Institute, Hinxton, UK.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Michael Spencer Chapman
- Wellcome Sanger Institute, Hinxton, UK.
- Department of Haematology, Barts Health NHS Trust, London, UK.
- Department of Haemato-oncology, Barts Cancer Institute, Queen Mary University of London, London, UK.
| | | | | | | | - Jyoti Nangalia
- Wellcome Sanger Institute, Hinxton, UK
- Wellcome-Medical Research Council Cambridge Stem Cell Institute, Cambridge, UK
- Department of Haematology, University of Cambridge, Cambridge, UK
| | - Peter J Campbell
- Wellcome Sanger Institute, Hinxton, UK.
- Wellcome-Medical Research Council Cambridge Stem Cell Institute, Cambridge, UK.
| |
Collapse
|
8
|
Sammut SJ, Galson JD, Minter R, Sun B, Chin SF, De Mattos-Arruda L, Finch DK, Schätzle S, Dias J, Rueda OM, Seoane J, Osbourn J, Caldas C, Bashford-Rogers RJM. Predictability of B cell clonal persistence and immunosurveillance in breast cancer. Nat Immunol 2024; 25:916-924. [PMID: 38698238 PMCID: PMC11065701 DOI: 10.1038/s41590-024-01821-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Accepted: 03/15/2024] [Indexed: 05/05/2024]
Abstract
B cells and T cells are important components of the adaptive immune system and mediate anticancer immunity. The T cell landscape in cancer is well characterized, but the contribution of B cells to anticancer immunosurveillance is less well explored. Here we show an integrative analysis of the B cell and T cell receptor repertoire from individuals with metastatic breast cancer and individuals with early breast cancer during neoadjuvant therapy. Using immune receptor, RNA and whole-exome sequencing, we show that both B cell and T cell responses seem to coevolve with the metastatic cancer genomes and mirror tumor mutational and neoantigen architecture. B cell clones associated with metastatic immunosurveillance and temporal persistence were more expanded and distinct from site-specific clones. B cell clonal immunosurveillance and temporal persistence are predictable from the clonal structure, with higher-centrality B cell antigen receptors more likely to be detected across multiple metastases or across time. This predictability was generalizable across other immune-mediated disorders. This work lays a foundation for prioritizing antibody sequences for therapeutic targeting in cancer.
Collapse
MESH Headings
- Humans
- Female
- Breast Neoplasms/immunology
- B-Lymphocytes/immunology
- Immunologic Surveillance
- Receptors, Antigen, T-Cell/genetics
- Receptors, Antigen, T-Cell/immunology
- Receptors, Antigen, T-Cell/metabolism
- Receptors, Antigen, B-Cell/metabolism
- Receptors, Antigen, B-Cell/genetics
- Receptors, Antigen, B-Cell/immunology
- T-Lymphocytes/immunology
- Monitoring, Immunologic
- Exome Sequencing
- Antigens, Neoplasm/immunology
- Neoplasm Metastasis
- Clone Cells
Collapse
Affiliation(s)
- Stephen-John Sammut
- Breast Cancer Now Toby Robins Research Centre, The Institute of Cancer Research, London, UK.
- The Royal Marsden Hospital NHS Foundation Trust, London, UK.
| | | | | | - Bo Sun
- Wellcome Centre for Human Genetics, Oxford, UK
- Nuffield Department of Clinical Neuroscience, University of Oxford, Oxford, UK
| | - Suet-Feung Chin
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Leticia De Mattos-Arruda
- IrsiCaixa, Germans Trias i Pujol University Hospital, Badalona, Spain
- Germans Trias i Pujol Research Institute (IGTP), Badalona, Spain
| | | | | | | | - Oscar M Rueda
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - Joan Seoane
- Vall d'Hebron Institute of Oncology (VHIO), Vall d'Hebron University Hospital, Institució Catalana de Recerca i Estudis Avançats (ICREA), Universitat Autònoma de Barcelona (UAB), CIBERONC, Barcelona, Spain
| | | | - Carlos Caldas
- School of Clinical Medicine, University of Cambridge, Cambridge, UK.
| | - Rachael J M Bashford-Rogers
- Wellcome Centre for Human Genetics, Oxford, UK.
- Department of Biochemistry, University of Oxford, Oxford, UK.
- Oxford Cancer Centre, Oxford, UK.
| |
Collapse
|
9
|
Koptagel H, Jun SH, Hård J, Lagergren J. Scuphr: A probabilistic framework for cell lineage tree reconstruction. PLoS Comput Biol 2024; 20:e1012094. [PMID: 38723024 PMCID: PMC11125557 DOI: 10.1371/journal.pcbi.1012094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 05/24/2024] [Accepted: 04/20/2024] [Indexed: 05/25/2024] Open
Abstract
Cell lineage tree reconstruction methods are developed for various tasks, such as investigating the development, differentiation, and cancer progression. Single-cell sequencing technologies enable more thorough analysis with higher resolution. We present Scuphr, a distance-based cell lineage tree reconstruction method using bulk and single-cell DNA sequencing data from healthy tissues. Common challenges of single-cell DNA sequencing, such as allelic dropouts and amplification errors, are included in Scuphr. Scuphr computes the distance between cell pairs and reconstructs the lineage tree using the neighbor-joining algorithm. With its embarrassingly parallel design, Scuphr can do faster analysis than the state-of-the-art methods while obtaining better accuracy. The method's robustness is investigated using various synthetic datasets and a biological dataset of 18 cells.
Collapse
Affiliation(s)
- Hazal Koptagel
- School of EECS, KTH Royal Institute of Technology, Stockholm, Sweden
- Science for Life Laboratory, Stockholm, Sweden
| | - Seong-Hwan Jun
- Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, New York, United States of America
| | - Joanna Hård
- Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden
| | - Jens Lagergren
- School of EECS, KTH Royal Institute of Technology, Stockholm, Sweden
- Science for Life Laboratory, Stockholm, Sweden
| |
Collapse
|
10
|
Weideman AMK, Wang R, Ibrahim JG, Jiang Y. Canopy2: tumor phylogeny inference by bulk DNA and single-cell RNA sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.18.585595. [PMID: 38562795 PMCID: PMC10983938 DOI: 10.1101/2024.03.18.585595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Tumors are comprised of a mixture of distinct cell populations that differ in terms of genetic makeup and function. Such heterogeneity plays a role in the development of drug resistance and the ineffectiveness of targeted cancer therapies. Insight into this complexity can be obtained through the construction of a phylogenetic tree, which illustrates the evolutionary lineage of tumor cells as they acquire mutations over time. We propose Canopy2, a Bayesian framework that uses single nucleotide variants derived from bulk DNA and single-cell RNA sequencing to infer tumor phylogeny and conduct mutational profiling of tumor subpopulations. Canopy2 uses Markov chain Monte Carlo methods to sample from a joint probability distribution involving a mixture of binomial and beta-binomial distributions, specifically chosen to account for the sparsity and stochasticity of the single-cell data. Canopy2 demystifies the sources of zeros in the single-cell data and separates zeros categorized as non-cancerous (cells without mutations), stochastic (mutations not expressed due to bursting), and technical (expressed mutations not picked up by sequencing). Simulations demonstrate that Canopy2 consistently outperforms competing methods and reconstructs the clonal tree with high fidelity, even in situations involving low sequencing depth, poor single-cell yield, and highly-advanced and polyclonal tumors. We further assess the performance of Canopy2 through application to breast cancer and glioblastoma data, benchmarking against existing methods. Canopy2 is an open-source R package available at https://github.com/annweideman/canopy2.
Collapse
Affiliation(s)
- Ann Marie K. Weideman
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Rujin Wang
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Joseph G. Ibrahim
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Yuchao Jiang
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
11
|
Cho JW, Cao J, Hemberg M. Joint analysis of mutational and transcriptional landscapes in human cancer reveals key perturbations during cancer evolution. Genome Biol 2024; 25:65. [PMID: 38459554 PMCID: PMC10921788 DOI: 10.1186/s13059-024-03201-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 02/19/2024] [Indexed: 03/10/2024] Open
Abstract
BACKGROUND Tumors are able to acquire new capabilities, including traits such as drug resistance and metastasis that are associated with unfavorable clinical outcomes. Single-cell technologies have made it possible to study both mutational and transcriptomic profiles, but as most studies have been conducted on model systems, little is known about cancer evolution in human patients. Hence, a better understanding of cancer evolution could have important implications for treatment strategies. RESULTS Here, we analyze cancer evolution and clonal selection by jointly considering mutational and transcriptomic profiles of single cells acquired from tumor biopsies from 49 lung cancer samples and 51 samples with chronic myeloid leukemia. Comparing the two profiles, we find that each clone is associated with a preferred transcriptional state. For metastasis and drug resistance, we find that the number of mutations affecting related genes increases as the clone evolves, while changes in gene expression profiles are limited. Surprisingly, we find that mutations affecting ligand-receptor interactions with the tumor microenvironment frequently emerge as clones acquire drug resistance. CONCLUSIONS Our results show that lung cancer and chronic myeloid leukemia maintain a high clonal and transcriptional diversity, and we find little evidence in favor of clonal sweeps. This suggests that for these cancers selection based solely on growth rate is unlikely to be the dominating driving force during cancer evolution.
Collapse
Affiliation(s)
- Jae-Won Cho
- The Gene Lay Institute of Immunology and Inflammation, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Jingyi Cao
- The Gene Lay Institute of Immunology and Inflammation, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Martin Hemberg
- The Gene Lay Institute of Immunology and Inflammation, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
12
|
Lai J, Liu Y, Scharpf RB, Karchin R. Evaluation of simulation methods for tumor subclonal reconstruction. ARXIV 2024:arXiv:2402.09599v1. [PMID: 38410652 PMCID: PMC10896360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 02/28/2024]
Abstract
Most neoplastic tumors originate from a single cell, and their evolution can be genetically traced through lineages characterized by common alterations such as small somatic mutations (SSMs), copy number alterations (CNAs), structural variants (SVs), and aneuploidies. Due to the complexity of these alterations in most tumors and the errors introduced by sequencing protocols and calling algorithms, tumor subclonal reconstruction algorithms are necessary to recapitulate the DNA sequence composition and tumor evolution in silico. With a growing number of these algorithms available, there is a pressing need for consistent and comprehensive benchmarking, which relies on realistic tumor sequencing generated by simulation tools. Here, we examine the current simulation methods, identifying their strengths and weaknesses, and provide recommendations for their improvement. Our review also explores potential new directions for research in this area. This work aims to serve as a resource for understanding and enhancing tumor genomic simulations, contributing to the advancement of the field.
Collapse
Affiliation(s)
- Jiaying Lai
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD
| | - Yunzhou Liu
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD
| | - Robert B. Scharpf
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD
- Department of Oncology, Johns Hopkins Medical Institutions, Baltimore, MD
| | - Rachel Karchin
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD
- Department of Oncology, Johns Hopkins Medical Institutions, Baltimore, MD
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD
| |
Collapse
|
13
|
Liu F, Shi F, Yu Z. Inferring single-cell copy number profiles through cross-cell segmentation of read counts. BMC Genomics 2024; 25:25. [PMID: 38166601 PMCID: PMC10762977 DOI: 10.1186/s12864-023-09901-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 12/12/2023] [Indexed: 01/05/2024] Open
Abstract
BACKGROUND Copy number alteration (CNA) is one of the major genomic variations that frequently occur in cancers, and accurate inference of CNAs is essential for unmasking intra-tumor heterogeneity (ITH) and tumor evolutionary history. Single-cell DNA sequencing (scDNA-seq) makes it convenient to profile CNAs at single-cell resolution, and thus aids in better characterization of ITH. Despite that several computational methods have been proposed to decipher single-cell CNAs, their performance is limited in either breakpoint detection or copy number estimation due to the high dimensionality and noisy nature of read counts data. RESULTS By treating breakpoint detection as a process to segment high dimensional read count sequence, we develop a novel method called DeepCNA for cross-cell segmentation of read count sequence and per-cell inference of CNAs. To cope with the difficulty of segmentation, an autoencoder (AE) network is employed in DeepCNA to project the original data into a low-dimensional space, where the breakpoints can be efficiently detected along each latent dimension and further merged to obtain the final breakpoints. Unlike the existing methods that manually calculate certain statistics of read counts to find breakpoints, the AE model makes it convenient to automatically learn the representations. Based on the inferred breakpoints, we employ a mixture model to predict copy numbers of segments for each cell, and leverage expectation-maximization algorithm to efficiently estimate cell ploidy by exploring the most abundant copy number state. Benchmarking results on simulated and real data demonstrate our method is able to accurately infer breakpoints as well as absolute copy numbers and surpasses the existing methods under different test conditions. DeepCNA can be accessed at: https://github.com/zhyu-lab/deepcna . CONCLUSIONS Profiling single-cell CNAs based on deep learning is becoming a new paradigm of scDNA-seq data analysis, and DeepCNA is an enhancement to the current arsenal of computational methods for investigating cancer genomics.
Collapse
Affiliation(s)
- Furui Liu
- School of Information Engineering, Ningxia University, Yinchuan, 750021, China
| | - Fangyuan Shi
- School of Information Engineering, Ningxia University, Yinchuan, 750021, China
- Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-Founded By Ningxia Municipality and Ministry of Education, Ningxia University, Yinchuan, 750021, China
| | - Zhenhua Yu
- School of Information Engineering, Ningxia University, Yinchuan, 750021, China.
- Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-Founded By Ningxia Municipality and Ministry of Education, Ningxia University, Yinchuan, 750021, China.
| |
Collapse
|
14
|
Rossi N, Gigante N, Vitacolonna N, Piazza C. Inferring Markov Chains to Describe Convergent Tumor Evolution With CIMICE. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:106-119. [PMID: 38015671 DOI: 10.1109/tcbb.2023.3337258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
The field of tumor phylogenetics focuses on studying the differences within cancer cell populations. Many efforts are done within the scientific community to build cancer progression models trying to understand the heterogeneity of such diseases. These models are highly dependent on the kind of data used for their construction, therefore, as the experimental technologies evolve, it is of major importance to exploit their peculiarities. In this work we describe a cancer progression model based on Single Cell DNA Sequencing data. When constructing the model, we focus on tailoring the formalism on the specificity of the data. We operate by defining a minimal set of assumptions needed to reconstruct a flexible DAG structured model, capable of identifying progression beyond the limitation of the infinite site assumption. Our proposal is conservative in the sense that we aim to neither discard nor infer knowledge which is not represented in the data. We provide simulations and analytical results to show the features of our model, test it on real data, show how it can be integrated with other approaches to cope with input noise. Moreover, our framework can be exploited to produce simulated data that follows our theoretical assumptions. Finally, we provide an open source R implementation of our approach, called CIMICE, that is publicly available on BioConductor.
Collapse
|
15
|
Han Y, Molloy EK. Quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model. Algorithms Mol Biol 2023; 18:19. [PMID: 38041123 PMCID: PMC10691101 DOI: 10.1186/s13015-023-00248-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 11/19/2023] [Indexed: 12/03/2023] Open
Abstract
Cancer progression and treatment can be informed by reconstructing its evolutionary history from tumor cells. Although many methods exist to estimate evolutionary trees (called phylogenies) from molecular sequences, traditional approaches assume the input data are error-free and the output tree is fully resolved. These assumptions are challenged in tumor phylogenetics because single-cell sequencing produces sparse, error-ridden data and because tumors evolve clonally. Here, we study the theoretical utility of methods based on quartets (four-leaf, unrooted phylogenetic trees) in light of these barriers. We consider a popular tumor phylogenetics model, in which mutations arise on a (highly unresolved) tree and then (unbiased) errors and missing values are introduced. Quartets are then implied by mutations present in two cells and absent from two cells. Our main result is that the most probable quartet identifies the unrooted model tree on four cells. This motivates seeking a tree such that the number of quartets shared between it and the input mutations is maximized. We prove an optimal solution to this problem is a consistent estimator of the unrooted cell lineage tree; this guarantee includes the case where the model tree is highly unresolved, with error defined as the number of false negative branches. Lastly, we outline how quartet-based methods might be employed when there are copy number aberrations and other challenges specific to tumor phylogenetics.
Collapse
Affiliation(s)
- Yunheng Han
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Erin K Molloy
- Department of Computer Science, University of Maryland, College Park, MD, USA.
- University of Maryland Institute for Advanced Computer Studies, College Park, MD, USA.
| |
Collapse
|
16
|
Sashittal P, Zhang H, Iacobuzio-Donahue CA, Raphael BJ. ConDoR: tumor phylogeny inference with a copy-number constrained mutation loss model. Genome Biol 2023; 24:272. [PMID: 38037115 PMCID: PMC10688497 DOI: 10.1186/s13059-023-03106-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Accepted: 11/07/2023] [Indexed: 12/02/2023] Open
Abstract
A tumor contains a diverse collection of somatic mutations that reflect its past evolutionary history and that range in scale from single nucleotide variants (SNVs) to large-scale copy-number aberrations (CNAs). However, no current single-cell DNA sequencing (scDNA-seq) technology produces accurate measurements of both SNVs and CNAs, complicating the inference of tumor phylogenies. We introduce a new evolutionary model, the constrained k-Dollo model, that uses SNVs as phylogenetic markers but constrains losses of SNVs according to clusters of cells. We derive an algorithm, ConDoR, that infers phylogenies from targeted scDNA-seq data using this model. We demonstrate the advantages of ConDoR on simulated and real scDNA-seq data.
Collapse
Affiliation(s)
| | - Haochen Zhang
- Gerstner Sloan Kettering Graduate School of Biomedical Sciences, Memorial Sloan Kettering Cancer Center, NY, USA
| | - Christine A Iacobuzio-Donahue
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, NY, USA
- David M. Rubenstein Center for Pancreatic Cancer Research, Memorial Sloan Kettering Cancer Center, NY, USA
- Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, NY, USA
| | | |
Collapse
|
17
|
Sollier E, Kuipers J, Takahashi K, Beerenwinkel N, Jahn K. COMPASS: joint copy number and mutation phylogeny reconstruction from amplicon single-cell sequencing data. Nat Commun 2023; 14:4921. [PMID: 37582954 PMCID: PMC10427627 DOI: 10.1038/s41467-023-40378-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 07/19/2023] [Indexed: 08/17/2023] Open
Abstract
Reconstructing the history of somatic DNA alterations can help understand the evolution of a tumor and predict its resistance to treatment. Single-cell DNA sequencing (scDNAseq) can be used to investigate clonal heterogeneity and to inform phylogeny reconstruction. However, most existing phylogenetic methods for scDNAseq data are designed either for single nucleotide variants (SNVs) or for large copy number alterations (CNAs), or are not applicable to targeted sequencing. Here, we develop COMPASS, a computational method for inferring the joint phylogeny of SNVs and CNAs from targeted scDNAseq data. We evaluate COMPASS on simulated data and apply it to several datasets including a cohort of 123 patients with acute myeloid leukemia. COMPASS detected clonal CNAs that could be orthogonally validated with bulk data, in addition to subclonal ones that require single-cell resolution, some of which point toward convergent evolution.
Collapse
Affiliation(s)
- Etienne Sollier
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- Division of Cancer Epigenomics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Koichi Takahashi
- Department of Leukemia, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Katharina Jahn
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland.
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland.
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany.
| |
Collapse
|
18
|
Liu X, Griffiths JI, Bishara I, Liu J, Bild AH, Chang JT. Phylogenetic inference from single-cell RNA-seq data. Sci Rep 2023; 13:12854. [PMID: 37553438 PMCID: PMC10409753 DOI: 10.1038/s41598-023-39995-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 08/03/2023] [Indexed: 08/10/2023] Open
Abstract
Tumors are comprised of subpopulations of cancer cells that harbor distinct genetic profiles and phenotypes that evolve over time and during treatment. By reconstructing the course of cancer evolution, we can understand the acquisition of the malignant properties that drive tumor progression. Unfortunately, recovering the evolutionary relationships of individual cancer cells linked to their phenotypes remains a difficult challenge. To address this need, we have developed PhylinSic, a method that reconstructs the phylogenetic relationships among cells linked to their gene expression profiles from single cell RNA-sequencing (scRNA-Seq) data. This method calls nucleotide bases using a probabilistic smoothing approach and then estimates a phylogenetic tree using a Bayesian modeling algorithm. We showed that PhylinSic identified evolutionary relationships underpinning drug selection and metastasis and was sensitive enough to identify subclones from genetic drift. We found that breast cancer tumors resistant to chemotherapies harbored multiple genetic lineages that independently acquired high K-Ras and β-catenin, suggesting that therapeutic strategies may need to control multiple lineages to be durable. These results demonstrated that PhylinSic can reconstruct evolution and link the genotypes and phenotypes of cells across monophyletic tumors using scRNA-Seq.
Collapse
Affiliation(s)
- Xuan Liu
- Department of Integrative Biology & Pharmacology, University of Texas Health Science Center at Houston, 6431 Fannin St, MSB 4.218, Houston, TX, 77030, USA
| | - Jason I Griffiths
- Division of Molecular Pharmacology, Department of Medical Oncology & Clinical Therapeutics, City of Hope, Monrovia, CA, USA
| | - Isaac Bishara
- Division of Molecular Pharmacology, Department of Medical Oncology & Clinical Therapeutics, City of Hope, Monrovia, CA, USA
| | - Jiayi Liu
- Department of Integrative Biology & Pharmacology, University of Texas Health Science Center at Houston, 6431 Fannin St, MSB 4.218, Houston, TX, 77030, USA
| | - Andrea H Bild
- Division of Molecular Pharmacology, Department of Medical Oncology & Clinical Therapeutics, City of Hope, Monrovia, CA, USA
| | - Jeffrey T Chang
- Department of Integrative Biology & Pharmacology, University of Texas Health Science Center at Houston, 6431 Fannin St, MSB 4.218, Houston, TX, 77030, USA.
| |
Collapse
|
19
|
Becker D, Champredon D, Chato C, Gugan G, Poon A. SUP: a probabilistic framework to propagate genome sequence uncertainty, with applications. NAR Genom Bioinform 2023; 5:lqad038. [PMID: 37101658 PMCID: PMC10124968 DOI: 10.1093/nargab/lqad038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 02/15/2023] [Accepted: 04/06/2023] [Indexed: 04/28/2023] Open
Abstract
Genetic sequencing is subject to many different types of errors, but most analyses treat the resultant sequences as if they are known without error. Next generation sequencing methods rely on significantly larger numbers of reads than previous sequencing methods in exchange for a loss of accuracy in each individual read. Still, the coverage of such machines is imperfect and leaves uncertainty in many of the base calls. In this work, we demonstrate that the uncertainty in sequencing techniques will affect downstream analysis and propose a straightforward method to propagate the uncertainty. Our method (which we have dubbed Sequence Uncertainty Propagation, or SUP) uses a probabilistic matrix representation of individual sequences which incorporates base quality scores as a measure of uncertainty that naturally lead to resampling and replication as a framework for uncertainty propagation. With the matrix representation, resampling possible base calls according to quality scores provides a bootstrap- or prior distribution-like first step towards genetic analysis. Analyses based on these re-sampled sequences will include a more complete evaluation of the error involved in such analyses. We demonstrate our resampling method on SARS-CoV-2 data. The resampling procedures add a linear computational cost to the analyses, but the large impact on the variance in downstream estimates makes it clear that ignoring this uncertainty may lead to overly confident conclusions. We show that SARS-CoV-2 lineage designations via Pangolin are much less certain than the bootstrap support reported by Pangolin would imply and the clock rate estimates for SARS-CoV-2 are much more variable than reported.
Collapse
Affiliation(s)
- Devan Becker
- To whom correspondence should be addressed. Tel: +1 519 884 1970 (Ext 2464);
| | | | - Connor Chato
- Department of Pathology and Laboratory Medicine, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - Gopi Gugan
- Department of Pathology and Laboratory Medicine, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - Art Poon
- Department of Pathology and Laboratory Medicine, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| |
Collapse
|
20
|
Brockley LJ, Souza VGP, Forder A, Pewarchuk ME, Erkan M, Telkar N, Benard K, Trejo J, Stewart MD, Stewart GL, Reis PP, Lam WL, Martinez VD. Sequence-Based Platforms for Discovering Biomarkers in Liquid Biopsy of Non-Small-Cell Lung Cancer. Cancers (Basel) 2023; 15:2275. [PMID: 37190212 PMCID: PMC10136462 DOI: 10.3390/cancers15082275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 04/07/2023] [Accepted: 04/11/2023] [Indexed: 05/17/2023] Open
Abstract
Lung cancer detection and monitoring are hampered by a lack of sensitive biomarkers, which results in diagnosis at late stages and difficulty in tracking response to treatment. Recent developments have established liquid biopsies as promising non-invasive methods for detecting biomarkers in lung cancer patients. With concurrent advances in high-throughput sequencing technologies and bioinformatics tools, new approaches for biomarker discovery have emerged. In this article, we survey established and emerging biomarker discovery methods using nucleic acid materials derived from bodily fluids in the context of lung cancer. We introduce nucleic acid biomarkers extracted from liquid biopsies and outline biological sources and methods of isolation. We discuss next-generation sequencing (NGS) platforms commonly used to identify novel biomarkers and describe how these have been applied to liquid biopsy. We highlight emerging biomarker discovery methods, including applications of long-read sequencing, fragmentomics, whole-genome amplification methods for single-cell analysis, and whole-genome methylation assays. Finally, we discuss advanced bioinformatics tools, describing methods for processing NGS data, as well as recently developed software tailored for liquid biopsy biomarker detection, which holds promise for early diagnosis of lung cancer.
Collapse
Affiliation(s)
- Liam J. Brockley
- British Columbia Cancer Research Institute, Vancouver, BC V5Z 1L3, Canada; (V.G.P.S.); (A.F.); (M.E.P.); (N.T.); (K.B.); (J.T.); (M.D.S.); (G.L.S.); (W.L.L.)
| | - Vanessa G. P. Souza
- British Columbia Cancer Research Institute, Vancouver, BC V5Z 1L3, Canada; (V.G.P.S.); (A.F.); (M.E.P.); (N.T.); (K.B.); (J.T.); (M.D.S.); (G.L.S.); (W.L.L.)
- Molecular Oncology Laboratory, Experimental Research Unit, School of Medicine, São Paulo State University (UNESP), Botucatu 18618-687, SP, Brazil;
| | - Aisling Forder
- British Columbia Cancer Research Institute, Vancouver, BC V5Z 1L3, Canada; (V.G.P.S.); (A.F.); (M.E.P.); (N.T.); (K.B.); (J.T.); (M.D.S.); (G.L.S.); (W.L.L.)
| | - Michelle E. Pewarchuk
- British Columbia Cancer Research Institute, Vancouver, BC V5Z 1L3, Canada; (V.G.P.S.); (A.F.); (M.E.P.); (N.T.); (K.B.); (J.T.); (M.D.S.); (G.L.S.); (W.L.L.)
| | - Melis Erkan
- Department of Pathology and Laboratory Medicine, IWK Health Centre, Halifax, NS B3K 6R8, Canada;
- Department of Pathology, Faculty of Medicine, Dalhousie University, Halifax, NS B3K 6R8, Canada
- Beatrice Hunter Cancer Research Institute, Halifax, NS B3H 4R2, Canada
| | - Nikita Telkar
- British Columbia Cancer Research Institute, Vancouver, BC V5Z 1L3, Canada; (V.G.P.S.); (A.F.); (M.E.P.); (N.T.); (K.B.); (J.T.); (M.D.S.); (G.L.S.); (W.L.L.)
- British Columbia Children’s Hospital Research Institute, Vancouver, BC V5Z 4H4, Canada
| | - Katya Benard
- British Columbia Cancer Research Institute, Vancouver, BC V5Z 1L3, Canada; (V.G.P.S.); (A.F.); (M.E.P.); (N.T.); (K.B.); (J.T.); (M.D.S.); (G.L.S.); (W.L.L.)
| | - Jessica Trejo
- British Columbia Cancer Research Institute, Vancouver, BC V5Z 1L3, Canada; (V.G.P.S.); (A.F.); (M.E.P.); (N.T.); (K.B.); (J.T.); (M.D.S.); (G.L.S.); (W.L.L.)
| | - Matt D. Stewart
- British Columbia Cancer Research Institute, Vancouver, BC V5Z 1L3, Canada; (V.G.P.S.); (A.F.); (M.E.P.); (N.T.); (K.B.); (J.T.); (M.D.S.); (G.L.S.); (W.L.L.)
| | - Greg L. Stewart
- British Columbia Cancer Research Institute, Vancouver, BC V5Z 1L3, Canada; (V.G.P.S.); (A.F.); (M.E.P.); (N.T.); (K.B.); (J.T.); (M.D.S.); (G.L.S.); (W.L.L.)
| | - Patricia P. Reis
- Molecular Oncology Laboratory, Experimental Research Unit, School of Medicine, São Paulo State University (UNESP), Botucatu 18618-687, SP, Brazil;
- Department of Surgery and Orthopedics, Faculty of Medicine, São Paulo State University (UNESP), Botucatu 18618-687, SP, Brazil
| | - Wan L. Lam
- British Columbia Cancer Research Institute, Vancouver, BC V5Z 1L3, Canada; (V.G.P.S.); (A.F.); (M.E.P.); (N.T.); (K.B.); (J.T.); (M.D.S.); (G.L.S.); (W.L.L.)
| | - Victor D. Martinez
- Department of Pathology and Laboratory Medicine, IWK Health Centre, Halifax, NS B3K 6R8, Canada;
- Department of Pathology, Faculty of Medicine, Dalhousie University, Halifax, NS B3K 6R8, Canada
- Beatrice Hunter Cancer Research Institute, Halifax, NS B3H 4R2, Canada
| |
Collapse
|
21
|
Guo J, Han X, Li J, Li Z, Yi J, Gao Y, Zhao X, Yue W. Single-cell transcriptomics in ovarian cancer identify a metastasis-associated cell cluster overexpressed RAB13. J Transl Med 2023; 21:254. [PMID: 37046345 PMCID: PMC10091580 DOI: 10.1186/s12967-023-04094-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 03/28/2023] [Indexed: 04/14/2023] Open
Abstract
BACKGROUND Metastasis, the leading cause of cancer-related death in patients diagnosed with ovarian cancer (OC), is a complex process that involves multiple biological effects. With the continuous development of sequencing technology, single-cell sequence has emerged as a promising strategy to understand the pathogenesis of ovarian cancer. METHODS Through integrating 10 × single-cell data from 12 samples, we developed a single-cell map of primary and metastatic OC. By copy-number variations analysis, pseudotime analysis, enrichment analysis, and cell-cell communication analysis, we explored the heterogeneity among OC cells. We performed differential expression analysis and high dimensional weighted gene co-expression network analysis to identify the hub genes of C4. The effects of RAB13 on OC cell lines were validated in vitro. RESULTS We discovered a cell subcluster, referred to as C4, that is closely associated with metastasis and poor prognosis in OC. This subcluster correlated with an epithelial-mesenchymal transition (EMT) and angiogenesis signature and RAB13 was identified as the key marker of it. Downregulation of RAB13 resulted in a reduction of OC cells migration and invasion. Additionally, we predicted several potential drugs that might inhibit RAB13. CONCLUSIONS Our study has identified a cell subcluster that is closely linked to metastasis in OC, and we have also identified RAB13 as its hub gene that has great potential to become a new therapeutic target for OC.
Collapse
Affiliation(s)
- Jiahao Guo
- Central Laboratory, Beijing Maternal and Child Health Care Hospital, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, Beijing, 100026, China
| | - Xiaoyang Han
- Central Laboratory, Beijing Maternal and Child Health Care Hospital, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, Beijing, 100026, China
| | - Jie Li
- Central Laboratory, Beijing Maternal and Child Health Care Hospital, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, Beijing, 100026, China
| | - Zhefeng Li
- Central Laboratory, Beijing Maternal and Child Health Care Hospital, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, Beijing, 100026, China
| | - Junjie Yi
- Central Laboratory, Beijing Maternal and Child Health Care Hospital, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, Beijing, 100026, China
| | - Yan Gao
- Central Laboratory, Beijing Maternal and Child Health Care Hospital, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, Beijing, 100026, China.
| | - Xiaoting Zhao
- Central Laboratory, Beijing Maternal and Child Health Care Hospital, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, Beijing, 100026, China.
| | - Wentao Yue
- Central Laboratory, Beijing Maternal and Child Health Care Hospital, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, Beijing, 100026, China.
| |
Collapse
|
22
|
Chen Z, Zhang B, Gong F, Wan L, Ma L. RobustTree: An adaptive, robust PCA algorithm for embedded tree structure recovery from single-cell sequencing data. Front Genet 2023; 14:1110899. [PMID: 36968591 PMCID: PMC10030613 DOI: 10.3389/fgene.2023.1110899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Accepted: 02/13/2023] [Indexed: 03/11/2023] Open
Abstract
Robust Principal Component Analysis (RPCA) offers a powerful tool for recovering a low-rank matrix from highly corrupted data, with growing applications in computational biology. Biological processes commonly form intrinsic hierarchical structures, such as tree structures of cell development trajectories and tumor evolutionary history. The rapid development of single-cell sequencing (SCS) technology calls for the recovery of embedded tree structures from noisy and heterogeneous SCS data. In this study, we propose RobustTree, a unified framework to reconstruct the inherent topological structure underlying high-dimensional data with noise. By extending RPCA to handle tree structure optimization, RobustTree leverages data denoising, clustering, and tree structure reconstruction. It solves the tree optimization problem with an adaptive parameter selection scheme that we proposed. In addition to recovering real datasets, RobustTree can reconstruct continuous topological structure and discrete-state topological structure of underlying SCS data. We apply RobustTree on multiple synthetic and real datasets and demonstrate its high accuracy and robustness when analyzing high-noise SCS data with embedded complex structures. The code is available at https://github.com/ucasdp/RobustTree.
Collapse
Affiliation(s)
- Ziwei Chen
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, United States
| | - Bingwei Zhang
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Fuzhou Gong
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Lin Wan
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
- *Correspondence: Lin Wan, ; Liang Ma,
| | - Liang Ma
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
- Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- *Correspondence: Lin Wan, ; Liang Ma,
| |
Collapse
|
23
|
Jun SH, Toosi H, Mold J, Engblom C, Chen X, O'Flanagan C, Hagemann-Jensen M, Sandberg R, Aparicio S, Hartman J, Roth A, Lagergren J. Reconstructing clonal tree for phylo-phenotypic characterization of cancer using single-cell transcriptomics. Nat Commun 2023; 14:982. [PMID: 36813776 PMCID: PMC9946941 DOI: 10.1038/s41467-023-36202-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 01/20/2023] [Indexed: 02/24/2023] Open
Abstract
Functional characterization of the cancer clones can shed light on the evolutionary mechanisms driving cancer's proliferation and relapse mechanisms. Single-cell RNA sequencing data provide grounds for understanding the functional state of cancer as a whole; however, much research remains to identify and reconstruct clonal relationships toward characterizing the changes in functions of individual clones. We present PhylEx that integrates bulk genomics data with co-occurrences of mutations from single-cell RNA sequencing data to reconstruct high-fidelity clonal trees. We evaluate PhylEx on synthetic and well-characterized high-grade serous ovarian cancer cell line datasets. PhylEx outperforms the state-of-the-art methods both when comparing capacity for clonal tree reconstruction and for identifying clones. We analyze high-grade serous ovarian cancer and breast cancer data to show that PhylEx exploits clonal expression profiles beyond what is possible with expression-based clustering methods and clear the way for accurate inference of clonal trees and robust phylo-phenotypic analysis of cancer.
Collapse
Affiliation(s)
- Seong-Hwan Jun
- SciLifeLab, School of EECS, KTH Royal Institute of Technology, Stockholm, Sweden.,Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, USA
| | - Hosein Toosi
- SciLifeLab, School of EECS, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Jeff Mold
- Department of Cell and Molecular Biology, Karolinska Institutet, Solna, Sweden
| | - Camilla Engblom
- Department of Cell and Molecular Biology, Karolinska Institutet, Solna, Sweden
| | - Xinsong Chen
- Department of Oncology and Pathology, Karolinska Institutet, Solna, Sweden
| | - Ciara O'Flanagan
- Department of Molecular Oncology, BC Cancer, Vancouver, BC, Canada
| | | | - Rickard Sandberg
- Department of Cell and Molecular Biology, Karolinska Institutet, Solna, Sweden
| | - Samuel Aparicio
- Department of Molecular Oncology, BC Cancer, Vancouver, BC, Canada.,Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
| | - Johan Hartman
- Department of Oncology and Pathology, Karolinska Institutet, Solna, Sweden.,Department of Clinical Pathology and Cytology, Karolinska University Laboratory, Stockholm, Sweden
| | - Andrew Roth
- Department of Molecular Oncology, BC Cancer, Vancouver, BC, Canada. .,Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada. .,Department of Computer Science, University of British Columbia, Vancouver, Canada.
| | - Jens Lagergren
- SciLifeLab, School of EECS, KTH Royal Institute of Technology, Stockholm, Sweden.
| |
Collapse
|
24
|
Moen MT, Johnston IG. HyperHMM: efficient inference of evolutionary and progressive dynamics on hypercubic transition graphs. Bioinformatics 2022; 39:6895098. [PMID: 36511587 PMCID: PMC9848056 DOI: 10.1093/bioinformatics/btac803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 11/11/2022] [Accepted: 12/12/2022] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION The evolution of bacterial drug resistance and other features in biology, the progression of cancer and other diseases and a wide range of broader questions can often be viewed as the sequential stochastic acquisition of binary traits (e.g. genetic changes, symptoms or characters). Using potentially noisy or incomplete data to learn the sequences by which such traits are acquired is a problem of general interest. The problem is complicated for large numbers of traits, which may, individually or synergistically, influence the probability of further acquisitions both positively and negatively. Hypercubic inference approaches, based on hidden Markov models on a hypercubic transition network, address these complications, but previous Bayesian instances can consume substantial time for converged results, limiting their practical use. RESULTS Here, we introduce HyperHMM, an adapted Baum-Welch (expectation-maximization) algorithm for hypercubic inference with resampling to quantify uncertainty, and show that it allows orders-of-magnitude faster inference while making few practical sacrifices compared to previous hypercubic inference approaches. We show that HyperHMM allows any combination of traits to exert arbitrary positive or negative influence on the acquisition of other traits, relaxing a common limitation of only independent trait influences. We apply this approach to synthetic and biological datasets and discuss its more general application in learning evolutionary and progressive pathways. AVAILABILITY AND IMPLEMENTATION Code for inference and visualization, and data for example cases, is freely available at https://github.com/StochasticBiology/hypercube-hmm. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Marcus T Moen
- Department of Mathematics, University of Bergen, Bergen, Vestland, Norway
| | | |
Collapse
|
25
|
Yan J, Ma M, Yu Z. bmVAE: a variational autoencoder method for clustering single-cell mutation data. Bioinformatics 2022; 39:6881080. [PMID: 36478203 PMCID: PMC9825778 DOI: 10.1093/bioinformatics/btac790] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 10/26/2022] [Accepted: 12/06/2022] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Genetic intra-tumor heterogeneity (ITH) characterizes the differences in genomic variations between tumor clones, and accurately unmasking ITH is important for personalized cancer therapy. Single-cell DNA sequencing now emerges as a powerful means for deciphering underlying ITH based on point mutations of single cells. However, detecting tumor clones from single-cell mutation data remains challenging due to the error-prone and discrete nature of the data. RESULTS We introduce bmVAE, a bioinformatics tool for learning low-dimensional latent representation of single cell based on a variational autoencoder and then clustering cells into subpopulations in the latent space. bmVAE takes single-cell binary mutation data as inputs, and outputs inferred cell subpopulations as well as their genotypes. To achieve this, the bmVAE framework is designed to consist of three modules including dimensionality reduction, cell clustering and genotype estimation. We assess the method on various synthetic datasets where different factors including false negative rate, data size and data heterogeneity are considered in simulation, and further demonstrate its effectiveness on two real datasets. The results suggest bmVAE is highly effective in reasoning ITH, and performs competitive to existing methods. AVAILABILITY AND IMPLEMENTATION bmVAE is freely available at https://github.com/zhyu-lab/bmvae. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiaqian Yan
- School of Information Engineering, Ningxia University, Yinchuan 750021, China
| | - Ming Ma
- School of Information Engineering, Ningxia University, Yinchuan 750021, China
| | - Zhenhua Yu
- To whom correspondence should be addressed.
| |
Collapse
|
26
|
Kang S, Borgsmüller N, Valecha M, Kuipers J, Alves JM, Prado-López S, Chantada D, Beerenwinkel N, Posada D, Szczurek E. SIEVE: joint inference of single-nucleotide variants and cell phylogeny from single-cell DNA sequencing data. Genome Biol 2022; 23:248. [PMID: 36451239 PMCID: PMC9714196 DOI: 10.1186/s13059-022-02813-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2022] [Accepted: 11/08/2022] [Indexed: 12/02/2022] Open
Abstract
We present SIEVE, a statistical method for the joint inference of somatic variants and cell phylogeny under the finite-sites assumption from single-cell DNA sequencing. SIEVE leverages raw read counts for all nucleotides and corrects the acquisition bias of branch lengths. In our simulations, SIEVE outperforms other methods in phylogenetic reconstruction and variant calling accuracy, especially in the inference of homozygous variants. Applying SIEVE to three datasets, one for triple-negative breast (TNBC), and two for colorectal cancer (CRC), we find that double mutant genotypes are rare in CRC but unexpectedly frequent in the TNBC samples.
Collapse
Affiliation(s)
- Senbai Kang
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
| | - Nico Borgsmüller
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland
| | - Monica Valecha
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
| | - Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland
| | - Joao M. Alves
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
| | - Sonia Prado-López
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
- Institute of Solid State Electronics E362, Technische Universität Wien, Vienna, Austria
| | - Débora Chantada
- Department of Pathology, Hospital Álvaro Cunqueiro, Vigo, Spain
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland
| | - David Posada
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
- Department of Biochemistry, Genetics, and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Ewa Szczurek
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
| |
Collapse
|
27
|
Pellegrina L, Vandin F. Discovering significant evolutionary trajectories in cancer phylogenies. Bioinformatics 2022; 38:ii49-ii55. [PMID: 36124798 DOI: 10.1093/bioinformatics/btac467] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Tumors are the result of a somatic evolutionary process leading to substantial intra-tumor heterogeneity. Single-cell and multi-region sequencing enable the detailed characterization of the clonal architecture of tumors and have highlighted its extensive diversity across tumors. While several computational methods have been developed to characterize the clonal composition and the evolutionary history of tumors, the identification of significantly conserved evolutionary trajectories across tumors is still a major challenge. RESULTS We present a new algorithm, MAximal tumor treeS TRajectOries (MASTRO), to discover significantly conserved evolutionary trajectories in cancer. MASTRO discovers all conserved trajectories in a collection of phylogenetic trees describing the evolution of a cohort of tumors, allowing the discovery of conserved complex relations between alterations. MASTRO assesses the significance of the trajectories using a conditional statistical test that captures the coherence in the order in which alterations are observed in different tumors. We apply MASTRO to data from nonsmall-cell lung cancer bulk sequencing and to acute myeloid leukemia data from single-cell panel sequencing, and find significant evolutionary trajectories recapitulating and extending the results reported in the original studies. AVAILABILITY AND IMPLEMENTATION MASTRO is available at https://github.com/VandinLab/MASTRO. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Leonardo Pellegrina
- Department of Information Engineering, University of Padova, Padova, 35129, Italy
| | - Fabio Vandin
- Department of Information Engineering, University of Padova, Padova, 35129, Italy
| |
Collapse
|
28
|
Kızılkale C, Rashidi Mehrabadi F, Sadeqi Azer E, Pérez-Guijarro E, Marie KL, Lee MP, Day CP, Merlino G, Ergün F, Buluç A, Sahinalp SC, Malikić S. Fast intratumor heterogeneity inference from single-cell sequencing data. NATURE COMPUTATIONAL SCIENCE 2022; 2:577-583. [PMID: 38177468 PMCID: PMC10765963 DOI: 10.1038/s43588-022-00298-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 07/14/2022] [Indexed: 01/06/2024]
Abstract
We introduce HUNTRESS, a computational method for mutational intratumor heterogeneity inference from noisy genotype matrices derived from single-cell sequencing data, the running time of which is linear with the number of cells and quadratic with the number of mutations. We prove that, under reasonable conditions, HUNTRESS computes the true progression history of a tumor with high probability. On simulated and real tumor sequencing data, HUNTRESS is demonstrated to be faster than available alternatives with comparable or better accuracy. Additionally, the progression histories of tumors inferred by HUNTRESS on real single-cell sequencing datasets agree with the best known evolution scenarios for the associated tumors.
Collapse
Affiliation(s)
- Can Kızılkale
- Department of Electrical Engineering and Computer Sciences UC Berkeley, Berkeley, CA, USA
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Farid Rashidi Mehrabadi
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
- Department of Computer Science, Indiana University, Bloomington, IN, USA
| | - Erfan Sadeqi Azer
- Department of Computer Science, Indiana University, Bloomington, IN, USA
- Google LLC, Sunnyvale, CA, USA
| | - Eva Pérez-Guijarro
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Kerrie L Marie
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Maxwell P Lee
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Chi-Ping Day
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Glenn Merlino
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Funda Ergün
- Department of Computer Science, Indiana University, Bloomington, IN, USA
| | - Aydın Buluç
- Department of Electrical Engineering and Computer Sciences UC Berkeley, Berkeley, CA, USA
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - S Cenk Sahinalp
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
| | - Salem Malikić
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
29
|
Foroughmand-Araabi MH, Goliaei S, McHardy AC. Scelestial: Fast and accurate single-cell lineage tree inference based on a Steiner tree approximation algorithm. PLoS Comput Biol 2022; 18:e1009100. [PMID: 35951662 PMCID: PMC9426887 DOI: 10.1371/journal.pcbi.1009100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 08/30/2022] [Accepted: 06/23/2022] [Indexed: 11/19/2022] Open
Abstract
Single-cell genome sequencing provides a highly granular view of biological systems but is affected by high error rates, allelic amplification bias, and uneven genome coverage. This creates a need for data-specific computational methods, for purposes such as for cell lineage tree inference. The objective of cell lineage tree reconstruction is to infer the evolutionary process that generated a set of observed cell genomes. Lineage trees may enable a better understanding of tumor formation and growth, as well as of organ development for healthy body cells. We describe a method, Scelestial, for lineage tree reconstruction from single-cell data, which is based on an approximation algorithm for the Steiner tree problem and is a generalization of the neighbor-joining method. We adapt the algorithm to efficiently select a limited subset of potential sequences as internal nodes, in the presence of missing values, and to minimize cost by lineage tree-based missing value imputation. In a comparison against seven state-of-the-art single-cell lineage tree reconstruction algorithms—BitPhylogeny, OncoNEM, SCITE, SiFit, SASC, SCIPhI, and SiCloneFit—on simulated and real single-cell tumor samples, Scelestial performed best at reconstructing trees in terms of accuracy and run time. Scelestial has been implemented in C++. It is also available as an R package named RScelestial.
Collapse
Affiliation(s)
- Mohammad-Hadi Foroughmand-Araabi
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Sama Goliaei
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Alice C. McHardy
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
- * E-mail:
| |
Collapse
|
30
|
Zhang X, Yang L, Lei W, Hou Q, Huang M, Zhou R, Enver T, Wu S. Single-cell sequencing reveals CD133+CD44−-originating evolution and novel stemness related variants in human colorectal cancer. EBioMedicine 2022; 82:104125. [PMID: 35785618 PMCID: PMC9254347 DOI: 10.1016/j.ebiom.2022.104125] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 06/06/2022] [Accepted: 06/07/2022] [Indexed: 11/30/2022] Open
Abstract
Background Tumor heterogeneity of human colorectal cancer (CRC)-initiating cells (CRCICs) in cancer tissues often represents aggressive features of cancer progression. For high-resolution examination of CRCICs, we performed single-cell whole-exome sequencing (scWES) and bulk cell targeted exome sequencing (TES) of CRCICs to investigate stemness-specific somatic alterations or clonal evolution. Methods Single cells of three subpopulations of CRCICs (CD133+CD44+, CD133−CD44+, and CD133+CD44− cells), CRC cells (CRCCs), and control cells from one CRC tissue were sorted for scWES. Then, we set up a mutation panel from scWES data and TES was used to validate mutation distribution and clonal evolution in additional 96 samples (20 patients) those were also sorted into the same three groups of CRCICs and CRCCs. The knock-down experiments were used to analyze stemness-related mutant genes. Neoantigens of these mutant genes and their MHC binding affinity were also analyzed. Findings Clonal evolution analysis of scWES and TES showed that the CD133+CD44− CRCICs were the likely origin of CRC before evolving into other groups of CRCICs/CRCCs. We revealed that AHNAK2, PLIN4, HLA-B, ALK, CCDC92 and ALMS1 genes were specifically mutated in CRCICs followed by the validation of their functions. Furthermore, four predicted neoantigens of AHNAK2 were identified and validated, which might have applications in immunotherapy for CRC patients. Interpretation All the integrative analyses above revealed clonal evolution of CRC and new markers for CRCICs and demonstrate the important roles of CRCICs in tumorigenesis and progression of CRCs. Funding A full list of funding bodies that contributed to this study can be found in the Acknowledgements section.
Collapse
Affiliation(s)
- Xiaoyan Zhang
- Department of Radiotherapy, The Second Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Ling Yang
- Department of Radiotherapy, The Second Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Wanjun Lei
- Novogene Bioinformatics Institute, Beijing, China
| | - Qiang Hou
- Clinical laboratory, Hangzhou Cancer Hospital, Hangzhou, China
| | - Ming Huang
- Clinical laboratory, Hangzhou Cancer Hospital, Hangzhou, China
| | - Rongjing Zhou
- Department of Pathology, Hangzhou Cancer Hospital, Hangzhou, China
| | - Tariq Enver
- Cancer Institute, University College London, United Kingdom.
| | - Shixiu Wu
- Department of Radiotherapy, The Second Affiliated Hospital of Wenzhou Medical University, Wenzhou, China.
| |
Collapse
|
31
|
Chen K, Moravec JÍC, Gavryushkin A, Welch D, Drummond AJ. Accounting for errors in data improves divergence time estimates in single-cell cancer evolution. Mol Biol Evol 2022; 39:6613463. [PMID: 35733333 PMCID: PMC9356729 DOI: 10.1093/molbev/msac143] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Single-cell sequencing provides a new way to explore the evolutionary history of cells. Compared to traditional bulk sequencing, where a population of heterogeneous cells is pooled to form a single observation, single-cell sequencing isolates and amplifies genetic material from individual cells, thereby preserving the information about the origin of the sequences. However, single-cell data is more error-prone than bulk sequencing data due to the limited genomic material available per cell. Here, we present error and mutation models for evolutionary inference of single-cell data within a mature and extensible Bayesian framework, BEAST2. Our framework enables integration with biologically informative models such as relaxed molecular clocks and population dynamic models. Our simulations show that modeling errors increase the accuracy of relative divergence times and substitution parameters. We reconstruct the phylogenetic history of a colorectal cancer patient and a healthy patient from single-cell DNA sequencing data. We find that the estimated times of terminal splitting events are shifted forward in time compared to models which ignore errors. We observed that not accounting for errors can overestimate the phylogenetic diversity in single-cell DNA sequencing data. We estimate that 30-50% of the apparent diversity can be attributed to error. Our work enables a full Bayesian approach capable of accounting for errors in the data within the integrative Bayesian software framework BEAST2.
Collapse
Affiliation(s)
- Kylie Chen
- School of Computer Science, University of Auckland, Auckland, New Zealand
| | - Jiř Í C Moravec
- Department of Computer Science, University of Otago, Dunedin, New Zealand.,School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
| | - Alex Gavryushkin
- School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
| | - David Welch
- School of Computer Science, University of Auckland, Auckland, New Zealand
| | - Alexei J Drummond
- School of Computer Science, University of Auckland, Auckland, New Zealand.,School of Biological Sciences, University of Auckland, Auckland, New Zealand
| |
Collapse
|
32
|
Markowska M, Cąkała T, Miasojedow B, Aybey B, Juraeva D, Mazur J, Ross E, Staub E, Szczurek E. CONET: copy number event tree model of evolutionary tumor history for single-cell data. Genome Biol 2022; 23:128. [PMID: 35681161 PMCID: PMC9185904 DOI: 10.1186/s13059-022-02693-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 05/23/2022] [Indexed: 11/10/2022] Open
Abstract
Copy number alterations constitute important phenomena in tumor evolution. Whole genome single-cell sequencing gives insight into copy number profiles of individual cells, but is highly noisy. Here, we propose CONET, a probabilistic model for joint inference of the evolutionary tree on copy number events and copy number calling. CONET employs an efficient, regularized MCMC procedure to search the space of possible model structures and parameters. We introduce a range of model priors and penalties for efficient regularization. CONET reveals copy number evolution in two breast cancer samples, and outperforms other methods in tree reconstruction, breakpoint identification and copy number calling.
Collapse
Affiliation(s)
- Magda Markowska
- University of Warsaw, Faculty of Mathematics, Informatics and Mechanics, Banacha 2, Warsaw, Poland
- Medical University of Warsaw, Postgraduate School of Molecular Medicine, Ks. Trojdena 2a Street, Warsaw, Poland
| | - Tomasz Cąkała
- University of Warsaw, Faculty of Mathematics, Informatics and Mechanics, Banacha 2, Warsaw, Poland
| | - BłaŻej Miasojedow
- University of Warsaw, Faculty of Mathematics, Informatics and Mechanics, Banacha 2, Warsaw, Poland
| | - Bogac Aybey
- Merck Healthcare KGaA, Translational Medicine, Oncology Bioinformatics, Frankfurter Str. 250, Darmstadt, 64293 Germany
| | - Dilafruz Juraeva
- Merck Healthcare KGaA, Translational Medicine, Oncology Bioinformatics, Frankfurter Str. 250, Darmstadt, 64293 Germany
| | - Johanna Mazur
- Merck Healthcare KGaA, Translational Medicine, Oncology Bioinformatics, Frankfurter Str. 250, Darmstadt, 64293 Germany
| | - Edith Ross
- Merck Healthcare KGaA, Translational Medicine, Oncology Bioinformatics, Frankfurter Str. 250, Darmstadt, 64293 Germany
| | - Eike Staub
- Merck Healthcare KGaA, Translational Medicine, Oncology Bioinformatics, Frankfurter Str. 250, Darmstadt, 64293 Germany
| | - Ewa Szczurek
- University of Warsaw, Faculty of Mathematics, Informatics and Mechanics, Banacha 2, Warsaw, Poland
| |
Collapse
|
33
|
Thomas DS, Cisneros LH, Anderson ARA, Maley CC. In Silico Investigations of Multi-Drug Adaptive Therapy Protocols. Cancers (Basel) 2022; 14:2699. [PMID: 35681680 PMCID: PMC9179496 DOI: 10.3390/cancers14112699] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 05/21/2022] [Accepted: 05/25/2022] [Indexed: 11/17/2022] Open
Abstract
The standard of care for cancer patients aims to eradicate the tumor by killing the maximum number of cancer cells using the maximum tolerated dose (MTD) of a drug. MTD causes significant toxicity and selects for resistant cells, eventually making the tumor refractory to treatment. Adaptive therapy aims to maximize time to progression (TTP), by maintaining sensitive cells to compete with resistant cells. We explored both dose modulation (DM) protocols and fixed dose (FD) interspersed with drug holiday protocols. In contrast to previous single drug protocols, we explored the determinants of success of two-drug adaptive therapy protocols, using an agent-based model. In almost all cases, DM protocols (but not FD protocols) increased TTP relative to MTD. DM protocols worked well when there was more competition, with a higher cost of resistance, greater cell turnover, and when crowded proliferating cells could replace their neighbors. The amount that the drug dose was changed, mattered less. The more sensitive the protocol was to tumor burden changes, the better. In general, protocols that used as little drug as possible, worked best. Preclinical experiments should test these predictions, especially dose modulation protocols, with the goal of generating successful clinical trials for greater cancer control.
Collapse
Affiliation(s)
- Daniel S. Thomas
- Arizona Cancer Evolution Center, Arizona State University, Tempe, AZ 85287, USA; (D.S.T.); (L.H.C.)
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
- Biodesign Center for Biocomputing, Security and Society, Arizona State University, Tempe, AZ 85287, USA
| | - Luis H. Cisneros
- Arizona Cancer Evolution Center, Arizona State University, Tempe, AZ 85287, USA; (D.S.T.); (L.H.C.)
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
- Biodesign Center for Biocomputing, Security and Society, Arizona State University, Tempe, AZ 85287, USA
| | | | - Carlo C. Maley
- Arizona Cancer Evolution Center, Arizona State University, Tempe, AZ 85287, USA; (D.S.T.); (L.H.C.)
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
- Biodesign Center for Biocomputing, Security and Society, Arizona State University, Tempe, AZ 85287, USA
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ 85287, USA
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ 85287, USA
| |
Collapse
|
34
|
Feng X, Chen L. SCSilicon: a tool for synthetic single-cell DNA sequencing data generation. BMC Genomics 2022; 23:359. [PMID: 35546390 PMCID: PMC9092674 DOI: 10.1186/s12864-022-08566-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 04/19/2022] [Indexed: 11/25/2022] Open
Abstract
Background Single-cell DNA sequencing is getting indispensable in the study of cell-specific cancer genomics. The performance of computational tools that tackle single-cell genome aberrations may be nevertheless undervalued or overvalued, owing to the insufficient size of benchmarking data. In silicon simulation is a cost-effective approach to generate as many single-cell genomes as possible in a controlled manner to make reliable and valid benchmarking. Results This study proposes a new tool, SCSilicon, which efficiently generates single-cell in silicon DNA reads with minimum manual intervention. SCSilicon automatically creates a set of genomic aberrations, including SNP, SNV, Indel, and CNV. Besides, SCSilicon yields the ground truth of CNV segmentation breakpoints and subclone cell labels. We have manually inspected a series of synthetic variations. We conducted a sanity check of the start-of-the-art single-cell CNV callers and found SCYN was the most robust one. Conclusions SCSilicon is a user-friendly software package for users to develop and benchmark single-cell CNV callers. Source code of SCSilicon is available at https://github.com/xikanfeng2/SCSilicon. Supplementary Information The online version contains supplementary material available at (10.1186/s12864-022-08566-w).
Collapse
Affiliation(s)
- Xikang Feng
- School of Software, Northwestern Polytechnical University, Xi'an, Shaanxi, 710072, China.
| | - Lingxi Chen
- Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, China
| |
Collapse
|
35
|
Chen Z, Gong F, Wan L, Ma L. BiTSC
2: Bayesian inference of tumor clonal tree by joint analysis of single-cell SNV and CNA data. Brief Bioinform 2022; 23:6562684. [PMID: 35368055 PMCID: PMC9116244 DOI: 10.1093/bib/bbac092] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Revised: 01/29/2022] [Accepted: 02/23/2022] [Indexed: 12/14/2022] Open
Abstract
Abstract
The rapid development of single-cell DNA sequencing (scDNA-seq) technology has greatly enhanced the resolution of tumor cell profiling, providing an unprecedented perspective in characterizing intra-tumoral heterogeneity and understanding tumor progression and metastasis. However, prominent algorithms for constructing tumor phylogeny based on scDNA-seq data usually only take single nucleotide variations (SNVs) as markers, failing to consider the effect caused by copy number alterations (CNAs). Here, we propose BiTSC$^2$, Bayesian inference of Tumor clonal Tree by joint analysis of Single-Cell SNV and CNA data. BiTSC$^2$ takes raw reads from scDNA-seq as input, accounts for the overlapping of CNA and SNV, models allelic dropout rate, sequencing errors and missing rate, as well as assigns single cells into subclones. By applying Markov Chain Monte Carlo sampling, BiTSC$^2$ can simultaneously estimate the subclonal scCNA and scSNV genotype matrices, subclonal assignments and tumor subclonal evolutionary tree. In comparison with existing methods on synthetic and real tumor data, BiTSC$^2$ shows high accuracy in genotype recovery, subclonal assignment and tree reconstruction. BiTSC$^2$ also performs robustly in dealing with scDNA-seq data with low sequencing depth and variant missing rate. BiTSC$^2$ software is available at https://github.com/ucasdp/BiTSC2.
Collapse
Affiliation(s)
- Ziwei Chen
- Institute of Zoology, Chinese Academy of Sciences, Beichen West Road, 100101, Beijing, Country
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Zhongguancun East Road, 100190, Beijing, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Yuquan Road, 100049, Beijing, China
| | - Fuzhou Gong
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Zhongguancun East Road, 100190, Beijing, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Yuquan Road, 100049, Beijing, China
| | - Lin Wan
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Zhongguancun East Road, 100190, Beijing, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Yuquan Road, 100049, Beijing, China
| | - Liang Ma
- Institute of Zoology, Chinese Academy of Sciences, Beichen West Road, 100101, Beijing, Country
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Yuquan Road, 100049, Beijing, China
| |
Collapse
|
36
|
Farswan A, Gupta R, Gupta A. ARCANE-ROG: Algorithm for Reconstruction of Cancer Evolution from single-cell data using Robust Graph Learning. J Biomed Inform 2022; 129:104055. [PMID: 35337943 DOI: 10.1016/j.jbi.2022.104055] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 02/17/2022] [Accepted: 03/12/2022] [Indexed: 11/27/2022]
Abstract
Tumor heterogeneity, marked by the presence of divergent clonal subpopulations of tumor cells, impedes the treatment response in cancer patients. Single-cell sequencing technology provides substantial prospects to gain an in-depth understanding of the cellular phenotypic variability driving tumor progression. A comprehensive insight into the intra-tumor heterogeneity may further assist in dealing with the treatment-resistant clones in cancer patients, thereby improving their overall survival. However, this task is hampered due to the challenges associated with the single-cell data, such as false positives, false negatives and missing bases, and the increase in their size. As a result, the computational cost of the existing methods increases, thereby limiting their usage. In this work, we propose a robust graph learning-based method, ARCANE-ROG (Algorithm for Reconstruction of CANcer Evolution via RObust Graph learning), for inferring clonal evolution from single-cell datasets. The first step of the proposed method is a joint framework of denoising with data imputation for the noisy and incomplete matrix while simultaneously learning an adjacency graph. Both the operations in the joint framework boost each other such that the overall performance of the denoising algorithm is improved. In the second step, an optimal number of clusters are identified via the Leiden method. In the last step, clonal evolution trees are inferred via a minimum spanning tree algorithm. The method has been benchmarked against a state-of-the-art method, RobustClone, using simulated datasets of varying sizes and five real datasets. The performance of our proposed method is found to be significantly superior (p-value < 0.05) in terms of reconstruction error, False Positive to False Negative (FPFN) ratio, tree distance error and V-measure compared to the other method. Overall, the proposed method is an improvement over the existing methods as it enhances cluster assignment and inference on clonal hierarchies.
Collapse
Affiliation(s)
- Akanksha Farswan
- SBILab, Department of ECE, Indraprastha Institute of Information Technology, New Delhi, India
| | - Ritu Gupta
- Laboratory Oncology Unit, Dr. B.R.A. IRCH, AIIMS, New Delhi, India.
| | - Anubha Gupta
- SBILab, Department of ECE, Indraprastha Institute of Information Technology, New Delhi, India.
| |
Collapse
|
37
|
Yu Z, Du F, Song L. SCClone: Accurate Clustering of Tumor Single-Cell DNA Sequencing Data. Front Genet 2022; 13:823941. [PMID: 35154282 PMCID: PMC8830741 DOI: 10.3389/fgene.2022.823941] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2021] [Accepted: 01/04/2022] [Indexed: 12/11/2022] Open
Abstract
Single-cell DNA sequencing (scDNA-seq) enables high-resolution profiling of genetic diversity among single cells and is especially useful for deciphering the intra-tumor heterogeneity and evolutionary history of tumor. Specific technical issues such as allele dropout, false-positive errors, and doublets make scDNA-seq data incomplete and error-prone, giving rise to a severe challenge of accurately inferring clonal architecture of tumor. To effectively address these issues, we introduce a new computational method called SCClone for reasoning subclones from single nucleotide variation (SNV) data of single cells. Specifically, SCClone leverages a probability mixture model for binary data to cluster single cells into distinct subclones. To accurately decipher underlying clonal composition, a novel model selection scheme based on inter-cluster variance is employed to find the optimal number of subclones. Extensive evaluations on various simulated datasets suggest SCClone has strong robustness against different technical noises in scDNA-seq data and achieves better performance than the state-of-the-art methods in reasoning clonal composition. Further evaluations of SCClone on three real scDNA-seq datasets show that it can effectively find the underlying subclones from severely disturbed data. The SCClone software is freely available at https://github.com/qasimyu/scclone.
Collapse
Affiliation(s)
- Zhenhua Yu
- School of Information Engineering, Ningxia University, Yinchuan, China.,Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Ningxia University, Yinchuan, China
| | - Fang Du
- School of Information Engineering, Ningxia University, Yinchuan, China.,Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Ningxia University, Yinchuan, China
| | - Lijuan Song
- School of Information Engineering, Ningxia University, Yinchuan, China.,Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Ningxia University, Yinchuan, China
| |
Collapse
|
38
|
Kester L, de Barbanson B, Lyubimova A, Chen LT, van der Schrier V, Alemany A, Mooijman D, Peterson-Maduro J, Drost J, de Ridder J, van Oudenaarden A. Integration of multiple lineage measurements from the same cell reconstructs parallel tumor evolution. CELL GENOMICS 2022; 2:100096. [PMID: 36778661 PMCID: PMC9903660 DOI: 10.1016/j.xgen.2022.100096] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Revised: 05/23/2021] [Accepted: 01/19/2022] [Indexed: 12/30/2022]
Abstract
Organoid evolution models complemented with integrated single-cell sequencing technology provide a powerful platform to characterize intra-tumor heterogeneity (ITH) and tumor evolution. Here, we conduct a parallel evolution experiment to mimic the tumor evolution process by evolving a colon cancer organoid model over 100 generations, spanning 6 months in time. We use single-cell whole-genome sequencing (WGS) in combination with viral lineage tracing at 12 time points to simultaneously monitor clone size, CNV states, SNV states, and viral lineage barcodes for 1,641 single cells. We integrate these measurements to construct clonal evolution trees with high resolution. We characterize the order of events in which chromosomal aberrations occur and identify aberrations that recur multiple times within the same tumor sub-population. We observe recurrent sequential loss of chromosome 4 after loss of chromosome 18 in four unique tumor clones. SNVs and CNVs identified in our organoid experiments are also frequently reported in colorectal carcinoma samples, and out of 334 patients with chromosome 18 loss in a Memorial Sloan Kettering colorectal cancer cohort, 99 (29.6%) also harbor chromosome 4 loss. Our study reconstructs tumor evolution in a colon cancer organoid model at high resolution, demonstrating an approach to identify potentially clinically relevant genomic aberrations in tumor evolution.
Collapse
Affiliation(s)
- Lennart Kester
- Oncode Institute, Hubrecht Institute-KNAW (Royal Netherlands Academy of Arts and Sciences), 3584 CT Utrecht, the Netherlands
| | - Buys de Barbanson
- Oncode Institute, Hubrecht Institute-KNAW (Royal Netherlands Academy of Arts and Sciences), 3584 CT Utrecht, the Netherlands,Oncode Institute, Center for Molecular Medicine, University Medical Center Utrecht, 3584 CX Utrecht, the Netherlands
| | - Anna Lyubimova
- Oncode Institute, Hubrecht Institute-KNAW (Royal Netherlands Academy of Arts and Sciences), 3584 CT Utrecht, the Netherlands
| | - Li-Ting Chen
- Oncode Institute, Hubrecht Institute-KNAW (Royal Netherlands Academy of Arts and Sciences), 3584 CT Utrecht, the Netherlands,Oncode Institute, Center for Molecular Medicine, University Medical Center Utrecht, 3584 CX Utrecht, the Netherlands
| | - Valérie van der Schrier
- Oncode Institute, Hubrecht Institute-KNAW (Royal Netherlands Academy of Arts and Sciences), 3584 CT Utrecht, the Netherlands
| | - Anna Alemany
- Oncode Institute, Hubrecht Institute-KNAW (Royal Netherlands Academy of Arts and Sciences), 3584 CT Utrecht, the Netherlands
| | - Dylan Mooijman
- Oncode Institute, Hubrecht Institute-KNAW (Royal Netherlands Academy of Arts and Sciences), 3584 CT Utrecht, the Netherlands
| | - Josi Peterson-Maduro
- Oncode Institute, Hubrecht Institute-KNAW (Royal Netherlands Academy of Arts and Sciences), 3584 CT Utrecht, the Netherlands
| | - Jarno Drost
- Oncode Institute, Princess Máxima Center for Pediatric Oncology, 3584 CS Utrecht, the Netherlands
| | - Jeroen de Ridder
- Oncode Institute, Center for Molecular Medicine, University Medical Center Utrecht, 3584 CX Utrecht, the Netherlands,Corresponding author
| | - Alexander van Oudenaarden
- Oncode Institute, Hubrecht Institute-KNAW (Royal Netherlands Academy of Arts and Sciences), 3584 CT Utrecht, the Netherlands,Corresponding author
| |
Collapse
|
39
|
Kozlov A, Alves JM, Stamatakis A, Posada D. CellPhy: accurate and fast probabilistic inference of single-cell phylogenies from scDNA-seq data. Genome Biol 2022; 23:37. [PMID: 35081992 PMCID: PMC8790911 DOI: 10.1186/s13059-021-02583-w] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Accepted: 12/20/2021] [Indexed: 01/15/2023] Open
Abstract
We introduce CellPhy, a maximum likelihood framework for inferring phylogenetic trees from somatic single-cell single-nucleotide variants. CellPhy leverages a finite-site Markov genotype model with 16 diploid states and considers amplification error and allelic dropout. We implement CellPhy into RAxML-NG, a widely used phylogenetic inference package that provides statistical confidence measurements and scales well on large datasets with hundreds or thousands of cells. Comprehensive simulations suggest that CellPhy is more robust to single-cell genomics errors and outperforms state-of-the-art methods under realistic scenarios, both in accuracy and speed. CellPhy is freely available at https://github.com/amkozlov/cellphy .
Collapse
Affiliation(s)
- Alexey Kozlov
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, 76128 Karlsruhe, Germany
| | - Joao M. Alves
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics, and Immunology, Universidade de Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany
| | - David Posada
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics, and Immunology, Universidade de Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
| |
Collapse
|
40
|
Govek K, Sikes C, Zhou Y, Oesper L. GraPhyC: Using Consensus to Infer Tumor Evolution. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:465-478. [PMID: 33031032 DOI: 10.1109/tcbb.2020.3029689] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
We consider the problem of finding a consensus tumor evolution tree from a set of conflicting input trees. In contrast to traditional phylogenetic trees, the tumor trees we consider do not have the same set of labels applied to the leaves of each tree. We describe several distance measures between these tumor trees. Our GraPhyC algorithm solves the consensus problem using a weighted directed graph where vertices are sets of mutations and edges are weighted based on the number of times a parental relationship is observed between their constituent mutations in the input trees. We find a minimum weight spanning arborescence in this graph and prove that it minimizes the total distance to all input trees for one of our distance measures. We also describe several extensions of our GraPhyC approach. On simulated data we show that GraPhyC outperforms a baseline method and demonstrate that GraPhyC can be an effective means of computing centroids in k-medians clustering. We analyze two real sequencing datasets and find that GraPhyC is able to identify a tree not included in the set of input trees, but that contains characteristics supported by other reported evolutionary reconstructions of this tumor.
Collapse
|
41
|
RDAClone: Deciphering Tumor Heterozygosity through Single-Cell Genomics Data Analysis with Robust Deep Autoencoder. Genes (Basel) 2021; 12:genes12121847. [PMID: 34946794 PMCID: PMC8701080 DOI: 10.3390/genes12121847] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 11/19/2021] [Accepted: 11/22/2021] [Indexed: 12/27/2022] Open
Abstract
Rapid advances in single-cell genomics sequencing (SCGS) have allowed researchers to characterize tumor heterozygosity with unprecedented resolution and reveal the phylogenetic relationships between tumor cells or clones. However, high sequencing error rates of current SCGS data, i.e., false positives, false negatives, and missing bases, severely limit its application. Here, we present a deep learning framework, RDAClone, to recover genotype matrices from noisy data with an extended robust deep autoencoder, cluster cells into subclones by the Louvain-Jaccard method, and further infer evolutionary relationships between subclones by the minimum spanning tree. Studies on both simulated and real datasets demonstrate its robustness and superiority in data denoising, cell clustering, and evolutionary tree reconstruction, particularly for large datasets.
Collapse
|
42
|
Lähnemann D, Köster J, Fischer U, Borkhardt A, McHardy AC, Schönhuth A. Accurate and scalable variant calling from single cell DNA sequencing data with ProSolo. Nat Commun 2021; 12:6744. [PMID: 34795237 PMCID: PMC8602313 DOI: 10.1038/s41467-021-26938-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Accepted: 10/22/2021] [Indexed: 01/14/2023] Open
Abstract
Accurate single cell mutational profiles can reveal genomic cell-to-cell heterogeneity. However, sequencing libraries suitable for genotyping require whole genome amplification, which introduces allelic bias and copy errors. The resulting data violates assumptions of variant callers developed for bulk sequencing. Thus, only dedicated models accounting for amplification bias and errors can provide accurate calls. We present ProSolo for calling single nucleotide variants from multiple displacement amplified (MDA) single cell DNA sequencing data. ProSolo probabilistically models a single cell jointly with a bulk sequencing sample and integrates all relevant MDA biases in a site-specific and scalable-because computationally efficient-manner. This achieves a higher accuracy in calling and genotyping single nucleotide variants in single cells in comparison to state-of-the-art tools and supports imputation of insufficiently covered genotypes, when downstream tools cannot handle missing data. Moreover, ProSolo implements the first approach to control the false discovery rate reliably and flexibly. ProSolo is implemented in an extendable framework, with code and usage at: https://github.com/prosolo/prosolo.
Collapse
Affiliation(s)
- David Lähnemann
- Department for Computational Biology of Infection Research, Helmholtz Centre for Infection Research, 38124, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, 38106, Braunschweig, Germany
- Algorithmic Bioinformatics, Faculty of Mathematics and Natural Sciences, Heinrich Heine University Düsseldorf, 40225, Düsseldorf, Germany
- Department of Paediatric Oncology, Haematology and Immunology, University Hospital, Medical Faculty, Heinrich Heine University Düsseldorf, 40225, Düsseldorf, Germany
- Algorithms for Reproducible Bioinformatics, Institute of Human Genetics, University of Duisburg-Essen, 45147, Essen, Germany
| | - Johannes Köster
- Algorithms for Reproducible Bioinformatics, Institute of Human Genetics, University of Duisburg-Essen, 45147, Essen, Germany
- Genome Data Science, Life Sciences Group, Centrum Wiskunde & Informatica, 1098 XG, Amsterdam, The Netherlands
| | - Ute Fischer
- Department of Paediatric Oncology, Haematology and Immunology, University Hospital, Medical Faculty, Heinrich Heine University Düsseldorf, 40225, Düsseldorf, Germany
| | - Arndt Borkhardt
- Department of Paediatric Oncology, Haematology and Immunology, University Hospital, Medical Faculty, Heinrich Heine University Düsseldorf, 40225, Düsseldorf, Germany
| | - Alice C McHardy
- Department for Computational Biology of Infection Research, Helmholtz Centre for Infection Research, 38124, Braunschweig, Germany.
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, 38106, Braunschweig, Germany.
- Algorithmic Bioinformatics, Faculty of Mathematics and Natural Sciences, Heinrich Heine University Düsseldorf, 40225, Düsseldorf, Germany.
| | - Alexander Schönhuth
- Genome Data Science, Life Sciences Group, Centrum Wiskunde & Informatica, 1098 XG, Amsterdam, The Netherlands.
- Genome Data Science, Faculty of Technology, Bielefeld University, 33615, Bielefeld, Germany.
| |
Collapse
|
43
|
Utro F, Levovitz C, Rhrissorrakrai K, Parida L. A common methodological phylogenomics framework for intra-patient heteroplasmies to infer SARS-CoV-2 sublineages and tumor clones. BMC Genomics 2021; 22:518. [PMID: 34789161 PMCID: PMC8596094 DOI: 10.1186/s12864-021-07660-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Accepted: 04/28/2021] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND All diseases containing genetic material undergo genetic evolution and give rise to heterogeneity including cancer and infection. Although these illnesses are biologically very different, the ability for phylogenetic retrodiction based on the genomic reads is common between them and thus tree-based principles and assumptions are shared. Just as the different frequencies of tumor genomic variants presupposes the existence of multiple tumor clones and provides a handle to computationally infer them, we postulate that the different variant frequencies in viral reads offers the means to infer multiple co-infecting sublineages. RESULTS We present a common methodological framework to infer the phylogenomics from genomic data, be it reads of SARS-CoV-2 of multiple COVID-19 patients or bulk DNAseq of the tumor of a cancer patient. We describe the Concerti computational framework for inferring phylogenies in each of the two scenarios.To demonstrate the accuracy of the method, we reproduce some known results in both scenarios. We also make some additional discoveries. CONCLUSIONS Concerti successfully extracts and integrates information from multi-point samples, enabling the discovery of clinically plausible phylogenetic trees that capture the heterogeneity known to exist both spatially and temporally. These models can have direct therapeutic implications by highlighting "birth" of clones that may harbor resistance mechanisms to treatment, "death" of subclones with drug targets, and acquisition of functionally pertinent mutations in clones that may have seemed clinically irrelevant. Specifically in this paper we uncover new potential parallel mutations in the evolution of the SARS-CoV-2 virus. In the context of cancer, we identify new clones harboring resistant mutations to therapy.
Collapse
Affiliation(s)
- Filippo Utro
- IBM Research, T.J. Watson Research Center, Yorktown Heights, USA
| | - Chaya Levovitz
- IBM Research, T.J. Watson Research Center, Yorktown Heights, USA
| | | | - Laxmi Parida
- IBM Research, T.J. Watson Research Center, Yorktown Heights, USA
| |
Collapse
|
44
|
Improved SNV Discovery in Barcode-Stratified scRNA-seq Alignments. Genes (Basel) 2021; 12:genes12101558. [PMID: 34680953 PMCID: PMC8535975 DOI: 10.3390/genes12101558] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Revised: 09/25/2021] [Accepted: 09/28/2021] [Indexed: 11/17/2022] Open
Abstract
Currently, the detection of single nucleotide variants (SNVs) from 10 x Genomics single-cell RNA sequencing data (scRNA-seq) is typically performed on the pooled sequencing reads across all cells in a sample. Here, we assess the gaining of information regarding SNV assessments from individual cell scRNA-seq data, wherein the alignments are split by cellular barcode prior to the variant call. We also reanalyze publicly available data on the MCF7 cell line during anticancer treatment. We assessed SNV calls by three variant callers—GATK, Strelka2, and Mutect2, in combination with a method for the cell-level tabulation of the sequencing read counts bearing variant alleles–SCReadCounts (single-cell read counts). Our analysis shows that variant calls on individual cell alignments identify at least a two-fold higher number of SNVs as compared to the pooled scRNA-seq; these SNVs are enriched in novel variants and in stop-codon and missense substitutions. Our study indicates an immense potential of SNV calls from individual cell scRNA-seq data and emphasizes the need for cell-level variant detection approaches and tools, which can contribute to the understanding of the cellular heterogeneity and the relationships to phenotypes, and help elucidate somatic mutation evolution and functionality.
Collapse
|
45
|
Tang J, Tu K, Lu K, Zhang J, Luo K, Jin H, Wang L, Yang L, Xiao W, Zhang Q, Liu X, Ge XY, Li G, Zhou Z, Xie D. Single-cell exome sequencing reveals multiple subclones in metastatic colorectal carcinoma. Genome Med 2021; 13:148. [PMID: 34507604 PMCID: PMC8434739 DOI: 10.1186/s13073-021-00962-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 08/12/2021] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Colorectal cancer (CRC) is a major cancer type whose mechanism of metastasis remains elusive. METHODS In this study, we characterised the evolutionary pattern of metastatic CRC (mCRC) by analysing bulk and single-cell exome sequencing data of primary and metastatic tumours from 7 CRC patients with liver metastases. Here, 7 CRC patients were analysed by bulk whole-exome sequencing (WES); 4 of these were also analysed using single-cell sequencing. RESULTS Despite low genomic divergence between paired primary and metastatic cancers in the bulk data, single-cell WES (scWES) data revealed rare mutations and defined two separate cell populations, indicative of the diverse evolutionary trajectories between primary and metastatic tumour cells. We further identified 24 metastatic cell-specific-mutated genes and validated their functions in cell migration capacity. CONCLUSIONS In summary, scWES revealed rare mutations that failed to be detected by bulk WES. These rare mutations better define the distinct genomic profiles of primary and metastatic tumour cell clones.
Collapse
Affiliation(s)
- Jie Tang
- National Frontier Center of Disease Molecular Network, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, No. 17, Section 3, Renmin South Road, Chengdu, 610041, Sichuan, China
| | - Kailing Tu
- National Frontier Center of Disease Molecular Network, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, No. 17, Section 3, Renmin South Road, Chengdu, 610041, Sichuan, China
| | - Keying Lu
- National Frontier Center of Disease Molecular Network, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, No. 17, Section 3, Renmin South Road, Chengdu, 610041, Sichuan, China
| | - Jiaxun Zhang
- National Frontier Center of Disease Molecular Network, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, No. 17, Section 3, Renmin South Road, Chengdu, 610041, Sichuan, China
| | - Kai Luo
- National Frontier Center of Disease Molecular Network, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, No. 17, Section 3, Renmin South Road, Chengdu, 610041, Sichuan, China
| | | | - Lei Wang
- BGI-Shenzhen, Shenzhen, 518083, China
| | - Lie Yang
- Department of Gastrointestinal Surgery, West China Hospital, Sichuan University, No. 37, Guoxue Lane, Wuhou District, Chengdu, 610041, Sichuan, China
| | - Weiran Xiao
- National Frontier Center of Disease Molecular Network, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, No. 17, Section 3, Renmin South Road, Chengdu, 610041, Sichuan, China
| | - Qilin Zhang
- National Frontier Center of Disease Molecular Network, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, No. 17, Section 3, Renmin South Road, Chengdu, 610041, Sichuan, China
| | - Xiaoling Liu
- National Frontier Center of Disease Molecular Network, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, No. 17, Section 3, Renmin South Road, Chengdu, 610041, Sichuan, China
| | - Xin Yi Ge
- National Frontier Center of Disease Molecular Network, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, No. 17, Section 3, Renmin South Road, Chengdu, 610041, Sichuan, China
| | - Guibo Li
- BGI-Shenzhen, Shenzhen, 518083, China.
| | - Zongguang Zhou
- Department of Gastrointestinal Surgery, West China Hospital, Sichuan University, No. 37, Guoxue Lane, Wuhou District, Chengdu, 610041, Sichuan, China.
| | - Dan Xie
- National Frontier Center of Disease Molecular Network, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, No. 17, Section 3, Renmin South Road, Chengdu, 610041, Sichuan, China. .,Department of Gastrointestinal Surgery, West China Hospital, Sichuan University, No. 37, Guoxue Lane, Wuhou District, Chengdu, 610041, Sichuan, China.
| |
Collapse
|
46
|
Morgan D, Jost TA, De Santiago C, Brock A. Applications of high-resolution clone tracking technologies in cancer. CURRENT OPINION IN BIOMEDICAL ENGINEERING 2021; 19:100317. [PMID: 34901584 PMCID: PMC8658740 DOI: 10.1016/j.cobme.2021.100317] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Tumors are comprised of dynamic, heterogenous cell populations characterized by numerous genetic and non-genetic alterations that accumulate and change with disease progression and treatment. Retrospective analyses of tumor evolution have relied on the measurement of genetic markers (such as copy number variants) to infer clonal dynamics. However, these approaches neglect the critical contributions of non-genetic drivers of disease. Techniques that harness the power of prospective clone tracking via heritable barcode tags provide an alternative strategy. In this review, we discuss methods for high-resolution, quantitative clone tracking, including recent advancements to pair barcode-specific functionality with scRNA-seq, clonal cell isolation, and in situ hybridization and imaging. We discuss these approaches in the context of cancer cell heterogeneity and treatment resistance.
Collapse
Affiliation(s)
- Daylin Morgan
- Department of Biomedical Engineering, The University of Texas at Austin, Austin, Texas, 78712, United States
| | - Tyler A. Jost
- Department of Biomedical Engineering, The University of Texas at Austin, Austin, Texas, 78712, United States
| | - Carolina De Santiago
- Department of Biomedical Engineering, The University of Texas at Austin, Austin, Texas, 78712, United States
| | - Amy Brock
- Department of Biomedical Engineering, The University of Texas at Austin, Austin, Texas, 78712, United States
| |
Collapse
|
47
|
Baghaarabani L, Goliaei S, Foroughmand-Araabi MH, Shariatpanahi SP, Goliaei B. Conifer: clonal tree inference for tumor heterogeneity with single-cell and bulk sequencing data. BMC Bioinformatics 2021; 22:416. [PMID: 34461827 PMCID: PMC8404257 DOI: 10.1186/s12859-021-04338-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2021] [Accepted: 08/16/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Genetic heterogeneity of a cancer tumor that develops during clonal evolution is one of the reasons for cancer treatment failure, by increasing the chance of drug resistance. Clones are cell populations with different genotypes, resulting from differences in somatic mutations that occur and accumulate during cancer development. An appropriate approach for identifying clones is determining the variant allele frequency of mutations that occurred in the tumor. Although bulk sequencing data can be used to provide that information, the frequencies are not informative enough for identifying different clones with the same prevalence and their evolutionary relationships. On the other hand, single-cell sequencing data provides valuable information about branching events in the evolution of a cancerous tumor. However, the temporal order of mutations may be determined with ambiguities using only single-cell data, while variant allele frequencies from bulk sequencing data can provide beneficial information for inferring the temporal order of mutations with fewer ambiguities. RESULT In this study, a new method called Conifer (ClONal tree Inference For hEterogeneity of tumoR) is proposed which combines aggregated variant allele frequency from bulk sequencing data with branching event information from single-cell sequencing data to more accurately identify clones and their evolutionary relationships. It is proven that the accuracy of clone identification and clonal tree inference is increased by using Conifer compared to other existing methods on various sets of simulated data. In addition, it is discussed that the evolutionary tree provided by Conifer on real cancer data sets is highly consistent with information in both bulk and single-cell data. CONCLUSIONS In this study, we have provided an accurate and robust method to identify clones of tumor heterogeneity and their evolutionary history by combining single-cell and bulk sequencing data.
Collapse
Affiliation(s)
- Leila Baghaarabani
- Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Sama Goliaei
- Faculty of New Sciences and Technologies, University of Tehran, Tehran, Iran
| | | | | | - Bahram Goliaei
- Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran.
| |
Collapse
|
48
|
Kumar S, Tao Q, Weaver S, Sanderford M, Caraballo-Ortiz MA, Sharma S, Pond SLK, Miura S. An Evolutionary Portrait of the Progenitor SARS-CoV-2 and Its Dominant Offshoots in COVID-19 Pandemic. Mol Biol Evol 2021; 38:3046-3059. [PMID: 33942847 PMCID: PMC8135569 DOI: 10.1093/molbev/msab118] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Global sequencing of genomes of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has continued to reveal new genetic variants that are the key to unraveling its early evolutionary history and tracking its global spread over time. Here we present the heretofore cryptic mutational history and spatiotemporal dynamics of SARS-CoV-2 from an analysis of thousands of high-quality genomes. We report the likely most recent common ancestor of SARS-CoV-2, reconstructed through a novel application and advancement of computational methods initially developed to infer the mutational history of tumor cells in a patient. This progenitor genome differs from genomes of the first coronaviruses sampled in China by three variants, implying that none of the earliest patients represent the index case or gave rise to all the human infections. However, multiple coronavirus infections in China and the United States harbored the progenitor genetic fingerprint in January 2020 and later, suggesting that the progenitor was spreading worldwide months before and after the first reported cases of COVID-19 in China. Mutations of the progenitor and its offshoots have produced many dominant coronavirus strains that have spread episodically over time. Fingerprinting based on common mutations reveals that the same coronavirus lineage has dominated North America for most of the pandemic in 2020. There have been multiple replacements of predominant coronavirus strains in Europe and Asia as well as continued presence of multiple high-frequency strains in Asia and North America. We have developed a continually updating dashboard of global evolution and spatiotemporal trends of SARS-CoV-2 spread (http://sars2evo.datamonkey.org/).
Collapse
Affiliation(s)
- Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Qiqing Tao
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| | - Steven Weaver
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| | - Maxwell Sanderford
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| | - Marcos A Caraballo-Ortiz
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| | - Sudip Sharma
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| | - Sergei L K Pond
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| | - Sayaka Miura
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| |
Collapse
|
49
|
Malikić S, Mehrabadi FR, Azer ES, Ebrahimabadi MH, Sahinalp SC. Studying the History of Tumor Evolution from Single-Cell Sequencing Data by Exploring the Space of Binary Matrices. J Comput Biol 2021; 28:857-879. [PMID: 34297621 DOI: 10.1089/cmb.2020.0595] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Single-cell sequencing (SCS) data have great potential in reconstructing the evolutionary history of tumors. Rapid advances in SCS technology in the past decade were followed by the design of various computational methods for inferring trees of tumor evolution. Some of the earliest methods were based on the direct search in the space of trees with the goal of finding the maximum likelihood tree. However, it can be shown that instead of searching directly in the tree space, we can perform a search in the space of binary matrices and obtain maximum likelihood tree directly from the maximum likelihood matrix. The potential of the latter tree search strategy has recently been recognized by different research groups and several related methods were published in the past 2 years. Here we provide a review of the theoretical background of these methods and a detailed discussion, which are largely missing in the available publications, of the correlation between the two tree search strategies. We also discuss each of the existing methods based on the search in the space of binary matrices and summarize the best-known single-cell DNA sequencing data sets, which can be used in the future for assessing performance on real data of newly developed methods.
Collapse
Affiliation(s)
- Salem Malikić
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Farid Rashidi Mehrabadi
- Department of Computer Science, Indiana University, Bloomington, Indiana, USA.,Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Erfan Sadeqi Azer
- Department of Computer Science, Indiana University, Bloomington, Indiana, USA
| | - Mohammad Haghir Ebrahimabadi
- Department of Computer Science, Indiana University, Bloomington, Indiana, USA.,Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Suleyman Cenk Sahinalp
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
50
|
Weber LL, Sashittal P, El-Kebir M. doubletD: detecting doublets in single-cell DNA sequencing data. Bioinformatics 2021; 37:i214-i221. [PMID: 34252961 PMCID: PMC8275324 DOI: 10.1093/bioinformatics/btab266] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/21/2021] [Indexed: 11/13/2022] Open
Abstract
Motivation While single-cell DNA sequencing (scDNA-seq) has enabled the study of intratumor heterogeneity at an unprecedented resolution, current technologies are error-prone and often result in doublets where two or more cells are mistaken for a single cell. Not only do doublets confound downstream analyses, but the increase in doublet rate is also a major bottleneck preventing higher throughput with current single-cell technologies. Although doublet detection and removal are standard practice in scRNA-seq data analysis, options for scDNA-seq data are limited. Current methods attempt to detect doublets while also performing complex downstream analyses tasks, leading to decreased efficiency and/or performance. Results We present doubletD, the first standalone method for detecting doublets in scDNA-seq data. Underlying our method is a simple maximum likelihood approach with a closed-form solution. We demonstrate the performance of doubletD on simulated data as well as real datasets, outperforming current methods for downstream analysis of scDNA-seq data that jointly infer doublets as well as standalone approaches for doublet detection in scRNA-seq data. Incorporating doubletD in scDNA-seq analysis pipelines will reduce complexity and lead to more accurate results. Availability and implementation https://github.com/elkebir-group/doubletD. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Leah L Weber
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbama, IL 61801, USA
| | - Palash Sashittal
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbama, IL 61801, USA.,Department of Aerospace Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Mohammed El-Kebir
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbama, IL 61801, USA
| |
Collapse
|