1
|
Cardner M, Marass F, Gedvilaite E, Yang JL, Tsui DWY, Beerenwinkel N. Predicting tumour content of liquid biopsies from cell-free DNA. BMC Bioinformatics 2023; 24:368. [PMID: 37777714 PMCID: PMC10543881 DOI: 10.1186/s12859-023-05478-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Accepted: 09/12/2023] [Indexed: 10/02/2023] Open
Abstract
BACKGROUND Liquid biopsy is a minimally-invasive method of sampling bodily fluids, capable of revealing evidence of cancer. The distribution of cell-free DNA (cfDNA) fragment lengths has been shown to differ between healthy subjects and cancer patients, whereby the distributional shift correlates with the sample's tumour content. These fragmentomic data have not yet been utilised to directly quantify the proportion of tumour-derived cfDNA in a liquid biopsy. RESULTS We used statistical learning to predict tumour content from Fourier and wavelet transforms of cfDNA length distributions in samples from 118 cancer patients. The model was validated on an independent dilution series of patient plasma. CONCLUSIONS This proof of concept suggests that our fragmentomic methodology could be useful for predicting tumour content in liquid biopsies.
Collapse
Affiliation(s)
- Mathias Cardner
- Department of Biosystems Science and Engineering, ETH Zurich, 4058, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, 4058, Basel, Switzerland
| | - Francesco Marass
- Department of Biosystems Science and Engineering, ETH Zurich, 4058, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, 4058, Basel, Switzerland
- PetDx, Inc, La Jolla, USA
| | - Erika Gedvilaite
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Julie L Yang
- Epigenetics Research Center, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Dana W Y Tsui
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
- PetDx, Inc, La Jolla, USA.
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, 4058, Basel, Switzerland.
- SIB Swiss Institute of Bioinformatics, 4058, Basel, Switzerland.
| |
Collapse
|
2
|
Sandmann S, Richter S, Jiang X, Varghese J. Reconstructing Clonal Evolution-A Systematic Evaluation of Current Bioinformatics Approaches. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:5128. [PMID: 36982036 PMCID: PMC10049679 DOI: 10.3390/ijerph20065128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 03/04/2023] [Accepted: 03/13/2023] [Indexed: 06/18/2023]
Abstract
The accurate reconstruction of clonal evolution, including the identification of newly developing, highly aggressive subclones, is essential for the application of precision medicine in cancer treatment. Reconstruction, aiming for correct variant clustering and clonal evolution tree reconstruction, is commonly performed by tedious manual work. While there is a plethora of tools to automatically generate reconstruction, their reliability, especially reasons for unreliability, are not systematically assessed. We developed clevRsim-an approach to simulate clonal evolution data, including single-nucleotide variants as well as (overlapping) copy number variants. From this, we generated 88 data sets and performed a systematic evaluation of the tools for the reconstruction of clonal evolution. The results indicate a major negative influence of a high number of clones on both clustering and tree reconstruction. Low coverage as well as an extreme number of time points usually leads to poor clustering results. An underlying branched independent evolution hampers correct tree reconstruction. A further major decline in performance could be observed for large deletions and duplications overlapping single-nucleotide variants. In summary, to explore the full potential of reconstructing clonal evolution, improved algorithms that can properly handle the identified limitations are greatly needed.
Collapse
Affiliation(s)
- Sarah Sandmann
- Institute of Medical Informatics, University of Münster, 48149 Münster, Germany
| | - Silja Richter
- Institute of Medical Informatics, University of Münster, 48149 Münster, Germany
| | - Xiaoyi Jiang
- Department of Computer Science, University of Münster, 48149 Münster, Germany
| | - Julian Varghese
- Institute of Medical Informatics, University of Münster, 48149 Münster, Germany
| |
Collapse
|
3
|
Chen Z, Gong F, Wan L, Ma L. BiTSC
2: Bayesian inference of tumor clonal tree by joint analysis of single-cell SNV and CNA data. Brief Bioinform 2022; 23:6562684. [PMID: 35368055 PMCID: PMC9116244 DOI: 10.1093/bib/bbac092] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Revised: 01/29/2022] [Accepted: 02/23/2022] [Indexed: 12/14/2022] Open
Abstract
Abstract
The rapid development of single-cell DNA sequencing (scDNA-seq) technology has greatly enhanced the resolution of tumor cell profiling, providing an unprecedented perspective in characterizing intra-tumoral heterogeneity and understanding tumor progression and metastasis. However, prominent algorithms for constructing tumor phylogeny based on scDNA-seq data usually only take single nucleotide variations (SNVs) as markers, failing to consider the effect caused by copy number alterations (CNAs). Here, we propose BiTSC$^2$, Bayesian inference of Tumor clonal Tree by joint analysis of Single-Cell SNV and CNA data. BiTSC$^2$ takes raw reads from scDNA-seq as input, accounts for the overlapping of CNA and SNV, models allelic dropout rate, sequencing errors and missing rate, as well as assigns single cells into subclones. By applying Markov Chain Monte Carlo sampling, BiTSC$^2$ can simultaneously estimate the subclonal scCNA and scSNV genotype matrices, subclonal assignments and tumor subclonal evolutionary tree. In comparison with existing methods on synthetic and real tumor data, BiTSC$^2$ shows high accuracy in genotype recovery, subclonal assignment and tree reconstruction. BiTSC$^2$ also performs robustly in dealing with scDNA-seq data with low sequencing depth and variant missing rate. BiTSC$^2$ software is available at https://github.com/ucasdp/BiTSC2.
Collapse
Affiliation(s)
- Ziwei Chen
- Institute of Zoology, Chinese Academy of Sciences, Beichen West Road, 100101, Beijing, Country
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Zhongguancun East Road, 100190, Beijing, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Yuquan Road, 100049, Beijing, China
| | - Fuzhou Gong
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Zhongguancun East Road, 100190, Beijing, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Yuquan Road, 100049, Beijing, China
| | - Lin Wan
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Zhongguancun East Road, 100190, Beijing, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Yuquan Road, 100049, Beijing, China
| | - Liang Ma
- Institute of Zoology, Chinese Academy of Sciences, Beichen West Road, 100101, Beijing, Country
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Yuquan Road, 100049, Beijing, China
| |
Collapse
|
4
|
Trethewey CS, Walter HS, Alqahtani ANM, Schmid R, Guttery DS, Griffin Y, Ahearne MJ, Saldanha GS, Jayne SPN, Dyer MJS. Limitations of Monitoring Disease Progression Using Circulating Tumor DNA in Lymphoma: An Example From Primary Cutaneous DLBCL Leg-type. Hemasphere 2022; 6:e690. [PMID: 35261967 PMCID: PMC8893288 DOI: 10.1097/hs9.0000000000000690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 01/24/2022] [Indexed: 12/03/2022] Open
Affiliation(s)
- Christopher S. Trethewey
- Ernest and Helen Scott Haematological Research Institute, Leicester Cancer Research Centre, University of Leicester, United Kingdom
- Leicester Cancer Research Centre, University of Leicester, United Kingdom
| | - Harriet S. Walter
- Ernest and Helen Scott Haematological Research Institute, Leicester Cancer Research Centre, University of Leicester, United Kingdom
- Leicester Cancer Research Centre, University of Leicester, United Kingdom
- University Hospitals of Leicester NHS Trust, Leicester, United Kingdom
| | - Abdullah N. M. Alqahtani
- Ernest and Helen Scott Haematological Research Institute, Leicester Cancer Research Centre, University of Leicester, United Kingdom
- Leicester Cancer Research Centre, University of Leicester, United Kingdom
| | - Ralf Schmid
- Department of Molecular and Cell Biology, University of Leicester, United Kingdom
- Leicester Institute of Structural and Chemical Biology, University of Leicester, United Kingdom
| | - David S. Guttery
- Leicester Cancer Research Centre, University of Leicester, United Kingdom
| | - Yvette Griffin
- University Hospitals of Leicester NHS Trust, Leicester, United Kingdom
| | - Matthew J. Ahearne
- Ernest and Helen Scott Haematological Research Institute, Leicester Cancer Research Centre, University of Leicester, United Kingdom
- Leicester Cancer Research Centre, University of Leicester, United Kingdom
- University Hospitals of Leicester NHS Trust, Leicester, United Kingdom
| | - Gerald S. Saldanha
- Leicester Cancer Research Centre, University of Leicester, United Kingdom
- University Hospitals of Leicester NHS Trust, Leicester, United Kingdom
| | - Sandrine P. N. Jayne
- Ernest and Helen Scott Haematological Research Institute, Leicester Cancer Research Centre, University of Leicester, United Kingdom
- Leicester Cancer Research Centre, University of Leicester, United Kingdom
| | - Martin J. S. Dyer
- Ernest and Helen Scott Haematological Research Institute, Leicester Cancer Research Centre, University of Leicester, United Kingdom
- Leicester Cancer Research Centre, University of Leicester, United Kingdom
- University Hospitals of Leicester NHS Trust, Leicester, United Kingdom
| |
Collapse
|
5
|
Multiparametric Circulating Tumor Cell Analysis to Select Targeted Therapies for Breast Cancer Patients. Cancers (Basel) 2021; 13:cancers13236004. [PMID: 34885114 PMCID: PMC8657376 DOI: 10.3390/cancers13236004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 11/22/2021] [Accepted: 11/24/2021] [Indexed: 02/06/2023] Open
Abstract
Simple Summary Liquid biopsies may act as a dynamic tool for identification of targets for precision therapy while circumventing limitations of tissue biopsies. In opposite to most liquid biopsy-related studies that analyze limited patient material for only one parameter, this study is based on a longitudinal and multiparametric analysis of circulating tumor cells (CTCs). A metastatic breast cancer patient was followed over a period of three years and analyses of the genome, RNA profiling, and in vitro drug testing on cultured CTCs were performed in a unique manner. We show that combining the strengths of multiple technologies for analysis yielded maximum information on the ongoing disease and, eventually, allowed choosing an effective therapy, which led to a massive reduction in CTC numbers. This approach provides a concept for future detailed longitudinal and multiparametric CTC analyses. Abstract Background: The analysis of liquid biopsies, e.g., circulating tumor cells (CTCs) is an appealing diagnostic concept for targeted therapy selection. In this proof-of-concept study, we aimed to perform multiparametric analyses of CTCs to select targeted therapies for metastatic breast cancer patients. Methods: First, CTCs of five metastatic breast cancer patients were analyzed by whole exome sequencing (WES). Based on the results, one patient was selected and monitored by longitudinal and multiparametric liquid biopsy analyses over more than three years, including WES, RNA profiling, and in vitro drug testing of CTCs. Results: Mutations addressable by targeted therapies were detected in all patients, including mutations that were not detected in biopsies of the primary tumor. For the index patient, the clonal evolution of the tumor cells was retraced and resistance mechanisms were identified. The AKT1 E17K mutation was uncovered as the driver of the metastatic process. Drug testing on the patient’s CTCs confirmed the efficacy of drugs targeting the AKT1 pathway. During a targeted therapy chosen based on the CTC characterization and including the mTOR inhibitor everolimus, CTC numbers dropped by 97.3% and the disease remained stable as determined by computer tomography/magnetic resonance imaging. Conclusion: These results illustrate the strength of a multiparametric CTC analysis to choose and validate targeted therapies to optimize cancer treatment in the future. Furthermore, from a scientific point of view, such studies promote the understanding of the biology of CTCs during different treatment regimens.
Collapse
|
6
|
Ali S, Ciccolella S, Lucarella L, Vedova GD, Patterson M. Simpler and Faster Development of Tumor Phylogeny Pipelines. J Comput Biol 2021; 28:1142-1155. [PMID: 34698531 DOI: 10.1089/cmb.2021.0271] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
In the recent years, there has been an increasing amount of single-cell sequencing studies, producing a considerable number of new data sets. This has particularly affected the field of cancer analysis, where more and more articles are published using this sequencing technique that allows for capturing more detailed information regarding the specific genetic mutations on each individually sampled cell. As the amount of information increases, it is necessary to have more sophisticated and rapid tools for analyzing the samples. To this goal, we developed plastic (PipeLine Amalgamating Single-cell Tree Inference Components), an easy-to-use and quick to adapt pipeline that integrates three different steps: (1) to simplify the input data, (2) to infer tumor phylogenies, and (3) to compare the phylogenies. We have created a pipeline submodule for each of those steps and developed new in-memory data structures that allow for easy and transparent sharing of the information across the tools implementing the above steps. While we use existing open source tools for those steps, we have extended the tool used for simplifying the input data, incorporating two machine learning procedures-which greatly reduce the running time without affecting the quality of the downstream analysis. Moreover, we have introduced the capability of producing some plots to quickly visualize results.
Collapse
Affiliation(s)
- Sarwan Ali
- Department of Computer Science, Georgia State University, Atlanta, Georgia, USA
| | - Simone Ciccolella
- Department of Informatics, Systems, and Communications, University of Milano-Bicocca, Milano, Italy
| | - Lorenzo Lucarella
- Department of Informatics, Systems, and Communications, University of Milano-Bicocca, Milano, Italy
| | - Gianluca Della Vedova
- Department of Informatics, Systems, and Communications, University of Milano-Bicocca, Milano, Italy
| | - Murray Patterson
- Department of Computer Science, Georgia State University, Atlanta, Georgia, USA
| |
Collapse
|
7
|
He S, Schein A, Sarsani V, Flaherty P. A BAYESIAN NONPARAMETRIC MODEL FOR INFERRING SUBCLONAL POPULATIONS FROM STRUCTURED DNA SEQUENCING DATA. Ann Appl Stat 2021; 15:925-951. [PMID: 34262633 DOI: 10.1214/20-aoas1434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
There are distinguishing features or "hallmarks" of cancer that are found across tumors, individuals, and types of cancer, and these hallmarks can be driven by specific genetic mutations. Yet, within a single tumor there is often extensive genetic heterogeneity as evidenced by single-cell and bulk DNA sequencing data. The goal of this work is to jointly infer the underlying genotypes of tumor subpopulations and the distribution of those subpopulations in individual tumors by integrating single-cell and bulk sequencing data. Understanding the genetic composition of the tumor at the time of treatment is important in the personalized design of targeted therapeutic combinations and monitoring for possible recurrence after treatment. We propose a hierarchical Dirichlet process mixture model that incorporates the correlation structure induced by a structured sampling arrangement and we show that this model improves the quality of inference. We develop a representation of the hierarchical Dirichlet process prior as a Gamma-Poisson hierarchy and we use this representation to derive a fast Gibbs sampling inference algorithm using the augment-and-marginalize method. Experiments with simulation data show that our model outperforms standard numerical and statistical methods for decomposing admixed count data. Analyses of real acute lymphoblastic leukemia cancer sequencing dataset shows that our model improves upon state-of-the-art bioinformatic methods. An interpretation of the results of our model on this real dataset reveals co-mutated loci across samples.
Collapse
Affiliation(s)
- Shai He
- Department of Mathematics and Statistics, University of Massachusetts Amherst
| | | | - Vishal Sarsani
- Department of Mathematics and Statistics, University of Massachusetts Amherst
| | - Patrick Flaherty
- Department of Mathematics and Statistics, University of Massachusetts Amherst
| |
Collapse
|
8
|
Spatial Distribution of Private Gene Mutations in Clear Cell Renal Cell Carcinoma. Cancers (Basel) 2021; 13:cancers13092163. [PMID: 33946379 PMCID: PMC8124666 DOI: 10.3390/cancers13092163] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 04/02/2021] [Accepted: 04/27/2021] [Indexed: 12/15/2022] Open
Abstract
Simple Summary Tumours consist of multiple groups of similar cells resulting from differing evolutionary trajectories, i.e., subclones. These subclones are prevalent in clear cell renal cell carcinoma (ccRCC). The aim of this study is to determine how similar or dissimilar the subclones in 89 ccRCC tumours are from one another regarding their gene mutations and expression profiles, i.e., the extent of intra-tumour heterogeneity. The implications of these alterations with respect to signalling pathways is also assessed. Deep sequencing allows for the identification of mutations with low-allele frequencies, providing a more comprehensive view of the heterogeneity present in the tumours. With an average of 62% of mutations having been identified in only one of the two biopsies, some of which in turn are found to impact gene expression, the complex makeup of ccRCC tumours is evident, and this can drastically influence treatment outcome. Abstract Intra-tumour heterogeneity is the molecular hallmark of renal cancer, and the molecular tumour composition determines the treatment outcome of renal cancer patients. In renal cancer tumourigenesis, in general, different tumour clones evolve over time. We analysed intra-tumour heterogeneity and subclonal mutation patterns in 178 tumour samples obtained from 89 clear cell renal cell carcinoma patients. In an initial discovery phase, whole-exome and transcriptome sequencing data from paired tumour biopsies from 16 ccRCC patients were used to design a gene panel for follow-up analysis. In this second phase, 826 selected genes were targeted at deep coverage in an extended cohort of 89 patients for a detailed analysis of tumour heterogeneity. On average, we found 22 mutations per patient. Pairwise comparison of the two biopsies from the same tumour revealed that on average, 62% of the mutations in a patient were detected in one of the two samples. In addition to commonly mutated genes (VHL, PBRM1, SETD2 and BAP1), frequent subclonal mutations with low variant allele frequency (<10%) were observed in TP53 and in mucin coding genes MUC6, MUC16, and MUC3A. Of the 89 ccRCC tumours, 87 (~98%) harboured private mutations, occurring in only one of the paired tumour samples. Clonally exclusive pathway pairs were identified using the WES data set from 16 ccRCC patients. Our findings imply that shared and private mutations significantly contribute to the complexity of differential gene expression and pathway interaction and might explain the clonal evolution of different molecular renal cancer subgroups. Multi-regional sequencing is central for the identification of subclones within ccRCC.
Collapse
|
9
|
Ciccolella S, Ricketts C, Soto Gomez M, Patterson M, Silverbush D, Bonizzoni P, Hajirasouliha I, Della Vedova G. Inferring cancer progression from Single-Cell Sequencing while allowing mutation losses. Bioinformatics 2021; 37:326-333. [PMID: 32805010 PMCID: PMC8058767 DOI: 10.1093/bioinformatics/btaa722] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Revised: 08/06/2020] [Accepted: 08/11/2020] [Indexed: 01/21/2023] Open
Abstract
Motivation In recent years, the well-known Infinite Sites Assumption has been a fundamental feature of computational methods devised for reconstructing tumor phylogenies and inferring cancer progressions. However, recent studies leveraging single-cell sequencing (SCS) techniques have shown evidence of the widespread recurrence and, especially, loss of mutations in several tumor samples. While there exist established computational methods that infer phylogenies with mutation losses, there remain some advancements to be made. Results We present Simulated Annealing Single-Cell inference (SASC): a new and robust approach based on simulated annealing for the inference of cancer progression from SCS datasets. In particular, we introduce an extension of the model of evolution where mutations are only accumulated, by allowing also a limited amount of mutation loss in the evolutionary history of the tumor: the Dollo-k model. We demonstrate that SASC achieves high levels of accuracy when tested on both simulated and real datasets and in comparison with some other available methods. Availability and implementation The SASC tool is open source and available at https://github.com/sciccolella/sasc. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Simone Ciccolella
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
| | - Camir Ricketts
- Department of Physiology and Biophysics, Tri-I Computational Biology & Medicine Graduate Program, Weill Cornell Medicine of Cornell University, New York, NY 10021, USA.,Institute for Computational Biomedicine, Englander Institute for Precision Medicine, The Meyer Cancer Center, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, New York City, NY 10021, USA
| | - Mauricio Soto Gomez
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
| | - Murray Patterson
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy.,Department of Computer Science, College of Arts and Sciences, Georgia State University, Atlanta, GA 30303, USA
| | - Dana Silverbush
- Department of Pathology and Center for Cancer Research, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Paola Bonizzoni
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
| | - Iman Hajirasouliha
- Institute for Computational Biomedicine, Englander Institute for Precision Medicine, The Meyer Cancer Center, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, New York City, NY 10021, USA
| | - Gianluca Della Vedova
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
| |
Collapse
|
10
|
Vavoulis DV, Cutts A, Taylor JC, Schuh A. A statistical approach for tracking clonal dynamics in cancer using longitudinal next-generation sequencing data. Bioinformatics 2021; 37:147-154. [PMID: 32722772 PMCID: PMC8055230 DOI: 10.1093/bioinformatics/btaa672] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Revised: 05/13/2020] [Accepted: 07/20/2020] [Indexed: 01/22/2023] Open
Abstract
MOTIVATION Tumours are composed of distinct cancer cell populations (clones), which continuously adapt to their local micro-environment. Standard methods for clonal deconvolution seek to identify groups of mutations and estimate the prevalence of each group in the tumour, while considering its purity and copy number profile. These methods have been applied on cross-sectional data and on longitudinal data after discarding information on the timing of sample collection. Two key questions are how can we incorporate such information in our analyses and is there any benefit in doing so? RESULTS We developed a clonal deconvolution method, which incorporates explicitly the temporal spacing of longitudinally sampled tumours. By merging a Dirichlet Process Mixture Model with Gaussian Process priors and using as input a sequence of several sparsely collected samples, our method can reconstruct the temporal profile of the abundance of any mutation cluster supported by the data as a continuous function of time. We benchmarked our method on whole genome, whole exome and targeted sequencing data from patients with chronic lymphocytic leukaemia, on liquid biopsy data from a patient with melanoma and on synthetic data and we found that incorporating information on the timing of tissue collection improves model performance, as long as data of sufficient volume and complexity are available for estimating free model parameters. Thus, our approach is particularly useful when collecting a relatively long sequence of tumour samples is feasible, as in liquid cancers (e.g. leukaemia) and liquid biopsies. AVAILABILITY AND IMPLEMENTATION The statistical methodology presented in this paper is freely available at github.com/dvav/clonosGP. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dimitrios V Vavoulis
- Department of Oncology, University of Oxford, Oxford, OX3 7DQ, UK
- Nuffield Department of Medicine, Wellcome Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Trust, Oxford, OX3 9DU, UK
- Department of Oncology, Molecular Diagnostic Centre, University of Oxford, Oxford OX3 9DU, UK
| | - Anthony Cutts
- Department of Oncology, University of Oxford, Oxford, OX3 7DQ, UK
- Department of Oncology, Molecular Diagnostic Centre, University of Oxford, Oxford OX3 9DU, UK
| | - Jenny C Taylor
- Nuffield Department of Medicine, Wellcome Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Trust, Oxford, OX3 9DU, UK
| | - Anna Schuh
- Department of Oncology, University of Oxford, Oxford, OX3 7DQ, UK
- NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Trust, Oxford, OX3 9DU, UK
- Department of Oncology, Molecular Diagnostic Centre, University of Oxford, Oxford OX3 9DU, UK
- Department of Haematology, Oxford University Hospitals NHS Trust, Oxford OX3 9DU, UK
| |
Collapse
|
11
|
Sadeqi Azer E, Rashidi Mehrabadi F, Malikić S, Li XC, Bartok O, Litchfield K, Levy R, Samuels Y, Schäffer AA, Gertz EM, Day CP, Pérez-Guijarro E, Marie K, Lee MP, Merlino G, Ergun F, Sahinalp SC. PhISCS-BnB: a fast branch and bound algorithm for the perfect tumor phylogeny reconstruction problem. Bioinformatics 2021; 36:i169-i176. [PMID: 32657358 DOI: 10.1093/bioinformatics/btaa464] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
MOTIVATION Recent advances in single-cell sequencing (SCS) offer an unprecedented insight into tumor emergence and evolution. Principled approaches to tumor phylogeny reconstruction via SCS data are typically based on general computational methods for solving an integer linear program, or a constraint satisfaction program, which, although guaranteeing convergence to the most likely solution, are very slow. Others based on Monte Carlo Markov Chain or alternative heuristics not only offer no such guarantee, but also are not faster in practice. As a result, novel methods that can scale up to handle the size and noise characteristics of emerging SCS data are highly desirable to fully utilize this technology. RESULTS We introduce PhISCS-BnB (phylogeny inference using SCS via branch and bound), a branch and bound algorithm to compute the most likely perfect phylogeny on an input genotype matrix extracted from an SCS dataset. PhISCS-BnB not only offers an optimality guarantee, but is also 10-100 times faster than the best available methods on simulated tumor SCS data. We also applied PhISCS-BnB on a recently published large melanoma dataset derived from the sublineages of a cell line involving 20 clones with 2367 mutations, which returned the optimal tumor phylogeny in <4 h. The resulting phylogeny agrees with and extends the published results by providing a more detailed picture on the clonal evolution of the tumor. AVAILABILITY AND IMPLEMENTATION https://github.com/algo-cancer/PhISCS-BnB. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Erfan Sadeqi Azer
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA
| | - Farid Rashidi Mehrabadi
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA.,Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Salem Malikić
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA
| | - Xuan Cindy Li
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA.,Program in Computational Biology, Bioinformatics and Genomics, University of Maryland, College Park, MD 20742, USA
| | - Osnat Bartok
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Kevin Litchfield
- Cancer Evolution and Genome Instability Laboratory, Francis Crick Institute, London NW1 1AT, UK.,Cancer Research UK Lung Cancer Centre of Excellence London, University College London Cancer Institute, London WC1E 6DD, UK
| | - Ronen Levy
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Yardena Samuels
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Alejandro A Schäffer
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - E Michael Gertz
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Chi-Ping Day
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Eva Pérez-Guijarro
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Kerrie Marie
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Maxwell P Lee
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Glenn Merlino
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Funda Ergun
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA
| | - S Cenk Sahinalp
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
12
|
Jhwueng DC, Wang CP. Phylogenetic Curved Optimal Regression for Adaptive Trait Evolution. ENTROPY (BASEL, SWITZERLAND) 2021; 23:218. [PMID: 33579023 PMCID: PMC7916804 DOI: 10.3390/e23020218] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2020] [Revised: 02/07/2021] [Accepted: 02/08/2021] [Indexed: 11/16/2022]
Abstract
Regression analysis using line equations has been broadly applied in studying the evolutionary relationship between the response trait and its covariates. However, the characteristics among closely related species in nature present abundant diversities where the nonlinear relationship between traits have been frequently observed. By treating the evolution of quantitative traits along a phylogenetic tree as a set of continuous stochastic variables, statistical models for describing the dynamics of the optimum of the response trait and its covariates are built herein. Analytical representations for the response trait variables, as well as their optima among a group of related species, are derived. Due to the models' lack of tractable likelihood, a procedure that implements the Approximate Bayesian Computation (ABC) technique is applied for statistical inference. Simulation results show that the new models perform well where the posterior means of the parameters are close to the true parameters. Empirical analysis supports the new models when analyzing the trait relationship among kangaroo species.
Collapse
|
13
|
Tarabichi M, Salcedo A, Deshwar AG, Leathlobhair MN, Wintersinger J, Wedge DC, Loo PV, Morris QD, Boutros PC. A practical guide to cancer subclonal reconstruction from DNA sequencing. Nat Methods 2021; 18:144-155. [PMID: 33398189 PMCID: PMC7867630 DOI: 10.1038/s41592-020-01013-2] [Citation(s) in RCA: 76] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2019] [Accepted: 11/09/2020] [Indexed: 01/28/2023]
Abstract
Subclonal reconstruction from bulk tumor DNA sequencing has become a pillar of cancer evolution studies, providing insight into the clonality and relative ordering of mutations and mutational processes. We provide an outline of the complex computational approaches used for subclonal reconstruction from single and multiple tumor samples. We identify the underlying assumptions and uncertainties in each step and suggest best practices for analysis and quality assessment. This guide provides a pragmatic resource for the growing user community of subclonal reconstruction methods.
Collapse
Affiliation(s)
- Maxime Tarabichi
- The Francis Crick Institute, London, United Kingdom
- Wellcome Sanger Institute, Hinxton, United Kingdom
| | - Adriana Salcedo
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Department of Human Genetics, University of California, Los Angeles
- Jonsson Comprehensive Cancer Center, David Geffen School of Medicine, University of California, Los Angeles
- Institute for Precision Health, University of California, Los Angeles
- Ontario Institute for Cancer Research, Toronto, Canada
| | - Amit G. Deshwar
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering, University of Toronto, Toronto, Canada
| | - Máire Ni Leathlobhair
- Big Data Institute, University of Oxford, Oxford, United Kingdom
- Ludwig Institute for Cancer Research, University of Oxford, Oxford, United Kingdom
| | - Jeff Wintersinger
- Department of Computer Science, University of Toronto, Toronto, Canada
| | - David C. Wedge
- Big Data Institute, University of Oxford, Oxford, United Kingdom
- Oxford NIHR Biomedical Research Centre, Oxford, United Kingdom
- Manchester Cancer Research Centre, University of Manchester, Manchester, United Kingdom
| | | | - Quaid D. Morris
- Ontario Institute for Cancer Research, Toronto, Canada
- Department of Computer Science, University of Toronto, Toronto, Canada
- Vector Institute, Toronto, Canada
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York
- Donnelly Centre, University of Toronto, Toronto, Canada
| | - Paul C. Boutros
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Department of Human Genetics, University of California, Los Angeles
- Jonsson Comprehensive Cancer Center, David Geffen School of Medicine, University of California, Los Angeles
- Institute for Precision Health, University of California, Los Angeles
- Vector Institute, Toronto, Canada
- Department of Pharmacology and Toxicology, University of Toronto, Toronto, Canada
- Department of Urology, David Geffen School of Medicine, University of California, Los Angeles
| |
Collapse
|
14
|
Ciccolella S, Soto Gomez M, Patterson MD, Della Vedova G, Hajirasouliha I, Bonizzoni P. gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data. BMC Bioinformatics 2020; 21:413. [PMID: 33297943 PMCID: PMC7725124 DOI: 10.1186/s12859-020-03736-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Accepted: 09/03/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Cancer progression reconstruction is an important development stemming from the phylogenetics field. In this context, the reconstruction of the phylogeny representing the evolutionary history presents some peculiar aspects that depend on the technology used to obtain the data to analyze: Single Cell DNA Sequencing data have great specificity, but are affected by moderate false negative and missing value rates. Moreover, there has been some recent evidence of back mutations in cancer: this phenomenon is currently widely ignored. RESULTS We present a new tool, gpps, that reconstructs a tumor phylogeny from Single Cell Sequencing data, allowing each mutation to be lost at most a fixed number of times. The General Parsimony Phylogeny from Single cell (gpps) tool is open source and available at https://github.com/AlgoLab/gpps . CONCLUSIONS gpps provides new insights to the analysis of intra-tumor heterogeneity by proposing a new progression model to the field of cancer phylogeny reconstruction on Single Cell data.
Collapse
Affiliation(s)
- Simone Ciccolella
- Department of Informatics, Systems, and Communication, University of Milano - Bicocca, Milan, Italy.
| | - Mauricio Soto Gomez
- Department of Informatics, Systems, and Communication, University of Milano - Bicocca, Milan, Italy
| | - Murray D Patterson
- Department of Informatics, Systems, and Communication, University of Milano - Bicocca, Milan, Italy.,Georgia State University, Atlanta, GA, USA
| | - Gianluca Della Vedova
- Department of Informatics, Systems, and Communication, University of Milano - Bicocca, Milan, Italy
| | - Iman Hajirasouliha
- Institute for Computational Biomedicine, Weill Cornell Medicine, New York City, NY, USA.,Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, NewYork City, 10021, NY, USA
| | - Paola Bonizzoni
- Department of Informatics, Systems, and Communication, University of Milano - Bicocca, Milan, Italy
| |
Collapse
|
15
|
Zhou T, Sengupta S, Müller P, Ji Y. RNDClone: Tumor subclone reconstruction based on integrating DNA and RNA sequence data. Ann Appl Stat 2020. [DOI: 10.1214/20-aoas1368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
16
|
Lee D, Park Y, Kim S. Towards multi-omics characterization of tumor heterogeneity: a comprehensive review of statistical and machine learning approaches. Brief Bioinform 2020; 22:5896573. [PMID: 34020548 DOI: 10.1093/bib/bbaa188] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 06/29/2020] [Accepted: 07/21/2020] [Indexed: 12/19/2022] Open
Abstract
The multi-omics molecular characterization of cancer opened a new horizon for our understanding of cancer biology and therapeutic strategies. However, a tumor biopsy comprises diverse types of cells limited not only to cancerous cells but also to tumor microenvironmental cells and adjacent normal cells. This heterogeneity is a major confounding factor that hampers a robust and reproducible bioinformatic analysis for biomarker identification using multi-omics profiles. Besides, the heterogeneity itself has been recognized over the years for its significant prognostic values in some cancer types, thus offering another promising avenue for therapeutic intervention. A number of computational approaches to unravel such heterogeneity from high-throughput molecular profiles of a tumor sample have been proposed, but most of them rely on the data from an individual omics layer. Since the heterogeneity of cells is widely distributed across multi-omics layers, methods based on an individual layer can only partially characterize the heterogeneous admixture of cells. To help facilitate further development of the methodologies that synchronously account for several multi-omics profiles, we wrote a comprehensive review of diverse approaches to characterize tumor heterogeneity based on three different omics layers: genome, epigenome and transcriptome. As a result, this review can be useful for the analysis of multi-omics profiles produced by many large-scale consortia. Contact:sunkim.bioinfo@snu.ac.kr.
Collapse
Affiliation(s)
- Dohoon Lee
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Youngjune Park
- Department of Computer Science and Engineering, Institute of Engineering Research, Seoul National University, Seoul 08826, Korea
| | - Sun Kim
- Bioinformatics Institute, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
17
|
DiNardo Z, Tomlinson K, Ritz A, Oesper L. Distance measures for tumor evolutionary trees. Bioinformatics 2020; 36:2090-2097. [PMID: 31750900 PMCID: PMC7141873 DOI: 10.1093/bioinformatics/btz869] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 09/04/2019] [Accepted: 11/19/2019] [Indexed: 12/14/2022] Open
Abstract
MOTIVATION There has been recent increased interest in using algorithmic methods to infer the evolutionary tree underlying the developmental history of a tumor. Quantitative measures that compare such trees are vital to a number of different applications including benchmarking tree inference methods and evaluating common inheritance patterns across patients. However, few appropriate distance measures exist, and those that do have low resolution for differentiating trees or do not fully account for the complex relationship between tree topology and the inheritance of the mutations labeling that topology. RESULTS Here, we present two novel distance measures, Common Ancestor Set distance (CASet) and Distinctly Inherited Set Comparison distance (DISC), that are specifically designed to account for the subclonal mutation inheritance patterns characteristic of tumor evolutionary trees. We apply CASet and DISC to multiple simulated datasets and two breast cancer datasets and show that our distance measures allow for more nuanced and accurate delineation between tumor evolutionary trees than existing distance measures. AVAILABILITY AND IMPLEMENTATION Implementations of CASet and DISC are freely available at: https://bitbucket.org/oesperlab/stereodist. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zach DiNardo
- Department of Computer Science, Carleton College, Northfield, MN 55057, USA
| | - Kiran Tomlinson
- Department of Computer Science, Carleton College, Northfield, MN 55057, USA
- Department of Computer Science, Cornell University, Ithaca, NY 14853, USA
| | - Anna Ritz
- Department of Biology, Reed College, Portland, OR 97202, USA
| | - Layla Oesper
- Department of Computer Science, Carleton College, Northfield, MN 55057, USA
| |
Collapse
|
18
|
Miura S, Vu T, Deng J, Buturla T, Oladeinde O, Choi J, Kumar S. Power and pitfalls of computational methods for inferring clone phylogenies and mutation orders from bulk sequencing data. Sci Rep 2020; 10:3498. [PMID: 32103044 PMCID: PMC7044161 DOI: 10.1038/s41598-020-59006-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Accepted: 01/23/2020] [Indexed: 12/13/2022] Open
Abstract
Tumors harbor extensive genetic heterogeneity in the form of distinct clone genotypes that arise over time and across different tissues and regions in cancer. Many computational methods produce clone phylogenies from population bulk sequencing data collected from multiple tumor samples from a patient. These clone phylogenies are used to infer mutation order and clone origins during tumor progression, rendering the selection of the appropriate clonal deconvolution method critical. Surprisingly, absolute and relative accuracies of these methods in correctly inferring clone phylogenies are yet to consistently assessed. Therefore, we evaluated the performance of seven computational methods. The accuracy of the reconstructed mutation order and inferred clone groupings varied extensively among methods. All the tested methods showed limited ability to identify ancestral clone sequences present in tumor samples correctly. The presence of copy number alterations, the occurrence of multiple seeding events among tumor sites during metastatic tumor evolution, and extensive intermixture of cancer cells among tumors hindered the detection of clones and the inference of clone phylogenies for all methods tested. Overall, CloneFinder, MACHINA, and LICHeE showed the highest overall accuracy, but none of the methods performed well for all simulated datasets. So, we present guidelines for selecting methods for data analysis.
Collapse
Affiliation(s)
- Sayaka Miura
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA.,Department of Biology, Temple University, Philadelphia, PA, 19122, USA
| | - Tracy Vu
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA.,Department of Biology, Temple University, Philadelphia, PA, 19122, USA
| | - Jiamin Deng
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA.,Department of Biology, Temple University, Philadelphia, PA, 19122, USA
| | - Tiffany Buturla
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA.,Department of Biology, Temple University, Philadelphia, PA, 19122, USA
| | - Olumide Oladeinde
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA.,Department of Biology, Temple University, Philadelphia, PA, 19122, USA
| | - Jiyeong Choi
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA.,Department of Biology, Temple University, Philadelphia, PA, 19122, USA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA. .,Department of Biology, Temple University, Philadelphia, PA, 19122, USA. .,Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia.
| |
Collapse
|
19
|
Ismail WM, Nzabarushimana E, Tang H. Algorithmic approaches to clonal reconstruction in heterogeneous cell populations. QUANTITATIVE BIOLOGY 2019; 7:255-265. [PMID: 32431959 PMCID: PMC7236794 DOI: 10.1007/s40484-019-0188-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2019] [Revised: 08/09/2019] [Accepted: 08/25/2019] [Indexed: 12/15/2022]
Abstract
BACKGROUND The reconstruction of clonal haplotypes and their evolutionary history in evolving populations is a common problem in both microbial evolutionary biology and cancer biology. The clonal theory of evolution provides a theoretical framework for modeling the evolution of clones. RESULTS In this paper, we review the theoretical framework and assumptions over which the clonal reconstruction problem is formulated. We formally define the problem and then discuss the complexity and solution space of the problem. Various methods have been proposed to find the phylogeny that best explains the observed data. We categorize these methods based on the type of input data that they use (space-resolved or time-resolved), and also based on their computational formulation as either combinatorial or probabilistic. It is crucial to understand the different types of input data because each provides essential but distinct information for drastically reducing the solution space of the clonal reconstruction problem. Complementary information provided by single cell sequencing or from whole genome sequencing of randomly isolated clones can also improve the accuracy of clonal reconstruction. We briefly review the existing algorithms and their relationships. Finally we summarize the tools that are developed for either directly solving the clonal reconstruction problem or a related computational problem. CONCLUSIONS In this review, we discuss the various formulations of the problem of inferring the clonal evolutionary history from allele frequeny data, review existing algorithms and catergorize them according to their problem formulation and solution approaches. We note that most of the available clonal inference algorithms were developed for elucidating tumor evolution whereas clonal reconstruction for unicellular genomes are less addressed. We conclude the review by discussing more open problems such as the lack of benchmark datasets and comparison of performance between available tools.
Collapse
Affiliation(s)
- Wazim Mohammed Ismail
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN 47405-7000, USA
| | - Etienne Nzabarushimana
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN 47405-7000, USA
| | - Haixu Tang
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN 47405-7000, USA
| |
Collapse
|
20
|
Malikic S, Mehrabadi FR, Ciccolella S, Rahman MK, Ricketts C, Haghshenas E, Seidman D, Hach F, Hajirasouliha I, Sahinalp SC. PhISCS: a combinatorial approach for subperfect tumor phylogeny reconstruction via integrative use of single-cell and bulk sequencing data. Genome Res 2019; 29:1860-1877. [PMID: 31628256 PMCID: PMC6836735 DOI: 10.1101/gr.234435.118] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Accepted: 09/11/2019] [Indexed: 12/29/2022]
Abstract
Available computational methods for tumor phylogeny inference via single-cell sequencing (SCS) data typically aim to identify the most likely perfect phylogeny tree satisfying the infinite sites assumption (ISA). However, the limitations of SCS technologies including frequent allele dropout and variable sequence coverage may prohibit a perfect phylogeny. In addition, ISA violations are commonly observed in tumor phylogenies due to the loss of heterozygosity, deletions, and convergent evolution. In order to address such limitations, we introduce the optimal subperfect phylogeny problem which asks to integrate SCS data with matching bulk sequencing data by minimizing a linear combination of potential false negatives (due to allele dropout or variance in sequence coverage), false positives (due to read errors) among mutation calls, and the number of mutations that violate ISA (real or because of incorrect copy number estimation). We then describe a combinatorial formulation to solve this problem which ensures that several lineage constraints imposed by the use of variant allele frequencies (VAFs, derived from bulk sequence data) are satisfied. We express our formulation both in the form of an integer linear program (ILP) and—as a first in tumor phylogeny reconstruction—a Boolean constraint satisfaction problem (CSP) and solve them by leveraging state-of-the-art ILP/CSP solvers. The resulting method, which we name PhISCS, is the first to integrate SCS and bulk sequencing data while accounting for ISA violating mutations. In contrast to the alternative methods, typically based on probabilistic approaches, PhISCS provides a guarantee of optimality in reported solutions. Using simulated and real data sets, we demonstrate that PhISCS is more general and accurate than all available approaches.
Collapse
Affiliation(s)
- Salem Malikic
- School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | - Farid Rashidi Mehrabadi
- Department of Computer Science, Indiana University, Bloomington, Indiana 47408, USA.,Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Simone Ciccolella
- Department of Computer Systems and Communication, University of Milano-Bicocca, 20136 Milan, Italy.,Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York 10065, USA
| | - Md Khaledur Rahman
- Department of Computer Science, Indiana University, Bloomington, Indiana 47408, USA
| | - Camir Ricketts
- Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York 10065, USA.,Tri-I Computational Biology and Medicine Graduate Program, Cornell University, New York, New York 10065, USA
| | - Ehsan Haghshenas
- School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | - Daniel Seidman
- Tri-I Computational Biology and Medicine Graduate Program, Cornell University, New York, New York 10065, USA
| | - Faraz Hach
- School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada.,Department of Urologic Sciences, University of British Columbia, Vancouver, BC V5Z 1M9, Canada.,Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada
| | - Iman Hajirasouliha
- Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York 10065, USA.,Department of Physiology and Biophysics, Englander Institute for Precision Medicine, The Meyer Cancer Center, Weill Cornell Medicine, New York, New York 10065, USA
| | - S Cenk Sahinalp
- Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| |
Collapse
|
21
|
Ricketts C, Seidman D, Popic V, Hormozdiari F, Batzoglou S, Hajirasouliha I. Meltos: multi-sample tumor phylogeny reconstruction for structural variants. Bioinformatics 2019; 36:1082-1090. [PMID: 31584621 PMCID: PMC8215921 DOI: 10.1093/bioinformatics/btz737] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2018] [Revised: 08/10/2019] [Accepted: 09/25/2019] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION We propose Meltos, a novel computational framework to address the challenging problem of building tumor phylogeny trees using somatic structural variants (SVs) among multiple samples. Meltos leverages the tumor phylogeny tree built on somatic single nucleotide variants (SNVs) to identify high confidence SVs and produce a comprehensive tumor lineage tree, using a novel optimization formulation. While we do not assume the evolutionary progression of SVs is necessarily the same as SNVs, we show that a tumor phylogeny tree using high-quality somatic SNVs can act as a guide for calling and assigning somatic SVs on a tree. Meltos utilizes multiple genomic read signals for potential SV breakpoints in whole genome sequencing data and proposes a probabilistic formulation for estimating variant allele fractions (VAFs) of SV events. RESULTS In order to assess the ability of Meltos to correctly refine SNV trees with SV information, we tested Meltos on two simulated datasets with five genomes in both. We also assessed Meltos on two real cancer datasets. We tested Meltos on multiple samples from a liposarcoma tumor and on a multi-sample breast cancer data (Yates et al., 2015), where the authors provide validated structural variation events together with deep, targeted sequencing for a collection of somatic SNVs. We show Meltos has the ability to place high confidence validated SV calls on a refined tumor phylogeny tree. We also showed the flexibility of Meltos to either estimate VAFs directly from genomic data or to use copy number corrected estimates. AVAILABILITY AND IMPLEMENTATION Meltos is available at https://github.com/ih-lab/Meltos. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Victoria Popic
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Fereydoun Hormozdiari
- Department of Biochemistry and Molecular Medicine, MIND Institute and Genome Center, University of California, Davis, CA 95616, USA
| | - Serafim Batzoglou
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | | |
Collapse
|
22
|
Malikic S, Jahn K, Kuipers J, Sahinalp SC, Beerenwinkel N. Integrative inference of subclonal tumour evolution from single-cell and bulk sequencing data. Nat Commun 2019; 10:2750. [PMID: 31227714 PMCID: PMC6588593 DOI: 10.1038/s41467-019-10737-5] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Accepted: 05/30/2019] [Indexed: 02/07/2023] Open
Abstract
Understanding the clonal architecture and evolutionary history of a tumour poses one of the key challenges to overcome treatment failure due to resistant cell populations. Previously, studies on subclonal tumour evolution have been primarily based on bulk sequencing and in some recent cases on single-cell sequencing data. Either data type alone has shortcomings with regard to this task, but methods integrating both data types have been lacking. Here, we present B-SCITE, the first computational approach that infers tumour phylogenies from combined single-cell and bulk sequencing data. Using a comprehensive set of simulated data, we show that B-SCITE systematically outperforms existing methods with respect to tree reconstruction accuracy and subclone identification. B-SCITE provides high-fidelity reconstructions even with a modest number of single cells and in cases where bulk allele frequencies are affected by copy number changes. On real tumour data, B-SCITE generated mutation histories show high concordance with expert generated trees. Intra-tumour heterogeneity provides important information about subclonal tumour evolution. Here, the authors develop B-SCITE, a computational method for inferring tumour phylogenies from combined single-cell and bulk sequencing data.
Collapse
Affiliation(s)
- Salem Malikic
- School of Computing Science, Simon Fraser University, Burnaby, V5A 1S6, BC, Canada.,Vancouver Prostate Centre, Vancouver, V6H 3Z6, BC, Canada
| | - Katharina Jahn
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
| | - Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
| | - S Cenk Sahinalp
- Department of Computer Science, Indiana University, Bloomington, 47405, IN, USA.
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland. .,Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland.
| |
Collapse
|
23
|
Zhou T, Sengupta S, Müller P, Ji Y. TreeClone: Reconstruction of tumor subclone phylogeny based on mutation pairs using next generation sequencing data. Ann Appl Stat 2019. [DOI: 10.1214/18-aoas1224] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
24
|
Zeng L, Warren JL, Zhao H. Phylogeny-based tumor subclone identification using a Bayesian feature allocation model. Ann Appl Stat 2019. [DOI: 10.1214/18-aoas1223] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
25
|
Ogundijo OE, Zhu K, Wang X, Anastassiou D. A sequential Monte Carlo algorithm for inference of subclonal structure in cancer. PLoS One 2019; 14:e0211213. [PMID: 30682127 PMCID: PMC6347199 DOI: 10.1371/journal.pone.0211213] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2018] [Accepted: 01/03/2019] [Indexed: 11/19/2022] Open
Abstract
Tumors are heterogeneous in the sense that they consist of multiple subpopulations of cells, referred to as subclones, each of which is characterized by a distinct profile of genomic variations such as somatic mutations. Inferring the underlying clonal landscape has become an important topic in that it can help in understanding cancer development and progression, and thereby help in improving treatment. We describe a novel state-space model, based on the feature allocation framework and an efficient sequential Monte Carlo (SMC) algorithm, using the somatic mutation data obtained from tumor samples to estimate the number of subclones, as well as their characterization. Our approach, by design, is capable of handling any number of mutations. Via extensive simulations, our method exhibits high accuracy, in most cases, and compares favorably with existing methods. Moreover, we demonstrated the validity of our method through analyzing real tumor samples from patients from multiple cancer types (breast, prostate, and lung). Our results reveal driver mutation events specific to cancer types, and indicate clonal expansion by manual phylogenetic analysis. MATLAB code and datasets are available to download at: https://github.com/moyanre/tumor_clones.
Collapse
Affiliation(s)
- Oyetunji E. Ogundijo
- Department of Electrical Engineering, Columbia University, New York, NY, United States of America
| | - Kaiyi Zhu
- Department of Electrical Engineering, Columbia University, New York, NY, United States of America
- Department of Systems Biology, Columbia University, New York, NY, United States of America
| | - Xiaodong Wang
- Department of Electrical Engineering, Columbia University, New York, NY, United States of America
- * E-mail:
| | - Dimitris Anastassiou
- Department of Electrical Engineering, Columbia University, New York, NY, United States of America
- Department of Systems Biology, Columbia University, New York, NY, United States of America
| |
Collapse
|
26
|
Ogundijo OE, Wang X. SeqClone: sequential Monte Carlo based inference of tumor subclones. BMC Bioinformatics 2019; 20:6. [PMID: 30611189 PMCID: PMC6320595 DOI: 10.1186/s12859-018-2562-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Accepted: 12/06/2018] [Indexed: 11/13/2022] Open
Abstract
Background Tumor samples are heterogeneous. They consist of varying cell populations or subclones and each subclone is characterized with a distinct single nucleotide variant (SNV) profile. This explains the source of genetic heterogeneity observed in tumor sequencing data. To make precise prognosis and design effective therapy for cancer, ascertaining the subclonal composition of a tumor is of great importance. Results In this paper, we propose a state-space formulation of the feature allocation model. This model is interpreted as the blind deconvolution of the expected variant allele fractions (VAFs). VAFs are deconvolved into a binary matrix of genotypes and a matrix of genotype proportions in the samples. Specifically, we consider a sequential construction of the genotype matrix which we model by Indian buffet process (IBP). We describe an efficient sequential Monte Carlo (SMC) algorithm, SeqClone, that jointly estimates the genotypes of subclones and their proportions in the samples. When compared to other methods for resolving tumor heterogeneity, SeqClone provides comparable and sometimes, better estimates of model parameters. By design, SeqClone conveniently handles any number of probed SNVs in the samples. In particular, we can analyze VAFs from newly probed SNVs to improve existing estimates, an attribute not present in existing solutions. Conclusions We show that the SMC algorithm for deconvolving VAFs from tumor sequencing data is a robust and promising alternative for explaining the observed genetic heterogeneity in tumor samples. Electronic supplementary material The online version of this article (10.1186/s12859-018-2562-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Oyetunji E Ogundijo
- Department of Electrical Engineering, Columbia University, New York, NY 10027, USA
| | - Xiaodong Wang
- Department of Electrical Engineering, Columbia University, New York, NY 10027, USA.
| |
Collapse
|
27
|
Zhou T, Müller P, Sengupta S, Ji Y. PairClone: a Bayesian subclone caller based on mutation pairs. J R Stat Soc Ser C Appl Stat 2018. [DOI: 10.1111/rssc.12328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Tianjian Zhou
- University of Chicago, NorthShore University HealthSystem EvanstonUSA
- University of Texas at Austin USA
| | | | | | - Yuan Ji
- University of Chicago and NorthShore University HealthSystem Evanston USA
| |
Collapse
|
28
|
Kuipers J, Jahn K, Beerenwinkel N. Advances in understanding tumour evolution through single-cell sequencing. Biochim Biophys Acta Rev Cancer 2017; 1867:127-138. [PMID: 28193548 PMCID: PMC5813714 DOI: 10.1016/j.bbcan.2017.02.001] [Citation(s) in RCA: 67] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2016] [Revised: 02/02/2017] [Accepted: 02/04/2017] [Indexed: 12/14/2022]
Abstract
The mutational heterogeneity observed within tumours poses additional challenges to the development of effective cancer treatments. A thorough understanding of a tumour's subclonal composition and its mutational history is essential to open up the design of treatments tailored to individual patients. Comparative studies on a large number of tumours permit the identification of mutational patterns which may refine forecasts of cancer progression, response to treatment and metastatic potential. The composition of tumours is shaped by evolutionary processes. Recent advances in next-generation sequencing offer the possibility to analyse the evolutionary history and accompanying heterogeneity of tumours at an unprecedented resolution, by sequencing single cells. New computational challenges arise when moving from bulk to single-cell sequencing data, leading to the development of novel modelling frameworks. In this review, we present the state of the art methods for understanding the phylogeny encoded in bulk or single-cell sequencing data, and highlight future directions for developing more comprehensive and informative pictures of tumour evolution. This article is part of a Special Issue entitled: Evolutionary principles - heterogeneity in cancer?, edited by Dr. Robert A. Gatenby.
Collapse
MESH Headings
- Adaptation, Physiological
- Animals
- Biomarkers, Tumor/genetics
- Biomarkers, Tumor/metabolism
- Cell Transformation, Neoplastic/genetics
- Cell Transformation, Neoplastic/metabolism
- Cell Transformation, Neoplastic/pathology
- Evolution, Molecular
- Gene Expression Regulation, Neoplastic
- Genetic Fitness
- Genetic Heterogeneity
- Genetic Predisposition to Disease
- Heredity
- Humans
- Models, Genetic
- Mutation
- Neoplasms/drug therapy
- Neoplasms/genetics
- Neoplasms/metabolism
- Neoplasms/pathology
- Pedigree
- Phenotype
- Phylogeny
- Sequence Analysis, DNA
- Signal Transduction/genetics
- Single-Cell Analysis/methods
- Time Factors
Collapse
Affiliation(s)
- Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Katharina Jahn
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; Swiss Institute of Bioinformatics, Basel, Switzerland
| |
Collapse
|
29
|
Zainulabadeen A, Yao P, Zare H. Underexpression of Specific Interferon Genes Is Associated with Poor Prognosis of Melanoma. PLoS One 2017; 12:e0170025. [PMID: 28114321 PMCID: PMC5256985 DOI: 10.1371/journal.pone.0170025] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2016] [Accepted: 12/26/2016] [Indexed: 01/30/2023] Open
Abstract
Because the prognosis of melanoma is challenging and inaccurate when using current clinical approaches, clinicians are seeking more accurate molecular markers to improve risk models. Accordingly, we performed a survival analysis on 404 samples from The Cancer Genome Atlas (TCGA) cohort of skin cutaneous melanoma. Using our recently developed gene network model, we identified biological signatures that confidently predict the prognosis of melanoma (p-value < 10-5). Our model predicted 38 cases as low-risk and 54 cases as high-risk. The probability of surviving at least 5 years was 64% for low-risk and 14% for high-risk cases. In particular, we found that the overexpression of specific genes in the mitotic cell cycle pathway and the underexpression of specific genes in the interferon pathway are both associated with poor prognosis. We show that our predictive model assesses the risk more accurately than the traditional Clark staging method. Therefore, our model can help clinicians design treatment strategies more effectively. Furthermore, our findings shed light on the biology of melanoma and its prognosis. This is the first in vivo study that demonstrates the association between the interferon pathway and the prognosis of melanoma.
Collapse
Affiliation(s)
- Aamir Zainulabadeen
- Department of Computer Science, Texas State University, San Marcos, Texas, United States of America
- Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
| | - Philip Yao
- Department of Computer Science, Texas State University, San Marcos, Texas, United States of America
- Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Habil Zare
- Department of Computer Science, Texas State University, San Marcos, Texas, United States of America
| |
Collapse
|