1
|
Minussi DC, Nicholson MD, Ye H, Davis A, Wang K, Baker T, Tarabichi M, Sei E, Du H, Rabbani M, Peng C, Hu M, Bai S, Lin YW, Schalck A, Multani A, Ma J, McDonald TO, Casasent A, Barrera A, Chen H, Lim B, Arun B, Meric-Bernstam F, Van Loo P, Michor F, Navin NE. Breast tumours maintain a reservoir of subclonal diversity during expansion. Nature 2021; 592:302-308. [PMID: 33762732 PMCID: PMC8049101 DOI: 10.1038/s41586-021-03357-x] [Citation(s) in RCA: 120] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 02/12/2021] [Indexed: 12/21/2022]
Abstract
Our knowledge of copy number evolution during the expansion of primary breast tumours is limited1,2. Here, to investigate this process, we developed a single-cell, single-molecule DNA-sequencing method and performed copy number analysis of 16,178 single cells from 8 human triple-negative breast cancers and 4 cell lines. The results show that breast tumours and cell lines comprise a large milieu of subclones (7-22) that are organized into a few (3-5) major superclones. Evolutionary analysis suggests that after clonal TP53 mutations, multiple loss-of-heterozygosity events and genome doubling, there was a period of transient genomic instability followed by ongoing copy number evolution during the primary tumour expansion. By subcloning single daughter cells in culture, we show that tumour cells rediversify their genomes and do not retain isogenic properties. These data show that triple-negative breast cancers continue to evolve chromosome aberrations and maintain a reservoir of subclonal diversity during primary tumour growth.
Collapse
Affiliation(s)
- Darlan C Minussi
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.,Graduate School of Biomedical Sciences, The University of Texas MD Anderson Cancer Center UTHealth, Houston, TX, USA
| | - Michael D Nicholson
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA.,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.,Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
| | - Hanghui Ye
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.,Graduate School of Biomedical Sciences, The University of Texas MD Anderson Cancer Center UTHealth, Houston, TX, USA
| | - Alexander Davis
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.,Graduate School of Biomedical Sciences, The University of Texas MD Anderson Cancer Center UTHealth, Houston, TX, USA
| | - Kaile Wang
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Toby Baker
- Cancer Genomics Laboratory, The Francis Crick Institute, London, UK
| | - Maxime Tarabichi
- Cancer Genomics Laboratory, The Francis Crick Institute, London, UK
| | - Emi Sei
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Haowei Du
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.,Graduate Program in Diagnostic Genetics, School of Health Professions, MD Anderson Cancer Center, Houston, TX, USA
| | - Mashiat Rabbani
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.,Graduate Program in Diagnostic Genetics, School of Health Professions, MD Anderson Cancer Center, Houston, TX, USA
| | - Cheng Peng
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.,Graduate Program in Diagnostic Genetics, School of Health Professions, MD Anderson Cancer Center, Houston, TX, USA
| | - Min Hu
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Shanshan Bai
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Yu-Wei Lin
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.,Graduate School of Biomedical Sciences, The University of Texas MD Anderson Cancer Center UTHealth, Houston, TX, USA
| | - Aislyn Schalck
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.,Graduate School of Biomedical Sciences, The University of Texas MD Anderson Cancer Center UTHealth, Houston, TX, USA
| | - Asha Multani
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Jin Ma
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Thomas O McDonald
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA.,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.,Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA.,Center for Cancer Evolution, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Anna Casasent
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.,Graduate School of Biomedical Sciences, The University of Texas MD Anderson Cancer Center UTHealth, Houston, TX, USA
| | - Angelica Barrera
- Department of Breast Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Hui Chen
- Department of Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Bora Lim
- Department of Breast Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Banu Arun
- Department of Breast Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Funda Meric-Bernstam
- Department of Breast Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Peter Van Loo
- Cancer Genomics Laboratory, The Francis Crick Institute, London, UK
| | - Franziska Michor
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA. .,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA. .,Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA. .,Center for Cancer Evolution, Dana-Farber Cancer Institute, Boston, MA, USA. .,The Ludwig Center at Harvard, Boston, MA, and the Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Nicholas E Navin
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA. .,Graduate School of Biomedical Sciences, The University of Texas MD Anderson Cancer Center UTHealth, Houston, TX, USA. .,Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
| |
Collapse
|
2
|
Zhang Z, Cheng H, Hong X, Di Narzo AF, Franzen O, Peng S, Ruusalepp A, Kovacic JC, Bjorkegren JLM, Wang X, Hao K. EnsembleCNV: an ensemble machine learning algorithm to identify and genotype copy number variation using SNP array data. Nucleic Acids Res 2019; 47:e39. [PMID: 30722045 PMCID: PMC6468244 DOI: 10.1093/nar/gkz068] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Revised: 12/17/2018] [Accepted: 01/25/2019] [Indexed: 12/30/2022] Open
Abstract
The associations between diseases/traits and copy number variants (CNVs) have not been systematically investigated in genome-wide association studies (GWASs), primarily due to a lack of robust and accurate tools for CNV genotyping. Herein, we propose a novel ensemble learning framework, ensembleCNV, to detect and genotype CNVs using single nucleotide polymorphism (SNP) array data. EnsembleCNV (a) identifies and eliminates batch effects at raw data level; (b) assembles individual CNV calls into CNV regions (CNVRs) from multiple existing callers with complementary strengths by a heuristic algorithm; (c) re-genotypes each CNVR with local likelihood model adjusted by global information across multiple CNVRs; (d) refines CNVR boundaries by local correlation structure in copy number intensities; (e) provides direct CNV genotyping accompanied with confidence score, directly accessible for downstream quality control and association analysis. Benchmarked on two large datasets, ensembleCNV outperformed competing methods and achieved a high call rate (93.3%) and reproducibility (98.6%), while concurrently achieving high sensitivity by capturing 85% of common CNVs documented in the 1000 Genomes Project. Given this CNV call rate and accuracy, which are comparable to SNP genotyping, we suggest ensembleCNV holds significant promise for performing genome-wide CNV association studies and investigating how CNVs predispose to human diseases.
Collapse
Affiliation(s)
- Zhongyang Zhang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Haoxiang Cheng
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Xiumei Hong
- Center on the Early Life Origins of Disease, Department of Population, Family and Reproductive Health, Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD 21205, USA
| | - Antonio F Di Narzo
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Oscar Franzen
- Integrated Cardio Metabolic Centre, Department of Medicine, Karolinska Institutet, Karolinska Universitetssjukhuset, Huddinge, Sweden
| | - Shouneng Peng
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Arno Ruusalepp
- Department of Cardiac Surgery, Tartu University Hospital, Tartu, Estonia
| | - Jason C Kovacic
- Cardiovascular Research Center, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Johan L M Bjorkegren
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Integrated Cardio Metabolic Centre, Department of Medicine, Karolinska Institutet, Karolinska Universitetssjukhuset, Huddinge, Sweden
| | - Xiaobin Wang
- Center on the Early Life Origins of Disease, Department of Population, Family and Reproductive Health, Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD 21205, USA
- Division of General Pediatrics & Adolescent Medicine, Department of Pediatrics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Ke Hao
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- The Tenth People's Hospital, Tongji University, Shanghai 200072, China
- College of Environmental Science and Engineering, Tongji University, Shanghai 200092, China
| |
Collapse
|
6
|
Zhang C, Cai H, Huang J, Song Y. nbCNV: a multi-constrained optimization model for discovering copy number variants in single-cell sequencing data. BMC Bioinformatics 2016; 17:384. [PMID: 27639558 PMCID: PMC5027123 DOI: 10.1186/s12859-016-1239-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2016] [Accepted: 09/04/2016] [Indexed: 02/02/2023] Open
Abstract
Background Variations in DNA copy number have an important contribution to the development of several diseases, including autism, schizophrenia and cancer. Single-cell sequencing technology allows the dissection of genomic heterogeneity at the single-cell level, thereby providing important evolutionary information about cancer cells. In contrast to traditional bulk sequencing, single-cell sequencing requires the amplification of the whole genome of a single cell to accumulate enough samples for sequencing. However, the amplification process inevitably introduces amplification bias, resulting in an over-dispersing portion of the sequencing data. Recent study has manifested that the over-dispersed portion of the single-cell sequencing data could be well modelled by negative binomial distributions. Results We developed a read-depth based method, nbCNV to detect the copy number variants (CNVs). The nbCNV method uses two constraints-sparsity and smoothness to fit the CNV patterns under the assumption that the read signals are negatively binomially distributed. The problem of CNV detection was formulated as a quadratic optimization problem, and was solved by an efficient numerical solution based on the classical alternating direction minimization method. Conclusions Extensive experiments to compare nbCNV with existing benchmark models were conducted on both simulated data and empirical single-cell sequencing data. The results of those experiments demonstrate that nbCNV achieves superior performance and high robustness for the detection of CNVs in single-cell sequencing data. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1239-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Changsheng Zhang
- School of Computer Science & Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Hongmin Cai
- School of Computer Science & Engineering, South China University of Technology, Guangzhou, 510006, China.
| | - Jingying Huang
- School of Computer Science & Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Yan Song
- School of Computer Science & Engineering, South China University of Technology, Guangzhou, 510006, China
| |
Collapse
|