1
|
Liu F, Shi F, Yu Z. Inferring single-cell copy number profiles through cross-cell segmentation of read counts. BMC Genomics 2024; 25:25. [PMID: 38166601 PMCID: PMC10762977 DOI: 10.1186/s12864-023-09901-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 12/12/2023] [Indexed: 01/05/2024] Open
Abstract
BACKGROUND Copy number alteration (CNA) is one of the major genomic variations that frequently occur in cancers, and accurate inference of CNAs is essential for unmasking intra-tumor heterogeneity (ITH) and tumor evolutionary history. Single-cell DNA sequencing (scDNA-seq) makes it convenient to profile CNAs at single-cell resolution, and thus aids in better characterization of ITH. Despite that several computational methods have been proposed to decipher single-cell CNAs, their performance is limited in either breakpoint detection or copy number estimation due to the high dimensionality and noisy nature of read counts data. RESULTS By treating breakpoint detection as a process to segment high dimensional read count sequence, we develop a novel method called DeepCNA for cross-cell segmentation of read count sequence and per-cell inference of CNAs. To cope with the difficulty of segmentation, an autoencoder (AE) network is employed in DeepCNA to project the original data into a low-dimensional space, where the breakpoints can be efficiently detected along each latent dimension and further merged to obtain the final breakpoints. Unlike the existing methods that manually calculate certain statistics of read counts to find breakpoints, the AE model makes it convenient to automatically learn the representations. Based on the inferred breakpoints, we employ a mixture model to predict copy numbers of segments for each cell, and leverage expectation-maximization algorithm to efficiently estimate cell ploidy by exploring the most abundant copy number state. Benchmarking results on simulated and real data demonstrate our method is able to accurately infer breakpoints as well as absolute copy numbers and surpasses the existing methods under different test conditions. DeepCNA can be accessed at: https://github.com/zhyu-lab/deepcna . CONCLUSIONS Profiling single-cell CNAs based on deep learning is becoming a new paradigm of scDNA-seq data analysis, and DeepCNA is an enhancement to the current arsenal of computational methods for investigating cancer genomics.
Collapse
Affiliation(s)
- Furui Liu
- School of Information Engineering, Ningxia University, Yinchuan, 750021, China
| | - Fangyuan Shi
- School of Information Engineering, Ningxia University, Yinchuan, 750021, China
- Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-Founded By Ningxia Municipality and Ministry of Education, Ningxia University, Yinchuan, 750021, China
| | - Zhenhua Yu
- School of Information Engineering, Ningxia University, Yinchuan, 750021, China.
- Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-Founded By Ningxia Municipality and Ministry of Education, Ningxia University, Yinchuan, 750021, China.
| |
Collapse
|
2
|
Bravo‐Estupiñan DM, Aguilar‐Guerrero K, Quirós S, Acón M, Marín‐Müller C, Ibáñez‐Hernández M, Mora‐Rodríguez RA. Gene dosage compensation: Origins, criteria to identify compensated genes, and mechanisms including sensor loops as an emerging systems-level property in cancer. Cancer Med 2023; 12:22130-22155. [PMID: 37987212 PMCID: PMC10757140 DOI: 10.1002/cam4.6719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 10/31/2023] [Accepted: 11/07/2023] [Indexed: 11/22/2023] Open
Abstract
The gene dosage compensation hypothesis presents a mechanism through which the expression of certain genes is modulated to compensate for differences in the dose of genes when additional chromosomes are present. It is one of the means through which cancer cells actively cope with the potential damaging effects of aneuploidy, a hallmark of most cancers. Dosage compensation arises through several processes, including downregulation or overexpression of specific genes and the relocation of dosage-sensitive genes. In cancer, a majority of compensated genes are generally thought to be regulated at the translational or post-translational level, and include the basic components of a compensation loop, including sensors of gene dosage and modulators of gene expression. Post-translational regulation is mostly undertaken by a general degradation or aggregation of remaining protein subunits of macromolecular complexes. An increasingly important role has also been observed for transcriptional level regulation. This article reviews the process of targeted gene dosage compensation in cancer and other biological conditions, along with the mechanisms by which cells regulate specific genes to restore cellular homeostasis. These mechanisms represent potential targets for the inhibition of dosage compensation of specific genes in aneuploid cancers. This article critically examines the process of targeted gene dosage compensation in cancer and other biological contexts, alongside the criteria for identifying genes subject to dosage compensation and the intricate mechanisms by which cells orchestrate the regulation of specific genes to reinstate cellular homeostasis. Ultimately, our aim is to gain a comprehensive understanding of the intricate nature of a systems-level property. This property hinges upon the kinetic parameters of regulatory motifs, which we have termed "gene dosage sensor loops." These loops have the potential to operate at both the transcriptional and translational levels, thus emerging as promising candidates for the inhibition of dosage compensation in specific genes. Additionally, they represent novel and highly specific therapeutic targets in the context of aneuploid cancer.
Collapse
Affiliation(s)
- Diana M. Bravo‐Estupiñan
- CICICA, Centro de Investigación en Cirugía y Cáncer Research Center on Surgery and CancerUniversidad de Costa RicaSan JoséCosta Rica
- Programa de Doctorado en Ciencias, Sistema de Estudios de Posgrado (SEP)Universidad de Costa RicaSan JoséCosta Rica
- Laboratorio de Terapia Génica, Departamento de BioquímicaEscuela Nacional de Ciencias Biológicas del Instituto Politécnico NacionalCiudad de MéxicoMexico
- Speratum Biopharma, Inc.Centro Nacional de Innovación Biotecnológica Nacional (CENIBiot)San JoséCosta Rica
| | - Karol Aguilar‐Guerrero
- CICICA, Centro de Investigación en Cirugía y Cáncer Research Center on Surgery and CancerUniversidad de Costa RicaSan JoséCosta Rica
- Maestría académica en Microbiología, Programa de Posgrado en Microbiología, Parasitología, Química Clínica e InmunologíaUniversidad de Costa RicaSan JoséCosta Rica
| | - Steve Quirós
- CICICA, Centro de Investigación en Cirugía y Cáncer Research Center on Surgery and CancerUniversidad de Costa RicaSan JoséCosta Rica
- Laboratorio de Quimiosensibilidad tumoral (LQT), Centro de Investigación en enfermedades Tropicales (CIET), Facultad de MicrobiologíaUniversidad de Costa RicaSan JoséCosta Rica
| | - Man‐Sai Acón
- CICICA, Centro de Investigación en Cirugía y Cáncer Research Center on Surgery and CancerUniversidad de Costa RicaSan JoséCosta Rica
| | - Christian Marín‐Müller
- Speratum Biopharma, Inc.Centro Nacional de Innovación Biotecnológica Nacional (CENIBiot)San JoséCosta Rica
| | - Miguel Ibáñez‐Hernández
- Laboratorio de Terapia Génica, Departamento de BioquímicaEscuela Nacional de Ciencias Biológicas del Instituto Politécnico NacionalCiudad de MéxicoMexico
| | - Rodrigo A. Mora‐Rodríguez
- CICICA, Centro de Investigación en Cirugía y Cáncer Research Center on Surgery and CancerUniversidad de Costa RicaSan JoséCosta Rica
- Laboratorio de Quimiosensibilidad tumoral (LQT), Centro de Investigación en enfermedades Tropicales (CIET), Facultad de MicrobiologíaUniversidad de Costa RicaSan JoséCosta Rica
| |
Collapse
|
3
|
Sarwal V, Niehus S, Ayyala R, Kim M, Sarkar A, Chang S, Lu A, Rajkumar N, Darci-Maher N, Littman R, Chhugani K, Soylev A, Comarova Z, Wesel E, Castellanos J, Chikka R, Distler MG, Eskin E, Flint J, Mangul S. A comprehensive benchmarking of WGS-based deletion structural variant callers. Brief Bioinform 2022; 23:bbac221. [PMID: 35753701 PMCID: PMC9294411 DOI: 10.1093/bib/bbac221] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Revised: 04/30/2022] [Accepted: 05/11/2022] [Indexed: 01/10/2023] Open
Abstract
Advances in whole-genome sequencing (WGS) promise to enable the accurate and comprehensive structural variant (SV) discovery. Dissecting SVs from WGS data presents a substantial number of challenges and a plethora of SV detection methods have been developed. Currently, evidence that investigators can use to select appropriate SV detection tools is lacking. In this article, we have evaluated the performance of SV detection tools on mouse and human WGS data using a comprehensive polymerase chain reaction-confirmed gold standard set of SVs and the genome-in-a-bottle variant set, respectively. In contrast to the previous benchmarking studies, our gold standard dataset included a complete set of SVs allowing us to report both precision and sensitivity rates of the SV detection methods. Our study investigates the ability of the methods to detect deletions, thus providing an optimistic estimate of SV detection performance as the SV detection methods that fail to detect deletions are likely to miss more complex SVs. We found that SV detection tools varied widely in their performance, with several methods providing a good balance between sensitivity and precision. Additionally, we have determined the SV callers best suited for low- and ultralow-pass sequencing data as well as for different deletion length categories.
Collapse
Affiliation(s)
- Varuni Sarwal
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
- Indian Institute of Technology Delhi, Hauz Khas, New Delhi, Delhi 110016, India
| | - Sebastian Niehus
- Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Str. 2, 10178 Berlin, Germany
- Charité-Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Charitéplatz 1, 10117 Berlin, Germany
| | - Ram Ayyala
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Minyoung Kim
- Department of Quantitative and Computational Biology, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089
| | - Aditya Sarkar
- School of Computing and Electrical Engineering, Indian Institute of Technology Mandi, Kamand, Mandi, Himachal Pradesh 175001, India
| | - Sei Chang
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Angela Lu
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Neha Rajkumar
- Department of Bioengineering, Department of Bioengineering, University of California Los Angeles, Los Angeles, CA, 90095
| | - Nicholas Darci-Maher
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Russell Littman
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Karishma Chhugani
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California 1985 Zonal Avenue Los Angeles, CA 90089-9121
| | - Arda Soylev
- Department of Computer Engineering, Konya Food and Agriculture University, Konya, Turkey
| | - Zoia Comarova
- Department Civil and Environmental Engineering, University of Southern California, Los Angeles, CA, United States
| | - Emily Wesel
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Jacqueline Castellanos
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Rahul Chikka
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Margaret G Distler
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Eleazar Eskin
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine at UCLA, 695 Charles E. Young Drive South, Box 708822, Los Angeles, CA, 90095, USA
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, 73-235 CHS, Los Angeles, CA, 90095, USA
| | - Jonathan Flint
- Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, 760 Westwood Plaza, Los Angeles, CA 90095, USA
| | - Serghei Mangul
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California 1985 Zonal Avenue Los Angeles, CA 90089-9121
| |
Collapse
|
4
|
Tisserant E, Vitobello A, Callegarin D, Verdez S, Bruel AL, Aho Glele LS, Sorlin A, Viora-Dupont E, Konyukh M, Marle N, Nambot S, Moutton S, Racine C, Garde A, Delanne J, Tran-Mau-Them F, Philippe C, Kuentz P, Poulleau M, Payet M, Poe C, Thauvin-Robinet C, Faivre L, Mosca-Boidron AL, Thevenon J, Duffourd Y, Callier P. Copy number variants calling from WES data through eXome hidden Markov model (XHMM) identifies additional 2.5% pathogenic genomic imbalances smaller than 30 kb undetected by array-CGH. Ann Hum Genet 2022; 86:171-180. [PMID: 35141892 DOI: 10.1111/ahg.12459] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Revised: 12/14/2021] [Accepted: 01/11/2022] [Indexed: 12/14/2022]
Abstract
It has been estimated that Copy Number Variants (CNVs) account for 10%-20% of patients affected by Developmental Disorder (DD)/Intellectual Disability (ID). Although array comparative genomic hybridization (array-CGH) represents the gold-standard for the detection of genomic imbalances, common Agilent array-CGH 4 × 180 kb arrays fail to detect CNVs smaller than 30 kb. Whole Exome sequencing (WES) is becoming the reference application for the detection of gene variants and makes it possible also to infer genomic imbalances at single exon resolution. However, the contribution of small CNVs in DD/ID is still underinvestigated. We made use of the eXome Hidden Markov Model (XHMM) software, a tool utilized by the ExAC consortium, to detect CNVs from whole exome sequencing data, in a cohort of 200 unsolved DD/DI patients after array-CGH and WES-based single nucleotide/indel variant analyses. In five out of 200 patients (2.5%), we identified pathogenic CNV(s) smaller than 30 kb, ranging from one to six exons. They included two heterozygous deletions in TCF4 and STXBP1 and three homozygous deletions in PPT1, CLCN2, and PIGN. After reverse phenotyping, all variants were reported as causative. This study shows the interest in applying sequencing-based CNV detection, from available WES data, to reduce the diagnostic odyssey of additional patients unsolved DD/DI patients and compare the CNV-detection yield of Agilent array-CGH 4 × 180kb versus whole exome sequencing.
Collapse
Affiliation(s)
- Emilie Tisserant
- Inserm UMR 1231 GAD, Faculty of Health Sciences, University of Burgundy and Franche-Comté, Dijon, France
| | - Antonio Vitobello
- Inserm UMR 1231 GAD, Faculty of Health Sciences, University of Burgundy and Franche-Comté, Dijon, France.,Molecular and chromosomal genetics laboratory, Biology Transfer Platform, Dijon University Hospital, Dijon, France
| | - Davide Callegarin
- Molecular and chromosomal genetics laboratory, Biology Transfer Platform, Dijon University Hospital, Dijon, France
| | - Simon Verdez
- Molecular and chromosomal genetics laboratory, Biology Transfer Platform, Dijon University Hospital, Dijon, France
| | - Ange-Line Bruel
- Inserm UMR 1231 GAD, Faculty of Health Sciences, University of Burgundy and Franche-Comté, Dijon, France
| | | | - Arthur Sorlin
- Inserm UMR 1231 GAD, Faculty of Health Sciences, University of Burgundy and Franche-Comté, Dijon, France.,Molecular and chromosomal genetics laboratory, Biology Transfer Platform, Dijon University Hospital, Dijon, France
| | - Eleonore Viora-Dupont
- Molecular and chromosomal genetics laboratory, Biology Transfer Platform, Dijon University Hospital, Dijon, France
| | - Marina Konyukh
- Molecular and chromosomal genetics laboratory, Biology Transfer Platform, Dijon University Hospital, Dijon, France
| | - Nathalie Marle
- Molecular and chromosomal genetics laboratory, Biology Transfer Platform, Dijon University Hospital, Dijon, France
| | - Sophie Nambot
- Inserm UMR 1231 GAD, Faculty of Health Sciences, University of Burgundy and Franche-Comté, Dijon, France.,Hospital Hygiene and Epidemiology Unit, Dijon University Hospital, Dijon Cedex, France
| | - Sébastien Moutton
- Inserm UMR 1231 GAD, Faculty of Health Sciences, University of Burgundy and Franche-Comté, Dijon, France.,Molecular and chromosomal genetics laboratory, Biology Transfer Platform, Dijon University Hospital, Dijon, France.,Reference Center for Intellectual Disorders, Dijon University Hospital, Dijon, France
| | - Caroline Racine
- Inserm UMR 1231 GAD, Faculty of Health Sciences, University of Burgundy and Franche-Comté, Dijon, France.,Molecular and chromosomal genetics laboratory, Biology Transfer Platform, Dijon University Hospital, Dijon, France.,Genetics Department and Reference Center for Developmental Disorders and Malformative Syndromes for East France, FHU TRANSLAD, Dijon University Hospital, Dijon, France
| | - Aurore Garde
- Inserm UMR 1231 GAD, Faculty of Health Sciences, University of Burgundy and Franche-Comté, Dijon, France.,Molecular and chromosomal genetics laboratory, Biology Transfer Platform, Dijon University Hospital, Dijon, France
| | - Julian Delanne
- Inserm UMR 1231 GAD, Faculty of Health Sciences, University of Burgundy and Franche-Comté, Dijon, France.,Genetics Department and Reference Center for Developmental Disorders and Malformative Syndromes for East France, FHU TRANSLAD, Dijon University Hospital, Dijon, France
| | - Frédéric Tran-Mau-Them
- Inserm UMR 1231 GAD, Faculty of Health Sciences, University of Burgundy and Franche-Comté, Dijon, France
| | - Christophe Philippe
- Inserm UMR 1231 GAD, Faculty of Health Sciences, University of Burgundy and Franche-Comté, Dijon, France.,Molecular and chromosomal genetics laboratory, Biology Transfer Platform, Dijon University Hospital, Dijon, France
| | - Paul Kuentz
- Inserm UMR 1231 GAD, Faculty of Health Sciences, University of Burgundy and Franche-Comté, Dijon, France
| | - Marlène Poulleau
- Molecular and chromosomal genetics laboratory, Biology Transfer Platform, Dijon University Hospital, Dijon, France
| | - Muriel Payet
- Molecular and chromosomal genetics laboratory, Biology Transfer Platform, Dijon University Hospital, Dijon, France
| | - Charlotte Poe
- Inserm UMR 1231 GAD, Faculty of Health Sciences, University of Burgundy and Franche-Comté, Dijon, France
| | - Christel Thauvin-Robinet
- Inserm UMR 1231 GAD, Faculty of Health Sciences, University of Burgundy and Franche-Comté, Dijon, France.,Genetics Department and Reference Center for Developmental Disorders and Malformative Syndromes for East France, FHU TRANSLAD, Dijon University Hospital, Dijon, France.,Reference Center for Intellectual Disorders, Dijon University Hospital, Dijon, France
| | - Laurence Faivre
- Inserm UMR 1231 GAD, Faculty of Health Sciences, University of Burgundy and Franche-Comté, Dijon, France.,Genetics Department and Reference Center for Developmental Disorders and Malformative Syndromes for East France, FHU TRANSLAD, Dijon University Hospital, Dijon, France.,Reference Center for Intellectual Disorders, Dijon University Hospital, Dijon, France
| | - Anne-Laure Mosca-Boidron
- Inserm UMR 1231 GAD, Faculty of Health Sciences, University of Burgundy and Franche-Comté, Dijon, France.,Molecular and chromosomal genetics laboratory, Biology Transfer Platform, Dijon University Hospital, Dijon, France
| | - Julien Thevenon
- Inserm UMR 1231 GAD, Faculty of Health Sciences, University of Burgundy and Franche-Comté, Dijon, France.,Genetics Department and Reference Center for Developmental Disorders and Malformative Syndromes for East France, FHU TRANSLAD, Dijon University Hospital, Dijon, France
| | - Yannis Duffourd
- Inserm UMR 1231 GAD, Faculty of Health Sciences, University of Burgundy and Franche-Comté, Dijon, France
| | - Patrick Callier
- Inserm UMR 1231 GAD, Faculty of Health Sciences, University of Burgundy and Franche-Comté, Dijon, France.,Molecular and chromosomal genetics laboratory, Biology Transfer Platform, Dijon University Hospital, Dijon, France
| |
Collapse
|
5
|
Khalil AIS, Chattopadhyay A, Sanyal A. Analysis of Aneuploidy Spectrum From Whole-Genome Sequencing Provides Rapid Assessment of Clonal Variation Within Established Cancer Cell Lines. Cancer Inform 2021; 20:11769351211049236. [PMID: 34671179 PMCID: PMC8521761 DOI: 10.1177/11769351211049236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 09/02/2021] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND The revolution in next-generation sequencing (NGS) technology has allowed easy access and sharing of high-throughput sequencing datasets of cancer cell lines and their integrative analyses. However, long-term passaging and culture conditions introduce high levels of genomic and phenotypic diversity in established cell lines resulting in strain differences. Thus, clonal variation in cultured cell lines with respect to the reference standard is a major barrier in systems biology data analyses. Therefore, there is a pressing need for a fast and entry-level assessment of clonal variations within cell lines using their high-throughput sequencing data. RESULTS We developed a Python-based software, AStra, for de novo estimation of the genome-wide segmental aneuploidy to measure and visually interpret strain-level similarities or differences of cancer cell lines from whole-genome sequencing (WGS). We demonstrated that aneuploidy spectrum can capture the genetic variations in 27 strains of MCF7 breast cancer cell line collected from different laboratories. Performance evaluation of AStra using several cancer sequencing datasets revealed that cancer cell lines exhibit distinct aneuploidy spectra which reflect their previously-reported karyotypic observations. Similarly, AStra successfully identified large-scale DNA copy number variations (CNVs) artificially introduced in simulated WGS datasets. CONCLUSIONS AStra provides an analytical and visualization platform for rapid and easy comparison between different strains or between cell lines based on their aneuploidy spectra solely using the raw BAM files representing mapped reads. We recommend AStra for rapid first-pass quality assessment of cancer cell lines before integrating scientific datasets that employ deep sequencing. AStra is an open-source software and is available at https://github.com/AISKhalil/AStra.
Collapse
Affiliation(s)
| | - Anupam Chattopadhyay
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| | - Amartya Sanyal
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
6
|
Critical evaluation of CNA estimators for DNA data using matching confidence masks and WGS technology. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2021.103004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
7
|
Smolander J, Khan S, Singaravelu K, Kauko L, Lund RJ, Laiho A, Elo LL. Evaluation of tools for identifying large copy number variations from ultra-low-coverage whole-genome sequencing data. BMC Genomics 2021; 22:357. [PMID: 34000988 PMCID: PMC8130438 DOI: 10.1186/s12864-021-07686-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Accepted: 05/07/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Detection of copy number variations (CNVs) from high-throughput next-generation whole-genome sequencing (WGS) data has become a widely used research method during the recent years. However, only a little is known about the applicability of the developed algorithms to ultra-low-coverage (0.0005-0.8×) data that is used in various research and clinical applications, such as digital karyotyping and single-cell CNV detection. RESULT Here, the performance of six popular read-depth based CNV detection algorithms (BIC-seq2, Canvas, CNVnator, FREEC, HMMcopy, and QDNAseq) was studied using ultra-low-coverage WGS data. Real-world array- and karyotyping kit-based validation were used as a benchmark in the evaluation. Additionally, ultra-low-coverage WGS data was simulated to investigate the ability of the algorithms to identify CNVs in the sex chromosomes and the theoretical minimum coverage at which these tools can accurately function. Our results suggest that while all the methods were able to detect large CNVs, many methods were susceptible to producing false positives when smaller CNVs (< 2 Mbp) were detected. There was also significant variability in their ability to identify CNVs in the sex chromosomes. Overall, BIC-seq2 was found to be the best method in terms of statistical performance. However, its significant drawback was by far the slowest runtime among the methods (> 3 h) compared with FREEC (~ 3 min), which we considered the second-best method. CONCLUSIONS Our comparative analysis demonstrates that CNV detection from ultra-low-coverage WGS data can be a highly accurate method for the detection of large copy number variations when their length is in millions of base pairs. These findings facilitate applications that utilize ultra-low-coverage CNV detection.
Collapse
Affiliation(s)
- Johannes Smolander
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520, Turku, Finland
| | - Sofia Khan
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520, Turku, Finland
| | - Kalaimathy Singaravelu
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520, Turku, Finland
| | - Leni Kauko
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520, Turku, Finland
| | - Riikka J Lund
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520, Turku, Finland
| | - Asta Laiho
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520, Turku, Finland
| | - Laura L Elo
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520, Turku, Finland.
- Institute of Biomedicine, University of Turku, 20520, Turku, Finland.
| |
Collapse
|
8
|
Jäger N. Bioinformatics workflows for clinical applications in precision oncology. Semin Cancer Biol 2021; 84:103-112. [PMID: 33476720 DOI: 10.1016/j.semcancer.2020.12.020] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 12/15/2020] [Accepted: 12/28/2020] [Indexed: 12/23/2022]
Abstract
High-throughput molecular profiling of tumors is a fundamental aspect of precision oncology, enabling the identification of genomic alterations that can be targeted therapeutically. In this context, a patient is matched to a specific drug or therapy based on the tumor's underlying genetic driver events rather than the histologic classification. This approach requires extensive bioinformatics methodology and workflows, including raw sequencing data processing and quality control, variant calling and annotation, integration of different molecular data types, visualization and finally reporting the data to physicians, cancer researchers and pharmacologists in a format that is readily interpretable for clinical decision making. This review comprises a broad overview of these bioinformatics aspects and discusses the multiple analytical, technical and interpretational challenges that remain to efficiently translate molecular findings into personalized treatment recommendations.
Collapse
Affiliation(s)
- Natalie Jäger
- Hopp Children's Cancer Center Heidelberg (KiTZ) & Division of Pediatric Neurooncology, German Cancer Research Center (DKFZ), Heidelberg, Germany.
| |
Collapse
|
9
|
Koboldt DC. Best practices for variant calling in clinical sequencing. Genome Med 2020; 12:91. [PMID: 33106175 PMCID: PMC7586657 DOI: 10.1186/s13073-020-00791-w] [Citation(s) in RCA: 149] [Impact Index Per Article: 37.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Accepted: 10/08/2020] [Indexed: 02/08/2023] Open
Abstract
Next-generation sequencing technologies have enabled a dramatic expansion of clinical genetic testing both for inherited conditions and diseases such as cancer. Accurate variant calling in NGS data is a critical step upon which virtually all downstream analysis and interpretation processes rely. Just as NGS technologies have evolved considerably over the past 10 years, so too have the software tools and approaches for detecting sequence variants in clinical samples. In this review, I discuss the current best practices for variant calling in clinical sequencing studies, with a particular emphasis on trio sequencing for inherited disorders and somatic mutation detection in cancer patients. I describe the relative strengths and weaknesses of panel, exome, and whole-genome sequencing for variant detection. Recommended tools and strategies for calling variants of different classes are also provided, along with guidance on variant review, validation, and benchmarking to ensure optimal performance. Although NGS technologies are continually evolving, and new capabilities (such as long-read single-molecule sequencing) are emerging, the “best practice” principles in this review should be relevant to clinical variant calling in the long term.
Collapse
Affiliation(s)
- Daniel C Koboldt
- Steve and Cindy Rasmussen Institute for Genomic Medicine at Nationwide Children's Hospital, Columbus, OH, USA. .,Department of Pediatrics, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|
10
|
Khalil AIS, Khyriem C, Chattopadhyay A, Sanyal A. Hierarchical discovery of large-scale and focal copy number alterations in low-coverage cancer genomes. BMC Bioinformatics 2020; 21:147. [PMID: 32299346 PMCID: PMC7160937 DOI: 10.1186/s12859-020-3480-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2019] [Accepted: 04/01/2020] [Indexed: 12/15/2022] Open
Abstract
Background Detection of DNA copy number alterations (CNAs) is critical to understand genetic diversity, genome evolution and pathological conditions such as cancer. Cancer genomes are plagued with widespread multi-level structural aberrations of chromosomes that pose challenges to discover CNAs of different length scales, and distinct biological origins and functions. Although several computational tools are available to identify CNAs using read depth (RD) signal, they fail to distinguish between large-scale and focal alterations due to inaccurate modeling of the RD signal of cancer genomes. Additionally, RD signal is affected by overdispersion-driven biases at low coverage, which significantly inflate false detection of CNA regions. Results We have developed CNAtra framework to hierarchically discover and classify ‘large-scale’ and ‘focal’ copy number gain/loss from a single whole-genome sequencing (WGS) sample. CNAtra first utilizes a multimodal-based distribution to estimate the copy number (CN) reference from the complex RD profile of the cancer genome. We implemented Savitzky-Golay smoothing filter and Modified Varri segmentation to capture the change points of the RD signal. We then developed a CN state-driven merging algorithm to identify the large segments with distinct copy numbers. Next, we identified focal alterations in each large segment using coverage-based thresholding to mitigate the adverse effects of signal variations. Using cancer cell lines and patient datasets, we confirmed CNAtra’s ability to detect and distinguish the segmental aneuploidies and focal alterations. We used realistic simulated data for benchmarking the performance of CNAtra against other single-sample detection tools, where we artificially introduced CNAs in the original cancer profiles. We found that CNAtra is superior in terms of precision, recall and f-measure. CNAtra shows the highest sensitivity of 93 and 97% for detecting large-scale and focal alterations respectively. Visual inspection of CNAs revealed that CNAtra is the most robust detection tool for low-coverage cancer data. Conclusions CNAtra is a single-sample CNA detection tool that provides an analytical and visualization framework for CNA profiling without relying on any reference control. It can detect chromosome-level segmental aneuploidies and high-confidence focal alterations, even from low-coverage data. CNAtra is an open-source software implemented in MATLAB®. It is freely available at https://github.com/AISKhalil/CNAtra.
Collapse
Affiliation(s)
- Ahmed Ibrahim Samir Khalil
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798, Singapore
| | - Costerwell Khyriem
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore
| | - Anupam Chattopadhyay
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798, Singapore.
| | - Amartya Sanyal
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore.
| |
Collapse
|
11
|
Xing Y, Dabney AR, Li X, Wang G, Gill CA, Casola C. SECNVs: A Simulator of Copy Number Variants and Whole-Exome Sequences From Reference Genomes. Front Genet 2020; 11:82. [PMID: 32153642 PMCID: PMC7046838 DOI: 10.3389/fgene.2020.00082] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Accepted: 01/24/2020] [Indexed: 01/26/2023] Open
Abstract
Copy number variants are duplications and deletions of the genome that play an important role in phenotypic changes and human disease. Many software applications have been developed to detect copy number variants using either whole-genome sequencing or whole-exome sequencing data. However, there is poor agreement in the results from these applications. Simulated datasets containing copy number variants allow comprehensive comparisons of the operating characteristics of existing and novel copy number variant detection methods. Several software applications have been developed to simulate copy number variants and other structural variants in whole-genome sequencing data. However, none of the applications reliably simulate copy number variants in whole-exome sequencing data. We have developed and tested Simulator of Exome Copy Number Variants (SECNVs), a fast, robust and customizable software application for simulating copy number variants and whole-exome sequences from a reference genome. SECNVs is easy to install, implements a wide range of commands to customize simulations, can output multiple samples at once, and incorporates a pipeline to output rearranged genomes, short reads and BAM files in a single command. Variants generated by SECNVs are detected with high sensitivity and precision by tools commonly used to detect copy number variants. SECNVs is publicly available at https://github.com/YJulyXing/SECNVs.
Collapse
Affiliation(s)
- Yue Xing
- Interdisciplinary Program in Genetics, Texas A&M University, College Station, TX, United States
- Department of Statistics, Texas A&M University, College Station, TX, United States
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, United States
| | - Alan R. Dabney
- Department of Statistics, Texas A&M University, College Station, TX, United States
| | - Xiao Li
- Department of Molecular and Cellular Medicine, Texas A&M University, College Station, TX, United States
| | - Guosong Wang
- Department of Animal Science, Texas A&M University, College Station, TX, United States
| | - Clare A. Gill
- Department of Animal Science, Texas A&M University, College Station, TX, United States
| | - Claudio Casola
- Department of Ecosystem Science and Management, Texas A&M University, College Station, TX, United States
| |
Collapse
|
12
|
Luo F. A systematic evaluation of copy number alterations detection methods on real SNP array and deep sequencing data. BMC Bioinformatics 2019; 20:692. [PMID: 31874603 PMCID: PMC6929333 DOI: 10.1186/s12859-019-3266-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND The Copy Number Alterations (CNAs) are discovered to be tightly associated with cancers, so accurately detecting them is one of the most important tasks in the cancer genomics. A series of CNAs detection methods have been proposed and new ones are still being developed. Due to the complexity of CNAs in cancers, no CNAs detection method has been accepted as the gold standard caller. Several evaluation works have made attempts to reveal typical CNAs detection methods' performance. Limited by the scale of evaluation data, these different comparison works don't reach a consensus and the researchers are still confused on how to choose one proper CNAs caller for their analysis. Therefore, it needs a more comprehensive evaluation of typical CNAs detection methods' performance. RESULTS In this work, we use a large-scale real dataset from CAGEKID consortium to evaluate total 12 typical CNAs detection methods. These methods are most widely used in cancer researches and always used as benchmark for the newly proposed CNAs detection methods. This large-scale dataset comprises of SNP array data on 94 samples and the whole genome sequencing data on 10 samples. Evaluations are comprehensively implemented in current scenarios of CNAs detection, which include that detect CNAs on SNP array data, on sequencing data with tumor and normal matched samples and on sequencing data with single tumor sample. Three SNP based methods are firstly ranked. Subsequently, the best SNP based method's results are used as benchmark to compare six matched samples based methods and three single tumor sample based methods in terms of the preprocessing, recall rate, Jaccard index and segmentation characteristics. CONCLUSIONS Our survey thoroughly reveals 12 typical methods' superiority and inferiority. We explain why methods show specific characteristics from a methodological standpoint. Finally, we present the guiding principle for choosing one proper CNAs detection method under specific conditions. Some unsolved problems and expectations are also addressed for upcoming CNAs detection methods.
Collapse
Affiliation(s)
- Fei Luo
- School of Computer Science, Wuhan University, Wuhan, China.
| |
Collapse
|
13
|
Bartha Á, Győrffy B. Comprehensive Outline of Whole Exome Sequencing Data Analysis Tools Available in Clinical Oncology. Cancers (Basel) 2019; 11:E1725. [PMID: 31690036 PMCID: PMC6895801 DOI: 10.3390/cancers11111725] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 10/31/2019] [Accepted: 11/01/2019] [Indexed: 12/17/2022] Open
Abstract
Whole exome sequencing (WES) enables the analysis of all protein coding sequences in the human genome. This technology enables the investigation of cancer-related genetic aberrations that are predominantly located in the exonic regions. WES delivers high-throughput results at a reasonable price. Here, we review analysis tools enabling utilization of WES data in clinical and research settings. Technically, WES initially allows the detection of single nucleotide variants (SNVs) and copy number variations (CNVs), and data obtained through these methods can be combined and further utilized. Variant calling algorithms for SNVs range from standalone tools to machine learning-based combined pipelines. Tools for CNV detection compare the number of reads aligned to a dedicated segment. Both SNVs and CNVs help to identify mutations resulting in pharmacologically druggable alterations. The identification of homologous recombination deficiency enables the use of PARP inhibitors. Determining microsatellite instability and tumor mutation burden helps to select patients eligible for immunotherapy. To pave the way for clinical applications, we have to recognize some limitations of WES, including its restricted ability to detect CNVs, low coverage compared to targeted sequencing, and the missing consensus regarding references and minimal application requirements. Recently, Galaxy became the leading platform in non-command line-based WES data processing. The maturation of next-generation sequencing is reinforced by Food and Drug Administration (FDA)-approved methods for cancer screening, detection, and follow-up. WES is on the verge of becoming an affordable and sufficiently evolved technology for everyday clinical use.
Collapse
Affiliation(s)
- Áron Bartha
- Semmelweis University, Department of Bioinformatics and 2nd Department of Pediatrics, H-1094 Budapest, Hungary.
- TTK Cancer Biomarker Research Group, Institute of Enzymology, Magyar tudósokkörútja 2., H-1117 Budapest, Hungary.
| | - Balázs Győrffy
- Semmelweis University, Department of Bioinformatics and 2nd Department of Pediatrics, H-1094 Budapest, Hungary.
- TTK Cancer Biomarker Research Group, Institute of Enzymology, Magyar tudósokkörútja 2., H-1117 Budapest, Hungary.
| |
Collapse
|
14
|
Ried T, Meijer GA, Harrison DJ, Grech G, Franch-Expósito S, Briffa R, Carvalho B, Camps J. The landscape of genomic copy number alterations in colorectal cancer and their consequences on gene expression levels and disease outcome. Mol Aspects Med 2019; 69:48-61. [PMID: 31365882 DOI: 10.1016/j.mam.2019.07.007] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 07/23/2019] [Accepted: 07/26/2019] [Indexed: 12/18/2022]
Abstract
Aneuploidy, the unbalanced state of the chromosome content, represents a hallmark of most solid tumors, including colorectal cancer. Such aneuploidies result in tumor specific genomic imbalances, which emerge in premalignant precursor lesions. Moreover, increasing levels of chromosomal instability have been observed in adenocarcinomas and are maintained in distant metastases. A number of studies have systematically integrated copy number alterations with gene expression changes in primary carcinomas, cell lines, and experimental models of aneuploidy. In fact, chromosomal aneuploidies target a number of genes conferring a selective advantage for the metabolism of the cancer cell. Copy number alterations not only have a positive correlation with expression changes of the majority of genes on the altered genomic segment, but also have effects on the transcriptional levels of genes genome-wide. Finally, copy number alterations have been associated with disease outcome; nevertheless, the translational applicability in clinical practice requires further studies. Here, we (i) review the spectrum of genetic alterations that lead to colorectal cancer, (ii) describe the most frequent copy number alterations at different stages of colorectal carcinogenesis, (iii) exemplify their positive correlation with gene expression levels, and (iv) discuss copy number alterations that are potentially involved in disease outcome of individual patients.
Collapse
Affiliation(s)
- Thomas Ried
- Genetics Branch, Center for Cancer Research, National Cancer Institute/National Institutes of Health, Bethesda, MD, USA.
| | - Gerrit A Meijer
- Department of Pathology, Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - David J Harrison
- School of Medicine, University of St Andrews, St Andrews, Scotland, UK
| | - Godfrey Grech
- Laboratory of Molecular Pathology, Department of Pathology, Faculty of Medicine and Surgery, University of Malta, Msida, Malta
| | - Sebastià Franch-Expósito
- Gastrointestinal and Pancreatic Oncology Group, Institut D'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), CIBEREHD, Barcelona, Spain
| | - Romina Briffa
- School of Medicine, University of St Andrews, St Andrews, Scotland, UK; Laboratory of Molecular Pathology, Department of Pathology, Faculty of Medicine and Surgery, University of Malta, Msida, Malta
| | - Beatriz Carvalho
- Department of Pathology, Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Jordi Camps
- Gastrointestinal and Pancreatic Oncology Group, Institut D'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), CIBEREHD, Barcelona, Spain; Unitat de Biologia Cel·lular i Genètica Mèdica, Departament de Biologia Cel·lular, Fisiologia i Immunologia, Facultat de Medicina, Universitat Autònoma de Barcelona, Bellaterra, Spain.
| |
Collapse
|
15
|
Singer J, Irmisch A, Ruscheweyh HJ, Singer F, Toussaint NC, Levesque MP, Stekhoven DJ, Beerenwinkel N. Bioinformatics for precision oncology. Brief Bioinform 2019; 20:778-788. [PMID: 29272324 PMCID: PMC6585151 DOI: 10.1093/bib/bbx143] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Revised: 09/29/2017] [Indexed: 12/13/2022] Open
Abstract
Molecular profiling of tumor biopsies plays an increasingly important role not only in cancer research, but also in the clinical management of cancer patients. Multi-omics approaches hold the promise of improving diagnostics, prognostics and personalized treatment. To deliver on this promise of precision oncology, appropriate bioinformatics methods for managing, integrating and analyzing large and complex data are necessary. Here, we discuss the specific requirements of bioinformatics methods and software that arise in the setting of clinical oncology, owing to a stricter regulatory environment and the need for rapid, highly reproducible and robust procedures. We describe the workflow of a molecular tumor board and the specific bioinformatics support that it requires, from the primary analysis of raw molecular profiling data to the automatic generation of a clinical report and its delivery to decision-making clinical oncologists. Such workflows have to various degrees been implemented in many clinical trials, as well as in molecular tumor boards at specialized cancer centers and university hospitals worldwide. We review these and more recent efforts to include other high-dimensional multi-omics patient profiles into the tumor board, as well as the state of clinical decision support software to translate molecular findings into treatment recommendations.
Collapse
Affiliation(s)
- Jochen Singer
- Department of Biosystems Science and Engineering of ETH Zurich in Basel, Switzerland
| | - Anja Irmisch
- Department of Dermatology at the University of Zurich Hospital in Zurich, Switzerland
| | | | | | | | | | | | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering of ETH Zurich in Basel, Switzerland
| |
Collapse
|
16
|
Zhang M, Liu D, Tang J, Feng Y, Wang T, Dobbin KK, Schliekelman P, Zhao S. SEG - A Software Program for Finding Somatic Copy Number Alterations in Whole Genome Sequencing Data of Cancer. Comput Struct Biotechnol J 2018; 16:335-341. [PMID: 30258547 PMCID: PMC6154469 DOI: 10.1016/j.csbj.2018.09.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2018] [Revised: 08/31/2018] [Accepted: 09/01/2018] [Indexed: 01/15/2023] Open
Abstract
As next-generation sequencing technology advances and the cost decreases, whole genome sequencing (WGS) has become the preferred platform for the identification of somatic copy number alteration (CNA) events in cancer genomes. To more effectively decipher these massive sequencing data, we developed a software program named SEG, shortened from the word “segment”. SEG utilizes mapped read or fragment density for CNA discovery. To reduce CNA artifacts arisen from sequencing and mapping biases, SEG first normalizes the data by taking the log2-ratio of each tumor density against its matching normal density. SEG then uses dynamic programming to find change-points among a contiguous log2-ratio data series along a chromosome, dividing the chromosome into different segments. SEG finally identifies those segments having CNA. Our analyses with both simulated and real sequencing data indicate that SEG finds more small CNAs than other published software tools.
Collapse
Affiliation(s)
- Mucheng Zhang
- Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, University of Georgia, Athens, GA30602-7229, USA
| | - Deli Liu
- Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, University of Georgia, Athens, GA30602-7229, USA
| | - Jie Tang
- Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, University of Georgia, Athens, GA30602-7229, USA
| | - Yuan Feng
- Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, University of Georgia, Athens, GA30602-7229, USA
| | - Tianfang Wang
- Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, University of Georgia, Athens, GA30602-7229, USA
| | - Kevin K Dobbin
- Department of Biostatistics, University of Georgia, Athens, GA30602-7229, USA
| | - Paul Schliekelman
- Department of Statistics, University of Georgia, Athens, GA30602-7229, USA
| | - Shaying Zhao
- Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, University of Georgia, Athens, GA30602-7229, USA
| |
Collapse
|
17
|
Rieber N, Bohnert R, Ziehm U, Jansen G. Reliability of algorithmic somatic copy number alteration detection from targeted capture data. Bioinformatics 2018; 33:2791-2798. [PMID: 28472276 PMCID: PMC5870863 DOI: 10.1093/bioinformatics/btx284] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2016] [Accepted: 05/03/2017] [Indexed: 01/11/2023] Open
Abstract
Motivation Whole exome and gene panel sequencing are increasingly used for oncological diagnostics. To investigate the accuracy of SCNA detection algorithms on simulated and clinical tumor samples, the precision and sensitivity of four SCNA callers were measured using 50 simulated whole exome and 50 simulated targeted gene panel datasets, and using 119 TCGA tumor samples for which SNP array data were available. Results On synthetic exome and panel data, VarScan2 mostly called false positives, whereas Control-FREEC was precise (>90% correct calls) at the cost of low sensitivity (<40% detected). ONCOCNV was slightly less precise on gene panel data, with similarly low sensitivity. This could be explained by low sensitivity for amplifications and high precision for deletions. Surprisingly, these results were not strongly affected by moderate tumor impurities; only contaminations with more than 60% non-cancerous cells resulted in strongly declining precision and sensitivity. On the 119 clinical samples, both Control-FREEC and CNVkit called 71.8% and 94%, respectively, of the SCNAs found by the SNP arrays, but with a considerable amount of false positives (precision 29% and 4.9%). Discussion Whole exome and targeted gene panel methods by design limit the precision of SCNA callers, making them prone to false positives. SCNA calls cannot easily be integrated in clinical pipelines that use data from targeted capture-based sequencing. If used at all, they need to be cross-validated using orthogonal methods. Availability and implementation Scripts are provided as supplementary information. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nora Rieber
- Molecular Health GmbH, Kurfürsten-Anlage 21, 69115 Heidelberg, Germany
| | - Regina Bohnert
- Molecular Health GmbH, Kurfürsten-Anlage 21, 69115 Heidelberg, Germany
| | - Ulrike Ziehm
- Molecular Health GmbH, Kurfürsten-Anlage 21, 69115 Heidelberg, Germany
| | - Gunther Jansen
- Molecular Health GmbH, Kurfürsten-Anlage 21, 69115 Heidelberg, Germany
| |
Collapse
|
18
|
Luo Z, Fan X, Su Y, Huang YS. Accurity: accurate tumor purity and ploidy inference from tumor-normal WGS data by jointly modelling somatic copy number alterations and heterozygous germline single-nucleotide-variants. Bioinformatics 2018; 34:2004-2011. [PMID: 29385401 PMCID: PMC9881684 DOI: 10.1093/bioinformatics/bty043] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2016] [Accepted: 01/26/2018] [Indexed: 02/02/2023] Open
Abstract
Motivation Tumor purity and ploidy have a substantial impact on next-gen sequence analyses of tumor samples and may alter the biological and clinical interpretation of results. Despite the existence of several computational methods that are dedicated to estimate tumor purity and/or ploidy from The Cancer Genome Atlas (TCGA) tumor-normal whole-genome-sequencing (WGS) data, an accurate, fast and fully-automated method that works in a wide range of sequencing coverage, level of tumor purity and level of intra-tumor heterogeneity, is still missing. Results We describe a computational method called Accurity that infers tumor purity, tumor cell ploidy and absolute allelic copy numbers for somatic copy number alterations (SCNAs) from tumor-normal WGS data by jointly modelling SCNAs and heterozygous germline single-nucleotide-variants (HGSNVs). Results from both in silico and real sequencing data demonstrated that Accurity is highly accurate and robust, even in low-purity, high-ploidy and low-coverage settings in which several existing methods perform poorly. Accounting for tumor purity and ploidy, Accurity significantly increased signal/noise gaps between different copy numbers. We are hopeful that Accurity is of clinical use for identifying cancer diagnostic biomarkers. Availability and implementation Accurity is implemented in C++/Rust, available at http://www.yfish.org/software/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Yao Su
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Yu S Huang
- To whom correspondence should be addressed.
| |
Collapse
|
19
|
Kotelnikova EA, Pyatnitskiy M, Paleeva A, Kremenetskaya O, Vinogradov D. Practical aspects of NGS-based pathways analysis for personalized cancer science and medicine. Oncotarget 2018; 7:52493-52516. [PMID: 27191992 PMCID: PMC5239569 DOI: 10.18632/oncotarget.9370] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Accepted: 04/18/2016] [Indexed: 12/17/2022] Open
Abstract
Nowadays, the personalized approach to health care and cancer care in particular is becoming more and more popular and is taking an important place in the translational medicine paradigm. In some cases, detection of the patient-specific individual mutations that point to a targeted therapy has already become a routine practice for clinical oncologists. Wider panels of genetic markers are also on the market which cover a greater number of possible oncogenes including those with lower reliability of resulting medical conclusions. In light of the large availability of high-throughput technologies, it is very tempting to use complete patient-specific New Generation Sequencing (NGS) or other "omics" data for cancer treatment guidance. However, there are still no gold standard methods and protocols to evaluate them. Here we will discuss the clinical utility of each of the data types and describe a systems biology approach adapted for single patient measurements. We will try to summarize the current state of the field focusing on the clinically relevant case-studies and practical aspects of data processing.
Collapse
Affiliation(s)
- Ekaterina A Kotelnikova
- Personal Biomedicine, Moscow, Russia.,A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia.,Institute Biomedical Research August Pi Sunyer (IDIBAPS), Hospital Clinic of Barcelona, Barcelona, Spain
| | - Mikhail Pyatnitskiy
- Personal Biomedicine, Moscow, Russia.,Orekhovich Institute of Biomedical Chemistry, Moscow, Russia.,Pirogov Russian National Research Medical University, Moscow, Russia
| | | | - Olga Kremenetskaya
- Personal Biomedicine, Moscow, Russia.,Center for Theoretical Problems of Physicochemical Pharmacology, Russian Academy of Sciences, Moscow, Russia
| | - Dmitriy Vinogradov
- Personal Biomedicine, Moscow, Russia.,A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia.,Lomonosov Moscow State University, Moscow, Russia
| |
Collapse
|
20
|
Added Value of Whole-Exome and Transcriptome Sequencing for Clinical Molecular Screenings of Advanced Cancer Patients With Solid Tumors. Cancer J 2018; 24:153-162. [DOI: 10.1097/ppo.0000000000000322] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
21
|
Gao J, Wan C, Zhang H, Li A, Zang Q, Ban R, Ali A, Yu Z, Shi Q, Jiang X, Zhang Y. Anaconda: AN automated pipeline for somatic COpy Number variation Detection and Annotation from tumor exome sequencing data. BMC Bioinformatics 2017; 18:436. [PMID: 28974218 PMCID: PMC5627484 DOI: 10.1186/s12859-017-1833-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Accepted: 09/11/2017] [Indexed: 12/05/2022] Open
Abstract
Background Copy number variations (CNVs) are the main genetic structural variations in cancer genome. Detecting CNVs in genetic exome region is efficient and cost-effective in identifying cancer associated genes. Many tools had been developed accordingly and yet these tools lack of reliability because of high false negative rate, which is intrinsically caused by genome exonic bias. Results To provide an alternative option, here, we report Anaconda, a comprehensive pipeline that allows flexible integration of multiple CNV-calling methods and systematic annotation of CNVs in analyzing WES data. Just by one command, Anaconda can generate CNV detection result by up to four CNV detecting tools. Associated with comprehensive annotation analysis of genes involved in shared CNV regions, Anaconda is able to deliver a more reliable and useful report in assistance with CNV-associate cancer researches. Conclusion Anaconda package and manual can be freely accessed at http://mcg.ustc.edu.cn/bsc/ANACONDA/. Electronic supplementary material The online version of this article (10.1186/s12859-017-1833-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jianing Gao
- Molecular and Cell Genetics Laboratory, The CAS Key Laboratory of Innate Immunity and Chronic Diseases, Hefei National Laboratory for Physical Sciences at Microscale, School of Life Sciences, CAS Center for Excellence in Molecular Cell Science, University of Science and Technology of China, Hefei, Anhui, 230027, China
| | - Changlin Wan
- Molecular and Cell Genetics Laboratory, The CAS Key Laboratory of Innate Immunity and Chronic Diseases, Hefei National Laboratory for Physical Sciences at Microscale, School of Life Sciences, CAS Center for Excellence in Molecular Cell Science, University of Science and Technology of China, Hefei, Anhui, 230027, China
| | - Huan Zhang
- Molecular and Cell Genetics Laboratory, The CAS Key Laboratory of Innate Immunity and Chronic Diseases, Hefei National Laboratory for Physical Sciences at Microscale, School of Life Sciences, CAS Center for Excellence in Molecular Cell Science, University of Science and Technology of China, Hefei, Anhui, 230027, China.,Reproductive Medicine Center of Jinghua Hospital, USTC-Shenyang Jinghua Hospital Joint Center of Human Reproduction and Genetics, Shenyang, Liaoning, 110005, China
| | - Ao Li
- School of Information Science and Technology, University of Science and Technology of China, Hefei, 230027, China
| | - Qiguang Zang
- School of Information Science and Technology, University of Science and Technology of China, Hefei, 230027, China
| | - Rongjun Ban
- Molecular and Cell Genetics Laboratory, The CAS Key Laboratory of Innate Immunity and Chronic Diseases, Hefei National Laboratory for Physical Sciences at Microscale, School of Life Sciences, CAS Center for Excellence in Molecular Cell Science, University of Science and Technology of China, Hefei, Anhui, 230027, China
| | - Asim Ali
- Molecular and Cell Genetics Laboratory, The CAS Key Laboratory of Innate Immunity and Chronic Diseases, Hefei National Laboratory for Physical Sciences at Microscale, School of Life Sciences, CAS Center for Excellence in Molecular Cell Science, University of Science and Technology of China, Hefei, Anhui, 230027, China
| | - Zhenghua Yu
- School of Information Science and Technology, University of Science and Technology of China, Hefei, 230027, China
| | - Qinghua Shi
- Molecular and Cell Genetics Laboratory, The CAS Key Laboratory of Innate Immunity and Chronic Diseases, Hefei National Laboratory for Physical Sciences at Microscale, School of Life Sciences, CAS Center for Excellence in Molecular Cell Science, University of Science and Technology of China, Hefei, Anhui, 230027, China
| | - Xiaohua Jiang
- Molecular and Cell Genetics Laboratory, The CAS Key Laboratory of Innate Immunity and Chronic Diseases, Hefei National Laboratory for Physical Sciences at Microscale, School of Life Sciences, CAS Center for Excellence in Molecular Cell Science, University of Science and Technology of China, Hefei, Anhui, 230027, China. .,Reproductive Medicine Center of Jinghua Hospital, USTC-Shenyang Jinghua Hospital Joint Center of Human Reproduction and Genetics, Shenyang, Liaoning, 110005, China.
| | - Yuanwei Zhang
- Molecular and Cell Genetics Laboratory, The CAS Key Laboratory of Innate Immunity and Chronic Diseases, Hefei National Laboratory for Physical Sciences at Microscale, School of Life Sciences, CAS Center for Excellence in Molecular Cell Science, University of Science and Technology of China, Hefei, Anhui, 230027, China. .,Reproductive Medicine Center of Jinghua Hospital, USTC-Shenyang Jinghua Hospital Joint Center of Human Reproduction and Genetics, Shenyang, Liaoning, 110005, China.
| |
Collapse
|
22
|
Afyounian E, Annala M, Nykter M. Segmentum: a tool for copy number analysis of cancer genomes. BMC Bioinformatics 2017; 18:215. [PMID: 28407731 PMCID: PMC5390478 DOI: 10.1186/s12859-017-1626-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Accepted: 04/06/2017] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Somatic alterations, including loss of heterozygosity, can affect the expression of oncogenes and tumor suppressor genes. Whole genome sequencing enables detailed characterization of such aberrations. However, due to the limitations of current high throughput sequencing technologies, this task remains challenging. Hence, accurate and reliable detection of such events is crucial for the identification of cancer-related alterations. RESULTS We introduce a new tool called Segmentum for determining somatic copy numbers using whole genome sequencing from paired tumor/normal samples. In our approach, read depth and B-allele fraction signals are smoothed, and double sliding windows are used to detect breakpoints, which makes our approach fast and straightforward. Because the breakpoint detection is performed simultaneously at different scales, it allows accurate detection as suggested by the evaluation results from simulated and real data. We applied Segmentum to paired tumor/normal whole genome sequencing samples from 38 patients with low-grade glioma from the TCGA dataset and were able to confirm the recurrence of copy-neutral loss of heterozygosity in chromosome 17p in low-grade astrocytoma characterized by IDH1/2 mutation and lack of 1p/19q co-deletion, which was previously reported using SNP array data. CONCLUSIONS Segmentum is an accurate, user-friendly tool for somatic copy number analysis of tumor samples. We demonstrate that this tool is suitable for the analysis of large cohorts, such as the TCGA dataset.
Collapse
Affiliation(s)
- Ebrahim Afyounian
- Faculty of Medicine and Life Sciences and BioMediTech institute, University of Tampere, Tampere, Finland
| | - Matti Annala
- Faculty of Medicine and Life Sciences and BioMediTech institute, University of Tampere, Tampere, Finland
| | - Matti Nykter
- Faculty of Medicine and Life Sciences and BioMediTech institute, University of Tampere, Tampere, Finland.
| |
Collapse
|
23
|
Silva GO, Siegel MB, Mose LE, Parker JS, Sun W, Perou CM, Chen M. SynthEx: a synthetic-normal-based DNA sequencing tool for copy number alteration detection and tumor heterogeneity profiling. Genome Biol 2017; 18:66. [PMID: 28390427 PMCID: PMC5385048 DOI: 10.1186/s13059-017-1193-3] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2016] [Accepted: 03/16/2017] [Indexed: 01/22/2023] Open
Abstract
Changes in the quantity of genetic material, known as somatic copy number alterations (CNAs), can drive tumorigenesis. Many methods exist for assessing CNAs using microarrays, but considerable technical issues limit current CNA calling based upon DNA sequencing. We present SynthEx, a novel tool for detecting CNAs from whole exome and genome sequencing. SynthEx utilizes a “synthetic-normal” strategy to overcome technical and financial issues. In terms of accuracy and precision, SynthEx is highly comparable to array-based methods and outperforms sequencing-based CNA detection tools. SynthEx robustly identifies CNAs using sequencing data without the additional costs associated with matched normal specimens.
Collapse
Affiliation(s)
- Grace O Silva
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.,Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill, NC, 27599, USA.,Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Marni B Siegel
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.,Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Lisle E Mose
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Joel S Parker
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.,Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Wei Sun
- Public Health Division, Fred Hutchison Cancer Research Center, Seattle, WA, 98109, USA
| | - Charles M Perou
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.,Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill, NC, 27599, USA.,Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Mengjie Chen
- Section of Genetic Medicine, Department of Medicine, The University of Chicago, 900 East 57th Street, KCBD 3220A, Chicago, IL, 60637, USA.
| |
Collapse
|
24
|
Mason-Suares H, Landry L, S. Lebo M. Detecting Copy Number Variation via Next Generation Technology. CURRENT GENETIC MEDICINE REPORTS 2016. [DOI: 10.1007/s40142-016-0091-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
25
|
Enhanced whole exome sequencing by higher DNA insert lengths. BMC Genomics 2016; 17:399. [PMID: 27225215 PMCID: PMC4880973 DOI: 10.1186/s12864-016-2698-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2015] [Accepted: 05/06/2016] [Indexed: 01/28/2023] Open
Abstract
Background Whole exome sequencing (WES) has been proven to serve as a valuable basis for various applications such as variant calling and copy number variation (CNV) analyses. For those analyses the read coverage should be optimally balanced throughout protein coding regions at sufficient read depth. Unfortunately, WES is known for its uneven coverage within coding regions due to GC-rich regions or off-target enrichment. Results In order to examine the irregularities of WES within genes, we applied Agilent SureSelectXT exome capture on human samples and sequenced these via Illumina in 2 × 101 paired-end mode. As we suspected the sequenced insert length to be crucial in the uneven coverage of exome captured samples, we sheared 12 genomic DNA samples to two different DNA insert size lengths, namely 130 and 170 bp. Interestingly, although mean coverages of target regions were clearly higher in samples of 130 bp insert length, the level of evenness was more pronounced in 170 bp samples. Moreover, merging overlapping paired-end reads revealed a positive effect on evenness indicating overlapping reads as another reason for the unevenness. In addition, mutation analysis on a subset of the samples was performed. In these isogenic subclones, the false negative rate in the 130 bp samples was almost double to that in the 170 bp samples. Visual inspection of the discarded mutation sites exposed low coverages at the sites flanked by high amplitudes of coverage depth. Conclusions Producing longer insert reads could be a good strategy to achieve better uniform read coverage in coding regions and hereby enhancing the effective sequencing yield to provide an improved basis for further variant calling and CNV analyses.
Collapse
|
26
|
Roller E, Ivakhno S, Lee S, Royce T, Tanner S. Canvas: versatile and scalable detection of copy number variants. ACTA ACUST UNITED AC 2016; 32:2375-7. [PMID: 27153601 DOI: 10.1093/bioinformatics/btw163] [Citation(s) in RCA: 120] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2016] [Accepted: 03/21/2016] [Indexed: 12/26/2022]
Abstract
MOTIVATION Versatile and efficient variant calling tools are needed to analyze large scale sequencing datasets. In particular, identification of copy number changes remains a challenging task due to their complexity, susceptibility to sequencing biases, variation in coverage data and dependence on genome-wide sample properties, such as tumor polyploidy or polyclonality in cancer samples. RESULTS We have developed a new tool, Canvas, for identification of copy number changes from diverse sequencing experiments including whole-genome matched tumor-normal and single-sample normal re-sequencing, as well as whole-exome matched and unmatched tumor-normal studies. In addition to variant calling, Canvas infers genome-wide parameters such as cancer ploidy, purity and heterogeneity. It provides fast and easy-to-run workflows that can scale to thousands of samples and can be easily incorporated into variant calling pipelines. AVAILABILITY AND IMPLEMENTATION Canvas is distributed under an open source license and can be downloaded from https://github.com/Illumina/canvas CONTACT eroller@illumina.com SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Sergii Ivakhno
- Illumina Cambridge Ltd, Chesterford Research Park, Little Chesterford, Essex CB10 1XL, UK
| | - Steve Lee
- Illumina Inc, San Diego, CA 92122, USA
| | | | | |
Collapse
|
27
|
Alioto TS, Buchhalter I, Derdak S, Hutter B, Eldridge MD, Hovig E, Heisler LE, Beck TA, Simpson JT, Tonon L, Sertier AS, Patch AM, Jäger N, Ginsbach P, Drews R, Paramasivam N, Kabbe R, Chotewutmontri S, Diessl N, Previti C, Schmidt S, Brors B, Feuerbach L, Heinold M, Gröbner S, Korshunov A, Tarpey PS, Butler AP, Hinton J, Jones D, Menzies A, Raine K, Shepherd R, Stebbings L, Teague JW, Ribeca P, Giner FC, Beltran S, Raineri E, Dabad M, Heath SC, Gut M, Denroche RE, Harding NJ, Yamaguchi TN, Fujimoto A, Nakagawa H, Quesada V, Valdés-Mas R, Nakken S, Vodák D, Bower L, Lynch AG, Anderson CL, Waddell N, Pearson JV, Grimmond SM, Peto M, Spellman P, He M, Kandoth C, Lee S, Zhang J, Létourneau L, Ma S, Seth S, Torrents D, Xi L, Wheeler DA, López-Otín C, Campo E, Campbell PJ, Boutros PC, Puente XS, Gerhard DS, Pfister SM, McPherson JD, Hudson TJ, Schlesner M, Lichter P, Eils R, Jones DTW, Gut IG. A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing. Nat Commun 2015; 6:10001. [PMID: 26647970 PMCID: PMC4682041 DOI: 10.1038/ncomms10001] [Citation(s) in RCA: 207] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2015] [Accepted: 10/23/2015] [Indexed: 12/13/2022] Open
Abstract
As whole-genome sequencing for cancer genome analysis becomes a clinical tool, a full understanding of the variables affecting sequencing analysis output is required. Here using tumour-normal sample pairs from two different types of cancer, chronic lymphocytic leukaemia and medulloblastoma, we conduct a benchmarking exercise within the context of the International Cancer Genome Consortium. We compare sequencing methods, analysis pipelines and validation methods. We show that using PCR-free methods and increasing sequencing depth to ∼ 100 × shows benefits, as long as the tumour:control coverage ratio remains balanced. We observe widely varying mutation call rates and low concordance among analysis pipelines, reflecting the artefact-prone nature of the raw data and lack of standards for dealing with the artefacts. However, we show that, using the benchmark mutation set we have created, many issues are in fact easy to remedy and have an immediate positive impact on mutation detection accuracy.
Collapse
Affiliation(s)
- Tyler S. Alioto
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| | - Ivo Buchhalter
- Division of Theoretical Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
- Division of Applied Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Sophia Derdak
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| | - Barbara Hutter
- Division of Applied Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Matthew D. Eldridge
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK
| | - Eivind Hovig
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, 0424 Oslo, Norway
- Department of Informatics, University of Oslo, 0373 Oslo, Norway
| | - Lawrence E. Heisler
- Ontario Institute for Cancer Research, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3
| | - Timothy A. Beck
- Ontario Institute for Cancer Research, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3
| | - Jared T. Simpson
- Ontario Institute for Cancer Research, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3
| | - Laurie Tonon
- Synergie Lyon Cancer Foundation, Centre Léon Bérard, Cheney C, 28 rue Laennec, Lyon 69373, France
| | - Anne-Sophie Sertier
- Synergie Lyon Cancer Foundation, Centre Léon Bérard, Cheney C, 28 rue Laennec, Lyon 69373, France
| | - Ann-Marie Patch
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, Queensland 4072, Australia
- QIMR Berghofer Medical Research Institute, Brisbane, Queensland 4006, Australia
| | - Natalie Jäger
- Division of Theoretical Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
- Department of Genetics, Stanford University, Mail Stop-5120, Stanford, California 94305-5120, USA
| | - Philip Ginsbach
- Division of Theoretical Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Ruben Drews
- Division of Theoretical Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Nagarajan Paramasivam
- Division of Theoretical Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Rolf Kabbe
- Division of Theoretical Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Sasithorn Chotewutmontri
- Genome and Proteome Core Facility, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg, 69120 Germany
| | - Nicolle Diessl
- Genome and Proteome Core Facility, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg, 69120 Germany
| | - Christopher Previti
- Genome and Proteome Core Facility, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg, 69120 Germany
| | - Sabine Schmidt
- Genome and Proteome Core Facility, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg, 69120 Germany
| | - Benedikt Brors
- Division of Applied Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Lars Feuerbach
- Division of Applied Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Michael Heinold
- Division of Applied Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Susanne Gröbner
- Department of Pediatric Hematology and Oncology, Heidelberg University Hospital, Im Neuenheimer Feld 430, Heidelberg 69120, Germany
| | - Andrey Korshunov
- Department of Neuropathology, Heidelberg University Hospital, Im Neuenheimer Feld 224, Heidelberg 69120, Germany
| | | | - Adam P. Butler
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Jonathan Hinton
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - David Jones
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Andrew Menzies
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Keiran Raine
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Rebecca Shepherd
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Lucy Stebbings
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Jon W. Teague
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Paolo Ribeca
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| | - Francesc Castro Giner
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| | - Sergi Beltran
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| | - Emanuele Raineri
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| | - Marc Dabad
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| | - Simon C. Heath
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| | - Marta Gut
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| | - Robert E. Denroche
- Ontario Institute for Cancer Research, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3
| | - Nicholas J. Harding
- Ontario Institute for Cancer Research, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3
| | - Takafumi N. Yamaguchi
- Ontario Institute for Cancer Research, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3
| | - Akihiro Fujimoto
- RIKEN Center for Integrative Medical Sciences, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
| | - Hidewaki Nakagawa
- RIKEN Center for Integrative Medical Sciences, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
| | - Víctor Quesada
- Universidad de Oviedo—IUOPA, C/Fernando Bongera s/n, 33006 Oviedo, Spain
| | - Rafael Valdés-Mas
- Universidad de Oviedo—IUOPA, C/Fernando Bongera s/n, 33006 Oviedo, Spain
| | - Sigve Nakken
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, 0424 Oslo, Norway
| | - Daniel Vodák
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, 0424 Oslo, Norway
- The Bioinformatics Core Facility, Institute for Cancer Genetics and Informatics, Oslo University Hospital, 0310 Oslo, Norway
| | - Lawrence Bower
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK
| | - Andrew G. Lynch
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK
| | - Charlotte L. Anderson
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK
- Victorian Life Sciences Computation Initiative, The University of Melbourne, Melbourne, Victoria 3053, Australia
| | - Nicola Waddell
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, Queensland 4072, Australia
- QIMR Berghofer Medical Research Institute, Brisbane, Queensland 4006, Australia
| | - John V. Pearson
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, Queensland 4072, Australia
- QIMR Berghofer Medical Research Institute, Brisbane, Queensland 4006, Australia
| | - Sean M. Grimmond
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, Queensland 4072, Australia
- WolfsonWohl Cancer Research Centre, Institute of Cancer Sciences, University of Glasgow, Glasgow, Scotland G61 1QH, UK
| | - Myron Peto
- Knight Cancer Institute, Oregon Health and Science University, Portland, Oregon 97239-3098, USA
| | - Paul Spellman
- Knight Cancer Institute, Oregon Health and Science University, Portland, Oregon 97239-3098, USA
| | | | - Cyriac Kandoth
- The Genome Institute, Washington University, St Louis, Missouri 63108, USA
| | - Semin Lee
- Harvard Medical School, Boston, Massachusetts 02115, USA
| | - John Zhang
- Harvard Medical School, Boston, Massachusetts 02115, USA
- MD Anderson Cancer Center, Houston, Texas 77030, USA
| | | | - Singer Ma
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95064, USA
| | - Sahil Seth
- MD Anderson Cancer Center, Houston, Texas 77030, USA
| | - David Torrents
- IRB-BSC Joint Research Program on Computational Biology, Barcelona Supercomputing Center, 08034 Barcelona, Spain
| | - Liu Xi
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA
| | - David A. Wheeler
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA
| | - Carlos López-Otín
- Universidad de Oviedo—IUOPA, C/Fernando Bongera s/n, 33006 Oviedo, Spain
| | - Elías Campo
- Hematopathology Unit, Department of Pathology, Hospital Clinic, University of Barcelona, Institut d'Investigacions Biomèdiques August Pi i Sunyer, 08036 Barcelona, Spain
| | | | - Paul C. Boutros
- Synergie Lyon Cancer Foundation, Centre Léon Bérard, Cheney C, 28 rue Laennec, Lyon 69373, France
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada M5G 1L7
| | - Xose S. Puente
- Universidad de Oviedo—IUOPA, C/Fernando Bongera s/n, 33006 Oviedo, Spain
| | - Daniela S. Gerhard
- National Cancer Institute, Office of Cancer Genomics, 31 Center Drive, 10A07, Bethesda, Maryland 20892-2580, USA
| | - Stefan M. Pfister
- Department of Pediatric Hematology and Oncology, Heidelberg University Hospital, Im Neuenheimer Feld 430, Heidelberg 69120, Germany
- Division of Pediatric Neurooncology, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - John D. McPherson
- Ontario Institute for Cancer Research, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada M5G 1L7
| | - Thomas J. Hudson
- Ontario Institute for Cancer Research, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada M5G 1L7
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| | - Matthias Schlesner
- Division of Theoretical Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Peter Lichter
- Division of Molecular Genetics, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, Heidelberg 69120,Germany
- Heidelberg Center for Personalised Oncology (DKFZ-HIPO), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Roland Eils
- Division of Theoretical Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
- Heidelberg Center for Personalised Oncology (DKFZ-HIPO), German Cancer Research Center (DKFZ), Heidelberg, Germany
- Institute of Pharmacy and Molecular Biotechnology, University of Heidelberg, Heidelberg 69120, Germany
- Bioquant Center, University of Heidelberg, Im Neuenheimer Feld 267, Heidelberg 69120, Germany
| | - David T. W. Jones
- Division of Pediatric Neurooncology, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Ivo G. Gut
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| |
Collapse
|
28
|
Ma L, Qin M, Liu B, Hu Q, Wei L, Wang J, Liu S. cnvCurator: an interactive visualization and editing tool for somatic copy number variations. BMC Bioinformatics 2015; 16:331. [PMID: 26472134 PMCID: PMC4608136 DOI: 10.1186/s12859-015-0766-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2015] [Accepted: 10/08/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND One of the most important somatic aberrations, copy number variations (CNVs) in tumor genomes is believed to have a high probability of harboring oncotargets. Detection of somatic CNVs is an essential part of cancer genome sequencing analysis, but the accuracy is usually limited due to various factors. A post-processing procedure including manual review and refinement of CNV segments is often needed in practice to achieve better accuracy. RESULTS cnvCurator is a user-friendly tool with functions specifically designed to facilitate the process of interactively visualizing and editing somatic CNV calling results. Different from other general genomics viewers, the index and display of CNV calling results in cnvCurator is segment central. It incorporates multiple CNV-specific information for concurrent, interactive display, as well as a number of relevant features allowing user to examine and curate the CNV calls. CONCLUSIONS cnvCurator provides important and practical utilities to assist the manual review and edition of results from a chosen somatic CNV caller, such that curated CNV segments will be used for down-stream applications.
Collapse
Affiliation(s)
- Lingnan Ma
- Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute, Buffalo, NY, 14263, USA. .,Department of Mathematics, University at Buffalo, Buffalo, NY, 14260, USA. .,College of Engineering, University of Michigan, Ann Arbor, MI, 48109, USA.
| | - Maochun Qin
- Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute, Buffalo, NY, 14263, USA.
| | - Biao Liu
- Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute, Buffalo, NY, 14263, USA.
| | - Qiang Hu
- Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute, Buffalo, NY, 14263, USA.
| | - Lei Wei
- Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute, Buffalo, NY, 14263, USA.
| | - Jianmin Wang
- Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute, Buffalo, NY, 14263, USA.
| | - Song Liu
- Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute, Buffalo, NY, 14263, USA.
| |
Collapse
|
29
|
Wang X, Li X, Cheng Y, Sun X, Sun X, Self S, Kooperberg C, Dai JY. Copy number alterations detected by whole-exome and whole-genome sequencing of esophageal adenocarcinoma. Hum Genomics 2015; 9:22. [PMID: 26374103 PMCID: PMC4570720 DOI: 10.1186/s40246-015-0044-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2015] [Accepted: 08/25/2015] [Indexed: 02/08/2023] Open
Abstract
Background Esophageal adenocarcinoma (EA) is among the leading causes of cancer mortality, especially in developed countries. A high level of somatic copy number alterations (CNAs) accumulates over the decades in the progression from Barrett’s esophagus, the precursor lesion, to EA. Accurate identification of somatic CNAs is essential to understand cancer development. Many studies have been conducted for the detection of CNA in EA using microarrays. Next-generation sequencing (NGS) technologies are believed to have advantages in sensitivity and accuracy to detect CNA, yet no NGS-based CNA detection in EA has been reported. Results In this study, we analyzed whole-exome (WES) and whole-genome sequencing (WGS) data for detecting CNA from a published large-scale genomic study of EA. Two specific comparisons were conducted. First, the recurrent CNAs based on WGS and WES data from 145 EA samples were compared to those found in five previous microarray-based studies. We found that the majority of the previously identified regions were also detected in this study. Interestingly, some novel amplifications and deletions were discovered using the NGS data. In particular, SKI and PRKCZ detected in a deletion region are involved in transforming growth factor-β pathway, suggesting the potential utility of novel biomarkers for EA. Second, we compared CNAs detected in WGS and WES data from the same 15 EA samples. No large-scale CNA was identified statistically more frequently by WES or WGS, while more focal-scale CNAs were detected by WGS than by WES. Conclusions Our results suggest that NGS can replace microarrays to detect CNA in EA. WGS is superior to WES in that it can offer finer resolution for the detection, though if the interest is on recurrent CNAs, WES can be preferable to WGS for its cost-effectiveness. Electronic supplementary material The online version of this article (doi:10.1186/s40246-015-0044-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiaoyu Wang
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
| | - Xiaohong Li
- Human Biology, Fred Hutchinson Cancer Research Center, Seattle, WA, USA. .,Public Health Science Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
| | - Yichen Cheng
- Public Health Science Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
| | - Xin Sun
- Institute of Occupational Health and Poison Control, Chinese Center for Disease Control and Prevention, Beijing, China.
| | - Xibin Sun
- Henan Office for Cancer Research and Control, Henan Cancer Hospital, Zhengzhou, Henan, China.
| | - Steve Self
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
| | - Charles Kooperberg
- Public Health Science Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
| | - James Y Dai
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA. .,Public Health Science Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
| |
Collapse
|
30
|
Cheng F, Zhao J, Zhao Z. Advances in computational approaches for prioritizing driver mutations and significantly mutated genes in cancer genomes. Brief Bioinform 2015; 17:642-56. [PMID: 26307061 DOI: 10.1093/bib/bbv068] [Citation(s) in RCA: 94] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2015] [Indexed: 12/27/2022] Open
Abstract
Cancer is often driven by the accumulation of genetic alterations, including single nucleotide variants, small insertions or deletions, gene fusions, copy-number variations, and large chromosomal rearrangements. Recent advances in next-generation sequencing technologies have helped investigators generate massive amounts of cancer genomic data and catalog somatic mutations in both common and rare cancer types. So far, the somatic mutation landscapes and signatures of >10 major cancer types have been reported; however, pinpointing driver mutations and cancer genes from millions of available cancer somatic mutations remains a monumental challenge. To tackle this important task, many methods and computational tools have been developed during the past several years and, thus, a review of its advances is urgently needed. Here, we first summarize the main features of these methods and tools for whole-exome, whole-genome and whole-transcriptome sequencing data. Then, we discuss major challenges like tumor intra-heterogeneity, tumor sample saturation and functionality of synonymous mutations in cancer, all of which may result in false-positive discoveries. Finally, we highlight new directions in studying regulatory roles of noncoding somatic mutations and quantitatively measuring circulating tumor DNA in cancer. This review may help investigators find an appropriate tool for detecting potential driver or actionable mutations in rapidly emerging precision cancer medicine.
Collapse
|
31
|
Duan J, Wan M, Deng HW, Wang YP. A Sparse Model Based Detection of Copy Number Variations From Exome Sequencing Data. IEEE Trans Biomed Eng 2015; 63:496-505. [PMID: 26258935 DOI: 10.1109/tbme.2015.2464674] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
GOAL Whole-exome sequencing provides a more cost-effective way than whole-genome sequencing for detecting genetic variants, such as copy number variations (CNVs). Although a number of approaches have been proposed to detect CNVs from whole-genome sequencing, a direct adoption of these approaches to whole-exome sequencing will often fail because exons are separately located along a genome. Therefore, an appropriate method is needed to target the specific features of exome sequencing data. METHODS In this paper, a novel sparse model based method is proposed to discover CNVs from multiple exome sequencing data. First, exome sequencing data are represented with a penalized matrix approximation, and technical variability and random sequencing errors are assumed to follow a generalized Gaussian distribution. Second, an iteratively reweighted least squares algorithm is used to estimate the solution. RESULTS The method is tested and validated on both synthetic and real data, and compared with other approaches including CoNIFER, XHMM, and cn.MOPS. The test demonstrates that the proposed method outperform other approaches. CONCLUSION The proposed sparse model can detect CNVs from exome sequencing data with high power and precision. Significance: Sparse model can target the specific features of exome sequencing data. The software codes are freely available at http://www.tulane.edu/ wyp/software/Exon_CNV.m.
Collapse
|
32
|
Nam JY, Kim NKD, Kim SC, Joung JG, Xi R, Lee S, Park PJ, Park WY. Evaluation of somatic copy number estimation tools for whole-exome sequencing data. Brief Bioinform 2015. [PMID: 26210357 DOI: 10.1093/bib/bbv055] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Whole-exome sequencing (WES) has become a standard method for detecting genetic variants in human diseases. Although the primary use of WES data has been the identification of single nucleotide variations and indels, these data also offer a possibility of detecting copy number variations (CNVs) at high resolution. However, WES data have uneven read coverage along the genome owing to the target capture step, and the development of a robust WES-based CNV tool is challenging. Here, we evaluate six WES somatic CNV detection tools: ADTEx, CONTRA, Control-FREEC, EXCAVATOR, ExomeCNV and Varscan2. Using WES data from 50 kidney chromophobe, 50 bladder urothelial carcinoma, and 50 stomach adenocarcinoma patients from The Cancer Genome Atlas, we compared the CNV calls from the six tools with a reference CNV set that was identified by both single nucleotide polymorphism array 6.0 and whole-genome sequencing data. We found that these algorithms gave highly variable results: visual inspection reveals significant differences between the WES-based segmentation profiles and the reference profile, as well as among the WES-based profiles. Using a 50% overlap criterion, 13-77% of WES CNV calls were covered by CNVs from the reference set, up to 21% of the copy gains were called as losses or vice versa, and dramatic differences in CNV sizes and CNV numbers were observed. Overall, ADTEx and EXCAVATOR had the best performance with relatively high precision and sensitivity. We suggest that the current algorithms for somatic CNV detection from WES data are limited in their performance and that more robust algorithms are needed.
Collapse
|
33
|
Varadan V, Singh S, Nosrati A, Ravi L, Lutterbaugh J, Barnholtz-Sloan JS, Markowitz SD, Willis JE, Guda K. ENVE: a novel computational framework characterizes copy-number mutational landscapes in colorectal cancers from African American patients. Genome Med 2015; 7:69. [PMID: 26269717 PMCID: PMC4534088 DOI: 10.1186/s13073-015-0192-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2015] [Accepted: 06/30/2015] [Indexed: 01/16/2023] Open
Abstract
Reliable detection of somatic copy-number alterations (sCNAs) in tumors using whole-exome sequencing (WES) remains challenging owing to technical (inherent noise) and sample-associated variability in WES data. We present a novel computational framework, ENVE, which models inherent noise in any WES dataset, enabling robust detection of sCNAs across WES platforms. ENVE achieved high concordance with orthogonal sCNA assessments across two colorectal cancer (CRC) WES datasets, and consistently outperformed a best-in-class algorithm, Control-FREEC. We subsequently used ENVE to characterize global sCNA landscapes in African American CRCs, identifying genomic aberrations potentially associated with CRC pathogenesis in this population. ENVE is downloadable at https://github.com/ENVE-Tools/ENVE.
Collapse
Affiliation(s)
- Vinay Varadan
- Division of General Medical Sciences-Oncology, Case Western Reserve University, Cleveland, OH 44106 USA ; Case Comprehensive Cancer Center, Case Western Reserve University, Cleveland, OH 44106 USA ; Case Western Reserve University, 2103 Cornell Road, Wolstein Research Building, Cleveland, OH 44106 USA
| | - Salendra Singh
- Case Comprehensive Cancer Center, Case Western Reserve University, Cleveland, OH 44106 USA
| | - Arman Nosrati
- Division of Hematology and Oncology, Case Western Reserve University, Cleveland, OH 44106 USA
| | - Lakshmeswari Ravi
- Division of Hematology and Oncology, Case Western Reserve University, Cleveland, OH 44106 USA
| | - James Lutterbaugh
- Division of Hematology and Oncology, Case Western Reserve University, Cleveland, OH 44106 USA
| | - Jill S Barnholtz-Sloan
- Division of General Medical Sciences-Oncology, Case Western Reserve University, Cleveland, OH 44106 USA ; Case Comprehensive Cancer Center, Case Western Reserve University, Cleveland, OH 44106 USA
| | - Sanford D Markowitz
- Case Comprehensive Cancer Center, Case Western Reserve University, Cleveland, OH 44106 USA ; Division of Hematology and Oncology, Case Western Reserve University, Cleveland, OH 44106 USA ; Department of Medicine, Case Western Reserve University, Cleveland, OH 44106 USA ; Case Medical Center, Case Western Reserve University, Cleveland, OH 44106 USA
| | - Joseph E Willis
- Case Comprehensive Cancer Center, Case Western Reserve University, Cleveland, OH 44106 USA ; Department of Medicine, Case Western Reserve University, Cleveland, OH 44106 USA ; Case Medical Center, Case Western Reserve University, Cleveland, OH 44106 USA ; Department of Pathology, Case Western Reserve University, Cleveland, OH 44106 USA
| | - Kishore Guda
- Division of General Medical Sciences-Oncology, Case Western Reserve University, Cleveland, OH 44106 USA ; Case Comprehensive Cancer Center, Case Western Reserve University, Cleveland, OH 44106 USA ; Department of Medicine, Case Western Reserve University, Cleveland, OH 44106 USA ; Case Western Reserve University, 2103 Cornell Road, Wolstein Research Building, Cleveland, OH 44106 USA
| |
Collapse
|
34
|
Tattini L, D'Aurizio R, Magi A. Detection of Genomic Structural Variants from Next-Generation Sequencing Data. Front Bioeng Biotechnol 2015; 3:92. [PMID: 26161383 PMCID: PMC4479793 DOI: 10.3389/fbioe.2015.00092] [Citation(s) in RCA: 155] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2014] [Accepted: 06/10/2015] [Indexed: 01/16/2023] Open
Abstract
Structural variants are genomic rearrangements larger than 50 bp accounting for around 1% of the variation among human genomes. They impact on phenotypic diversity and play a role in various diseases including neurological/neurocognitive disorders and cancer development and progression. Dissecting structural variants from next-generation sequencing data presents several challenges and a number of approaches have been proposed in the literature. In this mini review, we describe and summarize the latest tools – and their underlying algorithms – designed for the analysis of whole-genome sequencing, whole-exome sequencing, custom captures, and amplicon sequencing data, pointing out the major advantages/drawbacks. We also report a summary of the most recent applications of third-generation sequencing platforms. This assessment provides a guided indication – with particular emphasis on human genetics and copy number variants – for researchers involved in the investigation of these genomic events.
Collapse
Affiliation(s)
- Lorenzo Tattini
- Department of Neurosciences, Psychology, Pharmacology and Child Health, University of Florence , Florence , Italy
| | - Romina D'Aurizio
- Laboratory of Integrative Systems Medicine (LISM), Institute of Informatics and Telematics and Institute of Clinical Physiology, National Research Council , Pisa , Italy
| | - Alberto Magi
- Department of Clinical and Experimental Medicine, University of Florence , Florence , Italy
| |
Collapse
|
35
|
Kim J, Kim S, Nam H, Kim S, Lee D. SoloDel: a probabilistic model for detecting low-frequent somatic deletions from unmatched sequencing data. Bioinformatics 2015; 31:3105-13. [PMID: 26071141 DOI: 10.1093/bioinformatics/btv358] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2014] [Accepted: 06/05/2015] [Indexed: 01/26/2023] Open
Abstract
MOTIVATION Finding somatic mutations from massively parallel sequencing data is becoming a standard process in genome-based biomedical studies. There are a number of robust methods developed for detecting somatic single nucleotide variations However, detection of somatic copy number alteration has been substantially less explored and remains vulnerable to frequently raised sampling issues: low frequency in cell population and absence of the matched control samples. RESULTS We developed a novel computational method SoloDel that accurately classifies low-frequent somatic deletions from germline ones with or without matched control samples. We first constructed a probabilistic, somatic mutation progression model that describes the occurrence and propagation of the event in the cellular lineage of the sample. We then built a Gaussian mixture model to represent the mixed population of somatic and germline deletions. Parameters of the mixture model could be estimated using the expectation-maximization algorithm with the observed distribution of read-depth ratios at the points of discordant-read based initial deletion calls. Combined with conventional structural variation caller, SoloDel greatly increased the accuracy in classifying somatic mutations. Even without control, SoloDel maintained a comparable performance in a wide range of mutated subpopulation size (10-70%). SoloDel could also successfully recall experimentally validated somatic deletions from previously reported neuropsychiatric whole-genome sequencing data. AVAILABILITY AND IMPLEMENTATION Java-based implementation of the method is available at http://sourceforge.net/projects/solodel/ CONTACT swkim@yuhs.ac or dhlee@biosoft.kaist.ac.kr SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Junho Kim
- Severance Biomedical Science Institute, Yonsei University College of Medicine, Seoul 120-752, Korea, Department of Bio and Brain Engineering, KAIST, Yuseong-Gu, Daejeon 305-701, Korea
| | - Sanghyeon Kim
- Stanley Brain Research Laboratory, Stanley Medical Research Institute, Rockville, MD 20850, USA and
| | - Hojung Nam
- School of Information and Communications, Gwangju Institute of Science and Technology, Gwangju 500-712, Korea
| | - Sangwoo Kim
- Severance Biomedical Science Institute, Yonsei University College of Medicine, Seoul 120-752, Korea
| | - Doheon Lee
- Department of Bio and Brain Engineering, KAIST, Yuseong-Gu, Daejeon 305-701, Korea
| |
Collapse
|
36
|
Tetreault M, Bareke E, Nadaf J, Alirezaie N, Majewski J. Whole-exome sequencing as a diagnostic tool: current challenges and future opportunities. Expert Rev Mol Diagn 2015; 15:749-60. [PMID: 25959410 DOI: 10.1586/14737159.2015.1039516] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Whole-exome sequencing (WES) represents a significant breakthrough in the field of human genetics. This technology has largely contributed to the identification of new disease-causing genes and is now entering clinical laboratories. WES represents a powerful tool for diagnosis and could reduce the 'diagnostic odyssey' for many patients. In this review, we present a technical overview of WES analysis, variants annotation and interpretation in a clinical setting. We evaluate the usefulness of clinical WES in different clinical indications, such as rare diseases, cancer and complex diseases. Finally, we discuss the efficacy of WES as a diagnostic tool and the impact on patient management.
Collapse
Affiliation(s)
- Martine Tetreault
- Department of Human Genetics, McGill University, Montreal, QC H3A 1B1, Canada
| | | | | | | | | |
Collapse
|
37
|
Pirooznia M, Goes FS, Zandi PP. Whole-genome CNV analysis: advances in computational approaches. Front Genet 2015; 6:138. [PMID: 25918519 PMCID: PMC4394692 DOI: 10.3389/fgene.2015.00138] [Citation(s) in RCA: 123] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2015] [Accepted: 03/23/2015] [Indexed: 01/04/2023] Open
Abstract
Accumulating evidence indicates that DNA copy number variation (CNV) is likely to make a significant contribution to human diversity and also play an important role in disease susceptibility. Recent advances in genome sequencing technologies have enabled the characterization of a variety of genomic features, including CNVs. This has led to the development of several bioinformatics approaches to detect CNVs from next-generation sequencing data. Here, we review recent advances in CNV detection from whole genome sequencing. We discuss the informatics approaches and current computational tools that have been developed as well as their strengths and limitations. This review will assist researchers and analysts in choosing the most suitable tools for CNV analysis as well as provide suggestions for new directions in future development.
Collapse
Affiliation(s)
- Mehdi Pirooznia
- Mood Disorders Center, Department of Psychiatry and Behavioral Sciences, School of Medicine, Johns Hopkins University Baltimore, MD, USA
| | - Fernando S Goes
- Mood Disorders Center, Department of Psychiatry and Behavioral Sciences, School of Medicine, Johns Hopkins University Baltimore, MD, USA
| | - Peter P Zandi
- Mood Disorders Center, Department of Psychiatry and Behavioral Sciences, School of Medicine, Johns Hopkins University Baltimore, MD, USA ; Department of Mental Health, Johns Hopkins Bloomberg School of Public Health Baltimore, MD, USA USA
| |
Collapse
|
38
|
Oleksiewicz U, Tomczak K, Woropaj J, Markowska M, Stępniak P, Shah PK. Computational characterisation of cancer molecular profiles derived using next generation sequencing. Contemp Oncol (Pozn) 2015; 19:A78-91. [PMID: 25691827 PMCID: PMC4322529 DOI: 10.5114/wo.2014.47137] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
Our current understanding of cancer genetics is grounded on the principle that cancer arises from a clone that has accumulated the requisite somatically acquired genetic aberrations, leading to the malignant transformation. It also results in aberrent of gene and protein expression. Next generation sequencing (NGS) or deep sequencing platforms are being used to create large catalogues of changes in copy numbers, mutations, structural variations, gene fusions, gene expression, and other types of information for cancer patients. However, inferring different types of biological changes from raw reads generated using the sequencing experiments is algorithmically and computationally challenging. In this article, we outline common steps for the quality control and processing of NGS data. We highlight the importance of accurate and application-specific alignment of these reads and the methodological steps and challenges in obtaining different types of information. We comment on the importance of integrating these data and building infrastructure to analyse it. We also provide exhaustive lists of available software to obtain information and point the readers to articles comparing software for deeper insight in specialised areas. We hope that the article will guide readers in choosing the right tools for analysing oncogenomic datasets.
Collapse
Affiliation(s)
- Urszula Oleksiewicz
- Laboratory of Gene Therapy, Department of Cancer Immunology, The Greater Poland Cancer Centre, Poznan, Poland ; Department of Cancer Immunology and Diagnostics, Chair of Medical Biotechnology, Poznan University of Medical Sciences, Poznan, Poland ; These authors contributed equally to this paper
| | - Katarzyna Tomczak
- Laboratory of Gene Therapy, Department of Cancer Immunology, The Greater Poland Cancer Centre, Poznan, Poland ; Department of Cancer Immunology and Diagnostics, Chair of Medical Biotechnology, Poznan University of Medical Sciences, Poznan, Poland ; Postgraduate School of Molecular Medicine, Medical University of Warsaw, Warsaw ; These authors contributed equally to this paper
| | - Jakub Woropaj
- Poznan University of Economics, Poznań, Poland ; These authors contributed equally to this paper
| | | | | | - Parantu K Shah
- Institute for Applied Cancer Science, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
39
|
Nadaf J, Majewski J, Fahiminiya S. ExomeAI: detection of recurrent allelic imbalance in tumors using whole-exome sequencing data. ACTA ACUST UNITED AC 2014; 31:429-31. [PMID: 25297069 DOI: 10.1093/bioinformatics/btu665] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
SUMMARY Whole-exome sequencing (WES) has extensively been used in cancer genome studies; however, the use of WES data in the study of loss of heterozygosity or more generally allelic imbalance (AI) has so far been very limited, which highlights the need for user-friendly and flexible software that can handle low-quality datasets. We have developed a statistical approach, ExomeAI, for the detection of recurrent AI events using WES datasets, specifically where matched normal samples are not available. AVAILABILITY ExomeAI is a web-based application, publicly available at: http://genomequebec.mcgill.ca/exomeai. CONTACT JavadNadaf@gmail.com or somayyeh.fahiminiya@mcgill.ca SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Javad Nadaf
- Department of Human Genetics, Faculty of Medicine, McGill University and Genome Quebec Innovation Center, Montreal, Quebec, Canada
| | - Jacek Majewski
- Department of Human Genetics, Faculty of Medicine, McGill University and Genome Quebec Innovation Center, Montreal, Quebec, Canada
| | - Somayyeh Fahiminiya
- Department of Human Genetics, Faculty of Medicine, McGill University and Genome Quebec Innovation Center, Montreal, Quebec, Canada
| |
Collapse
|
40
|
Kadalayil L, Rafiq S, Rose-Zerilli MJJ, Pengelly RJ, Parker H, Oscier D, Strefford JC, Tapper WJ, Gibson J, Ennis S, Collins A. Exome sequence read depth methods for identifying copy number changes. Brief Bioinform 2014; 16:380-92. [PMID: 25169955 DOI: 10.1093/bib/bbu027] [Citation(s) in RCA: 65] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2014] [Accepted: 07/10/2014] [Indexed: 01/04/2023] Open
Abstract
Copy number variants (CNVs) play important roles in a number of human diseases and in pharmacogenetics. Powerful methods exist for CNV detection in whole genome sequencing (WGS) data, but such data are costly to obtain. Many disease causal CNVs span or are found in genome coding regions (exons), which makes CNV detection using whole exome sequencing (WES) data attractive. If reliably validated against WGS-based CNVs, exome-derived CNVs have potential applications in a clinical setting. Several algorithms have been developed to exploit exome data for CNV detection and comparisons made to find the most suitable methods for particular data samples. The results are not consistent across studies. Here, we review some of the exome CNV detection methods based on depth of coverage profiles and examine their performance to identify problems contributing to discrepancies in published results. We also present a streamlined strategy that uses a single metric, the likelihood ratio, to compare exome methods, and we demonstrated its utility using the VarScan 2 and eXome Hidden Markov Model (XHMM) programs using paired normal and tumour exome data from chronic lymphocytic leukaemia patients. We use array-based somatic CNV (SCNV) calls as a reference standard to compute prevalence-independent statistics, such as sensitivity, specificity and likelihood ratio, for validation of the exome-derived SCNVs. We also account for factors known to influence the performance of exome read depth methods, such as CNV size and frequency, while comparing our findings with published results.
Collapse
|
41
|
Prandi D, Baca SC, Romanel A, Barbieri CE, Mosquera JM, Fontugne J, Beltran H, Sboner A, Garraway LA, Rubin MA, Demichelis F. Unraveling the clonal hierarchy of somatic genomic aberrations. Genome Biol 2014; 15:439. [PMID: 25160065 PMCID: PMC4167267 DOI: 10.1186/s13059-014-0439-6] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2014] [Accepted: 08/13/2014] [Indexed: 12/25/2022] Open
Abstract
Defining the chronology of molecular alterations may identify milestones in carcinogenesis. To unravel the temporal evolution of aberrations from clinical tumors, we developed CLONET, which upon estimation of tumor admixture and ploidy infers the clonal hierarchy of genomic aberrations. Comparative analysis across 100 sequenced genomes from prostate, melanoma, and lung cancers established diverse evolutionary hierarchies, demonstrating the early disruption of tumor-specific pathways. The analyses highlight the diversity of clonal evolution within and across tumor types that might be informative for risk stratification and patient selection for targeted therapies. CLONET addresses heterogeneous clinical samples seen in the setting of precision medicine.
Collapse
|
42
|
Sekar D, Thirugnanasambantham K, Hairul Islam VI, Saravanan S. Sequencing approaches in cancer treatment. Cell Prolif 2014; 47:391-5. [PMID: 25131793 DOI: 10.1111/cpr.12124] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2014] [Accepted: 05/23/2014] [Indexed: 12/15/2022] Open
Abstract
Use of sequencing approaches is an important aspect in the field of cancer genomics, where next-generation sequencing has already been utilized for targeting oncogenes or tumour-suppressor genes, that can be sequenced in a short time period. Alterations such as point mutations, insertions/deletions, copy number alterations, chromosomal rearrangements and epigenetic changes are encountered in cancer cell genomes, and application of various NGS technologies in cancer research will encounter such modifications. Rapid advancement in technology has led to exponential growth in the field of genomic analysis. The $1000 Genome Project (in which the goal is to sequence an entire human genome for $1000), and deep sequencing techniques (which have greater accuracy and provide a more complete analysis of the genome), are examples of rapid advancements in the field of cancer genomics. In this mini review, we explore sequencing techniques, correlating their importance in cancer therapy and treatment.
Collapse
Affiliation(s)
- D Sekar
- Pondicherry Centre for Biological Sciences, Pondicherry, 605 005, India
| | | | | | | |
Collapse
|