51
|
Abstract
This chapter contains a step-by-step protocol for identifying somatic SNPs and small Indels from next-generation sequencing data of tumor samples and matching normal samples. The workflow presented here is largely based on the Broad Institute's "Best Practices" guidelines and makes use of their Genome Analysis Toolkit (GATK) platform. Variants are annotated with population allele frequencies and curated resources such as GnomAD and ClinVar and curated effect predictions from dbNSFP using VCFtools, SnpEff, and SnpSift.
Collapse
Affiliation(s)
- Peter J Ulintz
- BRCF Bioinformatics Core, University of Michigan, Ann Arbor, MI, USA.
- Division of Hematology and Oncology, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA.
| | - Weisheng Wu
- BRCF Bioinformatics Core, University of Michigan, Ann Arbor, MI, USA
| | - Chris M Gates
- BRCF Bioinformatics Core, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
52
|
Meng J, Chen YPP. A database of simulated tumor genomes towards accurate detection of somatic small variants in cancer. PLoS One 2018; 13:e0202982. [PMID: 30161165 PMCID: PMC6116990 DOI: 10.1371/journal.pone.0202982] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2018] [Accepted: 08/13/2018] [Indexed: 11/19/2022] Open
Abstract
Somatic mutations promote the transformation of normal cells to cancer. Accurate identification of such mutations facilitates cancer diagnosis and treatment, but biological and technological noises, including intra-tumor heterogeneity, sample contamination, uncertainties in base sequencing and read alignment, pose a big challenge to somatic mutation discovery. A number of callers have been developed to predict them from paired tumor/normal or unpaired tumor sequencing data. However, the small size of currently available experimentally validated somatic sites limits evaluation and then improvement of callers. Fortunately, NIST reference material NA12878 genome has been well-characterized with publicly available high-confidence genotype calls, and biological and technological noises can be computationally generalized to the number of sub-clones, the VAFs, the sequencing and mapping qualities. We used BAMSurgeon to create simulated tumors by introducing somatic small variants (SNVs and small indels) into homozygous reference or wildtype sites of NA12878. We generated 135 simulated tumors from 5 pre-tumors/normals. These simulated tumors vary in sequencing and subsequent mapping error profiles, read length, the number of sub-clones, the VAF, the mutation frequency across the genome and the genomic context. Furthermore, these pure tumor/normal pairs can be mixed at desired ratios within each pair to simulate sample contamination. This database (a total size of 15 terabytes) will be of great use to benchmark somatic small variant callers and guide their improvement.
Collapse
Affiliation(s)
- Jing Meng
- College of Science, Health and Engineering, La Trobe University, Melbourne, Victoria, Australia
| | - Yi-Ping Phoebe Chen
- College of Science, Health and Engineering, La Trobe University, Melbourne, Victoria, Australia
| |
Collapse
|
53
|
Kamps-Hughes N, McUsic A, Kurihara L, Harkins TT, Pal P, Ray C, Ionescu-Zanetti C. ERASE-Seq: Leveraging replicate measurements to enhance ultralow frequency variant detection in NGS data. PLoS One 2018; 13:e0195272. [PMID: 29630678 PMCID: PMC5890993 DOI: 10.1371/journal.pone.0195272] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2017] [Accepted: 03/19/2018] [Indexed: 12/30/2022] Open
Abstract
The accurate detection of ultralow allele frequency variants in DNA samples is of interest in both research and medical settings, particularly in liquid biopsies where cancer mutational status is monitored from circulating DNA. Next-generation sequencing (NGS) technologies employing molecular barcoding have shown promise but significant sensitivity and specificity improvements are still needed to detect mutations in a majority of patients before the metastatic stage. To address this we present analytical validation data for ERASE-Seq (Elimination of Recurrent Artifacts and Stochastic Errors), a method for accurate and sensitive detection of ultralow frequency DNA variants in NGS data. ERASE-Seq differs from previous methods by creating a robust statistical framework to utilize technical replicates in conjunction with background error modeling, providing a 10 to 100-fold reduction in false positive rates compared to published molecular barcoding methods. ERASE-Seq was tested using spiked human DNA mixtures with clinically realistic DNA input quantities to detect SNVs and indels between 0.05% and 1% allele frequency, the range commonly found in liquid biopsy samples. Variants were detected with greater than 90% sensitivity and a false positive rate below 0.1 calls per 10,000 possible variants. The approach represents a significant performance improvement compared to molecular barcoding methods and does not require changing molecular reagents.
Collapse
Affiliation(s)
- Nick Kamps-Hughes
- Fluxion Biosciences Inc., South San Francisco, California, United States of America
| | - Andrew McUsic
- Swift Biosciences Inc., Ann Arbor, Michigan, United States of America
| | - Laurie Kurihara
- Swift Biosciences Inc., Ann Arbor, Michigan, United States of America
| | - Timothy T Harkins
- Swift Biosciences Inc., Ann Arbor, Michigan, United States of America
| | - Prithwish Pal
- Illumina Inc., San Diego, California, United States of America
| | - Claire Ray
- Illumina Inc., San Diego, California, United States of America
| | | |
Collapse
|
54
|
Xu C. A review of somatic single nucleotide variant calling algorithms for next-generation sequencing data. Comput Struct Biotechnol J 2018; 16:15-24. [PMID: 29552334 PMCID: PMC5852328 DOI: 10.1016/j.csbj.2018.01.003] [Citation(s) in RCA: 149] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2017] [Revised: 01/20/2018] [Accepted: 01/28/2018] [Indexed: 02/06/2023] Open
Abstract
Detection of somatic mutations holds great potential in cancer treatment and has been a very active research field in the past few years, especially since the breakthrough of the next-generation sequencing technology. A collection of variant calling pipelines have been developed with different underlying models, filters, input data requirements, and targeted applications. This review aims to enumerate these unique features of the state-of-the-art variant callers, in the hope to provide a practical guide for selecting the appropriate pipeline for specific applications. We will focus on the detection of somatic single nucleotide variants, ranging from traditional variant callers based on whole genome or exome sequencing of paired tumor-normal samples to recent low-frequency variant callers designed for targeted sequencing protocols with unique molecular identifiers. The variant callers have been extensively benchmarked with inconsistent performances across these studies. We will review the reference materials, datasets, and performance metrics that have been used in the benchmarking studies. In the end, we will discuss emerging trends and future directions of the variant calling algorithms.
Collapse
Affiliation(s)
- Chang Xu
- Life Science Research and Foundation, Qiagen Sciences, Inc., 6951 Executive Way, Frederick, Maryland 21703, USA
| |
Collapse
|
55
|
Abstract
The rapid development of immunomodulatory cancer therapies has led to a concurrent increase in the application of informatics techniques to the analysis of tumors, the tumor microenvironment, and measures of systemic immunity. In this review, the use of tumors to gather genetic and expression data will first be explored. Next, techniques to assess tumor immunity are reviewed, including HLA status, predicted neoantigens, immune microenvironment deconvolution, and T-cell receptor sequencing. Attempts to integrate these data are in early stages of development and are discussed in this review. Finally, we review the application of these informatics strategies to therapy development, with a focus on vaccines, adoptive cell transfer, and checkpoint blockade therapies.
Collapse
Affiliation(s)
- J Hammerbacher
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York
- Department of Microbiology and Immunology, Medical University of South Carolina, Charleston
| | - A Snyder
- Department of Medicine, Memorial Sloan Kettering Cancer Center, New York
- Adaptive Biotechnologies, Seattle, USA
| |
Collapse
|
56
|
Eitan R, Shamir R. Reconstructing cancer karyotypes from short read data: the half empty and half full glass. BMC Bioinformatics 2017; 18:488. [PMID: 29141589 PMCID: PMC5688766 DOI: 10.1186/s12859-017-1929-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Accepted: 11/06/2017] [Indexed: 02/01/2023] Open
Abstract
Background During cancer progression genomes undergo point mutations as well as larger segmental changes. The latter include, among others, segmental deletions duplications, translocations and inversions.The result is a highly complex, patient-specific cancer karyotype. Using high-throughput technologies of deep sequencing and microarrays it is possible to interrogate a cancer genome and produce chromosomal copy number profiles and a list of breakpoints (“jumps”) relative to the normal genome. This information is very detailed but local, and does not give the overall picture of the cancer genome. One of the basic challenges in cancer genome research is to use such information to infer the cancer karyotype. We present here an algorithmic approach, based on graph theory and integer linear programming, that receives segmental copy number and breakpoint data as input and produces a cancer karyotype that is most concordant with them. We used simulations to evaluate the utility of our approach, and applied it to real data. Results By using a simulation model, we were able to estimate the correctness and robustness of the algorithm in a spectrum of scenarios. Under our base scenario, designed according to observations in real data, the algorithm correctly inferred 69% of the karyotypes. However, when using less stringent correctness metrics that account for incomplete and noisy data, 87% of the reconstructed karyotypes were correct. Furthermore, in scenarios where the data were very clean and complete, accuracy rose to 90%–100%. Some examples of analysis of real data, and the reconstructed karyotypes suggested by our algorithm, are also presented. Conclusion While reconstruction of complete, perfect karyotype based on short read data is very hard, a large fraction of the reconstruction will still be correct and can provide useful information. Electronic supplementary material The online version of this article (10.1186/s12859-017-1929-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Rami Eitan
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv-Yafo, Israel
| | - Ron Shamir
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv-Yafo, Israel.
| |
Collapse
|
57
|
Vijayan V, Yiu SM, Zhang L. Improving somatic variant identification through integration of genome and exome data. BMC Genomics 2017. [PMID: 29513195 PMCID: PMC5657037 DOI: 10.1186/s12864-017-4134-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Affiliation(s)
- Vinaya Vijayan
- Genetics, Bioinformatics and Computational Biology, Virginia Tech, Blacksburg, VA, USA
| | - Siu-Ming Yiu
- Department of Computer Science, The University of Hong Kong, Pokfulam, Hong Kong
| | - Liqing Zhang
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA.
| |
Collapse
|
58
|
Takayama K, Akita N, Mimura N, Akahira R, Taniguchi Y, Ikeda M, Sakurai F, Ohara O, Morio T, Sekiguchi K, Mizuguchi H. Generation of safe and therapeutically effective human induced pluripotent stem cell-derived hepatocyte-like cells for regenerative medicine. Hepatol Commun 2017; 1:1058-1069. [PMID: 29404442 PMCID: PMC5721405 DOI: 10.1002/hep4.1111] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Revised: 09/11/2017] [Accepted: 09/19/2017] [Indexed: 12/23/2022] Open
Abstract
Hepatocyte‐like cells (HLCs) differentiated from human induced pluripotent stem (iPS) cells are expected to be applied for regenerative medicine. In this study, we attempted to generate safe and therapeutically effective human iPS‐HLCs for hepatocyte transplantation. First, human iPS‐HLCs were generated from a human leukocyte antigen‐homozygous donor on the assumption that the allogenic transplantation might be carried out. Highly efficient hepatocyte differentiation was performed under a feeder‐free condition using human recombinant laminin 111, laminin 511, and type IV collagen. The percentage of asialoglycoprotein receptor 1‐positive cells was greater than 80%, while the percentage of residual undifferentiated cells was approximately 0.003%. In addition, no teratoma formation was observed even at 16 weeks after human iPS‐HLC transplantation. Furthermore, harmful genetic somatic single‐nucleotide substitutions were not observed during the hepatocyte differentiation process. We also developed a cryopreservation protocol for hepatoblast‐like cells without negatively affecting their hepatocyte differentiation potential by programming the freezing temperature. To evaluate the therapeutic potential of human iPS‐HLCs, these cells (1 × 106 cells/mouse) were intrasplenically transplanted into acute liver injury mice treated with 3 mL/kg CCl4 only once and chronic liver injury mice treated with 0.6 mL/kg CCl4 twice weekly for 8 weeks. By human iPS‐HLC transplantation, the survival rate of the acute liver injury mice was significantly increased and the liver fibrosis level of chronic liver injury mice was significantly decreased. Conclusion: We were able to generate safe and therapeutically effective human iPS‐HLCs for hepatocyte transplantation. (Hepatology Communications 2017;1:1058–1069)
Collapse
Affiliation(s)
- Kazuo Takayama
- Laboratory of Biochemistry and Molecular Biology, Graduate School of Pharmaceutical Sciences Osaka University Osaka Japan.,PRESTO, Japan Science and Technology Agency Saitama Japan.,Laboratory of Hepatocyte Regulation, National Institutes of Biomedical Innovation, Health and Nutrition Osaka Japan
| | - Naoki Akita
- Laboratory of Biochemistry and Molecular Biology, Graduate School of Pharmaceutical Sciences Osaka University Osaka Japan.,Laboratory of Hepatocyte Regulation, National Institutes of Biomedical Innovation, Health and Nutrition Osaka Japan
| | - Natsumi Mimura
- Laboratory of Hepatocyte Regulation, National Institutes of Biomedical Innovation, Health and Nutrition Osaka Japan
| | | | | | - Makoto Ikeda
- Department of Technology Development Kazusa DNA Research Institute Chiba Japan
| | - Fuminori Sakurai
- Laboratory of Biochemistry and Molecular Biology, Graduate School of Pharmaceutical Sciences Osaka University Osaka Japan.,Laboratory of Regulatory Sciences for Oligonucleotide Therapeutics, Clinical Drug Development Project, Graduate School of Pharmaceutical Sciences Osaka University Osaka Japan
| | - Osamu Ohara
- Department of Technology Development Kazusa DNA Research Institute Chiba Japan
| | - Tomohiro Morio
- Department of Pediatrics and Developmental Biology Tokyo Medical and Dental University Tokyo Japan
| | | | - Hiroyuki Mizuguchi
- Laboratory of Biochemistry and Molecular Biology, Graduate School of Pharmaceutical Sciences Osaka University Osaka Japan.,Laboratory of Hepatocyte Regulation, National Institutes of Biomedical Innovation, Health and Nutrition Osaka Japan.,Global Center for Medical Engineering and Informatics Osaka University Osaka Japan
| |
Collapse
|
59
|
Xu C, Nezami Ranjbar MR, Wu Z, DiCarlo J, Wang Y. Detecting very low allele fraction variants using targeted DNA sequencing and a novel molecular barcode-aware variant caller. BMC Genomics 2017; 18:5. [PMID: 28049435 PMCID: PMC5209917 DOI: 10.1186/s12864-016-3425-4] [Citation(s) in RCA: 72] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Accepted: 12/14/2016] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND Detection of DNA mutations at very low allele fractions with high accuracy will significantly improve the effectiveness of precision medicine for cancer patients. To achieve this goal through next generation sequencing, researchers need a detection method that 1) captures rare mutation-containing DNA fragments efficiently in the mix of abundant wild-type DNA; 2) sequences the DNA library extensively to deep coverage; and 3) distinguishes low level true variants from amplification and sequencing errors with high accuracy. Targeted enrichment using PCR primers provides researchers with a convenient way to achieve deep sequencing for a small, yet most relevant region using benchtop sequencers. Molecular barcoding (or indexing) provides a unique solution for reducing sequencing artifacts analytically. Although different molecular barcoding schemes have been reported in recent literature, most variant calling has been done on limited targets, using simple custom scripts. The analytical performance of barcode-aware variant calling can be significantly improved by incorporating advanced statistical models. RESULTS We present here a highly efficient, simple and scalable enrichment protocol that integrates molecular barcodes in multiplex PCR amplification. In addition, we developed smCounter, an open source, generic, barcode-aware variant caller based on a Bayesian probabilistic model. smCounter was optimized and benchmarked on two independent read sets with SNVs and indels at 5 and 1% allele fractions. Variants were called with very good sensitivity and specificity within coding regions. CONCLUSIONS We demonstrated that we can accurately detect somatic mutations with allele fractions as low as 1% in coding regions using our enrichment protocol and variant caller.
Collapse
Affiliation(s)
- Chang Xu
- Life Science Research and Foundation, Qiagen Sciences, Inc., 6951 Executive Way, Frederick, Maryland, 21703, USA.
| | - Mohammad R Nezami Ranjbar
- Life Science Research and Foundation, Qiagen Sciences, Inc., 6951 Executive Way, Frederick, Maryland, 21703, USA
| | - Zhong Wu
- Life Science Research and Foundation, Qiagen Sciences, Inc., 6951 Executive Way, Frederick, Maryland, 21703, USA
| | - John DiCarlo
- Life Science Research and Foundation, Qiagen Sciences, Inc., 6951 Executive Way, Frederick, Maryland, 21703, USA
| | - Yexun Wang
- Life Science Research and Foundation, Qiagen Sciences, Inc., 6951 Executive Way, Frederick, Maryland, 21703, USA.
| |
Collapse
|
60
|
Fan Y, Xi L, Hughes DST, Zhang J, Zhang J, Futreal PA, Wheeler DA, Wang W. MuSE: accounting for tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling from sequencing data. Genome Biol 2016; 17:178. [PMID: 27557938 PMCID: PMC4995747 DOI: 10.1186/s13059-016-1029-6] [Citation(s) in RCA: 161] [Impact Index Per Article: 20.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2016] [Accepted: 07/18/2016] [Indexed: 12/26/2022] Open
Abstract
Subclonal mutations reveal important features of the genetic architecture of tumors. However, accurate detection of mutations in genetically heterogeneous tumor cell populations using next-generation sequencing remains challenging. We develop MuSE (http://bioinformatics.mdanderson.org/main/MuSE), Mutation calling using a Markov Substitution model for Evolution, a novel approach for modeling the evolution of the allelic composition of the tumor and normal tissue at each reference base. MuSE adopts a sample-specific error model that reflects the underlying tumor heterogeneity to greatly improve the overall accuracy. We demonstrate the accuracy of MuSE in calling subclonal mutations in the context of large-scale tumor sequencing projects using whole exome and whole genome sequencing.
Collapse
Affiliation(s)
- Yu Fan
- Department of Bioinformatics and Computational Biology - Unit 1410, The University of Texas MD Anderson Cancer Center, P. O. Box 301402, Houston, 77230-1402, TX, USA
| | - Liu Xi
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Alkek N1419, Houston, 77030-3411, TX, USA
| | - Daniel S T Hughes
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Alkek N1419, Houston, 77030-3411, TX, USA
| | - Jianjun Zhang
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Boulevard, Houston, 77030, TX, USA
| | - Jianhua Zhang
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Boulevard, Houston, 77030, TX, USA
| | - P Andrew Futreal
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Boulevard, Houston, 77030, TX, USA
| | - David A Wheeler
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Alkek N1419, Houston, 77030-3411, TX, USA
| | - Wenyi Wang
- Department of Bioinformatics and Computational Biology - Unit 1410, The University of Texas MD Anderson Cancer Center, P. O. Box 301402, Houston, 77230-1402, TX, USA.
| |
Collapse
|
61
|
Lai Z, Markovets A, Ahdesmaki M, Chapman B, Hofmann O, McEwen R, Johnson J, Dougherty B, Barrett JC, Dry JR. VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research. Nucleic Acids Res 2016; 44:e108. [PMID: 27060149 PMCID: PMC4914105 DOI: 10.1093/nar/gkw227] [Citation(s) in RCA: 536] [Impact Index Per Article: 67.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2015] [Accepted: 03/22/2016] [Indexed: 12/22/2022] Open
Abstract
Accurate variant calling in next generation sequencing (NGS) is critical to understand cancer genomes better. Here we present VarDict, a novel and versatile variant caller for both DNA- and RNA-sequencing data. VarDict simultaneously calls SNV, MNV, InDels, complex and structural variants, expanding the detected genetic driver landscape of tumors. It performs local realignments on the fly for more accurate allele frequency estimation. VarDict performance scales linearly to sequencing depth, enabling ultra-deep sequencing used to explore tumor evolution or detect tumor DNA circulating in blood. In addition, VarDict performs amplicon aware variant calling for polymerase chain reaction (PCR)-based targeted sequencing often used in diagnostic settings, and is able to detect PCR artifacts. Finally, VarDict also detects differences in somatic and loss of heterozygosity variants between paired samples. VarDict reprocessing of The Cancer Genome Atlas (TCGA) Lung Adenocarcinoma dataset called known driver mutations in KRAS, EGFR, BRAF, PIK3CA and MET in 16% more patients than previously published variant calls. We believe VarDict will greatly facilitate application of NGS in clinical cancer research.
Collapse
Affiliation(s)
- Zhongwu Lai
- Oncology iMed, AstraZeneca, Waltham, MA 02451, USA
| | | | | | - Brad Chapman
- Bioinformatics Core, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Oliver Hofmann
- Bioinformatics Core, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA Wolfson Wohl Cancer Research Centre, Institute of Cancer Sciences, University of Glasgow, Bearsden Glasgow, G61 1QH, UK
| | | | | | | | | | | |
Collapse
|