1
|
Maruzani R, Brierley L, Jorgensen A, Fowler A. Benchmarking UMI-aware and standard variant callers for low frequency ctDNA variant detection. BMC Genomics 2024; 25:827. [PMID: 39227777 PMCID: PMC11370058 DOI: 10.1186/s12864-024-10737-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 08/22/2024] [Indexed: 09/05/2024] Open
Abstract
BACKGROUND Circulating tumour DNA (ctDNA) is a subset of cell free DNA (cfDNA) released by tumour cells into the bloodstream. Circulating tumour DNA has shown great potential as a biomarker to inform treatment in cancer patients. Collecting ctDNA is minimally invasive and reflects the entire genetic makeup of a patient's cancer. ctDNA variants in NGS data can be difficult to distinguish from sequencing and PCR artefacts due to low abundance, particularly in the early stages of cancer. Unique Molecular Identifiers (UMIs) are short sequences ligated to the sequencing library before amplification. These sequences are useful for filtering out low frequency artefacts. The utility of ctDNA as a cancer biomarker depends on accurate detection of cancer variants. RESULTS In this study, we benchmarked six variant calling tools, including two UMI-aware callers for their ability to call ctDNA variants. The standard variant callers tested included Mutect2, bcftools, LoFreq and FreeBayes. The UMI-aware variant callers benchmarked were UMI-VarCal and UMIErrorCorrect. We used both datasets with known variants spiked in at low frequencies, and datasets containing ctDNA, and generated synthetic UMI sequences for these datasets. Variant callers displayed different preferences for sensitivity and specificity. Mutect2 showed high sensitivity, while returning more privately called variants than any other caller in data without synthetic UMIs - an indicator of false positive variant discovery. In data encoded with synthetic UMIs, UMI-VarCal detected fewer putative false positive variants than all other callers in synthetic datasets. Mutect2 showed a balance between high sensitivity and specificity in data encoded with synthetic UMIs. CONCLUSIONS Our results indicate UMI-aware variant callers have potential to improve sensitivity and specificity in calling low frequency ctDNA variants over standard variant calling tools. There is a growing need for further development of UMI-aware variant calling tools if effective early detection methods for cancer using ctDNA samples are to be realised.
Collapse
Affiliation(s)
- Rugare Maruzani
- Department of Health Data Science, Institute of Population Health, University of Liverpool, Waterhouse Building, Block F, Brownlow Street, Liverpool, L69 3GF, UK.
| | - Liam Brierley
- Department of Health Data Science, Institute of Population Health, University of Liverpool, Waterhouse Building, Block F, Brownlow Street, Liverpool, L69 3GF, UK
- MRC-University of Glasgow Centre for Virus Research, University of Glasgow, Garscube Campus, 464 Bearsden Road, Glasgow, G61 1QH, UK
| | - Andrea Jorgensen
- Department of Health Data Science, Institute of Population Health, University of Liverpool, Waterhouse Building, Block F, Brownlow Street, Liverpool, L69 3GF, UK
| | - Anna Fowler
- Department of Health Data Science, Institute of Population Health, University of Liverpool, Waterhouse Building, Block F, Brownlow Street, Liverpool, L69 3GF, UK
| |
Collapse
|
2
|
Andersson D, Kebede FT, Escobar M, Österlund T, Ståhlberg A. Principles of digital sequencing using unique molecular identifiers. Mol Aspects Med 2024; 96:101253. [PMID: 38367531 DOI: 10.1016/j.mam.2024.101253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 01/26/2024] [Accepted: 02/03/2024] [Indexed: 02/19/2024]
Abstract
Massively parallel sequencing technologies have long been used in both basic research and clinical routine. The recent introduction of digital sequencing has made previously challenging applications possible by significantly improving sensitivity and specificity to now allow detection of rare sequence variants, even at single molecule level. Digital sequencing utilizes unique molecular identifiers (UMIs) to minimize sequencing-induced errors and quantification biases. Here, we discuss the principles of UMIs and how they are used in digital sequencing. We outline the properties of different UMI types and the consequences of various UMI approaches in relation to experimental protocols and bioinformatics. Finally, we describe how digital sequencing can be applied in specific research fields, focusing on cancer management where it can be used in screening of asymptomatic individuals, diagnosis, treatment prediction, prognostication, monitoring treatment efficacy and early detection of treatment resistance as well as relapse.
Collapse
Affiliation(s)
- Daniel Andersson
- Sahlgrenska Center for Cancer Research, Department of Laboratory Medicine, Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg, 413 90, Gothenburg, Sweden
| | - Firaol Tamiru Kebede
- Sahlgrenska Center for Cancer Research, Department of Laboratory Medicine, Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg, 413 90, Gothenburg, Sweden
| | - Mandy Escobar
- Sahlgrenska Center for Cancer Research, Department of Laboratory Medicine, Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg, 413 90, Gothenburg, Sweden
| | - Tobias Österlund
- Sahlgrenska Center for Cancer Research, Department of Laboratory Medicine, Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg, 413 90, Gothenburg, Sweden; Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, 413 90, Gothenburg, Sweden; Department of Clinical Genetics and Genomics, Sahlgrenska University Hospital, 413 45, Gothenburg, Sweden
| | - Anders Ståhlberg
- Sahlgrenska Center for Cancer Research, Department of Laboratory Medicine, Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg, 413 90, Gothenburg, Sweden; Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, 413 90, Gothenburg, Sweden; Department of Clinical Genetics and Genomics, Sahlgrenska University Hospital, 413 45, Gothenburg, Sweden.
| |
Collapse
|
3
|
Xiang X, Lu B, Song D, Li J, Shu K, Pu D. Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data. Sci Rep 2023; 13:20444. [PMID: 37993475 PMCID: PMC10665316 DOI: 10.1038/s41598-023-47135-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 11/09/2023] [Indexed: 11/24/2023] Open
Abstract
Detection of low-frequency variants with high accuracy plays an important role in biomedical research and clinical practice. However, it is challenging to do so with next-generation sequencing (NGS) approaches due to the high error rates of NGS. To accurately distinguish low-level true variants from these errors, many statistical variants calling tools for calling low-frequency variants have been proposed, but a systematic performance comparison of these tools has not yet been performed. Here, we evaluated four raw-reads-based variant callers (SiNVICT, outLyzer, Pisces, and LoFreq) and four UMI-based variant callers (DeepSNVMiner, MAGERI, smCounter2, and UMI-VarCal) considering their capability to call single nucleotide variants (SNVs) with allelic frequency as low as 0.025% in deep sequencing data. We analyzed a total of 54 simulated data with various sequencing depths and variant allele frequencies (VAFs), two reference data, and Horizon Tru-Q sample data. The results showed that the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers regarding detection limit. Sequencing depth had almost no effect on the UMI-based callers but significantly influenced on the raw-reads-based callers. Regardless of the sequencing depth, MAGERI showed the fastest analysis, while smCounter2 consistently took the longest to finish the variant calling process. Overall, DeepSNVMiner and UMI-VarCal performed the best with considerably good sensitivity and precision of 88%, 100%, and 84%, 100%, respectively. In conclusion, the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers in terms of sensitivity and precision. We recommend using DeepSNVMiner and UMI-VarCal for low-frequency variant detection. The results provide important information regarding future directions for reliable low-frequency variant detection and algorithm development, which is critical in genetics-based medical research and clinical applications.
Collapse
Affiliation(s)
- Xudong Xiang
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Bowen Lu
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Dongyang Song
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Jie Li
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Kunxian Shu
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Dan Pu
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| |
Collapse
|
4
|
Levin I, Štrajbl M, Fastman Y, Baran D, Twito S, Mioduser J, Keren A, Fischman S, Zhenin M, Nimrod G, Levitin N, Mayor MB, Gadrich M, Ofran Y. Accurate profiling of full-length Fv in highly homologous antibody libraries using UMI tagged short reads. Nucleic Acids Res 2023; 51:e61. [PMID: 37014016 PMCID: PMC10287906 DOI: 10.1093/nar/gkad235] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 03/14/2023] [Accepted: 03/29/2023] [Indexed: 04/05/2023] Open
Abstract
Deep parallel sequencing (NGS) is a viable tool for monitoring scFv and Fab library dynamics in many antibody engineering high-throughput screening efforts. Although very useful, the commonly used Illumina NGS platform cannot handle the entire sequence of scFv or Fab in a single read, usually focusing on specific CDRs or resorting to sequencing VH and VL variable domains separately, thus limiting its utility in comprehensive monitoring of selection dynamics. Here we present a simple and robust method for deep sequencing repertoires of full length scFv, Fab and Fv antibody sequences. This process utilizes standard molecular procedures and unique molecular identifiers (UMI) to pair separately sequenced VH and VL. We show that UMI assisted VH-VL matching allows for a comprehensive and highly accurate mapping of full length Fv clonal dynamics in large highly homologous antibody libraries, as well as identification of rare variants. In addition to its utility in synthetic antibody discovery processes, our method can be instrumental in generating large datasets for machine learning (ML) applications, which in the field of antibody engineering has been hampered by conspicuous paucity of large scale full length Fv data.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Adi Keren
- Biolojic Design, Ltd, Rehovot, Israel
| | | | | | | | | | | | | | - Yanay Ofran
- Biolojic Design, Ltd, Rehovot, Israel
- The Goodman Faculty of Life Sciences, Bar Ilan University, Ramat Gan, Israel
| |
Collapse
|
5
|
Peng X, Dorman KS. Accurate estimation of molecular counts from amplicon sequence data with unique molecular identifiers. Bioinformatics 2023; 39:6971842. [PMID: 36610988 PMCID: PMC9891248 DOI: 10.1093/bioinformatics/btad002] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Revised: 11/16/2022] [Accepted: 01/04/2023] [Indexed: 01/09/2023] Open
Abstract
MOTIVATION Amplicon sequencing is widely applied to explore heterogeneity and rare variants in genetic populations. Resolving true biological variants and quantifying their abundance is crucial for downstream analyses, but measured abundances are distorted by stochasticity and bias in amplification, plus errors during polymerase chain reaction (PCR) and sequencing. One solution attaches unique molecular identifiers (UMIs) to sample sequences before amplification. Counting UMIs instead of sequences provides unbiased estimates of abundance. While modern methods improve over naïve counting by UMI identity, most do not account for UMI reuse or collision, and they do not adequately model PCR and sequencing errors in the UMIs and sample sequences. RESULTS We introduce Deduplication and Abundance estimation with UMIs (DAUMI), a probabilistic framework to detect true biological amplicon sequences and accurately estimate their deduplicated abundance. DAUMI recognizes UMI collision, even on highly similar sequences, and detects and corrects most PCR and sequencing errors in the UMI and sampled sequences. DAUMI performs better on simulated and real data compared to other UMI-aware clustering methods. AVAILABILITY AND IMPLEMENTATION Source code is available at https://github.com/DormanLab/AmpliCI. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiyu Peng
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | | |
Collapse
|
6
|
Österlund T, Filges S, Johansson G, Ståhlberg A. UMIErrorCorrect and UMIAnalyzer: Software for Consensus Read Generation, Error Correction, and Visualization Using Unique Molecular Identifiers. Clin Chem 2022; 68:1425-1435. [PMID: 36031761 DOI: 10.1093/clinchem/hvac136] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Accepted: 07/08/2022] [Indexed: 11/14/2022]
Abstract
BACKGROUND Targeted sequencing using unique molecular identifiers (UMIs) enables detection of rare variant alleles in challenging applications, such as cell-free DNA analysis from liquid biopsies. Standard bioinformatics pipelines for data processing and variant calling are not adapted for deep-sequencing data containing UMIs, are inflexible, and require multistep workflows or dedicated computing resources. METHODS We developed a bioinformatics pipeline using Python and an R package for data analysis and visualization. To validate our pipeline, we analyzed cell-free DNA reference material with known mutant allele frequencies (0%, 0.125%, 0.25%, and 1%) and public data sets. RESULTS We developed UMIErrorCorrect, a bioinformatics pipeline for analyzing sequencing data containing UMIs. UMIErrorCorrect only requires fastq files as inputs and performs alignment, UMI clustering, error correction, and variant calling. We also provide UMIAnalyzer, a graphical user interface, for data mining, visualization, variant interpretation, and report generation. UMIAnalyzer allows the user to adjust analysis parameters and study their effect on variant calling. We demonstrated the flexibility of UMIErrorCorrect by analyzing data from 4 different targeted sequencing protocols. We also show its ability to detect different mutant allele frequencies in standardized cell-free DNA reference material. UMIErrorCorrect outperformed existing pipelines for targeted UMI sequencing data in terms of variant detection sensitivity. CONCLUSIONS UMIErrorCorrect and UMIAnalyzer are comprehensive and customizable bioinformatics tools that can be applied to any type of library preparation protocol and enrichment chemistry using UMIs. Access to simple, generic, and open-source bioinformatics tools will facilitate the implementation of UMI-based sequencing approaches in basic research and clinical applications.
Collapse
Affiliation(s)
- Tobias Österlund
- Department of Clinical Genetics and Genomics, Sahlgrenska University Hospital, Region Västra Götaland, Gothenburg, Sweden.,Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg, Sweden.,Sahlgrenska Center for Cancer Research, Department of Laboratory Medicine, Institute of Biomedicine, University of Gothenburg, Gothenburg, Sweden
| | - Stefan Filges
- Sahlgrenska Center for Cancer Research, Department of Laboratory Medicine, Institute of Biomedicine, University of Gothenburg, Gothenburg, Sweden
| | - Gustav Johansson
- Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg, Sweden.,Sahlgrenska Center for Cancer Research, Department of Laboratory Medicine, Institute of Biomedicine, University of Gothenburg, Gothenburg, Sweden.,SiMSen Diagnostics AB, Gothenburg, Sweden
| | - Anders Ståhlberg
- Department of Clinical Genetics and Genomics, Sahlgrenska University Hospital, Region Västra Götaland, Gothenburg, Sweden.,Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg, Sweden.,Sahlgrenska Center for Cancer Research, Department of Laboratory Medicine, Institute of Biomedicine, University of Gothenburg, Gothenburg, Sweden
| |
Collapse
|
7
|
Coto-Llerena M, Benjak A, Gallon J, Meier MA, Boldanova T, Terracciano LM, Ng CKY, Piscuoglio S. Circulating Cell-Free DNA Captures the Intratumor Heterogeneity in Multinodular Hepatocellular Carcinoma. JCO Precis Oncol 2022; 6:e2100335. [PMID: 35263170 PMCID: PMC8926063 DOI: 10.1200/po.21.00335] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Hepatocellular carcinoma (HCC) is a highly heterogeneous disease, with more than 40% of patients initially diagnosed with multinodular HCCs. Although circulating cell-free DNA (cfDNA) has been shown to effectively detect somatic mutations, little is known about its utility to capture intratumor heterogeneity in patients with multinodular HCC undergoing systemic treatment.
Collapse
Affiliation(s)
- Mairene Coto-Llerena
- Institute of Medical Genetics and Pathology, University Hospital Basel, Basel, Switzerland.,Visceral Surgery and Precision Medicine Research Laboratory, Department of Biomedicine, University of Basel, Basel, Switzerland
| | - Andrej Benjak
- Department for BioMedical Research (DBMR), University of Bern, Bern, Switzerland
| | - John Gallon
- Visceral Surgery and Precision Medicine Research Laboratory, Department of Biomedicine, University of Basel, Basel, Switzerland
| | - Marie-Anne Meier
- Hepatology Laboratory, Department of Biomedicine, University of Basel, Basel, Switzerland.,Division of Gastroenterology and Hepatology, University Hospital Basel, Basel, Switzerland
| | - Tuyana Boldanova
- Hepatology Laboratory, Department of Biomedicine, University of Basel, Basel, Switzerland.,Division of Gastroenterology and Hepatology, University Hospital Basel, Basel, Switzerland
| | - Luigi M Terracciano
- Department of Anatomic Pathology, IRCCS Humanitas Research Hospital, Rozzano, Milan, Italy.,Department of Biomedical Sciences, Humanitas University, Pieve Emanuele, Milan, Italy
| | - Charlotte K Y Ng
- Department for BioMedical Research (DBMR), University of Bern, Bern, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Salvatore Piscuoglio
- Institute of Medical Genetics and Pathology, University Hospital Basel, Basel, Switzerland.,Visceral Surgery and Precision Medicine Research Laboratory, Department of Biomedicine, University of Basel, Basel, Switzerland
| |
Collapse
|
8
|
Biezuner T, Brilon Y, Arye AB, Oron B, Kadam A, Danin A, Furer N, Minden MD, Hwan Kim DD, Shapira S, Arber N, Dick J, Thavendiranathan P, Moskovitz Y, Kaushansky N, Chapal-Ilani N, Shlush LI. An improved molecular inversion probe based targeted sequencing approach for low variant allele frequency. NAR Genom Bioinform 2022; 4:lqab125. [PMID: 35156021 PMCID: PMC8826764 DOI: 10.1093/nargab/lqab125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2021] [Revised: 11/25/2021] [Accepted: 01/25/2022] [Indexed: 11/23/2022] Open
Abstract
Deep targeted sequencing technologies are still not widely used in clinical practice due to the complexity of the methods and their cost. The Molecular Inversion Probes (MIP) technology is cost effective and scalable in the number of targets, however, suffers from low overall performance especially in GC rich regions. In order to improve the MIP performance, we sequenced a large cohort of healthy individuals (n = 4417), with a panel of 616 MIPs, at high depth in duplicates. To improve the previous state-of-the-art statistical model for low variant allele frequency, we selected 4635 potentially positive variants and validated them using amplicon sequencing. Using machine learning prediction tools, we significantly improved precision of 10–56.25% (P < 0.0004) to detect variants with VAF > 0.005. We further developed biochemically modified MIP protocol and improved its turn-around-time to ∼4 h. Our new biochemistry significantly improved uniformity, GC-Rich regions coverage, and enabled 95% on target reads in a large MIP panel of 8349 genomic targets. Overall, we demonstrate an enhancement of the MIP targeted sequencing approach in both detection of low frequency variants and in other key parameters, paving its way to become an ultrafast cost-effective research and clinical diagnostic tool.
Collapse
Affiliation(s)
- Tamir Biezuner
- Department of Immunology, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Yardena Brilon
- Department of Immunology, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Asaf Ben Arye
- Department of Statistics and Operations Research, Tel Aviv University, Ramat Aviv, Israel
| | - Barak Oron
- Department of Immunology, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Aditee Kadam
- Department of Immunology, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Adi Danin
- Department of Immunology, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Nili Furer
- Department of Immunology, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Mark D Minden
- Princess Margaret Cancer Centre, University Health Network (UHN), Department of Medical Oncology & Hematology, Toronto, ON, Canada
| | - Dennis Dong Hwan Kim
- Princess Margaret Cancer Centre, University Health Network (UHN), Department of Medical Oncology & Hematology, Toronto, ON, Canada
| | | | | | - John Dick
- Princess Margaret Cancer Centre, University Health Network (UHN), Department of Molecular Genetics, Toronto, ON, Canada
| | - Paaladinesh Thavendiranathan
- Department of Medicine, Division of Cardiology, Ted Rogers Program in Cardiotoxicity Prevention, Peter Munk Cardiac Center, Toronto General Hospital, University Health Network, University of Toronto, Toronto, ON, Canada
| | - Yoni Moskovitz
- Department of Immunology, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Nathali Kaushansky
- Department of Immunology, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Noa Chapal-Ilani
- Department of Immunology, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Liran I Shlush
- Department of Immunology, Weizmann Institute of Science, Rehovot 761001, Israel
| |
Collapse
|
9
|
A Retrospective Statistical Validation Approach for Panel of Normal-Based Single-Nucleotide Variant Detection in Tumor Sequencing. J Mol Diagn 2022; 24:41-47. [PMID: 34974877 DOI: 10.1016/j.jmoldx.2021.09.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2021] [Revised: 08/28/2021] [Accepted: 09/28/2021] [Indexed: 11/22/2022] Open
Abstract
An important step of somatic variant calling algorithms for deep sequencing data is quantifying the errors. For targeted sequencing in which hotspot mutations are of interest, site-specific error estimation allows more accurate calling. The site-specific error rates are often estimated from a panel of normal samples, which has limited size and is subject to sampling bias and variance. We propose a novel statistical validation method for single-nucleotide variation (SNV) calling based on historical data. The validation method extracts the high-quality reads from the Binary Alignment/Map (BAM) files, finds the negative samples in the data, and builds a statistical model to call individual samples. It is particularly useful in detecting low-frequency variants that may be missed by traditional panel of normal-based SNV methods. The proposed method makes it possible to launch a simple and parallel validation pipeline for SNV calling and improve the detection limit.
Collapse
|
10
|
Zhao X, Hu AC, Wang S, Wang X. Calling small variants using universality with Bayes-factor-adjusted odds ratios. Brief Bioinform 2021; 23:6427501. [PMID: 34791010 DOI: 10.1093/bib/bbab458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 09/26/2021] [Accepted: 10/07/2021] [Indexed: 11/12/2022] Open
Abstract
The application of next-generation sequencing in research and particularly in clinical routine requires highly accurate variant calling. Here we describe UVC, a method for calling small variants of germline or somatic origin. By unifying opposite assumptions with sublation, we discovered the following two empirical laws to improve variant calling: allele fraction at high sequencing depth is inversely proportional to the cubic root of variant-calling error rate, and odds ratios adjusted with Bayes factors can model various sequencing biases. UVC outperformed other variant callers on the GIAB germline truth sets, 192 scenarios of in silico mixtures simulating 192 combinations of tumor/normal sequencing depths and tumor/normal purities, the GIAB somatic truth sets derived from physical mixture, and the SEQC2 somatic reference sets derived from the breast-cancer cell-line HCC1395. UVC achieved 100% concordance with the manual review conducted by multiple independent researchers on a Qiagen 71-gene-panel dataset derived from 16 patients with colon adenoma. UVC outperformed other unique molecular identifier (UMI)-aware variant callers on the datasets used for publishing these variant callers. Performance was measured with sensitivity-specificity trade off for called variants. The improved variant calls generated by UVC from previously published UMI-based sequencing data provided additional insight about DNA damage repair. UVC is open-sourced under the BSD 3-Clause license at https://github.com/genetronhealth/uvc and quay.io/genetronhealth/gcc-6-3-0-uvc-0-6-0-441a694.
Collapse
Affiliation(s)
- Xiaofei Zhao
- Genetron Health (Beijing) Co. Ltd, Beijing 102208, China
| | - Allison C Hu
- Genetron Health (Beijing) Co. Ltd, Beijing 102208, China
| | - Sizhen Wang
- Genetron Health (Beijing) Co. Ltd, Beijing 102208, China
| | - Xiaoyue Wang
- State Key Laboratory of Medical Molecular Biology, Center for Bioinformatics, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing 100005, China
| |
Collapse
|
11
|
De Luca G, Dono M. The Opportunities and Challenges of Molecular Tagging Next-Generation Sequencing in Liquid Biopsy. Mol Diagn Ther 2021; 25:537-547. [PMID: 34224097 DOI: 10.1007/s40291-021-00542-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/20/2021] [Indexed: 10/20/2022]
Abstract
Liquid biopsy (LB) is a promising tool that is rapidly evolving as a standard of care in early and advanced stages of cancer settings. Next-generation sequencing (NGS) methods have become essential in molecular diagnostics and clinical laboratories dealing with LB analytes, i.e., cell-free DNA and RNA. The sensitivity and high-throughput capacity of NGS enable us to overcome technical issues that are mainly attributable to low-abundance (below 1% mutated allelic frequency) tumour genetic material circulating within biological fluids. In this context, the introduction of unique molecular identifiers (UMIs), also known as molecular barcodes, applied to various NGS platforms greatly improved the characterization of rare genetic alterations, as they resulted in a drastic reduction in background noise while maintaining high levels of positive predictive value and sensitivity. Different UMI strategies have been developed, such as single (e.g., safe-sequencing system, Safe-SeqS) or double (duplex-sequencing system, Duplex-Seq) strand-based labelling, and, currently, considerable results corroborate their potential implementation in a routine laboratory. Recently, the US Food and Drug Administration approved the clinical use of two comprehensive UMI-based NGS assays (FoundationOne Liquid CDx and Guardant360 CDx) in cfDNA mutational assessment. However, to definitively translate LB into clinical practice, UMI-based NGS protocols should meet certain feasibility requirements in terms of cost-effectiveness, wet laboratory performance and easy access to web-source and bioinformatic tools for downstream molecular data.
Collapse
Affiliation(s)
- Giuseppa De Luca
- Molecular Diagnostic Unit, IRCCS Ospedale Policlinico San Martino, 16132, Genova, Italy
| | - Mariella Dono
- Molecular Diagnostic Unit, IRCCS Ospedale Policlinico San Martino, 16132, Genova, Italy.
| |
Collapse
|
12
|
Estimation of tuna population by the improved analytical pipeline of unique molecular identifier-assisted HaCeD-Seq (haplotype count from eDNA). Sci Rep 2021; 11:7031. [PMID: 33846364 PMCID: PMC8041778 DOI: 10.1038/s41598-021-86190-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Accepted: 03/12/2021] [Indexed: 02/01/2023] Open
Abstract
Many studies have investigated the ability to identify species from environmental DNA (eDNA). However, even when individual species are identified, the accurate estimation of their abundances by traditional eDNA analyses has been still difficult. We previously developed a novel analytical method called HaCeD-Seq (Haplotype Count from eDNA), which focuses on the mitochondrial D-loop sequence. The D-loop is a rapidly evolving sequence and has been used to estimate the abundance of eel species in breeding water. In the current study, we have further improved this method by applying unique molecular identifier (UMI) tags, which eliminate the PCR and sequencing errors and extend the detection range by an order of magnitude. Based on this improved HaCeD-Seq pipeline, we computed the abundance of Pacific bluefin tuna (Thunnus orientalis) in aquarium tanks at the Tokyo Sea Life Park (Kasai, Tokyo, Japan). This tuna species is commercially important but is at high risk of resource depletion. With the developed UMI tag method, 90 out of 96 haplotypes (94%) were successfully detected from Pacific bluefin tuna eDNA. By contrast, only 29 out of 96 haplotypes (30%) were detected when UMI tags were not used. Our findings indicate the potential for conducting non-invasive fish stock surveys by sampling eDNA.
Collapse
|
13
|
Abstract
Application of next generation sequencing techniques in the field of liquid biopsy, in particular urine, requires specific bioinformatics methods in order to deal with its peculiarity. Many aspects of cancer can be explored starting from nucleic acids, especially from cell-free DNA and circulating tumor DNA in order to characterize cancer. It is possible to detect small mutations, as single nucleotide variants, small insertions and deletions, copy-number alterations, and epigenetic profiles. Due to the low fraction of circulating tumor DNA over the whole cell-free DNA, some methods have been exploited. One of them is the application of unique barcodes to each DNA fragment in order to lower the limit of detection of cancer-related variants. Some bioinformatics workflows and tools are the same of a classic analysis of tumor tissue, but there are some steps in which specific algorithms have to be introduced.
Collapse
|
14
|
Sater V, Viailly PJ, Lecroq T, Prieur-Gaston É, Bohers É, Viennot M, Ruminy P, Dauchel H, Vera P, Jardin F. UMI-VarCal: a new UMI-based variant caller that efficiently improves low-frequency variant detection in paired-end sequencing NGS libraries. Bioinformatics 2020; 36:2718-2724. [PMID: 31985795 DOI: 10.1093/bioinformatics/btaa053] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Revised: 11/18/2019] [Accepted: 01/20/2020] [Indexed: 01/03/2023] Open
Abstract
MOTIVATION Next-generation sequencing has become the go-to standard method for the detection of single-nucleotide variants in tumor cells. The use of such technologies requires a PCR amplification step and a sequencing step, steps in which artifacts are introduced at very low frequencies. These artifacts are often confused with true low-frequency variants that can be found in tumor cells and cell-free DNA. The recent use of unique molecular identifiers (UMI) in targeted sequencing protocols has offered a trustworthy approach to filter out artefactual variants and accurately call low-frequency variants. However, the integration of UMI analysis in the variant calling process led to developing tools that are significantly slower and more memory consuming than raw-reads-based variant callers. RESULTS We present UMI-VarCal, a UMI-based variant caller for targeted sequencing data with better sensitivity compared to other variant callers. Being developed with performance in mind, UMI-VarCal stands out from the crowd by being one of the few variant callers that do not rely on SAMtools to do their pileup. Instead, at its core runs an innovative homemade pileup algorithm specifically designed to treat the UMI tags in the reads. After the pileup, a Poisson statistical test is applied at every position to determine if the frequency of the variant is significantly higher than the background error noise. Finally, an analysis of UMI tags is performed, a strand bias and a homopolymer length filter are applied to achieve better accuracy. We illustrate the results obtained using UMI-VarCal through the sequencing of tumor samples and we show how UMI-VarCal is both faster and more sensitive than other publicly available solutions. AVAILABILITY AND IMPLEMENTATION The entire pipeline is available at https://gitlab.com/vincent-sater/umi-varcal-master under MIT license. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Pierre-Julien Viailly
- Department of Pathology, Centre Henri Becquerel.,INSERM U1245, University of Normandie UNIROUEN, Rouen 76000, France
| | | | | | - Élodie Bohers
- Department of Pathology, Centre Henri Becquerel.,INSERM U1245, University of Normandie UNIROUEN, Rouen 76000, France
| | - Mathieu Viennot
- Department of Pathology, Centre Henri Becquerel.,INSERM U1245, University of Normandie UNIROUEN, Rouen 76000, France
| | - Philippe Ruminy
- Department of Pathology, Centre Henri Becquerel.,INSERM U1245, University of Normandie UNIROUEN, Rouen 76000, France
| | - Hélène Dauchel
- Department of Pathology, Centre Henri Becquerel.,INSERM U1245, University of Normandie UNIROUEN, Rouen 76000, France
| | - Pierre Vera
- University of Normandie UNIROUEN, LITIS EA 4108.,Department of Pathology, Centre Henri Becquerel
| | - Fabrice Jardin
- Department of Pathology, Centre Henri Becquerel.,INSERM U1245, University of Normandie UNIROUEN, Rouen 76000, France
| |
Collapse
|
15
|
Wu L, Deng Q, Xu Z, Zhou S, Li C, Li YX. A novel virtual barcode strategy for accurate panel-wide variant calling in circulating tumor DNA. BMC Bioinformatics 2020; 21:127. [PMID: 32245364 PMCID: PMC7118954 DOI: 10.1186/s12859-020-3412-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 02/12/2020] [Indexed: 01/19/2023] Open
Abstract
Background Hybrid capture-based next-generation sequencing of DNA has been widely applied in the detection of circulating tumor DNA (ctDNA). Various methods have been proposed for ctDNA detection, but low-allelic-fraction (AF) variants are still a great challenge. In addition, no panel-wide calling algorithm is available, which hiders the full usage of ctDNA based ‘liquid biopsy’. Thus, we developed the VBCALAVD (Virtual Barcode-based Calling Algorithm for Low Allelic Variant Detection) in silico to overcome these limitations. Results Based on the understanding of the nature of ctDNA fragmentation, a novel platform-independent virtual barcode strategy was established to eliminate random sequencing errors by clustering sequencing reads into virtual families. Stereotypical mutant-family-level background artifacts were polished by constructing AF distributions. Three additional robust fine-tuning filters were obtained to eliminate stochastic mutant-family-level noises. The performance of our algorithm was validated using cell-free DNA reference standard samples (cfDNA RSDs) and normal healthy cfDNA samples (cfDNA controls). For the RSDs with AFs of 0.1, 0.2, 0.5, 1 and 5%, the mean F1 scores were 0.43 (0.25~0.56), 0.77, 0.92, 0.926 (0.86~1.0) and 0.89 (0.75~1.0), respectively, which indicates that the proposed approach significantly outperforms the published algorithms. Among controls, no false positives were detected. Meanwhile, characteristics of mutant-family-level noise and quantitative determinants of divergence between mutant-family-level noises from controls and RSDs were clearly depicted. Conclusions Due to its good performance in the detection of low-AF variants, our algorithm will greatly facilitate the noninvasive panel-wide detection of ctDNA in research and clinical settings. The whole pipeline is available at https://github.com/zhaodalv/VBCALAVD.
Collapse
Affiliation(s)
- Leilei Wu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Qinfang Deng
- Department of Oncology, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai, 200433, China
| | - Ze Xu
- Smartquerier Biomedicine, Shanghai, 201203, China
| | - Songwen Zhou
- Department of Oncology, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai, 200433, China.
| | - Chao Li
- Smartquerier Biomedicine, Shanghai, 201203, China. .,Shanghai Center for Bioinformation Technology, Shanghai, 201203, China.
| | - Yi-Xue Li
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China. .,Shanghai Center for Bioinformation Technology, Shanghai, 201203, China. .,CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
| |
Collapse
|
16
|
Stoler N, Arbeithuber B, Povysil G, Heinzl M, Salazar R, Makova KD, Tiemann-Boege I, Nekrutenko A. Family reunion via error correction: an efficient analysis of duplex sequencing data. BMC Bioinformatics 2020; 21:96. [PMID: 32131723 PMCID: PMC7057607 DOI: 10.1186/s12859-020-3419-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Accepted: 02/17/2020] [Indexed: 01/31/2023] Open
Abstract
BACKGROUND Duplex sequencing is the most accurate approach for identification of sequence variants present at very low frequencies. Its power comes from pooling together multiple descendants of both strands of original DNA molecules, which allows distinguishing true nucleotide substitutions from PCR amplification and sequencing artifacts. This strategy comes at a cost-sequencing the same molecule multiple times increases dynamic range but significantly diminishes coverage, making whole genome duplex sequencing prohibitively expensive. Furthermore, every duplex experiment produces a substantial proportion of singleton reads that cannot be used in the analysis and are thrown away. RESULTS In this paper we demonstrate that a significant fraction of these reads contains PCR or sequencing errors within duplex tags. Correction of such errors allows "reuniting" these reads with their respective families increasing the output of the method and making it more cost effective. CONCLUSIONS We combine an error correction strategy with a number of algorithmic improvements in a new version of the duplex analysis software, Du Novo 2.0. It is written in Python, C, AWK, and Bash. It is open source and readily available through Galaxy, Bioconda, and Github: https://github.com/galaxyproject/dunovo.
Collapse
Affiliation(s)
- Nicholas Stoler
- Graduate Program in Bioinformatics and Genomics, The Huck Institutes for Life Sciences, The Pennsylvania State University, University Park, PA, USA
| | - Barbara Arbeithuber
- Department of Biology, The Pennsylvania State University, University Park, PA, USA
| | - Gundula Povysil
- Institut für Biophysik, Johannes Kepler Universität, Linz, Österreich, Austria
- Present Address: Institute for Genomic Medicine, Columbia University Irving Medical Center, New York, NY, USA
| | - Monika Heinzl
- Institut für Biophysik, Johannes Kepler Universität, Linz, Österreich, Austria
| | - Renato Salazar
- Institut für Biophysik, Johannes Kepler Universität, Linz, Österreich, Austria
| | - Kateryna D Makova
- Department of Biology, The Pennsylvania State University, University Park, PA, USA.
| | - Irene Tiemann-Boege
- Institut für Biophysik, Johannes Kepler Universität, Linz, Österreich, Austria.
| | - Anton Nekrutenko
- Graduate Program in Bioinformatics and Genomics, The Huck Institutes for Life Sciences, The Pennsylvania State University, University Park, PA, USA.
| |
Collapse
|
17
|
Xu C, Gu X, Padmanabhan R, Wu Z, Peng Q, DiCarlo J, Wang Y. smCounter2: an accurate low-frequency variant caller for targeted sequencing data with unique molecular identifiers. Bioinformatics 2020; 35:1299-1309. [PMID: 30192920 PMCID: PMC6477992 DOI: 10.1093/bioinformatics/bty790] [Citation(s) in RCA: 63] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Revised: 08/03/2018] [Accepted: 09/05/2018] [Indexed: 12/30/2022] Open
Abstract
MOTIVATION Low-frequency DNA mutations are often confounded with technical artifacts from sample preparation and sequencing. With unique molecular identifiers (UMIs), most of the sequencing errors can be corrected. However, errors before UMI tagging, such as DNA polymerase errors during end repair and the first PCR cycle, cannot be corrected with single-strand UMIs and impose fundamental limits to UMI-based variant calling. RESULTS We developed smCounter2, a UMI-based variant caller for targeted sequencing data and an upgrade from the current version of smCounter. Compared to smCounter, smCounter2 features lower detection limit that decreases from 1 to 0.5%, better overall accuracy (particularly in non-coding regions), a consistent threshold that can be applied to both deep and shallow sequencing runs, and easier use via a Docker image and code for read pre-processing. We benchmarked smCounter2 against several state-of-the-art UMI-based variant calling methods using multiple datasets and demonstrated smCounter2's superior performance in detecting somatic variants. At the core of smCounter2 is a statistical test to determine whether the allele frequency of the putative variant is significantly above the background error rate, which was carefully modeled using an independent dataset. The improved accuracy in non-coding regions was mainly achieved using novel repetitive region filters that were specifically designed for UMI data. AVAILABILITY AND IMPLEMENTATION The entire pipeline is available at https://github.com/qiaseq/qiaseq-dna under MIT license. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chang Xu
- Life Science Research and Foundation, QIAGEN Sciences Inc., Frederick, MD, USA
| | - Xiujing Gu
- Life Science Research and Foundation, QIAGEN Sciences Inc., Frederick, MD, USA
| | | | - Zhong Wu
- Life Science Research and Foundation, QIAGEN Sciences Inc., Frederick, MD, USA
| | - Quan Peng
- Life Science Research and Foundation, QIAGEN Sciences Inc., Frederick, MD, USA
| | - John DiCarlo
- Life Science Research and Foundation, QIAGEN Sciences Inc., Frederick, MD, USA
| | - Yexun Wang
- Life Science Research and Foundation, QIAGEN Sciences Inc., Frederick, MD, USA
| |
Collapse
|
18
|
Chowdhury R, Maranas CD. From directed evolution to computational enzyme engineering—A review. AIChE J 2019. [DOI: 10.1002/aic.16847] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Ratul Chowdhury
- Department of Chemical Engineering The Pennsylvania State University University Park Pennsylvania
| | - Costas D. Maranas
- Department of Chemical Engineering The Pennsylvania State University University Park Pennsylvania
| |
Collapse
|
19
|
Huang CC, Du M, Wang L. Bioinformatics Analysis for Circulating Cell-Free DNA in Cancer. Cancers (Basel) 2019; 11:cancers11060805. [PMID: 31212602 PMCID: PMC6627444 DOI: 10.3390/cancers11060805] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2019] [Revised: 06/03/2019] [Accepted: 06/06/2019] [Indexed: 12/28/2022] Open
Abstract
Molecular analysis of cell-free DNA (cfDNA) that circulates in plasma and other body fluids represents a "liquid biopsy" approach for non-invasive cancer screening or monitoring. The rapid development of sequencing technologies has made cfDNA a promising source to study cancer development and progression. Specific genetic and epigenetic alterations have been found in plasma, serum, and urine cfDNA and could potentially be used as diagnostic or prognostic biomarkers in various cancer types. In this review, we will discuss the molecular characteristics of cancer cfDNA and major bioinformatics approaches involved in the analysis of cfDNA sequencing data for detecting genetic mutation, copy number alteration, methylation change, and nucleosome positioning variation. We highlight specific challenges in sensitivity to detect genetic aberrations and robustness of statistical analysis. Finally, we provide perspectives regarding the standard and continuing development of bioinformatics analysis to move this promising screening tool into clinical practice.
Collapse
Affiliation(s)
- Chiang-Ching Huang
- Zilber School of Public Health, University of Wisconsin, Milwaukee, WI 53205, USA.
| | - Meijun Du
- Department of Pathology and MCW Cancer Center, Medical College of Wisconsin, Milwaukee, WI 53226, USA.
| | - Liang Wang
- Department of Pathology and MCW Cancer Center, Medical College of Wisconsin, Milwaukee, WI 53226, USA.
| |
Collapse
|
20
|
Mbala-Kingebeni P, Aziza A, Di Paola N, Wiley MR, Makiala-Mandanda S, Caviness K, Pratt CB, Ladner JT, Kugelman JR, Prieto K, Chitty JA, Larson PA, Beitzel B, Ayouba A, Vidal N, Karhemere S, Diop M, Diagne MM, Faye M, Faye O, Aruna A, Nsio J, Mulangu F, Mukadi D, Mukadi P, Kombe J, Mulumba A, Villabona-Arenas CJ, Pukuta E, Gonzalez J, Bartlett ML, Sozhamannan S, Gross SM, Schroth GP, Tim R, Zhao JJ, Kuhn JH, Diallo B, Yao M, Fall IS, Ndjoloko B, Mossoko M, Lacroix A, Delaporte E, Sanchez-Lockhart M, Sall AA, Muyembe-Tamfum JJ, Peeters M, Palacios G, Ahuka-Mundeke S. Medical countermeasures during the 2018 Ebola virus disease outbreak in the North Kivu and Ituri Provinces of the Democratic Republic of the Congo: a rapid genomic assessment. THE LANCET. INFECTIOUS DISEASES 2019; 19:648-657. [PMID: 31000464 DOI: 10.1016/s1473-3099(19)30118-5] [Citation(s) in RCA: 49] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Revised: 02/21/2019] [Accepted: 03/06/2019] [Indexed: 11/30/2022]
Abstract
BACKGROUND The real-time generation of information about pathogen genomes has become a vital goal for transmission analysis and characterisation in rapid outbreak responses. In response to the recently established genomic capacity in the Democratic Republic of the Congo, we explored the real-time generation of genomic information at the start of the 2018 Ebola virus disease (EVD) outbreak in North Kivu Province. METHODS We used targeted-enrichment sequencing to produce two coding-complete Ebola virus genomes 5 days after declaration of the EVD outbreak in North Kivu. Subsequent sequencing efforts yielded an additional 46 genomes. Genomic information was used to assess early transmission, medical countermeasures, and evolution of Ebola virus. FINDINGS The genomic information demonstrated that the EVD outbreak in the North Kivu and Ituri Provinces was distinct from the 2018 EVD outbreak in Équateur Province of the Democratic Republic of the Congo. Primer and probe mismatches to Ebola virus were identified in silico for all deployed diagnostic PCR assays, with the exception of the Cepheid GeneXpert GP assay. INTERPRETATION The first two coding-complete genomes provided actionable information in real-time for the deployment of the rVSVΔG-ZEBOV-GP Ebola virus envelope glycoprotein vaccine, available therapeutics, and sequence-based diagnostic assays. Based on the mutations identified in the Ebola virus surface glycoprotein (GP12) observed in all 48 genomes, deployed monoclonal antibody therapeutics (mAb114 and ZMapp) should be efficacious against the circulating Ebola virus variant. Rapid Ebola virus genomic characterisation should be included in routine EVD outbreak response procedures to ascertain efficacy of medical countermeasures. FUNDING Defense Biological Product Assurance Office.
Collapse
Affiliation(s)
- Placide Mbala-Kingebeni
- Institut National de Recherche Biomédicale, Kinshasa, Democratic Republic of the Congo; TransVIHMI, Institut de Recherche pour le Développement, Institut National de la Santé et de la Recherche Médicale, Université de Montpellier, Montpellier, France; Service de Microbiologie, Cliniques Universitaires de Kinshasa, Kinshasa, Democratic Republic of the Congo
| | - Amuri Aziza
- Institut National de Recherche Biomédicale, Kinshasa, Democratic Republic of the Congo
| | - Nicholas Di Paola
- Center for Genome Sciences, United States Army Medical Research Institute of Infectious Diseases, Frederick, MD, USA
| | - Michael R Wiley
- Center for Genome Sciences, United States Army Medical Research Institute of Infectious Diseases, Frederick, MD, USA; College of Public Health, Northern Arizona University, Flagstaff, AZ, USA
| | - Sheila Makiala-Mandanda
- Institut National de Recherche Biomédicale, Kinshasa, Democratic Republic of the Congo; Service de Microbiologie, Cliniques Universitaires de Kinshasa, Kinshasa, Democratic Republic of the Congo
| | - Katie Caviness
- Center for Genome Sciences, United States Army Medical Research Institute of Infectious Diseases, Frederick, MD, USA
| | - Catherine B Pratt
- Center for Genome Sciences, United States Army Medical Research Institute of Infectious Diseases, Frederick, MD, USA; College of Public Health, Northern Arizona University, Flagstaff, AZ, USA
| | - Jason T Ladner
- University of Nebraska Medical Center, Omaha, NE, USA; The Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | | | - Karla Prieto
- Center for Genome Sciences, United States Army Medical Research Institute of Infectious Diseases, Frederick, MD, USA; College of Public Health, Northern Arizona University, Flagstaff, AZ, USA
| | - Joseph A Chitty
- Center for Genome Sciences, United States Army Medical Research Institute of Infectious Diseases, Frederick, MD, USA
| | - Peter A Larson
- Center for Genome Sciences, United States Army Medical Research Institute of Infectious Diseases, Frederick, MD, USA
| | - Brett Beitzel
- Center for Genome Sciences, United States Army Medical Research Institute of Infectious Diseases, Frederick, MD, USA
| | - Ahidjo Ayouba
- TransVIHMI, Institut de Recherche pour le Développement, Institut National de la Santé et de la Recherche Médicale, Université de Montpellier, Montpellier, France
| | - Nicole Vidal
- TransVIHMI, Institut de Recherche pour le Développement, Institut National de la Santé et de la Recherche Médicale, Université de Montpellier, Montpellier, France
| | - Stomy Karhemere
- Institut National de Recherche Biomédicale, Kinshasa, Democratic Republic of the Congo
| | | | | | | | | | - Aaron Aruna
- Direction Générale de Lutte contre la Maladie, Kinshasa, Democratic Republic of the Congo
| | - Justus Nsio
- Direction Générale de Lutte contre la Maladie, Kinshasa, Democratic Republic of the Congo
| | - Felix Mulangu
- Direction Générale de Lutte contre la Maladie, Kinshasa, Democratic Republic of the Congo
| | - Daniel Mukadi
- Service de Microbiologie, Cliniques Universitaires de Kinshasa, Kinshasa, Democratic Republic of the Congo
| | - Patrick Mukadi
- Institut National de Recherche Biomédicale, Kinshasa, Democratic Republic of the Congo
| | - John Kombe
- Direction Générale de Lutte contre la Maladie, Kinshasa, Democratic Republic of the Congo
| | - Anastasie Mulumba
- l'Organisation Mondiale de la Santé, Kinshasa, Democratic Republic of the Congo
| | | | - Elisabeth Pukuta
- Institut National de Recherche Biomédicale, Kinshasa, Democratic Republic of the Congo
| | - Jeanette Gonzalez
- Center for Genome Sciences, United States Army Medical Research Institute of Infectious Diseases, Frederick, MD, USA
| | - Maggie L Bartlett
- Center for Genome Sciences, United States Army Medical Research Institute of Infectious Diseases, Frederick, MD, USA; Department of Pathology and Microbiology, Northern Arizona University, Flagstaff, AZ, USA
| | - Shanmuga Sozhamannan
- Defense Biological Product Assurance Office, Joint Program Executive Office for Chemical, Biological, Radiological and Nuclear Defense-Joint Project Management Office for Guardian, Frederick, MA, USA; The Tauri Group, Alexandria, VA, USA
| | | | | | | | | | - Jens H Kuhn
- Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Frederick, MD, USA
| | | | - Michel Yao
- World Health Organization, Geneva, Switzerland
| | | | - Bathe Ndjoloko
- Direction Générale de Lutte contre la Maladie, Kinshasa, Democratic Republic of the Congo
| | - Mathias Mossoko
- Direction Générale de Lutte contre la Maladie, Kinshasa, Democratic Republic of the Congo
| | - Audrey Lacroix
- TransVIHMI, Institut de Recherche pour le Développement, Institut National de la Santé et de la Recherche Médicale, Université de Montpellier, Montpellier, France
| | - Eric Delaporte
- TransVIHMI, Institut de Recherche pour le Développement, Institut National de la Santé et de la Recherche Médicale, Université de Montpellier, Montpellier, France
| | - Mariano Sanchez-Lockhart
- Center for Genome Sciences, United States Army Medical Research Institute of Infectious Diseases, Frederick, MD, USA; Department of Pathology and Microbiology, Northern Arizona University, Flagstaff, AZ, USA
| | | | - Jean-Jacques Muyembe-Tamfum
- Institut National de Recherche Biomédicale, Kinshasa, Democratic Republic of the Congo; Service de Microbiologie, Cliniques Universitaires de Kinshasa, Kinshasa, Democratic Republic of the Congo
| | - Martine Peeters
- TransVIHMI, Institut de Recherche pour le Développement, Institut National de la Santé et de la Recherche Médicale, Université de Montpellier, Montpellier, France
| | - Gustavo Palacios
- Center for Genome Sciences, United States Army Medical Research Institute of Infectious Diseases, Frederick, MD, USA.
| | - Steve Ahuka-Mundeke
- Institut National de Recherche Biomédicale, Kinshasa, Democratic Republic of the Congo; Service de Microbiologie, Cliniques Universitaires de Kinshasa, Kinshasa, Democratic Republic of the Congo
| |
Collapse
|
21
|
Calling Variants in the Clinic: Informed Variant Calling Decisions Based on Biological, Clinical, and Laboratory Variables. Comput Struct Biotechnol J 2019; 17:561-569. [PMID: 31049166 PMCID: PMC6482431 DOI: 10.1016/j.csbj.2019.04.002] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2018] [Revised: 03/12/2019] [Accepted: 04/03/2019] [Indexed: 01/10/2023] Open
Abstract
Deep sequencing genomic analysis is becoming increasingly common in clinical research and practice, enabling accurate identification of diagnostic, prognostic, and predictive determinants. Variant calling, distinguishing between true mutations and experimental errors, is a central task of genomic analysis and often requires sophisticated statistical, computational, and/or heuristic techniques. Although variant callers seek to overcome noise inherent in biological experiments, variant calling can be significantly affected by outside factors including those used to prepare, store, and analyze samples. The goal of this review is to discuss known experimental features, such as sample preparation, library preparation, and sequencing, alongside diverse biological and clinical variables, and evaluate their effect on variant caller selection and optimization.
Collapse
|
22
|
Koessler T, Addeo A, Nouspikel T. Implementing circulating tumor DNA analysis in a clinical laboratory: A user manual. Adv Clin Chem 2019; 89:131-188. [PMID: 30797468 DOI: 10.1016/bs.acc.2018.12.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Liquid biopsy, the analysis of cell-free circulating tumor DNA (ctDNA), is becoming one of the most promising tools in oncology. It has already shown its usefulness in selecting and modulating therapy via remote analysis of the tumor genome and holds important promises in cancer therapy and management, such as assessing the success of key therapeutic steps, monitoring residual disease, early detection of relapses, and establishing prognosis. Yet, ctDNA analysis is technically challenging and its implementation in the laboratory raises multiple strategic and practical issues. As for oncology clinics, integration of this novel test in well-established therapeutic protocols can also pose numerous questions. The current review is intended as a field guide for (1) diagnostic laboratories wishing to implement, validate and possibly accredit ctDNA testing and (2) clinical oncologists interested in integrating the various applications of liquid biopsies in their daily practice. We provide advice and practical recommendations based on our own experience with the technical validations of these methods and on a review of the current literature, with a focus toward gastro-intestinal, lung and breast cancers.
Collapse
Affiliation(s)
- Thibaud Koessler
- Department of Oncology, Geneva University Hospital, Geneva, Switzerland
| | - Alfredo Addeo
- Department of Oncology, Geneva University Hospital, Geneva, Switzerland
| | - Thierry Nouspikel
- Service of Medical Genetics, Diagnostics Department, Geneva University Hospital, Geneva, Switzerland.
| |
Collapse
|
23
|
Teder H, Koel M, Paluoja P, Jatsenko T, Rekker K, Laisk-Podar T, Kukuškina V, Velthut-Meikas A, Fjodorova O, Peters M, Kere J, Salumets A, Palta P, Krjutškov K. TAC-seq: targeted DNA and RNA sequencing for precise biomarker molecule counting. NPJ Genom Med 2018; 3:34. [PMID: 30588329 PMCID: PMC6299075 DOI: 10.1038/s41525-018-0072-5] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2018] [Accepted: 11/26/2018] [Indexed: 12/22/2022] Open
Abstract
Targeted next-generation sequencing (NGS) methods have become essential in medical research and diagnostics. In addition to NGS sensitivity and high-throughput capacity, precise biomolecule counting based on unique molecular identifier (UMI) has potential to increase biomolecule detection accuracy. Although UMIs are widely used in basic research its introduction to clinical assays is still in progress. Here, we present a robust and cost-effective TAC-seq (Targeted Allele Counting by sequencing) method that uses UMIs to estimate the original molecule counts of mRNAs, microRNAs, and cell-free DNA. We applied TAC-seq in three different clinical applications and compared the results with standard NGS. RNA samples extracted from human endometrial biopsies were analyzed using previously described 57 mRNA-based receptivity biomarkers and 49 selected microRNAs at different expression levels. Cell-free DNA aneuploidy testing was based on cell line (47,XX, +21) genomic DNA. TAC-seq mRNA profiling showed identical clustering results to transcriptome RNA sequencing, and microRNA detection demonstrated significant reduction in amplification bias, allowing to determine minor expression changes between different samples that remained undetermined by standard NGS. The mimicking experiment for cell-free DNA fetal aneuploidy analysis showed that TAC-seq can be applied to count highly fragmented DNA, detecting significant (p = 7.6 × 10-4) excess of chromosome 21 molecules at 10% fetal fraction level. Based on three proof-of-principle applications we demonstrate that TAC-seq is an accurate and highly potential biomarker profiling method for advanced medical research and diagnostics.
Collapse
Affiliation(s)
- Hindrek Teder
- 1Competence Centre on Health Technologies, Tartu, Estonia.,2Institute of Biomedicine and Translational Medicine, University of Tartu, Tartu, Estonia
| | - Mariann Koel
- 1Competence Centre on Health Technologies, Tartu, Estonia.,3Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| | - Priit Paluoja
- 1Competence Centre on Health Technologies, Tartu, Estonia.,4Institute of Computer Science, University of Tartu, Tartu, Estonia
| | | | - Kadri Rekker
- 1Competence Centre on Health Technologies, Tartu, Estonia.,5Institute of Clinical Medicine, Department of Obstetrics and Gynaecology, University of Tartu, Tartu, Estonia
| | - Triin Laisk-Podar
- 1Competence Centre on Health Technologies, Tartu, Estonia.,5Institute of Clinical Medicine, Department of Obstetrics and Gynaecology, University of Tartu, Tartu, Estonia.,6Estonian Genome Center, University of Tartu, Tartu, Estonia
| | | | - Agne Velthut-Meikas
- 1Competence Centre on Health Technologies, Tartu, Estonia.,7Department of Chemistry and Biotechnology, School of Science, Tallinn University of Technology, Tallinn, Estonia
| | - Olga Fjodorova
- 3Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| | - Maire Peters
- 1Competence Centre on Health Technologies, Tartu, Estonia.,5Institute of Clinical Medicine, Department of Obstetrics and Gynaecology, University of Tartu, Tartu, Estonia
| | - Juha Kere
- 8Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden.,9Research Program of Molecular Neurology, Research Programs Unit, University of Helsinki, and Folkhälsan Institute of Genetics, Helsinki, Finland.,10School of Basic and Medical Biosciences, Guy's Hospital, King's College London, London, UK
| | - Andres Salumets
- 1Competence Centre on Health Technologies, Tartu, Estonia.,5Institute of Clinical Medicine, Department of Obstetrics and Gynaecology, University of Tartu, Tartu, Estonia.,11Institute of Biomedicine and Translational Medicine, Department of Biomedicine, University of Tartu, Tartu, Estonia.,12Department of Obstetrics and Gynecology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
| | - Priit Palta
- 6Estonian Genome Center, University of Tartu, Tartu, Estonia.,13Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland
| | - Kaarel Krjutškov
- 1Competence Centre on Health Technologies, Tartu, Estonia.,8Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden.,9Research Program of Molecular Neurology, Research Programs Unit, University of Helsinki, and Folkhälsan Institute of Genetics, Helsinki, Finland
| |
Collapse
|
24
|
Davydov AN, Obraztsova AS, Lebedin MY, Turchaninova MA, Staroverov DB, Merzlyak EM, Sharonov GV, Kladova O, Shugay M, Britanova OV, Chudakov DM. Comparative Analysis of B-Cell Receptor Repertoires Induced by Live Yellow Fever Vaccine in Young and Middle-Age Donors. Front Immunol 2018; 9:2309. [PMID: 30356675 PMCID: PMC6189279 DOI: 10.3389/fimmu.2018.02309] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Accepted: 09/17/2018] [Indexed: 12/25/2022] Open
Abstract
Age-related changes can significantly alter the state of adaptive immune system and often lead to attenuated response to novel pathogens and vaccination. In present study we employed 5′RACE UMI-based full length and nearly error-free immunoglobulin profiling to compare plasma cell antibody repertoires in young (19–26 years) and middle-age (45–58 years) individuals vaccinated with a live yellow fever vaccine, modeling a newly encountered pathogen. Our analysis has revealed age-related differences in the responding antibody repertoire ranging from distinct IGH CDR3 repertoire properties to differences in somatic hypermutation intensity and efficiency and antibody lineage tree structure. Overall, our findings suggest that younger individuals respond with a more diverse antibody repertoire and employ a more efficient somatic hypermutation process than elder individuals in response to a newly encountered pathogen.
Collapse
Affiliation(s)
- Alexey N Davydov
- Adaptive Immunity Group, Central European Institute of Technology, Brno, Czechia
| | - Anna S Obraztsova
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia.,Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Mikhail Y Lebedin
- Genomics of Adaptive Immunity Department, Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia
| | - Maria A Turchaninova
- Genomics of Adaptive Immunity Department, Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia.,Department of Molecular Technologies, Pirogov Russian National Research Medical University, Moscow, Russia.,Laboratory of Genomics of Antitumor Adaptive Immunity, Privolzhsky Research Medical University, Nizhny Novgorod, Russia
| | - Dmitriy B Staroverov
- Genomics of Adaptive Immunity Department, Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia.,Department of Molecular Technologies, Pirogov Russian National Research Medical University, Moscow, Russia
| | - Ekaterina M Merzlyak
- Genomics of Adaptive Immunity Department, Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia.,Department of Molecular Technologies, Pirogov Russian National Research Medical University, Moscow, Russia
| | - George V Sharonov
- Genomics of Adaptive Immunity Department, Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia.,Laboratory of Genomics of Antitumor Adaptive Immunity, Privolzhsky Research Medical University, Nizhny Novgorod, Russia
| | - Olga Kladova
- Department of Molecular Technologies, Pirogov Russian National Research Medical University, Moscow, Russia
| | - Mikhail Shugay
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Russia.,Genomics of Adaptive Immunity Department, Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia.,Department of Molecular Technologies, Pirogov Russian National Research Medical University, Moscow, Russia.,Laboratory of Genomics of Antitumor Adaptive Immunity, Privolzhsky Research Medical University, Nizhny Novgorod, Russia
| | - Olga V Britanova
- Genomics of Adaptive Immunity Department, Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia.,Department of Molecular Technologies, Pirogov Russian National Research Medical University, Moscow, Russia.,Laboratory of Genomics of Antitumor Adaptive Immunity, Privolzhsky Research Medical University, Nizhny Novgorod, Russia
| | - Dmitriy M Chudakov
- Adaptive Immunity Group, Central European Institute of Technology, Brno, Czechia.,Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Russia.,Genomics of Adaptive Immunity Department, Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia.,Department of Molecular Technologies, Pirogov Russian National Research Medical University, Moscow, Russia.,Laboratory of Genomics of Antitumor Adaptive Immunity, Privolzhsky Research Medical University, Nizhny Novgorod, Russia
| |
Collapse
|
25
|
Wong WH, Tong RS, Young AL, Druley TE. Rare Event Detection Using Error-corrected DNA and RNA Sequencing. J Vis Exp 2018. [PMID: 30124656 PMCID: PMC6126605 DOI: 10.3791/57509] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022] Open
Abstract
Conventional next-generation sequencing techniques (NGS) have allowed for immense genomic characterization for over a decade. Specifically, NGS has been used to analyze the spectrum of clonal mutations in malignancy. Though far more efficient than traditional Sanger methods, NGS struggles with identifying rare clonal and subclonal mutations due to its high error rate of ~0.5-2.0%. Thus, standard NGS has a limit of detection for mutations that are >0.02 variant allele fraction (VAF). While the clinical significance for mutations this rare in patients without known disease remains unclear, patients treated for leukemia have significantly improved outcomes when residual disease is <0.0001 by flow cytometry. In order to mitigate this artefactual background of NGS, numerous methods have been developed. Here we describe a method for Error-corrected DNA and RNA Sequencing (ECS), which involves tagging individual molecules with both a 16 bp random index for error-correction and an 8 bp patient-specific index for multiplexing. Our method can detect and track clonal mutations at variant allele fractions (VAFs) two orders of magnitude lower than the detection limit of NGS and as rare as 0.0001 VAF.
Collapse
Affiliation(s)
- Wing H Wong
- Department of Pediatrics, Division of Hematology and Oncology, Washington University School of Medicine; Center for Genome Sciences and Systems Biology, Washington University School of Medicine
| | - R Spencer Tong
- Department of Pediatrics, Division of Hematology and Oncology, Washington University School of Medicine; Center for Genome Sciences and Systems Biology, Washington University School of Medicine
| | - Andrew L Young
- Department of Pediatrics, Division of Hematology and Oncology, Washington University School of Medicine; Center for Genome Sciences and Systems Biology, Washington University School of Medicine
| | - Todd E Druley
- Department of Pediatrics, Division of Hematology and Oncology, Washington University School of Medicine; Center for Genome Sciences and Systems Biology, Washington University School of Medicine;
| |
Collapse
|
26
|
Clement K, Farouni R, Bauer DE, Pinello L. AmpUMI: design and analysis of unique molecular identifiers for deep amplicon sequencing. Bioinformatics 2018; 34:i202-i210. [PMID: 29949956 PMCID: PMC6022702 DOI: 10.1093/bioinformatics/bty264] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Motivation Unique molecular identifiers (UMIs) are added to DNA fragments before PCR amplification to discriminate between alleles arising from the same genomic locus and sequencing reads produced by PCR amplification. While computational methods have been developed to take into account UMI information in genome-wide and single-cell sequencing studies, they are not designed for modern amplicon-based sequencing experiments, especially in cases of high allelic diversity. Importantly, no guidelines are provided for the design of optimal UMI length for amplicon-based sequencing experiments. Results Based on the total number of DNA fragments and the distribution of allele frequencies, we present a model for the determination of the minimum UMI length required to prevent UMI collisions and reduce allelic distortion. We also introduce a user-friendly software tool called AmpUMI to assist in the design and the analysis of UMI-based amplicon sequencing studies. AmpUMI provides quality control metrics on frequency and quality of UMIs, and trims and deduplicates amplicon sequences with user specified parameters for use in downstream analysis. Availability and implementation AmpUMI is open-source and freely available at http://github.com/pinellolab/AmpUMI.
Collapse
Affiliation(s)
- Kendell Clement
- Molecular Pathology Unit and Cancer Center, Massachusetts General Hospital, Boston, MA, USA
- Department of Pathology, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Rick Farouni
- Molecular Pathology Unit and Cancer Center, Massachusetts General Hospital, Boston, MA, USA
- Department of Pathology, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Daniel E Bauer
- Division of Hematology/Oncology, Boston Children's Hospital; Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Harvard Stem Cell Institute, Cambridge, MA, USA
| | - Luca Pinello
- Molecular Pathology Unit and Cancer Center, Massachusetts General Hospital, Boston, MA, USA
- Department of Pathology, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| |
Collapse
|
27
|
Xu C. A review of somatic single nucleotide variant calling algorithms for next-generation sequencing data. Comput Struct Biotechnol J 2018; 16:15-24. [PMID: 29552334 PMCID: PMC5852328 DOI: 10.1016/j.csbj.2018.01.003] [Citation(s) in RCA: 149] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2017] [Revised: 01/20/2018] [Accepted: 01/28/2018] [Indexed: 02/06/2023] Open
Abstract
Detection of somatic mutations holds great potential in cancer treatment and has been a very active research field in the past few years, especially since the breakthrough of the next-generation sequencing technology. A collection of variant calling pipelines have been developed with different underlying models, filters, input data requirements, and targeted applications. This review aims to enumerate these unique features of the state-of-the-art variant callers, in the hope to provide a practical guide for selecting the appropriate pipeline for specific applications. We will focus on the detection of somatic single nucleotide variants, ranging from traditional variant callers based on whole genome or exome sequencing of paired tumor-normal samples to recent low-frequency variant callers designed for targeted sequencing protocols with unique molecular identifiers. The variant callers have been extensively benchmarked with inconsistent performances across these studies. We will review the reference materials, datasets, and performance metrics that have been used in the benchmarking studies. In the end, we will discuss emerging trends and future directions of the variant calling algorithms.
Collapse
Affiliation(s)
- Chang Xu
- Life Science Research and Foundation, Qiagen Sciences, Inc., 6951 Executive Way, Frederick, Maryland 21703, USA
| |
Collapse
|
28
|
Shagin DA, Turchaninova MA, Shagina IA, Shugay M, Zaretsky AR, Zueva OI, Bolotin DA, Lukyanov S, Chudakov DM. Application of nonsense-mediated primer exclusion (NOPE) for preparation of unique molecular barcoded libraries. BMC Genomics 2017; 18:440. [PMID: 28583065 PMCID: PMC5460480 DOI: 10.1186/s12864-017-3815-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Accepted: 05/24/2017] [Indexed: 12/18/2022] Open
Abstract
Background Recently we proposed efficient method to exclude undesirable primers at any stage of amplification reaction, here termed NOPE (NOnsense-mediated Primer Exclusion). According to this method, added oligonucleotide overlapping with the 3′-end of unwanted amplification primer (NOPE oligo) simultaneously provides a template for its elongation. This elongation disrupts specificity of unwanted primer, preventing its further participation in PCR. The suggested approach allows to rationally manage the course of PCR reactions in order to facilitate analysis of complex DNA mixtures as well as to perform multistage PCR bypassing intermediate purification steps. Results Here we apply NOPE method to DNA library preparation for the high-throughput sequencing (HTS) with the PCR-based introduction of unique molecular identifiers (UMI). We show that NOPE oligo efficiently neutralizes UMI-containing oligonucleotides after introduction of UMI into sample DNA molecules, thus allowing to proceed with further amplification steps without purification and associated loss of starting material. At the same time, NOPE oligo does not affect the efficiency of target PCR amplification. Conclusion We describe a simple, robust and cheap modification of UMI-labeled HTS libraries preparation procedure, that allows to bypass purification step and thus to preserve starting material which may be limited, e.g. circulating tumor DNA, circulating fetal DNA, or small amounts of isolated cells of interest. Furthermore, demonstrated simplicity and robustness of NOPE method should make it popular in various PCR protocols.
Collapse
Affiliation(s)
- Dmitriy A Shagin
- Pirogov Russian National Research Medical University, Moscow, Russia.,Shemiakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Science, Moscow, Russia
| | - Maria A Turchaninova
- Pirogov Russian National Research Medical University, Moscow, Russia.,Shemiakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Science, Moscow, Russia.,Central European Institute of Technology, Masaryk University, Brno, Czech Republic
| | - Irina A Shagina
- Pirogov Russian National Research Medical University, Moscow, Russia.,Shemiakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Science, Moscow, Russia
| | - Mikhail Shugay
- Pirogov Russian National Research Medical University, Moscow, Russia.,Shemiakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Science, Moscow, Russia.,Central European Institute of Technology, Masaryk University, Brno, Czech Republic
| | - Andrew R Zaretsky
- Pirogov Russian National Research Medical University, Moscow, Russia.,Evrogen JSC, Moscow, Russia
| | - Olga I Zueva
- Shemiakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Science, Moscow, Russia
| | - Dmitriy A Bolotin
- Pirogov Russian National Research Medical University, Moscow, Russia.,Shemiakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Science, Moscow, Russia
| | - Sergey Lukyanov
- Pirogov Russian National Research Medical University, Moscow, Russia.,Shemiakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Science, Moscow, Russia
| | - Dmitriy M Chudakov
- Pirogov Russian National Research Medical University, Moscow, Russia. .,Shemiakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Science, Moscow, Russia. .,Central European Institute of Technology, Masaryk University, Brno, Czech Republic. .,Skolkovo Institute of Science and Technology, Moscow, Russia.
| |
Collapse
|
29
|
Shagin DA, Shagina IA, Zaretsky AR, Barsova EV, Kelmanson IV, Lukyanov S, Chudakov DM, Shugay M. A high-throughput assay for quantitative measurement of PCR errors. Sci Rep 2017; 7:2718. [PMID: 28578414 PMCID: PMC5457411 DOI: 10.1038/s41598-017-02727-8] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2017] [Accepted: 04/18/2017] [Indexed: 01/01/2023] Open
Abstract
The accuracy with which DNA polymerase can replicate a template DNA sequence is an extremely important property that can vary by an order of magnitude from one enzyme to another. The rate of nucleotide misincorporation is shaped by multiple factors, including PCR conditions and proofreading capabilities, and proper assessment of polymerase error rate is essential for a wide range of sensitive PCR-based assays. In this paper, we describe a method for studying polymerase errors with exceptional resolution, which combines unique molecular identifier tagging and high-throughput sequencing. Our protocol is less laborious than commonly-used methods, and is also scalable, robust and accurate. In a series of nine PCR assays, we have measured a range of polymerase accuracies that is in line with previous observations. However, we were also able to comprehensively describe individual errors introduced by each polymerase after either 20 PCR cycles or a linear amplification, revealing specific substitution preferences and the diversity of PCR error frequency profiles. We also demonstrate that the detected high-frequency PCR errors are highly recurrent and that the position in the template sequence and polymerase-specific substitution preferences are among the major factors influencing the observed PCR error rate.
Collapse
Affiliation(s)
- Dmitriy A Shagin
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry RAS, Moscow, Russia.,Pirogov Russian National Research Medical University, Moscow, Russia.,Evrogen JSC, Moscow, Russia
| | - Irina A Shagina
- Pirogov Russian National Research Medical University, Moscow, Russia.,Evrogen JSC, Moscow, Russia
| | - Andrew R Zaretsky
- Pirogov Russian National Research Medical University, Moscow, Russia.,Evrogen JSC, Moscow, Russia
| | - Ekaterina V Barsova
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry RAS, Moscow, Russia.,Evrogen JSC, Moscow, Russia
| | - Ilya V Kelmanson
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry RAS, Moscow, Russia.,Evrogen JSC, Moscow, Russia
| | - Sergey Lukyanov
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry RAS, Moscow, Russia.,Pirogov Russian National Research Medical University, Moscow, Russia
| | - Dmitriy M Chudakov
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry RAS, Moscow, Russia. .,Pirogov Russian National Research Medical University, Moscow, Russia. .,Skolkovo Institute of Science and Technology, Moscow, Russia. .,Central European Institute of Technology, Masaryk University, Brno, Czech Republic.
| | - Mikhail Shugay
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry RAS, Moscow, Russia. .,Pirogov Russian National Research Medical University, Moscow, Russia. .,Central European Institute of Technology, Masaryk University, Brno, Czech Republic.
| |
Collapse
|