1
|
Zhou S, Hill CS, Spielvogel E, Clark MU, Hudgens MG, Swanstrom R. Unique Molecular Identifiers and Multiplexing Amplicons Maximize the Utility of Deep Sequencing To Critically Assess Population Diversity in RNA Viruses. ACS Infect Dis 2022; 8:2505-2514. [PMID: 36326446 PMCID: PMC9742341 DOI: 10.1021/acsinfecdis.2c00319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Next generation sequencing (NGS)/deep sequencing has become an important tool in the study of viruses. The use of unique molecular identifiers (UMI) can overcome the limitations of PCR errors and PCR-mediated recombination and reveal the true sampling depth of a viral population being sequenced in an NGS experiment. This approach of enhanced sequence data represents an ideal tool to study both high and low abundance drug resistance mutations and more generally to explore the genetic structure of viral populations. Central to the use of the UMI/Primer ID approach is the creation of a template consensus sequence (TCS) for each genome sequenced. Here we describe a series of experiments to validate several aspects of the Multiplexed Primer ID (MPID) sequencing approach using the MiSeq platform. We have evaluated how multiplexing of cDNA synthesis and amplicons affects the sampling depth of the viral population for each individual cDNA and amplicon to understand the relationship between broader genome coverage versus maximal sequencing depth. We have validated reproducibility of the MPID assay in the detection of minority mutations in viral genomes. We have also examined the determinants that allow sequencing reads of PCR recombinants to contaminate the final TCS data set and show how such contamination can be limited. Finally, we provide several examples where we have applied MPID to analyze features of minority variants and describe limits on their detection in viral populations of HIV-1 and SARS-CoV-2 to demonstrate the generalizable utility of this approach with any RNA virus.
Collapse
Affiliation(s)
- Shuntai Zhou
- UNC Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA,Corresponding Author: Shuntai Zhou - UNC Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, 27599, USA.
| | - Collin S. Hill
- UNC Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Ean Spielvogel
- UNC Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Michael U. Clark
- UNC Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Michael G. Hudgens
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Ronald Swanstrom
- UNC Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA,Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
2
|
King DJ, Freimanis G, Lasecka-Dykes L, Asfor A, Ribeca P, Waters R, King DP, Laing E. A Systematic Evaluation of High-Throughput Sequencing Approaches to Identify Low-Frequency Single Nucleotide Variants in Viral Populations. Viruses 2020; 12:E1187. [PMID: 33092085 PMCID: PMC7594041 DOI: 10.3390/v12101187] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 10/01/2020] [Accepted: 10/12/2020] [Indexed: 12/28/2022] Open
Abstract
High-throughput sequencing such as those provided by Illumina are an efficient way to understand sequence variation within viral populations. However, challenges exist in distinguishing process-introduced error from biological variance, which significantly impacts our ability to identify sub-consensus single-nucleotide variants (SNVs). Here we have taken a systematic approach to evaluate laboratory and bioinformatic pipelines to accurately identify low-frequency SNVs in viral populations. Artificial DNA and RNA "populations" were created by introducing known SNVs at predetermined frequencies into template nucleic acid before being sequenced on an Illumina MiSeq platform. These were used to assess the effects of abundance and starting input material type, technical replicates, read length and quality, short-read aligner, and percentage frequency thresholds on the ability to accurately call variants. Analyses revealed that the abundance and type of input nucleic acid had the greatest impact on the accuracy of SNV calling as measured by a micro-averaged Matthews correlation coefficient score, with DNA and high RNA inputs (107 copies) allowing for variants to be called at a 0.2% frequency. Reduced input RNA (105 copies) required more technical replicates to maintain accuracy, while low RNA inputs (103 copies) suffered from consensus-level errors. Base errors identified at specific motifs identified in all technical replicates were also identified which can be excluded to further increase SNV calling accuracy. These findings indicate that samples with low RNA inputs should be excluded for SNV calling and reinforce the importance of optimising the technical and bioinformatics steps in pipelines that are used to accurately identify sequence variants.
Collapse
Affiliation(s)
- David J. King
- The Pirbright Institute, Woking, Surrey GU24 0NF, UK; (D.J.K.); (G.F.); (L.L.-D.); (A.A.); (R.W.); (D.P.K.)
- Department of Microbial and Cellular Sciences, Faculty of Health and Medical Sciences, School of Biosciences and Medicine, University of Surrey, Guildford GU2 7XH, UK
| | - Graham Freimanis
- The Pirbright Institute, Woking, Surrey GU24 0NF, UK; (D.J.K.); (G.F.); (L.L.-D.); (A.A.); (R.W.); (D.P.K.)
| | - Lidia Lasecka-Dykes
- The Pirbright Institute, Woking, Surrey GU24 0NF, UK; (D.J.K.); (G.F.); (L.L.-D.); (A.A.); (R.W.); (D.P.K.)
| | - Amin Asfor
- The Pirbright Institute, Woking, Surrey GU24 0NF, UK; (D.J.K.); (G.F.); (L.L.-D.); (A.A.); (R.W.); (D.P.K.)
- Department of Pathology and Infectious Diseases, Faculty of Health and Medical sciences, School of Veterinary Medicine, University of Surrey, Guilford GU2 7XH, UK
| | - Paolo Ribeca
- Biomathematics and Statistics Scotland, Edinburgh, Midlothian EH9 3FD, UK;
| | - Ryan Waters
- The Pirbright Institute, Woking, Surrey GU24 0NF, UK; (D.J.K.); (G.F.); (L.L.-D.); (A.A.); (R.W.); (D.P.K.)
| | - Donald P. King
- The Pirbright Institute, Woking, Surrey GU24 0NF, UK; (D.J.K.); (G.F.); (L.L.-D.); (A.A.); (R.W.); (D.P.K.)
| | - Emma Laing
- Department of Microbial and Cellular Sciences, Faculty of Health and Medical Sciences, School of Biosciences and Medicine, University of Surrey, Guildford GU2 7XH, UK
| |
Collapse
|
3
|
Lu IN, Muller CP, He FQ. Applying next-generation sequencing to unravel the mutational landscape in viral quasispecies. Virus Res 2020; 283:197963. [PMID: 32278821 PMCID: PMC7144618 DOI: 10.1016/j.virusres.2020.197963] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 04/03/2020] [Accepted: 04/04/2020] [Indexed: 02/07/2023]
Abstract
Next-generation sequencing (NGS) has revolutionized the scale and depth of biomedical sciences. Because of its unique ability for the detection of sub-clonal variants within genetically diverse populations, NGS has been successfully applied to analyze and quantify the exceptionally-high diversity within viral quasispecies, and many low-frequency drug- or vaccine-resistant mutations of therapeutic importance have been discovered. Although many works have intensively discussed the latest NGS approaches and applications in general, none of them has focused on applying NGS in viral quasispecies studies, mostly due to the limited ability of current NGS technologies to accurately detect and quantify rare viral variants. Here, we summarize several error-correction strategies that have been developed to enhance the detection accuracy of minority variants. We also discuss critical considerations for preparing a sequencing library from viral RNAs and for analyzing NGS data to unravel the mutational landscape.
Collapse
Affiliation(s)
- I-Na Lu
- DKFZ-Division Translational Neurooncology at the WTZ, DKTK partner site, University Hospital Essen, D-45147 Essen, Germany; Department of Infectious Diseases, Aarhus University Hospital, DK-8200 Aarhus N, Denmark.
| | - Claude P Muller
- Department of Infection and Immunity, Luxembourg Institute of Health, L-4354 Esch-Sur-Alzette, Luxembourg; Laboratoire National de Santé, L-3583 Dudelange, Luxembourg
| | - Feng Q He
- Department of Infection and Immunity, Luxembourg Institute of Health, L-4354 Esch-Sur-Alzette, Luxembourg; Institute of Medical Microbiology, University Hospital Essen, University Duisburg-Essen, Essen, Germany.
| |
Collapse
|
4
|
Next-Generation Sequencing in High-Sensitive Detection of Mutations in Tumors: Challenges, Advances, and Applications. J Mol Diagn 2020; 22:994-1007. [PMID: 32480002 DOI: 10.1016/j.jmoldx.2020.04.213] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 03/17/2020] [Accepted: 04/23/2020] [Indexed: 02/06/2023] Open
Abstract
Next-generation sequencing (NGS) technologies have come of age as preferred technologies for screening of genomic variants of pathologic and therapeutic potential. Because of their capability for high-throughput and massively parallel sequencing, they can screen for a variety of genomic changes in multiple samples simultaneously. This has made them platforms of choice for clinical testing of solid tumors and hematological malignancies. Consequently, they are increasingly replacing conventional technologies, such as Sanger sequencing and pyrosequencing, expression arrays, real-time PCR, and fluorescence in situ hybridization methods, for routine molecular testing of tumors. However, one limitation of routinely used NGS technologies is the inability to detect low-level genomic variants with high accuracy. This can be attributed to the frequent occurrence of low-level sequencing errors and artifacts in NGS workflow that need specialized approaches to be identified and eliminated. This review focuses on the origins and nature of these artifacts and recent improvements in the NGS technologies to overcome them to facilitate accurate high-sensitive detection of low-level mutations. Potential applications of high-sensitive NGS in oncology and comparisons with non-NGS technologies of similar capabilities are also summarized.
Collapse
|
5
|
Mallampati S, Duose DY, Harmon MA, Mehrotra M, Kanagal-Shamanna R, Zalles S, Wistuba II, Sun X, Luthra R. Rational "Error Elimination" Approach to Evaluating Molecular Barcoded Next-Generation Sequencing Data Identifies Low-Frequency Mutations in Hematologic Malignancies. J Mol Diagn 2019; 21:471-482. [PMID: 30794984 PMCID: PMC6521894 DOI: 10.1016/j.jmoldx.2019.01.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Revised: 10/31/2018] [Accepted: 01/18/2019] [Indexed: 12/18/2022] Open
Abstract
The emergence of highly sensitive molecular diagnostic approaches, such as droplet digital PCR, has allowed the accurate identification of low-frequency variant alleles in clinical specimens; however, the multiplex capabilities of droplet digital PCR for variant detection are inadequate. The incorporation of molecular barcodes or unique IDs into next-generation sequencing libraries through PCR has enabled the detection of low-frequency variant alleles across multiple genomic regions. However, rational library preparation and sequencing data analytic strategies that integrate molecular barcodes have rarely been applied to clinical settings. In this study, we evaluated the parameters that are crucial in the use of molecular barcodes in next-generation sequencing for genotyping clinical specimens from patients with hematologic malignancies. The uniform incorporation of molecular barcodes into DNA templates through PCR was found to be crucial, and the extent of uniformity was governed by multiple interdependent variables. An error elimination strategy was developed for removing sequencing background errors by using molecular barcode sequence information as an alternative to the conventional error correction approach. This approach was successfully used to identify mutations with frequencies as low as 0.15%, and the clonal heterogeneity of hematologic malignancies was revealed. These findings have implications for elucidating heterogeneity and temporal and spatial clonal evolution, evaluating response to therapy, and monitoring relapse in patients with hematologic malignancies.
Collapse
Affiliation(s)
- Saradhi Mallampati
- Department of Laboratory Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas; Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Dzifa Y Duose
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | | | - Meenakshi Mehrotra
- Department of Hematopathology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Rashmi Kanagal-Shamanna
- Department of Hematopathology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Stephanie Zalles
- Department of Hematopathology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Ignacio I Wistuba
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Xiaoping Sun
- Department of Laboratory Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas.
| | - Rajyalakshmi Luthra
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, Texas; Department of Hematopathology, The University of Texas MD Anderson Cancer Center, Houston, Texas.
| |
Collapse
|
6
|
DeWitt WS, Mesin L, Victora GD, Minin VN, Matsen FA. Using Genotype Abundance to Improve Phylogenetic Inference. Mol Biol Evol 2019; 35:1253-1265. [PMID: 29474671 PMCID: PMC5913685 DOI: 10.1093/molbev/msy020] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Modern biological techniques enable very dense genetic sampling of unfolding evolutionary histories, and thus frequently sample some genotypes multiple times. This motivates strategies to incorporate genotype abundance information in phylogenetic inference. In this article, we synthesize a stochastic process model with standard sequence-based phylogenetic optimality, and show that tree estimation is substantially improved by doing so. Our method is validated with extensive simulations and an experimental single-cell lineage tracing study of germinal center B cell receptor affinity maturation.
Collapse
Affiliation(s)
- William S DeWitt
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA.,Department of Genome Sciences, University of Washington, Seattle, WA
| | - Luka Mesin
- Laboratory of Lymphocyte Dynamics, The Rockefeller University, New York, NY
| | - Gabriel D Victora
- Laboratory of Lymphocyte Dynamics, The Rockefeller University, New York, NY
| | | | - Frederick A Matsen
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA
| |
Collapse
|
7
|
Petrova VN, Muir L, McKay PF, Vassiliou GS, Smith KGC, Lyons PA, Russell CA, Anderson CA, Kellam P, Bashford-Rogers RJM. Combined Influence of B-Cell Receptor Rearrangement and Somatic Hypermutation on B-Cell Class-Switch Fate in Health and in Chronic Lymphocytic Leukemia. Front Immunol 2018; 9:1784. [PMID: 30147686 PMCID: PMC6095981 DOI: 10.3389/fimmu.2018.01784] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Accepted: 07/19/2018] [Indexed: 01/21/2023] Open
Abstract
A diverse B-cell receptor (BCR) repertoire is required to bind a wide range of antigens. BCRs are generated through genetic recombination and can be diversified through somatic hypermutation (SHM) or class-switch recombination (CSR). Patterns of repertoire diversity can vary substantially between different health conditions. We use isotype-resolved BCR sequencing to compare B-cell evolution and class-switch fate in healthy individuals and in patients with chronic lymphocytic leukemia (CLL). We show that the patterns of SHM and CSR in B-cells from healthy individuals are distinct from CLL. We identify distinct properties of clonal expansion that lead to the generation of antibodies of different classes in healthy, malignant, and non-malignant CLL BCR repertoires. We further demonstrate that BCR diversity is affected by relationships between antibody variable and constant regions leading to isotype-specific signatures of variable gene usage. This study provides powerful insights into the mechanisms underlying the evolution of the adaptive immune responses in health and their aberration during disease.
Collapse
MESH Headings
- B-Lymphocytes/immunology
- B-Lymphocytes/metabolism
- B-Lymphocytes/pathology
- Gene Rearrangement, B-Lymphocyte
- Humans
- Immunoglobulin Class Switching/genetics
- Immunoglobulin Isotypes/genetics
- Immunoglobulin Joining Region/genetics
- Immunoglobulin Variable Region/genetics
- Leukemia, Lymphocytic, Chronic, B-Cell/genetics
- Leukemia, Lymphocytic, Chronic, B-Cell/immunology
- Leukemia, Lymphocytic, Chronic, B-Cell/metabolism
- Leukocytes, Mononuclear/immunology
- Leukocytes, Mononuclear/metabolism
- Leukocytes, Mononuclear/pathology
- Multigene Family
- Receptors, Antigen, B-Cell/genetics
- Somatic Hypermutation, Immunoglobulin
Collapse
Affiliation(s)
| | - Luke Muir
- Department of Medicine, Division of Infectious Diseases, Imperial College London, London, United Kingdom
| | - Paul F. McKay
- Department of Medicine, Division of Infectious Diseases, Imperial College London, London, United Kingdom
| | | | | | - Paul A. Lyons
- Department of Medicine, University of Cambridge, Cambridge, United Kingdom
| | - Colin A. Russell
- Department of Medical Microbiology, Academic Medical Center, University of Amsterdam, Amsterdam, Netherlands
| | | | - Paul Kellam
- Department of Medicine, Division of Infectious Diseases, Imperial College London, London, United Kingdom
| | | |
Collapse
|
8
|
Boone M, De Koker A, Callewaert N. Capturing the 'ome': the expanding molecular toolbox for RNA and DNA library construction. Nucleic Acids Res 2018; 46:2701-2721. [PMID: 29514322 PMCID: PMC5888575 DOI: 10.1093/nar/gky167] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2017] [Revised: 02/05/2018] [Accepted: 02/23/2018] [Indexed: 12/14/2022] Open
Abstract
All sequencing experiments and most functional genomics screens rely on the generation of libraries to comprehensively capture pools of targeted sequences. In the past decade especially, driven by the progress in the field of massively parallel sequencing, numerous studies have comprehensively assessed the impact of particular manipulations on library complexity and quality, and characterized the activities and specificities of several key enzymes used in library construction. Fortunately, careful protocol design and reagent choice can substantially mitigate many of these biases, and enable reliable representation of sequences in libraries. This review aims to guide the reader through the vast expanse of literature on the subject to promote informed library generation, independent of the application.
Collapse
Affiliation(s)
- Morgane Boone
- Center for Medical Biotechnology, VIB, Zwijnaarde 9052, Belgium
- Department of Biochemistry and Microbiology, Ghent University, Ghent 9000, Belgium
| | - Andries De Koker
- Center for Medical Biotechnology, VIB, Zwijnaarde 9052, Belgium
- Department of Biochemistry and Microbiology, Ghent University, Ghent 9000, Belgium
| | - Nico Callewaert
- Center for Medical Biotechnology, VIB, Zwijnaarde 9052, Belgium
- Department of Biochemistry and Microbiology, Ghent University, Ghent 9000, Belgium
| |
Collapse
|
9
|
Ogawa T, Kryukov K, Imanishi T, Shiroguchi K. The efficacy and further functional advantages of random-base molecular barcodes for absolute and digital quantification of nucleic acid molecules. Sci Rep 2017; 7:13576. [PMID: 29051542 PMCID: PMC5648891 DOI: 10.1038/s41598-017-13529-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2017] [Accepted: 09/25/2017] [Indexed: 01/18/2023] Open
Abstract
Accurate quantification of biomolecules in system-wide measurements is in high demand, especially for systems with limited sample amounts such as single cells. Because of this, digital quantification of nucleic acid molecules using molecular barcodes has been developed, making, e.g., transcriptome analysis highly reproducible and quantitative. This counting scheme was shown to work using sequence-restricted barcodes, and non-sequence-restricted (random-base) barcodes that may provide a much higher dynamic range at significantly lower cost have been widely used. However, the efficacy of random-base barcodes is significantly affected by base changes due to amplification and/or sequencing errors and has not been investigated experimentally or quantitatively. Here, we show experimentally that random-base barcodes enable absolute and digital quantification of DNA molecules with high dynamic range (from one to more than 104, potentially up to 1015 molecules) conditional on our barcode design and variety, a certain range of sequencing depths, and computational analyses. Moreover, we quantitatively show further functional advantages of the molecular barcodes: the molecular barcodes enable one to find contaminants and misidentifications of target sequences. Our scheme here may be generally used to confirm that the digital quantification works in each platform.
Collapse
Affiliation(s)
- Taisaku Ogawa
- Laboratory for Integrative Omics, RIKEN Quantitative Biology Center (QBiC), 6-2-3 Furuedai Suita, Osaka, 565-0874, Japan
| | - Kirill Kryukov
- Biomedical Informatics Laboratory, Department of Molecular Life Science, Tokai University School of Medicine, 143 Shimokasuya, Isehara, Kanagawa, 259-1193, Japan
| | - Tadashi Imanishi
- Biomedical Informatics Laboratory, Department of Molecular Life Science, Tokai University School of Medicine, 143 Shimokasuya, Isehara, Kanagawa, 259-1193, Japan
| | - Katsuyuki Shiroguchi
- Laboratory for Integrative Omics, RIKEN Quantitative Biology Center (QBiC), 6-2-3 Furuedai Suita, Osaka, 565-0874, Japan. .,Laboratory for Immunogenetics, RIKEN Center for Integrative Medical Sciences (IMS), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan. .,JST PRESTO, 4-1-8 Honcho, Kawaguchi, Saitama, 332-0012, Japan.
| |
Collapse
|
10
|
Canzoniero JV, Cravero K, Park BH. The Impact of Collisions on the Ability to Detect Rare Mutant Alleles Using Barcode-Type Next-Generation Sequencing Techniques. Cancer Inform 2017. [DOI: 10.1177/1176935117719236] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Barcoding techniques are used to reduce error from next-generation sequencing, with applications ranging from understanding tumor subclone populations to detecting circulating tumor DNA. Collisions occur when more than one sample molecule is tagged by the same unique identifier (UID) and can result in failure to detect very-low-frequency mutations and error in estimating mutation frequency. Here, we created computer models of barcoding technique, with and without amplification bias introduced by the UID, and analyzed the effect of collisions for a range of mutant allele frequencies (1e−6 to 0.2), number of sample molecules (10 000 to 1e7), and number of UIDs (410-414). Inability to detect rare mutant alleles occurred in 0% to 100% of simulations, depending on collisions and number of mutant molecules. Collisions also introduced error in estimating mutant allele frequency resulting in underestimation of minor allele frequency. Incorporating an understanding of the effect of collisions into experimental design can allow for optimization of the number of sample molecules and number of UIDs to minimize the negative impact on rare mutant detection and mutant frequency estimation.
Collapse
Affiliation(s)
| | - Karen Cravero
- The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins Medicine, Baltimore, MD, USA
| | - Ben Ho Park
- The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins Medicine, Baltimore, MD, USA
| |
Collapse
|
11
|
Error rates, PCR recombination, and sampling depth in HIV-1 whole genome deep sequencing. Virus Res 2016; 239:106-114. [PMID: 28039047 DOI: 10.1016/j.virusres.2016.12.009] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2016] [Revised: 11/25/2016] [Accepted: 12/16/2016] [Indexed: 11/20/2022]
Abstract
Deep sequencing is a powerful and cost-effective tool to characterize the genetic diversity and evolution of virus populations. While modern sequencing instruments readily cover viral genomes many thousand fold and very rare variants can in principle be detected, sequencing errors, amplification biases, and other artifacts can limit sensitivity and complicate data interpretation. For this reason, the number of studies using whole genome deep sequencing to characterize viral quasi-species in clinical samples is still limited. We have previously undertaken a large scale whole genome deep sequencing study of HIV-1 populations. Here we discuss the challenges, error profiles, control experiments, and computational test we developed to quantify the accuracy of variant frequency estimation.
Collapse
|
12
|
Casadellà M, Paredes R. Deep sequencing for HIV-1 clinical management. Virus Res 2016; 239:69-81. [PMID: 27818211 DOI: 10.1016/j.virusres.2016.10.019] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Revised: 10/10/2016] [Accepted: 10/18/2016] [Indexed: 02/05/2023]
Abstract
The emerging HIV-1 resistance epidemic is threatening the impressive global advances in HIV-1 infection treatment and prevention achieved in the last decade. Next-generation sequencing is improving our ability to understand, diagnose and prevent HIV-1 resistance, being increasingly cost-effective and more accessible. However, NGS still faces a number of limitations that need to be addressed to enable its widespread use. Here, we will review the main NGS platforms available for HIV-1 diagnosis, the factors affecting the clinical utility of NGS testing and the evidence supporting -or not- ultrasensitive genotyping over Sanger sequencing for routine HIV-1 diagnosis. Now that global HIV-1 eradication might be within our reach, making NGS accessible also to LMICs has become a priority. Reductions in sequencing costs, particularly in library preparation, and accessibility to low-cost, robust but simplified automated bioinformatic analyses of NGS data will remain essential to end the HIV-1 pandemic.
Collapse
Affiliation(s)
- Maria Casadellà
- IrsiCaixa AIDS Research Institute, Badalona, Spain; Universitat Autònoma de Barcelona, Catalonia, Spain.
| | - Roger Paredes
- IrsiCaixa AIDS Research Institute, Badalona, Spain; Universitat Autònoma de Barcelona, Catalonia, Spain; Universitat de Vic - Central de Catalunya, Vic, Catalonia, Spain; HIV-1 Unit, Hospital Universitari Germans Trias i Pujol, Badalona, Catalonia, Spain
| |
Collapse
|
13
|
Friedensohn S, Khan TA, Reddy ST. Advanced Methodologies in High-Throughput Sequencing of Immune Repertoires. Trends Biotechnol 2016; 35:203-214. [PMID: 28341036 DOI: 10.1016/j.tibtech.2016.09.010] [Citation(s) in RCA: 77] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2016] [Revised: 09/19/2016] [Accepted: 09/30/2016] [Indexed: 11/19/2022]
Abstract
In recent years, major efforts have been made to develop sophisticated experimental and bioinformatic workflows for sequencing adaptive immune repertoires. The immunological insight gained has been applied to fields as varied as lymphocyte biology, immunodiagnostics, vaccines, cancer immunotherapy, and antibody engineering. In this review, we provide a detailed overview of these advanced methodologies, focusing specifically on strategies to reduce sequencing errors and bias and to achieve high-throughput pairing of variable regions (e.g., heavy-light or alpha-beta chains). In addition, we highlight recent technologies for single-cell transcriptome sequencing that can be integrated with immune repertoires. Finally, we provide a perspective on advanced immune repertoire sequencing and its ability to impact basic immunology, biopharmaceutical drug discovery and development, and cancer immunotherapy.
Collapse
Affiliation(s)
- Simon Friedensohn
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
| | - Tarik A Khan
- Pharmaceutical Development & Supplies Biologics Europe, F. Hoffman-La Roche Ltd, Basel, Switzerland
| | - Sai T Reddy
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.
| |
Collapse
|
14
|
Glanville J, D'Angelo S, Khan TA, Reddy ST, Naranjo L, Ferrara F, Bradbury ARM. Deep sequencing in library selection projects: what insight does it bring? Curr Opin Struct Biol 2016; 33:146-60. [PMID: 26451649 DOI: 10.1016/j.sbi.2015.09.001] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2015] [Revised: 08/19/2015] [Accepted: 09/17/2015] [Indexed: 11/17/2022]
Abstract
High throughput sequencing is poised to change all aspects of the way antibodies and other binders are discovered and engineered. Millions of available sequence reads provide an unprecedented sampling depth able to guide the design and construction of effective, high quality naïve libraries containing tens of billions of unique molecules. Furthermore, during selections, high throughput sequencing enables quantitative tracing of enriched clones and position-specific guidance to amino acid variation under positive selection during antibody engineering. Successful application of the technologies relies on specific PCR reagent design, correct sequencing platform selection, and effective use of computational tools and statistical measures to remove error, identify antibodies, estimate diversity, and extract signatures of selection from the clone down to individual structural positions. Here we review these considerations and discuss some of the remaining challenges to the widespread adoption of the technology.
Collapse
Affiliation(s)
- J Glanville
- Program in Computational and Systems Immunology, Institute for Immunity, Transplantation and Infection, Stanford University School of Medicine, Stanford, CA, USA
| | - S D'Angelo
- University of New Mexico Comprehensive Cancer Center, and Division of Molecular Medicine, University of New Mexico School of Medicine, Albuquerque, NM, USA
| | - T A Khan
- ETH Zurich, Department of Biosystems Science and Engineering, Basel, Switzerland
| | - S T Reddy
- ETH Zurich, Department of Biosystems Science and Engineering, Basel, Switzerland
| | - L Naranjo
- Bioscience division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - F Ferrara
- University of New Mexico Comprehensive Cancer Center, and Division of Molecular Medicine, University of New Mexico School of Medicine, Albuquerque, NM, USA
| | - A R M Bradbury
- Bioscience division, Los Alamos National Laboratory, Los Alamos, NM, USA.
| |
Collapse
|
15
|
Turchaninova MA, Davydov A, Britanova OV, Shugay M, Bikos V, Egorov ES, Kirgizova VI, Merzlyak EM, Staroverov DB, Bolotin DA, Mamedov IZ, Izraelson M, Logacheva MD, Kladova O, Plevova K, Pospisilova S, Chudakov DM. High-quality full-length immunoglobulin profiling with unique molecular barcoding. Nat Protoc 2016; 11:1599-616. [DOI: 10.1038/nprot.2016.093] [Citation(s) in RCA: 134] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
|
16
|
Khan TA, Friedensohn S, de Vries ARG, Straszewski J, Ruscheweyh HJ, Reddy ST. Accurate and predictive antibody repertoire profiling by molecular amplification fingerprinting. SCIENCE ADVANCES 2016; 2:e1501371. [PMID: 26998518 PMCID: PMC4795664 DOI: 10.1126/sciadv.1501371] [Citation(s) in RCA: 83] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Accepted: 01/17/2016] [Indexed: 05/31/2023]
Abstract
High-throughput antibody repertoire sequencing (Ig-seq) provides quantitative molecular information on humoral immunity. However, Ig-seq is compromised by biases and errors introduced during library preparation and sequencing. By using synthetic antibody spike-in genes, we determined that primer bias from multiplex polymerase chain reaction (PCR) library preparation resulted in antibody frequencies with only 42 to 62% accuracy. Additionally, Ig-seq errors resulted in antibody diversity measurements being overestimated by up to 5000-fold. To rectify this, we developed molecular amplification fingerprinting (MAF), which uses unique molecular identifier (UID) tagging before and during multiplex PCR amplification, which enabled tagging of transcripts while accounting for PCR efficiency. Combined with a bioinformatic pipeline, MAF bias correction led to measurements of antibody frequencies with up to 99% accuracy. We also used MAF to correct PCR and sequencing errors, resulting in enhanced accuracy of full-length antibody diversity measurements, achieving 98 to 100% error correction. Using murine MAF-corrected data, we established a quantitative metric of recent clonal expansion-the intraclonal diversity index-which measures the number of unique transcripts associated with an antibody clone. We used this intraclonal diversity index along with antibody frequencies and somatic hypermutation to build a logistic regression model for prediction of the immunological status of clones. The model was able to predict clonal status with high confidence but only when using MAF error and bias corrected Ig-seq data. Improved accuracy by MAF provides the potential to greatly advance Ig-seq and its utility in immunology and biotechnology.
Collapse
Affiliation(s)
- Tarik A. Khan
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
| | - Simon Friedensohn
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
| | | | - Jakub Straszewski
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
- Scientific IT Services, ETH Zurich, 4058 Basel, Switzerland
| | - Hans-Joachim Ruscheweyh
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
- Scientific IT Services, ETH Zurich, 4058 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland
| | - Sai T. Reddy
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
| |
Collapse
|
17
|
A benchmark study on error-correction by read-pairing and tag-clustering in amplicon-based deep sequencing. BMC Genomics 2016; 17:108. [PMID: 26868371 PMCID: PMC4751728 DOI: 10.1186/s12864-016-2388-9] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2015] [Accepted: 01/08/2016] [Indexed: 11/10/2022] Open
Abstract
Background The high error rate of next generation sequencing (NGS) restricts some of its applications, such as monitoring virus mutations and detecting rare mutations in tumors. There are two commonly employed sequencing library preparation strategies to improve sequencing accuracy by correcting sequencing errors: read-pairing method and tag-clustering method (i.e. primer ID or UID). Here, we constructed a homogeneous library from a single clone, and compared the variant calling accuracy of these error-correction methods. Result We comprehensively described the strengths and pitfalls of these methods. We found that both read-pairing and tag-clustering methods significantly decreased sequencing error rate. While the read-pairing method was more effective than the tag-clustering method at correcting insertion and deletion errors, it was not as effective as the tag-clustering method at correcting substitution errors. In addition, we observed that when the read quality was poor, the tag-clustering method led to huge coverage loss. We also tested the effect of applying quality score filtering to the error-correction methods and demonstrated that quality score filtering was able to impose a minor, yet statistically significant improvement to the error-correction methods tested in this study. Conclusion Our study provides a benchmark for researchers to select suitable error-correction methods based on the goal of the experiment by balancing the trade-off between sequencing cost (i.e. sequencing coverage requirement) and detection sensitivity. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2388-9) contains supplementary material, which is available to authorized users.
Collapse
|
18
|
Kou R, Lam H, Duan H, Ye L, Jongkam N, Chen W, Zhang S, Li S. Benefits and Challenges with Applying Unique Molecular Identifiers in Next Generation Sequencing to Detect Low Frequency Mutations. PLoS One 2016; 11:e0146638. [PMID: 26752634 PMCID: PMC4709065 DOI: 10.1371/journal.pone.0146638] [Citation(s) in RCA: 66] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Accepted: 12/21/2015] [Indexed: 11/18/2022] Open
Abstract
Indexing individual template molecules with a unique identifier (UID) before PCR and deep sequencing is promising for detecting low frequency mutations, as true mutations could be distinguished from PCR errors or sequencing errors based on consensus among reads sharing same index. In an effort to develop a robust assay to detect from urine low-abundant bladder cancer cells carrying well-documented mutations, we have tested the idea first on a set of mock templates, with wild type and known mutants mixed at defined ratios. We have measured the combined error rate for PCR and Illumina sequencing at each nucleotide position of three exons, and demonstrated the power of a UID in distinguishing and correcting errors. In addition, we have demonstrated that PCR sampling bias, rather than PCR errors, challenges the UID-deep sequencing method in faithfully detecting low frequency mutation.
Collapse
Affiliation(s)
- Ruqin Kou
- Department of Development, GENEWIZ LLC, 115 Corporate Blvd., South Plainfield, NJ, 07080, United States of America
| | - Ham Lam
- Department of Development, GENEWIZ LLC, 115 Corporate Blvd., South Plainfield, NJ, 07080, United States of America
| | - Hairong Duan
- Department of Bioinformatics, GENEWIZ CN, 218 Xinghu Street, Suzhou, Jiangsu, 215123, China
| | - Li Ye
- Department of Bioinformatics, GENEWIZ CN, 218 Xinghu Street, Suzhou, Jiangsu, 215123, China
| | - Narisra Jongkam
- Department of Development, GENEWIZ LLC, 115 Corporate Blvd., South Plainfield, NJ, 07080, United States of America
| | - Weizhi Chen
- Department of Bioinformatics, GENEWIZ CN, 218 Xinghu Street, Suzhou, Jiangsu, 215123, China
| | - Shifang Zhang
- Department of Development, GENEWIZ LLC, 115 Corporate Blvd., South Plainfield, NJ, 07080, United States of America
| | - Shihong Li
- Department of Development, GENEWIZ LLC, 115 Corporate Blvd., South Plainfield, NJ, 07080, United States of America
- * E-mail:
| |
Collapse
|
19
|
A Comprehensive Analysis of Primer IDs to Study Heterogeneous HIV-1 Populations. J Mol Biol 2015; 428:238-250. [PMID: 26711506 DOI: 10.1016/j.jmb.2015.12.012] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2015] [Revised: 11/25/2015] [Accepted: 12/16/2015] [Indexed: 01/01/2023]
Abstract
Determining the composition of viral populations is becoming increasingly important in the field of medical virology. While recently developed computational tools for viral haplotype analysis allow for correcting sequencing errors, they do not always allow for the removal of errors occurring in the upstream experimental protocol, such as PCR errors. Primer IDs (pIDs) are one method to address this problem by harnessing redundant template resampling for error correction. By using a reference mixture of five HIV-1 strains, we show how pIDs can be useful for estimating key experimental parameters, such as the substitution rate of the PCR process and the reverse transcription (RT) error rate. In addition, we introduce a hidden Markov model for determining the recombination rate of the RT PCR process. We found no strong sequence-specific bias in pID abundances (the same RT efficiencies as compared to commonly used short, specific RT primers) and no effects of pIDs on the estimated distribution of the references viruses.
Collapse
|
20
|
Primer ID Validates Template Sampling Depth and Greatly Reduces the Error Rate of Next-Generation Sequencing of HIV-1 Genomic RNA Populations. J Virol 2015; 89:8540-55. [PMID: 26041299 DOI: 10.1128/jvi.00522-15] [Citation(s) in RCA: 94] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2015] [Accepted: 05/30/2015] [Indexed: 12/29/2022] Open
Abstract
UNLABELLED Validating the sampling depth and reducing sequencing errors are critical for studies of viral populations using next-generation sequencing (NGS). We previously described the use of Primer ID to tag each viral RNA template with a block of degenerate nucleotides in the cDNA primer. We now show that low-abundance Primer IDs (offspring Primer IDs) are generated due to PCR/sequencing errors. These artifactual Primer IDs can be removed using a cutoff model for the number of reads required to make a template consensus sequence. We have modeled the fraction of sequences lost due to Primer ID resampling. For a typical sequencing run, less than 10% of the raw reads are lost to offspring Primer ID filtering and resampling. The remaining raw reads are used to correct for PCR resampling and sequencing errors. We also demonstrate that Primer ID reveals bias intrinsic to PCR, especially at low template input or utilization. cDNA synthesis and PCR convert ca. 20% of RNA templates into recoverable sequences, and 30-fold sequence coverage recovers most of these template sequences. We have directly measured the residual error rate to be around 1 in 10,000 nucleotides. We use this error rate and the Poisson distribution to define the cutoff to identify preexisting drug resistance mutations at low abundance in an HIV-infected subject. Collectively, these studies show that >90% of the raw sequence reads can be used to validate template sampling depth and to dramatically reduce the error rate in assessing a genetically diverse viral population using NGS. IMPORTANCE Although next-generation sequencing (NGS) has revolutionized sequencing strategies, it suffers from serious limitations in defining sequence heterogeneity in a genetically diverse population, such as HIV-1 due to PCR resampling and PCR/sequencing errors. The Primer ID approach reveals the true sampling depth and greatly reduces errors. Knowing the sampling depth allows the construction of a model of how to maximize the recovery of sequences from input templates and to reduce resampling of the Primer ID so that appropriate multiplexing can be included in the experimental design. With the defined sampling depth and measured error rate, we are able to assign cutoffs for the accurate detection of minority variants in viral populations. This approach allows the power of NGS to be realized without having to guess about sampling depth or to ignore the problem of PCR resampling, while also being able to correct most of the errors in the data set.
Collapse
|