1
|
Sidstedt M, Gynnå AH, Kiesler KM, Jansson L, Steffen CR, Håkansson J, Johansson G, Österlund T, Bogestål Y, Tillmar A, Rådström P, Ståhlberg A, Vallone PM, Hedman J. Ultrasensitive sequencing of STR markers utilizing unique molecular identifiers and the SiMSen-Seq method. Forensic Sci Int Genet 2024; 71:103047. [PMID: 38598919 DOI: 10.1016/j.fsigen.2024.103047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 03/27/2024] [Accepted: 04/01/2024] [Indexed: 04/12/2024]
Abstract
Massively parallel sequencing (MPS) is increasingly applied in forensic short tandem repeat (STR) analysis. The presence of stutter artefacts and other PCR or sequencing errors in the MPS-STR data partly limits the detection of low DNA amounts, e.g., in complex mixtures. Unique molecular identifiers (UMIs) have been applied in several scientific fields to reduce noise in sequencing. UMIs consist of a stretch of random nucleotides, a unique barcode for each starting DNA molecule, that is incorporated in the DNA template using either ligation or PCR. The barcode is used to generate consensus reads, thus removing errors. The SiMSen-Seq (Simple, multiplexed, PCR-based barcoding of DNA for sensitive mutation detection using sequencing) method relies on PCR-based introduction of UMIs and includes a sophisticated hairpin design to reduce unspecific primer binding as well as PCR protocol adjustments to further optimize the reaction. In this study, SiMSen-Seq is applied to develop a proof-of-concept seven STR multiplex for MPS library preparation and an associated bioinformatics pipeline. Additionally, machine learning (ML) models were evaluated to further improve UMI allele calling. Overall, the seven STR multiplex resulted in complete detection and concordant alleles for 47 single-source samples at 1 ng input DNA as well as for low-template samples at 62.5 pg input DNA. For twelve challenging mixtures with minor contributions of 10 pg to 150 pg and ratios of 1-15% relative to the major donor, 99.2% of the expected alleles were detected by applying the UMIs in combination with an ML filter. The main impact of UMIs was a substantially lowered number of artefacts as well as reduced stutter ratios, which were generally below 5% of the parental allele. In conclusion, UMI-based STR sequencing opens new means for improved analysis of challenging crime scene samples including complex mixtures.
Collapse
Affiliation(s)
- Maja Sidstedt
- National Forensic Centre, Swedish Police Authority, Linköping SE-581 94, Sweden
| | - Arvid H Gynnå
- National Forensic Centre, Swedish Police Authority, Linköping SE-581 94, Sweden
| | - Kevin M Kiesler
- National Institute of Standards and Technology, 100 Bureau Drive, M/S 8314, Gaithersburg, MD 20899, USA
| | - Linda Jansson
- National Forensic Centre, Swedish Police Authority, Linköping SE-581 94, Sweden; Applied Microbiology, Department of Chemistry, Lund University, Lund SE-221 00, Sweden
| | - Carolyn R Steffen
- National Institute of Standards and Technology, 100 Bureau Drive, M/S 8314, Gaithersburg, MD 20899, USA
| | - Joakim Håkansson
- RISE Unit of Biological Function, Division Materials and Production, Box 857, Borås SE-501 15, Sweden; Department of Laboratory Medicine, Institute of Biomedicine, University of Gothenburg, Gothenburg SE-405 30, Sweden; Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg SE-405 30, Sweden
| | - Gustav Johansson
- SIMSEN Diagnostics, Sahlgrenska Science Park, Gothenburg, Sweden
| | - Tobias Österlund
- Department of Laboratory Medicine, Sahlgrenska Center for Cancer Research, Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg, Medicinaregatan 1F, Gothenburg 41390, Sweden; Wallenberg Center for Molecular and Translational Medicine, University of Gothenburg, Gothenburg 41390, Sweden; Department of Clinical Genetics and Genomics, Sahlgrenska University Hospital, Gothenburg, Region Västra Götaland 41390, Sweden
| | - Yalda Bogestål
- RISE Unit of Biological Function, Division Materials and Production, Box 857, Borås SE-501 15, Sweden
| | - Andreas Tillmar
- Department of Forensic Genetics and Forensic Toxicology, National Board of Forensic Medicine, Linköping SE-587 58, Sweden
| | - Peter Rådström
- Applied Microbiology, Department of Chemistry, Lund University, Lund SE-221 00, Sweden
| | - Anders Ståhlberg
- Department of Laboratory Medicine, Sahlgrenska Center for Cancer Research, Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg, Medicinaregatan 1F, Gothenburg 41390, Sweden; Wallenberg Center for Molecular and Translational Medicine, University of Gothenburg, Gothenburg 41390, Sweden; Department of Clinical Genetics and Genomics, Sahlgrenska University Hospital, Gothenburg, Region Västra Götaland 41390, Sweden
| | - Peter M Vallone
- National Institute of Standards and Technology, 100 Bureau Drive, M/S 8314, Gaithersburg, MD 20899, USA
| | - Johannes Hedman
- National Forensic Centre, Swedish Police Authority, Linköping SE-581 94, Sweden; Applied Microbiology, Department of Chemistry, Lund University, Lund SE-221 00, Sweden.
| |
Collapse
|
2
|
Staadig A, Hedman J, Tillmar A. Applying Unique Molecular Indices with an Extensive All-in-One Forensic SNP Panel for Improved Genotype Accuracy and Sensitivity. Genes (Basel) 2023; 14:genes14040818. [PMID: 37107576 PMCID: PMC10137749 DOI: 10.3390/genes14040818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 03/23/2023] [Accepted: 03/27/2023] [Indexed: 03/31/2023] Open
Abstract
One of the major challenges in forensic genetics is being able to detect very small amounts of DNA. Massively parallel sequencing (MPS) enables sensitive detection; however, genotype errors may exist and could interfere with the interpretation. Common errors in MPS-based analysis are often induced during PCR or sequencing. Unique molecular indices (UMIs) are short random nucleotide sequences ligated to each template molecule prior to amplification. Applying UMIs can improve the limit of detection by enabling accurate counting of initial template molecules and removal of erroneous data. In this study, we applied the FORCE panel, which includes ~5500 SNPs, with a QIAseq Targeted DNA Custom Panel (Qiagen), including UMIs. Our main objective was to investigate whether UMIs can enhance the sensitivity and accuracy of forensic genotyping and to evaluate the overall assay performance. We analyzed the data both with and without the UMI information, and the results showed that both genotype accuracy and sensitivity were improved when applying UMIs. The results showed very high genotype accuracies (>99%) for both reference DNA and challenging samples, down to 125 pg. To conclude, we show successful assay performance for several forensic applications and improvements in forensic genotyping when applying UMIs.
Collapse
|
3
|
Using unique molecular identifiers to improve allele calling in low-template mixtures. Forensic Sci Int Genet 2023; 63:102807. [PMID: 36462297 DOI: 10.1016/j.fsigen.2022.102807] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 10/20/2022] [Accepted: 11/18/2022] [Indexed: 11/27/2022]
Abstract
PCR artifacts are an ever-present challenge in sequencing applications. These artifacts can seriously limit the analysis and interpretation of low-template samples and mixtures, especially with respect to a minor contributor. In medicine, molecular barcoding techniques have been employed to decrease the impact of PCR error and to allow the examination of low-abundance somatic variation. In principle, it should be possible to apply the same techniques to the forensic analysis of mixtures. To that end, several short tandem repeat loci were selected for targeted sequencing, and a bioinformatic pipeline for analyzing the sequence data was developed. The pipeline notes the relevant unique molecular identifiers (UMIs) attached to each read and, using machine learning, filters the noise products out of the set of potential alleles. To evaluate this pipeline, DNA from pairs of individuals were mixed at different ratios (1-1, 1-9) and sequenced with different starting amounts of DNA (10, 1 and 0.1 ng). Naïvely using the information in the molecular barcodes led to increased performance, with the machine learning resulting in an additional benefit. In concrete terms, using the UMI data results in less noise for a given amount of drop out. For instance, if thresholds are selected that filter out a quarter of the true alleles, using read counts accepts 2381 noise alleles and using raw UMI counts accepts 1726 noise alleles, while the machine learning approach only accepts 307.
Collapse
|
4
|
Cheng C, Fei Z, Xiao P. Methods to improve the accuracy of next-generation sequencing. Front Bioeng Biotechnol 2023; 11:982111. [PMID: 36741756 PMCID: PMC9895957 DOI: 10.3389/fbioe.2023.982111] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 01/11/2023] [Indexed: 01/21/2023] Open
Abstract
Next-generation sequencing (NGS) is present in all fields of life science, which has greatly promoted the development of basic research while being gradually applied in clinical diagnosis. However, the cost and throughput advantages of next-generation sequencing are offset by large tradeoffs with respect to read length and accuracy. Specifically, its high error rate makes it extremely difficult to detect SNPs or low-abundance mutations, limiting its clinical applications, such as pharmacogenomics studies primarily based on SNP and early clinical diagnosis primarily based on low abundance mutations. Currently, Sanger sequencing is still considered to be the gold standard due to its high accuracy, so the results of next-generation sequencing require verification by Sanger sequencing in clinical practice. In order to maintain high quality next-generation sequencing data, a variety of improvements at the levels of template preparation, sequencing strategy and data processing have been developed. This study summarized the general procedures of next-generation sequencing platforms, highlighting the improvements involved in eliminating errors at each step. Furthermore, the challenges and future development of next-generation sequencing in clinical application was discussed.
Collapse
|
5
|
De Luca G, Dono M. The Opportunities and Challenges of Molecular Tagging Next-Generation Sequencing in Liquid Biopsy. Mol Diagn Ther 2021; 25:537-547. [PMID: 34224097 DOI: 10.1007/s40291-021-00542-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/20/2021] [Indexed: 10/20/2022]
Abstract
Liquid biopsy (LB) is a promising tool that is rapidly evolving as a standard of care in early and advanced stages of cancer settings. Next-generation sequencing (NGS) methods have become essential in molecular diagnostics and clinical laboratories dealing with LB analytes, i.e., cell-free DNA and RNA. The sensitivity and high-throughput capacity of NGS enable us to overcome technical issues that are mainly attributable to low-abundance (below 1% mutated allelic frequency) tumour genetic material circulating within biological fluids. In this context, the introduction of unique molecular identifiers (UMIs), also known as molecular barcodes, applied to various NGS platforms greatly improved the characterization of rare genetic alterations, as they resulted in a drastic reduction in background noise while maintaining high levels of positive predictive value and sensitivity. Different UMI strategies have been developed, such as single (e.g., safe-sequencing system, Safe-SeqS) or double (duplex-sequencing system, Duplex-Seq) strand-based labelling, and, currently, considerable results corroborate their potential implementation in a routine laboratory. Recently, the US Food and Drug Administration approved the clinical use of two comprehensive UMI-based NGS assays (FoundationOne Liquid CDx and Guardant360 CDx) in cfDNA mutational assessment. However, to definitively translate LB into clinical practice, UMI-based NGS protocols should meet certain feasibility requirements in terms of cost-effectiveness, wet laboratory performance and easy access to web-source and bioinformatic tools for downstream molecular data.
Collapse
Affiliation(s)
- Giuseppa De Luca
- Molecular Diagnostic Unit, IRCCS Ospedale Policlinico San Martino, 16132, Genova, Italy
| | - Mariella Dono
- Molecular Diagnostic Unit, IRCCS Ospedale Policlinico San Martino, 16132, Genova, Italy.
| |
Collapse
|
6
|
Anastasakis DG, Jacob A, Konstantinidou P, Meguro K, Claypool D, Cekan P, Haase AD, Hafner M. A non-radioactive, improved PAR-CLIP and small RNA cDNA library preparation protocol. Nucleic Acids Res 2021; 49:e45. [PMID: 33503264 DOI: 10.1093/nar/gkab011] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Revised: 12/28/2020] [Accepted: 01/06/2021] [Indexed: 02/06/2023] Open
Abstract
Crosslinking and immunoprecipitation (CLIP) methods are powerful techniques to interrogate direct protein-RNA interactions and dissect posttranscriptional gene regulatory networks. One widely used CLIP variant is photoactivatable ribonucleoside enhanced CLIP (PAR-CLIP) that involves in vivo labeling of nascent RNAs with the photoreactive nucleosides 4-thiouridine (4SU) or 6-thioguanosine (6SG), which can efficiently crosslink to interacting proteins using UVA and UVB light. Crosslinking of 4SU or 6SG to interacting amino acids changes their base-pairing properties and results in characteristic mutations in cDNA libraries prepared for high-throughput sequencing, which can be computationally exploited to remove abundant background from non-crosslinked sequences and help pinpoint RNA binding protein binding sites at nucleotide resolution on a transcriptome-wide scale. Here we present a streamlined protocol for fluorescence-based PAR-CLIP (fPAR-CLIP) that eliminates the need to use radioactivity. It is based on direct ligation of a fluorescently labeled adapter to the 3'end of crosslinked RNA on immobilized ribonucleoproteins, followed by isolation of the adapter-ligated RNA and efficient conversion into cDNA without the previously needed size fractionation on denaturing polyacrylamide gels. These improvements cut the experimentation by half to 2 days and increases sensitivity by 10-100-fold.
Collapse
Affiliation(s)
- Dimitrios G Anastasakis
- Laboratory of Muscle Stem Cells and Gene Regulation, National Institute for Arthritis and Musculoskeletal and Skin Disease, National Institutes of Health, Bethesda, 20892 MD, USA
| | - Alexis Jacob
- Laboratory of Muscle Stem Cells and Gene Regulation, National Institute for Arthritis and Musculoskeletal and Skin Disease, National Institutes of Health, Bethesda, 20892 MD, USA
| | - Parthena Konstantinidou
- Laboratory of Cellular and Molecular Biology, National Institutes of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, 20892 MD, USA
| | - Kazuyuki Meguro
- Laboratory of Clinical Immunology & Microbiology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, 20892 MD, USA
| | - Duncan Claypool
- Laboratory of Muscle Stem Cells and Gene Regulation, National Institute for Arthritis and Musculoskeletal and Skin Disease, National Institutes of Health, Bethesda, 20892 MD, USA
| | - Pavol Cekan
- MultiplexDX s.r.o., 841 04 Bratislava, Slovakia
| | - Astrid D Haase
- Laboratory of Cellular and Molecular Biology, National Institutes of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, 20892 MD, USA
| | - Markus Hafner
- Laboratory of Muscle Stem Cells and Gene Regulation, National Institute for Arthritis and Musculoskeletal and Skin Disease, National Institutes of Health, Bethesda, 20892 MD, USA
| |
Collapse
|
7
|
Woerner AE, Mandape S, King JL, Muenzler M, Crysup B, Budowle B. Reducing noise and stutter in short tandem repeat loci with unique molecular identifiers. Forensic Sci Int Genet 2020; 51:102459. [PMID: 33429137 DOI: 10.1016/j.fsigen.2020.102459] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Revised: 10/28/2020] [Accepted: 12/21/2020] [Indexed: 12/24/2022]
Abstract
Unique molecular identifiers (UMIs) are a promising approach to contend with errors generated during PCR and massively parallel sequencing (MPS). With UMI technology, random molecular barcodes are ligated to template DNA molecules prior to PCR, allowing PCR and sequencing error to be tracked and corrected bioinformatically. UMIs have the potential to be particularly informative for the interpretation of short tandem repeats (STRs). Traditional MPS approaches may simply lead to the observation of alleles that are consistent with the hypotheses of stutter, while with UMIs stutter products bioinformatically may be re-associated with their parental alleles and subsequently removed. Herein, a bioinformatics pipeline named strumi is described that is designed for the analysis of STRs that are tagged with UMIs. Unlike other tools, strumi is an alignment-free machine learning driven algorithm that clusters individual MPS reads into UMI families, infers consensus super-reads that represent each family and provides an estimate the resulting haplotype's accuracy. Super-reads, in turn, approximate independent measurements not of the PCR products, but of the original template molecules, both in terms of quantity and sequence identity. Provisional assessments show that naïve threshold-based approaches generate super-reads that are accurate (∼97 % haplotype accuracy, compared to ∼78 % when UMIs are not used), and the application of a more nuanced machine learning approach increases the accuracy to ∼99.5 % depending on the level of certainty desired. With these features, UMIs may greatly simplify probabilistic genotyping systems and reduce uncertainty. However, the ability to interpret alleles at trace levels also permits the interpretation, characterization and quantification of contamination as well as somatic variation (including somatic stutter), which may present newfound challenges.
Collapse
Affiliation(s)
- August E Woerner
- Center for Human Identification, University of North Texas Health Science Center, 3500 Camp Bowie Blvd., Fort Worth, TX 76107, USA; Department of Microbiology, Immunology and Genetics, University of North Texas Health Science Center, 3500 Camp Bowie Blvd., Fort Worth, TX 76107, USA.
| | - Sammed Mandape
- Center for Human Identification, University of North Texas Health Science Center, 3500 Camp Bowie Blvd., Fort Worth, TX 76107, USA
| | - Jonathan L King
- Center for Human Identification, University of North Texas Health Science Center, 3500 Camp Bowie Blvd., Fort Worth, TX 76107, USA
| | - Melissa Muenzler
- Center for Human Identification, University of North Texas Health Science Center, 3500 Camp Bowie Blvd., Fort Worth, TX 76107, USA
| | - Benjamin Crysup
- Center for Human Identification, University of North Texas Health Science Center, 3500 Camp Bowie Blvd., Fort Worth, TX 76107, USA
| | - Bruce Budowle
- Center for Human Identification, University of North Texas Health Science Center, 3500 Camp Bowie Blvd., Fort Worth, TX 76107, USA; Department of Microbiology, Immunology and Genetics, University of North Texas Health Science Center, 3500 Camp Bowie Blvd., Fort Worth, TX 76107, USA
| |
Collapse
|
8
|
Zurek PJ, Knyphausen P, Neufeld K, Pushpanath A, Hollfelder F. UMI-linked consensus sequencing enables phylogenetic analysis of directed evolution. Nat Commun 2020; 11:6023. [PMID: 33243970 PMCID: PMC7691348 DOI: 10.1038/s41467-020-19687-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 10/12/2020] [Indexed: 11/09/2022] Open
Abstract
The success of protein evolution campaigns is strongly dependent on the sequence context in which mutations are introduced, stemming from pervasive non-additive interactions between a protein's amino acids ('intra-gene epistasis'). Our limited understanding of such epistasis hinders the correct prediction of the functional contributions and adaptive potential of mutations. Here we present a straightforward unique molecular identifier (UMI)-linked consensus sequencing workflow (UMIC-seq) that simplifies mapping of evolutionary trajectories based on full-length sequences. Attaching UMIs to gene variants allows accurate consensus generation for closely related genes with nanopore sequencing. We exemplify the utility of this approach by reconstructing the artificial phylogeny emerging in three rounds of directed evolution of an amine dehydrogenase biocatalyst via ultrahigh throughput droplet screening. Uniquely, we are able to identify lineages and their founding variant, as well as non-additive interactions between mutations within a full gene showing sign epistasis. Access to deep and accurate long reads will facilitate prediction of key beneficial mutations and adaptive potential based on in silico analysis of large sequence datasets.
Collapse
Affiliation(s)
- Paul Jannis Zurek
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK
- Johnson Matthey Plc, Cambridge, CB4 0WE, UK
| | - Philipp Knyphausen
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK
| | - Katharina Neufeld
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK
- Johnson Matthey Plc, Cambridge, CB4 0WE, UK
| | | | - Florian Hollfelder
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK.
| |
Collapse
|
9
|
Meiser LC, Koch J, Antkowiak PL, Stark WJ, Heckel R, Grass RN. DNA synthesis for true random number generation. Nat Commun 2020; 11:5869. [PMID: 33208744 PMCID: PMC7675991 DOI: 10.1038/s41467-020-19757-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Accepted: 10/28/2020] [Indexed: 11/09/2022] Open
Abstract
The volume of securely encrypted data transmission required by today's network complexity of people, transactions and interactions increases continuously. To guarantee security of encryption and decryption schemes for exchanging sensitive information, large volumes of true random numbers are required. Here we present a method to exploit the stochastic nature of chemistry by synthesizing DNA strands composed of random nucleotides. We compare three commercial random DNA syntheses giving a measure for robustness and synthesis distribution of nucleotides and show that using DNA for random number generation, we can obtain 7 million GB of randomness from one synthesis run, which can be read out using state-of-the-art sequencing technologies at rates of ca. 300 kB/s. Using the von Neumann algorithm for data compression, we remove bias introduced from human or technological sources and assess randomness using NIST's statistical test suite.
Collapse
Affiliation(s)
- Linda C Meiser
- Department of Chemistry and Applied Biosciences, Institute for Chemical and Bioengineering, ETH Zurich, Vladimir-Prelog-Weg 1, CH-8093, Zurich, Switzerland
| | - Julian Koch
- Department of Chemistry and Applied Biosciences, Institute for Chemical and Bioengineering, ETH Zurich, Vladimir-Prelog-Weg 1, CH-8093, Zurich, Switzerland
| | - Philipp L Antkowiak
- Department of Chemistry and Applied Biosciences, Institute for Chemical and Bioengineering, ETH Zurich, Vladimir-Prelog-Weg 1, CH-8093, Zurich, Switzerland
| | - Wendelin J Stark
- Department of Chemistry and Applied Biosciences, Institute for Chemical and Bioengineering, ETH Zurich, Vladimir-Prelog-Weg 1, CH-8093, Zurich, Switzerland
| | - Reinhard Heckel
- Department of Electrical and Computer Engineering, Technical University of Munich, Arcistrasse 21, 80333, Munich, Germany
| | - Robert N Grass
- Department of Chemistry and Applied Biosciences, Institute for Chemical and Bioengineering, ETH Zurich, Vladimir-Prelog-Weg 1, CH-8093, Zurich, Switzerland.
| |
Collapse
|
10
|
Next-Generation Sequencing in High-Sensitive Detection of Mutations in Tumors: Challenges, Advances, and Applications. J Mol Diagn 2020; 22:994-1007. [PMID: 32480002 DOI: 10.1016/j.jmoldx.2020.04.213] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 03/17/2020] [Accepted: 04/23/2020] [Indexed: 02/06/2023] Open
Abstract
Next-generation sequencing (NGS) technologies have come of age as preferred technologies for screening of genomic variants of pathologic and therapeutic potential. Because of their capability for high-throughput and massively parallel sequencing, they can screen for a variety of genomic changes in multiple samples simultaneously. This has made them platforms of choice for clinical testing of solid tumors and hematological malignancies. Consequently, they are increasingly replacing conventional technologies, such as Sanger sequencing and pyrosequencing, expression arrays, real-time PCR, and fluorescence in situ hybridization methods, for routine molecular testing of tumors. However, one limitation of routinely used NGS technologies is the inability to detect low-level genomic variants with high accuracy. This can be attributed to the frequent occurrence of low-level sequencing errors and artifacts in NGS workflow that need specialized approaches to be identified and eliminated. This review focuses on the origins and nature of these artifacts and recent improvements in the NGS technologies to overcome them to facilitate accurate high-sensitive detection of low-level mutations. Potential applications of high-sensitive NGS in oncology and comparisons with non-NGS technologies of similar capabilities are also summarized.
Collapse
|
11
|
Comparison of Target Enrichment Platforms for Circulating Tumor DNA Detection. Sci Rep 2020; 10:4124. [PMID: 32139724 PMCID: PMC7057974 DOI: 10.1038/s41598-020-60375-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Accepted: 02/04/2020] [Indexed: 12/20/2022] Open
Abstract
Cancer-related mortality of solid tumors remains the major cause of death worldwide. Circulating tumor DNA (ctDNA) released from cancer cells harbors specific somatic mutations. Sequencing ctDNA opens opportunities to non-invasive population screening and lays foundations for personalized therapy. In this study, two commercially available platforms, Roche’s Avenio ctDNA Expanded panel and QIAgen’s QIAseq Human Comprehensive Cancer panel were compared for (1) panel coverage of clinically relevant variants; (2) target enrichment specificity and sequencing performance; (3) the sensitivity; (4) concordance and (5) sequencing coverage using the same human blood sample with ultra-deep next-generation sequencing. Our finding suggests that Avenio detected somatic mutations in common cancers in over 70% of patients while QIAseq covered nearly 90% with a higher average number of variants per patient (Avenio: 3; QIAseq: 8 variants per patient). Both panels demonstrated similar on-target rate and percentage of reads mapped. However, Avenio had more uniform sequencing coverage across regions with different GC content. Avenio had a higher sensitivity and concordance compared with QIAseq at the same sequencing depth. This study identifies a unique niche for the application of each of the panel and allows the scientific community to make an informed decision on the technologies to meet research or application needs.
Collapse
|
12
|
Schmidt L, Werner S, Kemmer T, Niebler S, Kristen M, Ayadi L, Johe P, Marchand V, Schirmeister T, Motorin Y, Hildebrandt A, Schmidt B, Helm M. Graphical Workflow System for Modification Calling by Machine Learning of Reverse Transcription Signatures. Front Genet 2019; 10:876. [PMID: 31608115 PMCID: PMC6774277 DOI: 10.3389/fgene.2019.00876] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Accepted: 08/21/2019] [Indexed: 01/28/2023] Open
Abstract
Modification mapping from cDNA data has become a tremendously important approach in epitranscriptomics. So-called reverse transcription signatures in cDNA contain information on the position and nature of their causative RNA modifications. Data mining of, e.g. Illumina-based high-throughput sequencing data, is therefore fast growing in importance, and the field is still lacking effective tools. Here we present a versatile user-friendly graphical workflow system for modification calling based on machine learning. The workflow commences with a principal module for trimming, mapping, and postprocessing. The latter includes a quantification of mismatch and arrest rates with single-nucleotide resolution across the mapped transcriptome. Further downstream modules include tools for visualization, machine learning, and modification calling. From the machine-learning module, quality assessment parameters are provided to gauge the suitability of the initial dataset for effective machine learning and modification calling. This output is useful to improve the experimental parameters for library preparation and sequencing. In summary, the automation of the bioinformatics workflow allows a faster turnaround of the optimization cycles in modification calling.
Collapse
Affiliation(s)
- Lukas Schmidt
- Institute of Pharmacy and Biochemistry, Johannes Gutenberg-University, Mainz, Germany
| | - Stephan Werner
- Institute of Pharmacy and Biochemistry, Johannes Gutenberg-University, Mainz, Germany
| | - Thomas Kemmer
- Institute of Computer Science, Scientific Computing and Bioinformatics, Johannes Gutenberg-University, Mainz, Germany
| | - Stefan Niebler
- Institute of Computer Science, High Performance Computing, Johannes Gutenberg-University, Mainz, Germany
| | - Marco Kristen
- Institute of Pharmacy and Biochemistry, Johannes Gutenberg-University, Mainz, Germany
| | - Lilia Ayadi
- Next-Generation Sequencing Core Facility UMS2008 IBSLor CNRS-UL-INSERM, Biopôle, University of Lorraine, Vandœuvre-lès-Nancy, France.,IMoPA UMR7365 CNRS-UL, Biopôle, University of Lorraine, Vandœuvre-lès-Nancy, France
| | - Patrick Johe
- Institute of Pharmacy and Biochemistry, Johannes Gutenberg-University, Mainz, Germany
| | - Virginie Marchand
- Next-Generation Sequencing Core Facility UMS2008 IBSLor CNRS-UL-INSERM, Biopôle, University of Lorraine, Vandœuvre-lès-Nancy, France
| | - Tanja Schirmeister
- Institute of Pharmacy and Biochemistry, Johannes Gutenberg-University, Mainz, Germany
| | - Yuri Motorin
- Next-Generation Sequencing Core Facility UMS2008 IBSLor CNRS-UL-INSERM, Biopôle, University of Lorraine, Vandœuvre-lès-Nancy, France.,IMoPA UMR7365 CNRS-UL, Biopôle, University of Lorraine, Vandœuvre-lès-Nancy, France
| | - Andreas Hildebrandt
- Institute of Computer Science, Scientific Computing and Bioinformatics, Johannes Gutenberg-University, Mainz, Germany
| | - Bertil Schmidt
- Institute of Computer Science, High Performance Computing, Johannes Gutenberg-University, Mainz, Germany
| | - Mark Helm
- Institute of Pharmacy and Biochemistry, Johannes Gutenberg-University, Mainz, Germany
| |
Collapse
|
13
|
Kumar V, Rosenbaum J, Wang Z, Forcier T, Ronemus M, Wigler M, Levy D. Partial bisulfite conversion for unique template sequencing. Nucleic Acids Res 2019; 46:e10. [PMID: 29161423 PMCID: PMC5778454 DOI: 10.1093/nar/gkx1054] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2017] [Accepted: 10/18/2017] [Indexed: 11/15/2022] Open
Abstract
We introduce a new protocol, mutational sequencing or muSeq, which uses sodium bisulfite to randomly deaminate unmethylated cytosines at a fixed and tunable rate. The muSeq protocol marks each initial template molecule with a unique mutation signature that is present in every copy of the template, and in every fragmented copy of a copy. In the sequenced read data, this signature is observed as a unique pattern of C-to-T or G-to-A nucleotide conversions. Clustering reads with the same conversion pattern enables accurate count and long-range assembly of initial template molecules from short-read sequence data. We explore count and low-error sequencing by profiling 135 000 restriction fragments in a PstI representation, demonstrating that muSeq improves copy number inference and significantly reduces sporadic sequencer error. We explore long-range assembly in the context of cDNA, generating contiguous transcript clusters greater than 3,000 bp in length. The muSeq assemblies reveal transcriptional diversity not observable from short-read data alone.
Collapse
Affiliation(s)
- Vijay Kumar
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724 USA
| | - Julie Rosenbaum
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724 USA
| | - Zihua Wang
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724 USA
| | - Talitha Forcier
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724 USA
| | - Michael Ronemus
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724 USA
| | - Michael Wigler
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724 USA
| | - Dan Levy
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724 USA
| |
Collapse
|
14
|
López-Santibáñez-Jácome L, Avendaño-Vázquez SE, Flores-Jasso CF. The Pipeline Repertoire for Ig-Seq Analysis. Front Immunol 2019; 10:899. [PMID: 31114573 PMCID: PMC6503734 DOI: 10.3389/fimmu.2019.00899] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2018] [Accepted: 04/08/2019] [Indexed: 11/22/2022] Open
Abstract
With the advent of high-throughput sequencing of immunoglobulin genes (Ig-Seq), the understanding of antibody repertoires and their dynamics among individuals and populations has become an exciting area of research. There is an increasing number of computational tools that aid in every step of the immune repertoire characterization. However, since not all tools function identically, every pipeline has its unique rationale and capabilities, creating a rich blend of useful features that may appear intimidating for newcomer laboratories with the desire to plunge into immune repertoire analysis to expand and improve their research; hence, all pipeline strengths and differences may not seem evident. In this review we provide a practical and organized list of the current set of computational tools, focusing on their most attractive features and differences in order to carry out the characterization of antibody repertoires so that the reader better decides a strategic approach for the experimental design, and computational pathways for the analyses of immune repertoires.
Collapse
Affiliation(s)
- Laura López-Santibáñez-Jácome
- Consorcio de Metabolismo de RNA, Instituto Nacional de Medicina Genómica, Mexico City, Mexico
- Maestría en Ciencia de Datos, Instituto Tecnológico Autónomo de México, Mexico City, Mexico
| | | | | |
Collapse
|
15
|
Narayan A, Johnkennedy R, Zakaria M, Lee V, Patel AA. META RNA profiling: Multiplexed quantitation of targeted RNAs across large numbers of samples. Methods 2019; 152:41-47. [PMID: 30308315 DOI: 10.1016/j.ymeth.2018.09.012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 09/06/2018] [Accepted: 09/30/2018] [Indexed: 01/18/2023] Open
Abstract
META RNA profiling is a simple and inexpensive method to measure the expression of multiple targeted RNAs across many samples. By assigning sample-specific tags up-front during reverse-transcription, cDNAs from multiple samples can be pooled prior to amplification and deep sequencing. Such early parallelization of samples simplifies the workflow, minimizes cross-sample experimental variability, and reduces reagent and sequencing costs. Herein we describe the theoretical framework of the method and provide a detailed protocol to facilitate its implementation.
Collapse
Affiliation(s)
- Azeet Narayan
- Department of Therapeutic Radiology, Yale University, New Haven, CT, United States
| | - Rofina Johnkennedy
- Department of Therapeutic Radiology, Yale University, New Haven, CT, United States
| | - Maheen Zakaria
- Department of Therapeutic Radiology, Yale University, New Haven, CT, United States
| | - Victor Lee
- Department of Therapeutic Radiology, Yale University, New Haven, CT, United States
| | - Abhijit A Patel
- Department of Therapeutic Radiology, Yale University, New Haven, CT, United States.
| |
Collapse
|
16
|
Sena JA, Galotto G, Devitt NP, Connick MC, Jacobi JL, Umale PE, Vidali L, Bell CJ. Unique Molecular Identifiers reveal a novel sequencing artefact with implications for RNA-Seq based gene expression analysis. Sci Rep 2018; 8:13121. [PMID: 30177820 PMCID: PMC6120941 DOI: 10.1038/s41598-018-31064-7] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Accepted: 08/07/2018] [Indexed: 12/22/2022] Open
Abstract
Attaching Unique Molecular Identifiers (UMI) to RNA molecules in the first step of sequencing library preparation establishes a distinct identity for each input molecule. This makes it possible to eliminate the effects of PCR amplification bias, which is particularly important where many PCR cycles are required, for example, in single cell studies. After PCR, molecules sharing a UMI are assumed to be derived from the same input molecule. In our single cell RNA-Seq studies of Physcomitrella patens, we discovered that reads sharing a UMI, and therefore presumed to be derived from the same mRNA molecule, frequently map to different, but closely spaced locations. This behaviour occurs in all such libraries that we have produced, and in multiple other UMI-containing RNA-Seq data sets in the public domain. This apparent paradox, that reads of identical origin map to distinct genomic coordinates may be partially explained by PCR stutter, which is often seen in low-entropy templates and those containing simple tandem repeats. In the absence of UMI this artefact is undetectable. We show that the common assumption that sequence reads having different mapping coordinates are derived from different starting molecules does not hold. Unless taken into account, this artefact is likely to result in over-estimation of certain transcript abundances, depending on the counting method employed.
Collapse
Affiliation(s)
- Johnny A Sena
- National Center for Genome Resources, Santa Fe, NM, 87505, United States
| | - Giulia Galotto
- Worcester Polytechnic Institute, Department of Biology and Biotechnology, Worcester, MA, 01609, United States
| | - Nico P Devitt
- National Center for Genome Resources, Santa Fe, NM, 87505, United States
| | - Melanie C Connick
- National Center for Genome Resources, Santa Fe, NM, 87505, United States
| | - Jennifer L Jacobi
- National Center for Genome Resources, Santa Fe, NM, 87505, United States
| | - Pooja E Umale
- National Center for Genome Resources, Santa Fe, NM, 87505, United States
| | - Luis Vidali
- Worcester Polytechnic Institute, Department of Biology and Biotechnology, Worcester, MA, 01609, United States
| | - Callum J Bell
- National Center for Genome Resources, Santa Fe, NM, 87505, United States.
| |
Collapse
|
17
|
Canzoniero JV, Cravero K, Park BH. The Impact of Collisions on the Ability to Detect Rare Mutant Alleles Using Barcode-Type Next-Generation Sequencing Techniques. Cancer Inform 2017. [DOI: 10.1177/1176935117719236] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Barcoding techniques are used to reduce error from next-generation sequencing, with applications ranging from understanding tumor subclone populations to detecting circulating tumor DNA. Collisions occur when more than one sample molecule is tagged by the same unique identifier (UID) and can result in failure to detect very-low-frequency mutations and error in estimating mutation frequency. Here, we created computer models of barcoding technique, with and without amplification bias introduced by the UID, and analyzed the effect of collisions for a range of mutant allele frequencies (1e−6 to 0.2), number of sample molecules (10 000 to 1e7), and number of UIDs (410-414). Inability to detect rare mutant alleles occurred in 0% to 100% of simulations, depending on collisions and number of mutant molecules. Collisions also introduced error in estimating mutant allele frequency resulting in underestimation of minor allele frequency. Incorporating an understanding of the effect of collisions into experimental design can allow for optimization of the number of sample molecules and number of UIDs to minimize the negative impact on rare mutant detection and mutant frequency estimation.
Collapse
Affiliation(s)
| | - Karen Cravero
- The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins Medicine, Baltimore, MD, USA
| | - Ben Ho Park
- The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins Medicine, Baltimore, MD, USA
| |
Collapse
|
18
|
Smith T, Heger A, Sudbery I. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res 2017; 27:491-499. [PMID: 28100584 PMCID: PMC5340976 DOI: 10.1101/gr.209601.116] [Citation(s) in RCA: 1002] [Impact Index Per Article: 143.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2016] [Accepted: 01/17/2017] [Indexed: 01/06/2023]
Abstract
Unique Molecular Identifiers (UMIs) are random oligonucleotide barcodes that are increasingly used in high-throughput sequencing experiments. Through a UMI, identical copies arising from distinct molecules can be distinguished from those arising through PCR amplification of the same molecule. However, bioinformatic methods to leverage the information from UMIs have yet to be formalized. In particular, sequencing errors in the UMI sequence are often ignored or else resolved in an ad hoc manner. We show that errors in the UMI sequence are common and introduce network-based methods to account for these errors when identifying PCR duplicates. Using these methods, we demonstrate improved quantification accuracy both under simulated conditions and real iCLIP and single-cell RNA-seq data sets. Reproducibility between iCLIP replicates and single-cell RNA-seq clustering are both improved using our proposed network-based method, demonstrating the value of properly accounting for errors in UMIs. These methods are implemented in the open source UMI-tools software package.
Collapse
Affiliation(s)
- Tom Smith
- Computational Genomics Analysis and Training Programme, MRC WIMM Centre for Computational Biology, University of Oxford, Oxford OX3 9DS, United Kingdom
| | - Andreas Heger
- Computational Genomics Analysis and Training Programme, MRC WIMM Centre for Computational Biology, University of Oxford, Oxford OX3 9DS, United Kingdom
| | - Ian Sudbery
- Department of Molecular Biology and Biotechnology, University of Sheffield, Sheffield S10 2TN, United Kingdom
| |
Collapse
|
19
|
Abstract
New high-throughput DNA sequencing (HTS) technologies developed in the past decade have begun to be applied to the study of the complex gene rearrangements that encode human antibodies. This article first reviews the genetic features of Ig loci and the HTS technologies that have been applied to human repertoire studies, then discusses key choices for experimental design and data analysis in these experiments and the insights gained in immunological and infectious disease studies with the use of these approaches.
Collapse
|
20
|
Best K, Oakes T, Heather JM, Shawe-Taylor J, Chain B. Computational analysis of stochastic heterogeneity in PCR amplification efficiency revealed by single molecule barcoding. Sci Rep 2015; 5:14629. [PMID: 26459131 PMCID: PMC4602216 DOI: 10.1038/srep14629] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2014] [Accepted: 09/02/2015] [Indexed: 12/22/2022] Open
Abstract
The polymerase chain reaction (PCR) is one of the most widely used techniques in molecular biology. In combination with High Throughput Sequencing (HTS), PCR is widely used to quantify transcript abundance for RNA-seq, and in the context of analysis of T and B cell receptor repertoires. In this study, we combine DNA barcoding with HTS to quantify PCR output from individual target molecules. We develop computational tools that simulate both the PCR branching process itself, and the subsequent subsampling which typically occurs during HTS sequencing. We explore the influence of different types of heterogeneity on sequencing output, and compare them to experimental results where the efficiency of amplification is measured by barcodes uniquely identifying each molecule of starting template. Our results demonstrate that the PCR process introduces substantial amplification heterogeneity, independent of primer sequence and bulk experimental conditions. This heterogeneity can be attributed both to inherited differences between different template DNA molecules, and the inherent stochasticity of the PCR process. The results demonstrate that PCR heterogeneity arises even when reaction and substrate conditions are kept as constant as possible, and therefore single molecule barcoding is essential in order to derive reproducible quantitative results from any protocol combining PCR with HTS.
Collapse
Affiliation(s)
- Katharine Best
- Division of Infection and Immunity, UCL, London
- CoMPLEX, UCL, London
| | | | | | | | - Benny Chain
- Division of Infection and Immunity, UCL, London
| |
Collapse
|
21
|
Maslov AY, Quispe-Tintaya W, Gorbacheva T, White RR, Vijg J. High-throughput sequencing in mutation detection: A new generation of genotoxicity tests? Mutat Res 2015; 776:136-43. [PMID: 25934519 DOI: 10.1016/j.mrfmmm.2015.03.014] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2014] [Revised: 03/07/2015] [Accepted: 03/27/2015] [Indexed: 10/23/2022]
Abstract
The advent of next generation sequencing (NGS) technology has provided the means to directly analyze the genetic material in primary cells or tissues of any species in a high throughput manner for mutagenic effects of potential genotoxic agents. In principle, direct, genome-wide sequencing of human primary cells and/or tissue biopsies would open up opportunities to identify individuals possibly exposed to mutagenic agents, thereby replacing current risk assessment procedures based on surrogate markers and extrapolations from animal studies. NGS-based tests can also precisely characterize the mutation spectra induced by genotoxic agents, improving our knowledge of their mechanism of action. Thus far, NGS has not been widely employed in genetic toxicology due to the difficulties in measuring low-abundant somatic mutations. Here, we review different strategies to employ NGS for the detection of somatic mutations in a cost-effective manner and discuss the potential applicability of these methods in testing the mutagenicity of genotoxic agents.
Collapse
Affiliation(s)
- Alexander Y Maslov
- Department of Genetics, Albert Einstein College of Medicine, 1301 Morris Park Ave., Bronx, NY 10461, USA.
| | - Wilber Quispe-Tintaya
- Department of Genetics, Albert Einstein College of Medicine, 1301 Morris Park Ave., Bronx, NY 10461, USA
| | - Tatyana Gorbacheva
- Department of Genetics, Albert Einstein College of Medicine, 1301 Morris Park Ave., Bronx, NY 10461, USA
| | - Ryan R White
- Department of Genetics, Albert Einstein College of Medicine, 1301 Morris Park Ave., Bronx, NY 10461, USA
| | - Jan Vijg
- Department of Genetics, Albert Einstein College of Medicine, 1301 Morris Park Ave., Bronx, NY 10461, USA.
| |
Collapse
|
22
|
Abstract
Presently, inferring the long-range structure of the DNA templates is limited by short read lengths. Accurate template counts suffer from distortions occurring during PCR amplification. We explore the utility of introducing random mutations in identical or nearly identical templates to create distinguishable patterns that are inherited during subsequent copying. We simulate the applications of this process under assumptions of error-free sequencing and perfect mapping, using cytosine deamination as a model for mutation. The simulations demonstrate that within readily achievable conditions of nucleotide conversion and sequence coverage, we can accurately count the number of otherwise identical molecules as well as connect variants separated by long spans of identical sequence. We discuss many potential applications, such as transcript profiling, isoform assembly, haplotype phasing, and de novo genome assembly.
Collapse
|
23
|
Abstract
Duplex Sequencing (DS) is a next-generation sequencing methodology capable of detecting a single mutation among >1 × 10(7) wild-type nucleotides, thereby enabling the study of heterogeneous populations and very-low-frequency genetic alterations. DS can be applied to any double-stranded DNA sample, but it is ideal for small genomic regions of <1 Mb in size. The method relies on the ligation of sequencing adapters harboring random yet complementary double-stranded nucleotide sequences to the sample DNA of interest. Individually labeled strands are then PCR-amplified, creating sequence 'families' that share a common tag sequence derived from the two original complementary strands. Mutations are scored only if the variant is present in the PCR families arising from both of the two DNA strands. Here we provide a detailed protocol for efficient DS adapter synthesis, library preparation and target enrichment, as well as an overview of the data analysis workflow. The protocol typically takes 1-3 d.
Collapse
|
24
|
Wang Z, Wang J, Yue T, Yuan Y, Cai R, Niu C. Immunomagnetic separation combined with polymerase chain reaction for the detection of Alicyclobacillus acidoterrestris in apple juice. PLoS One 2013; 8:e82376. [PMID: 24349270 PMCID: PMC3857787 DOI: 10.1371/journal.pone.0082376] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2013] [Accepted: 10/29/2013] [Indexed: 11/23/2022] Open
Abstract
A combination of immunomagnetic separation (IMS) and polymerase chain reaction (PCR) was used to detect Alicyclobacillus acidoterrestris (A. acidoterrestris) in apple juice. The optimum technological parameters of the IMS system were investigated. The results indicated that the immunocapture reactions could be finished in 60 min and the quantity of IMPs used for IMS was 2.5 mg/mL. Then the combined IMS-PCR procedure was assessed by detecting A. acidoterrestris in apple juice samples. The agarose gel electrophoresis results of 20 different strains showed that the IMS-PCR procedure presented high specificity to the A. acidoterrestris. The sensitivity of the IMS-PCR was 2×101 CFU/mL and the total detection time was 3 to 4 h. Of the 78 naturally contaminated apple juice samples examined, the sensitivity, specificity and accuracy of IMS-PCR compared with the standardized pour plate method were 90.9%, 97.0% and 96.2%, respectively. The results exhibited that the developed IMS-PCR method will be a valuable tool for detecting A. acidoterrestris and improving food quality in juice samples.
Collapse
Affiliation(s)
- Zhouli Wang
- College of Food Science and Engineering, Northwest A&F University, Yangling, Shaanxi, China
| | - Jun Wang
- College of Food Science and Engineering, XuChang University, XuChang, Henan, China
| | - Tianli Yue
- College of Food Science and Engineering, Northwest A&F University, Yangling, Shaanxi, China
- * E-mail:
| | - Yahong Yuan
- College of Food Science and Engineering, Northwest A&F University, Yangling, Shaanxi, China
| | - Rui Cai
- College of Food Science and Engineering, Northwest A&F University, Yangling, Shaanxi, China
| | - Chen Niu
- College of Food Science and Engineering, Northwest A&F University, Yangling, Shaanxi, China
| |
Collapse
|
25
|
Detection of ultra-rare mutations by next-generation sequencing. Proc Natl Acad Sci U S A 2012; 109:14508-13. [PMID: 22853953 DOI: 10.1073/pnas.1208715109] [Citation(s) in RCA: 699] [Impact Index Per Article: 58.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Next-generation DNA sequencing promises to revolutionize clinical medicine and basic research. However, while this technology has the capacity to generate hundreds of billions of nucleotides of DNA sequence in a single experiment, the error rate of ~1% results in hundreds of millions of sequencing mistakes. These scattered errors can be tolerated in some applications but become extremely problematic when "deep sequencing" genetically heterogeneous mixtures, such as tumors or mixed microbial populations. To overcome limitations in sequencing accuracy, we have developed a method termed Duplex Sequencing. This approach greatly reduces errors by independently tagging and sequencing each of the two strands of a DNA duplex. As the two strands are complementary, true mutations are found at the same position in both strands. In contrast, PCR or sequencing errors result in mutations in only one strand and can thus be discounted as technical error. We determine that Duplex Sequencing has a theoretical background error rate of less than one artifactual mutation per billion nucleotides sequenced. In addition, we establish that detection of mutations present in only one of the two strands of duplex DNA can be used to identify sites of DNA damage. We apply the method to directly assess the frequency and pattern of random mutations in mitochondrial DNA from human cells.
Collapse
|
26
|
Milbury CA, Correll M, Quackenbush J, Rubio R, Makrigiorgos GM. COLD-PCR enrichment of rare cancer mutations prior to targeted amplicon resequencing. Clin Chem 2011; 58:580-9. [PMID: 22194627 DOI: 10.1373/clinchem.2011.176198] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
BACKGROUND Despite widespread interest in next-generation sequencing (NGS), the adoption of personalized clinical genomics and mutation profiling of cancer specimens is lagging, in part because of technical limitations. Tumors are genetically heterogeneous and often contain normal/stromal cells, features that lead to low-abundance somatic mutations that generate ambiguous results or reside below NGS detection limits, thus hindering the clinical sensitivity/specificity standards of mutation calling. We applied COLD-PCR (coamplification at lower denaturation temperature PCR), a PCR methodology that selectively enriches variants, to improve the detection of unknown mutations before NGS-based amplicon resequencing. METHODS We used both COLD-PCR and conventional PCR (for comparison) to amplify serially diluted mutation-containing cell-line DNA diluted into wild-type DNA, as well as DNA from lung adenocarcinoma and colorectal cancer samples. After amplification of TP53 (tumor protein p53), KRAS (v-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog), IDH1 [isocitrate dehydrogenase 1 (NADP(+)), soluble], and EGFR (epidermal growth factor receptor) gene regions, PCR products were pooled for library preparation, bar-coded, and sequenced on the Illumina HiSeq 2000. RESULTS In agreement with recent findings, sequencing errors by conventional targeted-amplicon approaches dictated a mutation-detection limit of approximately 1%-2%. Conversely, COLD-PCR amplicons enriched mutations above the error-related noise, enabling reliable identification of mutation abundances of approximately 0.04%. Sequencing depth was not a large factor in the identification of COLD-PCR-enriched mutations. For the clinical samples, several missense mutations were not called with conventional amplicons, yet they were clearly detectable with COLD-PCR amplicons. Tumor heterogeneity for the TP53 gene was apparent. CONCLUSIONS As cancer care shifts toward personalized intervention based on each patient's unique genetic abnormalities and tumor genome, we anticipate that COLD-PCR combined with NGS will elucidate the role of mutations in tumor progression, enabling NGS-based analysis of diverse clinical specimens within clinical practice.
Collapse
Affiliation(s)
- Coren A Milbury
- Division of DNA Repair and Genome Stability, Department of Radiation Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | | | | | | | | |
Collapse
|
27
|
Qiu F, Gu K, Yang B, Ding Y, Jiang D, Wu Y, Huang LL. DNA assay based on monolayer-barcoded nanoparticles for mass spectrometry in combination with magnetic microprobes. Talanta 2011; 85:1698-702. [PMID: 21807242 DOI: 10.1016/j.talanta.2011.06.045] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2011] [Revised: 06/14/2011] [Accepted: 06/15/2011] [Indexed: 01/05/2023]
Abstract
Mass spectrometry (MS) based methodology offers simple, fast and sensitive diagnosis. While it has become the predominate approach in biomolecular analysis, it has not been suitable for analyzing nucleic acid due to its low ionization efficiency. We report herein on a DNA assay based on monolayer-barcoded nanoparticles that were encoded with reporter mass molecules, which act as surrogate molecules for the matrix-assisted laser desorption/ionization time-of-flight MS (MALDI-TOF MS) identification of target DNA through mass spectrometry in combination with magnetic microprobes. This assay demonstrated high MS sensitivity, with the ability to detect target DNA at femtomolar (10(-15) M) levels. This inaugural effort using combined techniques is significant because it showed an extraordinary analytical capability for differentiating the single nucleotide polymorphism (SNP), which comprises the most abundant source of genetic variation in the human genome. We also report herein the feasibility of MS detection of two target DNAs that have the same mass but different nucleotide base composition, which classic MS methodology is inherently unable to differentiate.
Collapse
Affiliation(s)
- Fei Qiu
- Institute of Molecular Medicine, Huaqiao University, Quanzhou, PR China
| | | | | | | | | | | | | |
Collapse
|
28
|
Kinde I, Wu J, Papadopoulos N, Kinzler KW, Vogelstein B. Detection and quantification of rare mutations with massively parallel sequencing. Proc Natl Acad Sci U S A 2011; 108:9530-5. [PMID: 21586637 PMCID: PMC3111315 DOI: 10.1073/pnas.1105422108] [Citation(s) in RCA: 858] [Impact Index Per Article: 66.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
The identification of mutations that are present in a small fraction of DNA templates is essential for progress in several areas of biomedical research. Although massively parallel sequencing instruments are in principle well suited to this task, the error rates in such instruments are generally too high to allow confident identification of rare variants. We here describe an approach that can substantially increase the sensitivity of massively parallel sequencing instruments for this purpose. The keys to this approach, called the Safe-Sequencing System ("Safe-SeqS"), are (i) assignment of a unique identifier (UID) to each template molecule, (ii) amplification of each uniquely tagged template molecule to create UID families, and (iii) redundant sequencing of the amplification products. PCR fragments with the same UID are considered mutant ("supermutants") only if ≥95% of them contain the identical mutation. We illustrate the utility of this approach for determining the fidelity of a polymerase, the accuracy of oligonucleotides synthesized in vitro, and the prevalence of mutations in the nuclear and mitochondrial genomes of normal cells.
Collapse
Affiliation(s)
- Isaac Kinde
- The Ludwig Center for Cancer Genetics and Therapeutics and The Howard Hughes Medical Institute, Johns Hopkins Kimmel Cancer Center, Baltimore, MD 21231
| | - Jian Wu
- The Ludwig Center for Cancer Genetics and Therapeutics and The Howard Hughes Medical Institute, Johns Hopkins Kimmel Cancer Center, Baltimore, MD 21231
| | - Nick Papadopoulos
- The Ludwig Center for Cancer Genetics and Therapeutics and The Howard Hughes Medical Institute, Johns Hopkins Kimmel Cancer Center, Baltimore, MD 21231
| | - Kenneth W. Kinzler
- The Ludwig Center for Cancer Genetics and Therapeutics and The Howard Hughes Medical Institute, Johns Hopkins Kimmel Cancer Center, Baltimore, MD 21231
| | - Bert Vogelstein
- The Ludwig Center for Cancer Genetics and Therapeutics and The Howard Hughes Medical Institute, Johns Hopkins Kimmel Cancer Center, Baltimore, MD 21231
| |
Collapse
|
29
|
Piperi C, Papavassiliou AG. Strategies for DNA methylation analysis in developmental studies. Dev Growth Differ 2011; 53:287-99. [PMID: 21447098 DOI: 10.1111/j.1440-169x.2011.01253.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Developmental processes in eukaryotes are highly dependent on DNA methylation. 5-methylcytosine (m(5) C) is the most prevalent and best understood DNA modification implicated in maintenance of genomic integrity and function across species. Although m(5) C occurs almost exclusively in symmetrical CpG context in vertebrates, additional asymmetrical distribution in CpHpG and CpHpH sites has been observed in plants and embryonic stem cells. To this end, accurate and reproducible methodology for full analysis of the DNA methylome is highly demanded. Fortunately, a variety of methods enable quantitative DNA methylation mapping at a single-base resolution and in a large scale. Here, we provide a critical overview of methods applied primarily to m(5) C detection with particular emphasis on technical improvements of the classical bisulfite-conversion protocol. We further describe strategies in combination with emerging technologies that allow acquisition of highly reliable data for developmental studies.
Collapse
Affiliation(s)
- Christina Piperi
- Department of Biological Chemistry, University of Athens Medical School, 11527 Athens, Greece
| | | |
Collapse
|
30
|
Abstract
Common DNA sequence variants inadequately explain variability in fat mass among individuals. Abnormal body weights are characteristic of specific imprinted-gene disorders. However, the relevance of imprinted genes to our understanding of obesity among the general population is uncertain. Hitherto unidentified imprinted genes and epigenetic mosaicism are two of the challenges for this emerging field of epigenetics. Subtle epigenetic differences in imprinted genes and gene networks are likely to be present among cells, tissues and individuals. In order to advance obesity research it will be necessary to use genome-wide, next-generation sequencing approaches that allow the detection of such epigenetic differences.
Collapse
Affiliation(s)
- Reinhard Stöger
- Department of Biology, University of Washington, 156 Kincaid Hall, Box 351800, Seattle, WA, 98195-1800, USA.
| |
Collapse
|