1
|
Menon V, Brash DE. Next-generation sequencing methodologies to detect low-frequency mutations: "Catch me if you can". MUTATION RESEARCH. REVIEWS IN MUTATION RESEARCH 2023; 792:108471. [PMID: 37716438 PMCID: PMC10843083 DOI: 10.1016/j.mrrev.2023.108471] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 09/06/2023] [Accepted: 09/07/2023] [Indexed: 09/18/2023]
Abstract
Mutations, the irreversible changes in an organism's DNA sequence, are present in tissues at a variant allele frequency (VAF) ranging from ∼10-8 per bp for a founder mutation to ∼10-3 for a histologically normal tissue sample containing several independent clones - compared to 1%- 50% for a heterozygous tumor mutation or a polymorphism. The rarity of these events poses a challenge for accurate clinical diagnosis and prognosis, toxicology, and discovering new disease etiologies. Standard Next-Generation Sequencing (NGS) technologies report VAFs as low as 0.5% per nt, but reliably observing rarer precursor events requires additional sophistication to measure ultralow-frequency mutations. We detail the challenge; define terms used to characterize the results, which vary between laboratories and sometimes conflict between biologists and bioinformaticists; and describe recent innovations to improve standard NGS methodologies including: single-strand consensus sequence methods such as Safe-SeqS and SiMSen-Seq; tandem-strand consensus sequence methods such as o2n-Seq and SMM-Seq; and ultrasensitive parent-strand consensus sequence methods such as DuplexSeq, PacBio HiFi, SinoDuplex, OPUSeq, EcoSeq, BotSeqS, Hawk-Seq, NanoSeq, SaferSeq, and CODEC. Practical applications are also noted. Several methods quantify VAF down to 10-5 at a nt and mutation frequency (MF) in a target region down to 10-7 per nt. By expanding to > 1 Mb of sites never observed twice, thus forgoing VAF, other methods quantify MF < 10-9 per nt or < 15 errors per haploid genome. Clonal expansion cannot be directly distinguished from independent mutations by sequencing, so it is essential for a paper to report whether its MF counted only different mutations - the minimum independent-mutation frequency MFminI - or all mutations observed including recurrences - the larger maximum independent-mutation frequency MFmaxI which may reflect clonal expansion. Ultrasensitive methods reveal that, without their use, even mutations with VAF 0.5-1% are usually spurious.
Collapse
Affiliation(s)
- Vijay Menon
- Department of Therapeutic Radiology, Yale School of Medicine, New Haven, CT 06520-8040, USA.
| | - Douglas E Brash
- Department of Therapeutic Radiology, Yale School of Medicine, New Haven, CT 06520-8040, USA; Department of Dermatology, Yale School of Medicine, New Haven, CT 06520-8059, USA; Yale Cancer Center, Yale School of Medicine, New Haven, CT 06520-8028, USA.
| |
Collapse
|
2
|
Sequencing barcode construction and identification methods based on block error-correction codes. SCIENCE CHINA-LIFE SCIENCES 2020; 63:1580-1592. [PMID: 32303959 DOI: 10.1007/s11427-019-1651-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Accepted: 02/11/2020] [Indexed: 02/07/2023]
Abstract
Multiplexed sequencing relies on specific sample labels, the barcodes, to tag DNA fragments belonging to different samples and to separate the output of the sequencers. However, the barcodes are often corrupted by insertion, deletion and substitution errors introduced during sequencing, which may lead to sample misassignment. In this paper, we propose a barcode construction method, which combines a block error-correction code with a predetermined pseudorandom sequence to generate a base sequence for labeling different samples. Furthermore, to identify the corrupted barcodes for assigning reads to their respective samples, we present a soft decision identification method that consists of inner decoding and outer decoding. The inner decoder establishes the hidden Markov model (HMM) for base insertion/deletion estimation with the pseudorandom sequence, and adapts the forward-backward (FB) algorithm to output the soft information of each bit in the block code. The outer decoder performs soft decision decoding using the soft information to effectively correct multiple errors in the barcodes. Simulation results show that the proposed methods are highly robust to high error rates of insertions, deletions and substitutions in the barcodes. In addition, compared with the inner decoding algorithm of the barcodes based on watermarks, the proposed inner decoding algorithm can greatly reduce the decoding complexity.
Collapse
|
3
|
Sanhueza D, Guégan JF, Jordan H, Chevillon C. Environmental Variations in Mycobacterium ulcerans Transcriptome: Absence of Mycolactone Expression in Suboptimal Environments. Toxins (Basel) 2019; 11:E146. [PMID: 30836720 PMCID: PMC6468629 DOI: 10.3390/toxins11030146] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2019] [Revised: 02/18/2019] [Accepted: 02/27/2019] [Indexed: 12/30/2022] Open
Abstract
Buruli ulcer is a neglected tropical infectious disease, produced by the environmentally persistent pathogen Mycobacterium ulcerans (MU). Neither the ecological niche nor the exact mode of transmission of MU are completely elucidated. However, some environmental factors, such as the concentration in chitin and pH values, were reported to promote MU growth in vitro. We pursued this research using next generation sequencing (NGS) and mRNA sequencing to investigate potential changes in MU genomic expression profiles across in vitro environmental conditions known to be suitable for MU growth. Supplementing the growth culture medium in either chitin alone, calcium alone, or in both chitin and calcium significantly impacted the MU transcriptome and thus several metabolic pathways, such as, for instance, those involved in DNA synthesis or cell wall production. By contrast, some genes carried by the virulence plasmid and necessary for the production of the mycolactone toxin were expressed neither in control nor in any modified environments. We hypothesized that these genes are only expressed in stressful conditions. Our results describe important environmental determinants playing a role in the pathogenicity of MU, helping the understanding of its complex natural life cycle and encouraging further research using genomic approaches.
Collapse
Affiliation(s)
- Daniel Sanhueza
- MIVEGEC, IRD, CNRS, University Montpellier, 34394 Montpellier, France.
| | - Jean-François Guégan
- MIVEGEC, IRD, CNRS, University Montpellier, 34394 Montpellier, France.
- ASTRE, INRA, Cirad, University Montpellier, 34394 Montpellier, France.
| | - Heather Jordan
- Department of Biological Sciences, Mississippi State University, Starkville, MS 39762, USA.
| | | |
Collapse
|
4
|
Wang B, Zheng X, Zhou S, Zhou C, Wei X, Zhang Q, Wei Z. Constructing DNA Barcode Sets Based on Particle Swarm Optimization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:999-1002. [PMID: 28287980 DOI: 10.1109/tcbb.2017.2679004] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Following the completion of the human genome project, a large amount of high-throughput bio-data was generated. To analyze these data, massively parallel sequencing, namely next-generation sequencing, was rapidly developed. DNA barcodes are used to identify the ownership between sequences and samples when they are attached at the beginning or end of sequencing reads. Constructing DNA barcode sets provides the candidate DNA barcodes for this application. To increase the accuracy of DNA barcode sets, a particle swarm optimization (PSO) algorithm has been modified and used to construct the DNA barcode sets in this paper. Compared with the extant results, some lower bounds of DNA barcode sets are improved. The results show that the proposed algorithm is effective in constructing DNA barcode sets.
Collapse
|
5
|
Groot-Kormelink PJ, Ferrand S, Kelley N, Bill A, Freuler F, Imbert PE, Marelli A, Gerwin N, Sivilotti LG, Miraglia L, Orth AP, Oakeley EJ, Schopfer U, Siehler S. High Throughput Random Mutagenesis and Single Molecule Real Time Sequencing of the Muscle Nicotinic Acetylcholine Receptor. PLoS One 2016; 11:e0163129. [PMID: 27649498 PMCID: PMC5029940 DOI: 10.1371/journal.pone.0163129] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Accepted: 09/03/2016] [Indexed: 12/15/2022] Open
Abstract
High throughput random mutagenesis is a powerful tool to identify which residues are important for the function of a protein, and gain insight into its structure-function relation. The human muscle nicotinic acetylcholine receptor was used to test whether this technique previously used for monomeric receptors can be applied to a pentameric ligand-gated ion channel. A mutant library for the α1 subunit of the channel was generated by error-prone PCR, and full length sequences of all 2816 mutants were retrieved using single molecule real time sequencing. Each α1 mutant was co-transfected with wildtype β1, δ, and ε subunits, and the channel function characterized by an ion flux assay. To test whether the strategy could map the structure-function relation of this receptor, we attempted to identify mutations that conferred resistance to competitive antagonists. Mutant hits were defined as receptors that responded to the nicotinic agonist epibatidine, but were not inhibited by either α-bungarotoxin or tubocurarine. Eight α1 subunit mutant hits were identified, six of which contained mutations at position Y233 or V275 in the transmembrane domain. Three single point mutations (Y233N, Y233H, and V275M) were studied further, and found to enhance the potencies of five channel agonists tested. This suggests that the mutations made the channel resistant to the antagonists, not by impairing antagonist binding, but rather by producing a gain-of-function phenotype, e.g. increased agonist sensitivity. Our data show that random high throughput mutagenesis is applicable to multimeric proteins to discover novel functional mutants, and outlines the benefits of using single molecule real time sequencing with regards to quality control of the mutant library as well as downstream mutant data interpretation.
Collapse
Affiliation(s)
- Paul J. Groot-Kormelink
- Musculoskeletal Disease Area, Novartis Institutes for BioMedical Research, Basel, Switzerland
| | - Sandrine Ferrand
- Center for Proteomic Chemistry, Novartis Institutes for BioMedical Research, Basel, Switzerland
| | - Nicholas Kelley
- Analytical Sciences and Imaging, Novartis Institutes for BioMedical Research, Basel, Switzerland
| | - Anke Bill
- Center for Proteomic Chemistry, Novartis Institutes for BioMedical Research, Cambridge, Massachusetts, United States of America
| | - Felix Freuler
- Center for Proteomic Chemistry, Novartis Institutes for BioMedical Research, Basel, Switzerland
| | - Pierre-Eloi Imbert
- Center for Proteomic Chemistry, Novartis Institutes for BioMedical Research, Basel, Switzerland
| | - Anthony Marelli
- Genomics Institute of the Novartis Research Foundation, Novartis Institutes for BioMedical Research, San Diego, California, United States of America
| | - Nicole Gerwin
- Musculoskeletal Disease Area, Novartis Institutes for BioMedical Research, Basel, Switzerland
| | - Lucia G. Sivilotti
- Department of Neuroscience, Physiology and Pharmacology, University College London, London, United Kingdom
| | - Loren Miraglia
- Genomics Institute of the Novartis Research Foundation, Novartis Institutes for BioMedical Research, San Diego, California, United States of America
| | - Anthony P. Orth
- Genomics Institute of the Novartis Research Foundation, Novartis Institutes for BioMedical Research, San Diego, California, United States of America
| | - Edward J. Oakeley
- Analytical Sciences and Imaging, Novartis Institutes for BioMedical Research, Basel, Switzerland
| | - Ulrich Schopfer
- Center for Proteomic Chemistry, Novartis Institutes for BioMedical Research, Basel, Switzerland
| | - Sandra Siehler
- Center for Proteomic Chemistry, Novartis Institutes for BioMedical Research, Basel, Switzerland
- * E-mail:
| |
Collapse
|
6
|
Effects of early feeding on the host rumen transcriptome and bacterial diversity in lambs. Sci Rep 2016; 6:32479. [PMID: 27576848 PMCID: PMC5006043 DOI: 10.1038/srep32479] [Citation(s) in RCA: 64] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2016] [Accepted: 08/08/2016] [Indexed: 11/08/2022] Open
Abstract
Early consumption of starter feed promotes rumen development in lambs. We examined rumen development in lambs fed starter feed for 5 weeks using histological and biochemical analyses and by performing high-throughput sequencing in rumen tissues. Additionally, rumen contents of starter feed-fed lambs were compared to those of breast milk-fed controls. Our physiological and biochemical findings revealed that early starter consumption facilitated rumen development, changed the pattern of ruminal fermentation, and increased the amylase and carboxymethylcellulase activities of rumen micro-organisms. RNA-seq analysis revealed 225 differentially expressed genes between the rumens of breast milk- and starter feed-fed lambs. These DEGs were involved in many metabolic pathways, particularly lipid and carbohydrate metabolism, and included HMGCL and HMGCS2. Sequencing analysis of 16S rRNA genes revealed that ruminal bacterial communities were more diverse in breast milk-than in starter feed-fed lambs, and each group had a distinct microbiota. We conclude that early starter feeding is beneficial to rumen development and physiological function in lambs. The underlying mechanism may involve the stimulation of ruminal ketogenesis and butanoate metabolism via HMGCL and HMGCS2 combined with changes in the fermentation type induced by ruminal microbiota. Overall, this study provides insights into the molecular mechanisms of rumen development in sheep.
Collapse
|
7
|
Embryonal Control of Yellow Seed Coat Locus ECY1 Is Related to Alanine and Phenylalanine Metabolism in the Seed Embryo of Brassica napus. G3-GENES GENOMES GENETICS 2016; 6:1073-81. [PMID: 26896439 PMCID: PMC4825642 DOI: 10.1534/g3.116.027110] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Seed coat color is determined by the type of pigment deposited in the seed coat cells. It is related to important agronomic traits of seeds such as seed dormancy, longevity, oil content, protein content and fiber content. In Brassica napus, inheritance of seed coat color is related to maternal effects and pollen effects (xenia effects). In this research we isolated a mutation of yellow seeded B. napus controlled by a single Mendelian locus, which is named Embryonal Control of Yellow seed coat 1 (Ecy1). Microscopy of transverse sections of the mature seed show that pigment is deposited only in the outer layer of the seed coat. Using Illumina Hisequation 2000 sequencing technology, a total of 12 GB clean data, 116× coverage of coding sequences of B. napus, was achieved from seeds 26 d after pollination (DAP). It was assembled into 172,238 independent transcripts, and 55,637 unigenes. A total of 139 orthologous genes of Arabidopsis transparent testa (TT) genes were mapped in silico to 19 chromosomes of B. napus. Only 49 of the TT orthologous genes are transcribed in seeds. However transcription of all orthologs was independent of embryonal control of seed coat color. Only 55 genes were found to be differentially expressed between brown seeds and the yellow mutant. Of these 55, 50 were upregulated and five were downregulated in yellow seeds as compared to their brown counterparts. By KEGG classification, 14 metabolic pathways were significantly enriched. Of these, five pathways: phenylpropanoid biosynthesis, cyanoamino acid metabolism, plant hormone signal transduction, metabolic pathways, and biosynthesis of secondary metabolites, were related with seed coat pigmentation. Free amino acid quantification showed that Ala and Phe were present at higher levels in the embryos of yellow seeds as compared to those of brown seeds. This increase was not observed in the seed coat. Moreover, the excess amount of free Ala was exactly twice that of Phe in the embryo. The pigment substrate chalcone is synthesized from two molecules of Ala and one molecule of Phe. The correlation between accumulation of Ala and Phe, and disappearance of pigment in the yellow seeded mutant, suggests that embryonal control of seed coat color is related with Phe and Ala metabolism in the embryo of B. napus.
Collapse
|
8
|
Tapia E, Spetale F, Krsticevic F, Angelone L, Bulacio P. DNA Barcoding through Quaternary LDPC Codes. PLoS One 2015; 10:e0140459. [PMID: 26492348 PMCID: PMC4619643 DOI: 10.1371/journal.pone.0140459] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2015] [Accepted: 09/23/2015] [Indexed: 12/04/2022] Open
Abstract
For many parallel applications of Next-Generation Sequencing (NGS) technologies short barcodes able to accurately multiplex a large number of samples are demanded. To address these competitive requirements, the use of error-correcting codes is advised. Current barcoding systems are mostly built from short random error-correcting codes, a feature that strongly limits their multiplexing accuracy and experimental scalability. To overcome these problems on sequencing systems impaired by mismatch errors, the alternative use of binary BCH and pseudo-quaternary Hamming codes has been proposed. However, these codes either fail to provide a fine-scale with regard to size of barcodes (BCH) or have intrinsic poor error correcting abilities (Hamming). Here, the design of barcodes from shortened binary BCH codes and quaternary Low Density Parity Check (LDPC) codes is introduced. Simulation results show that although accurate barcoding systems of high multiplexing capacity can be obtained with any of these codes, using quaternary LDPC codes may be particularly advantageous due to the lower rates of read losses and undetected sample misidentification errors. Even at mismatch error rates of 10−2 per base, 24-nt LDPC barcodes can be used to multiplex roughly 2000 samples with a sample misidentification error rate in the order of 10−9 at the expense of a rate of read losses just in the order of 10−6.
Collapse
Affiliation(s)
- Elizabeth Tapia
- CIFASIS-Conicet Institute, Rosario, Argentina
- Fac. de Cs. Exactas e Ingeniería, Universidad Nac. de Rosario, Rosario, Argentina
- * E-mail:
| | - Flavio Spetale
- CIFASIS-Conicet Institute, Rosario, Argentina
- Fac. de Cs. Exactas e Ingeniería, Universidad Nac. de Rosario, Rosario, Argentina
| | | | - Laura Angelone
- CIFASIS-Conicet Institute, Rosario, Argentina
- Fac. de Cs. Exactas e Ingeniería, Universidad Nac. de Rosario, Rosario, Argentina
| | - Pilar Bulacio
- CIFASIS-Conicet Institute, Rosario, Argentina
- Fac. de Cs. Exactas e Ingeniería, Universidad Nac. de Rosario, Rosario, Argentina
| |
Collapse
|
9
|
Kracht D, Schober S. Insertion and deletion correcting DNA barcodes based on watermarks. BMC Bioinformatics 2015; 16:50. [PMID: 25887410 PMCID: PMC4339740 DOI: 10.1186/s12859-015-0482-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2014] [Accepted: 01/29/2015] [Indexed: 01/12/2023] Open
Abstract
Background Barcode multiplexing is a key strategy for sharing the rising capacity of next-generation sequencing devices: Synthetic DNA tags, called barcodes, are attached to natural DNA fragments within the library preparation procedure. Different libraries, can individually be labeled with barcodes for a joint sequencing procedure. A post-processing step is needed to sort the sequencing data according to their origin, utilizing these DNA labels. The final separation step is called demultiplexing and is mainly determined by the characteristics of the DNA code words used as labels. Currently, we are facing two different strategies for barcoding: One is based on the Hamming distance, the other uses the edit metric to measure distances of code words. The theory of channel coding provides well-known code constructions for Hamming metric. They provide a large number of code words with variable lengths and maximal correction capability regarding substitution errors. However, some sequencing platforms are known to have exceptional high numbers of insertion or deletion errors. Barcodes based on the edit distance can take insertion and deletion errors into account in the decoding process. Unfortunately, there is no explicit code-construction known that gives optimal codes for edit metric. Results In the present work we focus on an entirely different perspective to obtain DNA barcodes. We consider a concatenated code construction, producing so-called watermark codes, which were first proposed by Davey and Mackay, to communicate via binary channels with synchronization errors. We adapt and extend the concepts of watermark codes to use them for DNA sequencing. Moreover, we provide an exemplary set of barcodes that are experimentally compatible with common next-generation sequencing platforms. Finally, a realistic simulation scenario is use to evaluate the proposed codes to show that the watermark concept is suitable for DNA sequencing applications. Conclusion Our adaption of watermark codes enables the construction of barcodes that are capable of correcting substitutions, insertion and deletion errors. The presented approach has the advantage of not needing any markers or technical sequences to recover the position of the barcode in the sequencing reads, which poses a significant restriction with other approaches. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0482-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- David Kracht
- Institute of Communications Engineering, Ulm University, Albert-Einstein-Allee 43, Ulm, 89081, Germany.
| | - Steffen Schober
- Institute of Communications Engineering, Ulm University, Albert-Einstein-Allee 43, Ulm, 89081, Germany.
| |
Collapse
|