1
|
Matsumoto Y, Nakamura S. Rapid and Comprehensive Identification of Nontuberculous Mycobacteria. Methods Mol Biol 2023; 2632:247-255. [PMID: 36781733 DOI: 10.1007/978-1-0716-2996-3_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2023]
Abstract
Next-generation sequencing is a powerful tool to accurately identify pathogens. The MinION sequencer is best suited for the rapid identification of bacterial species due to its real-time sequence output. In this chapter, we introduce a method to identify nontuberculous mycobacteria (NTM) in one sequencing analysis from culture isolates using the MinION sequencer. NTM disease is now recognized as a growing global health concern due to its increasing incidence and prevalence. There are over 200 NTM species, of which the major pathogens are further classified into many subspecies showing different antibiotic susceptibilities. Therefore, identifying the pathogens at the subspecies level of NTM is necessary to select an appropriate treatment regimen. The protocol described here includes DNA extraction by lysis using silica beads, library preparation, sequencing by the MinION sequencer, and analysis of multilocus sequence typing using the software "mlstverse" and enables rapid and comprehensive identification of 175 species of NTM at the subspecies level with high sensitivity and accuracy.
Collapse
Affiliation(s)
- Yuki Matsumoto
- Department of Infection Metagenomics, Research Institute for Microbial Diseases, Osaka University, Osaka, Japan
| | - Shota Nakamura
- Department of Infection Metagenomics, Research Institute for Microbial Diseases, Osaka University, Osaka, Japan.
- Center for Infectious Disease Education and Research, Osaka University, Osaka, Japan.
| |
Collapse
|
2
|
Fu YB, Peterson GW, Dong Y. Increasing Genome Sampling and Improving SNP Genotyping for Genotyping-by-Sequencing with New Combinations of Restriction Enzymes. G3 (BETHESDA, MD.) 2016; 6:845-56. [PMID: 26818077 PMCID: PMC4825655 DOI: 10.1534/g3.115.025775] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Accepted: 01/22/2016] [Indexed: 12/15/2022]
Abstract
Genotyping-by-sequencing (GBS) has emerged as a useful genomic approach for exploring genome-wide genetic variation. However, GBS commonly samples a genome unevenly and can generate a substantial amount of missing data. These technical features would limit the power of various GBS-based genetic and genomic analyses. Here we present software called IgCoverage for in silico evaluation of genomic coverage through GBS with an individual or pair of restriction enzymes on one sequenced genome, and report a new set of 21 restriction enzyme combinations that can be applied to enhance GBS applications. These enzyme combinations were developed through an application of IgCoverage on 22 plant, animal, and fungus species with sequenced genomes, and some of them were empirically evaluated with different runs of Illumina MiSeq sequencing in 12 plant species. The in silico analysis of 22 organisms revealed up to eight times more genome coverage for the new combinations consisted of pairing four- or five-cutter restriction enzymes than the commonly used enzyme combination PstI + MspI. The empirical evaluation of the new enzyme combination (HinfI + HpyCH4IV) in 12 plant species showed 1.7-6 times more genome coverage than PstI + MspI, and 2.3 times more genome coverage in dicots than monocots. Also, the SNP genotyping in 12 Arabidopsis and 12 rice plants revealed that HinfI + HpyCH4IV generated 7 and 1.3 times more SNPs (with 0-16.7% missing observations) than PstI + MspI, respectively. These findings demonstrate that these novel enzyme combinations can be utilized to increase genome sampling and improve SNP genotyping in various GBS applications.
Collapse
Affiliation(s)
- Yong-Bi Fu
- Plant Gene Resources of Canada, Saskatoon Research and Development Centre, Agriculture and Agri-Food Canada, Saskatoon, Saskatchewan S7N 0X2, Canada
| | - Gregory W Peterson
- Plant Gene Resources of Canada, Saskatoon Research and Development Centre, Agriculture and Agri-Food Canada, Saskatoon, Saskatchewan S7N 0X2, Canada
| | - Yibo Dong
- Plant Gene Resources of Canada, Saskatoon Research and Development Centre, Agriculture and Agri-Food Canada, Saskatoon, Saskatchewan S7N 0X2, Canada
| |
Collapse
|
3
|
Abstract
Genotyping-by-sequencing (GBS) approaches provide low-cost, high-density genotype information. However, GBS has unique technical considerations, including a substantial amount of missing data and a nonuniform distribution of sequence reads. The goal of this study was to characterize technical variation using this method and to develop methods to optimize read depth to obtain desired marker coverage. To empirically assess the distribution of fragments produced using GBS, ∼8.69 Gb of GBS data were generated on the Zea mays reference inbred B73, utilizing ApeKI for genome reduction and single-end reads between 75 and 81 bp in length. We observed wide variation in sequence coverage across sites. Approximately 76% of potentially observable cut site-adjacent sequence fragments had no sequencing reads whereas a portion had substantially greater read depth than expected, up to 2369 times the expected mean. The methods described in this article facilitate determination of sequencing depth in the context of empirically defined read depth to achieve desired marker density for genetic mapping studies.
Collapse
|
4
|
Coverage theories for metagenomic DNA sequencing based on a generalization of Stevens' theorem. J Math Biol 2012; 67:1141-61. [PMID: 22965653 PMCID: PMC3795925 DOI: 10.1007/s00285-012-0586-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2011] [Revised: 08/28/2012] [Indexed: 11/21/2022]
Abstract
Metagenomic project design has relied variously upon speculation, semi-empirical and ad hoc heuristic models, and elementary extensions of single-sample Lander–Waterman expectation theory, all of which are demonstrably inadequate. Here, we propose an approach based upon a generalization of Stevens’ Theorem for randomly covering a domain. We extend this result to account for the presence of multiple species, from which are derived useful probabilities for fully recovering a particular target microbe of interest and for average contig length. These show improved specificities compared to older measures and recommend deeper data generation than the levels chosen by some early studies, supporting the view that poor assemblies were due at least somewhat to insufficient data. We assess predictions empirically by generating roughly 4.5 Gb of sequence from a twelve member bacterial community, comparing coverage for two particular members, Selenomonas artemidis and Enterococcus faecium, which are the least (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\sim $$\end{document}3 %) and most (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\sim $$\end{document}12 %) abundant species, respectively. Agreement is reasonable, with differences likely attributable to coverage biases. We show that, in some cases, bias is simple in the sense that a small reduction in read length to simulate less efficient covering brings data and theory into essentially complete accord. Finally, we describe two applications of the theory. One plots coverage probability over the relevant parameter space, constructing essentially a “metagenomic design map” to enable straightforward analysis and design of future projects. The other gives an overview of the data requirements for various types of sequencing milestones, including a desired number of contact reads and contig length, for detection of a rare viral species.
Collapse
|
5
|
Stanhope SA. Occupancy modeling, maximum contig size probabilities and designing metagenomics experiments. PLoS One 2010; 5:e11652. [PMID: 20686599 PMCID: PMC2912229 DOI: 10.1371/journal.pone.0011652] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2010] [Accepted: 06/22/2010] [Indexed: 11/19/2022] Open
Abstract
Mathematical aspects of coverage and gaps in genome assembly have received substantial attention by bioinformaticians. Typical problems under consideration suppose that reads can be experimentally obtained from a single genome and that the number of reads will be set to cover a large percentage of that genome at a desired depth. In metagenomics experiments genomes from multiple species are simultaneously analyzed and obtaining large numbers of reads per genome is unlikely. We propose the probability of obtaining at least one contig of a desired minimum size from each novel genome in the pool without restriction based on depth of coverage as a metric for metagenomic experimental design. We derive an approximation to the distribution of maximum contig size for single genome assemblies using relatively few reads. This approximation is verified in simulation studies and applied to a number of different metagenomic experimental design problems, ranging in difficulty from detecting a single novel genome in a pool of known species to detecting each of a random number of novel genomes collectively sized and with abundances corresponding to given distributions in a single pool.
Collapse
Affiliation(s)
- Stephen A Stanhope
- Biological Sciences Division, University of Chicago, Chicago, Illinois, United States of America.
| |
Collapse
|
6
|
Hooper SD, Dalevi D, Pati A, Mavromatis K, Ivanova NN, Kyrpides NC. Estimating DNA coverage and abundance in metagenomes using a gamma approximation. ACTA ACUST UNITED AC 2009; 26:295-301. [PMID: 20008478 PMCID: PMC2815663 DOI: 10.1093/bioinformatics/btp687] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Motivation: Shotgun sequencing generates large numbers of short DNA reads from either an isolated organism or, in the case of metagenomics projects, from the aggregate genome of a microbial community. These reads are then assembled based on overlapping sequences into larger, contiguous sequences (contigs). The feasibility of assembly and the coverage achieved (reads per nucleotide or distinct sequence of nucleotides) depend on several factors: the number of reads sequenced, the read length and the relative abundances of their source genomes in the microbial community. A low coverage suggests that most of the genomic DNA in the sample has not been sequenced, but it is often difficult to estimate either the extent of the uncaptured diversity or the amount of additional sequencing that would be most efficacious. In this work, we regard a metagenome as a population of DNA fragments (bins), each of which may be covered by one or more reads. We employ a gamma distribution to model this bin population due to its flexibility and ease of use. When a gamma approximation can be found that adequately fits the data, we may estimate the number of bins that were not sequenced and that could potentially be revealed by additional sequencing. We evaluated the performance of this model using simulated metagenomes and demonstrate its applicability on three recent metagenomic datasets. Contact:sean.d.hooper@genpat.uu.se Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sean D Hooper
- Department of Energy Joint Genome Institute (DOE-JGI), Genome Biology Program, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA.
| | | | | | | | | | | |
Collapse
|
7
|
Wendl MC, Wilson RK. Aspects of coverage in medical DNA sequencing. BMC Bioinformatics 2008; 9:239. [PMID: 18485222 PMCID: PMC2430974 DOI: 10.1186/1471-2105-9-239] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2007] [Accepted: 05/16/2008] [Indexed: 11/25/2022] Open
Abstract
Background DNA sequencing is now emerging as an important component in biomedical studies of diseases like cancer. Short-read, highly parallel sequencing instruments are expected to be used heavily for such projects, but many design specifications have yet to be conclusively established. Perhaps the most fundamental of these is the redundancy required to detect sequence variations, which bears directly upon genomic coverage and the consequent resolving power for discerning somatic mutations. Results We address the medical sequencing coverage problem via an extension of the standard mathematical theory of haploid coverage. The expected diploid multi-fold coverage, as well as its generalization for aneuploidy are derived and these expressions can be readily evaluated for any project. The resulting theory is used as a scaling law to calibrate performance to that of standard BAC sequencing at 8× to 10× redundancy, i.e. for expected coverages that exceed 99% of the unique sequence. A differential strategy is formalized for tumor/normal studies wherein tumor samples are sequenced more deeply than normal ones. In particular, both tumor alleles should be detected at least twice, while both normal alleles are detected at least once. Our theory predicts these requirements can be met for tumor and normal redundancies of approximately 26× and 21×, respectively. We explain why these values do not differ by a factor of 2, as might intuitively be expected. Future technology developments should prompt even deeper sequencing of tumors, but the 21× value for normal samples is essentially a constant. Conclusion Given the assumptions of standard coverage theory, our model gives pragmatic estimates for required redundancy. The differential strategy should be an efficient means of identifying potential somatic mutations for further study.
Collapse
Affiliation(s)
- Michael C Wendl
- Genome Sequencing Center and Department of Genetics, Washington University, St Louis MO 63108, USA.
| | | |
Collapse
|
8
|
Djikeng A, Halpin R, Kuzmickas R, Depasse J, Feldblyum J, Sengamalay N, Afonso C, Zhang X, Anderson NG, Ghedin E, Spiro DJ. Viral genome sequencing by random priming methods. BMC Genomics 2008; 9:5. [PMID: 18179705 PMCID: PMC2254600 DOI: 10.1186/1471-2164-9-5] [Citation(s) in RCA: 255] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2007] [Accepted: 01/07/2008] [Indexed: 12/05/2022] Open
Abstract
Background Most emerging health threats are of zoonotic origin. For the overwhelming majority, their causative agents are RNA viruses which include but are not limited to HIV, Influenza, SARS, Ebola, Dengue, and Hantavirus. Of increasing importance therefore is a better understanding of global viral diversity to enable better surveillance and prediction of pandemic threats; this will require rapid and flexible methods for complete viral genome sequencing. Results We have adapted the SISPA methodology [1-3] to genome sequencing of RNA and DNA viruses. We have demonstrated the utility of the method on various types and sources of viruses, obtaining near complete genome sequence of viruses ranging in size from 3,000–15,000 kb with a median depth of coverage of 14.33. We used this technique to generate full viral genome sequence in the presence of host contaminants, using viral preparations from cell culture supernatant, allantoic fluid and fecal matter. Conclusion The method described is of great utility in generating whole genome assemblies for viruses with little or no available sequence information, viruses from greatly divergent families, previously uncharacterized viruses, or to more fully describe mixed viral infections.
Collapse
|
9
|
Moriarty J, Marchesi JR, Metcalfe A. Bounds on the distribution of the number of gaps when circles and lines are covered by fragments: theory and practical application to genomic and metagenomic projects. BMC Bioinformatics 2007; 8:70. [PMID: 17335566 PMCID: PMC1821341 DOI: 10.1186/1471-2105-8-70] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2006] [Accepted: 03/02/2007] [Indexed: 11/13/2022] Open
Abstract
Background The question of how a circle or line segment becomes covered when random arcs are marked off has arisen repeatedly in bioinformatics. The number of uncovered gaps is of particular interest. Approximate distributions for the number of gaps have been given in the literature, one motivation being ease of computation. Error bounds for these approximate distributions have not been given. Results We give bounds on the probability distribution of the number of gaps when a circle is covered by fragments of fixed size. The absolute error in the approximation is typically on the order of 0.1% at 10× coverage depth. The method can be applied to coverage problems on the interval, including edge effects, and applications are given to metagenomic libraries and shotgun sequencing.
Collapse
Affiliation(s)
- John Moriarty
- Department of Mathematics/Boole Centre for Research in Informatics, University College Cork, Cork, Ireland
| | - Julian R Marchesi
- Alimentary Pharmabiotic Centre, University College Cork, Cork, Ireland
- Department of Microbiology, University College Cork, Cork, Ireland
| | - Anthony Metcalfe
- Department of Mathematics/Boole Centre for Research in Informatics, University College Cork, Cork, Ireland
| |
Collapse
|
10
|
Abstract
The classical theory of shotgun DNA sequencing accounts for neither the placement dependencies that are a fundamental consequence of the forward-reverse sequencing strategy, nor the edge effect that arises for small to moderate-sized genomic targets. These phenomena are relevant to a number of sequencing scenarios, including large-insert BAC and fosmid clones, filtered genomic libraries, and macro-nuclear chromosomes. Here, we report a model that considers these two effects and provides both the expected value of coverage and its variance. Comparison to methyl-filtered maize data shows significant improvement over classical theory. The model is used to analyze coverage performance over a range of small to moderately-sized genomic targets. We find that the read pairing effect and the edge effect interact in a non-trivial fashion. Shorter reads give superior coverage per unit sequence depth relative to longer ones. In principle, end-sequences can be optimized with respect to template insert length; however, optimal performance is unlikely to be realized in most cases because of inherent size variation in any set of targets. Conversely, single-stranded reads exhibit roughly the same coverage attributes as optimized end-reads. Although linking information is lost, single-stranded data should not pose a significant assembly liability if the target represents predominantly low-copy sequence. We also find that random sequencing should be halted at substantially lower redundancies than those now associated with larger projects. Given the enormous amount of data generated per cycle on pyro-sequencing instruments, this observation suggests devising schemes to split each run cycle between twoor more projects. This would prevent over-sequencing and would further leverage the pyrosequencing method.
Collapse
Affiliation(s)
- Michael C Wendl
- Genome Sequencing Center, Washington University, St. Louis, Missouri 63108, USA.
| |
Collapse
|