1
|
Matsumoto Y, Nakamura S. Rapid and Comprehensive Identification of Nontuberculous Mycobacteria. Methods Mol Biol 2023; 2632:247-255. [PMID: 36781733 DOI: 10.1007/978-1-0716-2996-3_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2023]
Abstract
Next-generation sequencing is a powerful tool to accurately identify pathogens. The MinION sequencer is best suited for the rapid identification of bacterial species due to its real-time sequence output. In this chapter, we introduce a method to identify nontuberculous mycobacteria (NTM) in one sequencing analysis from culture isolates using the MinION sequencer. NTM disease is now recognized as a growing global health concern due to its increasing incidence and prevalence. There are over 200 NTM species, of which the major pathogens are further classified into many subspecies showing different antibiotic susceptibilities. Therefore, identifying the pathogens at the subspecies level of NTM is necessary to select an appropriate treatment regimen. The protocol described here includes DNA extraction by lysis using silica beads, library preparation, sequencing by the MinION sequencer, and analysis of multilocus sequence typing using the software "mlstverse" and enables rapid and comprehensive identification of 175 species of NTM at the subspecies level with high sensitivity and accuracy.
Collapse
Affiliation(s)
- Yuki Matsumoto
- Department of Infection Metagenomics, Research Institute for Microbial Diseases, Osaka University, Osaka, Japan
| | - Shota Nakamura
- Department of Infection Metagenomics, Research Institute for Microbial Diseases, Osaka University, Osaka, Japan.
- Center for Infectious Disease Education and Research, Osaka University, Osaka, Japan.
| |
Collapse
|
2
|
Abstract
Arctic permafrost is thawing due to global warming, with unknown consequences on the microbial inhabitants or associated viruses. DNA viruses have previously been shown to be abundant and active in thawing permafrost, but little is known about RNA viruses in these systems. To address this knowledge gap, we assessed the composition of RNA viruses in thawed permafrost samples that were incubated for 97 days at 4°C to simulate thaw conditions. A diverse RNA viral community was assembled from metatranscriptome data including double-stranded RNA viruses, dominated by Reoviridae and Hypoviridae, and negative and positive single-stranded RNA viruses, with relatively high representations of Rhabdoviridae and Leviviridae, respectively. Sequences corresponding to potential plant and human pathogens were also detected. The detected RNA viruses primarily targeted dominant eukaryotic taxa in the samples (e.g., fungi, Metazoa and Viridiplantae) and the viral community structures were significantly associated with predicted host populations. These results indicate that RNA viruses are linked to eukaryotic host dynamics. Several of the RNA viral sequences contained auxiliary metabolic genes encoding proteins involved in carbon utilization (e.g., polygalacturosase), implying their potential roles in carbon cycling in thawed permafrost. IMPORTANCE Permafrost is thawing at a rapid pace in the Arctic with largely unknown consequences on ecological processes that are fundamental to Arctic ecosystems. This is the first study to determine the composition of RNA viruses in thawed permafrost. Other recent studies have characterized DNA viruses in thawing permafrost, but the majority of DNA viruses are bacteriophages that target bacterial hosts. By contrast RNA viruses primarily target eukaryotic hosts and thus represent potential pathogenic threats to humans, animals, and plants. Here, we find that RNA viruses in permafrost are novel and distinct from those in other habitats studied to date. The COVID-19 pandemic has heightened awareness of the importance of potential environmental reservoirs of emerging RNA viral pathogens. We demonstrate that some potential pathogens were detected after an experimental thawing regime. These results are important for understanding critical viral-host interactions and provide a better understanding of the ecological roles that RNA viruses play as permafrost thaws.
Collapse
|
3
|
Ma Z(S. Estimating the Optimum Coverage and Quality of Amplicon Sequencing With Taylor's Power Law Extensions. Front Bioeng Biotechnol 2020; 8:372. [PMID: 32500062 PMCID: PMC7242763 DOI: 10.3389/fbioe.2020.00372] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Accepted: 04/03/2020] [Indexed: 11/13/2022] Open
Abstract
Theoretical analysis of DNA sequencing coverage problem has been investigated with complex mathematical models such as Lander-Waterman expectation theory and Stevens' theorem for randomly covering a domain. In the field of metagenomics sequencing, several approaches have been developed to estimate the coverage of whole-genome shotgun sequencing, but surprisingly few studies addressed the coverage problem for marker-gene amplicon sequencing, for which arguably the biggest challenge is the complexity or heterogeneity of microbial communities. Overall, much of the practice still relies variously on speculation, semi-empirical and ad hoc heuristic models. Conservatively raising coverage may ensure the success of sequencing project, but often with unduly cost. In this study, we borrow the principles and approaches of optimum sampling methodology originated in applied entomology, achieved equal success in plant pathology and parasitology, and plays a critical role in the decision-making for global crop and forest protection against economic pests since 1970s when the pesticide crisis and food safety concerns forced the reduction of pesticide usages, which in turn requires reliable sampling techniques for monitoring pest populations. We realized that sequencing coverage is essentially an optimum sampling problem. Perhaps the only essential difference between sampling insects and sampling microbiome is the "instrument" used. In traditional entomology, it is usually humans that visually count the numbers of insects, occasionally aided by binocular microscope. In the metagenomics research, it is the DNA sequencers that count the number of DNA reads. Furthermore, a key theoretical foundation for sampling insect pest populations, i.e., Taylor's power law, which achieved rare status of ecological law and captures the population aggregation, has been recently extended to the community level for describing community heterogeneity and stability, namely, Taylor's power law extensions (TPLEs). This theoretical advance enabled us to develop a novel approach to assessing the quality and determining optimum reads (coverage) of amplicon sequencing operations. Specifically, two applications were developed: one is, in hindsight, to assess the quality of amplicon sequencing operation in terms of the precision and confidence levels. Another is, prior to sequencing operation, to determine the minimum sequencing efforts for a sequencing project to achieve preset precision and confidence levels.
Collapse
Affiliation(s)
- Zhanshan (Sam) Ma
- Computational Biology and Medical Ecology Lab, State Key Lab of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| |
Collapse
|
4
|
Deng C, Daley T, De Sena Brandine G, Smith AD. Molecular Heterogeneity in Large-Scale Biological Data: Techniques and Applications. Annu Rev Biomed Data Sci 2019. [DOI: 10.1146/annurev-biodatasci-072018-021339] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
High-throughput sequencing technologies have evolved at a stellar pace for almost a decade and have greatly advanced our understanding of genome biology. In these sampling-based technologies, there is an important detail that is often overlooked in the analysis of the data and the design of the experiments, specifically that the sampled observations often do not give a representative picture of the underlying population. This has long been recognized as a problem in statistical ecology and in the broader statistics literature. In this review, we discuss the connections between these fields, methodological advances that parallel both the needs and opportunities of large-scale data analysis, and specific applications in modern biology. In the process we describe unique aspects of applying these approaches to sequencing technologies, including sequencing error, population and individual heterogeneity, and the design of experiments.
Collapse
Affiliation(s)
- Chao Deng
- Department of Molecular and Computational Biology, University of Southern California, Los Angeles, California 90089, USA
| | - Timothy Daley
- Department of Statistics and Department of Bioengineering, Stanford University, Stanford, California 94305, USA
| | - Guilherme De Sena Brandine
- Department of Molecular and Computational Biology, University of Southern California, Los Angeles, California 90089, USA
| | - Andrew D. Smith
- Department of Molecular and Computational Biology, University of Southern California, Los Angeles, California 90089, USA
| |
Collapse
|
5
|
Abstract
Background Metagenomic sequencing is a powerful technology for studying the mixture of microbes or the microbiomes on human and in the environment. One basic task of analyzing metagenomic data is to identify the component genomes in the community. This task is challenging due to the complexity of microbiome composition, limited availability of known reference genomes, and usually insufficient sequencing coverage. Results As an initial step toward understanding the complete composition of a metagenomic sample, we studied the problem of estimating the total length of all distinct component genomes in a metagenomic sample. We showed that this problem can be solved by estimating the total number of distinct k-mers in all the metagenomic sequencing data. We proposed a method for this estimation based on the sequencing coverage distribution of observed k-mers, and introduced a k-mer redundancy index (KRI) to fill in the gap between the count of distinct k-mers and the total genome length. We showed the effectiveness of the proposed method on a set of carefully designed simulation data corresponding to multiple situations of true metagenomic data. Results on real data indicate that the uncaptured genomic information can vary dramatically across metagenomic samples, with the potential to mislead downstream analyses. Conclusions We proposed the question of how long the total genome length of all different species in a microbial community is and introduced a method to answer it. Electronic supplementary material The online version of this article (10.1186/s12864-019-5467-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kui Hua
- MOE Key Laboratory of Bioinformatics Division and Center for Synthetic & System Biology, BNRIST, Beijing, 100084, China.,Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Xuegong Zhang
- MOE Key Laboratory of Bioinformatics Division and Center for Synthetic & System Biology, BNRIST, Beijing, 100084, China. .,Department of Automation, Tsinghua University, Beijing, 100084, China. .,School of Life Sciences, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
6
|
Almeida OGG, De Martinis ECP. Bioinformatics tools to assess metagenomic data for applied microbiology. Appl Microbiol Biotechnol 2018; 103:69-82. [DOI: 10.1007/s00253-018-9464-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Revised: 10/15/2018] [Accepted: 10/16/2018] [Indexed: 12/14/2022]
|
7
|
Joint Analysis of Long and Short Reads Enables Accurate Estimates of Microbiome Complexity. Cell Syst 2018; 7:192-200.e3. [PMID: 30056005 DOI: 10.1016/j.cels.2018.06.009] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Revised: 05/05/2018] [Accepted: 06/15/2018] [Indexed: 01/09/2023]
Abstract
Reduced microbiome diversity has been linked to several diseases. However, estimating the diversity of bacterial communities-the number and the total length of distinct genomes within a metagenome-remains an open problem in microbial ecology. Here, we describe an algorithm for estimating the microbial diversity in a metagenomic sample based on a joint analysis of short and long reads. Unlike previous approaches, the algorithm does not make any assumptions on the distribution of the frequencies of genomes within a metagenome (as in parametric methods) and does not require a large database that covers the total diversity (as in non-parametric methods). We estimate that genomes comprising a human gut metagenome have total length varying from 1.3 to 3.5 billion nucleotides, with genomes responsible for 50% of total abundance having total length varying from only 25 to 61 million nucleotides. In contrast, genomes comprising an aquifer sediment metagenome have more than two orders of magnitude larger total length (≈840 billion nucleotides).
Collapse
|
8
|
Nonpareil 3: Fast Estimation of Metagenomic Coverage and Sequence Diversity. mSystems 2018; 3:mSystems00039-18. [PMID: 29657970 PMCID: PMC5893860 DOI: 10.1128/msystems.00039-18] [Citation(s) in RCA: 127] [Impact Index Per Article: 21.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2018] [Accepted: 03/23/2018] [Indexed: 01/15/2023] Open
Abstract
Estimations of microbial community diversity based on metagenomic data sets are affected, often to an unknown degree, by biases derived from insufficient coverage and reference database-dependent estimations of diversity. For instance, the completeness of reference databases cannot be generally estimated since it depends on the extant diversity sampled to date, which, with the exception of a few habitats such as the human gut, remains severely undersampled. Further, estimation of the degree of coverage of a microbial community by a metagenomic data set is prohibitively time-consuming for large data sets, and coverage values may not be directly comparable between data sets obtained with different sequencing technologies. Here, we extend Nonpareil, a database-independent tool for the estimation of coverage in metagenomic data sets, to a high-performance computing implementation that scales up to hundreds of cores and includes, in addition, a k-mer-based estimation as sensitive as the original alignment-based version but about three hundred times as fast. Further, we propose a metric of sequence diversity (Nd ) derived directly from Nonpareil curves that correlates well with alpha diversity assessed by traditional metrics. We use this metric in different experiments demonstrating the correlation with the Shannon index estimated on 16S rRNA gene profiles and show that Nd additionally reveals seasonal patterns in marine samples that are not captured by the Shannon index and more precise rankings of the magnitude of diversity of microbial communities in different habitats. Therefore, the new version of Nonpareil, called Nonpareil 3, advances the toolbox for metagenomic analyses of microbiomes. IMPORTANCE Estimation of the coverage provided by a metagenomic data set, i.e., what fraction of the microbial community was sampled by DNA sequencing, represents an essential first step of every culture-independent genomic study that aims to robustly assess the sequence diversity present in a sample. However, estimation of coverage remains elusive because of several technical limitations associated with high computational requirements and limiting statistical approaches to quantify diversity. Here we described Nonpareil 3, a new bioinformatics algorithm that circumvents several of these limitations and thus can facilitate culture-independent studies in clinical or environmental settings, independent of the sequencing platform employed. In addition, we present a new metric of sequence diversity based on rarefied coverage and demonstrate its use in communities from diverse ecosystems.
Collapse
|
9
|
Affiliation(s)
- Bing Xing Wang
- Department of Statistics, Zhejiang Gongshang University, Hangzhou, China
| | - Fangtao Wu
- Department of Statistics, Zhejiang Gongshang University, Hangzhou, China
| |
Collapse
|
10
|
García-Ortega LF, Martínez O. How Many Genes Are Expressed in a Transcriptome? Estimation and Results for RNA-Seq. PLoS One 2015; 10:e0130262. [PMID: 26107654 PMCID: PMC4479379 DOI: 10.1371/journal.pone.0130262] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2015] [Accepted: 05/19/2015] [Indexed: 01/02/2023] Open
Abstract
RNA-seq experiments estimate the number of genes expressed in a transcriptome as well as their relative frequencies. However, an undetermined number of genes can remain undetected due to their low expression relative to the sample size (sequence depth). Estimation of the true number of genes expressed in a transcriptome is essential in order to determine which genes are exclusively expressed in specific tissues or under particular conditions. A reliable estimate of the true number of expressed genes is also required to accurately measure transcriptome changes and to predict the sequencing depth needed to increase the proportion of detected genes. This problem is analogous to ecological sampling problems such as estimating the number of species at a given site. Here we present a non-parametric estimator for the number of undetected genes as well as for the extra sample size needed to detect a given proportion of the undetected genes. Our estimators are superior to ones already published by having smaller standard errors and biases. We applied our method to a set of 32 publicly available RNA-seq experiments, including the evaluation of 311 individually sequenced libraries. We found that in the majority of the cases more than one thousand genes are undetected, and that on average approximately 6% of the expressed genes per accession remain undetected. This figure increases to approximately 10% if individual sequencing libraries are analyzed. Our method is also applicable to metagenomic experiments. Using our method, the number of undetected genes as well as the sample size needed to detect them can be calculated, leading to more accurate and complete gene expression studies.
Collapse
Affiliation(s)
- Luis Fernando García-Ortega
- Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (Cinvestav-IPN), Irapuato, Guanajuato, México
| | - Octavio Martínez
- Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (Cinvestav-IPN), Irapuato, Guanajuato, México
| |
Collapse
|
11
|
|
12
|
|
13
|
Rodriguez-R LM, Konstantinidis KT. Nonpareil: a redundancy-based approach to assess the level of coverage in metagenomic datasets. ACTA ACUST UNITED AC 2013; 30:629-35. [PMID: 24123672 DOI: 10.1093/bioinformatics/btt584] [Citation(s) in RCA: 152] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Determining the fraction of the diversity within a microbial community sampled and the amount of sequencing required to cover the total diversity represent challenging issues for metagenomics studies. Owing to these limitations, central ecological questions with respect to the global distribution of microbes and the functional diversity of their communities cannot be robustly assessed. RESULTS We introduce Nonpareil, a method to estimate and project coverage in metagenomes. Nonpareil does not rely on high-quality assemblies, operational taxonomic unit calling or comprehensive reference databases; thus, it is broadly applicable to metagenomic studies. Application of Nonpareil on available metagenomic datasets provided estimates on the relative complexity of soil, freshwater and human microbiome communities, and suggested that ∼200 Gb of sequencing data are required for 95% abundance-weighted average coverage of the soil communities analyzed. AVAILABILITY AND IMPLEMENTATION Nonpareil is available at https://github.com/lmrodriguezr/nonpareil/ under the Artistic License 2.0.
Collapse
Affiliation(s)
- Luis M Rodriguez-R
- Center for Bioinformatics and Computational Genomics, School of Biology and School of Civil and Environmental Engineering, Georgia Institute of Technology, 311 Ferst Drive, Ford ES&T Building, Suite 3224, Atlanta, GA 30332, USA
| | | |
Collapse
|
14
|
Lindner MS, Kollock M, Zickmann F, Renard BY. Analyzing genome coverage profiles with applications to quality control in metagenomics. Bioinformatics 2013; 29:1260-7. [DOI: 10.1093/bioinformatics/btt147] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
15
|
Luo C, Rodriguez-R LM, Konstantinidis KT. A user's guide to quantitative and comparative analysis of metagenomic datasets. Methods Enzymol 2013; 531:525-47. [PMID: 24060135 DOI: 10.1016/b978-0-12-407863-5.00023-x] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Metagenomics has revolutionized microbiological studies during the past decade and provided new insights into the diversity, dynamics, and metabolic potential of natural microbial communities. However, metagenomics still represents a field in development, and standardized tools and approaches to handle and compare metagenomes have not been established yet. An important reason accounting for the latter is the continuous changes in the type of sequencing data available, for example, long versus short sequencing reads. Here, we provide a guide to bioinformatic pipelines developed to accomplish the following tasks, focusing primarily on those developed by our team: (i) assemble a metagenomic dataset; (ii) determine the level of sequence coverage obtained and the amount of sequencing required to obtain complete coverage; (iii) identify the taxonomic affiliation of a metagenomic read or assembled contig; and (iv) determine differentially abundant genes, pathways, and species between different datasets. Most of these pipelines do not depend on the type of sequences available or can be easily adjusted to fit different types of sequences, and are freely available (for instance, through our lab Web site: http://www.enve-omics.gatech.edu/). The limitations of current approaches, as well as the computational aspects that can be further improved, will also be briefly discussed. The work presented here provides practical guidelines on how to perform metagenomic analysis of microbial communities characterized by varied levels of diversity and establishes approaches to handle the resulting data, independent of the sequencing platform employed.
Collapse
Affiliation(s)
- Chengwei Luo
- Center for Bioinformatics and Computational Genomics, Georgia Institute of Technology, Atlanta, Georgia, USA; School of Biology, Georgia Institute of Technology, Atlanta, Georgia, USA; School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA
| | | | | |
Collapse
|
16
|
Abstract
Metagenomic studies have truly revolutionised biology and medicine, and changed the way we study genomics. As genome sequencing becomes cheaper it is being applied to study complex metagenomes. 'Metagenome' is the genetic material recovered directly from an environmental sample or niche. By delivering fast, cheap, and large volumes of data Next Generation Sequencing (NGS) platforms have facilitated a deeper understanding of the fundamentals of genomes, gene functions and regulation. Metagenomics, also referred to as environmental or community genomics, has brought about radical changes in our ability to analyse complex microbial communities by direct sampling of their natural habitat paving the way for the creation of innovative new areas for biomedical research. Many metagenomic studies involving the 'human microbiome'have been undertaken to date. Samples from of a number of diverse habitats including different human body sites have been subject to metagenomic examinations. Huge national and international projects with the purpose of elucidating the biogeography of microbial communities living within and on the human body, are well underway. The analysis of human microbiome data has brought about a paradigm shift in our understanding of the role of resident microflora in human health and disease and brings non-traditional areas such as gut ecology to the forefront of personalised medicine. In this chapter we present an overview of the state-of-the-art in current literature and projects pertaining to human microbiome studies.
Collapse
Affiliation(s)
- Ramana Madupu
- Genomic Medicine group at the J. Craig Venter Institute, USA
| | | | | |
Collapse
|
17
|
Boulund F, Johnning A, Pereira MB, Larsson DGJ, Kristiansson E. A novel method to discover fluoroquinolone antibiotic resistance (qnr) genes in fragmented nucleotide sequences. BMC Genomics 2012; 13:695. [PMID: 23231464 PMCID: PMC3543242 DOI: 10.1186/1471-2164-13-695] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2012] [Accepted: 12/04/2012] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Broad-spectrum fluoroquinolone antibiotics are central in modern health care and are used to treat and prevent a wide range of bacterial infections. The recently discovered qnr genes provide a mechanism of resistance with the potential to rapidly spread between bacteria using horizontal gene transfer. As for many antibiotic resistance genes present in pathogens today, qnr genes are hypothesized to originate from environmental bacteria. The vast amount of data generated by shotgun metagenomics can therefore be used to explore the diversity of qnr genes in more detail. RESULTS In this paper we describe a new method to identify qnr genes in nucleotide sequence data. We show, using cross-validation, that the method has a high statistical power of correctly classifying sequences from novel classes of qnr genes, even for fragments as short as 100 nucleotides. Based on sequences from public repositories, the method was able to identify all previously reported plasmid-mediated qnr genes. In addition, several fragments from novel putative qnr genes were identified in metagenomes. The method was also able to annotate 39 chromosomal variants of which 11 have previously not been reported in literature. CONCLUSIONS The method described in this paper significantly improves the sensitivity and specificity of identification and annotation of qnr genes in nucleotide sequence data. The predicted novel putative qnr genes in the metagenomic data support the hypothesis of a large and uncharacterized diversity within this family of resistance genes in environmental bacterial communities. An implementation of the method is freely available at http://bioinformatics.math.chalmers.se/qnr/.
Collapse
Affiliation(s)
- Fredrik Boulund
- Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg, Göteborg, SE-412 96, Sweden
| | | | | | | | | |
Collapse
|
18
|
Coverage theories for metagenomic DNA sequencing based on a generalization of Stevens' theorem. J Math Biol 2012; 67:1141-61. [PMID: 22965653 PMCID: PMC3795925 DOI: 10.1007/s00285-012-0586-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2011] [Revised: 08/28/2012] [Indexed: 11/21/2022]
Abstract
Metagenomic project design has relied variously upon speculation, semi-empirical and ad hoc heuristic models, and elementary extensions of single-sample Lander–Waterman expectation theory, all of which are demonstrably inadequate. Here, we propose an approach based upon a generalization of Stevens’ Theorem for randomly covering a domain. We extend this result to account for the presence of multiple species, from which are derived useful probabilities for fully recovering a particular target microbe of interest and for average contig length. These show improved specificities compared to older measures and recommend deeper data generation than the levels chosen by some early studies, supporting the view that poor assemblies were due at least somewhat to insufficient data. We assess predictions empirically by generating roughly 4.5 Gb of sequence from a twelve member bacterial community, comparing coverage for two particular members, Selenomonas artemidis and Enterococcus faecium, which are the least (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\sim $$\end{document}3 %) and most (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\sim $$\end{document}12 %) abundant species, respectively. Agreement is reasonable, with differences likely attributable to coverage biases. We show that, in some cases, bias is simple in the sense that a small reduction in read length to simulate less efficient covering brings data and theory into essentially complete accord. Finally, we describe two applications of the theory. One plots coverage probability over the relevant parameter space, constructing essentially a “metagenomic design map” to enable straightforward analysis and design of future projects. The other gives an overview of the data requirements for various types of sequencing milestones, including a desired number of contact reads and contig length, for detection of a rare viral species.
Collapse
|
19
|
Zhang T, Zhang XX, Ye L. Plasmid metagenome reveals high levels of antibiotic resistance genes and mobile genetic elements in activated sludge. PLoS One 2011; 6:e26041. [PMID: 22016806 PMCID: PMC3189950 DOI: 10.1371/journal.pone.0026041] [Citation(s) in RCA: 198] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2011] [Accepted: 09/16/2011] [Indexed: 11/19/2022] Open
Abstract
The overuse or misuse of antibiotics has accelerated antibiotic resistance, creating a major challenge for the public health in the world. Sewage treatment plants (STPs) are considered as important reservoirs for antibiotic resistance genes (ARGs) and activated sludge characterized with high microbial density and diversity facilitates ARG horizontal gene transfer (HGT) via mobile genetic elements (MGEs). However, little is known regarding the pool of ARGs and MGEs in sludge microbiome. In this study, the transposon aided capture (TRACA) system was employed to isolate novel plasmids from activated sludge of one STP in Hong Kong, China. We also used Illumina Hiseq 2000 high-throughput sequencing and metagenomics analysis to investigate the plasmid metagenome. Two novel plasmids were acquired from the sludge microbiome by using TRACA system and one novel plasmid was identified through metagenomics analysis. Our results revealed high levels of various ARGs as well as MGEs for HGT, including integrons, transposons and plasmids. The application of the TRACA system to isolate novel plasmids from the environmental metagenome, coupled with subsequent high-throughput sequencing and metagenomic analysis, highlighted the prevalence of ARGs and MGEs in microbial community of STPs.
Collapse
Affiliation(s)
- Tong Zhang
- Environmental Biotechnology Lab, Department of Civil Engineering, The University of Hong Kong, Hong Kong SAR, China.
| | | | | |
Collapse
|
20
|
Lombard N, Prestat E, van Elsas JD, Simonet P. Soil-specific limitations for access and analysis of soil microbial communities by metagenomics. FEMS Microbiol Ecol 2011; 78:31-49. [PMID: 21631545 DOI: 10.1111/j.1574-6941.2011.01140.x] [Citation(s) in RCA: 99] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Metagenomics approaches represent an important way to acquire information on the microbial communities present in complex environments like soil. However, to what extent do these approaches provide us with a true picture of soil microbial diversity? Soil is a challenging environment to work with. Its physicochemical properties affect microbial distributions inside the soil matrix, metagenome extraction and its subsequent analyses. To better understand the bias inherent to soil metagenome 'processing', we focus on soil physicochemical properties and their effects on the perceived bacterial distribution. In the light of this information, each step of soil metagenome processing is then discussed, with an emphasis on strategies for optimal soil sampling. Then, the interaction of cells and DNA with the soil matrix and the consequences for microbial DNA extraction are examined. Soil DNA extraction methods are compared and the veracity of the microbial profiles obtained is discussed. Finally, soil metagenomic sequence analysis and exploitation methods are reviewed.
Collapse
Affiliation(s)
- Nathalie Lombard
- Department of Marine Biotechnology, Institute of Marine Environmental Technology, University of Maryland Baltimore County, Baltimore, MD 21202, USA.
| | | | | | | |
Collapse
|
21
|
Kristiansson E, Fick J, Janzon A, Grabic R, Rutgersson C, Weijdegård B, Söderström H, Larsson DGJ. Pyrosequencing of antibiotic-contaminated river sediments reveals high levels of resistance and gene transfer elements. PLoS One 2011; 6:e17038. [PMID: 21359229 PMCID: PMC3040208 DOI: 10.1371/journal.pone.0017038] [Citation(s) in RCA: 353] [Impact Index Per Article: 27.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2010] [Accepted: 01/11/2011] [Indexed: 12/18/2022] Open
Abstract
The high and sometimes inappropriate use of antibiotics has accelerated the development of antibiotic resistance, creating a major challenge for the sustainable treatment of infections world-wide. Bacterial communities often respond to antibiotic selection pressure by acquiring resistance genes, i.e. mobile genetic elements that can be shared horizontally between species. Environmental microbial communities maintain diverse collections of resistance genes, which can be mobilized into pathogenic bacteria. Recently, exceptional environmental releases of antibiotics have been documented, but the effects on the promotion of resistance genes and the potential for horizontal gene transfer have yet received limited attention. In this study, we have used culture-independent shotgun metagenomics to investigate microbial communities in river sediments exposed to waste water from the production of antibiotics in India. Our analysis identified very high levels of several classes of resistance genes as well as elements for horizontal gene transfer, including integrons, transposons and plasmids. In addition, two abundant previously uncharacterized resistance plasmids were identified. The results suggest that antibiotic contamination plays a role in the promotion of resistance genes and their mobilization from environmental microbes to other species and eventually to human pathogens. The entire life-cycle of antibiotic substances, both before, under and after usage, should therefore be considered to fully evaluate their role in the promotion of resistance.
Collapse
Affiliation(s)
- Erik Kristiansson
- Department of Neuroscience and Physiology, the Sahlgrenska Academy at the University of Gothenburg, Göteborg, Sweden
- Department of Mathematical Statistics, Chalmers University of Technology, Göteborg, Sweden
| | - Jerker Fick
- Department of Chemistry, Umeå University, Umeå, Sweden
| | - Anders Janzon
- Department of Neuroscience and Physiology, the Sahlgrenska Academy at the University of Gothenburg, Göteborg, Sweden
| | - Roman Grabic
- Department of Chemistry, Umeå University, Umeå, Sweden
| | - Carolin Rutgersson
- Department of Neuroscience and Physiology, the Sahlgrenska Academy at the University of Gothenburg, Göteborg, Sweden
| | - Birgitta Weijdegård
- Department of Neuroscience and Physiology, the Sahlgrenska Academy at the University of Gothenburg, Göteborg, Sweden
| | | | - D. G. Joakim Larsson
- Department of Neuroscience and Physiology, the Sahlgrenska Academy at the University of Gothenburg, Göteborg, Sweden
- * E-mail:
| |
Collapse
|
22
|
Wu X, Ren C, Joshi T, Vuong T, Xu D, Nguyen HT. SNP discovery by high-throughput sequencing in soybean. BMC Genomics 2010; 11:469. [PMID: 20701770 PMCID: PMC3091665 DOI: 10.1186/1471-2164-11-469] [Citation(s) in RCA: 86] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2010] [Accepted: 08/11/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND With the advance of new massively parallel genotyping technologies, quantitative trait loci (QTL) fine mapping and map-based cloning become more achievable in identifying genes for important and complex traits. Development of high-density genetic markers in the QTL regions of specific mapping populations is essential for fine-mapping and map-based cloning of economically important genes. Single nucleotide polymorphisms (SNPs) are the most abundant form of genetic variation existing between any diverse genotypes that are usually used for QTL mapping studies. The massively parallel sequencing technologies (Roche GS/454, Illumina GA/Solexa, and ABI/SOLiD), have been widely applied to identify genome-wide sequence variations. However, it is still remains unclear whether sequence data at a low sequencing depth are enough to detect the variations existing in any QTL regions of interest in a crop genome, and how to prepare sequencing samples for a complex genome such as soybean. Therefore, with the aims of identifying SNP markers in a cost effective way for fine-mapping several QTL regions, and testing the validation rate of the putative SNPs predicted with Solexa short sequence reads at a low sequencing depth, we evaluated a pooled DNA fragment reduced representation library and SNP detection methods applied to short read sequences generated by Solexa high-throughput sequencing technology. RESULTS A total of 39,022 putative SNPs were identified by the Illumina/Solexa sequencing system using a reduced representation DNA library of two parental lines of a mapping population. The validation rates of these putative SNPs predicted with low and high stringency were 72% and 85%, respectively. One hundred sixty four SNP markers resulted from the validation of putative SNPs and have been selectively chosen to target a known QTL, thereby increasing the marker density of the targeted region to one marker per 42 K bp. CONCLUSIONS We have demonstrated how to quickly identify large numbers of SNPs for fine mapping of QTL regions by applying massively parallel sequencing combined with genome complexity reduction techniques. This SNP discovery approach is more efficient for targeting multiple QTL regions in a same genetic population, which can be applied to other crops.
Collapse
Affiliation(s)
- Xiaolei Wu
- Division of Plant Sciences and National Center for Soybean Biotechnology, University of Missouri, Columbia, MO 65211, USA
| | - Chengwei Ren
- Division of Plant Sciences and National Center for Soybean Biotechnology, University of Missouri, Columbia, MO 65211, USA
- Beta Seed Inc., Tangent, OR 97389, USA
| | - Trupti Joshi
- Digital Biology Laboratory, Computer Science Department and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, 65211, USA
| | - Tri Vuong
- Division of Plant Sciences and National Center for Soybean Biotechnology, University of Missouri, Columbia, MO 65211, USA
| | - Dong Xu
- Digital Biology Laboratory, Computer Science Department and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, 65211, USA
| | - Henry T Nguyen
- Division of Plant Sciences and National Center for Soybean Biotechnology, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|