Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Hooper SD, Dalevi D, Pati A, Mavromatis K, Ivanova NN, Kyrpides NC. Estimating DNA coverage and abundance in metagenomes using a gamma approximation. ACTA ACUST UNITED AC 2009;26:295-301. [PMID: 20008478 PMCID: PMC2815663 DOI: 10.1093/bioinformatics/btp687] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

For:	Hooper SD, Dalevi D, Pati A, Mavromatis K, Ivanova NN, Kyrpides NC. Estimating DNA coverage and abundance in metagenomes using a gamma approximation. ACTA ACUST UNITED AC 2009;26:295-301. [PMID: 20008478 PMCID: PMC2815663 DOI: 10.1093/bioinformatics/btp687] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Number

Cited by Other Article(s)

Matsumoto Y, Nakamura S. Rapid and Comprehensive Identification of Nontuberculous Mycobacteria. Methods Mol Biol 2023;2632:247-255. [PMID: 36781733 DOI: 10.1007/978-1-0716-2996-3_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2023]

RNA Viruses Linked to Eukaryotic Hosts in Thawed Permafrost. mSystems 2022;7:e0058222. [PMID: 36453933 PMCID: PMC9765123 DOI: 10.1128/msystems.00582-22] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open

Abstract

Arctic permafrost is thawing due to global warming, with unknown consequences on the microbial inhabitants or associated viruses. DNA viruses have previously been shown to be abundant and active in thawing permafrost, but little is known about RNA viruses in these systems. To address this knowledge gap, we assessed the composition of RNA viruses in thawed permafrost samples that were incubated for 97 days at 4°C to simulate thaw conditions. A diverse RNA viral community was assembled from metatranscriptome data including double-stranded RNA viruses, dominated by Reoviridae and Hypoviridae, and negative and positive single-stranded RNA viruses, with relatively high representations of Rhabdoviridae and Leviviridae, respectively. Sequences corresponding to potential plant and human pathogens were also detected. The detected RNA viruses primarily targeted dominant eukaryotic taxa in the samples (e.g., fungi, Metazoa and Viridiplantae) and the viral community structures were significantly associated with predicted host populations. These results indicate that RNA viruses are linked to eukaryotic host dynamics. Several of the RNA viral sequences contained auxiliary metabolic genes encoding proteins involved in carbon utilization (e.g., polygalacturosase), implying their potential roles in carbon cycling in thawed permafrost. IMPORTANCE Permafrost is thawing at a rapid pace in the Arctic with largely unknown consequences on ecological processes that are fundamental to Arctic ecosystems. This is the first study to determine the composition of RNA viruses in thawed permafrost. Other recent studies have characterized DNA viruses in thawing permafrost, but the majority of DNA viruses are bacteriophages that target bacterial hosts. By contrast RNA viruses primarily target eukaryotic hosts and thus represent potential pathogenic threats to humans, animals, and plants. Here, we find that RNA viruses in permafrost are novel and distinct from those in other habitats studied to date. The COVID-19 pandemic has heightened awareness of the importance of potential environmental reservoirs of emerging RNA viral pathogens. We demonstrate that some potential pathogens were detected after an experimental thawing regime. These results are important for understanding critical viral-host interactions and provide a better understanding of the ecological roles that RNA viruses play as permafrost thaws.

Collapse

Ma Z(S. Estimating the Optimum Coverage and Quality of Amplicon Sequencing With Taylor's Power Law Extensions. Front Bioeng Biotechnol 2020;8:372. [PMID: 32500062 PMCID: PMC7242763 DOI: 10.3389/fbioe.2020.00372] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Accepted: 04/03/2020] [Indexed: 11/13/2022] Open

Abstract

Theoretical analysis of DNA sequencing coverage problem has been investigated with complex mathematical models such as Lander-Waterman expectation theory and Stevens' theorem for randomly covering a domain. In the field of metagenomics sequencing, several approaches have been developed to estimate the coverage of whole-genome shotgun sequencing, but surprisingly few studies addressed the coverage problem for marker-gene amplicon sequencing, for which arguably the biggest challenge is the complexity or heterogeneity of microbial communities. Overall, much of the practice still relies variously on speculation, semi-empirical and ad hoc heuristic models. Conservatively raising coverage may ensure the success of sequencing project, but often with unduly cost. In this study, we borrow the principles and approaches of optimum sampling methodology originated in applied entomology, achieved equal success in plant pathology and parasitology, and plays a critical role in the decision-making for global crop and forest protection against economic pests since 1970s when the pesticide crisis and food safety concerns forced the reduction of pesticide usages, which in turn requires reliable sampling techniques for monitoring pest populations. We realized that sequencing coverage is essentially an optimum sampling problem. Perhaps the only essential difference between sampling insects and sampling microbiome is the "instrument" used. In traditional entomology, it is usually humans that visually count the numbers of insects, occasionally aided by binocular microscope. In the metagenomics research, it is the DNA sequencers that count the number of DNA reads. Furthermore, a key theoretical foundation for sampling insect pest populations, i.e., Taylor's power law, which achieved rare status of ecological law and captures the population aggregation, has been recently extended to the community level for describing community heterogeneity and stability, namely, Taylor's power law extensions (TPLEs). This theoretical advance enabled us to develop a novel approach to assessing the quality and determining optimum reads (coverage) of amplicon sequencing operations. Specifically, two applications were developed: one is, in hindsight, to assess the quality of amplicon sequencing operation in terms of the precision and confidence levels. Another is, prior to sequencing operation, to determine the minimum sequencing efforts for a sequencing project to achieve preset precision and confidence levels.

Collapse

Deng C, Daley T, De Sena Brandine G, Smith AD. Molecular Heterogeneity in Large-Scale Biological Data: Techniques and Applications. Annu Rev Biomed Data Sci 2019. [DOI: 10.1146/annurev-biodatasci-072018-021339] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Hua K, Zhang X. Estimating the total genome length of a metagenomic sample using k-mers. BMC Genomics 2019;20:183. [PMID: 30967110 PMCID: PMC6456951 DOI: 10.1186/s12864-019-5467-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Almeida OGG, De Martinis ECP. Bioinformatics tools to assess metagenomic data for applied microbiology. Appl Microbiol Biotechnol 2018;103:69-82. [DOI: 10.1007/s00253-018-9464-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Revised: 10/15/2018] [Accepted: 10/16/2018] [Indexed: 12/14/2022]

Joint Analysis of Long and Short Reads Enables Accurate Estimates of Microbiome Complexity. Cell Syst 2018;7:192-200.e3. [PMID: 30056005 DOI: 10.1016/j.cels.2018.06.009] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Revised: 05/05/2018] [Accepted: 06/15/2018] [Indexed: 01/09/2023]

Nonpareil 3: Fast Estimation of Metagenomic Coverage and Sequence Diversity. mSystems 2018;3:mSystems00039-18. [PMID: 29657970 PMCID: PMC5893860 DOI: 10.1128/msystems.00039-18] [Citation(s) in RCA: 127] [Impact Index Per Article: 21.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2018] [Accepted: 03/23/2018] [Indexed: 01/15/2023] Open

Abstract

Estimations of microbial community diversity based on metagenomic data sets are affected, often to an unknown degree, by biases derived from insufficient coverage and reference database-dependent estimations of diversity. For instance, the completeness of reference databases cannot be generally estimated since it depends on the extant diversity sampled to date, which, with the exception of a few habitats such as the human gut, remains severely undersampled. Further, estimation of the degree of coverage of a microbial community by a metagenomic data set is prohibitively time-consuming for large data sets, and coverage values may not be directly comparable between data sets obtained with different sequencing technologies. Here, we extend Nonpareil, a database-independent tool for the estimation of coverage in metagenomic data sets, to a high-performance computing implementation that scales up to hundreds of cores and includes, in addition, a k-mer-based estimation as sensitive as the original alignment-based version but about three hundred times as fast. Further, we propose a metric of sequence diversity (N_d ) derived directly from Nonpareil curves that correlates well with alpha diversity assessed by traditional metrics. We use this metric in different experiments demonstrating the correlation with the Shannon index estimated on 16S rRNA gene profiles and show that N_d additionally reveals seasonal patterns in marine samples that are not captured by the Shannon index and more precise rankings of the magnitude of diversity of microbial communities in different habitats. Therefore, the new version of Nonpareil, called Nonpareil 3, advances the toolbox for metagenomic analyses of microbiomes. IMPORTANCE Estimation of the coverage provided by a metagenomic data set, i.e., what fraction of the microbial community was sampled by DNA sequencing, represents an essential first step of every culture-independent genomic study that aims to robustly assess the sequence diversity present in a sample. However, estimation of coverage remains elusive because of several technical limitations associated with high computational requirements and limiting statistical approaches to quantify diversity. Here we described Nonpareil 3, a new bioinformatics algorithm that circumvents several of these limitations and thus can facilitate culture-independent studies in clinical or environmental settings, independent of the sequencing platform employed. In addition, we present a new metric of sequence diversity based on rarefied coverage and demonstrate its use in communities from diverse ecosystems.

Collapse

Wang BX, Wu F. Inference on the Gamma Distribution. Technometrics 2017. [DOI: 10.1080/00401706.2017.1328377] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

García-Ortega LF, Martínez O. How Many Genes Are Expressed in a Transcriptome? Estimation and Results for RNA-Seq. PLoS One 2015;10:e0130262. [PMID: 26107654 PMCID: PMC4479379 DOI: 10.1371/journal.pone.0130262] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2015] [Accepted: 05/19/2015] [Indexed: 01/02/2023] Open

A variance ratio test of fit for Gamma distributions. Stat Probab Lett 2015. [DOI: 10.1016/j.spl.2014.10.001] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Estimating coverage in metagenomic data sets and why it matters. ISME JOURNAL 2014;8:2349-51. [PMID: 24824669 DOI: 10.1038/ismej.2014.76] [Citation(s) in RCA: 113] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]

Rodriguez-R LM, Konstantinidis KT. Nonpareil: a redundancy-based approach to assess the level of coverage in metagenomic datasets. ACTA ACUST UNITED AC 2013;30:629-35. [PMID: 24123672 DOI: 10.1093/bioinformatics/btt584] [Citation(s) in RCA: 152] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Lindner MS, Kollock M, Zickmann F, Renard BY. Analyzing genome coverage profiles with applications to quality control in metagenomics. Bioinformatics 2013;29:1260-7. [DOI: 10.1093/bioinformatics/btt147] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Luo C, Rodriguez-R LM, Konstantinidis KT. A user's guide to quantitative and comparative analysis of metagenomic datasets. Methods Enzymol 2013;531:525-47. [PMID: 24060135 DOI: 10.1016/b978-0-12-407863-5.00023-x] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Madupu R, Szpakowski S, Nelson KE. Microbiome in human health and disease. Sci Prog 2013;96:153-70. [PMID: 23901633 PMCID: PMC10365526 DOI: 10.3184/003685013x13683759820813] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]

Boulund F, Johnning A, Pereira MB, Larsson DGJ, Kristiansson E. A novel method to discover fluoroquinolone antibiotic resistance (qnr) genes in fragmented nucleotide sequences. BMC Genomics 2012;13:695. [PMID: 23231464 PMCID: PMC3543242 DOI: 10.1186/1471-2164-13-695] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2012] [Accepted: 12/04/2012] [Indexed: 12/19/2022] Open

Coverage theories for metagenomic DNA sequencing based on a generalization of Stevens' theorem. J Math Biol 2012;67:1141-61. [PMID: 22965653 PMCID: PMC3795925 DOI: 10.1007/s00285-012-0586-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2011] [Revised: 08/28/2012] [Indexed: 11/21/2022]

Abstract

Metagenomic project design has relied variously upon speculation, semi-empirical and ad hoc heuristic models, and elementary extensions of single-sample Lander–Waterman expectation theory, all of which are demonstrably inadequate. Here, we propose an approach based upon a generalization of Stevens’ Theorem for randomly covering a domain. We extend this result to account for the presence of multiple species, from which are derived useful probabilities for fully recovering a particular target microbe of interest and for average contig length. These show improved specificities compared to older measures and recommend deeper data generation than the levels chosen by some early studies, supporting the view that poor assemblies were due at least somewhat to insufficient data. We assess predictions empirically by generating roughly 4.5 Gb of sequence from a twelve member bacterial community, comparing coverage for two particular members, Selenomonas artemidis and Enterococcus faecium, which are the least (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim $$\end{document}3 %) and most (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim $$\end{document}12 %) abundant species, respectively. Agreement is reasonable, with differences likely attributable to coverage biases. We show that, in some cases, bias is simple in the sense that a small reduction in read length to simulate less efficient covering brings data and theory into essentially complete accord. Finally, we describe two applications of the theory. One plots coverage probability over the relevant parameter space, constructing essentially a “metagenomic design map” to enable straightforward analysis and design of future projects. The other gives an overview of the data requirements for various types of sequencing milestones, including a desired number of contact reads and contig length, for detection of a rare viral species.

Collapse

Zhang T, Zhang XX, Ye L. Plasmid metagenome reveals high levels of antibiotic resistance genes and mobile genetic elements in activated sludge. PLoS One 2011;6:e26041. [PMID: 22016806 PMCID: PMC3189950 DOI: 10.1371/journal.pone.0026041] [Citation(s) in RCA: 198] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2011] [Accepted: 09/16/2011] [Indexed: 11/19/2022] Open

Lombard N, Prestat E, van Elsas JD, Simonet P. Soil-specific limitations for access and analysis of soil microbial communities by metagenomics. FEMS Microbiol Ecol 2011;78:31-49. [PMID: 21631545 DOI: 10.1111/j.1574-6941.2011.01140.x] [Citation(s) in RCA: 99] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open

Kristiansson E, Fick J, Janzon A, Grabic R, Rutgersson C, Weijdegård B, Söderström H, Larsson DGJ. Pyrosequencing of antibiotic-contaminated river sediments reveals high levels of resistance and gene transfer elements. PLoS One 2011;6:e17038. [PMID: 21359229 PMCID: PMC3040208 DOI: 10.1371/journal.pone.0017038] [Citation(s) in RCA: 353] [Impact Index Per Article: 27.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2010] [Accepted: 01/11/2011] [Indexed: 12/18/2022] Open

Wu X, Ren C, Joshi T, Vuong T, Xu D, Nguyen HT. SNP discovery by high-throughput sequencing in soybean. BMC Genomics 2010;11:469. [PMID: 20701770 PMCID: PMC3091665 DOI: 10.1186/1471-2164-11-469] [Citation(s) in RCA: 86] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2010] [Accepted: 08/11/2010] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

With the advance of new massively parallel genotyping technologies, quantitative trait loci (QTL) fine mapping and map-based cloning become more achievable in identifying genes for important and complex traits. Development of high-density genetic markers in the QTL regions of specific mapping populations is essential for fine-mapping and map-based cloning of economically important genes. Single nucleotide polymorphisms (SNPs) are the most abundant form of genetic variation existing between any diverse genotypes that are usually used for QTL mapping studies. The massively parallel sequencing technologies (Roche GS/454, Illumina GA/Solexa, and ABI/SOLiD), have been widely applied to identify genome-wide sequence variations. However, it is still remains unclear whether sequence data at a low sequencing depth are enough to detect the variations existing in any QTL regions of interest in a crop genome, and how to prepare sequencing samples for a complex genome such as soybean. Therefore, with the aims of identifying SNP markers in a cost effective way for fine-mapping several QTL regions, and testing the validation rate of the putative SNPs predicted with Solexa short sequence reads at a low sequencing depth, we evaluated a pooled DNA fragment reduced representation library and SNP detection methods applied to short read sequences generated by Solexa high-throughput sequencing technology.

RESULTS

A total of 39,022 putative SNPs were identified by the Illumina/Solexa sequencing system using a reduced representation DNA library of two parental lines of a mapping population. The validation rates of these putative SNPs predicted with low and high stringency were 72% and 85%, respectively. One hundred sixty four SNP markers resulted from the validation of putative SNPs and have been selectively chosen to target a known QTL, thereby increasing the marker density of the targeted region to one marker per 42 K bp.

CONCLUSIONS

We have demonstrated how to quickly identify large numbers of SNPs for fine mapping of QTL regions by applying massively parallel sequencing combined with genome complexity reduction techniques. This SNP discovery approach is more efficient for targeting multiple QTL regions in a same genetic population, which can be applied to other crops.

Collapse