Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Vinje H, Liland KH, Almøy T, Snipen L. Comparing K-mer based methods for improved classification of 16S sequences. BMC Bioinformatics 2015;16:205. [PMID: 26130333 DOI: 10.1186/s12859-015-0647-4] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2015] [Accepted: 06/06/2015] [Indexed: 11/10/2022] Open

For:	Vinje H, Liland KH, Almøy T, Snipen L. Comparing K-mer based methods for improved classification of 16S sequences. BMC Bioinformatics 2015;16:205. [PMID: 26130333 DOI: 10.1186/s12859-015-0647-4] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2015] [Accepted: 06/06/2015] [Indexed: 11/10/2022] Open

Number

Cited by Other Article(s)

Tian Q, Zhang P, Zhai Y, Wang Y, Zou Q. Application and Comparison of Machine Learning and Database-Based Methods in Taxonomic Classification of High-Throughput Sequencing Data. Genome Biol Evol 2024;16:evae102. [PMID: 38748485 PMCID: PMC11135637 DOI: 10.1093/gbe/evae102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/12/2024] [Indexed: 05/30/2024] Open

Abstract

The advent of high-throughput sequencing technologies has not only revolutionized the field of bioinformatics but has also heightened the demand for efficient taxonomic classification. Despite technological advancements, efficiently processing and analyzing the deluge of sequencing data for precise taxonomic classification remains a formidable challenge. Existing classification approaches primarily fall into two categories, database-based methods and machine learning methods, each presenting its own set of challenges and advantages. On this basis, the aim of our study was to conduct a comparative analysis between these two methods while also investigating the merits of integrating multiple database-based methods. Through an in-depth comparative study, we evaluated the performance of both methodological categories in taxonomic classification by utilizing simulated data sets. Our analysis revealed that database-based methods excel in classification accuracy when backed by a rich and comprehensive reference database. Conversely, while machine learning methods show superior performance in scenarios where reference sequences are sparse or lacking, they generally show inferior performance compared with database methods under most conditions. Moreover, our study confirms that integrating multiple database-based methods does, in fact, enhance classification accuracy. These findings shed new light on the taxonomic classification of high-throughput sequencing data and bear substantial implications for the future development of computational biology. For those interested in further exploring our methods, the source code of this study is publicly available on https://github.com/LoadStar822/Genome-Classifier-Performance-Evaluator. Additionally, a dedicated webpage showcasing our collected database, data sets, and various classification software can be found at http://lab.malab.cn/~tqz/project/taxonomic/.

Collapse

Trecarten S, Fongang B, Liss M. Current Trends and Challenges of Microbiome Research in Prostate Cancer. Curr Oncol Rep 2024;26:477-487. [PMID: 38573440 DOI: 10.1007/s11912-024-01520-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/18/2024] [Indexed: 04/05/2024]

Liao C, Wang L, Quon G. Microbiome-based classification models for fresh produce safety and quality evaluation. Microbiol Spectr 2024;12:e0344823. [PMID: 38445872 PMCID: PMC10986475 DOI: 10.1128/spectrum.03448-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 02/17/2024] [Indexed: 03/07/2024] Open

Abstract

Small sample sizes and loss of sequencing reads during the microbiome data preprocessing can limit the statistical power of differentiating fresh produce phenotypes and prevent the detection of important bacterial species associated with produce contamination or quality reduction. Here, we explored a machine learning-based k-mer hash analysis strategy to identify DNA signatures predictive of produce safety (PS) and produce quality (PQ) and compared it against the amplicon sequence variant (ASV) strategy that uses a typical denoising step and ASV-based taxonomy strategy. Random forest-based classifiers for PS and PQ using 7-mer hash data sets had significantly higher classification accuracy than those using the ASV data sets. We also demonstrated that the proposed combination of integrating multiple data sets and leveraging a 7-mer hash strategy leads to better classification performance for PS and PQ compared to the ASV method but presents lower PS classification accuracy compared to the feature-selected ASV-based taxonomy strategy. Due to the current limitation of generating taxonomy using the 7-mer hash strategy, the ASV-based taxonomy strategy with remarkably less computing time and memory usage is more efficient for PS and PQ classification and applicable for important taxa identification. Results generated from this study lay the foundation for future studies that wish and need to incorporate and/or compare different microbiome sequencing data sets for the application of machine learning in the area of microbial safety and quality of food.

IMPORTANCE

Identification of generalizable indicators for produce safety (PS) and produce quality (PQ) improves the detection of produce contamination and quality decline. However, effective sequencing read loss during microbiome data preprocessing and the limited sample size of individual studies restrain statistical power to identify important features contributing to differentiating PS and PQ phenotypes. We applied machine learning-based models using individual and integrated k-mer hash and amplicon sequence variant (ASV) data sets for PS and PQ classification and evaluated their classification performance and found that random forest (RF)-based models using integrated 7-mer hash data sets achieved significantly higher PS and PQ classification accuracy. Due to the limitation of taxonomic analysis for the 7-mer hash, we also developed RF-based models using feature-selected ASV-based taxonomic data sets, which performed better PS classification than those using the integrated 7-mer hash data set. The RF feature selection method identified 480 PS indicators and 263 PQ indicators with a positive contribution to the PS and PQ classification.

Collapse

Li R, Ernst J. Identifying associations of de novo noncoding variants with autism through integration of gene expression, sequence and sex information. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.20.585624. [PMID: 38562739 PMCID: PMC10983996 DOI: 10.1101/2024.03.20.585624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]

Ecological Observations Based on Functional Gene Sequencing Are Sensitive to the Amplicon Processing Method. mSphere 2022;7:e0032422. [PMID: 35938727 PMCID: PMC9429940 DOI: 10.1128/msphere.00324-22] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open

Abstract

Until recently, the de facto method for short-read-based amplicon reconstruction was a sequence similarity threshold approach (operational taxonomic units [OTUs]). This has changed with the amplicon sequence variant (ASV) method where distributions are fitted to abundance profiles of individual genes using a noise-error model. While OTU-based approaches are still useful for 16S rRNA/18S rRNA genes, where thresholds of 97% to 99% are used, their use for functional genes is still debatable as there is no consensus on clustering thresholds. Here, we compare OTU- and ASV-based reconstruction approaches and taxonomy assignment methods, the naive Bayesian classifier (NBC) and Bayesian lowest common ancestor (BLCA) algorithm, using a functional gene data set from the microbial nitrogen-cycling community in the Brouage mudflat (France). A range of OTU similarity thresholds and ASVs were used to compare amoA (ammonia-oxidizing archaea [AOA] and ammonia-oxidizing bacteria [AOB]), nxrB, nirS, nirK, and nrfA communities between differing sedimentary structures. Significant effects of the sedimentary structure on weighted UniFrac (WUniFrac) distances were observed for AOA amoA when using ASVs, an OTU at a threshold of 97% sequence identity (OTU-97%), and OTU-85%; AOB amoA when using OTU-85%; and nirS when using ASV, OTU-90%, and OTU-85%. For AOB amoA, significant effects of the sedimentary structures on UniFrac distances were observed when using OTU-97% but not ASVs, and the inverse was found for nrfA. Interestingly, conclusions drawn for nirK and nxrB were consistent between amplicon reconstruction methods. We also show that when the sequences in the reference database are related to the environment in question, the BLCA algorithm leads to more phylogenetically relevant classifications. However, when the reference database contains sequences more dissimilar to the ones retrieved, the NBC obtains more information. IMPORTANCE Several analysis pipelines are available to microbial ecologists to process amplicon sequencing data, yet to date, there is no consensus as to the most appropriate method, and it becomes more difficult for genes that encode a specific function (functional genes). Standardized approaches need to be adopted to increase the reliability and reproducibility of environmental amplicon-sequencing-based data sets. In this paper, we argue that the recently developed ASV approach offers a better opportunity to achieve such standardization than OTUs for functional genes. We also propose a comprehensive framework for quality filtering of the sequencing reads based on protein sequence verification.

Collapse

Wang H, Wang S, Zhang Y, Bi S, Zhu X. A brief review of machine learning methods for RNA methylation sites prediction. Methods 2022;203:399-421. [DOI: 10.1016/j.ymeth.2022.03.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Revised: 02/15/2022] [Accepted: 03/01/2022] [Indexed: 02/07/2023] Open

Almeida H, Palys S, Tsang A, Diallo AB. TOUCAN: a framework for fungal biosynthetic gene cluster discovery. NAR Genom Bioinform 2020;2:lqaa098. [PMID: 33575642 PMCID: PMC7694738 DOI: 10.1093/nargab/lqaa098] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Revised: 09/28/2020] [Accepted: 11/05/2020] [Indexed: 12/23/2022] Open

Chen X, Xiong Y, Liu Y, Chen Y, Bi S, Zhu X. m5CPred-SVM: a novel method for predicting m5C sites of RNA. BMC Bioinformatics 2020;21:489. [PMID: 33126851 PMCID: PMC7602301 DOI: 10.1186/s12859-020-03828-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Accepted: 10/21/2020] [Indexed: 02/08/2023] Open

Abstract

BACKGROUND

As one of the most common post-transcriptional modifications (PTCM) in RNA, 5-cytosine-methylation plays important roles in many biological functions such as RNA metabolism and cell fate decision. Through accurate identification of 5-methylcytosine (m5C) sites on RNA, researchers can better understand the exact role of 5-cytosine-methylation in these biological functions. In recent years, computational methods of predicting m5C sites have attracted lots of interests because of its efficiency and low-cost. However, both the accuracy and efficiency of these methods are not satisfactory yet and need further improvement.

RESULTS

In this work, we have developed a new computational method, m5CPred-SVM, to identify m5C sites in three species, H. sapiens, M. musculus and A. thaliana. To build this model, we first collected benchmark datasets following three recently published methods. Then, six types of sequence-based features were generated based on RNA segments and the sequential forward feature selection strategy was used to obtain the optimal feature subset. After that, the performance of models based on different learning algorithms were compared, and the model based on the support vector machine provided the highest prediction accuracy. Finally, our proposed method, m5CPred-SVM was compared with several existing methods, and the result showed that m5CPred-SVM offered substantially higher prediction accuracy than previously published methods. It is expected that our method, m5CPred-SVM, can become a useful tool for accurate identification of m5C sites.

CONCLUSION

In this study, by introducing position-specific propensity related features, we built a new model, m5CPred-SVM, to predict RNA m5C sites of three different species. The result shows that our model outperformed the existing state-of-art models. Our model is available for users through a web server at https://zhulab.ahu.edu.cn/m5CPred-SVM .

Collapse

F. Escapa I, Huang Y, Chen T, Lin M, Kokaras A, Dewhirst FE, Lemon KP. Construction of habitat-specific training sets to achieve species-level assignment in 16S rRNA gene datasets. MICROBIOME 2020;8:65. [PMID: 32414415 PMCID: PMC7291764 DOI: 10.1186/s40168-020-00841-w] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 04/15/2020] [Indexed: 05/10/2023]

Abstract

BACKGROUND

The low cost of 16S rRNA gene sequencing facilitates population-scale molecular epidemiological studies. Existing computational algorithms can resolve 16S rRNA gene sequences into high-resolution amplicon sequence variants (ASVs), which represent consistent labels comparable across studies. Assigning these ASVs to species-level taxonomy strengthens the ecological and/or clinical relevance of 16S rRNA gene-based microbiota studies and further facilitates data comparison across studies.

RESULTS

To achieve this, we developed a broadly applicable method for constructing high-resolution training sets based on the phylogenic relationships among microbes found in a habitat of interest. When used with the naïve Bayesian Ribosomal Database Project (RDP) Classifier, this training set achieved species/supraspecies-level taxonomic assignment of 16S rRNA gene-derived ASVs. The key steps for generating such a training set are (1) constructing an accurate and comprehensive phylogenetic-based, habitat-specific database; (2) compiling multiple 16S rRNA gene sequences to represent the natural sequence variability of each taxon in the database; (3) trimming the training set to match the sequenced regions, if necessary; and (4) placing species sharing closely related sequences into a training-set-specific supraspecies taxonomic level to preserve subgenus-level resolution. As proof of principle, we developed a V1-V3 region training set for the bacterial microbiota of the human aerodigestive tract using the full-length 16S rRNA gene reference sequences compiled in our expanded Human Oral Microbiome Database (eHOMD). We also overcame technical limitations to successfully use Illumina sequences for the 16S rRNA gene V1-V3 region, the most informative segment for classifying bacteria native to the human aerodigestive tract. Finally, we generated a full-length eHOMD 16S rRNA gene training set, which we used in conjunction with an independent PacBio single molecule, real-time (SMRT)-sequenced sinonasal dataset to validate the representation of species in our training set. This also established the effectiveness of a full-length training set for assigning taxonomy of long-read 16S rRNA gene datasets.

CONCLUSION

Here, we present a systematic approach for constructing a phylogeny-based, high-resolution, habitat-specific training set that permits species/supraspecies-level taxonomic assignment to short- and long-read 16S rRNA gene-derived ASVs. This advancement enhances the ecological and/or clinical relevance of 16S rRNA gene-based microbiota studies. Video Abstract.

Collapse

Hur M, Park SJ. Identification of Microbial Profiles in Heavy-Metal-Contaminated Soil from Full-Length 16S rRNA Reads Sequenced by a PacBio System. Microorganisms 2019;7:microorganisms7090357. [PMID: 31527468 PMCID: PMC6780547 DOI: 10.3390/microorganisms7090357] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Revised: 09/10/2019] [Accepted: 09/13/2019] [Indexed: 11/16/2022] Open

Meola M, Rifa E, Shani N, Delbès C, Berthoud H, Chassard C. DAIRYdb: a manually curated reference database for improved taxonomy annotation of 16S rRNA gene sequences from dairy products. BMC Genomics 2019;20:560. [PMID: 31286860 PMCID: PMC6615214 DOI: 10.1186/s12864-019-5914-8] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2018] [Accepted: 06/18/2019] [Indexed: 12/14/2022] Open

Taxonomy based performance metrics for evaluating taxonomic assignment methods. BMC Bioinformatics 2019;20:310. [PMID: 31185897 PMCID: PMC6561758 DOI: 10.1186/s12859-019-2896-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2018] [Accepted: 05/13/2019] [Indexed: 02/01/2023] Open

Zhao X, Zhang Y, Ning Q, Zhang H, Ji J, Yin M. Identifying N6-methyladenosine sites using extreme gradient boosting system optimized by particle swarm optimizer. J Theor Biol 2019;467:39-47. [DOI: 10.1016/j.jtbi.2019.01.035] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2018] [Revised: 01/04/2019] [Accepted: 01/30/2019] [Indexed: 01/15/2023]

He J, Fang T, Zhang Z, Huang B, Zhu X, Xiong Y. PseUI: Pseudouridine sites identification based on RNA sequence information. BMC Bioinformatics 2018;19:306. [PMID: 30157750 PMCID: PMC6114832 DOI: 10.1186/s12859-018-2321-0] [Citation(s) in RCA: 80] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2018] [Accepted: 08/21/2018] [Indexed: 01/28/2023] Open

Abstract

Background

Pseudouridylation is the most prevalent type of posttranscriptional modification in various stable RNAs of all organisms, which significantly affects many cellular processes that are regulated by RNA. Thus, accurate identification of pseudouridine (Ψ) sites in RNA will be of great benefit for understanding these cellular processes. Due to the low efficiency and high cost of current available experimental methods, it is highly desirable to develop computational methods for accurately and efficiently detecting Ψ sites in RNA sequences. However, the predictive accuracy of existing computational methods is not satisfactory and still needs improvement.

Results

In this study, we developed a new model, PseUI, for Ψ sites identification in three species, which are H. sapiens, S. cerevisiae, and M. musculus. Firstly, five different kinds of features including nucleotide composition (NC), dinucleotide composition (DC), pseudo dinucleotide composition (pseDNC), position-specific nucleotide propensity (PSNP), and position-specific dinucleotide propensity (PSDP) were generated based on RNA segments. Then, a sequential forward feature selection strategy was used to gain an effective feature subset with a compact representation but discriminative prediction power. Based on the selected feature subsets, we built our model by using a support vector machine (SVM). Finally, the generalization of our model was validated by both the jackknife test and independent validation tests on the benchmark datasets. The experimental results showed that our model is more accurate and stable than the previously published models. We have also provided a user-friendly web server for our model at http://zhulab.ahu.edu.cn/PseUI, and a brief instruction for the web server is provided in this paper. By using this instruction, the academic users can conveniently get their desired results without complicated calculations.

Conclusion

In this study, we proposed a new predictor, PseUI, to detect Ψ sites in RNA sequences. It is shown that our model outperformed the existing state-of-art models. It is expected that our model, PseUI, will become a useful tool for accurate identification of RNA Ψ sites.

Electronic supplementary material

The online version of this article (10.1186/s12859-018-2321-0) contains supplementary material, which is available to authorized users.

Collapse

Murali A, Bhargava A, Wright ES. IDTAXA: a novel approach for accurate taxonomic classification of microbiome sequences. MICROBIOME 2018;6:140. [PMID: 30092815 PMCID: PMC6085705 DOI: 10.1186/s40168-018-0521-5] [Citation(s) in RCA: 249] [Impact Index Per Article: 41.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Accepted: 07/25/2018] [Indexed: 05/11/2023]

Abstract

BACKGROUND

Microbiome studies often involve sequencing a marker gene to identify the microorganisms in samples of interest. Sequence classification is a critical component of this process, whereby sequences are assigned to a reference taxonomy containing known sequence representatives of many microbial groups. Previous studies have shown that existing classification programs often assign sequences to reference groups even if they belong to novel taxonomic groups that are absent from the reference taxonomy. This high rate of "over classification" is particularly detrimental in microbiome studies because reference taxonomies are far from comprehensive.

RESULTS

Here, we introduce IDTAXA, a novel approach to taxonomic classification that employs principles from machine learning to reduce over classification errors. Using multiple reference taxonomies, we demonstrate that IDTAXA has higher accuracy than popular classifiers such as BLAST, MAPSeq, QIIME, SINTAX, SPINGO, and the RDP Classifier. Similarly, IDTAXA yields far fewer over classifications on Illumina mock microbial community data when the expected taxa are absent from the training set. Furthermore, IDTAXA offers many practical advantages over other classifiers, such as maintaining low error rates across varying input sequence lengths and withholding classifications from input sequences composed of random nucleotides or repeats.

CONCLUSIONS

IDTAXA's classifications may lead to different conclusions in microbiome studies because of the substantially reduced number of taxa that are incorrectly identified through over classification. Although misclassification error is relatively minor, we believe that many remaining misclassifications are likely caused by errors in the reference taxonomy. We describe how IDTAXA is able to identify many putative mislabeling errors in reference taxonomies, enabling training sets to be automatically corrected by eliminating spurious sequences. IDTAXA is part of the DECIPHER package for the R programming language, available through the Bioconductor repository or accessible online ( http://DECIPHER.codes ).

Collapse

McGovern E, Waters SM, Blackshields G, McCabe MS. Evaluating Established Methods for Rumen 16S rRNA Amplicon Sequencing With Mock Microbial Populations. Front Microbiol 2018;9:1365. [PMID: 29988486 PMCID: PMC6026621 DOI: 10.3389/fmicb.2018.01365] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2018] [Accepted: 06/05/2018] [Indexed: 11/22/2022] Open

Abstract

The rumen microbiome scientific community has utilized amplicon sequencing as an aid in identifying potential community compositional trends that could be used as an estimation of various production and performance traits including methane emission, animal protein production efficiency, and ruminant health status. In order to translate rumen microbiome studies into executable application, there is a need for experimental and analytical concordance within the community. The objective of this study was to assess these factors in relation to selected currently established methods for 16S phylogenetic community analysis on a microbial community standard (MC) and a DNA standard (DS; ZymoBIOMICS^TM). DNA was extracted from MC using the RBBC method commonly used for microbial DNA extraction from rumen digesta samples. 16S rRNA amplicon libraries were generated for the MC and DS using primers routinely used for rumen bacterial and archaeal community analysis. The primers targeted the V4 and V3–V4 region of the 16S rRNA gene and samples were subjected to both 20 and 28 polymerase chain reaction (PCR) cycles under identical cycle conditions. Sequencing was conducted using the Illumina MiSeq platform. As the bacteria contained in the microbial mock community were well-classified species, and for ease of explanation, we used the results of the Basic Local Alignment Search Tool classification to assess the DNA, PCR cycle number, and primer type. Sequence classification methodology was assessed independently. Spearman’s correlation analysis indicated that utilizing the repeated bead beating and column method for DNA extraction in combination with primers targeting the 16S rRNA gene using 20 first-round PCR cycles was sufficient for amplicon sequencing to generate a relatively accurate depiction of the bacterial communities present in rumen samples. These results also emphasize the requirement to develop and utilize positive mock community controls for all rumen microbiomic studies in order to discern errors which may arise at any step during a next-generation sequencing protocol.

Collapse

McAllister T, Dunière L, Drouin P, Xu S, Wang Y, Munns K, Zaheer R. Silage review: Using molecular approaches to define the microbial ecology of silage. J Dairy Sci 2018;101:4060-4074. [DOI: 10.3168/jds.2017-13704] [Citation(s) in RCA: 66] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Accepted: 10/21/2017] [Indexed: 12/11/2022]

Zhang J, Guo J, Zhang M, Yu X, Yu X, Guo W, Zeng T, Chen L. Efficient Mining Multi-mers in a Variety of Biological Sequences. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018;17:949-958. [PMID: 29993642 DOI: 10.1109/tcbb.2018.2828313] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Liland KH, Vinje H, Snipen L. microclass: an R-package for 16S taxonomy classification. BMC Bioinformatics 2017;18:172. [PMID: 28302051 PMCID: PMC5353803 DOI: 10.1186/s12859-017-1583-2] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2016] [Accepted: 03/03/2017] [Indexed: 11/10/2022] Open

Li GQ, Liu Z, Shen HB, Yu DJ. TargetM6A: Identifying N6-Methyladenosine Sites From RNA Sequences via Position-Specific Nucleotide Propensities and a Support Vector Machine. IEEE Trans Nanobioscience 2016;15:674-682. [DOI: 10.1109/tnb.2016.2599115] [Citation(s) in RCA: 57] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

de la Cuesta-Zuluaga J, Escobar JS. Considerations For Optimizing Microbiome Analysis Using a Marker Gene. Front Nutr 2016;3:26. [PMID: 27551678 PMCID: PMC4976105 DOI: 10.3389/fnut.2016.00026] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2016] [Accepted: 07/26/2016] [Indexed: 12/22/2022] Open

Myer PR, Kim M, Freetly HC, Smith TPL. Evaluation of 16S rRNA amplicon sequencing using two next-generation sequencing technologies for phylogenetic analysis of the rumen bacterial community in steers. J Microbiol Methods 2016;127:132-140. [PMID: 27282101 DOI: 10.1016/j.mimet.2016.06.004] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2016] [Revised: 06/03/2016] [Accepted: 06/03/2016] [Indexed: 11/16/2022]

Abstract

Next generation sequencing technologies have vastly changed the approach of sequencing of the 16S rRNA gene for studies in microbial ecology. Three distinct technologies are available for large-scale 16S sequencing. All three are subject to biases introduced by sequencing error rates, amplification primer selection, and read length, which can affect the apparent microbial community. In this study, we compared short read 16S rRNA variable regions, V1-V3, with that of near-full length 16S regions, V1-V8, using highly diverse steer rumen microbial communities, in order to examine the impact of technology selection on phylogenetic profiles. Short paired-end reads from the Illumina MiSeq platform were used to generate V1-V3 sequence, while long "circular consensus" reads from the Pacific Biosciences RSII instrument were used to generate V1-V8 data. The two platforms revealed similar microbial operational taxonomic units (OTUs), as well as similar species richness, Good's coverage, and Shannon diversity metrics. However, the V1-V8 amplified ruminal community resulted in significant increases in several orders of taxa, such as phyla Proteobacteria and Verrucomicrobia (P < 0.05). Taxonomic classification accuracy was also greater in the near full-length read. UniFrac distance matrices using jackknifed UPGMA clustering also noted differences between the communities. These data support the consensus that longer reads result in a finer phylogenetic resolution that may not be achieved by shorter 16S rRNA gene fragments. Our work on the cattle rumen bacterial community demonstrates that utilizing near full-length 16S reads may be useful in conducting a more thorough study, or for developing a niche-specific database to use in analyzing data from shorter read technologies when budgetary constraints preclude use of near-full length 16S sequencing.

Collapse