Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Piro VC, Matschkowski M, Renard BY. MetaMeta: integrating metagenome analysis tools to improve taxonomic profiling. Microbiome 2017;5:101. [PMID: 28807044 PMCID: PMC5557516 DOI: 10.1186/s40168-017-0318-y] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2016] [Accepted: 07/25/2017] [Indexed: 05/11/2023]

For:	Piro VC, Matschkowski M, Renard BY. MetaMeta: integrating metagenome analysis tools to improve taxonomic profiling. Microbiome 2017;5:101. [PMID: 28807044 PMCID: PMC5557516 DOI: 10.1186/s40168-017-0318-y] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2016] [Accepted: 07/25/2017] [Indexed: 05/11/2023]

Number

Cited by Other Article(s)

Jang CS, Kim H, Kim D, Han B. MicroPredict: predicting species-level taxonomic abundance of whole-shotgun metagenomic data using only 16S amplicon sequencing data. Genes Genomics 2024;46:701-712. [PMID: 38700829 PMCID: PMC11102407 DOI: 10.1007/s13258-024-01514-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 03/26/2024] [Indexed: 05/19/2024]

Tian Q, Zhang P, Zhai Y, Wang Y, Zou Q. Application and Comparison of Machine Learning and Database-Based Methods in Taxonomic Classification of High-Throughput Sequencing Data. Genome Biol Evol 2024;16:evae102. [PMID: 38748485 PMCID: PMC11135637 DOI: 10.1093/gbe/evae102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/12/2024] [Indexed: 05/30/2024] Open

Abstract

The advent of high-throughput sequencing technologies has not only revolutionized the field of bioinformatics but has also heightened the demand for efficient taxonomic classification. Despite technological advancements, efficiently processing and analyzing the deluge of sequencing data for precise taxonomic classification remains a formidable challenge. Existing classification approaches primarily fall into two categories, database-based methods and machine learning methods, each presenting its own set of challenges and advantages. On this basis, the aim of our study was to conduct a comparative analysis between these two methods while also investigating the merits of integrating multiple database-based methods. Through an in-depth comparative study, we evaluated the performance of both methodological categories in taxonomic classification by utilizing simulated data sets. Our analysis revealed that database-based methods excel in classification accuracy when backed by a rich and comprehensive reference database. Conversely, while machine learning methods show superior performance in scenarios where reference sequences are sparse or lacking, they generally show inferior performance compared with database methods under most conditions. Moreover, our study confirms that integrating multiple database-based methods does, in fact, enhance classification accuracy. These findings shed new light on the taxonomic classification of high-throughput sequencing data and bear substantial implications for the future development of computational biology. For those interested in further exploring our methods, the source code of this study is publicly available on https://github.com/LoadStar822/Genome-Classifier-Performance-Evaluator. Additionally, a dedicated webpage showcasing our collected database, data sets, and various classification software can be found at http://lab.malab.cn/~tqz/project/taxonomic/.

Collapse

Walsh LH, Coakley M, Walsh AM, O'Toole PW, Cotter PD. Bioinformatic approaches for studying the microbiome of fermented food. Crit Rev Microbiol 2023;49:693-725. [PMID: 36287644 DOI: 10.1080/1040841x.2022.2132850] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 08/11/2022] [Accepted: 09/28/2022] [Indexed: 11/03/2022]

Budiš J, Krampl W, Kucharík M, Hekel R, Goga A, Sitarčík J, Lichvár M, Smol’ak D, Böhmer M, Baláž A, Ďuriš F, Gazdarica J, Šoltys K, Turňa J, Radvánszky J, Szemes T. SnakeLines: integrated set of computational pipelines for sequencing reads. J Integr Bioinform 2023;20:jib-2022-0059. [PMID: 37602733 PMCID: PMC10757078 DOI: 10.1515/jib-2022-0059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Accepted: 03/21/2023] [Indexed: 08/22/2023] Open

Affiliation(s)

Jaroslav Budiš Geneton Ltd., 841 04Bratislava, Slovakia Slovak Centre of Scientific and Technical Information, 811 04Bratislava, Slovakia Comenius University Science Park, 841 04Bratislava, Slovakia
Werner Krampl Geneton Ltd., 841 04Bratislava, Slovakia Comenius University Science Park, 841 04Bratislava, Slovakia Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, 841 04Bratislava, Slovakia
Marcel Kucharík Geneton Ltd., 841 04Bratislava, Slovakia Comenius University Science Park, 841 04Bratislava, Slovakia
Rastislav Hekel Geneton Ltd., 841 04Bratislava, Slovakia Slovak Centre of Scientific and Technical Information, 811 04Bratislava, Slovakia Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, 841 04Bratislava, Slovakia
Adrián Goga Comenius University Science Park, 841 04Bratislava, Slovakia Department of Computer Science, Faculty of Mathematics, Physics and Informatics, Comenius University, 841 04Bratislava, Slovakia
Jozef Sitarčík Geneton Ltd., 841 04Bratislava, Slovakia Slovak Centre of Scientific and Technical Information, 811 04Bratislava, Slovakia Comenius University Science Park, 841 04Bratislava, Slovakia
Michal Lichvár Geneton Ltd., 841 04Bratislava, Slovakia Comenius University Science Park, 841 04Bratislava, Slovakia
Dávid Smol’ak Geneton Ltd., 841 04Bratislava, Slovakia Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, 841 04Bratislava, Slovakia
Miroslav Böhmer Geneton Ltd., 841 04Bratislava, Slovakia Comenius University Science Park, 841 04Bratislava, Slovakia Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, 841 04Bratislava, Slovakia
Andrej Baláž Geneton Ltd., 841 04Bratislava, Slovakia Department of Applied Informatics, Faculty of Mathematics, Physics and Informatics, Comenius University, 841 04Bratislava, Slovakia
František Ďuriš Geneton Ltd., 841 04Bratislava, Slovakia Slovak Centre of Scientific and Technical Information, 811 04Bratislava, Slovakia
Juraj Gazdarica Geneton Ltd., 841 04Bratislava, Slovakia Slovak Centre of Scientific and Technical Information, 811 04Bratislava, Slovakia
Katarína Šoltys Comenius University Science Park, 841 04Bratislava, Slovakia Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, 841 04Bratislava, Slovakia
Ján Turňa Slovak Centre of Scientific and Technical Information, 811 04Bratislava, Slovakia Comenius University Science Park, 841 04Bratislava, Slovakia Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, 841 04Bratislava, Slovakia
Ján Radvánszky Geneton Ltd., 841 04Bratislava, Slovakia Comenius University Science Park, 841 04Bratislava, Slovakia Institute of Clinical and Translational Research, Biomedical Research Center, Slovak Academy of Sciences, 845 05Bratislava, Slovakia
Tomáš Szemes Geneton Ltd., 841 04Bratislava, Slovakia Comenius University Science Park, 841 04Bratislava, Slovakia Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, 841 04Bratislava, Slovakia

Collapse

Shen K, Din AU, Sinha B, Zhou Y, Qian F, Shen B. Translational informatics for human microbiota: data resources, models and applications. Brief Bioinform 2023;24:7152256. [PMID: 37141135 DOI: 10.1093/bib/bbad168] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Revised: 04/07/2023] [Accepted: 04/11/2023] [Indexed: 05/05/2023] Open

Gihawi A, Cardenas R, Hurst R, Brewer DS. Quality Control in Metagenomics Data. Methods Mol Biol 2023;2649:21-54. [PMID: 37258856 DOI: 10.1007/978-1-0716-3072-3_2] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]

Garrido-Sanz L, Àngel Senar M, Piñol J. Drastic reduction of false positive species in samples of insects by intersecting the default output of two popular metagenomic classifiers. PLoS One 2022;17:e0275790. [PMID: 36282811 PMCID: PMC9595558 DOI: 10.1371/journal.pone.0275790] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2022] [Accepted: 09/15/2022] [Indexed: 11/19/2022] Open

Abstract

The use of high-throughput sequencing to recover short DNA reads of many species has been widely applied on biodiversity studies, either as amplicon metabarcoding or shotgun metagenomics. These reads are assigned to taxa using classifiers. However, for different reasons, the results often contain many false positives. Here we focus on the reduction of false positive species attributable to the classifiers. We benchmarked two popular classifiers, BLASTn followed by MEGAN6 (BM) and Kraken2 (K2), to analyse shotgun sequenced artificial single-species samples of insects. To reduce the number of misclassified reads, we combined the output of the two classifiers in two different ways: (1) by keeping only the reads that were attributed to the same species by both classifiers (intersection approach); and (2) by keeping the reads assigned to some species by any classifier (union approach). In addition, we applied an analytical detection limit to further reduce the number of false positives species. As expected, both metagenomic classifiers used with default parameters generated an unacceptably high number of misidentified species (tens with BM, hundreds with K2). The false positive species were not necessarily phylogenetically close, as some of them belonged to different orders of insects. The union approach failed to reduce the number of false positives, but the intersection approach got rid of most of them. The addition of an analytic detection limit of 0.001 further reduced the number to ca. 0.5 false positive species per sample. The misidentification of species by most classifiers hampers the confidence of the DNA-based methods for assessing the biodiversity of biological samples. Our approach to alleviate the problem is straightforward and significantly reduced the number of reported false positive species.

Collapse

Bhattacharya C, Tierney BT, Ryon KA, Bhattacharyya M, Hastings JJA, Basu S, Bhattacharya B, Bagchi D, Mukherjee S, Wang L, Henaff EM, Mason CE. Supervised Machine Learning Enables Geospatial Microbial Provenance. Genes (Basel) 2022;13:1914. [PMID: 36292799 PMCID: PMC9601318 DOI: 10.3390/genes13101914] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 10/14/2022] [Accepted: 10/18/2022] [Indexed: 11/04/2022] Open

Affiliation(s)

Chandrima Bhattacharya Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine, New York, NY 10065, USA The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10065, USA Integrated Design and Media, Center for Urban Science and Progress, NYU Tandon School of Engineering, Brooklyn, New York, NY 11201, USA
Braden T. Tierney The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10065, USA Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA
Krista A. Ryon The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10065, USA Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA
Malay Bhattacharyya Center for Artificial Intelligence and Machine Learning, Indian Statistical Institute, Kolkata 700108, India Machine Intelligence Unit, Indian Statistical Institute, Kolkata 700108, India
Jaden J. A. Hastings The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10065, USA Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA
Srijani Basu Department of Medicine, Weill Cornell Medicine, New York, NY 10065, USA
Bodhisatwa Bhattacharya Department of Electrical and Electronics Engineering, Birla Institute of Technology, Mesra, Ranchi 835215, India
Debneel Bagchi Department of Metallurgy & Materials Engineering, Indian Institute of Engineering Science & Technology, Shibpur, Howrah 711103, India
Somsubhro Mukherjee Department of Biological Sciences, National University of Singapore, Singapore 117558, Singapore
Lu Wang Department of Biological Sciences, National University of Singapore, Singapore 117558, Singapore
Elizabeth M. Henaff Integrated Design and Media, Center for Urban Science and Progress, NYU Tandon School of Engineering, Brooklyn, New York, NY 11201, USA
Christopher E. Mason The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10065, USA Integrated Design and Media, Center for Urban Science and Progress, NYU Tandon School of Engineering, Brooklyn, New York, NY 11201, USA WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY 10065, USA

Collapse

Bartoszewicz JM, Nasri F, Nowicka M, Renard BY. Detecting DNA of novel fungal pathogens using ResNets and a curated fungi-hosts data collection. Bioinformatics 2022;38:ii168-ii174. [PMID: 36124807 DOI: 10.1093/bioinformatics/btac495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/08/2022] [Indexed: 12/25/2022] Open

PathoLive—Real-Time Pathogen Identification from Metagenomic Illumina Datasets. Life (Basel) 2022;12:life12091345. [PMID: 36143382 PMCID: PMC9505849 DOI: 10.3390/life12091345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 08/24/2022] [Accepted: 08/24/2022] [Indexed: 11/18/2022] Open

Crowdsourced benchmarking of taxonomic metagenome profilers: lessons learned from the sbv IMPROVER Microbiomics challenge. BMC Genomics 2022;23:624. [PMID: 36042406 PMCID: PMC9429340 DOI: 10.1186/s12864-022-08803-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Accepted: 07/25/2022] [Indexed: 11/10/2022] Open

Abstract

Background

Selection of optimal computational strategies for analyzing metagenomics data is a decisive step in determining the microbial composition of a sample, and this procedure is complex because of the numerous tools currently available. The aim of this research was to summarize the results of crowdsourced sbv IMPROVER Microbiomics Challenge designed to evaluate the performance of off-the-shelf metagenomics software as well as to investigate the robustness of these results by the extended post-challenge analysis. In total 21 off-the-shelf taxonomic metagenome profiling pipelines were benchmarked for their capacity to identify the microbiome composition at various taxon levels across 104 shotgun metagenomics datasets of bacterial genomes (representative of various microbiome samples) from public databases. Performance was determined by comparing predicted taxonomy profiles with the gold standard.

Results

Most taxonomic profilers performed homogeneously well at the phylum level but generated intermediate and heterogeneous scores at the genus and species levels, respectively. kmer-based pipelines using Kraken with and without Bracken or using CLARK-S performed best overall, but they exhibited lower precision than the two marker-gene-based methods MetaPhlAn and mOTU. Filtering out the 1% least abundance species—which were not reliably predicted—helped increase the performance of most profilers by increasing precision but at the cost of recall. However, the use of adaptive filtering thresholds determined from the sample’s Shannon index increased the performance of most kmer-based profilers while mitigating the tradeoff between precision and recall.

Conclusions

kmer-based metagenomic pipelines using Kraken/Bracken or CLARK-S performed most robustly across a large variety of microbiome datasets. Removing non-reliably predicted low-abundance species by using diversity-dependent adaptive filtering thresholds further enhanced the performance of these tools. This work demonstrates the applicability of computational pipelines for accurately determining taxonomic profiles in clinical and environmental contexts and exemplifies the power of crowdsourcing for unbiased evaluation.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12864-022-08803-2.

Collapse

Growth promotion and antibiotic induced metabolic shifts in the chicken gut microbiome. Commun Biol 2022;5:293. [PMID: 35365748 PMCID: PMC8975857 DOI: 10.1038/s42003-022-03239-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Accepted: 03/08/2022] [Indexed: 02/07/2023] Open

Ulrich JU, Lutfi A, Rutzen K, Renard BY. OUP accepted manuscript. Bioinformatics 2022;38:i153-i160. [PMID: 35758774 PMCID: PMC9235500 DOI: 10.1093/bioinformatics/btac223] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Voigt B, Fischer O, Krumnow C, Herta C, Dabrowski PW. NGS read classification using AI. PLoS One 2021;16:e0261548. [PMID: 34936673 PMCID: PMC8694450 DOI: 10.1371/journal.pone.0261548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 12/03/2021] [Indexed: 11/19/2022] Open

Accessing Dietary Effects on the Rumen Microbiome: Different Sequencing Methods Tell Different Stories. Vet Sci 2021;8:vetsci8070138. [PMID: 34357930 PMCID: PMC8310016 DOI: 10.3390/vetsci8070138] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 07/02/2021] [Accepted: 07/14/2021] [Indexed: 12/29/2022] Open

Dall'Olio D, Curti N, Fonzi E, Sala C, Remondini D, Castellani G, Giampieri E. Impact of concurrency on the performance of a whole exome sequencing pipeline. BMC Bioinformatics 2021;22:60. [PMID: 33563206 PMCID: PMC7874478 DOI: 10.1186/s12859-020-03780-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Accepted: 09/24/2020] [Indexed: 11/12/2022] Open

Abstract

Background

Current high-throughput technologies—i.e. whole genome sequencing, RNA-Seq, ChIP-Seq, etc.—generate huge amounts of data and their usage gets more widespread with each passing year. Complex analysis pipelines involving several computationally-intensive steps have to be applied on an increasing number of samples. Workflow management systems allow parallelization and a more efficient usage of computational power. Nevertheless, this mostly happens by assigning the available cores to a single or few samples’ pipeline at a time. We refer to this approach as naive parallel strategy (NPS). Here, we discuss an alternative approach, which we refer to as concurrent execution strategy (CES), which equally distributes the available processors across every sample’s pipeline.

Results

Theoretically, we show that the CES results, under loose conditions, in a substantial speedup, with an ideal gain range spanning from 1 to the number of samples. Also, we observe that the CES yields even faster executions since parallelly computable tasks scale sub-linearly. Practically, we tested both strategies on a whole exome sequencing pipeline applied to three publicly available matched tumour-normal sample pairs of gastrointestinal stromal tumour. The CES achieved speedups in latency up to 2–2.4 compared to the NPS.

Conclusions

Our results hint that if resources distribution is further tailored to fit specific situations, an even greater gain in performance of multiple samples pipelines execution could be achieved. For this to be feasible, a benchmarking of the tools included in the pipeline would be necessary. It is our opinion these benchmarks should be consistently performed by the tools’ developers. Finally, these results suggest that concurrent strategies might also lead to energy and cost savings by making feasible the usage of low power machine clusters.

Collapse

Prasanna A, Niranjan V. Clin-mNGS: Automated Pipeline for Pathogen Detection from Clinical Metagenomic Data. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200608130029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Abstract Background: Since bacteria are the earliest known organisms, there has been significant interest in their variety and biology, most certainly concerning human health. Recent advances in Metagenomics sequencing (mNGS), a culture-independent sequencing technology, have facilitated an accelerated development in clinical microbiology and our understanding of pathogens. Objective: For the implementation of mNGS in routine clinical practice to become feasible, a practical and scalable strategy for the study of mNGS data is essential. This study presents a robust automated pipeline to analyze clinical metagenomic data for pathogen identification and classification. Method: The proposed Clin-mNGS pipeline is an integrated, open-source, scalable, reproducible, and user-friendly framework scripted using the Snakemake workflow management software. The implementation avoids the hassle of manual installation and configuration of the multiple commandline tools and dependencies. The approach directly screens pathogens from clinical raw reads and generates consolidated reports for each sample. Results: The pipeline is demonstrated using publicly available data and is tested on a desktop Linux system and a High-performance cluster. The study compares variability in results from different tools and versions. The versions of the tools are made user modifiable. The pipeline results in quality check, filtered reads, host subtraction, assembled contigs, assembly metrics, relative abundances of bacterial species, antimicrobial resistance genes, plasmid finding, and virulence factors identification. The results obtained from the pipeline are evaluated based on sensitivity and positive predictive value. Conclusion: Clin-mNGS is an automated Snakemake pipeline validated for the analysis of microbial clinical metagenomics reads to perform taxonomic classification and antimicrobial resistance prediction. Collapse

Benavides A, Sanchez F, Alzate JF, Cabarcas F. DATMA: Distributed AuTomatic Metagenomic Assembly and annotation framework. PeerJ 2020;8:e9762. [PMID: 32953263 PMCID: PMC7474881 DOI: 10.7717/peerj.9762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Accepted: 07/28/2020] [Indexed: 11/20/2022] Open

Muñoz-Benavent M, Hartkopf F, Van Den Bossche T, Piro VC, García-Ferris C, Latorre A, Renard BY, Muth T. gNOMO: a multi-omics pipeline for integrated host and microbiome analysis of non-model organisms. NAR Genom Bioinform 2020;2:lqaa058. [PMID: 33575609 PMCID: PMC7671378 DOI: 10.1093/nargab/lqaa058] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Revised: 06/19/2020] [Accepted: 08/03/2020] [Indexed: 01/14/2023] Open

Sim M, Lee J, Lee D, Kwon D, Kim J. TAMA: improved metagenomic sequence classification through meta-analysis. BMC Bioinformatics 2020;21:185. [PMID: 32397982 PMCID: PMC7218625 DOI: 10.1186/s12859-020-3533-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2020] [Accepted: 05/05/2020] [Indexed: 11/10/2022] Open

Khachatryan L, de Leeuw RH, Kraakman MEM, Pappas N, Te Raa M, Mei H, de Knijff P, Laros JFJ. Taxonomic classification and abundance estimation using 16S and WGS-A comparison using controlled reference samples. Forensic Sci Int Genet 2020;46:102257. [PMID: 32058299 DOI: 10.1016/j.fsigen.2020.102257] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2019] [Revised: 12/30/2019] [Accepted: 01/27/2020] [Indexed: 12/30/2022]

Cooper RO, Cressler CE. Characterization of key bacterial species in the Daphnia magna microbiota using shotgun metagenomics. Sci Rep 2020;10:652. [PMID: 31959775 PMCID: PMC6971282 DOI: 10.1038/s41598-019-57367-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Accepted: 12/24/2019] [Indexed: 12/28/2022] Open

Gihawi A, Rallapalli G, Hurst R, Cooper CS, Leggett RM, Brewer DS. SEPATH: benchmarking the search for pathogens in human tissue whole genome sequence data leads to template pipelines. Genome Biol 2019;20:208. [PMID: 31639030 PMCID: PMC6805339 DOI: 10.1186/s13059-019-1819-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Accepted: 09/11/2019] [Indexed: 02/07/2023] Open

Ye SH, Siddle KJ, Park DJ, Sabeti PC. Benchmarking Metagenomics Tools for Taxonomic Classification. Cell 2019;178:779-794. [PMID: 31398336 PMCID: PMC6716367 DOI: 10.1016/j.cell.2019.07.010] [Citation(s) in RCA: 249] [Impact Index Per Article: 49.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 06/18/2019] [Accepted: 07/08/2019] [Indexed: 01/17/2023]

Seiler E, Trappe K, Renard BY. Where did you come from, where did you go: Refining metagenomic analysis tools for horizontal gene transfer characterisation. PLoS Comput Biol 2019;15:e1007208. [PMID: 31335917 PMCID: PMC6677323 DOI: 10.1371/journal.pcbi.1007208] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Revised: 08/02/2019] [Accepted: 06/24/2019] [Indexed: 12/22/2022] Open

Somerville V, Lutz S, Schmid M, Frei D, Moser A, Irmler S, Frey JE, Ahrens CH. Long-read based de novo assembly of low-complexity metagenome samples results in finished genomes and reveals insights into strain diversity and an active phage system. BMC Microbiol 2019;19:143. [PMID: 31238873 PMCID: PMC6593500 DOI: 10.1186/s12866-019-1500-0] [Citation(s) in RCA: 76] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Accepted: 05/31/2019] [Indexed: 01/18/2023] Open

Abstract

BACKGROUND

Complete and contiguous genome assemblies greatly improve the quality of subsequent systems-wide functional profiling studies and the ability to gain novel biological insights. While a de novo genome assembly of an isolated bacterial strain is in most cases straightforward, more informative data about co-existing bacteria as well as synergistic and antagonistic effects can be obtained from a direct analysis of microbial communities. However, the complexity of metagenomic samples represents a major challenge. While third generation sequencing technologies have been suggested to enable finished metagenome-assembled genomes, to our knowledge, the complete genome assembly of all dominant strains in a microbiome sample has not been demonstrated. Natural whey starter cultures (NWCs) are used in cheese production and represent low-complexity microbiomes. Previous studies of Swiss Gruyère and selected Italian hard cheeses, mostly based on amplicon metagenomics, concurred that three species generally pre-dominate: Streptococcus thermophilus, Lactobacillus helveticus and Lactobacillus delbrueckii.

RESULTS

Two NWCs from Swiss Gruyère producers were subjected to whole metagenome shotgun sequencing using the Pacific Biosciences Sequel and Illumina MiSeq platforms. In addition, longer Oxford Nanopore Technologies MinION reads had to be generated for one to resolve repeat regions. Thereby, we achieved the complete assembly of all dominant bacterial genomes from these low-complexity NWCs, which was corroborated by a 16S rRNA amplicon survey. Moreover, two distinct L. helveticus strains were successfully co-assembled from the same sample. Besides bacterial chromosomes, we could also assemble several bacterial plasmids and phages and a corresponding prophage. Biologically relevant insights were uncovered by linking the plasmids and phages to their respective host genomes using DNA methylation motifs on the plasmids and by matching prokaryotic CRISPR spacers with the corresponding protospacers on the phages. These results could only be achieved by employing long-read sequencing data able to span intragenomic as well as intergenomic repeats.

CONCLUSIONS

Here, we demonstrate the feasibility of complete de novo genome assembly of all dominant strains from low-complexity NWCs based on whole metagenomics shotgun sequencing data. This allowed to gain novel biological insights and is a fundamental basis for subsequent systems-wide omics analyses, functional profiling and phenotype to genotype analysis of specific microbial communities.

Collapse

Mesophilic Sporeformers Identified in Whey Powder by Using Shotgun Metagenomic Sequencing. Appl Environ Microbiol 2018;84:AEM.01305-18. [PMID: 30076196 DOI: 10.1128/aem.01305-18] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2018] [Accepted: 07/31/2018] [Indexed: 01/19/2023] Open

Abstract

Spoilage and pathogenic spore-forming bacteria are a major cause of concern for producers of dairy products. Traditional agar-based detection methods employed by the dairy industry have limitations with respect to their sensitivity and specificity. The aim of this study was to identify low-abundance sporeformers in samples of a powdered dairy product, whey powder, produced monthly over 1 year, using novel culture-independent shotgun metagenomics-based approaches. Although mesophilic sporeformers were the main target of this study, in one instance thermophilic sporeformers were also targeted using this culture-independent approach. For comparative purposes, mesophilic and thermophilic sporeformers were also tested for within the same sample using culture-based approaches. Ultimately, the approaches taken highlighted differences in the taxa identified due to treatment and isolation methods. Despite this, low levels of transient, mesophilic, and in some cases potentially pathogenic sporeformers were consistently detected in powder samples. Although the specific sporeformers changed from one month to the next, it was apparent that 3 groups of mesophilic sporeformers, namely, Bacillus cereus, Bacillus licheniformis/Bacillus paralicheniformis, and a third, more heterogeneous group containing Brevibacillus brevis, dominated across the 12 samples. Total thermophilic sporeformer taxonomy was considerably different from mesophilic taxonomy, as well as from the culturable thermophilic taxonomy, in the one sample analyzed by all four approaches. Ultimately, through the application of shotgun metagenomic sequencing to dairy powders, the potential for this technology to facilitate the detection of undesirable bacteria present in these food ingredients is highlighted.IMPORTANCE The ability of sporeformers to remain dormant in a desiccated state is of concern from a safety and spoilage perspective in dairy powder. Traditional culturing techniques are slow and provide little information without further investigation. We describe the identification of mesophilic sporeformers present in powders produced over 1 year, using novel shotgun metagenomic sequencing. This method allows detection and identification of possible pathogens and spoilage bacteria in parallel. Strain-level analysis and functional gene analysis, such as identification of toxin genes, were also performed. This approach has the potential to be of great value with respect to the detection of spore-forming bacteria and could allow a processor to make an informed decision surrounding process changes to reduce the risk of spore contamination.

Collapse

Uritskiy GV, DiRuggiero J, Taylor J. MetaWRAP-a flexible pipeline for genome-resolved metagenomic data analysis. MICROBIOME 2018;6:158. [PMID: 30219103 PMCID: PMC6138922 DOI: 10.1186/s40168-018-0541-1] [Citation(s) in RCA: 955] [Impact Index Per Article: 159.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Accepted: 08/29/2018] [Indexed: 05/18/2023]

Abstract

BACKGROUND

The study of microbiomes using whole-metagenome shotgun sequencing enables the analysis of uncultivated microbial populations that may have important roles in their environments. Extracting individual draft genomes (bins) facilitates metagenomic analysis at the single genome level. Software and pipelines for such analysis have become diverse and sophisticated, resulting in a significant burden for biologists to access and use them. Furthermore, while bin extraction algorithms are rapidly improving, there is still a lack of tools for their evaluation and visualization.

RESULTS

To address these challenges, we present metaWRAP, a modular pipeline software for shotgun metagenomic data analysis. MetaWRAP deploys state-of-the-art software to handle metagenomic data processing starting from raw sequencing reads and ending in metagenomic bins and their analysis. MetaWRAP is flexible enough to give investigators control over the analysis, while still being easy-to-install and easy-to-use. It includes hybrid algorithms that leverage the strengths of a variety of software to extract and refine high-quality bins from metagenomic data through bin consolidation and reassembly. MetaWRAP's hybrid bin extraction algorithm outperforms individual binning approaches and other bin consolidation programs in both synthetic and real data sets. Finally, metaWRAP comes with numerous modules for the analysis of metagenomic bins, including taxonomy assignment, abundance estimation, functional annotation, and visualization.

CONCLUSIONS

MetaWRAP is an easy-to-use modular pipeline that automates the core tasks in metagenomic analysis, while contributing significant improvements to the extraction and interpretation of high-quality metagenomic bins. The bin refinement and reassembly modules of metaWRAP consistently outperform other binning approaches. Each module of metaWRAP is also a standalone component, making it a flexible and versatile tool for tackling metagenomic shotgun sequencing data. MetaWRAP is open-source software available at https://github.com/bxlab/metaWRAP .

Collapse

Walsh AM, Crispie F, O'Sullivan O, Finnegan L, Claesson MJ, Cotter PD. Species classifier choice is a key consideration when analysing low-complexity food microbiome data. MICROBIOME 2018;6:50. [PMID: 29554948 PMCID: PMC5859664 DOI: 10.1186/s40168-018-0437-0] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Accepted: 03/05/2018] [Indexed: 05/03/2023]

Abstract

BACKGROUND

The use of shotgun metagenomics to analyse low-complexity microbial communities in foods has the potential to be of considerable fundamental and applied value. However, there is currently no consensus with respect to choice of species classification tool, platform, or sequencing depth. Here, we benchmarked the performances of three high-throughput short-read sequencing platforms, the Illumina MiSeq, NextSeq 500, and Ion Proton, for shotgun metagenomics of food microbiota. Briefly, we sequenced six kefir DNA samples and a mock community DNA sample, the latter constructed by evenly mixing genomic DNA from 13 food-related bacterial species. A variety of bioinformatic tools were used to analyse the data generated, and the effects of sequencing depth on these analyses were tested by randomly subsampling reads.

RESULTS

Compositional analysis results were consistent between the platforms at divergent sequencing depths. However, we observed pronounced differences in the predictions from species classification tools. Indeed, PERMANOVA indicated that there was no significant differences between the compositional results generated by the different sequencers (p = 0.693, R² = 0.011), but there was a significant difference between the results predicted by the species classifiers (p = 0.01, R² = 0.127). The relative abundances predicted by the classifiers, apart from MetaPhlAn2, were apparently biased by reference genome sizes. Additionally, we observed varying false-positive rates among the classifiers. MetaPhlAn2 had the lowest false-positive rate, whereas SLIMM had the greatest false-positive rate. Strain-level analysis results were also similar across platforms. Each platform correctly identified the strains present in the mock community, but accuracy was improved slightly with greater sequencing depth. Notably, PanPhlAn detected the dominant strains in each kefir sample above 500,000 reads per sample. Again, the outputs from functional profiling analysis using SUPER-FOCUS were generally accordant between the platforms at different sequencing depths. Finally, and expectedly, metagenome assembly completeness was significantly lower on the MiSeq than either on the NextSeq (p = 0.03) or the Proton (p = 0.011), and it improved with increased sequencing depth.

CONCLUSIONS

Our results demonstrate a remarkable similarity in the results generated by the three sequencing platforms at different sequencing depths, and, in fact, the choice of bioinformatics methodology had a more evident impact on results than the choice of sequencer did.

Collapse

Neves ALA, Li F, Ghoshal B, McAllister T, Guan LL. Enhancing the Resolution of Rumen Microbial Classification from Metatranscriptomic Data Using Kraken and Mothur. Front Microbiol 2017;8:2445. [PMID: 29270165 PMCID: PMC5725470 DOI: 10.3389/fmicb.2017.02445] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2017] [Accepted: 11/24/2017] [Indexed: 12/23/2022] Open

Abstract

The advent of next generation sequencing and bioinformatics tools have greatly advanced our knowledge about the phylogenetic diversity and ecological role of microbes inhabiting the mammalian gut. However, there is a lack of information on the evaluation of these computational tools in the context of the rumen microbiome as these programs have mostly been benchmarked on real or simulated datasets generated from human studies. In this study, we compared the outcomes of two methods, Kraken (mRNA based) and a pipeline developed in-house based on Mothur (16S rRNA based), to assess the taxonomic profiles (bacteria and archaea) of rumen microbial communities using total RNA sequencing of rumen fluid collected from 12 cattle with differing feed conversion ratios (FCR). Both approaches revealed a similar phyla distribution of the most abundant taxa, with Bacteroidetes, Firmicutes, and Proteobacteria accounting for approximately 80% of total bacterial abundance. For bacterial taxa, although 69 genera were commonly detected by both methods, an additional 159 genera were exclusively identified by Kraken. Kraken detected 423 species, while Mothur was not able to assign bacterial sequences to the species level. For archaea, both methods generated similar results only for the abundance of Methanomassiliicoccaceae (previously referred as RCC), which comprised more than 65% of the total archaeal families. Taxon R4-41B was exclusively identified by Mothur in the rumen of feed efficient bulls, whereas Kraken uniquely identified Methanococcaceae in inefficient bulls. Although Kraken enhanced the microbial classification at the species level, identification of bacteria or archaea in the rumen is limited due to a lack of reference genomes for the rumen microbiome. The findings from this study suggest that the development of the combined pipelines using Mothur and Kraken is needed for a more inclusive and representative classification of microbiomes.

Collapse