1
|
Bickhart DM, Watson M, Koren S, Panke-Buisse K, Cersosimo LM, Press MO, Van Tassell CP, Van Kessel JAS, Haley BJ, Kim SW, Heiner C, Suen G, Bakshy K, Liachko I, Sullivan ST, Myer PR, Ghurye J, Pop M, Weimer PJ, Phillippy AM, Smith TPL. Assignment of virus and antimicrobial resistance genes to microbial hosts in a complex microbial community by combined long-read assembly and proximity ligation. Genome Biol 2019; 20:153. [PMID: 31375138 PMCID: PMC6676630 DOI: 10.1186/s13059-019-1760-x] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2018] [Accepted: 07/02/2019] [Indexed: 11/10/2022] Open
Abstract
We describe a method that adds long-read sequencing to a mix of technologies used to assemble a highly complex cattle rumen microbial community, and provide a comparison to short read-based methods. Long-read alignments and Hi-C linkage between contigs support the identification of 188 novel virus-host associations and the determination of phage life cycle states in the rumen microbial community. The long-read assembly also identifies 94 antimicrobial resistance genes, compared to only seven alleles in the short-read assembly. We demonstrate novel techniques that work synergistically to improve characterization of biological features in a highly complex rumen microbial community.
Collapse
Affiliation(s)
- Derek M Bickhart
- Cell Wall Biology and Utilization Laboratory, Dairy Forage Research Center, USDA, Madison, WI, 53706, USA
| | - Mick Watson
- Division of Genetics and Genomics, The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, EH25 9RG, UK
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Kevin Panke-Buisse
- Cell Wall Biology and Utilization Laboratory, Dairy Forage Research Center, USDA, Madison, WI, 53706, USA
| | - Laura M Cersosimo
- Department of Animal Sciences, University of Florida, Gainesville, FL, 32611, USA
| | | | - Curtis P Van Tassell
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, 20705, USA
| | - Jo Ann S Van Kessel
- Environmental Microbial and Food Safety Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, 20705, USA
| | - Bradd J Haley
- Environmental Microbial and Food Safety Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, 20705, USA
| | - Seon Woo Kim
- Environmental Microbial and Food Safety Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, 20705, USA
| | | | - Garret Suen
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Kiranmayee Bakshy
- Cell Wall Biology and Utilization Laboratory, Dairy Forage Research Center, USDA, Madison, WI, 53706, USA
| | | | | | - Phillip R Myer
- Department of Animal Science, University of Tennessee, Knoxville, TN, 37996, USA
| | - Jay Ghurye
- Department of Computer Science, University of Maryland, College Park, MD, 20742, USA
| | - Mihai Pop
- Department of Computer Science, University of Maryland, College Park, MD, 20742, USA
| | - Paul J Weimer
- Cell Wall Biology and Utilization Laboratory, Dairy Forage Research Center, USDA, Madison, WI, 53706, USA.,Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Timothy P L Smith
- USDA-ARS U.S. Meat Animal Research Center, Clay Center, NE, 68933, USA.
| |
Collapse
|
2
|
Kunath BJ, Minniti G, Skaugen M, Hagen LH, Vaaje-Kolstad G, Eijsink VGH, Pope PB, Arntzen MØ. Metaproteomics: Sample Preparation and Methodological Considerations. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2019; 1073:187-215. [DOI: 10.1007/978-3-030-12298-0_8] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
3
|
Zhou W, Gay N, Oh J. ReprDB and panDB: minimalist databases with maximal microbial representation. MICROBIOME 2018; 6:15. [PMID: 29347966 PMCID: PMC5774170 DOI: 10.1186/s40168-018-0399-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2017] [Accepted: 01/10/2018] [Indexed: 05/11/2023]
Abstract
BACKGROUND Profiling of shotgun metagenomic samples is hindered by a lack of unified microbial reference genome databases that (i) assemble genomic information from all open access microbial genomes, (ii) have relatively small sizes, and (iii) are compatible to various metagenomic read mapping tools. Moreover, computational tools to rapidly compile and update such databases to accommodate the rapid increase in new reference genomes do not exist. As a result, database-guided analyses often fail to profile a substantial fraction of metagenomic shotgun sequencing reads from complex microbiomes. RESULTS We report pipelines that efficiently traverse all open access microbial genomes and assemble non-redundant genomic information. The pipelines result in two species-resolution microbial reference databases of relatively small sizes: reprDB, which assembles microbial representative or reference genomes, and panDB, for which we developed a novel iterative alignment algorithm to identify and assemble non-redundant genomic regions in multiple sequenced strains. With the databases, we managed to assign taxonomic labels and genome positions to the majority of metagenomic reads from human skin and gut microbiomes, demonstrating a significant improvement over a previous database-guided analysis on the same datasets. CONCLUSIONS reprDB and panDB leverage the rapid increases in the number of open access microbial genomes to more fully profile metagenomic samples. Additionally, the databases exclude redundant sequence information to avoid inflated storage or memory space and indexing or analyzing time. Finally, the novel iterative alignment algorithm significantly increases efficiency in pan-genome identification and can be useful in comparative genomic analyses.
Collapse
Affiliation(s)
- Wei Zhou
- The Jackson Laboratory for Genomic Medicine, Farmington, CT USA
| | - Nicole Gay
- The Jackson Laboratory for Genomic Medicine, Farmington, CT USA
- Stanford University, Stanford, CA USA
| | - Julia Oh
- The Jackson Laboratory for Genomic Medicine, Farmington, CT USA
| |
Collapse
|
4
|
Zhang Q. Metagenome Assembly and Contig Assignment. Methods Mol Biol 2018; 1849:179-192. [PMID: 30298255 DOI: 10.1007/978-1-4939-8728-3_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The recent development of metagenomic assembly has revolutionized metagenomic data analysis, thanks to the improvement of sequencing techniques, more powerful computational infrastructure and the development of novel algorithms and methods. Using longer assembled contigs rather than raw reads improves the process of metagenomic binning and annotation significantly, ultimately resulting in a deeper understanding of the microbial dynamics of the metagenomic samples being analyzed. In this chapter, we demonstrate a typical metagenomic analysis pipeline including raw read quality evaluation and trimming, assembly and contig binning. Alternative tools that can be used for each step are also discussed.
Collapse
Affiliation(s)
- Qingpeng Zhang
- Department of Energy, Joint Genome Institute, Walnut Creek, CA, USA.
| |
Collapse
|
5
|
Herath D, Tang SL, Tandon K, Ackland D, Halgamuge SK. CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision. BMC Bioinformatics 2017; 18:571. [PMID: 29297295 PMCID: PMC5751405 DOI: 10.1186/s12859-017-1967-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Background In metagenomics, the separation of nucleotide sequences belonging to an individual or closely matched populations is termed binning. Binning helps the evaluation of underlying microbial population structure as well as the recovery of individual genomes from a sample of uncultivable microbial organisms. Both supervised and unsupervised learning methods have been employed in binning; however, characterizing a metagenomic sample containing multiple strains remains a significant challenge. In this study, we designed and implemented a new workflow, Coverage and composition based binning of Metagenomes (CoMet), for binning contigs in a single metagenomic sample. CoMet utilizes coverage values and the compositional features of metagenomic contigs. The binning strategy in CoMet includes the initial grouping of contigs in guanine-cytosine (GC) content-coverage space and refinement of bins in tetranucleotide frequencies space in a purely unsupervised manner. With CoMet, the clustering algorithm DBSCAN is employed for binning contigs. The performances of CoMet were compared against four existing approaches for binning a single metagenomic sample, including MaxBin, Metawatt, MyCC (default) and MyCC (coverage) using multiple datasets including a sample comprised of multiple strains. Results Binning methods based on both compositional features and coverages of contigs had higher performances than the method which is based only on compositional features of contigs. CoMet yielded higher or comparable precision in comparison to the existing binning methods on benchmark datasets of varying complexities. MyCC (coverage) had the highest ranking score in F1-score. However, the performances of CoMet were higher than MyCC (coverage) on the dataset containing multiple strains. Furthermore, CoMet recovered contigs of more species and was 18 - 39% higher in precision than the compared existing methods in discriminating species from the sample of multiple strains. CoMet resulted in higher precision than MyCC (default) and MyCC (coverage) on a real metagenome. Conclusions The approach proposed with CoMet for binning contigs, improves the precision of binning while characterizing more species in a single metagenomic sample and in a sample containing multiple strains. The F1-scores obtained from different binning strategies vary with different datasets; however, CoMet yields the highest F1-score with a sample comprised of multiple strains. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1967-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Damayanthi Herath
- Department of Mechanical Engineering, The University of Melbourne, Parkville, Melbourne, 3010, Australia. .,Department of Computer Engineering, University of Peradeniya, Prof. E. O. E. Pereira Mawatha, Peradeniya, 20400, Sri Lanka.
| | - Sen-Lin Tang
- Biodiversity Research Center, Academia Sinica, Nan-Kang, Taipei, 11529, Taiwan
| | - Kshitij Tandon
- Biodiversity Research Center, Academia Sinica, Nan-Kang, Taipei, 11529, Taiwan.,Institute of Bioinformatics and Structural Biology, National Tsing Hua University, Hsinchu, 300, Taiwan.,Bioinformatics Program, Institute of Information Science, Taiwan International Graduate Program, Academia Sinica, Taipei, 115, Taiwan
| | - David Ackland
- Department of Biomedical Engineering, The University of Melbourne, Victoria, 3010, Australia
| | - Saman Kumara Halgamuge
- Research School of Engineering, College of Engineering and Computer Science, The Australian National University, Canberra ACT, 2601, Australia
| |
Collapse
|
6
|
Papudeshi B, Haggerty JM, Doane M, Morris MM, Walsh K, Beattie DT, Pande D, Zaeri P, Silva GGZ, Thompson F, Edwards RA, Dinsdale EA. Optimizing and evaluating the reconstruction of Metagenome-assembled microbial genomes. BMC Genomics 2017; 18:915. [PMID: 29183281 PMCID: PMC5706307 DOI: 10.1186/s12864-017-4294-1] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2017] [Accepted: 11/13/2017] [Indexed: 11/12/2022] Open
Abstract
Background Microbiome/host interactions describe characteristics that affect the host's health. Shotgun metagenomics includes sequencing a random subset of the microbiome to analyze its taxonomic and metabolic potential. Reconstruction of DNA fragments into genomes from metagenomes (called metagenome-assembled genomes) assigns unknown fragments to taxa/function and facilitates discovery of novel organisms. Genome reconstruction incorporates sequence assembly and sorting of assembled sequences into bins, characteristic of a genome. However, the microbial community composition, including taxonomic and phylogenetic diversity may influence genome reconstruction. We determine the optimal reconstruction method for four microbiome projects that had variable sequencing platforms (IonTorrent and Illumina), diversity (high or low), and environment (coral reefs and kelp forests), using a set of parameters to select for optimal assembly and binning tools. Methods We tested the effects of the assembly and binning processes on population genome reconstruction using 105 marine metagenomes from 4 projects. Reconstructed genomes were obtained from each project using 3 assemblers (IDBA, MetaVelvet, and SPAdes) and 2 binning tools (GroopM and MetaBat). We assessed the efficiency of assemblers using statistics that including contig continuity and contig chimerism and the effectiveness of binning tools using genome completeness and taxonomic identification. Results We concluded that SPAdes, assembled more contigs (143,718 ± 124 contigs) of longer length (N50 = 1632 ± 108 bp), and incorporated the most sequences (sequences-assembled = 19.65%). The microbial richness and evenness were maintained across the assembly, suggesting low contig chimeras. SPAdes assembly was responsive to the biological and technological variations within the project, compared with other assemblers. Among binning tools, we conclude that MetaBat produced bins with less variation in GC content (average standard deviation: 1.49), low species richness (4.91 ± 0.66), and higher genome completeness (40.92 ± 1.75) across all projects. MetaBat extracted 115 bins from the 4 projects of which 66 bins were identified as reconstructed metagenome-assembled genomes with sequences belonging to a specific genus. We identified 13 novel genomes, some of which were 100% complete, but show low similarity to genomes within databases. Conclusions In conclusion, we present a set of biologically relevant parameters for evaluation to select for optimal assembly and binning tools. For the tools we tested, SPAdes assembler and MetaBat binning tools reconstructed quality metagenome-assembled genomes for the four projects. We also conclude that metagenomes from microbial communities that have high coverage of phylogenetically distinct, and low taxonomic diversity results in highest quality metagenome-assembled genomes. Electronic supplementary material The online version of this article (10.1186/s12864-017-4294-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Bhavya Papudeshi
- Bioinformatics and Medical Informatics, San Diego State University, San Diego, California, USA.,National Center for Genome Analysis Support, Indiana University, Bloomington, Indiana, USA
| | - J Matthew Haggerty
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, 92115, California, USA
| | - Michael Doane
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, 92115, California, USA
| | - Megan M Morris
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, 92115, California, USA
| | - Kevin Walsh
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, 92115, California, USA
| | - Douglas T Beattie
- Department of Biology, University of New South Wales, Sydney, New South Wales, Australia
| | - Dnyanada Pande
- Bioinformatics and Medical Informatics, San Diego State University, San Diego, California, USA
| | - Parisa Zaeri
- Department of Mathematics and Statistics, San Diego State University, San Diego, California, USA
| | - Genivaldo G Z Silva
- Computational Science Research Center, San Diego State University, San Diego, California, USA
| | - Fabiano Thompson
- Institute of Biology, Federal University of Rio de Janeiro (UFRJ), Rio de Janeiro, Brazil
| | - Robert A Edwards
- Department of Computer Science, San Diego State University, 5500 Campanile Drive, San Diego, California, USA
| | - Elizabeth A Dinsdale
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, 92115, California, USA.
| |
Collapse
|
7
|
Becraft ED, Woyke T, Jarett J, Ivanova N, Godoy-Vitorino F, Poulton N, Brown JM, Brown J, Lau MCY, Onstott T, Eisen JA, Moser D, Stepanauskas R. Rokubacteria: Genomic Giants among the Uncultured Bacterial Phyla. Front Microbiol 2017; 8:2264. [PMID: 29234309 PMCID: PMC5712423 DOI: 10.3389/fmicb.2017.02264] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2017] [Accepted: 11/02/2017] [Indexed: 01/08/2023] Open
Abstract
Recent advances in single-cell genomic and metagenomic techniques have facilitated the discovery of numerous previously unknown, deep branches of the tree of life that lack cultured representatives. Many of these candidate phyla are composed of microorganisms with minimalistic, streamlined genomes lacking some core metabolic pathways, which may contribute to their resistance to growth in pure culture. Here we analyzed single-cell genomes and metagenome bins to show that the "Candidate phylum Rokubacteria," formerly known as SPAM, represents an interesting exception, by having large genomes (6-8 Mbps), high GC content (66-71%), and the potential for a versatile, mixotrophic metabolism. We also observed an unusually high genomic heterogeneity among individual Rokubacteria cells in the studied samples. These features may have contributed to the limited recovery of sequences of this candidate phylum in prior cultivation and metagenomic studies. Our analyses suggest that Rokubacteria are distributed globally in diverse terrestrial ecosystems, including soils, the rhizosphere, volcanic mud, oil wells, aquifers, and the deep subsurface, with no reports from marine environments to date.
Collapse
Affiliation(s)
- Eric D Becraft
- Bigelow Laboratory for Ocean Sciences, East Boothbay, ME, United States
| | - Tanja Woyke
- Joint Genome Institute, Walnut Creek, CA, United States
| | | | | | - Filipa Godoy-Vitorino
- Department of Natural Sciences, Inter American University of Puerto Rico, San Juan, Puerto Rico
| | - Nicole Poulton
- Bigelow Laboratory for Ocean Sciences, East Boothbay, ME, United States
| | - Julia M Brown
- Bigelow Laboratory for Ocean Sciences, East Boothbay, ME, United States
| | - Joseph Brown
- Bigelow Laboratory for Ocean Sciences, East Boothbay, ME, United States
| | - M C Y Lau
- Department of Geosciences, Princeton University, Princeton, NJ, United States
| | - Tullis Onstott
- Department of Geosciences, Princeton University, Princeton, NJ, United States
| | - Jonathan A Eisen
- College of Biological Sciences, Genome Center, University of California, Davis, Davis, CA, United States
| | - Duane Moser
- Desert Research Institute, Las Vegas, NV, United States
| | | |
Collapse
|
8
|
Roux S, Emerson JB, Eloe-Fadrosh EA, Sullivan MB. Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity. PeerJ 2017; 5:e3817. [PMID: 28948103 PMCID: PMC5610896 DOI: 10.7717/peerj.3817] [Citation(s) in RCA: 169] [Impact Index Per Article: 24.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2017] [Accepted: 08/26/2017] [Indexed: 12/20/2022] Open
Abstract
Background Viral metagenomics (viromics) is increasingly used to obtain uncultivated viral genomes, evaluate community diversity, and assess ecological hypotheses. While viromic experimental methods are relatively mature and widely accepted by the research community, robust bioinformatics standards remain to be established. Here we used in silico mock viral communities to evaluate the viromic sequence-to-ecological-inference pipeline, including (i) read pre-processing and metagenome assembly, (ii) thresholds applied to estimate viral relative abundances based on read mapping to assembled contigs, and (iii) normalization methods applied to the matrix of viral relative abundances for alpha and beta diversity estimates. Results Tools specifically designed for metagenomes, specifically metaSPAdes, MEGAHIT, and IDBA-UD, were the most effective at assembling viromes. Read pre-processing, such as partitioning, had virtually no impact on assembly output, but may be useful when hardware is limited. Viral populations with 2–5 × coverage typically assembled well, whereas lesser coverage led to fragmented assembly. Strain heterogeneity within populations hampered assembly, especially when strains were closely related (average nucleotide identity, or ANI ≥97%) and when the most abundant strain represented <50% of the population. Viral community composition assessments based on read recruitment were generally accurate when the following thresholds for detection were applied: (i) ≥10 kb contig lengths to define populations, (ii) coverage defined from reads mapping at ≥90% identity, and (iii) ≥75% of contig length with ≥1 × coverage. Finally, although data are limited to the most abundant viruses in a community, alpha and beta diversity patterns were robustly estimated (±10%) when comparing samples of similar sequencing depth, but more divergent (up to 80%) when sequencing depth was uneven across the dataset. In the latter cases, the use of normalization methods specifically developed for metagenomes provided the best estimates. Conclusions These simulations provide benchmarks for selecting analysis cut-offs and establish that an optimized sample-to-ecological-inference viromics pipeline is robust for making ecological inferences from natural viral communities. Continued development to better accessing RNA, rare, and/or diverse viral populations and improved reference viral genome availability will alleviate many of viromics remaining limitations.
Collapse
Affiliation(s)
- Simon Roux
- Department of Microbiology, Ohio State University, Columbus, OH, United States of America
| | - Joanne B Emerson
- Department of Microbiology, Ohio State University, Columbus, OH, United States of America
| | - Emiley A Eloe-Fadrosh
- Joint Genome Institute, Department of Energy, Walnut Creek, CA, United States of America
| | - Matthew B Sullivan
- Department of Microbiology, Ohio State University, Columbus, OH, United States of America.,Department of Civil, Environmental and Geodetic Engineering, Ohio State University, Columbus, OH, United States of America
| |
Collapse
|
9
|
Olson ND, Zook JM, Morrow JB, Lin NJ. Challenging a bioinformatic tool's ability to detect microbial contaminants using in silico whole genome sequencing data. PeerJ 2017; 5:e3729. [PMID: 28924496 PMCID: PMC5600177 DOI: 10.7717/peerj.3729] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2017] [Accepted: 08/02/2017] [Indexed: 12/20/2022] Open
Abstract
High sensitivity methods such as next generation sequencing and polymerase chain reaction (PCR) are adversely impacted by organismal and DNA contaminants. Current methods for detecting contaminants in microbial materials (genomic DNA and cultures) are not sensitive enough and require either a known or culturable contaminant. Whole genome sequencing (WGS) is a promising approach for detecting contaminants due to its sensitivity and lack of need for a priori assumptions about the contaminant. Prior to applying WGS, we must first understand its limitations for detecting contaminants and potential for false positives. Herein we demonstrate and characterize a WGS-based approach to detect organismal contaminants using an existing metagenomic taxonomic classification algorithm. Simulated WGS datasets from ten genera as individuals and binary mixtures of eight organisms at varying ratios were analyzed to evaluate the role of contaminant concentration and taxonomy on detection. For the individual genomes the false positive contaminants reported depended on the genus, with Staphylococcus, Escherichia, and Shigella having the highest proportion of false positives. For nearly all binary mixtures the contaminant was detected in the in-silico datasets at the equivalent of 1 in 1,000 cells, though F. tularensis was not detected in any of the simulated contaminant mixtures and Y. pestis was only detected at the equivalent of one in 10 cells. Once a WGS method for detecting contaminants is characterized, it can be applied to evaluate microbial material purity, in efforts to ensure that contaminants are characterized in microbial materials used to validate pathogen detection assays, generate genome assemblies for database submission, and benchmark sequencing methods.
Collapse
Affiliation(s)
- Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, United States of America
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, United States of America
| | - Jayne B Morrow
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, United States of America
| | - Nancy J Lin
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, United States of America
| |
Collapse
|
10
|
Jünemann S, Kleinbölting N, Jaenicke S, Henke C, Hassa J, Nelkner J, Stolze Y, Albaum SP, Schlüter A, Goesmann A, Sczyrba A, Stoye J. Bioinformatics for NGS-based metagenomics and the application to biogas research. J Biotechnol 2017; 261:10-23. [PMID: 28823476 DOI: 10.1016/j.jbiotec.2017.08.012] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Revised: 08/08/2017] [Accepted: 08/09/2017] [Indexed: 12/19/2022]
Abstract
Metagenomics has proven to be one of the most important research fields for microbial ecology during the last decade. Starting from 16S rRNA marker gene analysis for the characterization of community compositions to whole metagenome shotgun sequencing which additionally allows for functional analysis, metagenomics has been applied in a wide spectrum of research areas. The cost reduction paired with the increase in the amount of data due to the advent of next-generation sequencing led to a rapidly growing demand for bioinformatic software in metagenomics. By now, a large number of tools that can be used to analyze metagenomic datasets has been developed. The Bielefeld-Gießen center for microbial bioinformatics as part of the German Network for Bioinformatics Infrastructure bundles and imparts expert knowledge in the analysis of metagenomic datasets, especially in research on microbial communities involved in anaerobic digestion residing in biogas reactors. In this review, we give an overview of the field of metagenomics, introduce into important bioinformatic tools and possible workflows, accompanied by application examples of biogas surveys successfully conducted at the Center for Biotechnology of Bielefeld University.
Collapse
Affiliation(s)
- Sebastian Jünemann
- Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany; Faculty of Technology, Bielefeld University, Bielefeld, Germany.
| | - Nils Kleinbölting
- Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany
| | - Sebastian Jaenicke
- Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany; Bioinformatics and Systems Biology, Justus-Liebig-Universität, Gießen, Germany
| | - Christian Henke
- Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany
| | - Julia Hassa
- Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany
| | - Johanna Nelkner
- Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany
| | - Yvonne Stolze
- Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany
| | - Stefan P Albaum
- Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany
| | - Andreas Schlüter
- Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany
| | - Alexander Goesmann
- Bioinformatics and Systems Biology, Justus-Liebig-Universität, Gießen, Germany
| | - Alexander Sczyrba
- Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany; Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Jens Stoye
- Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany; Faculty of Technology, Bielefeld University, Bielefeld, Germany
| |
Collapse
|
11
|
Piro VC, Matschkowski M, Renard BY. MetaMeta: integrating metagenome analysis tools to improve taxonomic profiling. MICROBIOME 2017; 5:101. [PMID: 28807044 PMCID: PMC5557516 DOI: 10.1186/s40168-017-0318-y] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2016] [Accepted: 07/25/2017] [Indexed: 05/11/2023]
Abstract
BACKGROUND Many metagenome analysis tools are presently available to classify sequences and profile environmental samples. In particular, taxonomic profiling and binning methods are commonly used for such tasks. Tools available among these two categories make use of several techniques, e.g., read mapping, k-mer alignment, and composition analysis. Variations on the construction of the corresponding reference sequence databases are also common. In addition, different tools provide good results in different datasets and configurations. All this variation creates a complicated scenario to researchers to decide which methods to use. Installation, configuration and execution can also be difficult especially when dealing with multiple datasets and tools. RESULTS We propose MetaMeta: a pipeline to execute and integrate results from metagenome analysis tools. MetaMeta provides an easy workflow to run multiple tools with multiple samples, producing a single enhanced output profile for each sample. MetaMeta includes a database generation, pre-processing, execution, and integration steps, allowing easy execution and parallelization. The integration relies on the co-occurrence of organisms from different methods as the main feature to improve community profiling while accounting for differences in their databases. CONCLUSIONS In a controlled case with simulated and real data, we show that the integrated profiles of MetaMeta overcome the best single profile. Using the same input data, it provides more sensitive and reliable results with the presence of each organism being supported by several methods. MetaMeta uses Snakemake and has six pre-configured tools, all available at BioConda channel for easy installation (conda install -c bioconda metameta). The MetaMeta pipeline is open-source and can be downloaded at: https://gitlab.com/rki_bioinformatics .
Collapse
Affiliation(s)
- Vitor C. Piro
- Research Group Bioinformatics (NG4), Robert Koch Institute, Nordufer 20, Berlin, 13353 Germany
- CAPES Foundation, Ministry of Education of Brazil, Brasília, 70040-020 DF Brazil
| | - Marcel Matschkowski
- Research Group Bioinformatics (NG4), Robert Koch Institute, Nordufer 20, Berlin, 13353 Germany
| | - Bernhard Y. Renard
- Research Group Bioinformatics (NG4), Robert Koch Institute, Nordufer 20, Berlin, 13353 Germany
| |
Collapse
|
12
|
Interpreting Microbial Biosynthesis in the Genomic Age: Biological and Practical Considerations. Mar Drugs 2017; 15:md15060165. [PMID: 28587290 PMCID: PMC5484115 DOI: 10.3390/md15060165] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2017] [Revised: 05/22/2017] [Accepted: 05/31/2017] [Indexed: 02/06/2023] Open
Abstract
Genome mining has become an increasingly powerful, scalable, and economically accessible tool for the study of natural product biosynthesis and drug discovery. However, there remain important biological and practical problems that can complicate or obscure biosynthetic analysis in genomic and metagenomic sequencing projects. Here, we focus on limitations of available technology as well as computational and experimental strategies to overcome them. We review the unique challenges and approaches in the study of symbiotic and uncultured systems, as well as those associated with biosynthetic gene cluster (BGC) assembly and product prediction. Finally, to explore sequencing parameters that affect the recovery and contiguity of large and repetitive BGCs assembled de novo, we simulate Illumina and PacBio sequencing of the Salinispora tropica genome focusing on assembly of the salinilactam (slm) BGC.
Collapse
|
13
|
Abstract
Microorganisms play a primary role in regulating biogeochemical cycles and are a valuable source of enzymes that have biotechnological applications, such as carbohydrate-active enzymes (CAZymes). However, the inability to culture the majority of microorganisms that exist in natural ecosystems using common culture-dependent techniques restricts access to potentially novel cellulolytic bacteria and beneficial enzymes. The development of molecular-based culture-independent methods such as metagenomics enables researchers to study microbial communities directly from environmental samples, and presents a platform from which enzymes of interest can be sourced. We outline key methodological stages that are required as well as describe specific protocols that are currently used for metagenomic projects dedicated to CAZyme discovery.
Collapse
Affiliation(s)
- Benoit J Kunath
- Department of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, 5003, 1432, Ås, Norway
| | - Andreas Bremges
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, 38124, Braunschweig, Germany
- German Center for Infection Research (DZIF), 38124, Braunschweig, Germany
| | - Aaron Weimann
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, 38124, Braunschweig, Germany
| | - Alice C McHardy
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, 38124, Braunschweig, Germany
| | - Phillip B Pope
- Department of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, 5003, 1432, Ås, Norway.
| |
Collapse
|