101
|
Mitra S. Multiple Data Analyses and Statistical Approaches for Analyzing Data from Metagenomic Studies and Clinical Trials. Methods Mol Biol 2019; 1910:605-634. [PMID: 31278679 DOI: 10.1007/978-1-4939-9074-0_20] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Metagenomics, also known as environmental genomics, is the study of the genomic content of a sample of organisms (microbes) obtained from a common habitat. Metagenomics and other "omics" disciplines have captured the attention of researchers for several decades. The effect of microbes in our body is a relevant concern for health studies. There are plenty of studies using metagenomics which examine microorganisms that inhabit niches in the human body, sometimes causing disease, and are often correlated with multiple treatment conditions. No matter from which environment it comes, the analyses are often aimed at determining either the presence or absence of specific species of interest in a given metagenome or comparing the biological diversity and the functional activity of a wider range of microorganisms within their communities. The importance increases for comparison within different environments such as multiple patients with different conditions, multiple drugs, and multiple time points of same treatment or same patient. Thus, no matter how many hypotheses we have, we need a good understanding of genomics, bioinformatics, and statistics to work together to analyze and interpret these datasets in a meaningful way. This chapter provides an overview of different data analyses and statistical approaches (with example scenarios) to analyze metagenomics samples from different medical projects or clinical trials.
Collapse
Affiliation(s)
- Suparna Mitra
- Leeds Institute of Medical Research, University of Leeds, Microbiology, Old Medical School, Leeds General Infirmary, Leeds LS1 3EX, West Yorkshire, UK.
| |
Collapse
|
102
|
Beilsmith K, Thoen MPM, Brachi B, Gloss AD, Khan MH, Bergelson J. Genome-wide association studies on the phyllosphere microbiome: Embracing complexity in host-microbe interactions. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2019; 97:164-181. [PMID: 30466152 DOI: 10.1111/tpj.14170] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Revised: 11/08/2018] [Accepted: 11/16/2018] [Indexed: 05/18/2023]
Abstract
Environmental sequencing shows that plants harbor complex communities of microbes that vary across environments. However, many approaches for mapping plant genetic variation to microbe-related traits were developed in the relatively simple context of binary host-microbe interactions under controlled conditions. Recent advances in sequencing and statistics make genome-wide association studies (GWAS) an increasingly promising approach for identifying the plant genetic variation associated with microbes in a community context. This review discusses early efforts on GWAS of the plant phyllosphere microbiome and the outlook for future studies based on human microbiome GWAS. A workflow for GWAS of the phyllosphere microbiome is then presented, with particular attention to how perspectives on the mechanisms, evolution and environmental dependence of plant-microbe interactions will influence the choice of traits to be mapped.
Collapse
Affiliation(s)
- Kathleen Beilsmith
- Department of Ecology and Evolution, University of Chicago, 1101 E 57th St, Chicago, IL, 60637, USA
| | - Manus P M Thoen
- Department of Ecology and Evolution, University of Chicago, 1101 E 57th St, Chicago, IL, 60637, USA
| | - Benjamin Brachi
- BIOGECO, INRA, University of Bordeaux, 33610, Cestas, France
| | - Andrew D Gloss
- Department of Ecology and Evolution, University of Chicago, 1101 E 57th St, Chicago, IL, 60637, USA
| | - Mohammad H Khan
- Department of Ecology and Evolution, University of Chicago, 1101 E 57th St, Chicago, IL, 60637, USA
| | - Joy Bergelson
- Department of Ecology and Evolution, University of Chicago, 1101 E 57th St, Chicago, IL, 60637, USA
| |
Collapse
|
103
|
Blaustein RA, McFarland AG, Ben Maamar S, Lopez A, Castro-Wallace S, Hartmann EM. Pangenomic Approach To Understanding Microbial Adaptations within a Model Built Environment, the International Space Station, Relative to Human Hosts and Soil. mSystems 2019; 4:e00281-18. [PMID: 30637341 PMCID: PMC6325168 DOI: 10.1128/msystems.00281-18] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Accepted: 12/07/2018] [Indexed: 12/11/2022] Open
Abstract
Understanding underlying mechanisms involved in microbial persistence in the built environment (BE) is essential for strategically mitigating potential health risks. To test the hypothesis that BEs impose selective pressures resulting in characteristic adaptive responses, we performed a pangenomics meta-analysis leveraging 189 genomes (accessed from GenBank) of two epidemiologically important taxa, Bacillus cereus and Staphylococcus aureus, isolated from various origins: the International Space Station (ISS; a model BE), Earth-based BEs, soil, and humans. Our objectives were to (i) identify differences in the pangenomic composition of generalist and host-associated organisms, (ii) characterize genes and functions involved in BE-associated selection, and (iii) identify genomic signatures of ISS-derived strains of potential relevance for astronaut health. The pangenome of B. cereus was more expansive than that of S. aureus, which had a dominant core component. Genomic contents of both taxa significantly correlated with isolate origin, demonstrating an importance for biogeography and potential niche adaptations. ISS/BE-enriched functions were often involved in biosynthesis, catabolism, materials transport, metabolism, and stress response. Multiple origin-enriched functions also overlapped across taxa, suggesting conserved adaptive processes. We further characterized two mobile genetic elements with local neighborhood genes encoding biosynthesis and stress response functions that distinctively associated with B. cereus from the ISS. Although antibiotic resistance genes were present in ISS/BE isolates, they were also common in counterparts elsewhere. Overall, despite differences in microbial lifestyle, some functions appear common to remaining viable in the BE, and those functions are not typically associated with direct impacts on human health. IMPORTANCE The built environment contains a variety of microorganisms, some of which pose critical human health risks (e.g., hospital-acquired infection, antibiotic resistance dissemination). We uncovered a combination of complex biological functions that may play a role in bacterial survival under the presumed selective pressures in a model built environment-the International Space Station-by using an approach to compare pangenomes of bacterial strains from two clinically relevant species (B. cereus and S. aureus) isolated from both built environments and humans. Our findings suggest that the most crucial bacterial functions involved in this potential adaptive response are specific to bacterial lifestyle and do not appear to have direct impacts on human health.
Collapse
Affiliation(s)
- Ryan A. Blaustein
- Department of Civil and Environmental Engineering, Northwestern University, Evanston, Illinois, USA
| | - Alexander G. McFarland
- Department of Civil and Environmental Engineering, Northwestern University, Evanston, Illinois, USA
| | - Sarah Ben Maamar
- Department of Civil and Environmental Engineering, Northwestern University, Evanston, Illinois, USA
| | - Alberto Lopez
- Department of Microbiology-Immunology, Northwestern University, Evanston, Illinois, USA
| | - Sarah Castro-Wallace
- Biomedical Research and Environmental Sciences Division, NASA Johnson Space Center, Houston, Texas, USA
| | - Erica M. Hartmann
- Department of Civil and Environmental Engineering, Northwestern University, Evanston, Illinois, USA
| |
Collapse
|
104
|
Carrera-Quintanar L, Ortuño-Sahagún D, Franco-Arroyo NN, Viveros-Paredes JM, Zepeda-Morales AS, Lopez-Roa RI. The Human Microbiota and Obesity: A Literature Systematic Review of In Vivo Models and Technical Approaches. Int J Mol Sci 2018; 19:ijms19123827. [PMID: 30513674 PMCID: PMC6320813 DOI: 10.3390/ijms19123827] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2018] [Revised: 11/23/2018] [Accepted: 11/24/2018] [Indexed: 12/14/2022] Open
Abstract
Obesity is a noncommunicable disease that affects a considerable part of humanity. Recently, it has been recognized that gut microbiota constitutes a fundamental factor in the triggering and development of a large number of pathologies, among which obesity is one of the most related to the processes of dysbiosis. In this review, different animal model approaches, methodologies, and genome scale metabolic databases were revisited to study the gut microbiota and its relationship with metabolic disease. As a data source, PubMed for English-language published material from 1 January 2013, to 22 August 2018, were screened. Some previous studies were included if they were considered classics or highly relevant. Studies that included innovative technical approaches or different in vivo or in vitro models for the study of the relationship between gut microbiota and obesity were selected after a 16-different-keyword exhaustive search. A clear panorama of the current available options for the study of microbiota’s influence on obesity, both for animal model election and technical approaches, is presented to the researcher. All the knowledge generated from the study of the microbiota opens the possibility of considering fecal transplantation as a relevant therapeutic alternative for obesity and other metabolic disease treatment.
Collapse
Affiliation(s)
- Lucrecia Carrera-Quintanar
- Laboratorio de Ciencias de los Alimentos, Departamento de Reproducción Humana, Crecimiento y Desarrollo Infantil, Universidad de Guadalajara, CUCS, Guadalajara Jalisco 45180, Mexico.
| | - Daniel Ortuño-Sahagún
- Laboratorio de Neuroinmunobiología Molecular, Instituto de Investigación en Ciencias Biomédicas (IICB) CUCS, Universidad de Guadalajara, Guadalajara Jalisco 45180, Mexico.
| | - Noel N Franco-Arroyo
- Laboratorio de Investigación y Desarrollo Farmacéutico, Universidad de Guadalajara, CUCEI, Guadalajara Jalisco 44430, Mexico.
| | - Juan M Viveros-Paredes
- Laboratorio de Investigación y Desarrollo Farmacéutico, Universidad de Guadalajara, CUCEI, Guadalajara Jalisco 44430, Mexico.
| | - Adelaida S Zepeda-Morales
- Laboratorio de Investigación y Desarrollo Farmacéutico, Universidad de Guadalajara, CUCEI, Guadalajara Jalisco 44430, Mexico.
| | - Rocio I Lopez-Roa
- Laboratorio de Investigación y Desarrollo Farmacéutico, Universidad de Guadalajara, CUCEI, Guadalajara Jalisco 44430, Mexico.
| |
Collapse
|
105
|
Tangherlini M, Miralto M, Colantuono C, Sangiovanni M, Dell’ Anno A, Corinaldesi C, Danovaro R, Chiusano ML. GLOSSary: the GLobal Ocean 16S subunit web accessible resource. BMC Bioinformatics 2018; 19:443. [PMID: 30497362 PMCID: PMC6266928 DOI: 10.1186/s12859-018-2423-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Environmental metagenomics is a challenging approach that is exponentially spreading in the scientific community to investigate taxonomic diversity and possible functions of the biological components. The massive amount of sequence data produced, often endowed with rich environmental metadata, needs suitable computational tools to fully explore the embedded information. Bioinformatics plays a key role in providing methodologies to manage, process and mine molecular data, integrated with environmental metagenomics collections. One such relevant example is represented by the Tara Ocean Project. RESULTS We considered the Tara 16S miTAGs released by the consortium, representing raw sequences from a shotgun metagenomics approach with similarities to 16S rRNA genes. We generated assembled 16S rDNA sequences, which were classified according to their lengths, the possible presence of chimeric reads, the putative taxonomic affiliation. The dataset was included in GLOSSary (the GLobal Ocean 16S Subunit web accessible resource), a bioinformatics platform to organize environmental metagenomics data. The aims of this work were: i) to present alternative computational approaches to manage challenging metagenomics data; ii) to set up user friendly web-based platforms to allow the integration of environmental metagenomics sequences and of the associated metadata; iii) to implement an appropriate bioinformatics platform supporting the analysis of 16S rDNA sequences exploiting reference datasets, such as the SILVA database. We organized the data in a next-generation NoSQL "schema-less" database, allowing flexible organization of large amounts of data and supporting native geospatial queries. A web interface was developed to permit an interactive exploration and a visual geographical localization of the data, either raw miTAG reads or 16S contigs, from our processing pipeline. Information on unassembled sequences is also available. The taxonomic affiliations of contigs and miTAGs, and the spatial distribution of the sampling sites and their associated sequence libraries, as they are contained in the Tara metadata, can be explored by a query interface, which allows both textual and visual investigations. In addition, all the sequence data were made available for a dedicated BLAST-based web application alongside the SILVA collection. CONCLUSIONS GLOSSary provides an expandable bioinformatics environment, able to support the scientific community in current and forthcoming environmental metagenomics analyses.
Collapse
Affiliation(s)
- M. Tangherlini
- Stazione Zoologica “Anton Dohrn”, Villa Comunale, 80121 Naples, Italy
| | - M. Miralto
- Stazione Zoologica “Anton Dohrn”, Villa Comunale, 80121 Naples, Italy
| | - C. Colantuono
- Stazione Zoologica “Anton Dohrn”, Villa Comunale, 80121 Naples, Italy
| | - M. Sangiovanni
- Stazione Zoologica “Anton Dohrn”, Villa Comunale, 80121 Naples, Italy
| | - A. Dell’ Anno
- Dipartimento di Scienze della Vita e dell’Ambiente, Polytechnic University of Marche, Via Brecce Bianche, 60131 Ancona, Italy
| | - C. Corinaldesi
- Dipartimento di Scienze e Ingegneria della Materia, dell’Ambiente ed Urbanistica, Polytechnic University of Marche, Via Brecce Bianche, 60131 Ancona, Italy
| | - R. Danovaro
- Stazione Zoologica “Anton Dohrn”, Villa Comunale, 80121 Naples, Italy
- Dipartimento di Scienze della Vita e dell’Ambiente, Polytechnic University of Marche, Via Brecce Bianche, 60131 Ancona, Italy
| | - M. L. Chiusano
- Stazione Zoologica “Anton Dohrn”, Villa Comunale, 80121 Naples, Italy
- Dipartimento di Agraria, University of Naples “Federico II”, via Università 100, 80055 Portici, Italy
| |
Collapse
|
106
|
Guerin E, Shkoporov A, Stockdale SR, Clooney AG, Ryan FJ, Sutton TDS, Draper LA, Gonzalez-Tortuero E, Ross RP, Hill C. Biology and Taxonomy of crAss-like Bacteriophages, the Most Abundant Virus in the Human Gut. Cell Host Microbe 2018; 24:653-664.e6. [PMID: 30449316 DOI: 10.1016/j.chom.2018.10.002] [Citation(s) in RCA: 164] [Impact Index Per Article: 27.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2018] [Revised: 07/02/2018] [Accepted: 09/17/2018] [Indexed: 12/22/2022]
Abstract
CrAssphages represent the most abundant virus in the human gut microbiota, but the lack of available genome sequences for comparison has kept them enigmatic. Recently, sequence-based classification of distantly related crAss-like phages from multiple environments was reported, leading to a proposed familial-level taxonomic group. Here, we assembled the metagenomic sequencing reads from 702 human fecal virome/phageome samples and analyzed 99 complete circular crAss-like phage genomes and 150 contigs ≥70 kb. In silico comparative genomics and taxonomic analysis enabled a classification scheme of crAss-like phages from human fecal microbiomes into four candidate subfamilies composed of ten candidate genera. Laboratory analysis was performed on fecal samples from an individual harboring seven distinct crAss-like phages. We achieved crAss-like phage propagation in ex vivo human fecal fermentations and visualized short-tailed podoviruses by electron microscopy. Mass spectrometry of a crAss-like phage capsid protein could be linked to metagenomic sequencing data, confirming crAss-like phage structural annotations.
Collapse
Affiliation(s)
- Emma Guerin
- APC Microbiome Ireland, University College Cork, Cork, Ireland; School of Microbiology, University College Cork, Cork, Ireland
| | | | - Stephen R Stockdale
- APC Microbiome Ireland, University College Cork, Cork, Ireland; Teagasc Food Research Centre, Moorepark, Fermoy, Co., Cork, Ireland
| | - Adam G Clooney
- APC Microbiome Ireland, University College Cork, Cork, Ireland
| | - Feargal J Ryan
- APC Microbiome Ireland, University College Cork, Cork, Ireland
| | - Thomas D S Sutton
- APC Microbiome Ireland, University College Cork, Cork, Ireland; School of Microbiology, University College Cork, Cork, Ireland
| | | | | | - R Paul Ross
- APC Microbiome Ireland, University College Cork, Cork, Ireland; School of Microbiology, University College Cork, Cork, Ireland; Teagasc Food Research Centre, Moorepark, Fermoy, Co., Cork, Ireland
| | - Colin Hill
- APC Microbiome Ireland, University College Cork, Cork, Ireland; School of Microbiology, University College Cork, Cork, Ireland.
| |
Collapse
|
107
|
Gerner SM, Rattei T, Graf AB. Assessment of urban microbiome assemblies with the help of targeted in silico gold standards. Biol Direct 2018; 13:22. [PMID: 30621760 PMCID: PMC6889603 DOI: 10.1186/s13062-018-0225-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Accepted: 09/25/2018] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Microbial communities play a crucial role in our environment and may influence human health tremendously. Despite being the place where human interaction is most abundant we still know little about the urban microbiome. This is highlighted by the large amount of unclassified DNA reads found in urban metagenome samples. The only in silico approach that allows us to find unknown species, is the assembly and classification of draft genomes from a metagenomic dataset. In this study we (1) investigate the applicability of an assembly and binning approach for urban metagenome datasets, and (2) develop a new method for the generation of in silico gold standards to better understand the specific challenges of such datasets and provide a guide in the selection of available software. RESULTS We applied combinations of three assembly (Megahit, SPAdes and MetaSPAdes) and three binning tools (MaxBin, MetaBAT and CONCOCT) to whole genome shotgun datasets from the CAMDA 2017 Challenge. Complex in silico gold standards with a simulated bacterial fraction were generated for representative samples of each surface type and city. Using these gold standards, we found the combination of SPAdes and MetaBAT to be optimal for urban metagenome datasets by providing the best trade-off between the number of high-quality genome draft bins (MIMAG standards) retrieved, the least amount of misassemblies and contamination. The assembled draft genomes included known species like Propionibacterium acnes but also novel species according to respective ANI values. CONCLUSIONS In our work, we showed that, even for datasets with high diversity and low sequencing depth from urban environments, assembly and binning-based methods can provide high-quality genome drafts. Of vital importance to retrieve high-quality genome drafts is sequence depth but even more so a high proportion of the bacterial sequence fraction too achieve high coverage for bacterial genomes. In contrast to read-based methods relying on database knowledge, genome-centric methods as applied in this study can provide valuable information about unknown species and strains as well as functional contributions of single community members within a sample. Furthermore, we present a method for the generation of sample-specific highly complex in silico gold standards. REVIEWERS This article was reviewed by Craig Herbold, Serghei Mangul and Yana Bromberg.
Collapse
Affiliation(s)
- Samuel M. Gerner
- Department Bioengineering, University of Applied Sciences FH Campus Wien, Vienna, Austria
- Division of Computational System Biology, Department of Microbiology and Ecosystem Science, University of Vienna, Vienna, Austria
| | - Thomas Rattei
- Division of Computational System Biology, Department of Microbiology and Ecosystem Science, University of Vienna, Vienna, Austria
| | - Alexandra B. Graf
- Department Bioengineering, University of Applied Sciences FH Campus Wien, Vienna, Austria
| |
Collapse
|
108
|
Deltaproteobacteria (Pelobacter) and Methanococcoides are responsible for choline-dependent methanogenesis in a coastal saltmarsh sediment. ISME JOURNAL 2018; 13:277-289. [PMID: 30206424 PMCID: PMC6331629 DOI: 10.1038/s41396-018-0269-8] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Revised: 06/11/2018] [Accepted: 07/26/2018] [Indexed: 11/08/2022]
Abstract
Coastal saltmarsh sediments represent an important source of natural methane emissions, much of which originates from quaternary and methylated amines, such as choline and trimethylamine. In this study, we combine DNA stable isotope probing with high throughput sequencing of 16S rRNA genes and 13C2-choline enriched metagenomes, followed by metagenome data assembly, to identify the key microbes responsible for methanogenesis from choline. Microcosm incubation with 13C2-choline leads to the formation of trimethylamine and subsequent methane production, suggesting that choline-dependent methanogenesis is a two-step process involving trimethylamine as the key intermediate. Amplicon sequencing analysis identifies Deltaproteobacteria of the genera Pelobacter as the major choline utilizers. Methanogenic Archaea of the genera Methanococcoides become enriched in choline-amended microcosms, indicating their role in methane formation from trimethylamine. The binning of metagenomic DNA results in the identification of bins classified as Pelobacter and Methanococcoides. Analyses of these bins reveal that Pelobacter have the genetic potential to degrade choline to trimethylamine using the choline-trimethylamine lyase pathway, whereas Methanococcoides are capable of methanogenesis using the pyrrolysine-containing trimethylamine methyltransferase pathway. Together, our data provide a new insight on the diversity of choline utilizing organisms in coastal sediments and support a syntrophic relationship between Bacteria and Archaea as the dominant route for methanogenesis from choline in this environment.
Collapse
|
109
|
Ma ZS, Li L. Measuring metagenome diversity and similarity with Hill numbers. Mol Ecol Resour 2018; 18:1339-1355. [PMID: 29985552 DOI: 10.1111/1755-0998.12923] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2017] [Revised: 01/31/2018] [Accepted: 02/17/2018] [Indexed: 11/27/2022]
Abstract
The first step of any metagenome sequencing project is to get the inventory of OTU abundances (operational taxonomic units) and/or metagenomic gene abundances. The former is generated with 16S-rRNA-tagged amplicon sequencing technology, and the latter can be generated from either gene-targeted or whole-sample shotgun metagenomics technologies. With 16S-rRNA data sets, measuring community diversity with diversity indexes such as species richness and Shannon's index has been a de facto standard analysis; nevertheless, similarly comprehensive approaches to metagenomic gene abundances are still largely missing, despite that both OTU and gene abundances are DNA reads. Here, we adapt the Hill numbers, which were reintroduced to macrocommunity ecology recently and are now widely regarded as a most appropriate measure system for ecological diversity, for measuring metagenome alpha-, beta- and gamma-diversities, and similarity. Our proposal includes the following: (a) Metagenomic gene (MG) diversity measures the single-gene-level metagenome diversity; (b) Type-I metagenome functional gene cluster (MFGC) diversity measures the diversity of functional gene clusters but ignoring within-cluster gene abundance information; (c) Type-II MFGC diversity considers within-cluster gene abundances information and integrates gene-cluster-level metagenome diversity and functional gene redundancy information; and (d) Four classes of Hill-numbers-based similarity metrics, including local gene overlap, regional gene overlap, gene homogeneity measure and gene turnover complement, were introduced in terms of MG and MFGC, respectively. We demonstrate the proposal with the gut metagenomes from healthy and IBD (inflammatory bowel disease) cohorts. The Hill numbers offer a unified approach to cohesively and comprehensively measuring the ecological and metagenome diversities of microbiomes.
Collapse
Affiliation(s)
- Zhanshan Sam Ma
- Computational Biology and Medical Ecology Lab, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| | - Lianwei Li
- Computational Biology and Medical Ecology Lab, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| |
Collapse
|
110
|
Forouzan E, Shariati P, Mousavi Maleki MS, Karkhane AA, Yakhchali B. Practical evaluation of 11 de novo assemblers in metagenome assembly. J Microbiol Methods 2018; 151:99-105. [PMID: 29953874 DOI: 10.1016/j.mimet.2018.06.007] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2018] [Revised: 06/16/2018] [Accepted: 06/23/2018] [Indexed: 11/18/2022]
Abstract
Next Generation Sequencing (NGS) technologies are revolutionizing the field of biology and metagenomic-based research. Since the volume of metagenomic data is typically very large, De novo metagenomic assembly can be effectively used to reduce the total amount of data and enhance quality of downstream analysis, such as annotation and binning. Although, there are many freely available assemblers, but selecting one suitable for a specific goal can be highly challenging. In this study, the performance of 11 well-known assemblers was evaluated in the assembly of three different metagenomes. The results obtained show that metaSPAdes is the best assembler and Megahit is a good choice for conservative assembly strategy. In addition, this research provides useful information regarding the pros and cons of each assembler and the effect of read length on assembly, thereby helping scholars to select the optimal assembler based on their objectives.
Collapse
Affiliation(s)
- Esmaeil Forouzan
- Institute of Industrial and Environmental Biotechnology, National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
| | - Parvin Shariati
- Institute of Industrial and Environmental Biotechnology, National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
| | - Masoumeh Sadat Mousavi Maleki
- Institute of Industrial and Environmental Biotechnology, National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
| | - Ali Asghar Karkhane
- Institute of Industrial and Environmental Biotechnology, National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
| | - Bagher Yakhchali
- Institute of Industrial and Environmental Biotechnology, National Institute of Genetic Engineering and Biotechnology, Tehran, Iran.
| |
Collapse
|
111
|
Batut B, Gravouil K, Defois C, Hiltemann S, Brugère JF, Peyretaillade E, Peyret P. ASaiM: a Galaxy-based framework to analyze microbiota data. Gigascience 2018; 7:5001424. [PMID: 29790941 PMCID: PMC6007547 DOI: 10.1093/gigascience/giy057] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2017] [Accepted: 05/10/2018] [Indexed: 12/24/2022] Open
Abstract
Background New generations of sequencing platforms coupled to numerous bioinformatics tools have led to rapid technological progress in metagenomics and metatranscriptomics to investigate complex microorganism communities. Nevertheless, a combination of different bioinformatic tools remains necessary to draw conclusions out of microbiota studies. Modular and user-friendly tools would greatly improve such studies. Findings We therefore developed ASaiM, an Open-Source Galaxy-based framework dedicated to microbiota data analyses. ASaiM provides an extensive collection of tools to assemble, extract, explore, and visualize microbiota information from raw metataxonomic, metagenomic, or metatranscriptomic sequences. To guide the analyses, several customizable workflows are included and are supported by tutorials and Galaxy interactive tours, which guide users through the analyses step by step. ASaiM is implemented as a Galaxy Docker flavour. It is scalable to thousands of datasets but also can be used on a normal PC. The associated source code is available under Apache 2 license at https://github.com/ASaiM/framework and documentation can be found online (http://asaim.readthedocs.io). Conclusions Based on the Galaxy framework, ASaiM offers a sophisticated environment with a variety of tools, workflows, documentation, and training to scientists working on complex microorganism communities. It makes analysis and exploration analyses of microbiota data easy, quick, transparent, reproducible, and shareable.
Collapse
Affiliation(s)
- Bérénice Batut
- Université Clermont Auvergne, EA 4678 CIDAM, 63000 Clermont-Ferrand, France (previous address)
- Bioinformatics Group, Department of Computer Science, University of Freiburg, 79110 Freiburg, Germany
| | - Kévin Gravouil
- Université Clermont Auvergne, EA 4678 CIDAM, 63000 Clermont-Ferrand, France (previous address)
- Université Clermont Auvergne, INRA, MEDIS, 63000 Clermont-Ferrand, France
- Université Clermont Auvergne, CNRS, LMGE, 63000 Clermont-Ferrand, France
- Université Clermont Auvergne, CNRS, LIMOS, 63000 Clermont-Ferrand, France
| | - Clémence Defois
- Université Clermont Auvergne, EA 4678 CIDAM, 63000 Clermont-Ferrand, France (previous address)
- Université Clermont Auvergne, INRA, MEDIS, 63000 Clermont-Ferrand, France
| | - Saskia Hiltemann
- Department of Bioinformatics, Erasmus University Medical Center, Rotterdam, 3015 CE, Netherlands
| | - Jean-François Brugère
- Université Clermont Auvergne, EA 4678 CIDAM, 63000 Clermont-Ferrand, France (previous address)
| | - Eric Peyretaillade
- Université Clermont Auvergne, EA 4678 CIDAM, 63000 Clermont-Ferrand, France (previous address)
- Université Clermont Auvergne, CNRS, LMGE, 63000 Clermont-Ferrand, France
| | - Pierre Peyret
- Université Clermont Auvergne, EA 4678 CIDAM, 63000 Clermont-Ferrand, France (previous address)
- Université Clermont Auvergne, INRA, MEDIS, 63000 Clermont-Ferrand, France
| |
Collapse
|
112
|
MetaHMM: A webserver for identifying novel genes with specified functions in metagenomic samples. Genomics 2018; 111:883-885. [PMID: 29802977 DOI: 10.1016/j.ygeno.2018.05.016] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2018] [Revised: 05/17/2018] [Accepted: 05/18/2018] [Indexed: 11/20/2022]
Abstract
The fast and affordable sequencing of large clinical and environmental metagenomic datasets opens up new horizons in medical and biotechnological applications. It is believed that today we have described only about 1% of the microorganisms on the Earth, therefore, metagenomic analysis mostly deals with unknown species in the samples. Microbial communities in extreme environments may contain genes with high biotechnological potential, and clinical metagenomes, related to diseases, may uncover still unknown pathogens and pathological mechanisms in known diseases. While the species-level identification and description of the taxa in the samples do not seem to be possible today, we can search for novel genes with known functions in these samples, using numerous techniques, including artificial intelligence tools, like the hidden Markov models (HMMs). Here we describe a simple-to-use webserver, the MetaHMM, which is capable of homology-based automatic model-building for the genes to be searched for, and it also finds the closest matches in the metagenome. The webserver uses already highly successful building blocks: it performs multiple alignments by applying Clustal Omega, builds a hidden Markov model with HMMER components of hmmbuild and uses hmmsearch for finding similar sequences to the specified model in the metagenomes. The webserver is publicly available at https://metahmm.pitgroup.org.
Collapse
|
113
|
|
114
|
Kohl KD. An Introductory "How-to" Guide for Incorporating Microbiome Research into Integrative and Comparative Biology. Integr Comp Biol 2018; 57:674-681. [PMID: 28985331 DOI: 10.1093/icb/icx013] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Research on host-associated microbial communities has grown rapidly. Despite the great body of work, inclusion of microbiota-related questions into integrative and comparative biology is still lagging behind other disciplines. The purpose of this paper is to offer an introduction into the basic tools and techniques of host-microbe research. Specifically, what considerations should be made before embarking on such projects (types of samples, types of controls)? How is microbiome data analyzed and integrated with data measured from the hosts? How can researchers experimentally manipulate the microbiome? With this information, integrative and comparative biologists should be able to include host-microbe studies into their research and push the boundaries of both fields.
Collapse
Affiliation(s)
- Kevin D Kohl
- Department of Biological Sciences, University of Pittsburgh, 4249 Fifth Avenue, Pittsburgh, PA 15260, USA
| |
Collapse
|
115
|
Zhang Q. Metagenome Assembly and Contig Assignment. Methods Mol Biol 2018; 1849:179-192. [PMID: 30298255 DOI: 10.1007/978-1-4939-8728-3_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The recent development of metagenomic assembly has revolutionized metagenomic data analysis, thanks to the improvement of sequencing techniques, more powerful computational infrastructure and the development of novel algorithms and methods. Using longer assembled contigs rather than raw reads improves the process of metagenomic binning and annotation significantly, ultimately resulting in a deeper understanding of the microbial dynamics of the metagenomic samples being analyzed. In this chapter, we demonstrate a typical metagenomic analysis pipeline including raw read quality evaluation and trimming, assembly and contig binning. Alternative tools that can be used for each step are also discussed.
Collapse
Affiliation(s)
- Qingpeng Zhang
- Department of Energy, Joint Genome Institute, Walnut Creek, CA, USA.
| |
Collapse
|
116
|
Papudeshi B, Haggerty JM, Doane M, Morris MM, Walsh K, Beattie DT, Pande D, Zaeri P, Silva GGZ, Thompson F, Edwards RA, Dinsdale EA. Optimizing and evaluating the reconstruction of Metagenome-assembled microbial genomes. BMC Genomics 2017; 18:915. [PMID: 29183281 PMCID: PMC5706307 DOI: 10.1186/s12864-017-4294-1] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2017] [Accepted: 11/13/2017] [Indexed: 11/12/2022] Open
Abstract
Background Microbiome/host interactions describe characteristics that affect the host's health. Shotgun metagenomics includes sequencing a random subset of the microbiome to analyze its taxonomic and metabolic potential. Reconstruction of DNA fragments into genomes from metagenomes (called metagenome-assembled genomes) assigns unknown fragments to taxa/function and facilitates discovery of novel organisms. Genome reconstruction incorporates sequence assembly and sorting of assembled sequences into bins, characteristic of a genome. However, the microbial community composition, including taxonomic and phylogenetic diversity may influence genome reconstruction. We determine the optimal reconstruction method for four microbiome projects that had variable sequencing platforms (IonTorrent and Illumina), diversity (high or low), and environment (coral reefs and kelp forests), using a set of parameters to select for optimal assembly and binning tools. Methods We tested the effects of the assembly and binning processes on population genome reconstruction using 105 marine metagenomes from 4 projects. Reconstructed genomes were obtained from each project using 3 assemblers (IDBA, MetaVelvet, and SPAdes) and 2 binning tools (GroopM and MetaBat). We assessed the efficiency of assemblers using statistics that including contig continuity and contig chimerism and the effectiveness of binning tools using genome completeness and taxonomic identification. Results We concluded that SPAdes, assembled more contigs (143,718 ± 124 contigs) of longer length (N50 = 1632 ± 108 bp), and incorporated the most sequences (sequences-assembled = 19.65%). The microbial richness and evenness were maintained across the assembly, suggesting low contig chimeras. SPAdes assembly was responsive to the biological and technological variations within the project, compared with other assemblers. Among binning tools, we conclude that MetaBat produced bins with less variation in GC content (average standard deviation: 1.49), low species richness (4.91 ± 0.66), and higher genome completeness (40.92 ± 1.75) across all projects. MetaBat extracted 115 bins from the 4 projects of which 66 bins were identified as reconstructed metagenome-assembled genomes with sequences belonging to a specific genus. We identified 13 novel genomes, some of which were 100% complete, but show low similarity to genomes within databases. Conclusions In conclusion, we present a set of biologically relevant parameters for evaluation to select for optimal assembly and binning tools. For the tools we tested, SPAdes assembler and MetaBat binning tools reconstructed quality metagenome-assembled genomes for the four projects. We also conclude that metagenomes from microbial communities that have high coverage of phylogenetically distinct, and low taxonomic diversity results in highest quality metagenome-assembled genomes. Electronic supplementary material The online version of this article (10.1186/s12864-017-4294-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Bhavya Papudeshi
- Bioinformatics and Medical Informatics, San Diego State University, San Diego, California, USA.,National Center for Genome Analysis Support, Indiana University, Bloomington, Indiana, USA
| | - J Matthew Haggerty
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, 92115, California, USA
| | - Michael Doane
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, 92115, California, USA
| | - Megan M Morris
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, 92115, California, USA
| | - Kevin Walsh
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, 92115, California, USA
| | - Douglas T Beattie
- Department of Biology, University of New South Wales, Sydney, New South Wales, Australia
| | - Dnyanada Pande
- Bioinformatics and Medical Informatics, San Diego State University, San Diego, California, USA
| | - Parisa Zaeri
- Department of Mathematics and Statistics, San Diego State University, San Diego, California, USA
| | - Genivaldo G Z Silva
- Computational Science Research Center, San Diego State University, San Diego, California, USA
| | - Fabiano Thompson
- Institute of Biology, Federal University of Rio de Janeiro (UFRJ), Rio de Janeiro, Brazil
| | - Robert A Edwards
- Department of Computer Science, San Diego State University, 5500 Campanile Drive, San Diego, California, USA
| | - Elizabeth A Dinsdale
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, 92115, California, USA.
| |
Collapse
|
117
|
Comparative metagenomics of hydrocarbon and methane seeps of the Gulf of Mexico. Sci Rep 2017; 7:16015. [PMID: 29167487 PMCID: PMC5700182 DOI: 10.1038/s41598-017-16375-5] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2017] [Accepted: 11/10/2017] [Indexed: 11/18/2022] Open
Abstract
Oil and gas percolate profusely through the sediments of the Gulf of Mexico, leading to numerous seeps at the seafloor, where complex microbial, and sometimes animal communities flourish. Sediments from three areas (two cold seeps with contrasting hydrocarbon composition and a site outside any area of active seepage) of the Gulf of Mexico were investigated and compared. Consistent with the existence of a seep microbiome, a distinct microbial community was observed in seep areas compared to sediment from outside areas of active seepage. The microbial community from sediments without any influence from hydrocarbon seepage was characterized by Planctomycetes and the metabolic potential was consistent with detrital marine snow degradation. By contrast, in seep samples with methane as the principal hydrocarbon, methane oxidation by abundant members of ANME-1 was likely the predominant process. Seep samples characterized by fluids containing both methane and complex hydrocarbons, were characterized by abundant Chloroflexi (Anaerolinaceae) and deltaproteobacterial lineages and exhibited potential for complex hydrocarbon degradation. These different metabolic capacities suggested that microorganisms in cold seeps can potentially rely on other processes beyond methane oxidation and that the hydrocarbon composition of the seep fluids may be a critical factor structuring the seafloor microbial community composition and function.
Collapse
|
118
|
Gene Prediction in Metagenomic Fragments with Deep Learning. BIOMED RESEARCH INTERNATIONAL 2017; 2017:4740354. [PMID: 29250541 PMCID: PMC5698827 DOI: 10.1155/2017/4740354] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 10/08/2017] [Indexed: 01/14/2023]
Abstract
Next generation sequencing technologies used in metagenomics yield numerous sequencing fragments which come from thousands of different species. Accurately identifying genes from metagenomics fragments is one of the most fundamental issues in metagenomics. In this article, by fusing multifeatures (i.e., monocodon usage, monoamino acid usage, ORF length coverage, and Z-curve features) and using deep stacking networks learning model, we present a novel method (called Meta-MFDL) to predict the metagenomic genes. The results with 10 CV and independent tests show that Meta-MFDL is a powerful tool for identifying genes from metagenomic fragments.
Collapse
|
119
|
Zuñiga C, Zaramela L, Zengler K. Elucidation of complexity and prediction of interactions in microbial communities. Microb Biotechnol 2017; 10:1500-1522. [PMID: 28925555 PMCID: PMC5658597 DOI: 10.1111/1751-7915.12855] [Citation(s) in RCA: 77] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2017] [Revised: 08/10/2017] [Accepted: 08/11/2017] [Indexed: 12/11/2022] Open
Abstract
Microorganisms engage in complex interactions with other members of the microbial community, higher organisms as well as their environment. However, determining the exact nature of these interactions can be challenging due to the large number of members in these communities and the manifold of interactions they can engage in. Various omic data, such as 16S rRNA gene sequencing, shotgun metagenomics, metatranscriptomics, metaproteomics and metabolomics, have been deployed to unravel the community structure, interactions and resulting community dynamics in situ. Interpretation of these multi-omic data often requires advanced computational methods. Modelling approaches are powerful tools to integrate, contextualize and interpret experimental data, thus shedding light on the underlying processes shaping the microbiome. Here, we review current methods and approaches, both experimental and computational, to elucidate interactions in microbial communities and to predict their responses to perturbations.
Collapse
Affiliation(s)
- Cristal Zuñiga
- Department of PediatricsUniversity of California, San Diego9500 Gilman DriveLa JollaCA92093‐0760USA
| | - Livia Zaramela
- Department of PediatricsUniversity of California, San Diego9500 Gilman DriveLa JollaCA92093‐0760USA
| | - Karsten Zengler
- Department of PediatricsUniversity of California, San Diego9500 Gilman DriveLa JollaCA92093‐0760USA
| |
Collapse
|
120
|
Noyes NR, Weinroth ME, Parker JK, Dean CJ, Lakin SM, Raymond RA, Rovira P, Doster E, Abdo Z, Martin JN, Jones KL, Ruiz J, Boucher CA, Belk KE, Morley PS. Enrichment allows identification of diverse, rare elements in metagenomic resistome-virulome sequencing. MICROBIOME 2017; 5:142. [PMID: 29041965 PMCID: PMC5645900 DOI: 10.1186/s40168-017-0361-8] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/29/2017] [Accepted: 10/05/2017] [Indexed: 05/29/2023]
Abstract
BACKGROUND Shotgun metagenomic sequencing is increasingly utilized as a tool to evaluate ecological-level dynamics of antimicrobial resistance and virulence, in conjunction with microbiome analysis. Interest in use of this method for environmental surveillance of antimicrobial resistance and pathogenic microorganisms is also increasing. In published metagenomic datasets, the total of all resistance- and virulence-related sequences accounts for < 1% of all sequenced DNA, leading to limitations in detection of low-abundance resistome-virulome elements. This study describes the extent and composition of the low-abundance portion of the resistome-virulome, using a bait-capture and enrichment system that incorporates unique molecular indices to count DNA molecules and correct for enrichment bias. RESULTS The use of the bait-capture and enrichment system significantly increased on-target sequencing of the resistome-virulome, enabling detection of an additional 1441 gene accessions and revealing a low-abundance portion of the resistome-virulome that was more diverse and compositionally different than that detected by more traditional metagenomic assays. The low-abundance portion of the resistome-virulome also contained resistance genes with public health importance, such as extended-spectrum betalactamases, that were not detected using traditional shotgun metagenomic sequencing. In addition, the use of the bait-capture and enrichment system enabled identification of rare resistance gene haplotypes that were used to discriminate between sample origins. CONCLUSIONS These results demonstrate that the rare resistome-virulome contains valuable and unique information that can be utilized for both surveillance and population genetic investigations of resistance. Access to the rare resistome-virulome using the bait-capture and enrichment system validated in this study can greatly advance our understanding of microbiome-resistome dynamics.
Collapse
Affiliation(s)
- Noelle R Noyes
- Department of Microbiology, Immunology and Pathology, Colorado State University, Fort Collins, CO, USA
| | - Maggie E Weinroth
- Department of Animal Sciences, Colorado State University, Fort Collins, CO, USA
| | - Jennifer K Parker
- Department of Clinical Sciences, Colorado State University, Fort Collins, CO, USA
| | - Chris J Dean
- Department of Computer Sciences, Colorado State University, Fort Collins, CO, USA
| | - Steven M Lakin
- Department of Clinical Sciences, Colorado State University, Fort Collins, CO, USA
| | - Robert A Raymond
- Department of Computer Sciences, Colorado State University, Fort Collins, CO, USA
| | - Pablo Rovira
- Department of Animal Sciences, Colorado State University, Fort Collins, CO, USA
| | - Enrique Doster
- Department of Clinical Sciences, Colorado State University, Fort Collins, CO, USA
| | - Zaid Abdo
- Department of Microbiology, Immunology and Pathology, Colorado State University, Fort Collins, CO, USA
| | - Jennifer N Martin
- Department of Animal Sciences, Colorado State University, Fort Collins, CO, USA
| | - Kenneth L Jones
- Department of Pediatrics, Section of Hematology Oncology and Bone Marrow Transplant, University of Colorado School of Medicine, Aurora, CO, USA
| | - Jaime Ruiz
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, Florida, USA
| | - Christina A Boucher
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, Florida, USA
| | - Keith E Belk
- Department of Animal Sciences, Colorado State University, Fort Collins, CO, USA
| | - Paul S Morley
- Department of Clinical Sciences, Colorado State University, Fort Collins, CO, USA.
| |
Collapse
|
121
|
Kono N, Tomita M, Arakawa K. eRP arrangement: a strategy for assembled genomic contig rearrangement based on replication profiling in bacteria. BMC Genomics 2017; 18:784. [PMID: 29029602 PMCID: PMC5640929 DOI: 10.1186/s12864-017-4162-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2017] [Accepted: 10/05/2017] [Indexed: 12/15/2022] Open
Abstract
Background The reduced cost of sequencing has made de novo sequencing and the assembly of draft microbial genomes feasible in any ordinary biology lab. However, the process of finishing and completing the genome remains labor-intensive and computationally challenging in some cases, such as in the study of complete genome sequences, genomic rearrangements, long-range syntenic relationships, and structural variations. Methods Here, we show a contig reordering strategy based on experimental replication profiling (eRP) to recapitulate the bacterial genome structure within draft genomes. During the exponential growth phase, the majority of bacteria show a global genomic copy number gradient that is enriched near the replication origin and gradually declines toward the terminus. Therefore, if genome sequencing is performed with appropriate timing, the short-read coverage reflects this copy number gradient, providing information about the contig positions relative to the replication origin and terminus. Results We therefore investigated the appropriate timing for genomic DNA sampling and developed an algorithm for the reordering of the contigs based on eRP. As a result, this strategy successfully recapitulates the genomic structure of various structural mutants with draft genome sequencing. Conclusions Our strategy was successful for contig rearrangement with intracellular DNA replication behavior mechanisms and can be applied to almost all bacteria because the DNA replication system is highly conserved. Therefore, eRP makes it possible to understand genomic structural information and long-range syntenic relationships using a draft genome that is based on short reads. Electronic supplementary material The online version of this article (10.1186/s12864-017-4162-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Nobuaki Kono
- Institute for Advanced Biosciences, Keio University, Mizukami 246-2, Kakuganji, Tsuruoka, Yamagata, 997-0052, Japan.
| | - Masaru Tomita
- Institute for Advanced Biosciences, Keio University, Mizukami 246-2, Kakuganji, Tsuruoka, Yamagata, 997-0052, Japan
| | - Kazuharu Arakawa
- Institute for Advanced Biosciences, Keio University, Mizukami 246-2, Kakuganji, Tsuruoka, Yamagata, 997-0052, Japan
| |
Collapse
|
122
|
Metagenomic Sequencing of Microbial Communities from Brackish Water of Pangong Lake of the Northwest Indian Himalayas. GENOME ANNOUNCEMENTS 2017; 5:5/40/e01029-17. [PMID: 28982995 PMCID: PMC5629052 DOI: 10.1128/genomea.01029-17] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Pangong is a brackish water lake having environmental conditions that are hostile to supporting life. This is the first report unveiling the microbial diversity of sediment from Pangong Lake, Ladakh, India, using a high-throughput metagenomic approach. Metagenomic data analysis revealed a community structure of microbes in which functional genetic diversity facilitates their survival.
Collapse
|
123
|
Roux S, Emerson JB, Eloe-Fadrosh EA, Sullivan MB. Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity. PeerJ 2017; 5:e3817. [PMID: 28948103 PMCID: PMC5610896 DOI: 10.7717/peerj.3817] [Citation(s) in RCA: 170] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2017] [Accepted: 08/26/2017] [Indexed: 12/20/2022] Open
Abstract
Background Viral metagenomics (viromics) is increasingly used to obtain uncultivated viral genomes, evaluate community diversity, and assess ecological hypotheses. While viromic experimental methods are relatively mature and widely accepted by the research community, robust bioinformatics standards remain to be established. Here we used in silico mock viral communities to evaluate the viromic sequence-to-ecological-inference pipeline, including (i) read pre-processing and metagenome assembly, (ii) thresholds applied to estimate viral relative abundances based on read mapping to assembled contigs, and (iii) normalization methods applied to the matrix of viral relative abundances for alpha and beta diversity estimates. Results Tools specifically designed for metagenomes, specifically metaSPAdes, MEGAHIT, and IDBA-UD, were the most effective at assembling viromes. Read pre-processing, such as partitioning, had virtually no impact on assembly output, but may be useful when hardware is limited. Viral populations with 2–5 × coverage typically assembled well, whereas lesser coverage led to fragmented assembly. Strain heterogeneity within populations hampered assembly, especially when strains were closely related (average nucleotide identity, or ANI ≥97%) and when the most abundant strain represented <50% of the population. Viral community composition assessments based on read recruitment were generally accurate when the following thresholds for detection were applied: (i) ≥10 kb contig lengths to define populations, (ii) coverage defined from reads mapping at ≥90% identity, and (iii) ≥75% of contig length with ≥1 × coverage. Finally, although data are limited to the most abundant viruses in a community, alpha and beta diversity patterns were robustly estimated (±10%) when comparing samples of similar sequencing depth, but more divergent (up to 80%) when sequencing depth was uneven across the dataset. In the latter cases, the use of normalization methods specifically developed for metagenomes provided the best estimates. Conclusions These simulations provide benchmarks for selecting analysis cut-offs and establish that an optimized sample-to-ecological-inference viromics pipeline is robust for making ecological inferences from natural viral communities. Continued development to better accessing RNA, rare, and/or diverse viral populations and improved reference viral genome availability will alleviate many of viromics remaining limitations.
Collapse
Affiliation(s)
- Simon Roux
- Department of Microbiology, Ohio State University, Columbus, OH, United States of America
| | - Joanne B Emerson
- Department of Microbiology, Ohio State University, Columbus, OH, United States of America
| | - Emiley A Eloe-Fadrosh
- Joint Genome Institute, Department of Energy, Walnut Creek, CA, United States of America
| | - Matthew B Sullivan
- Department of Microbiology, Ohio State University, Columbus, OH, United States of America.,Department of Civil, Environmental and Geodetic Engineering, Ohio State University, Columbus, OH, United States of America
| |
Collapse
|
124
|
van der Walt AJ, van Goethem MW, Ramond JB, Makhalanyane TP, Reva O, Cowan DA. Assembling metagenomes, one community at a time. BMC Genomics 2017; 18:521. [PMID: 28693474 PMCID: PMC5502489 DOI: 10.1186/s12864-017-3918-9] [Citation(s) in RCA: 74] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Accepted: 07/02/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Metagenomics allows unprecedented access to uncultured environmental microorganisms. The analysis of metagenomic sequences facilitates gene prediction and annotation, and enables the assembly of draft genomes, including uncultured members of a community. However, while several platforms have been developed for this critical step, there is currently no clear framework for the assembly of metagenomic sequence data. RESULTS To assist with selection of an appropriate metagenome assembler we evaluated the capabilities of nine prominent assembly tools on nine publicly-available environmental metagenomes, as well as three simulated datasets. Overall, we found that SPAdes provided the largest contigs and highest N50 values across 6 of the 9 environmental datasets, followed by MEGAHIT and metaSPAdes. MEGAHIT emerged as a computationally inexpensive alternative to SPAdes, assembling the most complex dataset using less than 500 GB of RAM and within 10 hours. CONCLUSIONS We found that assembler choice ultimately depends on the scientific question, the available resources and the bioinformatic competence of the researcher. We provide a concise workflow for the selection of the best assembly tool.
Collapse
Affiliation(s)
- Andries Johannes van der Walt
- Centre for Microbial Ecology and Genomics (CMEG), Department of Genetics, University of Pretoria, Natural Sciences Building 2, Lynnwood Road, Pretoria, 0028, South Africa.,Centre for Bioinformatics and Computational Biology, Department of Biochemistry, University of Pretoria, Pretoria, South Africa
| | - Marc Warwick van Goethem
- Centre for Microbial Ecology and Genomics (CMEG), Department of Genetics, University of Pretoria, Natural Sciences Building 2, Lynnwood Road, Pretoria, 0028, South Africa
| | - Jean-Baptiste Ramond
- Centre for Microbial Ecology and Genomics (CMEG), Department of Genetics, University of Pretoria, Natural Sciences Building 2, Lynnwood Road, Pretoria, 0028, South Africa
| | - Thulani Peter Makhalanyane
- Centre for Microbial Ecology and Genomics (CMEG), Department of Genetics, University of Pretoria, Natural Sciences Building 2, Lynnwood Road, Pretoria, 0028, South Africa
| | - Oleg Reva
- Centre for Bioinformatics and Computational Biology, Department of Biochemistry, University of Pretoria, Pretoria, South Africa
| | - Don Arthur Cowan
- Centre for Microbial Ecology and Genomics (CMEG), Department of Genetics, University of Pretoria, Natural Sciences Building 2, Lynnwood Road, Pretoria, 0028, South Africa.
| |
Collapse
|