17201
|
Debelius J, Song SJ, Vazquez-Baeza Y, Xu ZZ, Gonzalez A, Knight R. Tiny microbes, enormous impacts: what matters in gut microbiome studies? Genome Biol 2016; 17:217. [PMID: 27760558 PMCID: PMC5072314 DOI: 10.1186/s13059-016-1086-x] [Citation(s) in RCA: 112] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
Many factors affect the microbiomes of humans, mice, and other mammals, but substantial challenges remain in determining which of these factors are of practical importance. Considering the relative effect sizes of both biological and technical covariates can help improve study design and the quality of biological conclusions. Care must be taken to avoid technical bias that can lead to incorrect biological conclusions. The presentation of quantitative effect sizes in addition to P values will improve our ability to perform meta-analysis and to evaluate potentially relevant biological effects. A better consideration of effect size and statistical power will lead to more robust biological conclusions in microbiome studies.
Collapse
Affiliation(s)
- Justine Debelius
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Se Jin Song
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Department of Ecology and Evolutionary Biology, University of Colorado Boulder, Boulder, CO, USA
| | - Yoshiki Vazquez-Baeza
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Zhenjiang Zech Xu
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Antonio Gonzalez
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Rob Knight
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA.
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
17202
|
mockrobiota: a Public Resource for Microbiome Bioinformatics Benchmarking. mSystems 2016; 1:mSystems00062-16. [PMID: 27822553 PMCID: PMC5080401 DOI: 10.1128/msystems.00062-16] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Accepted: 09/14/2016] [Indexed: 12/04/2022] Open
Abstract
The availability of standard and public mock community data will facilitate ongoing method optimizations, comparisons across studies that share source data, and greater transparency and access and eliminate redundancy. These are also valuable resources for bioinformatics teaching and training. This dynamic resource is intended to expand and evolve to meet the changing needs of the omics community. Mock communities are an important tool for validating, optimizing, and comparing bioinformatics methods for microbial community analysis. We present mockrobiota, a public resource for sharing, validating, and documenting mock community data resources, available at http://caporaso-lab.github.io/mockrobiota/. The materials contained in mockrobiota include data set and sample metadata, expected composition data (taxonomy or gene annotations or reference sequences for mock community members), and links to raw data (e.g., raw sequence data) for each mock community data set. mockrobiota does not supply physical sample materials directly, but the data set metadata included for each mock community indicate whether physical sample materials are available. At the time of this writing, mockrobiota contains 11 mock community data sets with known species compositions, including bacterial, archaeal, and eukaryotic mock communities, analyzed by high-throughput marker gene sequencing. IMPORTANCE The availability of standard and public mock community data will facilitate ongoing method optimizations, comparisons across studies that share source data, and greater transparency and access and eliminate redundancy. These are also valuable resources for bioinformatics teaching and training. This dynamic resource is intended to expand and evolve to meet the changing needs of the omics community.
Collapse
|
17203
|
Abstract
The UniFrac distance metric is often used to separate groups in microbiome analysis, but requires a constant sequencing depth to work properly. Here we demonstrate that unweighted UniFrac is highly sensitive to rarefaction instance and to sequencing depth in uniform data sets with no clear structure or separation between groups. We show that this arises because of subcompositional effects. We introduce information UniFrac and ratio UniFrac, two new weightings that are not as sensitive to rarefaction and allow greater separation of outliers than classic unweighted and weighted UniFrac. With this expansion of the UniFrac toolbox, we hope to empower researchers to extract more varied information from their data.
Collapse
Affiliation(s)
- Ruth G. Wong
- Department of Biochemistry, University of Western Ontario, London, Ontario, Canada
| | - Jia R. Wu
- Department of Biochemistry, University of Western Ontario, London, Ontario, Canada
| | - Gregory B. Gloor
- Department of Biochemistry, University of Western Ontario, London, Ontario, Canada
- * E-mail:
| |
Collapse
|
17204
|
Eren AM, Sogin ML, Maignien L. Editorial: New Insights into Microbial Ecology through Subtle Nucleotide Variation. Front Microbiol 2016; 7:1318. [PMID: 27605925 PMCID: PMC4995221 DOI: 10.3389/fmicb.2016.01318] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2016] [Accepted: 08/09/2016] [Indexed: 11/24/2022] Open
Affiliation(s)
- A Murat Eren
- Department of Medicine, The University of ChicagoChicago, IL, USA; Marine Biological Laboratory, Josephine Bay Paul CenterWoods Hole, MA, USA
| | - Mitchell L Sogin
- Marine Biological Laboratory, Josephine Bay Paul Center Woods Hole, MA, USA
| | - Loïs Maignien
- Marine Biological Laboratory, Josephine Bay Paul CenterWoods Hole, MA, USA; Laboratory of Microbiology of Extreme Environnments, UMR 6197, Institut Européen de la Mer, Université de Bretagne OccidentalePlouzane, France
| |
Collapse
|
17205
|
Ramiro-Garcia J, Hermes GDA, Giatsis C, Sipkema D, Zoetendal EG, Schaap PJ, Smidt H. NG-Tax, a highly accurate and validated pipeline for analysis of 16S rRNA amplicons from complex biomes. F1000Res 2016; 5:1791. [PMID: 30918626 PMCID: PMC6419982 DOI: 10.12688/f1000research.9227.1] [Citation(s) in RCA: 103] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/21/2016] [Indexed: 01/03/2023] Open
Abstract
Background Massive high-throughput sequencing of short, hypervariable segments of the 16S ribosomal RNA (rRNA) gene has transformed the methodological landscape describing microbial diversity within and across complex biomes. However, several studies have shown that the methodology rather than the biological variation is responsible for the observed sample composition and distribution. This compromises true meta-analyses, although this fact is often disregarded. Results To facilitate true meta-analysis of microbiome studies, we developed NG-Tax, a pipeline for 16S rRNA gene amplicon sequence analysis that was validated with different mock communities and benchmarked against QIIME as the currently most frequently used pipeline. The microbial composition of 49 independently amplified mock samples was characterized by sequencing two variable 16S rRNA gene regions, V4 and V5-V6, in three separate sequencing runs on Illumina's HiSeq2000 platform. This allowed evaluating important factors of technical bias in taxonomic classification: 1) run-to-run sequencing variation, 2) PCR-error, and 3) region/primer specific amplification bias. Despite the short read length (~140 nt) and all technical biases, the average specificity of the taxonomic assignment for the phylotypes included in the mock communities was 96%. On average 99.94% and 92.02% of the reads could be assigned to at least family or genus level, respectively, while assignment to 'spurious genera' represented on average only 0.02% of the reads per sample. Analysis of α- and β-diversity confirmed conclusions guided by biology rather than the aforementioned methodological aspects, which was not the case when samples were analysed using QIIME. Conclusions Different biological outcomes are commonly observed due to 16S rRNA region-specific performance. NG-Tax demonstrated high robustness against choice of region and other technical biases associated with 16S rRNA gene amplicon sequencing studies, diminishing their impact and providing accurate qualitative and quantitative representation of the true sample composition. This will improve comparability between studies and facilitate efforts towards standardization.
Collapse
Affiliation(s)
- Javier Ramiro-Garcia
- TI Food and Nutrition (TIFN), Wageningen, 6703 HB, The Netherlands
- Laboratory of Microbiology, Wageningen University, Wageningen, 6708 WE, The Netherlands
- Laboratory of Systems and Synthetic Biology, Wageningen University, Wageningen, 6708 WE, The Netherlands
| | - Gerben D. A. Hermes
- TI Food and Nutrition (TIFN), Wageningen, 6703 HB, The Netherlands
- Laboratory of Microbiology, Wageningen University, Wageningen, 6708 WE, The Netherlands
| | - Christos Giatsis
- Aquaculture and Fisheries Group, Wageningen University, Wageningen, 6708 WD, The Netherlands
| | - Detmer Sipkema
- Laboratory of Microbiology, Wageningen University, Wageningen, 6708 WE, The Netherlands
| | - Erwin G. Zoetendal
- TI Food and Nutrition (TIFN), Wageningen, 6703 HB, The Netherlands
- Laboratory of Microbiology, Wageningen University, Wageningen, 6708 WE, The Netherlands
| | - Peter J. Schaap
- TI Food and Nutrition (TIFN), Wageningen, 6703 HB, The Netherlands
- Laboratory of Systems and Synthetic Biology, Wageningen University, Wageningen, 6708 WE, The Netherlands
| | - Hauke Smidt
- Laboratory of Microbiology, Wageningen University, Wageningen, 6708 WE, The Netherlands
| |
Collapse
|
17206
|
Ramiro-Garcia J, Hermes GDA, Giatsis C, Sipkema D, Zoetendal EG, Schaap PJ, Smidt H. NG-Tax, a highly accurate and validated pipeline for analysis of 16S rRNA amplicons from complex biomes. F1000Res 2016; 5:1791. [PMID: 30918626 PMCID: PMC6419982 DOI: 10.12688/f1000research.9227.2] [Citation(s) in RCA: 81] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/15/2018] [Indexed: 02/01/2023] Open
Abstract
Background: Massive high-throughput sequencing of short, hypervariable segments of the 16S ribosomal RNA (rRNA) gene has transformed the methodological landscape describing microbial diversity within and across complex biomes. However, several studies have shown that the methodology rather than the biological variation is responsible for the observed sample composition and distribution. This compromises meta-analyses, although this fact is often disregarded. Results: To facilitate true meta-analysis of microbiome studies, we developed NG-Tax, a pipeline for 16S rRNA gene amplicon sequence analysis that was validated with different mock communities and benchmarked against QIIME as a frequently used pipeline. The microbial composition of 49 independently amplified mock samples was characterized by sequencing two variable 16S rRNA gene regions, V4 and V5-V6, in three separate sequencing runs on Illumina's HiSeq2000 platform. This allowed for the evaluation of important causes of technical bias in taxonomic classification: 1) run-to-run sequencing variation, 2) PCR-error, and 3) region/primer specific amplification bias. Despite the short read length (~140 nt) and all technical biases, the average specificity of the taxonomic assignment for the phylotypes included in the mock communities was 97.78%. On average 99.95% and 88.43% of the reads could be assigned to at least family or genus level, respectively, while assignment to 'spurious genera' represented on average only 0.21% of the reads per sample. Analysis of α- and β-diversity confirmed conclusions guided by biology rather than the aforementioned methodological aspects, which was not achieved with QIIME. Conclusions: Different biological outcomes are commonly observed due to 16S rRNA region-specific performance. NG-Tax demonstrated high robustness against choice of region and other technical biases associated with 16S rRNA gene amplicon sequencing studies, diminishing their impact and providing accurate qualitative and quantitative representation of the true sample composition. This will improve comparability between studies and facilitate efforts towards standardization.
Collapse
Affiliation(s)
- Javier Ramiro-Garcia
- TI Food and Nutrition (TIFN), Wageningen, 6703 HB, The Netherlands
- Laboratory of Microbiology, Wageningen University, Wageningen, 6708 WE, The Netherlands
- Laboratory of Systems and Synthetic Biology, Wageningen University, Wageningen, 6708 WE, The Netherlands
| | - Gerben D. A. Hermes
- TI Food and Nutrition (TIFN), Wageningen, 6703 HB, The Netherlands
- Laboratory of Microbiology, Wageningen University, Wageningen, 6708 WE, The Netherlands
| | - Christos Giatsis
- Aquaculture and Fisheries Group, Wageningen University, Wageningen, 6708 WD, The Netherlands
| | - Detmer Sipkema
- Laboratory of Microbiology, Wageningen University, Wageningen, 6708 WE, The Netherlands
| | - Erwin G. Zoetendal
- TI Food and Nutrition (TIFN), Wageningen, 6703 HB, The Netherlands
- Laboratory of Microbiology, Wageningen University, Wageningen, 6708 WE, The Netherlands
| | - Peter J. Schaap
- TI Food and Nutrition (TIFN), Wageningen, 6703 HB, The Netherlands
- Laboratory of Systems and Synthetic Biology, Wageningen University, Wageningen, 6708 WE, The Netherlands
| | - Hauke Smidt
- Laboratory of Microbiology, Wageningen University, Wageningen, 6708 WE, The Netherlands
| |
Collapse
|
17207
|
Emerging Technologies for Gut Microbiome Research. Trends Microbiol 2016; 24:887-901. [PMID: 27426971 DOI: 10.1016/j.tim.2016.06.008] [Citation(s) in RCA: 131] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2016] [Revised: 06/06/2016] [Accepted: 06/23/2016] [Indexed: 02/06/2023]
Abstract
Understanding the importance of the gut microbiome on modulation of host health has become a subject of great interest for researchers across disciplines. As an intrinsically multidisciplinary field, microbiome research has been able to reap the benefits of technological advancements in systems and synthetic biology, biomaterials engineering, and traditional microbiology. Gut microbiome research has been revolutionized by high-throughput sequencing technology, permitting compositional and functional analyses that were previously an unrealistic undertaking. Emerging technologies, including engineered organoids derived from human stem cells, high-throughput culturing, and microfluidics assays allowing for the introduction of novel approaches, will improve the efficiency and quality of microbiome research. Here, we discuss emerging technologies and their potential impact on gut microbiome studies.
Collapse
|
17208
|
Callahan BJ, Sankaran K, Fukuyama JA, McMurdie PJ, Holmes SP. Bioconductor Workflow for Microbiome Data Analysis: from raw reads to community analyses. F1000Res 2016; 5:1492. [PMID: 27508062 PMCID: PMC4955027 DOI: 10.12688/f1000research.8986.2] [Citation(s) in RCA: 510] [Impact Index Per Article: 56.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/17/2016] [Indexed: 11/20/2022] Open
Abstract
High-throughput sequencing of PCR-amplified taxonomic markers (like the 16S rRNA gene) has enabled a new level of analysis of complex bacterial communities known as microbiomes. Many tools exist to quantify and compare abundance levels or OTU composition of communities in different conditions. The sequencing reads have to be denoised and assigned to the closest taxa from a reference database. Common approaches use a notion of 97% similarity and normalize the data by subsampling to equalize library sizes. In this paper, we show that statistical models allow more accurate abundance estimates. By providing a complete workflow in R, we enable the user to do sophisticated downstream statistical analyses, whether parametric or nonparametric. We provide examples of using the R packages dada2, phyloseq, DESeq2, ggplot2 and vegan to filter, visualize and test microbiome data. We also provide examples of supervised analyses using random forests and nonparametric testing using community networks and the ggnetwork package.
Collapse
Affiliation(s)
- Ben J Callahan
- Statistics Department, Stanford University, Stanford, CA, 94305, USA
| | - Kris Sankaran
- Statistics Department, Stanford University, Stanford, CA, 94305, USA
| | - Julia A Fukuyama
- Statistics Department, Stanford University, Stanford, CA, 94305, USA
| | | | - Susan P Holmes
- Statistics Department, Stanford University, Stanford, CA, 94305, USA
| |
Collapse
|
17209
|
Callahan BJ, Sankaran K, Fukuyama JA, McMurdie PJ, Holmes SP. Bioconductor Workflow for Microbiome Data Analysis: from raw reads to community analyses. F1000Res 2016; 5:1492. [PMID: 27508062 DOI: 10.12688/f1000research.8986.1] [Citation(s) in RCA: 269] [Impact Index Per Article: 29.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/14/2016] [Indexed: 11/20/2022] Open
Abstract
High-throughput sequencing of PCR-amplified taxonomic markers (like the 16S rRNA gene) has enabled a new level of analysis of complex bacterial communities known as microbiomes. Many tools exist to quantify and compare abundance levels or OTU composition of communities in different conditions. The sequencing reads have to be denoised and assigned to the closest taxa from a reference database. Common approaches use a notion of 97% similarity and normalize the data by subsampling to equalize library sizes. In this paper, we show that statistical models allow more accurate abundance estimates. By providing a complete workflow in R, we enable the user to do sophisticated downstream statistical analyses, whether parametric or nonparametric. We provide examples of using the R packages dada2, phyloseq, DESeq2, ggplot2 and vegan to filter, visualize and test microbiome data. We also provide examples of supervised analyses using random forests and nonparametric testing using community networks and the ggnetwork package.
Collapse
Affiliation(s)
- Ben J Callahan
- Statistics Department, Stanford University, Stanford, CA, 94305, USA
| | - Kris Sankaran
- Statistics Department, Stanford University, Stanford, CA, 94305, USA
| | - Julia A Fukuyama
- Statistics Department, Stanford University, Stanford, CA, 94305, USA
| | | | - Susan P Holmes
- Statistics Department, Stanford University, Stanford, CA, 94305, USA
| |
Collapse
|