1
|
Forry SP, Servetas SL, Kralj JG, Soh K, Hadjithomas M, Cano R, Carlin M, Amorim MGD, Auch B, Bakker MG, Bartelli TF, Bustamante JP, Cassol I, Chalita M, Dias-Neto E, Duca AD, Gohl DM, Kazantseva J, Haruna MT, Menzel P, Moda BS, Neuberger-Castillo L, Nunes DN, Patel IR, Peralta RD, Saliou A, Schwarzer R, Sevilla S, Takenaka IKTM, Wang JR, Knight R, Gevers D, Jackson SA. Variability and bias in microbiome metagenomic sequencing: an interlaboratory study comparing experimental protocols. Sci Rep 2024; 14:9785. [PMID: 38684791 PMCID: PMC11059151 DOI: 10.1038/s41598-024-57981-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 03/24/2024] [Indexed: 05/02/2024] Open
Abstract
Several studies have documented the significant impact of methodological choices in microbiome analyses. The myriad of methodological options available complicate the replication of results and generally limit the comparability of findings between independent studies that use differing techniques and measurement pipelines. Here we describe the Mosaic Standards Challenge (MSC), an international interlaboratory study designed to assess the impact of methodological variables on the results. The MSC did not prescribe methods but rather asked participating labs to analyze 7 shared reference samples (5 × human stool samples and 2 × mock communities) using their standard laboratory methods. To capture the array of methodological variables, each participating lab completed a metadata reporting sheet that included 100 different questions regarding the details of their protocol. The goal of this study was to survey the methodological landscape for microbiome metagenomic sequencing (MGS) analyses and the impact of methodological decisions on metagenomic sequencing results. A total of 44 labs participated in the MSC by submitting results (16S or WGS) along with accompanying metadata; thirty 16S rRNA gene amplicon datasets and 14 WGS datasets were collected. The inclusion of two types of reference materials (human stool and mock communities) enabled analysis of both MGS measurement variability between different protocols using the biologically-relevant stool samples, and MGS bias with respect to ground truth values using the DNA mixtures. Owing to the compositional nature of MGS measurements, analyses were conducted on the ratio of Firmicutes: Bacteroidetes allowing us to directly apply common statistical methods. The resulting analysis demonstrated that protocol choices have significant effects, including both bias of the MGS measurement associated with a particular methodological choices, as well as effects on measurement robustness as observed through the spread of results between labs making similar methodological choices. In the analysis of the DNA mock communities, MGS measurement bias was observed even when there was general consensus among the participating laboratories. This study was the result of a collaborative effort that included academic, commercial, and government labs. In addition to highlighting the impact of different methodological decisions on MGS result comparability, this work also provides insights for consideration in future microbiome measurement study design.
Collapse
Affiliation(s)
- Samuel P Forry
- Complex Microbial Systems Group, National Institute of Standards and Technology (NIST), Gaithersburg, MD, USA.
| | - Stephanie L Servetas
- Complex Microbial Systems Group, National Institute of Standards and Technology (NIST), Gaithersburg, MD, USA
| | - Jason G Kralj
- Complex Microbial Systems Group, National Institute of Standards and Technology (NIST), Gaithersburg, MD, USA
| | - Keng Soh
- Novo Nordisk, Copenhagen, Denmark
| | - Michalis Hadjithomas
- LifeMine Therapeutics, Cambridge Discovery Park, 30 Acorn Park Drive, Cambridge, MA, 02140, USA
| | - Raul Cano
- The BioCollective, LLC, 5650 Washington Street, Suite C9, Denver, CO, 80216, USA
| | - Martha Carlin
- The BioCollective, LLC, 5650 Washington Street, Suite C9, Denver, CO, 80216, USA
| | - Maria G de Amorim
- Laboratory of Medical Genomics, A. C. Camargo Cancer Center, Sao Paulo, SP, 01508-010, Brazil
| | - Benjamin Auch
- University of Minnesota Genomics Center, Minneapolis, MN, 55455, USA
| | - Matthew G Bakker
- Department of Microbiology, University of Manitoba, Winnipeg, MB, R3T 2N2, Canada
| | - Thais F Bartelli
- Laboratory of Medical Genomics, A. C. Camargo Cancer Center, Sao Paulo, SP, 01508-010, Brazil
| | - Juan P Bustamante
- Laboratorio de Investigación, Desarrollo y Transferencia de la Facultad de Ingeniería de la Universidad Austral (LIDTUA), CIC-Austral, Pilar, Argentina
- Instituto de Investigación y Desarrollo en Bioingeniería y Bioinformática (IBB), CONICET-UNER, Oro Verde, Argentina
- Facultad de Ingeniería, Universidad Nacional de Entre Ríos, Concepción del Uruguay, Argentina
| | - Ignacio Cassol
- Laboratorio de Investigación, Desarrollo y Transferencia de la Facultad de Ingeniería de la Universidad Austral (LIDTUA), CIC-Austral, Pilar, Argentina
| | | | - Emmanuel Dias-Neto
- Laboratory of Medical Genomics, A. C. Camargo Cancer Center, Sao Paulo, SP, 01508-010, Brazil
| | | | - Daryl M Gohl
- University of Minnesota Genomics Center, Minneapolis, MN, 55455, USA
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Jekaterina Kazantseva
- Center of Food and Fermentation Technologies (TFTAK), Mäealuse 2/4, 12618, Tallinn, Estonia
| | - Muyideen T Haruna
- Bioenvironmental Program, Morgan State University, Baltimore, MD, USA
| | - Peter Menzel
- Labor Berlin Charité Vivantes GmbH, Sylter Str. 2, 13353, Berlin, Germany
| | - Bruno S Moda
- Laboratory of Medical Genomics, A. C. Camargo Cancer Center, Sao Paulo, SP, 01508-010, Brazil
- Laboratory of Computational Biology and Bioinformatics, A.C. Camargo Cancer Center, Sao Paulo, SP, 01508-010, Brazil
| | | | - Diana N Nunes
- Laboratory of Medical Genomics, A. C. Camargo Cancer Center, Sao Paulo, SP, 01508-010, Brazil
| | - Isha R Patel
- Center for Food Safety and Applied Nutrition, Office of Applied Research and Safety Assessment, U. S. Food and Drug Administration, Laurel, MD, 20708, USA
| | - Rodrigo D Peralta
- Laboratorio de Investigación, Desarrollo y Transferencia de la Facultad de Ingeniería de la Universidad Austral (LIDTUA), CIC-Austral, Pilar, Argentina
- Facultad de Ingeniería, Universidad Nacional de Entre Ríos, Concepción del Uruguay, Argentina
| | - Adrien Saliou
- OMICS Hub, BIOASTER, Microbiology Research Institute, Lyon, France
| | - Rolf Schwarzer
- Labor Berlin Charité Vivantes GmbH, Sylter Str. 2, 13353, Berlin, Germany
| | - Samantha Sevilla
- Center for Cancer Research, CCR Collaborative Bioinformatics Resource, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
- Advanced Biomedical Computational Sciences, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Inc., Frederick, MD, 21701, USA
| | - Isabella K T M Takenaka
- Laboratory of Medical Genomics, A. C. Camargo Cancer Center, Sao Paulo, SP, 01508-010, Brazil
| | - Jeremy R Wang
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Rob Knight
- Departments of Pediatrics, Bioengineering and Computer Science & Engineering, and Center for Microbiome Innovation, University of California at San Diego, 9500 Gilman Drive, MC 0763, La Jolla, CA, 92093-0763, USA
| | - Dirk Gevers
- Seed Health, 2100 Abbot Kinney Blvd, Venice, CA, 90291-7003, USA
| | - Scott A Jackson
- Complex Microbial Systems Group, National Institute of Standards and Technology (NIST), Gaithersburg, MD, USA
| |
Collapse
|
2
|
Huttenhower C, Finn RD, McHardy AC. Challenges and opportunities in sharing microbiome data and analyses. Nat Microbiol 2023; 8:1960-1970. [PMID: 37783751 DOI: 10.1038/s41564-023-01484-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2021] [Accepted: 08/28/2023] [Indexed: 10/04/2023]
Abstract
Microbiome data, metadata and analytical workflows have become 'big' in terms of volume and complexity. Although the infrastructure and technologies to share data have been established, the interdisciplinary and multi-omic nature of the field can make resources difficult to identify and use. Following best practices for data deposition requires substantial effort, with sometimes little obvious reward. Gaps remain where microbiome-specific resources for data sharing or reproducibility do not yet exist. We outline available best practices, challenges to their adoption and opportunities in data sharing in microbiome research. We showcase examples of best practices and advocate for their enforcement and incentivization for data sharing. This includes recognition of data curation and sharing endeavours by individuals, institutions, journals and funders. Opportunities for progress include enabling microbiome-specific databases to incorporate future methods for data analysis, integration and reuse.
Collapse
Affiliation(s)
- Curtis Huttenhower
- Harvard Chan Microbiome in Public Health Center, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Departments of Biostatistics and Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Alice Carolyn McHardy
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany.
| |
Collapse
|
3
|
Hu Y, Satten GA, Hu YJ. Impact of Experimental Bias on Compositional Analysis of Microbiome Data. Genes (Basel) 2023; 14:1777. [PMID: 37761917 PMCID: PMC10530728 DOI: 10.3390/genes14091777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 09/05/2023] [Accepted: 09/07/2023] [Indexed: 09/29/2023] Open
Abstract
Microbiome data are subject to experimental bias that is caused by DNA extraction and PCR amplification, among other sources, but this important feature is often ignored when developing statistical methods for analyzing microbiome data. McLaren, Willis, and Callahan (2019) proposed a model for how such biases affect the observed taxonomic profiles; this model assumes the main effects of bias without taxon-taxon interactions. Our newly developed method for testing the differential abundance of taxa, LOCOM, is the first method to account for experimental bias and is robust to the main effect biases. However, there is also evidence for taxon-taxon interactions. In this report, we formulated a model for interaction biases and used simulations based on this model to evaluate the impact of interaction biases on the performance of LOCOM as well as other available compositional analysis methods. Our simulation results indicate that LOCOM remained robust to a reasonable range of interaction biases. The other methods tend to have an inflated FDR even when there were only main effect biases. LOCOM maintained the highest sensitivity even when the other methods could not control the FDR. We thus conclude that LOCOM outperforms the other methods for compositional analysis of microbiome data considered here.
Collapse
Affiliation(s)
- Yingtian Hu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA;
| | - Glen A. Satten
- Department of Gynecology and Obstetrics, Emory University School of Medicine, Atlanta, GA 30322, USA;
| | - Yi-Juan Hu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA;
| |
Collapse
|
4
|
Amit G, Bashan A. Top-down identification of keystone taxa in the microbiome. Nat Commun 2023; 14:3951. [PMID: 37402745 DOI: 10.1038/s41467-023-39459-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Accepted: 06/14/2023] [Indexed: 07/06/2023] Open
Abstract
Keystone taxa in ecological communities are native taxa that play an especially important role in the stability of their ecosystem. However, we still lack an effective framework for identifying these taxa from the available high-throughput sequencing without the notoriously difficult step of reconstructing the detailed network of inter-specific interactions. In addition, while most microbial interaction models assume pair-wise relationships, it is yet unclear whether pair-wise interactions dominate the system, or whether higher-order interactions are relevant. Here we propose a top-down identification framework, which detects keystones by their total influence on the rest of the taxa. Our method does not assume a priori knowledge of pairwise interactions or any specific underlying dynamics and is appropriate to both perturbation experiments and metagenomic cross-sectional surveys. When applied to real high-throughput sequencing of the human gastrointestinal microbiome, we detect a set of candidate keystones and find that they are often part of a keystone module - multiple candidate keystone species with correlated occurrence. The keystone analysis of single-time-point cross-sectional data is also later verified by the evaluation of two-time-points longitudinal sampling. Our framework represents a necessary advancement towards the reliable identification of these key players of complex, real-world microbial communities.
Collapse
Affiliation(s)
- Guy Amit
- Department of Physics, Bar-Ilan University, Ramat-Gan, 590002, Israel
- Department of Natural Sciences, The Open University of Israel, Raanana, 4353701, Israel
| | - Amir Bashan
- Department of Physics, Bar-Ilan University, Ramat-Gan, 590002, Israel.
| |
Collapse
|
5
|
LOCOM: A logistic regression model for testing differential abundance in compositional microbiome data with false discovery rate control. Proc Natl Acad Sci U S A 2022; 119:e2122788119. [PMID: 35867822 PMCID: PMC9335309 DOI: 10.1073/pnas.2122788119] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Compositional analysis is based on the premise that a relatively small proportion of taxa are differentially abundant, while the ratios of the relative abundances of the remaining taxa remain unchanged. Most existing methods use log-transformed data, but log-transformation of data with pervasive zero counts is problematic, and these methods cannot always control the false discovery rate (FDR). Further, high-throughput microbiome data such as 16S amplicon or metagenomic sequencing are subject to experimental biases that are introduced in every step of the experimental workflow. McLaren et al. [eLife 8, e46923 (2019)] have recently proposed a model for how these biases affect relative abundance data. Motivated by this model, we show that the odds ratios in a logistic regression comparing counts in two taxa are invariant to experimental biases. With this motivation, we propose logistic compositional analysis (LOCOM), a robust logistic regression approach to compositional analysis, that does not require pseudocounts. Inference is based on permutation to account for overdispersion and small sample sizes. Traits can be either binary or continuous, and adjustment for confounders is supported. Our simulations indicate that LOCOM always preserved FDR and had much improved sensitivity over existing methods. In contrast, analysis of composition of microbiomes (ANCOM) and ANCOM with bias correction (ANCOM-BC)/ANOVA-Like Differential Expression tool (ALDEx2) had inflated FDR when the effect sizes were small and large, respectively. Only LOCOM was robust to experimental biases in every situation. The flexibility of our method for a variety of microbiome studies is illustrated by the analysis of data from two microbiome studies. Our R package LOCOM is publicly available.
Collapse
|
6
|
Zheng L, Sun R, Zhu Y, Li Z, She X, Jian X, Yu F, Deng X, Sai B, Wang L, Zhou W, Wu M, Li G, Tang J, Jia W, Xiang J. Lung microbiome alterations in NSCLC patients. Sci Rep 2021; 11:11736. [PMID: 34083661 PMCID: PMC8175694 DOI: 10.1038/s41598-021-91195-2] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 05/21/2021] [Indexed: 12/22/2022] Open
Abstract
Lung is colonized by a diverse array of microbes and the lung microbiota is profoundly involved in the development of respiratory diseases. There is little knowledge about the role of lung microbiota dysbiosis in lung cancer. In this study, we performed metagenomic sequencing on bronchoalveolar lavage (BAL) from two different sampling methods in non-small cell lung cancer (NSCLC) patients and non-cancer controls. We found the obvious variation between bronchoscopy samples and lobectomy samples. Oral taxa can be found in both bronchoscopy and lobectomy samples and higher abundance of oral taxa can be found in bronchoscopy samples. Although the NSCLC patients had similar microbial communities with non-cancer controls, rare species such as Lactobacillus rossiae, Bacteroides pyogenes, Paenibacillus odorifer, Pseudomonas entomophila, Magnetospirillum gryphiswaldense, fungus Chaetomium globosum et al. showed obvious difference between NSCLC patients and non-cancer controls. Age-, gender-, and smoking-specific species and EGFR expression-related species in NSCLC patients were detected. There results implicated that different lung segments have differential lung microbiome composition. The oral taxa are found in the lobectomy samples suggesting that oral microbiota are the true members of lung microbiota, rather than contamination during bronchoscopy. Lung cancer does not obviously alter the global microbial composition, while rare species are altered more than common species. Certain microbes may be associated with lung cancer progression.
Collapse
Affiliation(s)
- Leliang Zheng
- Hunan Cancer Hospital, The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, Hunan, China.,Cancer Research Institute, School of Basic Medical Science, Central South University, Changsha, Hunan, China.,NHC Key Laboratory of Carcinogenesis and the Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Xiangya Hospital, Central South University, Changsha, Hunan, China.,Hunan Key Laboratory of Nonresolving Inflammation and Cancer, Changsha, 410013, Hunan, China
| | - Ruizheng Sun
- Hunan Cancer Hospital, The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, Hunan, China.,Cancer Research Institute, School of Basic Medical Science, Central South University, Changsha, Hunan, China.,NHC Key Laboratory of Carcinogenesis and the Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Xiangya Hospital, Central South University, Changsha, Hunan, China.,Hunan Key Laboratory of Nonresolving Inflammation and Cancer, Changsha, 410013, Hunan, China
| | - Yinghong Zhu
- Hunan Cancer Hospital, The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, Hunan, China.,Cancer Research Institute, School of Basic Medical Science, Central South University, Changsha, Hunan, China.,NHC Key Laboratory of Carcinogenesis and the Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Xiangya Hospital, Central South University, Changsha, Hunan, China.,Hunan Key Laboratory of Nonresolving Inflammation and Cancer, Changsha, 410013, Hunan, China
| | - Zheng Li
- Hunan Cancer Hospital, The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, Hunan, China.,Cancer Research Institute, School of Basic Medical Science, Central South University, Changsha, Hunan, China.,NHC Key Laboratory of Carcinogenesis and the Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Xiangya Hospital, Central South University, Changsha, Hunan, China.,Hunan Key Laboratory of Nonresolving Inflammation and Cancer, Changsha, 410013, Hunan, China
| | - Xiaoling She
- Department of Pathology, The Second Xiangya Hospital, Central South University, Changsha, 410013, Hunan, China
| | - Xingxing Jian
- Hunan Cancer Hospital, The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, Hunan, China.,Cancer Research Institute, School of Basic Medical Science, Central South University, Changsha, Hunan, China.,NHC Key Laboratory of Carcinogenesis and the Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Xiangya Hospital, Central South University, Changsha, Hunan, China.,Hunan Key Laboratory of Nonresolving Inflammation and Cancer, Changsha, 410013, Hunan, China
| | - Fenglei Yu
- Department of Thoracic Surgery, The Second Xiangya Hospital, Central South University, Changsha, 410013, Hunan, China
| | - Xueyu Deng
- Department of Thoracic Surgery, The Second Xiangya Hospital, Central South University, Changsha, 410013, Hunan, China
| | - Buqing Sai
- Hunan Cancer Hospital, The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, Hunan, China.,Cancer Research Institute, School of Basic Medical Science, Central South University, Changsha, Hunan, China.,NHC Key Laboratory of Carcinogenesis and the Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Xiangya Hospital, Central South University, Changsha, Hunan, China.,Hunan Key Laboratory of Nonresolving Inflammation and Cancer, Changsha, 410013, Hunan, China
| | - Lujuan Wang
- Hunan Cancer Hospital, The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, Hunan, China.,Cancer Research Institute, School of Basic Medical Science, Central South University, Changsha, Hunan, China.,NHC Key Laboratory of Carcinogenesis and the Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Xiangya Hospital, Central South University, Changsha, Hunan, China.,Hunan Key Laboratory of Nonresolving Inflammation and Cancer, Changsha, 410013, Hunan, China
| | - Wen Zhou
- Hunan Cancer Hospital, The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, Hunan, China.,Cancer Research Institute, School of Basic Medical Science, Central South University, Changsha, Hunan, China.,NHC Key Laboratory of Carcinogenesis and the Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Xiangya Hospital, Central South University, Changsha, Hunan, China.,Hunan Key Laboratory of Nonresolving Inflammation and Cancer, Changsha, 410013, Hunan, China
| | - Minghua Wu
- Hunan Cancer Hospital, The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, Hunan, China.,Cancer Research Institute, School of Basic Medical Science, Central South University, Changsha, Hunan, China.,NHC Key Laboratory of Carcinogenesis and the Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Xiangya Hospital, Central South University, Changsha, Hunan, China.,Hunan Key Laboratory of Nonresolving Inflammation and Cancer, Changsha, 410013, Hunan, China
| | - Guiyuan Li
- Hunan Cancer Hospital, The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, Hunan, China.,Cancer Research Institute, School of Basic Medical Science, Central South University, Changsha, Hunan, China.,NHC Key Laboratory of Carcinogenesis and the Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Xiangya Hospital, Central South University, Changsha, Hunan, China.,Hunan Key Laboratory of Nonresolving Inflammation and Cancer, Changsha, 410013, Hunan, China
| | - Jingqun Tang
- Department of Thoracic Surgery, The Second Xiangya Hospital, Central South University, Changsha, 410013, Hunan, China.
| | - Wei Jia
- Hong Kong Phenome Research Centre, School of Chinese Medicine, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China.
| | - Juanjuan Xiang
- Hunan Cancer Hospital, The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, Hunan, China. .,Cancer Research Institute, School of Basic Medical Science, Central South University, Changsha, Hunan, China. .,NHC Key Laboratory of Carcinogenesis and the Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Xiangya Hospital, Central South University, Changsha, Hunan, China. .,Hunan Key Laboratory of Nonresolving Inflammation and Cancer, Changsha, 410013, Hunan, China.
| |
Collapse
|
7
|
Reid T, Bergsveinson J. How Do the Players Play? A Post-Genomic Analysis Paradigm to Understand Aquatic Ecosystem Processes. Front Mol Biosci 2021; 8:662888. [PMID: 34026835 PMCID: PMC8138469 DOI: 10.3389/fmolb.2021.662888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 04/26/2021] [Indexed: 12/01/2022] Open
Abstract
Culture-independent and meta-omics sequencing methods have shed considerable light on the so-called “microbial dark matter” of Earth’s environmental microbiome, improving our understanding of phylogeny, the tree of life, and the vast functional diversity of microorganisms. This influx of sequence data has led to refined and reimagined hypotheses about the role and importance of microbial biomass, that paradoxically, sequencing approaches alone are unable to effectively test. Post-genomic approaches such as metabolomics are providing more sensitive and insightful data to unravel the fundamental operations and intricacies of microbial communities within aquatic systems. We assert that the implementation of integrated post-genomic approaches, specifically metabolomics and metatranscriptomics, is the new frontier of environmental microbiology and ecology, expanding conventional assessments toward a holistic systems biology understanding. Progressing beyond siloed phylogenetic assessments and cataloging of metabolites, toward integrated analysis of expression (metatranscriptomics) and activity (metabolomics) is the most effective approach to provide true insight into microbial contributions toward local and global ecosystem functions. This data in turn creates opportunity for improved regulatory guidelines, biomarker discovery and better integration of modeling frameworks. To that end, critical aquatic environmental issues related to climate change, such as ocean warming and acidification, contamination mitigation, and macro-organism health have reasonable opportunity of being addressed through such an integrative approach. Lastly, we argue that the “post-genomics” paradigm is well served to proactively address the systemic technical issues experienced throughout the genomics revolution and focus on collaborative assessment of field-wide experimental standards of sampling, bioinformatics and statistical treatments.
Collapse
Affiliation(s)
- Thomas Reid
- Canada Centre for Inland Waters, Environment and Climate Change Canada, Burlington, ON, Canada
| | - Jordyn Bergsveinson
- National Hydrology Research Centre, Environment and Climate Change Canada, Saskatoon, SK, Canada
| |
Collapse
|
8
|
Hupfauf S, Etemadi M, Fernández-Delgado Juárez M, Gómez-Brandón M, Insam H, Podmirseg SM. CoMA - an intuitive and user-friendly pipeline for amplicon-sequencing data analysis. PLoS One 2020; 15:e0243241. [PMID: 33264369 PMCID: PMC7710066 DOI: 10.1371/journal.pone.0243241] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Accepted: 11/17/2020] [Indexed: 12/23/2022] Open
Abstract
In recent years, there has been a veritable boost in next-generation sequencing (NGS) of gene amplicons in biological and medical studies. Huge amounts of data are produced and need to be analyzed adequately. Various online and offline analysis tools are available; however, most of them require extensive expertise in computer science or bioinformatics, and often a Linux-based operating system. Here, we introduce "CoMA-Comparative Microbiome Analysis" as a free and intuitive analysis pipeline for amplicon-sequencing data, compatible with any common operating system. Moreover, the tool offers various useful services including data pre-processing, quality checking, clustering to operational taxonomic units (OTUs), taxonomic assignment, data post-processing, data visualization, and statistical appraisal. The workflow results in highly esthetic and publication-ready graphics, as well as output files in standardized formats (e.g. tab-delimited OTU-table, BIOM, NEWICK tree) that can be used for more sophisticated analyses. The CoMA output was validated by a benchmark test, using three mock communities with different sample characteristics (primer set, amplicon length, diversity). The performance was compared with that of Mothur, QIIME and QIIME2-DADA2, popular packages for NGS data analysis. Furthermore, the functionality of CoMA is demonstrated on a practical example, investigating microbial communities from three different soils (grassland, forest, swamp). All tools performed well in the benchmark test and were able to reveal the majority of all genera in the mock communities. Also for the soil samples, the results of CoMA were congruent to those of the other pipelines, in particular when looking at the key microbial players.
Collapse
Affiliation(s)
- Sebastian Hupfauf
- Department of Microbiology, University of Innsbruck, Innsbruck, Austria
| | - Mohammad Etemadi
- Department of Horticultural Science, School of Agriculture, Shiraz University, Shiraz, Iran
| | | | - María Gómez-Brandón
- Department of Ecology and Animal Biology, GEA Group, University of Vigo, Vigo, Spain
| | - Heribert Insam
- Department of Microbiology, University of Innsbruck, Innsbruck, Austria
| | | |
Collapse
|
9
|
Seasonal and diel patterns of abundance and activity of viruses in the Red Sea. Proc Natl Acad Sci U S A 2020; 117:29738-29747. [PMID: 33172994 DOI: 10.1073/pnas.2010783117] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Virus-microbe interactions have been studied in great molecular details for many years in cultured model systems, yielding a plethora of knowledge on how viruses use and manipulate host machinery. Since the advent of molecular techniques and high-throughput sequencing, methods such as cooccurrence, nucleotide composition, and other statistical frameworks have been widely used to infer virus-microbe interactions, overcoming the limitations of culturing methods. However, their accuracy and relevance is still debatable as cooccurrence does not necessarily mean interaction. Here we introduce an ecological perspective of marine viral communities and potential interaction with their hosts, using analyses that make no prior assumptions on specific virus-host pairs. By size fractionating water samples into free viruses and microbes (i.e., also viruses inside or attached to their hosts) and looking at how viral group abundance changes over time along both fractions, we show that the viral community is undergoing a change in rank abundance across seasons, suggesting a seasonal succession of viruses in the Red Sea. We use abundance patterns in the different size fractions to classify viral clusters, indicating potential diverse interactions with their hosts and potential differences in life history traits between major viral groups. Finally, we show hourly resolved variations of intracellular abundance of similar viral groups, which might indicate differences in their infection cycles or metabolic capacities.
Collapse
|
10
|
Leiten EO, Nielsen R, Wiker HG, Bakke PS, Martinsen EMH, Drengenes C, Tangedal S, Husebø GR, Eagan TML. The airway microbiota and exacerbations of COPD. ERJ Open Res 2020; 6:00168-2020. [PMID: 32904583 PMCID: PMC7456643 DOI: 10.1183/23120541.00168-2020] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Accepted: 05/27/2020] [Indexed: 02/06/2023] Open
Abstract
Aim The aim of this study was to investigate whether the compositionality of the lower airway microbiota predicts later exacerbation risk in persons with COPD in a cohort study. Materials and methods We collected lower airways microbiota samples by bronchoalveolar lavage and protected specimen brushes, and oral wash samples from 122 participants with COPD. Bacterial DNA was extracted from all samples, before we sequenced the V3-V4 region of the 16S RNA gene. The frequency of moderate and severe COPD exacerbations was surveyed in telephone interviews and in a follow-up visit. Compositional taxonomy and α and β diversity were compared between participants with and without later exacerbations. Results The four most abundant phyla were Firmicutes, Bacteroidetes, Proteobacteria and Fusobacteria in both groups, and the four most abundant genera were Streptococcus, Veillonella, Prevotella and Gemella. The relative abundances of different taxa showed a large variation between samples and individuals, and no statistically significant difference of either compositional taxonomy, or α or β diversity could be found between participants with and without COPD exacerbations within follow-up. Conclusion The findings from the current study indicate that individual differences in the lower airway microbiota in persons with COPD far outweigh group differences between frequent and nonfrequent COPD exacerbators, and that the compositionality of the microbiota is so complex as to present large challenges for use as a biomarker of later exacerbations. Contrary to previous reports, in this study there were no significant associations between the lung microbiota in stable COPD and COPD exacerbation frequencyhttps://bit.ly/2ZVcNdG
Collapse
Affiliation(s)
| | - Rune Nielsen
- Dept of Clinical Science, University of Bergen, Bergen, Norway.,Dept of Thoracic Medicine, Haukeland University Hospital, Bergen, Norway
| | - Harald Gotten Wiker
- Dept of Clinical Science, University of Bergen, Bergen, Norway.,Dept of Microbiology, Haukeland University Hospital, Bergen, Norway
| | | | | | - Christine Drengenes
- Dept of Clinical Science, University of Bergen, Bergen, Norway.,Dept of Thoracic Medicine, Haukeland University Hospital, Bergen, Norway
| | - Solveig Tangedal
- Dept of Clinical Science, University of Bergen, Bergen, Norway.,Dept of Thoracic Medicine, Haukeland University Hospital, Bergen, Norway
| | - Gunnar Reksten Husebø
- Dept of Clinical Science, University of Bergen, Bergen, Norway.,Dept of Thoracic Medicine, Haukeland University Hospital, Bergen, Norway
| | - Tomas Mikal Lind Eagan
- Dept of Clinical Science, University of Bergen, Bergen, Norway.,Dept of Thoracic Medicine, Haukeland University Hospital, Bergen, Norway
| |
Collapse
|
11
|
Berman HL, McLaren MR, Callahan BJ. Understanding and interpreting community sequencing measurements of the vaginal microbiome. BJOG 2020; 127:139-146. [PMID: 31597208 PMCID: PMC10801814 DOI: 10.1111/1471-0528.15978] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/30/2019] [Indexed: 02/03/2023]
Abstract
Community-wide high-throughput sequencing has transformed the study of the vaginal microbiome, and clinical applications are on the horizon. Here we outline the three main community sequencing methods: (1) amplicon sequencing, (2) shotgun metagenomic sequencing, and (3) metatranscriptomic sequencing. We discuss the advantages and limitations of community sequencing generally, and the unique strengths and weaknesses of each method. We briefly review the contributions of community sequencing to vaginal microbiome research and practice. We develop suggestions for critically interpreting research results and potential clinical applications based on community sequencing of the vaginal microbiome. TWEETABLE ABSTRACT: We review the advantages and limitations of amplicon sequencing, metagenomics, and metatranscriptomics methods for the study of the vaginal microbiome.
Collapse
Affiliation(s)
- HL Berman
- Department of Population Health and Pathobiology, North Carolina State University, Raleigh, NC, USA
| | - MR McLaren
- Department of Population Health and Pathobiology, North Carolina State University, Raleigh, NC, USA
| | - BJ Callahan
- Department of Population Health and Pathobiology, North Carolina State University, Raleigh, NC, USA
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA
| |
Collapse
|
12
|
Allen-Vercoe E, Carmical JR, Forry SP, Gail MH, Sinha R. Perspectives for Consideration in the Development of Microbial Cell Reference Materials. Cancer Epidemiol Biomarkers Prev 2019; 28:1949-1954. [PMID: 31515292 DOI: 10.1158/1055-9965.epi-19-0557] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Revised: 07/25/2019] [Accepted: 09/06/2019] [Indexed: 12/16/2022] Open
Abstract
Microbiome measurement and analyses benefit greatly from incorporation of reference materials as controls. However, there are many points to consider in defining an ideal whole-cell reference material standard. Such a standard would embody all the diversity and measurement challenges present in real samples, would be completely characterized to provide "ground truth" data, and would be inexpensive and widely available. This ideal is, unfortunately, not readily attainable because of the diverse nature of different sequencing projects. Some applications may benefit most from highly complex reference materials, while others will value characterization or low expense more highly. The selection of appropriate microbial whole-cell reference materials to benchmark and validate microbial measurements should be considered carefully and may vary among specific applications. In this article, we describe a perspective on the development of whole-cell microbial reference materials for use in metagenomics analyses.
Collapse
Affiliation(s)
- Emma Allen-Vercoe
- Department of Molecular and Cellular Biology, University of Guelph, Guelph, Ontario, Canada.
| | - Joseph Russell Carmical
- Alkek Center for Metagenomics & Microbiome Research (CMMR), Baylor College of Medicine, Houston, Texas
| | - Samuel P Forry
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, Maryland
| | - Mitchell H Gail
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| | - Rashmi Sinha
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| |
Collapse
|
13
|
McLaren MR, Willis AD, Callahan BJ. Consistent and correctable bias in metagenomic sequencing experiments. eLife 2019; 8:46923. [PMID: 31502536 PMCID: PMC6739870 DOI: 10.7554/elife.46923] [Citation(s) in RCA: 200] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2019] [Accepted: 08/10/2019] [Indexed: 12/22/2022] Open
Abstract
Marker-gene and metagenomic sequencing have profoundly expanded our ability to measure biological communities. But the measurements they provide differ from the truth, often dramatically, because these experiments are biased toward detecting some taxa over others. This experimental bias makes the taxon or gene abundances measured by different protocols quantitatively incomparable and can lead to spurious biological conclusions. We propose a mathematical model for how bias distorts community measurements based on the properties of real experiments. We validate this model with 16S rRNA gene and shotgun metagenomics data from defined bacterial communities. Our model better fits the experimental data despite being simpler than previous models. We illustrate how our model can be used to evaluate protocols, to understand the effect of bias on downstream statistical analyses, and to measure and correct bias given suitable calibration controls. These results illuminate new avenues toward truly quantitative and reproducible metagenomics measurements.
Collapse
Affiliation(s)
- Michael R McLaren
- Department of Population Health and Pathobiology, North Carolina State University, Raleigh, United States
| | - Amy D Willis
- Department of Biostatistics, University of Washington, Seattle, United States
| | - Benjamin J Callahan
- Department of Population Health and Pathobiology, North Carolina State University, Raleigh, United States.,Bioinformatics Research Center, North Carolina State University, Raleigh, United States
| |
Collapse
|
14
|
McLaren MR, Willis AD, Callahan BJ. Consistent and correctable bias in metagenomic sequencing experiments. eLife 2019; 8:46923. [PMID: 31502536 DOI: 10.1101/559831] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2019] [Accepted: 08/10/2019] [Indexed: 05/26/2023] Open
Abstract
Marker-gene and metagenomic sequencing have profoundly expanded our ability to measure biological communities. But the measurements they provide differ from the truth, often dramatically, because these experiments are biased toward detecting some taxa over others. This experimental bias makes the taxon or gene abundances measured by different protocols quantitatively incomparable and can lead to spurious biological conclusions. We propose a mathematical model for how bias distorts community measurements based on the properties of real experiments. We validate this model with 16S rRNA gene and shotgun metagenomics data from defined bacterial communities. Our model better fits the experimental data despite being simpler than previous models. We illustrate how our model can be used to evaluate protocols, to understand the effect of bias on downstream statistical analyses, and to measure and correct bias given suitable calibration controls. These results illuminate new avenues toward truly quantitative and reproducible metagenomics measurements.
Collapse
Affiliation(s)
- Michael R McLaren
- Department of Population Health and Pathobiology, North Carolina State University, Raleigh, United States
| | - Amy D Willis
- Department of Biostatistics, University of Washington, Seattle, United States
| | - Benjamin J Callahan
- Department of Population Health and Pathobiology, North Carolina State University, Raleigh, United States
- Bioinformatics Research Center, North Carolina State University, Raleigh, United States
| |
Collapse
|
15
|
Abe K, Hirayama M, Ohno K, Shimamura T. A latent allocation model for the analysis of microbial composition and disease. BMC Bioinformatics 2018; 19:519. [PMID: 30598099 PMCID: PMC6311924 DOI: 10.1186/s12859-018-2530-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Background Establishing the relationship between microbiota and specific diseases is important but requires appropriate statistical methodology. A specialized feature of microbiome count data is the presence of a large number of zeros, which makes it difficult to analyze in case-control studies. Most existing approaches either add a small number called a pseudo-count or use probability models such as the multinomial and Dirichlet-multinomial distributions to explain the excess zero counts, which may produce unnecessary biases and impose a correlation structure taht is unsuitable for microbiome data. Results The purpose of this article is to develop a new probabilistic model, called BERnoulli and MUltinomial Distribution-based latent Allocation (BERMUDA), to address these problems. BERMUDA enables us to describe the differences in bacteria composition and a certain disease among samples. We also provide a simple and efficient learning procedure for the proposed model using an annealing EM algorithm. Conclusion We illustrate the performance of the proposed method both through both the simulation and real data analysis. BERMUDA is implemented with R and is available from GitHub (https://github.com/abikoushi/Bermuda).
Collapse
Affiliation(s)
- Ko Abe
- Division of Systems Biology, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, 4668550, Japan
| | - Masaaki Hirayama
- School of Health Sciences, Nagoya University Graduate School of Medicine, 1-1-20 Daiko-Minami, Higashi-ku, Nagoya, 61-8873, Japan
| | - Kinji Ohno
- Division of Neurogenetics, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, 4668550, Japan
| | - Teppei Shimamura
- Division of Systems Biology, Nagoya university Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, 4668550, Japan.
| |
Collapse
|
16
|
Impact of DNA extraction method and targeted 16S-rRNA hypervariable region on oral microbiota profiling. Sci Rep 2018; 8:16321. [PMID: 30397210 PMCID: PMC6218491 DOI: 10.1038/s41598-018-34294-x] [Citation(s) in RCA: 94] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2018] [Accepted: 09/28/2018] [Indexed: 12/15/2022] Open
Abstract
Amplification and sequencing of 16S amplicons are widely used for profiling the structure of oral microbiota. However, it remains not clear whether and to what degree DNA extraction and targeted 16S rRNA hypervariable regions influence the analysis. Based on a mock community consisting of five oral bacterial species in equal abundance, we compared the 16S amplicon sequencing results on the Illumina MiSeq platform from six frequently employed DNA extraction procedures and three pairs of widely used 16S rRNA hypervariable primers targeting different 16S rRNA regions. Technical reproducibility of selected 16S regions was also assessed. DNA extraction method exerted considerable influence on the observed bacterial diversity while hypervariable regions had a relatively minor effect. Protocols with beads added to the enzyme-mediated DNA extraction reaction produced more accurate bacterial community structure than those without either beads or enzymes. Hypervariable regions targeting V3-V4 and V4-V5 seemed to produce more reproducible results than V1-V3. Neither sequencing batch nor change of operator affected the reproducibility of bacterial diversity profiles. Therefore, DNA extraction strategy and 16S rDNA hypervariable regions both influenced the results of oral microbiota biodiversity profiling, thus should be carefully considered in study design and data interpretation.
Collapse
|
17
|
Abstract
The human microbiome is associated with complex disorders such as diabetes, cancer, obesity and cardiovascular disorders. Recent technological developments have allowed researchers to fully quantify the composition of the microbiome using culture-independent approaches, resulting in a large amount of microbiome data, which provide invaluable opportunities to assess the important contributions of the microbiome to human health and disease. In this chapter, we discuss and evaluate multiple statistical approaches for processing, summarizing, and analyzing microbiome data. Specifically, we provide programming scripts for processing microbiome data using QIIME and calculating alpha and beta diversities, assessing the association between diversities and outcomes of interest using R programs, as well as interpretation of results. We illustrate the methods in the context of analyzing the foregut microbiome in esophageal adenocarcinoma.
Collapse
|
18
|
Brown EG, Tanner CM, Goldman SM. The Microbiome in Neurodegenerative Disease. CURRENT GERIATRICS REPORTS 2018. [DOI: 10.1007/s13670-018-0240-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
19
|
Mallick H, Ma S, Franzosa EA, Vatanen T, Morgan XC, Huttenhower C. Experimental design and quantitative analysis of microbial community multiomics. Genome Biol 2017; 18:228. [PMID: 29187204 PMCID: PMC5708111 DOI: 10.1186/s13059-017-1359-z] [Citation(s) in RCA: 114] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Studies of the microbiome have become increasingly sophisticated, and multiple sequence-based, molecular methods as well as culture-based methods exist for population-scale microbiome profiles. To link the resulting host and microbial data types to human health, several experimental design considerations, data analysis challenges, and statistical epidemiological approaches must be addressed. Here, we survey current best practices for experimental design in microbiome molecular epidemiology, including technologies for generating, analyzing, and integrating microbiome multiomics data. We highlight studies that have identified molecular bioactives that influence human health, and we suggest steps for scaling translational microbiome research to high-throughput target discovery across large populations.
Collapse
Affiliation(s)
- Himel Mallick
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Siyuan Ma
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Eric A Franzosa
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Tommi Vatanen
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Xochitl C Morgan
- Department of Microbiology and Immunology, The University of Otago, Dunedin, New Zealand
| | - Curtis Huttenhower
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
| |
Collapse
|
20
|
Statistical analysis of co-occurrence patterns in microbial presence-absence datasets. PLoS One 2017; 12:e0187132. [PMID: 29145425 PMCID: PMC5689832 DOI: 10.1371/journal.pone.0187132] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Accepted: 10/13/2017] [Indexed: 12/31/2022] Open
Abstract
Drawing on a long history in macroecology, correlation analysis of microbiome datasets is becoming a common practice for identifying relationships or shared ecological niches among bacterial taxa. However, many of the statistical issues that plague such analyses in macroscale communities remain unresolved for microbial communities. Here, we discuss problems in the analysis of microbial species correlations based on presence-absence data. We focus on presence-absence data because this information is more readily obtainable from sequencing studies, especially for whole-genome sequencing, where abundance estimation is still in its infancy. First, we show how Pearson's correlation coefficient (r) and Jaccard's index (J)-two of the most common metrics for correlation analysis of presence-absence data-can contradict each other when applied to a typical microbiome dataset. In our dataset, for example, 14% of species-pairs predicted to be significantly correlated by r were not predicted to be significantly correlated using J, while 37.4% of species-pairs predicted to be significantly correlated by J were not predicted to be significantly correlated using r. Mismatch was particularly common among species-pairs with at least one rare species (<10% prevalence), explaining why r and J might differ more strongly in microbiome datasets, where there are large numbers of rare taxa. Indeed 74% of all species-pairs in our study had at least one rare species. Next, we show how Pearson's correlation coefficient can result in artificial inflation of positive taxon relationships and how this is a particular problem for microbiome studies. We then illustrate how Jaccard's index of similarity (J) can yield improvements over Pearson's correlation coefficient. However, the standard null model for Jaccard's index is flawed, and thus introduces its own set of spurious conclusions. We thus identify a better null model based on a hypergeometric distribution, which appropriately corrects for species prevalence. This model is available from recent statistics literature, and can be used for evaluating the significance of any value of an empirically observed Jaccard's index. The resulting simple, yet effective method for handling correlation analysis of microbial presence-absence datasets provides a robust means of testing and finding relationships and/or shared environmental responses among microbial taxa.
Collapse
|
21
|
Nascimento MM, Zaura E, Mira A, Takahashi N, Ten Cate JM. Second Era of OMICS in Caries Research: Moving Past the Phase of Disillusionment. J Dent Res 2017; 96:733-740. [PMID: 28384412 DOI: 10.1177/0022034517701902] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Novel approaches using OMICS techniques enable a collective assessment of multiple related biological units, including genes, gene expression, proteins, and metabolites. In the past decade, next-generation sequencing ( NGS) technologies were improved by longer sequence reads and the development of genome databases and user-friendly pipelines for data analysis, all accessible at lower cost. This has generated an outburst of high-throughput data. The application of OMICS has provided more depth to existing hypotheses as well as new insights in the etiology of dental caries. For example, the determination of complete bacterial microbiomes of oral samples rather than selected species, together with oral metatranscriptome and metabolome analyses, supports the viewpoint of dysbiosis of the supragingival biofilms. In addition, metabolome studies have been instrumental in disclosing the contributions of major pathways for central carbon and amino acid metabolisms to biofilm pH homeostasis. New, often noncultured, oral streptococci have been identified, and their phenotypic characterization has revealed candidates for probiotic therapy. Although findings from OMICS research have been greatly informative, problems related to study design, data quality, integration, and reproducibility still need to be addressed. Also, the emergence and continuous updates of these computationally demanding technologies require expertise in advanced bioinformatics for reliable interpretation of data. Despite the obstacles cited above, OMICS research is expected to encourage the discovery of novel caries biomarkers and the development of next-generation diagnostics and therapies for caries control. These observations apply equally to the study of other oral diseases.
Collapse
Affiliation(s)
- M M Nascimento
- 1 Department of Restorative Dental Sciences, Division of Operative Dentistry, College of Dentistry, University of Florida, Gainesville, FL, USA
| | - E Zaura
- 2 Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam, University of Amsterdam and Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - A Mira
- 3 Department of Health & Genomics, Center for Advanced Research in Public Health, FISABIO Foundation, Valencia, Spain
| | - N Takahashi
- 4 Department of Oral Biology, Division of Oral Ecology and Biochemistry, Tohoku University Graduate School of Dentistry, Sendai, Japan
| | - J M Ten Cate
- 5 Royal Netherlands Academy of Arts and Sciences (KNAW), Amsterdam, the Netherlands
| |
Collapse
|
22
|
Foxman B, Seitz SM, Rothenberg R. Epidemiology and the microbiome. Ann Epidemiol 2016; 26:386-7. [PMID: 27180115 PMCID: PMC10519180 DOI: 10.1016/j.annepidem.2016.04.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Accepted: 04/09/2016] [Indexed: 12/12/2022]
Affiliation(s)
- Betsy Foxman
- Department of Epidemiology, University of Michigan, Ann Arbor.
| | | | - Richard Rothenberg
- Division of Epidemiology and Biostatistics, Georgia State University, Atlanta
| |
Collapse
|