1
|
Regueira-Iglesias A, Suárez-Rodríguez B, Blanco-Pintos T, Relvas M, Alonso-Sampedro M, Balsa-Castro C, Tomás I. The salivary microbiome as a diagnostic biomarker of periodontitis: a 16S multi-batch study before and after the removal of batch effects. Front Cell Infect Microbiol 2024; 14:1405699. [PMID: 39071165 PMCID: PMC11272481 DOI: 10.3389/fcimb.2024.1405699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Accepted: 06/17/2024] [Indexed: 07/30/2024] Open
Abstract
Introduction Microbiome-based clinical applications that improve diagnosis related to oral health are of great interest to precision dentistry. Predictive studies on the salivary microbiome are scarce and of low methodological quality (low sample sizes, lack of biological heterogeneity, and absence of a validation process). None of them evaluates the impact of confounding factors as batch effects (BEs). This is the first 16S multi-batch study to analyze the salivary microbiome at the amplicon sequence variant (ASV) level in terms of differential abundance and machine learning models. This is done in periodontally healthy and periodontitis patients before and after removing BEs. Methods Saliva was collected from 124 patients (50 healthy, 74 periodontitis) in our setting. Sequencing of the V3-V4 16S rRNA gene region was performed in Illumina MiSeq. In parallel, searches were conducted on four databases to identify previous Illumina V3-V4 sequencing studies on the salivary microbiome. Investigations that met predefined criteria were included in the analysis, and the own and external sequences were processed using the same bioinformatics protocol. The statistical analysis was performed in the R-Bioconductor environment. Results The elimination of BEs reduced the number of ASVs with differential abundance between the groups by approximately one-third (Before=265; After=190). Before removing BEs, the model constructed using all study samples (796) comprised 16 ASVs (0.16%) and had an area under the curve (AUC) of 0.944, sensitivity of 90.73%, and specificity of 87.16%. The model built using two-thirds of the specimens (training=531) comprised 35 ASVs (0.36%) and had an AUC of 0.955, sensitivity of 86.54%, and specificity of 90.06% after being validated in the remaining one-third (test=265). After removing BEs, the models required more ASVs (all samples=200-2.03%; training=100-1.01%) to obtain slightly lower AUC (all=0.935; test=0.947), lower sensitivity (all=81.79%; test=78.85%), and similar specificity (all=91.51%; test=90.68%). Conclusions The removal of BEs controls false positive ASVs in the differential abundance analysis. However, their elimination implies a significantly larger number of predictor taxa to achieve optimal performance, creating less robust classifiers. As all the provided models can accurately discriminate health from periodontitis, implying good/excellent sensitivities/specificities, the salivary microbiome demonstrates potential clinical applicability as a precision diagnostic tool for periodontitis.
Collapse
Affiliation(s)
- Alba Regueira-Iglesias
- Oral Sciences Research Group, Special Needs Unit, Department of Surgery and Medical-Surgical Specialties, School of Medicine and Dentistry, Instituto de Investigación Sanitaria de Santiago (IDIS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
| | - Berta Suárez-Rodríguez
- Oral Sciences Research Group, Special Needs Unit, Department of Surgery and Medical-Surgical Specialties, School of Medicine and Dentistry, Instituto de Investigación Sanitaria de Santiago (IDIS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
| | - Triana Blanco-Pintos
- Oral Sciences Research Group, Special Needs Unit, Department of Surgery and Medical-Surgical Specialties, School of Medicine and Dentistry, Instituto de Investigación Sanitaria de Santiago (IDIS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
| | - Marta Relvas
- Instituto Universitário de Ciências da Saúde, Cooperativa de Ensino Superior Politécnico e Universitário (IUCS-CESPU), Unidade de Investigação em Patologia e Reabilitação Oral (UNIPRO), Gandra, Portugal
| | - Manuela Alonso-Sampedro
- Department of Internal Medicine and Clinical Epidemiology, Instituto de Investigación Sanitaria de Santiago (IDIS), Complejo Hospitalario Universitario, Santiago de Compostela, Spain
| | - Carlos Balsa-Castro
- Oral Sciences Research Group, Special Needs Unit, Department of Surgery and Medical-Surgical Specialties, School of Medicine and Dentistry, Instituto de Investigación Sanitaria de Santiago (IDIS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
| | - Inmaculada Tomás
- Oral Sciences Research Group, Special Needs Unit, Department of Surgery and Medical-Surgical Specialties, School of Medicine and Dentistry, Instituto de Investigación Sanitaria de Santiago (IDIS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
| |
Collapse
|
2
|
Isokääntä H, Tomnikov N, Vanhatalo S, Munukka E, Huovinen P, Hakanen AJ, Kallonen T. High-throughput DNA extraction strategy for fecal microbiome studies. Microbiol Spectr 2024; 12:e0293223. [PMID: 38747618 DOI: 10.1128/spectrum.02932-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 04/19/2024] [Indexed: 06/06/2024] Open
Abstract
Microbiome studies are becoming larger in size to detect the potentially small effect that environmental factors have on our gut microbiomes, or that the microbiome has on our health. Therefore, fast and reproducible DNA isolation methods are needed to handle thousands of fecal samples. We used the Chemagic 360 chemistry and Magnetic Separation Module I (MSMI) instrument to compare two sample preservatives and four different pre-treatment protocols to find an optimal method for DNA isolation from thousands of fecal samples. The pre-treatments included bead beating, sample handling in tube and plate format, and proteinase K incubation. The optimal method offers a sufficient yield of high-quality DNA without contamination. Three human fecal samples (adult, senior, and infant) with technical replicates were extracted. The extraction included negative controls (OMNIgeneGUT, DNA/RNA shield fluid, and Chemagic Lysis Buffer 1) to detect cross-contamination and ZymoBIOMICS Gut Microbiome Standard as a positive control to mimic the human gut microbiome and assess sensitivity of the extraction method. All samples were extracted using Chemagic DNA Stool 200 H96 kit (PerkinElmer, Finland). The samples were collected in two preservatives, OMNIgeneGUT and DNA/RNA shield fluid. DNA quantity was measured using Qubit-fluorometer, DNA purity and quality using gel electrophoresis, and taxonomic signatures with 16S rRNA gene-based sequencing with V3V4 and V4 regions. Bead beating increased bacterial diversity. The largest increase was detected in gram-positive genera Blautia, Bifidobacterium, and Ruminococcus. Preservatives showed minor differences in bacterial abundances. The profiles between the V3V4 and V4 regions differed considerably with lower diversity samples. Negative controls showed signs from genera abundant in fecal samples. Technical replicates of the Gut Standard and stool samples showed low variation. The selected isolation protocol included recommended steps from manufacturer as well as bead beating. Bead beating was found to be necessary to detect hard-to-lyse bacteria. The protocol was reproducible in terms of DNA yield among different stool replicates and the ZymoBIOMICS Gut Microbiome Standard. The MSM1 instrument and pre-treatment in a 96-format offered the possibility of automation and handling of large sample collections. Both preservatives were feasible in terms of sample handling and had low variation in taxonomic signatures. The 16S rRNA target region had a high impact on the composition of the bacterial profile. IMPORTANCE Next-generation sequencing (NGS) is a widely used method for determining the composition of the gut microbiota. Due to the differences in the gut microbiota composition between individuals, microbiome studies have expanded into large population studies to maximize detection of small effects on microbe-host interactions. Thus, the demand for a rapid and reliable microbial profiling is continuously increasing, making the optimization of high-throughput 96-format DNA extraction integral for NGS-based downstream applications. However, experimental protocols are prone to bias and errors from sample collection and storage, to DNA extraction, primer selection and sequencing, and bioinformatics analyses. Methodological bias can contribute to differences in microbiome profiles, causing variability across studies and laboratories using different protocols. To improve consistency and confidence of the measurements, the standardization of microbiome analysis methods has been recognized in many fields.
Collapse
Affiliation(s)
- Heidi Isokääntä
- Infections and Immunity Unit, Institute of Biomedicine, University of Turku, Turku, Finland
- Centre for Population Health Research, University of Turku, Turku, Finland
| | - Natalie Tomnikov
- Department of Clinical Microbiology, Tyks Laboratories, Turku University Hospital, Turku, Finland
| | - Sanja Vanhatalo
- Infections and Immunity Unit, Institute of Biomedicine, University of Turku, Turku, Finland
| | - Eveliina Munukka
- Clinical Microbiome Bank, Microbe Center, Turku University Hospital and University of Turku, Turku, Finland
- Division of Digestive Surgery and Urology, Turku University Hospital, Turku, Finland
| | - Pentti Huovinen
- Infections and Immunity Unit, Institute of Biomedicine, University of Turku, Turku, Finland
| | - Antti J Hakanen
- Infections and Immunity Unit, Institute of Biomedicine, University of Turku, Turku, Finland
- Department of Clinical Microbiology, Tyks Laboratories, Turku University Hospital, Turku, Finland
- Clinical Microbiome Bank, Microbe Center, Turku University Hospital and University of Turku, Turku, Finland
| | - Teemu Kallonen
- Infections and Immunity Unit, Institute of Biomedicine, University of Turku, Turku, Finland
- Department of Clinical Microbiology, Tyks Laboratories, Turku University Hospital, Turku, Finland
- Clinical Microbiome Bank, Microbe Center, Turku University Hospital and University of Turku, Turku, Finland
| |
Collapse
|
3
|
Gao W, Lin W, Li Q, Chen W, Yin W, Zhu X, Gao S, Liu L, Li W, Wu D, Zhang G, Zhu R, Jiao N. Identification and validation of microbial biomarkers from cross-cohort datasets using xMarkerFinder. Nat Protoc 2024:10.1038/s41596-024-00999-9. [PMID: 38745111 DOI: 10.1038/s41596-024-00999-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 03/05/2024] [Indexed: 05/16/2024]
Abstract
Microbial signatures have emerged as promising biomarkers for disease diagnostics and prognostics, yet their variability across different studies calls for a standardized approach to biomarker research. Therefore, we introduce xMarkerFinder, a four-stage computational framework for microbial biomarker identification with comprehensive validations from cross-cohort datasets, including differential signature identification, model construction, model validation and biomarker interpretation. xMarkerFinder enables the identification and validation of reproducible biomarkers for cross-cohort studies, along with the establishment of classification models and potential microbiome-induced mechanisms. Originally developed for gut microbiome research, xMarkerFinder's adaptable design makes it applicable to various microbial habitats and data types. Distinct from existing biomarker research tools that typically concentrate on a singular aspect, xMarkerFinder uniquely incorporates a sophisticated feature selection process, specifically designed to address the heterogeneity between different cohorts, extensive internal and external validations, and detailed specificity assessments. Execution time varies depending on the sample size, selected algorithm and computational resource. Accessible via GitHub ( https://github.com/tjcadd2020/xMarkerFinder ), xMarkerFinder supports users with diverse expertise levels through different execution options, including step-to-step scripts with detailed tutorials and frequently asked questions, a single-command execution script, a ready-to-use Docker image and a user-friendly web server ( https://www.biosino.org/xmarkerfinder ).
Collapse
Affiliation(s)
- Wenxing Gao
- The Shanghai Tenth People's Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, P. R. China
| | - Weili Lin
- The Shanghai Tenth People's Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, P. R. China
| | - Qiang Li
- National Genomics Data Center & Bio-Med Big Data Center, Chinese Academy of Sciences Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of the Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, P. R. China
| | - Wanning Chen
- The Shanghai Tenth People's Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, P. R. China
| | - Wenjing Yin
- The Shanghai Tenth People's Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, P. R. China
| | - Xinyue Zhu
- The Shanghai Tenth People's Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, P. R. China
| | - Sheng Gao
- The Shanghai Tenth People's Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, P. R. China
| | - Lei Liu
- The Shanghai Tenth People's Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, P. R. China
| | - Wenjie Li
- Shanghai Southgene Technology Co., Ltd., Shanghai, P. R. China
| | - Dingfeng Wu
- National Clinical Research Center for Child Health, the Children's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, P. R. China
| | - Guoqing Zhang
- National Genomics Data Center & Bio-Med Big Data Center, Chinese Academy of Sciences Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of the Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, P. R. China.
| | - Ruixin Zhu
- The Shanghai Tenth People's Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, P. R. China.
| | - Na Jiao
- National Clinical Research Center for Child Health, the Children's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, P. R. China.
- State Key Laboratory of Genetic Engineering, Fudan Microbiome Center, School of Life Sciences, Fudan University, Shanghai, P. R. China.
| |
Collapse
|
4
|
Pasciullo Boychuck S, Brenner LJ, Gagorik CN, Schamel JT, Baker S, Tran E, vonHoldt BM, Koepfli K, Maldonado JE, DeCandia AL. The gut microbiomes of Channel Island foxes and island spotted skunks exhibit fine-scale differentiation across host species and island populations. Ecol Evol 2024; 14:e11017. [PMID: 38362164 PMCID: PMC10867392 DOI: 10.1002/ece3.11017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 12/09/2023] [Accepted: 12/11/2023] [Indexed: 02/17/2024] Open
Abstract
California's Channel Islands are home to two endemic mammalian carnivores: island foxes (Urocyon littoralis) and island spotted skunks (Spilogale gracilis amphiala). Although it is rare for two insular terrestrial carnivores to coexist, these known competitors persist on both Santa Cruz Island and Santa Rosa Island. We hypothesized that examination of their gut microbial communities would provide insight into the factors that enable this coexistence, as microbial symbionts often reflect host evolutionary history and contemporary ecology. Using rectal swabs collected from island foxes and island spotted skunks sampled across both islands, we generated 16S rRNA amplicon sequencing data to characterize their gut microbiomes. While island foxes and island spotted skunks both harbored the core mammalian microbiome, host species explained the largest proportion of variation in the dataset. We further identified intraspecific variation between island populations, with greater differentiation observed between more specialist island spotted skunk populations compared to more generalist island fox populations. This pattern may reflect differences in resource utilization following fine-scale niche differentiation. It may further reflect evolutionary differences regarding the timing of intraspecific separation. Considered together, this study contributes to the growing catalog of wildlife microbiome studies, with important implications for understanding how eco-evolutionary processes enable the coexistence of terrestrial carnivores-and their microbiomes-in island environments.
Collapse
Affiliation(s)
| | | | | | | | | | - Elton Tran
- Ecology and Evolutionary BiologyPrinceton UniversityPrincetonNew JerseyUSA
| | | | - Klaus‐Peter Koepfli
- Center for Species SurvivalSmithsonian's National Zoo & Conservation Biology InstituteFront RoyalVirginiaUSA
- Smithsonian‐Mason School of ConservationGeorge Mason UniversityFront RoyalVirginiaUSA
| | - Jesús E. Maldonado
- Center for Conservation GenomicsSmithsonian's National Zoo & Conservation Biology InstituteWashingtonDCUSA
| | - Alexandra L. DeCandia
- Biology, Georgetown UniversityWashingtonDCUSA
- Center for Conservation GenomicsSmithsonian's National Zoo & Conservation Biology InstituteWashingtonDCUSA
| |
Collapse
|
5
|
Kim MJ, Jung DR, Lee JM, Kim I, Son H, Kim ES, Shin JH. Microbial dysbiosis index for assessing colitis status in mouse models: A systematic review and meta-analysis. iScience 2024; 27:108657. [PMID: 38205250 PMCID: PMC10777064 DOI: 10.1016/j.isci.2023.108657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 09/07/2023] [Accepted: 12/04/2023] [Indexed: 01/12/2024] Open
Abstract
Although countless gut microbiome studies on colitis using mouse models have been carried out, experiments with small sample sizes have encountered reproducibility limitations because of batch effects and statistical errors. In this study, dextran-sodium-sulfate-induced microbial dysbiosis index (DiMDI) was introduced as a reliable dysbiosis index that can be used to assess the state of microbial dysbiosis in DSS-induced mouse models. Meta-analysis of 189 datasets from 11 independent studies was performed to construct the DiMDI. Microbial dysbiosis biomarkers, Muribaculaceae, Alistipes, Turicibacter, and Bacteroides, were selected through four different feature selection methods and used to construct the DiMDI. This index demonstrated a high accuracy of 82.3% and showed strong robustness (88.9%) in the independent cohort. Therefore, DiMDI may be used as a standard for assessing microbial imbalance in DSS-induced mouse models and may contribute to the development of reliable colitis microbiome studies in mouse experiments.
Collapse
Affiliation(s)
- Min-Ji Kim
- Department of Applied Biosciences, Kyungpook National University, Daegu 41566, Republic of Korea
| | - Da-Ryung Jung
- Department of Applied Biosciences, Kyungpook National University, Daegu 41566, Republic of Korea
| | - Ji-Min Lee
- Cell & Matrix Research Institute, Kyungpook National University, Daegu 41940, Republic of Korea
| | - Ikwhan Kim
- NGS Core Facility, Kyungpook National University, Daegu 41566, Republic of Korea
| | - HyunWoo Son
- Department of Applied Biosciences, Kyungpook National University, Daegu 41566, Republic of Korea
| | - Eun Soo Kim
- Division of Gastroenterology and Hepatology, Department of Internal Medicine, School of Medicine, Kyungpook National University, Daegu 41944, Republic of Korea
| | - Jae-Ho Shin
- Department of Applied Biosciences, Kyungpook National University, Daegu 41566, Republic of Korea
- NGS Core Facility, Kyungpook National University, Daegu 41566, Republic of Korea
| |
Collapse
|
6
|
Rojas-Velazquez D, Kidwai S, Kraneveld AD, Tonda A, Oberski D, Garssen J, Lopez-Rincon A. Methodology for biomarker discovery with reproducibility in microbiome data using machine learning. BMC Bioinformatics 2024; 25:26. [PMID: 38225565 PMCID: PMC10789030 DOI: 10.1186/s12859-024-05639-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Accepted: 01/04/2024] [Indexed: 01/17/2024] Open
Abstract
BACKGROUND In recent years, human microbiome studies have received increasing attention as this field is considered a potential source for clinical applications. With the advancements in omics technologies and AI, research focused on the discovery for potential biomarkers in the human microbiome using machine learning tools has produced positive outcomes. Despite the promising results, several issues can still be found in these studies such as datasets with small number of samples, inconsistent results, lack of uniform processing and methodologies, and other additional factors lead to lack of reproducibility in biomedical research. In this work, we propose a methodology that combines the DADA2 pipeline for 16s rRNA sequences processing and the Recursive Ensemble Feature Selection (REFS) in multiple datasets to increase reproducibility and obtain robust and reliable results in biomedical research. RESULTS Three experiments were performed analyzing microbiome data from patients/cases in Inflammatory Bowel Disease (IBD), Autism Spectrum Disorder (ASD), and Type 2 Diabetes (T2D). In each experiment, we found a biomarker signature in one dataset and applied to 2 other as further validation. The effectiveness of the proposed methodology was compared with other feature selection methods such as K-Best with F-score and random selection as a base line. The Area Under the Curve (AUC) was employed as a measure of diagnostic accuracy and used as a metric for comparing the results of the proposed methodology with other feature selection methods. Additionally, we use the Matthews Correlation Coefficient (MCC) as a metric to evaluate the performance of the methodology as well as for comparison with other feature selection methods. CONCLUSIONS We developed a methodology for reproducible biomarker discovery for 16s rRNA microbiome sequence analysis, addressing the issues related with data dimensionality, inconsistent results and validation across independent datasets. The findings from the three experiments, across 9 different datasets, show that the proposed methodology achieved higher accuracy compared to other feature selection methods. This methodology is a first approach to increase reproducibility, to provide robust and reliable results.
Collapse
Affiliation(s)
- David Rojas-Velazquez
- Division of Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, University of Utrecht, Utrecht, The Netherlands.
- Department of Data Science, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands.
| | - Sarah Kidwai
- Division of Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, University of Utrecht, Utrecht, The Netherlands
| | - Aletta D Kraneveld
- Division of Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, University of Utrecht, Utrecht, The Netherlands
- Department of Neuroscience, Faculty of Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Alberto Tonda
- UMR 518 MIA - PS, INRAE, Institut des Systèmes Complexes de Paris, Île - de - France (ISC-PIF) - UAR 3611 CNRS, Université Paris-Saclay, Paris, France
| | - Daniel Oberski
- Department of Data Science, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Johan Garssen
- Division of Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, University of Utrecht, Utrecht, The Netherlands
- Global Centre of Excellence Immunology, Danone Nutricia Research, Utrecht, The Netherlands
| | - Alejandro Lopez-Rincon
- Division of Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, University of Utrecht, Utrecht, The Netherlands
- Department of Data Science, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
| |
Collapse
|
7
|
Lee S, Lee I. Comprehensive assessment of machine learning methods for diagnosing gastrointestinal diseases through whole metagenome sequencing data. Gut Microbes 2024; 16:2375679. [PMID: 38972064 PMCID: PMC11229738 DOI: 10.1080/19490976.2024.2375679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 06/28/2024] [Indexed: 07/09/2024] Open
Abstract
The gut microbiome, linked significantly to host diseases, offers potential for disease diagnosis through machine learning (ML) pipelines. These pipelines, crucial in modeling diseases using high-dimensional microbiome data, involve selecting profile modalities, data preprocessing techniques, and classification algorithms, each impacting the model accuracy and generalizability. Despite whole metagenome shotgun sequencing (WMS) gaining popularity for human gut microbiome profiling, a consensus on the optimal methods for ML pipelines in disease diagnosis using WMS data remains elusive. Addressing this gap, we comprehensively evaluated ML methods for diagnosing Crohn's disease and colorectal cancer, using 2,553 fecal WMS samples from 21 case-control studies. Our study uncovered crucial insights: gut-specific, species-level taxonomic features proved to be the most effective for profiling; batch correction was not consistently beneficial for model performance; compositional data transformations markedly improved the models; and while nonlinear ensemble classification algorithms typically offered superior performance, linear models with proper regularization were found to be more effective for diseases that are linearly separable based on microbiome data. An optimal ML pipeline, integrating the most effective methods, was validated for generalizability using holdout data. This research offers practical guidelines for constructing reliable disease diagnostic ML models with fecal WMS data.
Collapse
Affiliation(s)
- Sungho Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Republic of Korea
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Republic of Korea
- POSTECH Biotech Center, Pohang University of Science and Technology (POSTECH), Pohang, Republic of Korea
| |
Collapse
|
8
|
Chetty A, Blekhman R. Multi-omic approaches for host-microbiome data integration. Gut Microbes 2024; 16:2297860. [PMID: 38166610 PMCID: PMC10766395 DOI: 10.1080/19490976.2023.2297860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 12/18/2023] [Indexed: 01/05/2024] Open
Abstract
The gut microbiome interacts with the host through complex networks that affect physiology and health outcomes. It is becoming clear that these interactions can be measured across many different omics layers, including the genome, transcriptome, epigenome, metabolome, and proteome, among others. Multi-omic studies of the microbiome can provide insight into the mechanisms underlying host-microbe interactions. As more omics layers are considered, increasingly sophisticated statistical methods are required to integrate them. In this review, we provide an overview of approaches currently used to characterize multi-omic interactions between host and microbiome data. While a large number of studies have generated a deeper understanding of host-microbiome interactions, there is still a need for standardization across approaches. Furthermore, microbiome studies would also benefit from the collection and curation of large, publicly available multi-omics datasets.
Collapse
Affiliation(s)
- Ashwin Chetty
- Committee on Genetics, Genomics and Systems Biology, The University of Chicago, Chicago, IL, USA
| | - Ran Blekhman
- Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL, USA
| |
Collapse
|
9
|
Xia Y. Statistical normalization methods in microbiome data with application to microbiome cancer research. Gut Microbes 2023; 15:2244139. [PMID: 37622724 PMCID: PMC10461514 DOI: 10.1080/19490976.2023.2244139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 07/12/2023] [Accepted: 07/31/2023] [Indexed: 08/26/2023] Open
Abstract
Mounting evidence has shown that gut microbiome is associated with various cancers, including gastrointestinal (GI) tract and non-GI tract cancers. But microbiome data have unique characteristics and pose major challenges when using standard statistical methods causing results to be invalid or misleading. Thus, to analyze microbiome data, it not only needs appropriate statistical methods, but also requires microbiome data to be normalized prior to statistical analysis. Here, we first describe the unique characteristics of microbiome data and the challenges in analyzing them (Section 2). Then, we provide an overall review on the available normalization methods of 16S rRNA and shotgun metagenomic data along with examples of their applications in microbiome cancer research (Section 3). In Section 4, we comprehensively investigate how the normalization methods of 16S rRNA and shotgun metagenomic data are evaluated. Finally, we summarize and conclude with remarks on statistical normalization methods (Section 5). Altogether, this review aims to provide a broad and comprehensive view and remarks on the promises and challenges of the statistical normalization methods in microbiome data with microbiome cancer research examples.
Collapse
Affiliation(s)
- Yinglin Xia
- Division of Gastroenterology and Hepatology, Department of Medicine, University of Illinois Chicago, Chicago, USA
| |
Collapse
|
10
|
Qian W, Yang Z. Identification of cell-type-specific genes in multimodal single-cell data using deep neural network algorithm. Comput Biol Med 2023; 166:107498. [PMID: 37738895 DOI: 10.1016/j.compbiomed.2023.107498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Revised: 08/15/2023] [Accepted: 09/15/2023] [Indexed: 09/24/2023]
Abstract
The emergence of single-cell RNA sequencing (scRNA-seq) technology makes it possible to measure DNA, RNA, and protein in a single cell. Cellular Indexing of Transcriptomes and Epitopes by sequencing (CITE-seq) is a powerful multimodal single-cell research innovation, allowing researchers to capture RNA and surface protein expression on the same cells. Currently, identification of cell-type-specific genes in CITE-seq data is still challenging. In this study, we obtained a set of CITE-seq datasets from Kaggle database, which included the sequencing dataset of seven cell types during bone marrow stem cell differentiation. We used Student's t-test to analyze these transcription RNAs and pick out 133 significantly differentially expressed genes (DEGs) among all cell types. Functional enrichment revealed that these DEGs were strongly associated with blood-related diseases, providing important insights into the cellular heterogeneity within bone marrow stem cells. The relation between RNA and protein levels was performed by deep neural network (DNN) model and achieved a high prediction score of 0.867. Based on their coefficients in the DNN model, three genes (LGALS1, CENPV, TRIM24) were identified as cell-type-specific genes in erythrocyte progenitor. Our works provide a novel perspective regarding the differentiation of stem cells in the bone marrow and provide valuable insights for further research in this field.
Collapse
Affiliation(s)
- Weiye Qian
- School of Artificial Intelligence, Hangzhou Dianzi University, Hangzhou, PR China
| | - Zhiyuan Yang
- School of Artificial Intelligence, Hangzhou Dianzi University, Hangzhou, PR China.
| |
Collapse
|
11
|
Zhou R, Ng SK, Sung JJY, Goh WWB, Wong SH. Data pre-processing for analyzing microbiome data - A mini review. Comput Struct Biotechnol J 2023; 21:4804-4815. [PMID: 37841330 PMCID: PMC10569954 DOI: 10.1016/j.csbj.2023.10.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 10/01/2023] [Accepted: 10/01/2023] [Indexed: 10/17/2023] Open
Abstract
The human microbiome is an emerging research frontier due to its profound impacts on health. High-throughput microbiome sequencing enables studying microbial communities but suffers from analytical challenges. In particular, the lack of dedicated preprocessing methods to improve data quality impedes effective minimization of biases prior to downstream analysis. This review aims to address this gap by providing a comprehensive overview of preprocessing techniques relevant to microbiome research. We outline a typical workflow for microbiome data analysis. Preprocessing methods discussed include quality filtering, batch effect correction, imputation of missing values, normalization, and data transformation. We highlight strengths and limitations of each technique to serve as a practical guide for researchers and identify areas needing further methodological development. Establishing robust, standardized preprocessing will be essential for drawing valid biological conclusions from microbiome studies.
Collapse
Affiliation(s)
- Ruwen Zhou
- Lee Kong Chian School of Medicine, Nanyang Technological University, 11 Mandalay Road, 308232, Singapore
| | - Siu Kin Ng
- Lee Kong Chian School of Medicine, Nanyang Technological University, 11 Mandalay Road, 308232, Singapore
| | - Joseph Jao Yiu Sung
- Lee Kong Chian School of Medicine, Nanyang Technological University, 11 Mandalay Road, 308232, Singapore
- Department of Gastroenterology and Hepatology, Tan Tock Seng Hospital, National Healthcare Group, 11 Jalan Tan Tock Seng, 308433, Singapore
| | - Wilson Wen Bin Goh
- Lee Kong Chian School of Medicine, Nanyang Technological University, 11 Mandalay Road, 308232, Singapore
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, 637551, Singapore
- Center for Biomedical Informatics, Nanyang Technological University, 59 Nanyang Drive, 636921, Singapore
| | - Sunny Hei Wong
- Lee Kong Chian School of Medicine, Nanyang Technological University, 11 Mandalay Road, 308232, Singapore
- Department of Gastroenterology and Hepatology, Tan Tock Seng Hospital, National Healthcare Group, 11 Jalan Tan Tock Seng, 308433, Singapore
| |
Collapse
|
12
|
Regueira-Iglesias A, Balsa-Castro C, Blanco-Pintos T, Tomás I. Critical review of 16S rRNA gene sequencing workflow in microbiome studies: From primer selection to advanced data analysis. Mol Oral Microbiol 2023; 38:347-399. [PMID: 37804481 DOI: 10.1111/omi.12434] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 09/01/2023] [Accepted: 09/14/2023] [Indexed: 10/09/2023]
Abstract
The multi-batch reanalysis approach of jointly reevaluating gene/genome sequences from different works has gained particular relevance in the literature in recent years. The large amount of 16S ribosomal ribonucleic acid (rRNA) gene sequence data stored in public repositories and information in taxonomic databases of the same gene far exceeds that related to complete genomes. This review is intended to guide researchers new to studying microbiota, particularly the oral microbiota, using 16S rRNA gene sequencing and those who want to expand and update their knowledge to optimise their decision-making and improve their research results. First, we describe the advantages and disadvantages of using the 16S rRNA gene as a phylogenetic marker and the latest findings on the impact of primer pair selection on diversity and taxonomic assignment outcomes in oral microbiome studies. Strategies for primer selection based on these results are introduced. Second, we identified the key factors to consider in selecting the sequencing technology and platform. The process and particularities of the main steps for processing 16S rRNA gene-derived data are described in detail to enable researchers to choose the most appropriate bioinformatics pipeline and analysis methods based on the available evidence. We then produce an overview of the different types of advanced analyses, both the most widely used in the literature and the most recent approaches. Several indices, metrics and software for studying microbial communities are included, highlighting their advantages and disadvantages. Considering the principles of clinical metagenomics, we conclude that future research should focus on rigorous analytical approaches, such as developing predictive models to identify microbiome-based biomarkers to classify health and disease states. Finally, we address the batch effect concept and the microbiome-specific methods for accounting for or correcting them.
Collapse
Affiliation(s)
- Alba Regueira-Iglesias
- Oral Sciences Research Group, Special Needs Unit, Department of Surgery and Medical-Surgical Specialties, School of Medicine and Dentistry, Universidade de Santiago de Compostela, Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, A Coruña, Spain
| | - Carlos Balsa-Castro
- Oral Sciences Research Group, Special Needs Unit, Department of Surgery and Medical-Surgical Specialties, School of Medicine and Dentistry, Universidade de Santiago de Compostela, Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, A Coruña, Spain
| | - Triana Blanco-Pintos
- Oral Sciences Research Group, Special Needs Unit, Department of Surgery and Medical-Surgical Specialties, School of Medicine and Dentistry, Universidade de Santiago de Compostela, Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, A Coruña, Spain
| | - Inmaculada Tomás
- Oral Sciences Research Group, Special Needs Unit, Department of Surgery and Medical-Surgical Specialties, School of Medicine and Dentistry, Universidade de Santiago de Compostela, Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, A Coruña, Spain
| |
Collapse
|
13
|
Papoutsoglou G, Tarazona S, Lopes MB, Klammsteiner T, Ibrahimi E, Eckenberger J, Novielli P, Tonda A, Simeon A, Shigdel R, Béreux S, Vitali G, Tangaro S, Lahti L, Temko A, Claesson MJ, Berland M. Machine learning approaches in microbiome research: challenges and best practices. Front Microbiol 2023; 14:1261889. [PMID: 37808286 PMCID: PMC10556866 DOI: 10.3389/fmicb.2023.1261889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 09/04/2023] [Indexed: 10/10/2023] Open
Abstract
Microbiome data predictive analysis within a machine learning (ML) workflow presents numerous domain-specific challenges involving preprocessing, feature selection, predictive modeling, performance estimation, model interpretation, and the extraction of biological information from the results. To assist decision-making, we offer a set of recommendations on algorithm selection, pipeline creation and evaluation, stemming from the COST Action ML4Microbiome. We compared the suggested approaches on a multi-cohort shotgun metagenomics dataset of colorectal cancer patients, focusing on their performance in disease diagnosis and biomarker discovery. It is demonstrated that the use of compositional transformations and filtering methods as part of data preprocessing does not always improve the predictive performance of a model. In contrast, the multivariate feature selection, such as the Statistically Equivalent Signatures algorithm, was effective in reducing the classification error. When validated on a separate test dataset, this algorithm in combination with random forest modeling, provided the most accurate performance estimates. Lastly, we showed how linear modeling by logistic regression coupled with visualization techniques such as Individual Conditional Expectation (ICE) plots can yield interpretable results and offer biological insights. These findings are significant for clinicians and non-experts alike in translational applications.
Collapse
Affiliation(s)
- Georgios Papoutsoglou
- Department of Computer Science, University of Crete, Heraklion, Greece
- JADBio Gnosis DA S.A., Science and Technology Park of Crete, Heraklion, Greece
| | - Sonia Tarazona
- Department of Applied Statistics and Operations Research and Quality, Polytechnic University of Valencia, Valencia, Spain
| | - Marta B. Lopes
- Center for Mathematics and Applications (NOVA Math), NOVA School of Science and Technology, Caparica, Portugal
- Research and Development Unit for Mechanical and Industrial Engineering (UNIDEMI), Department of Mechanical and Industrial Engineering, NOVA School of Science and Technology, Caparica, Portugal
| | - Thomas Klammsteiner
- Department of Ecology, Universität Innsbruck, Innsbruck, Austria
- Department of Microbiology, Universität Innsbruck, Innsbruck, Austria
| | - Eliana Ibrahimi
- Department of Biology, University of Tirana, Tirana, Albania
| | - Julia Eckenberger
- School of Microbiology, University College Cork, Cork, Ireland
- APC Microbiome Ireland, Cork, Ireland
| | - Pierfrancesco Novielli
- Department of Soil, Plant, and Food Sciences, University of Bari Aldo Moro, Bari, Italy
- National Institute for Nuclear Physics, Bari Division, Bari, Italy
| | - Alberto Tonda
- UMR 518 MIA-PS, INRAE, Paris-Saclay University, Palaiseau, France
- Complex Systems Institute of Paris Ile-de-France (ISC-PIF) - UAR 3611 CNRS, Paris, France
| | - Andrea Simeon
- BioSense Institute, University of Novi Sad, Novi Sad, Serbia
| | - Rajesh Shigdel
- Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Stéphane Béreux
- MetaGenoPolis, INRAE, Paris-Saclay University, Jouy-en-Josas, France
- MaIAGE, INRAE, Paris-Saclay University, Jouy-en-Josas, France
| | - Giacomo Vitali
- MetaGenoPolis, INRAE, Paris-Saclay University, Jouy-en-Josas, France
| | - Sabina Tangaro
- Department of Soil, Plant, and Food Sciences, University of Bari Aldo Moro, Bari, Italy
- National Institute for Nuclear Physics, Bari Division, Bari, Italy
| | - Leo Lahti
- Department of Computing, University of Turku, Turku, Finland
| | - Andriy Temko
- Department of Electrical and Electronic Engineering, University College Cork, Cork, Ireland
| | - Marcus J. Claesson
- School of Microbiology, University College Cork, Cork, Ireland
- APC Microbiome Ireland, Cork, Ireland
| | - Magali Berland
- MetaGenoPolis, INRAE, Paris-Saclay University, Jouy-en-Josas, France
| |
Collapse
|
14
|
Vazquez-Munoz R, Thompson A, Sobue T, Dongari-Bagtzoglou A. A prebiotic diet modulates the oral microbiome composition and results in the attenuation of oropharyngeal candidiasis in mice. Microbiol Spectr 2023; 11:e0173423. [PMID: 37671879 PMCID: PMC10580959 DOI: 10.1128/spectrum.01734-23] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 07/06/2023] [Indexed: 09/07/2023] Open
Abstract
Oral bacteria can influence the ability of Candida albicans to cause oropharyngeal candidiasis (OPC). We recently reported that a Lactobacillus johnsonii-enriched oral microbiota reduced C. albicans virulence in an immunosuppressed OPC mouse model. As a follow-up, in this work, we aimed to enrich the resident oral Lactobacillus communities with a prebiotic diet to further assess their effect on the severity of OPC. We tested the effect of a prebiotic xylo-oligosaccharides (XOS)-enriched diet in the oral global bacterial composition and severity of OPC. We assessed changes in the oral microbiome composition via 16S-rRNA gene high-throughput sequencing, validated by qPCR. The impact of the prebiotic diet on Candida infection was assessed by quantifying changes in oral fungal and bacterial biomass and scoring tongue lesions. Contrary to expectations, oral Lactobacillus communities were not enriched by the XOS-supplemented diet. Yet, XOS modulated the oral microbiome composition, increasing Bifidobacterium abundance and reducing enterococci and staphylococci. In the OPC model, the XOS diet attenuated Candida virulence and bacterial dysbiosis, increasing lactobacilli and reducing enterococci on the oral mucosa. We conclude that XOS attenuates Candida virulence by promoting a bacterial microbiome structure more resilient to Candida infection. IMPORTANCE This is the first study on the effects of a prebiotic diet on the oral mucosal bacterial microbiome and an oropharyngeal candidiasis (OPC) mouse model. We found that xylo-oligosaccharides change the oral bacterial community composition and attenuate OPC. Our results contribute to the understanding of the impact of the oral bacterial communities on Candida virulence.
Collapse
Affiliation(s)
- Roberto Vazquez-Munoz
- Department of General Dentistry, The University of Connecticut Health Center, Farmington, Connecticut, USA
| | - Angela Thompson
- Department of General Dentistry, The University of Connecticut Health Center, Farmington, Connecticut, USA
| | - Takanori Sobue
- Department of General Dentistry, The University of Connecticut Health Center, Farmington, Connecticut, USA
| | - Anna Dongari-Bagtzoglou
- Department of General Dentistry, The University of Connecticut Health Center, Farmington, Connecticut, USA
| |
Collapse
|
15
|
Goh WWB, Hui HWH, Wong L. How missing value imputation is confounded with batch effects and what you can do about it. Drug Discov Today 2023; 28:103661. [PMID: 37301250 DOI: 10.1016/j.drudis.2023.103661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 05/31/2023] [Accepted: 06/05/2023] [Indexed: 06/12/2023]
Abstract
In data-processing pipelines, upstream steps can influence downstream processes because of their sequential nature. Among these data-processing steps, batch effect (BE) correction (BEC) and missing value imputation (MVI) are crucial for ensuring data suitability for advanced modeling and reducing the likelihood of false discoveries. Although BEC-MVI interactions are not well studied, they are ultimately interdependent. Batch sensitization can improve the quality of MVI. Conversely, accounting for missingness also improves proper BE estimation in BEC. Here, we discuss how BEC and MVI are interconnected and interdependent. We show how batch sensitization can improve any MVI and bring attention to the idea of BE-associated missing values (BEAMs). Finally, we discuss how batch-class imbalance problems can be mitigated by borrowing ideas from machine learning.
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore; School of Biological Sciences, Nanyang Technological University, Singapore; Center for Biomedical Informatics, Nanyang Technological University, Singapore.
| | - Harvard Wai Hann Hui
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore; School of Biological Sciences, Nanyang Technological University, Singapore
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore, Singapore; Department of Pathology, National University of Singapore, Singapore.
| |
Collapse
|
16
|
Morton JT, Jin DM, Mills RH, Shao Y, Rahman G, McDonald D, Zhu Q, Balaban M, Jiang Y, Cantrell K, Gonzalez A, Carmel J, Frankiensztajn LM, Martin-Brevet S, Berding K, Needham BD, Zurita MF, David M, Averina OV, Kovtun AS, Noto A, Mussap M, Wang M, Frank DN, Li E, Zhou W, Fanos V, Danilenko VN, Wall DP, Cárdenas P, Baldeón ME, Jacquemont S, Koren O, Elliott E, Xavier RJ, Mazmanian SK, Knight R, Gilbert JA, Donovan SM, Lawley TD, Carpenter B, Bonneau R, Taroncher-Oldenburg G. Multi-level analysis of the gut-brain axis shows autism spectrum disorder-associated molecular and microbial profiles. Nat Neurosci 2023:10.1038/s41593-023-01361-0. [PMID: 37365313 DOI: 10.1038/s41593-023-01361-0] [Citation(s) in RCA: 36] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Accepted: 05/13/2023] [Indexed: 06/28/2023]
Abstract
Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by heterogeneous cognitive, behavioral and communication impairments. Disruption of the gut-brain axis (GBA) has been implicated in ASD although with limited reproducibility across studies. In this study, we developed a Bayesian differential ranking algorithm to identify ASD-associated molecular and taxa profiles across 10 cross-sectional microbiome datasets and 15 other datasets, including dietary patterns, metabolomics, cytokine profiles and human brain gene expression profiles. We found a functional architecture along the GBA that correlates with heterogeneity of ASD phenotypes, and it is characterized by ASD-associated amino acid, carbohydrate and lipid profiles predominantly encoded by microbial species in the genera Prevotella, Bifidobacterium, Desulfovibrio and Bacteroides and correlates with brain gene expression changes, restrictive dietary patterns and pro-inflammatory cytokine profiles. The functional architecture revealed in age-matched and sex-matched cohorts is not present in sibling-matched cohorts. We also show a strong association between temporal changes in microbiome composition and ASD phenotypes. In summary, we propose a framework to leverage multi-omic datasets from well-defined cohorts and investigate how the GBA influences ASD.
Collapse
Affiliation(s)
- James T Morton
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
- Biostatistics & Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA
| | - Dong-Min Jin
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | | | - Yan Shao
- Host-Microbiota Interactions Laboratory, Wellcome Sanger Institute, Hinxton, UK
| | - Gibraan Rahman
- Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA, USA
- Department of Pediatrics, School of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Daniel McDonald
- Department of Pediatrics, School of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Qiyun Zhu
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
- Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, AZ, USA
| | - Metin Balaban
- Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA, USA
| | - Yueyu Jiang
- Department of Electrical and Computer Engineering, University of California, San Diego, La Jolla, CA, USA
| | - Kalen Cantrell
- Department of Pediatrics, School of Medicine, University of California, San Diego, La Jolla, CA, USA
- Department of Computer Science and Engineering, Jacobs School of Engineering, University of California, San Diego, La Jolla, CA, USA
| | - Antonio Gonzalez
- Department of Pediatrics, School of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Julie Carmel
- Azrieli Faculty of Medicine, Bar Ilan University, Safed, Israel
| | | | - Sandra Martin-Brevet
- Laboratory for Research in Neuroimaging, Centre for Research in Neurosciences, Department of Clinical Neurosciences, Centre Hospitalier Universitaire Vaudois, University of Lausanne, Lausanne, Switzerland
| | - Kirsten Berding
- Division of Nutritional Sciences, University of Illinois, Urbana, IL, USA
| | - Brittany D Needham
- Stark Neurosciences Research Institute, Indiana University School of Medicine, Indianapolis, IN, USA
- Department of Anatomy, Cell Biology and Physiology, Indiana University School of Medicine, Indianapolis, IN, USA
| | - María Fernanda Zurita
- Microbiology Institute and Health Science College, Universidad San Francisco de Quito, Quito, Ecuador
| | - Maude David
- Departments of Microbiology & Pharmaceutical Sciences, Oregon State University, Corvallis, OR, USA
| | - Olga V Averina
- Vavilov Institute of General Genetics Russian Academy of Sciences, Moscow, Russia
| | - Alexey S Kovtun
- Vavilov Institute of General Genetics Russian Academy of Sciences, Moscow, Russia
- Skolkovo Institute of Science and Technology, Skolkovo, Russia
| | - Antonio Noto
- Department of Biomedical Sciences, School of Medicine, University of Cagliari, Cagliari, Italy
| | - Michele Mussap
- Laboratory Medicine, Department of Surgical Sciences, School of Medicine, University of Cagliari, Cagliari, Italy
| | - Mingbang Wang
- Shanghai Key Laboratory of Birth Defects, Division of Neonatology, Children's Hospital of Fudan University, National Center for Children's Health, Shanghai, China
- Microbiome Therapy Center, South China Hospital, Health Science Center, Shenzhen University, Shenzhen, China
| | - Daniel N Frank
- Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Ellen Li
- Department of Medicine, Division of Gastroenterology and Hepatology, Stony Brook University, Stony Brook, NY, USA
| | - Wenhao Zhou
- Shanghai Key Laboratory of Birth Defects, Division of Neonatology, Children's Hospital of Fudan University, National Center for Children's Health, Shanghai, China
| | - Vassilios Fanos
- Neonatal Intensive Care Unit and Neonatal Pathology, Department of Surgical Sciences, School of Medicine, University of Cagliari, Cagliari, Italy
| | - Valery N Danilenko
- Vavilov Institute of General Genetics Russian Academy of Sciences, Moscow, Russia
| | - Dennis P Wall
- Pediatrics (Systems Medicine), Biomedical Data Science, and Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, USA
| | - Paúl Cárdenas
- Institute of Microbiology, COCIBA, Universidad San Francisco de Quito, Quito, Ecuador
| | - Manuel E Baldeón
- Facultad de Ciencias Médicas, de la Salud y la Vida, Universidad Internacional del Ecuador, Quito, Ecuador
| | - Sébastien Jacquemont
- Sainte Justine Hospital Research Center, Montréal, QC, Canada
- Department of Pediatrics, Université de Montréal, Montréal, QC, Canada
| | - Omry Koren
- Azrieli Faculty of Medicine, Bar Ilan University, Safed, Israel
| | - Evan Elliott
- Azrieli Faculty of Medicine, Bar Ilan University, Safed, Israel
- The Leslie and Susan Gonda Multidisciplinary Brain Research Center, Bar Ilan University, Ramat Gan, Israel
| | - Ramnik J Xavier
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Molecular Biology, Massachusetts General Hospital, Boston, MA, USA
- Center for the Study of Inflammatory Bowel Disease, Massachusetts General Hospital, Boston, MA, USA
| | - Sarkis K Mazmanian
- Division of Biology & Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Rob Knight
- Department of Pediatrics, School of Medicine, University of California, San Diego, La Jolla, CA, USA
- Department of Computer Science and Engineering, Jacobs School of Engineering, University of California, San Diego, La Jolla, CA, USA
- Department of Bioengineering, University of California, San Diego, La Jolla, California, USA
- Center for Microbiome Innovation, University of California, San Diego, La Jolla, California, USA
| | - Jack A Gilbert
- Department of Pediatrics, School of Medicine, University of California, San Diego, La Jolla, CA, USA
- Center for Microbiome Innovation, University of California, San Diego, La Jolla, California, USA
- Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA, USA
| | - Sharon M Donovan
- Division of Nutritional Sciences, University of Illinois, Urbana, IL, USA
| | - Trevor D Lawley
- Host-Microbiota Interactions Laboratory, Wellcome Sanger Institute, Hinxton, UK
| | - Bob Carpenter
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Richard Bonneau
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
- Prescient Design, a Genentech Accelerator, New York, NY, USA
| | - Gaspar Taroncher-Oldenburg
- Gaspar Taroncher Consulting, Philadelphia, PA, USA.
- Simons Foundation Autism Research Initiative, Simons Foundation, New York, NY, USA.
| |
Collapse
|
17
|
Rahnenführer J, De Bin R, Benner A, Ambrogi F, Lusa L, Boulesteix AL, Migliavacca E, Binder H, Michiels S, Sauerbrei W, McShane L. Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges. BMC Med 2023; 21:182. [PMID: 37189125 DOI: 10.1186/s12916-023-02858-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Accepted: 04/03/2023] [Indexed: 05/17/2023] Open
Abstract
BACKGROUND In high-dimensional data (HDD) settings, the number of variables associated with each observation is very large. Prominent examples of HDD in biomedical research include omics data with a large number of variables such as many measurements across the genome, proteome, or metabolome, as well as electronic health records data that have large numbers of variables recorded for each patient. The statistical analysis of such data requires knowledge and experience, sometimes of complex methods adapted to the respective research questions. METHODS Advances in statistical methodology and machine learning methods offer new opportunities for innovative analyses of HDD, but at the same time require a deeper understanding of some fundamental statistical concepts. Topic group TG9 "High-dimensional data" of the STRATOS (STRengthening Analytical Thinking for Observational Studies) initiative provides guidance for the analysis of observational studies, addressing particular statistical challenges and opportunities for the analysis of studies involving HDD. In this overview, we discuss key aspects of HDD analysis to provide a gentle introduction for non-statisticians and for classically trained statisticians with little experience specific to HDD. RESULTS The paper is organized with respect to subtopics that are most relevant for the analysis of HDD, in particular initial data analysis, exploratory data analysis, multiple testing, and prediction. For each subtopic, main analytical goals in HDD settings are outlined. For each of these goals, basic explanations for some commonly used analysis methods are provided. Situations are identified where traditional statistical methods cannot, or should not, be used in the HDD setting, or where adequate analytic tools are still lacking. Many key references are provided. CONCLUSIONS This review aims to provide a solid statistical foundation for researchers, including statisticians and non-statisticians, who are new to research with HDD or simply want to better evaluate and understand the results of HDD analyses.
Collapse
Affiliation(s)
| | | | - Axel Benner
- Division of Biostatistics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Federico Ambrogi
- Department of Clinical Sciences and Community Health, University of Milan, Milan, Italy
- Scientific Directorate, IRCCS Policlinico San Donato, San Donato Milanese, Italy
| | - Lara Lusa
- Department of Mathematics, Faculty of Mathematics, Natural Sciences and Information Technology, University of Primorksa, Koper, Slovenia
- Institute of Biostatistics and Medical Informatics, University of Ljubljana, Ljubljana, Slovenia
| | - Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry and Epidemiology, Ludwig Maximilian University of Munich, Munich, Germany
| | | | - Harald Binder
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Stefan Michiels
- Service de Biostatistique et d'Épidémiologie, Gustave Roussy, Université Paris-Saclay, Villejuif, France
- Oncostat U1018, Inserm, Université Paris-Saclay, Labeled Ligue Contre le Cancer, Villejuif, France
| | - Willi Sauerbrei
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Lisa McShane
- Biometric Research Program, Division of Cancer Treatment and Diagnosis, National Cancer Institute, Bethesda, MD, USA.
| |
Collapse
|
18
|
Burnham CM, McKenney EA, van Heugten KA, Minter LJ, Trivedi S. Effects of age, seasonality, and reproductive status on the gut microbiome of Southern White Rhinoceros (Ceratotherium simum simum) at the North Carolina zoo. Anim Microbiome 2023; 5:27. [PMID: 37147724 PMCID: PMC10163733 DOI: 10.1186/s42523-023-00249-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 04/22/2023] [Indexed: 05/07/2023] Open
Abstract
BACKGROUND Managed southern white rhinoceros (Ceratotherium simum simum) serve as assurance populations for wild conspecifics threatened by poaching and other anthropocentric effects, though many managed populations experience subfertility and reproductive failure. Gut microbiome and host health are inextricably linked, and reproductive outcomes in managed southern white rhinoceros may be mediated in part by their diet and gut microbial diversity. Thus, understanding microbial dynamics within managed populations may help improve conservation efforts. We characterized the taxonomic composition of the gut microbiome in the managed population of female southern white rhinoceros (n = 8) at the North Carolina Zoo and investigated the effects of seasonality (summer vs. winter) and age classes (juveniles (n = 2; 0-2 years), subadults (n = 2; 3-7 years), and adults (n = 4; >7 years)) on microbial richness and community structure. Collection of a fecal sample was attempted for each individual once per month from July-September 2020 and January-March 2021 resulting in a total of 41 samples analyzed. Microbial DNA was extracted and sequenced using the V3-V4 region of the 16S rRNA bacterial gene. Total operational taxonomic units (OTUs), alpha diversity (species richness, Shannon diversity), and beta diversity (Bray-Curtis dissimilarity, linear discriminant analysis effect size) indices were examined, and differentially enriched taxa were identified. RESULTS There were differences (p < 0.05) in alpha and beta diversity indices across individuals, age groups, and sampling months. Subadult females had higher levels of Shannon diversity (Wilcoxon, p < 0.05) compared to adult females and harbored a community cluster distinct from both juveniles and adults. Samples collected during winter months (January-March 2021) possessed higher species richness and statistically distinct communities compared to summer months (July-September 2020) (PERMANOVA, p < 0.05). Reproductively active (n = 2) and currently nonreproductive adult females (n = 2) harbored differentially enriched taxa, with the gut microbiome of nonreproductive females significantly enriched (p = 0.001) in unclassified members of Mobiluncus, a genus which possesses species associated with poor reproductive outcomes in other animal species when identified in the cervicovaginal microbiome. CONCLUSION Together, our results increase the understanding of age and season related microbial variation in southern white rhinoceros at the North Carolina Zoo and have identified a potential microbial biomarker for reproductive concern within managed female southern white rhinoceros.
Collapse
Affiliation(s)
- Christina M Burnham
- Department of Animal Science, North Carolina State University, 120 W Broughton Dr, Raleigh, NC, 27607, USA
| | - Erin A McKenney
- Department of Applied Ecology, North Carolina State University, 100 Brooks Ave, Raleigh, NC, 27607, USA
| | - Kimberly Ange- van Heugten
- Department of Animal Science, North Carolina State University, 120 W Broughton Dr, Raleigh, NC, 27607, USA
| | - Larry J Minter
- North Carolina Zoo, 4401 Zoo Parkway, Asheboro, NC, 27205, USA
| | - Shweta Trivedi
- Department of Animal Science, North Carolina State University, 120 W Broughton Dr, Raleigh, NC, 27607, USA.
| |
Collapse
|
19
|
Olbrich M, Künstner A, Busch H. MBECS: Microbiome Batch Effects Correction Suite. BMC Bioinformatics 2023; 24:182. [PMID: 37138207 PMCID: PMC10155362 DOI: 10.1186/s12859-023-05252-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 03/20/2023] [Indexed: 05/05/2023] Open
Abstract
Despite the availability of batch effect correcting algorithms (BECA), no comprehensive tool that combines batch correction and evaluation of the results exists for microbiome datasets. This work outlines the Microbiome Batch Effects Correction Suite development that integrates several BECAs and evaluation metrics into a software package for the statistical computation framework R.
Collapse
Affiliation(s)
- Michael Olbrich
- Lübeck Institute for Experimental Dermatology, University of Lübeck, Lübeck, Germany.
- Institute for Cardiogenetics, University of Lübeck, Lübeck, Germany.
- Center for Biotechnology, Khalifa University, Abu Dhabi, United Arab Emirates.
| | - Axel Künstner
- Lübeck Institute for Experimental Dermatology, University of Lübeck, Lübeck, Germany
- Institute for Cardiogenetics, University of Lübeck, Lübeck, Germany
| | - Hauke Busch
- Lübeck Institute for Experimental Dermatology, University of Lübeck, Lübeck, Germany.
| |
Collapse
|
20
|
Guo F, Lin G, Dong L, Cheng KK, Deng L, Xu X, Raftery D, Dong J. Concordance-Based Batch Effect Correction for Large-Scale Metabolomics. Anal Chem 2023; 95:7220-7228. [PMID: 37115661 DOI: 10.1021/acs.analchem.2c05748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
For a large-scale metabolomics study, sample collection, preparation, and analysis may last several days, months, or even (intermittently) over years. This may lead to apparent batch effects in the acquired metabolomics data due to variability in instrument status, environmental conditions, or experimental operators. Batch effects may confound the true biological relationships among metabolites and thus obscure real metabolic changes. At present, most of the commonly used batch effect correction (BEC) methods are based on quality control (QC) samples, which require sufficient and stable QC samples. However, the quality of the QC samples may deteriorate if the experiment lasts for a long time. Alternatively, isotope-labeled internal standards have been used, but they generally do not provide good coverage of the metabolome. On the other hand, BEC can also be conducted through a data-driven method, in which no QC sample is needed. Here, we propose a novel data-driven BEC method, namely, CordBat, to achieve concordance between each batch of samples. In the proposed CordBat method, a reference batch is first selected from all batches of data, and the remaining batches are referred to as "other batches." The reference batch serves as the baseline for the batch adjustment by providing a coordinate of correlation between metabolites. Next, a Gaussian graphical model is built on the combined dataset of reference and other batches, and finally, BEC is achieved by optimizing the correction coefficients in the other batches so that the correlation between metabolites of each batch and their combinations are in concordance with that of the reference batch. Three real-world metabolomics datasets are used to evaluate the performance of CordBat by comparing it with five commonly used BEC methods. The present experimental results showed the effectiveness of CordBat in batch effect removal and the concordance of correlation between metabolites after BEC. CordBat was found to be comparable to the QC-based methods and achieved better performance in the preservation of biological effects. The proposed CordBat method may serve as an alternative BEC method for large-scale metabolomics that lack proper QC samples.
Collapse
Affiliation(s)
- Fanjing Guo
- Department of Electronic Science, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| | - Genjin Lin
- Department of Electronic Science, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| | - Liheng Dong
- School of Computer Science and Technology, Xiamen University Malaysia, Sepang 43600, Malaysia
| | - Kian-Kai Cheng
- Faculty of Chemical and Energy Engineering, Universiti Teknologi Malaysia, Johor 81310, Malaysia
| | - Lingli Deng
- Department of Information Engineering, East China University of Technology, Nanchang 330013, China
| | - Xiangnan Xu
- School of Mathematics and Statistics, The University of Sydney, Sydney, New South Wales 2006, Australia
| | - Daniel Raftery
- Northwest Metabolomics Research Center, University of Washington, Seattle, Washington 98109, United States
| | - Jiyang Dong
- Department of Electronic Science, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| |
Collapse
|
21
|
Wang Y, Lê Cao KA. PLSDA-batch: a multivariate framework to correct for batch effects in microbiome data. Brief Bioinform 2023; 24:6991121. [PMID: 36653900 PMCID: PMC10025448 DOI: 10.1093/bib/bbac622] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 12/14/2022] [Accepted: 12/17/2022] [Indexed: 01/20/2023] Open
Abstract
Microbial communities are highly dynamic and sensitive to changes in the environment. Thus, microbiome data are highly susceptible to batch effects, defined as sources of unwanted variation that are not related to and obscure any factors of interest. Existing batch effect correction methods have been primarily developed for gene expression data. As such, they do not consider the inherent characteristics of microbiome data, including zero inflation, overdispersion and correlation between variables. We introduce new multivariate and non-parametric batch effect correction methods based on Partial Least Squares Discriminant Analysis (PLSDA). PLSDA-batch first estimates treatment and batch variation with latent components, then subtracts batch-associated components from the data. The resulting batch-effect-corrected data can then be input in any downstream statistical analysis. Two variants are proposed to handle unbalanced batch x treatment designs and to avoid overfitting when estimating the components via variable selection. We compare our approaches with popular methods managing batch effects, namely, removeBatchEffect, ComBat and Surrogate Variable Analysis, in simulated and three case studies using various visual and numerical assessments. We show that our three methods lead to competitive performance in removing batch variation while preserving treatment variation, especially for unbalanced batch $\times $ treatment designs. Our downstream analyses show selections of biologically relevant taxa. This work demonstrates that batch effect correction methods can improve microbiome research outputs. Reproducible code and vignettes are available on GitHub.
Collapse
Affiliation(s)
- Yiwen Wang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, 97 Buxin Rd, Shenzhen, 518000, Guangdong, China
- Melbourne Integrative Genomics, School of Mathematics and Statistics, The University of Melbourne, 30 Royal Parade, Melbourne, 3052, VIC, Australia
| | - Kim-Anh Lê Cao
- Melbourne Integrative Genomics, School of Mathematics and Statistics, The University of Melbourne, 30 Royal Parade, Melbourne, 3052, VIC, Australia
| |
Collapse
|
22
|
Rahman G, Morton JT, Martino C, Sepich-Poore GD, Allaband C, Guccione C, Chen Y, Hakim D, Estaki M, Knight R. BIRDMAn: A Bayesian differential abundance framework that enables robust inference of host-microbe associations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.30.526328. [PMID: 36778470 PMCID: PMC9915500 DOI: 10.1101/2023.01.30.526328] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Quantifying the differential abundance (DA) of specific taxa among experimental groups in microbiome studies is challenging due to data characteristics (e.g., compositionality, sparsity) and specific study designs (e.g., repeated measures, meta-analysis, cross-over). Here we present BIRDMAn (Bayesian Inferential Regression for Differential Microbiome Analysis), a flexible DA method that can account for microbiome data characteristics and diverse experimental designs. Simulations show that BIRDMAn models are robust to uneven sequencing depth and provide a >20-fold improvement in statistical power over existing methods. We then use BIRDMAn to identify antibiotic-mediated perturbations undetected by other DA methods due to subject-level heterogeneity. Finally, we demonstrate how BIRDMAn can construct state-of-the-art cancer-type classifiers using The Cancer Genome Atlas (TCGA) dataset, with substantial accuracy improvements over random forests and existing DA tools across multiple sequencing centers. Collectively, BIRDMAn extracts more informative biological signals while accounting for study-specific experimental conditions than existing approaches.
Collapse
Affiliation(s)
- Gibraan Rahman
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
| | - James T Morton
- Biostatistics & Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA
| | - Cameron Martino
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
| | | | - Celeste Allaband
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Caitlin Guccione
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
- Division of Biomedical Informatics, Department of Medicine, University of California San Diego, La Jolla, CA
| | - Yang Chen
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Department of Dermatology, University of California San Diego, La Jolla, CA, USA
- Biomedical Sciences Graduate Program, University of California San Diego, La Jolla, CA
| | - Daniel Hakim
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
| | - Mehrbod Estaki
- Department of Physiology & Pharmacology, University of Calgary, Calgary, Canada
| | - Rob Knight
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California, USA
| |
Collapse
|
23
|
Hsieh PC, Chang CS, Chen KL, Cho YT, Chu CY, Chen KY. Temporal shifts of the microbiome associated with antibiotic treatment of purpuric drug eruptions related to epidermal growth factor receptor inhibitors. J Eur Acad Dermatol Venereol 2023; 37:382-389. [PMID: 36200415 DOI: 10.1111/jdv.18640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Accepted: 09/21/2022] [Indexed: 01/18/2023]
Abstract
BACKGROUND Epidermal growth factor receptor (EGFR) inhibitors are selective and effective treatments for cancers with relevant mutations. Purpuric drug eruptions are an uncommon but clinically significant dermatological side effect related to EGFR inhibitor use that are associated with positive bacterial cultures and responsive to antibiotic treatment. However, the longitudinal temporal shifts in the skin microbiome that occur before and after antibiotic treatment of purpuric drug eruptions remain largely unknown. OBJECTIVES To characterize temporal changes in the skin and mucosal microbiomes before and after antibiotic treatment of EGFR inhibitor-related purpuric drug eruptions. METHODS Twelve patients who experienced EGFR inhibitor-related purpuric drug eruptions were recruited from a dermato-oncology clinic in Taiwan from May 2017 to April 2018. Swabs were obtained from skin lesions and the nasal mucosa before and after antibiotic treatment of purpuric drug eruptions. After the amplification and sequencing of bacterial 16S rRNA genes, the diversity and compositions of microbiomes sampled at different time points were compared. RESULTS The alpha diversity (represented by the Shannon index) of the skin microbiome increased significantly in the recovered phase of purpuric drug eruptions compared with that of the active phase. By contrast, the nasal microbiome showed no significant change in alpha diversity. The relative abundance of Staphylococcus significantly decreased in samples from skin of the recovered phase, which was confirmed by analysis of compositions of microbiomes (ANCOM) and the ALDEx2 analysis packages in R. CONCLUSIONS The cutaneous microbiome of purpuric drug eruptions showed a significant increase in alpha diversity and a decrease in the relative abundance of Staphylococcus following antibiotic treatment. These findings may help guide antimicrobial therapy of this EGFR inhibitor-related condition.
Collapse
Affiliation(s)
- Paul-Chen Hsieh
- Department of Dermatology, National Taiwan University Hospital and National Taiwan University College of Medicine, Taipei, Taiwan.,Department of Dermatology, National Taiwan University Hospital Hsin-Chu Branch, Hsinchu, Taiwan
| | - Chi-Sheng Chang
- Department of Animal Science, Chinese Culture University, Taipei, Taiwan
| | - Kai-Lung Chen
- Department of Dermatology, National Taiwan University Cancer Center, Taipei, Taiwan
| | - Yung-Tsu Cho
- Department of Dermatology, National Taiwan University Hospital and National Taiwan University College of Medicine, Taipei, Taiwan
| | - Chia-Yu Chu
- Department of Dermatology, National Taiwan University Hospital and National Taiwan University College of Medicine, Taipei, Taiwan
| | - Kuan-Yu Chen
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, National Taiwan University Hospital and College of Medicine, Taipei, Taiwan
| |
Collapse
|
24
|
Busato S, Gordon M, Chaudhari M, Jensen I, Akyol T, Andersen S, Williams C. Compositionality, sparsity, spurious heterogeneity, and other data-driven challenges for machine learning algorithms within plant microbiome studies. CURRENT OPINION IN PLANT BIOLOGY 2023; 71:102326. [PMID: 36538837 PMCID: PMC9925409 DOI: 10.1016/j.pbi.2022.102326] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Revised: 11/08/2022] [Accepted: 11/21/2022] [Indexed: 06/17/2023]
Abstract
The plant-associated microbiome is a key component of plant systems, contributing to their health, growth, and productivity. The application of machine learning (ML) in this field promises to help untangle the relationships involved. However, measurements of microbial communities by high-throughput sequencing pose challenges for ML. Noise from low sample sizes, soil heterogeneity, and technical factors can impact the performance of ML. Additionally, the compositional and sparse nature of these datasets can impact the predictive accuracy of ML. We review recent literature from plant studies to illustrate that these properties often go unmentioned. We expand our analysis to other fields to quantify the degree to which mitigation approaches improve the performance of ML and describe the mathematical basis for this. With the advent of accessible analytical packages for microbiome data including learning models, researchers must be familiar with the nature of their datasets.
Collapse
Affiliation(s)
- Sebastiano Busato
- Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, USA; NC Plant Sciences Initiative, North Carolina State University, Raleigh, USA
| | - Max Gordon
- Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, USA; NC Plant Sciences Initiative, North Carolina State University, Raleigh, USA
| | - Meenal Chaudhari
- Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, USA; NC Plant Sciences Initiative, North Carolina State University, Raleigh, USA
| | - Ib Jensen
- Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | - Turgut Akyol
- Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | - Stig Andersen
- Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | - Cranos Williams
- Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, USA; NC Plant Sciences Initiative, North Carolina State University, Raleigh, USA; Department of Plant and Microbial Biology, North Carolina State University, Raleigh, USA.
| |
Collapse
|
25
|
Song X, Zhai Y, Song J, Zhang J, Li X. The structural discrepancy between the small and large gut microbiota of Asiatic toad (Bufo gargarizans) during hibernation. Folia Microbiol (Praha) 2023:10.1007/s12223-023-01031-5. [PMID: 36637770 DOI: 10.1007/s12223-023-01031-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 01/01/2023] [Indexed: 01/14/2023]
Abstract
Hibernating amphibians are suitable for the research on the adaptation of gut microbiota to long-term fasting and cold stresses. However, the previous studies mainly focus on the large or whole gut microbiota but not the small gut microbiota. To test the structural discrepancy between the small and large gut microbiota during hibernation, we performed two independent batches of 16S rRNA gene amplicon sequencing to profile the small and large gut microbiota of hibernating Asiatic toad (Bufo gargarizans) from two wild populations. Both batches of data revealed that Proteobacteria, Bacteroidetes, and Firmicutes were the three most dominant phyla in the small and large gut microbiota. Three core OTUs with 100% occurrence in all gut microbiotas were annotated as Pseudomonas. A significant structural discrepancy was detected between the small and large gut microbiota. For instance, Proteobacteria assembled in the small intestine with a higher proportion than it did in the large intestine, but Bacteroidetes and Firmicutes assembled in the large intestine with a higher proportion than they did in the small intestine. The large gut microbiota exhibited higher diversity than the small gut microbiota. Nevertheless, a severe batch effect existed in the structural analysis of the gut microbiotas. The large gut microbiota showed a better resistance to the batch effect than the small gut microbiota did. This study provides preliminary evidence that microbes assemble in the small and large intestines of amphibians with discrepant patterns during hibernation.
Collapse
Affiliation(s)
- Xiaowei Song
- College of Software Engineering, Chengdu University of Information and Technology, Chengdu, Sichuan, China. .,College of Life Sciences, Institute for Conservation and Utilization of Agro-Bioresources in Dabie Mountains, Xinyang Normal University, Xinyang, Henan, China. .,CAS Key Laboratory of Environmental and Applied Microbiology, Environmental Microbiology Key Laboratory of Sichuan Province, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, Sichuan, China.
| | - Yuanyuan Zhai
- College of Life Sciences, Institute for Conservation and Utilization of Agro-Bioresources in Dabie Mountains, Xinyang Normal University, Xinyang, Henan, China
| | - Jinghan Song
- College of Life Sciences, Institute for Conservation and Utilization of Agro-Bioresources in Dabie Mountains, Xinyang Normal University, Xinyang, Henan, China
| | - Jingwei Zhang
- Hospital of Xinyang Normal University, Xinyang Normal University, Henan, Xinyang, China
| | - Xiangzhen Li
- CAS Key Laboratory of Environmental and Applied Microbiology, Environmental Microbiology Key Laboratory of Sichuan Province, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, Sichuan, China
| |
Collapse
|
26
|
Osman EO, Vohsen SA, Girard F, Cruz R, Glickman O, Bullock LM, Anderson KE, Weinnig AM, Cordes EE, Fisher CR, Baums IB. Capacity of deep-sea corals to obtain nutrition from cold seeps aligned with microbiome reorganization. GLOBAL CHANGE BIOLOGY 2023; 29:189-205. [PMID: 36271605 PMCID: PMC10092215 DOI: 10.1111/gcb.16447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/28/2022] [Revised: 09/08/2022] [Accepted: 09/16/2022] [Indexed: 06/16/2023]
Abstract
Cold seeps in the deep sea harbor various animals that have adapted to utilize seepage chemicals with the aid of chemosynthetic microbes that serve as primary producers. Corals are among the animals that live near seep habitats and yet, there is a lack of evidence that corals gain benefits and/or incur costs from cold seeps. Here, we focused on Callogorgia delta and Paramuricea sp. type B3 that live near and far from visual signs of currently active seepage at five sites in the deep Gulf of Mexico. We tested whether these corals rely on chemosynthetically-derived food in seep habitats and how the proximity to cold seeps may influence; (i) coral colony traits (i.e., health status, growth rate, regrowth after sampling, and branch loss) and associated epifauna, (ii) associated microbiome, and (iii) host transcriptomes. Stable isotope data showed that many coral colonies utilized chemosynthetically derived food, but the feeding strategy differed by coral species. The microbiome composition of C. delta, unlike Paramuricea sp., varied significantly between seep and non-seep colonies and both coral species were associated with various sulfur-oxidizing bacteria (SUP05). Interestingly, the relative abundances of SUP05 varied among seep and non-seep colonies and were strongly correlated with carbon and nitrogen stable isotope values. In contrast, the proximity to cold seeps did not have a measurable effect on gene expression, colony traits, or associated epifauna in coral species. Our work provides the first evidence that some corals may gain benefits from living near cold seeps with apparently limited costs to the colonies. Cold seeps provide not only hard substrate but also food to cold-water corals. Furthermore, restructuring of the microbiome communities (particularly SUP05) is likely the key adaptive process to aid corals in utilizing seepage-derived carbon. This highlights that those deep-sea corals may upregulate particular microbial symbiont communities to cope with environmental gradients.
Collapse
Affiliation(s)
- Eslam O. Osman
- Department of BiologyThe Pennsylvania State UniversityState CollegePennsylvaniaUSA
- Marine Biology LabZoology Department, Faculty of ScienceAl‐Azhar UniversityCairoEgypt
- Red Sea Research Center (RSRC)King Abdullah University of Science and Technology (KAUST)ThuwalSaudi Arabia
| | - Samuel A. Vohsen
- Department of BiologyThe Pennsylvania State UniversityState CollegePennsylvaniaUSA
| | - Fanny Girard
- Department of BiologyThe Pennsylvania State UniversityState CollegePennsylvaniaUSA
- Monterey Bay Aquarium Research InstituteMoss LandingCAUSA
| | - Rafaelina Cruz
- Department of BiologyThe Pennsylvania State UniversityState CollegePennsylvaniaUSA
| | - Orli Glickman
- Department of BiologyThe Pennsylvania State UniversityState CollegePennsylvaniaUSA
| | - Lena M. Bullock
- Department of BiologyThe Pennsylvania State UniversityState CollegePennsylvaniaUSA
| | - Kaitlin E. Anderson
- Department of BiologyThe Pennsylvania State UniversityState CollegePennsylvaniaUSA
| | | | | | - Charles R. Fisher
- Department of BiologyThe Pennsylvania State UniversityState CollegePennsylvaniaUSA
| | - Iliana B. Baums
- Department of BiologyThe Pennsylvania State UniversityState CollegePennsylvaniaUSA
- Helmholtz Institute for Functional Marine Biodiversity (HIFMB)AmmerländerHeerstraße 231, 26129 OldenburgGermany
| |
Collapse
|
27
|
Van Pee T, Hogervorst J, Dockx Y, Witters K, Thijs S, Wang C, Bongaerts E, Van Hamme JD, Vangronsveld J, Ameloot M, Raes J, Nawrot TS. Accumulation of Black Carbon Particles in Placenta, Cord Blood, and Childhood Urine in Association with the Intestinal Microbiome Diversity and Composition in Four- to Six-Year-Old Children in the ENVIR ONAGE Birth Cohort. ENVIRONMENTAL HEALTH PERSPECTIVES 2023; 131:17010. [PMID: 36719212 PMCID: PMC9888258 DOI: 10.1289/ehp11257] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
BACKGROUND The gut microbiome plays an essential role in human health. Despite the link between air pollution exposure and various diseases, its association with the gut microbiome during susceptible life periods remains scarce. OBJECTIVES In this study, we examined the association between black carbon particles quantified in prenatal and postnatal biological matrices and bacterial richness and diversity measures, and bacterial families. METHODS A total of 85 stool samples were collected from 4- to 6-y-old children enrolled in the ENVIRonmental influence ON early AGEing birth cohort. We performed 16S rRNA gene sequencing to calculate bacterial richness and diversity indices (Chao1 richness, Shannon diversity, and Simpson diversity) and the relative abundance of bacterial families. Black carbon particles were quantified via white light generation under femtosecond pulsed laser illumination in placental tissue and cord blood, employed as prenatal exposure biomarkers, and in urine, used as a post-natal exposure biomarker. We used robust multivariable-adjusted linear models to examine the associations between quantified black carbon loads and measures of richness (Chao1 index) and diversity (Shannon and Simpson indices), adjusting for parity, season of delivery, sequencing batch, age, sex, weight and height of the child, and maternal education. Additionally, we performed a differential relative abundance analysis of bacterial families with a correction for sampling fraction bias. Results are expressed as percentage difference for a doubling in black carbon loads with 95% confidence interval (CI). RESULTS Two diversity indices were negatively associated with placental black carbon [Shannon: -4.38% (95% CI: -8.31%, -0.28%); Simpson: -0.90% (95% CI: -1.76%, -0.04%)], cord blood black carbon [Shannon: -3.38% (95% CI: -5.66%, -0.84%); Simpson: -0.91 (95% CI: -1.66%, -0.16%)], and urinary black carbon [Shannon: -3.39% (95% CI: -5.77%, -0.94%); Simpson: -0.89% (95% CI: -1.37%, -0.40%)]. The explained variance of black carbon on the above indices varied from 6.1% to 16.6%. No statistically significant associations were found between black carbon load and the Chao1 richness index. After multiple testing correction, placental black carbon was negatively associated with relative abundance of the bacterial families Defluviitaleaceae and Marinifilaceae, and urinary black carbon with Christensenellaceae and Coriobacteriaceae; associations with cord blood black carbon were not statistically significant after correction. CONCLUSION Black carbon particles quantified in prenatal and postnatal biological matrices were associated with the composition and diversity of the childhood intestinal microbiome. These findings address the influential role of exposure to air pollution during pregnancy and early life in human health. https://doi.org/10.1289/EHP11257.
Collapse
Affiliation(s)
- Thessa Van Pee
- Centre for Environmental Sciences, Hasselt University, Diepenbeek, Belgium
| | - Janneke Hogervorst
- Centre for Environmental Sciences, Hasselt University, Diepenbeek, Belgium
| | - Yinthe Dockx
- Centre for Environmental Sciences, Hasselt University, Diepenbeek, Belgium
| | - Katrien Witters
- Centre for Environmental Sciences, Hasselt University, Diepenbeek, Belgium
| | - Sofie Thijs
- Centre for Environmental Sciences, Hasselt University, Diepenbeek, Belgium
| | - Congrong Wang
- Centre for Environmental Sciences, Hasselt University, Diepenbeek, Belgium
| | - Eva Bongaerts
- Centre for Environmental Sciences, Hasselt University, Diepenbeek, Belgium
| | - Jonathan D Van Hamme
- Department of Biological Sciences, Thompson Rivers University, Kamloops, British Columbia, Canada
| | - Jaco Vangronsveld
- Centre for Environmental Sciences, Hasselt University, Diepenbeek, Belgium
- Department of Plant Physiology and Biophysics, Faculty of Biology and Biotechnology, Maria Curie-Skłodowska University, Lublin, Poland
| | - Marcel Ameloot
- Biomedical Research Institute, Hasselt University, Diepenbeek, Belgium
| | - Jeroen Raes
- Department of Microbiology and Immunology, Rega Instituut, KU Leuven-University of Leuven, Leuven, Belgium
- Center for Microbiology, VIB, Leuven, Belgium
| | - Tim S Nawrot
- Centre for Environmental Sciences, Hasselt University, Diepenbeek, Belgium
- Department of Public Health and Primary Care, Leuven University, Leuven, Belgium
| |
Collapse
|
28
|
Huang Z, Liu K, Ma W, Li D, Mo T, Liu Q. The gut microbiome in human health and disease-Where are we and where are we going? A bibliometric analysis. Front Microbiol 2022; 13:1018594. [PMID: 36590421 PMCID: PMC9797740 DOI: 10.3389/fmicb.2022.1018594] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Accepted: 11/17/2022] [Indexed: 12/23/2022] Open
Abstract
Background There are trillions of microbiota in our intestinal tract, and they play a significant role in health and disease via interacting with the host in metabolic, immune, neural, and endocrine pathways. Over the past decades, numerous studies have been published in the field of gut microbiome and disease. Although there are narrative reviews of gut microbiome and certain diseases, the whole field is lack of systematic and quantitative analysis. Therefore, we outline research status of the gut microbiome and disease, and present insights into developments and characteristics of this field to provide a holistic grasp and future research directions. Methods An advanced search was carried out in the Web of Science Core Collection (WoSCC), basing on the term "gut microbiome" and its synonyms. The current status and developing trends of this scientific domain were evaluated by bibliometric methodology. CiteSpace was used to perform collaboration network analysis, co-citation analysis and citation burst detection. Results A total of 29,870 articles and 13,311 reviews were retrieved from the database, which involve 42,900 keywords, 176 countries/regions, 19,065 institutions, 147,225 authors and 4,251 journals. The gut microbiome and disease research is active and has received increasing attention. Co-cited reference analysis revealed the landmark articles in the field. The United States had the largest number of publications and close cooperation with other countries. The current research mainly focuses on gastrointestinal diseases, such as inflammatory bowel disease (IBD), ulcerative colitis (UC) and Crohn's disease (CD), while extra-intestinal diseases are also rising, such as obesity, diabetes, cardiovascular disease, Alzheimer's disease, Parkinson's disease. Omics technologies, fecal microbiota transplantation (FMT) and metabolites linked to mechanism would be more concerned in the future. Conclusion The gut microbiome and disease has been a booming field of research, and the trend is expected to continue. Overall, this research field shows a multitude of challenges and great opportunities.
Collapse
|
29
|
Martinez SS, Stebliankin V, Hernandez J, Martin H, Tamargo J, Rodriguez JB, Teeman C, Johnson A, Seminario L, Campa A, Narasimhan G, Baum MK. Multiomic analysis reveals microbiome-related relationships between cocaine use and metabolites. AIDS 2022; 36:2089-2099. [PMID: 36382433 PMCID: PMC9673179 DOI: 10.1097/qad.0000000000003363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
OBJECTIVE Over 19 million individuals globally have a cocaine use disorder, a significant public health crisis. Cocaine has also been associated with a pro-inflammatory state and recently with imbalances in the intestinal microbiota as compared to nonuse. The objective of this pilot study was to characterize the gut microbiota and plasma metabolites in people with HIV (PWH) who use cocaine compared with those who do not. DESIGN Cross-sectional study. METHODS A pilot study in PWH was conducted on 25 cocaine users and 25 cocaine nonusers from the Miami Adult Studies on HIV cohort. Stool samples and blood plasma were collected. Bacterial composition was characterized using 16S rRNA sequencing. Metabolomics in plasma were determined using gas and liquid chromatography/mass spectrometry. RESULTS The relative abundances of the Lachnopspira genus, Oscillospira genus, Bifidobacterium adolescentis species, and Euryarchaeota phylum were significantly higher in the cocaine- using PWH compared to cocaine-nonusing PWH. Cocaine-use was associated with higher levels of several metabolites: products of dopamine catabolism (3-methoxytyrosine and 3-methoxytyramine sulfate), phenylacetate, benzoate, butyrate, and butyrylglycine. CONCLUSIONS Cocaine use was associated with higher abundances of taxa and metabolites known to be associated with pathogenic states that include gastrointestinal conditions. Understanding key intestinal bacterial functional pathways that are altered due to cocaine use in PWH will provide a better understanding of the relationships between the host intestinal microbiome and potentially provide novel treatments to improve health.
Collapse
Affiliation(s)
| | - Vitalii Stebliankin
- Florida International University, Bioinformatics Research Group (BioRG), Miami, FL, USA
| | - Jacqueline Hernandez
- Florida International University, R. Stempel College of Public Health and Social Work
| | - Haley Martin
- Florida International University, R. Stempel College of Public Health and Social Work
| | - Javier Tamargo
- Florida International University, R. Stempel College of Public Health and Social Work
| | | | - Colby Teeman
- Florida International University, R. Stempel College of Public Health and Social Work
| | - Angelique Johnson
- Florida International University, R. Stempel College of Public Health and Social Work
| | - Leslie Seminario
- Florida International University, R. Stempel College of Public Health and Social Work
| | - Adriana Campa
- Florida International University, R. Stempel College of Public Health and Social Work
| | - Giri Narasimhan
- Florida International University, Bioinformatics Research Group (BioRG), Miami, FL, USA
| | - Marianna K Baum
- Florida International University, R. Stempel College of Public Health and Social Work
| |
Collapse
|
30
|
Huo Y, Jiang Q, Zhao W. Meta-analysis of metagenomics reveals the signatures of vaginal microbiome in preterm birth. MEDICINE IN MICROECOLOGY 2022. [DOI: 10.1016/j.medmic.2022.100065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
|
31
|
Gut Microbiota in Nutrition and Health with a Special Focus on Specific Bacterial Clusters. Cells 2022; 11:cells11193091. [PMID: 36231053 PMCID: PMC9563262 DOI: 10.3390/cells11193091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 09/21/2022] [Accepted: 09/24/2022] [Indexed: 11/25/2022] Open
Abstract
Health is influenced by how the gut microbiome develops as a result of external and internal factors, such as nutrition, the environment, medication use, age, sex, and genetics. Alpha and beta diversity metrics and (enterotype) clustering methods are commonly employed to perform population studies and to analyse the effects of various treatments, yet, with the continuous development of (new) sequencing technologies, and as various omics fields as a result become more accessible for investigation, increasingly sophisticated methodologies are needed and indeed being developed in order to disentangle the complex ways in which the gut microbiome and health are intertwined. Diseases of affluence, such as type 2 diabetes (T2D) and cardiovascular diseases (CVD), are commonly linked to species associated with the Bacteroides enterotype(s) and a decline of various (beneficial) complex microbial trophic networks, which are in turn linked to the aforementioned factors. In this review, we (1) explore the effects that some of the most common internal and external factors have on the gut microbiome composition and how these in turn relate to T2D and CVD, and (2) discuss research opportunities enabled by and the limitations of some of the latest technical developments in the microbiome sector, including the use of artificial intelligence (AI), strain tracking, and peak to trough ratios.
Collapse
|
32
|
Ling W, Lu J, Zhao N, Lulla A, Plantinga AM, Fu W, Zhang A, Liu H, Song H, Li Z, Chen J, Randolph TW, Koay WLA, White JR, Launer LJ, Fodor AA, Meyer KA, Wu MC. Batch effects removal for microbiome data via conditional quantile regression. Nat Commun 2022; 13:5418. [PMID: 36109499 PMCID: PMC9477887 DOI: 10.1038/s41467-022-33071-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 08/29/2022] [Indexed: 11/10/2022] Open
Abstract
Batch effects in microbiome data arise from differential processing of specimens and can lead to spurious findings and obscure true signals. Strategies designed for genomic data to mitigate batch effects usually fail to address the zero-inflated and over-dispersed microbiome data. Most strategies tailored for microbiome data are restricted to association testing or specialized study designs, failing to allow other analytic goals or general designs. Here, we develop the Conditional Quantile Regression (ConQuR) approach to remove microbiome batch effects using a two-part quantile regression model. ConQuR is a comprehensive method that accommodates the complex distributions of microbial read counts by non-parametric modeling, and it generates batch-removed zero-inflated read counts that can be used in and benefit usual subsequent analyses. We apply ConQuR to simulated and real microbiome datasets and demonstrate its advantages in removing batch effects while preserving the signals of interest.
Collapse
Affiliation(s)
- Wodan Ling
- Public Health Sciences Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, 98109, Seattle, USA
| | - Jiuyao Lu
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 N Wolfe St, 21205, Baltimore, USA
| | - Ni Zhao
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 N Wolfe St, 21205, Baltimore, USA.
| | - Anju Lulla
- Nutrition Research Institute and Department of Nutrition, University of North Carolina, 500 Laureate Way, 28081, Kannapolis, USA
| | - Anna M Plantinga
- Department of Mathematics and Statistics, Williams College, 18 Hoxsey St, 01267, Williamstown, USA
| | - Weijia Fu
- Department of Biostatistics, School of Public Health, University of Washington, 1705 NE Pacific St, 98195, Seattle, USA
| | - Angela Zhang
- Public Health Sciences Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, 98109, Seattle, USA
- Department of Biostatistics, School of Public Health, University of Washington, 1705 NE Pacific St, 98195, Seattle, USA
| | - Hongjiao Liu
- Public Health Sciences Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, 98109, Seattle, USA
- Department of Biostatistics, School of Public Health, University of Washington, 1705 NE Pacific St, 98195, Seattle, USA
| | - Hoseung Song
- Public Health Sciences Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, 98109, Seattle, USA
| | - Zhigang Li
- Department of Biostatistics, College of Public Health & Health Professions, College of Medicine, University of Florida, 2004 Mowry Rd, 32611, Gainesville, USA
| | - Jun Chen
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 200 First St SW, 55905, Rochester, USA
| | - Timothy W Randolph
- Public Health Sciences Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, 98109, Seattle, USA
| | - Wei Li A Koay
- Children's National Hospital, 111 Michigan Ave NW, 20010, Washington DC, USA
- Department of Pediatrics, George Washington University, Ross Hall 2300 Eye St NW, 20037, Washington DC, USA
| | - James R White
- Resphera Biosciences, 1529 Lancaster St, 21231, Baltimore, USA
| | - Lenore J Launer
- Laboratory of Epidemiology and Population Science, NIA, NIH, 7201 Wisconsin Ave, 20814, Bethesda, USA
| | - Anthony A Fodor
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, 9201 University City Blvd, 28223, Charlotte, USA
| | - Katie A Meyer
- Nutrition Research Institute and Department of Nutrition, University of North Carolina, 500 Laureate Way, 28081, Kannapolis, USA
| | - Michael C Wu
- Public Health Sciences Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, 98109, Seattle, USA.
- Department of Biostatistics, School of Public Health, University of Washington, 1705 NE Pacific St, 98195, Seattle, USA.
| |
Collapse
|
33
|
Uchiyama J, Osumi T, Mizukami K, Fukuyama T, Shima A, Unno A, Takemura-Uchiyama I, Une Y, Murakami H, Sakaguchi M. Characterization of the oral and fecal microbiota associated with atopic dermatitis in dogs selected from a purebred Shiba Inu colony. Lett Appl Microbiol 2022; 75:1607-1616. [PMID: 36067033 DOI: 10.1111/lam.13828] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2022] [Revised: 08/21/2022] [Accepted: 08/31/2022] [Indexed: 11/30/2022]
Abstract
Atopic dermatitis (AD) is a chronic and relapsing multifactorial inflammatory skin disease that also affects dogs. The oral and gut microbiota are associated with many disorders, including allergy. Few studies have addressed the oral and gut microbiota in dogs, although the skin microbiota has been studied relatively well in these animals. Here, we studied the AD-associated oral and gut microbiota in 16 healthy and nine AD dogs from a purebred Shiba Inu colony. We found that the diversity of the oral microbiota was significantly different among the dogs, whereas no significant difference was observed in the gut microbiota. Moreover, a differential abundance analysis detected the Family_XIII_AD3011_group (Anaerovoracaceae) in the gut microbiota of AD dogs; however, no bacterial taxa were detected in the oral microbiota. Third, the comparison of the microbial co-occurrence patterns between AD and healthy dogs identified differential networks in which the bacteria in the oral microbiota that were most strongly associated with AD were related with human periodontitis, whereas those in the gut microbiota were related with dysbiosis and gut inflammation. These results suggest that AD can alter the oral and gut microbiota in dogs.
Collapse
Affiliation(s)
- Jumpei Uchiyama
- Department of Bacteriology, Graduate School of Medicine Dentistry and Pharmaceutical Sciences, Okayama University, Okayama, Japan.,School of Veterinary Medicine, Azabu University, Kanagawa, Japan
| | - Takafumi Osumi
- Laboratory of Veterinary Internal Medicine, Division of Animal Life Science, Graduate School, Tokyo University of Agriculture and Technology, Tokyo, Japan
| | - Keijiro Mizukami
- School of Veterinary Medicine, Azabu University, Kanagawa, Japan.,Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Tomoki Fukuyama
- School of Veterinary Medicine, Azabu University, Kanagawa, Japan
| | - Ayaka Shima
- Anicom Specialty Medical Institute Inc., Tokyo, Japan
| | - Asaka Unno
- School of Veterinary Medicine, Azabu University, Kanagawa, Japan
| | - Iyo Takemura-Uchiyama
- Department of Bacteriology, Graduate School of Medicine Dentistry and Pharmaceutical Sciences, Okayama University, Okayama, Japan.,School of Veterinary Medicine, Azabu University, Kanagawa, Japan
| | - Yumi Une
- Faculty of Veterinary Medicine, Okayama University of Science, Ehime, Japan
| | | | - Masahiro Sakaguchi
- School of Veterinary Medicine, Azabu University, Kanagawa, Japan.,Institute of Tokyo Environmental Allergy, Tokyo, Japan
| |
Collapse
|
34
|
Pantaleón García J, Dickson RP, Evans SE. Minimizing caging effects in murine lung microbiome studies. Am J Physiol Lung Cell Mol Physiol 2022; 323:L219-L220. [PMID: 35944140 PMCID: PMC9377779 DOI: 10.1152/ajplung.00144.2022] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Affiliation(s)
| | - Robert P Dickson
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, Michigan.,Department of Microbiology and Immunology, University of Michigan Medical School, Ann Arbor, Michigan.,Weil Institute for Critical Care Research & Innovation, Ann Arbor, Michigan
| | - Scott E Evans
- Department of Pulmonary Medicine, University of Texas MD Anderson Cancer Center, Houston, Texas.,MD Anderson Cancer Center UT Health Graduate School of Biomedical Sciences, Houston, Texas
| |
Collapse
|
35
|
Díez López C, Montiel González D, Vidaki A, Kayser M. Prediction of Smoking Habits From Class-Imbalanced Saliva Microbiome Data Using Data Augmentation and Machine Learning. Front Microbiol 2022; 13:886201. [PMID: 35928158 PMCID: PMC9343866 DOI: 10.3389/fmicb.2022.886201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 06/21/2022] [Indexed: 11/24/2022] Open
Abstract
Human microbiome research is moving from characterization and association studies to translational applications in medical research, clinical diagnostics, and others. One of these applications is the prediction of human traits, where machine learning (ML) methods are often employed, but face practical challenges. Class imbalance in available microbiome data is one of the major problems, which, if unaccounted for, leads to spurious prediction accuracies and limits the classifier's generalization. Here, we investigated the predictability of smoking habits from class-imbalanced saliva microbiome data by combining data augmentation techniques to account for class imbalance with ML methods for prediction. We collected publicly available saliva 16S rRNA gene sequencing data and smoking habit metadata demonstrating a serious class imbalance problem, i.e., 175 current vs. 1,070 non-current smokers. Three data augmentation techniques (synthetic minority over-sampling technique, adaptive synthetic, and tree-based associative data augmentation) were applied together with seven ML methods: logistic regression, k-nearest neighbors, support vector machine with linear and radial kernels, decision trees, random forest, and extreme gradient boosting. K-fold nested cross-validation was used with the different augmented data types and baseline non-augmented data to validate the prediction outcome. Combining data augmentation with ML generally outperformed baseline methods in our dataset. The final prediction model combined tree-based associative data augmentation and support vector machine with linear kernel, and achieved a classification performance expressed as Matthews correlation coefficient of 0.36 and AUC of 0.81. Our method successfully addresses the problem of class imbalance in microbiome data for reliable prediction of smoking habits.
Collapse
Affiliation(s)
| | | | | | - Manfred Kayser
- Department of Genetic Identification, Erasmus MC University Medical Center Rotterdam, Rotterdam, Netherlands
| |
Collapse
|
36
|
Kodikara S, Ellul S, Lê Cao KA. Statistical challenges in longitudinal microbiome data analysis. Brief Bioinform 2022; 23:bbac273. [PMID: 35830875 PMCID: PMC9294433 DOI: 10.1093/bib/bbac273] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Revised: 05/28/2022] [Accepted: 06/12/2022] [Indexed: 11/13/2022] Open
Abstract
The microbiome is a complex and dynamic community of microorganisms that co-exist interdependently within an ecosystem, and interact with its host or environment. Longitudinal studies can capture temporal variation within the microbiome to gain mechanistic insights into microbial systems; however, current statistical methods are limited due to the complex and inherent features of the data. We have identified three analytical objectives in longitudinal microbial studies: (1) differential abundance over time and between sample groups, demographic factors or clinical variables of interest; (2) clustering of microorganisms evolving concomitantly across time and (3) network modelling to identify temporal relationships between microorganisms. This review explores the strengths and limitations of current methods to fulfill these objectives, compares different methods in simulation and case studies for objectives (1) and (2), and highlights opportunities for further methodological developments. R tutorials are provided to reproduce the analyses conducted in this review.
Collapse
Affiliation(s)
- Saritha Kodikara
- Melbourne Integrative Genomics, School of Mathematics and Statistics, The University of Melbourne, Royal Parade, 3052, Victoria, Australia
| | - Susan Ellul
- Murdoch Children’s Research Institute and Department of Paediatrics, University of Melbourne, Bouverie Street, 3052, Victoria, Australia
| | - Kim-Anh Lê Cao
- Melbourne Integrative Genomics, School of Mathematics and Statistics, The University of Melbourne, Royal Parade, 3052, Victoria, Australia
| |
Collapse
|
37
|
Fresco S, Marie-Etancelin C, Meynadier A, Martinez Boggio G. Variation in Rumen Bacteria of Lacaune Dairy Ewes From One Week to the Next. Front Microbiol 2022; 13:848518. [PMID: 35814674 PMCID: PMC9260014 DOI: 10.3389/fmicb.2022.848518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Accepted: 06/06/2022] [Indexed: 11/13/2022] Open
Abstract
Bacteria are the most abundant microorganisms in the rumen microbiota and play essential roles, mainly fermenting plant compounds that yield fatty acids. In this study, we aimed at assessing stability of both bacterial composition and of its associations with rumen and milk fatty acids phenotypes over a 1-week period. The study was performed using 118 Lacaune dairy ewes from the INRAE Experimental Unit of La Fage. Rumen and milk samples were obtained from the ewes twice, 1 week apart, and microbiota composition, volatile and long-chain fatty acid concentrations were analyzed. Bacterial composition was assessed using 16S rRNA gene sequencing, and microbiota and fatty acids were analyzed as compositional data. As we worked with relative abundances expressed in a constrained space, the centered log-ratio transformation enabled to transform data to work with multivariate analyses in the Euclidian space. Bacterial composition differed between the 2 weeks of sampling, characterized by different proportions of the two main phyla, Bacteroidetes and Firmicutes. The repeatability of the operational taxonomic units (OTUs) was low, although it varied significantly. However, 66 of them presented a repeatability of over 0.50 and were particularly associated with fatty acid phenotypes. Even though the OTUs from the same bacterial families presented similar correlations to fatty acids in both weeks, only a few OTUs were conserved over the 2 weeks. We proved with the help of sequencing data that there is significant change in microbial composition over a week in terms of abundance of different families of bacteria. Further studies are required to determine the impact of bacterial composition alterations over 1 week, and the specificities of the highly repeatable OTUs.
Collapse
|
38
|
Li R, Yi X, Yang J, Zhu Z, Wang Y, Liu X, Huang X, Wan Y, Fu X, Shu W, Zhang W, Wang Z. Gut Microbiome Signatures in the Progression of Hepatitis B Virus-Induced Liver Disease. Front Microbiol 2022; 13:916061. [PMID: 35733959 PMCID: PMC9208012 DOI: 10.3389/fmicb.2022.916061] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 04/27/2022] [Indexed: 12/12/2022] Open
Abstract
The gut microbiome is associated with hepatitis B virus (HBV)-induced liver disease, which progresses from chronic hepatitis B, to liver cirrhosis, and eventually to hepatocellular carcinoma. Studies have analyzed the gut microbiome at each stage of HBV-induced liver diseases, but a consensus has not been reached on the microbial signatures across these stages. Here, we conducted by a systematic meta-analysis of 486 fecal samples from publicly available 16S rRNA gene datasets across all disease stages, and validated the results by a gut microbiome characterization on an independent cohort of 15 controls, 23 chronic hepatitis B, 20 liver cirrhosis, and 22 hepatocellular carcinoma patients. The integrative analyses revealed 13 genera consistently altered at each of the disease stages both in public and validation datasets, suggesting highly robust microbiome signatures. Specifically, Colidextribacter and Monoglobus were enriched in healthy controls. An unclassified Lachnospiraceae genus was specifically elevated in chronic hepatitis B, whereas Bilophia was depleted. Prevotella and Oscillibacter were depleted in liver cirrhosis. And Coprococcus and Faecalibacterium were depleted in hepatocellular carcinoma. Classifiers established using these 13 genera showed diagnostic power across all disease stages in a cross-validation between public and validation datasets (AUC = 0.65–0.832). The identified microbial taxonomy serves as non-invasive biomarkers for monitoring the progression of HBV-induced liver disease, and may contribute to microbiome-based therapies.
Collapse
Affiliation(s)
- Ranxi Li
- South China Normal University-Panyu Central Hospital Joint Laboratory of Basic and Translational Medical Research, Guangzhou Panyu Central Hospital, Guangzhou, China
- School of Life Sciences, South China Normal University, Guangzhou, China
| | - Xinzhu Yi
- School of Life Sciences, South China Normal University, Guangzhou, China
| | - Junhao Yang
- School of Life Sciences, South China Normal University, Guangzhou, China
| | - Zhou Zhu
- South China Normal University-Panyu Central Hospital Joint Laboratory of Basic and Translational Medical Research, Guangzhou Panyu Central Hospital, Guangzhou, China
- School of Life Sciences, South China Normal University, Guangzhou, China
| | - Yifei Wang
- South China Normal University-Panyu Central Hospital Joint Laboratory of Basic and Translational Medical Research, Guangzhou Panyu Central Hospital, Guangzhou, China
- School of Life Sciences, South China Normal University, Guangzhou, China
| | - Xiaomin Liu
- School of Life Sciences, South China Normal University, Guangzhou, China
| | - Xili Huang
- South China Normal University-Panyu Central Hospital Joint Laboratory of Basic and Translational Medical Research, Guangzhou Panyu Central Hospital, Guangzhou, China
- School of Life Sciences, South China Normal University, Guangzhou, China
| | - Yu Wan
- Department of Gastroenterology, Guangzhou Panyu Central Hospital, Guangzhou, China
| | - Xihua Fu
- Department of Infectious Diseases, Guangzhou Panyu Central Hospital, Guangzhou, China
| | - Wensheng Shu
- School of Life Sciences, South China Normal University, Guangzhou, China
- *Correspondence: Wensheng Shu
| | - Wenjie Zhang
- Department of Science and Education, Guangzhou Panyu Central Hospital, Guangzhou, China
- Wenjie Zhang
| | - Zhang Wang
- South China Normal University-Panyu Central Hospital Joint Laboratory of Basic and Translational Medical Research, Guangzhou Panyu Central Hospital, Guangzhou, China
- School of Life Sciences, South China Normal University, Guangzhou, China
- Zhang Wang
| |
Collapse
|
39
|
Priya S, Burns MB, Ward T, Mars RAT, Adamowicz B, Lock EF, Kashyap PC, Knights D, Blekhman R. Identification of shared and disease-specific host gene-microbiome associations across human diseases using multi-omic integration. Nat Microbiol 2022; 7:780-795. [PMID: 35577971 PMCID: PMC9159953 DOI: 10.1038/s41564-022-01121-z] [Citation(s) in RCA: 57] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Accepted: 04/06/2022] [Indexed: 12/19/2022]
Abstract
While gut microbiome and host gene regulation independently contribute to gastrointestinal disorders, it is unclear how the two may interact to influence host pathophysiology. Here we developed a machine learning-based framework to jointly analyse paired host transcriptomic (n = 208) and gut microbiome (n = 208) profiles from colonic mucosal samples of patients with colorectal cancer, inflammatory bowel disease and irritable bowel syndrome. We identified associations between gut microbes and host genes that depict shared as well as disease-specific patterns. We found that a common set of host genes and pathways implicated in gastrointestinal inflammation, gut barrier protection and energy metabolism are associated with disease-specific gut microbes. Additionally, we also found that mucosal gut microbes that have been implicated in all three diseases, such as Streptococcus, are associated with different host pathways in each disease, suggesting that similar microbes can affect host pathophysiology in a disease-specific manner through regulation of different host genes. Our framework can be applied to other diseases for the identification of host gene-microbiome associations that may influence disease outcomes.
Collapse
Affiliation(s)
- Sambhawa Priya
- Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, MN, USA
- Bioinformatics and Computational Biology, University of Minnesota, Minneapolis, MN, USA
| | - Michael B Burns
- Department of Biology, Loyola University Chicago, Chicago, IL, USA
| | - Tonya Ward
- BioTechnology Institute, College of Biological Sciences, University of Minnesota, Minneapolis, MN, USA
| | - Ruben A T Mars
- Division of Gastroenterology and Hepatology, Department of Internal Medicine, Mayo Clinic, Rochester, MN, USA
| | - Beth Adamowicz
- Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, MN, USA
| | - Eric F Lock
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Purna C Kashyap
- Division of Gastroenterology and Hepatology, Department of Internal Medicine, Mayo Clinic, Rochester, MN, USA
| | - Dan Knights
- BioTechnology Institute, College of Biological Sciences, University of Minnesota, Minneapolis, MN, USA
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
| | - Ran Blekhman
- Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, MN, USA.
- Department of Ecology, Evolution, and Behavior, University of Minnesota, Minneapolis, MN, USA.
| |
Collapse
|
40
|
Xiao L, Zhang F, Zhao F. Large-scale microbiome data integration enables robust biomarker identification. NATURE COMPUTATIONAL SCIENCE 2022; 2:307-316. [PMID: 38177817 PMCID: PMC10766547 DOI: 10.1038/s43588-022-00247-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 04/12/2022] [Indexed: 01/06/2024]
Abstract
The close association between gut microbiota dysbiosis and human diseases is being increasingly recognized. However, contradictory results are frequently reported, as confounding effects exist. The lack of unbiased data integration methods is also impeding the discovery of disease-associated microbial biomarkers from different cohorts. Here we propose an algorithm, NetMoss, for assessing shifts of microbial network modules to identify robust biomarkers associated with various diseases. Compared to previous approaches, the NetMoss method shows better performance in removing batch effects. Through comprehensive evaluations on both simulated and real datasets, we demonstrate that NetMoss has great advantages in the identification of disease-related biomarkers. Based on analysis of pandisease microbiota studies, there is a high prevalence of multidisease-related bacteria in global populations. We believe that large-scale data integration will help in understanding the role of the microbiome from a more comprehensive perspective and that accurate biomarker identification will greatly promote microbiome-based medical diagnosis.
Collapse
Affiliation(s)
- Liwen Xiao
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Fengyi Zhang
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China
| | - Fangqing Zhao
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China.
- University of Chinese Academy of Sciences, Beijing, China.
- Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, China.
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
41
|
Abaach M, Morilla I. Learning models for colorectal cancer signature reconstruction and classification in patients with chronic inflammatory bowel disease. Artif Intell Cancer 2022; 3:27-41. [DOI: 10.35713/aic.v3.i2.27] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 02/16/2022] [Accepted: 04/28/2022] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND In their everyday life, clinicians face an overabundance of biological indicators potentially helpful during a disease therapy. In this context, to be able to reliably identify a reduced number of those markers showing the ability of optimising the classification of treatment outcomes becomes a factor of vital importance to medical prognosis. In this work, we focus our interest in inflammatory bowel disease (IBD), a long-life threaten with a continuous increasing prevalence worldwide. In particular, IBD can be described as a set of autoimmune conditions affecting the gastrointestinal tract whose two main types are Crohn’s disease and ulcerative colitis.
AIM To identify the minimal signature of microRNA (miRNA) associated with colorectal cancer (CRC) in patients with one chronic IBD.
METHODS We provide a framework of well-established statistical and computational learning methods wisely adapted to reconstructing a CRC network leveraged to stratify these patients.
RESULTS Our strategy resulted in an adjusted signature of 5 miRNAs out of approximately 2600 in Crohn’s Disease (resp. 8 in Ulcerative Colitis) with a percentage of success in patient classification of 82% (resp. 81%).
CONCLUSION Importantly, these two signatures optimally balance the proportion between the number of significant miRNAs and their percentage of success in patients’ stratification.
Collapse
Affiliation(s)
- Mariem Abaach
- Mathématiques Appliquées à Paris 5, Unité mixte de Recherche, Centre National de la Recherche Scientifique, Université de Paris, Paris 75006, France
| | - Ian Morilla
- Laboratoire Analyse, Géométrie et Applications, Centre National de la Recherche Scientifique (Unité mixte de Recherche), Université Sorbonne Paris Nord, Villetaneuse, Paris 93430, France
| |
Collapse
|
42
|
Are batch effects still relevant in the age of big data? Trends Biotechnol 2022; 40:1029-1040. [DOI: 10.1016/j.tibtech.2022.02.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 02/13/2022] [Accepted: 02/18/2022] [Indexed: 12/30/2022]
|
43
|
Zha Y, Ning K. Ontology-aware neural network: a general framework for pattern mining from microbiome data. Brief Bioinform 2022; 23:6517031. [PMID: 35091743 PMCID: PMC8921649 DOI: 10.1093/bib/bbac005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 12/30/2021] [Accepted: 01/04/2022] [Indexed: 11/23/2022] Open
Abstract
With the rapid accumulation of microbiome data around the world, numerous computational bioinformatics methods have been developed for pattern mining from such paramount microbiome data. Current microbiome data mining methods, such as gene and species mining, rely heavily on sequence comparison. Most of these methods, however, have a clear trade-off, particularly, when it comes to big-data analytical efficiency and accuracy. Microbiome entities are usually organized in ontology structures, and pattern mining methods that have considered ontology structures could offer advantages in mining efficiency and accuracy. Here, we have summarized the ontology-aware neural network (ONN) as a novel framework for microbiome data mining. We have discussed the applications of ONN in multiple contexts, including gene mining, species mining and microbial community dynamic pattern mining. We have then highlighted one of the most important characteristics of ONN, namely, novel knowledge discovery, which makes ONN a standout among all microbiome data mining methods. Finally, we have provided several applications to showcase the advantage of ONN over other methods in microbiome data mining. In summary, ONN represents a paradigm shift for pattern mining from microbiome data: from traditional machine learning approach to ontology-aware and model-based approach, which has found its broad application scenarios in microbiome data mining.
Collapse
Affiliation(s)
- Yuguo Zha
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Department of Bioinformatics and Systems Biology, Center of AI Biology, College of Life Science and Technology, Huazhong University of Science and Technology, 1037 Luoyu Road Wuhan, Hubei, Wuhan 430074, China
| | - Kang Ning
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Department of Bioinformatics and Systems Biology, Center of AI Biology, College of Life Science and Technology, Huazhong University of Science and Technology, 1037 Luoyu Road Wuhan, Hubei, Wuhan 430074, China
| |
Collapse
|
44
|
Sung JJY, Wong SH. What is unknown in using microbiota as a therapeutic? J Gastroenterol Hepatol 2022; 37:39-44. [PMID: 34668228 DOI: 10.1111/jgh.15716] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 10/15/2021] [Accepted: 10/18/2021] [Indexed: 12/17/2022]
Abstract
Fecal microbiota transplantation (FMT) has been used extensively in the treatment of various gastrointestinal and extraintestinal conditions, despite that there are still a lot of missing gaps in our knowledge in the gut microbiota and its behavior. This article describes the unknowns in microbiota biology (undetected microbes, uncertain colonization, unclear mechanisms of action, uncertain indications, unsure long-term efficacy, or side effects). We discuss how these unknowns may affect the therapeutic uses of FMT, and the potentials and caveats of other related microbiota-based therapies. When used as an experimental therapy or last resort in difficult conditions, caution should be taken against inadvertent complications. Clear documentations of post-treatment events should be made mandatory, classified, and graded as in clinical trials. Further robust scientific experiments and properly designed clinical studies are needed.
Collapse
Affiliation(s)
- Joseph J Y Sung
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore
| | - Sunny H Wong
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore
| |
Collapse
|
45
|
Kubinski R, Djamen-Kepaou JY, Zhanabaev T, Hernandez-Garcia A, Bauer S, Hildebrand F, Korcsmaros T, Karam S, Jantchou P, Kafi K, Martin RD. Benchmark of Data Processing Methods and Machine Learning Models for Gut Microbiome-Based Diagnosis of Inflammatory Bowel Disease. Front Genet 2022; 13:784397. [PMID: 35251123 PMCID: PMC8895431 DOI: 10.3389/fgene.2022.784397] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 01/13/2022] [Indexed: 12/14/2022] Open
Abstract
Patients with inflammatory bowel disease (IBD) wait months and undergo numerous invasive procedures between the initial appearance of symptoms and receiving a diagnosis. In order to reduce time until diagnosis and improve patient wellbeing, machine learning algorithms capable of diagnosing IBD from the gut microbiome's composition are currently being explored. To date, these models have had limited clinical application due to decreased performance when applied to a new cohort of patient samples. Various methods have been developed to analyze microbiome data which may improve the generalizability of machine learning IBD diagnostic tests. With an abundance of methods, there is a need to benchmark the performance and generalizability of various machine learning pipelines (from data processing to training a machine learning model) for microbiome-based IBD diagnostic tools. We collected fifteen 16S rRNA microbiome datasets (7,707 samples) from North America to benchmark combinations of gut microbiome features, data normalization and transformation methods, batch effect correction methods, and machine learning models. Pipeline generalizability to new cohorts of patients was evaluated with two binary classification metrics following leave-one-dataset-out cross (LODO) validation, where all samples from one study were left out of the training set and tested upon. We demonstrate that taxonomic features processed with a compositional transformation method and batch effect correction with the naive zero-centering method attain the best classification performance. In addition, machine learning models that identify non-linear decision boundaries between labels are more generalizable than those that are linearly constrained. Lastly, we illustrate the importance of generating a curated training dataset to ensure similar performance across patient demographics. These findings will help improve the generalizability of machine learning models as we move towards non-invasive diagnostic and disease management tools for patients with IBD.
Collapse
Affiliation(s)
- Ryszard Kubinski
- Phyla Technologies Inc, Montréal, QC, Canada
- *Correspondence: Ryszard Kubinski, ; Ryan D. Martin,
| | | | | | - Alex Hernandez-Garcia
- Mila, Quebec Artificial Intelligence Institute, University of Montreal, Montréal, QC, Canada
| | - Stefan Bauer
- Max Planck Institute for Intelligent Systems, Tübingen, Germany
| | - Falk Hildebrand
- Gut Microbes and Health, Quadram Institute Bioscience, Norwich, United Kingdom
- Earlham Institute, Norwich, United Kingdom
| | - Tamas Korcsmaros
- Gut Microbes and Health, Quadram Institute Bioscience, Norwich, United Kingdom
- Earlham Institute, Norwich, United Kingdom
| | - Sani Karam
- Phyla Technologies Inc, Montréal, QC, Canada
| | - Prévost Jantchou
- Centre Hospitalier Universitaire Sainte-Justine, Montréal, QC, Canada
| | - Kamran Kafi
- Phyla Technologies Inc, Montréal, QC, Canada
| | - Ryan D. Martin
- Phyla Technologies Inc, Montréal, QC, Canada
- *Correspondence: Ryszard Kubinski, ; Ryan D. Martin,
| |
Collapse
|
46
|
Integrative genomic analysis of PPP3R1 in Alzheimer's disease: a potential biomarker for predictive, preventive, and personalized medical approach. EPMA J 2021; 12:647-658. [PMID: 34956428 DOI: 10.1007/s13167-021-00261-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 10/18/2021] [Indexed: 01/26/2023]
Abstract
Alzheimer's disease (AD) is associated with abnormal calcium signaling, a pathway regulated by the calcium-dependent protein phosphatase. This study aimed to investigate the molecular function of protein phosphatase 3 regulatory subunit B (PPP3R1) underlying AD, which may provide novel insights for the predictive diagnostics, targeted prevention, and personalization of medical services in AD by targeting PPP3R1. A total of 1860 differentially expressed genes (DEGs) from 13,049 background genes were overlapped in AD/control and PPP3R1-low/high cohorts. Based on these DEGs, six co-expression modules were constructed by weight gene correlation network analysis (WGCNA). The turquoise module had the strongest correlation with AD and low PPP3R1, in which DEGs participated in axon guidance, glutamatergic synapse, long-term potentiation (LTP), mitogen-activated protein kinase (MAPK), Ras, and hypoxia-inducible factor 1 (HIF-1) signaling pathways. Furthermore, the cross-talking pathways of PPP3R1, such as axon guidance, glutamatergic synapse, LTP, and MAPK signaling pathways, were identified in the global regulatory network. The area under the curve (AUC) analysis showed that low PPP3R1 could accurately predict the onset of AD. Therefore, our findings highlight the involvement of PPP3R1 in the pathogenesis of AD via axon guidance, glutamatergic synapse, LTP, and MAPK signaling pathways, and identify downregulation of PPP3R1 as a potential biomarker for AD treatment in the context of 3P medicine-predictive diagnostics, targeted prevention, and personalization of medical services. Supplementary Information The online version contains supplementary material available at 10.1007/s13167-021-00261-2.
Collapse
|
47
|
Narayana JK, Mac Aogáin M, Goh WWB, Xia K, Tsaneva-Atanasova K, Chotirmall SH. Mathematical-based microbiome analytics for clinical translation. Comput Struct Biotechnol J 2021; 19:6272-6281. [PMID: 34900137 PMCID: PMC8637001 DOI: 10.1016/j.csbj.2021.11.029] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2021] [Revised: 11/17/2021] [Accepted: 11/17/2021] [Indexed: 12/20/2022] Open
Abstract
Traditionally, human microbiology has been strongly built on the laboratory focused culture of microbes isolated from human specimens in patients with acute or chronic infection. These approaches primarily view human disease through the lens of a single species and its relevant clinical setting however such approaches fail to account for the surrounding environment and wide microbial diversity that exists in vivo. Given the emergence of next generation sequencing technologies and advancing bioinformatic pipelines, researchers now have unprecedented capabilities to characterise the human microbiome in terms of its taxonomy, function, antibiotic resistance and even bacteriophages. Despite this, an analysis of microbial communities has largely been restricted to ordination, ecological measures, and discriminant taxa analysis. This is predominantly due to a lack of suitable computational tools to facilitate microbiome analytics. In this review, we first evaluate the key concerns related to the inherent structure of microbiome datasets which include its compositionality and batch effects. We describe the available and emerging analytical techniques including integrative analysis, machine learning, microbial association networks, topological data analysis (TDA) and mathematical modelling. We also present how these methods may translate to clinical settings including tools for implementation. Mathematical based analytics for microbiome analysis represents a promising avenue for clinical translation across a range of acute and chronic disease states.
Collapse
Affiliation(s)
- Jayanth Kumar Narayana
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
| | - Micheál Mac Aogáin
- Biochemical Genetics Laboratory, Department of Biochemistry, St. James’s Hospital, Dublin, Ireland
- Clinical Biochemistry Unit, School of Medicine, Trinity College Dublin, Dublin, Ireland
| | - Wilson Wen Bin Goh
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore
| | - Krasimira Tsaneva-Atanasova
- Department of Mathematics & Living Systems Institute, College of Engineering, Mathematics and Physical Sciences, University of Exeter, Exeter EX4 4QF, UK
| | - Sanjay H. Chotirmall
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
- Department of Respiratory and Critical Care Medicine, Tan Tock Seng Hospital, Singapore
| |
Collapse
|
48
|
Cameron ES, Schmidt PJ, Tremblay BJM, Emelko MB, Müller KM. Enhancing diversity analysis by repeatedly rarefying next generation sequencing data describing microbial communities. Sci Rep 2021; 11:22302. [PMID: 34785722 PMCID: PMC8595385 DOI: 10.1038/s41598-021-01636-1] [Citation(s) in RCA: 60] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 10/27/2021] [Indexed: 12/13/2022] Open
Abstract
Amplicon sequencing has revolutionized our ability to study DNA collected from environmental samples by providing a rapid and sensitive technique for microbial community analysis that eliminates the challenges associated with lab cultivation and taxonomic identification through microscopy. In water resources management, it can be especially useful to evaluate ecosystem shifts in response to natural and anthropogenic landscape disturbances to signal potential water quality concerns, such as the detection of toxic cyanobacteria or pathogenic bacteria. Amplicon sequencing data consist of discrete counts of sequence reads, the sum of which is the library size. Groups of samples typically have different library sizes that are not representative of biological variation; library size normalization is required to meaningfully compare diversity between them. Rarefaction is a widely used normalization technique that involves the random subsampling of sequences from the initial sample library to a selected normalized library size. This process is often dismissed as statistically invalid because subsampling effectively discards a portion of the observed sequences, yet it remains prevalent in practice and the suitability of rarefying, relative to many other normalization approaches, for diversity analysis has been argued. Here, repeated rarefying is proposed as a tool to normalize library sizes for diversity analyses. This enables (i) proportionate representation of all observed sequences and (ii) characterization of the random variation introduced to diversity analyses by rarefying to a smaller library size shared by all samples. While many deterministic data transformations are not tailored to produce equal library sizes, repeatedly rarefying reflects the probabilistic process by which amplicon sequencing data are obtained as a representation of the amplified source microbial community. Specifically, it evaluates which data might have been obtained if a particular sample's library size had been smaller and allows graphical representation of the effects of this library size normalization process upon diversity analysis results.
Collapse
Affiliation(s)
- Ellen S Cameron
- Department of Biology, University of Waterloo, 200 University Ave. W, Waterloo, ON, N2L 3G1, Canada
| | - Philip J Schmidt
- Department of Civil and Environmental Engineering, University of Waterloo, 200 University Ave. W, Waterloo, ON, N2L 3G1, Canada
| | - Benjamin J-M Tremblay
- Department of Biology, University of Waterloo, 200 University Ave. W, Waterloo, ON, N2L 3G1, Canada
| | - Monica B Emelko
- Department of Civil and Environmental Engineering, University of Waterloo, 200 University Ave. W, Waterloo, ON, N2L 3G1, Canada
| | - Kirsten M Müller
- Department of Biology, University of Waterloo, 200 University Ave. W, Waterloo, ON, N2L 3G1, Canada.
| |
Collapse
|
49
|
Bredeck G, Kämpfer AAM, Sofranko A, Wahle T, Lison D, Ambroise J, Stahlmecke B, Albrecht C, Schins RPF. Effects of dietary exposure to the engineered nanomaterials CeO 2, SiO 2, Ag, and TiO 2 on the murine gut microbiome. Nanotoxicology 2021; 15:934-950. [PMID: 34380002 DOI: 10.1080/17435390.2021.1940339] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Rodent studies on the effects of engineered nanomaterials (ENM) on the gut microbiome have revealed contradictory results. Our aim was to assess the effects of four well-investigated model ENM using a realistic exposure scenario. Two independent ad libitum feeding studies were performed. In study 1, female mice from the local breeding facility received feed pellets containing 1% CeO2 or 1% SiO2 for three weeks. In study 2, both female and male mice were purchased and exposed to 0.2% Ag-PVP or 1% TiO2 for four weeks. A next generation 16S rDNA sequencing-based approach was applied to assess impacts on the gut microbiome. None of the ENM had an effect on the α- or β-diversity. A decreased relative abundance of the phylum Actinobacteria was observed in SiO2 exposed mice. In female mice, the relative abundance of the genus Roseburia was increased with Ag exposure. Furthermore, in study 2, a sex-related difference in the β-diversity was observed. A difference in the β-diversity was also shown between the female control mice of the two studies. We did not find major effects on the gut microbiome. This contrast to other studies may be due to variations in the study design. Our investigation underlined the important role of the sex of test animals and their microbiome composition prior to ENM exposure initiation. Hence, standardization of microbiome studies is strongly required to increase comparability. The ENM-specific effects on Actinobacteria and Roseburia, two taxa pivotal for the human gut homeostasis, warrant further research on their relevance for health.
Collapse
Affiliation(s)
- Gerrit Bredeck
- IUF - Leibniz Research Institute for Environmental Medicine, Düsseldorf, Germany
| | - Angela A M Kämpfer
- IUF - Leibniz Research Institute for Environmental Medicine, Düsseldorf, Germany
| | - Adriana Sofranko
- IUF - Leibniz Research Institute for Environmental Medicine, Düsseldorf, Germany
| | - Tina Wahle
- IUF - Leibniz Research Institute for Environmental Medicine, Düsseldorf, Germany
| | - Dominique Lison
- Louvain Centre for Toxicology and Applied Pharmacology, Université Catholique de Louvain, Brussels, Belgium
| | - Jérôme Ambroise
- Centre de Technologies Moléculaires Appliquées, Institut de Recherche Expérimentale et Clinique, Université Catholique de Louvain, Brussels, Belgium
| | - Burkhard Stahlmecke
- Institute for Energy and Environmental Technology e.V. (IUTA), Duisburg, Germany
| | - Catrin Albrecht
- IUF - Leibniz Research Institute for Environmental Medicine, Düsseldorf, Germany
| | - Roel P F Schins
- IUF - Leibniz Research Institute for Environmental Medicine, Düsseldorf, Germany
| |
Collapse
|
50
|
Peschel S, Müller CL, von Mutius E, Boulesteix AL, Depner M. NetCoMi: network construction and comparison for microbiome data in R. Brief Bioinform 2021; 22:bbaa290. [PMID: 33264391 PMCID: PMC8293835 DOI: 10.1093/bib/bbaa290] [Citation(s) in RCA: 146] [Impact Index Per Article: 48.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Revised: 09/24/2020] [Accepted: 10/07/2020] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Estimating microbial association networks from high-throughput sequencing data is a common exploratory data analysis approach aiming at understanding the complex interplay of microbial communities in their natural habitat. Statistical network estimation workflows comprise several analysis steps, including methods for zero handling, data normalization and computing microbial associations. Since microbial interactions are likely to change between conditions, e.g. between healthy individuals and patients, identifying network differences between groups is often an integral secondary analysis step. Thus far, however, no unifying computational tool is available that facilitates the whole analysis workflow of constructing, analysing and comparing microbial association networks from high-throughput sequencing data. RESULTS Here, we introduce NetCoMi (Network Construction and comparison for Microbiome data), an R package that integrates existing methods for each analysis step in a single reproducible computational workflow. The package offers functionality for constructing and analysing single microbial association networks as well as quantifying network differences. This enables insights into whether single taxa, groups of taxa or the overall network structure change between groups. NetCoMi also contains functionality for constructing differential networks, thus allowing to assess whether single pairs of taxa are differentially associated between two groups. Furthermore, NetCoMi facilitates the construction and analysis of dissimilarity networks of microbiome samples, enabling a high-level graphical summary of the heterogeneity of an entire microbiome sample collection. We illustrate NetCoMi's wide applicability using data sets from the GABRIELA study to compare microbial associations in settled dust from children's rooms between samples from two study centers (Ulm and Munich). AVAILABILITY R scripts used for producing the examples shown in this manuscript are provided as supplementary data. The NetCoMi package, together with a tutorial, is available at https://github.com/stefpeschel/NetCoMi. CONTACT Tel:+49 89 3187 43258; stefanie.peschel@mail.de. SUPPLEMENTARY INFORMATION Supplementary data are available at Briefings in Bioinformatics online.
Collapse
Affiliation(s)
- Stefanie Peschel
- Institute for Asthma and Allergy Prevention, Helmholtz Zentrum München, German Research Center for Environmental Health, Munich, Germany
| | - Christian L Müller
- Department of Statistics, LMU München, Munich, Germany
- Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Munich, Germany
- Center for Computational Mathematics, Flatiron Institute, New York, USA
| | - Erika von Mutius
- Institute for Asthma and Allergy Prevention, Helmholtz Zentrum München, German Research Center for Environmental Health, Munich, Germany
- Dr von Hauner Children’s Hospital, LMU München, Munich, Germany
- Comprehensive Pneumology Center Munich (CPC-M), Member of the German Center for Lung Research, Munich, Germany
| | - Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry and Epidemiology, LMU München, Munich, Germany
| | - Martin Depner
- Institute for Asthma and Allergy Prevention, Helmholtz Zentrum München, German Research Center for Environmental Health, Munich, Germany
| |
Collapse
|