1
|
Chang CC, Liu TC, Lu CJ, Chiu HC, Lin WN. Explainable machine learning model for identifying key gut microbes and metabolites biomarkers associated with myasthenia gravis. Comput Struct Biotechnol J 2024; 23:1572-1583. [PMID: 38650589 PMCID: PMC11035017 DOI: 10.1016/j.csbj.2024.04.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Revised: 03/14/2024] [Accepted: 04/07/2024] [Indexed: 04/25/2024] Open
Abstract
Diagnostic markers for myasthenia gravis (MG) are limited; thus, innovative approaches are required for supportive diagnosis and personalized care. Gut microbes are associated with MG pathogenesis; however, few studies have adopted machine learning (ML) to identify the associations among MG, gut microbiota, and metabolites. In this study, we developed an explainable ML model to predict biomarkers for MG diagnosis. We enrolled 19 MG patients and 10 non-MG individuals. Stool samples were collected and microbiome assessment was performed using 16S rRNA sequencing. Untargeted metabolic profiling was conducted to identify fecal amplicon significant variants (ASVs) and metabolites. We developed an explainable ML model in which the top ASVs and metabolites are combined to identify the best predictive performance. This model uses the SHapley Additive exPlanations method to generate both global and personalized explanations. Fecal microbe-metabolite composition differed significantly between groups. The key bacterial families were Lachnospiraceae and Ruminococcaceae, and the top three features were Lachnospiraceae, inosine, and methylhistidine. An ML model trained with the top 1 % ASVs and top 15 % metabolites combined outperformed all other models. Personalized explanations revealed different patterns of microbe-metabolite contributions in patients with MG. The integration of the microbiota-metabolite features and the development of an explainable ML framework can accurately identify MG and provide personalized explanations, revealing the associations between gut microbiota, metabolites, and MG. An online calculator employing this algorithm was developed that provides a streamlined interface for MG diagnosis screening and conducting personalized evaluations.
Collapse
Affiliation(s)
- Che-Cheng Chang
- PhD Program in Nutrition and Food Science, Fu Jen Catholic University, New Taipei City, Taiwan
- Department of Neurology, Fu Jen Catholic University Hospital, Fu Jen Catholic University, New Taipei City, Taiwan
- Graduate Institute of Biomedical and Pharmaceutical Science, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Tzu-Chi Liu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Chi-Jie Lu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City, Taiwan
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei City, Taiwan
- Department of Information Management, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Hou-Chang Chiu
- School of Medicine, Fu Jen Catholic University, New Taipei City, Taiwan
- Department of Neurology, Taipei Medical University, Shuang-Ho Hospital, New Taipei City, Taiwan
| | - Wei-Ning Lin
- PhD Program in Nutrition and Food Science, Fu Jen Catholic University, New Taipei City, Taiwan
- Graduate Institute of Biomedical and Pharmaceutical Science, Fu Jen Catholic University, New Taipei City, Taiwan
| |
Collapse
|
2
|
Ezra S, Bashan A. Network impact of a single-time-point microbial sample. PLoS One 2024; 19:e0301683. [PMID: 38814902 PMCID: PMC11139317 DOI: 10.1371/journal.pone.0301683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 03/20/2024] [Indexed: 06/01/2024] Open
Abstract
The human microbiome plays a crucial role in determining our well-being and can significantly influence human health. The individualized nature of the microbiome may reveal host-specific information about the health state of the subject. In particular, the microbiome is an ecosystem shaped by a tangled network of species-species and host-species interactions. Thus, analysis of the ecological balance of microbial communities can provide insights into these underlying interrelations. However, traditional methods for network analysis require many samples, while in practice only a single-time-point microbial sample is available in clinical screening. Recently, a method for the analysis of a single-time-point sample, which evaluates its 'network impact' with respect to a reference cohort, has been applied to analyze microbial samples from women with Gestational Diabetes Mellitus. Here, we introduce different variations of the network impact approach and systematically study their performance using simulated 'samples' fabricated via the Generalized Lotka-Volttera model of ecological dynamics. We show that the network impact of a single sample captures the effect of the interactions between the species, and thus can be applied to anomaly detection of shuffled samples, which are 'normal' in terms of species abundance but 'abnormal' in terms of species-species interrelations. In addition, we demonstrate the use of the network impact in binary and multiclass classifications, where the reference cohorts have similar abundance profiles but different species-species interactions. Individualized analysis of the human microbiome has the potential to improve diagnosis and personalized treatments.
Collapse
Affiliation(s)
- Shir Ezra
- Physics Department, Bar-Ilan University, Ramat Gan, Israel
| | - Amir Bashan
- Physics Department, Bar-Ilan University, Ramat Gan, Israel
| |
Collapse
|
3
|
Teixeira M, Silva F, Ferreira RM, Pereira T, Figueiredo C, Oliveira HP. A review of machine learning methods for cancer characterization from microbiome data. NPJ Precis Oncol 2024; 8:123. [PMID: 38816569 PMCID: PMC11139966 DOI: 10.1038/s41698-024-00617-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 05/17/2024] [Indexed: 06/01/2024] Open
Abstract
Recent studies have shown that the microbiome can impact cancer development, progression, and response to therapies suggesting microbiome-based approaches for cancer characterization. As cancer-related signatures are complex and implicate many taxa, their discovery often requires Machine Learning approaches. This review discusses Machine Learning methods for cancer characterization from microbiome data. It focuses on the implications of choices undertaken during sample collection, feature selection and pre-processing. It also discusses ML model selection, guiding how to choose an ML model, and model validation. Finally, it enumerates current limitations and how these may be surpassed. Proposed methods, often based on Random Forests, show promising results, however insufficient for widespread clinical usage. Studies often report conflicting results mainly due to ML models with poor generalizability. We expect that evaluating models with expanded, hold-out datasets, removing technical artifacts, exploring representations of the microbiome other than taxonomical profiles, leveraging advances in deep learning, and developing ML models better adapted to the characteristics of microbiome data will improve the performance and generalizability of models and enable their usage in the clinic.
Collapse
Affiliation(s)
- Marco Teixeira
- Institute for Systems and Computer Engineering, Technology and Science, Porto, Portugal.
- Faculty of Engineering, University of Porto, Porto, Portugal.
| | - Francisco Silva
- Institute for Systems and Computer Engineering, Technology and Science, Porto, Portugal
- Faculty of Science, University of Porto, Porto, Portugal
| | - Rui M Ferreira
- Ipatimup - Institute of Molecular Pathology and Immunology of the University of Porto, Porto, Portugal
- Instituto de Investigação e Inovação em Saúde, University of Porto, Porto, Portugal
| | - Tania Pereira
- Institute for Systems and Computer Engineering, Technology and Science, Porto, Portugal
- Faculty of Sciences and Technology, University of Coimbra, Coimbra, Portugal
| | - Ceu Figueiredo
- Ipatimup - Institute of Molecular Pathology and Immunology of the University of Porto, Porto, Portugal
- Instituto de Investigação e Inovação em Saúde, University of Porto, Porto, Portugal
- Faculty of Medicine, University of Porto, Porto, Portugal
| | - Hélder P Oliveira
- Institute for Systems and Computer Engineering, Technology and Science, Porto, Portugal
- Faculty of Science, University of Porto, Porto, Portugal
| |
Collapse
|
4
|
Hagen M, Dass R, Westhues C, Blom J, Schultheiss SJ, Patz S. Interpretable machine learning decodes soil microbiome's response to drought stress. ENVIRONMENTAL MICROBIOME 2024; 19:35. [PMID: 38812054 PMCID: PMC11138018 DOI: 10.1186/s40793-024-00578-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 05/10/2024] [Indexed: 05/31/2024]
Abstract
BACKGROUND Extreme weather events induced by climate change, particularly droughts, have detrimental consequences for crop yields and food security. Concurrently, these conditions provoke substantial changes in the soil bacterial microbiota and affect plant health. Early recognition of soil affected by drought enables farmers to implement appropriate agricultural management practices. In this context, interpretable machine learning holds immense potential for drought stress classification of soil based on marker taxa. RESULTS This study demonstrates that the 16S rRNA-based metagenomic approach of Differential Abundance Analysis methods and machine learning-based Shapley Additive Explanation values provide similar information. They exhibit their potential as complementary approaches for identifying marker taxa and investigating their enrichment or depletion under drought stress in grass lineages. Additionally, the Random Forest Classifier trained on a diverse range of relative abundance data from the soil bacterial micobiome of various plant species achieves a high accuracy of 92.3 % at the genus rank for drought stress prediction. It demonstrates its generalization capacity for the lineages tested. CONCLUSIONS In the detection of drought stress in soil bacterial microbiota, this study emphasizes the potential of an optimized and generalized location-based ML classifier. By identifying marker taxa, this approach holds promising implications for microbe-assisted plant breeding programs and contributes to the development of sustainable agriculture practices. These findings are crucial for preserving global food security in the face of climate change.
Collapse
Affiliation(s)
- Michelle Hagen
- Computomics GmbH, Eisenbahnstraße 1, 72072, Tübingen, Baden-Württemberg, Germany
| | - Rupashree Dass
- Computomics GmbH, Eisenbahnstraße 1, 72072, Tübingen, Baden-Württemberg, Germany
| | - Cathy Westhues
- Computomics GmbH, Eisenbahnstraße 1, 72072, Tübingen, Baden-Württemberg, Germany
| | - Jochen Blom
- Bioinformatics and Systems Biology, Justus Liebig University Gießen, Heinrich-Buff-Ring 58, 35390, Gießen, Hesse, Germany
| | | | - Sascha Patz
- Computomics GmbH, Eisenbahnstraße 1, 72072, Tübingen, Baden-Württemberg, Germany.
| |
Collapse
|
5
|
Bombaywala S, Bajaj A, Dafale NA. Meta-analysis of wastewater microbiome for antibiotic resistance profiling. J Microbiol Methods 2024; 223:106953. [PMID: 38754482 DOI: 10.1016/j.mimet.2024.106953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 05/12/2024] [Accepted: 05/12/2024] [Indexed: 05/18/2024]
Abstract
The microbial composition and stress molecules are main drivers influencing the development and spread of antibiotic resistance bacteria (ARBs) and genes (ARGs) in the environment. A reliable and rapid method for identifying associations between microbiome composition and resistome remains challenging. In the present study, secondary metagenome data of sewage and hospital wastewaters were assessed for differential taxonomic and ARG profiling. Subsequently, Random Forest (RF)-based ML models were used to predict ARG profiles based on taxonomic composition and model validation on hospital wastewaters. Total ARG abundance was significantly higher in hospital wastewaters (15 ppm) than sewage (5 ppm), while the resistance towards methicillin, carbapenem, and fluoroquinolone were predominant. Although, Pseudomonas constituted major fraction, Streptomyces, Enterobacter, and Klebsiella were characteristic of hospital wastewaters. Prediction modeling showed that the relative abundance of pathogenic genera Escherichia, Vibrio, and Pseudomonas contributed most towards variations in total ARG count. Moreover, the model was able to identify host-specific patterns for contributing taxa and related ARGs with >90% accuracy in predicting the ARG subtype abundance. More than >80% accuracy was obtained for hospital wastewaters, demonstrating that the model can be validly extrapolated to different types of wastewater systems. Findings from the study showed that the ML approach could identify ARG profile based on bacterial composition including 16S rDNA amplicon data, and can serve as a viable alternative to metagenomic binning for identification of potential hosts of ARGs. Overall, this study demonstrates the promising application of ML techniques for predicting the spread of ARGs and provides guidance for early warning of ARBs emergence.
Collapse
Affiliation(s)
- Sakina Bombaywala
- Environmental Biotechnology & Genomics Division, CSIR-National Environmental Engineering Research Institute (NEERI), Nehru Marg, Nagpur 440020, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Abhay Bajaj
- Environmental Biotechnology & Genomics Division, CSIR-National Environmental Engineering Research Institute (NEERI), Nehru Marg, Nagpur 440020, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Nishant A Dafale
- Environmental Biotechnology & Genomics Division, CSIR-National Environmental Engineering Research Institute (NEERI), Nehru Marg, Nagpur 440020, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India.
| |
Collapse
|
6
|
Gorman ED, Lladser ME. Interpretable metric learning in comparative metagenomics: The adaptive Haar-like distance. PLoS Comput Biol 2024; 20:e1011543. [PMID: 38768195 PMCID: PMC11142682 DOI: 10.1371/journal.pcbi.1011543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 05/31/2024] [Accepted: 04/25/2024] [Indexed: 05/22/2024] Open
Abstract
Random forests have emerged as a promising tool in comparative metagenomics because they can predict environmental characteristics based on microbial composition in datasets where β-diversity metrics fall short of revealing meaningful relationships between samples. Nevertheless, despite this efficacy, they lack biological insight in tandem with their predictions, potentially hindering scientific advancement. To overcome this limitation, we leverage a geometric characterization of random forests to introduce a data-driven phylogenetic β-diversity metric, the adaptive Haar-like distance. This new metric assigns a weight to each internal node (i.e., split or bifurcation) of a reference phylogeny, indicating the relative importance of that node in discerning environmental samples based on their microbial composition. Alongside this, a weighted nearest-neighbors classifier, constructed using the adaptive metric, can be used as a proxy for the random forest while maintaining accuracy on par with that of the original forest and another state-of-the-art classifier, CoDaCoRe. As shown in datasets from diverse microbial environments, however, the new metric and classifier significantly enhance the biological interpretability and visualization of high-dimensional metagenomic samples.
Collapse
Affiliation(s)
- Evan D. Gorman
- Department of Applied Mathematics, University of Colorado, Boulder, Colorado, United States of America
| | - Manuel E. Lladser
- Department of Applied Mathematics, University of Colorado, Boulder, Colorado, United States of America
| |
Collapse
|
7
|
Yerke A, Fry Brumit D, Fodor AA. Proportion-based normalizations outperform compositional data transformations in machine learning applications. MICROBIOME 2024; 12:45. [PMID: 38443997 PMCID: PMC10913632 DOI: 10.1186/s40168-023-01747-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 12/22/2023] [Indexed: 03/07/2024]
Abstract
BACKGROUND Normalization, as a pre-processing step, can significantly affect the resolution of machine learning analysis for microbiome studies. There are countless options for normalization scheme selection. In this study, we examined compositionally aware algorithms including the additive log ratio (alr), the centered log ratio (clr), and a recent evolution of the isometric log ratio (ilr) in the form of balance trees made with the PhILR R package. We also looked at compositionally naïve transformations such as raw counts tables and several transformations that are based on relative abundance, such as proportions, the Hellinger transformation, and a transformation based on the logarithm of proportions (which we call "lognorm"). RESULTS In our evaluation, we used 65 metadata variables culled from four publicly available datasets at the amplicon sequence variant (ASV) level with a random forest machine learning algorithm. We found that different common pre-processing steps in the creation of the balance trees made very little difference in overall performance. Overall, we found that the compositionally aware data transformations such as alr, clr, and ilr (PhILR) performed generally slightly worse or only as well as compositionally naïve transformations. However, relative abundance-based transformations outperformed most other transformations by a small but reliably statistically significant margin. CONCLUSIONS Our results suggest that minimizing the complexity of transformations while correcting for read depth may be a generally preferable strategy in preparing data for machine learning compared to more sophisticated, but more complex, transformations that attempt to better correct for compositionality. Video Abstract.
Collapse
Affiliation(s)
- Aaron Yerke
- Department of Bioinformatics and Genomics, Bioinformatics Building, UNC Charlotte, The University of North Carolina, Charlotte 9331 Robert D. Snyder Rd, Charlotte, USA
- Food Components and Health Laboratory, USDA, ARS, Beltsville Human Nutrition Research Center, Beltsville, USA
| | - Daisy Fry Brumit
- Department of Bioinformatics and Genomics, Bioinformatics Building, UNC Charlotte, The University of North Carolina, Charlotte 9331 Robert D. Snyder Rd, Charlotte, USA
| | - Anthony A Fodor
- Department of Bioinformatics and Genomics, Bioinformatics Building, UNC Charlotte, The University of North Carolina, Charlotte 9331 Robert D. Snyder Rd, Charlotte, USA.
| |
Collapse
|
8
|
Gradisteanu Pircalabioru G, Raileanu M, Dionisie MV, Lixandru-Petre IO, Iliescu C. Fast detection of bacterial gut pathogens on miniaturized devices: an overview. Expert Rev Mol Diagn 2024; 24:201-218. [PMID: 38347807 DOI: 10.1080/14737159.2024.2316756] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Accepted: 02/06/2024] [Indexed: 03/23/2024]
Abstract
INTRODUCTION Gut microbes pose challenges like colon inflammation, deadly diarrhea, antimicrobial resistance dissemination, and chronic disease onset. Development of early, rapid and specific diagnosis tools is essential for improving infection control. Point-of-care testing (POCT) systems offer rapid, sensitive, low-cost and sample-to-answer methods for microbe detection from various clinical and environmental samples, bringing the advantages of portability, automation, and simple operation. AREAS COVERED Rapid detection of gut microbes can be done using a wide array of techniques including biosensors, immunological assays, electrochemical impedance spectroscopy, mass spectrometry and molecular biology. Inclusion of Internet of Things, machine learning, and smartphone-based point-of-care applications is an important aspect of POCT. In this review, the authors discuss various fast diagnostic platforms for gut pathogens and their main challenges. EXPERT OPINION Developing effective assays for microbe detection can be complex. Assay design must consider factors like target selection, real-time and multiplex detection, sample type, reagent stability and storage, primer/probe design, and optimizing reaction conditions for accuracy and sensitivity. Mitigating these challenges requires interdisciplinary collaboration among scientists, clinicians, engineers, and industry partners. Future efforts are essential to enhance sensitivity, specificity, and versatility of POCT systems for gut microbe detection and quantification, advancing infectious disease diagnostics and management.
Collapse
Affiliation(s)
- Gratiela Gradisteanu Pircalabioru
- eBio-hub Research Centre, National University of Science and Technology "Politehnica" Bucharest, Bucharest, Romania
- Division of Earth, Environmental and Life Sciences, The Research Institute of University of Bucharest (ICUB), Bucharest, Romania
- Academy of Romanian Scientists, Bucharest, Romania
| | - Mina Raileanu
- eBio-hub Research Centre, National University of Science and Technology "Politehnica" Bucharest, Bucharest, Romania
- Department of Life and Environmental Physics, Horia Hulubei National Institute of Physics and Nuclear Engineering, Magurele, Romania
| | - Mihai Viorel Dionisie
- eBio-hub Research Centre, National University of Science and Technology "Politehnica" Bucharest, Bucharest, Romania
| | - Irina-Oana Lixandru-Petre
- eBio-hub Research Centre, National University of Science and Technology "Politehnica" Bucharest, Bucharest, Romania
| | - Ciprian Iliescu
- eBio-hub Research Centre, National University of Science and Technology "Politehnica" Bucharest, Bucharest, Romania
- Academy of Romanian Scientists, Bucharest, Romania
- Microsystems in Biomedical and Environmental Applications, National Research and Development Institute for Microtechnology, Bucharest, Romania
| |
Collapse
|
9
|
Brochu HN, Smith E, Jeong S, Carlson M, Hansen SG, Tisoncik-Go J, Law L, Picker LJ, Gale M, Peng X. Pre-challenge gut microbial signature predicts RhCMV/SIV vaccine efficacy in rhesus macaques. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.27.582186. [PMID: 38464179 PMCID: PMC10925241 DOI: 10.1101/2024.02.27.582186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Background RhCMV/SIV vaccines protect ∼59% of vaccinated rhesus macaques against repeated limiting-dose intra-rectal exposure with highly pathogenic SIVmac239M, but the exact mechanism responsible for the vaccine efficacy is not known. It is becoming evident that complex interactions exist between gut microbiota and the host immune system. Here we aimed to investigate if the rhesus gut microbiome impacts RhCMV/SIV vaccine-induced protection. Methods Three groups of 15 rhesus macaques naturally pre-exposed to RhCMV were vaccinated with RhCMV/SIV vaccines. Rectal swabs were collected longitudinally both before SIV challenge (after vaccination) and post challenge and were profiled using 16S rRNA based microbiome analysis. Results We identified ∼2,400 16S rRNA amplicon sequence variants (ASVs), representing potential bacterial species/strains. Global gut microbial profiles were strongly associated with each of the three vaccination groups, and all animals tended to maintain consistent profiles throughout the pre-challenge phase. Despite vaccination group differences, using newly developed compositional data analysis techniques we identified a common gut microbial signature predictive of vaccine protection outcome across the three vaccination groups. Part of this microbial signature persisted even after SIV challenge. We also observed a strong correlation between this microbial signature and an early signature derived from whole blood transcriptomes in the same animals. Conclusions Our findings indicate that changes in gut microbiomes are associated with RhCMV/SIV vaccine-induced protection and early host response to vaccination in rhesus macaques.
Collapse
|
10
|
Baddal B, Taner F, Uzun Ozsahin D. Harnessing of Artificial Intelligence for the Diagnosis and Prevention of Hospital-Acquired Infections: A Systematic Review. Diagnostics (Basel) 2024; 14:484. [PMID: 38472956 DOI: 10.3390/diagnostics14050484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Revised: 01/23/2024] [Accepted: 02/19/2024] [Indexed: 03/14/2024] Open
Abstract
Healthcare-associated infections (HAIs) are the most common adverse events in healthcare and constitute a major global public health concern. Surveillance represents the foundation for the effective prevention and control of HAIs, yet conventional surveillance is costly and labor intensive. Artificial intelligence (AI) and machine learning (ML) have the potential to support the development of HAI surveillance algorithms for the understanding of HAI risk factors, the improvement of patient risk stratification as well as the prediction and timely detection and prevention of infections. AI-supported systems have so far been explored for clinical laboratory testing and imaging diagnosis, antimicrobial resistance profiling, antibiotic discovery and prediction-based clinical decision support tools in terms of HAIs. This review aims to provide a comprehensive summary of the current literature on AI applications in the field of HAIs and discuss the future potentials of this emerging technology in infection practice. Following the PRISMA guidelines, this study examined the articles in databases including PubMed and Scopus until November 2023, which were screened based on the inclusion and exclusion criteria, resulting in 162 included articles. By elucidating the advancements in the field, we aim to highlight the potential applications of AI in the field, report related issues and shortcomings and discuss the future directions.
Collapse
Affiliation(s)
- Buket Baddal
- Department of Medical Microbiology and Clinical Microbiology, Faculty of Medicine, Near East University, North Cyprus, Mersin 10, 99138 Nicosia, Turkey
- DESAM Research Institute, Near East University, North Cyprus, Mersin 10, 99138 Nicosia, Turkey
| | - Ferdiye Taner
- Department of Medical Microbiology and Clinical Microbiology, Faculty of Medicine, Near East University, North Cyprus, Mersin 10, 99138 Nicosia, Turkey
- DESAM Research Institute, Near East University, North Cyprus, Mersin 10, 99138 Nicosia, Turkey
| | - Dilber Uzun Ozsahin
- Department of Medical Diagnostic Imaging, College of Health Science, University of Sharjah, Sharjah 27272, United Arab Emirates
- Research Institute for Medical and Health Sciences, University of Sharjah, Sharjah 27272, United Arab Emirates
- Operational Research Centre in Healthcare, Near East University, North Cyprus, Mersin 10, 99138 Nicosia, Turkey
| |
Collapse
|
11
|
Walsh C, Stallard-Olivera E, Fierer N. Nine (not so simple) steps: a practical guide to using machine learning in microbial ecology. mBio 2024; 15:e0205023. [PMID: 38126787 PMCID: PMC10865974 DOI: 10.1128/mbio.02050-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2023] Open
Abstract
Due to the complex nature of microbiome data, the field of microbial ecology has many current and potential uses for machine learning (ML) modeling. With the increased use of predictive ML models across many disciplines, including microbial ecology, there is extensive published information on the specific ML algorithms available and how those algorithms have been applied. Thus, our goal is not to summarize the breadth of ML models available or compare their performances. Rather, our goal is to provide more concrete and actionable information to guide microbial ecologists in how to select, run, and interpret ML algorithms to predict the taxa or genes associated with particular sample categories or environmental gradients of interest. Such microbial data often have unique characteristics that require careful consideration of how to apply ML models and how to interpret the associated results. This review is intended for practicing microbial ecologists who may be unfamiliar with some of the intricacies of ML models. We provide examples and discuss common opportunities and pitfalls specific to applying ML models to the types of data sets most frequently collected by microbial ecologists.
Collapse
Affiliation(s)
- Corinne Walsh
- Cooperative Institute of Research in Environmental Sciences, CU Boulder, Boulder, Colorado, USA
- Ecology and Evolutionary Biology Department, CU Boulder, Boulder, Colorado, USA
| | - Elías Stallard-Olivera
- Cooperative Institute of Research in Environmental Sciences, CU Boulder, Boulder, Colorado, USA
- Ecology and Evolutionary Biology Department, CU Boulder, Boulder, Colorado, USA
| | - Noah Fierer
- Cooperative Institute of Research in Environmental Sciences, CU Boulder, Boulder, Colorado, USA
- Ecology and Evolutionary Biology Department, CU Boulder, Boulder, Colorado, USA
| |
Collapse
|
12
|
Soldan R, Fusi M, Cardinale M, Homma F, Santos LG, Wenzl P, Bach-Pages M, Bitocchi E, Chacon Sanchez MI, Daffonchio D, Preston GM. Consistent effects of independent domestication events on the plant microbiota. Curr Biol 2024; 34:557-567.e4. [PMID: 38232731 DOI: 10.1016/j.cub.2023.12.056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 12/01/2023] [Accepted: 12/18/2023] [Indexed: 01/19/2024]
Abstract
The effect of plant domestication on plant-microbe interactions remains difficult to prove. In this study, we provide evidence of a domestication effect on the composition and abundance of the plant microbiota. We focused on the genus Phaseolus, which underwent four independent domestication events within two species (P. vulgaris and P. lunatus), providing multiple replicates of a process spanning thousands of years. We targeted Phaseolus seeds to identify a link between domesticated traits and bacterial community composition as Phaseolus seeds have been subject to large and consistent phenotypic changes during these independent domestication events. The seed bacterial communities of representative plant accessions from subpopulations descended from each domestication event were analyzed under controlled and field conditions. The results showed that independent domestication events led to similar seed bacterial community signatures in independently domesticated plant populations, which could be partially explained by selection for common domesticated plant phenotypes. Our results therefore provide evidence of a consistent effect of plant domestication on seed microbial community composition and abundance and offer avenues for applying knowledge of the impact of plant domestication on the plant microbiota to improve microbial applications in agriculture.
Collapse
Affiliation(s)
| | - Marco Fusi
- Center for Conservation and Restoration Science, Edinburgh Napier University, Edinburgh, UK
| | - Massimiliano Cardinale
- University of Salento, Department of Biological and Environmental Sciences and Technologies, Lecce, Italy
| | - Felix Homma
- University of Oxford, Department of Biology, Oxford, UK
| | - Luis Guillermo Santos
- The Alliance Biodiversity International and the International Center for Tropical Agriculture (CIAT), Palmira, Colombia
| | - Peter Wenzl
- The Alliance Biodiversity International and the International Center for Tropical Agriculture (CIAT), Palmira, Colombia
| | | | - Elena Bitocchi
- Dipartimento di Scienze Agrarie, Alimentari ed Ambientali, Università Politecnica delle Marche, Ancona, Italy
| | - Maria Isabel Chacon Sanchez
- Departamento de Agronomía, Facultad de Ciencias Agrarias, Universidad Nacional de Colombia, Bogotá, Colombia
| | - Daniele Daffonchio
- Red Sea Research Center (RSRC), 4700 King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Gail M Preston
- University of Oxford, Department of Biology, Oxford, UK.
| |
Collapse
|
13
|
Yang MQ, Wang ZJ, Zhai CB, Chen LQ. Research progress on the application of 16S rRNA gene sequencing and machine learning in forensic microbiome individual identification. Front Microbiol 2024; 15:1360457. [PMID: 38371926 PMCID: PMC10869621 DOI: 10.3389/fmicb.2024.1360457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Accepted: 01/23/2024] [Indexed: 02/20/2024] Open
Abstract
Forensic microbiome research is a field with a wide range of applications and a number of protocols have been developed for its use in this area of research. As individuals host radically different microbiota, the human microbiome is expected to become a new biomarker for forensic identification. To achieve an effective use of this procedure an understanding of factors which can alter the human microbiome and determinations of stable and changing elements will be critical in selecting appropriate targets for investigation. The 16S rRNA gene, which is notable for its conservation and specificity, represents a potentially ideal marker for forensic microbiome identification. Gene sequencing involving 16S rRNA is currently the method of choice for use in investigating microbiomes. While the sequencing involved with microbiome determinations can generate large multi-dimensional datasets that can be difficult to analyze and interpret, machine learning methods can be useful in surmounting this analytical challenge. In this review, we describe the research methods and related sequencing technologies currently available for application of 16S rRNA gene sequencing and machine learning in the field of forensic identification. In addition, we assess the potential value of 16S rRNA and machine learning in forensic microbiome science.
Collapse
Affiliation(s)
- Mai-Qing Yang
- Department of Pathology, Weifang People's Hospital (First Affiliated Hospital of Shandong Second Medical University), Weifang, China
| | - Zheng-Jiang Wang
- Department of Pathology, Weifang People's Hospital (First Affiliated Hospital of Shandong Second Medical University), Weifang, China
| | - Chun-Bo Zhai
- Department of Second Ward of Thoracic Surgery, Weifang People's Hospital (First Affiliated Hospital of Shandong Second Medical University), Weifang, China
| | - Li-Qian Chen
- Department of Pathology, Weifang People's Hospital (First Affiliated Hospital of Shandong Second Medical University), Weifang, China
| |
Collapse
|
14
|
Rojas-Velazquez D, Kidwai S, Kraneveld AD, Tonda A, Oberski D, Garssen J, Lopez-Rincon A. Methodology for biomarker discovery with reproducibility in microbiome data using machine learning. BMC Bioinformatics 2024; 25:26. [PMID: 38225565 PMCID: PMC10789030 DOI: 10.1186/s12859-024-05639-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Accepted: 01/04/2024] [Indexed: 01/17/2024] Open
Abstract
BACKGROUND In recent years, human microbiome studies have received increasing attention as this field is considered a potential source for clinical applications. With the advancements in omics technologies and AI, research focused on the discovery for potential biomarkers in the human microbiome using machine learning tools has produced positive outcomes. Despite the promising results, several issues can still be found in these studies such as datasets with small number of samples, inconsistent results, lack of uniform processing and methodologies, and other additional factors lead to lack of reproducibility in biomedical research. In this work, we propose a methodology that combines the DADA2 pipeline for 16s rRNA sequences processing and the Recursive Ensemble Feature Selection (REFS) in multiple datasets to increase reproducibility and obtain robust and reliable results in biomedical research. RESULTS Three experiments were performed analyzing microbiome data from patients/cases in Inflammatory Bowel Disease (IBD), Autism Spectrum Disorder (ASD), and Type 2 Diabetes (T2D). In each experiment, we found a biomarker signature in one dataset and applied to 2 other as further validation. The effectiveness of the proposed methodology was compared with other feature selection methods such as K-Best with F-score and random selection as a base line. The Area Under the Curve (AUC) was employed as a measure of diagnostic accuracy and used as a metric for comparing the results of the proposed methodology with other feature selection methods. Additionally, we use the Matthews Correlation Coefficient (MCC) as a metric to evaluate the performance of the methodology as well as for comparison with other feature selection methods. CONCLUSIONS We developed a methodology for reproducible biomarker discovery for 16s rRNA microbiome sequence analysis, addressing the issues related with data dimensionality, inconsistent results and validation across independent datasets. The findings from the three experiments, across 9 different datasets, show that the proposed methodology achieved higher accuracy compared to other feature selection methods. This methodology is a first approach to increase reproducibility, to provide robust and reliable results.
Collapse
Affiliation(s)
- David Rojas-Velazquez
- Division of Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, University of Utrecht, Utrecht, The Netherlands.
- Department of Data Science, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands.
| | - Sarah Kidwai
- Division of Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, University of Utrecht, Utrecht, The Netherlands
| | - Aletta D Kraneveld
- Division of Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, University of Utrecht, Utrecht, The Netherlands
- Department of Neuroscience, Faculty of Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Alberto Tonda
- UMR 518 MIA - PS, INRAE, Institut des Systèmes Complexes de Paris, Île - de - France (ISC-PIF) - UAR 3611 CNRS, Université Paris-Saclay, Paris, France
| | - Daniel Oberski
- Department of Data Science, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Johan Garssen
- Division of Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, University of Utrecht, Utrecht, The Netherlands
- Global Centre of Excellence Immunology, Danone Nutricia Research, Utrecht, The Netherlands
| | - Alejandro Lopez-Rincon
- Division of Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, University of Utrecht, Utrecht, The Netherlands
- Department of Data Science, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
| |
Collapse
|
15
|
Zhang Y, Wu H, Xu R, Wang Y, Chen L, Wei C. Machine learning modeling for the prediction of phosphorus and nitrogen removal efficiency and screening of crucial microorganisms in wastewater treatment plants. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 907:167730. [PMID: 37852495 DOI: 10.1016/j.scitotenv.2023.167730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 10/08/2023] [Accepted: 10/08/2023] [Indexed: 10/20/2023]
Abstract
The effectiveness of wastewater treatment plants (WWTPs) is largely determined by the microbial community structure in their activated sludge (AS). Interactions among microbial communities in AS systems and their indirect effects on water quality changes are crucial for WWTP performance. However, there is currently no quantitative method to evaluate the contribution of microorganisms to the operating efficiency of WWTPs. Traditional assessments of WWTP performance are limited by experimental conditions, methods, and other factors, resulting in increased costs and experimental pollutants. Therefore, an effective method is needed to predict WWTP efficiency based on AS community structure and quantitatively evaluate the contribution of microorganisms in the AS system. This study evaluated and compared microbial communities and water quality changes from WWTPs worldwide by meta-analysis of published high-throughput sequencing data. Six machine learning (ML) models were utilized to predict the efficiency of phosphorus and nitrogen removal in WWTPs; among them, XGBoost showed the highest prediction accuracy. Cross-entropy was used to screen the crucial microorganisms related to phosphorus and nitrogen removal efficiency, and the modeling confirmed the reasonableness of the results. Thirteen genera with nitrogen and phosphorus cycling pathways obtained from the screening were considered highly appropriate for the simultaneous removal of phosphorus and nitrogen. The results showed that the microbes Haliangium, Vicinamibacteraceae, Tolumonas, and SWB02 are potentially crucial for phosphorus and nitrogen removal, as they may be involved in the process of phosphorus and nitrogen removal in sewage treatment plants. Overall, these findings have deepened our understanding of the relationship between microbial community structure and performance of WWTPs, indicating that microbial data should play a critical role in the future design of sewage treatment plants. The ML model of this study can efficiently screen crucial microbes associated with WWTP system performance, and it is promising for the discovery of potential microbial metabolic pathways.
Collapse
Affiliation(s)
- Yinan Zhang
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, PR China
| | - Haizhen Wu
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, PR China.
| | - Rui Xu
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, PR China
| | - Ying Wang
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, PR China
| | - Liping Chen
- School of Environment and Energy, South China University of Technology, Guangzhou Higher Education Mega Centre, Guangzhou 510006, PR China
| | - Chaohai Wei
- School of Environment and Energy, South China University of Technology, Guangzhou Higher Education Mega Centre, Guangzhou 510006, PR China
| |
Collapse
|
16
|
Peralta-Marzal LN, Rojas-Velazquez D, Rigters D, Prince N, Garssen J, Kraneveld AD, Perez-Pardo P, Lopez-Rincon A. A robust microbiome signature for autism spectrum disorder across different studies using machine learning. Sci Rep 2024; 14:814. [PMID: 38191575 PMCID: PMC10774349 DOI: 10.1038/s41598-023-50601-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 12/21/2023] [Indexed: 01/10/2024] Open
Abstract
Autism spectrum disorder (ASD) is a highly complex neurodevelopmental disorder characterized by deficits in sociability and repetitive behaviour, however there is a great heterogeneity within other comorbidities that accompany ASD. Recently, gut microbiome has been pointed out as a plausible contributing factor for ASD development as individuals diagnosed with ASD often suffer from intestinal problems and show a differentiated intestinal microbial composition. Nevertheless, gut microbiome studies in ASD rarely agree on the specific bacterial taxa involved in this disorder. Regarding the potential role of gut microbiome in ASD pathophysiology, our aim is to investigate whether there is a set of bacterial taxa relevant for ASD classification by using a sibling-controlled dataset. Additionally, we aim to validate these results across two independent cohorts as several confounding factors, such as lifestyle, influence both ASD and gut microbiome studies. A machine learning approach, recursive ensemble feature selection (REFS), was applied to 16S rRNA gene sequencing data from 117 subjects (60 ASD cases and 57 siblings) identifying 26 bacterial taxa that discriminate ASD cases from controls. The average area under the curve (AUC) of this specific set of bacteria in the sibling-controlled dataset was 81.6%. Moreover, we applied the selected bacterial taxa in a tenfold cross-validation scheme using two independent cohorts (a total of 223 samples-125 ASD cases and 98 controls). We obtained average AUCs of 74.8% and 74%, respectively. Analysis of the gut microbiome using REFS identified a set of bacterial taxa that can be used to predict the ASD status of children in three distinct cohorts with AUC over 80% for the best-performing classifiers. Our results indicate that the gut microbiome has a strong association with ASD and should not be disregarded as a potential target for therapeutic interventions. Furthermore, our work can contribute to use the proposed approach for identifying microbiome signatures across other 16S rRNA gene sequencing datasets.
Collapse
Affiliation(s)
- Lucia N Peralta-Marzal
- Division of Pharmacology, Faculty of Science, Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, The Netherlands
| | - David Rojas-Velazquez
- Division of Pharmacology, Faculty of Science, Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, The Netherlands
- Department of Data Science, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Douwe Rigters
- Division of Pharmacology, Faculty of Science, Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, The Netherlands
| | - Naika Prince
- Division of Pharmacology, Faculty of Science, Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, The Netherlands
| | - Johan Garssen
- Division of Pharmacology, Faculty of Science, Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, The Netherlands
- Global Centre of Excellence Immunology, Danone Nutricia Research, Utrecht, The Netherlands
| | - Aletta D Kraneveld
- Division of Pharmacology, Faculty of Science, Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, The Netherlands
- Department of Neuroscience, Faculty of Science, VU University, Amsterdam, The Netherlands
| | - Paula Perez-Pardo
- Division of Pharmacology, Faculty of Science, Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, The Netherlands.
| | - Alejandro Lopez-Rincon
- Division of Pharmacology, Faculty of Science, Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, The Netherlands
- Department of Data Science, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
| |
Collapse
|
17
|
Lyu R, Qu Y, Divaris K, Wu D. Methodological Considerations in Longitudinal Analyses of Microbiome Data: A Comprehensive Review. Genes (Basel) 2023; 15:51. [PMID: 38254941 PMCID: PMC11154524 DOI: 10.3390/genes15010051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 12/22/2023] [Accepted: 12/26/2023] [Indexed: 01/24/2024] Open
Abstract
Biological processes underlying health and disease are inherently dynamic and are best understood when characterized in a time-informed manner. In this comprehensive review, we discuss challenges inherent in time-series microbiome data analyses and compare available approaches and methods to overcome them. Appropriate handling of longitudinal microbiome data can shed light on important roles, functions, patterns, and potential interactions between large numbers of microbial taxa or genes in the context of health, disease, or interventions. We present a comprehensive review and comparison of existing microbiome time-series analysis methods, for both preprocessing and downstream analyses, including differential analysis, clustering, network inference, and trait classification. We posit that the careful selection and appropriate utilization of computational tools for longitudinal microbiome analyses can help advance our understanding of the dynamic host-microbiome relationships that underlie health-maintaining homeostases, progressions to disease-promoting dysbioses, as well as phases of physiologic development like those encountered in childhood.
Collapse
Affiliation(s)
- Ruiqi Lyu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA;
| | - Yixiang Qu
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA;
| | - Kimon Divaris
- Division of Pediatric and Public Health, Adams School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA;
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Di Wu
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA;
- Division of Oral and Craniofacial Health Sciences, Adams School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
18
|
Vänni P, Tejesvi MV, Paalanne N, Aagaard K, Ackermann G, Camargo CA, Eggesbø M, Hasegawa K, Hoen AG, Karagas MR, Kolho KL, Laursen MF, Ludvigsson J, Madan J, Ownby D, Stanton C, Stokholm J, Tapiainen T. Machine-learning analysis of cross-study samples according to the gut microbiome in 12 infant cohorts. mSystems 2023; 8:e0036423. [PMID: 37874156 PMCID: PMC10734493 DOI: 10.1128/msystems.00364-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 09/13/2023] [Indexed: 10/25/2023] Open
Abstract
IMPORTANCE There are challenges in merging microbiome data from diverse research groups due to the intricate and multifaceted nature of such data. To address this, we utilized a combination of machine-learning (ML) models to analyze 16S sequencing data from a substantial set of gut microbiome samples, sourced from 12 distinct infant cohorts that were gathered prospectively. Our initial focus was on the mode of delivery due to its prior association with changes in infant gut microbiomes. Through ML analysis, we demonstrated the effective merging and comparison of various gut microbiome data sets, facilitating the identification of robust microbiome biomarkers applicable across varied study populations.
Collapse
Affiliation(s)
- Petri Vänni
- Research Unit of Clinical Medicine, University of Oulu, Oulu, Finland
| | - Mysore V. Tejesvi
- Research Unit of Clinical Medicine, University of Oulu, Oulu, Finland
- Ecology and Genetics, Faculty of Science, University of Oulu, Oulu, Finland
| | - Niko Paalanne
- Research Unit of Clinical Medicine, University of Oulu, Oulu, Finland
- Department of Pediatrics and Adolescent Medicine, Oulu University Hospital, University of Oulu, Oulu, Finland
| | - Kjersti Aagaard
- Department of Obstetrics & Gynecology, Division of Maternal-Fetal Medicine, Baylor College of Medicine and Texas Children’s Hospital, Houston, Texas, USA
| | - Gail Ackermann
- Department of Pediatrics, University of California, San Diego, California, USA
| | - Carlos A. Camargo
- Department of Emergency Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Merete Eggesbø
- Department of Climate and Environmental Health, Norwegian Institute of Public Health, Oslo, Norway
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway
| | - Kohei Hasegawa
- Department of Emergency Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Anne G. Hoen
- Department of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, New Hampshire, USA
| | - Margaret R. Karagas
- Department of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, New Hampshire, USA
| | - Kaija-Leena Kolho
- Children’s Hospital, University of Helsinki and HUS, Helsinki, Finland
| | - Martin F. Laursen
- National Food Institute, Technical University of Denmark, Lyngby, Denmark
| | - Johnny Ludvigsson
- Crown Princess Victoria Children’s Hospital and Division of Pediatrics, Department of Biomedical and Clinical Sciences, Linköping University, Linköping, Sweden
| | - Juliette Madan
- Department of Psychiatry, Dartmouth Hitchcock Medical Center, Geisel School of Medicine at Dartmouth, Lebanon, New Hampshire, USA
- Department of Pediatrics, Dartmouth Hitchcock Medical Center, Geisel School of Medicine at Dartmouth, Lebanon, New Hampshire, USA
| | - Dennis Ownby
- Medical College of Georgia, Augusta, Georgia, USA
| | - Catherine Stanton
- Teagasc Food Research Centre & APC Microbiome Ireland, Moorepark, Fermoy, Co. Cork, Ireland
| | - Jakob Stokholm
- Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
- Department of Food Science, University of Copenhagen, Copenhagen, Denmark
| | - Terhi Tapiainen
- Research Unit of Clinical Medicine, University of Oulu, Oulu, Finland
- Department of Obstetrics & Gynecology, Division of Maternal-Fetal Medicine, Baylor College of Medicine and Texas Children’s Hospital, Houston, Texas, USA
- Biocenter Oulu, University of Oulu, Oulu, Finland
| |
Collapse
|
19
|
Gautam A, Bhowmik D, Basu S, Zeng W, Lahiri A, Huson DH, Paul S. Microbiome Metabolome Integration Platform (MMIP): a web-based platform for microbiome and metabolome data integration and feature identification. Brief Bioinform 2023; 24:bbad325. [PMID: 37771003 DOI: 10.1093/bib/bbad325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 08/12/2023] [Indexed: 09/30/2023] Open
Abstract
A microbial community maintains its ecological dynamics via metabolite crosstalk. Hence, knowledge of the metabolome, alongside its populace, would help us understand the functionality of a community and also predict how it will change in atypical conditions. Methods that employ low-cost metagenomic sequencing data can predict the metabolic potential of a community, that is, its ability to produce or utilize specific metabolites. These, in turn, can potentially serve as markers of biochemical pathways that are associated with different communities. We developed MMIP (Microbiome Metabolome Integration Platform), a web-based analytical and predictive tool that can be used to compare the taxonomic content, diversity variation and the metabolic potential between two sets of microbial communities from targeted amplicon sequencing data. MMIP is capable of highlighting statistically significant taxonomic, enzymatic and metabolic attributes as well as learning-based features associated with one group in comparison with another. Furthermore, MMIP can predict linkages among species or groups of microbes in the community, specific enzyme profiles, compounds or metabolites associated with such a group of organisms. With MMIP, we aim to provide a user-friendly, online web server for performing key microbiome-associated analyses of targeted amplicon sequencing data, predicting metabolite signature, and using learning-based linkage analysis, without the need for initial metabolomic analysis, and thereby helping in hypothesis generation.
Collapse
Affiliation(s)
- Anupam Gautam
- Algorithms in Bioinformatics, Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany
- International Max Planck Research School "From Molecules to Organisms", Max Planck Institute for Biology Tübingen, Tübingen, Germany
- Cluster of Excellence: EXC 2124: Controlling Microbes to Fight Infection, Tübingen, Germany
| | - Debaleena Bhowmik
- Cell Biology and Physiology Division, CSIR-Indian Institute of Chemical Biology, Kolkata, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Sayantani Basu
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, United States
| | - Wenhuan Zeng
- Algorithms in Bioinformatics, Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany
- Cluster of Excellence: EXC 2064: Machine Learning: New Perspectives for Science, University of Tübingen, Tübingen, Germany
| | - Abhishake Lahiri
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
- Infectious Diseases and Immunology Division, CSIR-Indian Institute of Chemical Biology, Kolkata, India
- Centre for Health Science and Technology, JIS Institute of Advanced Studies and Research Kolkata, JIS University, West Bengal, India
| | - Daniel H Huson
- Algorithms in Bioinformatics, Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany
- International Max Planck Research School "From Molecules to Organisms", Max Planck Institute for Biology Tübingen, Tübingen, Germany
- Cluster of Excellence: EXC 2124: Controlling Microbes to Fight Infection, Tübingen, Germany
| | - Sandip Paul
- Centre for Health Science and Technology, JIS Institute of Advanced Studies and Research Kolkata, JIS University, West Bengal, India
| |
Collapse
|
20
|
Huang S, Ailer E, Kilbertus N, Pfister N. Supervised learning and model analysis with compositional data. PLoS Comput Biol 2023; 19:e1011240. [PMID: 37390111 DOI: 10.1371/journal.pcbi.1011240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 06/03/2023] [Indexed: 07/02/2023] Open
Abstract
Supervised learning, such as regression and classification, is an essential tool for analyzing modern high-throughput sequencing data, for example in microbiome research. However, due to the compositionality and sparsity, existing techniques are often inadequate. Either they rely on extensions of the linear log-contrast model (which adjust for compositionality but cannot account for complex signals or sparsity) or they are based on black-box machine learning methods (which may capture useful signals, but lack interpretability due to the compositionality). We propose KernelBiome, a kernel-based nonparametric regression and classification framework for compositional data. It is tailored to sparse compositional data and is able to incorporate prior knowledge, such as phylogenetic structure. KernelBiome captures complex signals, including in the zero-structure, while automatically adapting model complexity. We demonstrate on par or improved predictive performance compared with state-of-the-art machine learning methods on 33 publicly available microbiome datasets. Additionally, our framework provides two key advantages: (i) We propose two novel quantities to interpret contributions of individual components and prove that they consistently estimate average perturbation effects of the conditional mean, extending the interpretability of linear log-contrast coefficients to nonparametric models. (ii) We show that the connection between kernels and distances aids interpretability and provides a data-driven embedding that can augment further analysis. KernelBiome is available as an open-source Python package on PyPI and at https://github.com/shimenghuang/KernelBiome.
Collapse
Affiliation(s)
- Shimeng Huang
- Department of Mathematical Sciences, University of Copenhagen, Copenhagen, Denmark
| | | | - Niki Kilbertus
- Helmholtz Munich, Munich, Germany
- Technical University of Munich, Munich, Germany
| | - Niklas Pfister
- Department of Mathematical Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
21
|
Neri-Rosario D, Martínez-López YE, Esquivel-Hernández DA, Sánchez-Castañeda JP, Padron-Manrique C, Vázquez-Jiménez A, Giron-Villalobos D, Resendis-Antonio O. Dysbiosis signatures of gut microbiota and the progression of type 2 diabetes: a machine learning approach in a Mexican cohort. Front Endocrinol (Lausanne) 2023; 14:1170459. [PMID: 37441494 PMCID: PMC10333697 DOI: 10.3389/fendo.2023.1170459] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Accepted: 06/09/2023] [Indexed: 07/15/2023] Open
Abstract
Introduction The gut microbiota (GM) dysbiosis is one of the causal factors for the progression of different chronic metabolic diseases, including type 2 diabetes mellitus (T2D). Understanding the basis that laid this association may lead to developing new therapeutic strategies for preventing and treating T2D, such as probiotics, prebiotics, and fecal microbiota transplants. It may also help identify potential early detection biomarkers and develop personalized interventions based on an individual's gut microbiota profile. Here, we explore how supervised Machine Learning (ML) methods help to distinguish taxa for individuals with prediabetes (prediabetes) or T2D. Methods To this aim, we analyzed the GM profile (16s rRNA gene sequencing) in a cohort of 410 Mexican naïve patients stratified into normoglycemic, prediabetes, and T2D individuals. Then, we compared six different ML algorithms and found that Random Forest had the highest predictive performance in classifying T2D and prediabetes patients versus controls. Results We identified a set of taxa for predicting patients with T2D compared to normoglycemic individuals, including Allisonella, Slackia, Ruminococus_2, Megaspgaera, Escherichia/Shigella, and Prevotella, among them. Besides, we concluded that Anaerostipes, Intestinibacter, Prevotella_9, Blautia, Granulicatella, and Veillonella were the relevant genus in patients with prediabetes compared to normoglycemic subjects. Discussion These findings allow us to postulate that GM is a distinctive signature in prediabetes and T2D patients during the development and progression of the disease. Our study highlights the role of GM and opens a window toward the rational design of new preventive and personalized strategies against the control of this disease.
Collapse
Affiliation(s)
- Daniel Neri-Rosario
- Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN), México City, Mexico
- Programa de Maestría y Doctorado en Ciencias Bioquímicas, Universidad Nacional Autónoma de México (UNAM), Ciudad de México, Mexico
| | | | | | - Jean Paul Sánchez-Castañeda
- Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN), México City, Mexico
- Programa de Maestría y Doctorado en Ciencias Bioquímicas, Universidad Nacional Autónoma de México (UNAM), Ciudad de México, Mexico
| | - Cristian Padron-Manrique
- Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN), México City, Mexico
- Programa de Doctorado en Ciencias Biomédicas, Universidad Nacional Autónoma de México (UNAM), Ciudad de México, Mexico
| | - Aarón Vázquez-Jiménez
- Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN), México City, Mexico
| | - David Giron-Villalobos
- Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN), México City, Mexico
- Programa de Maestría y Doctorado en Ciencias Bioquímicas, Universidad Nacional Autónoma de México (UNAM), Ciudad de México, Mexico
| | - Osbaldo Resendis-Antonio
- Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN), México City, Mexico
- Coordinación de la Investigación Científica – Red de Apoyo a la Investigación, Universidad Nacional Autónoma de México (UNAM), Ciudad de México, Mexico
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México (UNAM), Ciudad de México, Mexico
| |
Collapse
|
22
|
Manghi P, Blanco-Míguez A, Manara S, NabiNejad A, Cumbo F, Beghini F, Armanini F, Golzato D, Huang KD, Thomas AM, Piccinno G, Punčochář M, Zolfo M, Lesker TR, Bredon M, Planchais J, Glodt J, Valles-Colomer M, Koren O, Pasolli E, Asnicar F, Strowig T, Sokol H, Segata N. MetaPhlAn 4 profiling of unknown species-level genome bins improves the characterization of diet-associated microbiome changes in mice. Cell Rep 2023; 42:112464. [PMID: 37141097 PMCID: PMC10242440 DOI: 10.1016/j.celrep.2023.112464] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 03/10/2023] [Accepted: 04/17/2023] [Indexed: 05/05/2023] Open
Abstract
Mouse models are key tools for investigating host-microbiome interactions. However, shotgun metagenomics can only profile a limited fraction of the mouse gut microbiome. Here, we employ a metagenomic profiling method, MetaPhlAn 4, which exploits a large catalog of metagenome-assembled genomes (including 22,718 metagenome-assembled genomes from mice) to improve the profiling of the mouse gut microbiome. We combine 622 samples from eight public datasets and an additional cohort of 97 mouse microbiomes, and we assess the potential of MetaPhlAn 4 to better identify diet-related changes in the host microbiome using a meta-analysis approach. We find multiple, strong, and reproducible diet-related microbial biomarkers, largely increasing those identifiable by other available methods relying only on reference information. The strongest drivers of the diet-induced changes are uncharacterized and previously undetected taxa, confirming the importance of adopting metagenomic methods integrating metagenomic assemblies for comprehensive profiling.
Collapse
Affiliation(s)
- Paolo Manghi
- Department CIBIO, University of Trento, Trento, Italy
| | | | - Serena Manara
- Department CIBIO, University of Trento, Trento, Italy
| | - Amir NabiNejad
- Department CIBIO, University of Trento, Trento, Italy; IEO, European Institute of Oncology IRCCS, Milan, Italy
| | - Fabio Cumbo
- Department CIBIO, University of Trento, Trento, Italy
| | | | | | | | - Kun D Huang
- Department CIBIO, University of Trento, Trento, Italy
| | | | | | | | - Moreno Zolfo
- Department CIBIO, University of Trento, Trento, Italy
| | - Till R Lesker
- Department of Microbial Immune Regulation, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Marius Bredon
- Gastroenterology Department, Sorbonne Université, INSERM, Centre de Recherche Saint Antoine, CRSA, AP-HP, Saint Antoine Hospital, 75012 Paris, France; Paris Centre for Microbiome Medicine (PaCeMM) FHU, Paris, France
| | - Julien Planchais
- Paris Centre for Microbiome Medicine (PaCeMM) FHU, Paris, France; INRAE, UMR1319 Micalis & AgroParisTech, Jouy en Josas, France
| | - Jeremy Glodt
- Paris Centre for Microbiome Medicine (PaCeMM) FHU, Paris, France; INRAE, UMR1319 Micalis & AgroParisTech, Jouy en Josas, France
| | | | - Omry Koren
- Azrieli Faculty of Medicine, Bar-Ilan University, Safed, Israel
| | - Edoardo Pasolli
- Department of Agricultural Sciences, University of Naples, Naples, Italy
| | | | - Till Strowig
- Department of Microbial Immune Regulation, Helmholtz Centre for Infection Research, Braunschweig, Germany; Centre for Individualised Infection Medicine (CiiM), a joint venture between the Helmholtz-Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany
| | - Harry Sokol
- Gastroenterology Department, Sorbonne Université, INSERM, Centre de Recherche Saint Antoine, CRSA, AP-HP, Saint Antoine Hospital, 75012 Paris, France; Paris Centre for Microbiome Medicine (PaCeMM) FHU, Paris, France; INRAE, UMR1319 Micalis & AgroParisTech, Jouy en Josas, France
| | - Nicola Segata
- Department CIBIO, University of Trento, Trento, Italy; IEO, European Institute of Oncology IRCCS, Milan, Italy.
| |
Collapse
|
23
|
Tapio M, Fischer D, Mäntysaari P, Tapio I. Rumen Microbiota Predicts Feed Efficiency of Primiparous Nordic Red Dairy Cows. Microorganisms 2023; 11:1116. [PMID: 37317090 DOI: 10.3390/microorganisms11051116] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 04/17/2023] [Accepted: 04/23/2023] [Indexed: 06/16/2023] Open
Abstract
Efficient feed utilization in dairy cows is crucial for economic and environmental reasons. The rumen microbiota plays a significant role in feed efficiency, but studies utilizing microbial data to predict host phenotype are limited. In this study, 87 primiparous Nordic Red dairy cows were ranked for feed efficiency during their early lactation based on residual energy intake, and the rumen liquid microbial ecosystem was subsequently evaluated using 16S rRNA amplicon and metagenome sequencing. The study used amplicon data to build an extreme gradient boosting model, demonstrating that taxonomic microbial variation can predict efficiency (rtest = 0.55). Prediction interpreters and microbial network revealed that predictions were based on microbial consortia and the efficient animals had more of the highly interacting microbes and consortia. Rumen metagenome data was used to evaluate carbohydrate-active enzymes and metabolic pathway differences between efficiency phenotypes. The study showed that an efficient rumen had a higher abundance of glycoside hydrolases, while an inefficient rumen had more glycosyl transferases. Enrichment of metabolic pathways was observed in the inefficient group, while efficient animals emphasized bacterial environmental sensing and motility over microbial growth. The results suggest that inter-kingdom interactions should be further analyzed to understand their association with the feed efficiency of animals.
Collapse
Affiliation(s)
- Miika Tapio
- Genomics and Breeding, Production Systems, Natural Resources Institute Finland (Luke), 31600 Jokioinen, Finland
| | - Daniel Fischer
- Applied Statistical Methods, Natural Resources, Natural Resources Institute Finland (Luke), 31600 Jokioinen, Finland
| | - Päivi Mäntysaari
- Animal Nutrition, Production Systems, Natural Resources Institute Finland (Luke), 31600 Jokioinen, Finland
| | - Ilma Tapio
- Genomics and Breeding, Production Systems, Natural Resources Institute Finland (Luke), 31600 Jokioinen, Finland
| |
Collapse
|
24
|
Lee CY, Dillard LR, Papin JA, Arnold KB. New perspectives into the vaginal microbiome with systems biology. Trends Microbiol 2023; 31:356-368. [PMID: 36272885 DOI: 10.1016/j.tim.2022.09.011] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 09/19/2022] [Accepted: 09/21/2022] [Indexed: 10/28/2022]
Abstract
The vaginal microbiome (VMB) is critical to female reproductive health; however, the mechanisms associated with optimal and non-optimal states remain poorly understood due to the complex community structure and dynamic nature. Quantitative systems biology techniques applied to the VMB have improved understanding of community composition and function using primarily statistical methods. In contrast, fewer mechanistic models that use a priori knowledge of VMB features to develop predictive models have been implemented despite their use for microbiomes at other sites, including the gastrointestinal tract. Here, we explore systems biology approaches that have been applied in the VMB, highlighting successful techniques and discussing new directions that hold promise for improving understanding of health and disease.
Collapse
Affiliation(s)
- Christina Y Lee
- Department of Biomedical Engineering, University of Michigan, Ann Arbor, MI, USA
| | - Lillian R Dillard
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, USA; Department of Biochemistry & Molecular Genetics, University of Virginia, Charlottesville, VA, USA
| | - Jason A Papin
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, USA
| | - Kelly B Arnold
- Department of Biomedical Engineering, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
25
|
Chung T, Yan R, Weller DL, Kovac J. Conditional Forest Models Built Using Metagenomic Data Accurately Predicted Salmonella Contamination in Northeastern Streams. Microbiol Spectr 2023; 11:e0038123. [PMID: 36946722 PMCID: PMC10100987 DOI: 10.1128/spectrum.00381-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Accepted: 02/27/2023] [Indexed: 03/23/2023] Open
Abstract
The use of water contaminated with Salmonella for produce production contributes to foodborne disease burden. To reduce human health risks, there is a need for novel, targeted approaches for assessing the pathogen status of agricultural water. We investigated the utility of water microbiome data for predicting Salmonella contamination of streams used to source water for produce production. Grab samples were collected from 60 New York streams in 2018 and tested for Salmonella. Separately, DNA was extracted from the samples and used for Illumina shotgun metagenomic sequencing. Reads were trimmed and used to assign taxonomy with Kraken2. Conditional forest (CF), regularized random forest (RRF), and support vector machine (SVM) models were implemented to predict Salmonella contamination. Model performance was assessed using 10-fold cross-validation repeated 10 times to quantify area under the curve (AUC) and Kappa score. CF models outperformed the other two algorithms based on AUC (0.86, CF; 0.81, RRF; 0.65, SVM) and Kappa score (0.53, CF; 0.41, RRF; 0.12, SVM). The taxa that were most informative for accurately predicting Salmonella contamination based on CF were compared to taxa identified by ALDEx2 as being differentially abundant between Salmonella-positive and -negative samples. CF and differential abundance tests both identified Aeromonas salmonicida (variable importance [VI] = 0.012) and Aeromonas sp. strain CA23 (VI = 0.025) as the two most informative taxa for predicting Salmonella contamination. Our findings suggest that microbiome-based models may provide an alternative to or complement existing water monitoring strategies. Similarly, the informative taxa identified in this study warrant further investigation as potential indicators of Salmonella contamination of agricultural water. IMPORTANCE Understanding the associations between surface water microbiome composition and the presence of foodborne pathogens, such as Salmonella, can facilitate the identification of novel indicators of Salmonella contamination. This study assessed the utility of microbiome data and three machine learning algorithms for predicting Salmonella contamination of Northeastern streams. The research reported here both expanded the knowledge on the microbiome composition of surface waters and identified putative novel indicators (i.e., Aeromonas species) for Salmonella in Northeastern streams. These putative indicators warrant further research to assess whether they are consistent indicators of Salmonella contamination across regions, waterways, and years not represented in the data set used in this study. Validated indicators identified using microbiome data may be used as targets in the development of rapid (e.g., PCR-based) detection assays for the assessment of microbial safety of agricultural surface waters.
Collapse
Affiliation(s)
- Taejung Chung
- Department of Food Science, The Pennsylvania State University, University Park, Pennsylvania, USA
- Microbiome Center, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania, USA
| | - Runan Yan
- Department of Food Science, The Pennsylvania State University, University Park, Pennsylvania, USA
- Microbiome Center, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania, USA
| | - Daniel L. Weller
- Department of Statistics and Computational Biology, University of Rochester Medical Center, Rochester, New York, USA
| | - Jasna Kovac
- Department of Food Science, The Pennsylvania State University, University Park, Pennsylvania, USA
- Microbiome Center, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania, USA
| |
Collapse
|
26
|
Shen Y, Zhu J, Deng Z, Lu W, Wang H. EnsDeepDP: An Ensemble Deep Learning Approach for Disease Prediction Through Metagenomics. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:986-998. [PMID: 36001521 DOI: 10.1109/tcbb.2022.3201295] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
A growing number of studies show that the human microbiome plays a vital role in human health and can be a crucial factor in predicting certain human diseases. However, microbiome data are often characterized by the limited samples and high-dimensional features, which pose a great challenge for machine learning methods. Therefore, this paper proposes a novel ensemble deep learning disease prediction method that combines unsupervised and supervised learning paradigms. First, unsupervised deep learning methods are used to learn the potential representation of the sample. Afterwards, the disease scoring strategy is developed based on the deep representations as the informative features for ensemble analysis. To ensure the optimal ensemble, a score selection mechanism is constructed, and performance boosting features are engaged with the original sample. Finally, the composite features are trained with gradient boosting classifier for health status decision. For case study, the ensemble deep learning flowchart has been demonstrated on six public datasets extracted from the human microbiome profiling. The results show that compared with the existing algorithms, our framework achieves better performance on disease prediction.
Collapse
|
27
|
Leveraging Scheme for Cross-Study Microbiome Machine Learning Prediction and Feature Evaluations. Bioengineering (Basel) 2023; 10:bioengineering10020231. [PMID: 36829725 PMCID: PMC9952031 DOI: 10.3390/bioengineering10020231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 02/02/2023] [Accepted: 02/04/2023] [Indexed: 02/11/2023] Open
Abstract
The microbiota has proved to be one of the critical factors for many diseases, and researchers have been using microbiome data for disease prediction. However, models trained on one independent microbiome study may not be easily applicable to other independent studies due to the high level of variability in microbiome data. In this study, we developed a method for improving the generalizability and interpretability of machine learning models for predicting three different diseases (colorectal cancer, Crohn's disease, and immunotherapy response) using nine independent microbiome datasets. Our method involves combining a smaller dataset with a larger dataset, and we found that using at least 25% of the target samples in the source data resulted in improved model performance. We determined random forest as our top model and employed feature selection to identify common and important taxa for disease prediction across the different studies. Our results suggest that this leveraging scheme is a promising approach for improving the accuracy and interpretability of machine learning models for predicting diseases based on microbiome data.
Collapse
|
28
|
Busato S, Gordon M, Chaudhari M, Jensen I, Akyol T, Andersen S, Williams C. Compositionality, sparsity, spurious heterogeneity, and other data-driven challenges for machine learning algorithms within plant microbiome studies. CURRENT OPINION IN PLANT BIOLOGY 2023; 71:102326. [PMID: 36538837 PMCID: PMC9925409 DOI: 10.1016/j.pbi.2022.102326] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Revised: 11/08/2022] [Accepted: 11/21/2022] [Indexed: 06/17/2023]
Abstract
The plant-associated microbiome is a key component of plant systems, contributing to their health, growth, and productivity. The application of machine learning (ML) in this field promises to help untangle the relationships involved. However, measurements of microbial communities by high-throughput sequencing pose challenges for ML. Noise from low sample sizes, soil heterogeneity, and technical factors can impact the performance of ML. Additionally, the compositional and sparse nature of these datasets can impact the predictive accuracy of ML. We review recent literature from plant studies to illustrate that these properties often go unmentioned. We expand our analysis to other fields to quantify the degree to which mitigation approaches improve the performance of ML and describe the mathematical basis for this. With the advent of accessible analytical packages for microbiome data including learning models, researchers must be familiar with the nature of their datasets.
Collapse
Affiliation(s)
- Sebastiano Busato
- Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, USA; NC Plant Sciences Initiative, North Carolina State University, Raleigh, USA
| | - Max Gordon
- Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, USA; NC Plant Sciences Initiative, North Carolina State University, Raleigh, USA
| | - Meenal Chaudhari
- Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, USA; NC Plant Sciences Initiative, North Carolina State University, Raleigh, USA
| | - Ib Jensen
- Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | - Turgut Akyol
- Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | - Stig Andersen
- Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | - Cranos Williams
- Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, USA; NC Plant Sciences Initiative, North Carolina State University, Raleigh, USA; Department of Plant and Microbial Biology, North Carolina State University, Raleigh, USA.
| |
Collapse
|
29
|
Interpreting tree ensemble machine learning models with endoR. PLoS Comput Biol 2022; 18:e1010714. [PMID: 36516158 PMCID: PMC9797088 DOI: 10.1371/journal.pcbi.1010714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 12/28/2022] [Accepted: 11/07/2022] [Indexed: 12/15/2022] Open
Abstract
Tree ensemble machine learning models are increasingly used in microbiome science as they are compatible with the compositional, high-dimensional, and sparse structure of sequence-based microbiome data. While such models are often good at predicting phenotypes based on microbiome data, they only yield limited insights into how microbial taxa may be associated. We developed endoR, a method to interpret tree ensemble models. First, endoR simplifies the fitted model into a decision ensemble. Then, it extracts information on the importance of individual features and their pairwise interactions, displaying them as an interpretable network. Both the endoR network and importance scores provide insights into how features, and interactions between them, contribute to the predictive performance of the fitted model. Adjustable regularization and bootstrapping help reduce the complexity and ensure that only essential parts of the model are retained. We assessed endoR on both simulated and real metagenomic data. We found endoR to have comparable accuracy to other common approaches while easing and enhancing model interpretation. Using endoR, we also confirmed published results on gut microbiome differences between cirrhotic and healthy individuals. Finally, we utilized endoR to explore associations between human gut methanogens and microbiome components. Indeed, these hydrogen consumers are expected to interact with fermenting bacteria in a complex syntrophic network. Specifically, we analyzed a global metagenome dataset of 2203 individuals and confirmed the previously reported association between Methanobacteriaceae and Christensenellales. Additionally, we observed that Methanobacteriaceae are associated with a network of hydrogen-producing bacteria. Our method accurately captures how tree ensembles use features and interactions between them to predict a response. As demonstrated by our applications, the resultant visualizations and summary outputs facilitate model interpretation and enable the generation of novel hypotheses about complex systems.
Collapse
|
30
|
Loganathan T, Priya Doss C G. The influence of machine learning technologies in gut microbiome research and cancer studies - A review. Life Sci 2022; 311:121118. [DOI: 10.1016/j.lfs.2022.121118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 10/19/2022] [Accepted: 10/19/2022] [Indexed: 11/18/2022]
|
31
|
Sampling from four geographically divergent young female populations demonstrates forensic geolocation potential in microbiomes. Sci Rep 2022; 12:18547. [PMID: 36329122 PMCID: PMC9633824 DOI: 10.1038/s41598-022-21779-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 10/04/2022] [Indexed: 11/06/2022] Open
Abstract
Studies of human microbiomes using new sequencing techniques have increasingly demonstrated that their ecologies are partly determined by the lifestyle and habits of individuals. As such, significant forensic information could be obtained from high throughput sequencing of the human microbiome. This approach, combined with multiple analytical techniques demonstrates that bacterial DNA can be used to uniquely identify an individual and to provide information about their life and behavioral patterns. However, the transformation of these findings into actionable forensic information, including the geolocation of the samples, remains limited by incomplete understanding of the effects of confounding factors and the paucity of diverse sequences. We obtained 16S rRNA sequences of stool and oral microbiomes collected from 206 young and healthy females from four globally diverse populations, in addition to supporting metadata, including dietary and medical information. Analysis of these microbiomes revealed detectable geolocation signals between the populations, even for populations living within the same city. Accounting for other lifestyle variables, such as diet and smoking, lessened but does not remove the geolocation signal.
Collapse
|
32
|
Ahmed E, Hens K. Microbiome in Precision Psychiatry: An Overview of the Ethical Challenges Regarding Microbiome Big Data and Microbiome-Based Interventions. AJOB Neurosci 2022; 13:270-286. [PMID: 34379050 DOI: 10.1080/21507740.2021.1958096] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2023]
Abstract
There has been a spurt in both fundamental and translational research that examines the underlying mechanisms of the human microbiome in psychiatric disorders. The personalized and dynamic features of the human microbiome suggest the potential of its manipulation for precision psychiatry in ways to improve mental health and avoid disease. However, findings in the field of microbiome also raise philosophical and ethical questions. From a philosophical point of view, they may yet be another attempt at providing a biological cause for phenomena that ultimately cannot be so easily localized. From an ethical point of view, it is relevant that the human gut microbiome comprises data on the individual's lifestyle, disease history, previous medications, and mental health. Massive datasets of microbiome sequences are collected to facilitate comparative studies to identify specific links between the microbiome and mental health. Although this emerging research domain may show promise for psychiatric patients, it is surrounded by ethical challenges regarding patient privacy, health risks, effects on personal identity, and concerns about responsibility. This narrative overview displays the roles and advances of microbiome research in psychiatry and discusses the philosophical and ethical implications of microbiome big data and microbiome-based interventions for psychiatric patients. We also investigate whether these issues are really "new," or "old wine in new bottles."
Collapse
Affiliation(s)
- Eman Ahmed
- University of Antwerp.,Suez Canal University
| | | |
Collapse
|
33
|
Pietrucci D, Teofani A, Milanesi M, Fosso B, Putignani L, Messina F, Pesole G, Desideri A, Chillemi G. Machine Learning Data Analysis Highlights the Role of Parasutterella and Alloprevotella in Autism Spectrum Disorders. Biomedicines 2022; 10:biomedicines10082028. [PMID: 36009575 PMCID: PMC9405825 DOI: 10.3390/biomedicines10082028] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 08/10/2022] [Accepted: 08/15/2022] [Indexed: 11/25/2022] Open
Abstract
In recent years, the involvement of the gut microbiota in disease and health has been investigated by sequencing the 16S gene from fecal samples. Dysbiotic gut microbiota was also observed in Autism Spectrum Disorder (ASD), a neurodevelopmental disorder characterized by gastrointestinal symptoms. However, despite the relevant number of studies, it is still difficult to identify a typical dysbiotic profile in ASD patients. The discrepancies among these studies are due to technical factors (i.e., experimental procedures) and external parameters (i.e., dietary habits). In this paper, we collected 959 samples from eight available projects (540 ASD and 419 Healthy Controls, HC) and reduced the observed bias among studies. Then, we applied a Machine Learning (ML) approach to create a predictor able to discriminate between ASD and HC. We tested and optimized three algorithms: Random Forest, Support Vector Machine and Gradient Boosting Machine. All three algorithms confirmed the importance of five different genera, including Parasutterella and Alloprevotella. Furthermore, our results show that ML algorithms could identify common taxonomic features by comparing datasets obtained from countries characterized by latent confounding variables.
Collapse
Affiliation(s)
- Daniele Pietrucci
- Department for Innovation in Biological, Agro-Food and Forest Systems (DIBAF), University of Tuscia, 01100 Viterbo, Italy
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, IBIOM, CNR, 70126 Bari, Italy
| | - Adelaide Teofani
- Department of Biology, University of Rome Tor Vergata, Via Montpellier 1, 00133 Rome, Italy
| | - Marco Milanesi
- Department for Innovation in Biological, Agro-Food and Forest Systems (DIBAF), University of Tuscia, 01100 Viterbo, Italy
| | - Bruno Fosso
- Department of Biosciences, Biotechnology and Biopharmaceutics, University of Bari “A. Moro”, Piazza Umberto I, 1, 70121 Bari, Italy
| | - Lorenza Putignani
- Unit of Microbiology and Diagnostic Immunology, Units of Microbiomics, Department of Diagnostic and Laboratory Medicine, Bambino Gesù Children’s Hospital, IRCCS, 00146 Rome, Italy
| | - Francesco Messina
- Laboratory of Microbiology and Biological Bank National Institute for Infectious Diseases “Lazzaro Spallanzani” Istituto di Ricovero e Cura a Carattere Scientifico, 00149 Rome, Italy
| | - Graziano Pesole
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, IBIOM, CNR, 70126 Bari, Italy
- Department of Biosciences, Biotechnology and Biopharmaceutics, University of Bari “A. Moro”, Piazza Umberto I, 1, 70121 Bari, Italy
| | - Alessandro Desideri
- Department of Biology, University of Rome Tor Vergata, Via Montpellier 1, 00133 Rome, Italy
| | - Giovanni Chillemi
- Department for Innovation in Biological, Agro-Food and Forest Systems (DIBAF), University of Tuscia, 01100 Viterbo, Italy
- Correspondence: ; Tel.: +39-0761-357-429
| |
Collapse
|
34
|
Evaluation of Prebiotics through an In Vitro Gastrointestinal Digestion and Fecal Fermentation Experiment: Further Idea on the Implementation of Machine Learning Technique. Foods 2022; 11:foods11162490. [PMID: 36010490 PMCID: PMC9407061 DOI: 10.3390/foods11162490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 08/12/2022] [Accepted: 08/16/2022] [Indexed: 11/17/2022] Open
Abstract
Prebiotics are non-digestible food ingredients that promote the growth of beneficial gut microorganisms and foster their activities. The performance of prebiotics has often been tested in mouse models in which the gut ecology differs from that of humans. In this study, we instead performed an in vitro gastrointestinal digestion and fecal fermentation experiment to evaluate the efficiency of eight different prebiotics. Feces obtained from 11 different individuals were used to ferment digested prebiotics. The total DNA from each sample was extracted and sequenced through Illumina MiSeq for microbial community analysis. The amount of short-chain fatty acids was assessed through gas chromatography. We found links between community shifts and the increased amount of short-chain fatty acids after prebiotics treatment. The results from differential abundance analysis showed increases in beneficial gut microorganisms, such as Bifidobacterium, Faeclibacterium, and Agathobacter, after prebiotics treatment. We were also able to construct well-performing machine-learning models that could predict the amount of short-chain fatty acids based on the gut microbial community structure. Finally, we provide an idea for further implementation of machine-learning techniques to find customized prebiotics.
Collapse
|
35
|
Zhou YH, Sun G. Improve the Colorectal Cancer Diagnosis Using Gut Microbiome Data. Front Mol Biosci 2022; 9:921945. [PMID: 36032686 PMCID: PMC9415616 DOI: 10.3389/fmolb.2022.921945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 06/16/2022] [Indexed: 11/17/2022] Open
Abstract
In the United States, colorectal cancer is the second largest cause of cancer death, and accurate early detection and identification of high-risk patients is a high priority. Although fecal screening tests are available, the close relationship between colorectal cancer and the gut microbiome has generated considerable interest. We describe a machine learning method for gut microbiome data to assist in diagnosing colorectal cancer. Our methodology integrates feature engineering, mediation analysis, statistical modeling, and network analysis into a novel unified pipeline. Simulation results illustrate the value of the method in comparison to existing methods. For predicting colorectal cancer in two real datasets, this pipeline showed an 8.7% higher prediction accuracy and 13% higher area under the receiver operator characteristic curve than other published work. Additionally, the approach highlights important colorectal cancer-related taxa for prioritization, such as high levels of Bacteroides fragilis, which can help elucidate disease pathology. Our algorithms and approach can be widely applied for Colorectal cancer prediction using either 16 S rRNA or shotgun metagenomics data.
Collapse
Affiliation(s)
- Yi-Hui Zhou
- Department of Biological Sciences, North Carolina State University, Raleigh, NC, United States
- Binformatics Research Center, North Carolina State University, Raleigh, NC, United States
- *Correspondence: Yi-Hui Zhou,
| | - George Sun
- Alston Ridge Middle School, Cary, NC, United States
| |
Collapse
|
36
|
Zhou L, Zhao Z, Shao L, Fang S, Li T, Gan L, Guo C. Predicting the abundance of metal resistance genes in subtropical estuaries using amplicon sequencing and machine learning. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2022; 241:113844. [PMID: 36068766 DOI: 10.1016/j.ecoenv.2022.113844] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 06/24/2022] [Accepted: 07/01/2022] [Indexed: 06/15/2023]
Abstract
Heavy metals are a group of anthropogenic contaminants in estuary ecosystems. Bacteria in estuaries counteract the highly concentrated metal toxicity through metal resistance genes (MRGs). Presently, metagenomic technology is popularly used to study MRGs. However, an easier and less expensive method of acquiring MRG information is needed to deepen our understanding of the fate of MRGs. Thus, this study explores the feasibility of using a machine learning approach-namely, random forests (RF)-to predict MRG abundance based on the 16S rRNA amplicon sequenced datasets from subtropical estuaries in China. Our results showed that the total MRG abundance could be predicted by RF models using bacterial composition at different taxonomic levels. Among them, the relative abundance of bacterial phyla had the highest predicted accuracy (71.7 %). In addition, the RF models constructed by bacterial phyla predicted the abundance of six MRG types and nine MRG subtypes with substantial accuracy (R2 > 0.600). Five bacterial phyla (Firmicutes, Bacteroidetes, Patescibacteria, Armatimonadetes, and Nitrospirae) substantially determined the variations in MRG abundance. Our findings prove that RF models can predict MRG abundance in South China estuaries during the wet season by using the bacterial composition obtained by 16S rRNA amplicon sequencing.
Collapse
Affiliation(s)
- Lei Zhou
- College of Marine Sciences, South China Agricultural University, 510642 Guangzhou, China
| | - Zelong Zhao
- Liaoning Key Lab of Germplasm Improvement and Fine Seed Breeding of Marine Aquatic animals, Liaoning Ocean and Fisheries Science Research Institute, Dalian 116023, China
| | - Liyi Shao
- College of Marine Sciences, South China Agricultural University, 510642 Guangzhou, China
| | - Shiyun Fang
- College of Marine Sciences, South China Agricultural University, 510642 Guangzhou, China
| | - Tongzhou Li
- College of Marine Sciences, South China Agricultural University, 510642 Guangzhou, China
| | - Lihong Gan
- College of Marine Sciences, South China Agricultural University, 510642 Guangzhou, China
| | - Chuanbo Guo
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China.
| |
Collapse
|
37
|
New-Generation Sequencing Technology in Diagnosis of Fungal Plant Pathogens: A Dream Comes True? J Fungi (Basel) 2022; 8:jof8070737. [PMID: 35887492 PMCID: PMC9320658 DOI: 10.3390/jof8070737] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 07/01/2022] [Accepted: 07/11/2022] [Indexed: 02/01/2023] Open
Abstract
The fast and continued progress of high-throughput sequencing (HTS) and the drastic reduction of its costs have boosted new and unpredictable developments in the field of plant pathology. The cost of whole-genome sequencing, which, until few years ago, was prohibitive for many projects, is now so affordable that a new branch, phylogenomics, is being developed. Fungal taxonomy is being deeply influenced by genome comparison, too. It is now easier to discover new genes as potential targets for an accurate diagnosis of new or emerging pathogens, notably those of quarantine concern. Similarly, with the development of metabarcoding and metagenomics techniques, it is now possible to unravel complex diseases or answer crucial questions, such as "What's in my soil?", to a good approximation, including fungi, bacteria, nematodes, etc. The new technologies allow to redraw the approach for disease control strategies considering the pathogens within their environment and deciphering the complex interactions between microorganisms and the cultivated crops. This kind of analysis usually generates big data that need sophisticated bioinformatic tools (machine learning, artificial intelligence) for their management. Herein, examples of the use of new technologies for research in fungal diversity and diagnosis of some fungal pathogens are reported.
Collapse
|
38
|
Wang Q, Wei Y. Quantifying uncertainty of subsampling-based ensemble methods under a U-statistic framework. J STAT COMPUT SIM 2022. [DOI: 10.1080/00949655.2022.2081969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Qing Wang
- Department of Mathematics, Wellesley College, Wellesley, MA, USA
| | - Yujie Wei
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
39
|
Wani AK, Roy P, Kumar V, Mir TUG. Metagenomics and artificial intelligence in the context of human health. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2022; 100:105267. [PMID: 35278679 DOI: 10.1016/j.meegid.2022.105267] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 03/03/2022] [Accepted: 03/04/2022] [Indexed: 12/12/2022]
Abstract
Human microbiome is ubiquitous, dynamic, and site-specific consortia of microbial communities. The pathogenic nature of microorganisms within human tissues has led to an increase in microbial studies. Characterization of genera, like Streptococcus, Cutibacterium, Staphylococcus, Bifidobacterium, Lactococcus and Lactobacillus through culture-dependent and culture-independent techniques has been reported. However, due to the unique environment within human tissues, it is difficult to culture these microorganisms making their molecular studies strenuous. MGs offer a gateway to explore and characterize hidden microbial communities through a culture-independent mode by direct DNA isolation. By function and sequence-based MGs, Scientists can explore the mechanistic details of numerous microbes and their interaction with the niche. Since the data generated from MGs studies is highly complex and multi-dimensional, it requires accurate analytical tools to evaluate and interpret the data. Artificial intelligence (AI) provides the luxury to automatically learn the data dimensionality and ease its complexity that makes the disease diagnosis and disease response easy, accurate and timely. This review provides insight into the human microbiota and its exploration and expansion through MG studies. The review elucidates the significance of MGs in studying the changing microbiota during disease conditions besides highlighting the role of AI in computational analysis of MG data.
Collapse
Affiliation(s)
- Atif Khurshid Wani
- Department of Biotechnology, School of Bioengineering and Biosciences, Lovely Professional University, Punjab 144411, India
| | - Priyanka Roy
- Department of Basic and Applied Sciences, National Institute of Food Technology Entrepreneurship and Management, Sonipat 131 028, Haryana, India
| | - Vijay Kumar
- Department of Basic and Applied Sciences, National Institute of Food Technology Entrepreneurship and Management, Sonipat 131 028, Haryana, India.
| | - Tahir Ul Gani Mir
- Department of Biotechnology, School of Bioengineering and Biosciences, Lovely Professional University, Punjab 144411, India
| |
Collapse
|
40
|
Morgan EW, Perdew GH, Patterson AD. Multi-Omics Strategies for Investigating the Microbiome in Toxicology Research. Toxicol Sci 2022; 187:189-213. [PMID: 35285497 PMCID: PMC9154275 DOI: 10.1093/toxsci/kfac029] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Microbial communities on and within the host contact environmental pollutants, toxic compounds, and other xenobiotic compounds. These communities of bacteria, fungi, viruses, and archaea possess diverse metabolic potential to catabolize compounds and produce new metabolites. Microbes alter chemical disposition thus making the microbiome a natural subject of interest for toxicology. Sequencing and metabolomics technologies permit the study of microbiomes altered by acute or long-term exposure to xenobiotics. These investigations have already contributed to and are helping to re-interpret traditional understandings of toxicology. The purpose of this review is to provide a survey of the current methods used to characterize microbes within the context of toxicology. This will include discussion of commonly used techniques for conducting omic-based experiments, their respective strengths and deficiencies, and how forward-looking techniques may address present shortcomings. Finally, a perspective will be provided regarding common assumptions that currently impede microbiome studies from producing causal explanations of toxicologic mechanisms.
Collapse
Affiliation(s)
- Ethan W Morgan
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Gary H Perdew
- Department of Veterinary and Biomedical Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Andrew D Patterson
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA.,Department of Veterinary and Biomedical Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| |
Collapse
|
41
|
Chen X, Zhu Z, Zhang W, Wang Y, Wang F, Yang J, Wong KC. Human disease prediction from microbiome data by multiple feature fusion and deep learning. iScience 2022; 25:104081. [PMID: 35372808 PMCID: PMC8971930 DOI: 10.1016/j.isci.2022.104081] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2021] [Revised: 09/16/2021] [Accepted: 03/13/2022] [Indexed: 10/29/2022] Open
Abstract
Human disease prediction from microbiome data has broad implications in metagenomics. It is rare for the existing methods to consider abundance profiles from both known and unknown microbial organisms, or capture the taxonomic relationships among microbial taxa, leading to significant information loss. On the other hand, deep learning has shown unprecedented advantages in classification tasks for its feature-learning ability. However, it encounters the opposite situation in metagenome-based disease prediction since high-dimensional low-sample-size metagenomic datasets can lead to severe overfitting; and black-box model fails in providing biological explanations. To circumvent the related problems, we developed MetaDR, a comprehensive machine learning-based framework that integrates various information and deep learning to predict human diseases. Experimental results indicate that MetaDR achieves competitive prediction performance with a reduction in running time, and effectively discovers the informative features with biological insights.
Collapse
Affiliation(s)
- Xingjian Chen
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Zifan Zhu
- Quantitative and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA
| | - Weitong Zhang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Yuchen Wang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Fuzhou Wang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR.,Hong Kong Institute for Data Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| |
Collapse
|
42
|
Zhou H, He K, Chen J, Zhang X. LinDA: linear models for differential abundance analysis of microbiome compositional data. Genome Biol 2022; 23:95. [PMID: 35421994 PMCID: PMC9012043 DOI: 10.1186/s13059-022-02655-5] [Citation(s) in RCA: 67] [Impact Index Per Article: 33.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 03/14/2022] [Indexed: 12/12/2022] Open
Abstract
Differential abundance analysis is at the core of statistical analysis of microbiome data. The compositional nature of microbiome sequencing data makes false positive control challenging. Here, we show that the compositional effects can be addressed by a simple, yet highly flexible and scalable, approach. The proposed method, LinDA, only requires fitting linear regression models on the centered log-ratio transformed data, and correcting the bias due to compositional effects. We show that LinDA enjoys asymptotic FDR control and can be extended to mixed-effect models for correlated microbiome data. Using simulations and real examples, we demonstrate the effectiveness of LinDA.
Collapse
|
43
|
Parvandeh S, Donehower LA, Katsonis P, Hsu TK, Asmussen J, Lee K, Lichtarge O. EPIMUTESTR: a nearest neighbor machine learning approach to predict cancer driver genes from the evolutionary action of coding variants. Nucleic Acids Res 2022; 50:e70. [PMID: 35412634 PMCID: PMC9262594 DOI: 10.1093/nar/gkac215] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 03/17/2022] [Accepted: 03/21/2022] [Indexed: 02/01/2023] Open
Abstract
Discovering rare cancer driver genes is difficult because their mutational frequency is too low for statistical detection by computational methods. EPIMUTESTR is an integrative nearest-neighbor machine learning algorithm that identifies such marginal genes by modeling the fitness of their mutations with the phylogenetic Evolutionary Action (EA) score. Over cohorts of sequenced patients from The Cancer Genome Atlas representing 33 tumor types, EPIMUTESTR detected 214 previously inferred cancer driver genes and 137 new candidates never identified computationally before of which seven genes are supported in the COSMIC Cancer Gene Census. EPIMUTESTR achieved better robustness and specificity than existing methods in a number of benchmark methods and datasets.
Collapse
Affiliation(s)
- Saeid Parvandeh
- To whom correspondence should be addressed. Tel: +1 713 798 7677;
| | - Lawrence A Donehower
- Department of Molecular Virology and Microbiology, Houston, TX 77030, USA,Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Teng-Kuei Hsu
- Department of Biochemistry & Molecular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Jennifer K Asmussen
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Kwanghyuk Lee
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Olivier Lichtarge
- Correspondence may also be addressed to Olivier Lichtarge. Tel: +1 713 798 5646;
| |
Collapse
|
44
|
Host phenotype classification from human microbiome data is mainly driven by the presence of microbial taxa. PLoS Comput Biol 2022; 18:e1010066. [PMID: 35446845 PMCID: PMC9064115 DOI: 10.1371/journal.pcbi.1010066] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 05/03/2022] [Accepted: 03/29/2022] [Indexed: 12/14/2022] Open
Abstract
Machine learning-based classification approaches are widely used to predict host phenotypes from microbiome data. Classifiers are typically employed by considering operational taxonomic units or relative abundance profiles as input features. Such types of data are intrinsically sparse, which opens the opportunity to make predictions from the presence/absence rather than the relative abundance of microbial taxa. This also poses the question whether it is the presence rather than the abundance of particular taxa to be relevant for discrimination purposes, an aspect that has been so far overlooked in the literature. In this paper, we aim at filling this gap by performing a meta-analysis on 4,128 publicly available metagenomes associated with multiple case-control studies. At species-level taxonomic resolution, we show that it is the presence rather than the relative abundance of specific microbial taxa to be important when building classification models. Such findings are robust to the choice of the classifier and confirmed by statistical tests applied to identifying differentially abundant/present taxa. Results are further confirmed at coarser taxonomic resolutions and validated on 4,026 additional 16S rRNA samples coming from 30 public case-control studies. The composition of the human microbiome has been linked to a large number of different diseases. In this context, classification methodologies based on machine learning approaches have represented a promising tool for diagnostic purposes from metagenomics data. The link between microbial population composition and host phenotypes has been usually performed by considering taxonomic profiles represented by relative abundances of microbial species. In this study, we show that it is more the presence rather than the relative abundance of microbial taxa to be relevant to maximize classification accuracy. This is accomplished by conducting a meta-analysis on more than 4,000 shotgun metagenomes coming from 25 case-control studies and in which original relative abundance data are degraded to presence/absence profiles. Findings are also extended to 16S rRNA data and advance the research field in building prediction models directly from human microbiome data.
Collapse
|
45
|
Liu B, Sträuber H, Saraiva J, Harms H, Silva SG, Kasmanas JC, Kleinsteuber S, Nunes da Rocha U. Machine learning-assisted identification of bioindicators predicts medium-chain carboxylate production performance of an anaerobic mixed culture. MICROBIOME 2022; 10:48. [PMID: 35331330 PMCID: PMC8952268 DOI: 10.1186/s40168-021-01219-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 12/17/2021] [Indexed: 05/10/2023]
Abstract
BACKGROUND The ability to quantitatively predict ecophysiological functions of microbial communities provides an important step to engineer microbiota for desired functions related to specific biochemical conversions. Here, we present the quantitative prediction of medium-chain carboxylate production in two continuous anaerobic bioreactors from 16S rRNA gene dynamics in enriched communities. RESULTS By progressively shortening the hydraulic retention time (HRT) from 8 to 2 days with different temporal schemes in two bioreactors operated for 211 days, we achieved higher productivities and yields of the target products n-caproate and n-caprylate. The datasets generated from each bioreactor were applied independently for training and testing machine learning algorithms using 16S rRNA genes to predict n-caproate and n-caprylate productivities. Our dataset consisted of 14 and 40 samples from HRT of 8 and 2 days, respectively. Because of the size and balance of our dataset, we compared linear regression, support vector machine and random forest regression algorithms using the original and balanced datasets generated using synthetic minority oversampling. Further, we performed cross-validation to estimate model stability. The random forest regression was the best algorithm producing more consistent results with median of error rates below 8%. More than 90% accuracy in the prediction of n-caproate and n-caprylate productivities was achieved. Four inferred bioindicators belonging to the genera Olsenella, Lactobacillus, Syntrophococcus and Clostridium IV suggest their relevance to the higher carboxylate productivity at shorter HRT. The recovery of metagenome-assembled genomes of these bioindicators confirmed their genetic potential to perform key steps of medium-chain carboxylate production. CONCLUSIONS Shortening the hydraulic retention time of the continuous bioreactor systems allows to shape the communities with desired chain elongation functions. Using machine learning, we demonstrated that 16S rRNA amplicon sequencing data can be used to predict bioreactor process performance quantitatively and accurately. Characterizing and harnessing bioindicators holds promise to manage reactor microbiota towards selection of the target processes. Our mathematical framework is transferrable to other ecosystem processes and microbial systems where community dynamics is linked to key functions. The general methodology used here can be adapted to data types of other functional categories such as genes, transcripts, proteins or metabolites. Video Abstract.
Collapse
Affiliation(s)
- Bin Liu
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research – UFZ, Leipzig, Germany
| | - Heike Sträuber
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research – UFZ, Leipzig, Germany
| | - João Saraiva
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research – UFZ, Leipzig, Germany
| | - Hauke Harms
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research – UFZ, Leipzig, Germany
| | - Sandra Godinho Silva
- Institute for Bioengineering and Biosciences, Department of Bioengineering, Instituto Superior Técnico Universidade de Lisboa, Lisbon, Portugal
| | - Jonas Coelho Kasmanas
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research – UFZ, Leipzig, Germany
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, Brazil
- Department of Computer Science and Interdisciplinary Center of Bioinformatics, University of Leipzig, Leipzig, Germany
| | - Sabine Kleinsteuber
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research – UFZ, Leipzig, Germany
| | - Ulisses Nunes da Rocha
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research – UFZ, Leipzig, Germany
| |
Collapse
|
46
|
David MM, Tataru C, Pope Q, Baker LJ, English MK, Epstein HE, Hammer A, Kent M, Sieler MJ, Mueller RS, Sharpton TJ, Tomas F, Vega Thurber R, Fern XZ. Revealing General Patterns of Microbiomes That Transcend Systems: Potential and Challenges of Deep Transfer Learning. mSystems 2022; 7:e0105821. [PMID: 35040699 PMCID: PMC8765061 DOI: 10.1128/msystems.01058-21] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
A growing body of research has established that the microbiome can mediate the dynamics and functional capacities of diverse biological systems. Yet, we understand little about what governs the response of these microbial communities to host or environmental changes. Most efforts to model microbiomes focus on defining the relationships between the microbiome, host, and environmental features within a specified study system and therefore fail to capture those that may be evident across multiple systems. In parallel with these developments in microbiome research, computer scientists have developed a variety of machine learning tools that can identify subtle, but informative, patterns from complex data. Here, we recommend using deep transfer learning to resolve microbiome patterns that transcend study systems. By leveraging diverse public data sets in an unsupervised way, such models can learn contextual relationships between features and build on those patterns to perform subsequent tasks (e.g., classification) within specific biological contexts.
Collapse
Affiliation(s)
- Maude M. David
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
- Department of Pharmaceutical Sciences, Oregon State University, Corvallis, Oregon, USA
| | - Christine Tataru
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Quintin Pope
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon, USA
| | - Lydia J. Baker
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Mary K. English
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Hannah E. Epstein
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Austin Hammer
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Michael Kent
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Michael J. Sieler
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Ryan S. Mueller
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Thomas J. Sharpton
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
- Department of Statistics, Oregon State University, Corvallis, Oregon, USA
| | - Fiona Tomas
- Instituto Mediterráneo de Estudios Avanzados, IMEDEA, Esporles, Balearic Islands, Spain
| | | | - Xiaoli Z. Fern
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon, USA
| |
Collapse
|
47
|
Jin BT, Xu F, Ng RT, Hogg JC. Mian: interactive web-based microbiome data table visualization and machine learning platform. Bioinformatics 2022; 38:1176-1178. [PMID: 34788784 DOI: 10.1093/bioinformatics/btab754] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 09/21/2021] [Accepted: 11/03/2021] [Indexed: 02/03/2023] Open
Abstract
SUMMARY Mian is a web application to interactively visualize, run statistical tools and train machine learning models on operational taxonomic unit (OTU) or amplicon sequence variant (ASV) datasets to identify key taxonomic groups, diversity trends or taxonomic composition shifts in the context of provided categorical or numerical sample metadata. Tools, including Fisher's exact test, Boruta feature selection, alpha and beta diversity, and random forest and deep neural network classifiers, facilitate open-ended data exploration and hypothesis generation on microbial datasets. AVAILABILITY Mian is freely available at: miandata.org. Mian is an open-source platform licensed under the MIT license with source code available at github.com/tbj128/mian. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Boyang Tom Jin
- Department of Computer Science, Stanford University, Stanford, CA, USA 94305
| | - Feng Xu
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC, Canada V6Z1Y6
| | - Raymond T Ng
- Department of Computer Science, University of British Columbia, Vancouver, BC, Canada V6T1Z4
| | - James C Hogg
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC, Canada V6Z1Y6
| |
Collapse
|
48
|
Youngblut ND, de la Cuesta-Zuluaga J, Ley RE. Incorporating genome-based phylogeny and functional similarity into diversity assessments helps to resolve a global collection of human gut metagenomes. Environ Microbiol 2022; 24:3966-3984. [PMID: 35049120 DOI: 10.1111/1462-2920.15910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 01/15/2022] [Indexed: 11/29/2022]
Abstract
Tree-based diversity measures incorporate phylogenetic or functional relatedness into comparisons of microbial communities. This can improve the identification of explanatory factors compared to tree-agnostic diversity measures. However, applying tree-based diversity measures to metagenome data is more challenging than for single-locus sequencing (e.g., 16S rRNA gene). Utilizing the Genome Taxonomy Database (GTDB) for species-level metagenome profiling allows for functional diversity measures based on genomic content or traits inferred from it. Still, it is unclear how metagenome-based assessments of microbiome diversity benefit from incorporating phylogeny or function into measures of diversity. We assessed this by measuring phylogeny-based, function-based, and tree-agnostic diversity measures from a large, global collection of human gut metagenomes composed of 30 studies and 2943 samples. We found tree-based measures to explain phenotypic variation (e.g., westernization, disease status, and gender) better or equivalent to tree-agnostic measures. Ecophylogenetic and functional diversity measures provided unique insight into how microbiome diversity was partitioned by phenotype. Tree-based measures greatly improved machine learning model performance for predicting westernization, disease status, and gender, relative to models trained solely on tree-agnostic measures. Our findings illustrate the usefulness of tree- and function-based measures for metagenomic assessments of microbial diversity, which is a fundamental component of microbiome science. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Nicholas D Youngblut
- Department of Microbiome Science, Max Planck Institute for Developmental Biology, Max Planck Ring 5, 72076, Tübingen, Germany
| | - Jacobo de la Cuesta-Zuluaga
- Department of Microbiome Science, Max Planck Institute for Developmental Biology, Max Planck Ring 5, 72076, Tübingen, Germany
| | - Ruth E Ley
- Department of Microbiome Science, Max Planck Institute for Developmental Biology, Max Planck Ring 5, 72076, Tübingen, Germany
| |
Collapse
|
49
|
Multimodal deep learning applied to classify healthy and disease states of human microbiome. Sci Rep 2022; 12:824. [PMID: 35039534 PMCID: PMC8763943 DOI: 10.1038/s41598-022-04773-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Accepted: 12/30/2021] [Indexed: 12/22/2022] Open
Abstract
Metagenomic sequencing methods provide considerable genomic information regarding human microbiomes, enabling us to discover and understand microbial diseases. Compositional differences have been reported between patients and healthy people, which could be used in the diagnosis of patients. Despite significant progress in this regard, the accuracy of these tools needs to be improved for applications in diagnostics and therapeutics. MDL4Microbiome, the method developed herein, demonstrated high accuracy in predicting disease status by using various features from metagenome sequences and a multimodal deep learning model. We propose combining three different features, i.e., conventional taxonomic profiles, genome-level relative abundance, and metabolic functional characteristics, to enhance classification accuracy. This deep learning model enabled the construction of a classifier that combines these various modalities encoded in the human microbiome. We achieved accuracies of 0.98, 0.76, 0.84, and 0.97 for predicting patients with inflammatory bowel disease, type 2 diabetes, liver cirrhosis, and colorectal cancer, respectively; these are comparable or higher than classical machine learning methods. A deeper analysis was also performed on the resulting sets of selected features to understand the contribution of their different characteristics. MDL4Microbiome is a classifier with higher or comparable accuracy compared with other machine learning methods, which offers perspectives on feature generation with metagenome sequences in deep learning models and their advantages in the classification of host disease status.
Collapse
|
50
|
Deng Z, Zhang J, Li J, Zhang X. Application of Deep Learning in Plant-Microbiota Association Analysis. Front Genet 2021; 12:697090. [PMID: 34691142 PMCID: PMC8531731 DOI: 10.3389/fgene.2021.697090] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 08/31/2021] [Indexed: 01/04/2023] Open
Abstract
Unraveling the association between microbiome and plant phenotype can illustrate the effect of microbiome on host and then guide the agriculture management. Adequate identification of species and appropriate choice of models are two challenges in microbiome data analysis. Computational models of microbiome data could help in association analysis between the microbiome and plant host. The deep learning methods have been widely used to learn the microbiome data due to their powerful strength of handling the complex, sparse, noisy, and high-dimensional data. Here, we review the analytic strategies in the microbiome data analysis and describe the applications of deep learning models for plant–microbiome correlation studies. We also introduce the application cases of different models in plant–microbiome correlation analysis and discuss how to adapt the models on the critical steps in data processing. From the aspect of data processing manner, model structure, and operating principle, most deep learning models are suitable for the plant microbiome data analysis. The ability of feature representation and pattern recognition is the advantage of deep learning methods in modeling and interpretation for association analysis. Based on published computational experiments, the convolutional neural network and graph neural networks could be recommended for plant microbiome analysis.
Collapse
Affiliation(s)
- Zhiyu Deng
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, China.,Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Jinming Zhang
- Department of Infectious Diseases, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, China
| | - Junya Li
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, China.,Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, China.,Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, China
| |
Collapse
|