Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zhou YH, Gallins P. A Review and Tutorial of Machine Learning Methods for Microbiome Host Trait Prediction. Front Genet 2019;10:579. [PMID: 31293616 PMCID: PMC6603228 DOI: 10.3389/fgene.2019.00579] [Citation(s) in RCA: 88] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2019] [Accepted: 06/04/2019] [Indexed: 12/19/2022] Open

For:	Zhou YH, Gallins P. A Review and Tutorial of Machine Learning Methods for Microbiome Host Trait Prediction. Front Genet 2019;10:579. [PMID: 31293616 PMCID: PMC6603228 DOI: 10.3389/fgene.2019.00579] [Citation(s) in RCA: 88] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2019] [Accepted: 06/04/2019] [Indexed: 12/19/2022] Open

Number

Cited by Other Article(s)

Chang CC, Liu TC, Lu CJ, Chiu HC, Lin WN. Explainable machine learning model for identifying key gut microbes and metabolites biomarkers associated with myasthenia gravis. Comput Struct Biotechnol J 2024;23:1572-1583. [PMID: 38650589 PMCID: PMC11035017 DOI: 10.1016/j.csbj.2024.04.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Revised: 03/14/2024] [Accepted: 04/07/2024] [Indexed: 04/25/2024] Open

Ezra S, Bashan A. Network impact of a single-time-point microbial sample. PLoS One 2024;19:e0301683. [PMID: 38814902 PMCID: PMC11139317 DOI: 10.1371/journal.pone.0301683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 03/20/2024] [Indexed: 06/01/2024] Open

Teixeira M, Silva F, Ferreira RM, Pereira T, Figueiredo C, Oliveira HP. A review of machine learning methods for cancer characterization from microbiome data. NPJ Precis Oncol 2024;8:123. [PMID: 38816569 PMCID: PMC11139966 DOI: 10.1038/s41698-024-00617-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 05/17/2024] [Indexed: 06/01/2024] Open

Hagen M, Dass R, Westhues C, Blom J, Schultheiss SJ, Patz S. Interpretable machine learning decodes soil microbiome's response to drought stress. ENVIRONMENTAL MICROBIOME 2024;19:35. [PMID: 38812054 PMCID: PMC11138018 DOI: 10.1186/s40793-024-00578-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 05/10/2024] [Indexed: 05/31/2024]

Bombaywala S, Bajaj A, Dafale NA. Meta-analysis of wastewater microbiome for antibiotic resistance profiling. J Microbiol Methods 2024;223:106953. [PMID: 38754482 DOI: 10.1016/j.mimet.2024.106953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 05/12/2024] [Accepted: 05/12/2024] [Indexed: 05/18/2024]

Abstract

The microbial composition and stress molecules are main drivers influencing the development and spread of antibiotic resistance bacteria (ARBs) and genes (ARGs) in the environment. A reliable and rapid method for identifying associations between microbiome composition and resistome remains challenging. In the present study, secondary metagenome data of sewage and hospital wastewaters were assessed for differential taxonomic and ARG profiling. Subsequently, Random Forest (RF)-based ML models were used to predict ARG profiles based on taxonomic composition and model validation on hospital wastewaters. Total ARG abundance was significantly higher in hospital wastewaters (15 ppm) than sewage (5 ppm), while the resistance towards methicillin, carbapenem, and fluoroquinolone were predominant. Although, Pseudomonas constituted major fraction, Streptomyces, Enterobacter, and Klebsiella were characteristic of hospital wastewaters. Prediction modeling showed that the relative abundance of pathogenic genera Escherichia, Vibrio, and Pseudomonas contributed most towards variations in total ARG count. Moreover, the model was able to identify host-specific patterns for contributing taxa and related ARGs with >90% accuracy in predicting the ARG subtype abundance. More than >80% accuracy was obtained for hospital wastewaters, demonstrating that the model can be validly extrapolated to different types of wastewater systems. Findings from the study showed that the ML approach could identify ARG profile based on bacterial composition including 16S rDNA amplicon data, and can serve as a viable alternative to metagenomic binning for identification of potential hosts of ARGs. Overall, this study demonstrates the promising application of ML techniques for predicting the spread of ARGs and provides guidance for early warning of ARBs emergence.

Collapse

Gorman ED, Lladser ME. Interpretable metric learning in comparative metagenomics: The adaptive Haar-like distance. PLoS Comput Biol 2024;20:e1011543. [PMID: 38768195 PMCID: PMC11142682 DOI: 10.1371/journal.pcbi.1011543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 05/31/2024] [Accepted: 04/25/2024] [Indexed: 05/22/2024] Open

Yerke A, Fry Brumit D, Fodor AA. Proportion-based normalizations outperform compositional data transformations in machine learning applications. MICROBIOME 2024;12:45. [PMID: 38443997 PMCID: PMC10913632 DOI: 10.1186/s40168-023-01747-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 12/22/2023] [Indexed: 03/07/2024]

Gradisteanu Pircalabioru G, Raileanu M, Dionisie MV, Lixandru-Petre IO, Iliescu C. Fast detection of bacterial gut pathogens on miniaturized devices: an overview. Expert Rev Mol Diagn 2024;24:201-218. [PMID: 38347807 DOI: 10.1080/14737159.2024.2316756] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Accepted: 02/06/2024] [Indexed: 03/23/2024]

Brochu HN, Smith E, Jeong S, Carlson M, Hansen SG, Tisoncik-Go J, Law L, Picker LJ, Gale M, Peng X. Pre-challenge gut microbial signature predicts RhCMV/SIV vaccine efficacy in rhesus macaques. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.27.582186. [PMID: 38464179 PMCID: PMC10925241 DOI: 10.1101/2024.02.27.582186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]

Baddal B, Taner F, Uzun Ozsahin D. Harnessing of Artificial Intelligence for the Diagnosis and Prevention of Hospital-Acquired Infections: A Systematic Review. Diagnostics (Basel) 2024;14:484. [PMID: 38472956 DOI: 10.3390/diagnostics14050484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Revised: 01/23/2024] [Accepted: 02/19/2024] [Indexed: 03/14/2024] Open

Walsh C, Stallard-Olivera E, Fierer N. Nine (not so simple) steps: a practical guide to using machine learning in microbial ecology. mBio 2024;15:e0205023. [PMID: 38126787 PMCID: PMC10865974 DOI: 10.1128/mbio.02050-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2023] Open

Soldan R, Fusi M, Cardinale M, Homma F, Santos LG, Wenzl P, Bach-Pages M, Bitocchi E, Chacon Sanchez MI, Daffonchio D, Preston GM. Consistent effects of independent domestication events on the plant microbiota. Curr Biol 2024;34:557-567.e4. [PMID: 38232731 DOI: 10.1016/j.cub.2023.12.056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 12/01/2023] [Accepted: 12/18/2023] [Indexed: 01/19/2024]

Yang MQ, Wang ZJ, Zhai CB, Chen LQ. Research progress on the application of 16S rRNA gene sequencing and machine learning in forensic microbiome individual identification. Front Microbiol 2024;15:1360457. [PMID: 38371926 PMCID: PMC10869621 DOI: 10.3389/fmicb.2024.1360457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Accepted: 01/23/2024] [Indexed: 02/20/2024] Open

Rojas-Velazquez D, Kidwai S, Kraneveld AD, Tonda A, Oberski D, Garssen J, Lopez-Rincon A. Methodology for biomarker discovery with reproducibility in microbiome data using machine learning. BMC Bioinformatics 2024;25:26. [PMID: 38225565 PMCID: PMC10789030 DOI: 10.1186/s12859-024-05639-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Accepted: 01/04/2024] [Indexed: 01/17/2024] Open

Abstract

BACKGROUND

In recent years, human microbiome studies have received increasing attention as this field is considered a potential source for clinical applications. With the advancements in omics technologies and AI, research focused on the discovery for potential biomarkers in the human microbiome using machine learning tools has produced positive outcomes. Despite the promising results, several issues can still be found in these studies such as datasets with small number of samples, inconsistent results, lack of uniform processing and methodologies, and other additional factors lead to lack of reproducibility in biomedical research. In this work, we propose a methodology that combines the DADA2 pipeline for 16s rRNA sequences processing and the Recursive Ensemble Feature Selection (REFS) in multiple datasets to increase reproducibility and obtain robust and reliable results in biomedical research.

RESULTS

Three experiments were performed analyzing microbiome data from patients/cases in Inflammatory Bowel Disease (IBD), Autism Spectrum Disorder (ASD), and Type 2 Diabetes (T2D). In each experiment, we found a biomarker signature in one dataset and applied to 2 other as further validation. The effectiveness of the proposed methodology was compared with other feature selection methods such as K-Best with F-score and random selection as a base line. The Area Under the Curve (AUC) was employed as a measure of diagnostic accuracy and used as a metric for comparing the results of the proposed methodology with other feature selection methods. Additionally, we use the Matthews Correlation Coefficient (MCC) as a metric to evaluate the performance of the methodology as well as for comparison with other feature selection methods.

CONCLUSIONS

We developed a methodology for reproducible biomarker discovery for 16s rRNA microbiome sequence analysis, addressing the issues related with data dimensionality, inconsistent results and validation across independent datasets. The findings from the three experiments, across 9 different datasets, show that the proposed methodology achieved higher accuracy compared to other feature selection methods. This methodology is a first approach to increase reproducibility, to provide robust and reliable results.

Collapse

Zhang Y, Wu H, Xu R, Wang Y, Chen L, Wei C. Machine learning modeling for the prediction of phosphorus and nitrogen removal efficiency and screening of crucial microorganisms in wastewater treatment plants. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024;907:167730. [PMID: 37852495 DOI: 10.1016/j.scitotenv.2023.167730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 10/08/2023] [Accepted: 10/08/2023] [Indexed: 10/20/2023]

Abstract

The effectiveness of wastewater treatment plants (WWTPs) is largely determined by the microbial community structure in their activated sludge (AS). Interactions among microbial communities in AS systems and their indirect effects on water quality changes are crucial for WWTP performance. However, there is currently no quantitative method to evaluate the contribution of microorganisms to the operating efficiency of WWTPs. Traditional assessments of WWTP performance are limited by experimental conditions, methods, and other factors, resulting in increased costs and experimental pollutants. Therefore, an effective method is needed to predict WWTP efficiency based on AS community structure and quantitatively evaluate the contribution of microorganisms in the AS system. This study evaluated and compared microbial communities and water quality changes from WWTPs worldwide by meta-analysis of published high-throughput sequencing data. Six machine learning (ML) models were utilized to predict the efficiency of phosphorus and nitrogen removal in WWTPs; among them, XGBoost showed the highest prediction accuracy. Cross-entropy was used to screen the crucial microorganisms related to phosphorus and nitrogen removal efficiency, and the modeling confirmed the reasonableness of the results. Thirteen genera with nitrogen and phosphorus cycling pathways obtained from the screening were considered highly appropriate for the simultaneous removal of phosphorus and nitrogen. The results showed that the microbes Haliangium, Vicinamibacteraceae, Tolumonas, and SWB02 are potentially crucial for phosphorus and nitrogen removal, as they may be involved in the process of phosphorus and nitrogen removal in sewage treatment plants. Overall, these findings have deepened our understanding of the relationship between microbial community structure and performance of WWTPs, indicating that microbial data should play a critical role in the future design of sewage treatment plants. The ML model of this study can efficiently screen crucial microbes associated with WWTP system performance, and it is promising for the discovery of potential microbial metabolic pathways.

Collapse

Peralta-Marzal LN, Rojas-Velazquez D, Rigters D, Prince N, Garssen J, Kraneveld AD, Perez-Pardo P, Lopez-Rincon A. A robust microbiome signature for autism spectrum disorder across different studies using machine learning. Sci Rep 2024;14:814. [PMID: 38191575 PMCID: PMC10774349 DOI: 10.1038/s41598-023-50601-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 12/21/2023] [Indexed: 01/10/2024] Open

Abstract

Autism spectrum disorder (ASD) is a highly complex neurodevelopmental disorder characterized by deficits in sociability and repetitive behaviour, however there is a great heterogeneity within other comorbidities that accompany ASD. Recently, gut microbiome has been pointed out as a plausible contributing factor for ASD development as individuals diagnosed with ASD often suffer from intestinal problems and show a differentiated intestinal microbial composition. Nevertheless, gut microbiome studies in ASD rarely agree on the specific bacterial taxa involved in this disorder. Regarding the potential role of gut microbiome in ASD pathophysiology, our aim is to investigate whether there is a set of bacterial taxa relevant for ASD classification by using a sibling-controlled dataset. Additionally, we aim to validate these results across two independent cohorts as several confounding factors, such as lifestyle, influence both ASD and gut microbiome studies. A machine learning approach, recursive ensemble feature selection (REFS), was applied to 16S rRNA gene sequencing data from 117 subjects (60 ASD cases and 57 siblings) identifying 26 bacterial taxa that discriminate ASD cases from controls. The average area under the curve (AUC) of this specific set of bacteria in the sibling-controlled dataset was 81.6%. Moreover, we applied the selected bacterial taxa in a tenfold cross-validation scheme using two independent cohorts (a total of 223 samples-125 ASD cases and 98 controls). We obtained average AUCs of 74.8% and 74%, respectively. Analysis of the gut microbiome using REFS identified a set of bacterial taxa that can be used to predict the ASD status of children in three distinct cohorts with AUC over 80% for the best-performing classifiers. Our results indicate that the gut microbiome has a strong association with ASD and should not be disregarded as a potential target for therapeutic interventions. Furthermore, our work can contribute to use the proposed approach for identifying microbiome signatures across other 16S rRNA gene sequencing datasets.

Collapse

Lyu R, Qu Y, Divaris K, Wu D. Methodological Considerations in Longitudinal Analyses of Microbiome Data: A Comprehensive Review. Genes (Basel) 2023;15:51. [PMID: 38254941 PMCID: PMC11154524 DOI: 10.3390/genes15010051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 12/22/2023] [Accepted: 12/26/2023] [Indexed: 01/24/2024] Open

Vänni P, Tejesvi MV, Paalanne N, Aagaard K, Ackermann G, Camargo CA, Eggesbø M, Hasegawa K, Hoen AG, Karagas MR, Kolho KL, Laursen MF, Ludvigsson J, Madan J, Ownby D, Stanton C, Stokholm J, Tapiainen T. Machine-learning analysis of cross-study samples according to the gut microbiome in 12 infant cohorts. mSystems 2023;8:e0036423. [PMID: 37874156 PMCID: PMC10734493 DOI: 10.1128/msystems.00364-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 09/13/2023] [Indexed: 10/25/2023] Open

Affiliation(s)

Petri Vänni Research Unit of Clinical Medicine, University of Oulu, Oulu, Finland
Mysore V. Tejesvi Research Unit of Clinical Medicine, University of Oulu, Oulu, Finland Ecology and Genetics, Faculty of Science, University of Oulu, Oulu, Finland
Niko Paalanne Research Unit of Clinical Medicine, University of Oulu, Oulu, Finland Department of Pediatrics and Adolescent Medicine, Oulu University Hospital, University of Oulu, Oulu, Finland
Kjersti Aagaard Department of Obstetrics & Gynecology, Division of Maternal-Fetal Medicine, Baylor College of Medicine and Texas Children’s Hospital, Houston, Texas, USA
Gail Ackermann Department of Pediatrics, University of California, San Diego, California, USA
Carlos A. Camargo Department of Emergency Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
Merete Eggesbø Department of Climate and Environmental Health, Norwegian Institute of Public Health, Oslo, Norway Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway
Kohei Hasegawa Department of Emergency Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
Anne G. Hoen Department of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, New Hampshire, USA
Margaret R. Karagas Department of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, New Hampshire, USA
Kaija-Leena Kolho Children’s Hospital, University of Helsinki and HUS, Helsinki, Finland
Martin F. Laursen National Food Institute, Technical University of Denmark, Lyngby, Denmark
Johnny Ludvigsson Crown Princess Victoria Children’s Hospital and Division of Pediatrics, Department of Biomedical and Clinical Sciences, Linköping University, Linköping, Sweden
Juliette Madan Department of Psychiatry, Dartmouth Hitchcock Medical Center, Geisel School of Medicine at Dartmouth, Lebanon, New Hampshire, USA Department of Pediatrics, Dartmouth Hitchcock Medical Center, Geisel School of Medicine at Dartmouth, Lebanon, New Hampshire, USA
Dennis Ownby Medical College of Georgia, Augusta, Georgia, USA
Catherine Stanton Teagasc Food Research Centre & APC Microbiome Ireland, Moorepark, Fermoy, Co. Cork, Ireland
Jakob Stokholm Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark Department of Food Science, University of Copenhagen, Copenhagen, Denmark
Terhi Tapiainen Research Unit of Clinical Medicine, University of Oulu, Oulu, Finland Department of Obstetrics & Gynecology, Division of Maternal-Fetal Medicine, Baylor College of Medicine and Texas Children’s Hospital, Houston, Texas, USA Biocenter Oulu, University of Oulu, Oulu, Finland

Collapse

Gautam A, Bhowmik D, Basu S, Zeng W, Lahiri A, Huson DH, Paul S. Microbiome Metabolome Integration Platform (MMIP): a web-based platform for microbiome and metabolome data integration and feature identification. Brief Bioinform 2023;24:bbad325. [PMID: 37771003 DOI: 10.1093/bib/bbad325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 08/12/2023] [Indexed: 09/30/2023] Open

Huang S, Ailer E, Kilbertus N, Pfister N. Supervised learning and model analysis with compositional data. PLoS Comput Biol 2023;19:e1011240. [PMID: 37390111 DOI: 10.1371/journal.pcbi.1011240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 06/03/2023] [Indexed: 07/02/2023] Open

Neri-Rosario D, Martínez-López YE, Esquivel-Hernández DA, Sánchez-Castañeda JP, Padron-Manrique C, Vázquez-Jiménez A, Giron-Villalobos D, Resendis-Antonio O. Dysbiosis signatures of gut microbiota and the progression of type 2 diabetes: a machine learning approach in a Mexican cohort. Front Endocrinol (Lausanne) 2023;14:1170459. [PMID: 37441494 PMCID: PMC10333697 DOI: 10.3389/fendo.2023.1170459] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Accepted: 06/09/2023] [Indexed: 07/15/2023] Open

Affiliation(s)

Daniel Neri-Rosario Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN), México City, Mexico Programa de Maestría y Doctorado en Ciencias Bioquímicas, Universidad Nacional Autónoma de México (UNAM), Ciudad de México, Mexico
Yoscelina Estrella Martínez-López Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN), México City, Mexico
Diego A. Esquivel-Hernández Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN), México City, Mexico
Jean Paul Sánchez-Castañeda Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN), México City, Mexico Programa de Maestría y Doctorado en Ciencias Bioquímicas, Universidad Nacional Autónoma de México (UNAM), Ciudad de México, Mexico
Cristian Padron-Manrique Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN), México City, Mexico Programa de Doctorado en Ciencias Biomédicas, Universidad Nacional Autónoma de México (UNAM), Ciudad de México, Mexico
Aarón Vázquez-Jiménez Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN), México City, Mexico
David Giron-Villalobos Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN), México City, Mexico Programa de Maestría y Doctorado en Ciencias Bioquímicas, Universidad Nacional Autónoma de México (UNAM), Ciudad de México, Mexico
Osbaldo Resendis-Antonio Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN), México City, Mexico Coordinación de la Investigación Científica – Red de Apoyo a la Investigación, Universidad Nacional Autónoma de México (UNAM), Ciudad de México, Mexico Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México (UNAM), Ciudad de México, Mexico

Collapse

Manghi P, Blanco-Míguez A, Manara S, NabiNejad A, Cumbo F, Beghini F, Armanini F, Golzato D, Huang KD, Thomas AM, Piccinno G, Punčochář M, Zolfo M, Lesker TR, Bredon M, Planchais J, Glodt J, Valles-Colomer M, Koren O, Pasolli E, Asnicar F, Strowig T, Sokol H, Segata N. MetaPhlAn 4 profiling of unknown species-level genome bins improves the characterization of diet-associated microbiome changes in mice. Cell Rep 2023;42:112464. [PMID: 37141097 PMCID: PMC10242440 DOI: 10.1016/j.celrep.2023.112464] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 03/10/2023] [Accepted: 04/17/2023] [Indexed: 05/05/2023] Open

Affiliation(s)

Paolo Manghi Department CIBIO, University of Trento, Trento, Italy
Aitor Blanco-Míguez Department CIBIO, University of Trento, Trento, Italy
Serena Manara Department CIBIO, University of Trento, Trento, Italy
Amir NabiNejad Department CIBIO, University of Trento, Trento, Italy; IEO, European Institute of Oncology IRCCS, Milan, Italy
Fabio Cumbo Department CIBIO, University of Trento, Trento, Italy
Francesco Beghini Department CIBIO, University of Trento, Trento, Italy
Federica Armanini Department CIBIO, University of Trento, Trento, Italy
Davide Golzato Department CIBIO, University of Trento, Trento, Italy
Kun D Huang Department CIBIO, University of Trento, Trento, Italy
Andrew M Thomas Department CIBIO, University of Trento, Trento, Italy
Gianmarco Piccinno Department CIBIO, University of Trento, Trento, Italy
Michal Punčochář Department CIBIO, University of Trento, Trento, Italy
Moreno Zolfo Department CIBIO, University of Trento, Trento, Italy
Till R Lesker Department of Microbial Immune Regulation, Helmholtz Centre for Infection Research, Braunschweig, Germany
Marius Bredon Gastroenterology Department, Sorbonne Université, INSERM, Centre de Recherche Saint Antoine, CRSA, AP-HP, Saint Antoine Hospital, 75012 Paris, France; Paris Centre for Microbiome Medicine (PaCeMM) FHU, Paris, France
Julien Planchais Paris Centre for Microbiome Medicine (PaCeMM) FHU, Paris, France; INRAE, UMR1319 Micalis & AgroParisTech, Jouy en Josas, France
Jeremy Glodt Paris Centre for Microbiome Medicine (PaCeMM) FHU, Paris, France; INRAE, UMR1319 Micalis & AgroParisTech, Jouy en Josas, France
Mireia Valles-Colomer Department CIBIO, University of Trento, Trento, Italy
Omry Koren Azrieli Faculty of Medicine, Bar-Ilan University, Safed, Israel
Edoardo Pasolli Department of Agricultural Sciences, University of Naples, Naples, Italy
Francesco Asnicar Department CIBIO, University of Trento, Trento, Italy
Till Strowig Department of Microbial Immune Regulation, Helmholtz Centre for Infection Research, Braunschweig, Germany; Centre for Individualised Infection Medicine (CiiM), a joint venture between the Helmholtz-Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany
Harry Sokol Gastroenterology Department, Sorbonne Université, INSERM, Centre de Recherche Saint Antoine, CRSA, AP-HP, Saint Antoine Hospital, 75012 Paris, France; Paris Centre for Microbiome Medicine (PaCeMM) FHU, Paris, France; INRAE, UMR1319 Micalis & AgroParisTech, Jouy en Josas, France
Nicola Segata Department CIBIO, University of Trento, Trento, Italy; IEO, European Institute of Oncology IRCCS, Milan, Italy.

Collapse

Tapio M, Fischer D, Mäntysaari P, Tapio I. Rumen Microbiota Predicts Feed Efficiency of Primiparous Nordic Red Dairy Cows. Microorganisms 2023;11:1116. [PMID: 37317090 DOI: 10.3390/microorganisms11051116] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 04/17/2023] [Accepted: 04/23/2023] [Indexed: 06/16/2023] Open

Lee CY, Dillard LR, Papin JA, Arnold KB. New perspectives into the vaginal microbiome with systems biology. Trends Microbiol 2023;31:356-368. [PMID: 36272885 DOI: 10.1016/j.tim.2022.09.011] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 09/19/2022] [Accepted: 09/21/2022] [Indexed: 10/28/2022]

Chung T, Yan R, Weller DL, Kovac J. Conditional Forest Models Built Using Metagenomic Data Accurately Predicted Salmonella Contamination in Northeastern Streams. Microbiol Spectr 2023;11:e0038123. [PMID: 36946722 PMCID: PMC10100987 DOI: 10.1128/spectrum.00381-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Accepted: 02/27/2023] [Indexed: 03/23/2023] Open

Abstract

The use of water contaminated with Salmonella for produce production contributes to foodborne disease burden. To reduce human health risks, there is a need for novel, targeted approaches for assessing the pathogen status of agricultural water. We investigated the utility of water microbiome data for predicting Salmonella contamination of streams used to source water for produce production. Grab samples were collected from 60 New York streams in 2018 and tested for Salmonella. Separately, DNA was extracted from the samples and used for Illumina shotgun metagenomic sequencing. Reads were trimmed and used to assign taxonomy with Kraken2. Conditional forest (CF), regularized random forest (RRF), and support vector machine (SVM) models were implemented to predict Salmonella contamination. Model performance was assessed using 10-fold cross-validation repeated 10 times to quantify area under the curve (AUC) and Kappa score. CF models outperformed the other two algorithms based on AUC (0.86, CF; 0.81, RRF; 0.65, SVM) and Kappa score (0.53, CF; 0.41, RRF; 0.12, SVM). The taxa that were most informative for accurately predicting Salmonella contamination based on CF were compared to taxa identified by ALDEx2 as being differentially abundant between Salmonella-positive and -negative samples. CF and differential abundance tests both identified Aeromonas salmonicida (variable importance [VI] = 0.012) and Aeromonas sp. strain CA23 (VI = 0.025) as the two most informative taxa for predicting Salmonella contamination. Our findings suggest that microbiome-based models may provide an alternative to or complement existing water monitoring strategies. Similarly, the informative taxa identified in this study warrant further investigation as potential indicators of Salmonella contamination of agricultural water. IMPORTANCE Understanding the associations between surface water microbiome composition and the presence of foodborne pathogens, such as Salmonella, can facilitate the identification of novel indicators of Salmonella contamination. This study assessed the utility of microbiome data and three machine learning algorithms for predicting Salmonella contamination of Northeastern streams. The research reported here both expanded the knowledge on the microbiome composition of surface waters and identified putative novel indicators (i.e., Aeromonas species) for Salmonella in Northeastern streams. These putative indicators warrant further research to assess whether they are consistent indicators of Salmonella contamination across regions, waterways, and years not represented in the data set used in this study. Validated indicators identified using microbiome data may be used as targets in the development of rapid (e.g., PCR-based) detection assays for the assessment of microbial safety of agricultural surface waters.

Collapse

Shen Y, Zhu J, Deng Z, Lu W, Wang H. EnsDeepDP: An Ensemble Deep Learning Approach for Disease Prediction Through Metagenomics. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:986-998. [PMID: 36001521 DOI: 10.1109/tcbb.2022.3201295] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Leveraging Scheme for Cross-Study Microbiome Machine Learning Prediction and Feature Evaluations. Bioengineering (Basel) 2023;10:bioengineering10020231. [PMID: 36829725 PMCID: PMC9952031 DOI: 10.3390/bioengineering10020231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 02/02/2023] [Accepted: 02/04/2023] [Indexed: 02/11/2023] Open

Busato S, Gordon M, Chaudhari M, Jensen I, Akyol T, Andersen S, Williams C. Compositionality, sparsity, spurious heterogeneity, and other data-driven challenges for machine learning algorithms within plant microbiome studies. CURRENT OPINION IN PLANT BIOLOGY 2023;71:102326. [PMID: 36538837 PMCID: PMC9925409 DOI: 10.1016/j.pbi.2022.102326] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Revised: 11/08/2022] [Accepted: 11/21/2022] [Indexed: 06/17/2023]

Interpreting tree ensemble machine learning models with endoR. PLoS Comput Biol 2022;18:e1010714. [PMID: 36516158 PMCID: PMC9797088 DOI: 10.1371/journal.pcbi.1010714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 12/28/2022] [Accepted: 11/07/2022] [Indexed: 12/15/2022] Open

Abstract

Tree ensemble machine learning models are increasingly used in microbiome science as they are compatible with the compositional, high-dimensional, and sparse structure of sequence-based microbiome data. While such models are often good at predicting phenotypes based on microbiome data, they only yield limited insights into how microbial taxa may be associated. We developed endoR, a method to interpret tree ensemble models. First, endoR simplifies the fitted model into a decision ensemble. Then, it extracts information on the importance of individual features and their pairwise interactions, displaying them as an interpretable network. Both the endoR network and importance scores provide insights into how features, and interactions between them, contribute to the predictive performance of the fitted model. Adjustable regularization and bootstrapping help reduce the complexity and ensure that only essential parts of the model are retained. We assessed endoR on both simulated and real metagenomic data. We found endoR to have comparable accuracy to other common approaches while easing and enhancing model interpretation. Using endoR, we also confirmed published results on gut microbiome differences between cirrhotic and healthy individuals. Finally, we utilized endoR to explore associations between human gut methanogens and microbiome components. Indeed, these hydrogen consumers are expected to interact with fermenting bacteria in a complex syntrophic network. Specifically, we analyzed a global metagenome dataset of 2203 individuals and confirmed the previously reported association between Methanobacteriaceae and Christensenellales. Additionally, we observed that Methanobacteriaceae are associated with a network of hydrogen-producing bacteria. Our method accurately captures how tree ensembles use features and interactions between them to predict a response. As demonstrated by our applications, the resultant visualizations and summary outputs facilitate model interpretation and enable the generation of novel hypotheses about complex systems.

Collapse

Loganathan T, Priya Doss C G. The influence of machine learning technologies in gut microbiome research and cancer studies - A review. Life Sci 2022;311:121118. [DOI: 10.1016/j.lfs.2022.121118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 10/19/2022] [Accepted: 10/19/2022] [Indexed: 11/18/2022]

Sampling from four geographically divergent young female populations demonstrates forensic geolocation potential in microbiomes. Sci Rep 2022;12:18547. [PMID: 36329122 PMCID: PMC9633824 DOI: 10.1038/s41598-022-21779-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 10/04/2022] [Indexed: 11/06/2022] Open

Ahmed E, Hens K. Microbiome in Precision Psychiatry: An Overview of the Ethical Challenges Regarding Microbiome Big Data and Microbiome-Based Interventions. AJOB Neurosci 2022;13:270-286. [PMID: 34379050 DOI: 10.1080/21507740.2021.1958096] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2023]

Pietrucci D, Teofani A, Milanesi M, Fosso B, Putignani L, Messina F, Pesole G, Desideri A, Chillemi G. Machine Learning Data Analysis Highlights the Role of Parasutterella and Alloprevotella in Autism Spectrum Disorders. Biomedicines 2022;10:biomedicines10082028. [PMID: 36009575 PMCID: PMC9405825 DOI: 10.3390/biomedicines10082028] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 08/10/2022] [Accepted: 08/15/2022] [Indexed: 11/25/2022] Open

Evaluation of Prebiotics through an In Vitro Gastrointestinal Digestion and Fecal Fermentation Experiment: Further Idea on the Implementation of Machine Learning Technique. Foods 2022;11:foods11162490. [PMID: 36010490 PMCID: PMC9407061 DOI: 10.3390/foods11162490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 08/12/2022] [Accepted: 08/16/2022] [Indexed: 11/17/2022] Open

Zhou YH, Sun G. Improve the Colorectal Cancer Diagnosis Using Gut Microbiome Data. Front Mol Biosci 2022;9:921945. [PMID: 36032686 PMCID: PMC9415616 DOI: 10.3389/fmolb.2022.921945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 06/16/2022] [Indexed: 11/17/2022] Open

Zhou L, Zhao Z, Shao L, Fang S, Li T, Gan L, Guo C. Predicting the abundance of metal resistance genes in subtropical estuaries using amplicon sequencing and machine learning. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2022;241:113844. [PMID: 36068766 DOI: 10.1016/j.ecoenv.2022.113844] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 06/24/2022] [Accepted: 07/01/2022] [Indexed: 06/15/2023]

New-Generation Sequencing Technology in Diagnosis of Fungal Plant Pathogens: A Dream Comes True? J Fungi (Basel) 2022;8:jof8070737. [PMID: 35887492 PMCID: PMC9320658 DOI: 10.3390/jof8070737] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 07/01/2022] [Accepted: 07/11/2022] [Indexed: 02/01/2023] Open

Wang Q, Wei Y. Quantifying uncertainty of subsampling-based ensemble methods under a U-statistic framework. J STAT COMPUT SIM 2022. [DOI: 10.1080/00949655.2022.2081969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Wani AK, Roy P, Kumar V, Mir TUG. Metagenomics and artificial intelligence in the context of human health. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2022;100:105267. [PMID: 35278679 DOI: 10.1016/j.meegid.2022.105267] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 03/03/2022] [Accepted: 03/04/2022] [Indexed: 12/12/2022]

Morgan EW, Perdew GH, Patterson AD. Multi-Omics Strategies for Investigating the Microbiome in Toxicology Research. Toxicol Sci 2022;187:189-213. [PMID: 35285497 PMCID: PMC9154275 DOI: 10.1093/toxsci/kfac029] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Chen X, Zhu Z, Zhang W, Wang Y, Wang F, Yang J, Wong KC. Human disease prediction from microbiome data by multiple feature fusion and deep learning. iScience 2022;25:104081. [PMID: 35372808 PMCID: PMC8971930 DOI: 10.1016/j.isci.2022.104081] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2021] [Revised: 09/16/2021] [Accepted: 03/13/2022] [Indexed: 10/29/2022] Open

Zhou H, He K, Chen J, Zhang X. LinDA: linear models for differential abundance analysis of microbiome compositional data. Genome Biol 2022;23:95. [PMID: 35421994 PMCID: PMC9012043 DOI: 10.1186/s13059-022-02655-5] [Citation(s) in RCA: 67] [Impact Index Per Article: 33.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 03/14/2022] [Indexed: 12/12/2022] Open

Parvandeh S, Donehower LA, Katsonis P, Hsu TK, Asmussen J, Lee K, Lichtarge O. EPIMUTESTR: a nearest neighbor machine learning approach to predict cancer driver genes from the evolutionary action of coding variants. Nucleic Acids Res 2022;50:e70. [PMID: 35412634 PMCID: PMC9262594 DOI: 10.1093/nar/gkac215] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 03/17/2022] [Accepted: 03/21/2022] [Indexed: 02/01/2023] Open

Host phenotype classification from human microbiome data is mainly driven by the presence of microbial taxa. PLoS Comput Biol 2022;18:e1010066. [PMID: 35446845 PMCID: PMC9064115 DOI: 10.1371/journal.pcbi.1010066] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 05/03/2022] [Accepted: 03/29/2022] [Indexed: 12/14/2022] Open

Abstract

Machine learning-based classification approaches are widely used to predict host phenotypes from microbiome data. Classifiers are typically employed by considering operational taxonomic units or relative abundance profiles as input features. Such types of data are intrinsically sparse, which opens the opportunity to make predictions from the presence/absence rather than the relative abundance of microbial taxa. This also poses the question whether it is the presence rather than the abundance of particular taxa to be relevant for discrimination purposes, an aspect that has been so far overlooked in the literature. In this paper, we aim at filling this gap by performing a meta-analysis on 4,128 publicly available metagenomes associated with multiple case-control studies. At species-level taxonomic resolution, we show that it is the presence rather than the relative abundance of specific microbial taxa to be important when building classification models. Such findings are robust to the choice of the classifier and confirmed by statistical tests applied to identifying differentially abundant/present taxa. Results are further confirmed at coarser taxonomic resolutions and validated on 4,026 additional 16S rRNA samples coming from 30 public case-control studies.

The composition of the human microbiome has been linked to a large number of different diseases. In this context, classification methodologies based on machine learning approaches have represented a promising tool for diagnostic purposes from metagenomics data. The link between microbial population composition and host phenotypes has been usually performed by considering taxonomic profiles represented by relative abundances of microbial species. In this study, we show that it is more the presence rather than the relative abundance of microbial taxa to be relevant to maximize classification accuracy. This is accomplished by conducting a meta-analysis on more than 4,000 shotgun metagenomes coming from 25 case-control studies and in which original relative abundance data are degraded to presence/absence profiles. Findings are also extended to 16S rRNA data and advance the research field in building prediction models directly from human microbiome data.

Collapse

Liu B, Sträuber H, Saraiva J, Harms H, Silva SG, Kasmanas JC, Kleinsteuber S, Nunes da Rocha U. Machine learning-assisted identification of bioindicators predicts medium-chain carboxylate production performance of an anaerobic mixed culture. MICROBIOME 2022;10:48. [PMID: 35331330 PMCID: PMC8952268 DOI: 10.1186/s40168-021-01219-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 12/17/2021] [Indexed: 05/10/2023]

Abstract

BACKGROUND

The ability to quantitatively predict ecophysiological functions of microbial communities provides an important step to engineer microbiota for desired functions related to specific biochemical conversions. Here, we present the quantitative prediction of medium-chain carboxylate production in two continuous anaerobic bioreactors from 16S rRNA gene dynamics in enriched communities.

RESULTS

By progressively shortening the hydraulic retention time (HRT) from 8 to 2 days with different temporal schemes in two bioreactors operated for 211 days, we achieved higher productivities and yields of the target products n-caproate and n-caprylate. The datasets generated from each bioreactor were applied independently for training and testing machine learning algorithms using 16S rRNA genes to predict n-caproate and n-caprylate productivities. Our dataset consisted of 14 and 40 samples from HRT of 8 and 2 days, respectively. Because of the size and balance of our dataset, we compared linear regression, support vector machine and random forest regression algorithms using the original and balanced datasets generated using synthetic minority oversampling. Further, we performed cross-validation to estimate model stability. The random forest regression was the best algorithm producing more consistent results with median of error rates below 8%. More than 90% accuracy in the prediction of n-caproate and n-caprylate productivities was achieved. Four inferred bioindicators belonging to the genera Olsenella, Lactobacillus, Syntrophococcus and Clostridium IV suggest their relevance to the higher carboxylate productivity at shorter HRT. The recovery of metagenome-assembled genomes of these bioindicators confirmed their genetic potential to perform key steps of medium-chain carboxylate production.

CONCLUSIONS

Shortening the hydraulic retention time of the continuous bioreactor systems allows to shape the communities with desired chain elongation functions. Using machine learning, we demonstrated that 16S rRNA amplicon sequencing data can be used to predict bioreactor process performance quantitatively and accurately. Characterizing and harnessing bioindicators holds promise to manage reactor microbiota towards selection of the target processes. Our mathematical framework is transferrable to other ecosystem processes and microbial systems where community dynamics is linked to key functions. The general methodology used here can be adapted to data types of other functional categories such as genes, transcripts, proteins or metabolites. Video Abstract.

Collapse

David MM, Tataru C, Pope Q, Baker LJ, English MK, Epstein HE, Hammer A, Kent M, Sieler MJ, Mueller RS, Sharpton TJ, Tomas F, Vega Thurber R, Fern XZ. Revealing General Patterns of Microbiomes That Transcend Systems: Potential and Challenges of Deep Transfer Learning. mSystems 2022;7:e0105821. [PMID: 35040699 PMCID: PMC8765061 DOI: 10.1128/msystems.01058-21] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open

Jin BT, Xu F, Ng RT, Hogg JC. Mian: interactive web-based microbiome data table visualization and machine learning platform. Bioinformatics 2022;38:1176-1178. [PMID: 34788784 DOI: 10.1093/bioinformatics/btab754] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 09/21/2021] [Accepted: 11/03/2021] [Indexed: 02/03/2023] Open

Youngblut ND, de la Cuesta-Zuluaga J, Ley RE. Incorporating genome-based phylogeny and functional similarity into diversity assessments helps to resolve a global collection of human gut metagenomes. Environ Microbiol 2022;24:3966-3984. [PMID: 35049120 DOI: 10.1111/1462-2920.15910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 01/15/2022] [Indexed: 11/29/2022]

Multimodal deep learning applied to classify healthy and disease states of human microbiome. Sci Rep 2022;12:824. [PMID: 35039534 PMCID: PMC8763943 DOI: 10.1038/s41598-022-04773-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Accepted: 12/30/2021] [Indexed: 12/22/2022] Open

Deng Z, Zhang J, Li J, Zhang X. Application of Deep Learning in Plant-Microbiota Association Analysis. Front Genet 2021;12:697090. [PMID: 34691142 PMCID: PMC8531731 DOI: 10.3389/fgene.2021.697090] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 08/31/2021] [Indexed: 01/04/2023] Open