Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Vangay P, Hillmann BM, Knights D. Microbiome Learning Repo (ML Repo): A public repository of microbiome regression and classification tasks. Gigascience 2019;8:giz042. [PMID: 31042284 PMCID: PMC6493971 DOI: 10.1093/gigascience/giz042] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Revised: 02/24/2019] [Accepted: 03/26/2019] [Indexed: 01/05/2023] Open

For:	Vangay P, Hillmann BM, Knights D. Microbiome Learning Repo (ML Repo): A public repository of microbiome regression and classification tasks. Gigascience 2019;8:giz042. [PMID: 31042284 PMCID: PMC6493971 DOI: 10.1093/gigascience/giz042] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Revised: 02/24/2019] [Accepted: 03/26/2019] [Indexed: 01/05/2023] Open

Number

Cited by Other Article(s)

Fernández-Edreira D, Liñares-Blanco J, V.-del-Río P, Fernandez-Lozano C. VIBES: A consensus subtyping of the vaginal microbiota reveals novel classification criteria. Comput Struct Biotechnol J 2024;23:148-156. [PMID: 38144944 PMCID: PMC10749217 DOI: 10.1016/j.csbj.2023.11.050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 11/16/2023] [Accepted: 11/27/2023] [Indexed: 12/26/2023] Open

Gonia S, Heisel T, Miller N, Haapala J, Harnack L, Georgieff MK, Fields DA, Knights D, Jacobs K, Seburg E, Demerath EW, Gale CA, Swanson MH. Maternal oral probiotic use is associated with decreased breastmilk inflammatory markers, infant fecal microbiome variation, and altered recognition memory responses in infants-a pilot observational study. Front Nutr 2024;11:1456111. [PMID: 39385777 PMCID: PMC11462058 DOI: 10.3389/fnut.2024.1456111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Accepted: 09/02/2024] [Indexed: 10/12/2024] Open

Abstract

Introduction

Early life gut microbiomes are important for brain and immune system development in animal models. Probiotic use has been proposed as a strategy to promote health via modulation of microbiomes. In this observational study, we explore if early life exposure to probiotics via the mother during pregnancy and lactation, is associated with decreased inflammation in breastmilk, maternal and infant microbiome variation, and altered infant neurodevelopmental features.

Methods

Exclusively breastfeeding mother-infant dyads were recruited as part of the "Mothers and Infants Linked for Healthy Growth (MILk) Study." Probiotic comparison groups were defined by exposure to maternal probiotics (NO/YES) and by timing of probiotic exposure (prenatal, postnatal, total). C-reactive protein (CRP) and IL-6 levels were determined in breastmilk by immunoassays, and microbiomes were characterized from 1-month milk and from 1- and 6-month infant feces by 16S rDNA sequencing. Infant brain function was profiled via electroencephalogram (EEG); we assessed recognition memory using event-related potential (ERP) responses to familiar and novel auditory (1 month) and visual (6 months) stimuli. Statistical comparisons of study outcomes between probiotic groups were performed using permutational analysis of variance (PERMANOVA) (microbiome) and linear models (all other study outcomes), including relevant covariables as indicated.

Results

We observed associations between probiotic exposure and lower breastmilk CRP and IL-6 levels, and infant gut microbiome variation at 1- and 6-months of age (including higher abundances of Bifidobacteria and Lactobacillus). In addition, maternal probiotic exposure was associated with differences in infant ERP features at 6-months of age. Specifically, infants who were exposed to postnatal maternal probiotics (between the 1- and 6-month study visits) via breastfeeding/breastmilk, had larger differential responses between familiar and novel visual stimuli with respect to the late slow wave component of the EEG, which may indicate greater memory updating potential. The milk of mothers of this subgroup of infants had lower IL-6 levels and infants had different 6-month fecal microbiomes as compared to those in the "NO" maternal probiotics group.

Discussion

These results support continued research into "Microbiota-Gut-Brain" connections during early life and the role of pre- and postnatal probiotics in mothers to promote healthy microbiome-associated outcomes in infants.

Collapse

Maaskant A, Voermans B, Levin E, de Goffau MC, Plomp N, Schuren F, Remarque EJ, Smits A, Langermans JAM, Bakker J, Montijn R. Microbiome signature suggestive of lactose-intolerance in rhesus macaques (Macaca mulatta) with intermittent chronic diarrhea. Anim Microbiome 2024;6:53. [PMID: 39313845 PMCID: PMC11421201 DOI: 10.1186/s42523-024-00338-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Accepted: 09/06/2024] [Indexed: 09/25/2024] Open

Gorman ED, Lladser ME. Interpretable metric learning in comparative metagenomics: The adaptive Haar-like distance. PLoS Comput Biol 2024;20:e1011543. [PMID: 38768195 PMCID: PMC11142682 DOI: 10.1371/journal.pcbi.1011543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 05/31/2024] [Accepted: 04/25/2024] [Indexed: 05/22/2024] Open

Quinn TP, Hess JL, Marshe VS, Barnett MM, Hauschild AC, Maciukiewicz M, Elsheikh SSM, Men X, Schwarz E, Trakadis YJ, Breen MS, Barnett EJ, Zhang-James Y, Ahsen ME, Cao H, Chen J, Hou J, Salekin A, Lin PI, Nicodemus KK, Meyer-Lindenberg A, Bichindaritz I, Faraone SV, Cairns MJ, Pandey G, Müller DJ, Glatt SJ. A primer on the use of machine learning to distil knowledge from data in biological psychiatry. Mol Psychiatry 2024;29:387-401. [PMID: 38177352 PMCID: PMC11228968 DOI: 10.1038/s41380-023-02334-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 09/21/2023] [Accepted: 11/17/2023] [Indexed: 01/06/2024]

Affiliation(s)

Thomas P Quinn Applied Artificial Intelligence Institute (A2I2), Burwood, VIC, 3125, Australia
Jonathan L Hess Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
Victoria S Marshe Institute of Medical Science, University of Toronto, Toronto, ON, M5S 1A1, Canada Pharmacogenetics Research Clinic, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, M5S 1A1, Canada
Michelle M Barnett School of Biomedical Sciences and Pharmacy, The University of Newcastle, Callaghan, NSW, 2308, Australia Precision Medicine Research Program, Hunter Medical Research Institute, Newcastle, NSW, 2308, Australia
Anne-Christin Hauschild Department of Medical Informatics, Medical University Center Göttingen, Göttingen, Lower Saxony, 37075, Germany
Malgorzata Maciukiewicz Hospital Zurich, University of Zurich, Zurich, 8091, Switzerland Department of Rheumatology and Immunology, University Hospital Bern, Bern, 3010, Switzerland Department for Biomedical Research (DBMR), University of Bern, Bern, 3010, Switzerland
Samar S M Elsheikh Pharmacogenetics Research Clinic, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, M5S 1A1, Canada
Xiaoyu Men Pharmacogenetics Research Clinic, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, M5S 1A1, Canada Department of Pharmacology and Toxicology, University of Toronto, Toronto, ON, M5S 1A1, Canada
Emanuel Schwarz Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Mannheim, Baden-Württemberg, J5 68159, Germany
Yannis J Trakadis Department Human Genetics, McGill University Health Centre, Montreal, QC, H4A 3J1, Canada
Michael S Breen Psychiatry, Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
Eric J Barnett Department of Neuroscience and Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
Yanli Zhang-James Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
Mehmet Eren Ahsen Department of Business Administration, Gies College of Business, University of Illinois at Urbana-Champaign, Champaign, IL, 61820, USA Department of Biomedical and Translational Sciences, Carle-Illinois School of Medicine, University of Illinois at Urbana-Champaign, Champaign, IL, 61820, USA
Han Cao Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Mannheim, Baden-Württemberg, J5 68159, Germany
Junfang Chen Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Mannheim, Baden-Württemberg, J5 68159, Germany
Jiahui Hou Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA Department of Neuroscience and Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
Asif Salekin Electrical Engineering and Computer Science, Syracuse University, Syracuse, NY, 13244, USA
Ping-I Lin Discipline of Psychiatry and Mental Health, University of New South Wales, Sydney, NSW, 2052, Australia Mental Health Research Unit, South Western Sydney Local Health District, Liverpool, NSW, 2170, Australia
Kristin K Nicodemus Usher Institute, University of Edinburgh, Edinburgh, EH8 9YL, UK
Andreas Meyer-Lindenberg Clinical Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Mannheim, Baden-Württemberg, J5 68159, Germany
Isabelle Bichindaritz Biomedical and Health Informatics/Computer Science Department, State University of New York at Oswego, Oswego, NY, 13126, USA Intelligent Bio Systems Lab, State University of New York at Oswego, Oswego, NY, 13126, USA
Stephen V Faraone Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA Department of Neuroscience and Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
Murray J Cairns School of Biomedical Sciences and Pharmacy, The University of Newcastle, Callaghan, NSW, 2308, Australia Precision Medicine Research Program, Hunter Medical Research Institute, Newcastle, NSW, 2308, Australia
Gaurav Pandey Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
Daniel J Müller Pharmacogenetics Research Clinic, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, M5S 1A1, Canada Department of Psychiatry, University of Toronto, Toronto, ON, M5S 1A1, Canada Department of Psychiatry, Psychosomatics and Psychotherapy, Center of Mental Health, University Hospital of Würzburg, Würzburg, 97080, Germany
Stephen J Glatt Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA. Department of Neuroscience and Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA. Department of Public Health and Preventive Medicine, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA.

Collapse

Ibrahimi E, Lopes MB, Dhamo X, Simeon A, Shigdel R, Hron K, Stres B, D’Elia D, Berland M, Marcos-Zambrano LJ. Overview of data preprocessing for machine learning applications in human microbiome research. Front Microbiol 2023;14:1250909. [PMID: 37869650 PMCID: PMC10588656 DOI: 10.3389/fmicb.2023.1250909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 09/22/2023] [Indexed: 10/24/2023] Open

Deschênes T, Tohoundjona FWE, Plante PL, Di Marzo V, Raymond F. Gene-based microbiome representation enhances host phenotype classification. mSystems 2023;8:e0053123. [PMID: 37404032 PMCID: PMC10469787 DOI: 10.1128/msystems.00531-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 05/24/2023] [Indexed: 07/06/2023] Open

Abstract

With the concomitant advances in both the microbiome and machine learning fields, the gut microbiome has become of great interest for the potential discovery of biomarkers to be used in the classification of the host health status. Shotgun metagenomics data derived from the human microbiome is composed of a high-dimensional set of microbial features. The use of such complex data for the modeling of host-microbiome interactions remains a challenge as retaining de novo content yields a highly granular set of microbial features. In this study, we compared the prediction performances of machine learning approaches according to different types of data representations derived from shotgun metagenomics. These representations include commonly used taxonomic and functional profiles and the more granular gene cluster approach. For the five case-control datasets used in this study (Type 2 diabetes, obesity, liver cirrhosis, colorectal cancer, and inflammatory bowel disease), gene-based approaches, whether used alone or in combination with reference-based data types, allowed improved or similar classification performances as the taxonomic and functional profiles. In addition, we show that using subsets of gene families from specific functional categories of genes highlight the importance of these functions on the host phenotype. This study demonstrates that both reference-free microbiome representations and curated metagenomic annotations can provide relevant representations for machine learning based on metagenomic data. IMPORTANCE Data representation is an essential part of machine learning performance when using metagenomic data. In this work, we show that different microbiome representations provide varied host phenotype classification performance depending on the dataset. In classification tasks, untargeted microbiome gene content can provide similar or improved classification compared to taxonomical profiling. Feature selection based on biological function also improves classification performance for some pathologies. Function-based feature selection combined with interpretable machine learning algorithms can generate new hypotheses that can potentially be assayed mechanistically. This work thus proposes new approaches to represent microbiome data for machine learning that can potentiate the findings associated with metagenomic data.

Collapse

Affiliation(s)

Thomas Deschênes Centre Nutrition, Santé et Société (NUTRISS) – Institut sur la Nutrition et les Aliments Fonctionnels (INAF), Université Laval, Québec, Canada Canada Research Excellence Chair on the Microbiome-Endocannabinoidome Axis in Metabolic Health (CERC-MEND), Quebec City, Quebec, Canada Institut Intelligence et Données, Université Laval, Québec, Canada
Fred Wilfried Elom Tohoundjona Centre Nutrition, Santé et Société (NUTRISS) – Institut sur la Nutrition et les Aliments Fonctionnels (INAF), Université Laval, Québec, Canada Canada Research Excellence Chair on the Microbiome-Endocannabinoidome Axis in Metabolic Health (CERC-MEND), Quebec City, Quebec, Canada
Pier-Luc Plante Centre Nutrition, Santé et Société (NUTRISS) – Institut sur la Nutrition et les Aliments Fonctionnels (INAF), Université Laval, Québec, Canada Canada Research Excellence Chair on the Microbiome-Endocannabinoidome Axis in Metabolic Health (CERC-MEND), Quebec City, Quebec, Canada Institut Intelligence et Données, Université Laval, Québec, Canada
Vincenzo Di Marzo Centre Nutrition, Santé et Société (NUTRISS) – Institut sur la Nutrition et les Aliments Fonctionnels (INAF), Université Laval, Québec, Canada Canada Research Excellence Chair on the Microbiome-Endocannabinoidome Axis in Metabolic Health (CERC-MEND), Quebec City, Quebec, Canada École de nutrition, Faculté des sciences de l’agriculture et de l’alimentation (FSAA), Université Laval, Québec, Canada Centre de recherche de l’Institut universitaire de cardiologie et de pneumologie de Québec (IUCPQ), Québec, Canada Département de médecine, Faculté de Médecine, Université Laval, Québec, Canada Joint International Unit on Chemical and Biomolecular Research on the Microbiome and its Impact on Metabolic Health and Nutrition (UMI-MicroMeNu), Quebec City, Canada
Frédéric Raymond Centre Nutrition, Santé et Société (NUTRISS) – Institut sur la Nutrition et les Aliments Fonctionnels (INAF), Université Laval, Québec, Canada Canada Research Excellence Chair on the Microbiome-Endocannabinoidome Axis in Metabolic Health (CERC-MEND), Quebec City, Quebec, Canada Institut Intelligence et Données, Université Laval, Québec, Canada École de nutrition, Faculté des sciences de l’agriculture et de l’alimentation (FSAA), Université Laval, Québec, Canada

Collapse

Shtossel O, Isakov H, Turjeman S, Koren O, Louzoun Y. Ordering taxa in image convolution networks improves microbiome-based machine learning accuracy. Gut Microbes 2023;15:2224474. [PMID: 37345233 PMCID: PMC10288916 DOI: 10.1080/19490976.2023.2224474] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 06/08/2023] [Indexed: 06/23/2023] Open

Loganathan T, Priya Doss C G. The influence of machine learning technologies in gut microbiome research and cancer studies - A review. Life Sci 2022;311:121118. [DOI: 10.1016/j.lfs.2022.121118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 10/19/2022] [Accepted: 10/19/2022] [Indexed: 11/18/2022]

Heisel T, Johnson AJ, Gonia S, Dillon A, Skalla E, Haapala J, Jacobs KM, Nagel E, Pierce S, Fields D, Demerath E, Knights D, Gale CA. Bacterial, fungal, and interkingdom microbiome features of exclusively breastfeeding dyads are associated with infant age, antibiotic exposure, and birth mode. Front Microbiol 2022;13:1050574. [PMID: 36466688 PMCID: PMC9714262 DOI: 10.3389/fmicb.2022.1050574] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 10/26/2022] [Indexed: 11/19/2022] Open

Abstract

The composition and function of early life gut bacterial communities (microbiomes) have been proposed to modulate health for the long term. In addition to bacteria, fungi (mycobiomes) also colonize the early life gut and have been implicated in health disorders such as asthma and obesity. Despite the potential importance of mycobiomes in health, there has been a lack of study regarding fungi and their interkingdom interactions with bacteria during infancy. The goal of this study was to obtain a more complete understanding of microbial communities thought to be relevant for the early life programming of health. Breastmilk and infant feces were obtained from a unique cohort of healthy, exclusively breastfeeding dyads recruited as part of the Mothers and Infants Linked for Healthy Growth (MILk) study with microbial taxa characterized using amplicon-based sequencing approaches. Bacterial and fungal communities in breastmilk were both distinct from those of infant feces, consistent with niche-specific microbial community development. Nevertheless, overlap was observed among sample types (breastmilk, 1-month feces, 6-month feces) with respect to the taxa that were the most prevalent and abundant. Self-reported antibacterial antibiotic exposure was associated with micro- as well as mycobiome variation, which depended upon the subject receiving antibiotics (mother or infant), timing of exposure (prenatal, peri- or postpartum), and sample type. In addition, birth mode was associated with bacterial and fungal community variation in infant feces, but not breastmilk. Correlations between bacterial and fungal taxa abundances were identified in all sample types. For infant feces, congruency between bacterial and fungal communities was higher for older infants, consistent with the idea of co-maturation of bacterial and fungal gut communities. Interkingdom connectedness also tended to be higher in older infants. Additionally, higher interkingdom connectedness was associated with Cesarean section birth and with antibiotic exposure for microbial communities of both breastmilk and infant feces. Overall, these results implicate infant age, birth mode, and antibiotic exposure in bacterial, fungal and interkingdom relationship variation in early-life-relevant microbiomes, expanding the current literature beyond bacteria.

Collapse

Boyraz A, Pawlowsky-Glahn V, Egozcue JJ, Acar AC. Principal microbial groups: compositional alternative to phylogenetic grouping of microbiome data. Brief Bioinform 2022;23:6675749. [PMID: 36007229 DOI: 10.1093/bib/bbac328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 07/19/2022] [Accepted: 07/20/2022] [Indexed: 11/13/2022] Open

Leske M, Bottacini F, Afli H, Andrade BGN. BiGAMi: Bi-Objective Genetic Algorithm Fitness Function for Feature Selection on Microbiome Datasets. Methods Protoc 2022;5:42. [PMID: 35645350 PMCID: PMC9149982 DOI: 10.3390/mps5030042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 05/16/2022] [Accepted: 05/18/2022] [Indexed: 11/23/2022] Open

Agostinetto G, Bozzi D, Porro D, Casiraghi M, Labra M, Bruno A. SKIOME Project: a curated collection of skin microbiome datasets enriched with study-related metadata. Database (Oxford) 2022;2022:6586378. [PMID: 35576001 PMCID: PMC9216470 DOI: 10.1093/database/baac033] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 02/25/2022] [Accepted: 05/09/2022] [Indexed: 04/07/2023]

Abstract

Large amounts of data from microbiome-related studies have been (and are currently being) deposited on international public databases. These datasets represent a valuable resource for the microbiome research community and could serve future researchers interested in integrating multiple datasets into powerful meta-analyses. However, this huge amount of data lacks harmonization and it is far from being completely exploited in its full potential to build a foundation that places microbiome research at the nexus of many subdisciplines within and beyond biology. Thus, it urges the need for data accessibility and reusability, according to findable, accessible, interoperable and reusable (FAIR) principles, as supported by National Microbiome Data Collaborative and FAIR Microbiome. To tackle the challenge of accelerating discovery and advances in skin microbiome research, we collected, integrated and organized existing microbiome data resources from human skin 16S rRNA amplicon-sequencing experiments. We generated a comprehensive collection of datasets, enriched in metadata, and organized this information into data frames ready to be integrated into microbiome research projects and advanced post-processing analyses, such as data science applications (e.g. machine learning). Furthermore, we have created a data retrieval and curation framework built on three different stages to maximize the retrieval of datasets and metadata associated with them. Lastly, we highlighted some caveats regarding metadata retrieval and suggested ways to improve future metadata submissions. Overall, our work resulted in a curated skin microbiome datasets collection accompanied by a state-of-the-art analysis of the last 10 years of the skin microbiome field. Database URL: https://github.com/giuliaago/SKIOMEMetadataRetrieval.

Collapse

Chen X, Zhu Z, Zhang W, Wang Y, Wang F, Yang J, Wong KC. Human disease prediction from microbiome data by multiple feature fusion and deep learning. iScience 2022;25:104081. [PMID: 35372808 PMCID: PMC8971930 DOI: 10.1016/j.isci.2022.104081] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2021] [Revised: 09/16/2021] [Accepted: 03/13/2022] [Indexed: 10/29/2022] Open

Host phenotype classification from human microbiome data is mainly driven by the presence of microbial taxa. PLoS Comput Biol 2022;18:e1010066. [PMID: 35446845 PMCID: PMC9064115 DOI: 10.1371/journal.pcbi.1010066] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 05/03/2022] [Accepted: 03/29/2022] [Indexed: 12/14/2022] Open

Abstract

Machine learning-based classification approaches are widely used to predict host phenotypes from microbiome data. Classifiers are typically employed by considering operational taxonomic units or relative abundance profiles as input features. Such types of data are intrinsically sparse, which opens the opportunity to make predictions from the presence/absence rather than the relative abundance of microbial taxa. This also poses the question whether it is the presence rather than the abundance of particular taxa to be relevant for discrimination purposes, an aspect that has been so far overlooked in the literature. In this paper, we aim at filling this gap by performing a meta-analysis on 4,128 publicly available metagenomes associated with multiple case-control studies. At species-level taxonomic resolution, we show that it is the presence rather than the relative abundance of specific microbial taxa to be important when building classification models. Such findings are robust to the choice of the classifier and confirmed by statistical tests applied to identifying differentially abundant/present taxa. Results are further confirmed at coarser taxonomic resolutions and validated on 4,026 additional 16S rRNA samples coming from 30 public case-control studies.

The composition of the human microbiome has been linked to a large number of different diseases. In this context, classification methodologies based on machine learning approaches have represented a promising tool for diagnostic purposes from metagenomics data. The link between microbial population composition and host phenotypes has been usually performed by considering taxonomic profiles represented by relative abundances of microbial species. In this study, we show that it is more the presence rather than the relative abundance of microbial taxa to be relevant to maximize classification accuracy. This is accomplished by conducting a meta-analysis on more than 4,000 shotgun metagenomes coming from 25 case-control studies and in which original relative abundance data are degraded to presence/absence profiles. Findings are also extended to 16S rRNA data and advance the research field in building prediction models directly from human microbiome data.

Collapse

Chen Y, Li J, Zhang Y, Zhang M, Sun Z, Jing G, Huang S, Su X. Parallel-Meta Suite: Interactive and rapid microbiome data analysis on multiple platforms. IMETA 2022;1:e1. [PMID: 38867729 PMCID: PMC10989749 DOI: 10.1002/imt2.1] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 12/13/2021] [Accepted: 12/17/2021] [Indexed: 06/14/2024]

Liu B, Huang L, Liu Z, Pan X, Cui Z, Pan J, Xie L. EasyMicroPlot: An Efficient and Convenient R Package in Microbiome Downstream Analysis and Visualization for Clinical Study. Front Genet 2022;12:803627. [PMID: 35058973 PMCID: PMC8764268 DOI: 10.3389/fgene.2021.803627] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 12/02/2021] [Indexed: 01/03/2023] Open

Affiliation(s)

Bingdong Liu The First Affiliated Hospital of Jinan University, Guangzhou, China.,State Key Laboratory of Applied Microbiology Southern China, Guangdong Provincial Key Laboratory of Microbial Culture Collection and Application, Guangdong Open Laboratory of Applied Microbiology, Institute of Microbiology, Guangdong Academy of Sciences, Guangzhou, China
Liujing Huang State Key Laboratory of Applied Microbiology Southern China, Guangdong Provincial Key Laboratory of Microbial Culture Collection and Application, Guangdong Open Laboratory of Applied Microbiology, Institute of Microbiology, Guangdong Academy of Sciences, Guangzhou, China.,Zhujiang Hospital, Southern Medical University, Guangzhou, China
Zhihong Liu State Key Laboratory of Applied Microbiology Southern China, Guangdong Provincial Key Laboratory of Microbial Culture Collection and Application, Guangdong Open Laboratory of Applied Microbiology, Institute of Microbiology, Guangdong Academy of Sciences, Guangzhou, China
Xiaohan Pan Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China
Zongbing Cui State Key Laboratory of Applied Microbiology Southern China, Guangdong Provincial Key Laboratory of Microbial Culture Collection and Application, Guangdong Open Laboratory of Applied Microbiology, Institute of Microbiology, Guangdong Academy of Sciences, Guangzhou, China
Jiyang Pan The First Affiliated Hospital of Jinan University, Guangzhou, China
Liwei Xie State Key Laboratory of Applied Microbiology Southern China, Guangdong Provincial Key Laboratory of Microbial Culture Collection and Application, Guangdong Open Laboratory of Applied Microbiology, Institute of Microbiology, Guangdong Academy of Sciences, Guangzhou, China.,Zhujiang Hospital, Southern Medical University, Guangzhou, China.,School of Public Health, Xinxiang Medical University, Xinxiang, China

Collapse

Zhao L, Cho WC, Nicolls MR. Colorectal Cancer-Associated Microbiome Patterns and Signatures. Front Genet 2022;12:787176. [PMID: 35003221 PMCID: PMC8729777 DOI: 10.3389/fgene.2021.787176] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 12/07/2021] [Indexed: 01/02/2023] Open

Abstract

The gut microbiome is dynamic and shaped by diet, age, geography, and environment. The disruption of normal gut microbiota (dysbiosis) is closely related to colorectal cancer (CRC) risk and progression. To better identify and characterize CRC-associated dysbiosis, we collected six independent cohorts with matched normal pairs (when available) for comparison and exploration of the microbiota and their interactions with the host. Comparing the microbial community compositions between cancerous and adjacent noncancerous tissues, we found that more microbes were depleted than enriched in tumors. Despite taxonomic variations among cohorts, consistent depletion of normal microbiota (members of Clostridia and Bacteroidia) and significant enrichment of oral-originated pathogens (such as Fusobacterium nucleatum and Parvimonas micra) were observed in CRC compared to normal tissues. Sets of hub and hub-connecting microbes were subsequently identified to infer microbe-microbe interaction networks in CRC. Furthermore, biclustering was used for identifying coherent patterns between patients and microbes. Two patient-microbe interaction patterns, named P0 and P1, can be consistently identified among the investigated six CRC cohorts. Characterization of the microbial community composition of the two patterns revealed that patients in P0 and P1 differed significantly in microbial alpha and beta diversity, and CRC‐associated microbiota changes consist of continuous populations of widespread taxa rather than discrete enterotypes. In contrast to the P0, the patients in P1 have reduced microbial alpha diversity compared to the adjacent normal tissues, and P1 possesses more oral-related pathogens than P0 and controls. Collectively, our study investigated the CRC-associated microbiome changes, and identified reproducible microbial signatures across multiple independent cohorts. More importantly, we revealed that the CRC heterogeneity can be partially attributed to the variety and compositional differences of microbes and their interactions to humans.

Collapse

Giulia A, Anna S, Antonia B, Dario P, Maurizio C. Extending Association Rule Mining to Microbiome Pattern Analysis: Tools and Guidelines to Support Real Applications. FRONTIERS IN BIOINFORMATICS 2022;1:794547. [PMID: 36303759 PMCID: PMC9580939 DOI: 10.3389/fbinf.2021.794547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 12/07/2021] [Indexed: 11/24/2022] Open

Abstract

Boosted by the exponential growth of microbiome-based studies, analyzing microbiome patterns is now a hot-topic, finding different fields of application. In particular, the use of machine learning techniques is increasing in microbiome studies, providing deep insights into microbial community composition. In this context, in order to investigate microbial patterns from 16S rRNA metabarcoding data, we explored the effectiveness of Association Rule Mining (ARM) technique, a supervised-machine learning procedure, to extract patterns (in this work, intended as groups of species or taxa) from microbiome data. ARM can generate huge amounts of data, making spurious information removal and visualizing results challenging. Our work sheds light on the strengths and weaknesses of pattern mining strategy into the study of microbial patterns, in particular from 16S rRNA microbiome datasets, applying ARM on real case studies and providing guidelines for future usage. Our results highlighted issues related to the type of input and the use of metadata in microbial pattern extraction, identifying the key steps that must be considered to apply ARM consciously on 16S rRNA microbiome data. To promote the use of ARM and the visualization of microbiome patterns, specifically, we developed microFIM (microbial Frequent Itemset Mining), a versatile Python tool that facilitates the use of ARM integrating common microbiome outputs, such as taxa tables. microFIM implements interest measures to remove spurious information and merges the results of ARM analysis with the common microbiome outputs, providing similar microbiome strategies that help scientists to integrate ARM in microbiome applications. With this work, we aimed at creating a bridge between microbial ecology researchers and ARM technique, making researchers aware about the strength and weaknesses of association rule mining approach.

Collapse

Gordon-Rodriguez E, Quinn TP, Cunningham JP. Learning sparse log-ratios for high-throughput sequencing data. Bioinformatics 2021;38:157-163. [PMID: 34498030 PMCID: PMC8696089 DOI: 10.1093/bioinformatics/btab645] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2021] [Revised: 08/09/2021] [Accepted: 09/03/2021] [Indexed: 02/03/2023] Open

Dupras C, Bunnik EM. Toward a Framework for Assessing Privacy Risks in Multi-Omic Research and Databases. THE AMERICAN JOURNAL OF BIOETHICS : AJOB 2021;21:46-64. [PMID: 33433298 DOI: 10.1080/15265161.2020.1863516] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Yang F, Zou Q. mAML: an automated machine learning pipeline with a microbiome repository for human disease classification. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021;2020:5862399. [PMID: 32588040 PMCID: PMC7316531 DOI: 10.1093/database/baaa050] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 05/27/2020] [Accepted: 06/03/2020] [Indexed: 12/20/2022]

García-Jiménez B, Muñoz J, Cabello S, Medina J, Wilkinson MD. Predicting microbiomes through a deep latent space. Bioinformatics 2021;37:1444-1451. [PMID: 33289510 PMCID: PMC8208755 DOI: 10.1093/bioinformatics/btaa971] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Revised: 10/21/2020] [Accepted: 11/06/2020] [Indexed: 12/28/2022] Open

Abstract

Motivation

Microbial communities influence their environment by modifying the availability of compounds, such as nutrients or chemical elicitors. Knowing the microbial composition of a site is therefore relevant to improve productivity or health. However, sequencing facilities are not always available, or may be prohibitively expensive in some cases. Thus, it would be desirable to computationally predict the microbial composition from more accessible, easily-measured features.

Results

Integrating deep learning techniques with microbiome data, we propose an artificial neural network architecture based on heterogeneous autoencoders to condense the long vector of microbial abundance values into a deep latent space representation. Then, we design a model to predict the deep latent space and, consequently, to predict the complete microbial composition using environmental features as input. The performance of our system is examined using the rhizosphere microbiome of Maize. We reconstruct the microbial composition (717 taxa) from the deep latent space (10 values) with high fidelity (>0.9 Pearson correlation). We then successfully predict microbial composition from environmental variables, such as plant age, temperature or precipitation (0.73 Pearson correlation, 0.42 Bray–Curtis). We extend this to predict microbiome composition under hypothetical scenarios, such as future climate change conditions. Finally, via transfer learning, we predict microbial composition in a distinct scenario with only 100 sequences, and distinct environmental features. We propose that our deep latent space may assist microbiome-engineering strategies when technical or financial resources are limited, through predicting current or future microbiome compositions.

Availability and implementation

Software, results and data are available at https://github.com/jorgemf/DeepLatentMicrobiome

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Chen X, Liu L, Zhang W, Yang J, Wong KC. Human host status inference from temporal microbiome changes via recurrent neural networks. Brief Bioinform 2021;22:6307015. [PMID: 34151933 DOI: 10.1093/bib/bbab223] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 04/21/2021] [Accepted: 04/21/2021] [Indexed: 01/04/2023] Open

DiMucci D, Kon M, Segrè D. BowSaw: Inferring Higher-Order Trait Interactions Associated With Complex Biological Phenotypes. Front Mol Biosci 2021;8:663532. [PMID: 34222331 PMCID: PMC8245782 DOI: 10.3389/fmolb.2021.663532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 05/24/2021] [Indexed: 11/15/2022] Open

Liu YX, Qin Y, Chen T, Lu M, Qian X, Guo X, Bai Y. A practical guide to amplicon and metagenomic analysis of microbiome data. Protein Cell 2021;12:315-330. [PMID: 32394199 PMCID: PMC8106563 DOI: 10.1007/s13238-020-00724-8] [Citation(s) in RCA: 346] [Impact Index Per Article: 115.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Accepted: 04/10/2020] [Indexed: 12/22/2022] Open

Affiliation(s)

Yong-Xin Liu State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China. CAS Center for Excellence in Biotic Interactions, University of Chinese Academy of Sciences, Beijing, 100049, China. CAS-JIC Centre of Excellence for Plant and Microbial Science, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China.
Yuan Qin State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China CAS Center for Excellence in Biotic Interactions, University of Chinese Academy of Sciences, Beijing, 100049, China CAS-JIC Centre of Excellence for Plant and Microbial Science, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China College of Advanced Agricultural Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
Tong Chen National Resource Center for Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing, 100700, China
Meiping Lu Department of Rheumatology Immunology & Allergy, Children's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang Province, 310053, China
Xubo Qian Department of Rheumatology Immunology & Allergy, Children's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang Province, 310053, China
Xiaoxuan Guo State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China CAS Center for Excellence in Biotic Interactions, University of Chinese Academy of Sciences, Beijing, 100049, China CAS-JIC Centre of Excellence for Plant and Microbial Science, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
Yang Bai State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China. CAS Center for Excellence in Biotic Interactions, University of Chinese Academy of Sciences, Beijing, 100049, China. CAS-JIC Centre of Excellence for Plant and Microbial Science, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China. College of Advanced Agricultural Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China.

Collapse

Wu S, Chen Y, Li Z, Li J, Zhao F, Su X. Towards multi-label classification: Next step of machine learning for microbiome research. Comput Struct Biotechnol J 2021;19:2742-2749. [PMID: 34093989 PMCID: PMC8131981 DOI: 10.1016/j.csbj.2021.04.054] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 04/21/2021] [Accepted: 04/22/2021] [Indexed: 11/22/2022] Open

Zhang W, Chen X, Wong KC. Noninvasive early diagnosis of intestinal diseases based on artificial intelligence in genomics and microbiome. J Gastroenterol Hepatol 2021;36:823-831. [PMID: 33880763 DOI: 10.1111/jgh.15500] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Revised: 03/15/2021] [Accepted: 03/17/2021] [Indexed: 12/15/2022]

Shestopaloff K, Dong M, Gao F, Xu W. DCMD: Distance-based classification using mixture distributions on microbiome data. PLoS Comput Biol 2021;17:e1008799. [PMID: 33711013 PMCID: PMC7990174 DOI: 10.1371/journal.pcbi.1008799] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Revised: 03/24/2021] [Accepted: 02/15/2021] [Indexed: 11/21/2022] Open

Abstract

Current advances in next-generation sequencing techniques have allowed researchers to conduct comprehensive research on the microbiome and human diseases, with recent studies identifying associations between the human microbiome and health outcomes for a number of chronic conditions. However, microbiome data structure, characterized by sparsity and skewness, presents challenges to building effective classifiers. To address this, we present an innovative approach for distance-based classification using mixture distributions (DCMD). The method aims to improve classification performance using microbiome community data, where the predictors are composed of sparse and heterogeneous count data. This approach models the inherent uncertainty in sparse counts by estimating a mixture distribution for the sample data and representing each observation as a distribution, conditional on observed counts and the estimated mixture, which are then used as inputs for distance-based classification. The method is implemented into a k-means classification and k-nearest neighbours framework. We develop two distance metrics that produce optimal results. The performance of the model is assessed using simulated and human microbiome study data, with results compared against a number of existing machine learning and distance-based classification approaches. The proposed method is competitive when compared to the other machine learning approaches, and shows a clear improvement over commonly used distance-based classifiers, underscoring the importance of modelling sparsity for achieving optimal results. The range of applicability and robustness make the proposed method a viable alternative for classification using sparse microbiome count data. The source code is available at https://github.com/kshestop/DCMD for academic use.

The uneven performance of conventional distanced-based classifiers when using microbiome profiles to predict disease status has motivated us to develop a novel distance-based method that accounts for uncertainty when modeling sparse counts. We propose a classification algorithm that uses mixture distributions to measure normed distances between microbiome distributions, which better models the underlying structure by handling excess zeros and sparsity inherent in microbial abundance counts. Applications of DCMD have shown improved classification performance and robustness, making the proposed method an improved alternative for classification using microbiome data.

Collapse

Marcos-Zambrano LJ, Karaduzovic-Hadziabdic K, Loncar Turukalo T, Przymus P, Trajkovik V, Aasmets O, Berland M, Gruca A, Hasic J, Hron K, Klammsteiner T, Kolev M, Lahti L, Lopes MB, Moreno V, Naskinova I, Org E, Paciência I, Papoutsoglou G, Shigdel R, Stres B, Vilne B, Yousef M, Zdravevski E, Tsamardinos I, Carrillo de Santa Pau E, Claesson MJ, Moreno-Indias I, Truu J. Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment. Front Microbiol 2021;12:634511. [PMID: 33737920 PMCID: PMC7962872 DOI: 10.3389/fmicb.2021.634511] [Citation(s) in RCA: 126] [Impact Index Per Article: 42.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 02/01/2021] [Indexed: 12/19/2022] Open

Abstract

The number of microbiome-related studies has notably increased the availability of data on human microbiome composition and function. These studies provide the essential material to deeply explore host-microbiome associations and their relation to the development and progression of various complex diseases. Improved data-analytical tools are needed to exploit all information from these biological datasets, taking into account the peculiarities of microbiome data, i.e., compositional, heterogeneous and sparse nature of these datasets. The possibility of predicting host-phenotypes based on taxonomy-informed feature selection to establish an association between microbiome and predict disease states is beneficial for personalized medicine. In this regard, machine learning (ML) provides new insights into the development of models that can be used to predict outputs, such as classification and prediction in microbiology, infer host phenotypes to predict diseases and use microbial communities to stratify patients by their characterization of state-specific microbial signatures. Here we review the state-of-the-art ML methods and respective software applied in human microbiome studies, performed as part of the COST Action ML4Microbiome activities. This scoping review focuses on the application of ML in microbiome studies related to association and clinical use for diagnostics, prognostics, and therapeutics. Although the data presented here is more related to the bacterial community, many algorithms could be applied in general, regardless of the feature type. This literature and software review covering this broad topic is aligned with the scoping review methodology. The manual identification of data sources has been complemented with: (1) automated publication search through digital libraries of the three major publishers using natural language processing (NLP) Toolkit, and (2) an automated identification of relevant software repositories on GitHub and ranking of the related research papers relying on learning to rank approach.

Collapse

Affiliation(s)

Laura Judith Marcos-Zambrano Computational Biology Group, Precision Nutrition and Cancer Research Program, IMDEA Food Institute, Madrid, Spain
Kanita Karaduzovic-Hadziabdic Faculty of Engineering and Natural Sciences, International University of Sarajevo, Sarajevo, Bosnia and Herzegovina
Tatjana Loncar Turukalo Faculty of Technical Sciences, University of Novi Sad, Novi Sad, Serbia
Piotr Przymus Faculty of Mathematics and Computer Science, Nicolaus Copernicus University, Toruń, Poland
Vladimir Trajkovik Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, Skopje, North Macedonia
Oliver Aasmets Institute of Genomics, Estonian Genome Centre, University of Tartu, Tartu, Estonia Department of Biotechnology, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
Magali Berland Université Paris-Saclay, INRAE, MGP, Jouy-en-Josas, France
Aleksandra Gruca Department of Computer Networks and Systems, Silesian University of Technology, Gliwice, Poland
Jasminka Hasic University Sarajevo School of Science and Technology, Sarajevo, Bosnia and Herzegovina
Karel Hron Department of Mathematical Analysis and Applications of Mathematics, Palacký University, Olomouc, Czechia
Thomas Klammsteiner Department of Microbiology, University of Innsbruck, Innsbruck, Austria
Mikhail Kolev South West University “Neofit Rilski”, Blagoevgrad, Bulgaria
Leo Lahti Department of Computing, University of Turku, Turku, Finland
Marta B. Lopes NOVA Laboratory for Computer Science and Informatics (NOVA LINCS), FCT, UNL, Caparica, Portugal Centro de Matemática e Aplicações (CMA), FCT, UNL, Caparica, Portugal
Victor Moreno Oncology Data Analytics Program, Catalan Institute of Oncology (ICO)Barcelona, Spain Colorectal Cancer Group, Institut de Recerca Biomedica de Bellvitge (IDIBELL), Barcelona, Spain Consortium for Biomedical Research in Epidemiology and Public Health (CIBERESP), Barcelona, Spain Department of Clinical Sciences, Faculty of Medicine, University of Barcelona, Barcelona, Spain
Irina Naskinova South West University “Neofit Rilski”, Blagoevgrad, Bulgaria
Elin Org Institute of Genomics, Estonian Genome Centre, University of Tartu, Tartu, Estonia
Inês Paciência EPIUnit – Instituto de Saúde Pública da Universidade do Porto, Porto, Portugal
Georgios Papoutsoglou Department of Computer Science, University of Crete, Heraklion, Greece
Rajesh Shigdel Department of Clinical Science, University of Bergen, Bergen, Norway
Blaz Stres Group for Microbiology and Microbial Biotechnology, Department of Animal Science, University of Ljubljana, Ljubljana, Slovenia
Baiba Vilne Bioinformatics Research Unit, Riga Stradins University, Riga, Latvia
Malik Yousef Department of Information Systems, Zefat Academic College, Zefat, Israel Galilee Digital Health Research Center (GDH), Zefat Academic College, Zefat, Israel
Eftim Zdravevski Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, Skopje, North Macedonia
Ioannis Tsamardinos Department of Computer Science, University of Crete, Heraklion, Greece
Enrique Carrillo de Santa Pau Computational Biology Group, Precision Nutrition and Cancer Research Program, IMDEA Food Institute, Madrid, Spain
Marcus J. Claesson School of Microbiology & APC Microbiome Ireland, University College Cork, Cork, Ireland
Isabel Moreno-Indias Unidad de Gestión Clínica de Endocrinología y Nutrición, Instituto de Investigación Biomédica de Málaga (IBIMA), Hospital Clínico Universitario Virgen de la Victoria, Universidad de Málaga, Málaga, Spain Centro de Investigación Biomédica en Red de Fisiopatología de la Obesidad y la Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, Spain
Jaak Truu Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia

Collapse

Ghannam RB, Techtmann SM. Machine learning applications in microbial ecology, human microbiome studies, and environmental monitoring. Comput Struct Biotechnol J 2021;19:1092-1107. [PMID: 33680353 PMCID: PMC7892807 DOI: 10.1016/j.csbj.2021.01.028] [Citation(s) in RCA: 89] [Impact Index Per Article: 29.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 01/16/2021] [Accepted: 01/18/2021] [Indexed: 01/04/2023] Open

Reiman D, Farhat AM, Dai Y. Predicting Host Phenotype Based on Gut Microbiome Using a Convolutional Neural Network Approach. Methods Mol Biol 2021;2190:249-266. [PMID: 32804370 DOI: 10.1007/978-1-0716-0826-5_12] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Ghosh A, Firdous S, Saha S. Bioinformatics for Human Microbiome. Adv Bioinformatics 2021. [DOI: 10.1007/978-981-33-6191-1_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open

Bokulich NA, Ziemski M, Robeson MS, Kaehler BD. Measuring the microbiome: Best practices for developing and benchmarking microbiomics methods. Comput Struct Biotechnol J 2020;18:4048-4062. [PMID: 33363701 PMCID: PMC7744638 DOI: 10.1016/j.csbj.2020.11.049] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Revised: 11/27/2020] [Accepted: 11/28/2020] [Indexed: 12/12/2022] Open

Gut microbiota and artificial intelligence approaches: A scoping review. HEALTH AND TECHNOLOGY 2020. [DOI: 10.1007/s12553-020-00486-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]

De Filippis F, Pasolli E, Ercolini D. The food-gut axis: lactic acid bacteria and their link to food, the gut microbiome and human health. FEMS Microbiol Rev 2020;44:454-489. [PMID: 32556166 PMCID: PMC7391071 DOI: 10.1093/femsre/fuaa015] [Citation(s) in RCA: 115] [Impact Index Per Article: 28.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Accepted: 05/20/2020] [Indexed: 12/18/2022] Open

Topçuoğlu BD, Lesniak NA, Ruffin MT, Wiens J, Schloss PD. A Framework for Effective Application of Machine Learning to Microbiome-Based Classification Problems. mBio 2020;11:e00434-20. [PMID: 32518182 PMCID: PMC7373189 DOI: 10.1128/mbio.00434-20] [Citation(s) in RCA: 75] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Accepted: 05/06/2020] [Indexed: 12/12/2022] Open

Abstract

Machine learning (ML) modeling of the human microbiome has the potential to identify microbial biomarkers and aid in the diagnosis of many diseases such as inflammatory bowel disease, diabetes, and colorectal cancer. Progress has been made toward developing ML models that predict health outcomes using bacterial abundances, but inconsistent adoption of training and evaluation methods call the validity of these models into question. Furthermore, there appears to be a preference by many researchers to favor increased model complexity over interpretability. To overcome these challenges, we trained seven models that used fecal 16S rRNA sequence data to predict the presence of colonic screen relevant neoplasias (SRNs) (n = 490 patients, 261 controls and 229 cases). We developed a reusable open-source pipeline to train, validate, and interpret ML models. To show the effect of model selection, we assessed the predictive performance, interpretability, and training time of L2-regularized logistic regression, L1- and L2-regularized support vector machines (SVM) with linear and radial basis function kernels, a decision tree, random forest, and gradient boosted trees (XGBoost). The random forest model performed best at detecting SRNs with an area under the receiver operating characteristic curve (AUROC) of 0.695 (interquartile range [IQR], 0.651 to 0.739) but was slow to train (83.2 h) and not inherently interpretable. Despite its simplicity, L2-regularized logistic regression followed random forest in predictive performance with an AUROC of 0.680 (IQR, 0.625 to 0.735), trained faster (12 min), and was inherently interpretable. Our analysis highlights the importance of choosing an ML approach based on the goal of the study, as the choice will inform expectations of performance and interpretability.IMPORTANCE Diagnosing diseases using machine learning (ML) is rapidly being adopted in microbiome studies. However, the estimated performance associated with these models is likely overoptimistic. Moreover, there is a trend toward using black box models without a discussion of the difficulty of interpreting such models when trying to identify microbial biomarkers of disease. This work represents a step toward developing more-reproducible ML practices in applying ML to microbiome research. We implement a rigorous pipeline and emphasize the importance of selecting ML models that reflect the goal of the study. These concepts are not particular to the study of human health but can also be applied to environmental microbiology studies.

Collapse

Vangay P, Hillmann BM, Knights D. Microbiome Learning Repo (ML Repo): A public repository of microbiome regression and classification tasks. Gigascience 2019;8:giz042. [PMID: 31042284 PMCID: PMC6493971 DOI: 10.1093/gigascience/giz042] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Revised: 02/24/2019] [Accepted: 03/26/2019] [Indexed: 01/05/2023] Open