1
|
Kim J, Jang H, Koh H. MiMultiCat: A Unified Cloud Platform for the Analysis of Microbiome Data with Multi-Categorical Responses. Bioengineering (Basel) 2024; 11:60. [PMID: 38247937 PMCID: PMC10813402 DOI: 10.3390/bioengineering11010060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2023] [Revised: 12/21/2023] [Accepted: 12/31/2023] [Indexed: 01/23/2024] Open
Abstract
The field of the human microbiome is rapidly growing due to the recent advances in high-throughput sequencing technologies. Meanwhile, there have also been many new analytic pipelines, methods and/or tools developed for microbiome data preprocessing and analytics. They are usually focused on microbiome data with continuous (e.g., body mass index) or binary responses (e.g., diseased vs. healthy), yet multi-categorical responses that have more than two categories are also common in reality. In this paper, we introduce a new unified cloud platform, named MiMultiCat, for the analysis of microbiome data with multi-categorical responses. The two main distinguishing features of MiMultiCat are as follows: First, MiMultiCat streamlines a long sequence of microbiome data preprocessing and analytic procedures on user-friendly web interfaces; as such, it is easy to use for many people in various disciplines (e.g., biology, medicine, public health). Second, MiMultiCat performs both association testing and prediction modeling extensively. For association testing, MiMultiCat handles both ecological (e.g., alpha and beta diversity) and taxonomical (e.g., phylum, class, order, family, genus, species) contexts through covariate-adjusted or unadjusted analysis. For prediction modeling, MiMultiCat employs the random forest and gradient boosting algorithms that are well suited to microbiome data while providing nice visual interpretations. We demonstrate its use through the reanalysis of gut microbiome data on obesity with body mass index categories. MiMultiCat is freely available on our web server.
Collapse
Affiliation(s)
| | | | - Hyunwook Koh
- Department of Applied Mathematics and Statistics, The State University of New York (SUNY), Incheon 21985, Republic of Korea
| |
Collapse
|
2
|
Kim J, Koh H. MiTree: A Unified Web Cloud Analytic Platform for User-Friendly and Interpretable Microbiome Data Mining Using Tree-Based Methods. Microorganisms 2023; 11:2816. [PMID: 38004827 PMCID: PMC10672986 DOI: 10.3390/microorganisms11112816] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 11/05/2023] [Accepted: 11/17/2023] [Indexed: 11/26/2023] Open
Abstract
The advent of next-generation sequencing has greatly accelerated the field of human microbiome studies. Currently, investigators are seeking, struggling and competing to find new ways to diagnose, treat and prevent human diseases through the human microbiome. Machine learning is a promising approach to help such an effort, especially due to the high complexity of microbiome data. However, many of the current machine learning algorithms are in a "black box", i.e., they are difficult to understand and interpret. In addition, clinicians, public health practitioners and biologists are not usually skilled at computer programming, and they do not always have high-end computing devices. Thus, in this study, we introduce a unified web cloud analytic platform, named MiTree, for user-friendly and interpretable microbiome data mining. MiTree employs tree-based learning methods, including decision tree, random forest and gradient boosting, that are well understood and suited to human microbiome studies. We also stress that MiTree can address both classification and regression problems through covariate-adjusted or unadjusted analysis. MiTree should serve as an easy-to-use and interpretable data mining tool for microbiome-based disease prediction modeling, and should provide new insights into microbiome-based diagnostics, treatment and prevention. MiTree is an open-source software that is available on our web server.
Collapse
|
3
|
Jang H, Park S, Koh H. Comprehensive microbiome causal mediation analysis using MiMed on user-friendly web interfaces. Biol Methods Protoc 2023; 8:bpad023. [PMID: 37840574 PMCID: PMC10576642 DOI: 10.1093/biomethods/bpad023] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 09/19/2023] [Accepted: 10/02/2023] [Indexed: 10/17/2023] Open
Abstract
It is a central goal of human microbiome studies to see the roles of the microbiome as a mediator that transmits environmental, behavioral, or medical exposures to health or disease outcomes. Yet, mediation analysis is not used as much as it should be. One reason is because of the lack of carefully planned routines, compilers, and automated computing systems for microbiome mediation analysis (MiMed) to perform a series of data processing, diversity calculation, data normalization, downstream data analysis, and visualizations. Many researchers in various disciplines (e.g. clinicians, public health practitioners, and biologists) are not also familiar with related statistical methods and programming languages on command-line interfaces. Thus, in this article, we introduce a web cloud computing platform, named as MiMed, that enables comprehensive MiMed on user-friendly web interfaces. The main features of MiMed are as follows. First, MiMed can survey the microbiome in various spheres (i) as a whole microbial ecosystem using different ecological measures (e.g. alpha- and beta-diversity indices) or (ii) as individual microbial taxa (e.g. phyla, classes, orders, families, genera, and species) using different data normalization methods. Second, MiMed enables covariate-adjusted analysis to control for potential confounding factors (e.g. age and gender), which is essential to enhance the causality of the results, especially for observational studies. Third, MiMed enables a breadth of statistical inferences in both mediation effect estimation and significance testing. Fourth, MiMed provides flexible and easy-to-use data processing and analytic modules and creates nice graphical representations. Finally, MiMed employs ChatGPT to search for what has been known about the microbial taxa that are found significantly as mediators using artificial intelligence technologies. For demonstration purposes, we applied MiMed to the study on the mediating roles of oral microbiome in subgingival niches between e-cigarette smoking and gingival inflammation. MiMed is freely available on our web server (http://mimed.micloud.kr).
Collapse
Affiliation(s)
- Hyojung Jang
- Department of Applied Mathematics and Statistics, The State University of New York, Korea, Incheon, South Korea
| | - Solha Park
- Department of Applied Mathematics and Statistics, The State University of New York, Korea, Incheon, South Korea
| | - Hyunwook Koh
- Department of Applied Mathematics and Statistics, The State University of New York, Korea, Incheon, South Korea
| |
Collapse
|
4
|
Papoutsoglou G, Tarazona S, Lopes MB, Klammsteiner T, Ibrahimi E, Eckenberger J, Novielli P, Tonda A, Simeon A, Shigdel R, Béreux S, Vitali G, Tangaro S, Lahti L, Temko A, Claesson MJ, Berland M. Machine learning approaches in microbiome research: challenges and best practices. Front Microbiol 2023; 14:1261889. [PMID: 37808286 PMCID: PMC10556866 DOI: 10.3389/fmicb.2023.1261889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 09/04/2023] [Indexed: 10/10/2023] Open
Abstract
Microbiome data predictive analysis within a machine learning (ML) workflow presents numerous domain-specific challenges involving preprocessing, feature selection, predictive modeling, performance estimation, model interpretation, and the extraction of biological information from the results. To assist decision-making, we offer a set of recommendations on algorithm selection, pipeline creation and evaluation, stemming from the COST Action ML4Microbiome. We compared the suggested approaches on a multi-cohort shotgun metagenomics dataset of colorectal cancer patients, focusing on their performance in disease diagnosis and biomarker discovery. It is demonstrated that the use of compositional transformations and filtering methods as part of data preprocessing does not always improve the predictive performance of a model. In contrast, the multivariate feature selection, such as the Statistically Equivalent Signatures algorithm, was effective in reducing the classification error. When validated on a separate test dataset, this algorithm in combination with random forest modeling, provided the most accurate performance estimates. Lastly, we showed how linear modeling by logistic regression coupled with visualization techniques such as Individual Conditional Expectation (ICE) plots can yield interpretable results and offer biological insights. These findings are significant for clinicians and non-experts alike in translational applications.
Collapse
Affiliation(s)
- Georgios Papoutsoglou
- Department of Computer Science, University of Crete, Heraklion, Greece
- JADBio Gnosis DA S.A., Science and Technology Park of Crete, Heraklion, Greece
| | - Sonia Tarazona
- Department of Applied Statistics and Operations Research and Quality, Polytechnic University of Valencia, Valencia, Spain
| | - Marta B. Lopes
- Center for Mathematics and Applications (NOVA Math), NOVA School of Science and Technology, Caparica, Portugal
- Research and Development Unit for Mechanical and Industrial Engineering (UNIDEMI), Department of Mechanical and Industrial Engineering, NOVA School of Science and Technology, Caparica, Portugal
| | - Thomas Klammsteiner
- Department of Ecology, Universität Innsbruck, Innsbruck, Austria
- Department of Microbiology, Universität Innsbruck, Innsbruck, Austria
| | - Eliana Ibrahimi
- Department of Biology, University of Tirana, Tirana, Albania
| | - Julia Eckenberger
- School of Microbiology, University College Cork, Cork, Ireland
- APC Microbiome Ireland, Cork, Ireland
| | - Pierfrancesco Novielli
- Department of Soil, Plant, and Food Sciences, University of Bari Aldo Moro, Bari, Italy
- National Institute for Nuclear Physics, Bari Division, Bari, Italy
| | - Alberto Tonda
- UMR 518 MIA-PS, INRAE, Paris-Saclay University, Palaiseau, France
- Complex Systems Institute of Paris Ile-de-France (ISC-PIF) - UAR 3611 CNRS, Paris, France
| | - Andrea Simeon
- BioSense Institute, University of Novi Sad, Novi Sad, Serbia
| | - Rajesh Shigdel
- Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Stéphane Béreux
- MetaGenoPolis, INRAE, Paris-Saclay University, Jouy-en-Josas, France
- MaIAGE, INRAE, Paris-Saclay University, Jouy-en-Josas, France
| | - Giacomo Vitali
- MetaGenoPolis, INRAE, Paris-Saclay University, Jouy-en-Josas, France
| | - Sabina Tangaro
- Department of Soil, Plant, and Food Sciences, University of Bari Aldo Moro, Bari, Italy
- National Institute for Nuclear Physics, Bari Division, Bari, Italy
| | - Leo Lahti
- Department of Computing, University of Turku, Turku, Finland
| | - Andriy Temko
- Department of Electrical and Electronic Engineering, University College Cork, Cork, Ireland
| | - Marcus J. Claesson
- School of Microbiology, University College Cork, Cork, Ireland
- APC Microbiome Ireland, Cork, Ireland
| | - Magali Berland
- MetaGenoPolis, INRAE, Paris-Saclay University, Jouy-en-Josas, France
| |
Collapse
|
5
|
Gu W, Koh H, Jang H, Lee B, Kang B. MiSurv: an Integrative Web Cloud Platform for User-Friendly Microbiome Data Analysis with Survival Responses. Microbiol Spectr 2023; 11:e0505922. [PMID: 37039671 PMCID: PMC10269532 DOI: 10.1128/spectrum.05059-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 03/12/2023] [Indexed: 04/12/2023] Open
Abstract
Investigators have studied the treatment effects on human health or disease, the treatment effects on human microbiome, and the roles of the microbiome on human health or disease. Especially, in a clinical trial, investigators commonly trace disease status over a lengthy period to survey the sequential disease progression for different treatment groups (e.g., treatment versus placebo, new treatment versus old treatment). Hence, disease responses are often available in the form of survival (i.e., time-to-event) responses stratified by treatment groups. While the recent web cloud platforms have enabled user-friendly microbiome data processing and analytics, there is currently no web cloud platform to analyze microbiome data with survival responses. Therefore, we introduce here an integrative web cloud platform, called MiSurv, for comprehensive microbiome data analysis with survival responses. IMPORTANCE MiSurv consists of a data processing module and its following four data analytic modules: (i) Module 1: Comparative survival analysis between treatment groups, (ii) Module 2: Comparative analysis in microbial composition between treatment groups, (iii) Module 3: Association testing between microbial composition and survival responses, (iv) Module 4: Prediction modeling using microbial taxa on survival responses. We demonstrate its use through an example trial on the effects of antibiotic use on the survival rate against type 1 diabetes (T1D) onset and gut microbiome composition, respectively, and the effects of the gut microbiome on the survival rate against T1D onset. MiSurv is freely available on our web server (http://misurv.micloud.kr) or can alternatively run on the user's local computer (https://github.com/wg99526/MiSurvGit).
Collapse
Affiliation(s)
- Won Gu
- Department of Applied Mathematics and Statistics, The State University of New York, Korea, Incheon, South Korea
| | - Hyunwook Koh
- Department of Applied Mathematics and Statistics, The State University of New York, Korea, Incheon, South Korea
| | - Hyojung Jang
- Department of Applied Mathematics and Statistics, The State University of New York, Korea, Incheon, South Korea
| | - Byungho Lee
- Department of Applied Mathematics and Statistics, The State University of New York, Korea, Incheon, South Korea
| | - Byungkon Kang
- Department of Computer Science, The State University of New York, Korea, Incheon, South Korea
| |
Collapse
|
6
|
Dietrich A, Matchado MS, Zwiebel M, Ölke B, Lauber M, Lagkouvardos I, Baumbach J, Haller D, Brandl B, Skurk T, Hauner H, Reitmeier S, List M. Namco: a microbiome explorer. Microb Genom 2022; 8. [PMID: 35917163 PMCID: PMC9484756 DOI: 10.1099/mgen.0.000852] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
16S rRNA gene profiling is currently the most widely used technique in microbiome research and allows the study of microbial diversity, taxonomic profiling, phylogenetics, functional and network analysis. While a plethora of tools have been developed for the analysis of 16S rRNA gene data, only a few platforms offer a user-friendly interface and none comprehensively covers the whole analysis pipeline from raw data processing down to complex analysis. We introduce Namco, an R shiny application that offers a streamlined interface and serves as a one-stop solution for microbiome analysis. We demonstrate Namco's capabilities by studying the association between a rich fibre diet and the gut microbiota composition. Namco helped to prove the hypothesis that butyrate-producing bacteria are prompted by fibre-enriched intervention. Namco provides a broad range of features from raw data processing and basic statistics down to machine learning and network analysis, thus covering complex data analysis tasks that are not comprehensively covered elsewhere. Namco is freely available at https://exbio.wzw.tum.de/namco/.
Collapse
Affiliation(s)
- Alexander Dietrich
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Monica Steffi Matchado
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany.,Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Maximilian Zwiebel
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Benjamin Ölke
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Michael Lauber
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Ilias Lagkouvardos
- ZIEL - Institute for Food & Health, Technical University of Munich, 85354 Freising, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany.,Institute of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Dirk Haller
- ZIEL - Institute for Food & Health, Technical University of Munich, 85354 Freising, Germany.,Chair of Nutrition and Immunology, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Beate Brandl
- ZIEL - Institute for Food & Health, Technical University of Munich, 85354 Freising, Germany
| | - Thomas Skurk
- ZIEL - Institute for Food & Health, Technical University of Munich, 85354 Freising, Germany
| | - Hans Hauner
- ZIEL - Institute for Food & Health, Technical University of Munich, 85354 Freising, Germany.,Institute of Nutritional Medicine, TUM School of Medicine, Technical University of Munich, Munich, Germany
| | - Sandra Reitmeier
- ZIEL - Institute for Food & Health, Technical University of Munich, 85354 Freising, Germany.,Chair of Nutrition and Immunology, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| |
Collapse
|
7
|
Abstract
Unraveling the association between microbiome and plant phenotype can illustrate the effect of microbiome on host and then guide the agriculture management. Adequate identification of species and appropriate choice of models are two challenges in microbiome data analysis. Computational models of microbiome data could help in association analysis between the microbiome and plant host. The deep learning methods have been widely used to learn the microbiome data due to their powerful strength of handling the complex, sparse, noisy, and high-dimensional data. Here, we review the analytic strategies in the microbiome data analysis and describe the applications of deep learning models for plant–microbiome correlation studies. We also introduce the application cases of different models in plant–microbiome correlation analysis and discuss how to adapt the models on the critical steps in data processing. From the aspect of data processing manner, model structure, and operating principle, most deep learning models are suitable for the plant microbiome data analysis. The ability of feature representation and pattern recognition is the advantage of deep learning methods in modeling and interpretation for association analysis. Based on published computational experiments, the convolutional neural network and graph neural networks could be recommended for plant microbiome analysis.
Collapse
Affiliation(s)
- Zhiyu Deng
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, China.,Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Jinming Zhang
- Department of Infectious Diseases, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, China
| | - Junya Li
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, China.,Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, China.,Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, China
| |
Collapse
|