1
|
Liu Z, Sun Y, Li Y, Ma A, Willaims NF, Jahanbahkshi S, Hoyd R, Wang X, Zhang S, Zhu J, Xu D, Spakowicz D, Ma Q, Liu B. An Explainable Graph Neural Framework to Identify Cancer-Associated Intratumoral Microbial Communities. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024:e2403393. [PMID: 39225619 DOI: 10.1002/advs.202403393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 06/26/2024] [Indexed: 09/04/2024]
Abstract
Microbes are extensively present among various cancer tissues and play critical roles in carcinogenesis and treatment responses. However, the underlying relationships between intratumoral microbes and tumors remain poorly understood. Here, a MIcrobial Cancer-association Analysis using a Heterogeneous graph transformer (MICAH) to identify intratumoral cancer-associated microbial communities is presented. MICAH integrates metabolic and phylogenetic relationships among microbes into a heterogeneous graph representation. It uses a graph transformer to holistically capture relationships between intratumoral microbes and cancer tissues, which improves the explainability of the associations between identified microbial communities and cancers. MICAH is applied to intratumoral bacterial data across 5 cancer types and 5 fungi datasets, and its generalizability and reproducibility are demonstrated. After experimentally testing a representative observation using a mouse model of tumor-microbe-immune interactions, a result consistent with MICAH's identified relationship is observed. Source tracking analysis reveals that the primary known contributor to a cancer-associated microbial community is the organs affected by the type of cancer. Overall, this graph neural network framework refines the number of microbes that can be used for follow-up experimental validation from thousands to tens, thereby helping to accelerate the understanding of the relationship between tumors and intratumoral microbiomes.
Collapse
Affiliation(s)
- Zhaoqian Liu
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
- College of Sciences, Xi'an University of Science and Technology, Xi'an, Shanxi, 710054, China
| | - Yuhan Sun
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Yingjie Li
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Anjun Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The Ohio State University, Columbus, OH, 43210, USA
| | - Nyelia F Willaims
- Department of Internal Medicine, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
| | - Shiva Jahanbahkshi
- Department of Food Science and Technology, College of Food, Agricultural, and Environmental Sciences, The Ohio State University, Columbus, OH, 43210, USA
| | - Rebecca Hoyd
- Department of Internal Medicine, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
| | - Xiaoying Wang
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The Ohio State University, Columbus, OH, 43210, USA
| | - Shiqi Zhang
- Department of Human Sciences, College of Education and Human Ecology, The Ohio State University, Columbus, OH, 43210, USA
| | - Jiangjiang Zhu
- Department of Human Sciences, College of Education and Human Ecology, The Ohio State University, Columbus, OH, 43210, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65201, USA
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, 65201, USA
| | - Daniel Spakowicz
- Pelotonia Institute for Immuno-Oncology, The Ohio State University, Columbus, OH, 43210, USA
- Department of Internal Medicine, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
| | - Qin Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The Ohio State University, Columbus, OH, 43210, USA
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
- Shandong National Center for Applied Mathematics, Jinan, Shandong, 250199, China
| |
Collapse
|
2
|
Roy G, Prifti E, Belda E, Zucker JD. Deep learning methods in metagenomics: a review. Microb Genom 2024; 10. [PMID: 38630611 DOI: 10.1099/mgen.0.001231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024] Open
Abstract
The ever-decreasing cost of sequencing and the growing potential applications of metagenomics have led to an unprecedented surge in data generation. One of the most prevalent applications of metagenomics is the study of microbial environments, such as the human gut. The gut microbiome plays a crucial role in human health, providing vital information for patient diagnosis and prognosis. However, analysing metagenomic data remains challenging due to several factors, including reference catalogues, sparsity and compositionality. Deep learning (DL) enables novel and promising approaches that complement state-of-the-art microbiome pipelines. DL-based methods can address almost all aspects of microbiome analysis, including novel pathogen detection, sequence classification, patient stratification and disease prediction. Beyond generating predictive models, a key aspect of these methods is also their interpretability. This article reviews DL approaches in metagenomics, including convolutional networks, autoencoders and attention-based models. These methods aggregate contextualized data and pave the way for improved patient care and a better understanding of the microbiome's key role in our health.
Collapse
Affiliation(s)
- Gaspar Roy
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
| | - Edi Prifti
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
- Sorbonne University, INSERM, Nutriomics, 91 bvd de l'hopital, 75013 Paris, France
| | - Eugeni Belda
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
- Sorbonne University, INSERM, Nutriomics, 91 bvd de l'hopital, 75013 Paris, France
| | - Jean-Daniel Zucker
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
- Sorbonne University, INSERM, Nutriomics, 91 bvd de l'hopital, 75013 Paris, France
| |
Collapse
|
3
|
Sharma D, Lou W, Xu W. phylaGAN: data augmentation through conditional GANs and autoencoders for improving disease prediction accuracy using microbiome data. Bioinformatics 2024; 40:btae161. [PMID: 38569898 PMCID: PMC11256914 DOI: 10.1093/bioinformatics/btae161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 02/18/2024] [Accepted: 04/01/2024] [Indexed: 04/05/2024] Open
Abstract
MOTIVATION Research is improving our understanding of how the microbiome interacts with the human body and its impact on human health. Existing machine learning methods have shown great potential in discriminating healthy from diseased microbiome states. However, Machine Learning based prediction using microbiome data has challenges such as, small sample size, imbalance between cases and controls and high cost of collecting large number of samples. To address these challenges, we propose a deep learning framework phylaGAN to augment the existing datasets with generated microbiome data using a combination of conditional generative adversarial network (C-GAN) and autoencoder. Conditional generative adversarial networks train two models against each other to compute larger simulated datasets that are representative of the original dataset. Autoencoder maps the original and the generated samples onto a common subspace to make the prediction more accurate. RESULTS Extensive evaluation and predictive analysis was conducted on two datasets, T2D study and Cirrhosis study showing an improvement in mean AUC using data augmentation by 11% and 5% respectively. External validation on a cohort classifying between obese and lean subjects, with a smaller sample size provided an improvement in mean AUC close to 32% when augmented through phylaGAN as compared to using the original cohort. Our findings not only indicate that the generative adversarial networks can create samples that mimic the original data across various diversity metrics, but also highlight the potential of enhancing disease prediction through machine learning models trained on synthetic data. AVAILABILITY AND IMPLEMENTATION https://github.com/divya031090/phylaGAN.
Collapse
Affiliation(s)
- Divya Sharma
- Biostatistics Department, Princess Margaret Cancer Center, University Health Network, Toronto, ON, M5G2C4, Canada
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, M5T3M7, Canada
| | - Wendy Lou
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, M5T3M7, Canada
| | - Wei Xu
- Biostatistics Department, Princess Margaret Cancer Center, University Health Network, Toronto, ON, M5G2C4, Canada
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, M5T3M7, Canada
| |
Collapse
|
4
|
Shtossel O, Koren O, Shai I, Rinott E, Louzoun Y. Gut microbiome-metabolome interactions predict host condition. MICROBIOME 2024; 12:24. [PMID: 38336867 PMCID: PMC10858481 DOI: 10.1186/s40168-023-01737-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 12/10/2023] [Indexed: 02/12/2024]
Abstract
BACKGROUND The effect of microbes on their human host is often mediated through changes in metabolite concentrations. As such, multiple tools have been proposed to predict metabolite concentrations from microbial taxa frequencies. Such tools typically fail to capture the dependence of the microbiome-metabolite relation on the environment. RESULTS We propose to treat the microbiome-metabolome relation as the equilibrium of a complex interaction and to relate the host condition to a latent representation of the interaction between the log concentration of the metabolome and the log frequencies of the microbiome. We develop LOCATE (Latent variables Of miCrobiome And meTabolites rElations), a machine learning tool to predict the metabolite concentration from the microbiome composition and produce a latent representation of the interaction. This representation is then used to predict the host condition. LOCATE's accuracy in predicting the metabolome is higher than all current predictors. The metabolite concentration prediction accuracy significantly decreases cross datasets, and cross conditions, especially in 16S data. LOCATE's latent representation predicts the host condition better than either the microbiome or the metabolome. This representation is strongly correlated with host demographics. A significant improvement in accuracy (0.793 vs. 0.724 average accuracy) is obtained even with a small number of metabolite samples ([Formula: see text]). CONCLUSION These results suggest that a latent representation of the microbiome-metabolome interaction leads to a better association with the host condition than any of the two separated or the simple combination of the two. Video Abstract.
Collapse
Affiliation(s)
- Oshrit Shtossel
- Department of Mathematics, Bar-Ilan University, Ramat Gan, 52900, Israel
| | - Omry Koren
- The Azrieli Faculty of Medicine, Bar-Ilan University, Safed, Israel
| | - Iris Shai
- Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Ehud Rinott
- Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Yoram Louzoun
- Department of Mathematics, Bar-Ilan University, Ramat Gan, 52900, Israel.
| |
Collapse
|
5
|
Asher EE, Bashan A. Model-free prediction of microbiome compositions. MICROBIOME 2024; 12:17. [PMID: 38303006 PMCID: PMC10832217 DOI: 10.1186/s40168-023-01721-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Accepted: 11/15/2023] [Indexed: 02/03/2024]
Abstract
BACKGROUND The recent recognition of the importance of the microbiome to the host's health and well-being has yielded efforts to develop therapies that aim to shift the microbiome from a disease-associated state to a healthier one. Direct manipulation techniques of the species' assemblage are currently available, e.g., using probiotics or narrow-spectrum antibiotics to introduce or eliminate specific taxa. However, predicting the species' abundances at the new state remains a challenge, mainly due to the difficulties of deciphering the delicate underlying network of ecological interactions or constructing a predictive model for such complex ecosystems. RESULTS Here, we propose a model-free method to predict the species' abundances at the new steady state based on their presence/absence configuration by utilizing a multi-dimensional k-nearest-neighbors (kNN) regression algorithm. By analyzing data from numeric simulations of ecological dynamics, we show that our predictions, which consider the presence/absence of all species holistically, outperform both the null model that uses the statistics of each species independently and a predictive neural network model. We analyze real metagenomic data of human-associated microbial communities and find that by relying on a small number of "neighboring" samples, i.e., samples with similar species assemblage, the kNN predicts the species abundance better than the whole-cohort average. By studying both real metagenomic and simulated data, we show that the predictability of our method is tightly related to the dissimilarity-overlap relationship of the training data. CONCLUSIONS Our results demonstrate how model-free methods can prove useful in predicting microbial communities and may facilitate the development of microbial-based therapies. Video Abstract.
Collapse
Affiliation(s)
- Eitan E Asher
- Physics Department, Bar-Ilan University, Ramat-Gan, Israel
| | - Amir Bashan
- Physics Department, Bar-Ilan University, Ramat-Gan, Israel.
| |
Collapse
|
6
|
Xu H, Wang T, Miao Y, Qian M, Yang Y, Wang S. MK-BMC: a Multi-Kernel framework with Boosted distance metrics for Microbiome data for Classification. Bioinformatics 2024; 40:btad757. [PMID: 38200571 PMCID: PMC10789312 DOI: 10.1093/bioinformatics/btad757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 10/30/2023] [Accepted: 01/09/2024] [Indexed: 01/12/2024] Open
Abstract
MOTIVATION Research on human microbiome has suggested associations with human health, opening opportunities to predict health outcomes using microbiome. Studies have also suggested that diverse forms of taxa such as rare taxa that are evolutionally related and abundant taxa that are evolutionally unrelated could be associated with or predictive of a health outcome. Although prediction models were developed for microbiome data, no prediction models currently exist that use multiple forms of microbiome-outcome associations. RESULTS We developed MK-BMC, a Multi-Kernel framework with Boosted distance Metrics for Classification using microbiome data. We propose to first boost widely used distance metrics for microbiome data using taxon-level association signal strengths to up-weight taxa that are potentially associated with an outcome of interest. We then propose a multi-kernel prediction model with one kernel capturing one form of association between taxa and the outcome, where a kernel measures similarities of microbiome compositions between pairs of samples being transformed from a proposed boosted distance metric. We demonstrated superior prediction performance of (i) boosted distance metrics for microbiome data over original ones and (ii) MK-BMC over competing methods through extensive simulations. We applied MK-BMC to predict thyroid, obesity, and inflammatory bowel disease status using gut microbiome data from the American Gut Project and observed much-improved prediction performance over that of competing methods. The learned kernel weights help us understand contributions of individual microbiome signal forms nicely. AVAILABILITY AND IMPLEMENTATION Source code together with a sample input dataset is available at https://github.com/HXu06/MK-BMC.
Collapse
Affiliation(s)
- Huang Xu
- Department of Statistics and Finance, University of Science and Technology of China, Hefei 230026, China
| | - Tian Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY 10032, United States
| | - Yuqi Miao
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY 10032, United States
| | - Min Qian
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY 10032, United States
| | - Yaning Yang
- Department of Statistics and Finance, University of Science and Technology of China, Hefei 230026, China
| | - Shuang Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY 10032, United States
| |
Collapse
|
7
|
Das A, Behera RN, Kapoor A, Ambatipudi K. The Potential of Meta-Proteomics and Artificial Intelligence to Establish the Next Generation of Probiotics for Personalized Healthcare. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2023; 71:17528-17542. [PMID: 37955263 DOI: 10.1021/acs.jafc.3c03834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2023]
Abstract
The symbiosis of probiotic bacteria with humans has rendered various health benefits while providing nutrition and a suitable environment for their survival. However, the probiotics must survive unfavorable gut conditions to exert beneficial effects. The intrinsic resistance of probiotics to survive harsh conditions results from a myriad of proteins. Interaction of microbial proteins with the host is indispensable for modulating the gut microbiome, such as interaction with cell receptors and protective action against pathogens. The complex interplay of proteins should be unraveled by utilizing metaproteomic strategies. The contribution of probiotics to health is now widely accepted. However, due to the inconsistency of generalized probiotics, contemporary research toward precision probiotics has gained momentum for customized treatment. This review explores the application of metaproteomics and AI/ML algorithms in resolving multiomics data analysis and in silico prediction of microbial features for screening specific beneficial probiotic organisms. Implementing these integrative strategies could augment the potential of precision probiotics for personalized healthcare.
Collapse
Affiliation(s)
- Arpita Das
- Department of Biosciences and Bioengineering, Indian Institute of Technology Roorkee, Roorkee 247667, India
| | - Rama N Behera
- Department of Biosciences and Bioengineering, Indian Institute of Technology Roorkee, Roorkee 247667, India
| | - Ayushi Kapoor
- Department of Biosciences and Bioengineering, Indian Institute of Technology Roorkee, Roorkee 247667, India
| | - Kiran Ambatipudi
- Department of Biosciences and Bioengineering, Indian Institute of Technology Roorkee, Roorkee 247667, India
| |
Collapse
|
8
|
Sharma D, Xu W. ReGeNNe: genetic pathway-based deep neural network using canonical correlation regularizer for disease prediction. Bioinformatics 2023; 39:btad679. [PMID: 37963055 PMCID: PMC10666205 DOI: 10.1093/bioinformatics/btad679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 10/06/2023] [Accepted: 11/13/2023] [Indexed: 11/16/2023] Open
Abstract
MOTIVATION Common human diseases result from the interplay of genes and their biologically associated pathways. Genetic pathway analyses provide more biological insight as compared to conventional gene-based analysis. In this article, we propose a framework combining genetic data into pathway structure and using an ensemble of convolutional neural networks (CNNs) along with a Canonical Correlation Regularizer layer for comprehensive prediction of disease risk. The novelty of our approach lies in our two-step framework: (i) utilizing the CNN's effectiveness to extract the complex gene associations within individual genetic pathways and (ii) fusing features from ensemble of CNNs through Canonical Correlation Regularization layer to incorporate the interactions between pathways which share common genes. During prediction, we also address the important issues of interpretability of neural network models, and identifying the pathways and genes playing an important role in prediction. RESULTS Implementation of our methodology into three real cancer genetic datasets for different prediction tasks validates our model's generalizability and robustness. Comparing with conventional models, our methodology provides consistently better performance with AUC improvement of 11% on predicting early/late-stage kidney cancer, 10% on predicting kidney versus liver cancer type and 7% on predicting survival status in ovarian cancer as compared to the next best conventional machine learning model. The robust performance of our deep learning algorithm indicates that disease prediction using neural networks in multiple functionally related genes across different pathways improves genetic data-based prediction and understanding molecular mechanisms of diseases. AVAILABILITY AND IMPLEMENTATION https://github.com/divya031090/ReGeNNe.
Collapse
Affiliation(s)
- Divya Sharma
- Biostatistics Department, Princess Margaret Cancer Center, University Health Network, Toronto, ON M5G2C4, Canada
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON M5T 3M7, Canada
| | - Wei Xu
- Biostatistics Department, Princess Margaret Cancer Center, University Health Network, Toronto, ON M5G2C4, Canada
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON M5T 3M7, Canada
| |
Collapse
|
9
|
Li B, Wang T, Qian M, Wang S. MKMR: a multi-kernel machine regression model to predict health outcomes using human microbiome data. Brief Bioinform 2023; 24:7142722. [PMID: 37099694 DOI: 10.1093/bib/bbad158] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 03/24/2023] [Accepted: 04/03/2023] [Indexed: 04/28/2023] Open
Abstract
Studies have found that human microbiome is associated with and predictive of human health and diseases. Many statistical methods developed for microbiome data focus on different distance metrics that can capture various information in microbiomes. Prediction models were also developed for microbiome data, including deep learning methods with convolutional neural networks that consider both taxa abundance profiles and taxonomic relationships among microbial taxa from a phylogenetic tree. Studies have also suggested that a health outcome could associate with multiple forms of microbiome profiles. In addition to the abundance of some taxa that are associated with a health outcome, the presence/absence of some taxa is also associated with and predictive of the same health outcome. Moreover, associated taxa may be close to each other on a phylogenetic tree or spread apart on a phylogenetic tree. No prediction models currently exist that use multiple forms of microbiome-outcome associations. To address this, we propose a multi-kernel machine regression (MKMR) method that is able to capture various types of microbiome signals when doing predictions. MKMR utilizes multiple forms of microbiome signals through multiple kernels being transformed from multiple distance metrics for microbiomes and learn an optimal conic combination of these kernels, with kernel weights helping us understand contributions of individual microbiome signal types. Simulation studies suggest a much-improved prediction performance over competing methods with mixture of microbiome signals. Real data applicants to predict multiple health outcomes using throat and gut microbiome data also suggest a better prediction of MKMR than that of competing methods.
Collapse
Affiliation(s)
- Bing Li
- Department of Biostatistics, School of Public Health, Brown University, Providence, Rhode Island, U.S.A
| | - Tian Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia University, 722 West 168th Street, New York, New York, 10032 U.S.A
| | - Min Qian
- Department of Biostatistics, Mailman School of Public Health, Columbia University, 722 West 168th Street, New York, New York, 10032 U.S.A
| | - Shuang Wang
- Department of Biostatistics, School of Public Health, Brown University, Providence, Rhode Island, U.S.A
| |
Collapse
|
10
|
Syama K, Jothi JAA, Khanna N. Automatic disease prediction from human gut metagenomic data using boosting GraphSAGE. BMC Bioinformatics 2023; 24:126. [PMID: 37003965 PMCID: PMC10067187 DOI: 10.1186/s12859-023-05251-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 03/23/2023] [Indexed: 04/03/2023] Open
Abstract
BACKGROUND The human microbiome plays a critical role in maintaining human health. Due to the recent advances in high-throughput sequencing technologies, the microbiome profiles present in the human body have become publicly available. Hence, many works have been done to analyze human microbiome profiles. These works have identified that different microbiome profiles are present in healthy and sick individuals for different diseases. Recently, several computational methods have utilized the microbiome profiles to automatically diagnose and classify the host phenotype. RESULTS In this work, a novel deep learning framework based on boosting GraphSAGE is proposed for automatic prediction of diseases from metagenomic data. The proposed framework has two main components, (a). Metagenomic Disease graph (MD-graph) construction module, (b). Disease prediction Network (DP-Net) module. The graph construction module constructs a graph by considering each metagenomic sample as a node in the graph. The graph captures the relationship between the samples using a proximity measure. The DP-Net consists of a boosting GraphSAGE model which predicts the status of a sample as sick or healthy. The effectiveness of the proposed method is verified using real and synthetic datasets corresponding to diseases like inflammatory bowel disease and colorectal cancer. The proposed model achieved a highest AUC of 93%, Accuracy of 95%, F1-score of 95%, AUPRC of 95% for the real inflammatory bowel disease dataset and a best AUC of 90%, Accuracy of 91%, F1-score of 87% and AUPRC of 93% for the real colorectal cancer dataset. CONCLUSION The proposed framework outperforms other machine learning and deep learning models in terms of classification accuracy, AUC, F1-score and AUPRC for both synthetic and real metagenomic data.
Collapse
Affiliation(s)
- K Syama
- Department of Computer Science, Birla Institute of Technology and Science Pilani Dubai Campus, Dubai International Academic City , Dubai, UAE
| | - J Angel Arul Jothi
- Department of Computer Science, Birla Institute of Technology and Science Pilani Dubai Campus, Dubai International Academic City , Dubai, UAE.
| | | |
Collapse
|
11
|
Peng S, Luo M, Long D, Liu Z, Tan Q, Huang P, Shen J, Pu S. Full-length 16S rRNA gene sequencing and machine learning reveal the bacterial composition of inhalable particles from two different breeding stages in a piggery. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2023; 253:114712. [PMID: 36863163 DOI: 10.1016/j.ecoenv.2023.114712] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 02/15/2023] [Accepted: 02/27/2023] [Indexed: 06/18/2023]
Abstract
Bacterial loading aggravates the harm of particulate matter (PM) to public health and ecological systems, especially in operations of concentrated animal production. This study aimed to explore the characteristics and influencing factors of bacterial components of inhalable particles at a piggery. The morphology and elemental composition of coarse particles (PM10, aerodynamic diameter ≤ 10 µm) and fine particles (PM2.5, aerodynamic diameter ≤ 2.5 µm) were analyzed. Full-length 16 S rRNA sequencing technology was used to identify bacterial components according to breeding stage, particle size, and diurnal rhythm. Machine learning (ML) algorithms were used to further explore the relationship between bacteria and the environment. The results showed that the morphology of particles in the piggery differed, and the morphologies of the suspected bacterial components were elliptical deposited particles. Full-length 16 S rRNA indicated that most of the airborne bacteria in the fattening and gestation houses were bacilli. The analysis of beta diversity and difference between samples showed that the relative abundance of some bacteria in PM2.5 was significantly higher than that in PM10 at the same pig house (P < 0.01). There were significant differences in the bacterial composition of inhalable particles between the fattening and gestation houses (P < 0.01). The aggregated boosted tree (ABT) model showed that PM2.5 had a great influence on airborne bacteria among air pollutants. Fast expectation-maximization microbial source tracking (FEAST) showed that feces was a major potential source of airborne bacteria in pig houses (contribution 52.64-80.58 %). These results will provide a scientific basis for exploring the potential risks of airborne bacteria in a piggery to human and animal health.
Collapse
Affiliation(s)
- Siyi Peng
- Chongqing Academy of Animal Sciences, No. 51, Changlong Avenue, Rong chang District, Chongqing 402460, China; College of Animal Science and Technology, Southwest University, Chongqing 402460, China
| | - Min Luo
- Chongqing Academy of Animal Sciences, No. 51, Changlong Avenue, Rong chang District, Chongqing 402460, China
| | - Dingbiao Long
- Chongqing Academy of Animal Sciences, No. 51, Changlong Avenue, Rong chang District, Chongqing 402460, China; Scientific Observation and Experiment Station of Livestock Equipment Engineering in Southwest, Ministry of Agriculture and Rural Affairs, Chongqing 402460, China; Innovation and Entrepreneurship Team for Livestock Environment Control and Equipment R&D, Chongqing 402460, China; National Center of Technology Innovation for pigs, Chongqing 402460, China
| | - Zuohua Liu
- Chongqing Academy of Animal Sciences, No. 51, Changlong Avenue, Rong chang District, Chongqing 402460, China; National Center of Technology Innovation for pigs, Chongqing 402460, China; College of Animal Science and Technology, Southwest University, Chongqing 402460, China
| | - Qiong Tan
- Chongqing Academy of Animal Sciences, No. 51, Changlong Avenue, Rong chang District, Chongqing 402460, China; National Center of Technology Innovation for pigs, Chongqing 402460, China
| | - Ping Huang
- Chongqing Academy of Animal Sciences, No. 51, Changlong Avenue, Rong chang District, Chongqing 402460, China; National Center of Technology Innovation for pigs, Chongqing 402460, China
| | - Jie Shen
- Chongqing Academy of Animal Sciences, No. 51, Changlong Avenue, Rong chang District, Chongqing 402460, China; National Center of Technology Innovation for pigs, Chongqing 402460, China
| | - Shihua Pu
- Chongqing Academy of Animal Sciences, No. 51, Changlong Avenue, Rong chang District, Chongqing 402460, China; Scientific Observation and Experiment Station of Livestock Equipment Engineering in Southwest, Ministry of Agriculture and Rural Affairs, Chongqing 402460, China; Innovation and Entrepreneurship Team for Livestock Environment Control and Equipment R&D, Chongqing 402460, China; National Center of Technology Innovation for pigs, Chongqing 402460, China.
| |
Collapse
|
12
|
Shen Y, Zhu J, Deng Z, Lu W, Wang H. EnsDeepDP: An Ensemble Deep Learning Approach for Disease Prediction Through Metagenomics. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:986-998. [PMID: 36001521 DOI: 10.1109/tcbb.2022.3201295] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
A growing number of studies show that the human microbiome plays a vital role in human health and can be a crucial factor in predicting certain human diseases. However, microbiome data are often characterized by the limited samples and high-dimensional features, which pose a great challenge for machine learning methods. Therefore, this paper proposes a novel ensemble deep learning disease prediction method that combines unsupervised and supervised learning paradigms. First, unsupervised deep learning methods are used to learn the potential representation of the sample. Afterwards, the disease scoring strategy is developed based on the deep representations as the informative features for ensemble analysis. To ensure the optimal ensemble, a score selection mechanism is constructed, and performance boosting features are engaged with the original sample. Finally, the composite features are trained with gradient boosting classifier for health status decision. For case study, the ensemble deep learning flowchart has been demonstrated on six public datasets extracted from the human microbiome profiling. The results show that compared with the existing algorithms, our framework achieves better performance on disease prediction.
Collapse
|
13
|
Shtossel O, Isakov H, Turjeman S, Koren O, Louzoun Y. Ordering taxa in image convolution networks improves microbiome-based machine learning accuracy. Gut Microbes 2023; 15:2224474. [PMID: 37345233 PMCID: PMC10288916 DOI: 10.1080/19490976.2023.2224474] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 06/08/2023] [Indexed: 06/23/2023] Open
Abstract
The human gut microbiome is associated with a large number of disease etiologies. As such, it is a natural candidate for machine-learning-based biomarker development for multiple diseases and conditions. The microbiome is often analyzed using 16S rRNA gene sequencing or shotgun metagenomics. However, several properties of microbial sequence-based studies hinder machine learning (ML), including non-uniform representation, a small number of samples compared with the dimension of each sample, and sparsity of the data, with the majority of taxa present in a small subset of samples. We show here using a graph representation that the cladogram structure is as informative as the taxa frequency. We then suggest a novel method to combine information from different taxa and improve data representation for ML using microbial taxonomy. iMic (image microbiome) translates the microbiome to images through an iterative ordering scheme, and applies convolutional neural networks to the resulting image. We show that iMic has a higher precision in static microbiome gene sequence-based ML than state-of-the-art methods. iMic also facilitates the interpretation of the classifiers through an explainable artificial intelligence (AI) algorithm to iMic to detect taxa relevant to each condition. iMic is then extended to dynamic microbiome samples by translating them to movies.
Collapse
Affiliation(s)
- Oshrit Shtossel
- Department of Mathematics, Bar-Ilan University, Ramat Gan, Israel
| | - Haim Isakov
- Department of Mathematics, Bar-Ilan University, Ramat Gan, Israel
| | - Sondra Turjeman
- The Azrieli Faculty of Medicine, Bar-Ilan University, Safed, Israel
| | - Omry Koren
- The Azrieli Faculty of Medicine, Bar-Ilan University, Safed, Israel
| | - Yoram Louzoun
- Department of Mathematics, Bar-Ilan University, Ramat Gan, Israel
| |
Collapse
|
14
|
Imangaliyev S, Schlötterer J, Meyer F, Seifert C. Diagnosis of Inflammatory Bowel Disease and Colorectal Cancer through Multi-View Stacked Generalization Applied on Gut Microbiome Data. Diagnostics (Basel) 2022; 12:diagnostics12102514. [PMID: 36292203 PMCID: PMC9600435 DOI: 10.3390/diagnostics12102514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 10/08/2022] [Accepted: 10/11/2022] [Indexed: 12/02/2022] Open
Abstract
Most of the microbiome studies suggest that using ensemble models such as Random Forest results in best predictive power. In this study, we empirically evaluate a more powerful ensemble learning algorithm, multi-view stacked generalization, on pediatric inflammatory bowel disease and adult colorectal cancer patients’ cohorts. We aim to check whether stacking would lead to better results compared to using a single best machine learning algorithm. Stacking achieves the best test set Average Precision (AP) on inflammatory bowel disease dataset reaching AP = 0.69, outperforming both the best base classifier (AP = 0.61) and the baseline meta learner built on top of base classifiers (AP = 0.63). On colorectal cancer dataset, the stacked classifier also outperforms (AP = 0.81) both the best base classifier (AP = 0.79) and the baseline meta learner (AP = 0.75). Stacking achieves best predictive performance on test set outperforming the best classifiers on both patient cohorts. Application of the stacking solves the issue of choosing the most appropriate machine learning algorithm by automating the model selection procedure. Clinical application of such a model is not limited to diagnosis task only, but it also can be extended to biomarker selection thanks to feature selection procedure.
Collapse
Affiliation(s)
- Sultan Imangaliyev
- Institute for Artificial Intelligence in Medicine, University of Duisburg-Essen, 45131 Essen, Germany
- Cancer Research Center Cologne Essen (CCCE), 45147 Essen, Germany
| | - Jörg Schlötterer
- Institute for Artificial Intelligence in Medicine, University of Duisburg-Essen, 45131 Essen, Germany
- Cancer Research Center Cologne Essen (CCCE), 45147 Essen, Germany
| | - Folker Meyer
- Institute for Artificial Intelligence in Medicine, University of Duisburg-Essen, 45131 Essen, Germany
| | - Christin Seifert
- Institute for Artificial Intelligence in Medicine, University of Duisburg-Essen, 45131 Essen, Germany
- Cancer Research Center Cologne Essen (CCCE), 45147 Essen, Germany
- Correspondence:
| |
Collapse
|
15
|
Hernández Medina R, Kutuzova S, Nielsen KN, Johansen J, Hansen LH, Nielsen M, Rasmussen S. Machine learning and deep learning applications in microbiome research. ISME COMMUNICATIONS 2022; 2:98. [PMID: 37938690 PMCID: PMC9723725 DOI: 10.1038/s43705-022-00182-9] [Citation(s) in RCA: 43] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 09/12/2022] [Accepted: 09/16/2022] [Indexed: 05/27/2023]
Abstract
The many microbial communities around us form interactive and dynamic ecosystems called microbiomes. Though concealed from the naked eye, microbiomes govern and influence macroscopic systems including human health, plant resilience, and biogeochemical cycling. Such feats have attracted interest from the scientific community, which has recently turned to machine learning and deep learning methods to interrogate the microbiome and elucidate the relationships between its composition and function. Here, we provide an overview of how the latest microbiome studies harness the inductive prowess of artificial intelligence methods. We start by highlighting that microbiome data - being compositional, sparse, and high-dimensional - necessitates special treatment. We then introduce traditional and novel methods and discuss their strengths and applications. Finally, we discuss the outlook of machine and deep learning pipelines, focusing on bottlenecks and considerations to address them.
Collapse
Affiliation(s)
- Ricardo Hernández Medina
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200, Copenhagen N, Denmark
| | - Svetlana Kutuzova
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200, Copenhagen N, Denmark
- Department of Computer Science, University of Copenhagen, DK-2100, Copenhagen Ø, Denmark
| | - Knud Nor Nielsen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200, Copenhagen N, Denmark
- Department of Plant and Environmental Sciences, University of Copenhagen, DK-1871, Frederiksberg, Denmark
| | - Joachim Johansen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200, Copenhagen N, Denmark
| | - Lars Hestbjerg Hansen
- Department of Plant and Environmental Sciences, University of Copenhagen, DK-1871, Frederiksberg, Denmark
| | - Mads Nielsen
- Department of Computer Science, University of Copenhagen, DK-2100, Copenhagen Ø, Denmark.
| | - Simon Rasmussen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200, Copenhagen N, Denmark.
| |
Collapse
|
16
|
Zeng W, Gautam A, Huson DH. DeepToA: An Ensemble Deep-Learning Approach to Predicting the Theater of Activity of a Microbiome. Bioinformatics 2022; 38:4670-4676. [PMID: 36029249 DOI: 10.1093/bioinformatics/btac584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 07/19/2022] [Accepted: 08/26/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Metagenomics is the study of microbiomes using DNA sequencing. A microbiome consists of an assemblage of microbes that is associated with a "theater of activity" (ToA). An important question is, to what degree does the taxonomic and functional content of the former depend on the (details of the) latter? Here we investigate a related technical question: Given a taxonomic and/or functional profile estimated from metagenomic sequencing data, how to predict the associated ToA? We present a deep-learning approach to this question. We use both taxonomic and functional profiles as input. We apply node2vec to embed hierarchical taxonomic profiles into numerical vectors. We then perform dimension reduction using clustering, to address the sparseness of the taxonomic data and thus make the problem more amenable to deep-learning algorithms. Functional features are combined with textual descriptions of protein families or domains. We present an ensemble deep-learning framework DeepToA for predicting the "theater of activity" of amicrobial community, based on taxonomic and functional profiles. We use SHAP (SHapley Additive exPlanations) values to determine which taxonomic and functional features are important for the prediction. RESULTS Based on 7,560 metagenomic profiles downloaded from MGnify, classified into ten different theaters of activity, we demonstrate that DeepToA has an accuracy of 98.30%. We show that adding textual information to functional features increases the accuracy. AVAILABILITY Our approach is available at http://ab.inf.uni-tuebingen.de/software/deeptoa. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wenhuan Zeng
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, 72076, Germany
| | - Anupam Gautam
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, 72076, Germany.,International Max Planck Research School "From Molecules to Organisms", Max Planck Institute for Biology Tübingen, Max-Planck-Ring 5, Tübingen, 72076, Germany
| | - Daniel H Huson
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, 72076, Germany.,International Max Planck Research School "From Molecules to Organisms", Max Planck Institute for Biology Tübingen, Max-Planck-Ring 5, Tübingen, 72076, Germany.,Cluster of Excellence: Controlling Microbes to Fight Infection, Tübingen, Germany
| |
Collapse
|
17
|
Zhou X, Chen L, Liu HX. Applications of Machine Learning Models to Predict and Prevent Obesity: A Mini-Review. Front Nutr 2022; 9:933130. [PMID: 35866076 PMCID: PMC9294383 DOI: 10.3389/fnut.2022.933130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 05/19/2022] [Indexed: 11/28/2022] Open
Abstract
Research on obesity and related diseases has received attention from government policymakers; interventions targeting nutrient intake, dietary patterns, and physical activity are deployed globally. An urgent issue now is how can we improve the efficiency of obesity research or obesity interventions. Currently, machine learning (ML) methods have been widely applied in obesity-related studies to detect obesity disease biomarkers or discover intervention strategies to optimize weight loss results. In addition, an open source of these algorithms is necessary to check the reproducibility of the research results. Furthermore, appropriate applications of these algorithms could greatly improve the efficiency of similar studies by other researchers. Here, we proposed a mini-review of several open-source ML algorithms, platforms, or related databases that are of particular interest or can be applied in the field of obesity research. We focus our topic on nutrition, environment and social factor, genetics or genomics, and microbiome-adopting ML algorithms.
Collapse
Affiliation(s)
- Xiaobei Zhou
- Health Sciences Institute, China Medical University, Shenyang, China
- Liaoning Key Laboratory of Obesity and Glucose/Lipid Associated Metabolic Diseases, China Medical University, Shenyang, China
| | - Lei Chen
- Health Sciences Institute, China Medical University, Shenyang, China
- Liaoning Key Laboratory of Obesity and Glucose/Lipid Associated Metabolic Diseases, China Medical University, Shenyang, China
- Institute of Life Sciences, China Medical University, Shenyang, China
| | - Hui-Xin Liu
- Health Sciences Institute, China Medical University, Shenyang, China
- Liaoning Key Laboratory of Obesity and Glucose/Lipid Associated Metabolic Diseases, China Medical University, Shenyang, China
- Institute of Life Sciences, China Medical University, Shenyang, China
| |
Collapse
|
18
|
Lim H, Cankara F, Tsai CJ, Keskin O, Nussinov R, Gursoy A. Artificial intelligence approaches to human-microbiome protein–protein interactions. Curr Opin Struct Biol 2022; 73:102328. [DOI: 10.1016/j.sbi.2022.102328] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 12/01/2021] [Accepted: 12/31/2021] [Indexed: 02/08/2023]
|
19
|
Michel‐Mata S, Wang X, Liu Y, Angulo MT. Predicting microbiome compositions from species assemblages through deep learning. IMETA 2022; 1:e3. [PMID: 35757098 PMCID: PMC9221840 DOI: 10.1002/imt2.3] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 12/29/2021] [Accepted: 01/04/2022] [Indexed: 05/13/2023]
Abstract
Microbes can form complex communities that perform critical functions in maintaining the integrity of their environment or their hosts' well-being. Rationally managing these microbial communities requires improving our ability to predict how different species assemblages affect the final species composition of the community. However, making such a prediction remains challenging because of our limited knowledge of the diverse physical, biochemical, and ecological processes governing microbial dynamics. To overcome this challenge, we present a deep learning framework that automatically learns the map between species assemblages and community compositions from training data only, without knowing any of the above processes. First, we systematically validate our framework using synthetic data generated by classical population dynamics models. Then, we apply our framework to data from in vitro and in vivo microbial communities, including ocean and soil microbiota, Drosophila melanogaster gut microbiota, and human gut and oral microbiota. We find that our framework learns to perform accurate out-of-sample predictions of complex community compositions from a small number of training samples. Our results demonstrate how deep learning can enable us to understand better and potentially manage complex microbial communities.
Collapse
Affiliation(s)
- Sebastian Michel‐Mata
- Center for Applied Physics and Advanced TechnologyUniversidad Nacional Autónoma de MéxicoJuriquillaMexico
- Department of Ecology and Evolutionary BiologyPrinceton UniversityPrincetonNew JerseyUSA
| | - Xu‐Wen Wang
- Channing Division of Network Medicine, Department of MedicineBrigham and Women's Hospital and Harvard Medical SchoolBostonMassachusettsUSA
| | - Yang‐Yu Liu
- Channing Division of Network Medicine, Department of MedicineBrigham and Women's Hospital and Harvard Medical SchoolBostonMassachusettsUSA
| | - Marco Tulio Angulo
- CONACyT—Institute of MathematicsUniversidad Nacional Autónoma de MéxicoJuriquillaMexico
| |
Collapse
|
20
|
David MM, Tataru C, Pope Q, Baker LJ, English MK, Epstein HE, Hammer A, Kent M, Sieler MJ, Mueller RS, Sharpton TJ, Tomas F, Vega Thurber R, Fern XZ. Revealing General Patterns of Microbiomes That Transcend Systems: Potential and Challenges of Deep Transfer Learning. mSystems 2022; 7:e0105821. [PMID: 35040699 PMCID: PMC8765061 DOI: 10.1128/msystems.01058-21] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
A growing body of research has established that the microbiome can mediate the dynamics and functional capacities of diverse biological systems. Yet, we understand little about what governs the response of these microbial communities to host or environmental changes. Most efforts to model microbiomes focus on defining the relationships between the microbiome, host, and environmental features within a specified study system and therefore fail to capture those that may be evident across multiple systems. In parallel with these developments in microbiome research, computer scientists have developed a variety of machine learning tools that can identify subtle, but informative, patterns from complex data. Here, we recommend using deep transfer learning to resolve microbiome patterns that transcend study systems. By leveraging diverse public data sets in an unsupervised way, such models can learn contextual relationships between features and build on those patterns to perform subsequent tasks (e.g., classification) within specific biological contexts.
Collapse
Affiliation(s)
- Maude M. David
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
- Department of Pharmaceutical Sciences, Oregon State University, Corvallis, Oregon, USA
| | - Christine Tataru
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Quintin Pope
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon, USA
| | - Lydia J. Baker
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Mary K. English
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Hannah E. Epstein
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Austin Hammer
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Michael Kent
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Michael J. Sieler
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Ryan S. Mueller
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Thomas J. Sharpton
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
- Department of Statistics, Oregon State University, Corvallis, Oregon, USA
| | - Fiona Tomas
- Instituto Mediterráneo de Estudios Avanzados, IMEDEA, Esporles, Balearic Islands, Spain
| | | | - Xiaoli Z. Fern
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon, USA
| |
Collapse
|
21
|
Sharma D, Xu W. phyLoSTM: a novel deep learning model on disease prediction from longitudinal microbiome data. Bioinformatics 2021; 37:3707-3714. [PMID: 34213529 DOI: 10.1093/bioinformatics/btab482] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 05/24/2021] [Accepted: 06/30/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Research shows that human microbiome is highly dynamic on longitudinal timescales, changing dynamically with diet, or due to medical interventions. In this paper, we propose a novel deep learning framework "phyLoSTM", using a combination of Convolutional Neural Networks and Long Short Term Memory Networks (LSTM) for feature extraction and analysis of temporal dependency in longitudinal microbiome sequencing data along with host's environmental factors for disease prediction. Additional novelty in terms of handling variable timepoints in subjects through LSTMs, as well as, weight balancing between imbalanced cases and controls is proposed. RESULTS We simulated 100 datasets across multiple time points for model testing. To demonstrate the model's effectiveness, we also implemented this novel method into two real longitudinal human microbiome studies: (i) DIABIMMUNE three country cohort with food allergy outcomes (Milk, Egg, Peanut and Overall) (ii) DiGiulio study with preterm delivery as outcome. Extensive analysis and comparison of our approach yields encouraging performance with an AUC of 0.897 (increased by 5%) on simulated studies and AUCs of 0.762 (increased by 19%) and 0.713 (increased by 8%) on the two real longitudinal microbiome studies respectively, as compared to the next best performing method, Random Forest. The proposed methodology improves predictive accuracy on longitudinal human microbiome studies containing spatially correlated data, and evaluates the change of microbiome composition contributing to outcome prediction. AVAILABILITY AND IMPLEMENTATION https://github.com/divya031090/phyLoSTM.
Collapse
Affiliation(s)
- Divya Sharma
- Princess Margaret Cancer Center, University Health Network, Toronto, Ontario, Canada
| | - Wei Xu
- Princess Margaret Cancer Center, University Health Network, Toronto, Ontario, Canada.,Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
22
|
García-Jiménez B, Muñoz J, Cabello S, Medina J, Wilkinson MD. Predicting microbiomes through a deep latent space. Bioinformatics 2021; 37:1444-1451. [PMID: 33289510 PMCID: PMC8208755 DOI: 10.1093/bioinformatics/btaa971] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Revised: 10/21/2020] [Accepted: 11/06/2020] [Indexed: 12/28/2022] Open
Abstract
Motivation Microbial communities influence their environment by modifying the availability of compounds, such as nutrients or chemical elicitors. Knowing the microbial composition of a site is therefore relevant to improve productivity or health. However, sequencing facilities are not always available, or may be prohibitively expensive in some cases. Thus, it would be desirable to computationally predict the microbial composition from more accessible, easily-measured features. Results Integrating deep learning techniques with microbiome data, we propose an artificial neural network architecture based on heterogeneous autoencoders to condense the long vector of microbial abundance values into a deep latent space representation. Then, we design a model to predict the deep latent space and, consequently, to predict the complete microbial composition using environmental features as input. The performance of our system is examined using the rhizosphere microbiome of Maize. We reconstruct the microbial composition (717 taxa) from the deep latent space (10 values) with high fidelity (>0.9 Pearson correlation). We then successfully predict microbial composition from environmental variables, such as plant age, temperature or precipitation (0.73 Pearson correlation, 0.42 Bray–Curtis). We extend this to predict microbiome composition under hypothetical scenarios, such as future climate change conditions. Finally, via transfer learning, we predict microbial composition in a distinct scenario with only 100 sequences, and distinct environmental features. We propose that our deep latent space may assist microbiome-engineering strategies when technical or financial resources are limited, through predicting current or future microbiome compositions. Availability and implementation Software, results and data are available at https://github.com/jorgemf/DeepLatentMicrobiome Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Beatriz García-Jiménez
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223-Pozuelo de Alarcón, Madrid, Spain
| | - Jorge Muñoz
- Serendeepia Research, 28905 Getafe (Madrid), Spain
| | - Sara Cabello
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223-Pozuelo de Alarcón, Madrid, Spain
| | - Joaquín Medina
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223-Pozuelo de Alarcón, Madrid, Spain
| | - Mark D Wilkinson
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223-Pozuelo de Alarcón, Madrid, Spain.,Departamento de Biotecnología-Biología Vegetal, Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid (UPM), Madrid, Spain
| |
Collapse
|
23
|
Wu S, Chen Y, Li Z, Li J, Zhao F, Su X. Towards multi-label classification: Next step of machine learning for microbiome research. Comput Struct Biotechnol J 2021; 19:2742-2749. [PMID: 34093989 PMCID: PMC8131981 DOI: 10.1016/j.csbj.2021.04.054] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 04/21/2021] [Accepted: 04/22/2021] [Indexed: 11/22/2022] Open
Abstract
Machine learning (ML) has been widely used in microbiome research for biomarker selection and disease prediction. By training microbial profiles of samples from patients and healthy controls, ML classifiers constructs data models by community features that highly correlated with the target diseases, so as to determine the status of new samples. To clearly understand the host-microbe interaction of specific diseases, previous studies always focused on well-designed cohorts, in which each sample was exactly labeled by a single status type. However, in fact an individual may be associated with multiple diseases simultaneously, which introduce additional variations on microbial patterns that interferes the status detection. More importantly, comorbidities or complications can be missed by regular ML models, limiting the practical application of microbiome techniques. In this review, we summarize the typical ML approaches of single-label classification for microbiome research, and demonstrate their limitations in multi-label disease detection using a real dataset. Then we prospect a further step of ML towards multi-label classification that potentially solves the aforementioned problem, including a series of promising strategies and key technical issues for applying multi-label classification in microbiome-based studies.
Collapse
Affiliation(s)
- Shunyao Wu
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China
| | - Yuzhu Chen
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China
| | - Zhiruo Li
- School of Mathematics and Statistics, Qingdao University, Qingdao, Shandong 266071, China
| | - Jian Li
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China
| | - Fengyang Zhao
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China
| | - Xiaoquan Su
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China
| |
Collapse
|