1
|
Wang L, Lu W, Song Y, Liu S, Fu YV. Using machine learning to identify environmental factors that collectively determine microbial community structure of activated sludge. ENVIRONMENTAL RESEARCH 2024; 260:119635. [PMID: 39025351 DOI: 10.1016/j.envres.2024.119635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 07/12/2024] [Accepted: 07/15/2024] [Indexed: 07/20/2024]
Abstract
Activated sludge (AS) microbial communities are influenced by various environmental variables. However, a comprehensive analysis of how these variables jointly and nonlinearly shape the AS microbial community remains challenging. In this study, we employed advanced machine learning techniques to elucidate the collective effects of environmental variables on the structure and function of AS microbial communities. Applying Dirichlet multinomial mixtures analysis to 311 global AS samples, we identified four distinct microbial community types (AS-types), each characterized by unique microbial compositions and metabolic profiles. We used 14 classical linear and nonlinear machine learning methods to select a baseline model. The extremely randomized trees demonstrated optimal performance in learning the relationship between environmental factors and AS types (with an accuracy of 71.43%). Feature selection identified critical environmental factors and their importance rankings, including latitude (Lat), longitude (Long), precipitation during sampling (Precip), solids retention time (SRT), effluent total nitrogen (Effluent TN), average temperature during sampling month (Avg Temp), mixed liquor temperature (Mixed Temp), influent biochemical oxygen demand (Influent BOD), and annual precipitation (Annual Precip). Significantly, Lat, Long, Precip, Avg Temp, and Annual Precip, influenced metabolic variations among AS types. These findings emphasize the pivotal role of environmental variables in shaping microbial community structures and enhancing metabolic pathways within activated sludge. Our study encourages the application of machine learning techniques to design artificial activated sludge microbial communities for specific environmental purposes.
Collapse
Affiliation(s)
- Lu Wang
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, 100101, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Weilai Lu
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, 100101, China
| | - Yang Song
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, 100101, China
| | - Shuangjiang Liu
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, 100101, China
| | - Yu Vincent Fu
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, 100101, China; Savaid Medical School, University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
2
|
Teixeira M, Silva F, Ferreira RM, Pereira T, Figueiredo C, Oliveira HP. A review of machine learning methods for cancer characterization from microbiome data. NPJ Precis Oncol 2024; 8:123. [PMID: 38816569 PMCID: PMC11139966 DOI: 10.1038/s41698-024-00617-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 05/17/2024] [Indexed: 06/01/2024] Open
Abstract
Recent studies have shown that the microbiome can impact cancer development, progression, and response to therapies suggesting microbiome-based approaches for cancer characterization. As cancer-related signatures are complex and implicate many taxa, their discovery often requires Machine Learning approaches. This review discusses Machine Learning methods for cancer characterization from microbiome data. It focuses on the implications of choices undertaken during sample collection, feature selection and pre-processing. It also discusses ML model selection, guiding how to choose an ML model, and model validation. Finally, it enumerates current limitations and how these may be surpassed. Proposed methods, often based on Random Forests, show promising results, however insufficient for widespread clinical usage. Studies often report conflicting results mainly due to ML models with poor generalizability. We expect that evaluating models with expanded, hold-out datasets, removing technical artifacts, exploring representations of the microbiome other than taxonomical profiles, leveraging advances in deep learning, and developing ML models better adapted to the characteristics of microbiome data will improve the performance and generalizability of models and enable their usage in the clinic.
Collapse
Affiliation(s)
- Marco Teixeira
- Institute for Systems and Computer Engineering, Technology and Science, Porto, Portugal.
- Faculty of Engineering, University of Porto, Porto, Portugal.
| | - Francisco Silva
- Institute for Systems and Computer Engineering, Technology and Science, Porto, Portugal
- Faculty of Science, University of Porto, Porto, Portugal
| | - Rui M Ferreira
- Ipatimup - Institute of Molecular Pathology and Immunology of the University of Porto, Porto, Portugal
- Instituto de Investigação e Inovação em Saúde, University of Porto, Porto, Portugal
| | - Tania Pereira
- Institute for Systems and Computer Engineering, Technology and Science, Porto, Portugal
- Faculty of Sciences and Technology, University of Coimbra, Coimbra, Portugal
| | - Ceu Figueiredo
- Ipatimup - Institute of Molecular Pathology and Immunology of the University of Porto, Porto, Portugal
- Instituto de Investigação e Inovação em Saúde, University of Porto, Porto, Portugal
- Faculty of Medicine, University of Porto, Porto, Portugal
| | - Hélder P Oliveira
- Institute for Systems and Computer Engineering, Technology and Science, Porto, Portugal
- Faculty of Science, University of Porto, Porto, Portugal
| |
Collapse
|
3
|
Roy G, Prifti E, Belda E, Zucker JD. Deep learning methods in metagenomics: a review. Microb Genom 2024; 10:001231. [PMID: 38630611 PMCID: PMC11092122 DOI: 10.1099/mgen.0.001231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 03/27/2024] [Indexed: 04/19/2024] Open
Abstract
The ever-decreasing cost of sequencing and the growing potential applications of metagenomics have led to an unprecedented surge in data generation. One of the most prevalent applications of metagenomics is the study of microbial environments, such as the human gut. The gut microbiome plays a crucial role in human health, providing vital information for patient diagnosis and prognosis. However, analysing metagenomic data remains challenging due to several factors, including reference catalogues, sparsity and compositionality. Deep learning (DL) enables novel and promising approaches that complement state-of-the-art microbiome pipelines. DL-based methods can address almost all aspects of microbiome analysis, including novel pathogen detection, sequence classification, patient stratification and disease prediction. Beyond generating predictive models, a key aspect of these methods is also their interpretability. This article reviews DL approaches in metagenomics, including convolutional networks, autoencoders and attention-based models. These methods aggregate contextualized data and pave the way for improved patient care and a better understanding of the microbiome's key role in our health.
Collapse
Affiliation(s)
- Gaspar Roy
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
| | - Edi Prifti
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
- Sorbonne University, INSERM, Nutriomics, 91 bvd de l’hopital, 75013 Paris, France
| | - Eugeni Belda
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
- Sorbonne University, INSERM, Nutriomics, 91 bvd de l’hopital, 75013 Paris, France
| | - Jean-Daniel Zucker
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
- Sorbonne University, INSERM, Nutriomics, 91 bvd de l’hopital, 75013 Paris, France
| |
Collapse
|
4
|
Franco Meléndez K, Schuster L, Donahey MC, Kairalla E, Jansen MA, Reisch C, Rivers AR. MicroMPN: methods and software for high-throughput screening of microbe suppression in mixed populations. Microbiol Spectr 2024; 12:e0357823. [PMID: 38353567 PMCID: PMC10923211 DOI: 10.1128/spectrum.03578-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 01/22/2024] [Indexed: 02/23/2024] Open
Abstract
Screening assays are used to test if one or more microbes suppress a pathogen of interest. In the presence of more than one microbe, the screening method must be able to accurately distinguish viable pathogen cells from non-viable and non-target microbes in a sample. Current screening methods are time-consuming and require special reagents to detect viability in mixed microbial communities. Screening assays performed using soil or other complex matrices present additional challenges for screening. Here, we develop an experimental workflow based on the most probable number (MPN) assay for testing the ability of synthetic microbial communities to suppress a soil-borne pathogen. Our approach, fluorMPN, uses a fluorescently labeled pathogen and microplate format to enable high-throughput comparative screening. In parallel, we developed a command-line tool, MicroMPN, which significantly reduces the complexity of calculating MPN values from microplates. We compared the performance of the fluorMPN assay with spotting on agar and found that both methods produced strongly correlated counts of equal precision. The suppressive effect of synthetic communities on the pathogen was equally recoverable by both methods. The application of this workflow for discriminating which communities lead to pathogen reduction helps narrow down candidates for additional characterization. Together, the resources offered here are meant to facilitate and simplify the application of MPN-based assays for comparative screening projects. IMPORTANCE We created a unified set of software and laboratory protocols for screening microbe libraries to assess the suppression of a pathogen in a mixed microbial community. Existing methods of fluorescent labeling were combined with the most probable number (MPN) assay in a microplate format to enumerate the reduction of a pathogenic soil microbe from complex soil matrices. This work provides a fluorescent expression vector available from Addgene, step-by-step laboratory protocols hosted by protocols.io, and MicroMPN, a command-line software for processing plate reader outputs. MicroMPN simplifies MPN estimation from 96- and 384-well microplates. The microplate screening assay is amenable to robotic automation with standard liquid handling robots, further reducing the hands-on processing time. This tool was designed to evaluate synthetic microbial communities for use as microbial inoculates or probiotics. The fluorMPN method is also useful for screening chemical and antimicrobial libraries for pathogen suppression in complex bacterial communities like soil.
Collapse
Affiliation(s)
- Karla Franco Meléndez
- United States Department of Agriculture, Agricultural Research Service, Genomics and Bioinformatics Research Unit, Gainesville, Florida, USA
| | - Layla Schuster
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, Florida, USA
| | - Melinda Chue Donahey
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, Florida, USA
| | - Emily Kairalla
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, Florida, USA
| | - M. Andrew Jansen
- United States Department of Agriculture, Agricultural Research Service, Systematic Entomology Laboratory, Electron and Confocal Microscopy Unit, Beltsville, Maryland, USA
| | - Christopher Reisch
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, Florida, USA
| | - Adam R. Rivers
- United States Department of Agriculture, Agricultural Research Service, Genomics and Bioinformatics Research Unit, Gainesville, Florida, USA
| |
Collapse
|
5
|
Asher EE, Bashan A. Model-free prediction of microbiome compositions. MICROBIOME 2024; 12:17. [PMID: 38303006 PMCID: PMC10832217 DOI: 10.1186/s40168-023-01721-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Accepted: 11/15/2023] [Indexed: 02/03/2024]
Abstract
BACKGROUND The recent recognition of the importance of the microbiome to the host's health and well-being has yielded efforts to develop therapies that aim to shift the microbiome from a disease-associated state to a healthier one. Direct manipulation techniques of the species' assemblage are currently available, e.g., using probiotics or narrow-spectrum antibiotics to introduce or eliminate specific taxa. However, predicting the species' abundances at the new state remains a challenge, mainly due to the difficulties of deciphering the delicate underlying network of ecological interactions or constructing a predictive model for such complex ecosystems. RESULTS Here, we propose a model-free method to predict the species' abundances at the new steady state based on their presence/absence configuration by utilizing a multi-dimensional k-nearest-neighbors (kNN) regression algorithm. By analyzing data from numeric simulations of ecological dynamics, we show that our predictions, which consider the presence/absence of all species holistically, outperform both the null model that uses the statistics of each species independently and a predictive neural network model. We analyze real metagenomic data of human-associated microbial communities and find that by relying on a small number of "neighboring" samples, i.e., samples with similar species assemblage, the kNN predicts the species abundance better than the whole-cohort average. By studying both real metagenomic and simulated data, we show that the predictability of our method is tightly related to the dissimilarity-overlap relationship of the training data. CONCLUSIONS Our results demonstrate how model-free methods can prove useful in predicting microbial communities and may facilitate the development of microbial-based therapies. Video Abstract.
Collapse
Affiliation(s)
- Eitan E Asher
- Physics Department, Bar-Ilan University, Ramat-Gan, Israel
| | - Amir Bashan
- Physics Department, Bar-Ilan University, Ramat-Gan, Israel.
| |
Collapse
|
6
|
Delogu F, Kunath BJ, Queirós PM, Halder R, Lebrun LA, Pope PB, May P, Widder S, Muller EEL, Wilmes P. Forecasting the dynamics of a complex microbial community using integrated meta-omics. Nat Ecol Evol 2024; 8:32-44. [PMID: 37957315 PMCID: PMC10781640 DOI: 10.1038/s41559-023-02241-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 10/02/2023] [Indexed: 11/15/2023]
Abstract
Predicting the behaviour of complex microbial communities is challenging. However, this is essential for complex biotechnological processes such as those in biological wastewater treatment plants (BWWTPs), which require sustainable operation. Here we summarize 14 months of longitudinal meta-omics data from a BWWTP anaerobic tank into 17 temporal signals, explaining 91.1% of the temporal variance, and link those signals to ecological events within the community. We forecast the signals over the subsequent five years and use 21 extra samples collected at defined time intervals for testing and validation. Our forecasts are correct for six signals and hint on phenomena such as predation cycles. Using all the 17 forecasts and the environmental variables, we predict gene abundance and expression, with a coefficient of determination ≥0.87 for the subsequent three years. Our study demonstrates the ability to forecast the dynamics of open microbial ecosystems using interactions between community cycles and environmental parameters.
Collapse
Affiliation(s)
- Francesco Delogu
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg.
| | - Benoit J Kunath
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Pedro M Queirós
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Rashi Halder
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Laura A Lebrun
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Phillip B Pope
- Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Ås, Norway
| | - Patrick May
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Stefanie Widder
- Department of Medicine 1, Research Division Infection Biology, Medical University of Vienna, Vienna, Austria
| | - Emilie E L Muller
- Génétique Moléculaire, Génomique, Microbiologie, UMR 7156 CNRS, Université de Strasbourg, Strasbourg, France
| | - Paul Wilmes
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg.
- Department of Life Sciences and Medicine, Faculty of Science, Technology and Medicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg.
| |
Collapse
|
7
|
Fung DLX, Li X, Leung CK, Hu P. A self-knowledge distillation-driven CNN-LSTM model for predicting disease outcomes using longitudinal microbiome data. BIOINFORMATICS ADVANCES 2023; 3:vbad059. [PMID: 37228387 PMCID: PMC10203376 DOI: 10.1093/bioadv/vbad059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 04/03/2023] [Accepted: 05/01/2023] [Indexed: 05/27/2023]
Abstract
Motivation Human microbiome is complex and highly dynamic in nature. Dynamic patterns of the microbiome can capture more information than single point inference as it contains the temporal changes information. However, dynamic information of the human microbiome can be hard to be captured due to the complexity of obtaining the longitudinal data with a large volume of missing data that in conjunction with heterogeneity may provide a challenge for the data analysis. Results We propose using an efficient hybrid deep learning architecture convolutional neural network-long short-term memory, which combines with self-knowledge distillation to create highly accurate models to analyze the longitudinal microbiome profiles to predict disease outcomes. Using our proposed models, we analyzed the datasets from Predicting Response to Standardized Pediatric Colitis Therapy (PROTECT) study and DIABIMMUNE study. We showed the significant improvement in the area under the receiver operating characteristic curve scores, achieving 0.889 and 0.798 on PROTECT study and DIABIMMUNE study, respectively, compared with state-of-the-art temporal deep learning models. Our findings provide an effective artificial intelligence-based tool to predict disease outcomes using longitudinal microbiome profiles from collected patients. Availability and implementation The data and source code can be accessed at https://github.com/darylfung96/UC-disease-TL.
Collapse
Affiliation(s)
- Daryl L X Fung
- Department of Computer Science, University of Manitoba, Winnipeg, MB R3T 2N2, Canada
| | - Xu Li
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON M5T 3M7, Canada
| | - Carson K Leung
- Department of Computer Science, University of Manitoba, Winnipeg, MB R3T 2N2, Canada
| | | |
Collapse
|
8
|
Peng S, Luo M, Long D, Liu Z, Tan Q, Huang P, Shen J, Pu S. Full-length 16S rRNA gene sequencing and machine learning reveal the bacterial composition of inhalable particles from two different breeding stages in a piggery. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2023; 253:114712. [PMID: 36863163 DOI: 10.1016/j.ecoenv.2023.114712] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 02/15/2023] [Accepted: 02/27/2023] [Indexed: 06/18/2023]
Abstract
Bacterial loading aggravates the harm of particulate matter (PM) to public health and ecological systems, especially in operations of concentrated animal production. This study aimed to explore the characteristics and influencing factors of bacterial components of inhalable particles at a piggery. The morphology and elemental composition of coarse particles (PM10, aerodynamic diameter ≤ 10 µm) and fine particles (PM2.5, aerodynamic diameter ≤ 2.5 µm) were analyzed. Full-length 16 S rRNA sequencing technology was used to identify bacterial components according to breeding stage, particle size, and diurnal rhythm. Machine learning (ML) algorithms were used to further explore the relationship between bacteria and the environment. The results showed that the morphology of particles in the piggery differed, and the morphologies of the suspected bacterial components were elliptical deposited particles. Full-length 16 S rRNA indicated that most of the airborne bacteria in the fattening and gestation houses were bacilli. The analysis of beta diversity and difference between samples showed that the relative abundance of some bacteria in PM2.5 was significantly higher than that in PM10 at the same pig house (P < 0.01). There were significant differences in the bacterial composition of inhalable particles between the fattening and gestation houses (P < 0.01). The aggregated boosted tree (ABT) model showed that PM2.5 had a great influence on airborne bacteria among air pollutants. Fast expectation-maximization microbial source tracking (FEAST) showed that feces was a major potential source of airborne bacteria in pig houses (contribution 52.64-80.58 %). These results will provide a scientific basis for exploring the potential risks of airborne bacteria in a piggery to human and animal health.
Collapse
Affiliation(s)
- Siyi Peng
- Chongqing Academy of Animal Sciences, No. 51, Changlong Avenue, Rong chang District, Chongqing 402460, China; College of Animal Science and Technology, Southwest University, Chongqing 402460, China
| | - Min Luo
- Chongqing Academy of Animal Sciences, No. 51, Changlong Avenue, Rong chang District, Chongqing 402460, China
| | - Dingbiao Long
- Chongqing Academy of Animal Sciences, No. 51, Changlong Avenue, Rong chang District, Chongqing 402460, China; Scientific Observation and Experiment Station of Livestock Equipment Engineering in Southwest, Ministry of Agriculture and Rural Affairs, Chongqing 402460, China; Innovation and Entrepreneurship Team for Livestock Environment Control and Equipment R&D, Chongqing 402460, China; National Center of Technology Innovation for pigs, Chongqing 402460, China
| | - Zuohua Liu
- Chongqing Academy of Animal Sciences, No. 51, Changlong Avenue, Rong chang District, Chongqing 402460, China; National Center of Technology Innovation for pigs, Chongqing 402460, China; College of Animal Science and Technology, Southwest University, Chongqing 402460, China
| | - Qiong Tan
- Chongqing Academy of Animal Sciences, No. 51, Changlong Avenue, Rong chang District, Chongqing 402460, China; National Center of Technology Innovation for pigs, Chongqing 402460, China
| | - Ping Huang
- Chongqing Academy of Animal Sciences, No. 51, Changlong Avenue, Rong chang District, Chongqing 402460, China; National Center of Technology Innovation for pigs, Chongqing 402460, China
| | - Jie Shen
- Chongqing Academy of Animal Sciences, No. 51, Changlong Avenue, Rong chang District, Chongqing 402460, China; National Center of Technology Innovation for pigs, Chongqing 402460, China
| | - Shihua Pu
- Chongqing Academy of Animal Sciences, No. 51, Changlong Avenue, Rong chang District, Chongqing 402460, China; Scientific Observation and Experiment Station of Livestock Equipment Engineering in Southwest, Ministry of Agriculture and Rural Affairs, Chongqing 402460, China; Innovation and Entrepreneurship Team for Livestock Environment Control and Equipment R&D, Chongqing 402460, China; National Center of Technology Innovation for pigs, Chongqing 402460, China.
| |
Collapse
|
9
|
Hallin S. Environmental microbiology going computational-Predictive ecology and unpredicted discoveries. Environ Microbiol 2023; 25:111-114. [PMID: 36181387 PMCID: PMC10092848 DOI: 10.1111/1462-2920.16232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 09/28/2022] [Indexed: 01/21/2023]
Affiliation(s)
- Sara Hallin
- Swedish University of Agricultural SciencesDepartment of Forest Mycology and Plant PathologyUppsalaSweden
| |
Collapse
|
10
|
Hernández Medina R, Kutuzova S, Nielsen KN, Johansen J, Hansen LH, Nielsen M, Rasmussen S. Machine learning and deep learning applications in microbiome research. ISME COMMUNICATIONS 2022; 2:98. [PMID: 37938690 PMCID: PMC9723725 DOI: 10.1038/s43705-022-00182-9] [Citation(s) in RCA: 53] [Impact Index Per Article: 26.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 09/12/2022] [Accepted: 09/16/2022] [Indexed: 05/27/2023]
Abstract
The many microbial communities around us form interactive and dynamic ecosystems called microbiomes. Though concealed from the naked eye, microbiomes govern and influence macroscopic systems including human health, plant resilience, and biogeochemical cycling. Such feats have attracted interest from the scientific community, which has recently turned to machine learning and deep learning methods to interrogate the microbiome and elucidate the relationships between its composition and function. Here, we provide an overview of how the latest microbiome studies harness the inductive prowess of artificial intelligence methods. We start by highlighting that microbiome data - being compositional, sparse, and high-dimensional - necessitates special treatment. We then introduce traditional and novel methods and discuss their strengths and applications. Finally, we discuss the outlook of machine and deep learning pipelines, focusing on bottlenecks and considerations to address them.
Collapse
Affiliation(s)
- Ricardo Hernández Medina
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200, Copenhagen N, Denmark
| | - Svetlana Kutuzova
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200, Copenhagen N, Denmark
- Department of Computer Science, University of Copenhagen, DK-2100, Copenhagen Ø, Denmark
| | - Knud Nor Nielsen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200, Copenhagen N, Denmark
- Department of Plant and Environmental Sciences, University of Copenhagen, DK-1871, Frederiksberg, Denmark
| | - Joachim Johansen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200, Copenhagen N, Denmark
| | - Lars Hestbjerg Hansen
- Department of Plant and Environmental Sciences, University of Copenhagen, DK-1871, Frederiksberg, Denmark
| | - Mads Nielsen
- Department of Computer Science, University of Copenhagen, DK-2100, Copenhagen Ø, Denmark.
| | - Simon Rasmussen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200, Copenhagen N, Denmark.
| |
Collapse
|
11
|
Borgman J, Stark K, Carson J, Hauser L. Deep Learning Encoding for Rapid Sequence Identification on Microbiome Data. FRONTIERS IN BIOINFORMATICS 2022; 2:871256. [PMID: 36304316 PMCID: PMC9580936 DOI: 10.3389/fbinf.2022.871256] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 05/30/2022] [Indexed: 11/18/2022] Open
Abstract
We present a novel approach for rapidly identifying sequences that leverages the representational power of Deep Learning techniques and is applied to the analysis of microbiome data. The method involves the creation of a latent sequence space, training a convolutional neural network to rapidly identify sequences by mapping them into that space, and we leverage the novel encoded latent space for denoising to correct sequencing errors. Using mock bacterial communities of known composition, we show that this approach achieves single nucleotide resolution, generating results for sequence identification and abundance estimation that match the best available microbiome algorithms in terms of accuracy while vastly increasing the speed of accurate processing. We further show the ability of this approach to support phenotypic prediction at the sample level on an experimental data set for which the ground truth for sequence identities and abundances is unknown, but the expected phenotypes of the samples are definitive. Moreover, this approach offers a potential solution for the analysis of data from other types of experiments that currently rely on computationally intensive sequence identification.
Collapse
|
12
|
Legeay J, Hijri M. A Comprehensive Insight of Current and Future Challenges in Large-Scale Soil Microbiome Analyses. MICROBIAL ECOLOGY 2022:10.1007/s00248-022-02060-2. [PMID: 35739325 DOI: 10.1007/s00248-022-02060-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 06/15/2022] [Indexed: 06/15/2023]
Abstract
In the last decade, various large-scale projects describing soil microbial diversity across large geographical gradients have been undertaken. However, many questions remain unanswered about the best ways to conduct these studies. In this review, we present an overview of the experience gathered during these projects, and of the challenges that future projects will face, such as standardization of protocols and results, considering the temporal variation of microbiomes, and the legal constraints limiting such studies. We also present the arguments for and against the exhaustive description of soil microbiomes. Finally, we look at future developments of soil microbiome studies, notably emphasizing the important role of cultivation techniques.
Collapse
Affiliation(s)
- Jean Legeay
- African Genome Center, Mohammed VI Polytechnic University, Ben Guerir, Morocco.
| | - Mohamed Hijri
- African Genome Center, Mohammed VI Polytechnic University, Ben Guerir, Morocco
- Institut de La Recherche en Biologie Végétale, Département de Sciences Biologiques, Université de Montréal, Montreal, QE, H1X 2B2, Canada
| |
Collapse
|
13
|
Taneishi K, Tsuchiya Y. Structure-based analyses of gut microbiome-related proteins by neural networks and molecular dynamics simulations. Curr Opin Struct Biol 2022; 73:102336. [DOI: 10.1016/j.sbi.2022.102336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 11/18/2021] [Accepted: 01/14/2022] [Indexed: 11/03/2022]
|
14
|
Michel‐Mata S, Wang X, Liu Y, Angulo MT. Predicting microbiome compositions from species assemblages through deep learning. IMETA 2022; 1:e3. [PMID: 35757098 PMCID: PMC9221840 DOI: 10.1002/imt2.3] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 12/29/2021] [Accepted: 01/04/2022] [Indexed: 05/13/2023]
Abstract
Microbes can form complex communities that perform critical functions in maintaining the integrity of their environment or their hosts' well-being. Rationally managing these microbial communities requires improving our ability to predict how different species assemblages affect the final species composition of the community. However, making such a prediction remains challenging because of our limited knowledge of the diverse physical, biochemical, and ecological processes governing microbial dynamics. To overcome this challenge, we present a deep learning framework that automatically learns the map between species assemblages and community compositions from training data only, without knowing any of the above processes. First, we systematically validate our framework using synthetic data generated by classical population dynamics models. Then, we apply our framework to data from in vitro and in vivo microbial communities, including ocean and soil microbiota, Drosophila melanogaster gut microbiota, and human gut and oral microbiota. We find that our framework learns to perform accurate out-of-sample predictions of complex community compositions from a small number of training samples. Our results demonstrate how deep learning can enable us to understand better and potentially manage complex microbial communities.
Collapse
Affiliation(s)
- Sebastian Michel‐Mata
- Center for Applied Physics and Advanced TechnologyUniversidad Nacional Autónoma de MéxicoJuriquillaMexico
- Department of Ecology and Evolutionary BiologyPrinceton UniversityPrincetonNew JerseyUSA
| | - Xu‐Wen Wang
- Channing Division of Network Medicine, Department of MedicineBrigham and Women's Hospital and Harvard Medical SchoolBostonMassachusettsUSA
| | - Yang‐Yu Liu
- Channing Division of Network Medicine, Department of MedicineBrigham and Women's Hospital and Harvard Medical SchoolBostonMassachusettsUSA
| | - Marco Tulio Angulo
- CONACyT—Institute of MathematicsUniversidad Nacional Autónoma de MéxicoJuriquillaMexico
| |
Collapse
|
15
|
David MM, Tataru C, Pope Q, Baker LJ, English MK, Epstein HE, Hammer A, Kent M, Sieler MJ, Mueller RS, Sharpton TJ, Tomas F, Vega Thurber R, Fern XZ. Revealing General Patterns of Microbiomes That Transcend Systems: Potential and Challenges of Deep Transfer Learning. mSystems 2022; 7:e0105821. [PMID: 35040699 PMCID: PMC8765061 DOI: 10.1128/msystems.01058-21] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
A growing body of research has established that the microbiome can mediate the dynamics and functional capacities of diverse biological systems. Yet, we understand little about what governs the response of these microbial communities to host or environmental changes. Most efforts to model microbiomes focus on defining the relationships between the microbiome, host, and environmental features within a specified study system and therefore fail to capture those that may be evident across multiple systems. In parallel with these developments in microbiome research, computer scientists have developed a variety of machine learning tools that can identify subtle, but informative, patterns from complex data. Here, we recommend using deep transfer learning to resolve microbiome patterns that transcend study systems. By leveraging diverse public data sets in an unsupervised way, such models can learn contextual relationships between features and build on those patterns to perform subsequent tasks (e.g., classification) within specific biological contexts.
Collapse
Affiliation(s)
- Maude M. David
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
- Department of Pharmaceutical Sciences, Oregon State University, Corvallis, Oregon, USA
| | - Christine Tataru
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Quintin Pope
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon, USA
| | - Lydia J. Baker
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Mary K. English
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Hannah E. Epstein
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Austin Hammer
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Michael Kent
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Michael J. Sieler
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Ryan S. Mueller
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Thomas J. Sharpton
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
- Department of Statistics, Oregon State University, Corvallis, Oregon, USA
| | - Fiona Tomas
- Instituto Mediterráneo de Estudios Avanzados, IMEDEA, Esporles, Balearic Islands, Spain
| | | | - Xiaoli Z. Fern
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon, USA
| |
Collapse
|