151
|
Separation of Donor and Recipient Microbial Diversity Allows Determination of Taxonomic and Functional Features of Gut Microbiota Restructuring following Fecal Transplantation. mSystems 2021; 6:e0081121. [PMID: 34402648 PMCID: PMC8407411 DOI: 10.1128/msystems.00811-21] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Fecal microbiota transplantation (FMT) is currently used in medicine to treat recurrent clostridial colitis and other intestinal diseases. However, neither the therapeutic mechanism of FMT nor the mechanism that allows the donor bacteria to colonize the intestine of the recipient has yet been clearly described. From a biological point of view, FMT can be considered a useful model for studying the ecology of host-associated microbial communities. FMT experiments can shed light on the relationship features between the host and its gut microbiota. This creates the need for experimentation with approaches to metagenomic data analysis which may be useful for the interpretation of observed biological phenomena. Here, the recipient intestine colonization analysis tool (RECAST) novel computational approach is presented, which is based on the metagenomic read sorting process per their origin in the recipient’s post-FMT stool metagenome. Using the RECAST algorithm, taxonomic/functional annotation, and machine learning approaches, the metagenomes from three FMT studies, including healthy volunteers, patients with clostridial colitis, and patients with metabolic syndrome, were analyzed. Using our computational pipeline, the donor-derived and recipient-derived microbes which formed the recipient post-FMT stool metagenomes (successful microbes) were identified. Their presence is well explained by a higher relative abundance in donor/pre-FMT recipient metagenomes or other metagenomes from the human population. In addition, successful microbes are enriched with gene groups potentially related to antibiotic resistance, including antimicrobial peptides. Interestingly, the observed reorganization features are universal and independent of the disease. IMPORTANCE We assumed that the enrichment of successful gut microbes by lantibiotic/antibiotic resistance genes can be related to gut microbiota colonization resistance by third-party microbe phenomena and resistance to bacterium-derived or host-derived antimicrobial substances. According to this assumption, competition between the donor-derived and recipient-derived microbes as well as host immunity may play a key role in the FMT-related colonization and redistribution of recipient gut microbiota structure. Author Video: An author video summary of this article is available.
Collapse
|
152
|
Sahu A, Blätke MA, Szymański JJ, Töpfer N. Advances in flux balance analysis by integrating machine learning and mechanism-based models. Comput Struct Biotechnol J 2021; 19:4626-4640. [PMID: 34471504 PMCID: PMC8382995 DOI: 10.1016/j.csbj.2021.08.004] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2021] [Revised: 08/03/2021] [Accepted: 08/03/2021] [Indexed: 02/08/2023] Open
Abstract
The availability of multi-omics data sets and genome-scale metabolic models for various organisms provide a platform for modeling and analyzing genotype-to-phenotype relationships. Flux balance analysis is the main tool for predicting flux distributions in genome-scale metabolic models and various data-integrative approaches enable modeling context-specific network behavior. Due to its linear nature, this optimization framework is readily scalable to multi-tissue or -organ and even multi-organism models. However, both data and model size can hamper a straightforward biological interpretation of the estimated fluxes. Moreover, flux balance analysis simulates metabolism at steady-state and thus, in its most basic form, does not consider kinetics or regulatory events. The integration of flux balance analysis with complementary data analysis and modeling techniques offers the potential to overcome these challenges. In particular machine learning approaches have emerged as the tool of choice for data reduction and selection of most important variables in big data sets. Kinetic models and formal languages can be used to simulate dynamic behavior. This review article provides an overview of integrative studies that combine flux balance analysis with machine learning approaches, kinetic models, such as physiology-based pharmacokinetic models, and formal graphical modeling languages, such as Petri nets. We discuss the mathematical aspects and biological applications of these integrated approaches and outline challenges and future perspectives.
Collapse
Affiliation(s)
- Ankur Sahu
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, 06466 Gatersleben, Germany
| | - Mary-Ann Blätke
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, 06466 Gatersleben, Germany
| | - Jędrzej Jakub Szymański
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, 06466 Gatersleben, Germany
| | - Nadine Töpfer
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, 06466 Gatersleben, Germany
| |
Collapse
|
153
|
Zhou Z, Hu S, Zhang R, Ma Y, Du K, Sun M, Zhang H, Jiang X, Tu H, Wang X, Chen P. A simple and novel biomarker panel for serofluid dish rapid quality and safety assessment based on gray relational analysis. FOOD BIOSCI 2021. [DOI: 10.1016/j.fbio.2021.101188] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
|
154
|
Hassouneh SAD, Loftus M, Yooseph S. Linking Inflammatory Bowel Disease Symptoms to Changes in the Gut Microbiome Structure and Function. Front Microbiol 2021; 12:673632. [PMID: 34349736 PMCID: PMC8326577 DOI: 10.3389/fmicb.2021.673632] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Accepted: 06/25/2021] [Indexed: 12/12/2022] Open
Abstract
Inflammatory bowel disease (IBD) is a chronic disease of the gastrointestinal tract that is often characterized by abdominal pain, rectal bleeding, inflammation, and weight loss. Many studies have posited that the gut microbiome may play an integral role in the onset and exacerbation of IBD. Here, we present a novel computational analysis of a previously published IBD dataset. This dataset consists of shotgun sequence data generated from fecal samples collected from individuals with IBD and an internal control group. Utilizing multiple external controls, together with appropriate techniques to handle the compositionality aspect of sequence data, our computational framework can identify and corroborate differences in the taxonomic profiles, bacterial association networks, and functional capacity within the IBD gut microbiome. Our analysis identified 42 bacterial species that are differentially abundant between IBD and every control group (one internal control and two external controls) with at least a twofold difference. Of the 42 species, 34 were significantly elevated in IBD, relative to every other control. These 34 species were still present in the control groups and appear to play important roles, according to network centrality and degree, in all bacterial association networks. Many of the species elevated in IBD have been implicated in modulating the immune response, mucin degradation, antibiotic resistance, and inflammation. We also identified elevated relative abundances of protein families related to signal transduction, sporulation and germination, and polysaccharide degradation as well as decreased relative abundance of protein families related to menaquinone and ubiquinone biosynthesis. Finally, we identified differences in functional capacities between IBD and healthy controls, and subsequently linked the changes in the functional capacity to previously published clinical research and to symptoms that commonly occur in IBD.
Collapse
Affiliation(s)
- Sayf Al-Deen Hassouneh
- Burnett School of Biomedical Sciences, Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL, United States
| | - Mark Loftus
- Burnett School of Biomedical Sciences, Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL, United States
| | - Shibu Yooseph
- Department of Computer Science, Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL, United States
| |
Collapse
|
155
|
Estaki M, Jiang L, Bokulich NA, McDonald D, González A, Kosciolek T, Martino C, Zhu Q, Birmingham A, Vázquez-Baeza Y, Dillon MR, Bolyen E, Caporaso JG, Knight R. QIIME 2 Enables Comprehensive End-to-End Analysis of Diverse Microbiome Data and Comparative Studies with Publicly Available Data. ACTA ACUST UNITED AC 2021; 70:e100. [PMID: 32343490 PMCID: PMC9285460 DOI: 10.1002/cpbi.100] [Citation(s) in RCA: 185] [Impact Index Per Article: 61.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
QIIME 2 is a completely re‐engineered microbiome bioinformatics platform based on the popular QIIME platform, which it has replaced. QIIME 2 facilitates comprehensive and fully reproducible microbiome data science, improving accessibility to diverse users by adding multiple user interfaces. QIIME 2 can be combined with Qiita, an open‐source web‐based platform, to re‐use available data for meta‐analysis. The following basic protocol describes how to install QIIME 2 on a single computer and analyze microbiome sequence data, from processing of raw DNA sequence reads through generating publishable interactive figures. These interactive figures allow readers of a study to interact with data with the same ease as its authors, advancing microbiome science transparency and reproducibility. We also show how plug‐ins developed by the community to add analysis capabilities can be installed and used with QIIME 2, enhancing various aspects of microbiome analyses—e.g., improving taxonomic classification accuracy. Finally, we illustrate how users can perform meta‐analyses combining different datasets using readily available public data through Qiita. In this tutorial, we analyze a subset of the Early Childhood Antibiotics and the Microbiome (ECAM) study, which tracked the microbiome composition and development of 43 infants in the United States from birth to 2 years of age, identifying microbiome associations with antibiotic exposure, delivery mode, and diet. For more information about QIIME 2, see https://qiime2.org. To troubleshoot or ask questions about QIIME 2 and microbiome analysis, join the active community at https://forum.qiime2.org. © 2020 The Authors. Basic Protocol: Using QIIME 2 with microbiome data Support Protocol: Further microbiome analyses
Collapse
Affiliation(s)
- Mehrbod Estaki
- Department of Pediatrics, University of California San Diego, La Jolla, California
| | - Lingjing Jiang
- Division of Biostatistics, University of California San Diego, La Jolla, California
| | - Nicholas A Bokulich
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona.,Department of Biological Sciences, Northern Arizona University, Flagstaff, Arizona
| | - Daniel McDonald
- Department of Pediatrics, University of California San Diego, La Jolla, California
| | - Antonio González
- Department of Pediatrics, University of California San Diego, La Jolla, California
| | - Tomasz Kosciolek
- Department of Pediatrics, University of California San Diego, La Jolla, California.,Małopolska Centre of Biotechnology, Jagiellonian University, Kraków, Poland
| | - Cameron Martino
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, California.,Center for Microbiome Innovation, University of California San Diego, La Jolla, California
| | - Qiyun Zhu
- Department of Pediatrics, University of California San Diego, La Jolla, California
| | - Amanda Birmingham
- Center for Computational Biology and Bioinformatics, University of California San Diego, La Jolla, California
| | - Yoshiki Vázquez-Baeza
- Center for Microbiome Innovation, University of California San Diego, La Jolla, California.,Jacobs School of Engineering, University of California San Diego, La Jolla, California
| | - Matthew R Dillon
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona
| | - Evan Bolyen
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona
| | - J Gregory Caporaso
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona.,Department of Biological Sciences, Northern Arizona University, Flagstaff, Arizona
| | - Rob Knight
- Department of Pediatrics, University of California San Diego, La Jolla, California.,Center for Microbiome Innovation, University of California San Diego, La Jolla, California.,Department of Computer Science and Engineering, University of California San Diego, La Jolla, California.,Department of Bioengineering, University of California San Diego, La Jolla, California
| |
Collapse
|
156
|
Yang F, Zou Q. mAML: an automated machine learning pipeline with a microbiome repository for human disease classification. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021; 2020:5862399. [PMID: 32588040 PMCID: PMC7316531 DOI: 10.1093/database/baaa050] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 05/27/2020] [Accepted: 06/03/2020] [Indexed: 12/20/2022]
Abstract
Due to the concerted efforts to utilize the microbial features to improve disease prediction capabilities, automated machine learning (AutoML) systems aiming to get rid of the tediousness in manually performing ML tasks are in great demand. Here we developed mAML, an ML model-building pipeline, which can automatically and rapidly generate optimized and interpretable models for personalized microbiome-based classification tasks in a reproducible way. The pipeline is deployed on a web-based platform, while the server is user-friendly and flexible and has been designed to be scalable according to the specific requirements. This pipeline exhibits high performance for 13 benchmark datasets including both binary and multi-class classification tasks. In addition, to facilitate the application of mAML and expand the human disease-related microbiome learning repository, we developed GMrepo ML repository (GMrepo Microbiome Learning repository) from the GMrepo database. The repository involves 120 microbiome-based classification tasks for 85 human-disease phenotypes referring to 12 429 metagenomic samples and 38 643 amplicon samples. The mAML pipeline and the GMrepo ML repository are expected to be important resources for researches in microbiology and algorithm developments. Database URL: http://lab.malab.cn/soft/mAML
Collapse
Affiliation(s)
- Fenglong Yang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, No. 4, Section 2, North Jianshe Road, Chengdu 610054, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, No. 4, Section 2, North Jianshe Road, Chengdu 610054, China
| |
Collapse
|
157
|
García-Jiménez B, Muñoz J, Cabello S, Medina J, Wilkinson MD. Predicting microbiomes through a deep latent space. Bioinformatics 2021; 37:1444-1451. [PMID: 33289510 PMCID: PMC8208755 DOI: 10.1093/bioinformatics/btaa971] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Revised: 10/21/2020] [Accepted: 11/06/2020] [Indexed: 12/28/2022] Open
Abstract
Motivation Microbial communities influence their environment by modifying the availability of compounds, such as nutrients or chemical elicitors. Knowing the microbial composition of a site is therefore relevant to improve productivity or health. However, sequencing facilities are not always available, or may be prohibitively expensive in some cases. Thus, it would be desirable to computationally predict the microbial composition from more accessible, easily-measured features. Results Integrating deep learning techniques with microbiome data, we propose an artificial neural network architecture based on heterogeneous autoencoders to condense the long vector of microbial abundance values into a deep latent space representation. Then, we design a model to predict the deep latent space and, consequently, to predict the complete microbial composition using environmental features as input. The performance of our system is examined using the rhizosphere microbiome of Maize. We reconstruct the microbial composition (717 taxa) from the deep latent space (10 values) with high fidelity (>0.9 Pearson correlation). We then successfully predict microbial composition from environmental variables, such as plant age, temperature or precipitation (0.73 Pearson correlation, 0.42 Bray–Curtis). We extend this to predict microbiome composition under hypothetical scenarios, such as future climate change conditions. Finally, via transfer learning, we predict microbial composition in a distinct scenario with only 100 sequences, and distinct environmental features. We propose that our deep latent space may assist microbiome-engineering strategies when technical or financial resources are limited, through predicting current or future microbiome compositions. Availability and implementation Software, results and data are available at https://github.com/jorgemf/DeepLatentMicrobiome Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Beatriz García-Jiménez
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223-Pozuelo de Alarcón, Madrid, Spain
| | - Jorge Muñoz
- Serendeepia Research, 28905 Getafe (Madrid), Spain
| | - Sara Cabello
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223-Pozuelo de Alarcón, Madrid, Spain
| | - Joaquín Medina
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223-Pozuelo de Alarcón, Madrid, Spain
| | - Mark D Wilkinson
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223-Pozuelo de Alarcón, Madrid, Spain.,Departamento de Biotecnología-Biología Vegetal, Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid (UPM), Madrid, Spain
| |
Collapse
|
158
|
Nishimura N, Kaji K, Kitagawa K, Sawada Y, Furukawa M, Ozutsumi T, Fujinaga Y, Tsuji Y, Takaya H, Kawaratani H, Moriya K, Namisaki T, Akahane T, Fukui H, Yoshiji H. Intestinal Permeability Is a Mechanical Rheostat in the Pathogenesis of Liver Cirrhosis. Int J Mol Sci 2021; 22:ijms22136921. [PMID: 34203178 PMCID: PMC8267717 DOI: 10.3390/ijms22136921] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Revised: 06/22/2021] [Accepted: 06/24/2021] [Indexed: 12/12/2022] Open
Abstract
Recent studies have suggested that an alteration in the gut microbiota and their products, particularly endotoxins derived from Gram-negative bacteria, may play a major role in the pathogenesis of liver diseases. Gut dysbiosis caused by a high-fat diet and alcohol consumption induces increased intestinal permeability, which means higher translocation of bacteria and their products and components, including endotoxins, the so-called "leaky gut". Clinical studies have found that plasma endotoxin levels are elevated in patients with chronic liver diseases, including alcoholic liver disease and nonalcoholic liver disease. A decrease in commensal nonpathogenic bacteria including Ruminococaceae and Lactobacillus and an overgrowth of pathogenic bacteria such as Bacteroidaceae and Enterobacteriaceae are observed in cirrhotic patients. The decreased diversity of the gut microbiota in cirrhotic patients before liver transplantation is also related to a higher incidence of post-transplant infections and cognitive impairment. The exposure to endotoxins activates macrophages via Toll-like receptor 4 (TLR4), leading to a greater production of proinflammatory cytokines and chemokines including tumor necrosis factor-alpha, interleukin (IL)-6, and IL-8, which play key roles in the progression of liver diseases. TLR4 is a major receptor activated by the binding of endotoxins in macrophages, and its downstream signal induces proinflammatory cytokines. The expression of TLR4 is also observed in nonimmune cells in the liver, such as hepatic stellate cells, which play a crucial role in the progression of liver fibrosis that develops into hepatocarcinogenesis, suggesting the importance of the interaction between endotoxemia and TLR4 signaling as a target for preventing liver disease progression. In this review, we summarize the findings for the role of gut-derived endotoxemia underlying the progression of liver pathogenesis.
Collapse
|
159
|
Chen X, Liu L, Zhang W, Yang J, Wong KC. Human host status inference from temporal microbiome changes via recurrent neural networks. Brief Bioinform 2021; 22:6307015. [PMID: 34151933 DOI: 10.1093/bib/bbab223] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 04/21/2021] [Accepted: 04/21/2021] [Indexed: 01/04/2023] Open
Abstract
With the rapid increase in sequencing data, human host status inference (e.g. healthy or sick) from microbiome data has become an important issue. Existing studies are mostly based on single-point microbiome composition, while it is rare that the host status is predicted from longitudinal microbiome data. However, single-point-based methods cannot capture the dynamic patterns between the temporal changes and host status. Therefore, it remains challenging to build good predictive models as well as scaling to different microbiome contexts. On the other hand, existing methods are mainly targeted for disease prediction and seldom investigate other host statuses. To fill the gap, we propose a comprehensive deep learning-based framework that utilizes longitudinal microbiome data as input to infer the human host status. Specifically, the framework is composed of specific data preparation strategies and a recurrent neural network tailored for longitudinal microbiome data. In experiments, we evaluated the proposed method on both semi-synthetic and real datasets based on different sequencing technologies and metagenomic contexts. The results indicate that our method achieves robust performance compared to other baseline and state-of-the-art classifiers and provides a significant reduction in prediction time.
Collapse
Affiliation(s)
- Xingjian Chen
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR
| | - Lingjing Liu
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR
| | - Weitong Zhang
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Kowloon, Hong Kong SAR
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR
| |
Collapse
|
160
|
Jiao N, Loomba R, Yang ZH, Wu D, Fang S, Bettencourt R, Lan P, Zhu R, Zhu L. Alterations in bile acid metabolizing gut microbiota and specific bile acid genes as a precision medicine to subclassify NAFLD. Physiol Genomics 2021; 53:336-348. [PMID: 34151600 DOI: 10.1152/physiolgenomics.00011.2021] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Multiple mechanisms for the gut microbiome contributing to the pathogenesis of nonalcoholic fatty liver disease (NAFLD) have been implicated. Here, we aim to investigate the contribution and potential application for altered bile acids (BA) metabolizing microbes in NAFLD by post hoc analysis of whole metagenome sequencing (WMS) data. The discovery cohort consisted of 86 well-characterized patients with biopsy-proven NAFLD and 38 healthy controls. Assembly-based analysis was performed to identify BA-metabolizing microbes. Statistical tests, feature selection, and microbial coabundance analysis were integrated to identify microbial alterations and markers in NAFLD. An independent validation cohort was subjected to similar analyses. NAFLD microbiota exhibited decreased diversity and microbial associations. We established a classifier model with 53 differential species exhibiting a robust diagnostic accuracy [area under the receiver-operator curve (AUC) = 0.97] for detecting NAFLD. Next, eight important differential pathway markers including secondary BA biosynthesis were identified. Specifically, increased abundance of 7α-hydroxysteroid dehydrogenase (7α-HSDH), 3α-hydroxysteroid dehydrogenase (baiA), and bile acid-coenzyme A ligase (baiB) was detected in NAFLD. Furthermore, 10 of 50 BA-metabolizing metagenome-assembled genomes (MAGs) from Bacteroides ovatus and Eubacterium biforme were dominant in NAFLD and interplayed as a synergetic ecological guild. Importantly, two subtypes of patients with NAFLD were observed according to secondary BA metabolism potentials. Elevated capability for secondary BA biosynthesis was also observed in the validation cohort. These bacterial BA-metabolizing genes and microbes identified in this study may serve as disease markers. Microbial differences in BA-metabolism and strain-specific differences among patients highlight the potential for precision medicine in NAFLD treatment.
Collapse
Affiliation(s)
- Na Jiao
- Department of Colorectal Surgery, Guangdong Institute of Gastroenterology, Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, People's Republic of China.,Department of Bioinformatics, Putuo People's Hospital, Tongji University, Shanghai, People's Republic of China
| | - Rohit Loomba
- Division of Gastroenterology and Epidemiology, Department of Medicine, NAFLD Research Center, University of California San Diego, La Jolla, California
| | - Zi-Huan Yang
- Department of Colorectal Surgery, Guangdong Institute of Gastroenterology, Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, People's Republic of China
| | - Dingfeng Wu
- Department of Bioinformatics, Putuo People's Hospital, Tongji University, Shanghai, People's Republic of China
| | - Sa Fang
- Department of Bioinformatics, Putuo People's Hospital, Tongji University, Shanghai, People's Republic of China
| | - Richele Bettencourt
- Division of Gastroenterology and Epidemiology, Department of Medicine, NAFLD Research Center, University of California San Diego, La Jolla, California
| | - Ping Lan
- Department of Colorectal Surgery, Guangdong Institute of Gastroenterology, Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, People's Republic of China
| | - Ruixin Zhu
- Department of Bioinformatics, Putuo People's Hospital, Tongji University, Shanghai, People's Republic of China
| | - Lixin Zhu
- Department of Colorectal Surgery, Guangdong Institute of Gastroenterology, Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, People's Republic of China.,Department of Biochemistry, Genome, Environment and Microbiome Community of Excellence, The State University of New York at Buffalo, Buffalo, New York
| |
Collapse
|
161
|
Jasner Y, Belogolovski A, Ben-Itzhak M, Koren O, Louzoun Y. Microbiome Preprocessing Machine Learning Pipeline. Front Immunol 2021; 12:677870. [PMID: 34220823 PMCID: PMC8250139 DOI: 10.3389/fimmu.2021.677870] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 05/07/2021] [Indexed: 11/13/2022] Open
Abstract
Background 16S sequencing results are often used for Machine Learning (ML) tasks. 16S gene sequences are represented as feature counts, which are associated with taxonomic representation. Raw feature counts may not be the optimal representation for ML. Methods We checked multiple preprocessing steps and tested the optimal combination for 16S sequencing-based classification tasks. We computed the contribution of each step to the accuracy as measured by the Area Under Curve (AUC) of the classification. Results We show that the log of the feature counts is much more informative than the relative counts. We further show that merging features associated with the same taxonomy at a given level, through a dimension reduction step for each group of bacteria improves the AUC. Finally, we show that z-scoring has a very limited effect on the results. Conclusions The prepossessing of microbiome 16S data is crucial for optimal microbiome based Machine Learning. These preprocessing steps are integrated into the MIPMLP - Microbiome Preprocessing Machine Learning Pipeline, which is available as a stand-alone version at: https://github.com/louzounlab/microbiome/tree/master/Preprocess or as a service at http://mip-mlp.math.biu.ac.il/Home Both contain the code, and standard test sets.
Collapse
Affiliation(s)
- Yoel Jasner
- Department of Mathematics, Bar-Ilan University, Ramat Gan, Israel
| | | | | | - Omry Koren
- Azrieli Faculty of Medicine, Bar-Ilan University, Ramat Gan, Israel
| | - Yoram Louzoun
- Department of Mathematics, Bar-Ilan University, Ramat Gan, Israel
| |
Collapse
|
162
|
Lima KM, Davis RR, Liu SY, Greenhalgh DG, Tran NK. Longitudinal profiling of the burn patient cutaneous and gastrointestinal microbiota: a pilot study. Sci Rep 2021; 11:10667. [PMID: 34021204 PMCID: PMC8139985 DOI: 10.1038/s41598-021-89822-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2021] [Accepted: 04/15/2021] [Indexed: 11/09/2022] Open
Abstract
Sepsis is a leading cause of morbidity and mortality in patients that have sustained a severe burn injury. Early detection and treatment of infections improves outcomes and understanding changes in the host microbiome following injury and during treatment may aid in burn care. The loss of functional barriers, systemic inflammation, and commensal community perturbations all contribute to a burn patient’s increased risk of infection. We sampled 10 burn patients to evaluate cutaneous microbial populations on the burn wound and corresponding spared skin on days 0, 3, 7, 14, 21, and 28 post-intensive care unit admission. In addition, skin samples were paired with perianal and rectal locations to evaluate changes in the burn patient gut microbiome following injury and treatment. We found significant (P = 0.011) reduction in alpha diversity on the burn wound compared to spared skin throughout the sampling period as well as reduction in common skin commensal bacteria such as Propionibacterium acnes and Staphylococcus epidermitis. Compared to healthy volunteers (n = 18), the burn patient spared skin also exhibited a significant reduction in alpha diversity (P = 0.001). Treatments such as systemic or topical antibiotic administration, skin grafting, and nutritional formulations also impact diversity and community composition at the sampling locations. When evaluating each subject individually, an increase in relative abundance of taxa isolated clinically by bacterial culture could be seen in 5/9 infections detected among the burn patient cohort.
Collapse
Affiliation(s)
- Kelly M Lima
- Department of Pathology and Laboratory Medicine, University of California Davis, 4400 V St., Sacramento, CA, 95817, USA
| | - Ryan R Davis
- Department of Pathology and Laboratory Medicine, University of California Davis, 4400 V St., Sacramento, CA, 95817, USA
| | - Stephenie Y Liu
- Department of Pathology and Laboratory Medicine, University of California Davis, 4400 V St., Sacramento, CA, 95817, USA
| | - David G Greenhalgh
- Division of Burn Surgery, Department of Surgery, 2221 Stockton Blvd., Sacramento, CA, 95817, USA
| | - Nam K Tran
- Department of Pathology and Laboratory Medicine, University of California Davis, 4400 V St., Sacramento, CA, 95817, USA.
| |
Collapse
|
163
|
Gene-level metagenomic architectures across diseases yield high-resolution microbiome diagnostic indicators. Nat Commun 2021; 12:2907. [PMID: 34006865 PMCID: PMC8131609 DOI: 10.1038/s41467-021-23029-8] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 04/13/2021] [Indexed: 02/06/2023] Open
Abstract
We propose microbiome disease “architectures”: linking >1 million microbial features (species, pathways, and genes) to 7 host phenotypes from 13 cohorts using a pipeline designed to identify associations that are robust to analytical model choice. Here, we quantify conservation and heterogeneity in microbiome-disease associations, using gene-level analysis to identify strain-specific, cross-disease, positive and negative associations. We find coronary artery disease, inflammatory bowel diseases, and liver cirrhosis to share gene-level signatures ascribed to the Streptococcus genus. Type 2 diabetes, by comparison, has a distinct metagenomic signature not linked to any one specific species or genus. We additionally find that at the species-level, the prior-reported connection between Solobacterium moorei and colorectal cancer is not consistently identified across models—however, our gene-level analysis unveils a group of robust, strain-specific gene associations. Finally, we validate our findings regarding colorectal cancer and inflammatory bowel diseases in independent cohorts and identify that features inversely associated with disease tend to be less reproducible than features enriched in disease. Overall, our work is not only a step towards gene-based, cross-disease microbiome diagnostic indicators, but it also illuminates the nuances of the genetic architecture of the human microbiome, including tension between gene- and species-level associations. Here, combing the massive gene-universe of the gut microbiome to identify strain-specific, cross-disease, associations across seven human diseases, the authors introduce the concept of microbiome architecture, defined as the complete set of positive and negative associations between microbial genes and human host disease, highlighting microbiome architectures as potential diagnostic indicators.
Collapse
|
164
|
Beghini F, McIver LJ, Blanco-Míguez A, Dubois L, Asnicar F, Maharjan S, Mailyan A, Manghi P, Scholz M, Thomas AM, Valles-Colomer M, Weingart G, Zhang Y, Zolfo M, Huttenhower C, Franzosa EA, Segata N. Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3. eLife 2021; 10:65088. [PMID: 33944776 PMCID: PMC8096432 DOI: 10.7554/elife.65088] [Citation(s) in RCA: 723] [Impact Index Per Article: 241.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2020] [Accepted: 04/21/2021] [Indexed: 02/06/2023] Open
Abstract
Culture-independent analyses of microbial communities have progressed dramatically in the last decade, particularly due to advances in methods for biological profiling via shotgun metagenomics. Opportunities for improvement continue to accelerate, with greater access to multi-omics, microbial reference genomes, and strain-level diversity. To leverage these, we present bioBakery 3, a set of integrated, improved methods for taxonomic, strain-level, functional, and phylogenetic profiling of metagenomes newly developed to build on the largest set of reference sequences now available. Compared to current alternatives, MetaPhlAn 3 increases the accuracy of taxonomic profiling, and HUMAnN 3 improves that of functional potential and activity. These methods detected novel disease-microbiome links in applications to CRC (1262 metagenomes) and IBD (1635 metagenomes and 817 metatranscriptomes). Strain-level profiling of an additional 4077 metagenomes with StrainPhlAn 3 and PanPhlAn 3 unraveled the phylogenetic and functional structure of the common gut microbe Ruminococcus bromii, previously described by only 15 isolate genomes. With open-source implementations and cloud-deployable reproducible workflows, the bioBakery 3 platform can help researchers deepen the resolution, scale, and accuracy of multi-omic profiling for microbial community studies.
Collapse
Affiliation(s)
| | - Lauren J McIver
- Harvard T.H. Chan School of Public Health, Boston, United States
| | | | | | | | - Sagun Maharjan
- Harvard T.H. Chan School of Public Health, Boston, United States.,The Broad Institute of MIT and Harvard, Cambridge, United States
| | - Ana Mailyan
- Harvard T.H. Chan School of Public Health, Boston, United States.,The Broad Institute of MIT and Harvard, Cambridge, United States
| | - Paolo Manghi
- Department CIBIO, University of Trento, Trento, Italy
| | - Matthias Scholz
- Department of Food Quality and Nutrition, Research and Innovation Center, Edmund Mach Foundation, San Michele all'Adige, Italy
| | | | | | - George Weingart
- Harvard T.H. Chan School of Public Health, Boston, United States.,The Broad Institute of MIT and Harvard, Cambridge, United States
| | - Yancong Zhang
- Harvard T.H. Chan School of Public Health, Boston, United States.,The Broad Institute of MIT and Harvard, Cambridge, United States
| | - Moreno Zolfo
- Department CIBIO, University of Trento, Trento, Italy
| | - Curtis Huttenhower
- Harvard T.H. Chan School of Public Health, Boston, United States.,The Broad Institute of MIT and Harvard, Cambridge, United States
| | - Eric A Franzosa
- Harvard T.H. Chan School of Public Health, Boston, United States.,The Broad Institute of MIT and Harvard, Cambridge, United States
| | - Nicola Segata
- Department CIBIO, University of Trento, Trento, Italy.,IEO, European Institute of Oncology IRCCS, Milan, Italy
| |
Collapse
|
165
|
Simon TG, Chan AT, Huttenhower C. Microbiome Biomarkers: One Step Closer in NAFLD Cirrhosis. Hepatology 2021; 73:2063-2066. [PMID: 33283299 DOI: 10.1002/hep.31660] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 10/25/2020] [Accepted: 11/23/2020] [Indexed: 12/17/2022]
Affiliation(s)
- Tracey G Simon
- Division of Gastroenterology, Department of Medicine, Massachusetts General Hospital, Boston, MA.,Harvard Medical School, Boston, MA.,Clinical and Translational Epidemiology Unit, Massachusetts General Hospital, Boston, MA
| | - Andrew T Chan
- Division of Gastroenterology, Department of Medicine, Massachusetts General Hospital, Boston, MA.,Harvard Medical School, Boston, MA.,Clinical and Translational Epidemiology Unit, Massachusetts General Hospital, Boston, MA.,Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA.,Broad Institute, Boston, MA.,Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA
| | - Curtis Huttenhower
- Broad Institute, Boston, MA.,Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA.,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA
| |
Collapse
|
166
|
Wilkinson JE, Franzosa EA, Everett C, Li C, Hu FB, Wirth DF, Song M, Chan AT, Rimm E, Garrett WS, Huttenhower C. A framework for microbiome science in public health. Nat Med 2021; 27:766-774. [PMID: 33820996 DOI: 10.1038/s41591-021-01258-0] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Accepted: 01/19/2021] [Indexed: 12/12/2022]
Abstract
Human microbiome science has advanced rapidly and reached a scale at which basic biology, clinical translation and population health are increasingly integrated. It is thus now possible for public health researchers, practitioners and policymakers to take specific action leveraging current and future microbiome-based opportunities and best practices. Here we provide an outline of considerations for research, education, interpretation and scientific communication concerning the human microbiome and public health. This includes guidelines for population-scale microbiome study design; necessary physical platforms and analysis methods; integration into public health areas such as epidemiology, nutrition, chronic disease, and global and environmental health; entrepreneurship and technology transfer; and educational curricula. Particularly in the near future, there are both opportunities for the incorporation of microbiome-based technologies into public health practice, and a growing need for policymaking and regulation around related areas such as prebiotic and probiotic supplements, novel live-cell therapies and fecal microbiota transplants.
Collapse
Affiliation(s)
- Jeremy E Wilkinson
- Harvard Chan Microbiome in Public Health Center, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Eric A Franzosa
- Harvard Chan Microbiome in Public Health Center, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Infectious Disease and Microbiome Program, Broad Institute, Cambridge, MA, USA
| | - Christine Everett
- Harvard Chan Microbiome in Public Health Center, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Chengchen Li
- Harvard Chan Microbiome in Public Health Center, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Frank B Hu
- Harvard Chan Microbiome in Public Health Center, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Dyann F Wirth
- Harvard Chan Microbiome in Public Health Center, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Infectious Disease and Microbiome Program, Broad Institute, Cambridge, MA, USA
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Mingyang Song
- Harvard Chan Microbiome in Public Health Center, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Division of Gastroenterology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
- Clinical and Translational Epidemiology Unit, Mongan Institute, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Andrew T Chan
- Harvard Chan Microbiome in Public Health Center, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Infectious Disease and Microbiome Program, Broad Institute, Cambridge, MA, USA
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Division of Gastroenterology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
- Clinical and Translational Epidemiology Unit, Mongan Institute, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Eric Rimm
- Harvard Chan Microbiome in Public Health Center, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Wendy S Garrett
- Harvard Chan Microbiome in Public Health Center, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Department of Medical Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA, USA.
- Department of Molecular Metabolism, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| | - Curtis Huttenhower
- Harvard Chan Microbiome in Public Health Center, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Infectious Disease and Microbiome Program, Broad Institute, Cambridge, MA, USA.
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| |
Collapse
|
167
|
Wu S, Chen Y, Li Z, Li J, Zhao F, Su X. Towards multi-label classification: Next step of machine learning for microbiome research. Comput Struct Biotechnol J 2021; 19:2742-2749. [PMID: 34093989 PMCID: PMC8131981 DOI: 10.1016/j.csbj.2021.04.054] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 04/21/2021] [Accepted: 04/22/2021] [Indexed: 11/22/2022] Open
Abstract
Machine learning (ML) has been widely used in microbiome research for biomarker selection and disease prediction. By training microbial profiles of samples from patients and healthy controls, ML classifiers constructs data models by community features that highly correlated with the target diseases, so as to determine the status of new samples. To clearly understand the host-microbe interaction of specific diseases, previous studies always focused on well-designed cohorts, in which each sample was exactly labeled by a single status type. However, in fact an individual may be associated with multiple diseases simultaneously, which introduce additional variations on microbial patterns that interferes the status detection. More importantly, comorbidities or complications can be missed by regular ML models, limiting the practical application of microbiome techniques. In this review, we summarize the typical ML approaches of single-label classification for microbiome research, and demonstrate their limitations in multi-label disease detection using a real dataset. Then we prospect a further step of ML towards multi-label classification that potentially solves the aforementioned problem, including a series of promising strategies and key technical issues for applying multi-label classification in microbiome-based studies.
Collapse
Affiliation(s)
- Shunyao Wu
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China
| | - Yuzhu Chen
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China
| | - Zhiruo Li
- School of Mathematics and Statistics, Qingdao University, Qingdao, Shandong 266071, China
| | - Jian Li
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China
| | - Fengyang Zhao
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China
| | - Xiaoquan Su
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China
| |
Collapse
|
168
|
Anyaso-Samuel S, Sachdeva A, Guha S, Datta S. Metagenomic Geolocation Prediction Using an Adaptive Ensemble Classifier. Front Genet 2021; 12:642282. [PMID: 33959149 PMCID: PMC8093763 DOI: 10.3389/fgene.2021.642282] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 03/18/2021] [Indexed: 11/13/2022] Open
Abstract
Microbiome samples harvested from urban environments can be informative in predicting the geographic location of unknown samples. The idea that different cities may have geographically disparate microbial signatures can be utilized to predict the geographical location based on city-specific microbiome samples. We implemented this idea first; by utilizing standard bioinformatics procedures to pre-process the raw metagenomics samples provided by the CAMDA organizers. We trained several component classifiers and a robust ensemble classifier with data generated from taxonomy-dependent and taxonomy-free approaches. Also, we implemented class weighting and an optimal oversampling technique to overcome the class imbalance in the primary data. In each instance, we observed that the component classifiers performed differently, whereas the ensemble classifier consistently yielded optimal performance. Finally, we predicted the source cities of mystery samples provided by the organizers. Our results highlight the unreliability of restricting the classification of metagenomic samples to source origins to a single classification algorithm. By combining several component classifiers via the ensemble approach, we obtained classification results that were as good as the best-performing component classifier.
Collapse
Affiliation(s)
- Samuel Anyaso-Samuel
- Department of Biostatistics, University of Florida, Gainesville, FL, United States
| | - Archie Sachdeva
- Department of Biostatistics, University of Florida, Gainesville, FL, United States
| | - Subharup Guha
- Department of Biostatistics, University of Florida, Gainesville, FL, United States
| | - Somnath Datta
- Department of Biostatistics, University of Florida, Gainesville, FL, United States
| |
Collapse
|
169
|
Young C, Wood HM, Fuentes Balaguer A, Bottomley D, Gallop N, Wilkinson L, Benton SC, Brealey M, John C, Burtonwood C, Thompson KN, Yan Y, Barrett JH, Morris EJA, Huttenhower C, Quirke P. Microbiome Analysis of More Than 2,000 NHS Bowel Cancer Screening Programme Samples Shows the Potential to Improve Screening Accuracy. Clin Cancer Res 2021; 27:2246-2254. [PMID: 33658300 PMCID: PMC7610626 DOI: 10.1158/1078-0432.ccr-20-3807] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2020] [Revised: 12/05/2020] [Accepted: 02/12/2021] [Indexed: 02/03/2023]
Abstract
PURPOSE There is potential for fecal microbiome profiling to improve colorectal cancer screening. This has been demonstrated by research studies, but it has not been quantified at scale using samples collected and processed routinely by a national screening program. EXPERIMENTAL DESIGN Between 2016 and 2019, the largest of the NHS Bowel Cancer Screening Programme hubs prospectively collected processed guaiac fecal occult blood test (gFOBT) samples with subsequent colonoscopy outcomes: blood-negative [n = 491 (22%)]; colorectal cancer [n = 430 (19%)]; adenoma [n = 665 (30%)]; colonoscopy-normal [n = 300 (13%)]; nonneoplastic [n = 366 (16%)]. Samples were transported and stored at room temperature. DNA underwent 16S rRNA gene V4 amplicon sequencing. Taxonomic profiling was performed to provide features for classification via random forests (RF). RESULTS Samples provided 16S amplicon-based microbial profiles, which confirmed previously described colorectal cancer-microbiome associations. Microbiome-based RF models showed potential as a first-tier screen, distinguishing colorectal cancer or neoplasm (colorectal cancer or adenoma) from blood-negative with AUC 0.86 (0.82-0.89) and AUC 0.78 (0.74-0.82), respectively. Microbiome-based models also showed potential as a second-tier screen, distinguishing from among gFOBT blood-positive samples, colorectal cancer or neoplasm from colonoscopy-normal with AUC 0.79 (0.74-0.83) and AUC 0.73 (0.68-0.77), respectively. Models remained robust when restricted to 15 taxa, and performed similarly during external validation with metagenomic datasets. CONCLUSIONS Microbiome features can be assessed using gFOBT samples collected and processed routinely by a national colorectal cancer screening program to improve accuracy as a first- or second-tier screen. The models required as few as 15 taxa, raising the potential of an inexpensive qPCR test. This could reduce the number of colonoscopies in countries that use fecal occult blood test screening.
Collapse
Affiliation(s)
- Caroline Young
- Pathology & Data Analytics, Leeds Institute of Medical Research at St James's University Hospital, University of Leeds, Leeds, United Kingdom.
| | - Henry M Wood
- Pathology & Data Analytics, Leeds Institute of Medical Research at St James's University Hospital, University of Leeds, Leeds, United Kingdom
| | - Alba Fuentes Balaguer
- Pathology & Data Analytics, Leeds Institute of Medical Research at St James's University Hospital, University of Leeds, Leeds, United Kingdom
| | - Daniel Bottomley
- Pathology & Data Analytics, Leeds Institute of Medical Research at St James's University Hospital, University of Leeds, Leeds, United Kingdom
| | - Niall Gallop
- Pathology & Data Analytics, Leeds Institute of Medical Research at St James's University Hospital, University of Leeds, Leeds, United Kingdom
| | - Lyndsay Wilkinson
- Pathology & Data Analytics, Leeds Institute of Medical Research at St James's University Hospital, University of Leeds, Leeds, United Kingdom
| | - Sally C Benton
- NHS Bowel Cancer Screening Programme - Southern Hub, Surrey Research Park, Guildford, United Kingdom
| | - Martin Brealey
- NHS Bowel Cancer Screening Programme - Southern Hub, Surrey Research Park, Guildford, United Kingdom
| | - Cerin John
- NHS Bowel Cancer Screening Programme - Southern Hub, Surrey Research Park, Guildford, United Kingdom
| | - Carole Burtonwood
- NHS Bowel Cancer Screening Programme - Southern Hub, Surrey Research Park, Guildford, United Kingdom
| | - Kelsey N Thompson
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts
| | - Yan Yan
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts
| | - Jennifer H Barrett
- Pathology & Data Analytics, Leeds Institute of Medical Research at St James's University Hospital, University of Leeds, Leeds, United Kingdom
| | - Eva J A Morris
- Pathology & Data Analytics, Leeds Institute of Medical Research at St James's University Hospital, University of Leeds, Leeds, United Kingdom
- Big Data Institute, Nuffield Department of Population Health, Old Road Campus, University of Oxford, Oxford, United Kingdom
| | - Curtis Huttenhower
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts
| | - Philip Quirke
- Pathology & Data Analytics, Leeds Institute of Medical Research at St James's University Hospital, University of Leeds, Leeds, United Kingdom
| |
Collapse
|
170
|
Johns MS, Petrelli NJ. Microbiome and colorectal cancer: A review of the past, present, and future. Surg Oncol 2021; 37:101560. [PMID: 33848761 DOI: 10.1016/j.suronc.2021.101560] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2020] [Revised: 11/22/2020] [Accepted: 03/28/2021] [Indexed: 12/27/2022]
Abstract
The gastrointestinal tract is home to diverse and abundant microorganisms, collectively referred to as the microbiome. This ecosystem typically contains trillions of microbial cells that play an important role in regulation of human health. The microbiome has been implicated in host immunity, nutrient absorption, digestion, and metabolism. In recent years, researchers have shown that alteration of the microbiome is associated with disease development, such as obesity, inflammatory bowel disease, and cancer. This review discusses the five decades of research into the human microbiome and the development of colorectal cancer - the historical context including experiments that sparked interest, the explosion of research that has occurred in the last decade, and finally the future of testing and treatment.
Collapse
Affiliation(s)
- Michael S Johns
- Department of Surgical Oncology, Helen F. Graham Cancer Center, ChristianaCare, Newark, DE, USA.
| | - Nicholas J Petrelli
- Department of Surgical Oncology, Helen F. Graham Cancer Center, ChristianaCare, Newark, DE, USA
| |
Collapse
|
171
|
Zhang W, Chen X, Wong KC. Noninvasive early diagnosis of intestinal diseases based on artificial intelligence in genomics and microbiome. J Gastroenterol Hepatol 2021; 36:823-831. [PMID: 33880763 DOI: 10.1111/jgh.15500] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Revised: 03/15/2021] [Accepted: 03/17/2021] [Indexed: 12/15/2022]
Abstract
The maturing development in artificial intelligence (AI) and genomics has propelled the advances in intestinal diseases including intestinal cancer, inflammatory bowel disease (IBD), and irritable bowel syndrome (IBS). On the other hand, colorectal cancer is the second most deadly and the third most common type of cancer in the world according to GLOBOCAN 2020 data. The mechanisms behind IBD and IBS are still speculative. The conventional methods to identify colorectal cancer, IBD, and IBS are based on endoscopy or colonoscopy to identify lesions. However, it is invasive, demanding, and time-consuming for early-stage intestinal diseases. To address those problems, new strategies based on blood and/or human microbiome in gut, colon, or even feces were developed; those methods took advantage of high-throughput sequencing and machine learning approaches. In this review, we summarize the recent research and methods to diagnose intestinal diseases with machine learning technologies based on cell-free DNA and microbiome data generated by amplicon sequencing or whole-genome sequencing. Those methods play an important role in not only intestinal disease diagnosis but also therapy development in the near future.
Collapse
Affiliation(s)
- Weitong Zhang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Xingjian Chen
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR.,Hong Kong Institute for Data Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| |
Collapse
|
172
|
Wirbel J, Zych K, Essex M, Karcher N, Kartal E, Salazar G, Bork P, Sunagawa S, Zeller G. Microbiome meta-analysis and cross-disease comparison enabled by the SIAMCAT machine learning toolbox. Genome Biol 2021; 22:93. [PMID: 33785070 PMCID: PMC8008609 DOI: 10.1186/s13059-021-02306-1] [Citation(s) in RCA: 96] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 02/24/2021] [Indexed: 02/08/2023] Open
Abstract
The human microbiome is increasingly mined for diagnostic and therapeutic biomarkers using machine learning (ML). However, metagenomics-specific software is scarce, and overoptimistic evaluation and limited cross-study generalization are prevailing issues. To address these, we developed SIAMCAT, a versatile R toolbox for ML-based comparative metagenomics. We demonstrate its capabilities in a meta-analysis of fecal metagenomic studies (10,803 samples). When naively transferred across studies, ML models lost accuracy and disease specificity, which could however be resolved by a novel training set augmentation strategy. This reveals some biomarkers to be disease-specific, with others shared across multiple conditions. SIAMCAT is freely available from siamcat.embl.de .
Collapse
Affiliation(s)
- Jakob Wirbel
- Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany
| | - Konrad Zych
- Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany
- Present Address: Clinical Microbiomics A/S, Ole Maaløes Vej 3, 2200 København, Denmark
| | - Morgan Essex
- Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany
- Present Address: Experimental and Clinical Research Center (ECRC) of the Max Delbrück Center for Molecular Medicine and Charité University Hospital, 13125 Berlin, Germany
| | - Nicolai Karcher
- Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany
- Department CIBIO, University of Trento, 38123 Trento, Italy
| | - Ece Kartal
- Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany
| | - Guillem Salazar
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, 8093 Zürich, Switzerland
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany
- Molecular Medicine Partnership Unit, Heidelberg, Germany
- Max Delbrück Centre for Molecular Medicine, 13125 Berlin, Germany
- Department of Bioinformatics, Biocenter, University of Würzburg, 97074 Würzburg, Germany
| | - Shinichi Sunagawa
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, 8093 Zürich, Switzerland
| | - Georg Zeller
- Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany
| |
Collapse
|
173
|
Rahman MA, Rangwala H. IDMIL: an alignment-free Interpretable Deep Multiple Instance Learning (MIL) for predicting disease from whole-metagenomic data. Bioinformatics 2021; 36:i39-i47. [PMID: 32657370 PMCID: PMC7355246 DOI: 10.1093/bioinformatics/btaa477] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Motivation The human body hosts more microbial organisms than human cells. Analysis of this microbial diversity provides key insight into the role played by these microorganisms on human health. Metagenomics is the collective DNA sequencing of coexisting microbial organisms in an environmental sample or a host. This has several applications in precision medicine, agriculture, environmental science and forensics. State-of-the-art predictive models for phenotype predictions from metagenomic data rely on alignments, assembly, extensive pruning, taxonomic profiling and reference sequence databases. These processes are time consuming and they do not consider novel microbial sequences when aligned with the reference genome, limiting the potential of whole metagenomics. We formulate the problem of predicting human disease from whole-metagenomic data using Multiple Instance Learning (MIL), a popular supervised learning paradigm. Our proposed alignment-free approach provides higher accuracy in prediction by harnessing the capability of deep convolutional neural network (CNN) within a MIL framework and provides interpretability via neural attention mechanism. Results The MIL formulation combined with the hierarchical feature extraction capability of deep-CNN provides significantly better predictive performance compared to popular existing approaches. The attention mechanism allows for the identification of groups of sequences that are likely to be correlated to diseases providing the much-needed interpretation. Our proposed approach does not rely on alignment, assembly and reference sequence databases; making it fast and scalable for large-scale metagenomic data. We evaluate our method on well-known large-scale metagenomic studies and show that our proposed approach outperforms comparative state-of-the-art methods for disease prediction. Availability and implementation https://github.com/mrahma23/IDMIL.
Collapse
Affiliation(s)
| | - Huzefa Rangwala
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA
| |
Collapse
|
174
|
Manandhar I, Alimadadi A, Aryal S, Munroe PB, Joe B, Cheng X. Gut microbiome-based supervised machine learning for clinical diagnosis of inflammatory bowel diseases. Am J Physiol Gastrointest Liver Physiol 2021; 320:G328-G337. [PMID: 33439104 PMCID: PMC8828266 DOI: 10.1152/ajpgi.00360.2020] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Despite the availability of various diagnostic tests for inflammatory bowel diseases (IBD), misdiagnosis of IBD occurs frequently, and thus, there is a clinical need to further improve the diagnosis of IBD. As gut dysbiosis is reported in patients with IBD, we hypothesized that supervised machine learning (ML) could be used to analyze gut microbiome data for predictive diagnostics of IBD. To test our hypothesis, fecal 16S metagenomic data of 729 subjects with IBD and 700 subjects without IBD from the American Gut Project were analyzed using five different ML algorithms. Fifty differential bacterial taxa were identified [linear discriminant analysis effect size (LEfSe): linear discriminant analysis (LDA) score > 3] between the IBD and non-IBD groups, and ML classifications trained with these taxonomic features using random forest (RF) achieved a testing area under the receiver operating characteristic curves (AUC) of ∼0.80. Next, we tested if operational taxonomic units (OTUs), instead of bacterial taxa, could be used as ML features for diagnostic classification of IBD. Top 500 high-variance OTUs were used for ML training, and an improved testing AUC of ∼0.82 (RF) was achieved. Lastly, we tested if supervised ML could be used for differentiating Crohn's disease (CD) and ulcerative colitis (UC). Using 331 CD and 141 UC samples, 117 differential bacterial taxa (LEfSe: LDA score > 3) were identified, and the RF model trained with differential taxonomic features or high-variance OTU features achieved a testing AUC > 0.90. In summary, our study demonstrates the promising potential of artificial intelligence via supervised ML modeling for predictive diagnostics of IBD using gut microbiome data.NEW & NOTEWORTHY Our study demonstrates the promising potential of artificial intelligence via supervised machine learning modeling for predictive diagnostics of different types of inflammatory bowel diseases using fecal gut microbiome data.
Collapse
Affiliation(s)
- Ishan Manandhar
- 1Bioinformatics & Artificial Intelligence Laboratory, Center for Hypertension and Precision Medicine, Program in Physiological Genomics, Department of Physiology and Pharmacology, University of Toledo College of Medicine and Life Sciences, Toledo, Ohio
| | - Ahmad Alimadadi
- 1Bioinformatics & Artificial Intelligence Laboratory, Center for Hypertension and Precision Medicine, Program in Physiological Genomics, Department of Physiology and Pharmacology, University of Toledo College of Medicine and Life Sciences, Toledo, Ohio
| | - Sachin Aryal
- 1Bioinformatics & Artificial Intelligence Laboratory, Center for Hypertension and Precision Medicine, Program in Physiological Genomics, Department of Physiology and Pharmacology, University of Toledo College of Medicine and Life Sciences, Toledo, Ohio
| | - Patricia B. Munroe
- 2Clinical Pharmacology, William Harvey Research Institute &
National Institute of Health Research Barts Cardiovascular Biomedical Research Centre, Barts
and The London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom
| | - Bina Joe
- 1Bioinformatics & Artificial Intelligence Laboratory, Center for Hypertension and Precision Medicine, Program in Physiological Genomics, Department of Physiology and Pharmacology, University of Toledo College of Medicine and Life Sciences, Toledo, Ohio
| | - Xi Cheng
- 1Bioinformatics & Artificial Intelligence Laboratory, Center for Hypertension and Precision Medicine, Program in Physiological Genomics, Department of Physiology and Pharmacology, University of Toledo College of Medicine and Life Sciences, Toledo, Ohio
| |
Collapse
|
175
|
Shanahan ER, McMaster JJ, Staudacher HM. Conducting research on diet-microbiome interactions: A review of current challenges, essential methodological principles, and recommendations for best practice in study design. J Hum Nutr Diet 2021; 34:631-644. [PMID: 33639033 DOI: 10.1111/jhn.12868] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 01/07/2021] [Accepted: 01/19/2021] [Indexed: 12/21/2022]
Abstract
Diet is one of the strongest modulators of the gut microbiome. However, the complexity of the interactions between diet and the microbial community emphasises the need for a robust study design and continued methodological development. This review aims to summarise considerations for conducting high-quality diet-microbiome research, outline key challenges unique to the field, and provide advice for addressing these in a practical manner useful to dietitians, microbiologists, gastroenterologists and other diet-microbiome researchers. Searches of databases and references from relevant articles were conducted using the primary search terms 'diet', 'diet intervention', 'dietary analysis', 'microbiome' and 'microbiota', alone or in combination. Publications were considered relevant if they addressed methods for diet and/or microbiome research, or were a human study relevant to diet-microbiome interactions. Best-practice design in diet-microbiome research requires appropriate consideration of the study population and careful choice of trial design and data collection methodology. Ongoing challenges include the collection of dietary data that accurately reflects intake at a timescale relevant to microbial community structure and metabolism, measurement of nutrients in foods pertinent to microbes, improving ability to measure and understand microbial metabolic and functional properties, adequately powering studies, and the considered analysis of multivariate compositional datasets. Collaboration across the disciplines of nutrition science and microbiology is crucial for high-quality diet-microbiome research. Improvements in our understanding of the interaction between nutrient intake and microbial metabolism, as well as continued methodological innovation, will facilitate development of effective evidence-based personalised dietary treatments.
Collapse
Affiliation(s)
- Erin R Shanahan
- School of Life and Environmental Sciences, Charles Perkins Centre, The University of Sydney, Sydney, NSW, Australia
| | | | - Heidi M Staudacher
- IMPACT (The Institute for Mental and Physical Health and Clinical Translation) Food & Mood Centre, Deakin University, Geelong, VIC, Australia
| |
Collapse
|
176
|
Carrieri AP, Haiminen N, Maudsley-Barton S, Gardiner LJ, Murphy B, Mayes AE, Paterson S, Grimshaw S, Winn M, Shand C, Hadjidoukas P, Rowe WPM, Hawkins S, MacGuire-Flanagan A, Tazzioli J, Kenny JG, Parida L, Hoptroff M, Pyzer-Knapp EO. Explainable AI reveals changes in skin microbiome composition linked to phenotypic differences. Sci Rep 2021; 11:4565. [PMID: 33633172 PMCID: PMC7907326 DOI: 10.1038/s41598-021-83922-6] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Accepted: 02/08/2021] [Indexed: 02/06/2023] Open
Abstract
Alterations in the human microbiome have been observed in a variety of conditions such as asthma, gingivitis, dermatitis and cancer, and much remains to be learned about the links between the microbiome and human health. The fusion of artificial intelligence with rich microbiome datasets can offer an improved understanding of the microbiome’s role in human health. To gain actionable insights it is essential to consider both the predictive power and the transparency of the models by providing explanations for the predictions. We combine the collection of leg skin microbiome samples from two healthy cohorts of women with the application of an explainable artificial intelligence (EAI) approach that provides accurate predictions of phenotypes with explanations. The explanations are expressed in terms of variations in the relative abundance of key microbes that drive the predictions. We predict skin hydration, subject's age, pre/post-menopausal status and smoking status from the leg skin microbiome. The changes in microbial composition linked to skin hydration can accelerate the development of personalized treatments for healthy skin, while those associated with age may offer insights into the skin aging process. The leg microbiome signatures associated with smoking and menopausal status are consistent with previous findings from oral/respiratory tract microbiomes and vaginal/gut microbiomes respectively. This suggests that easily accessible microbiome samples could be used to investigate health-related phenotypes, offering potential for non-invasive diagnosis and condition monitoring. Our EAI approach sets the stage for new work focused on understanding the complex relationships between microbial communities and phenotypes. Our approach can be applied to predict any condition from microbiome samples and has the potential to accelerate the development of microbiome-based personalized therapeutics and non-invasive diagnostics.
Collapse
Affiliation(s)
- Anna Paola Carrieri
- The Hartree Centre, Sci-Tech Daresbury, IBM Research, Daresbury, WA4 4AD, UK.
| | - Niina Haiminen
- T.J. Watson Research Center, IBM Research, Yorktown Heights, NY, 10598, USA
| | - Sean Maudsley-Barton
- The Hartree Centre, Sci-Tech Daresbury, IBM Research, Daresbury, WA4 4AD, UK.,Department of Computing and Mathematics, Manchester Metropolitan University (MUU), Manchester, M15 6BH, UK
| | | | - Barry Murphy
- Unilever Research & Development, Port Sunlight, CH63 3JW, UK
| | - Andrew E Mayes
- Unilever Research and Development, Sharnbrook, MK44 1LQ, UK
| | - Sarah Paterson
- Unilever Research & Development, Port Sunlight, CH63 3JW, UK
| | - Sally Grimshaw
- Unilever Research & Development, Port Sunlight, CH63 3JW, UK
| | - Martyn Winn
- Scientific Computing Department, STFC Daresbury Lab, Daresbury, WA4 4AD, UK
| | - Cameron Shand
- The Hartree Centre, Sci-Tech Daresbury, IBM Research, Daresbury, WA4 4AD, UK.,Department of Computer Science, University of Manchester (UoM), Manchester, M13 9LP, UK
| | | | | | - Stacy Hawkins
- Unilever Research & Development, Trumbull, CT, 06611, USA
| | | | - Jane Tazzioli
- Unilever Research & Development, Trumbull, CT, 06611, USA
| | - John G Kenny
- Institute of Integrative Biology, The University of Liverpool, The Bioscience Building, Liverpool, L697ZB, UK
| | - Laxmi Parida
- T.J. Watson Research Center, IBM Research, Yorktown Heights, NY, 10598, USA
| | | | | |
Collapse
|
177
|
Moreno-Indias I, Lahti L, Nedyalkova M, Elbere I, Roshchupkin G, Adilovic M, Aydemir O, Bakir-Gungor B, Santa Pau ECD, D’Elia D, Desai MS, Falquet L, Gundogdu A, Hron K, Klammsteiner T, Lopes MB, Marcos-Zambrano LJ, Marques C, Mason M, May P, Pašić L, Pio G, Pongor S, Promponas VJ, Przymus P, Saez-Rodriguez J, Sampri A, Shigdel R, Stres B, Suharoschi R, Truu J, Truică CO, Vilne B, Vlachakis D, Yilmaz E, Zeller G, Zomer AL, Gómez-Cabrero D, Claesson MJ. Statistical and Machine Learning Techniques in Human Microbiome Studies: Contemporary Challenges and Solutions. Front Microbiol 2021; 12:635781. [PMID: 33692771 PMCID: PMC7937616 DOI: 10.3389/fmicb.2021.635781] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 01/28/2021] [Indexed: 12/23/2022] Open
Abstract
The human microbiome has emerged as a central research topic in human biology and biomedicine. Current microbiome studies generate high-throughput omics data across different body sites, populations, and life stages. Many of the challenges in microbiome research are similar to other high-throughput studies, the quantitative analyses need to address the heterogeneity of data, specific statistical properties, and the remarkable variation in microbiome composition across individuals and body sites. This has led to a broad spectrum of statistical and machine learning challenges that range from study design, data processing, and standardization to analysis, modeling, cross-study comparison, prediction, data science ecosystems, and reproducible reporting. Nevertheless, although many statistics and machine learning approaches and tools have been developed, new techniques are needed to deal with emerging applications and the vast heterogeneity of microbiome data. We review and discuss emerging applications of statistical and machine learning techniques in human microbiome studies and introduce the COST Action CA18131 "ML4Microbiome" that brings together microbiome researchers and machine learning experts to address current challenges such as standardization of analysis pipelines for reproducibility of data analysis results, benchmarking, improvement, or development of existing and new tools and ontologies.
Collapse
Affiliation(s)
- Isabel Moreno-Indias
- Instituto de Investigación Biomédica de Málaga (IBIMA), Unidad de Gestión Clìnica de Endocrinologìa y Nutrición, Hospital Clìnico Universitario Virgen de la Victoria, Universidad de Málaga, Málaga, Spain
- Centro de Investigación Biomeìdica en Red de Fisiopatologtìa de la Obesidad y la Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, Spain
| | - Leo Lahti
- Department of Computing, University of Turku, Turku, Finland
| | - Miroslava Nedyalkova
- Human Genetics and Disease Mechanisms, Latvian Biomedical Research and Study Centre, Riga, Latvia
| | - Ilze Elbere
- Latvian Biomedical Research and Study Centre, Riga, Latvia
| | | | - Muhamed Adilovic
- Department of Genetics and Bioengineering, International University of Sarajevo, Sarajevo, Bosnia and Herzegovina
| | - Onder Aydemir
- Department of Electrical and Electronics Engineering, Karadeniz Technical University, Trabzon, Turkey
| | - Burcu Bakir-Gungor
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | | | - Domenica D’Elia
- Department for Biomedical Sciences, Institute for Biomedical Technologies, National Research Council, Bari, Italy
| | - Mahesh S. Desai
- Department of Infection and Immunity, Luxembourg Institute of Health, Esch-sur-Alzette, Luxembourg
- Odense Research Center for Anaphylaxis, Department of Dermatology and Allergy Center, Odense University Hospital, University of Southern Denmark, Odense, Denmark
| | - Laurent Falquet
- Department of Biology, University of Fribourg, Fribourg, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Aycan Gundogdu
- Department of Microbiology and Clinical Microbiology, Faculty of Medicine, Erciyes University, Kayseri, Turkey
- Metagenomics Laboratory, Genome and Stem Cell Center (GenKök), Erciyes University, Kayseri, Turkey
| | - Karel Hron
- Department of Mathematical Analysis and Applications of Mathematics, Palacký University, Olomouc, Czechia
| | | | - Marta B. Lopes
- NOVA Laboratory for Computer Science and Informatics (NOVA LINCS), FCT, UNL, Caparica, Portugal
- Centro de Matemática e Aplicações (CMA), FCT, UNL, Caparica, Portugal
| | - Laura Judith Marcos-Zambrano
- Computational Biology Group, Precision Nutrition and Cancer Research Program, IMDEA Food Institute, Madrid, Spain
| | - Cláudia Marques
- CINTESIS, NOVA Medical School, NMS, Universidade Nova de Lisboa, Lisbon, Portugal
| | - Michael Mason
- Computational Oncology, Sage Bionetworks, Seattle, WA, United States
| | - Patrick May
- Bioinformatics Core, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Lejla Pašić
- Sarajevo Medical School, University Sarajevo School of Science and Technology, Sarajevo, Bosnia and Herzegovina
| | - Gianvito Pio
- Department of Computer Science, University of Bari Aldo Moro, Bari, Italy
| | - Sándor Pongor
- Faculty of Information Tehnology and Bionics, Pázmány University, Budapest, Hungary
| | - Vasilis J. Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus
| | - Piotr Przymus
- Faculty of Mathematics and Computer Science, Nicolaus Copernicus University, Toruñ, Poland
| | - Julio Saez-Rodriguez
- Institute of Computational Biomedicine, Heidelberg University, Faculty of Medicine and Heidelberg University Hospital, Heidelberg, Germany
| | - Alexia Sampri
- Division of Informatics, Imaging and Data Sciences, School of Health Sciences, University of Manchester, Manchester, United Kingdom
| | - Rajesh Shigdel
- Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Blaz Stres
- Jozef Stefan Institute, Ljubljana, Slovenia
- Biotechnical Faculty, University of Ljubljana, Ljubljana, Slovenia
- Faculty of Civil and Geodetic Engineering, University of Ljubljana, Ljubljana, Slovenia
| | - Ramona Suharoschi
- Molecular Nutrition and Proteomics Lab, Faculty of the Food Science and Technology, Institute of Life Sciences, University of Agricultural Sciences and Veterinary Medicine of Cluj-Napoca, Cluj-Napoca, Romania
| | - Jaak Truu
- Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| | - Ciprian-Octavian Truică
- Department of Computer Science and Engineering, Faculty of Automatic Control and Computers, University Politehnica of Bucharest, Bucharest, Romania
| | - Baiba Vilne
- Bioinformatics Research Unit, Riga Stradins University, Riga, Latvia
| | - Dimitrios Vlachakis
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, Athens, Greece
| | - Ercument Yilmaz
- Department of Computer Technologies, Karadeniz Technical University, Trabzon, Turkey
| | - Georg Zeller
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, Heidelberg, Germany
| | - Aldert L. Zomer
- Department of Infectious Diseases and Immunology, Faculty of Veterinary Medicine, Utrecht University, Utrecht, Netherlands
| | - David Gómez-Cabrero
- Navarrabiomed, Complejo Hospitalario de Navarra (CHN), IdiSNA, Universidad Pública de Navarra (UPNA), Pamplona, Spain
| | - Marcus J. Claesson
- School of Microbiology and APC Microbiome Ireland, University College Cork, Cork, Ireland
| |
Collapse
|
178
|
Marcos-Zambrano LJ, Karaduzovic-Hadziabdic K, Loncar Turukalo T, Przymus P, Trajkovik V, Aasmets O, Berland M, Gruca A, Hasic J, Hron K, Klammsteiner T, Kolev M, Lahti L, Lopes MB, Moreno V, Naskinova I, Org E, Paciência I, Papoutsoglou G, Shigdel R, Stres B, Vilne B, Yousef M, Zdravevski E, Tsamardinos I, Carrillo de Santa Pau E, Claesson MJ, Moreno-Indias I, Truu J. Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment. Front Microbiol 2021; 12:634511. [PMID: 33737920 PMCID: PMC7962872 DOI: 10.3389/fmicb.2021.634511] [Citation(s) in RCA: 113] [Impact Index Per Article: 37.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 02/01/2021] [Indexed: 12/19/2022] Open
Abstract
The number of microbiome-related studies has notably increased the availability of data on human microbiome composition and function. These studies provide the essential material to deeply explore host-microbiome associations and their relation to the development and progression of various complex diseases. Improved data-analytical tools are needed to exploit all information from these biological datasets, taking into account the peculiarities of microbiome data, i.e., compositional, heterogeneous and sparse nature of these datasets. The possibility of predicting host-phenotypes based on taxonomy-informed feature selection to establish an association between microbiome and predict disease states is beneficial for personalized medicine. In this regard, machine learning (ML) provides new insights into the development of models that can be used to predict outputs, such as classification and prediction in microbiology, infer host phenotypes to predict diseases and use microbial communities to stratify patients by their characterization of state-specific microbial signatures. Here we review the state-of-the-art ML methods and respective software applied in human microbiome studies, performed as part of the COST Action ML4Microbiome activities. This scoping review focuses on the application of ML in microbiome studies related to association and clinical use for diagnostics, prognostics, and therapeutics. Although the data presented here is more related to the bacterial community, many algorithms could be applied in general, regardless of the feature type. This literature and software review covering this broad topic is aligned with the scoping review methodology. The manual identification of data sources has been complemented with: (1) automated publication search through digital libraries of the three major publishers using natural language processing (NLP) Toolkit, and (2) an automated identification of relevant software repositories on GitHub and ranking of the related research papers relying on learning to rank approach.
Collapse
Affiliation(s)
- Laura Judith Marcos-Zambrano
- Computational Biology Group, Precision Nutrition and Cancer Research Program, IMDEA Food Institute, Madrid, Spain
| | | | | | - Piotr Przymus
- Faculty of Mathematics and Computer Science, Nicolaus Copernicus University, Toruń, Poland
| | - Vladimir Trajkovik
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, Skopje, North Macedonia
| | - Oliver Aasmets
- Institute of Genomics, Estonian Genome Centre, University of Tartu, Tartu, Estonia
- Department of Biotechnology, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| | - Magali Berland
- Université Paris-Saclay, INRAE, MGP, Jouy-en-Josas, France
| | - Aleksandra Gruca
- Department of Computer Networks and Systems, Silesian University of Technology, Gliwice, Poland
| | - Jasminka Hasic
- University Sarajevo School of Science and Technology, Sarajevo, Bosnia and Herzegovina
| | - Karel Hron
- Department of Mathematical Analysis and Applications of Mathematics, Palacký University, Olomouc, Czechia
| | | | - Mikhail Kolev
- South West University “Neofit Rilski”, Blagoevgrad, Bulgaria
| | - Leo Lahti
- Department of Computing, University of Turku, Turku, Finland
| | - Marta B. Lopes
- NOVA Laboratory for Computer Science and Informatics (NOVA LINCS), FCT, UNL, Caparica, Portugal
- Centro de Matemática e Aplicações (CMA), FCT, UNL, Caparica, Portugal
| | - Victor Moreno
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO)Barcelona, Spain
- Colorectal Cancer Group, Institut de Recerca Biomedica de Bellvitge (IDIBELL), Barcelona, Spain
- Consortium for Biomedical Research in Epidemiology and Public Health (CIBERESP), Barcelona, Spain
- Department of Clinical Sciences, Faculty of Medicine, University of Barcelona, Barcelona, Spain
| | - Irina Naskinova
- South West University “Neofit Rilski”, Blagoevgrad, Bulgaria
| | - Elin Org
- Institute of Genomics, Estonian Genome Centre, University of Tartu, Tartu, Estonia
| | - Inês Paciência
- EPIUnit – Instituto de Saúde Pública da Universidade do Porto, Porto, Portugal
| | | | - Rajesh Shigdel
- Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Blaz Stres
- Group for Microbiology and Microbial Biotechnology, Department of Animal Science, University of Ljubljana, Ljubljana, Slovenia
| | - Baiba Vilne
- Bioinformatics Research Unit, Riga Stradins University, Riga, Latvia
| | - Malik Yousef
- Department of Information Systems, Zefat Academic College, Zefat, Israel
- Galilee Digital Health Research Center (GDH), Zefat Academic College, Zefat, Israel
| | - Eftim Zdravevski
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, Skopje, North Macedonia
| | | | | | - Marcus J. Claesson
- School of Microbiology & APC Microbiome Ireland, University College Cork, Cork, Ireland
| | - Isabel Moreno-Indias
- Unidad de Gestión Clínica de Endocrinología y Nutrición, Instituto de Investigación Biomédica de Málaga (IBIMA), Hospital Clínico Universitario Virgen de la Victoria, Universidad de Málaga, Málaga, Spain
- Centro de Investigación Biomédica en Red de Fisiopatología de la Obesidad y la Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, Spain
| | - Jaak Truu
- Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| |
Collapse
|
179
|
Shi K, Zhang L, Yu J, Chen Z, Lai S, Zhao X, Li WG, Luo Q, Lin W, Feng J, Bork P, Zhao XM, Li F. A 12-genus bacterial signature identifies a group of severe autistic children with differential sensory behavior and brain structures. Clin Transl Med 2021; 11:e314. [PMID: 33634969 PMCID: PMC7893807 DOI: 10.1002/ctm2.314] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 01/20/2021] [Accepted: 01/21/2021] [Indexed: 01/01/2023] Open
Affiliation(s)
- Kai Shi
- Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, China.,School of Mathematical Sciences, SCMS, and SCAM, Fudan University, Shanghai, China.,College of Information Science and Engineering, Guilin University of Technology, Guilin, China
| | - Lingli Zhang
- Department of Developmental and Behavioural Pediatric & Child Primary Care, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.,Brain and Behavioural Research Unit of Shanghai Institute for Pediatric Research and MOE Shanghai Key Laboratory for Children's Environmental Health, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Juehua Yu
- Department of Developmental and Behavioural Pediatric & Child Primary Care, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.,NHC Key Laboratory of Drug Addiction Medicine (Kunming Medical University), First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan, China
| | - Zilin Chen
- Department of Developmental and Behavioural Pediatric & Child Primary Care, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.,Brain and Behavioural Research Unit of Shanghai Institute for Pediatric Research and MOE Shanghai Key Laboratory for Children's Environmental Health, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Senying Lai
- Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, China
| | - Xingzhong Zhao
- Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, China
| | - Wei-Guang Li
- Collaborative Innovation Center for Brain Science, Department of Anatomy and Physiology, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Qiang Luo
- Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, China
| | - Wei Lin
- Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, China.,School of Mathematical Sciences, SCMS, and SCAM, Fudan University, Shanghai, China.,Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, China
| | - Jianfeng Feng
- Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, China
| | - Peer Bork
- European Molecular Biology Laboratory, Meyerhofstraße 1, Heidelberg, Germany
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, China.,MOE Key Laboratory of Computational Neuroscience and Brain-inspired Intelligence, and Frontiers Center for Brain Science, Shanghai, China
| | - Fei Li
- Department of Developmental and Behavioural Pediatric & Child Primary Care, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.,Brain and Behavioural Research Unit of Shanghai Institute for Pediatric Research and MOE Shanghai Key Laboratory for Children's Environmental Health, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| |
Collapse
|
180
|
Sharma D, Paterson AD, Xu W. TaxoNN: ensemble of neural networks on stratified microbiome data for disease prediction. Bioinformatics 2021; 36:4544-4550. [PMID: 32449747 PMCID: PMC7750934 DOI: 10.1093/bioinformatics/btaa542] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Revised: 05/08/2020] [Accepted: 05/19/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Research supports the potential use of microbiome as a predictor of some diseases. Motivated by the findings that microbiome data is complex in nature, and there is an inherent correlation due to hierarchical taxonomy of microbial Operational Taxonomic Units (OTUs), we propose a novel machine learning method incorporating a stratified approach to group OTUs into phylum clusters. Convolutional Neural Networks (CNNs) were used to train within each of the clusters individually. Further, through an ensemble learning approach, features obtained from each cluster were then concatenated to improve prediction accuracy. Our two-step approach comprising stratification prior to combining multiple CNNs, aided in capturing the relationships between OTUs sharing a phylum efficiently, as compared to using a single CNN ignoring OTU correlations. RESULTS We used simulated datasets containing 168 OTUs in 200 cases and 200 controls for model testing. Thirty-two OTUs, potentially associated with risk of disease were randomly selected and interactions between three OTUs were used to introduce non-linearity. We also implemented this novel method in two human microbiome studies: (i) Cirrhosis with 118 cases, 114 controls; (ii) type 2 diabetes (T2D) with 170 cases, 174 controls; to demonstrate the model's effectiveness. Extensive experimentation and comparison against conventional machine learning techniques yielded encouraging results. We obtained mean AUC values of 0.88, 0.92, 0.75, showing a consistent increment (5%, 3%, 7%) in simulations, Cirrhosis and T2D data, respectively, against the next best performing method, Random Forest. AVAILABILITY AND IMPLEMENTATION https://github.com/divya031090/TaxoNN_OTU. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Divya Sharma
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada M5T 3M7
| | - Andrew D Paterson
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada M5T 3M7.,Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, Canada, M5G 1X8
| | - Wei Xu
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada M5T 3M7.,Department of Biostatistics, Princess Margaret Cancer Center, University Health Network, Toronto, ON, Canada, M5G 2C1
| |
Collapse
|
181
|
Aasmets O, Lüll K, Lang JM, Pan C, Kuusisto J, Fischer K, Laakso M, Lusis AJ, Org E. Machine Learning Reveals Time-Varying Microbial Predictors with Complex Effects on Glucose Regulation. mSystems 2021; 6:e01191-20. [PMID: 33594006 PMCID: PMC8573957 DOI: 10.1128/msystems.01191-20] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 01/22/2021] [Indexed: 12/11/2022] Open
Abstract
The incidence of type 2 diabetes (T2D) has been increasing globally, and a growing body of evidence links type 2 diabetes with altered microbiota composition. Type 2 diabetes is preceded by a long prediabetic state characterized by changes in various metabolic parameters. We tested whether the gut microbiome could have predictive potential for T2D development during the healthy and prediabetic disease stages. We used prospective data of 608 well-phenotyped Finnish men collected from the population-based Metabolic Syndrome in Men (METSIM) study to build machine learning models for predicting continuous glucose and insulin measures in a shorter (1.5 year) and longer (4 year) period. Our results show that the inclusion of the gut microbiome improves prediction accuracy for modeling T2D-associated parameters such as glycosylated hemoglobin and insulin measures. We identified novel microbial biomarkers and described their effects on the predictions using interpretable machine learning techniques, which revealed complex linear and nonlinear associations. Additionally, the modeling strategy carried out allowed us to compare the stability of model performance and biomarker selection, also revealing differences in short-term and long-term predictions. The identified microbiome biomarkers provide a predictive measure for various metabolic traits related to T2D, thus providing an additional parameter for personal risk assessment. Our work also highlights the need for robust modeling strategies and the value of interpretable machine learning.IMPORTANCE Recent studies have shown a clear link between gut microbiota and type 2 diabetes. However, current results are based on cross-sectional studies that aim to determine the microbial dysbiosis when the disease is already prevalent. In order to consider the microbiome as a factor in disease risk assessment, prospective studies are needed. Our study is the first study that assesses the gut microbiome as a predictive measure for several type 2 diabetes-associated parameters in a longitudinal study setting. Our results revealed a number of novel microbial biomarkers that can improve the prediction accuracy for continuous insulin measures and glycosylated hemoglobin levels. These results make the prospect of using the microbiome in personalized medicine promising.
Collapse
Affiliation(s)
- Oliver Aasmets
- Institute of Genomics, Estonian Genome Centre, University of Tartu, Tartu, Estonia
- Department of Biotechnology, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| | - Kreete Lüll
- Institute of Genomics, Estonian Genome Centre, University of Tartu, Tartu, Estonia
- Department of Biotechnology, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| | - Jennifer M Lang
- Department of Medicine, University of California, Los Angeles, California, USA
| | - Calvin Pan
- Department of Medicine, University of California, Los Angeles, California, USA
| | - Johanna Kuusisto
- Institute of Clinical Medicine, Internal Medicine, University of Eastern Finland, Kuopio, and Kuopio University Hospital, Kuopio, Finland
| | - Krista Fischer
- Institute of Mathematics and Statistics, University of Tartu, Tartu, Estonia
| | - Markku Laakso
- Institute of Clinical Medicine, Internal Medicine, University of Eastern Finland, Kuopio, and Kuopio University Hospital, Kuopio, Finland
| | - Aldons J Lusis
- Department of Medicine, University of California, Los Angeles, California, USA
- Department of Human Genetics, University of California, Los Angeles, California, USA
- Department of Microbiology, Immunology and Molecular Genetics, University of California, Los Angeles, California, USA
| | - Elin Org
- Institute of Genomics, Estonian Genome Centre, University of Tartu, Tartu, Estonia
| |
Collapse
|
182
|
PM2RA: A Framework for Detecting and Quantifying Relationship Alterations in Microbial Community. GENOMICS PROTEOMICS & BIOINFORMATICS 2021; 19:154-167. [PMID: 33581337 PMCID: PMC8498968 DOI: 10.1016/j.gpb.2020.07.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/21/2020] [Revised: 06/28/2020] [Accepted: 08/09/2020] [Indexed: 11/21/2022]
Abstract
The dysbiosis of gut microbiota is associated with the pathogenesis of human diseases. However, observing shifts in the microbe abundance cannot fully reveal underlying perturbations. Examining the relationship alterations (RAs) in the microbiome between health and disease statuses provides additional hints about the pathogenesis of human diseases, but no methods were designed to detect and quantify the RAs between different conditions directly. Here, we present profile monitoring for microbial relationship alteration (PM2RA), an analysis framework to identify and quantify the microbial RAs. The performance of PM2RA was evaluated with synthetic data, and it showed higher specificity and sensitivity than the co-occurrence-based methods. Analyses of real microbial datasets showed that PM2RA was robust for quantifying microbial RAs across different datasets in several diseases. By applying PM2RA, we identified several novel or previously reported microbes implicated in multiple diseases. PM2RA is now implemented as a web-based application available at http://www.pm2ra-xingyinliulab.cn/.
Collapse
|
183
|
Microbial source tracking using metagenomics and other new technologies. J Microbiol 2021; 59:259-269. [DOI: 10.1007/s12275-021-0668-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 01/08/2021] [Accepted: 01/08/2021] [Indexed: 12/12/2022]
|
184
|
Microbiome connections with host metabolism and habitual diet from 1,098 deeply phenotyped individuals. Nat Med 2021; 27:321-332. [PMID: 33432175 PMCID: PMC8353542 DOI: 10.1038/s41591-020-01183-8] [Citation(s) in RCA: 416] [Impact Index Per Article: 138.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Accepted: 11/16/2020] [Indexed: 02/07/2023]
Abstract
The gut microbiome is shaped by diet and influences host metabolism; however, these links are complex and can be unique to each individual. We performed deep metagenomic sequencing of 1,203 gut microbiomes from 1,098 individuals enrolled in the Personalised Responses to Dietary Composition Trial (PREDICT 1) study, whose detailed long-term diet information, as well as hundreds of fasting and same-meal postprandial cardiometabolic blood marker measurements were available. We found many significant associations between microbes and specific nutrients, foods, food groups and general dietary indices, which were driven especially by the presence and diversity of healthy and plant-based foods. Microbial biomarkers of obesity were reproducible across external publicly available cohorts and in agreement with circulating blood metabolites that are indicators of cardiovascular disease risk. While some microbes, such as Prevotella copri and Blastocystis spp., were indicators of favorable postprandial glucose metabolism, overall microbiome composition was predictive for a large panel of cardiometabolic blood markers including fasting and postprandial glycemic, lipemic and inflammatory indices. The panel of intestinal species associated with healthy dietary habits overlapped with those associated with favorable cardiometabolic and postprandial markers, indicating that our large-scale resource can potentially stratify the gut microbiome into generalizable health levels in individuals without clinically manifest disease.
Collapse
|
185
|
Ghannam RB, Techtmann SM. Machine learning applications in microbial ecology, human microbiome studies, and environmental monitoring. Comput Struct Biotechnol J 2021; 19:1092-1107. [PMID: 33680353 PMCID: PMC7892807 DOI: 10.1016/j.csbj.2021.01.028] [Citation(s) in RCA: 76] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 01/16/2021] [Accepted: 01/18/2021] [Indexed: 01/04/2023] Open
Abstract
Advances in nucleic acid sequencing technology have enabled expansion of our ability to profile microbial diversity. These large datasets of taxonomic and functional diversity are key to better understanding microbial ecology. Machine learning has proven to be a useful approach for analyzing microbial community data and making predictions about outcomes including human and environmental health. Machine learning applied to microbial community profiles has been used to predict disease states in human health, environmental quality and presence of contamination in the environment, and as trace evidence in forensics. Machine learning has appeal as a powerful tool that can provide deep insights into microbial communities and identify patterns in microbial community data. However, often machine learning models can be used as black boxes to predict a specific outcome, with little understanding of how the models arrived at predictions. Complex machine learning algorithms often may value higher accuracy and performance at the sacrifice of interpretability. In order to leverage machine learning into more translational research related to the microbiome and strengthen our ability to extract meaningful biological information, it is important for models to be interpretable. Here we review current trends in machine learning applications in microbial ecology as well as some of the important challenges and opportunities for more broad application of machine learning to understanding microbial communities.
Collapse
Key Words
- 16S rRNA
- ANN, Artificial Neural Networks
- ASV, Amplicon Sequence Variant
- AUC, Area Under the Curve
- Forensics
- GB, Gradient Boosting
- ML, Machine Learning
- Machine learning
- Marker genes
- Metagenomics
- PCoA, Principal Coordinate Analysis
- RF, Random Forests
- ROC, Receiver Operating Characteristic
- SML, Supervised Machine Learning
- SVM, Support Vector Machines
- USML, Unsupervised Machine Learning
- tSNE, t-distributed Stochastic Neighbor Embedding
Collapse
Affiliation(s)
- Ryan B. Ghannam
- Department of Biological Sciences, Michigan Technological University, Houghton MI, United States
| | - Stephen M. Techtmann
- Department of Biological Sciences, Michigan Technological University, Houghton MI, United States
| |
Collapse
|
186
|
Dhungel E, Mreyoud Y, Gwak HJ, Rajeh A, Rho M, Ahn TH. MegaR: an interactive R package for rapid sample classification and phenotype prediction using metagenome profiles and machine learning. BMC Bioinformatics 2021; 22:25. [PMID: 33461494 PMCID: PMC7814621 DOI: 10.1186/s12859-020-03933-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 12/11/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Diverse microbiome communities drive biogeochemical processes and evolution of animals in their ecosystems. Many microbiome projects have demonstrated the power of using metagenomics to understand the structures and factors influencing the function of the microbiomes in their environments. In order to characterize the effects from microbiome composition for human health, diseases, and even ecosystems, one must first understand the relationship of microbes and their environment in different samples. Running machine learning model with metagenomic sequencing data is encouraged for this purpose, but it is not an easy task to make an appropriate machine learning model for all diverse metagenomic datasets. RESULTS We introduce MegaR, an R Shiny package and web application, to build an unbiased machine learning model effortlessly with interactive visual analysis. The MegaR employs taxonomic profiles from either whole metagenome sequencing or 16S rRNA sequencing data to develop machine learning models and classify the samples into two or more categories. It provides various options for model fine tuning throughout the analysis pipeline such as data processing, multiple machine learning techniques, model validation, and unknown sample prediction that can be used to achieve the highest prediction accuracy possible for any given dataset while still maintaining a user-friendly experience. CONCLUSIONS Metagenomic sample classification and phenotype prediction is important particularly when it applies to a diagnostic method for identifying and predicting microbe-related human diseases. MegaR provides various interactive visualizations for user to build an accurate machine-learning model without difficulty. Unknown sample prediction with a properly trained model using MegaR will enhance researchers to identify the sample property in a fast turnaround time.
Collapse
Affiliation(s)
- Eliza Dhungel
- Program in Bioinformatics and Computational Biology, Saint Louis University, Saint Louis, MO, USA
| | - Yassin Mreyoud
- Program in Bioinformatics and Computational Biology, Saint Louis University, Saint Louis, MO, USA
| | - Ho-Jin Gwak
- Department of Computer Science and Engineering, Hanyang University, Seoul, Korea
| | - Ahmad Rajeh
- Program in Bioinformatics and Computational Biology, Saint Louis University, Saint Louis, MO, USA
| | - Mina Rho
- Department of Computer Science and Engineering, Hanyang University, Seoul, Korea
| | - Tae-Hyuk Ahn
- Program in Bioinformatics and Computational Biology, Saint Louis University, Saint Louis, MO, USA.
- Department of Computer Science, Saint Louis University, Saint Louis, MO, USA.
| |
Collapse
|
187
|
Reiman D, Farhat AM, Dai Y. Predicting Host Phenotype Based on Gut Microbiome Using a Convolutional Neural Network Approach. Methods Mol Biol 2021; 2190:249-266. [PMID: 32804370 DOI: 10.1007/978-1-0716-0826-5_12] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Accurate prediction of the host phenotypes from a microbial sample and identification of the associated microbial markers are important in understanding the impact of the microbiome on the pathogenesis and progression of various diseases within the host. A deep learning tool, PopPhy-CNN, has been developed for the task of predicting host phenotypes using a convolutional neural network (CNN). By representing samples as annotated taxonomic trees and further representing these trees as matrices, PopPhy-CNN utilizes the CNN's innate ability to explore locally similar microbes on the taxonomic tree. Furthermore, PopPhy-CNN can be used to evaluate the importance of each taxon in the prediction of host status. Here, we describe the underlying methodology, architecture, and core utility of PopPhy-CNN. We also demonstrate the use of PopPhy-CNN on a microbial dataset.
Collapse
Affiliation(s)
- Derek Reiman
- Department of Bioengineering, University of Illinois at Chicago, Chicago, IL, USA
| | - Ali M Farhat
- College of Medicine, University of Illinois at Chicago, Chicago, IL, USA
| | - Yang Dai
- Department of Bioengineering, University of Illinois at Chicago, Chicago, IL, USA.
| |
Collapse
|
188
|
Mancin L, Rollo I, Mota JF, Piccini F, Carletti M, Susto GA, Valle G, Paoli A. Optimizing Microbiota Profiles for Athletes. Exerc Sport Sci Rev 2021; 49:42-49. [PMID: 33044333 DOI: 10.1249/jes.0000000000000236] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Gut microbiome influences athletes' physiology, but because of the complexity of sport performance and the great intervariability of microbiome features, it is not reasonable to define a single healthy microbiota profile for athletes. We suggest the use of specific meta-omics analysis coupled with innovative computational systems to uncover the hidden association between microbes and athlete's physiology and predict personalized recommendation.
Collapse
Affiliation(s)
| | | | - Joao Felipe Mota
- Clinical and Sports Nutrition Research Laboratory (LABINCE), Federal University of Goiás, Goiânia, Goiás, Brazil
| | | | | | | | | | | |
Collapse
|
189
|
Fouladi F, Carroll IM, Sharpton TJ, Bulik-Sullivan E, Heinberg L, Steffen KJ, Fodor AA. A microbial signature following bariatric surgery is robustly consistent across multiple cohorts. Gut Microbes 2021; 13:1930872. [PMID: 34159880 PMCID: PMC8224199 DOI: 10.1080/19490976.2021.1930872] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Revised: 04/28/2021] [Accepted: 05/05/2021] [Indexed: 02/07/2023] Open
Abstract
Bariatric surgery induces significant shifts in the gut microbiota which could potentially contribute to weight loss and metabolic benefits. The aim of this study was to characterize a microbial signature following Roux-en-Y Gastric bypass (RYGB) surgery using novel and existing gut microbiota sequence data. We generated 16S rRNA gene and metagenomic sequences from fecal samples from patients undergoing RYGB surgery (n = 61 for 16S rRNA gene and n = 135 for metagenomics) at pre-surgical baseline and one, six, and twelve-month post-surgery. We compared these data with three smaller publicly available 16S rRNA gene and one metagenomic datasets from patients who also underwent RYGB surgery. Linear mixed models and machine learning approaches were used to examine the presence of a common microbial signature across studies. Comparison of our new sequences with previous longitudinal studies revealed strikingly similar profiles in both fecal microbiota composition (r = 0.41 ± 0.10; p < .05) and metabolic pathways (r = 0.70 ± 0.05; p < .001) early after surgery across multiple datasets. Notably, Veillonella, Streptococcus, Gemella, Fusobacterium, Escherichia/Shigella, and Akkermansia increased after surgery, while Blautia decreased. Machine learning approaches revealed that the replicable gut microbiota signature associated with RYGB surgery could be used to discriminate pre- and post-surgical samples. Opportunistic pathogen abundance also increased post-surgery in a consistent manner across cohorts. Our study reveals a robust microbial signature involving many commensal and pathogenic taxa and metabolic pathways early after RYGB surgery across different studies and sites. Characterization of the effects of this robust microbial signature on outcomes of bariatric surgery could provide insights into the development of microbiome-based interventions for predicting or improving outcomes following surgery.
Collapse
Affiliation(s)
- Farnaz Fouladi
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, USA
| | - Ian M. Carroll
- Department of Nutrition, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, USA
- Center for Gastrointestinal Biology and Disease, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, USA
| | - Thomas J. Sharpton
- Department of Microbiology, Department of Statistics, Center for Genome Research and Biocomputing, Oregon State University, Corvallis, USA
| | - Emily Bulik-Sullivan
- Department of Nutrition, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, USA
| | - Leslie Heinberg
- Department of Psychiatry and Psychology, Cleveland Clinic Lerner College of Medicine of Case Western Reserve University, Cleveland, USA
| | - Kristine J. Steffen
- School of Pharmacy, College of Health Professions, North Dakota State University, Fargo, USA
- Director of Biomedical Research, Center for Biobehavioral Research/Sanford Research, Fargo, USA
| | - Anthony A. Fodor
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, USA
| |
Collapse
|
190
|
Elgart M, Redline S, Sofer T. Machine and Deep Learning in Molecular and Genetic Aspects of Sleep Research. Neurotherapeutics 2021; 18:228-243. [PMID: 33829409 PMCID: PMC8116376 DOI: 10.1007/s13311-021-01014-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/18/2021] [Indexed: 12/11/2022] Open
Abstract
Epidemiological sleep research strives to identify the interactions and causal mechanisms by which sleep affects human health, and to design intervention strategies for improving sleep throughout the lifespan. These goals can be advanced by further focusing on the environmental and genetic etiology of sleep disorders, and by development of risk stratification algorithms, to identify people who are at risk or are affected by, sleep disorders. These studies rely on comprehensive sleep-related data which often contains complex multi-dimensional physiological and molecular measurements across multiple timepoints. Thus, sleep research is well-suited for the application of computational approaches that can handle high-dimensional data. Here, we survey recent advances in machine and deep learning together with the availability of large human cohort studies with sleep data that can jointly drive the next breakthroughs in the sleep-research field. We describe sleep-related data types and datasets, and present some of the tasks in the field that can be targets for algorithmic approaches, as well as the challenges and opportunities in pursuing them.
Collapse
Affiliation(s)
- Michael Elgart
- Division of Sleep and Circadian Disorders, Brigham and Women’s Hospital, Boston, MA USA
- Department of Medicine, Harvard Medical School, Boston, MA USA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Brigham and Women’s Hospital, Boston, MA USA
- Department of Medicine, Harvard Medical School, Boston, MA USA
| | - Tamar Sofer
- Division of Sleep and Circadian Disorders, Brigham and Women’s Hospital, Boston, MA USA
- Department of Medicine, Harvard Medical School, Boston, MA USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA USA
| |
Collapse
|
191
|
McCoubrey LE, Elbadawi M, Orlu M, Gaisford S, Basit AW. Harnessing machine learning for development of microbiome therapeutics. Gut Microbes 2021; 13:1-20. [PMID: 33522391 PMCID: PMC7872042 DOI: 10.1080/19490976.2021.1872323] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 12/20/2020] [Indexed: 02/06/2023] Open
Abstract
The last twenty years of seminal microbiome research has uncovered microbiota's intrinsic relationship with human health. Studies elucidating the relationship between an unbalanced microbiome and disease are currently published daily. As such, microbiome big data have become a reality that provide a mine of information for the development of new therapeutics. Machine learning (ML), a branch of artificial intelligence, offers powerful techniques for big data analysis and prediction-making, that are out of reach of human intellect alone. This review will explore how ML can be applied for the development of microbiome-targeted therapeutics. A background on ML will be given, followed by a guide on where to find reliable microbiome big data. Existing applications and opportunities will be discussed, including the use of ML to discover, design, and characterize microbiome therapeutics. The use of ML to optimize advanced processes, such as 3D printing and in silico prediction of drug-microbiome interactions, will also be highlighted. Finally, barriers to adoption of ML in academic and industrial settings will be examined, concluded by a future outlook for the field.
Collapse
Affiliation(s)
| | - Moe Elbadawi
- UCL School of Pharmacy, University College London, London, UK
| | - Mine Orlu
- UCL School of Pharmacy, University College London, London, UK
| | - Simon Gaisford
- UCL School of Pharmacy, University College London, London, UK
- FabRx Ltd., Ashford, Kent, UK
| | - Abdul W. Basit
- UCL School of Pharmacy, University College London, London, UK
| |
Collapse
|
192
|
Hsu CK, Su SC, Chang LC, Shao SC, Yang KJ, Chen CY, Chen YT, Wu IW. Effects of Low Protein Diet on Modulating Gut Microbiota in Patients with Chronic Kidney Disease: A Systematic Review and Meta-analysis of International Studies. Int J Med Sci 2021; 18:3839-3850. [PMID: 34790060 PMCID: PMC8579282 DOI: 10.7150/ijms.66451] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 10/09/2021] [Indexed: 12/11/2022] Open
Abstract
Background: Although associations between low protein diet (LPD) and changes of gut microbiota have been reported; however, systematic discernment of the effects of LPD on diet-microbiome-host interaction in patients with chronic kidney disease (CKD) is lacking. Methods: We searched PUBMED and EMBASE for articles published on changes of gut microbiota associated with implementation of LPD in CKD patients until July 2021. Independent researchers extracted data and assessed risks of bias. We conducted meta-analyses of combine p-value, mean differences and random effects for gut microbiota and related metabolites. Study heterogeneity was measured by Tau2 and I2 statistic. This study followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. Results: Five articles met inclusion criteria. The meta-analyses of gut microbiota exhibited enrichments of Lactobacillaceae (meta-p= 0.010), Bacteroidaceae (meta-p= 0.048) and Streptococcus anginosus (meta-p< 0.001), but revealed depletion of Bacteroides eggerthii (p=0.017) and Roseburia faecis (meta-p=0.019) in LPD patients compared to patients undergoing normal protein diet. The serum IS levels (mean difference: 0.68 ug/mL, 95% CI: -8.38-9.68, p= 0.89) and pCS levels (mean difference: -3.85 ug/mL, 95% CI: -15.49-7.78, p < 0.52) did not change between groups. We did not find significant differences on renal function associated with change of microbiota between groups (eGFR, mean difference: -7.21 mL/min/1.73 m2, 95% CI: -33.2-18.79, p= 0.59; blood urea nitrogen, mean difference: -6.8 mg/dL, 95% CI: -46.42-32.82, p= 0.74). Other clinical (sodium, potassium, phosphate, albumin, fasting sugar, uric acid, total cholesterol, triglycerides, C-reactive protein and hemoglobin) and anthropometric estimates (body mass index, systolic blood pressure and diastolic blood pressure) did not differ between the two groups. Conclusions: This systematic review and meta-analysis suggested that the effects of LPD on the microbiota were observed predominantly at the families and species levels but minimal on microbial diversity or richness. In the absence of global compositional microbiota shifts, the species-level changes appear insufficient to alter metabolic or clinical outputs.
Collapse
Affiliation(s)
- Cheng-Kai Hsu
- Department of Nephrology, Chang Gung Memorial Hospital, Keelung, Taiwan
| | - Shih-Chi Su
- Whole-Genome Research Core Laboratory of Human Diseases, Chang Gung Memorial Hospital, Keelung, Taiwan
| | - Lun-Ching Chang
- Department of Mathematical Sciences, Florida Atlantic University, Florida, US
| | - Shih-Chieh Shao
- School of Pharmacy, Institute of Clinical Pharmacy and Pharmaceutical Sciences, College of Medicine, National Cheng Kung University, Tainan, Taiwan.,Department of Pharmacy, Keelung Chang Gung Memorial Hospital, Keelung, Taiwan
| | - Kai-Jie Yang
- Department of Nephrology, Chang Gung Memorial Hospital, Keelung, Taiwan
| | - Chun-Yu Chen
- Department of Nephrology, Chang Gung Memorial Hospital, Keelung, Taiwan
| | - Yih-Ting Chen
- Department of Nephrology, Chang Gung Memorial Hospital, Keelung, Taiwan
| | - I-Wen Wu
- Department of Nephrology, Chang Gung Memorial Hospital, Keelung, Taiwan.,College of Medicine, Chang Gung University, Taoyuan, Taiwan
| |
Collapse
|
193
|
Trivieri N, Pracella R, Cariglia MG, Panebianco C, Parrella P, Visioli A, Giani F, Soriano AA, Barile C, Canistro G, Latiano TP, Dimitri L, Bazzocchi F, Cassano D, Vescovi AL, Pazienza V, Binda E. BRAF V600E mutation impinges on gut microbial markers defining novel biomarkers for serrated colorectal cancer effective therapies. JOURNAL OF EXPERIMENTAL & CLINICAL CANCER RESEARCH : CR 2020; 39:285. [PMID: 33317591 PMCID: PMC7737386 DOI: 10.1186/s13046-020-01801-w] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Accepted: 12/04/2020] [Indexed: 12/12/2022]
Abstract
BACKGROUND Colorectal cancer (CRC) harboring BRAFV600E mutation exhibits low response to conventional therapy and poorest prognosis. Due to the emerging correlation between gut microbiota and CRC carcinogenesis, we investigated in serrated BRAFV600E cases the existence of a peculiar fecal microbial fingerprint and specific bacterial markers, which might represent a tool for the development of more effective clinical strategies. METHODS By injecting human CRC stem-like cells isolated from BRAFV600E patients in immunocompromised mice, we described a new xenogeneic model of this subtype of CRC. By performing bacterial 16S rRNA sequencing, the fecal microbiota profile was then investigated either in CRC-carrying mice or in a cohort of human CRC subjects. The microbial communities' functional profile was also predicted. Data were compared with Mann-Whitney U, Welch's t-test for unequal variances and Kruskal-Wallis test with Benjamini-Hochberg false discovery rate (FDR) correction, extracted as potential BRAF class biomarkers and selected as model features. The obtained mean test prediction scores were subjected to Receiver Operating characteristic (ROC) analysis. To discriminate the BRAF status, a Random Forest classifier (RF) was employed. RESULTS A specific microbial signature distinctive for BRAF status emerged, being the BRAF-mutated cases closer to healthy controls than BRAF wild-type counterpart. In agreement, a considerable score of correlation was also pointed out between bacteria abundance from BRAF-mutated cases and the level of markers distinctive of BRAFV600E pathway, including those involved in inflammation, innate immune response and epithelial-mesenchymal transition. We provide evidence that two candidate bacterial markers, Prevotella enoeca and Ruthenibacterium lactatiformans, more abundant in BRAFV600E and BRAF wild-type subjects respectively, emerged as single factors with the best performance in distinguishing BRAF status (AUROC = 0.72 and 0.74, respectively, 95% confidence interval). Furthermore, the combination of the 10 differentially represented microorganisms between the two groups improved performance in discriminating serrated CRC driven by BRAF mutation from BRAF wild-type CRC cases (AUROC = 0.85, 95% confidence interval, 0.69-1.01). CONCLUSION Overall, our results suggest that BRAFV600E mutation itself drives a distinctive gut microbiota signature and provide predictive CRC-associated bacterial biomarkers able to discriminate BRAF status in CRC patients and, thus, useful to devise non-invasive patient-selective diagnostic strategies and patient-tailored optimized therapies.
Collapse
Affiliation(s)
- Nadia Trivieri
- Cancer Stem Cells Unit, ISBReMIT, IRCSS Casa Sollievo della Sofferenza, Opera di San Pio da Pietrelcina, San Giovanni Rotondo, FG, Italy
| | - Riccardo Pracella
- Cancer Stem Cells Unit, ISBReMIT, IRCSS Casa Sollievo della Sofferenza, Opera di San Pio da Pietrelcina, San Giovanni Rotondo, FG, Italy
| | - Maria Grazia Cariglia
- Cancer Stem Cells Unit, ISBReMIT, IRCSS Casa Sollievo della Sofferenza, Opera di San Pio da Pietrelcina, San Giovanni Rotondo, FG, Italy
| | - Concetta Panebianco
- Gastroenterology Unit, IRCSS Casa Sollievo della Sofferenza, Opera di San Pio da Pietrelcina, San Giovanni Rotondo, FG, Italy
| | - Paola Parrella
- Oncology Laboratory, IRCSS Casa Sollievo della Sofferenza, Opera di San Pio da Pietrelcina, San Giovanni Rotondo, FG, Italy
| | | | | | - Amata Amy Soriano
- Cancer Stem Cells Unit, ISBReMIT, IRCSS Casa Sollievo della Sofferenza, Opera di San Pio da Pietrelcina, San Giovanni Rotondo, FG, Italy
| | - Chiara Barile
- Cancer Stem Cells Unit, ISBReMIT, IRCSS Casa Sollievo della Sofferenza, Opera di San Pio da Pietrelcina, San Giovanni Rotondo, FG, Italy
| | - Giuseppe Canistro
- Abdominal Surgery Unit, IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo, FG, Italy
| | - Tiziana Pia Latiano
- Division of Medical Oncology, IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo, FG, Italy
| | - Lucia Dimitri
- Anatomical Pathology Unit, IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo, FG, Italy
| | - Francesca Bazzocchi
- Abdominal Surgery Unit, IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo, FG, Italy
| | - Dario Cassano
- Abdominal Surgery Unit, IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo, FG, Italy
| | - Angelo L Vescovi
- StemGen SpA, Milan, Italy.,Science Directorate, IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo, FG, Italy
| | - Valerio Pazienza
- Gastroenterology Unit, IRCSS Casa Sollievo della Sofferenza, Opera di San Pio da Pietrelcina, San Giovanni Rotondo, FG, Italy
| | - Elena Binda
- Cancer Stem Cells Unit, ISBReMIT, IRCSS Casa Sollievo della Sofferenza, Opera di San Pio da Pietrelcina, San Giovanni Rotondo, FG, Italy. .,Cancer Stem Cells Unit, Fondazione IRCCS Casa Sollievo della Sofferenza, Institute for Stem Cell Biology, Regenerative Medicine and Innovative Therapeutics (ISBReMIT), 71013, San Giovanni Rotondo, FG, Italy.
| |
Collapse
|
194
|
Chen JCY, Tyler AD. Systematic evaluation of supervised machine learning for sample origin prediction using metagenomic sequencing data. Biol Direct 2020; 15:29. [PMID: 33302990 PMCID: PMC7731568 DOI: 10.1186/s13062-020-00287-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Accepted: 12/01/2020] [Indexed: 02/07/2023] Open
Abstract
Background The advent of metagenomic sequencing provides microbial abundance patterns that can be leveraged for sample origin prediction. Supervised machine learning classification approaches have been reported to predict sample origin accurately when the origin has been previously sampled. Using metagenomic datasets provided by the 2019 CAMDA challenge, we evaluated the influence of variable technical, analytical and machine learning approaches for result interpretation and novel source prediction. Results Comparison between 16S rRNA amplicon and shotgun sequencing approaches as well as metagenomic analytical tools showed differences in normalized microbial abundance, especially for organisms present at low abundance. Shotgun sequence data analyzed using Kraken2 and Bracken, for taxonomic annotation, had higher detection sensitivity. As classification models are limited to labeling pre-trained origins, we took an alternative approach using Lasso-regularized multivariate regression to predict geographic coordinates for comparison. In both models, the prediction errors were much higher in Leave-1-city-out than in 10-fold cross validation, of which the former realistically forecasted the increased difficulty in accurately predicting samples from new origins. This challenge was further confirmed when applying the model to a set of samples obtained from new origins. Overall, the prediction performance of the regression and classification models, as measured by mean squared error, were comparable on mystery samples. Due to higher prediction error rates for samples from new origins, we provided an additional strategy based on prediction ambiguity to infer whether a sample is from a new origin. Lastly, we report increased prediction error when data from different sequencing protocols were included as training data. Conclusions Herein, we highlight the capacity of predicting sample origin accurately with pre-trained origins and the challenge of predicting new origins through both regression and classification models. Overall, this work provides a summary of the impact of sequencing technique, protocol, taxonomic analytical approaches, and machine learning approaches on the use of metagenomics for prediction of sample origin. Supplementary Information The online version contains supplementary material available at 10.1186/s13062-020-00287-y.
Collapse
Affiliation(s)
- Julie Chih-Yu Chen
- National Microbiology Laboratory, Public Health Agency of Canada, 1015 Arlington Street, Winnipeg, Manitoba, R3E 3R2, Canada.
| | - Andrea D Tyler
- National Microbiology Laboratory, Public Health Agency of Canada, 1015 Arlington Street, Winnipeg, Manitoba, R3E 3R2, Canada
| |
Collapse
|
195
|
Bokulich NA, Ziemski M, Robeson MS, Kaehler BD. Measuring the microbiome: Best practices for developing and benchmarking microbiomics methods. Comput Struct Biotechnol J 2020; 18:4048-4062. [PMID: 33363701 PMCID: PMC7744638 DOI: 10.1016/j.csbj.2020.11.049] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Revised: 11/27/2020] [Accepted: 11/28/2020] [Indexed: 12/12/2022] Open
Abstract
Microbiomes are integral components of diverse ecosystems, and increasingly recognized for their roles in the health of humans, animals, plants, and other hosts. Given their complexity (both in composition and function), the effective study of microbiomes (microbiomics) relies on the development, optimization, and validation of computational methods for analyzing microbial datasets, such as from marker-gene (e.g., 16S rRNA gene) and metagenome data. This review describes best practices for benchmarking and implementing computational methods (and software) for studying microbiomes, with particular focus on unique characteristics of microbiomes and microbiomics data that should be taken into account when designing and testing microbiomics methods.
Collapse
Affiliation(s)
- Nicholas A. Bokulich
- Laboratory of Food Systems Biotechnology, Institute of Food, Nutrition, and Health, ETH Zurich, Switzerland
| | - Michal Ziemski
- Laboratory of Food Systems Biotechnology, Institute of Food, Nutrition, and Health, ETH Zurich, Switzerland
| | - Michael S. Robeson
- University of Arkansas for Medical Sciences, Department of Biomedical Informatics, Little Rock, AR, USA
| | | |
Collapse
|
196
|
Kohli A, Holzwanger EA, Levy AN. Emerging use of artificial intelligence in inflammatory bowel disease. World J Gastroenterol 2020; 26:6923-6928. [PMID: 33311940 PMCID: PMC7701951 DOI: 10.3748/wjg.v26.i44.6923] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 10/24/2020] [Accepted: 11/12/2020] [Indexed: 02/06/2023] Open
Abstract
Inflammatory bowel disease (IBD) is a complex, immune-mediated gastrointestinal disorder with ill-defined etiology, multifaceted diagnostic criteria, and unpredictable treatment response. Innovations in IBD diagnostics, including developments in genomic sequencing and molecular analytics, have generated tremendous interest in leveraging these large data platforms into clinically meaningful tools. Artificial intelligence, through machine learning facilitates the interpretation of large arrays of data, and may provide insight to improving IBD outcomes. While potential applications of machine learning models are vast, further research is needed to generate standardized models that can be adapted to target IBD populations.
Collapse
Affiliation(s)
- Arushi Kohli
- Department of Internal Medicine, Tufts Medical Center, Boston, MA 02111, United States
| | - Erik A Holzwanger
- Division of Gastroenterology and Hepatology, Tufts Medical Center, Boston, MA 02111, United States
| | - Alexander N Levy
- Division of Gastroenterology and Hepatology, Tufts Medical Center, Boston, MA 02111, United States
| |
Collapse
|
197
|
Fermented food products in the era of globalization: tradition meets biotechnology innovations. Curr Opin Biotechnol 2020; 70:36-41. [PMID: 33232845 DOI: 10.1016/j.copbio.2020.10.006] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 09/18/2020] [Accepted: 10/19/2020] [Indexed: 02/06/2023]
Abstract
Omics tools offer the opportunity to characterize and trace traditional and industrial fermented foods. Bioinformatics, through machine learning, and other advanced statistical approaches, are able to disentangle fermentation processes and to predict the evolution and metabolic outcomes of a food microbial ecosystem. By assembling microbial artificial consortia, the biotechnological advances will also be able to enhance the nutritional value and organoleptics characteristics of fermented food, preserving, at the same time, the potential of autochthonous microbial consortia and metabolic pathways, which are difficult to reproduce. Preserving the traditional methods contributes to protecting the hidden value of local biodiversity, and exploits its potential in industrial processes with the final aim of guaranteeing food security and safety, even in developing countries.
Collapse
|
198
|
Alvarez-Pitti J, de Blas A, Lurbe E. Innovations in Infant Feeding: Future Challenges and Opportunities in Obesity and Cardiometabolic Disease. Nutrients 2020; 12:nu12113508. [PMID: 33202614 PMCID: PMC7697724 DOI: 10.3390/nu12113508] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 11/11/2020] [Accepted: 11/11/2020] [Indexed: 12/15/2022] Open
Abstract
The field of nutrition in early life, as an effective tool to prevent and treat chronic diseases, has attracted a large amount of interest over recent years. The vital roles of food products and nutrients on the body’s molecular mechanisms have been demonstrated. The knowledge of the mechanisms and the possibility of controlling them via what we eat has opened up the field of precision nutrition, which aims to set dietary strategies in order to improve health with the greatest effectiveness. However, this objective is achieved only if the genetic profile of individuals and their living conditions are also considered. The relevance of this topic is strengthened considering the importance of nutrition during childhood and the impact on the development of obesity. In fact, the prevalence of global childhood obesity has increased substantially from 1990 and has now reached epidemic proportions. The current narrative review presents recent research on precision nutrition and its role on the prevention and treatment of obesity during pediatric years, a novel and promising area of research.
Collapse
Affiliation(s)
- Julio Alvarez-Pitti
- Department of Pediatrics, Consorcio Hospital General, University of Valencia, 46014 Valencia, Spain; (A.d.B.); (E.L.)
- CIBER Fisiopatología Obesidad y Nutrición (CB06/03), Instituto de Salud Carlos III, 28029 Madrid, Spain
- INCLIVA Biomedical Research Institute, Hospital Clínico, University of Valencia, 46010 Valencia, Spain
- Correspondence: ; Tel.: +34-96-1820772
| | - Ana de Blas
- Department of Pediatrics, Consorcio Hospital General, University of Valencia, 46014 Valencia, Spain; (A.d.B.); (E.L.)
| | - Empar Lurbe
- Department of Pediatrics, Consorcio Hospital General, University of Valencia, 46014 Valencia, Spain; (A.d.B.); (E.L.)
- CIBER Fisiopatología Obesidad y Nutrición (CB06/03), Instituto de Salud Carlos III, 28029 Madrid, Spain
- INCLIVA Biomedical Research Institute, Hospital Clínico, University of Valencia, 46010 Valencia, Spain
| |
Collapse
|
199
|
Abstract
Today massive amounts of sequenced metagenomic and metatranscriptomic data from different ecological niches and environmental locations are available. Scientific progress depends critically on methods that allow extracting useful information from the various types of sequence data. Here, we will first discuss types of information contained in the various flavours of biological sequence data, and how this information can be interpreted to increase our scientific knowledge and understanding. We argue that a mechanistic understanding of biological systems analysed from different perspectives is required to consistently interpret experimental observations, and that this understanding is greatly facilitated by the generation and analysis of dynamic mathematical models. We conclude that, in order to construct mathematical models and to test mechanistic hypotheses, time-series data are of critical importance. We review diverse techniques to analyse time-series data and discuss various approaches by which time-series of biological sequence data have been successfully used to derive and test mechanistic hypotheses. Analysing the bottlenecks of current strategies in the extraction of knowledge and understanding from data, we conclude that combined experimental and theoretical efforts should be implemented as early as possible during the planning phase of individual experiments and scientific research projects. This article is part of the theme issue ‘Integrative research perspectives on marine conservation’.
Collapse
Affiliation(s)
- Ovidiu Popa
- Institute of Quantitative and Theoretical Biology, CEPLAS, Heinrich-Heine University Düsseldorf, Germany
| | - Ellen Oldenburg
- Institute of Quantitative and Theoretical Biology, CEPLAS, Heinrich-Heine University Düsseldorf, Germany
| | - Oliver Ebenhöh
- Institute of Quantitative and Theoretical Biology, CEPLAS, Heinrich-Heine University Düsseldorf, Germany.,Cluster of Excellence on Plant Sciences, CEPLAS, Heinrich-Heine University Düsseldorf, Germany
| |
Collapse
|
200
|
Microbiome of the first stool after birth and infantile colic. Pediatr Res 2020; 88:776-783. [PMID: 32053826 DOI: 10.1038/s41390-020-0804-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Revised: 12/16/2019] [Accepted: 01/28/2020] [Indexed: 11/08/2022]
Abstract
BACKGROUND Recent studies have shown a diverse microbiome in the first stool after birth. The clinical significance of the microbiome of the first stool is not known. Infantile colic has earlier been associated with the composition of the intestinal microbiome. METHODS We set out to test whether the microbiome of the first stool is associated with subsequent infantile colic in a prospective, population-based cohort study of 212 consecutive newborn infants. We used next-generation sequencing of the bacterial 16S rRNA gene. RESULTS The newborns who later developed infantile colic (n = 19) had a lower relative abundance of the genus Lactobacillus and the phylum Firmicutes in the first stool than those who remained healthy (n = 139). By using all microbiome data, random forest algorithm classified newborn with subsequent colic and those who remained healthy with area under the curve of 0.66 (SD 0.03) as compared to that of shuffled samples (P value <0.001). CONCLUSIONS In this prospective, population-based study, the microbiome of the first-pass meconium was associated with subsequent infantile colic. Our results suggest that the pathogenesis of infantile colic is closely related to the intestinal microbiome at birth.
Collapse
|