1
|
Hosseiniyan Khatibi SM, Dimaano NG, Veliz E, Sundaresan V, Ali J. Exploring and exploiting the rice phytobiome to tackle climate change challenges. PLANT COMMUNICATIONS 2024; 5:101078. [PMID: 39233440 PMCID: PMC11671768 DOI: 10.1016/j.xplc.2024.101078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Revised: 08/07/2024] [Accepted: 09/02/2024] [Indexed: 09/06/2024]
Abstract
The future of agriculture is uncertain under the current climate change scenario. Climate change directly and indirectly affects the biotic and abiotic elements that control agroecosystems, jeopardizing the safety of the world's food supply. A new area that focuses on characterizing the phytobiome is emerging. The phytobiome comprises plants and their immediate surroundings, involving numerous interdependent microscopic and macroscopic organisms that affect the health and productivity of plants. Phytobiome studies primarily focus on the microbial communities associated with plants, which are referred to as the plant microbiome. The development of high-throughput sequencing technologies over the past 10 years has dramatically advanced our understanding of the structure, functionality, and dynamics of the phytobiome; however, comprehensive methods for using this knowledge are lacking, particularly for major crops such as rice. Considering the impact of rice production on world food security, gaining fresh perspectives on the interdependent and interrelated components of the rice phytobiome could enhance rice production and crop health, sustain rice ecosystem function, and combat the effects of climate change. Our review re-conceptualizes the complex dynamics of the microscopic and macroscopic components in the rice phytobiome as influenced by human interventions and changing environmental conditions driven by climate change. We also discuss interdisciplinary and systematic approaches to decipher and reprogram the sophisticated interactions in the rice phytobiome using novel strategies and cutting-edge technology. Merging the gigantic datasets and complex information on the rice phytobiome and their application in the context of regenerative agriculture could lead to sustainable rice farming practices that are resilient to the impacts of climate change.
Collapse
Affiliation(s)
| | - Niña Gracel Dimaano
- International Rice Research Institute, Los Baños, Laguna, Philippines; College of Agriculture and Food Science, University of the Philippines Los Baños, Los Baños, Laguna, Philippines
| | - Esteban Veliz
- College of Biological Sciences, University of California, Davis, Davis, CA, USA
| | - Venkatesan Sundaresan
- College of Biological Sciences, University of California, Davis, Davis, CA, USA; College of Agricultural and Environmental Sciences, University of California, Davis, Davis, CA, USA
| | - Jauhar Ali
- International Rice Research Institute, Los Baños, Laguna, Philippines.
| |
Collapse
|
2
|
Muller E, Shiryan I, Borenstein E. Multi-omic integration of microbiome data for identifying disease-associated modules. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.07.03.547607. [PMID: 37461534 PMCID: PMC10349976 DOI: 10.1101/2023.07.03.547607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/27/2023]
Abstract
The human gut microbiome is a complex ecosystem with profound implications for health and disease. This recognition has led to a surge in multi-omic microbiome studies, employing various molecular assays to elucidate the microbiome's role in diseases across multiple functional layers. However, despite the clear value of these multi-omic datasets, rigorous integrative analysis of such data poses significant challenges, hindering a comprehensive understanding of microbiome-disease interactions. Perhaps most notably, multiple approaches, including univariate and multivariate analyses, as well as machine learning, have been applied to such data to identify disease-associated markers, namely, specific features (e.g., species, pathways, metabolites) that are significantly altered in disease state. These methods, however, often yield extensive lists of features associated with the disease without effectively capturing the multi-layered structure of multi-omic data or offering clear, interpretable hypotheses about underlying microbiome-disease mechanisms. Here, we address this challenge by introducing MintTea - an intermediate integration-based method for analyzing multi-omic microbiome data. MintTea combines a canonical correlation analysis (CCA) extension, consensus analysis, and an evaluation protocol to robustly identify disease-associated multi-omic modules. Each such module consists of a set of features from the various omics that both shift in concord, and collectively associate with the disease. Applying MintTea to diverse case-control cohorts with multi-omic data, we show that this framework is able to capture modules with high predictive power for disease, significant cross-omic correlations, and alignment with known microbiome-disease associations. For example, analyzing samples from a metabolic syndrome (MS) study, we found a MS-associated module comprising of a highly correlated cluster of serum glutamate- and TCA cycle-related metabolites, as well as bacterial species previously implicated in insulin resistance. In another cohort, we identified a module associated with late-stage colorectal cancer, featuring Peptostreptococcus and Gemella species and several fecal amino acids, in agreement with these species' reported role in the metabolism of these amino acids and their coordinated increase in abundance during disease development. Finally, comparing modules identified in different datasets, we detected multiple significant overlaps, suggesting common interactions between microbiome features. Combined, this work serves as a proof of concept for the potential benefits of advanced integration methods in generating integrated multi-omic hypotheses underlying microbiome-disease interactions and a promising avenue for researchers seeking systems-level insights into coherent mechanisms governing microbiome-related diseases.
Collapse
|
3
|
Ibrahimi E, Lopes MB, Dhamo X, Simeon A, Shigdel R, Hron K, Stres B, D’Elia D, Berland M, Marcos-Zambrano LJ. Overview of data preprocessing for machine learning applications in human microbiome research. Front Microbiol 2023; 14:1250909. [PMID: 37869650 PMCID: PMC10588656 DOI: 10.3389/fmicb.2023.1250909] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 09/22/2023] [Indexed: 10/24/2023] Open
Abstract
Although metagenomic sequencing is now the preferred technique to study microbiome-host interactions, analyzing and interpreting microbiome sequencing data presents challenges primarily attributed to the statistical specificities of the data (e.g., sparse, over-dispersed, compositional, inter-variable dependency). This mini review explores preprocessing and transformation methods applied in recent human microbiome studies to address microbiome data analysis challenges. Our results indicate a limited adoption of transformation methods targeting the statistical characteristics of microbiome sequencing data. Instead, there is a prevalent usage of relative and normalization-based transformations that do not specifically account for the specific attributes of microbiome data. The information on preprocessing and transformations applied to the data before analysis was incomplete or missing in many publications, leading to reproducibility concerns, comparability issues, and questionable results. We hope this mini review will provide researchers and newcomers to the field of human microbiome research with an up-to-date point of reference for various data transformation tools and assist them in choosing the most suitable transformation method based on their research questions, objectives, and data characteristics.
Collapse
Affiliation(s)
- Eliana Ibrahimi
- Department of Biology, Faculty of Natural Sciences, University of Tirana, Tirana, Albania
| | - Marta B. Lopes
- Department of Mathematics, Center for Mathematics and Applications (NOVA Math), NOVA School of Science and Technology, Caparica, Portugal
- UNIDEMI, Department of Mechanical and Industrial Engineering, NOVA School of Science and Technology, Caparica, Portugal
| | - Xhilda Dhamo
- Department of Applied Mathematics, Faculty of Natural Sciences, University of Tirana, Tirana, Albania
| | - Andrea Simeon
- BioSense Institute, University of Novi Sad, Novi Sad, Serbia
| | - Rajesh Shigdel
- Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Karel Hron
- Department of Mathematical Analysis and Applications of Mathematics, Faculty of Science, Palacký University Olomouc, Olomouc, Czechia
| | - Blaž Stres
- Department of Catalysis and Chemical Reaction Engineering, National Institute of Chemistry, Ljubljana, Slovenia
- Faculty of Civil and Geodetic Engineering, Institute of Sanitary Engineering, Ljubljana, Slovenia
- Department of Automation, Biocybernetics and Robotics, Jožef Stefan Institute, Ljubljana, Slovenia
- Department of Animal Science, Biotechnical Faculty, University of Ljubljana, Ljubljana, Slovenia
| | - Domenica D’Elia
- Department of Biomedical Sciences, National Research Council, Institute for Biomedical Technologies, Bari, Italy
| | - Magali Berland
- INRAE, MetaGenoPolis, Université Paris-Saclay, Jouy-en-Josas, France
| | - Laura Judith Marcos-Zambrano
- Computational Biology Group, Precision Nutrition and Cancer Research Program, IMDEA Food Institute, Madrid, Spain
| |
Collapse
|
4
|
Wassan JT, Wang H, Zheng H. Developing a New Phylogeny-Driven Random Forest Model for Functional Metagenomics. IEEE Trans Nanobioscience 2023; 22:763-770. [PMID: 37279136 DOI: 10.1109/tnb.2023.3283462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Metagenomics is an unobtrusive science linking microbial genes to biological functions or environmental states. Classifying microbial genes into their functional repertoire is an important task in the downstream analysis of Metagenomic studies. The task involves Machine Learning (ML) based supervised methods to achieve good classification performance. Random Forest (RF) has been applied rigorously to microbial gene abundance profiles, mapping them to functional phenotypes. The current research targets tuning RF by the evolutionary ancestry of microbial phylogeny, developing a Phylogeny-RF model for functional classification of metagenomes. This method facilitates capturing the effects of phylogenetic relatedness in an ML classifier itself rather than just applying a supervised classifier over the raw abundances of microbial genes. The idea is rooted in the fact that closely related microbes by phylogeny are highly correlated and tend to have similar genetic and phenotypic traits. Such microbes behave similarly; and hence tend to be selected together, or one of these could be dropped from the analysis, to improve the ML process. The proposed Phylogeny-RF algorithm has been compared with state-of-the-art classification methods including RF and the phylogeny-aware methods of MetaPhyl and PhILR, using three real-world 16S rRNA metagenomic datasets. It has been observed that the proposed method not only achieved significantly better performance than the traditional RF model but also performed better than the other phylogeny-driven benchmarks (p < 0.05). For example, Phylogeny-RF attained a highest AUC of 0.949 and Kappa of 0.891 over soil microbiomes in comparison to other benchmarks.
Collapse
|
5
|
Zhao X, Zhang T, Dang B, Guo M, Jin M, Li C, Hou N, Bai S. Microalgae-based constructed wetland system enhances nitrogen removal and reduce carbon emissions: Performance and mechanisms. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 877:162883. [PMID: 36934950 DOI: 10.1016/j.scitotenv.2023.162883] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Revised: 03/11/2023] [Accepted: 03/11/2023] [Indexed: 05/06/2023]
Abstract
Combination of constructed wetlands (CWs) and microalgae-based technologies has been proved as effective wastewater treatment option; however, little attention was paid to investigate the optimal combination ways. This study showed that the integrated system (IS) connecting microalgal pond with CWs exhibited improved pollutant-removal efficiencies and preferred carbon reduction effects compared to other alternatives such as coupled system or independent CWs. Microbial analysis demonstrated that core microorganisms (e.g., Acinetobacter and Thermomonas) of the IS were mostly associated with carbon, nitrogen, and energy metabolism. Based on co-occurrence networks, microbial quantity with denitrification function in the IS accounted for 71.01 % of the microorganism related to nitrogen metabolism, which was higher than that of 48.84 % in the independent CWs, indicating that the presence of microalgae in IS played important role in promoting biological denitrification. These findings provide insights into the microbial mechanism and highlights the complementary effects between microalgae and CWs.
Collapse
Affiliation(s)
- Xinyue Zhao
- College of Resource and Environment, Northeast Agricultural University, Harbin 150030, China
| | - Tuoshi Zhang
- College of Resource and Environment, Northeast Agricultural University, Harbin 150030, China
| | - Bin Dang
- College of Resource and Environment, Northeast Agricultural University, Harbin 150030, China
| | - Mengran Guo
- College of Resource and Environment, Northeast Agricultural University, Harbin 150030, China
| | - Ming Jin
- College of Resource and Environment, Northeast Agricultural University, Harbin 150030, China
| | - Chunyan Li
- College of Resource and Environment, Northeast Agricultural University, Harbin 150030, China
| | - Ning Hou
- College of Resource and Environment, Northeast Agricultural University, Harbin 150030, China
| | - Shunwen Bai
- School of Environment, State Key Laboratory of Urban Water Resource and Environment, Harbin Institute of Technology, Harbin 150090, China.
| |
Collapse
|
6
|
Sherif FF, Ahmed KS. Unsupervised clustering of SARS-CoV-2 using deep convolutional autoencoder. JOURNAL OF ENGINEERING AND APPLIED SCIENCE 2022. [PMCID: PMC9383682 DOI: 10.1186/s44147-022-00125-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
SARS-CoV-2’s population structure might have a substantial impact on public health management and diagnostics if it can be identified. It is critical to rapidly monitor and characterize their lineages circulating globally for a more accurate diagnosis, improved care, and faster treatment. For a clearer picture of the SARS-CoV-2 population structure, clustering the sequencing data is essential. Here, deep clustering techniques were used to automatically group 29,017 different strains of SARS-CoV-2 into clusters. We aim to identify the main clusters of SARS-CoV-2 population structure based on convolutional autoencoder (CAE) trained with numerical feature vectors mapped from coronavirus Spike peptide sequences. Our clustering findings revealed that there are six large SARS-CoV-2 population clusters (C1, C2, C3, C4, C5, C6). These clusters contained 43 unique lineages in which the 29,017 publicly accessible strains were dispersed. In all the resulting six clusters, the genetic distances within the same cluster (intra-cluster distances) are less than the distances between inter-clusters (P-value 0.0019, Wilcoxon rank-sum test). This indicates substantial evidence of a connection between the cluster’s lineages. Furthermore, comparisons of the K-means and hierarchical clustering methods have been examined against the proposed deep learning clustering method. The intra-cluster genetic distances of the proposed method were smaller than those of K-means alone and hierarchical clustering methods. We used T-distributed stochastic-neighbor embedding (t-SNE) to show the outcomes of the deep learning clustering. The strains were isolated correctly between clusters in the t-SNE plot. Our results showed that the (C5) cluster exclusively includes Gamma lineage (P.1) only, suggesting that strains of P.1 in C5 are more diversified than those in the other clusters. Our study indicates that the genetic similarity between strains in the same cluster enables a better understanding of the major features of the unknown population lineages when compared to some of the more prevalent viral isolates. This information helps researchers figure out how the virus changed over time and spread to people all over the world.
Collapse
|
7
|
Loganathan T, Priya Doss C G. The influence of machine learning technologies in gut microbiome research and cancer studies - A review. Life Sci 2022; 311:121118. [DOI: 10.1016/j.lfs.2022.121118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 10/19/2022] [Accepted: 10/19/2022] [Indexed: 11/18/2022]
|
8
|
Hernández Medina R, Kutuzova S, Nielsen KN, Johansen J, Hansen LH, Nielsen M, Rasmussen S. Machine learning and deep learning applications in microbiome research. ISME COMMUNICATIONS 2022; 2:98. [PMID: 37938690 PMCID: PMC9723725 DOI: 10.1038/s43705-022-00182-9] [Citation(s) in RCA: 66] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 09/12/2022] [Accepted: 09/16/2022] [Indexed: 05/27/2023]
Abstract
The many microbial communities around us form interactive and dynamic ecosystems called microbiomes. Though concealed from the naked eye, microbiomes govern and influence macroscopic systems including human health, plant resilience, and biogeochemical cycling. Such feats have attracted interest from the scientific community, which has recently turned to machine learning and deep learning methods to interrogate the microbiome and elucidate the relationships between its composition and function. Here, we provide an overview of how the latest microbiome studies harness the inductive prowess of artificial intelligence methods. We start by highlighting that microbiome data - being compositional, sparse, and high-dimensional - necessitates special treatment. We then introduce traditional and novel methods and discuss their strengths and applications. Finally, we discuss the outlook of machine and deep learning pipelines, focusing on bottlenecks and considerations to address them.
Collapse
Affiliation(s)
- Ricardo Hernández Medina
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200, Copenhagen N, Denmark
| | - Svetlana Kutuzova
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200, Copenhagen N, Denmark
- Department of Computer Science, University of Copenhagen, DK-2100, Copenhagen Ø, Denmark
| | - Knud Nor Nielsen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200, Copenhagen N, Denmark
- Department of Plant and Environmental Sciences, University of Copenhagen, DK-1871, Frederiksberg, Denmark
| | - Joachim Johansen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200, Copenhagen N, Denmark
| | - Lars Hestbjerg Hansen
- Department of Plant and Environmental Sciences, University of Copenhagen, DK-1871, Frederiksberg, Denmark
| | - Mads Nielsen
- Department of Computer Science, University of Copenhagen, DK-2100, Copenhagen Ø, Denmark.
| | - Simon Rasmussen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200, Copenhagen N, Denmark.
| |
Collapse
|
9
|
Zhao X, Guo M, Chen J, Zhuang Z, Zhang T, Wang X, Li C, Hou N, Bai S. Successional dynamics of microbial communities in response to concentration perturbation in constructed wetland system. BIORESOURCE TECHNOLOGY 2022; 361:127733. [PMID: 35932946 DOI: 10.1016/j.biortech.2022.127733] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 07/29/2022] [Accepted: 07/29/2022] [Indexed: 06/15/2023]
Abstract
Constructed wetlands (CWs) are widely considered as resilient systems able to adapt to environmental perturbations. Little attention has been paid, however, to microbial dynamics when CWs withstand and recover from external shock. To understand the resilience of CWs, this study investigated rhizosphere microbial dynamics when CWs were subjected to influent COD perturbation (200 mg/L-1600 mg/L). Results demonstrated that CWs had strong adaptability to different influent perturbations, characterized by transitions from fluctuating to stable pollutant removal. Microbial analysis showed that rhizosphere microorganisms competed for niches in response to increased COD concentrations, and Trichococcus played key roles in resisting concentration perturbations. Structural equation modeling indicated that rhizosphere community succession and microbial energy metabolism were shaped by pH and DO. These findings provide insights into the mechanism for CW stability maintenance when facing concentration perturbations.
Collapse
Affiliation(s)
- Xinyue Zhao
- College of Resource and Environment, Northeast Agricultural University, Harbin 150030, China
| | - Mengran Guo
- College of Resource and Environment, Northeast Agricultural University, Harbin 150030, China
| | - Juntong Chen
- College of Resource and Environment, Northeast Agricultural University, Harbin 150030, China
| | - Zhixuan Zhuang
- College of Resource and Environment, Northeast Agricultural University, Harbin 150030, China
| | - Tuoshi Zhang
- College of Resource and Environment, Northeast Agricultural University, Harbin 150030, China
| | - Xiaohui Wang
- Beijing Engineering Research Center of Environmental Material for Water Purification, College of Chemical Engineering, Beijing University of Chemical Technology, Beijing 100029, China
| | - Chunyan Li
- College of Resource and Environment, Northeast Agricultural University, Harbin 150030, China
| | - Ning Hou
- College of Resource and Environment, Northeast Agricultural University, Harbin 150030, China
| | - Shunwen Bai
- School of Environment, State Key Laboratory of Urban Water Resource and Environment, Harbin Institute of Technology, Harbin 150090, China.
| |
Collapse
|
10
|
Using an Unsupervised Clustering Model to Detect the Early Spread of SARS-CoV-2 Worldwide. Genes (Basel) 2022; 13:genes13040648. [PMID: 35456454 PMCID: PMC9030792 DOI: 10.3390/genes13040648] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Revised: 03/29/2022] [Accepted: 04/05/2022] [Indexed: 02/04/2023] Open
Abstract
Deciphering the population structure of SARS-CoV-2 is critical to inform public health management and reduce the risk of future dissemination. With the continuous accruing of SARS-CoV-2 genomes worldwide, discovering an effective way to group these genomes is critical for organizing the landscape of the population structure of the virus. Taking advantage of recently published state-of-the-art machine learning algorithms, we used an unsupervised deep learning clustering algorithm to group a total of 16,873 SARS-CoV-2 genomes. Using single nucleotide polymorphisms as input features, we identified six major subtypes of SARS-CoV-2. The proportions of the clusters across the continents revealed distinct geographical distributions. Comprehensive analysis indicated that both genetic factors and human migration factors shaped the specific geographical distribution of the population structure. This study provides a different approach using clustering methods to study the population structure of a never-seen-before and fast-growing species such as SARS-CoV-2. Moreover, clustering techniques can be used for further studies of local population structures of the proliferating virus.
Collapse
|
11
|
Li Y, Liu Q, Zeng Z, Luo Y. Unsupervised clustering analysis of SARS-Cov-2 population structure reveals six major subtypes at early stage across the world. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2020.09.04.283358. [PMID: 34845455 PMCID: PMC8629198 DOI: 10.1101/2020.09.04.283358] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
Identifying the population structure of the newly emerged coronavirus SARS-CoV-2 has significant potential to inform public health management and diagnosis. As SARS-CoV-2 sequencing data accrued, grouping them into clusters is important for organizing the landscape of the population structure of the virus. Due to the limited prior information on the newly emerged coronavirus, we utilized four different clustering algorithms to group 16,873 SARS-CoV-2 strains, which automatically enables the identification of spatial structure for SARS-CoV-2. A total of six distinct genomic clusters were identified using mutation profiles as input features. Comparison of the clustering results reveals that the four algorithms produced highly consistent results, but the state-of-the-art unsupervised deep learning clustering algorithm performed best and produced the smallest intra-cluster pairwise genetic distances. The varied proportions of the six clusters within different continents revealed specific geographical distributions. In particular, our analysis found that Oceania was the only continent on which the strains were dispersively distributed into six clusters. In summary, this study provides a concrete framework for the use of clustering methods to study the global population structure of SARS-CoV-2. In addition, clustering methods can be used for future studies of variant population structures in specific regions of these fast-growing viruses.
Collapse
Affiliation(s)
- Yawei Li
- Department of Preventive Medicine, Northwestern University, Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Qingyun Liu
- Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
| | - Zexian Zeng
- Department of Data Science, Dana Farber Cancer Institute, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University, Feinberg School of Medicine, Chicago, IL 60611, USA
| |
Collapse
|
12
|
Yuan B, Ma B, Yu J, Meng Q, Du T, Li H, Zhu Y, Sun Z, Ma S, Song C. Fecal Bacteria as Non-Invasive Biomarkers for Colorectal Adenocarcinoma. Front Oncol 2021; 11:664321. [PMID: 34447694 PMCID: PMC8383742 DOI: 10.3389/fonc.2021.664321] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 06/07/2021] [Indexed: 12/14/2022] Open
Abstract
Colorectal adenocarcinoma (CRC) ranks one of the five most lethal malignant tumors both in China and worldwide. Early diagnosis and treatment of CRC could substantially increase the survival rate. Emerging evidence has revealed the importance of gut microbiome on CRC, thus fecal microbial community could be termed as a potential screen for non-invasive diagnosis. Importantly, few numbers of bacteria genus as non-invasive biomarkers with high sensitivity and specificity causing less cost would be benefitted more in clinical compared with the whole microbial community analysis. Here we analyzed the gut microbiome between CRC patients and healthy people using 16s rRNA sequencing showing the divergence of microbial composition between case and control. Furthermore, ExtraTrees classifier was performed for the classification of CRC gut microbiome and heathy control, and 13 bacteria were screened as biomarkers for CRC. In addition, 13 biomarkers including 12 bacteria genera and FOBT showed an outstanding sensitivity and specificity for discrimination of CRC patients from healthy controls. This method could be used as a non-invasive method for CRC early diagnosis.
Collapse
Affiliation(s)
- Biao Yuan
- Department of Gastroenterological Surgery, Shanghai East Hospital, Tongji University of Medicine, Shanghai, China
| | - Bin Ma
- Department of Colorectal Surgery, Cancer Hospital of China Medical University, Liaoning Cancer Hospital and Institute, Shenyang, China
| | - Jing Yu
- Research and Development Department, Shanghai Personal Biotechnology Co., Ltd, Shanghai, China.,ECNU-PERSONAL Joint Laboratory of Genetic Detection and Application, Shanghai Personal Biotechnology Co., Ltd, Shanghai, China
| | - Qingkai Meng
- Department of Colorectal Surgery, Cancer Hospital of China Medical University, Liaoning Cancer Hospital and Institute, Shenyang, China
| | - Tao Du
- Department of Gastroenterological Surgery, Shanghai East Hospital, Tongji University of Medicine, Shanghai, China
| | - Hongyi Li
- Research and Development Department, Shanghai Personal Biotechnology Co., Ltd, Shanghai, China
| | - Yueyan Zhu
- Research and Development Department, Shanghai Personal Biotechnology Co., Ltd, Shanghai, China
| | - Zikui Sun
- Research and Development Department, Shanghai Personal Biotechnology Co., Ltd, Shanghai, China.,ECNU-PERSONAL Joint Laboratory of Genetic Detection and Application, Shanghai Personal Biotechnology Co., Ltd, Shanghai, China
| | - Siping Ma
- Department of Colorectal Surgery, Cancer Hospital of China Medical University, Liaoning Cancer Hospital and Institute, Shenyang, China
| | - Chun Song
- Department of Gastroenterological Surgery, Shanghai East Hospital, Tongji University of Medicine, Shanghai, China
| |
Collapse
|
13
|
Marcos-Zambrano LJ, Karaduzovic-Hadziabdic K, Loncar Turukalo T, Przymus P, Trajkovik V, Aasmets O, Berland M, Gruca A, Hasic J, Hron K, Klammsteiner T, Kolev M, Lahti L, Lopes MB, Moreno V, Naskinova I, Org E, Paciência I, Papoutsoglou G, Shigdel R, Stres B, Vilne B, Yousef M, Zdravevski E, Tsamardinos I, Carrillo de Santa Pau E, Claesson MJ, Moreno-Indias I, Truu J. Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment. Front Microbiol 2021; 12:634511. [PMID: 33737920 PMCID: PMC7962872 DOI: 10.3389/fmicb.2021.634511] [Citation(s) in RCA: 139] [Impact Index Per Article: 34.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 02/01/2021] [Indexed: 12/19/2022] Open
Abstract
The number of microbiome-related studies has notably increased the availability of data on human microbiome composition and function. These studies provide the essential material to deeply explore host-microbiome associations and their relation to the development and progression of various complex diseases. Improved data-analytical tools are needed to exploit all information from these biological datasets, taking into account the peculiarities of microbiome data, i.e., compositional, heterogeneous and sparse nature of these datasets. The possibility of predicting host-phenotypes based on taxonomy-informed feature selection to establish an association between microbiome and predict disease states is beneficial for personalized medicine. In this regard, machine learning (ML) provides new insights into the development of models that can be used to predict outputs, such as classification and prediction in microbiology, infer host phenotypes to predict diseases and use microbial communities to stratify patients by their characterization of state-specific microbial signatures. Here we review the state-of-the-art ML methods and respective software applied in human microbiome studies, performed as part of the COST Action ML4Microbiome activities. This scoping review focuses on the application of ML in microbiome studies related to association and clinical use for diagnostics, prognostics, and therapeutics. Although the data presented here is more related to the bacterial community, many algorithms could be applied in general, regardless of the feature type. This literature and software review covering this broad topic is aligned with the scoping review methodology. The manual identification of data sources has been complemented with: (1) automated publication search through digital libraries of the three major publishers using natural language processing (NLP) Toolkit, and (2) an automated identification of relevant software repositories on GitHub and ranking of the related research papers relying on learning to rank approach.
Collapse
Affiliation(s)
- Laura Judith Marcos-Zambrano
- Computational Biology Group, Precision Nutrition and Cancer Research Program, IMDEA Food Institute, Madrid, Spain
| | | | | | - Piotr Przymus
- Faculty of Mathematics and Computer Science, Nicolaus Copernicus University, Toruń, Poland
| | - Vladimir Trajkovik
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, Skopje, North Macedonia
| | - Oliver Aasmets
- Institute of Genomics, Estonian Genome Centre, University of Tartu, Tartu, Estonia
- Department of Biotechnology, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| | - Magali Berland
- Université Paris-Saclay, INRAE, MGP, Jouy-en-Josas, France
| | - Aleksandra Gruca
- Department of Computer Networks and Systems, Silesian University of Technology, Gliwice, Poland
| | - Jasminka Hasic
- University Sarajevo School of Science and Technology, Sarajevo, Bosnia and Herzegovina
| | - Karel Hron
- Department of Mathematical Analysis and Applications of Mathematics, Palacký University, Olomouc, Czechia
| | | | - Mikhail Kolev
- South West University “Neofit Rilski”, Blagoevgrad, Bulgaria
| | - Leo Lahti
- Department of Computing, University of Turku, Turku, Finland
| | - Marta B. Lopes
- NOVA Laboratory for Computer Science and Informatics (NOVA LINCS), FCT, UNL, Caparica, Portugal
- Centro de Matemática e Aplicações (CMA), FCT, UNL, Caparica, Portugal
| | - Victor Moreno
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO)Barcelona, Spain
- Colorectal Cancer Group, Institut de Recerca Biomedica de Bellvitge (IDIBELL), Barcelona, Spain
- Consortium for Biomedical Research in Epidemiology and Public Health (CIBERESP), Barcelona, Spain
- Department of Clinical Sciences, Faculty of Medicine, University of Barcelona, Barcelona, Spain
| | - Irina Naskinova
- South West University “Neofit Rilski”, Blagoevgrad, Bulgaria
| | - Elin Org
- Institute of Genomics, Estonian Genome Centre, University of Tartu, Tartu, Estonia
| | - Inês Paciência
- EPIUnit – Instituto de Saúde Pública da Universidade do Porto, Porto, Portugal
| | | | - Rajesh Shigdel
- Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Blaz Stres
- Group for Microbiology and Microbial Biotechnology, Department of Animal Science, University of Ljubljana, Ljubljana, Slovenia
| | - Baiba Vilne
- Bioinformatics Research Unit, Riga Stradins University, Riga, Latvia
| | - Malik Yousef
- Department of Information Systems, Zefat Academic College, Zefat, Israel
- Galilee Digital Health Research Center (GDH), Zefat Academic College, Zefat, Israel
| | - Eftim Zdravevski
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, Skopje, North Macedonia
| | | | | | - Marcus J. Claesson
- School of Microbiology & APC Microbiome Ireland, University College Cork, Cork, Ireland
| | - Isabel Moreno-Indias
- Unidad de Gestión Clínica de Endocrinología y Nutrición, Instituto de Investigación Biomédica de Málaga (IBIMA), Hospital Clínico Universitario Virgen de la Victoria, Universidad de Málaga, Málaga, Spain
- Centro de Investigación Biomédica en Red de Fisiopatología de la Obesidad y la Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, Spain
| | - Jaak Truu
- Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| |
Collapse
|
14
|
Wang X, Zhao K, Zhou X, Street N. Predicting User Posting Activities in Online Health Communities with Deep Learning. ACM TRANSACTIONS ON MANAGEMENT INFORMATION SYSTEMS 2020. [DOI: 10.1145/3383780] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Online health communities (OHCs) represent a great source of social support for patients and their caregivers. Better predictions of user activities in OHCs can help improve user engagement and retention, which are important to manage and sustain a successful OHC. This article proposes a general framework to predict OHC user posting activities. Deep learning methods are adopted to learn from users’ temporal trajectories in both the volumes and content of posts published over time. Experiments based on data from a popular OHC for cancer survivors demonstrate that the proposed approach can improve the performance of user activity predictions. In addition, several topics of users’ posts are found to have strong impact on predicting users’ activities in the OHC.
Collapse
Affiliation(s)
| | | | - Xun Zhou
- University of Iowa, Iowa City, IA
| | | |
Collapse
|
15
|
Xia Y. Correlation and association analyses in microbiome study integrating multiomics in health and disease. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2020; 171:309-491. [PMID: 32475527 DOI: 10.1016/bs.pmbts.2020.04.003] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Correlation and association analyses are one of the most widely used statistical methods in research fields, including microbiome and integrative multiomics studies. Correlation and association have two implications: dependence and co-occurrence. Microbiome data are structured as phylogenetic tree and have several unique characteristics, including high dimensionality, compositionality, sparsity with excess zeros, and heterogeneity. These unique characteristics cause several statistical issues when analyzing microbiome data and integrating multiomics data, such as large p and small n, dependency, overdispersion, and zero-inflation. In microbiome research, on the one hand, classic correlation and association methods are still applied in real studies and used for the development of new methods; on the other hand, new methods have been developed to target statistical issues arising from unique characteristics of microbiome data. Here, we first provide a comprehensive view of classic and newly developed univariate correlation and association-based methods. We discuss the appropriateness and limitations of using classic methods and demonstrate how the newly developed methods mitigate the issues of microbiome data. Second, we emphasize that concepts of correlation and association analyses have been shifted by introducing network analysis, microbe-metabolite interactions, functional analysis, etc. Third, we introduce multivariate correlation and association-based methods, which are organized by the categories of exploratory, interpretive, and discriminatory analyses and classification methods. Fourth, we focus on the hypothesis testing of univariate and multivariate regression-based association methods, including alpha and beta diversities-based, count-based, and relative abundance (or compositional)-based association analyses. We demonstrate the characteristics and limitations of each approaches. Fifth, we introduce two specific microbiome-based methods: phylogenetic tree-based association analysis and testing for survival outcomes. Sixth, we provide an overall view of longitudinal methods in analysis of microbiome and omics data, which cover standard, static, regression-based time series methods, principal trend analysis, and newly developed univariate overdispersed and zero-inflated as well as multivariate distance/kernel-based longitudinal models. Finally, we comment on current association analysis and future direction of association analysis in microbiome and multiomics studies.
Collapse
Affiliation(s)
- Yinglin Xia
- Department of Medicine, University of Illinois at Chicago, Chicago, IL, United States.
| |
Collapse
|
16
|
Woloszynek S, Mell JC, Zhao Z, Simpson G, O’Connor MP, Rosen GL. Exploring thematic structure and predicted functionality of 16S rRNA amplicon data. PLoS One 2019; 14:e0219235. [PMID: 31825995 PMCID: PMC6905537 DOI: 10.1371/journal.pone.0219235] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Accepted: 10/19/2019] [Indexed: 12/21/2022] Open
Abstract
Analysis of microbiome data involves identifying co-occurring groups of taxa associated with sample features of interest (e.g., disease state). Elucidating such relations is often difficult as microbiome data are compositional, sparse, and have high dimensionality. Also, the configuration of co-occurring taxa may represent overlapping subcommunities that contribute to sample characteristics such as host status. Preserving the configuration of co-occurring microbes rather than detecting specific indicator species is more likely to facilitate biologically meaningful interpretations. Additionally, analyses that use taxonomic relative abundances to predict the abundances of different gene functions aggregate predicted functional profiles across taxa. This precludes straightforward identification of predicted functional components associated with subsets of co-occurring taxa. We provide an approach to explore co-occurring taxa using "topics" generated via a topic model and link these topics to specific sample features (e.g., disease state). Rather than inferring predicted functional content based on overall taxonomic relative abundances, we instead focus on inference of functional content within topics, which we parse by estimating interactions between topics and pathways through a multilevel, fully Bayesian regression model. We apply our methods to three publicly available 16S amplicon sequencing datasets: an inflammatory bowel disease dataset, an oral cancer dataset, and a time-series dataset. Using our topic model approach to uncover latent structure in 16S rRNA amplicon surveys, investigators can (1) capture groups of co-occurring taxa termed topics; (2) uncover within-topic functional potential; (3) link taxa co-occurrence, gene function, and environmental/host features; and (4) explore the way in which sets of co-occurring taxa behave and evolve over time. These methods have been implemented in a freely available R package: https://cran.r-project.org/package=themetagenomics, https://github.com/EESI/themetagenomics.
Collapse
Affiliation(s)
- Stephen Woloszynek
- Department of Electrical and Computer Engineering, Drexel University, Philadelphia, Pennsylvania, United States of America
| | - Joshua Chang Mell
- Department of Microbiology and Immunology, Drexel University College of Medicine, Philadelphia, Pennsylvania, United States of America
| | - Zhengqiao Zhao
- Department of Electrical and Computer Engineering, Drexel University, Philadelphia, Pennsylvania, United States of America
| | - Gideon Simpson
- Department of Mathematics, Drexel University, Philadelphia, Pennsylvania, United States of America
| | - Michael P. O’Connor
- Department of Biodiversity, Earth, and Environmental Science, Drexel University, Philadelphia, Pennsylvania, United States of America
| | - Gail L. Rosen
- Department of Electrical and Computer Engineering, Drexel University, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
17
|
Nembrini S. On what to permute in test-based approaches for variable importance measures in Random Forests. Bioinformatics 2019; 35:2701-2705. [PMID: 30561510 DOI: 10.1093/bioinformatics/bty1025] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Revised: 12/09/2018] [Accepted: 12/12/2018] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION In bioinformatics applications, it is currently customary to permute the outcome variable in order to produce inference on covariates to test novel methods or statistics whose distributions are poorly known. The seminal publication of Altmann et al. in Bioinformatics uses the same permutation scheme to obtain P-values that can be treated as corrected measure of feature importance to rectify the bias of the Gini variable importance in Random Forests. Since then, such method has been used in applied work to also draw statistical conclusions on variable importance measures from resulting P-values. RESULTS In this paper, we show that permuting the outcome may produce unexpected results, including P-values with undesirable properties and illustrate how more refined permutation schemes can be appropriate to obtain desirable results, including high power in discovering relevant variables. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Stefano Nembrini
- Department of Pathology, Immunology and Laboratory Medicine, College of Medicine, Emerging Pathogens Institute, University of Florida, Gainesville, FL, USA
| |
Collapse
|
18
|
Wassan JT, Wang H, Browne F, Zheng H. Phy-PMRFI: Phylogeny-Aware Prediction of Metagenomic Functions Using Random Forest Feature Importance. IEEE Trans Nanobioscience 2019; 18:273-282. [PMID: 31021803 DOI: 10.1109/tnb.2019.2912824] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
High-throughput sequencing techniques have accelerated functional metagenomics studies through the generation of large volumes of omics data. The integration of these data using computational approaches is potentially useful for predicting metagenomic functions. Machine learning (ML) models can be trained using microbial features which are then used to classify microbial data into different functional classes. For example, ML analyses over the human microbiome data has been linked to the prediction of important biological states. For analysing omics data, integrating abundance count of taxonomical features with their biological relationships is important. These relationships can potentially be uncovered from the phylogenetic tree of microbial taxa. In this paper, we propose a novel integrative framework Phy-PMRFI. This framework is driven by the phylogeny-based modeling of omics data to predict metagenomic functions using important features selected by a random forest importance (RFI) strategy. The proposed framework integrates the underlying phylogenetic tree information with abundance measures of microbial species (features) by creating a novel phylogeny and abundance aware matrix structure (PAAM). Phy-PMRFI progresses by ranking the microbial features using an RFI measure. This is then used as input for microbiome classification. The resultant feature set enhances the performance of the state-of-art methods such as support vector machines. Our proposed integrative framework also outperforms the state-of-the-art pipeline of phylogenetic isometric log-ratio transform (PhILR) and MetaPhyl. Prediction accuracy of 90 % is obtained with Phy-PMRFI over human throat microbiome in comparison to other approaches of PhILR with 53% and MetaPhyl with 71% accuracy.
Collapse
|
19
|
Degenhardt F, Seifert S, Szymczak S. Evaluation of variable selection methods for random forests and omics data sets. Brief Bioinform 2019; 20:492-503. [PMID: 29045534 PMCID: PMC6433899 DOI: 10.1093/bib/bbx124] [Citation(s) in RCA: 263] [Impact Index Per Article: 43.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2017] [Revised: 09/06/2017] [Indexed: 12/28/2022] Open
Abstract
Machine learning methods and in particular random forests are promising approaches for prediction based on high dimensional omics data sets. They provide variable importance measures to rank predictors according to their predictive power. If building a prediction model is the main goal of a study, often a minimal set of variables with good prediction performance is selected. However, if the objective is the identification of involved variables to find active networks and pathways, approaches that aim to select all relevant variables should be preferred. We evaluated several variable selection procedures based on simulated data as well as publicly available experimental methylation and gene expression data. Our comparison included the Boruta algorithm, the Vita method, recurrent relative variable importance, a permutation approach and its parametric variant (Altmann) as well as recursive feature elimination (RFE). In our simulation studies, Boruta was the most powerful approach, followed closely by the Vita method. Both approaches demonstrated similar stability in variable selection, while Vita was the most robust approach under a pure null model without any predictor variables related to the outcome. In the analysis of the different experimental data sets, Vita demonstrated slightly better stability in variable selection and was less computationally intensive than Boruta. In conclusion, we recommend the Boruta and Vita approaches for the analysis of high-dimensional data sets. Vita is considerably faster than Boruta and thus more suitable for large data sets, but only Boruta can also be applied in low-dimensional settings.
Collapse
Affiliation(s)
| | - Stephan Seifert
- Institute of Medical Informatics and Statistics, Kiel University, Germany
| | - Silke Szymczak
- Institute of Medical Informatics and Statistics, Kiel University, Germany
| |
Collapse
|
20
|
Xiao J, Chen L, Yu Y, Zhang X, Chen J. A Phylogeny-Regularized Sparse Regression Model for Predictive Modeling of Microbial Community Data. Front Microbiol 2018; 9:3112. [PMID: 30619188 PMCID: PMC6305753 DOI: 10.3389/fmicb.2018.03112] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2018] [Accepted: 12/03/2018] [Indexed: 12/16/2022] Open
Abstract
Fueled by technological advancement, there has been a surge of human microbiome studies surveying the microbial communities associated with the human body and their links with health and disease. As a complement to the human genome, the human microbiome holds great potential for precision medicine. Efficient predictive models based on microbiome data could be potentially used in various clinical applications such as disease diagnosis, patient stratification and drug response prediction. One important characteristic of the microbial community data is the phylogenetic tree that relates all the microbial taxa based on their evolutionary history. The phylogenetic tree is an informative prior for more efficient prediction since the microbial community changes are usually not randomly distributed on the tree but tend to occur in clades at varying phylogenetic depths (clustered signal). Although community-wide changes are possible for some conditions, it is also likely that the community changes are only associated with a small subset of "marker" taxa (sparse signal). Unfortunately, predictive models of microbial community data taking into account both the sparsity and the tree structure remain under-developed. In this paper, we propose a predictive framework to exploit sparse and clustered microbiome signals using a phylogeny-regularized sparse regression model. Our approach is motivated by evolutionary theory, where a natural correlation structure among microbial taxa exists according to the phylogenetic relationship. A novel phylogeny-based smoothness penalty is proposed to smooth the coefficients of the microbial taxa with respect to the phylogenetic tree. Using simulated and real datasets, we show that our method achieves better prediction performance than competing sparse regression methods for sparse and clustered microbiome signals.
Collapse
Affiliation(s)
- Jian Xiao
- Division of Biomedical Statistics and Informatics, Center for Individualized Medicine, Mayo Clinic Rochester, MN, United States.,School of Statistics and Mathematics Zhongnan University of Economics and Law, Wuhan, China
| | - Li Chen
- Department of Health Outcomes Research and Policy, Harrison School of Pharmacy, Auburn University Auburn, AL, United States
| | - Yue Yu
- Division of Biomedical Statistics and Informatics, Center for Individualized Medicine, Mayo Clinic Rochester, MN, United States
| | - Xianyang Zhang
- Department of Statistics, Texas A&M University College Station, TX, United States
| | - Jun Chen
- Division of Biomedical Statistics and Informatics, Center for Individualized Medicine, Mayo Clinic Rochester, MN, United States
| |
Collapse
|
21
|
Forbes JD, Chen CY, Knox NC, Marrie RA, El-Gabalawy H, de Kievit T, Alfa M, Bernstein CN, Van Domselaar G. A comparative study of the gut microbiota in immune-mediated inflammatory diseases-does a common dysbiosis exist? MICROBIOME 2018; 6:221. [PMID: 30545401 PMCID: PMC6292067 DOI: 10.1186/s40168-018-0603-4] [Citation(s) in RCA: 270] [Impact Index Per Article: 38.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Accepted: 11/25/2018] [Indexed: 05/12/2023]
Abstract
BACKGROUND Immune-mediated inflammatory disease (IMID) represents a substantial health concern. It is widely recognized that IMID patients are at a higher risk for developing secondary inflammation-related conditions. While an ambiguous etiology is common to all IMIDs, in recent years, considerable knowledge has emerged regarding the plausible role of the gut microbiome in IMIDs. This study used 16S rRNA gene amplicon sequencing to compare the gut microbiota of patients with Crohn's disease (CD; N = 20), ulcerative colitis (UC; N = 19), multiple sclerosis (MS; N = 19), and rheumatoid arthritis (RA; N = 21) versus healthy controls (HC; N = 23). Biological replicates were collected from participants within a 2-month interval. This study aimed to identify common (or unique) taxonomic biomarkers of IMIDs using both differential abundance testing and a machine learning approach. RESULTS Significant microbial community differences between cohorts were observed (pseudo F = 4.56; p = 0.01). Richness and diversity were significantly different between cohorts (pFDR < 0.001) and were lowest in CD while highest in HC. Abundances of Actinomyces, Eggerthella, Clostridium III, Faecalicoccus, and Streptococcus (pFDR < 0.001) were significantly higher in all disease cohorts relative to HC, whereas significantly lower abundances were observed for Gemmiger, Lachnospira, and Sporobacter (pFDR < 0.001). Several taxa were found to be differentially abundant in IMIDs versus HC including significantly higher abundances of Intestinibacter in CD, Bifidobacterium in UC, and unclassified Erysipelotrichaceae in MS and significantly lower abundances of Coprococcus in CD, Dialister in MS, and Roseburia in RA. A machine learning approach to classify disease versus HC was highest for CD (AUC = 0.93 and AUC = 0.95 for OTU and genus features, respectively) followed by MS, RA, and UC. Gemmiger and Faecalicoccus were identified as important features for classification of subjects to CD and HC. In general, features identified by differential abundance testing were consistent with machine learning feature importance. CONCLUSIONS This study identified several gut microbial taxa with differential abundance patterns common to IMIDs. We also found differentially abundant taxa between IMIDs. These taxa may serve as biomarkers for the detection and diagnosis of IMIDs and suggest there may be a common component to IMID etiology.
Collapse
Affiliation(s)
- Jessica D. Forbes
- Department of Internal Medicine, University of Manitoba, Winnipeg, MB Canada
- University of Manitoba IBD Clinical and Research Centre, Winnipeg, MB Canada
- National Microbiology Laboratory, Public Health Agency of Canada, 1015 Arlington Street, Winnipeg, MB R3E 3R2 Canada
- Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Winnipeg, MB Canada
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Canada
| | - Chih-yu Chen
- National Microbiology Laboratory, Public Health Agency of Canada, 1015 Arlington Street, Winnipeg, MB R3E 3R2 Canada
| | - Natalie C. Knox
- National Microbiology Laboratory, Public Health Agency of Canada, 1015 Arlington Street, Winnipeg, MB R3E 3R2 Canada
| | - Ruth-Ann Marrie
- Department of Internal Medicine, University of Manitoba, Winnipeg, MB Canada
- Department of Community Health Sciences, University of Manitoba, Winnipeg, MB Canada
| | - Hani El-Gabalawy
- Department of Internal Medicine, University of Manitoba, Winnipeg, MB Canada
- Arthritis Centre, University of Manitoba, Winnipeg, MB Canada
| | - Teresa de Kievit
- Department of Microbiology, University of Manitoba, Winnipeg, MB Canada
| | - Michelle Alfa
- Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Winnipeg, MB Canada
| | - Charles N. Bernstein
- Department of Internal Medicine, University of Manitoba, Winnipeg, MB Canada
- University of Manitoba IBD Clinical and Research Centre, Winnipeg, MB Canada
| | - Gary Van Domselaar
- University of Manitoba IBD Clinical and Research Centre, Winnipeg, MB Canada
- National Microbiology Laboratory, Public Health Agency of Canada, 1015 Arlington Street, Winnipeg, MB R3E 3R2 Canada
- Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Winnipeg, MB Canada
| |
Collapse
|
22
|
Sylvester EVA, Bentzen P, Bradbury IR, Clément M, Pearce J, Horne J, Beiko RG. Applications of random forest feature selection for fine-scale genetic population assignment. Evol Appl 2017; 11:153-165. [PMID: 29387152 PMCID: PMC5775496 DOI: 10.1111/eva.12524] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Accepted: 07/11/2017] [Indexed: 01/10/2023] Open
Abstract
Genetic population assignment used to inform wildlife management and conservation efforts requires panels of highly informative genetic markers and sensitive assignment tests. We explored the utility of machine‐learning algorithms (random forest, regularized random forest and guided regularized random forest) compared with FST ranking for selection of single nucleotide polymorphisms (SNP) for fine‐scale population assignment. We applied these methods to an unpublished SNP data set for Atlantic salmon (Salmo salar) and a published SNP data set for Alaskan Chinook salmon (Oncorhynchus tshawytscha). In each species, we identified the minimum panel size required to obtain a self‐assignment accuracy of at least 90% using each method to create panels of 50–700 markers Panels of SNPs identified using random forest‐based methods performed up to 7.8 and 11.2 percentage points better than FST‐selected panels of similar size for the Atlantic salmon and Chinook salmon data, respectively. Self‐assignment accuracy ≥90% was obtained with panels of 670 and 384 SNPs for each data set, respectively, a level of accuracy never reached for these species using FST‐selected panels. Our results demonstrate a role for machine‐learning approaches in marker selection across large genomic data sets to improve assignment for management and conservation of exploited populations.
Collapse
Affiliation(s)
| | - Paul Bentzen
- Marine Gene Probe Laboratory Department of Biology Dalhousie University Halifax NS Canada
| | | | - Marie Clément
- Centre for Fisheries Ecosystems Research, Fisheries and Marine Institute Memorial University of Newfoundland St. John's NL Canada.,Labrador Institute Memorial University of Newfoundland Happy Valley-Goose Bay NL Canada
| | - Jon Pearce
- Northern SE Regional Aquaculture Association Hidden Falls Hatchery Sitka AK USA
| | - John Horne
- Marine Gene Probe Laboratory Department of Biology Dalhousie University Halifax NS Canada
| | - Robert G Beiko
- Faculty of Computer Science Dalhousie University Halifax NS Canada
| |
Collapse
|
23
|
Signature of Microbial Dysbiosis in Periodontitis. Appl Environ Microbiol 2017; 83:AEM.00462-17. [PMID: 28476771 DOI: 10.1128/aem.00462-17] [Citation(s) in RCA: 80] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2017] [Accepted: 05/02/2017] [Indexed: 01/11/2023] Open
Abstract
Periodontitis is driven by disproportionate host inflammatory immune responses induced by an imbalance in the composition of oral bacteria; this instigates microbial dysbiosis, along with failed resolution of the chronic destructive inflammation. The objectives of this study were to identify microbial signatures for health and chronic periodontitis at the genus level and to propose a model of dysbiosis, including the calculation of bacterial ratios. Published sequencing data obtained from several different studies (196 subgingival samples from patients with chronic periodontitis and 422 subgingival samples from healthy subjects) were pooled and subjected to a new microbiota analysis using the same Visualization and Analysis of Microbial Population Structures (VAMPS) pipeline, to identify microbiota specific to health and disease. Microbiota were visualized using CoNet and Cytoscape. Dysbiosis ratios, defined as the percentage of genera associated with disease relative to the percentage of genera associated with health, were calculated to distinguish disease from health. Correlations between the proposed dysbiosis ratio and the periodontal pocket depth were tested with a different set of data obtained from a recent study, to confirm the relevance of the ratio as a potential indicator of dysbiosis. Beta diversity showed significant clustering of periodontitis-associated microbiota, at the genus level, according to the clinical status and independent of the methods used. Specific genera (Veillonella, Neisseria, Rothia, Corynebacterium, and Actinomyces) were highly prevalent (>95%) in health, while other genera (Eubacterium, Campylobacter, Treponema, and Tannerella) were associated with chronic periodontitis. The calculation of dysbiosis ratios based on the relative abundance of the genera found in health versus periodontitis was tested. Nonperiodontitis samples were significantly identifiable by low ratios, compared to chronic periodontitis samples. When applied to a subgingival sample set with well-defined clinical data, the method showed a strong correlation between the dysbiosis ratio, as well as a simplified ratio (Porphyromonas, Treponema, and Tannerella to Rothia and Corynebacterium), and pocket depth. Microbial analysis of chronic periodontitis can be correlated with the pocket depth through specific signatures for microbial dysbiosis.IMPORTANCE Defining microbiota typical of oral health or chronic periodontitis is difficult. The evaluation of periodontal disease is currently based on probing of the periodontal pocket. However, the status of pockets "on the mend" or sulci at risk of periodontitis cannot be addressed solely through pocket depth measurements or current microbiological tests available for practitioners. Thus, a more specific microbiological measure of dysbiosis could help in future diagnoses of periodontitis. In this work, data from different studies were pooled, to improve the accuracy of the results. However, analysis of multiple species from different studies intensified the bacterial network and complicated the search for reproducible microbial signatures. Despite the use of different methods in each study, investigation of the microbiota at the genus level showed that some genera were prevalent (up to 95% of the samples) in health or disease, allowing the calculation of bacterial ratios (i.e., dysbiosis ratios). The correlation between the proposed ratios and the periodontal pocket depth was tested, which confirmed the link between dysbiosis ratios and the severity of the disease. The results of this work are promising, but longitudinal studies will be required to improve the ratios and to define the microbial signatures of the disease, which will allow monitoring of periodontal pocket recovery and, conceivably, determination of the potential risk of periodontitis among healthy patients.
Collapse
|
24
|
Washburne AD, Silverman JD, Leff JW, Bennett DJ, Darcy JL, Mukherjee S, Fierer N, David LA. Phylogenetic factorization of compositional data yields lineage-level associations in microbiome datasets. PeerJ 2017; 5:e2969. [PMID: 28289558 PMCID: PMC5345826 DOI: 10.7717/peerj.2969] [Citation(s) in RCA: 74] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Accepted: 01/09/2017] [Indexed: 01/06/2023] Open
Abstract
Marker gene sequencing of microbial communities has generated big datasets of microbial relative abundances varying across environmental conditions, sample sites and treatments. These data often come with putative phylogenies, providing unique opportunities to investigate how shared evolutionary history affects microbial abundance patterns. Here, we present a method to identify the phylogenetic factors driving patterns in microbial community composition. We use the method, "phylofactorization," to re-analyze datasets from the human body and soil microbial communities, demonstrating how phylofactorization is a dimensionality-reducing tool, an ordination-visualization tool, and an inferential tool for identifying edges in the phylogeny along which putative functional ecological traits may have arisen.
Collapse
Affiliation(s)
- Alex D. Washburne
- Nicholas School of the Environment, Duke University, Durham, NC, United States
| | - Justin D. Silverman
- Program for Computational Biology and Bioinformatics, Duke University, Durham, NC, United States
- Medical Scientist Training Program, Duke University, Durham, NC, United States
- Center for Genomic and Computational Biology, Duke University, Durham, NC, United States
- Department of Molecular Genetics and Microbiology, Duke University, Durham, NC, United States
| | - Jonathan W. Leff
- Cooperative Institute for Research in Environmental Sciences, University of Colorado, Boulder, CO, United States
| | - Dominic J. Bennett
- Department of Earth Science and Engineering, Imperial College London, London, United Kingdom
- Institute of Zoology, Zoological Society of London, London, United Kingdom
| | - John L. Darcy
- Department of Ecology and Evolution, University of Colorado Boulder, Boulder, CO, United States
| | - Sayan Mukherjee
- Program for Computational Biology and Bioinformatics, Duke University, Durham, NC, United States
- Department of Statistical Science, Mathematics, and Computer Science, Duke University, Durham, NC, United States
| | - Noah Fierer
- Cooperative Institute for Research in Environmental Sciences, University of Colorado, Boulder, CO, United States
| | - Lawrence A. David
- Program for Computational Biology and Bioinformatics, Duke University, Durham, NC, United States
- Center for Genomic and Computational Biology, Duke University, Durham, NC, United States
- Department of Molecular Genetics and Microbiology, Duke University, Durham, NC, United States
| |
Collapse
|
25
|
Microbial Malaise: How Can We Classify the Microbiome? Trends Microbiol 2015; 23:671-679. [DOI: 10.1016/j.tim.2015.08.009] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2015] [Revised: 08/11/2015] [Accepted: 08/21/2015] [Indexed: 01/05/2023]
|