Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Xiao J, Chen L, Johnson S, Yu Y, Zhang X, Chen J. Predictive Modeling of Microbiome Data Using a Phylogeny-Regularized Generalized Linear Mixed Model. Front Microbiol 2018;9:1391. [PMID: 29997602 PMCID: PMC6030386 DOI: 10.3389/fmicb.2018.01391] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2018] [Accepted: 06/06/2018] [Indexed: 12/21/2022] Open

For:	Xiao J, Chen L, Johnson S, Yu Y, Zhang X, Chen J. Predictive Modeling of Microbiome Data Using a Phylogeny-Regularized Generalized Linear Mixed Model. Front Microbiol 2018;9:1391. [PMID: 29997602 PMCID: PMC6030386 DOI: 10.3389/fmicb.2018.01391] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2018] [Accepted: 06/06/2018] [Indexed: 12/21/2022] Open

Number

Cited by Other Article(s)

Xu H, Wang T, Miao Y, Qian M, Yang Y, Wang S. MK-BMC: a Multi-Kernel framework with Boosted distance metrics for Microbiome data for Classification. Bioinformatics 2024;40:btad757. [PMID: 38200571 PMCID: PMC10789312 DOI: 10.1093/bioinformatics/btad757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 10/30/2023] [Accepted: 01/09/2024] [Indexed: 01/12/2024] Open

Regueira-Iglesias A, Balsa-Castro C, Blanco-Pintos T, Tomás I. Critical review of 16S rRNA gene sequencing workflow in microbiome studies: From primer selection to advanced data analysis. Mol Oral Microbiol 2023;38:347-399. [PMID: 37804481 DOI: 10.1111/omi.12434] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 09/01/2023] [Accepted: 09/14/2023] [Indexed: 10/09/2023]

Abstract

The multi-batch reanalysis approach of jointly reevaluating gene/genome sequences from different works has gained particular relevance in the literature in recent years. The large amount of 16S ribosomal ribonucleic acid (rRNA) gene sequence data stored in public repositories and information in taxonomic databases of the same gene far exceeds that related to complete genomes. This review is intended to guide researchers new to studying microbiota, particularly the oral microbiota, using 16S rRNA gene sequencing and those who want to expand and update their knowledge to optimise their decision-making and improve their research results. First, we describe the advantages and disadvantages of using the 16S rRNA gene as a phylogenetic marker and the latest findings on the impact of primer pair selection on diversity and taxonomic assignment outcomes in oral microbiome studies. Strategies for primer selection based on these results are introduced. Second, we identified the key factors to consider in selecting the sequencing technology and platform. The process and particularities of the main steps for processing 16S rRNA gene-derived data are described in detail to enable researchers to choose the most appropriate bioinformatics pipeline and analysis methods based on the available evidence. We then produce an overview of the different types of advanced analyses, both the most widely used in the literature and the most recent approaches. Several indices, metrics and software for studying microbial communities are included, highlighting their advantages and disadvantages. Considering the principles of clinical metagenomics, we conclude that future research should focus on rigorous analytical approaches, such as developing predictive models to identify microbiome-based biomarkers to classify health and disease states. Finally, we address the batch effect concept and the microbiome-specific methods for accounting for or correcting them.

Collapse

Li B, Wang T, Qian M, Wang S. MKMR: a multi-kernel machine regression model to predict health outcomes using human microbiome data. Brief Bioinform 2023;24:7142722. [PMID: 37099694 DOI: 10.1093/bib/bbad158] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 03/24/2023] [Accepted: 04/03/2023] [Indexed: 04/28/2023] Open

Yang L, Chen J. Benchmarking differential abundance analysis methods for correlated microbiome sequencing data. Brief Bioinform 2023;24:bbac607. [PMID: 36617187 PMCID: PMC9851339 DOI: 10.1093/bib/bbac607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 11/16/2022] [Accepted: 12/10/2022] [Indexed: 01/09/2023] Open

Yang L, Chen J. A comprehensive evaluation of microbial differential abundance analysis methods: current status and potential solutions. MICROBIOME 2022;10:130. [PMID: 35986393 PMCID: PMC9392415 DOI: 10.1186/s40168-022-01320-0] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Accepted: 07/04/2022] [Indexed: 06/12/2023]

Abstract

BACKGROUND

Differential abundance analysis (DAA) is one central statistical task in microbiome data analysis. A robust and powerful DAA tool can help identify highly confident microbial candidates for further biological validation. Numerous DAA tools have been proposed in the past decade addressing the special characteristics of microbiome data such as zero inflation and compositional effects. Disturbingly, different DAA tools could sometimes produce quite discordant results, opening to the possibility of cherry-picking the tool in favor of one's own hypothesis. To recommend the best DAA tool or practice to the field, a comprehensive evaluation, which covers as many biologically relevant scenarios as possible, is critically needed.

RESULTS

We performed by far the most comprehensive evaluation of existing DAA tools using real data-based simulations. We found that DAA methods explicitly addressing compositional effects such as ANCOM-BC, Aldex2, metagenomeSeq (fitFeatureModel), and DACOMP did have improved performance in false-positive control. But they are still not optimal: type 1 error inflation or low statistical power has been observed in many settings. The recent LDM method generally had the best power, but its false-positive control in the presence of strong compositional effects was not satisfactory. Overall, none of the evaluated methods is simultaneously robust, powerful, and flexible, which makes the selection of the best DAA tool difficult. To meet the analysis needs, we designed an optimized procedure, ZicoSeq, drawing on the strength of the existing DAA methods. We show that ZicoSeq generally controlled for false positives across settings, and the power was among the highest. Application of DAA methods to a large collection of real datasets revealed a similar pattern observed in simulation studies.

CONCLUSIONS

Based on the benchmarking study, we conclude that none of the existing DAA methods evaluated can be applied blindly to any real microbiome dataset. The applicability of an existing DAA method depends on specific settings, which are usually unknown a priori. To circumvent the difficulty of selecting the best DAA tool in practice, we design ZicoSeq, which addresses the major challenges in DAA and remedies the drawbacks of existing DAA methods. ZicoSeq can be applied to microbiome datasets from diverse settings and is a useful DAA tool for robust microbiome biomarker discovery. Video Abstract.

Collapse

Huang C, Callahan BJ, Wu MC, Holloway ST, Brochu H, Lu W, Peng X, Tzeng JY. Phylogeny-guided microbiome OTU-specific association test (POST). MICROBIOME 2022;10:86. [PMID: 35668471 PMCID: PMC9171974 DOI: 10.1186/s40168-022-01266-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 04/01/2022] [Indexed: 06/15/2023]

Abstract

BACKGROUND

The relationship between host conditions and microbiome profiles, typically characterized by operational taxonomic units (OTUs), contains important information about the microbial role in human health. Traditional association testing frameworks are challenged by the high dimensionality and sparsity of typical microbiome profiles. Phylogenetic information is often incorporated to address these challenges with the assumption that evolutionarily similar taxa tend to behave similarly. However, this assumption may not always be valid due to the complex effects of microbes, and phylogenetic information should be incorporated in a data-supervised fashion.

RESULTS

In this work, we propose a local collapsing test called phylogeny-guided microbiome OTU-specific association test (POST). In POST, whether or not to borrow information and how much information to borrow from the neighboring OTUs in the phylogenetic tree are supervised by phylogenetic distance and the outcome-OTU association. POST is constructed under the kernel machine framework to accommodate complex OTU effects and extends kernel machine microbiome tests from community level to OTU level. Using simulation studies, we show that when the phylogenetic tree is informative, POST has better performance than existing OTU-level association tests. When the phylogenetic tree is not informative, POST achieves similar performance as existing methods. Finally, in real data applications on bacterial vaginosis and on preterm birth, we find that POST can identify similar or more outcome-associated OTUs that are of biological relevance compared to existing methods.

CONCLUSIONS

Using POST, we show that adaptively leveraging the phylogenetic information can enhance the selection performance of associated microbiome features by improving the overall true-positive and false-positive detection. We developed a user friendly R package POSTm which is freely available on CRAN ( https://CRAN.R-project.org/package=POSTm ). Video Abstract.

Collapse

Zhou H, He K, Chen J, Zhang X. LinDA: linear models for differential abundance analysis of microbiome compositional data. Genome Biol 2022;23:95. [PMID: 35421994 PMCID: PMC9012043 DOI: 10.1186/s13059-022-02655-5] [Citation(s) in RCA: 70] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 03/14/2022] [Indexed: 12/12/2022] Open

Liu B, Sträuber H, Saraiva J, Harms H, Silva SG, Kasmanas JC, Kleinsteuber S, Nunes da Rocha U. Machine learning-assisted identification of bioindicators predicts medium-chain carboxylate production performance of an anaerobic mixed culture. MICROBIOME 2022;10:48. [PMID: 35331330 PMCID: PMC8952268 DOI: 10.1186/s40168-021-01219-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 12/17/2021] [Indexed: 05/10/2023]

Abstract

BACKGROUND

The ability to quantitatively predict ecophysiological functions of microbial communities provides an important step to engineer microbiota for desired functions related to specific biochemical conversions. Here, we present the quantitative prediction of medium-chain carboxylate production in two continuous anaerobic bioreactors from 16S rRNA gene dynamics in enriched communities.

RESULTS

By progressively shortening the hydraulic retention time (HRT) from 8 to 2 days with different temporal schemes in two bioreactors operated for 211 days, we achieved higher productivities and yields of the target products n-caproate and n-caprylate. The datasets generated from each bioreactor were applied independently for training and testing machine learning algorithms using 16S rRNA genes to predict n-caproate and n-caprylate productivities. Our dataset consisted of 14 and 40 samples from HRT of 8 and 2 days, respectively. Because of the size and balance of our dataset, we compared linear regression, support vector machine and random forest regression algorithms using the original and balanced datasets generated using synthetic minority oversampling. Further, we performed cross-validation to estimate model stability. The random forest regression was the best algorithm producing more consistent results with median of error rates below 8%. More than 90% accuracy in the prediction of n-caproate and n-caprylate productivities was achieved. Four inferred bioindicators belonging to the genera Olsenella, Lactobacillus, Syntrophococcus and Clostridium IV suggest their relevance to the higher carboxylate productivity at shorter HRT. The recovery of metagenome-assembled genomes of these bioindicators confirmed their genetic potential to perform key steps of medium-chain carboxylate production.

CONCLUSIONS

Shortening the hydraulic retention time of the continuous bioreactor systems allows to shape the communities with desired chain elongation functions. Using machine learning, we demonstrated that 16S rRNA amplicon sequencing data can be used to predict bioreactor process performance quantitatively and accurately. Characterizing and harnessing bioindicators holds promise to manage reactor microbiota towards selection of the target processes. Our mathematical framework is transferrable to other ecosystem processes and microbial systems where community dynamics is linked to key functions. The general methodology used here can be adapted to data types of other functional categories such as genes, transcripts, proteins or metabolites. Video Abstract.

Collapse

Zhang L, Wang Y, Chen J, Chen J. RFtest: A Robust and Flexible Community-Level Test for Microbiome Data Powerfully Detects Phylogenetically Clustered Signals. Front Genet 2022;12:749573. [PMID: 35140735 PMCID: PMC8819960 DOI: 10.3389/fgene.2021.749573] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Accepted: 11/09/2021] [Indexed: 12/31/2022] Open

Revers A, Zhang X, Zwinderman AH. A Bayesian Negative Binomial Hierarchical Model for Identifying Diet-Gut Microbiome Associations. Front Microbiol 2021;12:711861. [PMID: 34690956 PMCID: PMC8529249 DOI: 10.3389/fmicb.2021.711861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Accepted: 08/20/2021] [Indexed: 11/13/2022] Open

Bien J, Yan X, Simpson L, Müller CL. Tree-aggregated predictive modeling of microbiome data. Sci Rep 2021;11:14505. [PMID: 34267244 PMCID: PMC8282688 DOI: 10.1038/s41598-021-93645-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 06/22/2021] [Indexed: 01/05/2023] Open

Goren E, Wang C, He Z, Sheflin AM, Chiniquy D, Prenni JE, Tringe S, Schachtman DP, Liu P. Feature selection and causal analysis for microbiome studies in the presence of confounding using standardization. BMC Bioinformatics 2021;22:362. [PMID: 34229628 PMCID: PMC8261956 DOI: 10.1186/s12859-021-04232-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Accepted: 06/03/2021] [Indexed: 12/25/2022] Open

Abstract

BACKGROUND

Microbiome studies have uncovered associations between microbes and human, animal, and plant health outcomes. This has led to an interest in developing microbial interventions for treatment of disease and optimization of crop yields which requires identification of microbiome features that impact the outcome in the population of interest. That task is challenging because of the high dimensionality of microbiome data and the confounding that results from the complex and dynamic interactions among host, environment, and microbiome. In the presence of such confounding, variable selection and estimation procedures may have unsatisfactory performance in identifying microbial features with an effect on the outcome.

RESULTS

In this manuscript, we aim to estimate population-level effects of individual microbiome features while controlling for confounding by a categorical variable. Due to the high dimensionality and confounding-induced correlation between features, we propose feature screening, selection, and estimation conditional on each stratum of the confounder followed by a standardization approach to estimation of population-level effects of individual features. Comprehensive simulation studies demonstrate the advantages of our approach in recovering relevant features. Utilizing a potential-outcomes framework, we outline assumptions required to ascribe causal, rather than associational, interpretations to the identified microbiome effects. We conducted an agricultural study of the rhizosphere microbiome of sorghum in which nitrogen fertilizer application is a confounding variable. In this study, the proposed approach identified microbial taxa that are consistent with biological understanding of potential plant-microbe interactions.

CONCLUSIONS

Standardization enables more accurate identification of individual microbiome features with an effect on the outcome of interest compared to other variable selection and estimation procedures when there is confounding by a categorical variable.

Collapse

Jiang R, Li WV, Li JJ. mbImpute: an accurate and robust imputation method for microbiome data. Genome Biol 2021;22:192. [PMID: 34183041 PMCID: PMC8240317 DOI: 10.1186/s13059-021-02400-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 06/04/2021] [Indexed: 12/22/2022] Open

Sharma D, Paterson AD, Xu W. TaxoNN: ensemble of neural networks on stratified microbiome data for disease prediction. Bioinformatics 2021;36:4544-4550. [PMID: 32449747 PMCID: PMC7750934 DOI: 10.1093/bioinformatics/btaa542] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Revised: 05/08/2020] [Accepted: 05/19/2020] [Indexed: 11/13/2022] Open

Liu L, Gu H, Van Limbergen J, Kenney T. SuRF: A new method for sparse variable selection, with application in microbiome data analysis. Stat Med 2020;40:897-919. [PMID: 33219557 DOI: 10.1002/sim.8809] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Revised: 10/25/2020] [Accepted: 10/27/2020] [Indexed: 01/16/2023]

Dong M, Li L, Chen M, Kusalik A, Xu W. Predictive analysis methods for human microbiome data with application to Parkinson's disease. PLoS One 2020;15:e0237779. [PMID: 32834004 PMCID: PMC7446854 DOI: 10.1371/journal.pone.0237779] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Accepted: 08/03/2020] [Indexed: 12/22/2022] Open

Abstract

Microbiome data consists of operational taxonomic unit (OTU) counts characterized by zero-inflation, over-dispersion, and grouping structure among samples. Currently, statistical testing methods are commonly performed to identify OTUs that are associated with a phenotype. The limitations of statistical testing methods include that the validity of p-values/q-values depend sensitively on the correctness of models and that the statistical significance does not necessarily imply predictivity. Predictive analysis using methods such as LASSO is an alternative approach for identifying associated OTUs and for measuring the predictability of the phenotype variable with OTUs and other covariate variables. We investigate three strategies of performing predictive analysis: (1) LASSO: fitting a LASSO multinomial logistic regression model to all OTU counts with specific transformation; (2) screening+GLM: screening OTUs with q-values returned by fitting a GLMM to each OTU, then fitting a GLM model using a subset of selected OTUs; (3) screening+LASSO: fitting a LASSO to a subset of OTUs selected with GLMM. We have conducted empirical studies using three simulation datasets generated using Dirichlet-multinomial models and a real gut microbiome data related to Parkinson’s disease to investigate the performance of the three strategies for predictive analysis. Our simulation studies show that the predictive performance of LASSO with appropriate variable transformation works remarkably well on zero-inflated data. Our results of real data analysis show that Parkinson’s disease can be predicted based on selected OTUs after the binary transformation, age, and sex with high accuracy (Error Rate = 0.199, AUC = 0.872, AUPRC = 0.912). These results provide strong evidences of the relationship between Parkinson’s disease and the gut microbiome.

Collapse

Pelpolage SW, Yoshida A, Nagata R, Shimada K, Fukuma N, Bochimoto H, Hamamoto T, Hoshizawa M, Nakano K, Han KH, Fukushima M. Frozen Autoclaved Sorghum Enhanced Colonic Fermentation and Lower Visceral Fat Accumulation in Rats. Nutrients 2020;12:E2412. [PMID: 32806549 PMCID: PMC7570106 DOI: 10.3390/nu12082412] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Revised: 08/05/2020] [Accepted: 08/10/2020] [Indexed: 01/09/2023] Open

Affiliation(s)

Samanthi W. Pelpolage Department of Life and Food Sciences, Obihiro University of Agriculture and Veterinary Medicine, West 2-11, Inada, Obihiro 080–8555, Hokkaido, Japan; (S.W.P.); (A.Y.); (K.S.); (R.N.); (N.F.); (K.-H.H.)
Atsushi Yoshida Department of Life and Food Sciences, Obihiro University of Agriculture and Veterinary Medicine, West 2-11, Inada, Obihiro 080–8555, Hokkaido, Japan; (S.W.P.); (A.Y.); (K.S.); (R.N.); (N.F.); (K.-H.H.)
Ryuji Nagata Department of Life and Food Sciences, Obihiro University of Agriculture and Veterinary Medicine, West 2-11, Inada, Obihiro 080–8555, Hokkaido, Japan; (S.W.P.); (A.Y.); (K.S.); (R.N.); (N.F.); (K.-H.H.)
Kenichiro Shimada Department of Life and Food Sciences, Obihiro University of Agriculture and Veterinary Medicine, West 2-11, Inada, Obihiro 080–8555, Hokkaido, Japan; (S.W.P.); (A.Y.); (K.S.); (R.N.); (N.F.); (K.-H.H.)
Naoki Fukuma Department of Life and Food Sciences, Obihiro University of Agriculture and Veterinary Medicine, West 2-11, Inada, Obihiro 080–8555, Hokkaido, Japan; (S.W.P.); (A.Y.); (K.S.); (R.N.); (N.F.); (K.-H.H.) Research Center for Global Agromedicine, Obihiro University of Agriculture and Veterinary Medicine, West 2-11, Inada, Obihiro 080-8555, Hokkaido, Japan
Hiroki Bochimoto Division of Aerospace Medicine, Department of Cell Physiology, The Jikei University School of Medicine, 3-25-8 Nishishimbashi, Minatoku, Tokyo 105-8461, Japan;
Tetsuo Hamamoto U.S. Grains Council, 11th Floor, Toranomon Denki Building No. 3, 1-2-20 Toranomon, Minato-ku, Tokyo 105-0001, Japan; (T.H.); (M.H.)
Michiyo Hoshizawa U.S. Grains Council, 11th Floor, Toranomon Denki Building No. 3, 1-2-20 Toranomon, Minato-ku, Tokyo 105-0001, Japan; (T.H.); (M.H.)
Koichi Nakano Nakano Industry Co., Asahishinmachi 33-25 Takamatsu, Kagawa 760-0064, Japan;
Kyu-Ho Han Department of Life and Food Sciences, Obihiro University of Agriculture and Veterinary Medicine, West 2-11, Inada, Obihiro 080–8555, Hokkaido, Japan; (S.W.P.); (A.Y.); (K.S.); (R.N.); (N.F.); (K.-H.H.) Research Center for Global Agromedicine, Obihiro University of Agriculture and Veterinary Medicine, West 2-11, Inada, Obihiro 080-8555, Hokkaido, Japan
Michihiro Fukushima Department of Life and Food Sciences, Obihiro University of Agriculture and Veterinary Medicine, West 2-11, Inada, Obihiro 080–8555, Hokkaido, Japan; (S.W.P.); (A.Y.); (K.S.); (R.N.); (N.F.); (K.-H.H.)

Collapse

Zhang L, Shi Y, Jenq RR, Do KA, Peterson CB. Bayesian compositional regression with structured priors for microbiome feature selection. Biometrics 2020;77:824-838. [PMID: 32686846 DOI: 10.1111/biom.13335] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Accepted: 07/13/2020] [Indexed: 01/10/2023]

Crawford J, Greene CS. Incorporating biological structure into machine learning models in biomedicine. Curr Opin Biotechnol 2020;63:126-134. [PMID: 31962244 PMCID: PMC7308204 DOI: 10.1016/j.copbio.2019.12.021] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Revised: 12/17/2019] [Accepted: 12/19/2019] [Indexed: 12/19/2022]

Xia Y. Correlation and association analyses in microbiome study integrating multiomics in health and disease. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2020;171:309-491. [PMID: 32475527 DOI: 10.1016/bs.pmbts.2020.04.003] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Abstract

Correlation and association analyses are one of the most widely used statistical methods in research fields, including microbiome and integrative multiomics studies. Correlation and association have two implications: dependence and co-occurrence. Microbiome data are structured as phylogenetic tree and have several unique characteristics, including high dimensionality, compositionality, sparsity with excess zeros, and heterogeneity. These unique characteristics cause several statistical issues when analyzing microbiome data and integrating multiomics data, such as large p and small n, dependency, overdispersion, and zero-inflation. In microbiome research, on the one hand, classic correlation and association methods are still applied in real studies and used for the development of new methods; on the other hand, new methods have been developed to target statistical issues arising from unique characteristics of microbiome data. Here, we first provide a comprehensive view of classic and newly developed univariate correlation and association-based methods. We discuss the appropriateness and limitations of using classic methods and demonstrate how the newly developed methods mitigate the issues of microbiome data. Second, we emphasize that concepts of correlation and association analyses have been shifted by introducing network analysis, microbe-metabolite interactions, functional analysis, etc. Third, we introduce multivariate correlation and association-based methods, which are organized by the categories of exploratory, interpretive, and discriminatory analyses and classification methods. Fourth, we focus on the hypothesis testing of univariate and multivariate regression-based association methods, including alpha and beta diversities-based, count-based, and relative abundance (or compositional)-based association analyses. We demonstrate the characteristics and limitations of each approaches. Fifth, we introduce two specific microbiome-based methods: phylogenetic tree-based association analysis and testing for survival outcomes. Sixth, we provide an overall view of longitudinal methods in analysis of microbiome and omics data, which cover standard, static, regression-based time series methods, principal trend analysis, and newly developed univariate overdispersed and zero-inflated as well as multivariate distance/kernel-based longitudinal models. Finally, we comment on current association analysis and future direction of association analysis in microbiome and multiomics studies.

Collapse

Wang Y, Bhattacharya T, Jiang Y, Qin X, Wang Y, Liu Y, Saykin AJ, Chen L. A novel deep learning method for predictive modeling of microbiome data. Brief Bioinform 2020;22:5835556. [PMID: 32406914 DOI: 10.1093/bib/bbaa073] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2019] [Revised: 02/22/2020] [Accepted: 04/10/2020] [Indexed: 12/22/2022] Open

Abstract

With the development and decreasing cost of next-generation sequencing technologies, the study of the human microbiome has become a rapid expanding research field, which provides an unprecedented opportunity in various clinical applications such as drug response predictions and disease diagnosis. It is thus essential and desirable to build a prediction model for clinical outcomes based on microbiome data that usually consist of taxon abundance and a phylogenetic tree. Importantly, all microbial species are not uniformly distributed in the phylogenetic tree but tend to be clustered at different phylogenetic depths. Therefore, the phylogenetic tree represents a unique correlation structure of microbiome, which can be an important prior to improve the prediction performance. However, prediction methods that consider the phylogenetic tree in an efficient and rigorous way are under-developed. Here, we develop a novel deep learning prediction method MDeep (microbiome-based deep learning method) to predict both continuous and binary outcomes. Conceptually, MDeep designs convolutional layers to mimic taxonomic ranks with multiple convolutional filters on each convolutional layer to capture the phylogenetic correlation among microbial species in a local receptive field and maintain the correlation structure across different convolutional layers via feature mapping. Taken together, the convolutional layers with its built-in convolutional filters capture microbial signals at different taxonomic levels while encouraging local smoothing and preserving local connectivity induced by the phylogenetic tree. We use both simulation studies and real data applications to demonstrate that MDeep outperforms competing methods in both regression and binary classifications. Availability and Implementation: MDeep software is available at https://github.com/lichen-lab/MDeep Contact:chen61@iu.edu.

Collapse

Bichat A, Plassais J, Ambroise C, Mariadassou M. Incorporating Phylogenetic Information in Microbiome Differential Abundance Studies Has No Effect on Detection Power and FDR Control. Front Microbiol 2020;11:649. [PMID: 32351481 PMCID: PMC7174607 DOI: 10.3389/fmicb.2020.00649] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 03/20/2020] [Indexed: 12/18/2022] Open

Martinez S, Garcia JG, Williams R, Elmassry M, West A, Hamood A, Hurtado D, Gudenkauf B, Ventolini G, Schlabritz-Loutsevitch N. Lactobacilli spp.: real-time evaluation of biofilm growth. BMC Microbiol 2020;20:64. [PMID: 32209050 PMCID: PMC7092459 DOI: 10.1186/s12866-020-01753-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Accepted: 03/13/2020] [Indexed: 01/10/2023] Open

Abstract

BACKGROUND

Biofilm is a fundamental bacterial survival mode which proceeds through three main generalized phases: adhesion, maturation, and dispersion. Lactobacilli spp. (LB) are critical components of gut and reproductive health and are widely used probiotics. Evaluation of time-dependent mechanisms of biofilm formation is important for understanding of host-microbial interaction and development of therapeutic interventions. Time-dependent LB biofilm growth was studied in two systems: large biofilm output in continuous flow system (microfermenter (M), Institute Pasteur, France) and electrical impedance-based real time label-free cell analyzer (C) (xCELLigence, ACEA Bioscience Inc., San Diego, CA). L. plantarum biofilm growth in M system was video-recorded, followed by analyses using IMARIS software (Bitplane, Oxford Instrument Company, Concord, MA, USA). Additionally, whole genome expression and analyses of attached (A) and dispersed (D) biofilm phases at 24 and 48 h were performed.

RESULTS

The dynamic of biofilm growth of L. plantarum was similar in both systems except for D phases. Comparison of the transcriptome of A and D phases revealed, that 121 transcripts differ between two phases at 24 h. and 35 transcripts - at 48 h. of M growth. The main pathways, down-regulated in A compared to D phases after 24 h. were transcriptional regulation, purine nucleotide biosynthesis, and L-aspartate biosynthesis, and the upregulated pathways were fatty acid and phospholipid metabolism as well as ABC transporters and purine nucleotide biosynthesis. Four LB species differed in the duration and amplitude of attachment phases, while growth phases were similar.

CONCLUSION

LB spp. biofilm growth and propagation area dynamic, time-dependent processes with species-specific and time specific characteristics. The dynamic of LB biofilm growth agrees with published pathophysiological data and points out that real time evaluation is an important tool in understanding growth of microbial communities.

Collapse

Xiao J, Chen L, Yu Y, Zhang X, Chen J. A Phylogeny-Regularized Sparse Regression Model for Predictive Modeling of Microbial Community Data. Front Microbiol 2018;9:3112. [PMID: 30619188 PMCID: PMC6305753 DOI: 10.3389/fmicb.2018.03112] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2018] [Accepted: 12/03/2018] [Indexed: 12/16/2022] Open

Abstract

Fueled by technological advancement, there has been a surge of human microbiome studies surveying the microbial communities associated with the human body and their links with health and disease. As a complement to the human genome, the human microbiome holds great potential for precision medicine. Efficient predictive models based on microbiome data could be potentially used in various clinical applications such as disease diagnosis, patient stratification and drug response prediction. One important characteristic of the microbial community data is the phylogenetic tree that relates all the microbial taxa based on their evolutionary history. The phylogenetic tree is an informative prior for more efficient prediction since the microbial community changes are usually not randomly distributed on the tree but tend to occur in clades at varying phylogenetic depths (clustered signal). Although community-wide changes are possible for some conditions, it is also likely that the community changes are only associated with a small subset of "marker" taxa (sparse signal). Unfortunately, predictive models of microbial community data taking into account both the sparsity and the tree structure remain under-developed. In this paper, we propose a predictive framework to exploit sparse and clustered microbiome signals using a phylogeny-regularized sparse regression model. Our approach is motivated by evolutionary theory, where a natural correlation structure among microbial taxa exists according to the phylogenetic relationship. A novel phylogeny-based smoothness penalty is proposed to smooth the coefficients of the microbial taxa with respect to the phylogenetic tree. Using simulated and real datasets, we show that our method achieves better prediction performance than competing sparse regression methods for sparse and clustered microbiome signals.

Collapse