Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Prost V, Gazut S, Brüls T. A zero inflated log-normal model for inference of sparse microbial association networks. PLoS Comput Biol 2021;17:e1009089. [PMID: 34143768 PMCID: PMC8244920 DOI: 10.1371/journal.pcbi.1009089] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 06/30/2021] [Accepted: 05/17/2021] [Indexed: 01/03/2023] Open

For:	Prost V, Gazut S, Brüls T. A zero inflated log-normal model for inference of sparse microbial association networks. PLoS Comput Biol 2021;17:e1009089. [PMID: 34143768 PMCID: PMC8244920 DOI: 10.1371/journal.pcbi.1009089] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 06/30/2021] [Accepted: 05/17/2021] [Indexed: 01/03/2023] Open

Number

Cited by Other Article(s)

Lee KH, Pedroza C, Avritscher EBC, Mosquera RA, Tyson JE. Evaluation of negative binomial and zero-inflated negative binomial models for the analysis of zero-inflated count data: application to the telemedicine for children with medical complexity trial. Trials 2023;24:613. [PMID: 37752579 PMCID: PMC10523642 DOI: 10.1186/s13063-023-07648-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Accepted: 09/12/2023] [Indexed: 09/28/2023] Open

Abstract

BACKGROUND

Two characteristics of commonly used outcomes in medical research are zero inflation and non-negative integers; examples include the number of hospital admissions or emergency department visits, where the majority of patients will have zero counts. Zero-inflated regression models were devised to analyze this type of data. However, the performance of zero-inflated regression models or the properties of data best suited for these analyses have not been thoroughly investigated.

METHODS

We conducted a simulation study to evaluate the performance of two generalized linear models, negative binomial and zero-inflated negative binomial, for analyzing zero-inflated count data. Simulation scenarios assumed a randomized controlled trial design and varied the true underlying distribution, sample size, and rate of zero inflation. We compared the models in terms of bias, mean squared error, and coverage. Additionally, we used logistic regression to determine which data properties are most important for predicting the best-fitting model.

RESULTS

We first found that, regardless of the rate of zero inflation, there was little difference between the conventional negative binomial and its zero-inflated counterpart in terms of bias of the marginal treatment group coefficient. Second, even when the outcome was simulated from a zero-inflated distribution, a negative binomial model was favored above its ZI counterpart in terms of the Akaike Information Criterion. Third, the mean and skewness of the non-zero part of the data were stronger predictors of model preference than the percentage of zero counts. These results were not affected by the sample size, which ranged from 60 to 800.

CONCLUSIONS

We recommend that the rate of zero inflation and overdispersion in the outcome should not be the sole and main justification for choosing zero-inflated regression models. Investigators should also consider other data characteristics when choosing a model for count data. In addition, if the performance of the NB and ZINB regression models is reasonably comparable even with ZI outcomes, we advocate the use of the NB regression model due to its clear and straightforward interpretation of the results.

Collapse

Lin BM, Cho H, Liu C, Roach J, Ribeiro AA, Divaris K, Wu D. BZINB Model-Based Pathway Analysis and Module Identification Facilitates Integration of Microbiome and Metabolome Data. Microorganisms 2023;11:766. [PMID: 36985339 PMCID: PMC10056694 DOI: 10.3390/microorganisms11030766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 03/04/2023] [Accepted: 03/12/2023] [Indexed: 03/19/2023] Open

Abstract

Integration of multi-omics data is a challenging but necessary step to advance our understanding of the biology underlying human health and disease processes. To date, investigations seeking to integrate multi-omics (e.g., microbiome and metabolome) employ simple correlation-based network analyses; however, these methods are not always well-suited for microbiome analyses because they do not accommodate the excess zeros typically present in these data. In this paper, we introduce a bivariate zero-inflated negative binomial (BZINB) model-based network and module analysis method that addresses this limitation and improves microbiome-metabolome correlation-based model fitting by accommodating excess zeros. We use real and simulated data based on a multi-omics study of childhood oral health (ZOE 2.0; investigating early childhood dental caries, ECC) and find that the accuracy of the BZINB model-based correlation method is superior compared to Spearman's rank and Pearson correlations in terms of approximating the underlying relationships between microbial taxa and metabolites. The new method, BZINB-iMMPath, facilitates the construction of metabolite-species and species-species correlation networks using BZINB and identifies modules of (i.e., correlated) species by combining BZINB and similarity-based clustering. Perturbations in correlation networks and modules can be efficiently tested between groups (i.e., healthy and diseased study participants). Upon application of the new method in the ZOE 2.0 study microbiome-metabolome data, we identify that several biologically-relevant correlations of ECC-associated microbial taxa with carbohydrate metabolites differ between healthy and dental caries-affected participants. In sum, we find that the BZINB model is a useful alternative to Spearman or Pearson correlations for estimating the underlying correlation of zero-inflated bivariate count data and thus is suitable for integrative analyses of multi-omics data such as those encountered in microbiome and metabolome studies.

Collapse

Acharjee A, Singh U, Choudhury SP, Gkoutos GV. The diagnostic potential and barriers of microbiome based therapeutics. Diagnosis (Berl) 2022;9:411-420. [PMID: 36000189 DOI: 10.1515/dx-2022-0052] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2022] [Accepted: 08/03/2022] [Indexed: 02/07/2023]

Cappellato M, Baruzzo G, Di Camillo B. Investigating differential abundance methods in microbiome data: A benchmark study. PLoS Comput Biol 2022;18:e1010467. [PMID: 36074761 PMCID: PMC9488820 DOI: 10.1371/journal.pcbi.1010467] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 09/20/2022] [Accepted: 08/03/2022] [Indexed: 11/19/2022] Open

Abstract

The development of increasingly efficient and cost-effective high throughput DNA sequencing techniques has enhanced the possibility of studying complex microbial systems. Recently, researchers have shown great interest in studying the microorganisms that characterise different ecological niches. Differential abundance analysis aims to find the differences in the abundance of each taxa between two classes of subjects or samples, assigning a significance value to each comparison. Several bioinformatic methods have been specifically developed, taking into account the challenges of microbiome data, such as sparsity, the different sequencing depth constraint between samples and compositionality. Differential abundance analysis has led to important conclusions in different fields, from health to the environment. However, the lack of a known biological truth makes it difficult to validate the results obtained. In this work we exploit metaSPARSim, a microbial sequencing count data simulator, to simulate data with differential abundance features between experimental groups. We perform a complete comparison of recently developed and established methods on a common benchmark with great effort to the reliability of both the simulated scenarios and the evaluation metrics. The performance overview includes the investigation of numerous scenarios, studying the effect on methods’ results on the main covariates such as sample size, percentage of differentially abundant features, sequencing depth, feature variability, normalisation approach and ecological niches. Mainly, we find that methods show a good control of the type I error and, generally, also of the false discovery rate at high sample size, while recall seem to depend on the dataset and sample size.

The Microbiota is the set of microorganisms that characterize an ecological environment or niche. Several studies have shown that the microbiota is involved in various biological mechanisms that affect the health or balance of the host organism or the ecosystem. New discoveries and insights have been possible thanks to the increasingly efficient sequencing technologies together with the development of bioinformatic computational methods. One of the most interesting analyses in this landscape is the identification of microorganisms that show significant different abundances when two groups of subjects are analysed. Although many computational methods have been developed, it is still unclear which one has the best performance. Therefore, we exploited a simulator of microbiome data to build a simulation framework that allowed us to carry out an extensive benchmarking of the known tools of differential abundance analysis. Our work is not only a starting point to guide analysts in the choice of tools, but also a first step towards a robust, reliable and fair simulation framework.

Collapse

Ma S, Ren B, Mallick H, Moon YS, Schwager E, Maharjan S, Tickle TL, Lu Y, Carmody RN, Franzosa EA, Janson L, Huttenhower C. A statistical model for describing and simulating microbial community profiles. PLoS Comput Biol 2021;17:e1008913. [PMID: 34516542 PMCID: PMC8491899 DOI: 10.1371/journal.pcbi.1008913] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 10/05/2021] [Accepted: 08/19/2021] [Indexed: 12/26/2022] Open

Abstract

Many methods have been developed for statistical analysis of microbial community profiles, but due to the complex nature of typical microbiome measurements (e.g. sparsity, zero-inflation, non-independence, and compositionality) and of the associated underlying biology, it is difficult to compare or evaluate such methods within a single systematic framework. To address this challenge, we developed SparseDOSSA (Sparse Data Observations for the Simulation of Synthetic Abundances): a statistical model of microbial ecological population structure, which can be used to parameterize real-world microbial community profiles and to simulate new, realistic profiles of known structure for methods evaluation. Specifically, SparseDOSSA's model captures marginal microbial feature abundances as a zero-inflated log-normal distribution, with additional model components for absolute cell counts and the sequence read generation process, microbe-microbe, and microbe-environment interactions. Together, these allow fully known covariance structure between synthetic features (i.e. "taxa") or between features and "phenotypes" to be simulated for method benchmarking. Here, we demonstrate SparseDOSSA's performance for 1) accurately modeling human-associated microbial population profiles; 2) generating synthetic communities with controlled population and ecological structures; 3) spiking-in true positive synthetic associations to benchmark analysis methods; and 4) recapitulating an end-to-end mouse microbiome feeding experiment. Together, these represent the most common analysis types in assessment of real microbial community environmental and epidemiological statistics, thus demonstrating SparseDOSSA's utility as a general-purpose aid for modeling communities and evaluating quantitative methods. An open-source implementation is available at http://huttenhower.sph.harvard.edu/sparsedossa2.

Collapse

Affiliation(s)

Siyuan Ma Harvard Chan Microbiome in Public Health Center, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America Broad Institute, Cambridge, Massachusetts, United States of America
Boyu Ren Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America Broad Institute, Cambridge, Massachusetts, United States of America
Himel Mallick Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America Broad Institute, Cambridge, Massachusetts, United States of America
Yo Sup Moon Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
Emma Schwager Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
Sagun Maharjan Harvard Chan Microbiome in Public Health Center, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America Broad Institute, Cambridge, Massachusetts, United States of America
Timothy L. Tickle Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America Broad Institute, Cambridge, Massachusetts, United States of America
Yiren Lu Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
Rachel N. Carmody Department of Human Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
Eric A. Franzosa Harvard Chan Microbiome in Public Health Center, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America Broad Institute, Cambridge, Massachusetts, United States of America
Lucas Janson Department of Statistics, Harvard University, Cambridge, Massachusetts, United States of America
Curtis Huttenhower Harvard Chan Microbiome in Public Health Center, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America Broad Institute, Cambridge, Massachusetts, United States of America Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America

Collapse