1
|
Lee KH, Coull BA, Moscicki AB, Paster BJ, Starr JR. Bayesian variable selection for multivariate zero-inflated models: Application to microbiome count data. Biostatistics 2021; 21:499-517. [PMID: 30590511 DOI: 10.1093/biostatistics/kxy067] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Revised: 09/06/2018] [Accepted: 09/09/2018] [Indexed: 01/22/2023] Open
Abstract
Microorganisms play critical roles in human health and disease. They live in diverse communities in which they interact synergistically or antagonistically. Thus for estimating microbial associations with clinical covariates, such as treatment effects, joint (multivariate) statistical models are preferred. Multivariate models allow one to estimate and exploit complex interdependencies among multiple taxa, yielding more powerful tests of exposure or treatment effects than application of taxon-specific univariate analyses. Analysis of microbial count data also requires special attention because data commonly exhibit zero inflation, i.e., more zeros than expected from a standard count distribution. To meet these needs, we developed a Bayesian variable selection model for multivariate count data with excess zeros that incorporates information on the covariance structure of the outcomes (counts for multiple taxa), while estimating associations with the mean levels of these outcomes. Though there has been much work on zero-inflated models for longitudinal data, little attention has been given to high-dimensional multivariate zero-inflated data modeled via a general correlation structure. Through simulation, we compared performance of the proposed method to that of existing univariate approaches, for both the binary ("excess zero") and count parts of the model. When outcomes were correlated the proposed variable selection method maintained type I error while boosting the ability to identify true associations in the binary component of the model. For the count part of the model, in some scenarios the univariate method had higher power than the multivariate approach. This higher power was at a cost of a highly inflated false discovery rate not observed with the proposed multivariate method. We applied the approach to oral microbiome data from the Pediatric HIV/AIDS Cohort Oral Health Study and identified five (of 44) species associated with HIV infection.
Collapse
Affiliation(s)
- Kyu Ha Lee
- The Forsyth Institute, 245 First Street, Cambridge, MA 02142, USA and Department of Oral Health Policy and Epidemiology, Harvard School of Dental Medicine, Boston, MA 02115, USA
| | - Brent A Coull
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 665 Huntington Avenue, Boston, MA 02115, USA
| | - Anna-Barbara Moscicki
- Department of Pediatrics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 10833, USA
| | - Bruce J Paster
- The Forsyth Institute, 245 First Street, Cambridge, MA 02142, USA and Department of Oral Medicine, Infection, and Immunity, Harvard School of Dental Medicine, Boston, MA 02115, USA
| | - Jacqueline R Starr
- The Forsyth Institute, 245 First Street, Cambridge, MA 02142, USA and Department of Oral Health Policy and Epidemiology, Harvard School of Dental Medicine, Boston, MA 02115, USA
| |
Collapse
|
2
|
Liu X, Zhang B, Tang L, Zhang Z, Zhang N, Allison JJ, Srivastava DK, Zhang H. Are marginalized two-part models superior to non-marginalized two-part models for count data with excess zeroes? Estimation of marginal effects, model misspecification, and model selection. HEALTH SERVICES AND OUTCOMES RESEARCH METHODOLOGY 2018. [DOI: 10.1007/s10742-018-0183-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|