1
|
XIE HUI, ROLKA DEBORAHB, BARKER LAWRENCEE. Modeling County-Level Rare Disease Prevalence Using Bayesian Hierarchical Sampling Weighted Zero-Inflated Regression. JOURNAL OF DATA SCIENCE : JDS 2023; 21:145-157. [PMID: 38799122 PMCID: PMC11119276 DOI: 10.6339/22-jds1049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Estimates of county-level disease prevalence have a variety of applications. Such estimation is often done via model-based small-area estimation using survey data. However, for conditions with low prevalence (i.e., rare diseases or newly diagnosed diseases), counties with a high fraction of zero counts in surveys are common. They are often more common than the model used would lead one to expect; such zeros are called 'excess zeros'. The excess zeros can be structural (there are no cases to find) or sampling (there are cases, but none were selected for sampling). These issues are often addressed by combining multiple years of data. However, this approach can obscure trends in annual estimates and prevent estimates from being timely. Using single-year survey data, we proposed a Bayesian weighted Binomial Zero-inflated (BBZ) model to estimate county-level rare diseases prevalence. The BBZ model accounts for excess zero counts, the sampling weights and uses a power prior. We evaluated BBZ with American Community Survey results and simulated data. We showed that BBZ yielded less bias and smaller variance than estimates based on the binomial distribution, a common approach to this problem. Since BBZ uses only a single year of survey data, BBZ produces more timely county-level incidence estimates. These timely estimates help pinpoint the special areas of county-level needs and help medical researchers and public health practitioners promptly evaluate rare diseases trends and associations with other health conditions.
Collapse
Affiliation(s)
- HUI XIE
- Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Division of Diabetes Translation, Atlanta, Georgia, USA
| | - DEBORAH B. ROLKA
- Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Division of Diabetes Translation, Atlanta, Georgia, USA
| | - LAWRENCE E. BARKER
- Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Office of the Director, Atlanta, Georgia, USA (retired)
| |
Collapse
|
2
|
Chen HF, Karim SA. Relationship between political partisanship and COVID-19 deaths: future implications for public health. J Public Health (Oxf) 2022; 44:716-723. [PMID: 33912968 PMCID: PMC8135482 DOI: 10.1093/pubmed/fdab136] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 03/30/2021] [Accepted: 04/12/2021] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND COVID-19 has impacted more than 200 countries. However in the USA, the response to the COVID-19 pandemic has been politically polarized. The objective of this study is to investigate the association between political partisanship and COVID-19 deaths rates in the USA. METHODS This study used longitudinal county-level panel data, segmented into 10 30-day time periods, consisting of all counties in the USA, from 22 January 2020 to 5 December 2020. The outcome measure is the total number of COVID-19 deaths per 30-day period. The key explanatory variable is county political partisanship, dichotomized as Democratic or Republican. The analysis used a ZINB regression. RESULTS When compared with Republican counties, COVID-19 death rates in Democratic counties were significantly higher (IRRs ranged from 2.0 to 18.3, P < 0.001) in Time 1-Time 5, but in Time 9-Time10, were significantly lower (IRRs ranged from 0.43 to 0.69, P < 0.001). CONCLUSION The reversed trend in COVID-19 death rates between Democratic and Republican counties was influenced by the political polarized response to the pandemic. The findings support the necessity of evidence-based public health leadership and management in maneuvering the USA out of the current COVID-19 pandemic and prepare for future public health crises.
Collapse
Affiliation(s)
- Hsueh-Fen Chen
- Department of Healthcare Administration and Medical Informatics, Kaohsiung Medical University, 100, Shih-Chuan 1st Road, Kaohsiung 80708, Taiwan
| | - Saleema A Karim
- Department of Health Policy and Management, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| |
Collapse
|
3
|
Fernandez GA, Vatcheva KP. A comparison of statistical methods for modeling count data with an application to hospital length of stay. BMC Med Res Methodol 2022; 22:211. [PMID: 35927612 PMCID: PMC9351158 DOI: 10.1186/s12874-022-01685-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Accepted: 07/11/2022] [Indexed: 11/22/2022] Open
Abstract
Background Hospital length of stay (LOS) is a key indicator of hospital care management efficiency, cost of care, and hospital planning. Hospital LOS is often used as a measure of a post-medical procedure outcome, as a guide to the benefit of a treatment of interest, or as an important risk factor for adverse events. Therefore, understanding hospital LOS variability is always an important healthcare focus. Hospital LOS data can be treated as count data, with discrete and non-negative values, typically right skewed, and often exhibiting excessive zeros. In this study, we compared the performance of the Poisson, negative binomial (NB), zero-inflated Poisson (ZIP), and zero-inflated negative binomial (ZINB) regression models using simulated and empirical data. Methods Data were generated under different simulation scenarios with varying sample sizes, proportions of zeros, and levels of overdispersion. Analysis of hospital LOS was conducted using empirical data from the Medical Information Mart for Intensive Care database. Results Results showed that Poisson and ZIP models performed poorly in overdispersed data. ZIP outperformed the rest of the regression models when the overdispersion is due to zero-inflation only. NB and ZINB regression models faced substantial convergence issues when incorrectly used to model equidispersed data. NB model provided the best fit in overdispersed data and outperformed the ZINB model in many simulation scenarios with combinations of zero-inflation and overdispersion, regardless of the sample size. In the empirical data analysis, we demonstrated that fitting incorrect models to overdispersed data leaded to incorrect regression coefficients estimates and overstated significance of some of the predictors. Conclusions Based on this study, we recommend to the researchers that they consider the ZIP models for count data with zero-inflation only and NB models for overdispersed data or data with combinations of zero-inflation and overdispersion. If the researcher believes there are two different data generating mechanisms producing zeros, then the ZINB regression model may provide greater flexibility when modeling the zero-inflation and overdispersion.
Collapse
Affiliation(s)
- Gustavo A Fernandez
- School of Mathematical and Statistical Sciences, University of Texas Rio Grande Valley, One West University Boulevard, Brownsville CampusBrownsville, TX, 78520, USA
| | - Kristina P Vatcheva
- School of Mathematical and Statistical Sciences, University of Texas Rio Grande Valley, One West University Boulevard, Brownsville CampusBrownsville, TX, 78520, USA.
| |
Collapse
|
4
|
Luo D, Liu W, Chen T, An L. A Distribution-Free Model for Longitudinal Metagenomic Count Data. Genes (Basel) 2022; 13:1183. [PMID: 35885966 PMCID: PMC9316307 DOI: 10.3390/genes13071183] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 06/25/2022] [Accepted: 06/28/2022] [Indexed: 02/05/2023] Open
Abstract
Longitudinal metagenomics has been widely studied in the recent decade to provide valuable insight for understanding microbial dynamics. The correlation within each subject can be observed across repeated measurements. However, previous methods that assume independent correlation may suffer from incorrect inferences. In addition, methods that do account for intra-sample correlation may not be applicable for count data. We proposed a distribution-free approach, namely CorrZIDF, which extends the current method to model correlated zero-inflated metagenomic count data, offering a powerful and accurate solution for detecting significance features. This method can handle different working correlation structures without specifying each margin distribution of the count data. Through simulation studies, we have shown the robustness of CorrZIDF when selecting a working correlation structure for repeated measures studies to enhance the efficiency of estimation. We also compared four methods using two real datasets, and the new proposed method identified more unique features that were reported previously on the relevant research.
Collapse
Affiliation(s)
- Dan Luo
- Department of Epidemiology and Biostatistics, The University of Arizona, Tucson, AZ 85721, USA;
| | - Wenwei Liu
- Interdisciplinary Program of Statistics and Data Science, The University of Arizona, Tucson, AZ 85721, USA;
| | - Tian Chen
- Statistical and Quantitative Sciences, Takeda Pharmaceuticals, Cambridge, MA 02139, USA;
| | - Lingling An
- Department of Epidemiology and Biostatistics, The University of Arizona, Tucson, AZ 85721, USA;
- Interdisciplinary Program of Statistics and Data Science, The University of Arizona, Tucson, AZ 85721, USA;
- Department of Biosystems Engineering, The University of Arizona, Tucson, AZ 85721, USA
| |
Collapse
|
5
|
Chen C, Shen C. Distribution‐free model selection for longitudinal zero‐inflated count data with missing responses and covariates. Stat Med 2022; 41:3180-3198. [DOI: 10.1002/sim.9411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2021] [Revised: 03/24/2022] [Accepted: 04/01/2022] [Indexed: 11/05/2022]
Affiliation(s)
- Chun‐Shu Chen
- Graduate Institute of Statistics National Central University Taoyuan Taiwan Republic of China
| | - Chung‐Wei Shen
- Department of Mathematics National Chung Cheng University Chia‐Yi Taiwan Republic of China
| |
Collapse
|
6
|
Time Series Regression for Zero-Inflated and Overdispersed Count Data: A Functional Response Model Approach. JOURNAL OF STATISTICAL THEORY AND PRACTICE 2020. [DOI: 10.1007/s42519-020-00094-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
7
|
Gushi LL, Sousa MDLRD, Frias AC, Antunes JLF. Fatores associados ao impacto das condições de saúde bucal nas atividades de vida diária de adolescentes, Estado de São Paulo, 2015. REVISTA BRASILEIRA DE EPIDEMIOLOGIA 2020; 23:e200098. [DOI: 10.1590/1980-549720200098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Accepted: 04/01/2020] [Indexed: 11/21/2022] Open
Abstract
RESUMO: Objetivo: Verificar a associação entre impacto nas atividades de vida diária e variáveis sociodemográficas e parâmetros bucais em adolescentes no Estado de São Paulo. Métodos: Estudo transversal com dados de 5.409 adolescentes que participaram da “Pesquisa Estadual de Saúde Bucal de São Paulo - SB”, de 2015. O impacto nas atividades de vida diária foi avaliado pelo índice de impacto das condições de saúde bucal nas atividades de vida diária (em inglês: oral impacts on daily performances [OIDP]), pela prevalência (presença ou ausência de impacto) e pela severidade do impacto (escores do OIDP). Utilizou-se o modelo de regressão binomial negativa inflado de zeros, considerando os pesos amostrais. Foram calculados as razões de prevalências (RP), as razões de médias (RM) e os intervalos de confiança (IC). Resultados: A prevalência de impacto nas atividades de vida diária foi de 37,4%. Após o ajuste, pôde-se observar que o sexo feminino permaneceu com maior prevalência (RP = 1,59; IC95% 1,36 ‒ 1,81) e severidade do impacto (RM = 1,49; IC95% 1,22 ‒ 1,81). Na comparação com brancos, os demais grupos tiveram maior prevalência de impacto. A renda familiar maior que R$ 2.501 (RM = 0,79; IC95% 0,64 ‒ 0,98) e a aglomeração domiciliar (RM = 1,18; IC95% 1,00 ‒ 1,39) foram associadas com a severidade do impacto. Nas condições de saúde bucal, verificou-se que a cárie não tratada (RP = 1,46; IC95% 1,23 ‒ 1,74) e o sangramento gengival (RP = 1,35; IC95% 1,14 ‒ 1,60) permaneceram associados com maior prevalência de impacto. Conclusão: Sexo feminino, ter cor não branca, ter cárie não tratada e sangramento gengival foram associados ao maior impacto nas atividades de vida diária. Ter renda maior que R$ 2.500 e residir em domicílios menos aglomerados associaram-se com menor impacto.
Collapse
|
8
|
Chen T, Zhang H, Zhang B. A semiparametric marginalized zero-inflated model for analyzing healthcare utilization panel data with missingness. J Appl Stat 2019; 46:2862-2883. [PMID: 32952258 DOI: 10.1080/02664763.2019.1620705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Zero-inflated count outcomes arise quite often in research and practice. Parametric models such as the zero-inflated Poisson and zero-inflated negative binomial are widely used to model such responses. However, interpretations of those models focus on the at-risk subpopulation of a two-component population mixture and fail to provide direct inference about marginal effects for the overall population. Recently, new approaches have been proposed to facilitate such marginal inferences for count responses with excess zeros. However, they are likelihood based and impose strong assumptions on data distributions. In this paper, we propose a new distribution-free, or semiparametric, alternative to provide robust inference for marginal effects when population mixtures are defined by zero-inflated count outcomes. The proposed method also applies to longitudinal studies with missing data following the general missing at random mechanism. The proposed approach is illustrated with both simulated and real study data.
Collapse
Affiliation(s)
- Tian Chen
- Department of Mathematics and Statistics, University of Toledo, Toledo, OH 43606, U.S.A
| | - Hui Zhang
- Department of Biostatistics, St. Jude Children's Research Hospital, TN 38105, U.S.A
| | - Bo Zhang
- Department of Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA 01605, U.S.A
| |
Collapse
|
9
|
Ye P, Tang W, He J, He H. A GEE-type approach to untangle structural and random zeros in predictors. Stat Methods Med Res 2018; 28:3683-3696. [PMID: 30472921 DOI: 10.1177/0962280218812228] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Count outcomes with excessive zeros are common in behavioral and social studies, and zero-inflated count models such as zero-inflated Poisson (ZIP) and zero-inflated Negative Binomial (ZINB) can be applied when such zero-inflated count data are used as response variable. However, when the zero-inflated count data are used as predictors, ignoring the difference of structural and random zeros can result in biased estimates. In this paper, a generalized estimating equation (GEE)-type mixture model is proposed to jointly model the response of interest and the zero-inflated count predictors. Simulation studies show that the proposed method performs well for practical settings and is more robust for model misspecification than the likelihood-based approach. A case study is also provided for illustration.
Collapse
Affiliation(s)
- Peng Ye
- School of Statistics, University of International Business and Economics, Beijing, China.,Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
| | - Wan Tang
- Department of Global Biostatistics and Data Science, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
| | - Jiang He
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
| | - Hua He
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
| |
Collapse
|
10
|
He H, Zhang H, Ye P, Tang W. A test of inflated zeros for Poisson regression models. Stat Methods Med Res 2017; 28:1157-1169. [PMID: 29284370 DOI: 10.1177/0962280217749991] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Excessive zeros are common in practice and may cause overdispersion and invalidate inference when fitting Poisson regression models. There is a large body of literature on zero-inflated Poisson models. However, methods for testing whether there are excessive zeros are less well developed. The Vuong test comparing a Poisson and a zero-inflated Poisson model is commonly applied in practice. However, the type I error of the test often deviates seriously from the nominal level, rendering serious doubts on the validity of the test in such applications. In this paper, we develop a new approach for testing inflated zeros under the Poisson model. Unlike the Vuong test for inflated zeros, our method does not require a zero-inflated Poisson model to perform the test. Simulation studies show that when compared with the Vuong test our approach not only better at controlling type I error rate, but also yield more power.
Collapse
Affiliation(s)
- Hua He
- 1 Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
| | - Hui Zhang
- 2 Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Peng Ye
- 1 Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA.,3 School of Statistics, University of International Business and Economics, Beijing, China
| | - Wan Tang
- 4 Department of Global Biostatistics and Data Science, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
| |
Collapse
|
11
|
Chen T, Kowalski J, Chen R, Wu P, Zhang H, Feng C, Tu XM. Rank-preserving regression: a more robust rank regression model against outliers. Stat Med 2016; 35:3333-46. [DOI: 10.1002/sim.6930] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2015] [Revised: 02/10/2016] [Accepted: 02/16/2016] [Indexed: 11/07/2022]
Affiliation(s)
- Tian Chen
- Department of Mathematics and Statistics; University of Toledo; Toledo 43606 OH U.S.A
| | - Jeanne Kowalski
- Department of Biostatistics and Bioinformatics; Emory University; Atlanta 30322 GA U.S.A
| | - Rui Chen
- Consumer Behavior; Amazon.com, Inc. 333 Boren Ave N; Seattle 98109 WA U.S.A
| | - Pan Wu
- CValue Institute, Christiana Care Health System; John H Ammon Medical Education Center; Newark 19718 DE U.S.A
| | - Hui Zhang
- Department of Biostatistics; St. Jude Children's Research Hospital; Memphis 38105 TN U.S.A
| | - Changyong Feng
- Department of Biostatistics and Computational Biology; University of Rochester; Rochester 14642 NY U.S.A
| | - Xin M. Tu
- Department of Biostatistics and Computational Biology; University of Rochester; Rochester 14642 NY U.S.A
| |
Collapse
|
12
|
Chen T, Wu P, Tang W, Zhang H, Feng C, Kowalski J, Tu XM. Variable selection for distribution-free models for longitudinal zero-inflated count responses. Stat Med 2016; 35:2770-85. [PMID: 26844819 DOI: 10.1002/sim.6892] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2015] [Revised: 01/08/2016] [Accepted: 01/08/2016] [Indexed: 11/08/2022]
Abstract
Zero-inflated count outcomes arise quite often in research and practice. Parametric models such as the zero-inflated Poisson and zero-inflated negative binomial are widely used to model such responses. Like most parametric models, they are quite sensitive to departures from assumed distributions. Recently, new approaches have been proposed to provide distribution-free, or semi-parametric, alternatives. These methods extend the generalized estimating equations to provide robust inference for population mixtures defined by zero-inflated count outcomes. In this paper, we propose methods to extend smoothly clipped absolute deviation (SCAD)-based variable selection methods to these new models. Variable selection has been gaining popularity in modern clinical research studies, as determining differential treatment effects of interventions for different subgroups has become the norm, rather the exception, in the era of patent-centered outcome research. Such moderation analysis in general creates many explanatory variables in regression analysis, and the advantages of SCAD-based methods over their traditional counterparts render them a great choice for addressing this important and timely issues in clinical research. We illustrate the proposed approach with both simulated and real study data. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Tian Chen
- Department of Mathematics and Statistics, University of Toledo, Toledo, 43606, OH, U.S.A
| | - Pan Wu
- Value Institute, Christiana Care Health System, John H Ammon Medical Education Center, Newark, 19718, DE, U.S.A
| | - Wan Tang
- Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA 70112, U.S.A
| | - Hui Zhang
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, 38105, TN, U.S.A
| | - Changyong Feng
- Department of Biostatistics and Computational Biology, University of Rochester, Rochester, 14642, NY, U.S.A
| | - Jeanne Kowalski
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, U.S.A
| | - Xin M Tu
- Department of Biostatistics and Computational Biology, University of Rochester, Rochester, 14642, NY, U.S.A.,Department of Psychiatry, University of Rochester, Rochester, 14642, NY, U.S.A
| |
Collapse
|