1
|
Bindele HF, Nguelifack BM. Generalized signed-rank estimation for regression models with non-ignorable missing responses. Comput Stat Data Anal 2019. [DOI: 10.1016/j.csda.2019.04.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
2
|
Xu D, Tang N. Bayesian adaptive Lasso for quantile regression models with nonignorably missing response data. COMMUN STAT-SIMUL C 2019. [DOI: 10.1080/03610918.2018.1468452] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Dengke Xu
- Department of Statistics, Yunnan University, Kunming, People’s Republic of China
- Department of Statistics, Zhejiang Agriculture and Forestry University, Hangzhou, People’s Republic of China
| | - Niansheng Tang
- Department of Statistics, Yunnan University, Kunming, People’s Republic of China
| |
Collapse
|
3
|
Affiliation(s)
- Huybrechts F. Bindele
- Department of Mathematics and Statistics; University of South Alabama; Mobile AL 36688-0002 U.S.A
| | - Akim Adekpedjou
- Department of Mathematics and Statistics; Missouri University of Science and Technology; Rolla MO 65409 U.S.A
| |
Collapse
|
4
|
Abstract
It is common in longitudinal studies that missing data occur due to subjects' no response, missed visits, dropout, death or other reasons during the course of study. To perform valid analysis in this setting, data missing not at random (MNAR) have to be considered. However, models for data MNAR often suffer from the identifiability issue and hence result in difficulty in estimation and computational convergence. To ameliorate this issue, we propose the LASSO and ridge-regularized selection models that regularize the missing data mechanism model to handle data MNAR, with the regularization parameter selected via a cross-validation procedure. The proposed models can be also employed for sensitivity analysis to examine the effects on inference of different assumptions about the missing data mechanism. We illustrate the performance of the proposed models via simulation studies and the analysis of data from a randomized clinical trial.
Collapse
Affiliation(s)
- Chi-Hong Tseng
- 1 Department of Medicine, University of California, Los Angeles
| | | |
Collapse
|
5
|
Fu YZ. Stochastic EM algorithm of a finite mixture model from hurdle Poisson distribution with missing responses. COMMUN STAT-THEOR M 2016. [DOI: 10.1080/03610926.2014.953689] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
6
|
Xu D, Daniels MJ, Winterstein AG. Sequential BART for imputation of missing covariates. Biostatistics 2016; 17:589-602. [PMID: 26980459 DOI: 10.1093/biostatistics/kxw009] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Accepted: 02/01/2016] [Indexed: 11/13/2022] Open
Abstract
To conduct comparative effectiveness research using electronic health records (EHR), many covariates are typically needed to adjust for selection and confounding biases. Unfortunately, it is typical to have missingness in these covariates. Just using cases with complete covariates will result in considerable efficiency losses and likely bias. Here, we consider the covariates missing at random with missing data mechanism either depending on the response or not. Standard methods for multiple imputation can either fail to capture nonlinear relationships or suffer from the incompatibility and uncongeniality issues. We explore a flexible Bayesian nonparametric approach to impute the missing covariates, which involves factoring the joint distribution of the covariates with missingness into a set of sequential conditionals and applying Bayesian additive regression trees to model each of these univariate conditionals. Using data augmentation, the posterior for each conditional can be sampled simultaneously. We provide details on the computational algorithm and make comparisons to other methods, including parametric sequential imputation and two versions of multiple imputation by chained equations. We illustrate the proposed approach on EHR data from an affiliated tertiary care institution to examine factors related to hyperglycemia.
Collapse
Affiliation(s)
- Dandan Xu
- Department of Statistics, University of Florida, Gainesville, FL 32601, USA
| | - Michael J Daniels
- Departments of Integrative Biology, and Statistics & Data Sciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Almut G Winterstein
- Departments of Pharmaceutical Outcomes & Policy, and Epidemiology, University of Florida, Gainesville, FL 32601, USA
| |
Collapse
|
7
|
Wang F, Song PXK, Wang L. Merging multiple longitudinal studies with study-specific missing covariates: A joint estimating function approach. Biometrics 2015; 71:929-40. [PMID: 26193911 DOI: 10.1111/biom.12356] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Revised: 04/01/2015] [Accepted: 05/01/2015] [Indexed: 11/28/2022]
Abstract
Merging multiple datasets collected from studies with identical or similar scientific objectives is often undertaken in practice to increase statistical power. This article concerns the development of an effective statistical method that enables to merge multiple longitudinal datasets subject to various heterogeneous characteristics, such as different follow-up schedules and study-specific missing covariates (e.g., covariates observed in some studies but missing in other studies). The presence of study-specific missing covariates presents great statistical methodology challenge in data merging and analysis. We propose a joint estimating function approach to addressing this challenge, in which a novel nonparametric estimating function constructed via splines-based sieve approximation is utilized to bridge estimating equations from studies with missing covariates to those with fully observed covariates. Under mild regularity conditions, we show that the proposed estimator is consistent and asymptotically normal. We evaluate finite-sample performances of the proposed method through simulation studies. In comparison to the conventional multiple imputation approach, our method exhibits smaller estimation bias. We provide an illustrative data analysis using longitudinal cohorts collected in Mexico City to assess the effect of lead exposures on children's somatic growth.
Collapse
Affiliation(s)
- Fei Wang
- Global Analytics, Ford Motor Credit, Dearborn, Michigan 48126, U.S.A
| | - Peter X-K Song
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, U.S.A
| | - Lu Wang
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, U.S.A
| |
Collapse
|
8
|
Chen Q, Wu H, Ware LB, Koyama T. A Bayesian Approach for the Cox Proportional Hazards Model with Covariates Subject to Detection Limit. ACTA ACUST UNITED AC 2014; 3:32-43. [PMID: 24772198 PMCID: PMC3998726 DOI: 10.6000/1929-6029.2014.03.01.5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
Abstract
The research on biomarkers has been limited in its effectiveness because biomarker levels can only be measured within the thresholds of assays and laboratory instruments, a challenge referred to as a detection limit (DL) problem. In this paper, we propose a Bayesian approach to the Cox proportional hazards model with explanatory variables subject to lower, upper, or interval DLs. We demonstrate that by formulating the time-to-event outcome using the Poisson density with counting process notation, implementing the proposed approach in the OpenBUGS and JAGS is straightforward. We have conducted extensive simulations to compare the proposed Bayesian approach to the other four commonly used methods and to evaluate its robustness with respect to the distribution assumption of the biomarkers. The proposed Bayesian approach and other methods were applied to an acute lung injury study, in which a panel of cytokine biomarkers was studied for the biomarkers’ association with ventilation-free survival.
Collapse
Affiliation(s)
- Qingxia Chen
- Department of Biostatistics, Vanderbilt University, Nashville, Tennessee, 37232, USA ; Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, 37232, USA
| | - Huiyun Wu
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, Tennessee, 38105, USA
| | - Lorraine B Ware
- Department of Medicine, Vanderbilt University, Nashville, Tennessee, 37232, USA
| | - Tatsuki Koyama
- Department of Biostatistics, Vanderbilt University, Nashville, Tennessee, 37232, USA
| |
Collapse
|
9
|
Tang NS, Zhao H. Bayesian Analysis of Nonlinear Reproductive Dispersion Mixed Models for Longitudinal Data with Nonignorable Missing Covariates. COMMUN STAT-SIMUL C 2013. [DOI: 10.1080/03610918.2012.732175] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
10
|
Kalaylioglu ZI, Ozturk O. Bayesian semiparametric models for nonignorable missing mechanisms in generalized linear models. J Appl Stat 2013. [DOI: 10.1080/02664763.2013.794329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
11
|
Fonseca RS, Valença DM, Bolfarine H. Cure rate survival models with missing covariates: a simulation study. J STAT COMPUT SIM 2013. [DOI: 10.1080/00949655.2011.613396] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
12
|
Wu H, Chen Q, Ware LB, Koyama T. A Bayesian Approach for Generalized Linear Models with Explanatory Biomarker Measurement Variables Subject to Detection Limit - an Application to Acute Lung Injury. J Appl Stat 2012; 39:1733-1747. [PMID: 23049157 PMCID: PMC3463110 DOI: 10.1080/02664763.2012.681362] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Biomarkers have the potential to improve our understanding of disease diagnosis and prognosis. Biomarker levels that fall below the assay detection limits (DLs), however, compromise the application of biomarkers in research and practice. Most existing methods to handle non-detects focus on a scenario in which the response variable is subject to detection limit; only a few methods consider explanatory variables when dealing with DLs. We propose a Bayesian approach for generalized linear models with explanatory variables subject to lower, upper, or interval DLs. In simulation studies, we compared the proposed Bayesian approach to four commonly used methods in a logistic regression model with explanatory variable measurements subject to DL. We also applied the Bayesian approach and other four methods in a real study, in which a panel of cytokine biomarkers was studied for their association with acute lung injury (ALI). We found that IL8 was associated with a moderate increase in risk for ALI in the model based on the proposed Bayesian approach.
Collapse
Affiliation(s)
- Huiyun Wu
- Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Qingxia Chen
- Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Lorraine B. Ware
- Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Tatsuki Koyama
- Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA
| |
Collapse
|
13
|
Fu YZ. Model Selection of Zero-Inflated Generalized Power Series Distribution with Missing Responses. COMMUN STAT-THEOR M 2012. [DOI: 10.1080/03610926.2010.535633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
14
|
Fu YZ, Chen XD. Model selection of generalized partially linear models with missing covariates. J Stat Plan Inference 2012. [DOI: 10.1016/j.jspi.2011.07.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
15
|
Wand H, Whitaker C, Ramjee G. Geoadditive models to assess spatial variation of HIV infections among women in local communities of Durban, South Africa. Int J Health Geogr 2011; 10:28. [PMID: 21496324 PMCID: PMC3098769 DOI: 10.1186/1476-072x-10-28] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2010] [Accepted: 04/17/2011] [Indexed: 11/21/2022] Open
Abstract
BACKGROUND The severity of the HIV/AIDS epidemic in South Africa varies between and within provinces, with differences noted even at the suburban scale. We investigated the geographical variability of HIV infection in rural areas of the eThekwini Metropolitan Municipality in KwaZulu-Natal province, South Africa. METHOD We used geoadditive models to assess nonlinear geographical variation in HIV prevalence while simultaneously controlling for important demographic and sexual risk factors. A total of 3,469 women who were screened for a Phase-III randomized trial were included in the current analysis. RESULTS We found significant spatial patterns that could not be explained by demographic and sexual risk behaviors. In particular, the epidemic was determined to be much worse 44 km south of Durban after controlling for all demographic and sexual risk behaviors. CONCLUSION The study revealed significant geographic variability in HIV infection in the eThekwini Metropolitan Municipality in KwaZulu-Natal, South Africa.
Collapse
Affiliation(s)
- Handan Wand
- National Centre in HIV Epidemiology and Clinical Research, Sydney, Australia
| | - Claire Whitaker
- HIV Prevention Research Unit, Medical Research Council, Durban, South Africa
| | - Gita Ramjee
- HIV Prevention Research Unit, Medical Research Council, Durban, South Africa
| |
Collapse
|
16
|
Seo B. A gradient-based algorithm for semiparametric models with missing covariates. J STAT COMPUT SIM 2011. [DOI: 10.1080/00949650903359848] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
17
|
|
18
|
Mireku N, Wang Y, Ager J, Reddy RC, Baptist AP. Changes in weather and the effects on pediatric asthma exacerbations. Ann Allergy Asthma Immunol 2009; 103:220-4. [PMID: 19788019 DOI: 10.1016/s1081-1206(10)60185-8] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
BACKGROUND Pediatric asthma exacerbations may correlate with changes in weather, yet this relationship is not well defined. OBJECTIVE To determine the effects of fluctuations in climatic factors (temperature, humidity, and barometric pressure) on pediatric asthma exacerbations. METHODS A retrospective study was performed at 1 large urban hospital during a 2-year period (January 1, 2004, to December 31, 2005). Children presenting to the emergency department (ED) for an asthma exacerbation were included. Data on climactic factors, pollutants, and aeroallergens were collected daily. The relationship of daily (intraday) or between-day (interday) changes in climactic factors and asthma ED visits was evaluated using time series analysis, controlling for seasonality, air pollution, and aeroallergen exposure. The effects of climactic factors were evaluated on the day of admission (T=0) and up to 5 days before admission (T-5 through T-1). RESULTS There were 25,401 asthma ED visits. A 10% intraday increase in humidity on day T-1 or day T-2 was associated with approximately 1 additional ED visit for asthma (P < .001 and P = .01, respectively). Interday changes in humidity from day T - 3 to T-2 were also associated with more ED visits (P < .001). Interday changes in temperature from T-1 to T = 0 increased ED visits, with a 10 degrees F increase being associated with 1.8 additional visits (P = .006). No association was found with changes in barometric pressure. CONCLUSION Fluctuations in humidity and temperature, but not barometric pressure, appear to influence ED visits for pediatric asthma. The additional ED visits occur 1 to 2 days after the fluctuation.
Collapse
Affiliation(s)
- Nana Mireku
- Division of Allergy and Immunology; Children's Hospital of Michigan, Wayne State University School of Medicine, Detroit, Michigan, USA
| | | | | | | | | |
Collapse
|
19
|
Ghosh P, Tu W. Assessing Sexual Attitudes and Behaviors of Young Women: A Joint Model with Nonlinear Time Effects, Time Varying Covariates, and Dropouts. J Am Stat Assoc 2009. [DOI: 10.1198/jasa.2009.0013] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
20
|
Abstract
Incomplete data are quite common in biomedical and other types of research, especially in longitudinal studies. During the last three decades, a vast amount of work has been done in the area. This has led, on the one hand, to a rich taxonomy of missing-data concepts, issues, and methods and, on the other hand, to a variety of data-analytic tools. Elements of taxonomy include: missing data patterns, mechanisms, and modeling frameworks; inferential paradigms; and sensitivity analysis frameworks. These are described in detail. A variety of concrete modeling devices is presented. To make matters concrete, two case studies are considered. The first one concerns quality of life among breast cancer patients, while the second one examines data from the Muscatine children's obesity study.
Collapse
Affiliation(s)
- Joseph G. Ibrahim
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, USA
| | - Geert Molenberghs
- Center for Statistics à International Institute for Biostatistic and Statistical Bioinformatics, Hasselt University and Catholic University Leuven, Agoralaan 1, 3590 Diepenbeek, Belgium
| |
Collapse
|
21
|
|
22
|
Abstract
Semiparametric regression is a fusion between parametric regression and nonparametric regression that integrates low-rank penalized splines, mixed model and hierarchical Bayesian methodology - thus allowing more streamlined handling of longitudinal and spatial correlation. We review progress in the field over the five-year period between 2003 and 2007. We find semiparametric regression to be a vibrant field with substantial involvement and activity, continual enhancement and widespread application.
Collapse
Affiliation(s)
- David Ruppert
- School of Operations Research and Information Engineering, Cornell University, 1170 Comstock Hall, Ithaca, NY 14853, U.S.A
| | | | | |
Collapse
|
23
|
Ghosh P, Tu W. Assessing Sexual Attitudes and Behaviors of Young Women: A Joint Model with Nonlinear Time Effects, Time Varying Covariates, and Dropouts. J Am Stat Assoc 2008; 103:1496-1507. [PMID: 19300533 PMCID: PMC2657729 DOI: 10.1198/016214508000000850] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Understanding human sexual behaviors is essential for the effective prevention of sexually transmitted infections. Analysis of longitudinally measured sexual behavioral data, however, is often complicated by zero-inflation of event counts, nonlinear time trend, time-varying covariates, and informative dropouts. Ignoring these complicating factors could undermine the validity of the study findings. In this paper, we put forth a unified joint modeling structure that accommodates these features of the data. Specifically, we propose a pair of simultaneous models for the zero-inflated event counts: Each of these models contains an auto-regressive structure for the accommodation of the effect of recent event history, and a nonparametric component for the modeling of nonlinear time effect. Informative dropout and time varying covariates are modeled explicitly in the process. Model fitting and parameter estimation are carried out in a Bayesian paradigm by the use of a Markov Chain Monte Carlo (MCMC) method. Analytical results showed that adolescent sexual behaviors tended to evolve nonlinearly over time and they were strongly influenced by the day-to-day variations in mood and sexual interests. These findings suggest that adolescent sex is to a large extent driven by intrinsic factors rather than being compelled by circumstances, thus highlighting the need of education on self protective measures against infection risks.
Collapse
Affiliation(s)
- Pulak Ghosh
- Pulak Ghosh is Assistant Professor, Department of Mathematics and Statistics, Georgia State University, Atlanta, GA 30303 ().Wanzhu Tu is Associate Professor, Division of Biostatistics, Indiana University School of Medicine, 410 West 10th Street, Suite 3000, Indianapolis IN 46202-3002; He is also Research Scientist at Regenstrief Institute, Inc., ()
| | - Wanzhu Tu
- Pulak Ghosh is Assistant Professor, Department of Mathematics and Statistics, Georgia State University, Atlanta, GA 30303 ().Wanzhu Tu is Associate Professor, Division of Biostatistics, Indiana University School of Medicine, 410 West 10th Street, Suite 3000, Indianapolis IN 46202-3002; He is also Research Scientist at Regenstrief Institute, Inc., ()
| |
Collapse
|
24
|
Chen Q, Ibrahim JG, Chen MH, Senchaudhuri P. Theory and Inference for Regression Models with Missing Responses and Covariates. J MULTIVARIATE ANAL 2008; 99:1302-1331. [PMID: 19169388 DOI: 10.1016/j.jmva.2007.08.009] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
In this paper, we carry out an in-depth theoretical investigation for inference with missing response and covariate data for general regression models. We assume that the missing data are Missing at Random (MAR) or Missing Completely at Random (MCAR) throughout. Previous theoretical investigations in the literature have focused only on missing covariates or missing responses, but not both. Here, we consider theoretical properties of the estimates under three different estimation settings: complete case analysis (CC), a complete response analysis (CR) that involves an analysis of those subjects with only completely observed responses, and the all case analysis (AC), which is an analysis based on all of the cases. Under each scenario, we derive general expressions for the likelihood and devise estimation schemes based on the EM algorithm. We carry out a theoretical investigation of the three estimation methods in the normal linear model and analytically characterize the loss of information for each method, as well as derive and compare the asymptotic variances for each method assuming the missing data are MAR or MCAR. In addition, a theoretical investigation of bias for the CC method is also carried out. A simulation study and real dataset are given to illustrate the methodology.
Collapse
Affiliation(s)
- Qingxia Chen
- Qingxia Chen is Assistant Professor, Department of Biostatistics, Vanderbilt University, Nashville, TN 37232, . Joseph G. Ibrahim is Professor, Department of Biostatistics, University of North Carolina, McGavran-Greenberg Hall, Chapel Hill, NC 27599, . Ming-Hui Chen is Professor, Department of Statistics, University of Connecticut, 215 Glenbrook Road, U-4120, Storrs, CT 06269-4120, . Pralay Senchaudhuri is Director of Cytel Software Corporation, Cambridge, MA 02139,
| | | | | | | |
Collapse
|