1
|
Bridge H, Morgan KE, Frost C. Negative variance components and intercept-slope correlations greater than one in magnitude: How do such "non-regular" random intercept and slope models arise, and what should be done when they do? Stat Med 2024; 43:2747-2764. [PMID: 38695394 DOI: 10.1002/sim.10070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 02/17/2024] [Accepted: 03/19/2024] [Indexed: 06/15/2024]
Abstract
Statistical models with random intercepts and slopes (RIAS models) are commonly used to analyze longitudinal data. Fitting such models sometimes results in negative estimates of variance components or estimates on parameter space boundaries. This can be an unlucky chance occurrence, but can also occur because certain marginal distributions are mathematically identical to those from RIAS models with negative intercept and/or slope variance components and/or intercept-slope correlations greater than one in magnitude. We term such parameters "pseudo-variances" and "pseudo-correlations," and the models "non-regular." We use eigenvalue theory to explore how and when such non-regular RIAS models arise, showing: (i) A small number of measurements, short follow-up, and large residual variance increase the parameter space for which data (with a positive semidefinite marginal variance-covariance matrix) are compatible with non-regular RIAS models. (ii) Non-regular RIAS models can arise from model misspecification, when non-linearity in fixed effects is ignored or when random effects are omitted. (iii) A non-regular RIAS model can sometimes be interpreted as a regular linear mixed model with one or more additional random effects, which may not be identifiable from the data. (iv) Particular parameterizations of non-regular RIAS models have no generality for all possible numbers of measurements over time. Because of this lack of generality, we conclude that non-regular RIAS models can only be regarded as plausible data-generating mechanisms in some situations. Nevertheless, fitting a non-regular RIAS model can be acceptable, allowing unbiased inference on fixed effects where commonly recommended alternatives such as dropping the random slope result in bias.
Collapse
Affiliation(s)
- Helen Bridge
- Alumna, London School of Hygiene and Tropical Medicine, London, UK
| | - Katy E Morgan
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK
| | - Chris Frost
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK
| |
Collapse
|
2
|
Kundu D, Sarkar P, Gogoi MP, Das K. A Bayesian joint model for multivariate longitudinal and time-to-event data with application to ALL maintenance studies. J Biopharm Stat 2024; 34:37-54. [PMID: 36882959 DOI: 10.1080/10543406.2023.2187413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Accepted: 02/25/2023] [Indexed: 03/09/2023]
Abstract
The most common type of cancer diagnosed among children is the Acute Lymphocytic Leukemia (ALL). A study was conducted by Tata Translational Cancer Research Center (TTCRC) Kolkata, in which 236 children (diagnosed as ALL patients) were treated for the first two years (approximately) with two standard drugs (6MP and MTx) and were then followed nearly for the next 3 years. The goal is to identify the longitudinal biomarkers that are associated with time-to-relapse, and also to assess the effectiveness of the drugs. We develop a Bayesian joint model in which a linear mixed model is used to jointly model three biomarkers (i.e. white blood cell count, neutrophil count, and platelet count) and a semi-parametric proportional hazards model is used to model the time-to-relapse. Our proposed joint model can assess the effects of different covariates on the progression of the biomarkers, and the effects of the biomarkers (and the covariates) on time-to-relapse. In addition, the proposed joint model can impute the missing longitudinal biomarkers efficiently. Our analysis shows that the white blood cell (WBC) count is not associated with time-to-relapse, but the neutrophil count and the platelet count are significantly associated with it. We also infer that a lower dose of 6MP and a higher dose of MTx jointly result in a lower relapse probability in the follow-up period. Interestingly, we find that relapse probability is the lowest for the patients classified into the "high-risk" group at presentation. The effectiveness of the proposed joint model is assessed through the extensive simulation studies.
Collapse
Affiliation(s)
- Damitri Kundu
- Applied Statistics Division, Indian Statistical Institute, Kolkata, India
| | - Partha Sarkar
- Department of Statistics, University of Florida, Gainesville, Florida, USA
| | - Manash Pratim Gogoi
- Tata Translational Cancer Research Centre, Tata Medical Center, Kolkata, India
| | - Kiranmoy Das
- Applied Statistics Division, Indian Statistical Institute, Kolkata, India
| |
Collapse
|
3
|
Cai M, van Buuren S, Vink G. Graphical and numerical diagnostic tools to assess multiple imputation models by posterior predictive checking. Heliyon 2023; 9:e17077. [PMID: 37360073 PMCID: PMC10285146 DOI: 10.1016/j.heliyon.2023.e17077] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Revised: 06/03/2023] [Accepted: 06/06/2023] [Indexed: 06/28/2023] Open
Abstract
Problem The congenial of the imputation model is crucial for valid statistical inferences. Hence, it is important to develop methodologies for diagnosing imputation models. Aim We propose and evaluate a new diagnostic method based on posterior predictive checking to diagnose the congeniality of fully conditional imputation models. Our method applies to multiple imputation by chained equations, which is widely used in statistical software. Methods The proposed method compares the observed data with their replicates generated under the corresponding posterior predictive distributions to diagnose the performance of imputation models. The method applies to various imputation models, including parametric and semi-parametric approaches and continuous and discrete incomplete variables. We studied the validity of the method through simulation and application. Results The proposed diagnostic method based on posterior predictive checking demonstrates its validity in assessing the performance of imputation models. The method can diagnose the consistency of imputation models with the substantive model and can be applied to a broad range of research contexts. Conclusion The diagnostic method based on posterior predictive checking provides a valuable tool for researchers who use fully conditional specification to handle missing data. By assessing the performance of imputation models, our method can help researchers improve the accuracy and reliability of their analyzes. Furthermore, our method applies to different imputation models. Hence, it is a versatile and valuable tool for researchers identifying plausible imputation models.
Collapse
Affiliation(s)
- Mingyang Cai
- Corresponding author at: Sjoerd Groenman building, Padualaan 14, 3584 CH, Utrecht, the Netherlands.
| | | | | |
Collapse
|
4
|
Kundu D, Sarkar P, Das K. A Bayesian joint model for multivariate longitudinal and time-to-event data with application to ALL maintenance studies. J Biopharm Stat 2023:1-18. [PMID: 36762772 DOI: 10.1080/10543406.2023.2171430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
Abstract
The most common type of cancer diagnosed among children is the acute lymphocytic leukemia (ALL). A study was conducted by Tata Translational Cancer Research Center (TTCRC) Kolkata, in which 236 children (diagnosed as ALL patients) were treated for the first two years (approximately) with two standard drugs (6MP and MTx) and were then followed nearly for the next three years. The goal is to identify the longitudinal biomarkers that are associated with time-to-relapse, and also to assess the effectiveness of the drugs. We develop a Bayesian joint model in which a linear mixed model is used to jointly model three biomarkers (i.e. white blood cell count, neutrophil count, and platelet count) and a semi-parametric proportional hazards model is used to model the time-to-relapse. Our proposed joint model can assess the effects of different covariates on the progression of the biomarkers, and the effects of the biomarkers (and the covariates) on time-to-relapse. In addition, the proposed joint model can impute the missing longitudinal biomarkers efficiently. Our analysis shows that the white blood cell (WBC) count is not associated with time-to-relapse, but the neutrophil count and the platelet count are significantly associated with it. We also infer that a lower dose of 6MP and a higher dose of MTx jointly result in a lower relapse probability in the follow-up period. Interestingly, we find that relapse probability is the lowest for the patients classified into the "high-risk" group at presentation. The effectiveness of the proposed joint model is assessed through the extensive simulation studies.
Collapse
Affiliation(s)
- Damitri Kundu
- Applied Statistics Division, Indian Statistical Institute, Kolkata, India
| | - Partha Sarkar
- Applied Statistics Division, Indian Statistical Institute, Kolkata, India.,Department of Statistics, University of Florida, Gainesville, Florida, USA
| | - Kiranmoy Das
- Applied Statistics Division, Indian Statistical Institute, Kolkata, India
| |
Collapse
|
5
|
Zhao Y. Diagnostic checking of multiple imputation models. ASTA ADVANCES IN STATISTICAL ANALYSIS 2022. [DOI: 10.1007/s10182-021-00429-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
6
|
Liang Z, Wang Q, Wei Y. Robust model selection with covariables missing at random. ANN I STAT MATH 2021. [DOI: 10.1007/s10463-021-00806-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
7
|
Takahashi M. Multiple imputation regression discontinuity designs: Alternative to regression discontinuity designs to estimate the local average treatment effect at the cutoff. COMMUN STAT-SIMUL C 2021. [DOI: 10.1080/03610918.2021.1960374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Masayoshi Takahashi
- School of Information and Data Sciences, Nagasaki University, Nagasaki, Japan
| |
Collapse
|
8
|
Hu Z. Assessing conditional causal effect via characteristic score. Stat Med 2021; 40:5188-5198. [PMID: 34181277 DOI: 10.1002/sim.9119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Revised: 05/28/2021] [Accepted: 06/10/2021] [Indexed: 11/10/2022]
Abstract
Observational studies usually include participants representing the wide heterogeneous population. The conditional causal effect, treatment effect conditional on baseline characteristics, is of practical importance. Its estimation is subject to two challenges. First, the causal effect is not observable in any individual due to counterfactuality. Second, high-dimensional baseline variables are involved to satisfy the ignorable treatment selection assumption and to attain better estimation efficiency. In this work, a nonparametric estimation procedure, along with a pseudo-response, is proposed to estimate the conditional treatment effect through "characteristic score"-a parsimonious representation of baseline variable influence on treatment benefit. Adopting sparse dimension reduction with variable prescreening in the proposed estimation, we aim to identify the key baseline variables that impact the conditional treatment effect and to uncover the characteristic score that best predicts the treatment effect. This approach is applied to an HIV study for assessing the benefit of antiretroviral regimens and identifying the beneficiary subpopulation.
Collapse
Affiliation(s)
- Zonghui Hu
- Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
9
|
Wright WJ, Irvine KM, Higgs MD. Identifying occupancy model inadequacies: can residuals separately assess detection and presence? Ecology 2019; 100:e02703. [PMID: 30932179 DOI: 10.1002/ecy.2703] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/25/2018] [Revised: 02/06/2019] [Accepted: 02/22/2019] [Indexed: 11/08/2022]
Abstract
Occupancy models are widely applied to estimate species distributions, but few methods exist for model checking. Thorough model assessments can uncover inadequacies and allow for deeper ecological insight by exploring structure in the observed data not accounted for by a model. We introduce occupancy model residual definitions that utilize the posterior distribution of the partially latent occupancy states. Residual-based assessments are valuable because they can target specific assumptions and identify ways to improve a model, such as adding spatial correlation or meaningful covariates. Our approach defines separate residuals for occupancy and detection, and we use simulation to examine whether missing structure for modeling detection probabilities can be distinguished from that for occupancy probabilities. In many scenarios, our residual diagnostics were able to separate inadequacies at the different model levels successfully, but we describe other situations when this may not be the case. Applying Moran's I residual diagnostics to assess models for silver-haired (Lasionycteris noctivagans) and little brown (Myotis lucifugus) bats only provided evidence of residual spatial correlation among detections. Targeting specific model assumptions using carefully chosen residual diagnostics is valuable for any analysis, and we remove previous barriers for occupancy analyses-lack of examples and practical advice.
Collapse
Affiliation(s)
- Wilson J Wright
- Department of Ecology, Montana State University, Bozeman, Montana, 59717, USA
| | - Kathryn M Irvine
- U.S. Geological Survey, Northern Rocky Mountain Science Center, 2327 University Way Suite 2, Bozeman, Montana, 59717, USA
| | - Megan D Higgs
- Department of Mathematical Sciences, Montana State University, Bozeman, Montana, 59717, USA
| |
Collapse
|
10
|
Kuo I, Liu T, Patrick R, Trezza C, Bazerman L, Uhrig Castonguay BJ, Peterson J, Kurth A, Beckwith CG. Use of an mHealth Intervention to Improve Engagement in HIV Community-Based Care Among Persons Recently Released from a Correctional Facility in Washington, DC: A Pilot Study. AIDS Behav 2019; 23:1016-1031. [PMID: 30627850 DOI: 10.1007/s10461-018-02389-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
We examined the preliminary effectiveness of a computerized counseling session plus post-incarceration text messaging intervention (CARE + Corrections) to support ART adherence and linkage/engagement in community care among recently incarcerated persons with HIV in Washington, D.C. Recently incarcerated persons with HIV ≥ 18 years old were recruited from the D.C. jail or community outreach and randomized to CARE + Corrections or control arm. Participants completed assessments at baseline, 3-months and 6-months. Multivariable random effects modeling identified predictors of suppressed viral load (≤ 200 copies/mL) and engagement in HIV care at 6 months. Participants (N = 110) were aged 42 (IQR 30-49); 58% male, 24% female, 18% transgender, 85% Black, and lifetime incarceration was a median of 7 years (IQR 2-15). More controls had a regular healthcare provider at baseline. Although not statistically significant, intervention participants had increased odds of viral suppression versus controls at 6 months (AOR 2.04; 95% CI 0.62, 6.70). Those reporting high ART adherence at baseline had higher odds of viral suppression at follow-up (AOR 10.77; 95% CI 1.83, 63.31). HIV care engagement was similar between the two groups, although both groups reported increased engagement at 6 months versus baseline. We observed a positive but non-significant association of viral suppression in the CARE + Corrections group, and care engagement increased in both groups after 6 months. Further attention to increasing viral suppression among CJ-involved persons with HIV upon community reentry is warranted.
Collapse
Affiliation(s)
- Irene Kuo
- Department of Epidemiology and Biostatistics, George Washington University Milken Institute School of Public Health, 950 New Hampshire Avenue NW, Suite 500, Washington, DC, 20052, USA.
| | - Tao Liu
- Brown University School of Public Health, Providence, RI, USA
| | - Rudy Patrick
- Department of Epidemiology and Biostatistics, George Washington University Milken Institute School of Public Health, 950 New Hampshire Avenue NW, Suite 500, Washington, DC, 20052, USA
- University of California San Diego, San Diego, CA, USA
| | - Claudia Trezza
- Department of Epidemiology and Biostatistics, George Washington University Milken Institute School of Public Health, 950 New Hampshire Avenue NW, Suite 500, Washington, DC, 20052, USA
| | | | | | - James Peterson
- Department of Epidemiology and Biostatistics, George Washington University Milken Institute School of Public Health, 950 New Hampshire Avenue NW, Suite 500, Washington, DC, 20052, USA
| | - Ann Kurth
- Yale University School of Nursing, New Haven, CT, USA
| | - Curt G Beckwith
- The Miriam Hospital, Providence, RI, USA
- Alpert Medical School of Brown University, Providence, RI, USA
| |
Collapse
|
11
|
Gu C, Gutman R. Development of a common patient assessment scale across the continuum of care: A nested multiple imputation approach. Ann Appl Stat 2019. [DOI: 10.1214/18-aoas1202] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
12
|
Hegde H, Shimpi N, Panny A, Glurich I, Christie P, Acharya A. MICE vs PPCA: Missing data imputation in healthcare. INFORMATICS IN MEDICINE UNLOCKED 2019. [DOI: 10.1016/j.imu.2019.100275] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
|
13
|
Merchant RC, Zhang Z, Zhang Z, Liu T, Baird JR. Lack of efficacy in a randomised trial of a brief intervention to reduce drug use and increase drug treatment services utilisation among adult emergency department patients over a 12-month period. Emerg Med J 2018; 35:282-288. [PMID: 29437758 DOI: 10.1136/emermed-2016-206540] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2016] [Revised: 01/03/2018] [Accepted: 01/10/2018] [Indexed: 11/04/2022]
Abstract
OBJECTIVES Assess the 12-month efficacy of a brief intervention (BI) on reducing drug use and increasing drug treatment services utilisation among adult emergency department (ED) patients. METHODS This randomised, controlled trial enrolled 18-64-year-old ED patients needing a drug use intervention. Treatment arm participants received a tailored BI while control arm participants only completed the study questionnaires. Self-reported past 3-month drug use and engagement in drug treatment services were compared by study arm at 3-month intervals over 1 year. Multiple imputations were performed to overcome loss-to-follow-up. RESULTS Of the 1030 participants, follow-up completion ranged 55%-64% over the four follow-ups. At 12 months, the two study arms were similar in regards to mean: (1) proportion reporting any drug use (treatment: 67.1% (61.6 to 72.6), control: 74.4% (69.4 to 79.4)); (2) drug use frequency on a five-point scale (treatment: 3.7 (3.3 to 4.2), control: 4.6 (4.0 to 5.2)); (3) total days of drug use (treatment: 28.3 (23.2 to 33.4), control: 33.4 (28.5 to 38.2)); (4) most number of times drugs used/day (treatment: 4.6 (3.6 to 5.5), control: 6.1 (4.8 to 7.3)) and (5) typical number of times drugs used/day (treatment: 3.3 (2.5 to 4.1), control: 5.1 (3.9 to 6.2)). Utilisation of drug treatment services also was similar by study arm. In multivariable regression analyses, patients who were homeless or had higher drug use at baseline continued to have greater drug use in follow-up. CONCLUSIONS Among adult ED patients requiring a drug use intervention, this BI did not decrease drug use or increase drug treatment services utilisation over a 12-month period more than the control condition. TRIAL REGISTRATION NUMBER NCT01124591; Pre-trial.
Collapse
Affiliation(s)
- Roland C Merchant
- Department of Emergency Medicine, Alpert Medical School, Brown University, Providence, Rhode Island, USA.,Department of Epidemiology, School of Public Health, Brown University, Providence, Rhode Island, USA
| | - Zhongli Zhang
- Department of Biostatistics, Center for Statistical Sciences, School of Public Health, Brown University, Providence, Rhode Island, USA
| | - Zihao Zhang
- Department of Biostatistics, Center for Statistical Sciences, School of Public Health, Brown University, Providence, Rhode Island, USA
| | - Tao Liu
- Department of Biostatistics, Center for Statistical Sciences, School of Public Health, Brown University, Providence, Rhode Island, USA
| | - Janette R Baird
- Department of Emergency Medicine, Alpert Medical School, Brown University, Providence, Rhode Island, USA
| |
Collapse
|
14
|
Bernhardt PW. Model validation and influence diagnostics for regression models with missing covariates. Stat Med 2018; 37:1325-1342. [PMID: 29318652 DOI: 10.1002/sim.7584] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2017] [Revised: 11/01/2017] [Accepted: 11/14/2017] [Indexed: 11/11/2022]
Abstract
Missing covariate values are prevalent in regression applications. While an array of methods have been developed for estimating parameters in regression models with missing covariate data for a variety of response types, minimal focus has been given to validation of the response model and influence diagnostics. Previous research has mainly focused on estimating residuals for observations with missing covariates using expected values, after which specialized techniques are needed to conduct proper inference. We suggest a multiple imputation strategy that allows for the use of standard methods for residual analyses on the imputed data sets or a stacked data set. We demonstrate the suggested multiple imputation method by analyzing the Sleep in Mammals data in the context of a linear regression model and the New York Social Indicators Status data with a logistic regression model.
Collapse
Affiliation(s)
- Paul W Bernhardt
- Department of Mathematics and Statistics, Villanova University, Villanova, PA 19085, USA
| |
Collapse
|
15
|
Merchant RC, Romanoff J, Zhang Z, Liu T, Baird JR. Impact of a brief intervention on reducing alcohol use and increasing alcohol treatment services utilization among alcohol- and drug-using adult emergency department patients. Alcohol 2017; 65:71-80. [PMID: 29084632 PMCID: PMC5681406 DOI: 10.1016/j.alcohol.2017.07.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2016] [Revised: 05/30/2017] [Accepted: 07/13/2017] [Indexed: 11/22/2022]
Abstract
Most previous brief intervention (BI) studies have focused on alcohol or drug use, instead of both substances. Our primary aim was to determine if an alcohol- and drug-use BI reduced alcohol use and increased alcohol treatment services utilization among adult emergency department (ED) patients who drink alcohol and require an intervention for their drug use. Our secondary aims were to assess when the greatest relative reductions in alcohol use occurred, and which patients (stratified by need for an alcohol use intervention) reduced their alcohol use the most. In this secondary analysis, we studied a sub-sample of participants from the Brief Intervention for Drug Misuse in the Emergency Department (BIDMED) randomized, controlled trial of a BI vs. no BI, whose responses to the Alcohol, Smoking and Substance Involvement Screening Test (ASSIST) indicated a need for a BI for any drug use, and who also reported alcohol use. Participants were stratified by their ASSIST alcohol subscore: 1) no BI needed, 2) a BI needed, or 3) an intensive intervention needed for alcohol use. Alcohol use and alcohol treatment services utilization were measured every 3 months for 12 months post-enrollment. Of these 833 participants, median age was 29 years-old, 46% were female; 55% were white/non-Hispanic, 27% black/non-Hispanic, and 15% Hispanic. Although any alcohol use, alcohol use frequency, days of alcohol use, typical drinks consumed/day, and most drinks consumed/day decreased in both the BI and no BI arms, there were no differences between study arms. Few patients sought alcohol use treatment services in follow-up, and utilization also did not differ by study arm. Compared to baseline, alcohol use reduced the most during the first 3 months after enrollment, yet reduced little afterward. Participants whose ASSIST alcohol subscores indicated a need for an intensive intervention generally had the greatest relative decreases in alcohol use. These results indicate that the BI was not efficacious in reducing alcohol use among alcohol- and drug-using adult ED patients than the self-assessments alone, but suggest that self-assessments with or without a BI may confer reductions in alcohol use.
Collapse
Affiliation(s)
- Roland C Merchant
- Department of Emergency Medicine, Alpert Medical School, Brown University, Providence, RI, USA; Department of Epidemiology, School of Public Health, Brown University, Providence, RI, USA.
| | - Justin Romanoff
- Department of Biostatistics, Center for Statistical Sciences, School of Public Health, Brown University, Providence, RI, USA
| | - Zihao Zhang
- Department of Biostatistics, Center for Statistical Sciences, School of Public Health, Brown University, Providence, RI, USA
| | - Tao Liu
- Department of Biostatistics, Center for Statistical Sciences, School of Public Health, Brown University, Providence, RI, USA
| | - Janette R Baird
- Department of Emergency Medicine, Alpert Medical School, Brown University, Providence, RI, USA
| |
Collapse
|
16
|
Nguyen CD, Carlin JB, Lee KJ. Model checking in multiple imputation: an overview and case study. Emerg Themes Epidemiol 2017; 14:8. [PMID: 28852415 PMCID: PMC5569512 DOI: 10.1186/s12982-017-0062-6] [Citation(s) in RCA: 88] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2017] [Accepted: 08/07/2017] [Indexed: 11/20/2022] Open
Abstract
Background Multiple imputation has become very popular as a general-purpose method for handling missing data. The validity of multiple-imputation-based analyses relies on the use of an appropriate model to impute the missing values. Despite the widespread use of multiple imputation, there are few guidelines available for checking imputation models.
Analysis In this paper, we provide an overview of currently available methods for checking imputation models. These include graphical checks and numerical summaries, as well as simulation-based methods such as posterior predictive checking. These model checking techniques are illustrated using an analysis affected by missing data from the Longitudinal Study of Australian Children. Conclusions As multiple imputation becomes further established as a standard approach for handling missing data, it will become increasingly important that researchers employ appropriate model checking approaches to ensure that reliable results are obtained when using this method.
Electronic supplementary material The online version of this article (doi:10.1186/s12982-017-0062-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Cattram D Nguyen
- Clinical Epidemiology and Biostatistics Unit, Murdoch Childrens Research Institute, The Royal Children's Hospital, Flemington Road, Parkville, VIC 3052 Australia.,Department of Paediatrics (RCH Academic Centre), Faculty of Medicine, Dentistry and Health Sciences, The Royal Children's Hospital, University of Melbourne, Flemington Road, Parkville, VIC 3052 Australia
| | - John B Carlin
- Clinical Epidemiology and Biostatistics Unit, Murdoch Childrens Research Institute, The Royal Children's Hospital, Flemington Road, Parkville, VIC 3052 Australia.,Department of Paediatrics (RCH Academic Centre), Faculty of Medicine, Dentistry and Health Sciences, The Royal Children's Hospital, University of Melbourne, Flemington Road, Parkville, VIC 3052 Australia
| | - Katherine J Lee
- Clinical Epidemiology and Biostatistics Unit, Murdoch Childrens Research Institute, The Royal Children's Hospital, Flemington Road, Parkville, VIC 3052 Australia.,Department of Paediatrics (RCH Academic Centre), Faculty of Medicine, Dentistry and Health Sciences, The Royal Children's Hospital, University of Melbourne, Flemington Road, Parkville, VIC 3052 Australia
| |
Collapse
|
17
|
Yin P, Shi JQ. Simulation-based sensitivity analysis for non-ignorably missing data. Stat Methods Med Res 2017; 28:289-308. [PMID: 28747095 DOI: 10.1177/0962280217722382] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Sensitivity analysis is popular in dealing with missing data problems particularly for non-ignorable missingness, where full-likelihood method cannot be adopted. It analyses how sensitively the conclusions (output) may depend on assumptions or parameters (input) about missing data, i.e. missing data mechanism. We call models with the problem of uncertainty sensitivity models. To make conventional sensitivity analysis more useful in practice we need to define some simple and interpretable statistical quantities to assess the sensitivity models and make evidence based analysis. We propose a novel approach in this paper on attempting to investigate the possibility of each missing data mechanism model assumption, by comparing the simulated datasets from various MNAR models with the observed data non-parametrically, using the K-nearest-neighbour distances. Some asymptotic theory has also been provided. A key step of this method is to plug in a plausibility evaluation system towards each sensitivity parameter, to select plausible values and reject unlikely values, instead of considering all proposed values of sensitivity parameters as in the conventional sensitivity analysis method. The method is generic and has been applied successfully to several specific models in this paper including meta-analysis model with publication bias, analysis of incomplete longitudinal data and mean estimation with non-ignorable missing data.
Collapse
Affiliation(s)
- Peng Yin
- 1 Department of Biostatistics, University of Liverpool, UK
| | - Jian Q Shi
- 2 School of Mathematics & Statistics, Newcastle University, UK
| |
Collapse
|
18
|
Yucel R. Impact of the non-distinctness and non-ignorability on the inference by multiple imputation in multivariate multilevel data: a simulation assessment. J STAT COMPUT SIM 2017. [DOI: 10.1080/00949655.2017.1288233] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Recai Yucel
- Department of Epidemiology and Biostatistics, State University of New York, Albany, NY, USA
| |
Collapse
|
19
|
Xu D, Chatterjee A, Daniels M. A note on posterior predictive checks to assess model fit for incomplete data. Stat Med 2016; 35:5029-5039. [PMID: 27426216 DOI: 10.1002/sim.7040] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2016] [Revised: 06/15/2016] [Accepted: 06/16/2016] [Indexed: 11/05/2022]
Abstract
We examine two posterior predictive distribution based approaches to assess model fit for incomplete longitudinal data. The first approach assesses fit based on replicated complete data as advocated in Gelman et al. (2005). The second approach assesses fit based on replicated observed data. Differences between the two approaches are discussed and an analytic example is presented for illustration and understanding. Both checks are applied to data from a longitudinal clinical trial. The proposed checks can easily be implemented in standard software like (Win)BUGS/JAGS/Stan. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Dandan Xu
- Department of Statistics, University of Florida, Gainesville, 32611, FL, U.S.A
| | | | - Michael Daniels
- Department of Integrative Biology, Department of Statistics and Data Sciences, The University of Texas, Austin, 78712, TX, U.S.A..
| |
Collapse
|
20
|
Abstract
Latent trait models have long been used in the social science literature for studying variables that can only be measured indirectly through multiple items. However, such models are also very useful in accounting for correlation in multivariate and longitudinal data, particularly when outcomes have mixed measurement scales. Bayesian methods implemented with Markov chain Monte Carlo provide a flexible framework for routine fitting of a broad class of latent variable (LV) models, including very general structural equation models. However, in considering LV models, a number of challenging issues arise, including identifiability, confounding between the mean and variance, uncertainty in different aspects of the model, and difficulty in computation. Motivated by the problem of modelling multidimensional longitudinal data, this article reviews the recent literature, provides some recommendations and highlights areas in need of additional research, focusing on methods for model uncertainty.
Collapse
Affiliation(s)
- David B Dunson
- Biostatistics Branch, National Institute of Environmental Health Sciences, NC 27709, USA.
| |
Collapse
|
21
|
Griffith SD, Shiffman S, Li Y, Heitjan DF. Model-based imputation of latent cigarette counts using data from a calibration study. Int J Methods Psychiatr Res 2016; 25:112-22. [PMID: 26081923 PMCID: PMC6877209 DOI: 10.1002/mpr.1468] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/08/2014] [Revised: 05/15/2014] [Accepted: 06/27/2014] [Indexed: 11/10/2022] Open
Abstract
In addition to dichotomous measures of abstinence, smoking studies may use daily cigarette consumption as an outcome variable. These counts hold the promise of more efficient and detailed analyses than dichotomous measures, but present serious quality issues - measurement error and heaping - if obtained by retrospective recall. A doubly-coded dataset with a retrospective recall measurement (timeline followback, TLFB) and a more precise instantaneous measurement (ecological momentary assessment, EMA) serves as a calibration dataset, allowing us to predict EMA given TLFB and baseline factors. We apply this model to multiply impute precise cigarette counts for a randomized, placebo-controlled trial of bupropion with only TLFB measurements available. To account for repeated measurements on a subject, we induce correlation in the imputed counts. Finally, we analyze the imputed data in a longitudinal model that accommodates random subject effects and zero inflation. Both raw and imputed data show a significant drug effect for reducing the odds of non-abstinence and the number of cigarettes smoked among non-abstainers, but the imputed data provide efficiency gains. This method permits the analysis of daily cigarette consumption data previously deemed suspect due to reporting error and is applicable to other self-reported count data sets for which calibration samples are available. Copyright © 2015 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
| | | | - Yimei Li
- Biostatistics & Epidemiology, University of Pennsylvania
| | - Daniel F Heitjan
- Statistical Science, Southern Methodist University.,Clinical Sciences, University of Texas Southwestern
| |
Collapse
|
22
|
Si Y, Reiter JP, Hillygus DS. Bayesian latent pattern mixture models for handling attrition in panel studies with refreshment samples. Ann Appl Stat 2016. [DOI: 10.1214/15-aoas876] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
23
|
Lee MC, Mitra R. Multiply imputing missing values in data sets with mixed measurement scales using a sequence of generalised linear models. Comput Stat Data Anal 2016. [DOI: 10.1016/j.csda.2015.08.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
|
24
|
Nguyen CD, Lee KJ, Carlin JB. Posterior predictive checking of multiple imputation models. Biom J 2015; 57:676-94. [PMID: 25939490 DOI: 10.1002/bimj.201400034] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2014] [Revised: 11/13/2014] [Accepted: 12/05/2014] [Indexed: 11/09/2022]
Abstract
Multiple imputation is gaining popularity as a strategy for handling missing data, but there is a scarcity of tools for checking imputation models, a critical step in model fitting. Posterior predictive checking (PPC) has been recommended as an imputation diagnostic. PPC involves simulating "replicated" data from the posterior predictive distribution of the model under scrutiny. Model fit is assessed by examining whether the analysis from the observed data appears typical of results obtained from the replicates produced by the model. A proposed diagnostic measure is the posterior predictive "p-value", an extreme value of which (i.e., a value close to 0 or 1) suggests a misfit between the model and the data. The aim of this study was to evaluate the performance of the posterior predictive p-value as an imputation diagnostic. Using simulation methods, we deliberately misspecified imputation models to determine whether posterior predictive p-values were effective in identifying these problems. When estimating the regression parameter of interest, we found that more extreme p-values were associated with poorer imputation model performance, although the results highlighted that traditional thresholds for classical p-values do not apply in this context. A shortcoming of the PPC method was its reduced ability to detect misspecified models with increasing amounts of missing data. Despite the limitations of posterior predictive p-values, they appear to have a valuable place in the imputer's toolkit. In addition to automated checking using p-values, we recommend imputers perform graphical checks and examine other summaries of the test quantity distribution.
Collapse
Affiliation(s)
- Cattram D Nguyen
- Clinical Epidemiology and Biostatistics Unit, Murdoch Childrens Research Institute, The Royal Children's Hospital, Flemington Road Parkville, Victoria, 3052, Australia.,Department of Paediatrics (RCH Academic Centre), Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, The Royal Children's Hospital, Flemington Road Parkville, Victoria, 3052, Australia
| | - Katherine J Lee
- Clinical Epidemiology and Biostatistics Unit, Murdoch Childrens Research Institute, The Royal Children's Hospital, Flemington Road Parkville, Victoria, 3052, Australia.,Department of Paediatrics (RCH Academic Centre), Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, The Royal Children's Hospital, Flemington Road Parkville, Victoria, 3052, Australia
| | - John B Carlin
- Clinical Epidemiology and Biostatistics Unit, Murdoch Childrens Research Institute, The Royal Children's Hospital, Flemington Road Parkville, Victoria, 3052, Australia.,Department of Paediatrics (RCH Academic Centre), Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, The Royal Children's Hospital, Flemington Road Parkville, Victoria, 3052, Australia
| |
Collapse
|
25
|
The rise of multiple imputation: a review of the reporting and implementation of the method in medical research. BMC Med Res Methodol 2015; 15:30. [PMID: 25880850 PMCID: PMC4396150 DOI: 10.1186/s12874-015-0022-1] [Citation(s) in RCA: 214] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2014] [Accepted: 03/18/2015] [Indexed: 12/16/2022] Open
Abstract
Background Missing data are common in medical research, which can lead to a loss in statistical power and potentially biased results if not handled appropriately. Multiple imputation (MI) is a statistical method, widely adopted in practice, for dealing with missing data. Many academic journals now emphasise the importance of reporting information regarding missing data and proposed guidelines for documenting the application of MI have been published. This review evaluated the reporting of missing data, the application of MI including the details provided regarding the imputation model, and the frequency of sensitivity analyses within the MI framework in medical research articles. Methods A systematic review of articles published in the Lancet and New England Journal of Medicine between January 2008 and December 2013 in which MI was implemented was carried out. Results We identified 103 papers that used MI, with the number of papers increasing from 11 in 2008 to 26 in 2013. Nearly half of the papers specified the proportion of complete cases or the proportion with missing data by each variable. In the majority of the articles (86%) the imputed variables were specified. Of the 38 papers (37%) that stated the method of imputation, 20 used chained equations, 8 used multivariate normal imputation, and 10 used alternative methods. Very few articles (9%) detailed how they handled non-normally distributed variables during imputation. Thirty-nine papers (38%) stated the variables included in the imputation model. Less than half of the papers (46%) reported the number of imputations, and only two papers compared the distribution of imputed and observed data. Sixty-six papers presented the results from MI as a secondary analysis. Only three articles carried out a sensitivity analysis following MI to assess departures from the missing at random assumption, with details of the sensitivity analyses only provided by one article. Conclusions This review outlined deficiencies in the documenting of missing data and the details provided about imputation. Furthermore, only a few articles performed sensitivity analyses following MI even though this is strongly recommended in guidelines. Authors are encouraged to follow the available guidelines and provide information on missing data and the imputation process. Electronic supplementary material The online version of this article (doi:10.1186/s12874-015-0022-1) contains supplementary material, which is available to authorized users.
Collapse
|
26
|
|
27
|
Shortreed SM, Laber E, Stroup TS, Pineau J, Murphy SA. A multiple imputation strategy for sequential multiple assignment randomized trials. Stat Med 2014; 33:4202-14. [PMID: 24919867 PMCID: PMC4184954 DOI: 10.1002/sim.6223] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2013] [Revised: 02/20/2014] [Accepted: 05/09/2014] [Indexed: 12/14/2022]
Abstract
Sequential multiple assignment randomized trials (SMARTs) are increasingly being used to inform clinical and intervention science. In a SMART, each patient is repeatedly randomized over time. Each randomization occurs at a critical decision point in the treatment course. These critical decision points often correspond to milestones in the disease process or other changes in a patient's health status. Thus, the timing and number of randomizations may vary across patients and depend on evolving patient-specific information. This presents unique challenges when analyzing data from a SMART in the presence of missing data. This paper presents the first comprehensive discussion of missing data issues typical of SMART studies: we describe five specific challenges and propose a flexible imputation strategy to facilitate valid statistical estimation and inference using incomplete data from a SMART. To illustrate these contributions, we consider data from the Clinical Antipsychotic Trial of Intervention and Effectiveness, one of the most well-known SMARTs to date.
Collapse
Affiliation(s)
- Susan M. Shortreed
- Biostatistics Unit, Group Health Research Institute, Seattle, WA, 98101, U.S.A
- Department of Biostatistics, University of Washington, Seattle, WA, 98195, U.S.A
| | - Eric Laber
- Department of Statistics, North Caroline State University, Raleigh, NC, 27695, U.S.A
| | - T. Scott Stroup
- NYS Psychiatric Institute, Columbia University, New York, NY 10032, U.S.A
| | - Joelle Pineau
- School of Computer Science, McGill University, Montreal, Quebec H3A 0E9, Canada
| | - Susan A. Murphy
- Department of Statistics, University of Michigan, Ann Arbor, MI, 48109, U.S.A
| |
Collapse
|
28
|
Park KY, Qiu P. Model selection and diagnostics for joint modeling of survival and longitudinal data with crossing hazard rate functions. Stat Med 2014; 33:4532-46. [PMID: 25043230 DOI: 10.1002/sim.6259] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2013] [Revised: 05/09/2014] [Accepted: 06/10/2014] [Indexed: 11/11/2022]
Abstract
Comparison of two hazard rate functions is important for evaluating treatment effect in studies concerning times to some important events. In practice, it may happen that the two hazard rate functions cross each other at one or more unknown time points, representing temporal changes of the treatment effect. Also, besides survival data, there could be longitudinal data available regarding some time-dependent covariates. When jointly modeling the survival and longitudinal data in such cases, model selection and model diagnostics are especially important to provide reliable statistical analysis of the data, which are lacking in the literature. In this paper, we discuss several criteria for assessing model fit that have been used for model selection and apply them to the joint modeling of survival and longitudinal data for comparing two crossing hazard rate functions. We also propose hypothesis testing and graphical methods for model diagnostics of the proposed joint modeling approach. Our proposed methods are illustrated by a simulation study and by a real-data example concerning two early breast cancer treatments.
Collapse
Affiliation(s)
- Ka Young Park
- Department of Biostatistics, University of Florida, Gainesville, FL 32610, U.S.A
| | | |
Collapse
|
29
|
Geerlings H, Laros JA, Tellegen PJ, Glas CAW. Testing the difficulty theory of the SON-R 5(1/2)-17, a non-verbal test of intelligence. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2014; 67:248-265. [PMID: 23773035 DOI: 10.1111/bmsp.12017] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2009] [Revised: 04/26/2013] [Indexed: 06/02/2023]
Abstract
Fischer's (1973) linear logistic test model can be used to test hypotheses regarding the effect of covariates on item difficulty and to predict the difficulty of newly constructed test items. However, its assumptions of equal discriminatory power across items and a perfect prediction of item difficulty are never absolutely met. The amount of misfit in an application of a Bayesian version of the model to two subtests of the SON-R 5(1/2)-17 is investigated by means of item fit statistics in the framework of posterior predictive checks and by means of a comparison with a model that allows for residual (co)variance in the item parameters. The effect of the degree of residual (co)variance on the robustness of inferences is investigated in a simulation study.
Collapse
|
30
|
Nguyen CD, Carlin JB, Lee KJ. Diagnosing problems with imputation models using the Kolmogorov-Smirnov test: a simulation study. BMC Med Res Methodol 2013; 13:144. [PMID: 24252653 PMCID: PMC3840572 DOI: 10.1186/1471-2288-13-144] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2013] [Accepted: 11/12/2013] [Indexed: 11/20/2022] Open
Abstract
Background Multiple imputation (MI) is becoming increasingly popular as a strategy for handling missing data, but there is a scarcity of tools for checking the adequacy of imputation models. The Kolmogorov-Smirnov (KS) test has been identified as a potential diagnostic method for assessing whether the distribution of imputed data deviates substantially from that of the observed data. The aim of this study was to evaluate the performance of the KS test as an imputation diagnostic. Methods Using simulation, we examined whether the KS test could reliably identify departures from assumptions made in the imputation model. To do this we examined how the p-values from the KS test behaved when skewed and heavy-tailed data were imputed using a normal imputation model. We varied the amount of missing data, the missing data models and the amount of skewness, and evaluated the performance of KS test in diagnosing issues with the imputation models under these different scenarios. Results The KS test was able to flag differences between the observations and imputed values; however, these differences did not always correspond to problems with MI inference for the regression parameter of interest. When there was a strong missing at random dependency, the KS p-values were very small, regardless of whether or not the MI estimates were biased; so that the KS test was not able to discriminate between imputed variables that required further investigation, and those that did not. The p-values were also sensitive to sample size and the proportion of missing data, adding to the challenge of interpreting the results from the KS test. Conclusions Given our study results, it is difficult to establish guidelines or recommendations for using the KS test as a diagnostic tool for MI. The investigation of other imputation diagnostics and their incorporation into statistical software are important areas for future research.
Collapse
Affiliation(s)
- Cattram D Nguyen
- Clinical Epidemiology & Biostatistics Unit, Murdoch Childrens Research Institute, The Royal Children's Hospital, Flemington Road Parkville, Melbourne, Victoria 3052, Australia.
| | | | | |
Collapse
|
31
|
Gastonguay MR, French JL, Heitjan DF, Rogers JA, Ahn JE, Ravva P. Missing Data in Model-Based Pharmacometric Applications: Points to Consider. J Clin Pharmacol 2013; 50:63S-74S. [DOI: 10.1177/0091270010378409] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
32
|
Daniels MJ, Chatterjee AS, Wang C. Bayesian model selection for incomplete data using the posterior predictive distribution. Biometrics 2012; 68:1055-63. [PMID: 22551040 PMCID: PMC3890150 DOI: 10.1111/j.1541-0420.2012.01766.x] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
We explore the use of a posterior predictive loss criterion for model selection for incomplete longitudinal data. We begin by identifying a property that most model selection criteria for incomplete data should consider. We then show that a straightforward extension of the Gelfand and Ghosh (1998, Biometrika, 85, 1-11) criterion to incomplete data has two problems. First, it introduces an extra term (in addition to the goodness of fit and penalty terms) that compromises the criterion. Second, it does not satisfy the aforementioned property. We propose an alternative and explore its properties via simulations and on a real dataset and compare it to the deviance information criterion (DIC). In general, the DIC outperforms the posterior predictive criterion, but the latter criterion appears to work well overall and is very easy to compute unlike the DIC in certain classes of models for missing data.
Collapse
Affiliation(s)
- Michael J Daniels
- Department of Statistics, University of Florida, Gainesville, FL 32611, USA.
| | | | | |
Collapse
|
33
|
|
34
|
Rogers JA, Polhamus D, Gillespie WR, Ito K, Romero K, Qiu R, Stephenson D, Gastonguay MR, Corrigan B. Combining patient-level and summary-level data for Alzheimer's disease modeling and simulation: a β regression meta-analysis. J Pharmacokinet Pharmacodyn 2012; 39:479-98. [PMID: 22821139 DOI: 10.1007/s10928-012-9263-3] [Citation(s) in RCA: 61] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2012] [Accepted: 07/03/2012] [Indexed: 11/25/2022]
Abstract
Our objective was to develop a beta regression (BR) model to describe the longitudinal progression of the 11 item Alzheimer's disease (AD) assessment scale cognitive subscale (ADAS-cog) in AD patients in both natural history and randomized clinical trial settings, utilizing both individual patient and summary level literature data. Patient data from the coalition against major diseases database (3,223 patients), the Alzheimer's disease neruroimaging initiative study database (186 patients), and summary data from 73 literature references (representing 17,235 patients) were fit to a BR drug-disease-trial model. Treatment effects for currently available acetyl cholinesterase inhibitors, longitudinal changes in disease severity, dropout rate, placebo effect, and factors influencing these parameters were estimated in the model. Based on predictive checks and external validation, an adequate BR meta-analysis model for ADAS-cog using both summary-level and patient-level data was developed. Baseline ADAS-cog was estimated from baseline MMSE score. Disease progression was dependent on time, ApoE4 status, age, and gender. Study drop out was a function of time, baseline age, and baseline MMSE. The use of the BR constrained simulations to the 0-70 range of the ADAS-cog, even when residuals were incorporated. The model allows for simultaneous fitting of summary and patient level data, allowing for integration of all information available. A further advantage of the BR model is that it constrains values to the range of the original instrument for simulation purposes, in contrast to methodologies that provide appropriate constraints only for conditional expectations.
Collapse
|
35
|
Inferring Upon Heterogeneous Associations in Dairy Cattle Performance Using a Bivariate Hierarchical Model. JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS 2012. [DOI: 10.1007/s13253-012-0084-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
36
|
Wang H, Shiffman S, Griffith SD, Heitjan DF. Truth and Memory: Linking Instantaneous and Retrospective Self-Reported Cigarette Consumption. Ann Appl Stat 2012; 6:1689-1706. [PMID: 24432181 PMCID: PMC3889075 DOI: 10.1214/12-aoas557] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Studies of smoking behavior commonly use the time-line follow-back (TLFB) method, or periodic retrospective recall, to gather data on daily cigarette consumption. TLFB is considered adequate for identifying periods of abstinence and lapse but not for measurement of daily cigarette consumption, thanks to substantial recall and digit preference biases. With the development of the hand-held electronic diary (ED), it has become possible to collect cigarette consumption data using ecological momentary assessment (EMA), or the instantaneous recording of each cigarette as it is smoked. EMA data, because they do not rely on retrospective recall, are thought to more accurately measure cigarette consumption. In this article we present an analysis of consumption data collected simultaneously by both methods from 236 active smokers in the pre-quit phase of a smoking cessation study. We define a statistical model that describes the genesis of the TLFB records as a two-stage process of mis-remembering and rounding, including fixed and random effects at each stage. We use Bayesian methods to estimate the model, and we evaluate its adequacy by studying histograms of imputed values of the latent remembered cigarette count. Our analysis suggests that both mis-remembering and heaping contribute substantially to the distortion of self-reported cigarette counts. Higher nicotine dependence, white ethnicity and male sex are associated with greater remembered smoking given the EMA count. The model is potentially useful in other applications where it is desirable to understand the process by which subjects remember and report true observations.
Collapse
|
37
|
He Y, Zaslavsky AM. Diagnosing imputation models by applying target analyses to posterior replicates of completed data. Stat Med 2011; 31:1-18. [PMID: 22139814 DOI: 10.1002/sim.4413] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2010] [Accepted: 08/02/2011] [Indexed: 01/10/2023]
Abstract
Multiple imputation fills in missing data with posterior predictive draws from imputation models. To assess the adequacy of imputation models, we can compare completed data with their replicates simulated under the imputation model. We apply analyses of substantive interest to both datasets and use posterior predictive checks of the differences of these estimates to quantify the evidence of model inadequacy. We can further integrate out the imputed missing data and their replicates over the completed-data analyses to reduce variance in the comparison. In many cases, the checking procedure can be easily implemented using standard imputation software by treating re-imputations under the model as posterior predictive replicates. Thus, it can be applied for non-Bayesian imputation methods. We also sketch several strategies for applying the method in the context of practical imputation analyses. We illustrate the method using two real data applications and study its property using a simulation.
Collapse
Affiliation(s)
- Yulei He
- Department of Health Care Policy, Harvard Medical School, Boston, MA 02115, USA.
| | | |
Collapse
|
38
|
Su L. A marginalized conditional linear model for longitudinal binary data when informative dropout occurs in continuous time. Biostatistics 2011; 13:355-68. [PMID: 22133756 PMCID: PMC3297830 DOI: 10.1093/biostatistics/kxr041] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Within the pattern-mixture modeling framework for informative dropout, conditional linear models (CLMs) are a useful approach to deal with dropout that can occur at any point in continuous time (not just at observation times). However, in contrast with selection models, inferences about marginal covariate effects in CLMs are not readily available if nonidentity links are used in the mean structures. In this article, we propose a CLM for long series of longitudinal binary data with marginal covariate effects directly specified. The association between the binary responses and the dropout time is taken into account by modeling the conditional mean of the binary response as well as the dependence between the binary responses given the dropout time. Specifically, parameters in both the conditional mean and dependence models are assumed to be linear or quadratic functions of the dropout time; and the continuous dropout time distribution is left completely unspecified. Inference is fully Bayesian. We illustrate the proposed model using data from a longitudinal study of depression in HIV-infected women, where the strategy of sensitivity analysis based on the extrapolation method is also demonstrated.
Collapse
Affiliation(s)
- Li Su
- MRC Biostatistics Unit, Robinson Way, Cambridge, UK.
| |
Collapse
|
39
|
Yucel RM, He Y, Zaslavsky AM. Gaussian-based routines to impute categorical variables in health surveys. Stat Med 2011; 30:3447-60. [PMID: 21976366 DOI: 10.1002/sim.4355] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2010] [Revised: 06/22/2011] [Accepted: 06/30/2011] [Indexed: 11/08/2022]
Abstract
The multivariate normal (MVN) distribution is arguably the most popular parametric model used in imputation and is available in most software packages (e.g., SAS PROC MI, R package norm). When it is applied to categorical variables as an approximation, practitioners often either apply simple rounding techniques for ordinal variables or create a distinct 'missing' category and/or disregard the nominal variable from the imputation phase. All of these practices can potentially lead to biased and/or uninterpretable inferences. In this work, we develop a new rounding methodology calibrated to preserve observed distributions to multiply impute missing categorical covariates. The major attractiveness of this method is its flexibility to use any 'working' imputation software, particularly those based on MVN, allowing practitioners to obtain usable imputations with small biases. A simulation study demonstrates the clear advantage of the proposed method in rounding ordinal variables and, in some scenarios, its plausibility in imputing nominal variables. We illustrate our methods on a widely used National Survey of Children with Special Health Care Needs where incomplete values on race posed a valid threat on inferences pertaining to disparities.
Collapse
Affiliation(s)
- Recai M Yucel
- Department of Epidemiology and Biostatistics, School of Public Health, University at Albany, SUNY, One University Place, Rensselaer, NY 12144-3456, USA.
| | | | | |
Collapse
|
40
|
|
41
|
Brun M, Abraham C, Jarry M, Dumas J, Lange F, Prévost E. Estimating an homogeneous series of a population abundance indicator despite changes in data collection procedure: A hierarchical Bayesian modelling approach. Ecol Modell 2011. [DOI: 10.1016/j.ecolmodel.2010.10.030] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
42
|
|
43
|
Informing sequential clinical decision-making through reinforcement learning: an empirical study. Mach Learn 2010; 84:109-136. [PMID: 21799585 DOI: 10.1007/s10994-010-5229-0] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
This paper highlights the role that reinforcement learning can play in the optimization of treatment policies for chronic illnesses. Before applying any off-the-shelf reinforcement learning methods in this setting, we must first tackle a number of challenges. We outline some of these challenges and present methods for overcoming them. First, we describe a multiple imputation approach to overcome the problem of missing data. Second, we discuss the use of function approximation in the context of a highly variable observation set. Finally, we discuss approaches to summarizing the evidence in the data for recommending a particular action and quantifying the uncertainty around the Q-function of the recommended policy. We present the results of applying these methods to real clinical trial data of patients with schizophrenia.
Collapse
|
44
|
JIN LEI, WANG SUOJIN. A Model Validation Procedure when Covariate Data are Missing at Random. Scand Stat Theory Appl 2010. [DOI: 10.1111/j.1467-9469.2009.00674.x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
45
|
Abstract
Missing data are a pervasive problem in health investigations. We describe some background of missing data analysis and criticize ad hoc methods that are prone to serious problems. We then focus on multiple imputation, in which missing cases are first filled in by several sets of plausible values to create multiple completed datasets, then standard complete-data procedures are applied to each completed dataset, and finally the multiple sets of results are combined to yield a single inference. We introduce the basic concepts and general methodology and provide some guidance for application. For illustration, we use a study assessing the effect of cardiovascular diseases on hospice discussion for late stage lung cancer patients.
Collapse
Affiliation(s)
- Yulei He
- Department of Health Care Policy, Harvard Medical School, 180 Longwood Ave, Boston, MA 02115, USA.
| |
Collapse
|
46
|
Yucel RM, Demirtas H. Impact of non-normal random effects on inference by multiple imputation: A simulation assessment. Comput Stat Data Anal 2010; 54:790-801. [PMID: 20526424 DOI: 10.1016/j.csda.2009.01.016] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Multivariate extensions of well-known linear mixed-effects models have been increasingly utilized in inference by multiple imputation in the analysis of multilevel incomplete data. The normality assumption for the underlying error terms and random effects plays a crucial role in simulating the posterior predictive distribution from which the multiple imputations are drawn. The plausibility of this normality assumption on the subject-specific random effects is assessed. Specifically, the performance of multiple imputation created under a multivariate linear mixed-effects model is investigated on a diverse set of incomplete data sets simulated under varying distributional characteristics. Under moderate amounts of missing data, the simulation study confirms that the underlying model leads to a well-calibrated procedure with negligible biases and actual coverage rates close to nominal rates in estimates of the regression coefficients. Estimation quality of the random-effect variance and association measures, however, are negatively affected from both the misspecification of the random-effect distribution and number of incompletely-observed variables. Some of the adverse impacts include lower coverage rates and increased biases.
Collapse
Affiliation(s)
- Recai M Yucel
- Department of Epidemiology and Biostatistics, School of Public Health, University at Albany, SUNY, One University Place Room 139, Rensselaer, NY 12144, United States
| | | |
Collapse
|
47
|
Molenberghs G. Incomplete Data in Clinical Studies: Analysis, Sensitivity, and Sensitivity Analysis. ACTA ACUST UNITED AC 2009. [DOI: 10.1177/009286150904300404] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
48
|
He Y, Zaslavsky AM, Landrum MB, Harrington DP, Catalano P. Multiple imputation in a large-scale complex survey: a practical guide. Stat Methods Med Res 2009; 19:653-70. [PMID: 19654173 DOI: 10.1177/0962280208101273] [Citation(s) in RCA: 110] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The Cancer Care Outcomes Research and Surveillance (CanCORS) Consortium is a multisite, multimode, multiwave study of the quality and patterns of care delivered to population-based cohorts of newly diagnosed patients with lung and colorectal cancer. As is typical in observational studies, missing data are a serious concern for CanCORS, following complicated patterns that impose severe challenges to the consortium investigators. Despite the popularity of multiple imputation of missing data, its acceptance and application still lag in large-scale studies with complicated data sets such as CanCORS. We use sequential regression multiple imputation, implemented in public-available software, to deal with non-response in the CanCORS surveys and construct a centralised completed database that can be easily used by investigators from multiple sites. Our work illustrates the feasibility of multiple imputation in a large-scale multiobjective survey, showing its capacity to handle complex missing data. We present the implementation process in detail as an example for practitioners and discuss some of the challenging issues which need further research.
Collapse
Affiliation(s)
- Y He
- Department of Health Care Policy, Harvard Medical School, 180 Longwood Ave., Boston, MA 02115, USA.
| | | | | | | | | |
Collapse
|
49
|
STEINBAKK GUNNHILDURHÖGNADÓTTIR, STORVIK GEIROLVE. Posterior Predictivep-values in Bayesian Hierarchical Models. Scand Stat Theory Appl 2009. [DOI: 10.1111/j.1467-9469.2008.00630.x] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
50
|
Liu X, Daniels MJ, Marcus B. Joint Models for the Association of Longitudinal Binary and Continuous Processes With Application to a Smoking Cessation Trial. J Am Stat Assoc 2009; 104:429-438. [PMID: 20161053 DOI: 10.1198/016214508000000904] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Joint models for the association of a longitudinal binary and a longitudinal continuous process are proposed for situations in which their association is of direct interest. The models are parameterized such that the dependence between the two processes is characterized by unconstrained regression coefficients. Bayesian variable selection techniques are used to parsimoniously model these coefficients. A Markov chain Monte Carlo (MCMC) sampling algorithm is developed for sampling from the posterior distribution, using data augmentation steps to handle missing data. Several technical issues are addressed to implement the MCMC algorithm efficiently. The models are motivated by, and are used for, the analysis of a smoking cessation clinical trial in which an important question of interest was the effect of the (exercise) treatment on the relationship between smoking cessation and weight gain.
Collapse
Affiliation(s)
- Xuefeng Liu
- Department of Biostatistics and Epidemiology, East Tennessee State University, Johnson City, TN 37614
| | | | | |
Collapse
|