1
|
Oganisian A, Mitra N, Roy JA. Hierarchical Bayesian bootstrap for heterogeneous treatment effect estimation. Int J Biostat 2024; 20:93-106. [PMID: 36584112 DOI: 10.1515/ijb-2022-0051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 12/05/2022] [Indexed: 12/31/2022]
Abstract
A major focus of causal inference is the estimation of heterogeneous average treatment effects (HTE) - average treatment effects within strata of another variable of interest such as levels of a biomarker, education, or age strata. Inference involves estimating a stratum-specific regression and integrating it over the distribution of confounders in that stratum - which itself must be estimated. Standard practice involves estimating these stratum-specific confounder distributions independently (e.g. via the empirical distribution or Rubin's Bayesian bootstrap), which becomes problematic for sparsely populated strata with few observed confounder vectors. In this paper, we develop a nonparametric hierarchical Bayesian bootstrap (HBB) prior over the stratum-specific confounder distributions for HTE estimation. The HBB partially pools the stratum-specific distributions, thereby allowing principled borrowing of confounder information across strata when sparsity is a concern. We show that posterior inference under the HBB can yield efficiency gains over standard marginalization approaches while avoiding strong parametric assumptions about the confounder distribution. We use our approach to estimate the adverse event risk of proton versus photon chemoradiotherapy across various cancer types.
Collapse
Affiliation(s)
- Arman Oganisian
- Department of Biostatistics, Brown University, Providence, RI, USA
| | - Nandita Mitra
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Jason A Roy
- Department of Biostatistics and Epidemiology, Rutgers University, Piscataway, NJ, USA
| |
Collapse
|
2
|
Kim MK, Rouphael C, McMichael J, Welch N, Dasarathy S. Challenges in and Opportunities for Electronic Health Record-Based Data Analysis and Interpretation. Gut Liver 2024; 18:201-208. [PMID: 37905424 PMCID: PMC10938158 DOI: 10.5009/gnl230272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 08/15/2023] [Indexed: 11/02/2023] Open
Abstract
Electronic health records (EHRs) have been increasingly adopted in clinical practices across the United States, providing a primary source of data for clinical research, particularly observational cohort studies. EHRs are a high-yield, low-maintenance source of longitudinal real-world data for large patient populations and provide a wealth of information and clinical contexts that are useful for clinical research and translation into practice. Despite these strengths, it is important to recognize the multiple limitations and challenges related to the use of EHR data in clinical research. Missing data are a major source of error and biases and can affect the representativeness of the cohort of interest, as well as the accuracy of the outcomes and exposures. Here, we aim to provide a critical understanding of the types of data available in EHRs and describe the impact of data heterogeneity, quality, and generalizability, which should be evaluated prior to and during the analysis of EHR data. We also identify challenges pertaining to data quality, including errors and biases, and examine potential sources of such biases and errors. Finally, we discuss approaches to mitigate and remediate these limitations. A proactive approach to addressing these issues can help ensure the integrity and quality of EHR data and the appropriateness of their use in clinical studies.
Collapse
Affiliation(s)
- Michelle Kang Kim
- Department of Gastroenterology, Hepatology, and Nutrition, Digestive Disease and Surgery Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Carol Rouphael
- Department of Gastroenterology, Hepatology, and Nutrition, Digestive Disease and Surgery Institute, Cleveland Clinic, Cleveland, OH, USA
| | - John McMichael
- Department of Surgery, Digestive Disease and Surgery Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Nicole Welch
- Department of Gastroenterology, Hepatology, and Nutrition, Digestive Disease and Surgery Institute, Cleveland Clinic, Cleveland, OH, USA
- Department of Inflammation and Immunity, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Srinivasan Dasarathy
- Department of Gastroenterology, Hepatology, and Nutrition, Digestive Disease and Surgery Institute, Cleveland Clinic, Cleveland, OH, USA
- Department of Inflammation and Immunity, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| |
Collapse
|
3
|
Shepherd DA, Baer BR, Moreno-Betancur M. Confounding-adjustment methods for the causal difference in medians. BMC Med Res Methodol 2023; 23:288. [PMID: 38062364 PMCID: PMC10702096 DOI: 10.1186/s12874-023-02100-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 11/07/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND With continuous outcomes, the average causal effect is typically defined using a contrast of expected potential outcomes. However, in the presence of skewed outcome data, the expectation (population mean) may no longer be meaningful. In practice the typical approach is to continue defining the estimand this way or transform the outcome to obtain a more symmetric distribution, although neither approach may be entirely satisfactory. Alternatively the causal effect can be redefined as a contrast of median potential outcomes, yet discussion of confounding-adjustment methods to estimate the causal difference in medians is limited. In this study we described and compared confounding-adjustment methods to address this gap. METHODS The methods considered were multivariable quantile regression, an inverse probability weighted (IPW) estimator, weighted quantile regression (another form of IPW) and two little-known implementations of g-computation for this problem. Methods were evaluated within a simulation study under varying degrees of skewness in the outcome and applied to an empirical study using data from the Longitudinal Study of Australian Children. RESULTS Simulation results indicated the IPW estimator, weighted quantile regression and g-computation implementations minimised bias across all settings when the relevant models were correctly specified, with g-computation additionally minimising the variance. Multivariable quantile regression, which relies on a constant-effect assumption, consistently yielded biased results. Application to the empirical study illustrated the practical value of these methods. CONCLUSION The presented methods provide appealing avenues for estimating the causal difference in medians.
Collapse
Affiliation(s)
- Daisy A Shepherd
- Clinical Epidemiology & Biostatistics Unit, Department of Paediatrics, The University of Melbourne, The Royal Children's Hospital, Melbourne, VIC, 3052, Australia.
- Clinical Epidemiology & Biostatistics Unit, The Murdoch Children's Research Institute, The Royal Children's Hospital, Melbourne, VIC, 3052, Australia.
| | - Benjamin R Baer
- Department of Biostatistics and Computational Biology, The University of Rochester, Rochester, New York, 14642, USA
| | - Margarita Moreno-Betancur
- Clinical Epidemiology & Biostatistics Unit, Department of Paediatrics, The University of Melbourne, The Royal Children's Hospital, Melbourne, VIC, 3052, Australia
- Clinical Epidemiology & Biostatistics Unit, The Murdoch Children's Research Institute, The Royal Children's Hospital, Melbourne, VIC, 3052, Australia
| |
Collapse
|
4
|
Linero AR. Prior and posterior checking of implicit causal assumptions. Biometrics 2023; 79:3153-3164. [PMID: 37325868 DOI: 10.1111/biom.13886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Accepted: 05/18/2023] [Indexed: 06/17/2023]
Abstract
Causal inference practitioners have increasingly adopted machine learning techniques with the aim of producing principled uncertainty quantification for causal effects while minimizing the risk of model misspecification. Bayesian nonparametric approaches have attracted attention as well, both for their flexibility and their promise of providing natural uncertainty quantification. Priors on high-dimensional or nonparametric spaces, however, can often unintentionally encode prior information that is at odds with substantive knowledge in causal inference-specifically, the regularization required for high-dimensional Bayesian models to work can indirectly imply that the magnitude of the confounding is negligible. In this paper, we explain this problem and provide tools for (i) verifying that the prior distribution does not encode an inductive bias away from confounded models and (ii) verifying that the posterior distribution contains sufficient information to overcome this issue if it exists. We provide a proof-of-concept on simulated data from a high-dimensional probit-ridge regression model, and illustrate on a Bayesian nonparametric decision tree ensemble applied to a large medical expenditure survey.
Collapse
Affiliation(s)
- Antonio R Linero
- Department of Statistics and Data Science, University of Texas at Austin, Austin, Texas, USA
| |
Collapse
|
5
|
Peskoe SB, Arterburn D, Coleman KJ, Herrinton LJ, Daniels MJ, Haneuse S. Adjusting for selection bias due to missing data in electronic health records-based research. Stat Methods Med Res 2021; 30:2221-2238. [PMID: 34445911 PMCID: PMC10942747 DOI: 10.1177/09622802211027601] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
While electronic health records data provide unique opportunities for research, numerous methodological issues must be considered. Among these, selection bias due to incomplete/missing data has received far less attention than other issues. Unfortunately, standard missing data approaches (e.g. inverse-probability weighting and multiple imputation) generally fail to acknowledge the complex interplay of heterogeneous decisions made by patients, providers, and health systems that govern whether specific data elements in the electronic health records are observed. This, in turn, renders the missing-at-random assumption difficult to believe in standard approaches. In the clinical literature, the collection of decisions that gives rise to the observed data is referred to as the data provenance. Building on a recently-proposed framework for modularizing the data provenance, we develop a general and scalable framework for estimation and inference with respect to regression models based on inverse-probability weighting that allows for a hierarchy of missingness mechanisms to better align with the complex nature of electronic health records data. We show that the proposed estimator is consistent and asymptotically Normal, derive the form of the asymptotic variance, and propose two consistent estimators. Simulations show that naïve application of standard methods may yield biased point estimates, that the proposed estimators have good small-sample properties, and that researchers may have to contend with a bias-variance trade-off as they consider how to handle missing data. The proposed methods are motivated by an on-going, electronic health records-based study of bariatric surgery.
Collapse
Affiliation(s)
- Sarah B Peskoe
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - David Arterburn
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| | - Karen J Coleman
- Kaiser Permanente Department of Research & Evaluation, Pasadena, CA, USA
| | | | - Michael J Daniels
- Department of Statistics, University of Florida, Gainesville, FL, USA
| | - Sebastien Haneuse
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
6
|
Linero AR. Simulation-based estimators of analytically intractable causal effects. Biometrics 2021; 78:1001-1017. [PMID: 34051105 DOI: 10.1111/biom.13499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2020] [Revised: 03/31/2021] [Accepted: 04/28/2021] [Indexed: 11/28/2022]
Abstract
In causal inference problems, one is often tasked with estimating causal effects which are analytically intractable functionals of the data-generating mechanism. Relevant settings include estimating intention-to-treat effects in longitudinal problems with missing data or computing direct and indirect effects in mediation analysis. One approach to computing these effects is to use the g-formula implemented via Monte Carlo integration; when simulation-based methods such as the nonparametric bootstrap or Markov chain Monte Carlo are used for inference, Monte Carlo integration must be nested within an already computationally intensive algorithm. We develop a widely-applicable approach to accelerating this Monte Carlo integration step which greatly reduces the computational burden of existing g-computation algorithms. We refer to our method as accelerated g-computation (AGC). The algorithms we present are similar in spirit to multiple imputation, but require removing within-imputation variance from the standard error rather than adding it. We illustrate the use of AGC on a mediation analysis problem using a beta regression model and in a longitudinal clinical trial subject to nonignorable missingness using a Bayesian additive regression trees model.
Collapse
Affiliation(s)
- Antonio R Linero
- Department of Statistics and Data Science, University of Texas at Austin, Austin, Texas, USA
| |
Collapse
|
7
|
Oganisian A, Roy JA. A practical introduction to Bayesian estimation of causal effects: Parametric and nonparametric approaches. Stat Med 2021; 40:518-551. [PMID: 33015870 PMCID: PMC8640942 DOI: 10.1002/sim.8761] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Revised: 09/07/2020] [Accepted: 09/08/2020] [Indexed: 12/27/2022]
Abstract
Substantial advances in Bayesian methods for causal inference have been made in recent years. We provide an introduction to Bayesian inference for causal effects for practicing statisticians who have some familiarity with Bayesian models and would like an overview of what it can add to causal estimation in practical settings. In the paper, we demonstrate how priors can induce shrinkage and sparsity in parametric models and be used to perform probabilistic sensitivity analyses around causal assumptions. We provide an overview of nonparametric Bayesian estimation and survey their applications in the causal inference literature. Inference in the point-treatment and time-varying treatment settings are considered. For the latter, we explore both static and dynamic treatment regimes. Throughout, we illustrate implementation using off-the-shelf open source software. We hope to leave the reader with implementation-level knowledge of Bayesian causal inference using both parametric and nonparametric models. All synthetic examples and code used in the paper are publicly available on a companion GitHub repository.
Collapse
Affiliation(s)
- Arman Oganisian
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Pennsylvania, USA
| | - Jason A. Roy
- Department of Biostatistics and Epidemiology, Rutgers University, New Jersey, USA
| |
Collapse
|
8
|
Xie Y, Cotton C, Zhu Y. Multiply robust estimation of causal quantile treatment effects. Stat Med 2020; 39:4238-4251. [PMID: 32857876 DOI: 10.1002/sim.8722] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Revised: 06/15/2020] [Accepted: 07/16/2020] [Indexed: 11/07/2022]
Abstract
In causal inference, often the interest lies in the estimation of the average causal effect. Other quantities such as the quantile treatment effect may be of interest as well. In this article, we propose a multiply robust method for estimating the marginal quantiles of potential outcomes by achieving mean balance in (a) the propensity score, and (b) the conditional distributions of potential outcomes. An empirical likelihood or entropy measure approach can be utilized for estimation instead of inverse probability weighting, which is known to be sensitive to the misspecification of the propensity score model. Simulation studies are conducted across different scenarios of correctness in both the propensity score models and the outcome models. Both simulation results and theoretical development indicate that our proposed estimator is consistent if any of the models are correctly specified. In the data analysis, we investigate the quantile treatment effect of mothers' smoking status on infants' birthweight.
Collapse
Affiliation(s)
- Yuying Xie
- Biometrics Department, Hoffmann-La Roche Limited, Mississauga, Ontario, Canada
| | - Cecilia Cotton
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada
| | - Yeying Zhu
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada
| |
Collapse
|
9
|
Oganisian A, Mitra N, Roy JA. A Bayesian nonparametric model for zero-inflated outcomes: Prediction, clustering, and causal estimation. Biometrics 2020; 77:125-135. [PMID: 32125699 DOI: 10.1111/biom.13244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2019] [Revised: 01/19/2020] [Accepted: 02/13/2020] [Indexed: 12/01/2022]
Abstract
Researchers are often interested in predicting outcomes, detecting distinct subgroups of their data, or estimating causal treatment effects. Pathological data distributions that exhibit skewness and zero-inflation complicate these tasks-requiring highly flexible, data-adaptive modeling. In this paper, we present a multipurpose Bayesian nonparametric model for continuous, zero-inflated outcomes that simultaneously predicts structural zeros, captures skewness, and clusters patients with similar joint data distributions. The flexibility of our approach yields predictions that capture the joint data distribution better than commonly used zero-inflated methods. Moreover, we demonstrate that our model can be coherently incorporated into a standardization procedure for computing causal effect estimates that are robust to such data pathologies. Uncertainty at all levels of this model flow through to the causal effect estimates of interest-allowing easy point estimation, interval estimation, and posterior predictive checks verifying positivity, a required causal identification assumption. Our simulation results show point estimates to have low bias and interval estimates to have close to nominal coverage under complicated data settings. Under simpler settings, these results hold while incurring lower efficiency loss than comparator methods. We use our proposed method to analyze zero-inflated inpatient medical costs among endometrial cancer patients receiving either chemotherapy or radiation therapy in the SEER-Medicare database.
Collapse
Affiliation(s)
- Arman Oganisian
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Nandita Mitra
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Jason A Roy
- Department of Biostatistics and Epidemiology, Rutgers University, Piscataway, New Jersey
| |
Collapse
|
10
|
Franks A, D’Amour A, Feller A. Flexible Sensitivity Analysis for Observational Studies Without Observable Implications. J Am Stat Assoc 2019. [DOI: 10.1080/01621459.2019.1604369] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
| | | | - Avi Feller
- University of California, Berkeley, Berkeley, CA
| |
Collapse
|
11
|
Greenland S, Fay MP, Brittain EH, Shih JH, Follmann DA, Gabriel EE, Robins JM. On Causal Inferences for Personalized Medicine: How Hidden Causal Assumptions Led to Erroneous Causal Claims About the D-Value. AM STAT 2019; 74:243-248. [PMID: 33487634 DOI: 10.1080/00031305.2019.1575771] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Personalized medicine asks if a new treatment will help a particular patient, rather than if it improves the average response in a population. Without a causal model to distinguish these questions, interpretational mistakes arise. These mistakes are seen in an article by Demidenko [2016] that recommends the "D-value," which is the probability that a randomly chosen person from the new-treatment group has a higher value for the outcome than a randomly chosen person from the control-treatment group. The abstract states "The D-value has a clear interpretation as the proportion of patients who get worse after the treatment" with similar assertions appearing later. We show these statements are incorrect because they require assumptions about the potential outcomes which are neither testable in randomized experiments nor plausible in general. The D-value will not equal the proportion of patients who get worse after treatment if (as expected) those outcomes are correlated. Independence of potential outcomes is unrealistic and eliminates any personalized treatment effects; with dependence, the D-value can even imply treatment is better than control even though most patients are harmed by the treatment. Thus, D-values are misleading for personalized medicine. To prevent misunderstandings, we advise incorporating causal models into basic statistics education.
Collapse
Affiliation(s)
- Sander Greenland
- Department of Epidemiology and Department of Statistics, University of California, Los Angeles, CA, U.S.A.,
| | - Michael P Fay
- Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, Bethesda MD, U.S.A
| | - Erica H Brittain
- Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, Bethesda MD, U.S.A
| | - Joanna H Shih
- Biometric Research Branch, National Cancer Institute, Rockville, MD, U.S.A
| | - Dean A Follmann
- Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, Bethesda MD, U.S.A
| | - Erin E Gabriel
- Department of Medical Epidemiology and Biostatistics, Karolinska Institute, Stockholm, Sweden
| | - James M Robins
- Department of Epidemiology and Department of Biostatistics, Harvard T. Chan School of Public Health, Boston, MA
| |
Collapse
|
12
|
Affiliation(s)
- Joseph Antonelli
- Department of Statistics, University of Florida, Gainesville, FL
| | | |
Collapse
|
13
|
Capistrano ESM, Moodie EEM, Schmidt AM. Bayesian estimation of the average treatment effect on the treated using inverse weighting. Stat Med 2019; 38:2447-2466. [PMID: 30859603 DOI: 10.1002/sim.8121] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 01/17/2019] [Accepted: 01/20/2019] [Indexed: 11/06/2022]
Abstract
We develop a Bayesian approach to estimate the average treatment effect on the treated in the presence of confounding. The approach builds on developments proposed by Saarela et al in the context of marginal structural models, using importance sampling weights to adjust for confounding and estimate a causal effect. The Bayesian bootstrap is adopted to approximate posterior distributions of interest and avoid the issue of feedback that arises in Bayesian causal estimation relying on a joint likelihood. We present results from simulation studies to estimate the average treatment effect on the treated, evaluating the impact of sample size and the strength of confounding on estimation. We illustrate our approach using the classic Right Heart Catheterization data set and find a negative causal effect of the exposure on 30-day survival, in accordance with previous analyses of these data. We also apply our approach to the data set of the National Center for Health Statistics Birth Data and obtain a negative effect of maternal smoking during pregnancy on birth weight.
Collapse
Affiliation(s)
| | - Erica E M Moodie
- Department of Epidemiology, Biostatistics, and Occupational Health, McGill University, Montréal, Canada
| | - Alexandra M Schmidt
- Department of Epidemiology, Biostatistics, and Occupational Health, McGill University, Montréal, Canada
| |
Collapse
|
14
|
Ro J, Lum KJ, Zeldow B, Dworkin JD, Re VL, Daniels MJ. Bayesian nonparametric generative models for causal inference with missing at random covariates. Biometrics 2018; 74:1193-1202. [PMID: 29579341 PMCID: PMC7568223 DOI: 10.1111/biom.12875] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2017] [Revised: 02/01/2018] [Accepted: 02/01/2018] [Indexed: 11/28/2022]
Abstract
We propose a general Bayesian nonparametric (BNP) approach to causal inference in the point treatment setting. The joint distribution of the observed data (outcome, treatment, and confounders) is modeled using an enriched Dirichlet process. The combination of the observed data model and causal assumptions allows us to identify any type of causal effect-differences, ratios, or quantile effects, either marginally or for subpopulations of interest. The proposed BNP model is well-suited for causal inference problems, as it does not require parametric assumptions about the distribution of confounders and naturally leads to a computationally efficient Gibbs sampling algorithm. By flexibly modeling the joint distribution, we are also able to impute (via data augmentation) values for missing covariates within the algorithm under an assumption of ignorable missingness, obviating the need to create separate imputed data sets. This approach for imputing the missing covariates has the additional advantage of guaranteeing congeniality between the imputation model and the analysis model, and because we use a BNP approach, parametric models are avoided for imputation. The performance of the method is assessed using simulation studies. The method is applied to data from a cohort study of human immunodeficiency virus/hepatitis C virus co-infected patients.
Collapse
Affiliation(s)
- Jason Ro
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, U.S.A
| | - Kirsten J. Lum
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, U.S.A
| | - Bret Zeldow
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, U.S.A
| | - Jordan D. Dworkin
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, U.S.A
| | - Vincent Lo Re
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, U.S.A
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, U.S.A
| | - Michael J. Daniels
- Department of Statistics, University of Florida, Gainesville, Florida 32611, U.S.A
| |
Collapse
|