1
|
Bae W, Daniels MJ, Perri MG. A Bayesian nonparametric approach for causal mediation with a post-treatment confounder. Biometrics 2024; 80:ujae099. [PMID: 39311673 PMCID: PMC11418020 DOI: 10.1093/biomtc/ujae099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Revised: 08/13/2024] [Accepted: 08/29/2024] [Indexed: 09/26/2024]
Abstract
We propose a new Bayesian nonparametric method for estimating the causal effects of mediation in the presence of a post-treatment confounder. The methodology is motivated by the Rural Lifestyle Intervention Treatment Effectiveness Trial (Rural LITE) for which there is interest in estimating causal mediation effects but is complicated by the presence of a post-treatment confounder. We specify an enriched Dirichlet process mixture (EDPM) to model the joint distribution of the observed data (outcome, mediator, post-treatment confounder, treatment, and baseline confounders). For identifiability, we use the extended version of the standard sequential ignorability (SI) as introduced in Hong et al. along with a Gaussian copula model assumption. The observed data model and causal identification assumptions enable us to estimate and identify the causal effects of mediation, that is, the natural direct effects (NDE) and natural indirect effects (NIE). Our method enables easy computation of NIE and NDE for a subset of confounding variables and addresses missing data through data augmentation under the assumption of ignorable missingness. We conduct simulation studies to assess the performance of our proposed method. Furthermore, we apply this approach to evaluate the causal mediation effect in the Rural LITE trial, finding that there was not strong evidence for the potential mediator.
Collapse
Affiliation(s)
- Woojung Bae
- Department of Statistics, University of Florida, Gainesville, FL 32611, USA
| | - Michael J Daniels
- Department of Statistics, University of Florida, Gainesville, FL 32611, USA
| | - Michael G Perri
- Department of Clinical and Health Psychology, University of Florida, Gainesville, FL 32611, USA
| |
Collapse
|
2
|
Roy S, Daniels MJ, Roy J. A Bayesian nonparametric approach for multiple mediators with applications in mental health studies. Biostatistics 2024; 25:919-932. [PMID: 38332624 PMCID: PMC11247183 DOI: 10.1093/biostatistics/kxad038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 12/14/2023] [Accepted: 12/15/2023] [Indexed: 02/10/2024] Open
Abstract
Mediation analysis with contemporaneously observed multiple mediators is a significant area of causal inference. Recent approaches for multiple mediators are often based on parametric models and thus may suffer from model misspecification. Also, much of the existing literature either only allow estimation of the joint mediation effect or estimate the joint mediation effect just as the sum of individual mediator effects, ignoring the interaction among the mediators. In this article, we propose a novel Bayesian nonparametric method that overcomes the two aforementioned drawbacks. We model the joint distribution of the observed data (outcome, mediators, treatment, and confounders) flexibly using an enriched Dirichlet process mixture with three levels. We use standardization (g-computation) to compute all possible mediation effects, including pairwise and all other possible interaction among the mediators. We thoroughly explore our method via simulations and apply our method to a mental health data from Wisconsin Longitudinal Study, where we estimate how the effect of births from unintended pregnancies on later life mental depression (CES-D) among the mothers is mediated through lack of self-acceptance and autonomy, employment instability, lack of social participation, and increased family stress. Our method identified significant individual mediators, along with some significant pairwise effects.
Collapse
Affiliation(s)
- Samrat Roy
- Operations and Decision Sciences, Indian Institute of Management Ahmedabad, Gujarat, India
| | | | - Jason Roy
- Department of Biostatistics and Epidemiology, Rutgers University, New Brunswick, USA
| |
Collapse
|
3
|
Oganisian A, Mitra N, Roy JA. Hierarchical Bayesian bootstrap for heterogeneous treatment effect estimation. Int J Biostat 2024; 20:93-106. [PMID: 36584112 DOI: 10.1515/ijb-2022-0051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 12/05/2022] [Indexed: 12/31/2022]
Abstract
A major focus of causal inference is the estimation of heterogeneous average treatment effects (HTE) - average treatment effects within strata of another variable of interest such as levels of a biomarker, education, or age strata. Inference involves estimating a stratum-specific regression and integrating it over the distribution of confounders in that stratum - which itself must be estimated. Standard practice involves estimating these stratum-specific confounder distributions independently (e.g. via the empirical distribution or Rubin's Bayesian bootstrap), which becomes problematic for sparsely populated strata with few observed confounder vectors. In this paper, we develop a nonparametric hierarchical Bayesian bootstrap (HBB) prior over the stratum-specific confounder distributions for HTE estimation. The HBB partially pools the stratum-specific distributions, thereby allowing principled borrowing of confounder information across strata when sparsity is a concern. We show that posterior inference under the HBB can yield efficiency gains over standard marginalization approaches while avoiding strong parametric assumptions about the confounder distribution. We use our approach to estimate the adverse event risk of proton versus photon chemoradiotherapy across various cancer types.
Collapse
Affiliation(s)
- Arman Oganisian
- Department of Biostatistics, Brown University, Providence, RI, USA
| | - Nandita Mitra
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Jason A Roy
- Department of Biostatistics and Epidemiology, Rutgers University, Piscataway, NJ, USA
| |
Collapse
|
4
|
Zorzetto D, Bargagli-Stoffi FJ, Canale A, Dominici. F. Confounder-dependent Bayesian mixture model: Characterizing heterogeneity of causal effects in air pollution epidemiology. Biometrics 2024; 80:ujae025. [PMID: 38640436 PMCID: PMC11028589 DOI: 10.1093/biomtc/ujae025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 01/24/2024] [Accepted: 03/14/2024] [Indexed: 04/21/2024]
Abstract
Several epidemiological studies have provided evidence that long-term exposure to fine particulate matter (pm2.5) increases mortality rate. Furthermore, some population characteristics (e.g., age, race, and socioeconomic status) might play a crucial role in understanding vulnerability to air pollution. To inform policy, it is necessary to identify groups of the population that are more or less vulnerable to air pollution. In causal inference literature, the group average treatment effect (GATE) is a distinctive facet of the conditional average treatment effect. This widely employed metric serves to characterize the heterogeneity of a treatment effect based on some population characteristics. In this paper, we introduce a novel Confounder-Dependent Bayesian Mixture Model (CDBMM) to characterize causal effect heterogeneity. More specifically, our method leverages the flexibility of the dependent Dirichlet process to model the distribution of the potential outcomes conditionally to the covariates and the treatment levels, thus enabling us to: (i) identify heterogeneous and mutually exclusive population groups defined by similar GATEs in a data-driven way, and (ii) estimate and characterize the causal effects within each of the identified groups. Through simulations, we demonstrate the effectiveness of our method in uncovering key insights about treatment effects heterogeneity. We apply our method to claims data from Medicare enrollees in Texas. We found six mutually exclusive groups where the causal effects of pm2.5 on mortality rate are heterogeneous.
Collapse
Affiliation(s)
- Dafne Zorzetto
- Department of Statistics, University of Padova, Padova 35121, Italy
| | - Falco J Bargagli-Stoffi
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, 02115 MA, United States
| | - Antonio Canale
- Department of Statistics, University of Padova, Padova 35121, Italy
| | - Francesca Dominici.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, 02115 MA, United States
| |
Collapse
|
5
|
Cardoso P, Dennis JM, Bowden J, Shields BM, McKinley TJ. Dirichlet process mixture models to impute missing predictor data in counterfactual prediction models: an application to predict optimal type 2 diabetes therapy. BMC Med Inform Decis Mak 2024; 24:12. [PMID: 38191403 PMCID: PMC10773072 DOI: 10.1186/s12911-023-02400-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Accepted: 12/11/2023] [Indexed: 01/10/2024] Open
Abstract
BACKGROUND The handling of missing data is a challenge for inference and regression modelling. A particular challenge is dealing with missing predictor information, particularly when trying to build and make predictions from models for use in clinical practice. METHODS We utilise a flexible Bayesian approach for handling missing predictor information in regression models. This provides practitioners with full posterior predictive distributions for both the missing predictor information (conditional on the observed predictors) and the outcome-of-interest. We apply this approach to a previously proposed counterfactual treatment selection model for type 2 diabetes second-line therapies. Our approach combines a regression model and a Dirichlet process mixture model (DPMM), where the former defines the treatment selection model, and the latter provides a flexible way to model the joint distribution of the predictors. RESULTS We show that DPMMs can model complex relationships between predictor variables and can provide powerful means of fitting models to incomplete data (under missing-completely-at-random and missing-at-random assumptions). This framework ensures that the posterior distribution for the parameters and the conditional average treatment effect estimates automatically reflect the additional uncertainties associated with missing data due to the hierarchical model structure. We also demonstrate that in the presence of multiple missing predictors, the DPMM model can be used to explore which variable(s), if collected, could provide the most additional information about the likely outcome. CONCLUSIONS When developing clinical prediction models, DPMMs offer a flexible way to model complex covariate structures and handle missing predictor information. DPMM-based counterfactual prediction models can also provide additional information to support clinical decision-making, including allowing predictions with appropriate uncertainty to be made for individuals with incomplete predictor data.
Collapse
Affiliation(s)
- Pedro Cardoso
- University of Exeter, Medical School, Exeter, England
| | - John M Dennis
- University of Exeter, Medical School, Exeter, England
| | - Jack Bowden
- University of Exeter, Medical School, Exeter, England
| | | | | |
Collapse
|
6
|
Zang H, Kim HJ, Huang B, Szczesniak R. Bayesian causal inference for observational studies with missingness in covariates and outcomes. Biometrics 2023; 79:3624-3636. [PMID: 37553770 PMCID: PMC10840608 DOI: 10.1111/biom.13918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 07/13/2023] [Indexed: 08/10/2023]
Abstract
Missing data are a pervasive issue in observational studies using electronic health records or patient registries. It presents unique challenges for statistical inference, especially causal inference. Inappropriately handling missing data in causal inference could potentially bias causal estimation. Besides missing data problems, observational health data structures typically have mixed-type variables - continuous and categorical covariates - whose joint distribution is often too complex to be modeled by simple parametric models. The existence of missing values in covariates and outcomes makes the causal inference even more challenging, while most standard causal inference approaches assume fully observed data or start their works after imputing missing values in a separate preprocessing stage. To address these problems, we introduce a Bayesian nonparametric causal model to estimate causal effects with missing data. The proposed approach can simultaneously impute missing values, account for multiple outcomes, and estimate causal effects under the potential outcomes framework. We provide three simulation studies to show the performance of our proposed method under complicated data settings whose features are similar to our case studies. For example, Simulation Study 3 assumes the case where missing values exist in both outcomes and covariates. Two case studies were conducted applying our method to evaluate the comparative effectiveness of treatments for chronic disease management in juvenile idiopathic arthritis and cystic fibrosis.
Collapse
Affiliation(s)
- Huaiyu Zang
- Heart Institute, Cincinnati Children’s Hospital Medical Center, OH, U.S.A
| | - Hang J. Kim
- Division of Statistics and Data Science, University of Cincinnati, OH, U.S.A
| | - Bin Huang
- Division of Biostatistics and Epidemiology, Cincinnati Children’s Hospital Medical Center, OH, U.S.A
- Department of Pediatrics, University of Cincinnati, OH, U.S.A
| | - Rhonda Szczesniak
- Division of Biostatistics and Epidemiology, Cincinnati Children’s Hospital Medical Center, OH, U.S.A
- Department of Pediatrics, University of Cincinnati, OH, U.S.A
| |
Collapse
|
7
|
Daniels MJ, Lee M, Feng W. Dirichlet process mixture models for the analysis of repeated attempt designs. Biometrics 2023; 79:3907-3915. [PMID: 37349969 PMCID: PMC11091717 DOI: 10.1111/biom.13894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Accepted: 05/22/2023] [Indexed: 06/24/2023]
Abstract
In longitudinal studies, it is not uncommon to make multiple attempts to collect a measurement after baseline. Recording whether these attempts are successful provides useful information for the purposes of assessing missing data assumptions. This is because measurements from subjects who provide the data after numerous failed attempts may differ from those who provide the measurement after fewer attempts. Previous models for these designs were parametric and/or did not allow sensitivity analysis. For the former, there are always concerns about model misspecification and for the latter, sensitivity analysis is essential when conducting inference in the presence of missing data. Here, we propose a new approach which minimizes issues with model misspecification by using Bayesian nonparametrics for the observed data distribution. We also introduce a novel approach for identification and sensitivity analysis. We re-analyze the repeated attempts data from a clinical trial involving patients with severe mental illness and conduct simulations to better understand the properties of our approach.
Collapse
Affiliation(s)
- Michael J. Daniels
- Department of Statistics, University of Florida, Gainesville, Florida, USA
| | - Minji Lee
- Edwards Lifesciences, Irvine, California, USA
| | - Wei Feng
- Keros Therapeutics, Lexington, Massachusetts, USA
| |
Collapse
|
8
|
Linero AR. Prior and posterior checking of implicit causal assumptions. Biometrics 2023; 79:3153-3164. [PMID: 37325868 DOI: 10.1111/biom.13886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Accepted: 05/18/2023] [Indexed: 06/17/2023]
Abstract
Causal inference practitioners have increasingly adopted machine learning techniques with the aim of producing principled uncertainty quantification for causal effects while minimizing the risk of model misspecification. Bayesian nonparametric approaches have attracted attention as well, both for their flexibility and their promise of providing natural uncertainty quantification. Priors on high-dimensional or nonparametric spaces, however, can often unintentionally encode prior information that is at odds with substantive knowledge in causal inference-specifically, the regularization required for high-dimensional Bayesian models to work can indirectly imply that the magnitude of the confounding is negligible. In this paper, we explain this problem and provide tools for (i) verifying that the prior distribution does not encode an inductive bias away from confounded models and (ii) verifying that the posterior distribution contains sufficient information to overcome this issue if it exists. We provide a proof-of-concept on simulated data from a high-dimensional probit-ridge regression model, and illustrate on a Bayesian nonparametric decision tree ensemble applied to a large medical expenditure survey.
Collapse
Affiliation(s)
- Antonio R Linero
- Department of Statistics and Data Science, University of Texas at Austin, Austin, Texas, USA
| |
Collapse
|
9
|
Franzolini B, Cremaschi A, van den Boom W, De Iorio M. Bayesian clustering of multiple zero-inflated outcomes. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2023; 381:20220145. [PMID: 36970823 PMCID: PMC10041346 DOI: 10.1098/rsta.2022.0145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Accepted: 09/15/2022] [Indexed: 06/18/2023]
Abstract
Several applications involving counts present a large proportion of zeros (excess-of-zeros data). A popular model for such data is the hurdle model, which explicitly models the probability of a zero count, while assuming a sampling distribution on the positive integers. We consider data from multiple count processes. In this context, it is of interest to study the patterns of counts and cluster the subjects accordingly. We introduce a novel Bayesian approach to cluster multiple, possibly related, zero-inflated processes. We propose a joint model for zero-inflated counts, specifying a hurdle model for each process with a shifted Negative Binomial sampling distribution. Conditionally on the model parameters, the different processes are assumed independent, leading to a substantial reduction in the number of parameters as compared with traditional multivariate approaches. The subject-specific probabilities of zero-inflation and the parameters of the sampling distribution are flexibly modelled via an enriched finite mixture with random number of components. This induces a two-level clustering of the subjects based on the zero/non-zero patterns (outer clustering) and on the sampling distribution (inner clustering). Posterior inference is performed through tailored Markov chain Monte Carlo schemes. We demonstrate the proposed approach on an application involving the use of the messaging service WhatsApp. This article is part of the theme issue 'Bayesian inference: challenges, perspectives, and prospects'.
Collapse
Affiliation(s)
- Beatrice Franzolini
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Andrea Cremaschi
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Willem van den Boom
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Republic of Singapore
| | - Maria De Iorio
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Republic of Singapore
- Department of Statistical Science, University College London, London, UK
| |
Collapse
|
10
|
Li F, Ding P, Mealli F. Bayesian causal inference: a critical review. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2023; 381:20220153. [PMID: 36970828 DOI: 10.1098/rsta.2022.0153] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 10/23/2022] [Indexed: 06/18/2023]
Abstract
This paper provides a critical review of the Bayesian perspective of causal inference based on the potential outcomes framework. We review the causal estimands, assignment mechanism, the general structure of Bayesian inference of causal effects and sensitivity analysis. We highlight issues that are unique to Bayesian causal inference, including the role of the propensity score, the definition of identifiability, the choice of priors in both low- and high-dimensional regimes. We point out the central role of covariate overlap and more generally the design stage in Bayesian causal inference. We extend the discussion to two complex assignment mechanisms: instrumental variable and time-varying treatments. We identify the strengths and weaknesses of the Bayesian approach to causal inference. Throughout, we illustrate the key concepts via examples. This article is part of the theme issue 'Bayesian inference: challenges, perspectives, and prospects'.
Collapse
Affiliation(s)
- Fan Li
- Duke University, Durham, NC, USA
| | - Peng Ding
- University of California, Berkeley, CA, USA
| | | |
Collapse
|
11
|
Comment L, Coull BA, Zigler C, Valeri L. Bayesian data fusion: Probabilistic sensitivity analysis for unmeasured confounding using informative priors based on secondary data. Biometrics 2022; 78:730-741. [PMID: 33527348 PMCID: PMC8326294 DOI: 10.1111/biom.13436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Revised: 04/10/2020] [Accepted: 01/13/2021] [Indexed: 11/28/2022]
Abstract
Bayesian causal inference offers a principled approach to policy evaluation of proposed interventions on mediators or time-varying exposures. Building on the Bayesian g-formula method introduced by Keil et al., we outline a general approach for the estimation of population-level causal quantities involving dynamic and stochastic treatment regimes, including regimes related to mediation estimands such as natural direct and indirect effects. We further extend this approach to propose a Bayesian data fusion (BDF), an algorithm for performing probabilistic sensitivity analysis when a confounder unmeasured in a primary data set is available in an external data source. When the relevant relationships are causally transportable between the two source populations, BDF corrects confounding bias and supports causal inference and decision-making within the main study population without sharing of the individual-level external data set. We present results from a simulation study comparing BDF to two common frequentist correction methods for unmeasured mediator-outcome confounding bias in the mediation setting. We use these methods to analyze data on the role of stage at cancer diagnosis in contributing to Black-White colorectal cancer survival disparities.
Collapse
Affiliation(s)
- Leah Comment
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
| | - Brent A. Coull
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
| | - Corwin Zigler
- Department of Statistics and Data Sciences, University of Texas, Austin, Texas
| | - Linda Valeri
- Department of Biostatistics, Columbia University Mailman School of Public Health, New York, New York
| |
Collapse
|
12
|
Cui J, Li X, Zhao H, Wang H, Li B, Li X. Epoch-Evolving Gaussian Process Guided Learning for Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; PP:326-337. [PMID: 35604997 DOI: 10.1109/tnnls.2022.3174207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The conventional mini-batch gradient descent algorithms are usually trapped in the local batch-level distribution information, resulting in the ``zig-zag'' effect in the learning process. To characterize the correlation information between the batch-level distribution and the global data distribution, we propose a novel learning scheme called epoch-evolving Gaussian process guided learning (GPGL) to encode the global data distribution information in a non-parametric way. Upon a set of class-aware anchor samples, our GP model is built to estimate the class distribution for each sample in mini-batch through label propagation from the anchor samples to the batch samples. The class distribution, also named the context label, is provided as a complement for the ground-truth one-hot label. Such a class distribution structure has a smooth property and usually carries a rich body of contextual information that is capable of speeding up the convergence process. With the guidance of the context label and ground-truth label, the GPGL scheme provides a more efficient optimization through updating the model parameters with a triangle consistency loss. Furthermore, our GPGL scheme can be generalized and naturally applied to the current deep models, outperforming the state-of-the-art optimization methods on six benchmark datasets.
Collapse
|
13
|
Josefsson M, Daniels MJ. Bayesian semi-parametric G-computation for causal inference in a cohort study with MNAR dropout and death. J R Stat Soc Ser C Appl Stat 2021; 70:398-414. [PMID: 33692597 PMCID: PMC7939177 DOI: 10.1111/rssc.12464] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Causal inference with observational longitudinal data and time-varying exposures is often complicated by time-dependent confounding and attrition. The G-computation formula is one approach for estimating a causal effect in this setting. The parametric modeling approach typically used in practice relies on strong modeling assumptions for valid inference, and moreover depends on an assumption of missing at random, which is not appropriate when the missingness is missing not at random (MNAR) or due to death. In this work we develop a flexible Bayesian semi-parametric G-computation approach for assessing the causal effect on the subpopulation that would survive irrespective of exposure, in a setting with MNAR dropout. The approach is to specify models for the observed data using Bayesian additive regression trees, and then use assumptions with embedded sensitivity parameters to identify and estimate the causal effect. The proposed approach is motivated by a longitudinal cohort study on cognition, health, and aging, and we apply our approach to study the effect of becoming a widow on memory. We also compare our approach to several standard methods.
Collapse
Affiliation(s)
- Maria Josefsson
- Centre for Demographic and Ageing Research, Umeå University, Sweden
| | | |
Collapse
|
14
|
Roy J, Mitra N. Measured and accounted-for confounding in pharmacoepidemiologic studies: Some thoughts for practitioners. Pharmacoepidemiol Drug Saf 2021; 30:277-282. [PMID: 33372303 PMCID: PMC8635757 DOI: 10.1002/pds.5189] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Accepted: 12/25/2020] [Indexed: 11/06/2022]
Abstract
BACKGROUND Valid causal inference from observational pharmacoepidemiologic studies relies on adequately adjusting for confounding. AIMS The goal of this article is to provide clarity and guidance on issues related to confounding and provide motivation for using more flexible models for causal inference in pharmacoepidemiology. MATERIALS & METHODS In this article we elucidate two important components of making valid inference from observational data: measuring the necessary set of variables at the design/data collection phase (measured confounding) and properly accounting for confounding at the modeling/analysis phase (accounted-for confounding). For the latter concept, we contrast parametric modeling approaches, which are susceptible to model misspecification bias, with data adaptive approaches. DISCUSSION Both measuring and properly accounting for confounding is critical to obtaining valid causal inference from pharmacoepidemiology studies. Carefully thought out DAGs, based on subject matter knowledge, can help to better identify confounders and confounding. Even when confounding has been adequately measured, mis-specified models may lead to unaccounted for confounding and increasing the sample size often does not help. We recommend modern analytic techniques such as flexible data adaptive approaches that do not rely on strong parametric assumptions. Further, sensitivity analyses and other modern bounding approaches are recommended to account for the effects of unmeasured confounding. CONCLUSION Confounding must be considered at both the design and analysis stages of a study. DAGs and data adaptive approaches can help.
Collapse
Affiliation(s)
- Jason Roy
- Department of Biostatistics and Epidemiology, Rutgers University School of Public Health, New Brunswick, NJ
| | - Nandita Mitra
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| |
Collapse
|
15
|
Oganisian A, Roy JA. A practical introduction to Bayesian estimation of causal effects: Parametric and nonparametric approaches. Stat Med 2021; 40:518-551. [PMID: 33015870 PMCID: PMC8640942 DOI: 10.1002/sim.8761] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Revised: 09/07/2020] [Accepted: 09/08/2020] [Indexed: 12/27/2022]
Abstract
Substantial advances in Bayesian methods for causal inference have been made in recent years. We provide an introduction to Bayesian inference for causal effects for practicing statisticians who have some familiarity with Bayesian models and would like an overview of what it can add to causal estimation in practical settings. In the paper, we demonstrate how priors can induce shrinkage and sparsity in parametric models and be used to perform probabilistic sensitivity analyses around causal assumptions. We provide an overview of nonparametric Bayesian estimation and survey their applications in the causal inference literature. Inference in the point-treatment and time-varying treatment settings are considered. For the latter, we explore both static and dynamic treatment regimes. Throughout, we illustrate implementation using off-the-shelf open source software. We hope to leave the reader with implementation-level knowledge of Bayesian causal inference using both parametric and nonparametric models. All synthetic examples and code used in the paper are publicly available on a companion GitHub repository.
Collapse
Affiliation(s)
- Arman Oganisian
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Pennsylvania, USA
| | - Jason A. Roy
- Department of Biostatistics and Epidemiology, Rutgers University, New Jersey, USA
| |
Collapse
|
16
|
Oganisian A, Mitra N, Roy JA. A Bayesian nonparametric model for zero-inflated outcomes: Prediction, clustering, and causal estimation. Biometrics 2020; 77:125-135. [PMID: 32125699 DOI: 10.1111/biom.13244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2019] [Revised: 01/19/2020] [Accepted: 02/13/2020] [Indexed: 12/01/2022]
Abstract
Researchers are often interested in predicting outcomes, detecting distinct subgroups of their data, or estimating causal treatment effects. Pathological data distributions that exhibit skewness and zero-inflation complicate these tasks-requiring highly flexible, data-adaptive modeling. In this paper, we present a multipurpose Bayesian nonparametric model for continuous, zero-inflated outcomes that simultaneously predicts structural zeros, captures skewness, and clusters patients with similar joint data distributions. The flexibility of our approach yields predictions that capture the joint data distribution better than commonly used zero-inflated methods. Moreover, we demonstrate that our model can be coherently incorporated into a standardization procedure for computing causal effect estimates that are robust to such data pathologies. Uncertainty at all levels of this model flow through to the causal effect estimates of interest-allowing easy point estimation, interval estimation, and posterior predictive checks verifying positivity, a required causal identification assumption. Our simulation results show point estimates to have low bias and interval estimates to have close to nominal coverage under complicated data settings. Under simpler settings, these results hold while incurring lower efficiency loss than comparator methods. We use our proposed method to analyze zero-inflated inpatient medical costs among endometrial cancer patients receiving either chemotherapy or radiation therapy in the SEER-Medicare database.
Collapse
Affiliation(s)
- Arman Oganisian
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Nandita Mitra
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Jason A Roy
- Department of Biostatistics and Epidemiology, Rutgers University, Piscataway, New Jersey
| |
Collapse
|
17
|
Affiliation(s)
- Joseph Antonelli
- Department of Statistics, University of Florida, Gainesville, FL
| | | |
Collapse
|
18
|
Capistrano ESM, Moodie EEM, Schmidt AM. Bayesian estimation of the average treatment effect on the treated using inverse weighting. Stat Med 2019; 38:2447-2466. [PMID: 30859603 DOI: 10.1002/sim.8121] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 01/17/2019] [Accepted: 01/20/2019] [Indexed: 11/06/2022]
Abstract
We develop a Bayesian approach to estimate the average treatment effect on the treated in the presence of confounding. The approach builds on developments proposed by Saarela et al in the context of marginal structural models, using importance sampling weights to adjust for confounding and estimate a causal effect. The Bayesian bootstrap is adopted to approximate posterior distributions of interest and avoid the issue of feedback that arises in Bayesian causal estimation relying on a joint likelihood. We present results from simulation studies to estimate the average treatment effect on the treated, evaluating the impact of sample size and the strength of confounding on estimation. We illustrate our approach using the classic Right Heart Catheterization data set and find a negative causal effect of the exposure on 30-day survival, in accordance with previous analyses of these data. We also apply our approach to the data set of the National Center for Health Statistics Birth Data and obtain a negative effect of maternal smoking during pregnancy on birth weight.
Collapse
Affiliation(s)
| | - Erica E M Moodie
- Department of Epidemiology, Biostatistics, and Occupational Health, McGill University, Montréal, Canada
| | - Alexandra M Schmidt
- Department of Epidemiology, Biostatistics, and Occupational Health, McGill University, Montréal, Canada
| |
Collapse
|