1
|
Shin H, Antonelli J. Improved inference for doubly robust estimators of heterogeneous treatment effects. Biometrics 2023; 79:3140-3152. [PMID: 36745745 DOI: 10.1111/biom.13837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 01/16/2023] [Accepted: 01/30/2023] [Indexed: 02/08/2023]
Abstract
We propose a doubly robust approach to characterizing treatment effect heterogeneity in observational studies. We develop a frequentist inferential procedure that utilizes posterior distributions for both the propensity score and outcome regression models to provide valid inference on the conditional average treatment effect even when high-dimensional or nonparametric models are used. We show that our approach leads to conservative inference in finite samples or under model misspecification and provides a consistent variance estimator when both models are correctly specified. In simulations, we illustrate the utility of these results in difficult settings such as high-dimensional covariate spaces or highly flexible models for the propensity score and outcome regression. Lastly, we analyze environmental exposure data from NHANES to identify how the effects of these exposures vary by subject-level characteristics.
Collapse
Affiliation(s)
- Heejun Shin
- Department of Statistics, University of Florida, Gainesville, Florida, USA
| | - Joseph Antonelli
- Department of Statistics, University of Florida, Gainesville, Florida, USA
| |
Collapse
|
2
|
Li Y, Miao W, Shpitser I, Tchetgen Tchetgen EJ. A self-censoring model for multivariate nonignorable nonmonotone missing data. Biometrics 2023; 79:3203-3214. [PMID: 37488709 DOI: 10.1111/biom.13916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 07/10/2023] [Indexed: 07/26/2023]
Abstract
We introduce an itemwise modeling approach called "self-censoring" for multivariate nonignorable nonmonotone missing data, where the missingness process of each outcome can be affected by its own value and associated with missingness indicators of other outcomes, while conditionally independent of the other outcomes. The self-censoring model complements previous graphical approaches for the analysis of multivariate nonignorable missing data. It is identified under a completeness condition stating that any variability in one outcome can be captured by variability in the other outcomes among complete cases. For estimation, we propose a suite of semiparametric estimators including doubly robust estimators that deliver valid inferences under partial misspecification of the full-data distribution. We also provide a novel and flexible global sensitivity analysis procedure anchored at the self-censoring. We evaluate the performance of the proposed methods with simulations and apply them to analyze a study about the effect of highly active antiretroviral therapy on preterm delivery of HIV-positive mothers.
Collapse
Affiliation(s)
- Yilin Li
- Department of Probability and Statistics, Peking University, Beijing, China
| | - Wang Miao
- Department of Probability and Statistics, Peking University, Beijing, China
| | - Ilya Shpitser
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Eric J Tchetgen Tchetgen
- Department of Statistics, The Wharton School of the University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
3
|
Rostami M, Saarela O. Targeted L1-Regularization and Joint Modeling of Neural Networks for Causal Inference. Entropy (Basel) 2022; 24:1290. [PMID: 36141175 PMCID: PMC9497603 DOI: 10.3390/e24091290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 09/07/2022] [Accepted: 09/08/2022] [Indexed: 06/16/2023]
Abstract
The calculation of the Augmented Inverse Probability Weighting (AIPW) estimator of the Average Treatment Effect (ATE) is carried out in two steps, where in the first step, the treatment and outcome are modeled, and in the second step, the predictions are inserted into the AIPW estimator. The model misspecification in the first step has led researchers to utilize Machine Learning algorithms instead of parametric algorithms. However, the existence of strong confounders and/or Instrumental Variables (IVs) can lead the complex ML algorithms to provide perfect predictions for the treatment model which can violate the positivity assumption and elevate the variance of AIPW estimators. Thus the complexity of ML algorithms must be controlled to avoid perfect predictions for the treatment model while still learning the relationship between the confounders and the treatment and outcome. We use two NN architectures with an L1-regularization on specific NN parameters and investigate how their certain hyperparameters should be tuned in the presence of confounders and IVs to achieve a low bias-variance tradeoff for ATE estimators such as AIPW estimator. Through simulation results, we will provide recommendations as to how NNs can be employed for ATE estimation.
Collapse
|
4
|
Antonelli J, Papadogeorgou G, Dominici F. Causal inference in high dimensions: A marriage between Bayesian modeling and good frequentist properties. Biometrics 2022; 78:100-114. [PMID: 33349923 PMCID: PMC8209114 DOI: 10.1111/biom.13417] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2018] [Revised: 12/02/2020] [Accepted: 12/04/2020] [Indexed: 11/30/2022]
Abstract
We introduce a framework for estimating causal effects of binary and continuous treatments in high dimensions. We show how posterior distributions of treatment and outcome models can be used together with doubly robust estimators. We propose an approach to uncertainty quantification for the doubly robust estimator, which utilizes posterior distributions of model parameters and (1) results in good frequentist properties in small samples, (2) is based on a single run of a Markov chain Monte Carlo (MCMC) algorithm, and (3) improves over frequentist measures of uncertainty which rely on asymptotic properties. We consider a flexible framework for modeling the treatment and outcome processes within the Bayesian paradigm that reduces model dependence, accommodates nonlinearity, and achieves dimension reduction of the covariate space. We illustrate the ability of the proposed approach to flexibly estimate causal effects in high dimensions and appropriately quantify uncertainty. We show that our proposed variance estimation strategy is consistent when both models are correctly specified, and we see empirically that it performs well in finite samples and under model misspecification. Finally, we estimate the effect of continuous environmental exposures on cholesterol and triglyceride levels.
Collapse
Affiliation(s)
- Joseph Antonelli
- Department of Statistics, University of Florida, Gainesville, FL, 32611
| | | | - Francesca Dominici
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| |
Collapse
|
5
|
Zhong Y, Kennedy EH, Bodnar LM, Naimi AI. AIPW: An R Package for Augmented Inverse Probability-Weighted Estimation of Average Causal Effects. Am J Epidemiol 2021; 190:2690-2699. [PMID: 34268567 PMCID: PMC8796813 DOI: 10.1093/aje/kwab207] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Revised: 07/09/2021] [Accepted: 07/13/2021] [Indexed: 12/26/2022] Open
Abstract
An increasing number of recent studies have suggested that doubly robust estimators with cross-fitting should be used when estimating causal effects with machine learning methods. However, not all existing programs that implement doubly robust estimators support machine learning methods and cross-fitting, or provide estimates on multiplicative scales. To address these needs, we developed AIPW, a software package implementing augmented inverse probability weighting (AIPW) estimation of average causal effects in R (R Foundation for Statistical Computing, Vienna, Austria). Key features of the AIPW package include cross-fitting and flexible covariate adjustment for observational studies and randomized controlled trials (RCTs). In this paper, we use a simulated RCT to illustrate implementation of the AIPW estimator. We also perform a simulation study to evaluate the performance of the AIPW package compared with other doubly robust implementations, including CausalGAM, npcausal, tmle, and tmle3. Our simulation showed that the AIPW package yields performance comparable to that of other programs. Furthermore, we also found that cross-fitting substantively decreases the bias and improves the confidence interval coverage for doubly robust estimators fitted with machine learning algorithms. Our findings suggest that the AIPW package can be a useful tool for estimating average causal effects with machine learning methods in RCTs and observational studies.
Collapse
Affiliation(s)
| | | | | | - Ashley I Naimi
- Correspondence to Dr. Ashley I. Naimi, Department of Epidemiology, Rollins School of Public Health, Emory University, 1518 Clifton Road, Atlanta, GA 30322 (e-mail: )
| |
Collapse
|
6
|
Berchialla P, Sciannameo V, Urru S, Lanera C, Azzolina D, Gregori D, Baldi I. Adjustment for Baseline Covariates to Increase Efficiency in RCTs with Binary Endpoint: A Comparison of Bayesian and Frequentist Approaches. Int J Environ Res Public Health 2021; 18:ijerph18157758. [PMID: 34360051 PMCID: PMC8345531 DOI: 10.3390/ijerph18157758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 07/20/2021] [Accepted: 07/21/2021] [Indexed: 11/16/2022]
Abstract
BACKGROUND In a randomized controlled trial (RCT) with binary outcome the estimate of the marginal treatment effect can be biased by prognostic baseline covariates adjustment. Methods that target the marginal odds ratio, allowing for improved precision and power, have been developed. METHODS The performance of different estimators for the treatment effect in the frequentist (targeted maximum likelihood estimator, inverse-probability-of-treatment weighting, parametric G-computation, and the semiparametric locally efficient estimator) and Bayesian (model averaging), adjustment for confounding, and generalized Bayesian causal effect estimation frameworks are assessed and compared in a simulation study under different scenarios. The use of these estimators is illustrated on an RCT in type II diabetes. RESULTS Model mis-specification does not increase the bias. The approaches that are not doubly robust have increased standard error (SE) under the scenario of mis-specification of the treatment model. The Bayesian estimators showed a higher type II error than frequentist estimators if noisy covariates are included in the treatment model. CONCLUSIONS Adjusting for prognostic baseline covariates in the analysis of RCTs can have more power than intention-to-treat based tests. However, for some classes of model, when the regression model is mis-specified, inflated type I error and potential bias on treatment effect estimate may arise.
Collapse
Affiliation(s)
- Paola Berchialla
- Department of Clinical and Biological Sciences, University of Torino, 10100 Torino, Italy;
- Correspondence:
| | - Veronica Sciannameo
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, 35121 Padova, Italy; (V.S.); (C.L.); (D.G.); (I.B.)
| | - Sara Urru
- Department of Clinical and Biological Sciences, University of Torino, 10100 Torino, Italy;
| | - Corrado Lanera
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, 35121 Padova, Italy; (V.S.); (C.L.); (D.G.); (I.B.)
| | - Danila Azzolina
- Department of Medical Sciences, University of Ferrara, 44121 Ferrara, Italy;
| | - Dario Gregori
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, 35121 Padova, Italy; (V.S.); (C.L.); (D.G.); (I.B.)
| | - Ileana Baldi
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, 35121 Padova, Italy; (V.S.); (C.L.); (D.G.); (I.B.)
| |
Collapse
|
7
|
Choi BY, Wang CP, Gelfond J. Machine learning outcome regression improves doubly robust estimation of average causal effects. Pharmacoepidemiol Drug Saf 2020; 29:1120-1133. [PMID: 32716126 PMCID: PMC8098857 DOI: 10.1002/pds.5074] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Revised: 06/17/2020] [Accepted: 06/18/2020] [Indexed: 11/06/2022]
Abstract
BACKGROUND Doubly robust estimation produces an unbiased estimator for the average treatment effect unless both propensity score (PS) and outcome models are incorrectly specified. Studies have shown that the doubly robust estimator is subject to more bias than the standard weighting estimator when both PS and outcome models are incorrectly specified. METHOD We evaluated whether various machine learning methods can be used for estimating conditional means of the potential outcomes to enhance the robustness of the doubly robust estimator to various degrees of model misspecification in terms of reducing bias and standard error. We considered four types of methods to predict the outcomes: least squares, tree-based methods, generalized additive models and shrinkage methods. We also considered an ensemble method called the Super Learner (SL), which is a linear combination of multiple learners. We conducted simulations considering different scenarios by the complexity of PS and outcome-generating models and some ranges of treatment prevalence. RESULTS The shrinkage methods performed well with robust doubly robust estimates in term of bias and mean squared error across the scenarios when the models became rich by including all 2-way interactions of the covariates. The SL performed similarly to the best method in each scenario. CONCLUSIONS Our findings indicate that machine learning methods such as the SL or the shrinkage methods using interaction models should be used for more accurate doubly robust estimators.
Collapse
Affiliation(s)
- Byeong Yeob Choi
- Department of Population Health Sciences, UT Health San Antonio, San Antonio, Texas, USA
| | - Chen-Pin Wang
- Department of Population Health Sciences, UT Health San Antonio, San Antonio, Texas, USA
| | - Jonathan Gelfond
- Department of Population Health Sciences, UT Health San Antonio, San Antonio, Texas, USA
| |
Collapse
|
8
|
Abstract
Deep learning is a class of machine learning algorithms that are popular for building risk prediction models. When observations are censored, the outcomes are only partially observed and standard deep learning algorithms cannot be directly applied. We develop a new class of deep learning algorithms for outcomes that are potentially censored. To account for censoring, the unobservable loss function used in the absence of censoring is replaced by a censoring unbiased transformation. The resulting class of algorithms can be used to estimate both survival probabilities and restricted mean survival. We show how the deep learning algorithms can be implemented by adapting software for uncensored data by using a form of response transformation. We provide comparisons of the proposed deep learning algorithms to existing risk prediction algorithms for predicting survival probabilities and restricted mean survival through both simulated datasets and analysis of data from breast cancer patients.
Collapse
Affiliation(s)
| | - Samantha Morrison
- Department of Biostatistics, Brown University, Providence, Rhode Island, USA
| |
Collapse
|
9
|
Kim D, Ahn BI. Eating Out and Consumers' Health: Evidence on Obesity and Balanced Nutrition Intakes. Int J Environ Res Public Health 2020; 17:E586. [PMID: 31963262 DOI: 10.3390/ijerph17020586] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Revised: 01/12/2020] [Accepted: 01/13/2020] [Indexed: 01/02/2023]
Abstract
Changes in demographic and socioeconomic characteristics have contributed to an increase in away-from-home food consumption. Although consumers are increasingly demanding higher quality food, unbalanced nutrition intakes and health issues such as obesity remain prominent predicaments. This paper investigates the relationship between the frequency of having Food Away From Home (FAFH), balanced dietary intakes, and obesity (controlling for covariates) among Korean adults aged 19 to 64. Whether there exists a linear relationship between the number of having FAFH and health outcome is investigated and the optimal number of having FAFH that leads to the best health outcome is identified in the study. The results suggest that Food Away From Home generally increases deviations of dietary intakes from the reference intakes and high-frequency FAFH consumers have an elevated chance of being obese (36.22%). However, having FAFH 1–7 times per week is associated with decreased body mass index (BMI) and a lower chance of being obese in comparison to the outcomes of having food at home. The optimal level of consuming FAFH is identified to be 5–7 times per week in terms of BMI and obesity. However, consuming no FAFH is suggested to be the best in terms of balanced nutrition intake.
Collapse
|
10
|
Zetterqvist J, Vermeulen K, Vansteelandt S, Sjölander A. Doubly robust conditional logistic regression. Stat Med 2019; 38:4749-4760. [PMID: 31373403 DOI: 10.1002/sim.8332] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2018] [Revised: 06/05/2019] [Accepted: 06/28/2019] [Indexed: 11/07/2022]
Abstract
Epidemiologic research often aims to estimate the association between a binary exposure and a binary outcome, while adjusting for a set of covariates (eg, confounders). When data are clustered, as in, for instance, matched case-control studies and co-twin-control studies, it is common to use conditional logistic regression. In this model, all cluster-constant covariates are absorbed into a cluster-specific intercept, whereas cluster-varying covariates are adjusted for by explicitly adding these as explanatory variables to the model. In this paper, we propose a doubly robust estimator of the exposure-outcome odds ratio in conditional logistic regression models. This estimator protects against bias in the odds ratio estimator due to misspecification of the part of the model that contains the cluster-varying covariates. The doubly robust estimator uses two conditional logistic regression models for the odds ratio, one prospective and one retrospective, and is consistent for the exposure-outcome odds ratio if at least one of these models is correctly specified, not necessarily both. We demonstrate the properties of the proposed method by simulations and by re-analyzing a publicly available dataset from a matched case-control study on induced abortion and infertility.
Collapse
Affiliation(s)
- Johan Zetterqvist
- Institute for Evaluation of Labour Market and Education Policy, Uppsala, Sweden
| | - Karel Vermeulen
- Department of Data Analysis and Mathematical Modelling, Faculty of Bioscience Engineering, Ghent University, Gent, Belgium
| | - Stijn Vansteelandt
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium.,Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK
| | - Arvid Sjölander
- Department of Medical Epidemiology and Biostatistics, Karolinska Institute, Stockholm, Sweden
| |
Collapse
|
11
|
Abstract
As individuals may respond differently to treatment, estimating subgroup effects is important to understand the characteristics of individuals who may benefit. Factors that define subgroups may be correlated, complicating evaluation of subgroup effects, especially in observational studies requiring control of confounding variables. We address this problem when propensity score methods are used for confounding control. A common practice is to evaluate candidate subgroup identifiers one at a time without adjusting for other candidate identifiers. We show that this practice can be misleading if the treatment effect modification attributed to a candidate identifier is in truth due to the effect of other correlated true effect modifiers. Whereas jointly analyzing multiple identifiers provides estimates of the desired subgroup effects adjusted for the effects of the other identifiers, it requires the propensity scores to adequately reflect the underlying treatment selection processes and balance the covariates within each subgroup of interest. Satisfying the requirement in practice is hard since the number of strata may increase quickly, while the per stratum sample size may decrease dramatically. A practically helpful approach is utilizing the whole cohort for the propensity score estimation with modeling of interaction terms to reflect the potentially different treatment selection processes across strata. We empirically examine the performance of the whole cohort approach by itself and with subjecting the interaction terms to variable selection. Our results using both simulations and real data analysis suggest that the whole cohort approach should explore inclusion of high-order interactions in the propensity score model to ensure adequate covariate balance across strata, and that variable selection is of limited utility.
Collapse
Affiliation(s)
- Shan-Yu Liu
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, USA
| | - Chunyan Liu
- Division of Biostatistics and Epidemiology, Cincinnati Children's Hospital Medical Center, Cincinnati, USA
| | - Eddie Nehus
- Division of Nephrology and Hypertension, Cincinnati Children's Hospital Medical Center, Cincinnati, USA
| | - Maurizio Macaluso
- Division of Biostatistics and Epidemiology, Cincinnati Children's Hospital Medical Center, Cincinnati, USA
| | - Bo Lu
- Division of Biostatistics, The Ohio State University, Columbus, USA
| | - Mi-Ok Kim
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, USA.,Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, USA
| |
Collapse
|
12
|
Ridgeway G, Nørgaard M, Rasmussen TB, Finkle WD, Pedersen L, Bøtker HE, Sørensen HT. Benchmarking Danish hospitals on mortality and readmission rates after cardiovascular admission. Clin Epidemiol 2019; 11:67-80. [PMID: 30655706 PMCID: PMC6324920 DOI: 10.2147/clep.s189263] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Objective The aim of this study was to examine hospital performance measures that account more comprehensively for unique mixes of patients' characteristics. Design Nationwide cohort registry-based study within a population-based health care system. Participants In this study, 331,513 patients discharged with a primary cardiovascular diagnosis from 1 of 26 Danish hospitals during 2011-2015 were included. Data covering all Danish hospitals were drawn from the Danish National Patient Registry and the Danish National Health Service Prescription Database. Main outcome measures Thirty-day post-admission mortality rates, 30-day post-discharge readmission rates, and the associated numbers needed to harm were measured. Methods For each index hospital, we used a non-parametric logistic regression model to compute propensity scores. Propensity score weighted patients treated at other hospitals collectively resembled patients treated at the index hospital in terms of age, sex, primary discharge diagnosis, diagnosis history, medications, previous cardiac procedures, and comorbidities. Outcomes for the weighted patients treated at other hospitals formed benchmarks for the index hospital. Doubly robust regression formally tested whether the outcomes of patients at the index hospital differed from the outcomes of the patients used to form the benchmarks. For each index hospital, we computed the false discovery rate, ie, the probability of being incorrect if we claimed the hospital differed from its benchmark. Results Five hospitals exceeded their benchmark for 30-day mortality rates, with the number needed to harm ranging between 55 and 137. Seven hospitals exceeded their benchmark for readmission, with the number needed to harm ranging from 22 to 71. Our benchmarking approach flagged fewer hospitals as outliers compared with conventional regression methods. Conclusion Conventional methods flag more hospitals as outliers than our benchmarking approach. Our benchmarking approach accounts more thoroughly for differences in hospitals' patient case mix, reducing the risk of false-positive selection of suspected outliers. A more comprehensive system of hospital performance measurement could be based on this approach.
Collapse
Affiliation(s)
- Greg Ridgeway
- Department of Criminology, University of Pennsylvania, Philadelphia, PA, USA, .,Department of Statistics, University of Pennsylvania, Philadelphia, PA, USA, .,Consolidated Research, Inc., Los Angeles, CA, USA,
| | - Mette Nørgaard
- Department of Clinical Epidemiology, Institute of Clinical Medicine, Aarhus University Hospital, Aarhus, Denmark
| | - Thomas Bøjer Rasmussen
- Department of Clinical Epidemiology, Institute of Clinical Medicine, Aarhus University Hospital, Aarhus, Denmark
| | | | - Lars Pedersen
- Department of Clinical Epidemiology, Institute of Clinical Medicine, Aarhus University Hospital, Aarhus, Denmark
| | - Hans Erik Bøtker
- Department of Cardiology, Aarhus University Hospital, Aarhus, Denmark
| | - Henrik Toft Sørensen
- Department of Clinical Epidemiology, Institute of Clinical Medicine, Aarhus University Hospital, Aarhus, Denmark
| |
Collapse
|
13
|
Abstract
Inverse probability weighting can be used to estimate the average treatment effect in propensity score analysis. When there is lack of overlap in the propensity score distributions between the treatment groups under comparison, some weights may be excessively large, causing numerical instability and bias in point and variance estimation. We study a class of modified inverse probability weighting estimators that can be used to avoid this problem. These weights cause the estimand to deviate from the average treatment effect. We provide some justification for this deviation from the perspective of treatment effect discovery. We show that when lack of overlap occurs, the modified weights can achieve substantial gains in statistical power compared with inverse probability weighting and other propensity score methods. We develop analytical variance estimates that properly adjust for the sampling variability of the estimated propensity scores, and augment the modified inverse probability weighting estimator with outcome models for improved efficiency, a property that resembles double robustness. Results from extensive simulations and a real data application support our conclusions. The proposed methodology is implemented in R package PSW.
Collapse
Affiliation(s)
- Huzhang Mao
- 1 Department of Biostatistics, University of Texas School of Public Health, Houston, TX, USA.,2 Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Liang Li
- 2 Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Tom Greene
- 3 Department of Population Health Sciences, University of Utah, Salt Lake City, UT, USA
| |
Collapse
|
14
|
Nasseh K, Vujicic M, Glick M. The Relationship between Periodontal Interventions and Healthcare Costs and Utilization. Evidence from an Integrated Dental, Medical, and Pharmacy Commercial Claims Database. Health Econ 2017; 26:519-527. [PMID: 26799518 PMCID: PMC5347922 DOI: 10.1002/hec.3316] [Citation(s) in RCA: 72] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2014] [Revised: 08/03/2015] [Accepted: 12/10/2015] [Indexed: 05/03/2023]
Abstract
Periodontal disease has been linked to poor glycemic control among individuals with type 2 diabetes. Using integrated dental, medical, and pharmacy commercial claims from Truven MarketScan® Research Databases, we implement inverse probability weighting and doubly robust methods to estimate a relationship between a periodontal intervention and healthcare costs and utilization. Among individuals newly diagnosed with type 2 diabetes, we find that a periodontal intervention is associated with lower total healthcare costs (-$1799), lower total medical costs excluding pharmacy costs (-$1577), and lower total type 2 diabetes-related healthcare costs (-$408). © 2016 The Authors. Health Economics Published by John Wiley & Sons Ltd.
Collapse
Affiliation(s)
- Kamyar Nasseh
- American Dental AssociationHealth Policy InstituteChicagoILUSA
| | - Marko Vujicic
- American Dental AssociationHealth Policy InstituteChicagoILUSA
| | - Michael Glick
- University of Buffalo (The State University of New York)BuffaloNYUSA
| |
Collapse
|
15
|
Li J, Handorf E, Bekelman J, Mitra N. Propensity score and doubly robust methods for estimating the effect of treatment on censored cost. Stat Med 2015; 35:1985-99. [PMID: 26678242 DOI: 10.1002/sim.6842] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2015] [Revised: 10/19/2015] [Accepted: 11/17/2015] [Indexed: 11/07/2022]
Abstract
The estimation of treatment effects on medical costs is complicated by the need to account for informative censoring, skewness, and the effects of confounders. Because medical costs are often collected from observational claims data, we investigate propensity score (PS) methods such as covariate adjustment, stratification, and inverse probability weighting taking into account informative censoring of the cost outcome. We compare these more commonly used methods with doubly robust (DR) estimation. We then use a machine learning approach called super learner (SL) to choose among conventional cost models to estimate regression parameters in the DR approach and to choose among various model specifications for PS estimation. Our simulation studies show that when the PS model is correctly specified, weighting and DR perform well. When the PS model is misspecified, the combined approach of DR with SL can still provide unbiased estimates. SL is especially useful when the underlying cost distribution comes from a mixture of different distributions or when the true PS model is unknown. We apply these approaches to a cost analysis of two bladder cancer treatments, cystectomy versus bladder preservation therapy, using SEER-Medicare data. Copyright © 2015 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Jiaqi Li
- Department of Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia, PA, 19104, U.S.A
| | - Elizabeth Handorf
- Biostatistics and Bioinformatics Facility, Temple University Health System Fox Chase Cancer Center, Philadelphia, PA, 19111, U.S.A
| | - Justin Bekelman
- Department of Radiation Oncology, University of Pennsylvania, Philadelphia, PA, 19104, U.S.A
| | - Nandita Mitra
- Department of Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia, PA, 19104, U.S.A
| |
Collapse
|
16
|
Brinkley J. A doubly robust estimator for the attributable benefit of a treatment regime. Stat Med 2014; 33:5057-73. [PMID: 25382146 DOI: 10.1002/sim.6312] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2013] [Revised: 09/02/2014] [Accepted: 09/03/2014] [Indexed: 11/08/2022]
Abstract
The identification and study of treatment regimes (algorithms or policies for dictating treatments to patients) are a growing area of study in the statistical sciences. Many methods have been put forth to identify the 'best' or optimal treatment regime from observed data. Once the optimal treatment regime is identified, a secondary question of interest is to determine the public health impact of that health policy. Simply put, what is the benefit that can be attributed to using such a regime in practice? The attributable benefit of a treatment regime is a measure of the reduction in poor outcomes that would have been observed had the regime of interest been utilized. Methods for identifying the optimal treatment regime can use statistical modeling techniques which are susceptible to model misspecification in the identification of both the optimal treatment regime and its attributable benefit. Using notions from causal inference and building upon previous work, this paper identifies an estimator for attributable benefit that offers a second layer of protection in cases where an outcome regression model may be misspecified. The estimator is dubbed doubly robust in that it is unbiased for the true benefit if either a model for the outcome or a propensity model for treatment is correctly specified. Large sample properties are explored, and two sets of confidence intervals proposed. Simulation studies compare the proposed estimator with previous work, with a focus on model misspecification. The estimator is applied to real data to examine its utility in practice.
Collapse
Affiliation(s)
- Jason Brinkley
- John Wiley & Sons, Ltd., The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, U.K
| |
Collapse
|