1
|
Liu J, Xi D. Covariate adjustment and estimation of difference in proportions in randomized clinical trials. Pharm Stat 2024. [PMID: 38763917 DOI: 10.1002/pst.2397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 04/08/2024] [Accepted: 05/03/2024] [Indexed: 05/21/2024]
Abstract
Difference in proportions is frequently used to measure treatment effect for binary outcomes in randomized clinical trials. The estimation of difference in proportions can be assisted by adjusting for prognostic baseline covariates to enhance precision and bolster statistical power. Standardization or g-computation is a widely used method for covariate adjustment in estimating unconditional difference in proportions, because of its robustness to model misspecification. Various inference methods have been proposed to quantify the uncertainty and confidence intervals based on large-sample theories. However, their performances under small sample sizes and model misspecification have not been comprehensively evaluated. We propose an alternative approach to estimate the unconditional variance of the standardization estimator based on the robust sandwich estimator to further enhance the finite sample performance. Extensive simulations are provided to demonstrate the performances of the proposed method, spanning a wide range of sample sizes, randomization ratios, and model specification. We apply the proposed method in a real data example to illustrate the practical utility.
Collapse
Affiliation(s)
- Jialuo Liu
- Department of Biostatistics, Gilead Sciences, Foster City, California, USA
| | - Dong Xi
- Department of Biostatistics, Gilead Sciences, Foster City, California, USA
| |
Collapse
|
2
|
Liu S, Yang S, Zhang Y, Liu G(F. Multiply robust estimators in longitudinal studies with missing data under control-based imputation. Biometrics 2024; 80:ujad036. [PMID: 38393335 PMCID: PMC10885818 DOI: 10.1093/biomtc/ujad036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 12/05/2023] [Accepted: 12/19/2023] [Indexed: 02/25/2024]
Abstract
Longitudinal studies are often subject to missing data. The recent guidance from regulatory agencies, such as the ICH E9(R1) addendum addresses the importance of defining a treatment effect estimand with the consideration of intercurrent events. Jump-to-reference (J2R) is one classical control-based scenario for the treatment effect evaluation, where the participants in the treatment group after intercurrent events are assumed to have the same disease progress as those with identical covariates in the control group. We establish new estimators to assess the average treatment effect based on a proposed potential outcomes framework under J2R. Various identification formulas are constructed, motivating estimators that rely on different parts of the observed data distribution. Moreover, we obtain a novel estimator inspired by the efficient influence function, with multiple robustness in the sense that it achieves n1/2-consistency if any pairs of multiple nuisance functions are correctly specified, or if the nuisance functions converge at a rate not slower than n-1/4 when using flexible modeling approaches. The finite-sample performance of the proposed estimators is validated in simulation studies and an antidepressant clinical trial.
Collapse
Affiliation(s)
- Siyi Liu
- Department of Statistics, North Carolina State University, Raleigh, NC 27607, United States
| | - Shu Yang
- Department of Statistics, North Carolina State University, Raleigh, NC 27607, United States
| | - Yilong Zhang
- Merck & Co., Inc., Kenilworth, NJ 07033, United States
| | | |
Collapse
|
3
|
Souli Y, Trudel X, Diop A, Brisson C, Talbot D. Longitudinal plasmode algorithms to evaluate statistical methods in realistic scenarios: an illustration applied to occupational epidemiology. BMC Med Res Methodol 2023; 23:242. [PMID: 37853309 PMCID: PMC10585912 DOI: 10.1186/s12874-023-02062-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Accepted: 10/09/2023] [Indexed: 10/20/2023] Open
Abstract
INTRODUCTION Plasmode simulations are a type of simulations that use real data to determine the synthetic data-generating equations. Such simulations thus allow evaluating statistical methods under realistic conditions. As far as we know, no plasmode algorithm has been proposed for simulating longitudinal data. In this paper, we propose a longitudinal plasmode framework to generate realistic data with both a time-varying exposure and time-varying covariates. This work was motivated by the objective of comparing different methods for estimating the causal effect of a cumulative exposure to psychosocial stressors at work over time. METHODS We developed two longitudinal plasmode algorithms: a parametric and a nonparametric algorithms. Data from the PROspective Québec (PROQ) Study on Work and Health were used as an input to generate data with the proposed plasmode algorithms. We evaluated the performance of multiple estimators of the parameters of marginal structural models (MSMs): inverse probability of treatment weighting, g-computation and targeted maximum likelihood estimation. These estimators were also compared to standard regression approaches with either adjustment for baseline covariates only or with adjustment for both baseline and time-varying covariates. RESULTS Standard regression methods were susceptible to yield biased estimates with confidence intervals having coverage probability lower than their nominal level. The bias was much lower and coverage of confidence intervals was much closer to the nominal level when considering MSMs. Among MSM estimators, g-computation overall produced the best results relative to bias, root mean squared error and coverage of confidence intervals. No method produced unbiased estimates with adequate coverage for all parameters in the more realistic nonparametric plasmode simulation. CONCLUSION The proposed longitudinal plasmode algorithms can be important methodological tools for evaluating and comparing analytical methods in realistic simulation scenarios. To facilitate the use of these algorithms, we provide R functions on GitHub. We also recommend using MSMs when estimating the effect of cumulative exposure to psychosocial stressors at work.
Collapse
Affiliation(s)
- Youssra Souli
- Institute for Stochastics Johannes Kepler University, Linz, Austria
| | - Xavier Trudel
- Université Laval, Département de médecine sociale et préventive, Québec, Canada
- Centre de recherche du CHU de Québec - Université Laval, Axe santé des populations et pratiques optimales en santé, Québec, Canada
| | - Awa Diop
- Université Laval, Département de médecine sociale et préventive, Québec, Canada
- Centre de recherche du CHU de Québec - Université Laval, Axe santé des populations et pratiques optimales en santé, Québec, Canada
| | - Chantal Brisson
- Université Laval, Département de médecine sociale et préventive, Québec, Canada
- Centre de recherche du CHU de Québec - Université Laval, Axe santé des populations et pratiques optimales en santé, Québec, Canada
| | - Denis Talbot
- Université Laval, Département de médecine sociale et préventive, Québec, Canada.
- Centre de recherche du CHU de Québec - Université Laval, Axe santé des populations et pratiques optimales en santé, Québec, Canada.
| |
Collapse
|
4
|
Naimi AI, Mishler AE, Kennedy EH. Challenges in Obtaining Valid Causal Effect Estimates with Machine Learning Algorithms. Am J Epidemiol 2023; 192:kwab201. [PMID: 34268558 DOI: 10.1093/aje/kwab201] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Revised: 01/06/2021] [Accepted: 01/08/2021] [Indexed: 11/14/2022] Open
Abstract
Unlike parametric regression, machine learning (ML) methods do not generally require precise knowledge of the true data generating mechanisms. As such, numerous authors have advocated for ML methods to estimate causal effects. Unfortunately, ML algorithmscan perform worse than parametric regression. We demonstrate the performance of ML-based single- and double-robust estimators. We use 100 Monte Carlo samples with sample sizes of 200, 1200, and 5000 to investigate bias and confidence interval coverage under several scenarios. In a simple confounding scenario, confounders were related to the treatment and the outcome via parametric models. In a complex confounding scenario, the simple confounders were transformed to induce complicated nonlinear relationships. In the simple scenario, when ML algorithms were used, double-robust estimators were superior to single-robust estimators. In the complex scenario, single-robust estimators with ML algorithms were at least as biased as estimators using misspecified parametric models. Double-robust estimators were less biased, but coverage was well below nominal. The use of sample splitting, inclusion of confounder interactions, reliance on a richly specified ML algorithm, and use of doubly robust estimators was the only explored approach that yielded negligible bias and nominal coverage. Our results suggest that ML based singly robust methods should be avoided.
Collapse
Affiliation(s)
| | - Alan E Mishler
- Department of Statistics & Data Science, Carnegie Mellon University
| | - Edward H Kennedy
- Department of Statistics & Data Science, Carnegie Mellon University
| |
Collapse
|
5
|
Ye T, Bannick M, Yi Y, Shao J. Robust Variance Estimation for Covariate-Adjusted Unconditional Treatment Effect in Randomized Clinical Trials with Binary Outcomes. STATISTICAL THEORY AND RELATED FIELDS 2023; 7:159-163. [PMID: 37997606 PMCID: PMC10665030 DOI: 10.1080/24754269.2023.2205802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 03/28/2023] [Accepted: 04/18/2023] [Indexed: 11/25/2023]
Abstract
To improve precision of estimation and power of testing hypothesis for an unconditional treatment effect in randomized clinical trials with binary outcomes, researchers and regulatory agencies recommend using g-computation as a reliable method of covariate adjustment. However, the practical application of g-computation is hindered by the lack of an explicit robust variance formula that can be used for different unconditional treatment effects of interest. To fill this gap, we provide explicit and robust variance estimators for g-computation estimators and demonstrate through simulations that the variance estimators can be reliably applied in practice.
Collapse
Affiliation(s)
- Ting Ye
- Department of Biostatistics, University of Washington, Seattle, Washington 98195, U.S.A
| | - Marlena Bannick
- Department of Biostatistics, University of Washington, Seattle, Washington 98195, U.S.A
| | - Yanyao Yi
- Global Statistical Sciences, Eli Lilly and Company, Indianapolis, Indiana 46285, U.S.A
| | - Jun Shao
- School of Statistics, East China Normal University, Shanghai 200241, China, Department of Statistics, University of Wisconsin, Madison, Wisconsin 53706, U.S.A
| |
Collapse
|
6
|
Das M, Kennedy EH, Jewell NP. Doubly robust capture-recapture methods for estimating population size. J Am Stat Assoc 2023. [DOI: 10.1080/01621459.2023.2187814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2023]
Affiliation(s)
- Manjari Das
- Department of Statistics & Data Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh PA 15213, USA
| | - Edward H. Kennedy
- Department of Statistics & Data Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh PA 15213, USA
| | - Nicholas P. Jewell
- Department of Medical Statistics, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
- Division of Biostatistics and Epidemiology, University of California, 2121 Berkeley Way, Berkeley CA 94720, USA
| |
Collapse
|
7
|
Hejazi NS, Boileau P, van der Laan MJ, Hubbard AE. A generalization of moderated statistics to data adaptive semiparametric estimation in high-dimensional biology. Stat Methods Med Res 2023; 32:539-554. [PMID: 36573044 PMCID: PMC11078029 DOI: 10.1177/09622802221146313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
The widespread availability of high-dimensional biological data has made the simultaneous screening of many biological characteristics a central problem in computational and high-dimensional biology. As the dimensionality of datasets continues to grow, so too does the complexity of identifying biomarkers linked to exposure patterns. The statistical analysis of such data often relies upon parametric modeling assumptions motivated by convenience, inviting opportunities for model misspecification. While estimation frameworks incorporating flexible, data adaptive regression strategies can mitigate this, their standard variance estimators are often unstable in high-dimensional settings, resulting in inflated Type-I error even after standard multiple testing corrections. We adapt a shrinkage approach compatible with parametric modeling strategies to semiparametric variance estimators of a family of efficient, asymptotically linear estimators of causal effects, defined by counterfactual exposure contrasts. Augmenting the inferential stability of these estimators in high-dimensional settings yields a data adaptive approach for robustly uncovering stable causal associations, even when sample sizes are limited. Our generalized variance estimator is evaluated against appropriate alternatives in numerical experiments, and an open source R/Bioconductor package, biotmle, is introduced. The proposal is demonstrated in an analysis of high-dimensional DNA methylation data from an observational study on the epigenetic effects of tobacco smoking.
Collapse
Affiliation(s)
- Nima S Hejazi
- Department of Biostatistics, T.H. Chan School of Public Health, Harvard University, Boston, MA, USA
| | - Philippe Boileau
- Division of Biostatistics, School of Public Health, University of California, Berkeley, CA, USA
- Center for Computational Biology, University of California, Berkeley, CA, USA
| | - Mark J van der Laan
- Division of Biostatistics, School of Public Health, University of California, Berkeley, CA, USA
- Center for Computational Biology, University of California, Berkeley, CA, USA
- Department of Statistics, University of California, Berkeley, CA, USA
| | - Alan E Hubbard
- Division of Biostatistics, School of Public Health, University of California, Berkeley, CA, USA
- Center for Computational Biology, University of California, Berkeley, CA, USA
| |
Collapse
|
8
|
Moosavi N, Häggström J, de Luna X. The Costs and Benefits of Uniformly Valid Causal Inference with High-Dimensional Nuisance Parameters. Stat Sci 2023. [DOI: 10.1214/21-sts843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Affiliation(s)
- Niloofar Moosavi
- Niloofar Moosavi is Ph.D. Student, Department of Statistics, USBE, Umeå University, 901 87, Umeå, Sweden
| | - Jenny Häggström
- Jenny Häggström is Associate Professor, Department of Statistics, USBE, Umeå University, 901 87, Umeå, Sweden
| | - Xavier de Luna
- Xavier de Luna is Professor, Department of Statistics, USBE, Umeå University, 901 87, Umeå, Sweden
| |
Collapse
|
9
|
Ogburn EL, Sofrygin O, Díaz I, van der Laan MJ. Causal Inference for Social Network Data. J Am Stat Assoc 2022; 119:597-611. [PMID: 38800714 PMCID: PMC11114213 DOI: 10.1080/01621459.2022.2131557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2020] [Accepted: 09/26/2022] [Indexed: 10/17/2022]
Abstract
We describe semiparametric estimation and inference for causal effects using observational data from a single social network. Our asymptotic results are the first to allow for dependence of each observation on a growing number of other units as sample size increases. In addition, while previous methods have implicitly permitted only one of two possible sources of dependence among social network observations, we allow for both dependence due to transmission of information across network ties and for dependence due to latent similarities among nodes sharing ties. We propose new causal effects that are specifically of interest in social network settings, such as interventions on network ties and network structure. We use our methods to reanalyze an influential and controversial study that estimated causal peer effects of obesity using social network data from the Framingham Heart Study; after accounting for network structure we find no evidence for causal peer effects.
Collapse
Affiliation(s)
- Elizabeth L Ogburn
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Oleg Sofrygin
- Kaiser Permanente Division of Research, 2000 Broadway, Oakland, CA, 94612, USA
| | - Iván Díaz
- Division of Biostatistics and Epidemiology, Weill Cornell Medicine, New York, NY, USA
| | - Mark J van der Laan
- Department of Biostatistics, University of California Berkeley, 2121 Berkeley Way, Berkeley, CA, 94720, USA
| |
Collapse
|
10
|
Lee D, Yang S, Wang X. Doubly robust estimators for generalizing treatment effects on survival outcomes from randomized controlled trials to a target population. JOURNAL OF CAUSAL INFERENCE 2022; 10:415-440. [PMID: 37637433 PMCID: PMC10457100 DOI: 10.1515/jci-2022-0004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/29/2023]
Abstract
In the presence of heterogeneity between the randomized controlled trial (RCT) participants and the target population, evaluating the treatment effect solely based on the RCT often leads to biased quantification of the real-world treatment effect. To address the problem of lack of generalizability for the treatment effect estimated by the RCT sample, we leverage observational studies with large samples that are representative of the target population. This article concerns evaluating treatment effects on survival outcomes for a target population and considers a broad class of estimands that are functionals of treatment-specific survival functions, including differences in survival probability and restricted mean survival times. Motivated by two intuitive but distinct approaches, i.e., imputation based on survival outcome regression and weighting based on inverse probability of sampling, censoring, and treatment assignment, we propose a semiparametric estimator through the guidance of the efficient influence function. The proposed estimator is doubly robust in the sense that it is consistent for the target population estimands if either the survival model or the weighting model is correctly specified and is locally efficient when both are correct. In addition, as an alternative to parametric estimation, we employ the nonparametric method of sieves for flexible and robust estimation of the nuisance functions and show that the resulting estimator retains the root-n consistency and efficiency, the so-called rate-double robustness. Simulation studies confirm the theoretical properties of the proposed estimator and show that it outperforms competitors. We apply the proposed method to estimate the effect of adjuvant chemotherapy on survival in patients with early-stage resected non-small cell lung cancer.
Collapse
Affiliation(s)
- Dasom Lee
- Department of Statistics, North Carolina State University, Raleigh, NC 27695, United States
| | - Shu Yang
- Department of Statistics, North Carolina State University, Raleigh, NC 27695, United States
| | - Xiaofei Wang
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27708, United States
| |
Collapse
|
11
|
Zivich PN, Hudgens MG, Brookhart MA, Moody J, Weber DJ, Aiello AE. Targeted maximum likelihood estimation of causal effects with interference: A simulation study. Stat Med 2022; 41:4554-4577. [PMID: 35852017 PMCID: PMC9489667 DOI: 10.1002/sim.9525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Revised: 06/20/2022] [Accepted: 06/28/2022] [Indexed: 11/08/2022]
Abstract
Interference, the dependency of an individual's potential outcome on the exposure of other individuals, is a common occurrence in medicine and public health. Recently, targeted maximum likelihood estimation (TMLE) has been extended to settings of interference, including in the context of estimation of the mean of an outcome under a specified distribution of exposure, referred to as a policy. This paper summarizes how TMLE for independent data is extended to general interference (network-TMLE). An extensive simulation study is presented of network-TMLE, consisting of four data generating mechanisms (unit-treatment effect only, spillover effects only, unit-treatment and spillover effects, infection transmission) in networks of varying structures. Simulations show that network-TMLE performs well across scenarios with interference, but issues manifest when policies are not well-supported by the observed data, potentially leading to poor confidence interval coverage. Guidance for practical application, freely available software, and areas of future work are provided.
Collapse
Affiliation(s)
- Paul N Zivich
- Department of Epidemiology, Gillings School of Global Public Health, UNC Chapel Hill, Chapel Hill, North Carolina, USA
- Carolina Population Center, UNC Chapel Hill, Chapel Hill, North Carolina, USA
| | - Michael G Hudgens
- Department of Biostatistics, Gillings School of Global Public Health, UNC Chapel Hill, Chapel Hill, North Carolina, USA
| | - Maurice A Brookhart
- NoviSci, Durham, North Carolina, USA
- Department of Population Health Sciences, Duke University, Durham, North Carolina, USA
| | - James Moody
- Department of Sociology, Duke University, Durham, North Carolina, USA
| | - David J Weber
- Division of Infectious Diseases, Department of Medicine, UNC Chapel Hill, Chapel Hill, North Carolina, USA
| | - Allison E Aiello
- Department of Epidemiology, Gillings School of Global Public Health, UNC Chapel Hill, Chapel Hill, North Carolina, USA
- Carolina Population Center, UNC Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
12
|
Diop SA, Duchesne T, G. Cumming S, Diop A, Talbot D. Confounding adjustment methods for multi-level treatment comparisons under lack of positivity and unknown model specification. J Appl Stat 2022; 49:2570-2592. [PMID: 35757044 PMCID: PMC9225669 DOI: 10.1080/02664763.2021.1911966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
Imbalances in covariates between treatment groups are frequent in observational studies and can lead to biased comparisons. Various adjustment methods can be employed to correct these biases in the context of multi-level treatments (> 2). Analytical challenges, such as positivity violations and incorrect model specification due to unknown functional relationships between covariates and treatment or outcome, may affect their ability to yield unbiased results. Such challenges were expected in a comparison of fire-suppression interventions for preventing fire growth. We identified the overlap weights, augmented overlap weights, bias-corrected matching and targeted maximum likelihood as methods with the best potential to address those challenges. A simple variance estimator for the overlap weight estimators that can naturally be combined with machine learning is proposed. In a simulation study, we investigated the performance of these methods as well as those of simpler alternatives. Adjustment methods that included an outcome modeling component performed better than those that focused on the treatment mechanism in our simulations. Additionally, machine learning implementation was observed to efficiently compensate for the unknown model specification for the former methods, but not the latter. Based on these results, we compared the effectiveness of fire-suppression interventions using the augmented overlap weight estimator.
Collapse
Affiliation(s)
- S. Arona Diop
- Département de mathématiques et de statistique, Université Laval, Québec, Canada
| | - Thierry Duchesne
- Département de mathématiques et de statistique, Université Laval, Québec, Canada
| | - Steven G. Cumming
- Département des sciences du bois et de la forêt, Université Laval, Québec, Canada
| | - Awa Diop
- Département de médecine sociale et préventive, Université Laval, Québec, Canada
| | - Denis Talbot
- Département de médecine sociale et préventive, Université Laval, Québec, Canada
| |
Collapse
|
13
|
Hasegawa RB, Small DS. Estimating Malaria Vaccine Efficacy in the Absence of a Gold Standard Case Definition: Mendelian Factorial Design. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2020.1863222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Raiden B. Hasegawa
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA
| | - Dylan S. Small
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA
| |
Collapse
|
14
|
Wyss R, Yanover C, El-Hay T, Bennett D, Platt RW, Zullo AR, Sari G, Wen X, Ye Y, Yuan H, Gokhale M, Patorno E, Lin KJ. Machine learning for improving high-dimensional proxy confounder adjustment in healthcare database studies: an overview of the current literature. Pharmacoepidemiol Drug Saf 2022; 31:932-943. [PMID: 35729705 PMCID: PMC9541861 DOI: 10.1002/pds.5500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 06/01/2022] [Accepted: 06/05/2022] [Indexed: 11/10/2022]
Abstract
Controlling for large numbers of variables that collectively serve as 'proxies' for unmeasured factors can often improve confounding control in pharmacoepidemiologic studies utilizing administrative healthcare databases. There is a growing body of evidence showing that data-driven machine learning algorithms for high-dimensional proxy confounder adjustment can supplement investigator-specified variables to improve confounding control compared to adjustment based on investigator-specified variables alone. Consequently, there has been a recent focus on the development of data-driven methods for high-dimensional proxy confounder adjustment. In this paper, we discuss the considerations underpinning three areas for data-driven high-dimensional proxy confounder adjustment: 1) feature generation-transforming raw data into covariates (or features) to be used for proxy adjustment; 2) covariate prioritization, selection and adjustment; and 3) diagnostic assessment. We survey current approaches and recent advancements within each area, including the most widely used approach to proxy confounder adjustment in healthcare database studies (the high-dimensional propensity score or hdPS). We also discuss limitations of the hdPS and outline recent advancements that incorporate the principles of proxy adjustment with machine learning extensions to improve performance. We further discuss challenges and avenues of future development within each area. This manuscript is endorsed by the International Society for Pharmacoepidemiology (ISPE). This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Richard Wyss
- Division of Pharmacoepidemioogy and Pharmacoeconomics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | | | - Tal El-Hay
- KI Research Institute, Kfar Malal, Israel.,IBM Research-Haifa Labs, Haifa, Israel
| | - Dimitri Bennett
- Global Evidence and Outcomes, Takeda Pharmaceutical Company Ltd., Cambridge, MA, USA
| | | | - Andrew R Zullo
- Department of Health Services, Policy, and Practice, Brown University School of Public Health and Center of Innovation in Long-Term Services and Supports, Providence Veterans Affairs Medical Center, Providence, RI, USA
| | - Grammati Sari
- Real World Evidence Strategy Lead, Visible Analytics Ltd, Oxford, UK
| | - Xuerong Wen
- Health Outcomes, Pharmacy Practice, College of Pharmacy, University of Rhode Island, Kingston, RI, USA
| | - Yizhou Ye
- Global Epidemiology, AbbVie Inc. North Chicago, IL, USA
| | - Hongbo Yuan
- Canadian Agency for Drugs and Technologies in Health, Ottawa, Canada
| | - Mugdha Gokhale
- Pharmacoepidemiology, Center for Observational and Real-world Evidence, Merck, PA, USA
| | - Elisabetta Patorno
- Division of Pharmacoepidemioogy and Pharmacoeconomics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Kueiyu Joshua Lin
- Division of Pharmacoepidemioogy and Pharmacoeconomics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.,Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| |
Collapse
|
15
|
Jiang Z, Yang S, Ding P. Multiply robust estimation of causal effects under principal ignorability. J R Stat Soc Series B Stat Methodol 2022. [DOI: 10.1111/rssb.12538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Zhichao Jiang
- Department of Biostatistics and Epidemiology University of Massachusetts Amherst Massachusetts USA
| | - Shu Yang
- Department of Statistics North Carolina State University Raleigh North Carolina USA
| | - Peng Ding
- University of California, Berkeley Berkeley California USA
| |
Collapse
|
16
|
Smith MJ, Mansournia MA, Maringe C, Zivich PN, Cole SR, Leyrat C, Belot A, Rachet B, Luque-Fernandez MA. Introduction to computational causal inference using reproducible Stata, R, and Python code: A tutorial. Stat Med 2022; 41:407-432. [PMID: 34713468 DOI: 10.1002/sim.9234] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 10/08/2021] [Accepted: 10/11/2021] [Indexed: 11/09/2022]
Abstract
The main purpose of many medical studies is to estimate the effects of a treatment or exposure on an outcome. However, it is not always possible to randomize the study participants to a particular treatment, therefore observational study designs may be used. There are major challenges with observational studies; one of which is confounding. Controlling for confounding is commonly performed by direct adjustment of measured confounders; although, sometimes this approach is suboptimal due to modeling assumptions and misspecification. Recent advances in the field of causal inference have dealt with confounding by building on classical standardization methods. However, these recent advances have progressed quickly with a relative paucity of computational-oriented applied tutorials contributing to some confusion in the use of these methods among applied researchers. In this tutorial, we show the computational implementation of different causal inference estimators from a historical perspective where new estimators were developed to overcome the limitations of the previous estimators (ie, nonparametric and parametric g-formula, inverse probability weighting, double-robust, and data-adaptive estimators). We illustrate the implementation of different methods using an empirical example from the Connors study based on intensive care medicine, and most importantly, we provide reproducible and commented code in Stata, R, and Python for researchers to adapt in their own observational study. The code can be accessed at https://github.com/migariane/Tutorial_Computational_Causal_Inference_Estimators.
Collapse
Affiliation(s)
- Matthew J Smith
- Inequalities in Cancer Outcomes Network, Department of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK
| | - Mohammad A Mansournia
- Department of Epidemiology and Biostatistics, Tehran University of Medical Sciences, Tehran, Iran
| | - Camille Maringe
- Inequalities in Cancer Outcomes Network, Department of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK
| | - Paul N Zivich
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Stephen R Cole
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Clémence Leyrat
- Inequalities in Cancer Outcomes Network, Department of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK
| | - Aurélien Belot
- Inequalities in Cancer Outcomes Network, Department of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK
| | - Bernard Rachet
- Inequalities in Cancer Outcomes Network, Department of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK
| | - Miguel A Luque-Fernandez
- Inequalities in Cancer Outcomes Network, Department of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK
- Non-communicable Disease and Cancer Epidemiology Group, Instituto de Investigacion Biosanitaria de Granada (ibs.GRANADA), Andalusian School of Public Health, University of Granada, Granada, Spain
- Biomedical Network Research Centers of Epidemiology and Public Health (CIBERESP), Madrid, Spain
| |
Collapse
|
17
|
Schnitzer ME, Guerra SF, Longo C, Blais L, Platt RW. A potential outcomes approach to defining and estimating gestational age-specific exposure effects during pregnancy. Stat Methods Med Res 2022; 31:300-314. [PMID: 34986058 PMCID: PMC8829732 DOI: 10.1177/09622802211065158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Many studies seek to evaluate the effects of potentially harmful pregnancy exposures during specific gestational periods. We consider an observational pregnancy cohort where pregnant individuals can initiate medication usage or become exposed to a drug at various times during their pregnancy. An important statistical challenge involves how to define and estimate exposure effects when pregnancy loss or delivery can occur over time. Without proper consideration, the results of standard analysis may be vulnerable to selection bias, immortal time-bias, and time-dependent confounding. In this study, we apply the “target trials” framework of Hernán and Robins in order to define effects based on the counterfactual approach often used in causal inference. This effect is defined relative to a hypothetical randomized trial of timed pregnancy exposures where delivery may precede and thus potentially interrupt exposure initiation. We describe specific implementations of inverse probability weighting, G-computation, and Targeted Maximum Likelihood Estimation to estimate the effects of interest. We demonstrate the performance of all estimators using simulated data and show that a standard implementation of inverse probability weighting is biased. We then apply our proposed methods to a pharmacoepidemiology study to evaluate the potentially time-dependent effect of exposure to inhaled corticosteroids on birthweight in pregnant people with mild asthma.
Collapse
Affiliation(s)
- Mireille E Schnitzer
- Faculty of Pharmacy, 5622Université de Montréal, Canada.,Department of Social and Preventive Medicine, 5622Université de Montréal, Canada.,Department of Epidemiology, Biostatistics and Occupational Health, 5620McGill University, Canada
| | - Steve Ferreira Guerra
- Department of Epidemiology, Biostatistics and Occupational Health, 5620McGill University, Canada
| | - Cristina Longo
- 1234Academisch Medisch Centrum Universiteit van Amsterdam, the Netherlands
| | - Lucie Blais
- Faculty of Pharmacy, 5622Université de Montréal, Canada.,Hôpital du Sacré Coeur de Montréal, Centre intégré universitaire de santé et de services sociaux du Nord-de-l'île-de-Montréal, Canada
| | - Robert W Platt
- Department of Epidemiology, Biostatistics and Occupational Health, 5620McGill University, Canada.,Research Institute of the McGill University Health Centre, Canada
| |
Collapse
|
18
|
Conzuelo Rodriguez G, Bodnar LM, Brooks MM, Wahed A, Kennedy EH, Schisterman E, Naimi AI. Performance Evaluation of Parametric and Nonparametric Methods When Assessing Effect Measure Modification. Am J Epidemiol 2022; 191:198-207. [PMID: 34409985 DOI: 10.1093/aje/kwab220] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Revised: 08/13/2021] [Accepted: 08/13/2021] [Indexed: 12/20/2022] Open
Abstract
Effect measure modification is often evaluated using parametric models. These models, although efficient when correctly specified, make strong parametric assumptions. While nonparametric models avoid important functional form assumptions, they often require larger samples to achieve a given accuracy. We conducted a simulation study to evaluate performance tradeoffs between correctly specified parametric and nonparametric models to detect effect modification of a binary exposure by both binary and continuous modifiers. We evaluated generalized linear models and doubly robust (DR) estimators, with and without sample splitting. Continuous modifiers were modeled with cubic splines, fractional polynomials, and nonparametric DR-learner. For binary modifiers, generalized linear models showed the greatest power to detect effect modification, ranging from 0.42 to 1.00 in the worst and best scenario, respectively. Augmented inverse probability weighting had the lowest power, with an increase of 23% when using sample splitting. For continuous modifiers, the DR-learner was comparable to flexible parametric models in capturing quadratic and nonlinear monotonic functions. However, for nonlinear, nonmonotonic functions, the DR-learner had lower integrated bias than splines and fractional polynomials, with values of 141.3, 251.7, and 209.0, respectively. Our findings suggest comparable performance between nonparametric and correctly specified parametric models in evaluating effect modification.
Collapse
|
19
|
Applying Machine Learning in Distributed Data Networks for Pharmacoepidemiologic and Pharmacovigilance Studies: Opportunities, Challenges, and Considerations. Drug Saf 2022; 45:493-510. [PMID: 35579813 PMCID: PMC9112258 DOI: 10.1007/s40264-022-01158-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/13/2022] [Indexed: 01/28/2023]
Abstract
Increasing availability of electronic health databases capturing real-world experiences with medical products has garnered much interest in their use for pharmacoepidemiologic and pharmacovigilance studies. The traditional practice of having numerous groups use single databases to accomplish similar tasks and address common questions about medical products can be made more efficient through well-coordinated multi-database studies, greatly facilitated through distributed data network (DDN) architectures. Access to larger amounts of electronic health data within DDNs has created a growing interest in using data-adaptive machine learning (ML) techniques that can automatically model complex associations in high-dimensional data with minimal human guidance. However, the siloed storage and diverse nature of the databases in DDNs create unique challenges for using ML. In this paper, we discuss opportunities, challenges, and considerations for applying ML in DDNs for pharmacoepidemiologic and pharmacovigilance studies. We first discuss major types of activities performed by DDNs and how ML may be used. Next, we discuss practical data-related factors influencing how DDNs work in practice. We then combine these discussions and jointly consider how opportunities for ML are affected by practical data-related factors for DDNs, leading to several challenges. We present different approaches for addressing these challenges and highlight efforts that real-world DDNs have taken or are currently taking to help mitigate them. Despite these challenges, the time is ripe for the emerging interest to use ML in DDNs, and the utility of these data-adaptive modeling techniques in pharmacoepidemiologic and pharmacovigilance studies will likely continue to increase in the coming years.
Collapse
|
20
|
Zhong Y, Kennedy EH, Bodnar LM, Naimi AI. AIPW: An R Package for Augmented Inverse Probability-Weighted Estimation of Average Causal Effects. Am J Epidemiol 2021; 190:2690-2699. [PMID: 34268567 PMCID: PMC8796813 DOI: 10.1093/aje/kwab207] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Revised: 07/09/2021] [Accepted: 07/13/2021] [Indexed: 12/26/2022] Open
Abstract
An increasing number of recent studies have suggested that doubly robust estimators with cross-fitting should be used when estimating causal effects with machine learning methods. However, not all existing programs that implement doubly robust estimators support machine learning methods and cross-fitting, or provide estimates on multiplicative scales. To address these needs, we developed AIPW, a software package implementing augmented inverse probability weighting (AIPW) estimation of average causal effects in R (R Foundation for Statistical Computing, Vienna, Austria). Key features of the AIPW package include cross-fitting and flexible covariate adjustment for observational studies and randomized controlled trials (RCTs). In this paper, we use a simulated RCT to illustrate implementation of the AIPW estimator. We also perform a simulation study to evaluate the performance of the AIPW package compared with other doubly robust implementations, including CausalGAM, npcausal, tmle, and tmle3. Our simulation showed that the AIPW package yields performance comparable to that of other programs. Furthermore, we also found that cross-fitting substantively decreases the bias and improves the confidence interval coverage for doubly robust estimators fitted with machine learning algorithms. Our findings suggest that the AIPW package can be a useful tool for estimating average causal effects with machine learning methods in RCTs and observational studies.
Collapse
Affiliation(s)
| | | | | | - Ashley I Naimi
- Correspondence to Dr. Ashley I. Naimi, Department of Epidemiology, Rollins School of Public Health, Emory University, 1518 Clifton Road, Atlanta, GA 30322 (e-mail: )
| |
Collapse
|
21
|
Lee Y, Kennedy EH, Mitra N. Doubly robust nonparametric instrumental variable estimators for survival outcomes. Biostatistics 2021; 24:518-537. [PMID: 34676400 DOI: 10.1093/biostatistics/kxab036] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2021] [Revised: 09/15/2021] [Accepted: 09/20/2021] [Indexed: 11/12/2022] Open
Abstract
Instrumental variable (IV) methods allow us the opportunity to address unmeasured confounding in causal inference. However, most IV methods are only applicable to discrete or continuous outcomes with very few IV methods for censored survival outcomes. In this article, we propose nonparametric estimators for the local average treatment effect on survival probabilities under both covariate-dependent and outcome-dependent censoring. We provide an efficient influence function-based estimator and a simple estimation procedure when the IV is either binary or continuous. The proposed estimators possess double-robustness properties and can easily incorporate nonparametric estimation using machine learning tools. In simulation studies, we demonstrate the flexibility and double robustness of our proposed estimators under various plausible scenarios. We apply our method to the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial for estimating the causal effect of screening on survival probabilities and investigate the causal contrasts between the two interventions under different censoring assumptions.
Collapse
Affiliation(s)
- Youjin Lee
- Department of Biostatistics, Brown University, 121 S Main St, Providence, RI 02912, USA
| | - Edward H Kennedy
- Department of Statistics and Data Science, Carnegie Mellon University, 132 J Baker Hall, Pittsburgh, PA 15213, USA
| | - Nandita Mitra
- Department of Biostatistics and Epidemiology, University Pennsylvania, 423 Guardian Drive, Philadelphia, PA 19104, USA
| |
Collapse
|
22
|
Han B, Paddock SM, Burgette L. Causal inference under interference with prognostic scores for dynamic group therapy studies. Int J Biostat 2021; 18:439-453. [PMID: 34391217 PMCID: PMC9973534 DOI: 10.1515/ijb-2019-0126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2019] [Accepted: 07/20/2021] [Indexed: 01/10/2023]
Abstract
Group therapy is a common treatment modality for behavioral health conditions. Patients often enter and exit groups on an ongoing basis, leading to dynamic therapy groups. Examining the effect of high versus low session attendance on patient outcomes is a research question of interest. However, there are several challenges to identifying causal effects in this setting, including the lack of randomization, interference among patients, and the interrelatedness of patient participation. Dynamic therapy groups motivate a unique causal inference scenario, as the treatment statuses are completely defined by the patient attendance record for the therapy session, which is also the structure inducing interference. We adopt the Rubin causal model framework to define the causal effect of high versus low session attendance of group therapy at both the individual patient and peer levels. We propose a strategy to identify individual, peer, and total effects of high attendance versus low attendance on patient outcomes by the prognostic score stratification. We examine performance of our approach via simulation and apply it to data from a group cognitive behavioral therapy trial for treating depression among patients in a substance use disorders treatment setting.
Collapse
Affiliation(s)
- Bing Han
- Southern California Kaiser Permanente, Pasadena, CA,To whom correspondence should be addressed:
| | | | | |
Collapse
|
23
|
Abdollahpour I, Nedjat S, Almasi-Hashiani A, Nazemipour M, Mansournia MA, Luque-Fernandez MA. Estimating the Marginal Causal Effect and Potential Impact of Waterpipe Smoking on Risk of Multiple Sclerosis Using the Targeted Maximum Likelihood Estimation Method: A Large, Population-Based Incident Case-Control Study. Am J Epidemiol 2021; 190:1332-1340. [PMID: 33576427 DOI: 10.1093/aje/kwab036] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 02/09/2021] [Accepted: 02/09/2021] [Indexed: 12/11/2022] Open
Abstract
There are few if any reports regarding the role of lifetime waterpipe smoking in the etiology of multiple sclerosis (MS). In a population-based incident case-control study conducted in Tehran, Iran, we investigated the association between waterpipe smoking and MS, adjusted for confounders. Cases (n = 547) were patients aged 15-50 years identified from the Iranian Multiple Sclerosis Society between 2013 and 2015. Population-based controls (n = 1,057) were persons aged 15-50 years recruited through random digit telephone dialing. A doubly robust estimation method, the targeted maximum likelihood estimator (TMLE), was used to estimate the marginal risk ratio and odds ratio for the association between waterpipe smoking and MS. The estimated risk ratio and odds ratio were both 1.70 (95% confidence interval: 1.34, 2.17). The population attributable fraction was 21.4% (95% confidence interval: 4.0, 38.8). Subject to the limitations of case-control studies in interpreting associations causally, these results suggest that waterpipe use, or strongly related but undetermined factors, increases the risk of MS. Further epidemiologic studies, including nested case-control studies, are needed to confirm these findings.
Collapse
|
24
|
Affiliation(s)
- Kevin Guo
- Stanford University, Statistics, Stanford, 94305-6104 United States
| | - Guillaume Basse
- Stanford University, Statistics, Stanford, 94305-6104 United States
| |
Collapse
|
25
|
Hines O, Vansteelandt S, Diaz-Ordaz K. Robust Inference for Mediated Effects in Partially Linear Models. PSYCHOMETRIKA 2021; 86:595-618. [PMID: 34008127 DOI: 10.1007/s11336-021-09768-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Revised: 04/17/2021] [Accepted: 04/24/2021] [Indexed: 06/12/2023]
Abstract
We consider mediated effects of an exposure, X on an outcome, Y, via a mediator, M, under no unmeasured confounding assumptions in the setting where models for the conditional expectation of the mediator and outcome are partially linear. We propose G-estimators for the direct and indirect effects and demonstrate consistent asymptotic normality for indirect effects when models for the conditional means of M, or X and Y are correctly specified, and for direct effects, when models for the conditional means of Y, or X and M are correct. This marks an improvement, in this particular setting, over previous 'triple' robust methods, which do not assume partially linear mean models. Testing of the no-mediation hypothesis is inherently problematic due to the composite nature of the test (either X has no effect on M or M no effect on Y), leading to low power when both effect sizes are small. We use generalized methods of moments (GMM) results to construct a new score testing framework, which includes as special cases the no-mediation and the no-direct-effect hypotheses. The proposed tests rely on an orthogonal estimation strategy for estimating nuisance parameters. Simulations show that the GMM-based tests perform better in terms of power and small sample performance compared with traditional tests in the partially linear setting, with drastic improvement under model misspecification. New methods are illustrated in a mediation analysis of data from the COPERS trial, a randomized trial investigating the effect of a non-pharmacological intervention of patients suffering from chronic pain. An accompanying R package implementing these methods can be found at github.com/ohines/plmed.
Collapse
Affiliation(s)
- Oliver Hines
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK.
| | - Stijn Vansteelandt
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Karla Diaz-Ordaz
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK
| |
Collapse
|
26
|
Oliveira NL, Kennedy EH, Tibshirani R, Levine A, Martin E, Munro C, Ragin AB, Rubin LH, Sacktor N, Seaberg EC, Weinstein A, Becker JT. Longitudinal 5-year prediction of cognitive impairment among men with HIV disease. AIDS 2021; 35:889-898. [PMID: 33534203 PMCID: PMC8881797 DOI: 10.1097/qad.0000000000002827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
BACKGROUND Although combination antiretroviral therapy reduced the prevalence of HIV-associated dementia, milder syndromes persist. Our goals were to predict cognitive impairment of the Multicenter AIDS Cohort Study (MACS) participants 5 years ahead and from a large pool of factors, select the ones that mostly contributed to our predictions. DESIGN Longitudinal, natural and treated history of HIV infection among MSM. METHODS The MACS is a longitudinal study of the natural and treated history of HIV disease in MSM; the neuropsychological substudy aims to characterize cognitive disorders in men with HIV disease. RESULTS We modeled on an annual basis the risk of cognitive impairment 5 years in the future. We were able to predict cognitive impairment at individual level with high precision and overperform default methods. We found that while a diagnosis of AIDS is a critical risk factor, HIV infection per se does not necessarily convey additional risk. Other infectious processes, most notably hepatitis B and C, are independently associated with increased risk of impairment. The relative importance of an AIDS diagnosis diminished across calendar time. CONCLUSION Our prediction models are a powerful tool to help clinicians address dementia in early stages for MACS paticipants. The strongest predictors of future cognitive impairment included the presence of clinical AIDS and hepatitis B or C infection. The fact that the pattern of predictive power differs by calendar year suggests a clinically critical change to the face of the epidemic.
Collapse
Affiliation(s)
- Natalia L. Oliveira
- Department of Statistics and Data Science, Carnegie Mellon University
- Department of Machine Learning Department, Carnegie Mellon University
| | - Edward H. Kennedy
- Department of Statistics and Data Science, Carnegie Mellon University
| | - Ryan Tibshirani
- Department of Statistics and Data Science, Carnegie Mellon University
- Department of Machine Learning Department, Carnegie Mellon University
| | - Andrew Levine
- Department of Neurology, David Geffen School of Medicine, UCLA
| | - Eileen Martin
- Department of Psychiatry, Rush University School of Medicine
| | - Cynthia Munro
- Departments of Psychiatry, The Johns Hopkins University School of Medicine
| | - Ann B. Ragin
- Department of Radiology, Northwestern University
| | - Leah H. Rubin
- Departments of Psychiatry, The Johns Hopkins University School of Medicine
- Departments of Neurology, The Johns Hopkins University School of Medicine
| | - Ned Sacktor
- Departments of Neurology, The Johns Hopkins University School of Medicine
| | - Eric C. Seaberg
- Department of Epidemiology, Bloomberg School of Public Health, The Johns Hopkins University
| | | | - James T. Becker
- Departments of Psychiatry, University of Pittsburgh
- Departments of Neurology, University of Pittsburgh
- Departments of Psychology, University of Pittsburgh
| | | |
Collapse
|
27
|
Su CL, Platt RW, Plante JF. Causal inference for recurrent event data using pseudo-observations. Biostatistics 2020; 23:189-206. [PMID: 32432686 DOI: 10.1093/biostatistics/kxaa020] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Revised: 04/01/2020] [Accepted: 04/02/2020] [Indexed: 11/13/2022] Open
Abstract
Recurrent event data are commonly encountered in observational studies where each subject may experience a particular event repeatedly over time. In this article, we aim to compare cumulative rate functions (CRFs) of two groups when treatment assignment may depend on the unbalanced distribution of confounders. Several estimators based on pseudo-observations are proposed to adjust for the confounding effects, namely inverse probability of treatment weighting estimator, regression model-based estimators, and doubly robust estimators. The proposed marginal regression estimator and doubly robust estimators based on pseudo-observations are shown to be consistent and asymptotically normal. A bootstrap approach is proposed for the variance estimation of the proposed estimators. Model diagnostic plots of residuals are presented to assess the goodness-of-fit for the proposed regression models. A family of adjusted two-sample pseudo-score tests is proposed to compare two CRFs. Simulation studies are conducted to assess finite sample performance of the proposed method. The proposed technique is demonstrated through an application to a hospital readmission data set.
Collapse
Affiliation(s)
- Chien-Lin Su
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University and Centre for Clinical Epidemiology, Lady Davis Institute, Jewish General Hospital, Montréal, Québec, Canada
| | - Robert W Platt
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University and Centre for Clinical Epidemiology, Lady Davis Institute, Jewish General Hospital, Montréal, Québec, Canada
| | | |
Collapse
|
28
|
Affiliation(s)
| | - Edward H. Kennedy
- Department of Statistics & Data Science, Carnegie Mellon University, Pittsburgh, PA
| |
Collapse
|
29
|
Ozenne BMH, Scheike TH, Stærk L, Gerds TA. On the estimation of average treatment effects with right‐censored time to event outcome and competing risks. Biom J 2020; 62:751-763. [DOI: 10.1002/bimj.201800298] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2018] [Revised: 10/28/2019] [Accepted: 11/04/2019] [Indexed: 12/12/2022]
Affiliation(s)
- Brice Maxime Hugues Ozenne
- Department of Biostatistics University of Copenhagen Copenhagen Denmark
- Neurobiology Research Unit University Hospital of Copenhagen Rigshospitalet Copenhagen Denmark
| | | | - Laila Stærk
- Department of Cardiology Copenhagen University Hospital Herlev and Gentofte Hellerup Denmark
| | - Thomas Alexander Gerds
- Department of Biostatistics University of Copenhagen Copenhagen Denmark
- Danish Heart Foundation Copenhagen Denmark
| |
Collapse
|
30
|
Wang Y, Zubizarreta JR. Minimal dispersion approximately balancing weights: asymptotic properties and practical considerations. Biometrika 2019. [DOI: 10.1093/biomet/asz050] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Summary
Weighting methods are widely used to adjust for covariates in observational studies, sample surveys, and regression settings. In this paper, we study a class of recently proposed weighting methods, which find the weights of minimum dispersion that approximately balance the covariates. We call these weights ‘minimal weights’ and study them under a common optimization framework. Our key observation is that finding weights which achieve approximate covariate balance is equivalent to performing shrinkage estimation of the inverse propensity score. This connection leads to both theoretical and practical developments. From a theoretical standpoint, we characterize the asymptotic properties of minimal weights and show that, under standard smoothness conditions on the propensity score function, minimal weights are consistent estimates of the true inverse probability weights. In addition, we show that the resulting weighting estimator is consistent, asymptotically normal and semiparametrically efficient. From a practical standpoint, we give a finite-sample oracle inequality that bounds the loss incurred by balancing more functions of the covariates than strictly needed. This inequality shows that minimal weights implicitly bound the number of active covariate balance constraints. Finally, we provide a tuning algorithm for choosing the degree of approximate balance in minimal weights. The paper concludes with an empirical study which suggests that approximate balance is preferable to exact balance, especially when there is limited overlap in covariate distributions. Further studies show that the root mean squared error of the weighting estimator can be reduced by as much as a half with approximate balance.
Collapse
Affiliation(s)
- Yixin Wang
- Department of Statistics, Columbia University, 1255 Amsterdam Ave, New York, New York 10027, U.S.A
| | - Jose R Zubizarreta
- Department of Health Care Policy, Harvard University, 180 Longwood Avenue, Boston, Massachusetts 02115, U.S.A
| |
Collapse
|
31
|
Kennedy EH. Nonparametric Causal Effects Based on Incremental Propensity Score Interventions. J Am Stat Assoc 2018. [DOI: 10.1080/01621459.2017.1422737] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Edward H. Kennedy
- Department of Statistics & Data Science, Carnegie Mellon University, Pittsburgh, PA
| |
Collapse
|