1
|
Dashti SG, Lee KJ, Simpson JA, White IR, Carlin JB, Moreno-Betancur M. Handling missing data when estimating causal effects with targeted maximum likelihood estimation. Am J Epidemiol 2024; 193:1019-1030. [PMID: 38400653 PMCID: PMC11228874 DOI: 10.1093/aje/kwae012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Revised: 02/04/2024] [Accepted: 02/20/2024] [Indexed: 02/25/2024] Open
Abstract
Targeted maximum likelihood estimation (TMLE) is increasingly used for doubly robust causal inference, but how missing data should be handled when using TMLE with data-adaptive approaches is unclear. Based on data (1992-1998) from the Victorian Adolescent Health Cohort Study, we conducted a simulation study to evaluate 8 missing-data methods in this context: complete-case analysis, extended TMLE incorporating an outcome-missingness model, the missing covariate missing indicator method, and 5 multiple imputation (MI) approaches using parametric or machine-learning models. We considered 6 scenarios that varied in terms of exposure/outcome generation models (presence of confounder-confounder interactions) and missingness mechanisms (whether outcome influenced missingness in other variables and presence of interaction/nonlinear terms in missingness models). Complete-case analysis and extended TMLE had small biases when outcome did not influence missingness in other variables. Parametric MI without interactions had large bias when exposure/outcome generation models included interactions. Parametric MI including interactions performed best in bias and variance reduction across all settings, except when missingness models included a nonlinear term. When choosing a method for handling missing data in the context of TMLE, researchers must consider the missingness mechanism and, for MI, compatibility with the analysis method. In many settings, a parametric MI approach that incorporates interactions and nonlinearities is expected to perform well.
Collapse
Affiliation(s)
- S Ghazaleh Dashti
- Corresponding author: S. Ghazaleh Dashti, Clinical Epidemiology and Biostatistics Unit, Murdoch Children’s Research Institute, Royal Children’s Hospital, 50 Flemington Road, Parkville, VIC 3052, Australia ()
| | | | | | | | | | | |
Collapse
|
2
|
Smith MJ, Phillips RV, Luque-Fernandez MA, Maringe C. Application of targeted maximum likelihood estimation in public health and epidemiological studies: a systematic review. Ann Epidemiol 2023; 86:34-48.e28. [PMID: 37343734 DOI: 10.1016/j.annepidem.2023.06.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 05/24/2023] [Accepted: 06/06/2023] [Indexed: 06/23/2023]
Abstract
PURPOSE The targeted maximum likelihood estimation (TMLE) statistical data analysis framework integrates machine learning, statistical theory, and statistical inference to provide a least biased, efficient, and robust strategy for estimation and inference of a variety of statistical and causal parameters. We describe and evaluate the epidemiological applications that have benefited from recent methodological developments. METHODS We conducted a systematic literature review in PubMed for articles that applied any form of TMLE in observational studies. We summarized the epidemiological discipline, geographical location, expertize of the authors, and TMLE methods over time. We used the Roadmap of Targeted Learning and Causal Inference to extract key methodological aspects of the publications. We showcase the contributions to the literature of these TMLE results. RESULTS Of the 89 publications included, 33% originated from the University of California at Berkeley, where the framework was first developed by Professor Mark van der Laan. By 2022, 59% of the publications originated from outside the United States and explored up to seven different epidemiological disciplines in 2021-2022. Double-robustness, bias reduction, and model misspecification were the main motivations that drew researchers toward the TMLE framework. Through time, a wide variety of methodological, tutorial, and software-specific articles were cited, owing to the constant growth of methodological developments around TMLE. CONCLUSIONS There is a clear dissemination trend of the TMLE framework to various epidemiological disciplines and to increasing numbers of geographical areas. The availability of R packages, publication of tutorial papers, and involvement of methodological experts in applied publications have contributed to an exponential increase in the number of studies that understood the benefits and adoption of TMLE.
Collapse
Affiliation(s)
- Matthew J Smith
- Inequalities in Cancer Outcomes Network, London School of Hygiene and Tropical Medicine, London, UK.
| | - Rachael V Phillips
- Division of Biostatistics, School of Public Health, University of California at Berkeley, Berkeley, CA
| | - Miguel Angel Luque-Fernandez
- Inequalities in Cancer Outcomes Network, London School of Hygiene and Tropical Medicine, London, UK; Department of Statistics and Operations Research, University of Granada, Granada, Spain
| | - Camille Maringe
- Inequalities in Cancer Outcomes Network, London School of Hygiene and Tropical Medicine, London, UK
| |
Collapse
|
3
|
Moreno-Betancur M, Lynch JW, Pilkington RM, Schuch HS, Gialamas A, Sawyer MG, Chittleborough CR, Schurer S, Gurrin LC. Emulating a target trial of intensive nurse home visiting in the policy-relevant population using linked administrative data. Int J Epidemiol 2023; 52:119-131. [PMID: 35588223 PMCID: PMC9908050 DOI: 10.1093/ije/dyac092] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 04/21/2022] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND Populations willing to participate in randomized trials may not correspond well to policy-relevant target populations. Evidence of effectiveness that is complementary to randomized trials may be obtained by combining the 'target trial' causal inference framework with whole-of-population linked administrative data. METHODS We demonstrate this approach in an evaluation of the South Australian Family Home Visiting Program, a nurse home visiting programme targeting socially disadvantaged families. Using de-identified data from 2004-10 in the ethics-approved Better Evidence Better Outcomes Linked Data (BEBOLD) platform, we characterized the policy-relevant population and emulated a trial evaluating effects on child developmental vulnerability at 5 years (n = 4160) and academic achievement at 9 years (n = 6370). Linkage to seven health, welfare and education data sources allowed adjustment for 29 confounders using Targeted Maximum Likelihood Estimation (TMLE) with SuperLearner. Sensitivity analyses assessed robustness to analytical choices. RESULTS We demonstrated how the target trial framework may be used with linked administrative data to generate evidence for an intervention as it is delivered in practice in the community in the policy-relevant target population, and considering effects on outcomes years down the track. The target trial lens also aided in understanding and limiting the increased measurement, confounding and selection bias risks arising with such data. Substantively, we did not find robust evidence of a meaningful beneficial intervention effect. CONCLUSIONS This approach could be a valuable avenue for generating high-quality, policy-relevant evidence that is complementary to trials, particularly when the target populations are multiply disadvantaged and less likely to participate in trials.
Collapse
Affiliation(s)
- Margarita Moreno-Betancur
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Parkville, VIC, Australia
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children’s Research Institute, Parkville, VIC, Australia
- Clinical Epidemiology and Biostatistics Unit, Department of Paediatrics, University of Melbourne, Parkville, VIC, Australia
| | - John W Lynch
- School of Public Health, University of Adelaide, Adelaide, SA, Australia
- Robinson Research Institute, University of Adelaide, Adelaide, SA, Australia
- Bristol Medical School, Population Health Sciences, University of Bristol, Bristol, UK
| | - Rhiannon M Pilkington
- School of Public Health, University of Adelaide, Adelaide, SA, Australia
- Robinson Research Institute, University of Adelaide, Adelaide, SA, Australia
| | - Helena S Schuch
- School of Public Health, University of Adelaide, Adelaide, SA, Australia
- Robinson Research Institute, University of Adelaide, Adelaide, SA, Australia
- Postgraduate programme in Dentistry, Federal University of Pelotas, Pelotas, Brazil
| | - Angela Gialamas
- School of Public Health, University of Adelaide, Adelaide, SA, Australia
- Robinson Research Institute, University of Adelaide, Adelaide, SA, Australia
| | - Michael G Sawyer
- Robinson Research Institute, University of Adelaide, Adelaide, SA, Australia
- School of Medicine, University of Adelaide, Adelaide, SA, Australia
| | - Catherine R Chittleborough
- School of Public Health, University of Adelaide, Adelaide, SA, Australia
- Robinson Research Institute, University of Adelaide, Adelaide, SA, Australia
| | - Stefanie Schurer
- School of Economics, University of Sydney, Sydney, NSW, Australia
| | - Lyle C Gurrin
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Parkville, VIC, Australia
| |
Collapse
|
4
|
Wang SJ, Huang Z, Zhu H. Performance of LTMLE in the presence of missing data in control-matched longitudinal studies. Stat Biopharm Res 2022. [DOI: 10.1080/19466315.2022.2108136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
Affiliation(s)
- Sue-Jane Wang
- Division of Biometrics I, Office of Biostatistics, Office of Translational Sciences, Center for Drug Evaluation and Research, U.S. Food and Drug Administration
| | - Zhipeng Huang
- Division of Biometrics I, Office of Biostatistics, Office of Translational Sciences, Center for Drug Evaluation and Research, U.S. Food and Drug Administration
| | - Hai Zhu
- Department of Biometrics and Clinical Development, SystImmune, Inc
| |
Collapse
|
5
|
Díaz I, Williams N, Hoffman KL, Schenck EJ. Nonparametric Causal Effects Based on Longitudinal Modified Treatment Policies. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2021.1955691] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Iván Díaz
- Division of Biostatistics, Department of Population Health Sciences, Weill Cornell Medicine, New York
| | - Nicholas Williams
- Division of Biostatistics, Department of Population Health Sciences, Weill Cornell Medicine, New York
| | - Katherine L. Hoffman
- Division of Biostatistics, Department of Population Health Sciences, Weill Cornell Medicine, New York
| | - Edward J. Schenck
- Division of Pulmonary & Critical Care Medicine, Department of Medicine, Weill Cornell Medicine, New York
| |
Collapse
|
6
|
Díaz I, Savenkov O, Kamel H. Nonparametric targeted Bayesian estimation of class proportions in unlabeled data. Biostatistics 2020; 23:274-293. [PMID: 32529244 DOI: 10.1093/biostatistics/kxaa022] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Revised: 04/21/2020] [Accepted: 04/23/2020] [Indexed: 12/20/2022] Open
Abstract
We introduce a novel Bayesian estimator for the class proportion in an unlabeled dataset, based on the targeted learning framework. The procedure requires the specification of a prior (and outputs a posterior) only for the target of inference, and yields a tightly concentrated posterior. When the scientific question can be characterized by a low-dimensional parameter functional, this focus on target prior and posterior distributions perfectly aligns with Bayesian subjectivism. We prove a Bernstein-von Mises-type result for our proposed Bayesian procedure, which guarantees that the posterior distribution converges to the distribution of an efficient, asymptotically linear estimator. In particular, the posterior is Gaussian, doubly robust, and efficient in the limit, under the only assumption that certain nuisance parameters are estimated at slower-than-parametric rates. We perform numerical studies illustrating the frequentist properties of the method. We also illustrate their use in a motivating application to estimate the proportion of embolic strokes of undetermined source arising from occult cardiac sources or large-artery atherosclerotic lesions. Though we focus on the motivating example of the proportion of cases in an unlabeled dataset, the procedure is general and can be adapted to estimate any pathwise differentiable parameter in a non-parametric model.
Collapse
Affiliation(s)
- Iván Díaz
- Division of Biostatistics, Weill Cornell Medicine, New York, NY 10065, USA
| | | | - Hooman Kamel
- Department of Neurology, Weill Cornell Medicine, New York, NY 10065, USA
| |
Collapse
|
7
|
Díaz I. Statistical inference for data-adaptive doubly robust estimators with survival outcomes. Stat Med 2019; 38:2735-2748. [PMID: 30950107 DOI: 10.1002/sim.8156] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Revised: 02/25/2019] [Accepted: 03/08/2019] [Indexed: 11/06/2022]
Abstract
The consistency of doubly robust estimators relies on the consistent estimation of at least one of two nuisance regression parameters. In moderate-to-large dimensions, the use of flexible data-adaptive regression estimators may aid in achieving this consistency. However, n1/2 -consistency of doubly robust estimators is not guaranteed if one of the nuisance estimators is inconsistent. In this paper, we present a doubly robust estimator for survival analysis with the novel property that it converges to a Gaussian variable at an n1/2 -rate for a large class of data-adaptive estimators of the nuisance parameters, under the only assumption that at least one of them is consistently estimated at an n1/4 -rate. This result is achieved through the adaptation of recent ideas in semiparametric inference, which amount to (i) Gaussianizing (ie, making asymptotically linear) a drift term that arises in the asymptotic analysis of the doubly robust estimator and (ii) using cross-fitting to avoid entropy conditions on the nuisance estimators. We present the formula of the asymptotic variance of the estimator, which allows for the computation of doubly robust confidence intervals and p values. We illustrate the finite-sample properties of the estimator in simulation studies and demonstrate its use in a phase III clinical trial for estimating the effect of a novel therapy for the treatment of human epidermal growth factor receptor 2 (HER2)-positive breast cancer.
Collapse
Affiliation(s)
- Iván Díaz
- Division of Biostatistics, Weill Cornell Medicine, New York, New York
| |
Collapse
|
8
|
Moreno-Betancur M, Lee KJ, Leacy FP, White IR, Simpson JA, Carlin JB. Canonical Causal Diagrams to Guide the Treatment of Missing Data in Epidemiologic Studies. Am J Epidemiol 2018; 187:2705-2715. [PMID: 30124749 PMCID: PMC6269242 DOI: 10.1093/aje/kwy173] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2018] [Accepted: 08/03/2018] [Indexed: 12/02/2022] Open
Abstract
With incomplete data, the “missing at random” (MAR) assumption is widely understood to enable unbiased estimation with appropriate methods. While the need to assess the plausibility of MAR and to perform sensitivity analyses considering “missing not at random” (MNAR) scenarios has been emphasized, the practical difficulty of these tasks is rarely acknowledged. With multivariable missingness, what MAR means is difficult to grasp, and in many MNAR scenarios unbiased estimation is possible using methods commonly associated with MAR. Directed acyclic graphs (DAGs) have been proposed as an alternative framework for specifying practically accessible assumptions beyond the MAR-MNAR dichotomy. However, there is currently no general algorithm for deciding how to handle the missing data given a specific DAG. Here we construct “canonical” DAGs capturing typical missingness mechanisms in epidemiologic studies with incomplete data on exposure, outcome, and confounding factors. For each DAG, we determine whether common target parameters are “recoverable,” meaning that they can be expressed as functions of the available data distribution and thus estimated consistently, or whether sensitivity analyses are necessary. We investigate the performance of available-case and multiple-imputation procedures. Using data from waves 1–3 of the Longitudinal Study of Australian Children (2004–2008), we illustrate how our findings can guide the treatment of missing data in point-exposure studies.
Collapse
Affiliation(s)
- Margarita Moreno-Betancur
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children’s Research Institute, Melbourne, Victoria, Australia
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Melbourne, Victoria, Australia
| | - Katherine J Lee
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children’s Research Institute, Melbourne, Victoria, Australia
- Department of Paediatrics, Melbourne Medical School, University of Melbourne, Melbourne, Victoria, Australia
| | - Finbarr P Leacy
- Data Science Centre, Royal College of Surgeons in Ireland, Dublin, Ireland
| | - Ian R White
- MRC Clinical Trials Unit, London, United Kingdom
| | - Julie A Simpson
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Melbourne, Victoria, Australia
| | - John B Carlin
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children’s Research Institute, Melbourne, Victoria, Australia
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Melbourne, Victoria, Australia
| |
Collapse
|
9
|
Zheng W, Luo Z, van der Laan MJ. Marginal Structural Models with Counterfactual Effect Modifiers. Int J Biostat 2018; 14:ijb-2018-0039. [PMID: 29883322 PMCID: PMC6682415 DOI: 10.1515/ijb-2018-0039] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2018] [Accepted: 05/09/2018] [Indexed: 11/15/2022]
Abstract
In health and social sciences, research questions often involve systematic assessment of the modification of treatment causal effect by patient characteristics. In longitudinal settings, time-varying or post-intervention effect modifiers are also of interest. In this work, we investigate the robust and efficient estimation of the Counterfactual-History-Adjusted Marginal Structural Model (van der Laan MJ, Petersen M. Statistical learning of origin-specific statically optimal individualized treatment rules. Int J Biostat. 2007;3), which models the conditional intervention-specific mean outcome given a counterfactual modifier history in an ideal experiment. We establish the semiparametric efficiency theory for these models, and present a substitution-based, semiparametric efficient and doubly robust estimator using the targeted maximum likelihood estimation methodology (TMLE, e.g. van der Laan MJ, Rubin DB. Targeted maximum likelihood learning. Int J Biostat. 2006;2, van der Laan MJ, Rose S. Targeted learning: causal inference for observational and experimental data, 1st ed. Springer Series in Statistics. Springer, 2011). To facilitate implementation in applications where the effect modifier is high dimensional, our third contribution is a projected influence function (and the corresponding projected TMLE estimator), which retains most of the robustness of its efficient peer and can be easily implemented in applications where the use of the efficient influence function becomes taxing. We compare the projected TMLE estimator with an Inverse Probability of Treatment Weighted estimator (e.g. Robins JM. Marginal structural models. In: Proceedings of the American Statistical Association. Section on Bayesian Statistical Science, 1-10. 1997a, Hernan MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. EPIDEMIOLOGY 2000;11:561-570), and a non-targeted G-computation estimator (Robins JM. A new approach to causal inference in mortality studies with sustained exposure periods - application to control of the healthy worker survivor effect. Math Modell. 1986;7:1393-1512.). The comparative performance of these estimators is assessed in a simulation study. The use of the projected TMLE estimator is illustrated in a secondary data analysis for the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) trial where effect modifiers are subject to missing at random.
Collapse
Affiliation(s)
- Wenjing Zheng
- Division of Biostatistics, University of California, Berkeley, USA
- Center for Targeted Learning, University of California, Berkeley, USA
| | - Zhehui Luo
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, USA
| | - Mark J van der Laan
- Division of Biostatistics, University of California, Berkeley, USA
- Center for Targeted Learning, University of California, Berkeley, USA
| |
Collapse
|