1
|
Yin A, Yuan A, Tan MT. Highly robust causal semiparametric U-statistic with applications in biomedical studies. Int J Biostat 2024; 20:69-91. [PMID: 36433631 PMCID: PMC10225018 DOI: 10.1515/ijb-2022-0047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 10/31/2022] [Indexed: 11/28/2022]
Abstract
With our increased ability to capture large data, causal inference has received renewed attention and is playing an ever-important role in biomedicine and economics. However, one major methodological hurdle is that existing methods rely on many unverifiable model assumptions. Thus robust modeling is a critically important approach complementary to sensitivity analysis, where it compares results under various model assumptions. The more robust a method is with respect to model assumptions, the more worthy it is. The doubly robust estimator (DRE) is a significant advance in this direction. However, in practice, many outcome measures are functionals of multiple distributions, and so are the associated estimands, which can only be estimated via U-statistics. Thus most existing DREs do not apply. This article proposes a broad class of highly robust U-statistic estimators (HREs), which use semiparametric specifications for both the propensity score and outcome models in constructing the U-statistic. Thus, the HRE is more robust than the existing DREs. We derive comprehensive asymptotic properties of the proposed estimators and perform extensive simulation studies to evaluate their finite sample performance and compare them with the corresponding parametric U-statistics and the naive estimators, which show significant advantages. Then we apply the method to analyze a clinical trial from the AIDS Clinical Trials Group.
Collapse
Affiliation(s)
- Anqi Yin
- Department of Biostatistics, Bioinformatics and Biomathematics Georgetown University, Washington, DC 20057, USA
| | - Ao Yuan
- Department of Biostatistics, Bioinformatics and Biomathematics Georgetown University, Washington, DC 20057, USA
| | - Ming T. Tan
- Department of Biostatistics, Bioinformatics and Biomathematics Georgetown University, Washington, DC 20057, USA
| |
Collapse
|
2
|
Wang L, Wang X, Liao KP, Cai T. Semisupervised transfer learning for evaluation of model classification performance. Biometrics 2024; 80:ujae002. [PMID: 38465982 PMCID: PMC10926267 DOI: 10.1093/biomtc/ujae002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 12/17/2023] [Accepted: 01/17/2024] [Indexed: 03/12/2024]
Abstract
In many modern machine learning applications, changes in covariate distributions and difficulty in acquiring outcome information have posed challenges to robust model training and evaluation. Numerous transfer learning methods have been developed to robustly adapt the model itself to some unlabeled target populations using existing labeled data in a source population. However, there is a paucity of literature on transferring performance metrics, especially receiver operating characteristic (ROC) parameters, of a trained model. In this paper, we aim to evaluate the performance of a trained binary classifier on unlabeled target population based on ROC analysis. We proposed Semisupervised Transfer lEarning of Accuracy Measures (STEAM), an efficient three-step estimation procedure that employs (1) double-index modeling to construct calibrated density ratio weights and (2) robust imputation to leverage the large amount of unlabeled data to improve estimation efficiency. We establish the consistency and asymptotic normality of the proposed estimator under the correct specification of either the density ratio model or the outcome model. We also correct for potential overfitting bias in the estimators in finite samples with cross-validation. We compare our proposed estimators to existing methods and show reductions in bias and gains in efficiency through simulations. We illustrate the practical utility of the proposed method on evaluating prediction performance of a phenotyping model for rheumatoid arthritis (RA) on a temporally evolving EHR cohort.
Collapse
Affiliation(s)
- Linshanshan Wang
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, United States
| | - Xuan Wang
- Division of Biostatistics, Department of Population Health Sciences, University of Utah, Salt Lake City, UT 84108, United States
| | - Katherine P Liao
- Division of Rheumatology, Brigham and Women’s Hospital, Boston, MA 02115, United States
| | - Tianxi Cai
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| |
Collapse
|
3
|
Li Y, Li L. Propensity score analysis with local balance. Stat Med 2023; 42:2637-2660. [PMID: 37012676 PMCID: PMC11390285 DOI: 10.1002/sim.9741] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 02/01/2023] [Accepted: 03/25/2023] [Indexed: 04/05/2023]
Abstract
Most propensity score (PS) analysis methods rely on a correctly specified parametric PS model, which may result in biased estimation of the average treatment effect (ATE) when the model is misspecified. More flexible nonparametric models for treatment assignment alleviate this issue, but they do not always guarantee covariate balance. Methods that force balance in the means of covariates and their transformations between the treatment groups, termed global balance in this article, do not always lead to unbiased estimation of ATE. Their estimated propensity scores only ensure global balance but not the balancing property, which is defined as the conditional independence between treatment assignment and covariates given the propensity score. The balancing property implies not only global balance but also local balance-the mean balance of covariates in propensity score stratified sub-populations. Local balance implies global balance, but the reverse is false. We propose the propensity score with local balance (PSLB) methodology, which incorporates nonparametric propensity score models and optimizes local balance. Extensive numerical studies showed that the proposed method can substantially outperform existing methods that estimate the propensity score by optimizing global balance, when the model is misspecified. The proposed method is implemented in the R package PSLB.
Collapse
Affiliation(s)
- Yan Li
- The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, Texas, USA
| | - Liang Li
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| |
Collapse
|
4
|
Westling T, Luedtke A, Gilbert PB, Carone M. Inference for treatment-specific survival curves using machine learning. J Am Stat Assoc 2023; 119:1541-1553. [PMID: 39184837 PMCID: PMC11339859 DOI: 10.1080/01621459.2023.2205060] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Accepted: 04/11/2023] [Indexed: 08/27/2024]
Abstract
In the absence of data from a randomized trial, researchers may aim to use observational data to draw causal inference about the effect of a treatment on a time-to-event outcome. In this context, interest often focuses on the treatment-specific survival curves, that is, the survival curves were the population under study to be assigned to receive the treatment or not. Under certain conditions, including that all confounders of the treatment-outcome relationship are observed, the treatment-specific survival curve can be identified with a covariate-adjusted survival curve. In this article, we propose a novel cross-fitted doubly-robust estimator that incorporates data-adaptive (e.g. machine learning) estimators of the conditional survival functions. We establish conditions on the nuisance estimators under which our estimator is consistent and asymptotically linear, both pointwise and uniformly in time. We also propose a novel ensemble learner for combining multiple candidate estimators of the conditional survival estimators. Notably, our methods and results accommodate events occurring in discrete or continuous time, or an arbitrary mix of the two. We investigate the practical performance of our methods using numerical studies and an application to the effect of a surgical treatment to prevent metastases of parotid carcinoma on mortality.
Collapse
Affiliation(s)
- Ted Westling
- Department of Mathematics and Statistics, University of Massachusetts Amherst
| | - Alex Luedtke
- Department of Statistics, University of Washington
| | - Peter B. Gilbert
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center
| | - Marco Carone
- Department of Biostatistics, University of Washington
| |
Collapse
|
5
|
Bannick MS, Gao F, Brown ER, Janes HE. Retrospective, Observational Studies for Estimating Vaccine Effects on the Secondary Attack Rate of SARS-CoV-2. Am J Epidemiol 2023; 192:1016-1028. [PMID: 36883907 PMCID: PMC10505422 DOI: 10.1093/aje/kwad046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 11/21/2022] [Accepted: 02/23/2023] [Indexed: 03/09/2023] Open
Abstract
Coronavirus disease 2019 (COVID-19) vaccines are highly efficacious at preventing symptomatic infection, severe disease, and death. Most of the evidence that COVID-19 vaccines also reduce transmission of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is based on retrospective, observational studies. Specifically, an increasing number of studies are evaluating vaccine effectiveness against the secondary attack rate of SARS-CoV-2 using data available in existing health-care databases or contact-tracing databases. Since these types of databases were designed for clinical diagnosis or management of COVID-19, they are limited in their ability to provide accurate information on infection, infection timing, and transmission events. We highlight challenges with using existing databases to identify transmission units and confirm potential SARS-CoV-2 transmission events. We discuss the impact of common diagnostic testing strategies, including event-prompted and infrequent testing, and illustrate their potential biases in estimating vaccine effectiveness against the secondary attack rate of SARS-CoV-2. We articulate the need for prospective observational studies of vaccine effectiveness against the SARS-CoV-2 secondary attack rate, and we provide design and reporting considerations for studies using retrospective databases.
Collapse
Affiliation(s)
- Marlena S Bannick
- Correspondence to Marlena Bannick, Department of Biostatistics, Hans Rosling Center for Population Health, Box 357232, University of Washington, Seattle, WA 98195 (e-mail: )
| | | | | | | |
Collapse
|
6
|
Lee MJ, Lee S. Review and comparison of treatment effect estimators using propensity and prognostic scores. Int J Biostat 2022; 18:357-380. [PMID: 35942611 DOI: 10.1515/ijb-2021-0005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Accepted: 01/03/2022] [Indexed: 01/10/2023]
Abstract
In finding effects of a binary treatment, practitioners use mostly either propensity score matching (PSM) or inverse probability weighting (IPW). However, many new treatment effect estimators are available now using propensity score and "prognostic score", and some of these estimators are much better than PSM and IPW in several aspects. In this paper, we review those recent treatment effect estimators to show how they are related to one another, and why they are better than PSM and IPW. We compare 26 estimators in total through extensive simulation and empirical studies. Based on these, we recommend recent treatment effect estimators using "overlap weight", and "targeted MLE" using statistical/machine learning, as well as a simple regression imputation/adjustment estimator using linear prognostic score models.
Collapse
Affiliation(s)
- Myoung-Jae Lee
- Department of Economics, Korea University, Seoul 02841, Korea
| | - Sanghyeok Lee
- Department of Economics, American University in Cairo, New Cairo 11835, Egypt
| |
Collapse
|
7
|
Williams N, Rosenblum M, Díaz I. Optimising precision and power by machine learning in randomised trials with ordinal and time-to-event outcomes with an application to COVID-19. JOURNAL OF THE ROYAL STATISTICAL SOCIETY. SERIES A, (STATISTICS IN SOCIETY) 2022; 185:RSSA12915. [PMID: 36246572 PMCID: PMC9539267 DOI: 10.1111/rssa.12915] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 05/23/2022] [Accepted: 07/05/2022] [Indexed: 05/23/2023]
Abstract
The rapid finding of effective therapeutics requires efficient use of available resources in clinical trials. Covariate adjustment can yield statistical estimates with improved precision, resulting in a reduction in the number of participants required to draw futility or efficacy conclusions. We focus on time-to-event and ordinal outcomes. When more than a few baseline covariates are available, a key question for covariate adjustment in randomised studies is how to fit a model relating the outcome and the baseline covariates to maximise precision. We present a novel theoretical result establishing conditions for asymptotic normality of a variety of covariate-adjusted estimators that rely on machine learning (e.g.,ℓ 1 -regularisation, Random Forests, XGBoost, and Multivariate Adaptive Regression Splines [MARS]), under the assumption that outcome data are missing completely at random. We further present a consistent estimator of the asymptotic variance. Importantly, the conditions do not require the machine learning methods to converge to the true outcome distribution conditional on baseline variables, as long as they converge to some (possibly incorrect) limit. We conducted a simulation study to evaluate the performance of the aforementioned prediction methods in COVID-19 trials. Our simulation is based on resampling longitudinal data from over 1500 patients hospitalised with COVID-19 at Weill Cornell Medicine New York Presbyterian Hospital. We found that usingℓ 1 -regularisation led to estimators and corresponding hypothesis tests that control type 1 error and are more precise than an unadjusted estimator across all sample sizes tested. We also show that when covariates are not prognostic of the outcome,ℓ 1 -regularisation remains as precise as the unadjusted estimator, even at small sample sizes (n = 100 ). We give an R package adjrct that performs model-robust covariate adjustment for ordinal and time-to-event outcomes.
Collapse
Affiliation(s)
- Nicholas Williams
- Department of EpidemiologyColumbia University Mailman School of Public HealthNew York CityNew YorkUSA
| | - Michael Rosenblum
- Department of BiostatisticsJohns Hopkins Bloomberg School of Public HealthBaltimoreMarylandUSA
| | - Iván Díaz
- Division of Biostatistics, Department of Population HealthNew York University Grossman School of MedicineNew York CityNew YorkUSA
| |
Collapse
|
8
|
Su M, Wang R, Wang Q. A two-stage optimal subsampling estimation for missing data problems with large-scale data. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2022.107505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
9
|
Wen L, Hernán MA, Robins JM. MULTIPLY ROBUST ESTIMATORS OF CAUSAL EFFECTS FOR SURVIVAL OUTCOMES. Scand Stat Theory Appl 2022; 49:1304-1328. [PMID: 36033967 PMCID: PMC9401091 DOI: 10.1111/sjos.12561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Accepted: 08/30/2021] [Indexed: 11/27/2022]
Abstract
Multiply robust estimators of the longitudinal g-formula have recently been proposed to protect against model misspecification better than the standard augmented inverse probability weighted estimator (Rotnitzky et al., 2017; Luedtke et al., 2018). These multiply robust estimators ensure consistency if one of the models for the treatment process or outcome process is correctly specified at each time point. We study the multiply robust estimators of Rotnitzky et al. (2017) in the context of a survival outcome. Specifically, we compare various estimators of the g-formula for survival outcomes in order to 1) understand how the estimators may be related to one another, 2) understand each estimator's robustness to model misspecification, and 3) construct estimators that can be more efficient than others in certain model misspecification scenarios. We propose a modification of the multiply robust estimators to gain efficiency under misspecification of the outcome model by using calibrated propensity scores over non-calibrated propensity scores at each time point. Theoretical results are confirmed via simulation studies, and a practical comparison of these estimators is conducted through an application to the US Veterans Aging Cohort Study.
Collapse
Affiliation(s)
- Lan Wen
- DEPARTMENT OF EPIDEMIOLOGY, HARVARD T. H. CHAN SCHOOL OF PUBLIC HEALTH
- CAUSALAB, HARVARD T.H. CHAN SCHOOL OF PUBLIC HEALTH
| | - Miguel A Hernán
- DEPARTMENT OF EPIDEMIOLOGY, HARVARD T. H. CHAN SCHOOL OF PUBLIC HEALTH
- CAUSALAB, HARVARD T.H. CHAN SCHOOL OF PUBLIC HEALTH
- DEPARTMENT OF BIOSTATISTICS, HARVARD T. H. CHAN SCHOOL OF PUBLIC HEALTH
| | - James M Robins
- DEPARTMENT OF EPIDEMIOLOGY, HARVARD T. H. CHAN SCHOOL OF PUBLIC HEALTH
- CAUSALAB, HARVARD T.H. CHAN SCHOOL OF PUBLIC HEALTH
- DEPARTMENT OF BIOSTATISTICS, HARVARD T. H. CHAN SCHOOL OF PUBLIC HEALTH
| |
Collapse
|
10
|
Bagmar MSH, Shen H. Causal inference with missingness in confounder. J STAT COMPUT SIM 2022. [DOI: 10.1080/00949655.2022.2089672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Affiliation(s)
- Md. Shaddam Hossain Bagmar
- Department of Mathematics and Statistics, University of Calgary, Calgary, Alberta, Canada
- Institute of Statistical Research and Training (ISRT), University of Dhaka, Dhaka, Bangladesh
| | - Hua Shen
- Department of Mathematics and Statistics, University of Calgary, Calgary, Alberta, Canada
| |
Collapse
|
11
|
Choi JY, Lee MJ. Overlap weight and propensity score residual for heterogeneous effects: A review with extensions. J Stat Plan Inference 2022. [DOI: 10.1016/j.jspi.2022.04.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
12
|
Su M, Wang Q. A convex programming solution based debiased estimator for quantile with missing response and high-dimensional covariables. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2021.107371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
13
|
Butera NM, Zeng D, Howard AG, Gordon-Larsen P, Cai J. A doubly robust method to handle missing multilevel outcome data with application to the China Health and Nutrition Survey. Stat Med 2022; 41:769-785. [PMID: 34786739 PMCID: PMC8795489 DOI: 10.1002/sim.9260] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 08/17/2021] [Accepted: 10/25/2021] [Indexed: 11/12/2022]
Abstract
Missing data are common in longitudinal cohort studies and can lead to bias, particularly in studies with informative missingness. Many common methods for handling informatively missing data in survey samples require correctly specifying a model for missingness. Although doubly robust methods exist to provide unbiased regression coefficients in the presence of missing outcome data, these methods do not account for correlation due to clustering inherent in longitudinal or cluster-sampled studies. In this work, we developed a doubly robust method to estimate the regression of an outcome on a predictor in the presence of missing multilevel data on the outcome, which results in consistent estimation of regression coefficients assuming correct specification of either (1) the probability of missingness or (2) the outcome model. This method involves specification of separate hierarchical models for missingness and for the outcome, conditional on observed auxiliary variables and cluster-specific random effects, to account for correlation among observations. We showed this proposed estimator is doubly robust and derived its asymptotic distribution, conducted simulation studies to compare the method to an existing doubly robust method developed for independent data, and applied the method to data from the China Health and Nutrition Survey, an ongoing multilevel longitudinal cohort study.
Collapse
Affiliation(s)
- Nicole M. Butera
- The Biostatistics Center and Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, The George Washington University, Rockville, Maryland
| | - Donglin Zeng
- Department of Biostatistics, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Annie Green Howard
- Department of Biostatistics, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
- Carolina Population Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Penny Gordon-Larsen
- Carolina Population Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
- Department of Nutrition, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Jianwen Cai
- Department of Biostatistics, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| |
Collapse
|
14
|
Shah RD, Bühlmann P. Double-Estimation-Friendly Inference for High-Dimensional Misspecified Models. Stat Sci 2022. [DOI: 10.1214/22-sts850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Rajen D. Shah
- Rajen D. Shah is Professor of Statistics, Statistical Laboratory, University of Cambridge, Cambridge, United Kingdom
| | - Peter Bühlmann
- Peter Bühlmann is Professor of Statistics, Seminar for Statistics, ETH Zurich, Zurich, Switzerland
| |
Collapse
|
15
|
Han K, Shaw PA, Lumley T. Combining multiple imputation with raking of weights: An efficient and robust approach in the setting of nearly true models. Stat Med 2021; 40:6777-6791. [PMID: 34585424 PMCID: PMC8963275 DOI: 10.1002/sim.9210] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Revised: 07/30/2021] [Accepted: 09/14/2021] [Indexed: 01/01/2023]
Abstract
Multiple imputation (MI) provides us with efficient estimators in model-based methods for handling missing data under the true model. It is also well-understood that design-based estimators are robust methods that do not require accurately modeling the missing data; however, they can be inefficient. In any applied setting, it is difficult to know whether a missing data model may be good enough to win the bias-efficiency trade-off. Raking of weights is one approach that relies on constructing an auxiliary variable from data observed on the full cohort, which is then used to adjust the weights for the usual Horvitz-Thompson estimator. Computing the optimally efficient raking estimator requires evaluating the expectation of the efficient score given the full cohort data, which is generally infeasible. We demonstrate MI as a practical method to compute a raking estimator that will be optimal. We compare this estimator to common parametric and semi-parametric estimators, including standard MI. We show that while estimators, such as the semi-parametric maximum likelihood and MI estimator, obtain optimal performance under the true model, the proposed raking estimator utilizing MI maintains a better robustness-efficiency trade-off even under mild model misspecification. We also show that the standard raking estimator, without MI, is often competitive with the optimal raking estimator. We demonstrate these properties through several numerical examples and provide a theoretical discussion of conditions for asymptotically superior relative efficiency of the proposed raking estimator.
Collapse
Affiliation(s)
- Kyunghee Han
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Pamela A. Shaw
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Thomas Lumley
- Department of Statistics, University of Auckland, Auckland, New Zealand
| |
Collapse
|
16
|
Mo W, Liu Y. Efficient learning of optimal individualized treatment rules for heteroscedastic or misspecified treatment‐free effect models. J R Stat Soc Series B Stat Methodol 2021. [DOI: 10.1111/rssb.12474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Weibin Mo
- Department of Statistics and Operations Research University of North Carolina at Chapel Hill Chapel Hill North Carolina USA
| | - Yufeng Liu
- Department of Statistics and Operations Research Department of Genetics Department of Biostatistics Carolina Center for Genome Sciences Lineberger Comprehensive Cancer Center University of North Carolina at Chapel Hill Chapel Hill North Carolina USA
| |
Collapse
|
17
|
Zhou M, Yao W. Sensitivity analysis of unmeasured confounding in causal inference based on exponential tilting and super learner. J Appl Stat 2021; 50:744-760. [PMID: 36819084 PMCID: PMC9930795 DOI: 10.1080/02664763.2021.1999398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Causal inference under the potential outcome framework relies on the strongly ignorable treatment assumption. This assumption is usually questionable in observational studies, and the unmeasured confounding is one of the fundamental challenges in causal inference. To this end, we propose a new sensitivity analysis method to evaluate the impact of the unmeasured confounder by leveraging ideas of doubly robust estimators, the exponential tilt method, and the super learner algorithm. Compared to other existing methods of sensitivity analysis that parameterize the unmeasured confounder as a latent variable in the working models, the exponential tilting method does not impose any restrictions on the structure or models of the unmeasured confounders. In addition, in order to reduce the modeling bias of traditional parametric methods, we propose incorporating the super learner machine learning algorithm to perform nonparametric model estimation and the corresponding sensitivity analysis. Furthermore, most existing sensitivity analysis methods require multivariate sensitivity parameters, which make its choice difficult and subjective in practice. In comparison, the new method has a univariate sensitivity parameter with a nice and simple interpretation of log-odds ratios for binary outcomes, which makes its choice and the application of the new sensitivity analysis method very easy for practitioners.
Collapse
Affiliation(s)
- Mi Zhou
- Department of Statistics, University of California, Riverside, CA, USA
| | - Weixin Yao
- Department of Statistics, University of California, Riverside, CA, USA,Weixin Yao Department of Statistics, University of California, Riverside, CA92521, USA
| |
Collapse
|
18
|
Zhang Y, Bradic J. High-dimensional semi-supervised learning: in search of optimal inference of the mean. Biometrika 2021. [DOI: 10.1093/biomet/asab042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Abstract
A fundamental challenge in semi-supervised learning lies in the observed data’s disproportional size when compared with the size of the data collected with missing outcomes. An implicit understanding is that the dataset with missing outcomes, being significantly larger, ought to improve estimation and inference. However, it is unclear to what extent this is correct. We illustrate one clear benefit: root-n inference of the outcome’s mean is possible while only requiring a consistent estimation of the outcome, possibly at a rate slower than root-n. This is achieved by a novel k-fold cross-fitted, double robust estimator. We discuss both linear and nonlinear outcomes. Such an estimator is particularly suited for models that naturally do not admit root-n consistency, such as high-dimensional, nonparametric, or semiparametric models. We apply our methods to the heterogeneous treatment effects.
Collapse
Affiliation(s)
- Yuqian Zhang
- Department of Mathematics, University of California San Diego, 9500 Gilman Drive, La Jolla, California 92093-0112, U.S.A
| | - Jelena Bradic
- Department of Mathematics, University of California San Diego, 9500 Gilman Drive, La Jolla, California 92093-0112, U.S.A
| |
Collapse
|
19
|
Hines O, Vansteelandt S, Diaz-Ordaz K. Robust Inference for Mediated Effects in Partially Linear Models. PSYCHOMETRIKA 2021; 86:595-618. [PMID: 34008127 DOI: 10.1007/s11336-021-09768-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Revised: 04/17/2021] [Accepted: 04/24/2021] [Indexed: 06/12/2023]
Abstract
We consider mediated effects of an exposure, X on an outcome, Y, via a mediator, M, under no unmeasured confounding assumptions in the setting where models for the conditional expectation of the mediator and outcome are partially linear. We propose G-estimators for the direct and indirect effects and demonstrate consistent asymptotic normality for indirect effects when models for the conditional means of M, or X and Y are correctly specified, and for direct effects, when models for the conditional means of Y, or X and M are correct. This marks an improvement, in this particular setting, over previous 'triple' robust methods, which do not assume partially linear mean models. Testing of the no-mediation hypothesis is inherently problematic due to the composite nature of the test (either X has no effect on M or M no effect on Y), leading to low power when both effect sizes are small. We use generalized methods of moments (GMM) results to construct a new score testing framework, which includes as special cases the no-mediation and the no-direct-effect hypotheses. The proposed tests rely on an orthogonal estimation strategy for estimating nuisance parameters. Simulations show that the GMM-based tests perform better in terms of power and small sample performance compared with traditional tests in the partially linear setting, with drastic improvement under model misspecification. New methods are illustrated in a mediation analysis of data from the COPERS trial, a randomized trial investigating the effect of a non-pharmacological intervention of patients suffering from chronic pain. An accompanying R package implementing these methods can be found at github.com/ohines/plmed.
Collapse
Affiliation(s)
- Oliver Hines
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK.
| | - Stijn Vansteelandt
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Karla Diaz-Ordaz
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK
| |
Collapse
|
20
|
Cheng D, Ananthakrishnan AN, Cai T. Robust and efficient semi-supervised estimation of average treatment effects with application to electronic health records data. Biometrics 2021; 77:413-423. [PMID: 32413171 PMCID: PMC7758040 DOI: 10.1111/biom.13298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2018] [Revised: 04/30/2020] [Accepted: 05/01/2020] [Indexed: 11/29/2022]
Abstract
We consider the problem of estimating the average treatment effect (ATE) in a semi-supervised learning setting, where a very small proportion of the entire set of observations are labeled with the true outcome but features predictive of the outcome are available among all observations. This problem arises, for example, when estimating treatment effects in electronic health records (EHR) data because gold-standard outcomes are often not directly observable from the records but are observed for a limited number of patients through small-scale manual chart review. We develop an imputation-based approach for estimating the ATE that is robust to misspecification of the imputation model. This effectively allows information from the predictive features to be safely leveraged to improve efficiency in estimating the ATE. The estimator is additionally doubly-robust in that it is consistent under correct specification of either an initial propensity score model or a baseline outcome model. It is also locally semiparametric efficient under an ideal semi-supervised model where the distribution of the unlabeled data is known. Simulations exhibit the efficiency and robustness of the proposed method compared to existing approaches in finite samples. We illustrate the method by comparing rates of treatment response to two biologic agents for treatment inflammatory bowel disease using EHR data from Partners' Healthcare.
Collapse
Affiliation(s)
- David Cheng
- VA Boston Healthcare System, Boston, Massachusetts, U.S.A
| | | | - Tianxi Cai
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, U.S.A
| |
Collapse
|
21
|
Dhar SS, Das U. On distance based goodness of fit tests for missing data when missing occurs at random. AUST NZ J STAT 2021. [DOI: 10.1111/anzs.12313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Subhra Sankar Dhar
- Department of Mathematics and Statistics Indian Institute of Technology Kanpur Uttar Pradesh208016India
| | - Ujjwal Das
- OM, QM & IS Area Indian Institute of Management Udaipur Rajasthan313001India
| |
Collapse
|
22
|
Liu J, Li W. A semiparametric method for evaluating causal effects in the presence of error-prone covariates. Biom J 2021; 63:1202-1222. [PMID: 34357652 DOI: 10.1002/bimj.202000069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 02/18/2021] [Accepted: 03/10/2021] [Indexed: 11/12/2022]
Abstract
The goal of most empirical studies in social sciences and medical research is to determine whether an alteration in an intervention or a treatment will cause a change in the desired outcome response. Unlike randomized designs, establishing the causal relationship based on observational studies is a challenging problem because the ceteris paribus condition is violated. When the covariates of interest are measured with errors, evaluating the causal effects becomes a thorny issue. We propose a semiparametric method to establish the causal relationship, which yields a consistent estimator of the average causal effect. The method we proposed results in locally efficient estimators of the covariate effects. We study their theoretical properties and demonstrate their finite sample performance on simulated data. We further apply the proposed method to the Stroke Recovery in Underserved Populations (SRUP) study by the National Institute on Aging.
Collapse
Affiliation(s)
- Jianxuan Liu
- Department of Mathematics, Syracuse University, Syracuse, NY, USA.,Center for Policy Research, Maxwell School of Citizenship and Public Affairs, Syracuse University, Syracuse, NY, USA
| | - Wei Li
- Department of Mathematics, Syracuse University, Syracuse, NY, USA
| |
Collapse
|
23
|
Wang Q, Su M, Wang R. A beyond multiple robust approach for missing response problem. Comput Stat Data Anal 2021. [DOI: 10.1016/j.csda.2020.107111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
24
|
Enhanced Doubly Robust Procedure for Causal Inference. STATISTICS IN BIOSCIENCES 2021. [DOI: 10.1007/s12561-021-09300-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
25
|
Yiu A, Goudie RJB, Tom BDM. Inference under unequal probability sampling with the Bayesian exponentially tilted empirical likelihood. Biometrika 2020; 107:857-873. [PMID: 34992304 PMCID: PMC7612173 DOI: 10.1093/biomet/asaa028] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Fully Bayesian inference in the presence of unequal probability sampling requires stronger structural assumptions on the data-generating distribution than frequentist semiparametric methods, but offers the potential for improved small-sample inference and convenient evidence synthesis. We demonstrate that the Bayesian exponentially tilted empirical likelihood can be used to combine the practical benefits of Bayesian inference with the robustness and attractive large-sample properties of frequentist approaches. Estimators defined as the solutions to unbiased estimating equations can be used to define a semiparametric model through the set of corresponding moment constraints. We prove Bernstein-von Mises theorems which show that the posterior constructed from the resulting exponentially tilted empirical likelihood becomes approximately normal, centred at the chosen estimator with matching asymptotic variance; thus, the posterior has properties analogous to those of the estimator, such as double robustness, and the frequentist coverage of any credible set will be approximately equal to its credibility. The proposed method can be used to obtain modified versions of existing estimators with improved properties, such as guarantees that the estimator lies within the parameter space. Unlike existing Bayesian proposals, our method does not prescribe a particular choice of prior or require posterior variance correction, and simulations suggest that it provides superior performance in terms of frequentist criteria.
Collapse
Affiliation(s)
- A Yiu
- Medical Research Council Biostatistics Unit, School of Clinical Medicine, University of Cambridge, Robinson Way, Cambridge CB2 0SR, U.K
| | - R J B Goudie
- Medical Research Council Biostatistics Unit, School of Clinical Medicine, University of Cambridge, Robinson Way, Cambridge CB2 0SR, U.K
| | - B D M Tom
- Medical Research Council Biostatistics Unit, School of Clinical Medicine, University of Cambridge, Robinson Way, Cambridge CB2 0SR, U.K
| |
Collapse
|
26
|
Zhang Y, Qin G, Zhu Z, Fu B. Robust estimation of models for longitudinal data with dropouts and outliers. J Appl Stat 2020; 49:902-925. [PMID: 35707815 PMCID: PMC9042061 DOI: 10.1080/02664763.2020.1845623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2020] [Accepted: 10/26/2020] [Indexed: 10/23/2022]
Abstract
Missing data and outliers usually arise in longitudinal studies. Ignoring the effects of missing data and outliers will make the classical generalized estimating equation approach invalid. The longitudinal cohort study of rheumatoid arthritis patients was designed to investigate whether the Health Assessment Questionnaire score was associated with baseline covariates and changed with time. There exist dropouts and outliers in the data. In order to analyze the data, we develop a robust estimating equation approach. To deal with the responses missing at random, we extend a doubly robust method. To achieve robustness against outliers, we utilize an outlier robust method, which corrects the bias induced by outliers through centralizing the covariate matrix in the estimating equation. The doubly robust method for dropouts is easy to combine with the outlier robust method. The proposed method has the property of robustness in the sense that the proposed estimator is not only doubly robust against model misspecification for dropouts when there is no outlier in the data, but also robust against outliers. Consistency and asymptotic normality of the proposed estimator are established under regularity conditions. A comprehensive simulation study and real data analysis demonstrate that the proposed estimator does have the property of robustness.
Collapse
Affiliation(s)
- Yuexia Zhang
- Department of Computer and Mathematical Sciences, University of Toronto, Toronto, Canada
| | - Guoyou Qin
- Department of Biostatistics, School of Public Health, and The Key Laboratory of Public Health Safety of Ministry of Education, Fudan University, Shanghai, People's Republic of China
| | - Zhongyi Zhu
- Department of Statistics, Fudan University, Shanghai, People's Republic of China
| | - Bo Fu
- School of Data Science, Fudan University, Shanghai, People's Republic of China
| |
Collapse
|
27
|
Canto MI, Trindade AJ, Abrams J, Rosenblum M, Dumot J, Chak A, Iyer P, Diehl D, Khara HS, Corbett FS, McKinley M, Shin EJ, Waxman I, Infantolino A, Tofani C, Samarasena J, Chang K, Wang B, Goldblum J, Voltaggio L, Montgomery E, Lightdale CJ, Shaheen NJ. Multifocal Cryoballoon Ablation for Eradication of Barrett's Esophagus-Related Neoplasia: A Prospective Multicenter Clinical Trial. Am J Gastroenterol 2020; 115:1879-1890. [PMID: 33009064 DOI: 10.14309/ajg.0000000000000822] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
INTRODUCTION:
Ablation of Barrett's esophagus (BE) is the preferred approach for the treatment of neoplasia without visible lesions. Limited data on cryoballoon ablation (CBA) suggest its potential clinical utility. We evaluated the safety and efficacy of CBA in a multicenter study of patients with neoplastic BE.
METHODS:
In a prospective clinical trial, 11 academic and community centers recruited consecutive patients with BE of 1–6 cm length and low-grade dysplasia, high-grade dysplasia (HGD), or intramucosal adenocarcinoma (ImCA) confirmed by central pathology. Patients with symptomatic pre-existing strictures or visible BE lesions had dilation or endoscopic mucosal resection (EMR), respectively, before enrollment. A nitrous oxide cryoballoon focal ablation system was used to treat all visible columnar mucosa in up to 5 sessions. Study end points included complete eradication of all dysplasia (CE-D) and intestinal metaplasia (CE-IM) at 1 year.
RESULTS:
One hundred twenty patients with BE with ImCA (20%), HGD (56%), or low-grade dysplasia (23%) were enrolled. In the intention-to-treat analysis, the CE-D and CE-IM rates were 76% and 72%, respectively. In the per-protocol analysis (94 patients), the CE-D and CE-IM rates were 97% and 91%, respectively. Postablation pain was mild and short lived. Fifteen subjects (12.5%) developed strictures requiring dilation. One patient (0.8%) with HGD progressed to ImCA, which was successfully treated with EMR. Another patient (0.8%) developed gastrointestinal bleeding associated with clopidogrel use. One patient (0.8%) had buried BE with HGD in 1 biopsy, not confirmed by subsequent EMR.
DISCUSSION:
In patients with neoplastic BE, CBA was safe and effective. Head-to-head comparisons between CBA and other ablation modalities are warranted (clinicaltrials.gov registration NCT02514525).
Collapse
Affiliation(s)
- Marcia Irene Canto
- Department of Medicine (Gastroenterology), Johns Hopkins Medical Institutions, Baltimore, Maryland, USA
| | - Arvind J. Trindade
- Division of Gastroenterology at the Zucker School of Medicine of Hofstra/Northwell, Long Island Jewish Medical Center, Northwell Health System, New Hyde Park, New York, USA
| | - Julian Abrams
- Department of Medicine, Columbia University Medical Center, New York, New York, USA
| | - Michael Rosenblum
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, USA
| | - John Dumot
- Division of Gastroenterology at University Hospitals of Cleveland Medical Center, Cleveland, Ohio, USA
| | - Amitabh Chak
- Division of Gastroenterology at University Hospitals of Cleveland Medical Center, Cleveland, Ohio, USA
| | - Prasad Iyer
- Division of Gastroenterology, Mayo Clinic, Rochester, Minnesota, USA
| | - David Diehl
- Division of Gastroenterology, Geisinger Medical Center, Danby Pennsylvania, USA
| | - Harshit S. Khara
- Division of Gastroenterology, Geisinger Medical Center, Danby Pennsylvania, USA
| | | | - Matthew McKinley
- Division of Gastroenterology at the Zucker School of Medicine of Hofstra/Northwell, Long Island Jewish Medical Center, Northwell Health System, New Hyde Park, New York, USA
| | - Eun Ji Shin
- Department of Medicine (Gastroenterology), Johns Hopkins Medical Institutions, Baltimore, Maryland, USA
| | - Irving Waxman
- Division of Gastroenterology, University of Chicago Medical Center, Chicago, Illinois, USA
| | - Anthony Infantolino
- Division of Gastroenterology, Jefferson Medical Center, Philadelphia, Pennsylvania, USA
| | - Christina Tofani
- Division of Gastroenterology, University of Chicago Medical Center, Chicago, Illinois, USA
| | - Jason Samarasena
- Division of Gastroenterology, University of California Irvine Medical Center, Irvine, California, USA
| | - Kenneth Chang
- Division of Gastroenterology, University of California Irvine Medical Center, Irvine, California, USA
| | - Bingkai Wang
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, USA
| | - John Goldblum
- Department of Pathology, The Cleveland Clinic Foundation, Cleveland, Ohio, USA
| | - Lysandra Voltaggio
- Department ofPathology, Johns Hopkins Medical Institutions Baltimore Maryland, USA
| | - Elizabeth Montgomery
- Department ofPathology, Johns Hopkins Medical Institutions Baltimore Maryland, USA
| | - Charles J. Lightdale
- Division of Gastroenterology at the Zucker School of Medicine of Hofstra/Northwell, Long Island Jewish Medical Center, Northwell Health System, New Hyde Park, New York, USA
| | - Nicholas J. Shaheen
- Division of Gastroenterology, University of North Carolina, Chapel Hill, North Carolina, USA
| |
Collapse
|
28
|
Affiliation(s)
- Muxuan Liang
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA
| | - Menggang Yu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI
| |
Collapse
|
29
|
Parast L, Cai T, Tian L. Evaluating multiple surrogate markers with censored data. Biometrics 2020; 77:1315-1327. [PMID: 32920821 DOI: 10.1111/biom.13370] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 06/11/2020] [Accepted: 09/01/2020] [Indexed: 11/27/2022]
Abstract
The utilization of surrogate markers offers the opportunity to reduce the length of required follow-up time and/or costs of a randomized trial examining the effectiveness of an intervention or treatment. There are many available methods for evaluating the utility of a single surrogate marker including both parametric and nonparametric approaches. However, as the dimension of the surrogate marker increases, a completely nonparametric procedure becomes infeasible due to the curse of dimensionality. In this paper, we define a quantity to assess the value of multiple surrogate markers in a time-to-event outcome setting and propose a robust estimation approach for censored data. We focus on surrogate markers that are measured at some landmark time, t0 , which occurs earlier than the end of the study. Our approach is based on a dimension reduction procedure with an option to incorporate weights to guard against potential misspecification of the working model, resulting in three different proposed estimators, two of which can be shown to be double robust. We examine the finite sample performance of the estimators under various scenarios using a simulation study. We illustrate the estimation and inference procedures using data from the Diabetes Prevention Program (DPP) to examine multiple potential surrogate markers for diabetes.
Collapse
Affiliation(s)
- Layla Parast
- Statistics Group, RAND Corporation, Santa Monica, California
| | - Tianxi Cai
- Department of Biostatistics, Harvard University, Boston, Massachusetts
| | - Lu Tian
- Department of Biomedical Data Science, Stanford University, Stanford, California
| |
Collapse
|
30
|
Fang Y, He W, Wang H, Wu M. Key considerations in the design of real-world studies. Contemp Clin Trials 2020; 96:106091. [PMID: 32717351 DOI: 10.1016/j.cct.2020.106091] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Revised: 06/12/2020] [Accepted: 07/21/2020] [Indexed: 12/28/2022]
Abstract
Randomized controlled clinical trials (RCTs) are the gold standard for evaluating the safety and efficacy of pharmaceutical drugs, but in many cases their costs, duration, limited generalizability, and ethical or technical feasibility have caused some to look for real-world studies as alternatives. However, real-world studies may be less convincing due to the lack of randomization and blinding. In this article, we discuss some key considerations in the design of real-world studies, which include experimental studies (e.g., hybrid or pragmatic clinical trials and non-randomized single-arm clinical trials with external controls) and non-experimental studies (e.g., cohort studies, cross-sectional studies, and case-control studies). Causal inference plays a critical role in the derivation of robust real-world evidence (RWE) from the analysis of real-world data (RWD). Therefore, we apply the hypothetical strategy, along with the concept of potential outcome, to lay out these key considerations, and we hope these considerations are helpful for the design, conduct, and analysis of real-world studies.
Collapse
Affiliation(s)
- Yixin Fang
- AbbVie, 1 North Waukegan Rd, North Chicago, IL 60064, United States of America.
| | - Weili He
- AbbVie, 1 North Waukegan Rd, North Chicago, IL 60064, United States of America
| | - Hongwei Wang
- AbbVie, 1 North Waukegan Rd, North Chicago, IL 60064, United States of America
| | - Meijing Wu
- AbbVie, 1 North Waukegan Rd, North Chicago, IL 60064, United States of America
| |
Collapse
|
31
|
Abstract
Summary
For estimating the population mean of a response variable subject to ignorable missingness, a new class of methods, called multiply robust procedures, has been proposed. The advantage of multiply robust procedures over the traditional doubly robust methods is that they permit the use of multiple candidate models for both the propensity score and the outcome regression, and they are consistent if any one of the multiple models is correctly specified, a property termed multiple robustness. This paper shows that, somewhat surprisingly, multiply robust estimators are special cases of doubly robust estimators, where the final propensity score and outcome regression models are certain combinations of the candidate models. To further improve model specifications in the doubly robust estimators, we adapt a model mixing procedure as an alternative method for combining multiple candidate models. We show that multiple robustness and asymptotic normality can also be achieved by our mixing-based doubly robust estimator. Moreover, our estimator and its theoretical properties are not confined to parametric models. Numerical examples demonstrate that the proposed estimator is comparable to and can even outperform existing multiply robust estimators.
Collapse
Affiliation(s)
- Wei Li
- School of Statistics, Renmin University of China, 59 Zhongguancun Street, Beijing 100872, China
| | - Yuwen Gu
- Department of Statistics, University of Connecticut, 215 Glenbrook Road, Storrs, Connecticut 06269, U.S.A
| | - Lan Liu
- School of Statistics, University of Minnesota, 313 Ford Hall, 224 Church St SE, Minneapolis, Minnesota 55455, U.S.A
| |
Collapse
|
32
|
Robbins MW, Ann Griffin B, Shih RA, Ellen Slaughter M. Robust estimation of the causal effect of time-varying neighborhood factors on health outcomes. Stat Med 2020; 39:544-561. [PMID: 31820833 PMCID: PMC9706720 DOI: 10.1002/sim.8423] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Revised: 09/25/2019] [Accepted: 10/05/2019] [Indexed: 11/07/2022]
Abstract
The fundamental difficulty of establishing causal relationships between an exposure and an outcome in observational data involves disentangling causality from confounding factors. This problem underlies much of neighborhoods research, which abounds with studies that consider associations between neighborhood characteristics and health outcomes in longitudinal data. Such analyses are confounded by selection issues; individuals with above average health outcomes (or associated characteristics) may self-select into advantaged neighborhoods. Techniques commonly used to assess causal inferences in observational longitudinal data, such as inverse probability of treatment weighting (IPTW), may be inappropriate in neighborhoods data due to unique characteristics of such data. We advance the IPTW toolkit by introducing a procedure based on a multivariate kernel density function which is more appropriate for neighborhoods data. The proposed weighting method is applied in conjunction with a marginal structural model. Our empirical analyses use longitudinal data from the Health and Retirement Study; our exposure of interest is an index of neighborhood socioeconomic status (NSES), and we examine its influence on cognitive function. Our findings illustrate the importance of the choice of method for IPTW-the comparison weighting methods provide poor balance across the set of covariates (which is not the case for our preferred procedure) and yield misleading results when applied in the outcomes models. The utility of the multivariate kernel is also validated via simulation. In addition, our findings emphasize the importance of IPTW-controlling for covariates within a regression without IPTW indicates that NSES affects cognition, whereas IPTW-weighted models fail to show a statistically significant effect.
Collapse
|
33
|
Díaz I, Colantuoni E, Hanley DF, Rosenblum M. Improved precision in the analysis of randomized trials with survival outcomes, without assuming proportional hazards. LIFETIME DATA ANALYSIS 2019; 25:439-468. [PMID: 29492746 DOI: 10.1007/s10985-018-9428-5] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2017] [Accepted: 02/18/2018] [Indexed: 06/08/2023]
Abstract
We present a new estimator of the restricted mean survival time in randomized trials where there is right censoring that may depend on treatment and baseline variables. The proposed estimator leverages prognostic baseline variables to obtain equal or better asymptotic precision compared to traditional estimators. Under regularity conditions and random censoring within strata of treatment and baseline variables, the proposed estimator has the following features: (i) it is interpretable under violations of the proportional hazards assumption; (ii) it is consistent and at least as precise as the Kaplan-Meier and inverse probability weighted estimators, under identifiability conditions; (iii) it remains consistent under violations of independent censoring (unlike the Kaplan-Meier estimator) when either the censoring or survival distributions, conditional on covariates, are estimated consistently; and (iv) it achieves the nonparametric efficiency bound when both of these distributions are consistently estimated. We illustrate the performance of our method using simulations based on resampling data from a completed, phase 3 randomized clinical trial of a new surgical treatment for stroke; the proposed estimator achieves a 12% gain in relative efficiency compared to the Kaplan-Meier estimator. The proposed estimator has potential advantages over existing approaches for randomized trials with time-to-event outcomes, since existing methods either rely on model assumptions that are untenable in many applications, or lack some of the efficiency and consistency properties (i)-(iv). We focus on estimation of the restricted mean survival time, but our methods may be adapted to estimate any treatment effect measure defined as a smooth contrast between the survival curves for each study arm. We provide R code to implement the estimator.
Collapse
Affiliation(s)
- Iván Díaz
- Division of Biostatistics and Epidemiology, Weill Cornell Medicine, New York, NY, USA.
| | - Elizabeth Colantuoni
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Daniel F Hanley
- Division of Brain Injury Outcomes, Johns Hopkins Medical Institutions, Baltimore, MD, USA
| | - Michael Rosenblum
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| |
Collapse
|
34
|
Ma S, Zhu L, Zhang Z, Tsai CL, Carroll RJ. A ROBUST AND EFFICIENT APPROACH TO CAUSAL INFERENCE BASED ON SPARSE SUFFICIENT DIMENSION REDUCTION. Ann Stat 2019; 47:1505-1535. [PMID: 31231143 PMCID: PMC6588012 DOI: 10.1214/18-aos1722] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
A fundamental assumption used in causal inference with observational data is that treatment assignment is ignorable given measured confounding variables. This assumption of no missing confounders is plausible if a large number of baseline covariates are included in the analysis, as we often have no prior knowledge of which variables can be important confounders. Thus, estimation of treatment effects with a large number of covariates has received considerable attention in recent years. Most existing methods require specifying certain parametric models involving the outcome, treatment and confounding variables, and employ a variable selection procedure to identify confounders. However, selection of a proper set of confounders depends on correct specification of the working models. The bias due to model misspecification and incorrect selection of confounding variables can yield misleading results. We propose a robust and efficient approach for inference about the average treatment effect via a flexible modeling strategy incorporating penalized variable selection. Specifically, we consider an estimator constructed based on an efficient influence function that involves a propensity score and an outcome regression. We then propose a new sparse sufficient dimension reduction method to estimate these two functions without making restrictive parametric modeling assumptions. The proposed estimator of the average treatment effect is asymptotically normal and semiparametrically efficient without the need for variable selection consistency. The proposed methods are illustrated via simulation studies and a biomedical application.
Collapse
Affiliation(s)
- Shujie Ma
- DEPARTMENT OF STATISTICS, UNIVERSITY OF CALIFORNIA, RIVERSIDE, RIVERSIDE, CALIFORNIA 92521, USA
| | - Liping Zhu
- CENTER FOR APPLIED STATISTICS, INSTITUTE OF STATISTICS AND BIG DATA, RENMIN UNIVERSITY OF CHINA, BEIJING 100872, CHINA
| | - Zhiwei Zhang
- DEPARTMENT OF STATISTICS, UNIVERSITY OF CALIFORNIA, RIVERSIDE, RIVERSIDE, CALIFORNIA 92521, USA
| | - Chih-Ling Tsai
- GRADUATE SCHOOL OF MANAGEMENT, UNIVERSITY OF CALIFORNIA, DAVIS, DAVIS, CALIFORNIA 95616, USA
| | - Raymond J Carroll
- DEPARTMENT OF STATISTICS, TEXAS A&M UNIVERSITY, COLLEGE STATION, TEXAS 77843, USA
- SCHOOL OF MATHEMATICAL SCIENCES, UNIVERSITY OF TECHNOLOGY, SYDNEY, BROADWAY NSW 2007, AUSTRALIA
| |
Collapse
|
35
|
Dahabreh IJ, Robertson SE, Tchetgen EJT, Stuart EA, Hernán MA. Generalizing causal inferences from individuals in randomized trials to all trial-eligible individuals. Biometrics 2019; 75:685-694. [PMID: 30488513 PMCID: PMC10938232 DOI: 10.1111/biom.13009] [Citation(s) in RCA: 82] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2017] [Accepted: 11/02/2018] [Indexed: 12/20/2022]
Abstract
We consider methods for causal inference in randomized trials nested within cohorts of trial-eligible individuals, including those who are not randomized. We show how baseline covariate data from the entire cohort, and treatment and outcome data only from randomized individuals, can be used to identify potential (counterfactual) outcome means and average treatment effects in the target population of all eligible individuals. We review identifiability conditions, propose estimators, and assess the estimators' finite-sample performance in simulation studies. As an illustration, we apply the estimators in a trial nested within a cohort of trial-eligible individuals to compare coronary artery bypass grafting surgery plus medical therapy vs. medical therapy alone for chronic coronary artery disease.
Collapse
Affiliation(s)
- Issa J. Dahabreh
- Center for Evidence Synthesis in Health, Brown University School of Public Health, Providence, RI, U.S.A
- Departments of Health Services, Policy & Practice and Epidemiology, Brown University, Providence, RI, U.S.A
- Department of Epidemiology, Harvard-T.H. Chan School of Public Health, Boston, MA, U.S.A
| | - Sarah E. Robertson
- Center for Evidence Synthesis in Health, Brown University School of Public Health, Providence, RI, U.S.A
| | | | - Elizabeth A. Stuart
- Departments of Mental Health, Biostatistics, and Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, U.S.A
| | - Miguel A. Hernán
- Department of Epidemiology, Harvard-T.H. Chan School of Public Health, Boston, MA, U.S.A
- Department of Biostatistics, Harvard-T.H. Chan School of Public Health, Boston, MA, U.S.A
- Harvard-MIT Division of Health Sciences and Technology, Boston, MA, U.S.A
| |
Collapse
|
36
|
Affiliation(s)
- Richard Berk
- Department of Criminology, University of Pennsylvania, Philadelphia, PA
- Department of Statistics, University of Pennsylvania, Philadelphia, PA
| | - Andreas Buja
- Department of Statistics, University of Pennsylvania, Philadelphia, PA
| | - Lawrence Brown
- Department of Statistics, University of Pennsylvania, Philadelphia, PA
| | - Edward George
- Department of Statistics, University of Pennsylvania, Philadelphia, PA
| | | | - Weijie Su
- Department of Statistics, University of Pennsylvania, Philadelphia, PA
| | - Linda Zhao
- Department of Statistics, University of Pennsylvania, Philadelphia, PA
| |
Collapse
|
37
|
Shortreed SM, Cook AJ, Coley RY, Bobb JF, Nelson JC. Challenges and Opportunities for Using Big Health Care Data to Advance Medical Science and Public Health. Am J Epidemiol 2019; 188:851-861. [PMID: 30877288 DOI: 10.1093/aje/kwy292] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Accepted: 12/20/2018] [Indexed: 12/14/2022] Open
Abstract
Methodological advancements in epidemiology, biostatistics, and data science have strengthened the research world's ability to use data captured from electronic health records (EHRs) to address pressing medical questions, but gaps remain. We describe methods investments that are needed to curate EHR data toward research quality and to integrate complementary data sources when EHR data alone are insufficient for research goals. We highlight new methods and directions for improving the integrity of medical evidence generated from pragmatic trials, observational studies, and predictive modeling. We also discuss needed methods contributions to further ease data sharing across multisite EHR data networks. Throughout, we identify opportunities for training and for bolstering collaboration among subject matter experts, methodologists, practicing clinicians, and health system leaders to help ensure that methods problems are identified and resulting advances are translated into mainstream research practice more quickly.
Collapse
Affiliation(s)
- Susan M Shortreed
- Biostatistics Unit, Kaiser Permanente Washington Health Research Institute, Seattle, Washington
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, Washington
| | - Andrea J Cook
- Biostatistics Unit, Kaiser Permanente Washington Health Research Institute, Seattle, Washington
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, Washington
| | - R Yates Coley
- Biostatistics Unit, Kaiser Permanente Washington Health Research Institute, Seattle, Washington
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, Washington
| | - Jennifer F Bobb
- Biostatistics Unit, Kaiser Permanente Washington Health Research Institute, Seattle, Washington
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, Washington
| | - Jennifer C Nelson
- Biostatistics Unit, Kaiser Permanente Washington Health Research Institute, Seattle, Washington
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, Washington
| |
Collapse
|
38
|
Zhang Z, Hu Z, Liu C. Estimating the Population Average Treatment Effect in Observational Studies with Choice-Based Sampling. Int J Biostat 2019; 15:/j/ijb.ahead-of-print/ijb-2018-0093/ijb-2018-0093.xml. [PMID: 30990786 DOI: 10.1515/ijb-2018-0093] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2018] [Accepted: 04/02/2019] [Indexed: 11/15/2022]
Abstract
We consider causal inference in observational studies with choice-based sampling, in which subject enrollment is stratified on treatment choice. Choice-based sampling has been considered mainly in the econometrics literature, but it can be useful for biomedical studies as well, especially when one of the treatments being compared is uncommon. We propose new methods for estimating the population average treatment effect under choice-based sampling, including doubly robust methods motivated by semiparametric theory. A doubly robust, locally efficient estimator may be obtained by replacing nuisance functions in the efficient influence function with estimates based on parametric models. The use of machine learning methods to estimate nuisance functions leads to estimators that are consistent and asymptotically efficient under broader conditions. The methods are compared in simulation experiments and illustrated in the context of a large observational study in obstetrics. We also make suggestions on how to choose the target proportion of treated subjects and the sample size in designing a choice-based observational study.
Collapse
Affiliation(s)
- Zhiwei Zhang
- Department of Statistics, University of California, Riverside, CA,USA
| | - Zonghui Hu
- Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, MD,USA
| | - Chunling Liu
- Department of Applied Mathematics, Hong Kong Polytechnic University, Hong Kong, China
| |
Collapse
|
39
|
|
40
|
Tran L, Yiannoutsos C, Wools-Kaloustian K, Siika A, van der Laan M, Petersen M. Double Robust Efficient Estimators of Longitudinal Treatment Effects: Comparative Performance in Simulations and a Case Study. Int J Biostat 2019; 15:ijb-2017-0054. [PMID: 30811344 DOI: 10.1515/ijb-2017-0054] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 11/16/2018] [Indexed: 11/15/2022]
Abstract
A number of sophisticated estimators of longitudinal effects have been proposed for estimating the intervention-specific mean outcome. However, there is a relative paucity of research comparing these methods directly to one another. In this study, we compare various approaches to estimating a causal effect in a longitudinal treatment setting using both simulated data and data measured from a human immunodeficiency virus cohort. Six distinct estimators are considered: (i) an iterated conditional expectation representation, (ii) an inverse propensity weighted method, (iii) an augmented inverse propensity weighted method, (iv) a double robust iterated conditional expectation estimator, (v) a modified version of the double robust iterated conditional expectation estimator, and (vi) a targeted minimum loss-based estimator. The details of each estimator and its implementation are presented along with nuisance parameter estimation details, which include potentially pooling the observed data across all subjects regardless of treatment history and using data adaptive machine learning algorithms. Simulations are constructed over six time points, with each time point steadily increasing in positivity violations. Estimation is carried out for both the simulations and applied example using each of the six estimators under both stratified and pooled approaches of nuisance parameter estimation. Simulation results show that double robust estimators remained without meaningful bias as long as at least one of the two nuisance parameters were estimated with a correctly specified model. Under full misspecification, the bias of the double robust estimators remained better than that of the inverse propensity estimator under misspecification, but worse than the iterated conditional expectation estimator. Weighted estimators tended to show better performance than the covariate estimators. As positivity violations increased, the mean squared error and bias of all estimators considered became worse, with covariate-based double robust estimators especially susceptible. Applied analyses showed similar estimates at most time points, with the important exception of the inverse propensity estimator which deviated markedly as positivity violations increased. Given its efficiency, ability to respect the parameter space, and observed performance, we recommend the pooled and weighted targeted minimum loss-based estimator.
Collapse
Affiliation(s)
- Linh Tran
- Department of Biostatistics, University of California Berkeley, Berkeley, CA, USA
| | - Constantin Yiannoutsos
- Department of Biostatistics, Indiana University Richard M Fairbanks School of Public Health, Indianapolis, IN, USA
| | - Kara Wools-Kaloustian
- Infectious Diseases, Howard Hughes Medical Institute - Indiana University School of Medicine, Indianapolis, IN, USA
| | | | | | - Maya Petersen
- University of California at Berkeley, Berkeley, CAUSA
| |
Collapse
|
41
|
Liu L, Hudgens MG, Saul B, Clemens JD, Ali M, Emch ME. Doubly Robust Estimation in Observational Studies with Partial Interference. Stat (Int Stat Inst) 2019; 8. [PMID: 31440374 DOI: 10.1002/sta4.214] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Interference occurs when the treatment (or exposure) of one individual affects the outcomes of others. In some settings it may be reasonable to assume individuals can be partitioned into clusters such that there is no interference between individuals in different clusters, i.e., there is partial interference. In observational studies with partial interference, inverse probability weighted (IPW) estimators have been proposed of different possible treatment effects. However, the validity of IPW estimators depends on the propensity score being known or correctly modeled. Alternatively, one can estimate the treatment effect using an outcome regression model. In this paper, we propose doubly robust (DR) estimators which utilize both models and are consistent and asymptotically normal if either model, but not necessarily both, is correctly specified. Empirical results are presented to demonstrate the DR property of the proposed estimators, as well as the efficiency gain of DR over IPW estimators when both models are correctly specified. The different estimators are illustrated using data from a study examining the effects of cholera vaccination in Bangladesh.
Collapse
Affiliation(s)
- Lan Liu
- School of Statistics, University of Minnesota at Twin Cities, Minnsota, U.S.A
| | - Michael G Hudgens
- Department of Biostatistics, University of North Carolina at Chapel Hill, North Carolina, U.S.A
| | - Bradley Saul
- Department of Biostatistics, University of North Carolina at Chapel Hill, North Carolina, U.S.A
| | - John D Clemens
- Department of Epidemiology, University of California, Los Angeles, California, U.S.A
| | - Mohammad Ali
- Department of International Health, Johns Hopkins University, Maryland, U.S.A
| | - Michael E Emch
- Department of Geography, University of North Carolina at Chapel Hill, North Carolina, U.S.A
| |
Collapse
|
42
|
Statti F, Sued M, Yohai VJ. High breakdown point robust estimators with missing data. COMMUN STAT-THEOR M 2018. [DOI: 10.1080/03610926.2017.1388396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Florencia Statti
- Instituto de Cálculo, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, CONICET, Argentina
| | - Mariela Sued
- Instituto de Cálculo, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, CONICET, Argentina
| | - Victor J. Yohai
- Departamento de Matemática, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires and CONICET, Argentina
| |
Collapse
|
43
|
Tao Y, Fu H. Doubly robust estimation of the weighted average treatment effect for a target population. Stat Med 2018; 38:315-325. [DOI: 10.1002/sim.7980] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Revised: 07/26/2018] [Accepted: 09/04/2018] [Indexed: 02/05/2023]
Affiliation(s)
- Yebin Tao
- Eli Lilly and Company; Indianapolis IN 46285
| | - Haoda Fu
- Eli Lilly and Company; Indianapolis IN 46285
| |
Collapse
|
44
|
Li T, Xie F, Feng X, Ibrahim JG, Zhu H. Functional Linear Regression Models for Nonignorable Missing Scalar Responses. Stat Sin 2018; 28:1867-1886. [PMID: 30344426 PMCID: PMC6191855 DOI: 10.5705/ss.202016.0350] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
As an important part of modern health care, medical imaging data, which can be regarded as densely sampled functional data, have been widely used for diagnosis, screening, treatment, and prognosis, such as finding breast cancer through mammograms. The aim of this paper is to propose a functional linear regression model for using functional (or imaging) predictors to predict clinical outcomes (e.g., disease status), while addressing missing clinical outcomes. We introduce an exponential tilting semiparametric model to account for the nonignorable missing data mechanism. We develop a set of estimating equations and its associated computational methods for both parameter estimation and the selection of the tuning parameters. We also propose a bootstrap resampling procedure for carrying out statistical inference. Under some regularity conditions, we systematically establish the asymptotic properties (e.g., consistency and convergence rate) of the estimates calculated from the proposed estimating equations. Simulation studies and a real data analysis are used to illustrate the finite sample performance of the proposed methods.
Collapse
Affiliation(s)
- Tengfei Li
- University of Texas MD Anderson Cancer Center
| | | | | | | | - Hongtu Zhu
- University of Texas MD Anderson Cancer Center
- University of North Carolina at Chapel Hill
| |
Collapse
|
45
|
Liu J, Ma Y, Wang L. An alternative robust estimator of average treatment effect in causal inference. Biometrics 2018; 74:910-923. [PMID: 29441521 PMCID: PMC6089681 DOI: 10.1111/biom.12859] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Revised: 11/01/2018] [Accepted: 12/01/2018] [Indexed: 10/18/2022]
Abstract
The problem of estimating the average treatment effects is important when evaluating the effectiveness of medical treatments or social intervention policies. Most of the existing methods for estimating the average treatment effect rely on some parametric assumptions about the propensity score model or the outcome regression model one way or the other. In reality, both models are prone to misspecification, which can have undue influence on the estimated average treatment effect. We propose an alternative robust approach to estimating the average treatment effect based on observational data in the challenging situation when neither a plausible parametric outcome model nor a reliable parametric propensity score model is available. Our estimator can be considered as a robust extension of the popular class of propensity score weighted estimators. This approach has the advantage of being robust, flexible, data adaptive, and it can handle many covariates simultaneously. Adopting a dimension reduction approach, we estimate the propensity score weights semiparametrically by using a non-parametric link function to relate the treatment assignment indicator to a low-dimensional structure of the covariates which are formed typically by several linear combinations of the covariates. We develop a class of consistent estimators for the average treatment effect and study their theoretical properties. We demonstrate the robust performance of the estimators on simulated data and a real data example of investigating the effect of maternal smoking on babies' birth weight.
Collapse
Affiliation(s)
- Jianxuan Liu
- Department of Mathematics and Statistics, Bowling Green State University, Bowling Green, OH 43403
| | - Yanyuan Ma
- Department of Statistics, Penn State University, University Park, PA 16802
| | - Lan Wang
- School of Statistics, University of Minnesota, Minneapolis, MN 55455
| |
Collapse
|
46
|
Hu Z, Qin J. Generalizability of causal inference in observational studies under retrospective convenience sampling. Stat Med 2018; 37:2874-2883. [PMID: 29781220 DOI: 10.1002/sim.7808] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2017] [Revised: 02/15/2018] [Accepted: 04/12/2018] [Indexed: 11/10/2022]
Abstract
Many observational studies adopt what we call retrospective convenience sampling (RCS). With the sample size in each arm prespecified, RCS randomly selects subjects from the treatment-inclined subpopulation into the treatment arm and those from the control-inclined into the control arm. Samples in each arm are representative of the respective subpopulation, but the proportion of the 2 subpopulations is usually not preserved in the sample data. We show in this work that, under RCS, existing causal effect estimators actually estimate the treatment effect over the sample population instead of the underlying study population. We investigate how to correct existing methods for consistent estimation of the treatment effect over the underlying population. Although RCS is adopted in medical studies for ethical and cost-effective purposes, it also has a big advantage for statistical inference: When the tendency to receive treatment is low in a study population, treatment effect estimators under RCS, with proper correction, are more efficient than their parallels under random sampling. These properties are investigated both theoretically and through numerical demonstration.
Collapse
Affiliation(s)
- Zonghui Hu
- Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, MD 20852, USA
| | - Jing Qin
- Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, MD 20852, USA
| |
Collapse
|
47
|
Abstract
Most methods for handling incomplete data can be broadly classified as inverse probability weighting (IPW) strategies or imputation strategies. The former model the occurrence of incomplete data; the latter, the distribution of the missing variables given observed variables in each missingness pattern. Imputation strategies are typically more efficient, but they can involve extrapolation, which is difficult to diagnose and can lead to large bias. Double robust (DR) methods combine the two approaches. They are typically more efficient than IPW and more robust to model misspecification than imputation. We give a formal introduction to DR estimation of the mean of a partially observed variable, before moving to more general incomplete-data scenarios. We review strategies to improve the performance of DR estimators under model misspecification, reveal connections between DR estimators for incomplete data and 'design-consistent' estimators used in sample surveys, and explain the value of double robustness when using flexible data-adaptive methods for IPW or imputation.
Collapse
Affiliation(s)
- Shaun R Seaman
- Medical Research Council Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - Stijn Vansteelandt
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium.,Department of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, UK
| |
Collapse
|
48
|
Sensitivity of adaptive enrichment trial designs to accrual rates, time to outcome measurement, and prognostic variables. Contemp Clin Trials Commun 2017; 8:39-48. [PMID: 29696195 PMCID: PMC5898543 DOI: 10.1016/j.conctc.2017.08.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2016] [Revised: 04/19/2017] [Accepted: 08/11/2017] [Indexed: 11/21/2022] Open
|
49
|
Abstract
Doubly robust estimators have now been proposed for a variety of target parameters in the causal inference and missing data literature. These consistently estimate the parameter of interest under a semiparametric model when one of two nuisance working models is correctly specified, regardless of which. The recently proposed bias-reduced doubly robust estimation procedure aims to partially retain this robustness in more realistic settings where both working models are misspecified. These so-called bias-reduced doubly robust estimators make use of special (finite-dimensional) nuisance parameter estimators that are designed to locally minimize the squared asymptotic bias of the doubly robust estimator in certain directions of these finite-dimensional nuisance parameters under misspecification of both parametric working models. In this article, we extend this idea to incorporate the use of data-adaptive estimators (infinite-dimensional nuisance parameters), by exploiting the bias reduction estimation principle in the direction of only one nuisance parameter. We additionally provide an asymptotic linearity theorem which gives the influence function of the proposed doubly robust estimator under correct specification of a parametric nuisance working model for the missingness mechanism/propensity score but a possibly misspecified (finite- or infinite-dimensional) outcome working model. Simulation studies confirm the desirable finite-sample performance of the proposed estimators relative to a variety of other doubly robust estimators.
Collapse
|
50
|
Molina J, Rotnitzky A, Sued M, Robins JM. Multiple robustness in factorized likelihood models. Biometrika 2017; 104:561-581. [PMID: 29430033 PMCID: PMC5793686 DOI: 10.1093/biomet/asx027] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2015] [Indexed: 11/30/2022] Open
Abstract
We consider inference under a nonparametric or semiparametric model with likelihood that factorizes as the product of two or more variation-independent factors. We are interested in a finite-dimensional parameter that depends on only one of the likelihood factors and whose estimation requires the auxiliary estimation of one or several nuisance functions. We investigate general structures conducive to the construction of so-called multiply robust estimating functions, whose computation requires postulating several dimension-reducing models but which have mean zero at the true parameter value provided one of these models is correct.
Collapse
Affiliation(s)
- J Molina
- Instituto de Cálculo, Universidad de Buenos Aires, Intendente Guiraldes 2160, Pabellon II, Buenos Aires 1428,
| | - A Rotnitzky
- Department of Economics, Di Tella University, Figueroa Alcorta 7350, Buenos Aires 1428,
| | - M Sued
- Instituto de Cálculo, Universidad de Buenos Aires, Intendente Guiraldes 2160, Pabellon II, Buenos Aires 1428,
| | - J M Robins
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, 655 Huntington Avenue, Boston, Massachusetts 02115,
| |
Collapse
|