1
|
Bonander C, Nilsson A, Li H, Sharma S, Nwaru C, Gisslén M, Lindh M, Hammar N, Björk J, Nyberg F. A Capture-Recapture-based Ascertainment Probability Weighting Method for Effect Estimation With Under-ascertained Outcomes. Epidemiology 2024; 35:340-348. [PMID: 38442421 PMCID: PMC11022997 DOI: 10.1097/ede.0000000000001717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 01/18/2024] [Indexed: 03/07/2024]
Abstract
Outcome under-ascertainment, characterized by the incomplete identification or reporting of cases, poses a substantial challenge in epidemiologic research. While capture-recapture methods can estimate unknown case numbers, their role in estimating exposure effects in observational studies is not well established. This paper presents an ascertainment probability weighting framework that integrates capture-recapture and propensity score weighting. We propose a nonparametric estimator of effects on binary outcomes that combines exposure propensity scores with data from two conditionally independent outcome measurements to simultaneously adjust for confounding and under-ascertainment. Demonstrating its practical application, we apply the method to estimate the relationship between health care work and coronavirus disease 2019 testing in a Swedish region. We find that ascertainment probability weighting greatly influences the estimated association compared to conventional inverse probability weighting, underscoring the importance of accounting for under-ascertainment in studies with limited outcome data coverage. We conclude with practical guidelines for the method's implementation, discussing its strengths, limitations, and suitable scenarios for application.
Collapse
Affiliation(s)
- Carl Bonander
- From the School of Public Health and Community Medicine, Institute of Medicine, University of Gothenburg, Gothenburg, Sweden
- Centre for Societal Risk Management, Karlstad University, Karlstad, Sweden
| | - Anton Nilsson
- Epidemiology, Population Studies, and Infrastructures (EPI@LUND), Lund University, Lund, Sweden
| | - Huiqi Li
- From the School of Public Health and Community Medicine, Institute of Medicine, University of Gothenburg, Gothenburg, Sweden
| | - Shambhavi Sharma
- From the School of Public Health and Community Medicine, Institute of Medicine, University of Gothenburg, Gothenburg, Sweden
| | - Chioma Nwaru
- From the School of Public Health and Community Medicine, Institute of Medicine, University of Gothenburg, Gothenburg, Sweden
| | - Magnus Gisslén
- Department of Infectious Diseases, Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
- Region Västra Götaland, Department of Infectious Diseases, Sahlgrenska University Hospital, Gothenburg, Sweden
| | - Magnus Lindh
- Department of Infectious Diseases, Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
- Department of Clinical Microbiology, Sahlgrenska University Hospital, Gothenburg, Sweden
| | - Niklas Hammar
- Unit of Epidemiology, Institute of Environmental Medicine, Karolinska Institute, Stockholm, Sweden
| | - Jonas Björk
- Epidemiology, Population Studies, and Infrastructures (EPI@LUND), Lund University, Lund, Sweden
- Clinical Studies Sweden, Forum South, Skåne University Hospital, Lund, Sweden
| | - Fredrik Nyberg
- From the School of Public Health and Community Medicine, Institute of Medicine, University of Gothenburg, Gothenburg, Sweden
| |
Collapse
|
2
|
Bong S, Lee K, Dominici F. Differential recall bias in estimating treatment effects in observational studies. Biometrics 2024; 80:ujae058. [PMID: 38919141 PMCID: PMC11199734 DOI: 10.1093/biomtc/ujae058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 04/02/2024] [Accepted: 06/05/2024] [Indexed: 06/27/2024]
Abstract
Observational studies are frequently used to estimate the effect of an exposure or treatment on an outcome. To obtain an unbiased estimate of the treatment effect, it is crucial to measure the exposure accurately. A common type of exposure misclassification is recall bias, which occurs in retrospective cohort studies when study subjects may inaccurately recall their past exposure. Particularly challenging is differential recall bias in the context of self-reported binary exposures, where the bias may be directional rather than random and its extent varies according to the outcomes experienced. This paper makes several contributions: (1) it establishes bounds for the average treatment effect even when a validation study is not available; (2) it proposes multiple estimation methods across various strategies predicated on different assumptions; and (3) it suggests a sensitivity analysis technique to assess the robustness of the causal conclusion, incorporating insights from prior research. The effectiveness of these methods is demonstrated through simulation studies that explore various model misspecification scenarios. These approaches are then applied to investigate the effect of childhood physical abuse on mental health in adulthood.
Collapse
Affiliation(s)
- Suhwan Bong
- Department of Statistics, Seoul National University, Seoul 08826, Republic of Korea
| | - Kwonsang Lee
- Department of Statistics, Seoul National University, Seoul 08826, Republic of Korea
| | - Francesca Dominici
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, United States
| |
Collapse
|
3
|
Edwards JK, Cole SR, Shook-Sa BE, Zivich PN, Zhang N, Lesko CR. When Does Differential Outcome Misclassification Matter for Estimating Prevalence? Epidemiology 2023; 34:192-200. [PMID: 36722801 PMCID: PMC10237297 DOI: 10.1097/ede.0000000000001572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
BACKGROUND When accounting for misclassification, investigators make assumptions about whether misclassification is "differential" or "nondifferential." Most guidance on differential misclassification considers settings where outcome misclassification varies across levels of exposure, or vice versa. Here, we examine when covariate-differential misclassification must be considered when estimating overall outcome prevalence. METHODS We generated datasets with outcome misclassification under five data generating mechanisms. In each, we estimated prevalence using estimators that (a) ignored misclassification, (b) assumed misclassification was nondifferential, and (c) allowed misclassification to vary across levels of a covariate. We compared bias and precision in estimated prevalence in the study sample and an external target population using different sources of validation data to account for misclassification. We illustrated use of each approach to estimate HIV prevalence using self-reported HIV status among people in East Africa cross-border areas. RESULTS The estimator that allowed misclassification to vary across levels of the covariate produced results with little bias for both populations in all scenarios but had higher variability when the validation study contained sparse strata. Estimators that assumed nondifferential misclassification produced results with little bias when the covariate distribution in the validation data matched the covariate distribution in the target population; otherwise estimates assuming nondifferential misclassification were biased. CONCLUSIONS If validation data are a simple random sample from the target population, assuming nondifferential outcome misclassification will yield prevalence estimates with little bias regardless of whether misclassification varies across covariates. Otherwise, obtaining valid prevalence estimates requires incorporating covariates into the estimators used to account for misclassification.
Collapse
Affiliation(s)
- Jessie K. Edwards
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill
| | - Stephen R. Cole
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill
| | - Bonnie E. Shook-Sa
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill
| | - Paul N. Zivich
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill
| | - Ning Zhang
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill
| | - Catherine R. Lesko
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins
| |
Collapse
|
4
|
Sengupta D, Roy S, Banerjee T. Testing of Poisson mean with under-reported counts. BRAZ J PROBAB STAT 2021. [DOI: 10.1214/20-bjps493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Debjit Sengupta
- Department of Statistics, St. Xavier’s College, 30, Mother Teresa Sarani, Kolkata-700016, India
| | - Surupa Roy
- Department of Statistics, St. Xavier’s College, 30, Mother Teresa Sarani, Kolkata-700016, India
| | | |
Collapse
|
5
|
Postmyocardial Infarction Statin Exposure and the Risk of Stroke with Weighting for Outcome Misclassification. Epidemiology 2020; 31:880-888. [PMID: 33003152 DOI: 10.1097/ede.0000000000001253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
BACKGROUND Observational healthcare data can be used for drug safety and effectiveness research. The use of inverse probability of treatment weights (IPW) reduces measured confounding under the assumption of accurate measurement of the outcome variable; however, many datasets suffer from systematic outcome misclassification. METHODS We introduced a modification to IPW to correct for the presence of outcome misclassification. To demonstrate the utility of these modified weights in realistic settings, we investigated postmyocardial infarction statin use and the 1-year risk of stroke in the Clinical Practice Research Datalink. RESULTS We computed an IPW-adjusted odds ratio (OR = 0.67; 95% confidence interval (CI) = 0.48, 0.93). We employed a technique to modify IPW for the presence of outcome misclassification using linked hospital records for outcome validation (modified IPW adjusted OR = 0.77; 95% CI = 0.52, 1.15) and compared the results with a meta-analysis of randomized controlled trials (RCTs) (pooled OR = 0.80; 95% CI = 0.74, 0.87). Finally, we present simulation studies to investigate the impact of model selection on bias reduction and variability. CONCLUSION Ignoring outcome misclassification yielded biased estimates whereas the use of the modified IPW approach produced encouraging results when compared with the meta-analytic RCT findings.
Collapse
|
6
|
Penning de Vries BB, van Smeden M, Groenwold RH. A weighting method for simultaneous adjustment for confounding and joint exposure-outcome misclassifications. Stat Methods Med Res 2020; 30:473-487. [PMID: 32998668 PMCID: PMC8008432 DOI: 10.1177/0962280220960172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Joint misclassification of exposure and outcome variables can lead to considerable bias in epidemiological studies of causal exposure-outcome effects. In this paper, we present a new maximum likelihood based estimator for marginal causal effects that simultaneously adjusts for confounding and several forms of joint misclassification of the exposure and outcome variables. The proposed method relies on validation data for the construction of weights that account for both sources of bias. The weighting estimator, which is an extension of the outcome misclassification weighting estimator proposed by Gravel and Platt (Weighted estimation for confounded binary outcomes subject to misclassification. Stat Med 2018; 37: 425–436), is applied to reinfarction data. Simulation studies were carried out to study its finite sample properties and compare it with methods that do not account for confounding or misclassification. The new estimator showed favourable large sample properties in the simulations. Further research is needed to study the sensitivity of the proposed method and that of alternatives to violations of their assumptions. The implementation of the estimator is facilitated by a new R function (ipwm) in an existing R package (mecor).
Collapse
Affiliation(s)
- Bas Bl Penning de Vries
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Maarten van Smeden
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Rolf Hh Groenwold
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands.,Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|
7
|
Edwards JK, Cole SR, Fox MP. Flexibly Accounting for Exposure Misclassification With External Validation Data. Am J Epidemiol 2020; 189:850-860. [PMID: 31971584 DOI: 10.1093/aje/kwaa011] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Revised: 01/07/2020] [Accepted: 01/13/2020] [Indexed: 11/14/2022] Open
Abstract
Measurement error is common in epidemiology, but few studies use quantitative methods to account for bias due to mismeasurement. One potential barrier is that some intuitive approaches that readily combine with methods to account for other sources of bias, like multiple imputation for measurement error (MIME), rely on internal validation data, which are rarely available. Here, we present a reparameterized imputation approach for measurement error (RIME) that can be used with internal or external validation data. We illustrate the advantages of RIME over a naive approach that ignores measurement error and MIME using a hypothetical example and a series of simulation experiments. In both the example and simulations, we combine MIME and RIME with inverse probability weighting to account for confounding when estimating hazard ratios and counterfactual risk functions. MIME and RIME performed similarly when rich external validation data were available and the prevalence of exposure did not vary between the main study and the validation data. However, RIME outperformed MIME when validation data included only true and mismeasured versions of the exposure or when exposure prevalence differed between the data sources. RIME allows investigators to leverage external validation data to account for measurement error in a wide range of scenarios.
Collapse
Affiliation(s)
- Jessie K Edwards
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Stephen R Cole
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Matthew P Fox
- Department of Epidemiology, School of Public Health, Boston University, Boston, Massachusetts
- Department of Global Health, School of Public Health, Boston University, Boston, Massachusetts
| |
Collapse
|
8
|
Shu D, Yi GY. Causal inference with noisy data: Bias analysis and estimation approaches to simultaneously addressing missingness and misclassification in binary outcomes. Stat Med 2020; 39:456-468. [PMID: 31802532 DOI: 10.1002/sim.8419] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Revised: 08/21/2019] [Accepted: 10/13/2019] [Indexed: 11/08/2022]
Abstract
Causal inference has been widely conducted in various fields and many methods have been proposed for different settings. However, for noisy data with both mismeasurements and missing observations, those methods often break down. In this paper, we consider a problem that binary outcomes are subject to both missingness and misclassification, when the interest is in estimation of the average treatment effects (ATE). We examine the asymptotic biases caused by ignoring missingness and/or misclassification and establish the intrinsic connections between missingness effects and misclassification effects on the estimation of ATE. We develop valid weighted estimation methods to simultaneously correct for missingness and misclassification effects. To provide protection against model misspecification, we further propose a doubly robust correction method which yields consistent estimators when either the treatment model or the outcome model is misspecified. Simulation studies are conducted to assess the performance of the proposed methods. An application to smoking cessation data is reported to illustrate the use of the proposed methods.
Collapse
Affiliation(s)
- Di Shu
- Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, Massachusetts.,Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada
| | - Grace Y Yi
- Department of Statistical and Actuarial Sciences, Department of Computer Science, University of Western Ontario, London, Ontario, Canada.,Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada
| |
Collapse
|
9
|
van Smeden M, Lash TL, Groenwold RHH. Reflection on modern methods: five myths about measurement error in epidemiological research. Int J Epidemiol 2020; 49:338-347. [PMID: 31821469 PMCID: PMC7124512 DOI: 10.1093/ije/dyz251] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/16/2019] [Indexed: 02/02/2023] Open
Abstract
Epidemiologists are often confronted with datasets to analyse which contain measurement error due to, for instance, mistaken data entries, inaccurate recordings and measurement instrument or procedural errors. If the effect of measurement error is misjudged, the data analyses are hampered and the validity of the study's inferences may be affected. In this paper, we describe five myths that contribute to misjudgments about measurement error, regarding expected structure, impact and solutions to mitigate the problems resulting from mismeasurements. The aim is to clarify these measurement error misconceptions. We show that the influence of measurement error in an epidemiological data analysis can play out in ways that go beyond simple heuristics, such as heuristics about whether or not to expect attenuation of the effect estimates. Whereas we encourage epidemiologists to deliberate about the structure and potential impact of measurement error in their analyses, we also recommend exercising restraint when making claims about the magnitude or even direction of effect of measurement error if not accompanied by statistical measurement error corrections or quantitative bias analysis. Suggestions for alleviating the problems or investigating the structure and magnitude of measurement error are given.
Collapse
Affiliation(s)
- Maarten van Smeden
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Timothy L Lash
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Rolf H H Groenwold
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|
10
|
Tong J, Huang J, Chubak J, Wang X, Moore JH, Hubbard RA, Chen Y. An augmented estimation procedure for EHR-based association studies accounting for differential misclassification. J Am Med Inform Assoc 2020; 27:244-253. [PMID: 31617899 DOI: 10.1093/jamia/ocz180] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 08/14/2019] [Accepted: 09/15/2019] [Indexed: 11/12/2022] Open
Abstract
OBJECTIVES The ability to identify novel risk factors for health outcomes is a key strength of electronic health record (EHR)-based research. However, the validity of such studies is limited by error in EHR-derived phenotypes. The objective of this study was to develop a novel procedure for reducing bias in estimated associations between risk factors and phenotypes in EHR data. MATERIALS AND METHODS The proposed method combines the strengths of a gold-standard phenotype obtained through manual chart review for a small validation set of patients and an automatically-derived phenotype that is available for all patients but is potentially error-prone (hereafter referred to as the algorithm-derived phenotype). An augmented estimator of associations is obtained by optimally combining these 2 phenotypes. We conducted simulation studies to evaluate the performance of the augmented estimator and conducted an analysis of risk factors for second breast cancer events using data on a cohort from Kaiser Permanente Washington. RESULTS The proposed method was shown to reduce bias relative to an estimator using only the algorithm-derived phenotype and reduce variance compared to an estimator using only the validation data. DISCUSSION Our simulation studies and real data application demonstrate that, compared to the estimator using validation data only, the augmented estimator has lower variance (ie, higher statistical efficiency). Compared to the estimator using error-prone EHR-derived phenotypes, the augmented estimator has smaller bias. CONCLUSIONS The proposed estimator can effectively combine an error-prone phenotype with gold-standard data from a limited chart review in order to improve analyses of risk factors using EHR data.
Collapse
Affiliation(s)
- Jiayi Tong
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Jing Huang
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Jessica Chubak
- Department of Epidemiology, Kaiser Permanente Washington Health Research Institute, Seattle, Washington, USA
| | - Xuan Wang
- Department of Statistics, School of Mathematical Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Jason H Moore
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Rebecca A Hubbard
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Yong Chen
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
11
|
Shu D, Yi GY. Inverse‐probability‐of‐treatment weighted estimation of causal parameters in the presence of error‐contaminated and time‐dependent confounders. Biom J 2019; 61:1507-1525. [DOI: 10.1002/bimj.201600228] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2016] [Revised: 03/19/2019] [Accepted: 06/19/2019] [Indexed: 11/09/2022]
Affiliation(s)
- Di Shu
- Department of Statistics and Actuarial ScienceUniversity of WaterlooWaterloo Ontario Canada
| | - Grace Y. Yi
- Department of Statistics and Actuarial ScienceUniversity of WaterlooWaterloo Ontario Canada
| |
Collapse
|
12
|
Shu D, Yi GY. Weighted causal inference methods with mismeasured covariates and misclassified outcomes. Stat Med 2019; 38:1835-1854. [DOI: 10.1002/sim.8073] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Revised: 10/26/2018] [Accepted: 11/19/2018] [Indexed: 11/08/2022]
Affiliation(s)
- Di Shu
- Department of Statistics and Actuarial ScienceUniversity of Waterloo Waterloo Ontario Canada
| | - Grace Y. Yi
- Department of Statistics and Actuarial ScienceUniversity of Waterloo Waterloo Ontario Canada
| |
Collapse
|