1
|
Shih JH, Albert PS, Fine J, Liu D. An imputation approach for a time-to-event analysis subject to missing outcomes due to noncoverage in disease registries. Biostatistics 2023; 25:117-133. [PMID: 36534828 PMCID: PMC10939403 DOI: 10.1093/biostatistics/kxac049] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Revised: 11/29/2022] [Accepted: 12/04/2022] [Indexed: 12/17/2023] Open
Abstract
Disease incidence data in a national-based cohort study would ideally be obtained through a national disease registry. Unfortunately, no such registry currently exists in the United States. Instead, the results from individual state registries need to be combined to ascertain certain disease diagnoses in the United States. The National Cancer Institute has initiated a program to assemble all state registries to provide a complete assessment of all cancers in the United States. Unfortunately, not all registries have agreed to participate. In this article, we develop an imputation-based approach that uses self-reported cancer diagnosis from longitudinally collected questionnaires to impute cancer incidence not covered by the combined registry. We propose a two-step procedure, where in the first step a mover-stayer model is used to impute a participant's registry coverage status when it is only reported at the time of the questionnaires given at 10-year intervals and the time of the last-alive vital status and death. In the second step, we propose a semiparametric working model, fit using an imputed coverage area sample identified from the mover-stayer model, to impute registry-based survival outcomes for participants in areas not covered by the registry. The simulation studies show the approach performs well as compared with alternative ad hoc approaches for dealing with this problem. We illustrate the methodology with an analysis that links the United States Radiologic Technologists study cohort with the combined registry that includes 32 of the 50 states.
Collapse
Affiliation(s)
- Joanna H Shih
- Biometric Research Program, Division of Cancer Treatment and Diagnosis, National Cancer Institute, 9609 Medical Center Drive, Bethesda, MD 20892, USA
| | - Paul S Albert
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, 9609 Medical Center Drive, Bethesda, MD 20892, USA
| | - Jason Fine
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, 9609 Medical Center Drive, Bethesda, MD 20892, USA
| | - Danping Liu
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, 9609 Medical Center Drive, Bethesda, MD 20892, USA
| |
Collapse
|
2
|
Hou J, Chan SF, Wang X, Cai T. Risk prediction with imperfect survival outcome information from electronic health records. Biometrics 2023; 79:190-202. [PMID: 34747010 PMCID: PMC9741856 DOI: 10.1111/biom.13599] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 10/28/2021] [Accepted: 10/29/2021] [Indexed: 12/14/2022]
Abstract
Readily available proxies for the time of disease onset such as the time of the first diagnostic code can lead to substantial risk prediction error if performing analyses based on poor proxies. Due to the lack of detailed documentation and labor intensiveness of manual annotation, it is often only feasible to ascertain for a small subset the current status of the disease by a follow-up time rather than the exact time. In this paper, we aim to develop risk prediction models for the onset time efficiently leveraging both a small number of labels on the current status and a large number of unlabeled observations on imperfect proxies. Under a semiparametric transformation model for onset and a highly flexible measurement error model for proxy onset time, we propose the semisupervised risk prediction method by combining information from proxies and limited labels efficiently. From an initially estimator solely based on the labeled subset, we perform a one-step correction with the full data augmenting against a mean zero rank correlation score derived from the proxies. We establish the consistency and asymptotic normality of the proposed semisupervised estimator and provide a resampling procedure for interval estimation. Simulation studies demonstrate that the proposed estimator performs well in a finite sample. We illustrate the proposed estimator by developing a genetic risk prediction model for obesity using data from Mass General Brigham Healthcare Biobank.
Collapse
Affiliation(s)
- Jue Hou
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Stephanie F. Chan
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Xuan Wang
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Tianxi Cai
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
3
|
Valeri L. Invited Perspective: A Multivariate Disease Process Perspective for Environmental Epidemiology. ENVIRONMENTAL HEALTH PERSPECTIVES 2023; 131:11302. [PMID: 36696107 PMCID: PMC9875848 DOI: 10.1289/ehp12509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Accepted: 12/19/2022] [Indexed: 06/17/2023]
Affiliation(s)
- Linda Valeri
- Columbia University Mailman School of Public Health, New York, New York, USA
| |
Collapse
|
4
|
Abrahamowicz M, Beauchamp ME, Moura CS, Bernatsky S, Ferreira Guerra S, Danieli C. Adapting SIMEX to correct for bias due to interval-censored outcomes in survival analysis with time-varying exposure. Biom J 2022; 64:1467-1485. [PMID: 36065586 DOI: 10.1002/bimj.202100013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Revised: 05/16/2022] [Accepted: 05/28/2022] [Indexed: 12/14/2022]
Abstract
Many clinical and epidemiological applications of survival analysis focus on interval-censored events that can be ascertained only at discrete times of clinic visits. This implies that the values of time-varying covariates are not correctly aligned with the true, unknown event times, inducing a bias in the estimated associations. To address this issue, we adapted the simulation-extrapolation (SIMEX) methodology, based on assessing how the estimates change with the artificially increased time between clinic visits. We propose diagnostics to choose the extrapolating function. In simulations, the SIMEX-corrected estimates reduced considerably the bias to the null and generally yielded a better bias/variance trade-off than conventional estimates. In a real-life pharmacoepidemiological application, the proposed method increased by 27% the excess hazard of the estimated association between a time-varying exposure, representing the 2-year cumulative duration of past use of a hypertensive medication, and the hazard of nonmelanoma skin cancer (interval-censored events). These simulation-based and real-life results suggest that the proposed SIMEX-based correction may help improve the accuracy of estimated associations between time-varying exposures and the hazard of interval-censored events in large cohort studies where the events are recorded only at relatively sparse times of clinic visits/assessments. However, these advantages may be less certain for smaller studies and/or weak associations.
Collapse
Affiliation(s)
- Michal Abrahamowicz
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada.,Centre for Outcomes Research and Evaluation, Research Institute of the McGill University Health Centre, Montreal, QC, Canada
| | - Marie-Eve Beauchamp
- Centre for Outcomes Research and Evaluation, Research Institute of the McGill University Health Centre, Montreal, QC, Canada
| | - Cristiano Soares Moura
- Centre for Outcomes Research and Evaluation, Research Institute of the McGill University Health Centre, Montreal, QC, Canada
| | - Sasha Bernatsky
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada.,Centre for Outcomes Research and Evaluation, Research Institute of the McGill University Health Centre, Montreal, QC, Canada
| | - Steve Ferreira Guerra
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada
| | - Coraline Danieli
- Centre for Outcomes Research and Evaluation, Research Institute of the McGill University Health Centre, Montreal, QC, Canada
| |
Collapse
|
5
|
Cloodt E, Lindgren A, Lauge-Pedersen H, Rodby-Bousquet E. Sequence of flexion contracture development in the lower limb: a longitudinal analysis of 1,071 children with cerebral palsy. BMC Musculoskelet Disord 2022; 23:629. [PMID: 35780097 PMCID: PMC9250270 DOI: 10.1186/s12891-022-05548-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Accepted: 06/14/2022] [Indexed: 11/10/2022] Open
Abstract
Background To prevent severe contractures and their impact on adjacent joints in children with cerebral palsy (CP), it is crucial to treat the reduced range of motion early and to understand the order by which contractures appear. The aim of this study was to determine how a hip–knee or ankle contracture are associated with the time to and sequence of contracture development in adjacent joints. Methods This was a longitudinal cohort study of 1,071 children (636 boys, 435 girls) with CP born 1990 to 2018 who were registered before 5 years of age in the Swedish surveillance program for CP and had a hip, knee or ankle flexion contracture of ≥ 10°. The results were based on 1,636 legs followed for an average of 4.6 years (range 0–17 years). The Cox proportional-hazards model adjusted for Gross Motor Function Classification System (GMFCS) levels I–V was used to compare the percentage of legs with and without more than one contracture. Results A second contracture developed in 44% of the legs. The frequency of multiple contractures increased with higher GMFCS level. Children with a primary hip or foot contracture were more likely to develop a second knee contracture. Children with a primary knee contracture developed either a hip or ankle contracture as a second contracture. Conclusions Multiple contractures were associated with higher GMFCS level. Lower limb contractures appeared in specific patterns where the location of the primary contracture and GMFCS level were associated with contracture development in adjacent joints.
Collapse
Affiliation(s)
- Erika Cloodt
- Department of Clinical Sciences Lund, Orthopaedics, Lund University, Lund, Sweden. .,Department of Research and Development, Region Kronoberg, Växjö, Sweden.
| | - Anna Lindgren
- Centre for Mathematical Sciences, Lund University, Lund, Sweden
| | | | - Elisabet Rodby-Bousquet
- Department of Clinical Sciences Lund, Orthopaedics, Lund University, Lund, Sweden.,Centre for Clinical Research Västerås, Uppsala University-Region Västmanland, Västerås, Sweden
| |
Collapse
|
6
|
Sevilimedu V, Yu L. Simulation extrapolation method for measurement error: A review. Stat Methods Med Res 2022; 31:1617-1636. [PMID: 35607297 PMCID: PMC10062410 DOI: 10.1177/09622802221102619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Measurement error is pervasive in statistics due to the non-availability of authentic data. The reasons for measurement error mainly relate to cost, convenience, and human error. Measurement error can result in non-negligible bias due to attenuated estimates, reduced power of statistical tests, and lower coverage probabilities of the coefficient estimators in a regression model. Several methods have been proposed to correct for measurement error, all of which can be grouped into two broad categories based on the underlying model-functional and structural. Functional models provide flexibility and robustness to estimators by placing minimal or no assumptions on the distribution of the mismeasured covariate or by treating them as a fixed entity, as opposed to a structural model which treats the underlying mismeasured covariates as random with a specified structure. The simulation extrapolation method is one method that is used for the partial correction of measurement error in both structural and functional models. Reviews of measurement error correction techniques are available in the literature. However, none of the previously conducted reviews has exclusively focused on simulation extrapolation and its application in continuous measurement error models, despite its widespread use and ease of application. We attempt to close this gap in the literature by highlighting its development over the past two and a half decades.
Collapse
Affiliation(s)
- Varadan Sevilimedu
- Department of Epidemiology and Biostatistics, 5803Memorial Sloan Kettering Cancer Center, Manhattan, New York, USA
| | - Lili Yu
- JPHCOPH, 123432Georgia Southern University, Statesboro, Georgia, USA
| |
Collapse
|
7
|
Irlmeier R, Hughey JJ, Bastarache L, Denny JC, Chen Q. Cox regression is robust to inaccurate EHR-extracted event time: an application to EHR-based GWAS. Bioinformatics 2022; 38:2297-2306. [PMID: 35157022 PMCID: PMC10060718 DOI: 10.1093/bioinformatics/btac086] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Revised: 12/14/2021] [Accepted: 02/09/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Logistic regression models are used in genomic studies to analyze the genetic data linked to electronic health records (EHRs), and do not take full usage of the time-to-event information available in EHRs. Previous work has shown that Cox regression, which can account for left truncation and right censoring in EHRs, increased the power to detect genotype-phenotype associations compared to logistic regression. We extend this to evaluate the relative performance of Cox regression and various logistic regression models in the presence of positive errors in event time (delayed event time), relating to recorded event time accuracy. RESULTS One Cox model and three logistic regression models were considered under different scenarios of delayed event time. Extensive simulations and a genomic study application were used to evaluate the impact of delayed event time. While logistic regression does not model the time-to-event directly, various logistic regression models used in the literature were more sensitive to delayed event time than Cox regression. Results highlighted the importance to identify and exclude the patients diagnosed before entry time. Cox regression had similar or modest improvement in statistical power over various logistic regression models at controlled type I error. This was supported by the empirical data, where the Cox models steadily had the highest sensitivity to detect known genotype-phenotype associations under all scenarios of delayed event time. AVAILABILITY AND IMPLEMENTATION Access to individual-level EHR and genotype data is restricted by the IRB. Simulation code and R script for data process are at: https://github.com/QingxiaCindyChen/CoxRobustEHR.git. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rebecca Irlmeier
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37203, USA
| | - Jacob J Hughey
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, USA.,Department of Biomedical Sciences, Vanderbilt University, Nashville, TN 37203, USA
| | - Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, USA
| | - Joshua C Denny
- All of Us Research Program, National Institutes of Health, Bethesda, MD 20892, USA
| | - Qingxia Chen
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37203, USA.,Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, USA
| |
Collapse
|
8
|
Cao Z, Wong MY. Approximate profile likelihood estimation for Cox regression with covariate measurement error. Stat Med 2022; 41:910-931. [PMID: 35067954 DOI: 10.1002/sim.9324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Revised: 09/24/2021] [Accepted: 12/21/2021] [Indexed: 11/09/2022]
Abstract
In nutritional epidemiology, measurement error in covariates is a well-known problem since dietary intakes are usually assessed through self-reporting. In this article, we consider an additive error model in which error variables are highly correlated, and propose a new method called approximate profile likelihood estimation (APLE) for covariates measured with error in the Cox regression. Asymptotic normality of this estimator is established under regularity conditions, and simulation studies are conducted to examine the finite sample performance of the proposed estimator empirically. Moreover, the popular correction method called regression calibration is shown to be a special case of APLE. We then apply APLE to deal with measurement error in some nutrients of interest in the EPIC-InterAct Study under a sensitivity analysis framework.
Collapse
Affiliation(s)
- Zhiqiang Cao
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen, China
| | - Man Yu Wong
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong, China
| |
Collapse
|
9
|
Innes GK, Bhondoekhan F, Lau B, Gross AL, Ng DK, Abraham AG. The Measurement Error Elephant in the Room: Challenges and Solutions to Measurement Error in Epidemiology. Epidemiol Rev 2022; 43:94-105. [PMID: 34664648 PMCID: PMC9005058 DOI: 10.1093/epirev/mxab011] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Revised: 09/30/2021] [Accepted: 10/06/2021] [Indexed: 11/12/2022] Open
Abstract
Measurement error, although ubiquitous, is uncommonly acknowledged and rarely assessed or corrected in epidemiologic studies. This review offers a straightforward guide to common problems caused by measurement error in research studies and a review of several accessible bias-correction methods for epidemiologists and data analysts. Although most correction methods require criterion validation including a gold standard, there are also ways to evaluate the impact of measurement error and potentially correct for it without such data. Technical difficulty ranges from simple algebra to more complex algorithms that require expertise, fine tuning, and computational power. However, at all skill levels, software packages and methods are available and can be used to understand the threat to inferences that arises from imperfect measurements.
Collapse
Affiliation(s)
| | | | | | | | | | - Alison G Abraham
- Correspondence to Dr. Alison G. Abraham, Department of Epidemiology, University of Colorado, Anschutz Medical Campus, 1635 Aurora Ct, Aurora, CO 80045 (e-mail: )
| |
Collapse
|
10
|
Manderson AA, Goudie RJB. Combining chains of Bayesian models with Markov melding. BAYESIAN ANALYSIS 2022; 18:807-840. [PMID: 37587923 PMCID: PMC7614958 DOI: 10.1214/22-ba1327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/18/2023]
Abstract
A challenge for practitioners of Bayesian inference is specifying a model that incorporates multiple relevant, heterogeneous data sets. It may be easier to instead specify distinct submodels for each source of data, then join the submodels together. We consider chains of submodels, where submodels directly relate to their neighbours via common quantities which may be parameters or deterministic functions thereof. We propose chained Markov melding, an extension of Markov melding, a generic method to combine chains of submodels into a joint model. One challenge we address is appropriately capturing the prior dependence between common quantities within a submodel, whilst also reconciling differences in priors for the same common quantity between two adjacent submodels. Estimating the posterior of the resulting overall joint model is also challenging, so we describe a sampler that uses the chain structure to incorporate information contained in the submodels in multiple stages, possibly in parallel. We demonstrate our methodology using two examples. The first example considers an ecological integrated population model, where multiple data sets are required to accurately estimate population immigration and reproduction rates. We also consider a joint longitudinal and time-to-event model with uncertain, submodel-derived event times. Chained Markov melding is a conceptually appealing approach to integrating submodels in these settings.
Collapse
Affiliation(s)
- Andrew A. Manderson
- MRC Biostatistics Unit, University of Cambridge, United Kingdom, and The Alan Turing Institute
| | | |
Collapse
|
11
|
Oh EJ, Shepherd BE, Lumley T, Shaw PA. Improved generalized raking estimators to address dependent covariate and failure-time outcome error. Biom J 2021; 63:1006-1027. [PMID: 33709462 PMCID: PMC8211389 DOI: 10.1002/bimj.202000187] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2020] [Revised: 10/05/2020] [Accepted: 01/05/2021] [Indexed: 11/12/2022]
Abstract
Biomedical studies that use electronic health records (EHR) data for inference are often subject to bias due to measurement error. The measurement error present in EHR data is typically complex, consisting of errors of unknown functional form in covariates and the outcome, which can be dependent. To address the bias resulting from such errors, generalized raking has recently been proposed as a robust method that yields consistent estimates without the need to model the error structure. We provide rationale for why these previously proposed raking estimators can be expected to be inefficient in failure-time outcome settings involving misclassification of the event indicator. We propose raking estimators that utilize multiple imputation, to impute either the target variables or auxiliary variables, to improve the efficiency. We also consider outcome-dependent sampling designs and investigate their impact on the efficiency of the raking estimators, either with or without multiple imputation. We present an extensive numerical study to examine the performance of the proposed estimators across various measurement error settings. We then apply the proposed methods to our motivating setting, in which we seek to analyze HIV outcomes in an observational cohort with EHR data from the Vanderbilt Comprehensive Care Clinic.
Collapse
Affiliation(s)
- Eric J. Oh
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Bryan E. Shepherd
- Department of Biostatistics, Vanderbilt University, Nashville, TN, USA
| | - Thomas Lumley
- Department of Statistics, University of Auckland, Auckland, New Zealand
| | - Pamela A. Shaw
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
12
|
Mocanu A, Noja GG, Istodor AV, Moise G, Leretter M, Rusu LC, Marza AM, Mederle AO. Individual Characteristics as Prognostic Factors of the Evolution of Hospitalized COVID-19 Romanian Patients: A Comparative Observational Study between the First and Second Waves Based on Gaussian Graphical Models and Structural Equation Modeling. J Clin Med 2021; 10:1958. [PMID: 34063243 PMCID: PMC8124435 DOI: 10.3390/jcm10091958] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 04/29/2021] [Accepted: 04/29/2021] [Indexed: 02/07/2023] Open
Abstract
This study examines the role played by individual characteristics and specific treatment methods in the evolution of hospitalized patients with coronavirus disease 2019 (COVID-19), through the lens of an observational study performed in a comparative approach between the first and second waves of coronavirus pandemic in Romania. The research endeavor is configured on a two-fold approach, including a detailed observation of the evolution of 274 hospitalized patients with COVID-19 (145 in the first wave and 129 in the second wave of infection) according to specific treatment methods applied and patients' individual features, as well as an econometric (quantitative) analysis through structural equation modeling and Gaussian graphical models designed to acknowledge the correlations and causal relationship between all considered coordinates. The main results highlight that the specific treatment methods applied had a positive influence on the evolution of COVID-19 patients, particularly in the second wave of coronavirus pandemic. In case of the first wave of COVID-19 infection, GGM results entail that there is a strong positive correlation between the evolution of the patients and the COVID-19 disease form, which is further positively correlated with the treatment scheme. The evolution of the patients is strongly and inversely correlated with the symptomatology and the ICU hospitalization. Moreover, the disease form is strongly and inversely correlated with oxygen saturation and the residence of patients (urban/rural). The symptomatology at first appearance also strongly depends on the age of the patients (positive correlation) and of the fact that the patient is a smoker or non-smoker and has other comorbidities. Age and gender are also important credentials that shape the disease degree and patient evolution in responding to treatment as well, our study attesting strong interconnections between these coordinates, the form of disease, symptomatology and overall evolution of the patients.
Collapse
Affiliation(s)
- Alexandra Mocanu
- Department XIII, Discipline of Infectious Diseases, “Victor Babes” University of Medicine and Pharmacy Timisoara, 2 Eftimie Murgu Square, 300041 Timisoara, Romania;
| | - Gratiela Georgiana Noja
- Department of Marketing and International Economic Relations, Faculty of Economics and Business Administration, West University of Timisoara, 16 Pestalozzi Street, 300115 Timisoara, Romania;
| | - Alin Viorel Istodor
- First Department of Surgery, Second Discipline of Surgical Semiology, “Victor Babes” University of Medicine and Pharmacy Timisoara, 2 Eftimie Murgu Square, 300041 Timisoara, Romania
| | - Georgiana Moise
- Department of Clinical Pharmacology, “Victor Babes” University of Medicine and Pharmacy, “Pius Brinzeu” County Emergency Clinical Hospital Timisoara, 2 Eftimie Murgu Square, 300041 Timisoara, Romania;
| | - Marius Leretter
- Department of Prosthodontics, Multidisciplinary Center for Research, Evaluation, Diagnosis and Therapies in Oral Medicine, “Victor Babeș” University of Medicine and Pharmacy Timisoara, 2 Eftimie Murgu Square, 300041 Timisoara, Romania
| | - Laura-Cristina Rusu
- Department of Oral Pathology, Multidisciplinary Center for Research, Evaluation, Diagnosis and Therapies in Oral Medicine, “Victor Babeș” University of Medicine and Pharmacy Timisoara, 2 Eftimie Murgu Square, 300041 Timisoara, Romania;
| | - Adina Maria Marza
- Department of Surgery, Multidisciplinary Center for Research, Evaluation, Diagnosis and Therapies in Oral Medicine, “Victor Babes” University of Medicine and Pharmacy Timisoara, 2 Eftimie Murgu Square, 300041 Timisoara, Romania; (A.M.M.); (A.O.M.)
| | - Alexandru Ovidiu Mederle
- Department of Surgery, Multidisciplinary Center for Research, Evaluation, Diagnosis and Therapies in Oral Medicine, “Victor Babes” University of Medicine and Pharmacy Timisoara, 2 Eftimie Murgu Square, 300041 Timisoara, Romania; (A.M.M.); (A.O.M.)
| |
Collapse
|
13
|
Cloodt E, Wagner P, Lauge-Pedersen H, Rodby-Bousquet E. Knee and foot contracture occur earliest in children with cerebral palsy: a longitudinal analysis of 2,693 children. Acta Orthop 2021; 92:222-227. [PMID: 33228441 PMCID: PMC8158222 DOI: 10.1080/17453674.2020.1848154] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
Background and purpose - Joint contracture is a common problem among children with cerebral palsy (CP). To prevent severe contracture and its effects on adjacent joints, it is crucial to identify children with a reduced range of motion (ROM) early. We examined whether significant hip, knee, or foot contracture occurs earliest in children with CP.Patients and methods - This was a longitudinal study involving 27,230 measurements obtained for 2,693 children (59% boys, 41% girls) with CP born 1990 to 2018 and registered before 5 years of age in the Swedish surveillance program for CP. The analysis was based on 4,751 legs followed up for an average of 5.0 years. Separate Kaplan-Meier (KM) curves were drawn for each ROM to illustrate the proportions of contracture-free legs at a given time during the follow-up. Using a clustered bootstrap method and considering the child as the unit of clustering, 95% pointwise confidence intervals were generated for equally spaced time points every 2.5 years for each KM curve.Results - Contracture developed in 34% of all legs, and the median time to the first contracture was 10 years from the first examination. Contracture was most common in children with a higher Gross Motor Function Classification System (GMFCS) level. The first contracture was a flexion contracture preventing dorsiflexion in children with GMFCS level I or II and preventing knee extension in children with GMFCS level III to V.Interpretation - Early interventions to prevent knee and foot contractures in children with CP should be considered.
Collapse
Affiliation(s)
- Erika Cloodt
- Department of Clinical Sciences Lund, Orthopaedics, Lund University, Lund; ,Department of Research and Development, Region Kronoberg, Växjö;; ,Correspondence:
| | - Philippe Wagner
- Centre for Clinical Research Västerås, Uppsala University-Region Västmanland, Västerås, Sweden
| | | | - Elisabet Rodby-Bousquet
- Department of Clinical Sciences Lund, Orthopaedics, Lund University, Lund; ,Centre for Clinical Research Västerås, Uppsala University-Region Västmanland, Västerås, Sweden
| |
Collapse
|
14
|
Oh EJ, Shepherd BE, Lumley T, Shaw PA. Raking and regression calibration: Methods to address bias from correlated covariate and time-to-event error. Stat Med 2021; 40:631-649. [PMID: 33140432 PMCID: PMC7874496 DOI: 10.1002/sim.8793] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Revised: 08/05/2020] [Accepted: 10/11/2020] [Indexed: 11/11/2022]
Abstract
Medical studies that depend on electronic health records (EHR) data are often subject to measurement error, as the data are not collected to support research questions under study. These data errors, if not accounted for in study analyses, can obscure or cause spurious associations between patient exposures and disease risk. Methodology to address covariate measurement error has been well developed; however, time-to-event error has also been shown to cause significant bias, but methods to address it are relatively underdeveloped. More generally, it is possible to observe errors in both the covariate and the time-to-event outcome that are correlated. We propose regression calibration (RC) estimators to simultaneously address correlated error in the covariates and the censored event time. Although RC can perform well in many settings with covariate measurement error, it is biased for nonlinear regression models, such as the Cox model. Thus, we additionally propose raking estimators which are consistent estimators of the parameter defined by the population estimating equation. Raking can improve upon RC in certain settings with failure-time data, require no explicit modeling of the error structure, and can be utilized under outcome-dependent sampling designs. We discuss features of the underlying estimation problem that affect the degree of improvement the raking estimator has over the RC approach. Detailed simulation studies are presented to examine the performance of the proposed estimators under varying levels of signal, error, and censoring. The methodology is illustrated on observational EHR data on HIV outcomes from the Vanderbilt Comprehensive Care Clinic.
Collapse
Affiliation(s)
- Eric J. Oh
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Bryan E. Shepherd
- Department of Biostatistics, Vanderbilt University, Nashville, Tennessee, USA
| | - Thomas Lumley
- Department of Statistics, University of Auckland, Auckland, New Zealand
| | - Pamela A. Shaw
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
15
|
Giganti MJ, Shepherd BE. Multiple-Imputation Variance Estimation in Studies With Missing or Misclassified Inclusion Criteria. Am J Epidemiol 2020; 189:1628-1632. [PMID: 32685964 PMCID: PMC7705600 DOI: 10.1093/aje/kwaa153] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Revised: 07/13/2020] [Accepted: 07/14/2020] [Indexed: 11/15/2022] Open
Abstract
In observational studies using routinely collected data, a variable with a high level of missingness or misclassification may determine whether an observation is included in the analysis. In settings where inclusion criteria are assessed after imputation, the popular multiple-imputation variance estimator proposed by Rubin ("Rubin's rules" (RR)) is biased due to incompatibility between imputation and analysis models. While alternative approaches exist, most analysts are not familiar with them. Using partially validated data from a human immunodeficiency virus cohort, we illustrate the calculation of an imputation variance estimator proposed by Robins and Wang (RW) in a scenario where the study exclusion criteria are based on a variable that must be imputed. In this motivating example, the corresponding imputation variance estimate for the log odds was 29% smaller using the RW estimator than using the RR estimator. We further compared these 2 variance estimators with a simulation study which showed that coverage probabilities of 95% confidence intervals based on the RR estimator were too high and became worse as more observations were imputed and more subjects were excluded from the analysis. The RW imputation variance estimator performed much better and should be employed when there is incompatibility between imputation and analysis models. We provide analysis code to aid future analysts in implementing this method.
Collapse
Affiliation(s)
- Mark J Giganti
- Correspondence to Dr. Mark J. Giganti, Center for Biostatistics in AIDS Research, Harvard T.H. Chan School of Public Health, 651 Huntington Avenue, Boston, MA 02115 (e-mail: )
| | | |
Collapse
|
16
|
Parast L, Garcia TP, Prentice RL, Carroll RJ. Robust methods to correct for measurement error when evaluating a surrogate marker. Biometrics 2020; 78:9-23. [PMID: 33021738 DOI: 10.1111/biom.13386] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Revised: 09/16/2020] [Accepted: 09/30/2020] [Indexed: 11/27/2022]
Abstract
The identification of valid surrogate markers of disease or disease progression has the potential to decrease the length and costs of future studies. Most available methods that assess the value of a surrogate marker ignore the fact that surrogates are often measured with error. Failing to adjust for measurement error can erroneously identify a useful surrogate marker as not useful or vice versa. We investigate and propose robust methods to correct for the effect of measurement error when evaluating a surrogate marker using multiple estimators developed for parametric and nonparametric estimates of the proportion of treatment effect explained by the surrogate marker. In addition, we quantify the attenuation bias induced by measurement error and develop inference procedures to allow for variance and confidence interval estimation. Through a simulation study, we show that our proposed estimators correct for measurement error in the surrogate marker and that our inference procedures perform well in finite samples. We illustrate these methods by examining a potential surrogate marker that is measured with error, hemoglobin A1c, using data from the Diabetes Prevention Program clinical trial.
Collapse
Affiliation(s)
- Layla Parast
- RAND Corporation, Statistics Group, Santa Monica, California
| | - Tanya P Garcia
- Department of Biostatistics, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Ross L Prentice
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington
| | - Raymond J Carroll
- Department of Statistics, Texas A&M University, College Station, Texas.,School of Mathematical and Physical Sciences, University of Technology Sydney, Broadway, NSW, Australia
| |
Collapse
|
17
|
Shepherd BE, Shaw PA. Errors in multiple variables in human immunodeficiency virus (HIV) cohort and electronic health record data: statistical challenges and opportunities. STATISTICAL COMMUNICATIONS IN INFECTIOUS DISEASES 2020; 12:20190015. [PMID: 35880997 PMCID: PMC9204761 DOI: 10.1515/scid-2019-0015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Accepted: 08/21/2020] [Indexed: 06/15/2023]
Abstract
Objectives: Observational data derived from patient electronic health records (EHR) data are increasingly used for human immunodeficiency virus/acquired immunodeficiency syndrome (HIV/AIDS) research. There are challenges to using these data, in particular with regards to data quality; some are recognized, some unrecognized, and some recognized but ignored. There are great opportunities for the statistical community to improve inference by incorporating validation subsampling into analyses of EHR data.Methods: Methods to address measurement error, misclassification, and missing data are relevant, as are sampling designs such as two-phase sampling. However, many of the existing statistical methods for measurement error, for example, only address relatively simple settings, whereas the errors seen in these datasets span multiple variables (both predictors and outcomes), are correlated, and even affect who is included in the study.Results/Conclusion: We will discuss some preliminary methods in this area with a particular focus on time-to-event outcomes and outline areas of future research.
Collapse
Affiliation(s)
- Bryan E. Shepherd
- Biostatistics, Vanderbilt University, 2525 West End, Suite 11000, 37203Nashville, Tennessee, USA
| | - Pamela A. Shaw
- Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
18
|
Keogh RH, Shaw PA, Gustafson P, Carroll RJ, Deffner V, Dodd KW, Küchenhoff H, Tooze JA, Wallace MP, Kipnis V, Freedman LS. STRATOS guidance document on measurement error and misclassification of variables in observational epidemiology: Part 1-Basic theory and simple methods of adjustment. Stat Med 2020; 39:2197-2231. [PMID: 32246539 PMCID: PMC7450672 DOI: 10.1002/sim.8532] [Citation(s) in RCA: 79] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2018] [Revised: 02/25/2020] [Accepted: 02/28/2020] [Indexed: 11/11/2022]
Abstract
Measurement error and misclassification of variables frequently occur in epidemiology and involve variables important to public health. Their presence can impact strongly on results of statistical analyses involving such variables. However, investigators commonly fail to pay attention to biases resulting from such mismeasurement. We provide, in two parts, an overview of the types of error that occur, their impacts on analytic results, and statistical methods to mitigate the biases that they cause. In this first part, we review different types of measurement error and misclassification, emphasizing the classical, linear, and Berkson models, and on the concepts of nondifferential and differential error. We describe the impacts of these types of error in covariates and in outcome variables on various analyses, including estimation and testing in regression models and estimating distributions. We outline types of ancillary studies required to provide information about such errors and discuss the implications of covariate measurement error for study design. Methods for ascertaining sample size requirements are outlined, both for ancillary studies designed to provide information about measurement error and for main studies where the exposure of interest is measured with error. We describe two of the simpler methods, regression calibration and simulation extrapolation (SIMEX), that adjust for bias in regression coefficients caused by measurement error in continuous covariates, and illustrate their use through examples drawn from the Observing Protein and Energy (OPEN) dietary validation study. Finally, we review software available for implementing these methods. The second part of the article deals with more advanced topics.
Collapse
Affiliation(s)
- Ruth H Keogh
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK
| | - Pamela A Shaw
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Paul Gustafson
- Department of Statistics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Raymond J Carroll
- Department of Statistics, Texas A&M University, College Station, Texas, USA
- School of Mathematical and Physical Sciences, University of Technology Sydney, Broadway, New South Wales, Australia
| | - Veronika Deffner
- Statistical Consulting Unit StaBLab, Department of Statistics, Ludwig-Maximilians-Universität, Munich, Germany
| | - Kevin W Dodd
- Biometry Research Group, Division of Cancer Prevention, National Cancer Institute, Bethesda, Maryland, USA
| | - Helmut Küchenhoff
- Department of Statistics, Statistical Consulting Unit StaBLab, Ludwig-Maximilians-Universität, Munich, Germany
| | - Janet A Tooze
- Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA
| | - Michael P Wallace
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada
| | - Victor Kipnis
- Biometry Research Group, Division of Cancer Prevention, National Cancer Institute, Bethesda, Maryland, USA
| | - Laurence S Freedman
- Biostatistics and Biomathematics Unit, Gertner Institute for Epidemiology and Health Policy Research, Tel Hashomer, Israel
- Information Management Services Inc., Rockville, Maryland, USA
| |
Collapse
|
19
|
Giganti MJ, Shaw PA, Chen G, Bebawy SS, Turner MM, Sterling TR, Shepherd BE. ACCOUNTING FOR DEPENDENT ERRORS IN PREDICTORS AND TIME-TO-EVENT OUTCOMES USING ELECTRONIC HEALTH RECORDS, VALIDATION SAMPLES, AND MULTIPLE IMPUTATION. Ann Appl Stat 2020; 14:1045-1061. [PMID: 32999698 PMCID: PMC7523695 DOI: 10.1214/20-aoas1343] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Data from electronic health records (EHR) are prone to errors, which are often correlated across multiple variables. The error structure is further complicated when analysis variables are derived as functions of two or more error-prone variables. Such errors can substantially impact estimates, yet we are unaware of methods that simultaneously account for errors in covariates and time-to-event outcomes. Using EHR data from 4217 patients, the hazard ratio for an AIDS-defining event associated with a 100 cell/mm3 increase in CD4 count at ART initiation was 0.74 (95%CI: 0.68-0.80) using unvalidated data and 0.60 (95%CI: 0.53-0.68) using fully validated data. Our goal is to obtain unbiased and efficient estimates after validating a random subset of records. We propose fitting discrete failure time models to the validated subsample and then multiply imputing values for unvalidated records. We demonstrate how this approach simultaneously addresses dependent errors in predictors, time-to-event outcomes, and inclusion criteria. Using the fully validated dataset as a gold standard, we compare the mean squared error of our estimates with those from the unvalidated dataset and the corresponding subsample-only dataset for various subsample sizes. By incorporating reasonably sized validated subsamples and appropriate imputation models, our approach had improved estimation over both the naive analysis and the analysis using only the validation subsample.
Collapse
Affiliation(s)
| | - Pamela A. Shaw
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania
| | - Guanhua Chen
- Department of Biostatistics and Medical Informatics, University of Wisconsin
| | | | | | | | | |
Collapse
|
20
|
Shortreed SM, Cook AJ, Coley RY, Bobb JF, Nelson JC. Challenges and Opportunities for Using Big Health Care Data to Advance Medical Science and Public Health. Am J Epidemiol 2019; 188:851-861. [PMID: 30877288 DOI: 10.1093/aje/kwy292] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Accepted: 12/20/2018] [Indexed: 12/14/2022] Open
Abstract
Methodological advancements in epidemiology, biostatistics, and data science have strengthened the research world's ability to use data captured from electronic health records (EHRs) to address pressing medical questions, but gaps remain. We describe methods investments that are needed to curate EHR data toward research quality and to integrate complementary data sources when EHR data alone are insufficient for research goals. We highlight new methods and directions for improving the integrity of medical evidence generated from pragmatic trials, observational studies, and predictive modeling. We also discuss needed methods contributions to further ease data sharing across multisite EHR data networks. Throughout, we identify opportunities for training and for bolstering collaboration among subject matter experts, methodologists, practicing clinicians, and health system leaders to help ensure that methods problems are identified and resulting advances are translated into mainstream research practice more quickly.
Collapse
Affiliation(s)
- Susan M Shortreed
- Biostatistics Unit, Kaiser Permanente Washington Health Research Institute, Seattle, Washington
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, Washington
| | - Andrea J Cook
- Biostatistics Unit, Kaiser Permanente Washington Health Research Institute, Seattle, Washington
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, Washington
| | - R Yates Coley
- Biostatistics Unit, Kaiser Permanente Washington Health Research Institute, Seattle, Washington
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, Washington
| | - Jennifer F Bobb
- Biostatistics Unit, Kaiser Permanente Washington Health Research Institute, Seattle, Washington
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, Washington
| | - Jennifer C Nelson
- Biostatistics Unit, Kaiser Permanente Washington Health Research Institute, Seattle, Washington
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, Washington
| |
Collapse
|