1
|
Sevilimedu V, Yu L, Samawi H. Misclassification simulation extrapolation method for a Weibull accelerated failure time model. Stat Methods Med Res 2023; 32:1478-1493. [PMID: 37122155 PMCID: PMC10939450 DOI: 10.1177/09622802231168248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
The problem of misclassification in covariates is ubiquitous in survival data and often leads to biased estimates. The misclassification simulation extrapolation method is a popular method to correct this bias. However, its impact on Weibull accelerated failure time models has not been studied. In this paper, we study the bias caused by misclassification in one or more binary covariates in Weibull accelerated failure time models and explore the use of the misclassification simulation extrapolation in correcting for this bias, along with its asymptotic properties. Simulation studies are carried out to investigate the numerical properties of the resulting estimator for finite samples. The proposed method is then applied to colon cancer data obtained from the cancer registry at Memorial Sloan Kettering Cancer Center.
Collapse
Affiliation(s)
- Varadan Sevilimedu
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Lili Yu
- Department of Biostatistics, Epidemiology and Environmental Health Sciences, JPH college of Public Health, Georgia Southern University, Statesboro, GA, USA
| | - Hani Samawi
- Department of Biostatistics, Epidemiology and Environmental Health Sciences, JPH college of Public Health, Georgia Southern University, Statesboro, GA, USA
| |
Collapse
|
2
|
Cheng C, Spiegelman D, Li F. Mediation analysis in the presence of continuous exposure measurement error. Stat Med 2023; 42:1669-1686. [PMID: 36869626 PMCID: PMC11320713 DOI: 10.1002/sim.9693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 01/06/2023] [Accepted: 02/16/2023] [Indexed: 03/05/2023]
Abstract
The difference method is used in mediation analysis to quantify the extent to which a mediator explains the mechanisms underlying the pathway between an exposure and an outcome. In many health science studies, the exposures are almost never measured without error, which can result in biased effect estimates. This article investigates methods for mediation analysis when a continuous exposure is mismeasured. Under a linear exposure measurement error model, we prove that the bias of indirect effect and mediation proportion can go in either direction but the mediation proportion is usually be less biased when the associations between the exposure and its error-prone counterpart are similar with and without adjustment for the mediator. We further propose methods to adjust for exposure measurement error with continuous and binary outcomes. The proposed approaches require a main study/validation study design where in the validation study, data are available for characterizing the relationship between the true exposure and its error-prone counterpart. The proposed approaches are then applied to the Health Professional Follow-up Study, 1986-2016, to investigate the impact of body mass index (BMI) as a mediator for mediating the effect of physical activity on the risk of cardiovascular diseases. Our results reveal that physical activity is significantly associated with a lower risk of cardiovascular disease incidence, and approximately half of the total effect of physical activity is mediated by BMI after accounting for exposure measurement error. Extensive simulation studies are conducted to demonstrate the validity and efficiency of the proposed approaches in finite samples.
Collapse
Affiliation(s)
- Chao Cheng
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
- Center for Methods in Implementation and Prevention Science, Yale School of Public Health, New Haven, Connecticut, USA
| | - Donna Spiegelman
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
- Center for Methods in Implementation and Prevention Science, Yale School of Public Health, New Haven, Connecticut, USA
| | - Fan Li
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
- Center for Methods in Implementation and Prevention Science, Yale School of Public Health, New Haven, Connecticut, USA
| |
Collapse
|
3
|
Cao Z, Wong MY, Cheng GH. Logistic regression with correlated measurement error and misclassification in covariates. Stat Methods Med Res 2023; 32:789-805. [PMID: 36790894 DOI: 10.1177/09622802231154324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]
Abstract
Many areas of research, such as nutritional epidemiology, may encounter measurement errors of continuous covariates and misclassification of categorical variables when modeling. It is well known that ignoring measurement errors or misclassification can lead to biased results. But most research has focused on solving these two problems separately. Addressing both measurement error and misclassification simultaneously in a single analysis is less actively studied. In this article, we propose a new correction method for a logistic regression to handle correlated error variables involved in multivariate continuous covariates and misclassification in a categorical variable simultaneously. It is not computationally intensive since a closed-form of the approximate likelihood function conditional on observed covariates is derived. The asymptotic normality of this proposed estimator is established under regularity conditions and its finite-sample performance is empirically examined by simulation studies. We apply this new estimation method to handle measurement error in some nutrients of interest and misclassification of a categorical variable named physical activity in the European Prospective Investigation into Cancer and Nutrition-InterAct Study data. Analyses show that fruit is negatively associated with type 2 diabetes for a group of women doing active physical activity, protein has positive association with type 2 diabetes for the group of less active physical activity, and actual physical activity has a greater effect on reducing the risk of type 2 diabetes than observed physical activity.
Collapse
Affiliation(s)
- Zhiqiang Cao
- College of Big Data and Internet, 507738Shenzhen Technology University, Shenzhen, China
| | - Man Yu Wong
- Department of Mathematics, 58207The Hong Kong University of Science and Technology, Hong Kong, China
| | - Garvin Hl Cheng
- Department of Mathematics, 58207The Hong Kong University of Science and Technology, Hong Kong, China
| |
Collapse
|
4
|
Yi GY, Chen LP. Estimation of the average treatment effect with variable selection and measurement error simultaneously addressed for potential confounders. Stat Methods Med Res 2023; 32:691-711. [PMID: 36694932 PMCID: PMC10119903 DOI: 10.1177/09622802221146308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
In the framework of causal inference, the inverse probability weighting estimation method and its variants have been commonly employed to estimate the average treatment effect. Such methods, however, are challenged by the presence of irrelevant pre-treatment variables and measurement error. Ignoring these features and naively applying the usual inverse probability weighting estimation procedures may typically yield biased inference results. In this article, we develop an inference method for estimating the average treatment effect with those features taken into account. We establish theoretical properties for the resulting estimator and carry out numerical studies to assess the finite sample performance of the proposed estimator.
Collapse
Affiliation(s)
- Grace Y Yi
- Department of Statistical and Actuarial Sciences, 6221University of Western Ontario, London, Canada.,Department of Computer Science, 6221University of Western Ontario, London, Canada
| | - Li-Pang Chen
- Department of Statistical and Actuarial Sciences, 6221University of Western Ontario, London, Canada.,Department of Statistics, 34913National Chengchi University, Taipei, Taiwan
| |
Collapse
|
5
|
Zhang Q, Yi GY. Zero-inflated poisson models with measurement error in the response. Biometrics 2022. [PMID: 35261029 DOI: 10.1111/biom.13657] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 02/02/2022] [Accepted: 02/14/2022] [Indexed: 11/26/2022]
Abstract
Zero-inflated count data arise frequently from genomics studies. Analysis of such data is often based on a mixture model which facilitates excess zeros in combination with a Poisson distribution, and various inference methods have been proposed under such a model. Those analysis procedures, however, are challenged by the presence of measurement error in count responses. In this article, we propose a new measurement error model to describe error-contaminated count data. We show that ignoring the measurement error effects in the analysis may generally lead to invalid inference results, and meanwhile, we identify situations where ignoring measurement error can still yield consistent estimators. Furthermore, we propose a Bayesian method to address the effects of measurement error under the zero-inflated Poisson model and discuss the identifiability issues. We develop a data-augmentation algorithm that is easy to implement. Simulation studies are conducted to evaluate the performance of the proposed method. We apply our method to analyze the data arising from a prostate adenocarcinoma genomics study. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Qihuang Zhang
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, N2L 3G1, Canada
| | - Grace Y Yi
- Department of Statistical and Actuarial Sciences, Department of Computer Science, University of Western Ontario, London, N6A 5B7, Canada
| |
Collapse
|
6
|
Chen LP, Yi GY. De-noising analysis of noisy data under mixed graphical models. Electron J Stat 2022. [DOI: 10.1214/22-ejs2028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Li-Pang Chen
- Department of Statistics, National Chengchi University
| | - Grace Y. Yi
- Department Statistical and Actuarial Sciences, Department of Computer Science, University of Western Ontario
| |
Collapse
|
7
|
Chen X, Chang J, Spiegelman D, Li F. A Bayesian approach for estimating the partial potential impact fraction with exposure measurement error under a main study/internal validation design. Stat Methods Med Res 2021; 31:404-418. [PMID: 34841964 DOI: 10.1177/09622802211060514] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
The partial potential impact fraction describes the proportion of disease cases that can be prevented if the distribution of modifiable continuous exposures is shifted in a population, while other risk factors are not modified. It is a useful quantity for evaluating the burden of disease in epidemiologic and public health studies. When exposures are measured with error, the partial potential impact fraction estimates may be biased, which necessitates methods to correct for the exposure measurement error. Motivated by the health professionals follow-up study, we develop a Bayesian approach to adjust for exposure measurement error when estimating the partial potential impact fraction under the main study/internal validation study design. We adopt the reclassification approach that leverages the strength of the main study/internal validation study design and clarifies transportability assumptions for valid inference. We assess the finite-sample performance of both the point and credible interval estimators via extensive simulations and apply the proposed approach in the health professionals follow-up study to estimate the partial potential impact fraction for colorectal cancer incidence under interventions exploring shifting the distributions of red meat, alcohol, and/or folate intake.
Collapse
Affiliation(s)
- Xinyuan Chen
- Department of Mathematics and Statistics, 5547Mississippi State University, Mississippi State, MS, USA
| | - Joseph Chang
- Department of Statistics and Data Science, 5755Yale University, New Haven, CT, USA
| | - Donna Spiegelman
- Department of Statistics and Data Science, 5755Yale University, New Haven, CT, USA
- Department of Biostatistics, 50296Yale University School of Public Health, New Haven, CT, USA
- Center for Methods in Implementation and Preventive Science, 5755Yale University, New Haven, CT, USA
| | - Fan Li
- Department of Biostatistics, 50296Yale University School of Public Health, New Haven, CT, USA
- Center for Methods in Implementation and Preventive Science, 5755Yale University, New Haven, CT, USA
| |
Collapse
|
8
|
Wong BHW, Lee J, Spiegelman D, Wang M. Estimation and inference for the population attributable risk in the presence of misclassification. Biostatistics 2021; 22:805-818. [PMID: 32112073 DOI: 10.1093/biostatistics/kxz067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Revised: 12/27/2019] [Accepted: 12/29/2019] [Indexed: 11/14/2022] Open
Abstract
Because it describes the proportion of disease cases that could be prevented if an exposure were entirely eliminated from a target population as a result of an intervention, estimation of the population attributable risk (PAR) has become an important goal of public health research. In epidemiologic studies, categorical covariates are often misclassified. We present methods for obtaining point and interval estimates of the PAR and the partial PAR (pPAR) in the presence of misclassification, filling an important existing gap in public health evaluation methods. We use a likelihood-based approach to estimate parameters in the models for the disease and for the misclassification process, under main study/internal validation study and main study/external validation study designs, and various plausible assumptions about transportability. We assessed the finite sample perf ormance of this method via a simulation study, and used it to obtain corrected point and interval estimates of the pPAR for high red meat intake and alcohol intake in relation to colorectal cancer incidence in the HPFS, where we found that the estimated pPAR for the two risk factors increased by up to 317% after correcting for bias due to misclassification.
Collapse
Affiliation(s)
- Benedict H W Wong
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA
| | - Jooyoung Lee
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA and Department of Epidemiology, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA
| | - Donna Spiegelman
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA, Department of Epidemiology, Harvard T.H. Chan School of Public Health, 181 Longwood Ave, Boston, MA 02115, USA, Department of Nutrition and Global Health & Population, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA and Department of Biostatistics, Center on Methods in Implementation and Prevention Science, Yale School of Public Health, 60 College St, New Haven, CT 06510, USA
| | - Molin Wang
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA, Department of Epidemiology, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA and Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, 181 Longwood Ave, Boston, MA 02115
| |
Collapse
|
9
|
Zhang C, Gu X, Chen Y. Estimation for frailty measurement error Cox models based on profile likelihood and Bayes methods. COMMUN STAT-SIMUL C 2021. [DOI: 10.1080/03610918.2019.1572763] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Caiya Zhang
- Department of Statistics, Zhejiang University City College, Hangzhou, China
| | - Xiaolu Gu
- Department of Statistics, Zhejiang University City College, Hangzhou, China
| | - Yingyu Chen
- Department of Statistics, Zhejiang University City College, Hangzhou, China
| |
Collapse
|
10
|
Zhang Q, Yi GY. Marginal analysis of bivariate mixed responses with measurement error and misclassification. Stat Methods Med Res 2021; 30:1155-1186. [PMID: 33635738 DOI: 10.1177/0962280220983587] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Bivariate responses with mixed continuous and binary variables arise commonly in applications such as clinical trials and genetic studies. Statistical methods based on jointly modeling continuous and binary variables have been available. However, such methods ignore the effects of response mismeasurement, a ubiquitous feature in applications. It has been well studied that in many settings, ignorance of mismeasurement in variables usually results in biased estimation. In this paper, we consider the setting with a bivariate outcome vector which contains a continuous component and a binary component both subject to mismeasurement. We propose estimating equation approaches to handle measurement error in the continuous response and misclassification in the binary response simultaneously. The proposed estimators are consistent and robust to certain model misspecification, provided regularity conditions. Extensive simulation studies confirm that the proposed methods successfully correct the biases resulting from the error-in-variables under various settings. The proposed methods are applied to analyze the outbred Carworth Farms White mice data arising from a genome-wide association study.
Collapse
Affiliation(s)
- Qihuang Zhang
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, ON, Canada
| | - Grace Y Yi
- Department of Statistics and Actuarial Sciences and Department of Computer Science, University of Western Ontario, London, ON, Canada
| |
Collapse
|
11
|
Curley B. A nonlinear measurement error model and its application to describing the dependency of health outcomes on dietary intake. J Appl Stat 2021; 49:1485-1518. [DOI: 10.1080/02664763.2020.1870671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- B. Curley
- Moravian College, Bethlehem, PA, USA
| |
Collapse
|
12
|
Zhang Q, Yi GY. Genetic association studies with bivariate mixed responses subject to measurement error and misclassification. Stat Med 2020; 39:3700-3719. [PMID: 32914420 DOI: 10.1002/sim.8688] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Revised: 04/12/2020] [Accepted: 06/13/2020] [Indexed: 01/01/2023]
Abstract
In genetic association studies, mixed effects models have been widely used in detecting the pleiotropy effects which occur when one gene affects multiple phenotype traits. In particular, bivariate mixed effects models are useful for describing the association of a gene with a continuous trait and a binary trait. However, such models are inadequate to feature the data with response mismeasurement, a characteristic that is often overlooked. It has been well studied that in univariate settings, ignorance of mismeasurement in variables usually results in biased estimation. In this paper, we consider the setting with a bivariate outcome vector which contains a continuous component and a binary component both subject to mismeasurement. We propose an induced likelihood approach and an EM algorithm method to handle measurement error in continuous response and misclassification in binary response simultaneously. Simulation studies confirm that the proposed methods successfully remove the bias induced from the response mismeasurement.
Collapse
Affiliation(s)
- Qihuang Zhang
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, N2L3G1, Canada
| | - Grace Y Yi
- Department of Statistical and Actuarial Sciences, Department of Computer Science, University of Western Ontario, London, Ontario, Canada, N6A 5B7
| |
Collapse
|
13
|
Cao Z, Wong MY. Approximate maximum likelihood estimation for logistic regression with covariate measurement error. Biom J 2020; 63:27-45. [PMID: 32914478 DOI: 10.1002/bimj.202000024] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Revised: 05/03/2020] [Accepted: 06/26/2020] [Indexed: 11/07/2022]
Abstract
In nutritional epidemiology, dietary intake assessed with a food frequency questionnaire is prone to measurement error. Ignoring the measurement error in covariates causes estimates to be biased and leads to a loss of power. In this paper, we consider an additive error model according to the characteristics of the European Prospective Investigation into Cancer and Nutrition (EPIC)-InterAct Study data, and derive an approximate maximum likelihood estimation (AMLE) for covariates with measurement error under logistic regression. This method can be regarded as an adjusted version of regression calibration and can provide an approximate consistent estimator. Asymptotic normality of this estimator is established under regularity conditions, and simulation studies are conducted to empirically examine the finite sample performance of the proposed method. We apply AMLE to deal with measurement errors in some interested nutrients of the EPIC-InterAct Study under a sensitivity analysis framework.
Collapse
Affiliation(s)
- Zhiqiang Cao
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen, P. R. China.,Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, P. R. China
| | - Man Yu Wong
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, P. R. China
| |
Collapse
|
14
|
Zhang X, Ma Y, Carroll RJ. MALMEM: model averaging in linear measurement error models. J R Stat Soc Series B Stat Methodol 2020; 81:763-779. [PMID: 32863735 DOI: 10.1111/rssb.12317] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
We develop model averaging estimation in the linear regression model where some covariates are subject to measurement error. The absence of the true covariates in this framework makes the calculation of the standard residual-based loss function impossible. We take advantage of the explicit form of the parameter estimators and construct a weight choice criterion. It is asymptotically equivalent to the unknown model average estimator minimizing the loss function. When the true model is not included in the set of candidate models, the method achieves optimality in terms of minimizing the relative loss, whereas, when the true model is included, the method estimates the model parameter with root n rate. Simulation results in comparison with existing Bayesian information criterion and Akaike information criterion model selection and model averaging methods strongly favour our model averaging method. The method is applied to a study on health.
Collapse
Affiliation(s)
- Xinyu Zhang
- University of Science and Technology of China, Hefei, and Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Yanyuan Ma
- Pennsylvania State University, University Park, USA
| | - Raymond J Carroll
- Texas A&M University, College Station, USA, and University of Technology Sydney, Australia
| |
Collapse
|
15
|
Chen LP, Yi GY. Analysis of noisy survival data with graphical proportional hazards measurement error models. Biometrics 2020; 77:956-969. [PMID: 32687216 DOI: 10.1111/biom.13331] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Revised: 04/15/2020] [Accepted: 07/09/2020] [Indexed: 01/07/2023]
Abstract
In survival data analysis, the Cox proportional hazards (PH) model is perhaps the most widely used model to feature the dependence of survival times on covariates. While many inference methods have been developed under such a model or its variants, those models are not adequate for handling data with complex structured covariates. High-dimensional survival data often entail several features: (1) many covariates are inactive in explaining the survival information, (2) active covariates are associated in a network structure, and (3) some covariates are error-contaminated. To hand such kinds of survival data, we propose graphical PH measurement error models and develop inferential procedures for the parameters of interest. Our proposed models significantly enlarge the scope of the usual Cox PH model and have great flexibility in characterizing survival data. Theoretical results are established to justify the proposed methods. Numerical studies are conducted to assess the performance of the proposed methods.
Collapse
Affiliation(s)
- Li-Pang Chen
- Department of Statistical and Actuarial Sciences, University of Western Ontario, London, Ontario, Canada
| | - Grace Y Yi
- Department of Statistical and Actuarial Sciences, University of Western Ontario, London, Ontario, Canada.,Department of Computer Science, University of Western Ontario, London, Ontario, Canada
| |
Collapse
|
16
|
Shaw PA, Gustafson P, Carroll RJ, Deffner V, Dodd KW, Keogh RH, Kipnis V, Tooze JA, Wallace MP, Küchenhoff H, Freedman LS. STRATOS guidance document on measurement error and misclassification of variables in observational epidemiology: Part 2-More complex methods of adjustment and advanced topics. Stat Med 2020; 39:2232-2263. [PMID: 32246531 PMCID: PMC7272296 DOI: 10.1002/sim.8531] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2018] [Revised: 02/27/2020] [Accepted: 02/28/2020] [Indexed: 12/24/2022]
Abstract
We continue our review of issues related to measurement error and misclassification in epidemiology. We further describe methods of adjusting for biased estimation caused by measurement error in continuous covariates, covering likelihood methods, Bayesian methods, moment reconstruction, moment-adjusted imputation, and multiple imputation. We then describe which methods can also be used with misclassification of categorical covariates. Methods of adjusting estimation of distributions of continuous variables for measurement error are then reviewed. Illustrative examples are provided throughout these sections. We provide lists of available software for implementing these methods and also provide the code for implementing our examples in the Supporting Information. Next, we present several advanced topics, including data subject to both classical and Berkson error, modeling continuous exposures with measurement error, and categorical exposures with misclassification in the same model, variable selection when some of the variables are measured with error, adjusting analyses or design for error in an outcome variable, and categorizing continuous variables measured with error. Finally, we provide some advice for the often met situations where variables are known to be measured with substantial error, but there is only an external reference standard or partial (or no) information about the type or magnitude of the error.
Collapse
Affiliation(s)
- Pamela A Shaw
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Paul Gustafson
- Department of Statistics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Raymond J Carroll
- Department of Statistics, Texas A&M University, College Station, Texas, USA
- School of Mathematical and Physical Sciences, University of Technology Sydney, Broadway, New South Wales, Australia
| | - Veronika Deffner
- Statistical Consulting Unit StaBLab, Department of Statistics, Ludwig-Maximilians-Universität, Munich, Germany
| | - Kevin W Dodd
- Biometry Research Group, Division of Cancer Prevention, National Cancer Institute, Bethesda, Maryland, USA
| | - Ruth H Keogh
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK
| | - Victor Kipnis
- Biometry Research Group, Division of Cancer Prevention, National Cancer Institute, Bethesda, Maryland, USA
| | - Janet A Tooze
- Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA
| | - Michael P Wallace
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada
| | - Helmut Küchenhoff
- Statistical Consulting Unit StaBLab, Department of Statistics, Ludwig-Maximilians-Universität, Munich, Germany
| | - Laurence S Freedman
- Biostatistics and Biomathematics Unit, Gertner Institute for Epidemiology and Health Policy Research, Sheba Medical Center, Tel Hashomer, Israel
- Information Management Services Inc., Rockville, Maryland, USA
| |
Collapse
|
17
|
Semiparametric methods for left-truncated and right-censored survival data with covariate measurement error. ANN I STAT MATH 2020. [DOI: 10.1007/s10463-020-00755-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
18
|
Locally efficient estimation in generalized partially linear model with measurement error in nonlinear function. TEST-SPAIN 2020. [DOI: 10.1007/s11749-019-00668-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
19
|
Chen LP, Yi GY. Model selection and model averaging for analysis of truncated and censored data with measurement error. Electron J Stat 2020. [DOI: 10.1214/20-ejs1762] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
|
20
|
Geng P, Koul HL. Minimum distance model checking in Berkson measurement error models with validation data. TEST-SPAIN 2019. [DOI: 10.1007/s11749-018-0610-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
21
|
Zhang Q, Yi GY. R package for analysis of data with mixed measurement error and misclassification in covariates: augSIMEX. J STAT COMPUT SIM 2019. [DOI: 10.1080/00949655.2019.1615911] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Affiliation(s)
- Qihuang Zhang
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Canada
| | - Grace Y. Yi
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Canada
| |
Collapse
|
22
|
Lin YC, Hsu HK, Lai TS, Chiang WC, Lin SL, Chen YM, Chen CC, Chu TS. Emergency department utilization and resuscitation rate among patients receiving maintenance hemodialysis. J Formos Med Assoc 2019; 118:1652-1660. [PMID: 30711255 DOI: 10.1016/j.jfma.2019.01.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2018] [Revised: 01/03/2019] [Accepted: 01/09/2019] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND End-stage renal disease (ESRD) is a growing global health concern with increased disease burden and high medical costs. Utilization of the emergency department (ED) among dialyzed patients and the associated risk factors remain unknown. METHODS Participants of this study, selected from the National Health Insurance Database in Taiwan, were aged 19-90 years and received maintenance hemodialysis from January 1, 2010, to December 31, 2010. A control group consisting of individuals who did not receive dialysis, selected from the same data source, were matched for age, sex, and the Charlson Comorbidity Index (CCI). Subgroup analysis with hemodialysis frequency was also performed. ED utilization among enrolled individuals was assessed in 2012. Generalized estimating equations with multiple variable adjustments were used to identify risk factors associated with resuscitation during ED visits. RESULTS One group of 2985 individuals who received maintenance hemodialysis, and another group of 2985 patients that did not receive hemodialysis, between January 1, 2010, and December 31, 2010, were included in this study. There were 4822 ED visits in the hemodialysis group, and 1755 ED visits in the non-dialysis group between January 1, 2012, and December 31, 2012. Analysis of multivariable generalized estimating equations identified the risk associated with resuscitation during ED visits to be greater in individuals who were receiving maintenance hemodialysis, aged older than 55 years, hospitalized in the past year, and assigned first and second degree of triage. CONCLUSION Patients receiving maintenance hemodialysis had higher ED utilization and a significantly higher risk of resuscitation during ED visits than those without hemodialysis.
Collapse
Affiliation(s)
- Yi-Chih Lin
- Department of Medicine, National Taiwan University Hospital Jinshan Branch, New Taipei City, Taiwan
| | - Hua-Kuei Hsu
- Department of Health Care Management, National Taipei University of Nursing and Health Science, Taipei, Taiwan
| | - Tai-Shuan Lai
- Division of Nephrology, Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan.
| | - Wen-Chih Chiang
- Division of Nephrology, Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan
| | - Shuei-Liong Lin
- Division of Nephrology, Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan
| | - Yung-Ming Chen
- Division of Nephrology, Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan
| | - Chu-Chieh Chen
- Department of Health Care Management, National Taipei University of Nursing and Health Science, Taipei, Taiwan.
| | - Tzong-Shinn Chu
- Division of Nephrology, Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan
| |
Collapse
|
23
|
Liu J, Ma Y. Locally efficient semiparametric estimators for a class of Poisson models with measurement error. CAN J STAT 2019. [DOI: 10.1002/cjs.11483] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Jianxuan Liu
- Department of Mathematics Syracuse University Syracuse NY 13244 U.S.A
| | - Yanyuan Ma
- Department of Statistics Penn State University University Park PA 16802 U.S.A
| |
Collapse
|
24
|
Yi GY, Yan Y, Liao X, Spiegelman D. Parametric Regression Analysis with Covariate Misclassification in Main Study/Validation Study Designs. Int J Biostat 2018; 15:/j/ijb.ahead-of-print/ijb-2017-0002/ijb-2017-0002.xml. [PMID: 30864410 DOI: 10.1515/ijb-2017-0002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2017] [Accepted: 11/06/2018] [Indexed: 11/15/2022]
Abstract
Measurement error and misclassification have long been a concern in many fields, including medicine, administrative health care data, epidemiology, and survey sampling. It is known that measurement error and misclassification may seriously degrade the quality of estimation and inference, and should be avoided whenever possible. However, in practice, it is inevitable that measurements contain error for a variety of reasons. It is thus necessary to develop statistical strategies to cope with this issue. Although many inference methods have been proposed in the literature to address mis-measurement effects, some important issues remain unexplored. Typically, it is generally unclear how the available methods may perform relative to each other. In this paper, capitalizing on the unique feature of discrete variables, we consider settings with misclassified binary covariates and investigate issues concerning covariate misclassification; our development parallels available strategies for handling measurement error in continuous covariates. Under a unified framework, we examine a number of valid inferential procedures for practical settings where a validation study, either internal or external, is available besides a main study. Furthermore, we compare the relative performance of these methods and make practical recommendations.
Collapse
Affiliation(s)
- Grace Y Yi
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1
| | - Ying Yan
- Department of Statistical Science, School of Mathematics, Sun Yat-sen University, Guangzhou, China
| | - Xiaomei Liao
- Departments of Epidemiology and Biostatistics, Harvard School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA
| | - Donna Spiegelman
- Departments of Epidemiology and Biostatistics, Harvard School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA; Department of Biostatistics, Yale School of Public Health, New Haven, CT 06510
| |
Collapse
|
25
|
An Explorative Study on Estimating Local Accuracies in Land-Cover Information Using Logistic Regression and Class-Heterogeneity-Stratified Data. REMOTE SENSING 2018. [DOI: 10.3390/rs10101581] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
It is increasingly recognized that classification accuracy should be characterized locally at the level of individual pixels to depict its spatial variability to better inform users and producers of land-cover information than by conventional error-matrix-based methods. Local or per-pixel accuracy is usually estimated through empirical modelling, such as logistic regression, which often proceeds in a class-aggregated or a class-stratified way, with the latter being generally more accurate due to its accommodation for between-class inhomogeneity in accuracy-context relations. As an extension to class-stratified modelling, class-heterogeneity-stratified modelling, in which logistic models are built separately for contextually heterogeneous vs. homogeneous sub-strata in individual strata of map classes, is proposed in this paper for proper handling of within-class inhomogeneity in accuracy-context relations to increase accuracy of estimation. Unlike in existing literature where sampling is usually approached separately, the double-stratification method is also adopted in sampling design so that more sample data are likely allocated to heterogeneous sub-strata (which are more prone to misclassifications than homogeneous ones). This class-heterogeneity-stratified method furnished for sampling and modelling jointly thus constitutes an integrative framework for accuracy estimation and information refinement. As the first step in building up such a framework, this paper investigates the proposed double-stratification method’s performance and sensitivity to sample size regarding local accuracy estimation in comparison with those of existing methods through a case study concerning Globeland30 2010 land cover over Wuhan, China. A detailed review of existing methods for analyses, estimation, and use of local accuracy was provided, helping to put the proposed research in a broader context. Candidate explanatory variables for logistic regression included sample pixels’ map classes, positions, and contextual features that were computed in different-sized moving windows. Relative performances of these methods were evaluated based on an independent reference sample, with all methods found reliable. It was confirmed that the proposed method is in general the most accurate, as observed with varying sample sizes. The proposed method’s competitive performance is thus proved, reinforcing its potential for information refinement. Extensions to and uncertainty aspects of the proposed method were discussed, with further research proposed.
Collapse
|
26
|
Su Y, Reedy J, Carroll RJ. Clustering in General Measurement Error Models. Stat Sin 2018; 28:2337-2351. [PMID: 30636855 PMCID: PMC6329467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
This paper is dedicated to the memory of Peter G. Hall. It concerns a deceptively simple question: if one observes variables corrupted with measurement error of possibly very complex form, can one recreate asymptotically the clusters that would have been found had there been no measurement error? We show that the answer is yes, and that the solution is surprisingly simple and general. The method itself is to simulate, by computer, realizations with the same distribution as that of the true variables, and then to apply clustering to these realizations. Technically, we show that if one uses K-means clustering or any other risk minimizing clustering, and a multivariate deconvolution device with certain smoothness and convergence properties, then, in the limit, the cluster means based on our method converge to the same cluster means as if there is no measurement error. Along with the method and its technical justification, we analyze two important nutrition data sets, finding patterns that make sense nutritionally.
Collapse
Affiliation(s)
- Ya Su
- Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX 77843-3143
| | - Jill Reedy
- Epidemiology and Genomics Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, MD 20892
| | - Raymond J Carroll
- Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX 77843-3143, and School of Mathematical and Physical Sciences, University of Technology, Sydney, Broadway NSW 2007, Australia
| |
Collapse
|
27
|
Shu D, Yi GY. Causal inference with measurement error in outcomes: Bias analysis and estimation methods. Stat Methods Med Res 2017; 28:2049-2068. [DOI: 10.1177/0962280217743777] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Inverse probability weighting estimation has been popularly used to consistently estimate the average treatment effect. Its validity, however, is challenged by the presence of error-prone variables. In this paper, we explore the inverse probability weighting estimation with mismeasured outcome variables. We study the impact of measurement error for both continuous and discrete outcome variables and reveal interesting consequences of the naive analysis which ignores measurement error. When a continuous outcome variable is mismeasured under an additive measurement error model, the naive analysis may still yield a consistent estimator; when the outcome is binary, we derive the asymptotic bias in a closed-form. Furthermore, we develop consistent estimation procedures for practical scenarios where either validation data or replicates are available. With validation data, we propose an efficient method for estimation of average treatment effect; the efficiency gain is substantial relative to usual methods of using validation data. To provide protection against model misspecification, we further propose a doubly robust estimator which is consistent even when either the treatment model or the outcome model is misspecified. Simulation studies are reported to assess the performance of the proposed methods. An application to a smoking cessation dataset is presented.
Collapse
Affiliation(s)
- Di Shu
- Department of Statistics and Actuarial Science, University of Waterloo, Ontario, Canada
| | - Grace Y Yi
- Department of Statistics and Actuarial Science, University of Waterloo, Ontario, Canada
| |
Collapse
|
28
|
Zhang X, Wang H, Ma Y, Carroll RJ. Linear Model Selection when Covariates Contain Errors. J Am Stat Assoc 2017; 112:1553-1561. [PMID: 29416191 PMCID: PMC5798903 DOI: 10.1080/01621459.2016.1219262] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2016] [Revised: 05/27/2016] [Indexed: 10/21/2022]
Abstract
Prediction precision is arguably the most relevant criterion of a model in practice and is often a sought after property. A common difficulty with covariates measured with errors is the impossibility of performing prediction evaluation on the data even if a model is completely given without any unknown parameters. We bypass this inherent difficulty by using special properties on moment relations in linear regression models with measurement errors. The end product is a model selection procedure that achieves the same optimality properties that are achieved in classical linear regression models without covariate measurement error. Asymptotically, the procedure selects the model with the minimum prediction error in general, and selects the smallest correct model if the regression relation is indeed linear. Our model selection procedure is useful in prediction when future covariates without measurement error become available, e.g., due to improved technology or better management and design of data collection procedures.
Collapse
Affiliation(s)
- Xinyu Zhang
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China,
| | - Haiying Wang
- Department of Mathematics and Statistics, University of New Hampshire, Durham, NH 03824,
| | - Yanyuan Ma
- Department of Statistics, Penn State University, State College, PA 16802,
| | - Raymond J Carroll
- Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX 77843-3143, and School of Mathematical and Physical Sciences, University of Technology Sydney, Broadway NSW 2007,
| |
Collapse
|
29
|
Dlugosz S, Mammen E, Wilke RA. Generalized partially linear regression with misclassified data and an application to labour market transitions. Comput Stat Data Anal 2017. [DOI: 10.1016/j.csda.2017.01.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
30
|
Xu Y, Kim JK, Li Y. Semiparametric estimation for measurement error models with validation data. CAN J STAT 2017. [DOI: 10.1002/cjs.11314] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Yuhang Xu
- Department of Statistics; University of Nebraska-Lincoln; Lincoln, NE 68583 U.S.A
| | - Jae Kwang Kim
- Department of Statistics; Iowa State University; Ames, IA 50011 U.S.A
- Department of Mathematical Sciences; KAIST, Daejeon, 34141 Korea
| | - Yehua Li
- Department of Statistics; Iowa State University; Ames, IA 50011 U.S.A
| |
Collapse
|
31
|
Agogo GO, van der Voet H, van ’t Veer P, Ferrari P, Muller DC, Sánchez-Cantalejo E, Bamia C, Braaten T, Knüppel S, Johansson I, van Eeuwijk FA, Boshuizen HC. A method for sensitivity analysis to assess the effects of measurement error in multiple exposure variables using external validation data. BMC Med Res Methodol 2016; 16:139. [PMID: 27737637 PMCID: PMC5064985 DOI: 10.1186/s12874-016-0240-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Accepted: 10/05/2016] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND Measurement error in self-reported dietary intakes is known to bias the association between dietary intake and a health outcome of interest such as risk of a disease. The association can be distorted further by mismeasured confounders, leading to invalid results and conclusions. It is, however, difficult to adjust for the bias in the association when there is no internal validation data. METHODS We proposed a method to adjust for the bias in the diet-disease association (hereafter, association), due to measurement error in dietary intake and a mismeasured confounder, when there is no internal validation data. The method combines prior information on the validity of the self-report instrument with the observed data to adjust for the bias in the association. We compared the proposed method with the method that ignores the confounder effect, and with the method that ignores measurement errors completely. We assessed the sensitivity of the estimates to various magnitudes of measurement error, error correlations and uncertainty in the literature-reported validation data. We applied the methods to fruits and vegetables (FV) intakes, cigarette smoking (confounder) and all-cause mortality data from the European Prospective Investigation into Cancer and Nutrition study. RESULTS Using the proposed method resulted in about four times increase in the strength of association between FV intake and mortality. For weakly correlated errors, measurement error in the confounder minimally affected the hazard ratio estimate for FV intake. The effect was more pronounced for strong error correlations. CONCLUSIONS The proposed method permits sensitivity analysis on measurement error structures and accounts for uncertainties in the reported validity coefficients. The method is useful in assessing the direction and quantifying the magnitude of bias in the association due to measurement errors in the confounders.
Collapse
Affiliation(s)
- George O. Agogo
- Biometris, Wageningen University and Research Centre, Wageningen, The Netherlands
- Department of Internal Medicine, Yale University, New Haven, USA
| | - Hilko van der Voet
- Biometris, Wageningen University and Research Centre, Wageningen, The Netherlands
| | - Pieter van ’t Veer
- Department of Human Nutrition, Wageningen University and Research Centre, Wageningen, The Netherlands
| | - Pietro Ferrari
- Nutritional Epidemiology Group, International Agency for Research on Cancer, Lyon, France
| | - David C. Muller
- Genetic Epidemiology Group, International Agency for Research on Cancer, Lyon, France
| | | | - Christina Bamia
- Department of Hygiene, Epidemiology and Medical Statistics, University of Athens Medical School, Athens, Greece
| | - Tonje Braaten
- Department of Community Medicine, University of Tromsø, N-9037 Tromsø, Norway
| | - Sven Knüppel
- Department of Epidemiology, German Institute of Human Nutrition Potsdam-Rehbrücke, Nuthetal, Germany
| | | | - Fred A. van Eeuwijk
- Biometris, Wageningen University and Research Centre, Wageningen, The Netherlands
| | - Hendriek C. Boshuizen
- Department of Statistics, mathematical modelling and data logistics, National Institute for Public Health and the Environment (RIVM), Bilthoven, The Netherlands
| |
Collapse
|