1
|
Keogh RH, Shaw PA, Gustafson P, Carroll RJ, Deffner V, Dodd KW, Küchenhoff H, Tooze JA, Wallace MP, Kipnis V, Freedman LS. STRATOS guidance document on measurement error and misclassification of variables in observational epidemiology: Part 1-Basic theory and simple methods of adjustment. Stat Med 2020; 39:2197-2231. [PMID: 32246539 PMCID: PMC7450672 DOI: 10.1002/sim.8532] [Citation(s) in RCA: 75] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2018] [Revised: 02/25/2020] [Accepted: 02/28/2020] [Indexed: 11/11/2022]
Abstract
Measurement error and misclassification of variables frequently occur in epidemiology and involve variables important to public health. Their presence can impact strongly on results of statistical analyses involving such variables. However, investigators commonly fail to pay attention to biases resulting from such mismeasurement. We provide, in two parts, an overview of the types of error that occur, their impacts on analytic results, and statistical methods to mitigate the biases that they cause. In this first part, we review different types of measurement error and misclassification, emphasizing the classical, linear, and Berkson models, and on the concepts of nondifferential and differential error. We describe the impacts of these types of error in covariates and in outcome variables on various analyses, including estimation and testing in regression models and estimating distributions. We outline types of ancillary studies required to provide information about such errors and discuss the implications of covariate measurement error for study design. Methods for ascertaining sample size requirements are outlined, both for ancillary studies designed to provide information about measurement error and for main studies where the exposure of interest is measured with error. We describe two of the simpler methods, regression calibration and simulation extrapolation (SIMEX), that adjust for bias in regression coefficients caused by measurement error in continuous covariates, and illustrate their use through examples drawn from the Observing Protein and Energy (OPEN) dietary validation study. Finally, we review software available for implementing these methods. The second part of the article deals with more advanced topics.
Collapse
Affiliation(s)
- Ruth H Keogh
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK
| | - Pamela A Shaw
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Paul Gustafson
- Department of Statistics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Raymond J Carroll
- Department of Statistics, Texas A&M University, College Station, Texas, USA
- School of Mathematical and Physical Sciences, University of Technology Sydney, Broadway, New South Wales, Australia
| | - Veronika Deffner
- Statistical Consulting Unit StaBLab, Department of Statistics, Ludwig-Maximilians-Universität, Munich, Germany
| | - Kevin W Dodd
- Biometry Research Group, Division of Cancer Prevention, National Cancer Institute, Bethesda, Maryland, USA
| | - Helmut Küchenhoff
- Department of Statistics, Statistical Consulting Unit StaBLab, Ludwig-Maximilians-Universität, Munich, Germany
| | - Janet A Tooze
- Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA
| | - Michael P Wallace
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada
| | - Victor Kipnis
- Biometry Research Group, Division of Cancer Prevention, National Cancer Institute, Bethesda, Maryland, USA
| | - Laurence S Freedman
- Biostatistics and Biomathematics Unit, Gertner Institute for Epidemiology and Health Policy Research, Tel Hashomer, Israel
- Information Management Services Inc., Rockville, Maryland, USA
| |
Collapse
|
2
|
The Impact of Joint Misclassification of Exposures and Outcomes on the Results of Epidemiologic Research. CURR EPIDEMIOL REP 2018. [DOI: 10.1007/s40471-018-0147-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
3
|
Gribble MO, Karimi R, Feingold BJ, Nyland JF, O'Hara TM, Gladyshev MI, Chen CY. Mercury, selenium and fish oils in marine food webs and implications for human health. JOURNAL OF THE MARINE BIOLOGICAL ASSOCIATION OF THE UNITED KINGDOM. MARINE BIOLOGICAL ASSOCIATION OF THE UNITED KINGDOM 2016; 96:43-59. [PMID: 26834292 PMCID: PMC4720108 DOI: 10.1017/s0025315415001356] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2015] [Accepted: 07/23/2015] [Indexed: 05/04/2023]
Abstract
Humans who eat fish are exposed to mixtures of healthful nutrients and harmful contaminants that are influenced by environmental and ecological factors. Marine fisheries are composed of a multitude of species with varying life histories, and harvested in oceans, coastal waters and estuaries where environmental and ecological conditions determine fish exposure to both nutrients and contaminants. Many of these nutrients and contaminants are thought to influence similar health outcomes (i.e., neurological, cardiovascular, immunological systems). Therefore, our understanding of the risks and benefits of consuming seafood require balanced assessments of contaminants and nutrients found in fish and shellfish. In this paper, we review some of the reported benefits of fish consumption with a focus on the potential hazards of mercury exposure, and compare the environmental variability of fish oils, selenium and mercury in fish. A major scientific gap identified is that fish tissue concentrations are rarely measured for both contaminants and nutrients across a range of species and geographic regions. Interpreting the implications of seafood for human health will require a better understanding of these multiple exposures, particularly as environmental conditions in the oceans change.
Collapse
Affiliation(s)
- Matthew O. Gribble
- Department of Preventive Medicine, University of Southern California Keck School of Medicine, Los Angeles, CA, USA
| | - Roxanne Karimi
- School of Marine and Atmospheric Sciences, Stony Brook University, Stony Brook, NY, USA
| | - Beth J. Feingold
- Department of Environmental Health Sciences, University at Albany School of Public Health, State University of New York, Rensselaer, NY, USA
| | - Jennifer F. Nyland
- Department of Pathology, Microbiology and Immunology, University of South Carolina School of Medicine, Columbia, SC, USA
| | - Todd M. O'Hara
- Department of Veterinary Medicine, College of Natural Science and Mathematics, University of Alaska Fairbanks, Fairbanks, AK, USA
| | - Michail I. Gladyshev
- Institute of Biophysics of Siberian Branch of Russian Academy of Sciences, Akademgorodok, Krasnoyarsk, Russia
- Siberian Federal University, Krasnoyarsk, Russia
| | - Celia Y. Chen
- Department of Biological Sciences – Dartmouth College, Hanover, NH, USA
| |
Collapse
|
4
|
Midthune D, Carroll RJ, Freedman LS, Kipnis V. Measurement error models with interactions. Biostatistics 2015; 17:277-90. [PMID: 26530858 DOI: 10.1093/biostatistics/kxv043] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2014] [Accepted: 10/07/2015] [Indexed: 11/14/2022] Open
Abstract
An important use of measurement error models is to correct regression models for bias due to covariate measurement error. Most measurement error models assume that the observed error-prone covariate (WW ) is a linear function of the unobserved true covariate (X) plus other covariates (Z) in the regression model. In this paper, we consider models for W that include interactions between X and Z. We derive the conditional distribution of X given W and Z and use it to extend the method of regression calibration to this class of measurement error models. We apply the model to dietary data and test whether self-reported dietary intake includes an interaction between true intake and body mass index. We also perform simulations to compare the model to simpler approximate calibration models.
Collapse
Affiliation(s)
- Douglas Midthune
- Biometry Research Group, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Room 5E122, Bethesda, MD 20892, USA
| | - Raymond J Carroll
- Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX 77843-3143, USA and School of Mathematical Sciences, University of Technology, Sydney, Broadway, NSW 2007, Australia
| | - Laurence S Freedman
- Gertner Institute for Epidemiology and Health Policy Research, Sheba Medical Center, Tel Hashomer 52161, Israel
| | - Victor Kipnis
- Biometry Research Group, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Room 5E118, Bethesda, MD 20892, USA
| |
Collapse
|
5
|
Valeri L, Lin X, VanderWeele TJ. Mediation analysis when a continuous mediator is measured with error and the outcome follows a generalized linear model. Stat Med 2014; 33:4875-90. [PMID: 25220625 PMCID: PMC4224977 DOI: 10.1002/sim.6295] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2013] [Accepted: 08/17/2014] [Indexed: 11/09/2022]
Abstract
Mediation analysis is a popular approach to examine the extent to which the effect of an exposure on an outcome is through an intermediate variable (mediator) and the extent to which the effect is direct. When the mediator is mis-measured, the validity of mediation analysis can be severely undermined. In this paper, we first study the bias of classical, non-differential measurement error on a continuous mediator in the estimation of direct and indirect causal effects in generalized linear models when the outcome is either continuous or discrete and exposure-mediator interaction may be present. Our theoretical results as well as a numerical study demonstrate that in the presence of non-linearities, the bias of naive estimators for direct and indirect effects that ignore measurement error can take unintuitive directions. We then develop methods to correct for measurement error. Three correction approaches using method of moments, regression calibration, and SIMEX are compared. We apply the proposed method to the Massachusetts General Hospital lung cancer study to evaluate the effect of genetic variants mediated through smoking on lung cancer risk.
Collapse
Affiliation(s)
- Linda Valeri
- Department of Biostatistics and Epidemiology, Harvard School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, U.S.A
| | | | | |
Collapse
|
6
|
Gene-environment dependence creates spurious gene-environment interaction. Am J Hum Genet 2014; 95:301-7. [PMID: 25152454 PMCID: PMC4157149 DOI: 10.1016/j.ajhg.2014.07.014] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2014] [Accepted: 07/31/2014] [Indexed: 01/21/2023] Open
Abstract
Gene-environment interactions have the potential to shed light on biological processes leading to disease and to improve the accuracy of epidemiological risk models. However, relatively few such interactions have yet been confirmed. In part this is because genetic markers such as tag SNPs are usually studied, rather than the causal variants themselves. Previous work has shown that this leads to substantial loss of power and increased sample size when gene and environment are independent. However, dependence between gene and environment can arise in several ways including mediation, pleiotropy, and confounding, and several examples of gene-environment interaction under gene-environment dependence have recently been published. Here we show that under gene-environment dependence, a statistical interaction can be present between a marker and environment even if there is no interaction between the causal variant and the environment. We give simple conditions under which there is no marker-environment interaction and note that they do not hold in general when there is gene-environment dependence. Furthermore, the gene-environment dependence applies to the causal variant and cannot be assessed from marker data. Gene-gene interactions are susceptible to the same problem if two causal variants are in linkage disequilibrium. In addition to existing concerns about mechanistic interpretations, we suggest further caution in reporting interactions for genetic markers.
Collapse
|
7
|
Murad H, Kipnis V, Freedman LS. Estimating and testing interactions when explanatory variables are subject to non-classical measurement error. Stat Methods Med Res 2013; 25:1991-2013. [PMID: 24334284 DOI: 10.1177/0962280213509720] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Assessing interactions in linear regression models when covariates have measurement error (ME) is complex.We previously described regression calibration (RC) methods that yield consistent estimators and standard errors for interaction coefficients of normally distributed covariates having classical ME. Here we extend normal based RC (NBRC) and linear RC (LRC) methods to a non-classical ME model, and describe more efficient versions that combine estimates from the main study and internal sub-study. We apply these methods to data from the Observing Protein and Energy Nutrition (OPEN) study. Using simulations we show that (i) for normally distributed covariates efficient NBRC and LRC were nearly unbiased and performed well with sub-study size ≥200; (ii) efficient NBRC had lower MSE than efficient LRC; (iii) the naïve test for a single interaction had type I error probability close to the nominal significance level, whereas efficient NBRC and LRC were slightly anti-conservative but more powerful; (iv) for markedly non-normal covariates, efficient LRC yielded less biased estimators with smaller variance than efficient NBRC. Our simulations suggest that it is preferable to use: (i) efficient NBRC for estimating and testing interaction effects of normally distributed covariates and (ii) efficient LRC for estimating and testing interactions for markedly non-normal covariates.
Collapse
Affiliation(s)
- Havi Murad
- Biostatistics Unit, Gertner Institute for Epidemiology and Health Policy Research, Tel-Hashomer, Israel
| | - Victor Kipnis
- Biometry Research Group, Division of Cancer Prevention, National Cancer Institute, Bethesda, MD, USA
| | - Laurence S Freedman
- Biostatistics Unit, Gertner Institute for Epidemiology and Health Policy Research, Tel-Hashomer, Israel
| |
Collapse
|
8
|
Strand M, Sillau S, Grunwald GK, Rabinovitch N. Regression calibration for models with two predictor variables measured with error and their interaction, using instrumental variables and longitudinal data. Stat Med 2013; 33:470-87. [PMID: 23901041 DOI: 10.1002/sim.5904] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2012] [Accepted: 06/18/2013] [Indexed: 11/11/2022]
Abstract
Regression calibration provides a way to obtain unbiased estimators of fixed effects in regression models when one or more predictors are measured with error. Recent development of measurement error methods has focused on models that include interaction terms between measured-with-error predictors, and separately, methods for estimation in models that account for correlated data. In this work, we derive explicit and novel forms of regression calibration estimators and associated asymptotic variances for longitudinal models that include interaction terms, when data from instrumental and unbiased surrogate variables are available but not the actual predictors of interest. The longitudinal data are fit using linear mixed models that contain random intercepts and account for serial correlation and unequally spaced observations. The motivating application involves a longitudinal study of exposure to two pollutants (predictors) - outdoor fine particulate matter and cigarette smoke - and their association in interactive form with levels of a biomarker of inflammation, leukotriene E4 (LTE 4 , outcome) in asthmatic children. Because the exposure concentrations could not be directly observed, we used measurements from a fixed outdoor monitor and urinary cotinine concentrations as instrumental variables, and we used concentrations of fine ambient particulate matter and cigarette smoke measured with error by personal monitors as unbiased surrogate variables. We applied the derived regression calibration methods to estimate coefficients of the unobserved predictors and their interaction, allowing for direct comparison of toxicity of the different pollutants. We used simulations to verify accuracy of inferential methods based on asymptotic theory.
Collapse
Affiliation(s)
- Matthew Strand
- Division of Biostatistics & Bioinformatics, National Jewish Health, Denver, CO, U.S.A.; Department of Biostatistics & Informatics, Colorado School of Public Health, University of Colorado Denver, Denver, CO, U.S.A
| | | | | | | |
Collapse
|
9
|
Score tests in the presence of errors in covariates in matched case-control studies. J MULTIVARIATE ANAL 2013. [DOI: 10.1016/j.jmva.2012.10.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
10
|
Pollack AZ, Perkins NJ, Mumford SL, Ye A, Schisterman EF. Correlated biomarker measurement error: an important threat to inference in environmental epidemiology. Am J Epidemiol 2013; 177:84-92. [PMID: 23221725 DOI: 10.1093/aje/kws209] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Utilizing multiple biomarkers is increasingly common in epidemiology. However, the combined impact of correlated exposure measurement error, unmeasured confounding, interaction, and limits of detection (LODs) on inference for multiple biomarkers is unknown. We conducted data-driven simulations evaluating bias from correlated measurement error with varying reliability coefficients (R), odds ratios (ORs), levels of correlation between exposures and error, LODs, and interactions. Blood cadmium and lead levels in relation to anovulation served as the motivating example, based on findings from the BioCycle Study (2005-2007). For most scenarios, main-effect estimates for cadmium and lead with increasing levels of positively correlated measurement error created increasing downward or upward bias for OR > 1.00 and OR < 1.00, respectively, that was also a function of effect size. Some scenarios showed bias for cadmium away from the null. Results subject to LODs were similar. Bias for main and interaction effects ranged from -130% to 36% and from -144% to 84%, respectively. A closed-form continuous outcome case solution provides a useful tool for estimating the bias in logistic regression. Investigators should consider how measurement error and LODs may bias findings when examining biomarkers measured in the same medium, prepared with the same process, or analyzed using the same method.
Collapse
Affiliation(s)
- A Z Pollack
- Epidemiology Branch, Division of Epidemiology, Statistics, and Prevention Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD 20892, USA.
| | | | | | | | | |
Collapse
|
11
|
Majeske KD, Lynch-Caris T, Brelin-Fornari J. Quantifying R2bias in the presence of measurement error. J Appl Stat 2010. [DOI: 10.1080/02664760902814542] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|