1
|
Xia M, Akakpo RM. A Bayesian approach to simultaneous adjustment of misclassification and missingness in categorical covariates. Stat Methods Med Res 2022; 31:1449-1469. [PMID: 35473473 DOI: 10.1177/09622802221094941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
This study considers concurrent adjustment of misclassification and missingness in categorical covariates in regression models. Under various misclassification and missingness mechanisms, we derive a general mixture regression structure for regression models that can incorporate multiple surrogates of categorical covariates that are subject to misclassification and missingness. In simulation studies, we demonstrate that including observations with missingness and/or multiple surrogates of the covariate helps alleviate the efficiency loss caused by misclassification. In addition, we study the efficacy of misclassification adjustment when the number of categories increases for the covariate of interest. Using data from the Longitudinal Studies of HIV-Associated Lung Infections and Complications, we perform simultaneous adjustment of misclassification and missingness in the self-reported cocaine and heroin use variable when assessing its association with lung density measures.
Collapse
Affiliation(s)
- Michelle Xia
- Department of Statistics and Actuarial Science, 2848Northern Illinois University, Dekalb, IL 60115, USA
| | - Rexford M Akakpo
- Department of Statistics and Actuarial Science, 2848Northern Illinois University, Dekalb, IL 60115, USA
| |
Collapse
|
2
|
Lash TL, Ahern TP, Collin LJ, Fox MP, MacLehose RF. Bias Analysis Gone Bad. Am J Epidemiol 2021; 190:1604-1612. [PMID: 33778845 DOI: 10.1093/aje/kwab072] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Accepted: 12/15/2020] [Indexed: 11/12/2022] Open
Abstract
Quantitative bias analysis comprises the tools used to estimate the direction, magnitude, and uncertainty from systematic errors affecting epidemiologic research. Despite the availability of methods and tools, and guidance for good practices, few reports of epidemiologic research incorporate quantitative estimates of bias impacts. The lack of familiarity with bias analysis allows for the possibility of misuse, which is likely most often unintentional but could occasionally include intentional efforts to mislead. We identified 3 examples of suboptimal bias analysis, one for each common bias. For each, we describe the original research and its bias analysis, compare the bias analysis with good practices, and describe how the bias analysis and research findings might have been improved. We assert no motive to the suboptimal bias analysis by the original authors. Common shortcomings in the examples were lack of a clear bias model, computed example, and computing code; poor selection of the values assigned to the bias model's parameters; and little effort to understand the range of uncertainty associated with the bias. Until bias analysis becomes more common, community expectations for the presentation, explanation, and interpretation of bias analyses will remain unstable. Attention to good practices should improve quality, avoid errors, and discourage manipulation.
Collapse
|
3
|
Greenland S. Invited Commentary: Dealing With the Inevitable Deficiencies of Bias Analysis-and All Analyses. Am J Epidemiol 2021; 190:1617-1621. [PMID: 33778862 DOI: 10.1093/aje/kwab069] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 01/26/2021] [Accepted: 02/10/2021] [Indexed: 12/22/2022] Open
Abstract
Lash et al. (Am J Epidemiol. 2021;190(8):1604-1612) have presented detailed critiques of 3 bias analyses that they identify as "suboptimal." This identification raises the question of what "optimal" means for bias analysis, because it is practically impossible to do statistically optimal analyses of typical population studies-with or without bias analysis. At best the analysis can only attempt to satisfy practice guidelines and account for available information both within and outside the study. One should not expect a full accounting for all sources of uncertainty; hence, interval estimates and distributions for causal effects should never be treated as valid uncertainty assessments-they are instead only example analyses that follow from collections of often questionable assumptions. These observations reinforce those of Lash et al. and point to the need for more development of methods for judging bias-parameter distributions and utilization of available information.
Collapse
|
4
|
Manuel CM, Sinha S, Wang S. Matched case-control data with a misclassified exposure: what can be done with instrumental variables? Biostatistics 2021; 22:1-18. [PMID: 31086943 DOI: 10.1093/biostatistics/kxz012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2018] [Revised: 01/13/2019] [Accepted: 04/02/2019] [Indexed: 11/14/2022] Open
Abstract
Matched case-control studies are used for finding the association between a disease and an exposure after controlling the effect of important confounding variables. It is a known fact that the disease-exposure association parameter estimators are biased when the exposure is misclassified, and a matched case-control study is of no exception. Any bias correction method relies on validation data that contain the true exposure and the misclassified exposure value, and in turn the validation data help to estimate the misclassification probabilities. The question is what we can do when there are no validation data and no prior knowledge on the misclassification probabilities, but some instrumental variables are observed. To answer this unexplored and unanswered question, we propose two methods of reducing the exposure misclassification bias in the analysis of a matched case-control data when instrumental variables are measured for each subject of the study. The significance of these approaches is that the proposed methods are designed to work without any validation data that often are not available when the true exposure is impossible or too costly to measure. A simulation study explores different types of instrumental variable scenarios and investigates when the proposed methods work, and how much bias can be reduced. For the purpose of illustration, we apply the methods to a nested case-control data sampled from the 1989 US birth registry.
Collapse
Affiliation(s)
- Christopher M Manuel
- Department of Statistics, Texas A&M University, College Station, TX 77843-3143, USA
| | - Samiran Sinha
- Department of Statistics, Texas A&M University, College Station, TX 77843-3143, USA
| | - Suojin Wang
- Department of Statistics, Texas A&M University, College Station, TX 77843-3143, USA
| |
Collapse
|
5
|
Shaw PA, Gustafson P, Carroll RJ, Deffner V, Dodd KW, Keogh RH, Kipnis V, Tooze JA, Wallace MP, Küchenhoff H, Freedman LS. STRATOS guidance document on measurement error and misclassification of variables in observational epidemiology: Part 2-More complex methods of adjustment and advanced topics. Stat Med 2020; 39:2232-2263. [PMID: 32246531 PMCID: PMC7272296 DOI: 10.1002/sim.8531] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2018] [Revised: 02/27/2020] [Accepted: 02/28/2020] [Indexed: 12/24/2022]
Abstract
We continue our review of issues related to measurement error and misclassification in epidemiology. We further describe methods of adjusting for biased estimation caused by measurement error in continuous covariates, covering likelihood methods, Bayesian methods, moment reconstruction, moment-adjusted imputation, and multiple imputation. We then describe which methods can also be used with misclassification of categorical covariates. Methods of adjusting estimation of distributions of continuous variables for measurement error are then reviewed. Illustrative examples are provided throughout these sections. We provide lists of available software for implementing these methods and also provide the code for implementing our examples in the Supporting Information. Next, we present several advanced topics, including data subject to both classical and Berkson error, modeling continuous exposures with measurement error, and categorical exposures with misclassification in the same model, variable selection when some of the variables are measured with error, adjusting analyses or design for error in an outcome variable, and categorizing continuous variables measured with error. Finally, we provide some advice for the often met situations where variables are known to be measured with substantial error, but there is only an external reference standard or partial (or no) information about the type or magnitude of the error.
Collapse
Affiliation(s)
- Pamela A Shaw
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Paul Gustafson
- Department of Statistics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Raymond J Carroll
- Department of Statistics, Texas A&M University, College Station, Texas, USA
- School of Mathematical and Physical Sciences, University of Technology Sydney, Broadway, New South Wales, Australia
| | - Veronika Deffner
- Statistical Consulting Unit StaBLab, Department of Statistics, Ludwig-Maximilians-Universität, Munich, Germany
| | - Kevin W Dodd
- Biometry Research Group, Division of Cancer Prevention, National Cancer Institute, Bethesda, Maryland, USA
| | - Ruth H Keogh
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK
| | - Victor Kipnis
- Biometry Research Group, Division of Cancer Prevention, National Cancer Institute, Bethesda, Maryland, USA
| | - Janet A Tooze
- Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA
| | - Michael P Wallace
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada
| | - Helmut Küchenhoff
- Statistical Consulting Unit StaBLab, Department of Statistics, Ludwig-Maximilians-Universität, Munich, Germany
| | - Laurence S Freedman
- Biostatistics and Biomathematics Unit, Gertner Institute for Epidemiology and Health Policy Research, Sheba Medical Center, Tel Hashomer, Israel
- Information Management Services Inc., Rockville, Maryland, USA
| |
Collapse
|
6
|
Gustafson P, Karim ME. When exposure is subject to nondifferential misclassification, are validation data helpful in testing for an exposure–disease association? CAN J STAT 2019. [DOI: 10.1002/cjs.11490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Paul Gustafson
- Department of StatisticsUniversity of British ColumbiaVancouver Canada
| | - Mohammad Ehsanul Karim
- School of Population and Public HealthUniversity of British ColumbiaVancouver Canada
- Centre for Health Evaluation and Outcome SciencesProvidence Health CareVancouver Canada
| |
Collapse
|
7
|
The impact of maternal smoking during pregnancy on childhood asthma: adjusted for exposure misclassification; results from the National Health and Nutrition Examination Survey, 2011-2012. Ann Epidemiol 2018; 28:697-703. [PMID: 30150159 DOI: 10.1016/j.annepidem.2018.07.011] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2017] [Revised: 06/07/2018] [Accepted: 07/23/2018] [Indexed: 11/22/2022]
Abstract
PURPOSE We sought to examine the association between childhood asthma and self-reported maternal smoking during pregnancy (MSDP) after adjusting for a range of exposure misclassification scenarios using a Bayesian approach that incorporated exposure misclassification probability estimates from the literature. METHODS Self-reported MSDP and asthma data were extracted from National Health and Nutrition Examination Survey 2011-2012. The association between self-reported MSDP and asthma was adjusted for exposure misclassification using a Bayesian bias model approach. RESULTS We included 3074 subjects who were 1-15 years of age, including 492 asthma cases. The mean (SD) of age of the participants was 8.5 (4.1) and 7.1 (4.2) years and the number (percentage) of female was 205 (42%) and 1314 (51%) among asthmatic and nonasthmatic groups, respectively. The odds ratio (OR) for the association between self-reported MSDP and asthma in logistic regression adjusted for confounders was 1.28 (95% confidence interval: 0.92, 1.77). In a Bayesian analysis that adjusted for exposure misclassification using external data, we found different ORs between MSDP and asthma by applying different priors (posterior ORs 0.90 [95% credible interval {CRI}: 0.47, 1.60] to 3.05 [95% CRI: 1.73, 5.53] in differential and 1.22 [CRI 95%: 0.62, 2.25] to 1.60 CRI: 1.18, 2.19) in nondifferential misclassification settings. CONCLUSIONS Given the assumptions and the accuracy of the bias model, the estimated effect of MSDP on asthma after adjusting for misclassification was strengthened in many scenarios.
Collapse
|
8
|
Xia M, Gustafson P. Bayesian regression models adjusting for unidirectional covariate misclassification. CAN J STAT 2016. [DOI: 10.1002/cjs.11284] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Michelle Xia
- Division of Statistics; Northern Illinois University; Dekalb IL U.S.A
| | - Paul Gustafson
- Department of Statistics; University of British Columbia; Vancouver British Columbia Canada
| |
Collapse
|
9
|
Mak TSH, Best N, Rushton L. Robust bayesian sensitivity analysis for case-control studies with uncertain exposure misclassification probabilities. Int J Biostat 2016; 11:135-49. [PMID: 25720128 DOI: 10.1515/ijb-2013-0044] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Exposure misclassification in case-control studies leads to bias in odds ratio estimates. There has been considerable interest recently to account for misclassification in estimation so as to adjust for bias as well as more accurately quantify uncertainty. These methods require users to elicit suitable values or prior distributions for the misclassification probabilities. In the event where exposure misclassification is highly uncertain, these methods are of limited use, because the resulting posterior uncertainty intervals tend to be too wide to be informative. Posterior inference also becomes very dependent on the subjectively elicited prior distribution. In this paper, we propose an alternative "robust Bayesian" approach, where instead of eliciting prior distributions for the misclassification probabilities, a feasible region is given. The extrema of posterior inference within the region are sought using an inequality constrained optimization algorithm. This method enables sensitivity analyses to be conducted in a useful way as we do not need to restrict all of our unknown parameters to fixed values, but can instead consider ranges of values at a time.
Collapse
|
10
|
Karim ME, Gustafson P. Hypothesis Testing for an Exposure–Disease Association in Case–Control Studies Under Nondifferential Exposure Misclassification in the Presence of Validation Data: Bayesian and Frequentist Adjustments. STATISTICS IN BIOSCIENCES 2016. [DOI: 10.1007/s12561-015-9141-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
11
|
|
12
|
van Gelder MMHJ, Rogier A, Donders T, Devine O, Roeleveld N, Reefhuis J. Using bayesian models to assess the effects of under-reporting of cannabis use on the association with birth defects, national birth defects prevention study, 1997-2005. Paediatr Perinat Epidemiol 2014; 28:424-33. [PMID: 25155701 PMCID: PMC4532339 DOI: 10.1111/ppe.12140] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
BACKGROUND Studies on associations between periconceptional cannabis exposure and birth defects have mainly relied on self-reported exposure. Therefore, the results may be biased due to under-reporting of the exposure. The aim of this study was to quantify the potential effects of this form of exposure misclassification. METHODS Using multivariable logistic regression, we re-analysed associations between periconceptional cannabis use and 20 specific birth defects using data from the National Birth Defects Prevention Study from 1997-2005 for 13 859 case infants and 6556 control infants. For seven birth defects, we implemented four Bayesian models based on various assumptions concerning the sensitivity of self-reported cannabis use to estimate odds ratios (ORs), adjusted for confounding and under-reporting of the exposure. We used information on sensitivity of self-reported cannabis use from the literature for prior assumptions. RESULTS The results unadjusted for under-reporting of the exposure showed an association between cannabis use and anencephaly (posterior OR 1.9 [95% credible interval (CRI) 1.1, 3.2]) which persisted after adjustment for potential exposure misclassification. Initially, no statistically significant associations were observed between cannabis use and the other birth defect categories studied. Although adjustment for under-reporting did not notably change these effect estimates, cannabis use was associated with esophageal atresia (posterior OR 1.7 [95% CRI 1.0, 2.9]), diaphragmatic hernia (posterior OR 1.8 [95% CRI 1.1, 3.0]), and gastroschisis (posterior OR 1.7 [95% CRI 1.2, 2.3]) after correction for exposure misclassification. CONCLUSIONS Under-reporting of the exposure may have obscured some cannabis-birth defect associations in previous studies. However, the resulting bias is likely to be limited.
Collapse
Affiliation(s)
| | | | - T. Donders
- Department for Health Evidence, Radboud university medical center, Nijmegen, The Netherlands
| | - Owen Devine
- National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, GA
| | - Nel Roeleveld
- Department for Health Evidence, Radboud university medical center, Nijmegen, The Netherlands,National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, GA
| | - Jennita Reefhuis
- National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, GA
| |
Collapse
|
13
|
Hamra G, MacLehose R, Richardson D. Markov chain Monte Carlo: an introduction for epidemiologists. Int J Epidemiol 2013; 42:627-34. [PMID: 23569196 DOI: 10.1093/ije/dyt043] [Citation(s) in RCA: 108] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Markov Chain Monte Carlo (MCMC) methods are increasingly popular among epidemiologists. The reason for this may in part be that MCMC offers an appealing approach to handling some difficult types of analyses. Additionally, MCMC methods are those most commonly used for Bayesian analysis. However, epidemiologists are still largely unfamiliar with MCMC. They may lack familiarity either with he implementation of MCMC or with interpretation of the resultant output. As with tutorials outlining the calculus behind maximum likelihood in previous decades, a simple description of the machinery of MCMC is needed. We provide an introduction to conducting analyses with MCMC, and show that, given the same data and under certain model specifications, the results of an MCMC simulation match those of methods based on standard maximum-likelihood estimation (MLE). In addition, we highlight examples of instances in which MCMC approaches to data analysis provide a clear advantage over MLE. We hope that this brief tutorial will encourage epidemiologists to consider MCMC approaches as part of their analytic tool-kit.
Collapse
Affiliation(s)
- Ghassan Hamra
- Division of Environment and Radiation, International Agency for Research on Cancer, Lyon, France.
| | | | | |
Collapse
|
14
|
Abstract
Sparse-data problems are common, and approaches are needed to evaluate the sensitivity of parameter estimates based on sparse data. We propose a Bayesian approach that uses weakly informative priors to quantify sensitivity of parameters to sparse data. The weakly informative prior is based on accumulated evidence regarding the expected magnitude of relationships using relative measures of disease association. We illustrate the use of weakly informative priors with an example of the association of lifetime alcohol consumption and head and neck cancer. When data are sparse and the observed information is weak, a weakly informative prior will shrink parameter estimates toward the prior mean. Additionally, the example shows that when data are not sparse and the observed information is not weak, a weakly informative prior is not influential. Advancements in implementation of Markov Chain Monte Carlo simulation make this sensitivity analysis easily accessible to the practicing epidemiologist.
Collapse
|
15
|
de Vocht F, Cherry N, Wakefield J. A Bayesian mixture modeling approach for assessing the effects of correlated exposures in case-control studies. JOURNAL OF EXPOSURE SCIENCE & ENVIRONMENTAL EPIDEMIOLOGY 2012; 22:352-60. [PMID: 22588215 DOI: 10.1038/jes.2012.22] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
Predisposition to a disease is usually caused by cumulative effects of a multitude of exposures and lifestyle factors in combination with individual susceptibility. Failure to include all relevant variables may result in biased risk estimates and decreased power, whereas inclusion of all variables may lead to computational difficulties, especially when variables are correlated. We describe a Bayesian Mixture Model (BMM) incorporating a variable-selection prior and compared its performance with logistic multiple regression model (LM) in simulated case-control data with up to twenty exposures with varying prevalences and correlations. In addition, as a practical example we re analyzed data on male infertility and occupational exposures (Chaps-UK). BMM mean-squared errors (MSE) were smaller than of the LM, and were independent of the number of model parameters. BMM type I errors were minimal (≤1), whereas for the LM this increased with the number of parameters and correlation between exposures. The numbers of type II errors were comparable. Re analysis of Chaps-UK data demonstrated more convincingly than by using a LM that occupational exposure to glycol ethers and VOCs are likely risk factors for male infertility. This BMM proves an appealing alternative to standard logistic regression when dealing with the analysis of (correlated) exposures in case-control studies.
Collapse
Affiliation(s)
- Frank de Vocht
- Centre for Occupational and Environmental Health, School of Community Based Medicine, Manchester Academic Health Science Centre, University of Manchester, Manchester, UK.
| | | | | |
Collapse
|