1
|
Camero G, Villamizar G, Pombo LM, Saba M, Frank AL, Teherán AA, Acero GM. Epidemiology of Asbestosis between 2010-2014 and 2015-2019 Periods in Colombia: Descriptive Study. Ann Glob Health 2023; 89:54. [PMID: 37637467 PMCID: PMC10453953 DOI: 10.5334/aogh.3963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 06/26/2023] [Indexed: 08/29/2023] Open
Abstract
Background Asbestosis is a prevalent worldwide problem, but scarce data sourced from developing countries are available. We describe the sociodemographic characteristics and patterns in the occurrence of care provided for asbestosis in Colombia during the periods 2010-2014 and 2015-2019 to establish the behavior, trends, and variables associated with concentrations among people attended by asbestosis. Methods A retrospective descriptive study was carried out with data from the Integrated Social Protection Information System (SISPRO) for two 5-year periods. People attended by asbestosis (ICD-10: J61) were identified; the frequency of patient visits, sociodemographic characteristics, case distribution patterns, and trends in both five-year periods were described, as was the crude frequency (cFr, 95% CI) of asbestosis (1,000,000 people/year) in both five-year periods (cFr ratio, 95% CI). Results During the period 2010-2019, 765 people attended by asbestosis were identified; there were 308 people attended by asbestosis between 2010-2014 (cFr: 2.20, 1.96-2.47), and ther were 457 people attended by asbestos between 2015-2019 (cFr: 3.14, 2.92-3.50). In both periods, the estimated cFr in men was nine times the estimated cFr in women. The cFr increased in the 2015-2019 period (cFr_ratio: 1.23, 1.06-1.43). Compared with the 2010-2014 period, the cFr of asbestosis increased in women (cFr_ratio: 1.44, 1.03-2.01), in the Andean (cFr_ratio: 1.61, 1.35-1.95) and Caribbean regions (cFr_ratio: 1. 66, 1.21-2.30), in the urban area (cFr_ratio: 1.24, 1.05-1.48), and in the age groups 45-59 years (cFr_ratio: 1.34, 1.001-1.79) and ≥60 years (cFr_ratio: 1.43, 1.13-1.83). Discussion During two five-year periods, the cFr of asbestosis was higher in men; between the first and second five-year periods, it increased significantly, especially in urbanized geographic areas and in populations aged ≥45 years. The estimates possibly reflect the effect of disease latency or the expected impact of public health policies to monitor asbestos exposure and complications.
Collapse
Affiliation(s)
- Gabriel Camero
- Cruz Roja Colombiana—Seccional Cundinamarca-Bogotá, Grupo de Investigación Emergencias, Desastres y Ayuda Humanitaria, Cruz Roja Cundinamarca y Bogotá, USA
| | | | - Luis M. Pombo
- Fundación Universitaria Juan N. Corpas, Grupos de Investigación COMPLEXUS, GIFVTA, Colombia
| | - Manuel Saba
- Universidad de Cartagena, Facultad de Ingeniería. Grupo de Investigación de Modelación Ambiental (GIMA), Cartagena, Colombia
| | | | - Aníbal A. Teherán
- Fundación Universitaria Juan N. Corpas, Grupos de Investigación COMPLEXUS, GIFVTA, Colombia
- Cruz Roja Colombiana—Seccional Cundinamarca-Bogotá, Grupo de Investigación Emergencias, Desastres y Ayuda Humanitaria, Cruz Roja Cundinamarca y Bogotá, Colombia
| | | |
Collapse
|
2
|
Wang MC, Zhu Y. Bias correction via outcome reassignment for cross-sectional data with binary disease outcome. LIFETIME DATA ANALYSIS 2022; 28:659-674. [PMID: 35748999 DOI: 10.1007/s10985-022-09559-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Accepted: 06/06/2022] [Indexed: 06/15/2023]
Abstract
Cross-sectionally sampled data with binary disease outcome are commonly analyzed in observational studies to identify the relationship between covariates and disease outcome. A cross-sectional population is defined as a population of living individuals at the sampling or observational time. It is generally understood that binary disease outcome from cross-sectional data contains less information than longitudinally collected time-to-event data, but there is insufficient understanding as to whether bias can possibly exist in cross-sectional data and how the bias is related to the population risk of interest. Wang and Yang (2021) presented the complexity and bias in cross-sectional data with binary disease outcome with detailed analytical explorations into the data structure. As the distribution of the cross-sectional binary outcome is quite different from the population risk distribution, bias can arise when using cross-sectional data analysis to draw inference for population risk. In this paper we argue that the commonly adopted age-specific risk probability is biased for the estimation of population risk and propose an outcome reassignment approach which reassigns a portion of the observed binary outcome, 0 or 1, to the other disease category. A sign test and a semiparametric pseudo-likelihood method are developed for analyzing cross-sectional data using the OR approach. Simulations and an analysis based on Alzheimer's Disease data are presented to illustrate the proposed methods.
Collapse
Affiliation(s)
- Mei-Cheng Wang
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, USA.
| | - Yuxin Zhu
- Department of Neurology, Johns Hopkins School of Medicine, Baltimore, USA
| |
Collapse
|
3
|
Owora AH. Maternal major depression disorder misclassification errors: Remedies for valid individual- and population-level inference. Brain Behav 2022; 12:e2614. [PMID: 35587518 PMCID: PMC9226807 DOI: 10.1002/brb3.2614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 04/17/2022] [Accepted: 04/21/2022] [Indexed: 11/10/2022] Open
Abstract
Individual and population level inference about risk and burden of MDD, particularly maternal MDD, is often made using case-finding tools that are imperfect and prone to misclassification error (i.e. false positives and negatives). These errors or biases are rarely accounted for and lead to inappropriate clinical decisions, inefficient allocation of scarce resources, and poor planning of maternal MDD prevention and treatment interventions. The argument that the use of existing maternal MDD case-finding instruments results in misclassification errors is not new; in fact, it has been argued for decades, but by and large its implications and particularly how to correct for these errors for valid inference is unexplored. Correction of the estimates of maternal MDD prevalence, case-finding tool sensitivity and specificity is possible and should be done to inform valid individual and population-level inferences.
Collapse
Affiliation(s)
- Arthur H Owora
- Department of Epidemiology and Biostatistics, School of Public Health, Indiana University, Bloomington, Indiana
| |
Collapse
|
4
|
Abstract
Purpose of review Epidemiologists frequently must handle competing events, which prevent the event of interest from occurring. We review considerations for handling competing events when interpreting results causally. Recent findings When interpreting statistical associations as causal effects, we recommend following a causal inference "roadmap" as one would in an analysis without competing events. There are, however, special considerations to be made for competing events when choosing the causal estimand that best answers the question of interest, selecting the statistical estimand (e.g. the cause-specific or subdistribution) that will target that causal estimand, and assessing whether causal identification conditions (e.g., conditional exchangeability, positivity, and consistency) have been sufficiently met. Summary When doing causal inference in the competing events setting, it is critical to first ascertain the relevant question and the causal estimand that best answers it, with the choice often being between estimands that do and do not eliminate competing events.
Collapse
Affiliation(s)
- Jacqueline E Rudolph
- Department of Epidemiology, Graduate School of Public Health, University of Pittsburgh
| | | | - Ashley I Naimi
- Department of Epidemiology, Graduate School of Public Health, University of Pittsburgh
| |
Collapse
|
5
|
Linet MS, Schubauer-Berigan MK, Berrington de González A. Outcome Assessment in Epidemiological Studies of Low-Dose Radiation Exposure and Cancer Risks: Sources, Level of Ascertainment, and Misclassification. J Natl Cancer Inst Monogr 2020; 2020:154-175. [PMID: 32657350 PMCID: PMC8454197 DOI: 10.1093/jncimonographs/lgaa007] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Accepted: 02/18/2020] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Outcome assessment problems and errors that could lead to biased risk estimates in low-dose radiation epidemiological studies of cancer risks have not been systematically evaluated. METHODS Incidence or mortality risks for all cancers or all solid cancers combined and for leukemia were examined in 26 studies published in 2006-2017 involving low-dose (mean dose ≤100 mGy) radiation from environmental, medical, or occupational sources. We evaluated the impact of loss to follow-up, under- or overascertainment, outcome misclassification, and changing classifications occurring similarly or differentially across radiation dose levels. RESULTS Loss to follow-up was not reported in 62% of studies, but when reported it was generally small. Only one study critically evaluated the completeness of the sources of vital status. Underascertainment of cancers ("false negatives") was a potential shortcoming for cohorts that could not be linked with high-quality population-based registries, particularly during early years of exposure in five studies, in two lacking complete residential history, and in one with substantial emigration. False positives may have occurred as a result of cancer ascertainment from self- or next-of-kin report in three studies or from enhanced medical surveillance of exposed patients that could lead to detection bias (eg, reporting precancer lesions as physician-diagnosed cancer) in one study. Most pediatric but few adult leukemia studies used expert hematopathology review or current classifications. Only a few studies recoded solid cancers to the latest International Classification of Diseases or International Classification of Diseases for Oncology codes. These outcome assessment shortcomings were generally nondifferential in relation to radiation exposure level except possibly in four studies. CONCLUSION The majority of studies lacked information to enable comprehensive evaluation of all major sources of outcome assessment errors, although reported data suggested that the outcome assessment limitations generally had little effect on risk or biased estimates towards the null except possibly in four studies.
Collapse
Affiliation(s)
- Martha S Linet
- Radiation Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Rockville, MD
| | - Mary K Schubauer-Berigan
- Monographs Programme, Evidence Synthesis and Classification Section, International Agency for Research on Cancer, Lyon, France
| | - Amy Berrington de González
- Radiation Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Rockville, MD
| |
Collapse
|
6
|
Tan KS. Misclassification of the actual causes of death and its impact on analysis: A case study in non-small cell lung cancer. Lung Cancer 2019; 134:16-24. [PMID: 31319976 DOI: 10.1016/j.lungcan.2019.05.016] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Revised: 05/07/2019] [Accepted: 05/14/2019] [Indexed: 11/27/2022]
Abstract
OBJECTIVES Cumulative incidence of lung cancer deaths (LC-CID) is an important metric to understand cancer prognosis and to determine treatment options. However, credible estimates of LC-CID rely on accurate cause-of-death coding in death certificates. Results from lung cancer screening trials estimated 15% under-reporting and 1% over-reporting of lung cancer deaths due to misclassification. This study investigated the impact of cause-of-death misclassification on the estimation of LC-CID. MATERIALS AND METHODS Patients with stage I/II non-small cell lung cancer (NSCLC) from the Surveillance, Epidemiology, and End Results registry were included. LC-CID was estimated using the competing-risk approach in two ways: (1) reporting observed estimates that ignore potential cause-of-death misclassification and (2) correcting for plausible misclassification rates reported in the literature (15% under-reporting and 1% over-reporting). Bias was quantified as the difference between observed and corrected 10-year LC-CIDs: positive values indicated that observed LC-CID overestimated true LC-CID, whereas negative values indicated the opposite. RESULTS Among 66,179 patients, the impact of over-reporting on 10-year LC-CID was negligible across all age groups. In contrast, under-reporting resulted in substantial underestimation of 10-year LC-CID. The biases increased as age increased due to higher LC-CIDs: 10-year LC-CIDs among stage I patients 18-44, 45-59, 60-74 and ≥75 years were 25%, 32%, 41%, and 50%, respectively, and the corresponding biases given the plausible misclassification rates were -4.4%, -5.6%, -7.1%, and -8.6%. Because the observed LC-CIDs among patients with stage II disease were higher than those with stage I disease, the biases were greater among stage II patients, up to -12.5% in the oldest age group. CONCLUSIONS In lung cancer, LC-CID may be severely underestimated due to under-reporting of lung cancer deaths, particularly among older patients or those with late-stage disease. Future studies that involve such subpopulations should present the corrected LC-CIDs based on plausible misclassification rates alongside the observed LC-CIDs.
Collapse
Affiliation(s)
- Kay See Tan
- Department of Biostatistics and Epidemiology, Memorial Sloan Kettering Cancer Center, 485 Lexington Ave, 2(nd) Floor, New York, NY, 10017, United States.
| |
Collapse
|
7
|
Abstract
Background: National estimates of the sizes of key populations, including female sex workers, men who have sex with men, and transgender women are critical to inform national and international responses to the HIV pandemic. However, epidemiologic studies typically provide size estimates for only limited high priority geographic areas. This article illustrates a two-stage approach to obtain a national key population size estimate in the Dominican Republic using available estimates and publicly available contextual information. Methods: Available estimates of key population size in priority areas were augmented with targeted additional data collection in other areas. To combine information from data collected at each stage, we used statistical methods for handling missing data, including inverse probability weights, multiple imputation, and augmented inverse probability weights. Results: Using the augmented inverse probability weighting approach, which provides some protection against parametric model misspecification, we estimated that 3.7% (95% CI = 2.9, 4.7) of the total population of women in the Dominican Republic between the ages of 15 and 49 years were engaged in sex work, 1.2% (95% CI = 1.1, 1.3) of men aged 15–49 had sex with other men, and 0.19% (95% CI = 0.17, 0.21) of people assigned the male sex at birth were transgender. Conclusions: Viewing the size estimation of key populations as a missing data problem provides a framework for articulating and evaluating the assumptions necessary to obtain a national size estimate. In addition, this paradigm allows use of methods for missing data familiar to epidemiologists.
Collapse
|
8
|
Edwards JK, Cole SR, Moore RD, Mathews WC, Kitahata M, Eron JJ. Sensitivity Analyses for Misclassification of Cause of Death in the Parametric G-Formula. Am J Epidemiol 2018; 187:1808-1816. [PMID: 29420696 DOI: 10.1093/aje/kwy028] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 02/02/2018] [Indexed: 01/03/2023] Open
Abstract
Cause-specific mortality is an important outcome in studies of interventions to improve survival, yet causes of death can be misclassified. Here, we present an approach to performing sensitivity analyses for misclassification of cause of death in the parametric g-formula. The g-formula is a useful method to estimate effects of interventions in epidemiologic research because it appropriately accounts for time-varying confounding affected by prior treatment and can estimate risk under dynamic treatment plans. We illustrate our approach using an example comparing acquired immune deficiency syndrome (AIDS)-related mortality under immediate and delayed treatment strategies in a cohort of therapy-naive adults entering care for human immunodeficiency virus infection in the United States. In the standard g-formula approach, 10-year risk of AIDS-related mortality under delayed treatment was 1.73 (95% CI: 1.17, 2.54) times the risk under immediate treatment. In a sensitivity analysis assuming that AIDS-related death was measured with sensitivity of 95% and specificity of 90%, the 10-year risk ratio comparing AIDS-related mortality between treatment plans was 1.89 (95% CI: 1.13, 3.14). When sensitivity and specificity are unknown, this approach can be used to estimate the effects of dynamic treatment plans under a range of plausible values of sensitivity and specificity of the recorded event type.
Collapse
Affiliation(s)
- Jessie K Edwards
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Stephen R Cole
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Richard D Moore
- School of Medicine, Johns Hopkins University, Baltimore, Maryland
| | | | - Mari Kitahata
- Department of Medicine, University of Washington, Seattle, Washington
| | - Joseph J Eron
- School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| |
Collapse
|
9
|
Lévêque E, Lacourt A, Luce D, Sylvestre MP, Guénel P, Stücker I, Leffondré K. Time-dependent effect of intensity of smoking and of occupational exposure to asbestos on the risk of lung cancer: results from the ICARE case-control study. Occup Environ Med 2018; 75:586-592. [PMID: 29777039 DOI: 10.1136/oemed-2017-104953] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2017] [Revised: 03/28/2018] [Accepted: 04/27/2018] [Indexed: 11/03/2022]
Abstract
OBJECTIVE To estimate the impact of intensity of both smoking and occupational exposure to asbestos on the risk of lung cancer throughout the whole exposure history. METHODS Data on 2026 male cases and 2610 male controls came from the French ICARE (Investigation of occupational and environmental causes of respiratory cancers) population-based, case-control study. Lifetime smoking history and occupational history were collected from standardised questionnaires and face-to-face interviews. Occupational exposure to asbestos was assessed using a job exposure matrix. The effects of annual average daily intensity of smoking (reported average number of cigarettes smoked per day) and asbestos exposure (estimated average daily air concentration of asbestos fibres at work) were estimated using a flexible weighted cumulative index of exposure in logistic regression models. RESULTS Intensity of smoking in the 10 years preceding diagnosis had a much stronger association with the risk of lung cancer than more distant intensity. By contrast, intensity of asbestos exposure that occurred more than 40 years before diagnosis had a stronger association with the risk of lung cancer than more recent intensity, even if intensity in the 10 years preceding diagnosis also had a significant effect. CONCLUSION Our results illustrate the dynamic of the effect of intensity of both smoking and occupational exposure to asbestos on the risk of lung cancer. They confirm that the timing of exposure plays an important role, and suggest that standard analytical methods assuming equal weights of intensity over the whole exposure history may be questionable.
Collapse
Affiliation(s)
- Emilie Lévêque
- Université de Bordeaux, ISPED, INSERM, Bordeaux Population Health Research Center, Team Biostatistics, UMR 1219, Bordeaux, France.,Université de Bordeaux, INSERM, Bordeaux Population Health Research Center, Team EPICENE, UMR 1219, Bordeaux, France
| | - Aude Lacourt
- Université de Bordeaux, INSERM, Bordeaux Population Health Research Center, Team EPICENE, UMR 1219, Bordeaux, France
| | - Danièle Luce
- Université de Rennes, INSERM, EHESP, IRSET (Institut de recherche en santé, environnement et travail), UMR_S 1085, Pointe-à-Pitre, France
| | - Marie-Pierre Sylvestre
- Department of Social and Preventive Medicine, Montreal School of Public Health (ESPUM), University of Montreal, Montreal, Quebec, Canada.,Research Center, University of Montreal Health Center (CRCHUM), Montreal, Quebec, Canada
| | - Pascal Guénel
- INSERM, CESP, Cancer and Environment Team, Université Paris Saclay, Université de Paris-Sud, UVSQ, Villejuif, France
| | - Isabelle Stücker
- INSERM, CESP, Cancer and Environment Team, Université Paris Saclay, Université de Paris-Sud, UVSQ, Villejuif, France
| | - Karen Leffondré
- Université de Bordeaux, ISPED, INSERM, Bordeaux Population Health Research Center, Team Biostatistics, UMR 1219, Bordeaux, France
| |
Collapse
|
10
|
Missingness in the Setting of Competing Risks: from missing values to missing potential outcomes. CURR EPIDEMIOL REP 2018; 5:153-159. [PMID: 30386717 DOI: 10.1007/s40471-018-0142-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Purpose of review The setting of competing risks in which there is an event that precludes the event of interest from occurring is prevalent in epidemiological research. Unless studying all-cause mortality, any study following up individuals is subject to having a competing risk should individuals die during time period that the study covers. While there are prior papers discussing the need for competing risk methods in epidemiologic research, we are not aware of any review that discusses issues of missing data in a competing risk setting. Recent Findings We provide an overview of causal inference in competing risks as potential outcomes are missing, provide some strategies in dealing with missing (or misclassified) event type, and missing covariate data in competing risks. The strategies presented are specifically focused on those that may easily be implemented in standard statistical packages. There is ongoing work in terms of causal analyses, dealing with missing event type information, and missing covariate values specific to competing risk analyses. Summary Competing events are common in epidemiologic research. While there has been a focus on why one should conduct a proper competing risk analysis, a perhaps unrecognized issue is in terms of missingness. Strategies exist to minimize the impact of missingness in analyses of competing risks.
Collapse
|
11
|
Hajizadeh N, Pourhoseingholi MA, Baghestani AR, Abadi A, Zali MR. Bayesian adjustment of gastric cancer mortality rate in the presence of misclassification. World J Gastrointest Oncol 2017; 9:160-165. [PMID: 28451063 PMCID: PMC5390301 DOI: 10.4251/wjgo.v9.i4.160] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Revised: 10/20/2016] [Accepted: 01/14/2017] [Indexed: 02/05/2023] Open
Abstract
AIM To correct for misclassification error in registering causes of death in Iran death registry using Bayesian method.
METHODS National death statistic from 2006 to 2010 for gastric cancer which reported annually by the Ministry of Health and Medical Education included in this study. To correct the rate of gastric cancer mortality with reassigning the deaths due to gastric cancer that registered as cancer without detail, a Bayesian method was implemented with Poisson count regression and beta prior for misclassified parameter, assuming 20% misclassification in registering causes of death in Iran.
RESULTS Registered mortality due to gastric cancer from 2006 to 2010 was considered in this study. According to the Bayesian re-estimate, about 3%-7% of deaths due to gastric cancer have registered as cancer without mentioning details. It makes an undercount of gastric cancer mortality in Iranian population. The number and age standardized rate of gastric cancer death is estimated to be 5805 (10.17 per 100000 populations), 5862 (10.51 per 100000 populations), 5731 (10.23 per 100000 populations), 5946 (10.44 per 100000 populations), and 6002 (10.35 per 100000 populations), respectively for years 2006 to 2010.
CONCLUSION There is an undercount in gastric cancer mortality in Iranian registered data that researchers and authorities should notice that in sequential estimations and policy making.
Collapse
|
12
|
Zawistowski M, Sussman JB, Hofer TP, Bentley D, Hayward RA, Wiitala WL. Corrected ROC analysis for misclassified binary outcomes. Stat Med 2017; 36:2148-2160. [PMID: 28245528 DOI: 10.1002/sim.7260] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2016] [Revised: 01/25/2017] [Accepted: 01/26/2017] [Indexed: 11/06/2022]
Abstract
Creating accurate risk prediction models from Big Data resources such as Electronic Health Records (EHRs) is a critical step toward achieving precision medicine. A major challenge in developing these tools is accounting for imperfect aspects of EHR data, particularly the potential for misclassified outcomes. Misclassification, the swapping of case and control outcome labels, is well known to bias effect size estimates for regression prediction models. In this paper, we study the effect of misclassification on accuracy assessment for risk prediction models and find that it leads to bias in the area under the curve (AUC) metric from standard ROC analysis. The extent of the bias is determined by the false positive and false negative misclassification rates as well as disease prevalence. Notably, we show that simply correcting for misclassification while building the prediction model is not sufficient to remove the bias in AUC. We therefore introduce an intuitive misclassification-adjusted ROC procedure that accounts for uncertainty in observed outcomes and produces bias-corrected estimates of the true AUC. The method requires that misclassification rates are either known or can be estimated, quantities typically required for the modeling step. The computational simplicity of our method is a key advantage, making it ideal for efficiently comparing multiple prediction models on very large datasets. Finally, we apply the correction method to a hospitalization prediction model from a cohort of over 1 million patients from the Veterans Health Administrations EHR. Implementations of the ROC correction are provided for Stata and R. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.
Collapse
Affiliation(s)
- Matthew Zawistowski
- Veterans Affairs Center for Clinical Management Research, Ann Arbor, 48105, MI, U.S.A.,Department of Biostatistics, University of Michigan, Ann Arbor, 48109, MI, U.S.A
| | - Jeremy B Sussman
- Veterans Affairs Center for Clinical Management Research, Ann Arbor, 48105, MI, U.S.A.,Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, 48109, MI, U.S.A
| | - Timothy P Hofer
- Veterans Affairs Center for Clinical Management Research, Ann Arbor, 48105, MI, U.S.A.,Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, 48109, MI, U.S.A
| | - Douglas Bentley
- Veterans Affairs Center for Clinical Management Research, Ann Arbor, 48105, MI, U.S.A
| | - Rodney A Hayward
- Veterans Affairs Center for Clinical Management Research, Ann Arbor, 48105, MI, U.S.A.,Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, 48109, MI, U.S.A
| | - Wyndy L Wiitala
- Veterans Affairs Center for Clinical Management Research, Ann Arbor, 48105, MI, U.S.A
| |
Collapse
|
13
|
A Bayesian Approach to Account for Misclassification and Overdispersion in Count Data. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2015; 12:10648-61. [PMID: 26343704 PMCID: PMC4586634 DOI: 10.3390/ijerph120910648] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/30/2015] [Revised: 07/17/2015] [Accepted: 08/25/2015] [Indexed: 12/24/2022]
Abstract
Count data are subject to considerable sources of what is often referred to as non-sampling error. Errors such as misclassification, measurement error and unmeasured confounding can lead to substantially biased estimators. It is strongly recommended that epidemiologists not only acknowledge these sorts of errors in data, but incorporate sensitivity analyses into part of the total data analysis. We extend previous work on Poisson regression models that allow for misclassification by thoroughly discussing the basis for the models and allowing for extra-Poisson variability in the form of random effects. Via simulation we show the improvements in inference that are brought about by accounting for both the misclassification and the overdispersion.
Collapse
|
14
|
Akinkugbe AA, Saraiya VM, Preisser JS, Offenbacher S, Beck JD. Bias in estimating the cross-sectional smoking, alcohol, obesity and diabetes associations with moderate-severe periodontitis in the Atherosclerosis Risk in Communities study: comparison of full versus partial-mouth estimates. J Clin Periodontol 2015; 42:609-21. [PMID: 26076661 DOI: 10.1111/jcpe.12425] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/17/2015] [Indexed: 01/18/2023]
Abstract
OBJECTIVE To assess whether partial-mouth protocols (PRPs) result in biased estimates of the associations between smoking, alcohol, obesity and diabetes with periodontitis. METHODS Using a sample (n = 6129) of the 1996-1998 Atherosclerosis Risk in Communities study, we used measures of probing pocket depth and clinical attachment level to identify moderate-severe periodontitis. Adjusting for confounders, unconditional binary logistic regression estimated prevalence odds ratios (POR) and 95% confidence limits. Specifically, we compared POR for smoking, alcohol, obesity and diabetes with periodontitis derived from full-mouth to those derived from 4-PRPs (Ramfjörd, National Health and Nutrition Examination survey-III, modified-NHANES-IV and 42-site-Random-site selection-method). Finally, we conducted a simple sensitivity analysis of periodontitis misclassification by changing the case definition threshold for each PRP. RESULTS In comparison to full-mouth PORs, PRP PORs were biased in terms of magnitude and direction. Holding the full-mouth case definition at moderate-severe periodontitis and setting it at mild-moderate-severe for the PRPs did not consistently produce POR estimates that were either biased towards or away from the null in comparison to full-mouth estimates. CONCLUSIONS Partial-mouth protocols result in misclassification of periodontitis and may bias epidemiologic measures of association. The magnitude and direction of this bias depends on choice of PRP and case definition threshold used.
Collapse
Affiliation(s)
- Aderonke A Akinkugbe
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.,Department of Dental Ecology, School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Veeral M Saraiya
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.,Department of Dental Ecology, School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - John S Preisser
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Steven Offenbacher
- Department of Periodontology, School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - James D Beck
- Department of Dental Ecology, School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
15
|
Funk MJ, Landi SN. Misclassification in administrative claims data: quantifying the impact on treatment effect estimates. CURR EPIDEMIOL REP 2015. [PMID: 26085977 DOI: 10.1007/s40471‐014‐0027‐z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Misclassification is present in nearly every epidemiologic study, yet is rarely quantified in analysis in favor of a focus on random error. In this review, we discuss past and present wisdom on misclassification and what measures should be taken to quantify this influential bias, with a focus on bias in pharmacoepidemiologic studies. To date, pharmacoepidemiology primarily utilizes data obtained from administrative claims, a rich source of prescription data but susceptible to bias from unobservable factors including medication sample use, medications filled but not taken, health conditions that are not reported in the administrative billing data, and inadequate capture of confounders. Due to the increasing focus on comparative effectiveness research, we provide a discussion of misclassification in the context of an active comparator, including a demonstration of treatment effects biased away from the null in the presence of nondifferential misclassification. Finally, we highlight recently developed methods to quantify bias and offer these methods as potential options for strengthening the validity and quantifying uncertainty of results obtained from pharmacoepidemiologic research.
Collapse
Affiliation(s)
- Michele Jonsson Funk
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill NC
| | - Suzanne N Landi
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill NC
| |
Collapse
|
16
|
Funk MJ, Landi SN. Misclassification in administrative claims data: quantifying the impact on treatment effect estimates. CURR EPIDEMIOL REP 2014; 1:175-185. [PMID: 26085977 PMCID: PMC4465810 DOI: 10.1007/s40471-014-0027-z] [Citation(s) in RCA: 128] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Misclassification is present in nearly every epidemiologic study, yet is rarely quantified in analysis in favor of a focus on random error. In this review, we discuss past and present wisdom on misclassification and what measures should be taken to quantify this influential bias, with a focus on bias in pharmacoepidemiologic studies. To date, pharmacoepidemiology primarily utilizes data obtained from administrative claims, a rich source of prescription data but susceptible to bias from unobservable factors including medication sample use, medications filled but not taken, health conditions that are not reported in the administrative billing data, and inadequate capture of confounders. Due to the increasing focus on comparative effectiveness research, we provide a discussion of misclassification in the context of an active comparator, including a demonstration of treatment effects biased away from the null in the presence of nondifferential misclassification. Finally, we highlight recently developed methods to quantify bias and offer these methods as potential options for strengthening the validity and quantifying uncertainty of results obtained from pharmacoepidemiologic research.
Collapse
Affiliation(s)
- Michele Jonsson Funk
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill NC
| | - Suzanne N Landi
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill NC
| |
Collapse
|