1
|
Haque MA, Gedara MLB, Nickel N, Turgeon M, Lix LM. The validity of electronic health data for measuring smoking status: a systematic review and meta-analysis. BMC Med Inform Decis Mak 2024; 24:33. [PMID: 38308231 PMCID: PMC10836023 DOI: 10.1186/s12911-024-02416-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 01/03/2024] [Indexed: 02/04/2024] Open
Abstract
BACKGROUND Smoking is a risk factor for many chronic diseases. Multiple smoking status ascertainment algorithms have been developed for population-based electronic health databases such as administrative databases and electronic medical records (EMRs). Evidence syntheses of algorithm validation studies have often focused on chronic diseases rather than risk factors. We conducted a systematic review and meta-analysis of smoking status ascertainment algorithms to describe the characteristics and validity of these algorithms. METHODS The Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines were followed. We searched articles published from 1990 to 2022 in EMBASE, MEDLINE, Scopus, and Web of Science with key terms such as validity, administrative data, electronic health records, smoking, and tobacco use. The extracted information, including article characteristics, algorithm characteristics, and validity measures, was descriptively analyzed. Sources of heterogeneity in validity measures were estimated using a meta-regression model. Risk of bias (ROB) in the reviewed articles was assessed using the Quality Assessment of Diagnostic Accuracy Studies-2 tool. RESULTS The initial search yielded 2086 articles; 57 were selected for review and 116 algorithms were identified. Almost three-quarters (71.6%) of algorithms were based on EMR data. The algorithms were primarily constructed using diagnosis codes for smoking-related conditions, although prescription medication codes for smoking treatments were also adopted. About half of the algorithms were developed using machine-learning models. The pooled estimates of positive predictive value, sensitivity, and specificity were 0.843, 0.672, and 0.918 respectively. Algorithm sensitivity and specificity were highly variable and ranged from 3 to 100% and 36 to 100%, respectively. Model-based algorithms had significantly greater sensitivity (p = 0.006) than rule-based algorithms. Algorithms for EMR data had higher sensitivity than algorithms for administrative data (p = 0.001). The ROB was low in most of the articles (76.3%) that underwent the assessment. CONCLUSIONS Multiple algorithms using different data sources and methods have been proposed to ascertain smoking status in electronic health data. Many algorithms had low sensitivity and positive predictive value, but the data source influenced their validity. Algorithms based on machine-learning models for multiple linked data sources have improved validity.
Collapse
Affiliation(s)
- Md Ashiqul Haque
- Department of Community Health Sciences, University of Manitoba, Winnipeg, MB, Canada
| | | | - Nathan Nickel
- Department of Community Health Sciences, University of Manitoba, Winnipeg, MB, Canada
| | - Maxime Turgeon
- Department of Statistics, University of Manitoba, Winnipeg, MB, Canada
| | - Lisa M Lix
- Department of Community Health Sciences, University of Manitoba, Winnipeg, MB, Canada.
| |
Collapse
|
2
|
Aliabadi A, Sheikhtaheri A, Ansari H. Electronic health record-based disease surveillance systems: A systematic literature review on challenges and solutions. J Am Med Inform Assoc 2021; 27:1977-1986. [PMID: 32929458 DOI: 10.1093/jamia/ocaa186] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 06/20/2020] [Accepted: 07/22/2020] [Indexed: 01/11/2023] Open
Abstract
OBJECTIVE Disease surveillance systems are expanding using electronic health records (EHRs). However, there are many challenges in this regard. In the present study, the solutions and challenges of implementing EHR-based disease surveillance systems (EHR-DS) have been reviewed. MATERIALS AND METHODS We searched the related keywords in ProQuest, PubMed, Web of Science, Cochrane Library, Embase, and Scopus. Then, we assessed and selected articles using the inclusion and exclusion criteria and, finally, classified the identified solutions and challenges. RESULTS Finally, 50 studies were included, and 52 unique solutions and 47 challenges were organized into 6 main themes (policy and regulatory, technical, management, standardization, financial, and data quality). The results indicate that due to the multifaceted nature of the challenges, the implementation of EHR-DS is not low cost and easy to implement and requires a variety of interventions. On the one hand, the most common challenges include the need to invest significant time and resources; the poor data quality in EHRs; difficulty in analyzing, cleaning, and accessing unstructured data; data privacy and security; and the lack of interoperability standards. On the other hand, the most common solutions are the use of natural language processing and machine learning algorithms for unstructured data; the use of appropriate technical solutions for data retrieval, extraction, identification, and visualization; the collaboration of health and clinical departments to access data; standardizing EHR content for public health; and using a unique health identifier for individuals. CONCLUSIONS EHR systems have an important role in modernizing disease surveillance systems. However, there are many problems and challenges facing the development and implementation of EHR-DS that need to be appropriately addressed.
Collapse
Affiliation(s)
- Ali Aliabadi
- Department of Health Information Management, School of Health Management and Information Sciences, Iran University of Medical Sciences, Tehran, Iran
| | - Abbas Sheikhtaheri
- Health Management and Economics Research Center, Department of Health Information Management, School of Health Management and Information Sciences, Iran University of Medical Sciences, Tehran, Iran
| | - Hossein Ansari
- Department of Epidemiology and Biostatistics, Zahedan University of Medical Sciences, Zahedan, Iran
| |
Collapse
|
3
|
Tarabichi Y, Goyden J, Liu R, Lewis S, Sudano J, Kaelber DC. A step closer to nationwide electronic health record-based chronic disease surveillance: characterizing asthma prevalence and emergency department utilization from 100 million patient records through a novel multisite collaboration. J Am Med Inform Assoc 2021; 27:127-135. [PMID: 31592525 DOI: 10.1093/jamia/ocz172] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2019] [Revised: 08/29/2019] [Accepted: 09/05/2019] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVE The study sought to assess the feasibility of nationwide chronic disease surveillance using data aggregated through a multisite collaboration of customers of the same electronic health record (EHR) platform across the United States. MATERIALS AND METHODS An independent confederation of customers of the same EHR platform proposed and guided the development of a program that leverages native EHR features to allow customers to securely contribute de-identified data regarding the prevalence of asthma and rate of asthma-associated emergency department visits to a vendor-managed repository. Data were stratified by state, age, sex, race, and ethnicity. Results were qualitatively compared with national survey-based estimates. RESULTS The program accumulated information from 100 million health records from over 130 healthcare systems in the United States over its first 14 months. All states were represented, with a median coverage of 22.88% of an estimated state's population (interquartile range, 12.05%-42.24%). The mean monthly prevalence of asthma was 5.27 ± 0.11%. The rate of asthma-associated emergency department visits was 1.39 ± 0.08%. Both measures mirrored national survey-based estimates. DISCUSSION By organizing the program around native features of a shared EHR platform, we were able to rapidly accumulate population level measures from a sizeable cohort of health records, with representation from every state. The resulting data allowed estimates of asthma prevalence that were comparable to data from traditional epidemiologic surveys at both geographic and demographic levels. CONCLUSIONS Our initiative demonstrates the potential of intravendor customer collaboration and highlights an organizational approach that complements other data aggregation efforts seeking to achieve nationwide EHR-based chronic disease surveillance.
Collapse
Affiliation(s)
- Yasir Tarabichi
- Center for Clinical Informatics Research and Education, The MetroHealth System, Cleveland, Ohio, USA.,School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA.,Department of Internal Medicine, The MetroHealth System, Cleveland, Ohio, USA.,Division of Pulmonary and Critical Care Medicine, The MetroHealth System, Cleveland, Ohio, USA
| | - Jake Goyden
- Center for Clinical Informatics Research and Education, The MetroHealth System, Cleveland, Ohio, USA.,School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA
| | - Rujia Liu
- Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, Ohio, USA.,Center for Healthcare Research and Policy, The MetroHealth System, Cleveland, Ohio, USA
| | - Steven Lewis
- Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, Ohio, USA.,Center for Healthcare Research and Policy, The MetroHealth System, Cleveland, Ohio, USA
| | - Joseph Sudano
- Department of Internal Medicine, The MetroHealth System, Cleveland, Ohio, USA.,Center for Healthcare Research and Policy, The MetroHealth System, Cleveland, Ohio, USA
| | - David C Kaelber
- Center for Clinical Informatics Research and Education, The MetroHealth System, Cleveland, Ohio, USA.,School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA.,Department of Internal Medicine, The MetroHealth System, Cleveland, Ohio, USA.,Department of Pediatrics, The MetroHealth System, Cleveland, Ohio, USA.,Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, Ohio, USA.,Center for Healthcare Research and Policy, The MetroHealth System, Cleveland, Ohio, USA
| |
Collapse
|
4
|
Kim RS, Shankar V. Prevalence estimation by joint use of big data and health survey: a demonstration study using electronic health records in New York city. BMC Med Res Methodol 2020; 20:77. [PMID: 32252642 PMCID: PMC7137316 DOI: 10.1186/s12874-020-00956-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Accepted: 03/23/2020] [Indexed: 11/22/2022] Open
Abstract
Background Electronic Health Records (EHR) has been increasingly used as a tool to monitor population health. However, subject-level errors in the records can yield biased estimates of health indicators. There is an urgent need for methods to estimate the prevalence of health indicators using large and real-time EHR while correcting the potential bias. Methods We demonstrate joint analyses of EHR and a smaller gold-standard health survey. We first adopted Mosteller’s method that pools two estimators, among which one is potentially biased. It only requires knowing the prevalence estimates from two data sources and their standard errors. Then, we adopted the method of Schenker et al., which uses multiple imputations of subject-level health outcomes that are missing for the subjects in EHR. This procedure requires information to link some subjects between two sources and modeling the mechanism of misclassification in EHR as well as modeling inclusion probabilities to both sources. Results In a simulation study, both estimators yielded negligible bias even when EHR was biased. They performed as well as health survey estimator when EHR bias was large and better than health survey estimator when EHR bias was moderate. It may be challenging to model the misclassification mechanism in real data for the subject-level imputation estimator. We illustrated the methods analyzing six health indicators from 2013 to 14 NYC HANES and the 2013 NYC Macroscope, and a study that linked some subjects in both data sources. Conclusions When a small gold-standard health survey exists, it can serve as a safeguard against potential bias in EHR through the joint analysis of the two sources.
Collapse
Affiliation(s)
- Ryung S Kim
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, 1300 Morris Park Ave, Bronx, NY, 10461, USA.
| | - Viswanathan Shankar
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, 1300 Morris Park Ave, Bronx, NY, 10461, USA
| |
Collapse
|
5
|
Kantrow SP, Jolley SE, Price-Haywood EG, Wang X, Tseng TS, Arnold D, Brown LF, Leonardi C, Scribner RA, Trapido EJ, Lin HY. Using the emergency department to investigate smoking in young adults. Ann Epidemiol 2019; 30:44-49.e1. [PMID: 30555003 PMCID: PMC6510949 DOI: 10.1016/j.annepidem.2018.11.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2018] [Revised: 10/10/2018] [Accepted: 11/18/2018] [Indexed: 10/27/2022]
Abstract
PURPOSE Smoking in young adults identifies the population at risk for future tobacco-related disease. We investigated smoking in a young adult population and within high-risk groups using emergency department (ED) data in a metropolitan area. METHODS Using the electronic health record, we performed a retrospective study of smoking in adults aged 18-30 years presenting to the ED. RESULTS Smoking status was available for 55,777 subjects (90.9% of the total ED cohort); 60.8% were women, 55.0% were black, 35.3% were white, and 8.1% were Hispanic; 34.4% were uninsured. Most smokers used cigarettes (95.1%). Prevalence of current smoking was 21.7% for women and 42.5% for men. The electronic health record contains data about diagnosis and social history that can be used to investigate smoking status for high-risk populations. Smoking prevalence was highest for substance use disorder (58.0%), psychiatric illness (41.3%) and alcohol use (39.1%), and lowest for pregnancy (13.5%). In multivariable analyses, male gender, white race, lack of health insurance, alcohol use, and illicit drug use were independently associated with smoking. Smoking risk among alcohol and drug users varied by gender, race, and/or age. CONCLUSIONS The ED provides access to a large, demographically diverse population, and supports investigation of smoking risk in young adults.
Collapse
Affiliation(s)
- Stephen P Kantrow
- Section of Pulmonary/Critical Care and Allergy/Immunology, Department of Medicine, Louisiana State University Health Sciences Center, New Orleans, LA.
| | - Sarah E Jolley
- Section of Pulmonary/Critical Care and Allergy/Immunology, Department of Medicine, Louisiana State University Health Sciences Center, New Orleans, LA
| | - Eboni G Price-Haywood
- Center for Outcomes and Health Services and Department of Internal Medicine, Ochsner Health System, New Orleans, LA
| | - Xinnan Wang
- Biostatistics Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, LA
| | - Tung-Sung Tseng
- Behavioral and Community Health Sciences Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, LA
| | - Dodie Arnold
- Louisiana Public Health Institute, New Orleans, LA
| | | | - Claudia Leonardi
- Behavioral and Community Health Sciences Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, LA
| | - Richard A Scribner
- Epidemiology Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, LA
| | - Edward J Trapido
- Epidemiology Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, LA
| | - Hui-Yi Lin
- Biostatistics Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, LA
| |
Collapse
|
6
|
Chan PY, Zhao Y, Lim S, Perlman SE, McVeigh KH. Using Calibration to Reduce Measurement Error in Prevalence Estimates Based on Electronic Health Records. Prev Chronic Dis 2018; 15:E155. [PMID: 30576279 PMCID: PMC6307836 DOI: 10.5888/pcd15.180371] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
INTRODUCTION Increasing adoption of electronic health record (EHR) systems by health care providers presents an opportunity for EHR-based population health surveillance. EHR data, however, may be subject to measurement error because of factors such as data entry errors and lack of documentation by physicians. We investigated the use of a calibration model to reduce bias of prevalence estimates from the New York City (NYC) Macroscope, an EHR-based surveillance system. METHODS We calibrated 6 health indicators to the 2013-2014 NYC Health and Nutrition Examination Survey (NYC HANES) data: hypertension, diabetes, smoking, obesity, influenza vaccination, and depression. We classified indicators into having low measurement error or high measurement error on the basis of whether the proportion of misclassification (ie, false-negative or false-positive cases) was greater than 15% in 190 reviewed charts. We compared bias (ie, absolute difference between NYC Macroscope estimates and NYC HANES estimates) before and after calibration. RESULTS The health indicators with low measurement error had the same bias after calibration as before calibration (diabetes, 2.5 percentage points; smoking, 2.5 percentage points; obesity, 3.5 percentage points; hypertension, 1.1 percentage points). For indicators with high measurement error, bias decreased from 10.8 to 2.5 percentage points for depression, and from 26.7 to 8.4 percentage points for influenza vaccination. CONCLUSION The calibration model has the potential to reduce bias of prevalence estimates from EHR-based surveillance systems for indicators with high measurement errors. Further research is warranted to assess the utility of the current calibration model for other EHR data and additional indicators.
Collapse
Affiliation(s)
- Pui Ying Chan
- Division of Epidemiology, New York City Department of Health and Mental Hygiene, Long Island City, New York.,42-09 28th St, CN# 07-099, Long Island City, NY 11101.
| | - Yihong Zhao
- Department of Health Policy and Health Services Research, Henry M. Goldman School of Dental Medicine, Boston University, Boston, Massachusetts
| | - Sungwoo Lim
- Division of Epidemiology, New York City Department of Health and Mental Hygiene, Long Island City, New York
| | - Sharon E Perlman
- Division of Epidemiology, New York City Department of Health and Mental Hygiene, Long Island City, New York
| | - Katharine H McVeigh
- Division of Family and Child Health, New York City Department of Health and Mental Hygiene, Long Island City, New York
| |
Collapse
|