1
|
van Werkhoven CH, de Gier B, McDonald SA, de Melker HE, Hahné SJM, van den Hof S, Knol MJ. Informed consent for national registration of COVID-19 vaccination caused information bias of vaccine effectiveness estimates mostly in older adults: a bias correction study. J Clin Epidemiol 2024; 174:111471. [PMID: 39032589 DOI: 10.1016/j.jclinepi.2024.111471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 07/10/2024] [Accepted: 07/15/2024] [Indexed: 07/23/2024]
Abstract
OBJECTIVES Registration in the Dutch national COVID-19 vaccination register requires consent from the vaccinee. This causes misclassification of nonconsenting vaccinated persons as being unvaccinated. We quantified and corrected the resulting information bias in vaccine effectiveness (VE) estimates. STUDY DESIGN AND SETTING National data were used for the period dominated by the SARS-CoV-2 Delta variant (July 11 to November 15, 2021). VE ((1-relative risk)∗100%) against COVID-19 hospitalization and intensive care unit (ICU) admission was estimated for individuals 12 to 49, 50 to 69, and ≥70 years of age using negative binomial regression. Anonymous data on vaccinations administered by the Municipal Health Services were used to determine informed consent percentages and estimate corrected VEs by iteratively imputing corrected vaccination status. Absolute bias was calculated as the absolute change in VE; relative bias as uncorrected/corrected relative risk. RESULTS A total of 8804 COVID-19 hospitalizations and 1692 COVID-19 ICU admissions were observed. The bias was largest in the 70+ age group where the nonconsent proportion was 7.0% and observed vaccination coverage was 87%: VE of primary vaccination against hospitalization changed from 75.5% (95% CI 73.5-77.4) before to 85.9% (95% CI 84.7-87.1) after correction (absolute bias -10.4 percentage point, relative bias 1.74). VE against ICU admission in this group was 88.7% (95% CI 86.2-90.8) before and 93.7% (95% CI 92.2-94.9) after correction (absolute bias -5.0 percentage point, relative bias 1.79). CONCLUSION VE estimates can be substantially biased with modest nonconsent percentages for vaccination data registration. Data on covariate-specific nonconsent percentages should be available to correct this bias.
Collapse
Affiliation(s)
- Cornelis H van Werkhoven
- Center for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, The Netherlands; Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands.
| | - Brechje de Gier
- Center for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, The Netherlands
| | - Scott A McDonald
- Center for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, The Netherlands
| | - Hester E de Melker
- Center for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, The Netherlands
| | - Susan J M Hahné
- Center for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, The Netherlands
| | - Susan van den Hof
- Center for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, The Netherlands
| | - Mirjam J Knol
- Center for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, The Netherlands
| |
Collapse
|
2
|
Lennox L, Lambe K, Hindocha CN, Coronini-Cronberg S. What health inequalities exist in access to, outcomes from and experience of treatment for lung cancer? A scoping review. BMJ Open 2023; 13:e077610. [PMID: 37918927 PMCID: PMC10626811 DOI: 10.1136/bmjopen-2023-077610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 10/18/2023] [Indexed: 11/04/2023] Open
Abstract
OBJECTIVES Lung cancer (LC) continues to be the leading cause of cancer-related deaths and while there have been significant improvements in overall survival, this gain is not equally distributed. To address health inequalities (HIs), it is vital to identify whether and where they exist. This paper reviews existing literature on what HIs impact LC care and where these manifest on the care pathway. DESIGN A systematic scoping review based on Arksey and O'Malley's five-stage framework. DATA SOURCES Multiple databases (EMBASE, HMIC, Medline, PsycINFO, PubMed) were used to retrieve articles. ELIGIBILITY CRITERIA Search limits were set to retrieve articles published between January 2012 and April 2022. Papers examining LC along with domains of HI were included. Two authors screened papers and independently assessed full texts. DATA EXTRACTION AND SYNTHESIS HIs were categorised according to: (a) HI domains: Protected Characteristics (PC); Socioeconomic and Deprivation Factors (SDF); Geographical Region (GR); Vulnerable or Socially Excluded Groups (VSG); and (b) where on the LC pathway (access to, outcomes from, experience of care) inequalities manifest. Data were extracted by two authors and collated in a spreadsheet for structured analysis and interpretation. RESULTS 41 papers were included. The most studied domain was PC (32/41), followed by SDF (19/41), GR (18/41) and VSG (13/41). Most studies investigated differences in access (31/41) or outcomes (27/41), with few (4/41) exploring experience inequalities. Evidence showed race, rural residence and being part of a VSG impacted the access to LC diagnosis, treatment and supportive care. Additionally, rural residence, older age or male sex negatively impacted survival and mortality. The relationship between outcomes and other factors (eg, race, deprivation) showed mixed results. CONCLUSIONS Findings offer an opportunity to reflect on the understanding of HIs in LC care and provide a platform to consider targeted efforts to improve equity of access, outcomes and experience for patients.
Collapse
Affiliation(s)
- Laura Lennox
- Primary Care and Public Health, Imperial College London, London, UK
- NIHR Applied Research Collaboration Northwest London, London, UK
| | - Kate Lambe
- Chelsea and Westminster Hospital NHS Foundation Trust, London, UK
| | - Chandni N Hindocha
- Primary Care and Public Health, Imperial College London, London, UK
- NIHR Applied Research Collaboration Northwest London, London, UK
| | - Sophie Coronini-Cronberg
- Primary Care and Public Health, Imperial College London, London, UK
- NIHR Applied Research Collaboration Northwest London, London, UK
- Chelsea and Westminster Hospital NHS Foundation Trust, London, UK
- West London NHS Trust, London, UK
| |
Collapse
|
3
|
Kaplan AD, Greene JD, Liu VX, Ray P. Unsupervised probabilistic models for sequential Electronic Health Records. J Biomed Inform 2022; 134:104163. [PMID: 36038064 PMCID: PMC10588733 DOI: 10.1016/j.jbi.2022.104163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 06/23/2022] [Accepted: 08/11/2022] [Indexed: 11/18/2022]
Abstract
We develop an unsupervised probabilistic model for heterogeneous Electronic Health Record (EHR) data. Utilizing a mixture model formulation, our approach directly models sequences of arbitrary length, such as medications and laboratory results. This allows for subgrouping and incorporation of the dynamics underlying heterogeneous data types. The model consists of a layered set of latent variables that encode underlying structure in the data. These variables represent subject subgroups at the top layer, and unobserved states for sequences in the second layer. We train this model on episodic data from subjects receiving medical care in the Kaiser Permanente Northern California integrated healthcare delivery system. The resulting properties of the trained model generate novel insight from these complex and multifaceted data. In addition, we show how the model can be used to analyze sequences that contribute to assessment of mortality likelihood.
Collapse
Affiliation(s)
- Alan D Kaplan
- Computational Engineering Division, Lawrence Livermore National Laboratory, 7000 East Ave., Livermore, CA 94550, United States of America.
| | - John D Greene
- Kaiser Permanente Division of Research, 2000 Broadway, Oakland, CA 94612, United States of America
| | - Vincent X Liu
- Kaiser Permanente Division of Research, 2000 Broadway, Oakland, CA 94612, United States of America
| | - Priyadip Ray
- Computational Engineering Division, Lawrence Livermore National Laboratory, 7000 East Ave., Livermore, CA 94550, United States of America
| |
Collapse
|
4
|
Xia M, Akakpo RM. A Bayesian approach to simultaneous adjustment of misclassification and missingness in categorical covariates. Stat Methods Med Res 2022; 31:1449-1469. [PMID: 35473473 DOI: 10.1177/09622802221094941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
This study considers concurrent adjustment of misclassification and missingness in categorical covariates in regression models. Under various misclassification and missingness mechanisms, we derive a general mixture regression structure for regression models that can incorporate multiple surrogates of categorical covariates that are subject to misclassification and missingness. In simulation studies, we demonstrate that including observations with missingness and/or multiple surrogates of the covariate helps alleviate the efficiency loss caused by misclassification. In addition, we study the efficacy of misclassification adjustment when the number of categories increases for the covariate of interest. Using data from the Longitudinal Studies of HIV-Associated Lung Infections and Complications, we perform simultaneous adjustment of misclassification and missingness in the self-reported cocaine and heroin use variable when assessing its association with lung density measures.
Collapse
Affiliation(s)
- Michelle Xia
- Department of Statistics and Actuarial Science, 2848Northern Illinois University, Dekalb, IL 60115, USA
| | - Rexford M Akakpo
- Department of Statistics and Actuarial Science, 2848Northern Illinois University, Dekalb, IL 60115, USA
| |
Collapse
|
5
|
Reducing Bias Due to Outcome Misclassification for Epidemiologic Studies Using EHR-derived Probabilistic Phenotypes. Epidemiology 2020; 31:542-550. [DOI: 10.1097/ede.0000000000001193] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
6
|
Zheng Y, Corley DA, Doubeni C, Halm E, Shortreed SM, Barlow WE, Zauber A, Tosteson TD, Chubak J. ANALYSES OF PREVENTIVE CARE MEASURES WITH INCOMPLETE HISTORICAL DATA IN ELECTRONIC MEDICAL RECORDS: AN EXAMPLE FROM COLORECTAL CANCER SCREENING. Ann Appl Stat 2020; 14:1030-1044. [PMID: 34531936 PMCID: PMC8442666 DOI: 10.1214/20-aoas1342] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
The calculation of quality of care measures based on electronic medical records (EMRs) may be inaccurate because of incomplete capture of past services. We evaluate the influence of different statistical approaches for calculating the proportion of patients who are up-to-date for a preventive service, using the example of colorectal cancer (CRC) screening. We propose an extension of traditional mixture models to account for the uncertainty in compliance, which is further complicated by the choice of various screening modalities with different recommended screening intervals. We conducted simulation studies to compare various statistical approaches and demonstrated that the proposed method can alleviate bias when individuals with complete prior medical history information were not representative of the targeted population. The method is motivated by and applied to data from the National Cancer Institute-funded consortium Population-Based Research Optimizing Screening through Personalized Regiments (PROSPR). Findings from the application are important for the evaluation of appropriate use of preventive care and provide a novel tool for dealing with similar analytical challenges with EMR data in broad settings.
Collapse
Affiliation(s)
- Yingye Zheng
- Department of Biostatistics, Fred Hutchinson Cancer Research Center, Seattle WA
| | - Douglas A. Corley
- Division of Research, Kaiser Permanente Northern California, Oakland, CA
| | - Chyke Doubeni
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| | - Ethan Halm
- Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical School, Dallas TX
| | | | | | - Ann Zauber
- Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY
| | | | - Jessica Chubak
- Health Research Institute, Kaiser Permanente Washington, Seattle WA
| |
Collapse
|
7
|
Lobach I, Sheng Y, Lobach S, Zablotska L, Huang CY. Case-control versus case-only estimates of gene-environment interactions with common and misclassified clinical diagnosis. Genet Epidemiol 2019; 44:4-15. [PMID: 31667895 DOI: 10.1002/gepi.22266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Revised: 09/09/2019] [Accepted: 09/27/2019] [Indexed: 11/07/2022]
Abstract
Genetic studies provide valuable information to assess if the effect of genetic variants varies by the nongenetic "environmental" variables, what is traditionally defined to be gene-environment interaction (GxE). A common complication is that multiple disease states present with the same set of symptoms, and hence share the clinical diagnosis. Because (a) disease states might have distinct genetic bases; and (b) frequencies of the disease states within the clinical diagnosis vary by the environmental variables, analyses of association with the clinical diagnosis as an outcome variable might result in false positive or false negative findings. We develop estimates for this setting to be able to assess GxE in a case-only study and we compare the case-control and case-only estimates. We report extensive simulation studies that evaluate empirical properties of the estimates and show the application to a study of Alzheimer's disease.
Collapse
Affiliation(s)
- Iryna Lobach
- Department of Epidemiology and Biostatistics, University of California, San Francisco, California
| | - Ying Sheng
- Department of Epidemiology and Biostatistics, University of California, San Francisco, California
| | - Siarhei Lobach
- Applied Mathematics and Computer Science Department, Belarusian State University, Minsk, Belarus
| | - Lydia Zablotska
- Department of Epidemiology and Biostatistics, University of California, San Francisco, California
| | - Chiung-Yu Huang
- Department of Epidemiology and Biostatistics, University of California, San Francisco, California
| |
Collapse
|
8
|
Burnett-Hartman AN, Kamineni A, Corley DA, Singal AG, Halm EA, Rutter CM, Chubak J, Lee JK, Doubeni CA, Inadomi JM, Doria-Rose VP, Zheng Y. Colonoscopy Indication Algorithm Performance Across Diverse Health Care Systems in the PROSPR Consortium. EGEMS (WASHINGTON, DC) 2019; 7:37. [PMID: 31531383 PMCID: PMC6676916 DOI: 10.5334/egems.296] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2018] [Accepted: 06/21/2019] [Indexed: 11/20/2022]
Abstract
BACKGROUND Despite the importance of characterizing colonoscopy indication for quality monitoring and cancer screening program evaluation, there is no standard approach to documenting colonoscopy indication in medical records. METHODS We applied two algorithms in three health care systems to assign colonoscopy indication to persons 50-89 years old who received a colonoscopy during 2010-2013. Both algorithms used standard procedure, diagnostic, and laboratory codes. One algorithm, the KPNC algorithm, used a hierarchical approach to classify exam indication into: diagnostic, surveillance, or screening; whereas the other, the SEARCH algorithm, used a logistic regression-based algorithm to provide the probability that colonoscopy was performed for screening. Gold standard assessment of indication was from medical records abstraction. RESULTS There were 1,796 colonoscopy exams included in analyses; age and racial/ethnic distributions of participants differed across health care systems. The KPNC algorithm's sensitivities and specificities for screening indication ranged from 0.78-0.82 and 0.78-0.91, respectively; sensitivities and specificities for diagnostic indication ranged from 0.78-0.89 and 0.74-0.82, respectively. The KPNC algorithm had poor sensitivities (ranging from 0.11-0.67) and high specificities for surveillance exams. The Area Under the Curve (AUC) of the SEARCH algorithm for screening indication ranged from 0.76-0.84 across health care systems. For screening indication, the KPNC algorithm obtained higher specificities than the SEARCH algorithm at the same sensitivity. CONCLUSION Despite standardized implementation of these indication algorithms across three health care systems, the capture of colonoscopy indication data was imperfect. Thus, we recommend that standard, systematic documentation of colonoscopy indication should be added to medical records to ensure efficient and accurate data capture.
Collapse
Affiliation(s)
- Andrea N. Burnett-Hartman
- Institute for Health Research, Kaiser Permanente Colorado, Denver, CO, US
- Fred Hutchinson Cancer Research Center, Seattle, WA, US
| | - Aruna Kamineni
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, US
| | - Douglas A. Corley
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, US
| | - Amit G. Singal
- Department of Internal Medicine, University of Texas Southwestern Medical Center, Dallas, TX, US
| | - Ethan A. Halm
- Department of Internal Medicine, University of Texas Southwestern Medical Center, Dallas, TX, US
- Harold C. Simmons Comprehensive Cancer Center, Dallas, TX, US
| | | | - Jessica Chubak
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, US
| | - Jeffrey K. Lee
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, US
| | - Chyke A. Doubeni
- Center for Health Equity and Community Engagement Research, Rochester, MN, US
- Department of Family Medicine, Mayo Clinic, Rochester, MN, US
| | - John M. Inadomi
- Division of Gastroenterology, University of Washington, School of Medicine, Seattle, WA, US
| | - V. Paul Doria-Rose
- Division of Cancer Control and Population Sciences, National Cancer Institute, Rockville, Maryland, US
| | - Yingye Zheng
- Fred Hutchinson Cancer Research Center, Seattle, WA, US
| |
Collapse
|
9
|
Flugelman AA, Stein N, Segol O, Lavi I, Keinan-Boker L. Delayed Colonoscopy Following a Positive Fecal Test Result and Cancer Mortality. JNCI Cancer Spectr 2019; 3:pkz024. [PMID: 31360901 PMCID: PMC6649710 DOI: 10.1093/jncics/pkz024] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Revised: 01/12/2019] [Accepted: 03/22/2019] [Indexed: 12/15/2022] Open
Abstract
Background A fecal test followed by diagnostic colonoscopy for a positive result is a widely endorsed screening strategy for colorectal cancer (CRC). However, the relationship between the time delay from the positive test to the follow-up colonoscopy and CRC mortality has not been established. Methods From a population-based screening program, we identified CRC patients newly diagnosed from 2005 through 2015 by a positive fecal occult test followed by a colonoscopy. The primary outcome measure was CRC-specific mortality according to four categories for the time elapsed between the positive result and the subsequent colonoscopy. Results The 1749 patients underwent colonoscopies within 0–3 months (n = 981, 56.1%), 4–6 months (n = 307, 17.5%), 7–12 months (n = 157, 9.0%), and later than 12 months (n = 304, 17.4%). CRC-specific deaths according to exposure groups were: 13.8% (135 of 981) for 0–3 months, 10.7% (33 of 307) for 4–6 months (crude hazards ratio [HR] = 0.74, 95% confidence interval [CI] = 0.51 to 1.14), 11.5% (18 of 157) for 7–12 months (crude HR = 0.83, 95% CI = 0.51 to 1.42), and 22.7% (69 of 304) for longer than 12 months (crude HR = 1.40, 95% CI = 1.04 to 1.90). The only variable that was associated with mortality risk was the number of positive slides (P = .003). High positivity was twice the value in the 0–3 as the longer-than-12 months group: 51.9% vs 25.0% and similar for the 4–6 and 7–12 months groups (38.1% and 36.5%), respectively. The adjusted HRs for CRC mortality were 0.81 (95% CI = 0.55 to 1.19); 0.83 (95% CI = 0.50 to 1.41), and 1.53 (95% CI = 1.13 to 2.12, P = .006) for the 4–12, 7–12, and longer-than-12-months groups, respectively, compared with the shortest delay group. Conclusions Among screen-diagnosed CRC patients, performance of colonoscopy more than 12 months after the initial positive fecal occult blood test was associated with more advanced disease and higher mortality due to CRC.
Collapse
Affiliation(s)
- Anath A Flugelman
- Department of Community Medicine and Epidemiology, Lady Davis Carmel Medical Center, Haifa, Israel.,Ruth and Bruce Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa, Israel.,Clalit National Cancer Control Center, Haifa, Israel
| | - Nili Stein
- Ruth and Bruce Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa, Israel.,Clalit National Cancer Control Center, Haifa, Israel
| | - Ori Segol
- Ruth and Bruce Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa, Israel.,Department of Gastroenterology, Lady Davis Carmel Medical Center, Haifa, Israel
| | - Idit Lavi
- Department of Community Medicine and Epidemiology, Lady Davis Carmel Medical Center, Haifa, Israel.,Clalit National Cancer Control Center, Haifa, Israel
| | - Lital Keinan-Boker
- Israel National Cancer Registry, Israel Center for Disease Control, Ministry of Health, Ramat Gan, Israel.,School of Public Health, University of Haifa, Haifa, Israel
| |
Collapse
|
10
|
Young JC, Conover MM, Jonsson Funk M. Measurement Error and Misclassification in Electronic Medical Records: Methods to Mitigate Bias. CURR EPIDEMIOL REP 2018; 5:343-356. [PMID: 35633879 PMCID: PMC9141310 DOI: 10.1007/s40471-018-0164-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
PURPOSE OF REVIEW We sought to: 1) examine common sources of measurement error in research using data from electronic medical records (EMR), 2) discuss methods to assess the extent and type of measurement error, and 3) describe recent developments in methods to address this source of bias. RECENT FINDINGS We identified eight sources of measurement error frequently encountered in EMR studies, the most prominent being that EMR data usually reflect only the health services and medications delivered within the specific health facility/system contributing to the EMR data. Methods for assessing measurement error in EMR data usually require gold standard or validation data, which may be possible using data linkage. Recent methodological developments to address the impact of measurement error in EMR analyses were particularly rich in the multiple imputation literature. SUMMARY Presently, sources of measurement error impacting EMR studies are still being elucidated, as are methods for assessing and addressing them. Given the magnitude of measurement error that has been reported, investigators are urged to carefully evaluate and rigorously address this potential source of bias in studies based in EMR data.
Collapse
|
11
|
Hubbard RA, Huang J, Harton J, Oganisian A, Choi G, Utidjian L, Eneli I, Bailey LC, Chen Y. A Bayesian latent class approach for EHR-based phenotyping. Stat Med 2018; 38:74-87. [PMID: 30252148 PMCID: PMC6519239 DOI: 10.1002/sim.7953] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2018] [Revised: 06/29/2018] [Accepted: 08/05/2018] [Indexed: 01/09/2023]
Abstract
Phenotyping, ie, identification of patients possessing a characteristic of interest, is a fundamental task for research conducted using electronic health records. However, challenges to this task include imperfect sensitivity and specificity of clinical codes and inconsistent availability of more detailed data such as laboratory test results. Despite these challenges, most existing electronic health records-derived phenotypes are rule-based, consisting of a series of Boolean arguments informed by expert knowledge of the disease of interest and its coding. The objective of this paper is to introduce a Bayesian latent phenotyping approach that accounts for imperfect data elements and missing not at random missingness patterns that can be used when no gold-standard data are available. We conducted simulation studies to compare alternative phenotyping methods under different patterns of missingness and applied these approaches to a cohort of 68 265 children at elevated risk for type 2 diabetes mellitus (T2DM). In simulation studies, the latent class approach had similar sensitivity to a rule-based approach (95.9% vs 91.9%) while substantially improving specificity (99.7% vs 90.8%). In the PEDSnet cohort, we found that biomarkers and clinical codes were strongly associated with latent T2DM status. The latent T2DM class was also strongly predictive of missingness in biomarkers. Glucose was missing in 83.4% of patients (odds ratio for latent T2DM status = 0.52) while hemoglobin A1c was missing in 91.2% (odds ratio for latent T2DM status = 0.03 ), suggesting missing not at random missingness. The latent phenotype approach may substantially improve on rule-based phenotyping.
Collapse
Affiliation(s)
- Rebecca A Hubbard
- Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Jing Huang
- Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Joanna Harton
- Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Arman Oganisian
- Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Grace Choi
- Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Levon Utidjian
- Department of Pediatrics, University of Pennsylvania, Philadelphia, Pennsylvania.,Children's Hospital of Philadelphia, Philadelphia, Pennsylvania
| | | | - L Charles Bailey
- Department of Pediatrics, University of Pennsylvania, Philadelphia, Pennsylvania.,Children's Hospital of Philadelphia, Philadelphia, Pennsylvania
| | - Yong Chen
- Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania, Philadelphia, Pennsylvania
| |
Collapse
|