1
|
Li Z, Pang S, Qu H, Lian W. Logistic regression prediction models and key influencing factors analysis of diabetes based on algorithm design. Neural Comput Appl 2023. [DOI: 10.1007/s00521-023-08447-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
|
2
|
Vajravelu ME, Hitt TA, Amaral S, Levitt Katz LE, Lee JM, Kelly A. Real-world treatment escalation from metformin monotherapy in youth-onset Type 2 diabetes mellitus: A retrospective cohort study. Pediatr Diabetes 2021; 22:861-871. [PMID: 33978986 PMCID: PMC8373808 DOI: 10.1111/pedi.13232] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Revised: 03/22/2021] [Accepted: 04/26/2021] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND Due to high rates of comorbidities and rapid progression, youth with Type 2 diabetes may benefit from early and aggressive treatment. However, until 2019, the only approved medications for this population were metformin and insulin. OBJECTIVE To investigate patterns and predictors of treatment escalation within 5 years of metformin monotherapy initiation for youth with Type 2 diabetes in clinical practice. SUBJECTS Commercially-insured patients with incident youth-onset (10-18 years) Type 2 diabetes initially treated with metformin only. METHODS Retrospective cohort study using a patient-level medical claims database with data from 2000 to 2020. Frequency and order of treatment escalation to insulin and non-insulin antihyperglycemics were determined and categorized by age at diagnosis. Cox proportional hazards regression was used to evaluate potential predictors of treatment escalation, including age, sex, race/ethnicity, comorbidities, complications, and metformin adherence (medication possession ratio ≥ 0.8). RESULTS The cohort included 829 (66% female; median age at diagnosis 15 years; 19% Hispanic, 17% Black) patients, with median 2.9 year follow-up after metformin initiation. One-quarter underwent treatment escalation (n = 207; 88 to insulin, 164 to non-insulin antihyperglycemic). Younger patients were more likely to have insulin prescribed prior to other antihyperglycemics. Age at diagnosis (HR 1.14, 95% CI 1.07-1.21), medication adherence (HR 4.10, 95% CI 2.96-5.67), Hispanic ethnicity (HR 1.83, 95% CI 1.28-2.61), and diabetes-related complications (HR 1.78, 95% CI 1.15-2.74) were positively associated with treatment escalation. CONCLUSIONS In clinical practice, treatment escalation for pediatric Type 2 diabetes differs with age. Off-label use of non-insulin antihyperglycemics occurs, most commonly among older adolescents.
Collapse
Affiliation(s)
- Mary Ellen Vajravelu
- Division of Endocrinology and Diabetes, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, USA,University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA,Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Talia A. Hitt
- Division of Endocrinology and Diabetes, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, USA,Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Sandra Amaral
- Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, Pennsylvania, USA,Division of Nephrology, The Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
| | - Lorraine E. Levitt Katz
- Division of Endocrinology and Diabetes, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, USA,University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Joyce M. Lee
- Susan B Meister Child Health Evaluation and Research Center, Division of Pediatric Endocrinology, University of Michigan, Ann Arbor, Michigan, USA
| | - Andrea Kelly
- Division of Endocrinology and Diabetes, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, USA,University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| |
Collapse
|
3
|
Barrett CE, Park J, Kompaniyets L, Baggs J, Cheng YJ, Zhang P, Imperatore G, Pavkov ME. Intensive Care Unit Admission, Mechanical Ventilation, and Mortality Among Patients With Type 1 Diabetes Hospitalized for COVID-19 in the U.S. Diabetes Care 2021; 44:1788-1796. [PMID: 34158365 PMCID: PMC9109617 DOI: 10.2337/dc21-0604] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 05/16/2021] [Indexed: 02/03/2023]
Abstract
OBJECTIVE To assess whether risk of severe outcomes among patients with type 1 diabetes mellitus (T1DM) hospitalized for coronavirus disease 2019 (COVID-19) differs from that of patients without diabetes or with type 2 diabetes mellitus (T2DM). RESEARCH DESIGN AND METHODS Using the Premier Healthcare Database Special COVID-19 Release records of patients discharged after COVID-19 hospitalization from U.S. hospitals from March to November 2020 (N = 269,674 after exclusion), we estimated risk differences (RD) and risk ratios (RR) of intensive care unit admission or invasive mechanical ventilation (ICU/MV) and of death among patients with T1DM compared with patients without diabetes or with T2DM. Logistic models were adjusted for age, sex, and race or ethnicity. Models adjusted for additional demographic and clinical characteristics were used to examine whether other factors account for the associations between T1DM and severe COVID-19 outcomes. RESULTS Compared with patients without diabetes, T1DM was associated with a 21% higher absolute risk of ICU/MV (RD 0.21, 95% CI 0.19-0.24; RR 1.49, 95% CI 1.43-1.56) and a 5% higher absolute risk of mortality (RD 0.05, 95% CI 0.03-0.07; RR 1.40, 95% CI 1.24-1.57), with adjustment for age, sex, and race or ethnicity. Compared with T2DM, T1DM was associated with a 9% higher absolute risk of ICU/MV (RD 0.09, 95% CI 0.07-0.12; RR 1.17, 95% CI 1.12-1.22), but no difference in mortality (RD 0.00, 95% CI -0.02 to 0.02; RR 1.00, 95% CI 0.89-1.13). After adjustment for diabetic ketoacidosis (DKA) occurring before or at COVID-19 diagnosis, patients with T1DM no longer had increased risk of ICU/MV (RD 0.01, 95% CI -0.01 to 0.03) and had lower mortality (RD -0.03, 95% CI -0.05 to -0.01) in comparisons with patients with T2DM. CONCLUSIONS Patients with T1DM hospitalized for COVID-19 are at higher risk for severe outcomes than those without diabetes. Higher risk of ICU/MV in patients with T1DM than in patients with T2DM was largely accounted for by the presence of DKA. These findings might further guide recommendations related to diabetes management and the prevention of COVID-19.
Collapse
Affiliation(s)
- Catherine E Barrett
- Division of Diabetes Translation, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA .,COVID-19 Response Team, Centers for Disease Control and Prevention, Atlanta, GA
| | - Joohyun Park
- Division of Diabetes Translation, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA
| | | | - James Baggs
- COVID-19 Response Team, Centers for Disease Control and Prevention, Atlanta, GA
| | - Yiling J Cheng
- Office on Smoking and Health, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA
| | - Ping Zhang
- Division of Diabetes Translation, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA
| | - Giuseppina Imperatore
- Division of Diabetes Translation, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA
| | - Meda E Pavkov
- Division of Diabetes Translation, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA
| |
Collapse
|
4
|
Dabelea D, Sauder KA, Jensen ET, Mottl AK, Huang A, Pihoker C, Hamman RF, Lawrence J, Dolan LM, Agostino RD, Wagenknecht L, Mayer-Davis EJ, Marcovina SM. Twenty years of pediatric diabetes surveillance: what do we know and why it matters. Ann N Y Acad Sci 2021; 1495:99-120. [PMID: 33543783 PMCID: PMC8282684 DOI: 10.1111/nyas.14573] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 01/14/2021] [Accepted: 01/20/2021] [Indexed: 12/23/2022]
Abstract
SEARCH for Diabetes in Youth (SEARCH) was initiated in 2000 as a multicenter study to address major gaps in the understanding of childhood diabetes in the United States. An active registry of youth diagnosed with diabetes at age <20 years since 2002 assessed prevalence, annual incidence, and trends by age, race/ethnicity, sex, and diabetes type. An observational cohort nested within the population-based registry was established to assess the natural history and risk factors for acute and chronic diabetes-related complications, as well as the quality of care and quality of life of children and adolescents with diabetes from diagnosis into young adulthood. SEARCH findings have contributed to a better understanding of the complex and heterogeneous nature of youth-onset diabetes. Continued surveillance of the burden and risk of type 1 and type 2 diabetes is important to track and monitor incidence and prevalence within the population. SEARCH reported evidence of early diabetes complications highlighting that continuing the long-term follow-up of youth with diabetes is necessary to further our understanding of its natural history and to develop the most appropriate approaches to primary, secondary, and tertiary prevention of diabetes and its complications. This review summarizes two decades of research and suggests avenues for further work.
Collapse
Affiliation(s)
- Dana Dabelea
- Lifecourse Epidemiology of Adiposity and Diabetes Center, Departments of Epidemiology and Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO
| | - Katherine A. Sauder
- Lifecourse Epidemiology of Adiposity and Diabetes Center, Departments of Epidemiology and Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO
| | - Elizabeth T. Jensen
- Department of Epidemiology and Prevention, Wake Forest School of Medicine, Winston-Salem, NC
| | - Amy K. Mottl
- Division of Nephrology and Hypertension, University of North Carolina School of Medicine, Chapel Hill, NC
| | - Alyssa Huang
- Department of Pediatrics, University of Washington, Seattle, WA
| | | | - Richard F. Hamman
- Lifecourse Epidemiology of Adiposity and Diabetes Center, Departments of Epidemiology and Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO
| | - Jean Lawrence
- Department of Research & Evaluation, Kaiser Permanente Southern California, Pasadena, CA
| | - Lawrence M. Dolan
- Division of Endocrinology, Cincinnati Children’s Hospital Medical Center, University of Cincinnati College of Medicine, Cincinnati, OH
| | - Ralph D’ Agostino
- Division of Public Health Sciences, Wake Forest School of Medicine, Winston-Salem, NC
| | - Lynne Wagenknecht
- Division of Public Health Sciences, Wake Forest School of Medicine, Winston-Salem, NC
| | | | | | | |
Collapse
|
5
|
Lee S, Doktorchik C, Martin EA, D'Souza AG, Eastwood C, Shaheen AA, Naugler C, Lee J, Quan H. Electronic Medical Record-Based Case Phenotyping for the Charlson Conditions: Scoping Review. JMIR Med Inform 2021; 9:e23934. [PMID: 33522976 PMCID: PMC7884219 DOI: 10.2196/23934] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 11/20/2020] [Accepted: 12/05/2020] [Indexed: 12/16/2022] Open
Abstract
Background Electronic medical records (EMRs) contain large amounts of rich clinical information. Developing EMR-based case definitions, also known as EMR phenotyping, is an active area of research that has implications for epidemiology, clinical care, and health services research. Objective This review aims to describe and assess the present landscape of EMR-based case phenotyping for the Charlson conditions. Methods A scoping review of EMR-based algorithms for defining the Charlson comorbidity index conditions was completed. This study covered articles published between January 2000 and April 2020, both inclusive. Embase (Excerpta Medica database) and MEDLINE (Medical Literature Analysis and Retrieval System Online) were searched using keywords developed in the following 3 domains: terms related to EMR, terms related to case finding, and disease-specific terms. The manuscript follows the Preferred Reporting Items for Systematic reviews and Meta-analyses extension for Scoping Reviews (PRISMA) guidelines. Results A total of 274 articles representing 299 algorithms were assessed and summarized. Most studies were undertaken in the United States (181/299, 60.5%), followed by the United Kingdom (42/299, 14.0%) and Canada (15/299, 5.0%). These algorithms were mostly developed either in primary care (103/299, 34.4%) or inpatient (168/299, 56.2%) settings. Diabetes, congestive heart failure, myocardial infarction, and rheumatology had the highest number of developed algorithms. Data-driven and clinical rule–based approaches have been identified. EMR-based phenotype and algorithm development reflect the data access allowed by respective health systems, and algorithms vary in their performance. Conclusions Recognizing similarities and differences in health systems, data collection strategies, extraction, data release protocols, and existing clinical pathways is critical to algorithm development strategies. Several strategies to assist with phenotype-based case definitions have been proposed.
Collapse
Affiliation(s)
- Seungwon Lee
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Alberta Health Services, Calgary, AB, Canada.,Data Intelligence for Health Lab, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Chelsea Doktorchik
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Elliot Asher Martin
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Alberta Health Services, Calgary, AB, Canada
| | - Adam Giles D'Souza
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Alberta Health Services, Calgary, AB, Canada
| | - Cathy Eastwood
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Abdel Aziz Shaheen
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Medicine, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Christopher Naugler
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Pathology and Laboratory Medicine, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Joon Lee
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Data Intelligence for Health Lab, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Cardiac Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Hude Quan
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| |
Collapse
|
6
|
Crume TL, Hamman RF, Isom S, Divers J, Mayer-Davis EJ, Liese AD, Saydah S, Lawrence JM, Pihoker C, Dabelea D. The accuracy of provider diagnosed diabetes type in youth compared to an etiologic criteria in the SEARCH for Diabetes in Youth Study. Pediatr Diabetes 2020; 21:1403-1411. [PMID: 32981196 PMCID: PMC7819667 DOI: 10.1111/pedi.13126] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 09/10/2020] [Accepted: 09/16/2020] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Although surveillance for diabetes in youth relies on provider-assigned diabetes type from medical records, its accuracy compared to an etiologic definition is unknown. METHODS Using the SEARCH for Diabetes in Youth Registry, we evaluated the validity and accuracy of provider-assigned diabetes type abstracted from medical records against etiologic criteria that included the presence of diabetes autoantibodies (DAA) and insulin sensitivity. Youth who were incident for diabetes in 2002-2006, 2008, or 2012 and had complete data on key analysis variables were included (n = 4001, 85% provider diagnosed type 1). The etiologic definition for type 1 diabetes was ≥1 positive DAA titer(s) or negative DAA titers in the presence of insulin sensitivity and for type 2 diabetes was negative DAA titers in the presence of insulin resistance. RESULTS Provider diagnosed diabetes type correctly agreed with the etiologic definition of type for 89.9% of cases. Provider diagnosed type 1 diabetes was 96.9% sensitive, 82.8% specific, had a positive predictive value (PPV) of 97.0% and a negative predictive value (NPV) of 82.7%. Provider diagnosed type 2 diabetes was 82.8% sensitive, 96.9% specific, had a PPV and NPV of 82.7% and 97.0%, respectively. CONCLUSION Provider diagnosis of diabetes type agreed with etiologic criteria for 90% of the cases. While the sensitivity and PPV were high for youth with type 1 diabetes, the lower sensitivity and PPV for type 2 diabetes highlights the value of DAA testing and assessment of insulin sensitivity status to ensure estimates are not biased by misclassification.
Collapse
Affiliation(s)
- Tessa L Crume
- Department of Epidemiology, Colorado School of Public Health, University of Colorado, Lifecourse Epidemiology of Adiposity and Diabetes (LEAD Center) Anschutz Medical Campus, Denver, Colorado, USA
| | - Richard F Hamman
- Department of Epidemiology, Colorado School of Public Health, University of Colorado, Lifecourse Epidemiology of Adiposity and Diabetes (LEAD Center) Anschutz Medical Campus, Denver, Colorado, USA
| | - Scott Isom
- Wake Forest School of Medicine, Winston-Salem, North Carolina, USA
| | - Jasmin Divers
- Wake Forest School of Medicine, Winston-Salem, North Carolina, USA
| | - Elizabeth J Mayer-Davis
- School of Public Health and School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Angela D Liese
- Department of Epidemiology and Biostatistics, University of South Carolina, Columbia, South Carolina, USA
| | - Sharon Saydah
- Division of Diabetes Translation, Centers for Disease Control and Prevention, Hyattsville, Maryland, USA
| | - Jean M Lawrence
- Department of Research & Evaluation, Kaiser Permanente Southern California, Pasadena, California, USA
| | - Catherine Pihoker
- Department of Pediatric Endocrinology, Children's Hospital & Regional Medical Center, Seattle, Washington, USA
| | - Dana Dabelea
- Department of Epidemiology, Colorado School of Public Health, University of Colorado, Lifecourse Epidemiology of Adiposity and Diabetes (LEAD Center) Anschutz Medical Campus, Denver, Colorado, USA
| |
Collapse
|
7
|
Knight GM, Spencer-Bonilla G, Maahs DM, Blum MR, Valencia A, Zuma BZ, Prahalad P, Sarraju A, Rodriguez F, Scheinker D. Multimethod, multidataset analysis reveals paradoxical relationships between sociodemographic factors, Hispanic ethnicity and diabetes. BMJ Open Diabetes Res Care 2020; 8:e001725. [PMID: 33229378 PMCID: PMC7684662 DOI: 10.1136/bmjdrc-2020-001725] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Revised: 10/06/2020] [Accepted: 10/21/2020] [Indexed: 12/13/2022] Open
Abstract
INTRODUCTION Population-level and individual-level analyses have strengths and limitations as do 'blackbox' machine learning (ML) and traditional, interpretable models. Diabetes mellitus (DM) is a leading cause of morbidity and mortality with complex sociodemographic dynamics that have not been analyzed in a way that leverages population-level and individual-level data as well as traditional epidemiological and ML models. We analyzed complementary individual-level and county-level datasets with both regression and ML methods to study the association between sociodemographic factors and DM. RESEARCH DESIGN AND METHODS County-level DM prevalence, demographics, and socioeconomic status (SES) factors were extracted from the 2018 Robert Wood Johnson Foundation County Health Rankings and merged with US Census data. Analogous individual-level data were extracted from 2007 to 2016 National Health and Nutrition Examination Survey studies and corrected for oversampling with survey weights. We used multivariate linear (logistic) regression and ML regression (classification) models for county (individual) data. Regression and ML models were compared using measures of explained variation (area under the receiver operating characteristic curve (AUC) and R2). RESULTS Among the 3138 counties assessed, the mean DM prevalence was 11.4% (range: 3.0%-21.1%). Among the 12 824 individuals assessed, 1688 met DM criteria (13.2% unweighted; 10.2% weighted). Age, gender, race/ethnicity, income, and education were associated with DM at the county and individual levels. Higher county Hispanic ethnic density was negatively associated with county DM prevalence, while Hispanic ethnicity was positively associated with individual DM. ML outperformed regression in both datasets (mean R2 of 0.679 vs 0.610, respectively (p<0.001) for county-level data; mean AUC of 0.737 vs 0.727 (p<0.0427) for individual-level data). CONCLUSIONS Hispanic individuals are at higher risk of DM, while counties with larger Hispanic populations have lower DM prevalence. Analyses of population-level and individual-level data with multiple methods may afford more confidence in results and identify areas for further study.
Collapse
Affiliation(s)
- Gabriel M Knight
- Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | | | - David M Maahs
- Division of Pediatric Endocrinology, Stanford University School of Medicine, Stanford, California, USA
- Stanford Diabetes Research Center, Stanford, California, USA
- Department of Health Research and Policy, Stanford University School of Medicine, Stanford, California, USA
| | - Manuel R Blum
- Department of Medicine, Stanford University School of Medicine, Stanford, California, USA
- Department of General Internal Medicine, Bern University Hospital, Bern, Switzerland
- Institute of Primary Health Care, University of Bern, Bern, Switzerland
| | - Areli Valencia
- Stanford University School of Medicine, Stanford, California, USA
| | - Bongeka Z Zuma
- Stanford University School of Medicine, Stanford, California, USA
| | - Priya Prahalad
- Division of Pediatric Endocrinology, Stanford University School of Medicine, Stanford, California, USA
| | - Ashish Sarraju
- Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, California, USA
| | - Fatima Rodriguez
- Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, California, USA
| | - David Scheinker
- Division of Pediatric Endocrinology, Stanford University School of Medicine, Stanford, California, USA
- Department of Management Science and Engineering, Stanford University School of Engineering, Stanford, California, USA
- Clinical Excellence Research Center, Stanford University School of Medicine, Stanford, California, USA
| |
Collapse
|
8
|
Obeid JS, Davis M, Turner M, Meystre SM, Heider PM, O'Bryan EC, Lenert LA. An artificial intelligence approach to COVID-19 infection risk assessment in virtual visits: A case report. J Am Med Inform Assoc 2020; 27:1321-1325. [PMID: 32449766 PMCID: PMC7313981 DOI: 10.1093/jamia/ocaa105] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Revised: 05/07/2020] [Accepted: 05/21/2020] [Indexed: 12/15/2022] Open
Abstract
Objective In an effort to improve the efficiency of computer algorithms applied to screening for coronavirus disease 2019 (COVID-19) testing, we used natural language processing and artificial intelligence–based methods with unstructured patient data collected through telehealth visits. Materials and Methods After segmenting and parsing documents, we conducted analysis of overrepresented words in patient symptoms. We then developed a word embedding–based convolutional neural network for predicting COVID-19 test results based on patients’ self-reported symptoms. Results Text analytics revealed that concepts such as smell and taste were more prevalent than expected in patients testing positive. As a result, screening algorithms were adapted to include these symptoms. The deep learning model yielded an area under the receiver-operating characteristic curve of 0.729 for predicting positive results and was subsequently applied to prioritize testing appointment scheduling. Conclusions Informatics tools such as natural language processing and artificial intelligence methods can have significant clinical impacts when applied to data streams early in the development of clinical systems for outbreak response.
Collapse
Affiliation(s)
- Jihad S Obeid
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, South Carolina, USA.,Biomedical Informatics Center, Medical University of South Carolina, Charleston, South Carolina, USA
| | - Matthew Davis
- Information Solutions, Medical University of South Carolina, Charleston, South Carolina, USA
| | - Matthew Turner
- Information Solutions, Medical University of South Carolina, Charleston, South Carolina, USA
| | - Stephane M Meystre
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, South Carolina, USA.,Department of Psychiatry and Behavioral Sciences, Medical University of South Carolina, Charleston, South Carolina, USA
| | - Paul M Heider
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, South Carolina, USA
| | - Edward C O'Bryan
- Department of Emergency Medicine, Medical University of South Carolina, Charleston, South Carolina, USA
| | - Leslie A Lenert
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, South Carolina, USA.,Department of Medicine, Medical University of South Carolina, Charleston, South Carolina, USA
| |
Collapse
|
9
|
Wells BJ, Lenoir KM, Wagenknecht LE, Mayer-Davis EJ, Lawrence JM, Dabelea D, Pihoker C, Saydah S, Casanova R, Turley C, Liese AD, Standiford D, Kahn MG, Hamman R, Divers J. Detection of Diabetes Status and Type in Youth Using Electronic Health Records: The SEARCH for Diabetes in Youth Study. Diabetes Care 2020; 43:2418-2425. [PMID: 32737140 PMCID: PMC7510036 DOI: 10.2337/dc20-0063] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 06/20/2020] [Indexed: 02/03/2023]
Abstract
OBJECTIVE Diabetes surveillance often requires manual medical chart reviews to confirm status and type. This project aimed to create an electronic health record (EHR)-based procedure for improving surveillance efficiency through automation of case identification. RESEARCH DESIGN AND METHODS Youth (<20 years old) with potential evidence of diabetes (N = 8,682) were identified from EHRs at three children's hospitals participating in the SEARCH for Diabetes in Youth Study. True diabetes status/type was determined by manual chart reviews. Multinomial regression was compared with an ICD-10 rule-based algorithm in the ability to correctly identify diabetes status and type. Subsequently, the investigators evaluated a scenario of combining the rule-based algorithm with targeted chart reviews where the algorithm performed poorly. RESULTS The sample included 5,308 true cases (89.2% type 1 diabetes). The rule-based algorithm outperformed regression for overall accuracy (0.955 vs. 0.936). Type 1 diabetes was classified well by both methods: sensitivity (Se) (>0.95), specificity (Sp) (>0.96), and positive predictive value (PPV) (>0.97). In contrast, the PPVs for type 2 diabetes were 0.642 and 0.778 for the rule-based algorithm and the multinomial regression, respectively. Combination of the rule-based method with chart reviews (n = 695, 7.9%) of persons predicted to have non-type 1 diabetes resulted in perfect PPV for the cases reviewed while increasing overall accuracy (0.983). The Se, Sp, and PPV for type 2 diabetes using the combined method were ≥0.91. CONCLUSIONS An ICD-10 algorithm combined with targeted chart reviews accurately identified diabetes status/type and could be an attractive option for diabetes surveillance in youth.
Collapse
Affiliation(s)
- Brian J Wells
- Division of Public Health Sciences, Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, NC
| | - Kristin M Lenoir
- Division of Public Health Sciences, Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, NC
| | - Lynne E Wagenknecht
- Division of Public Health Sciences, Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, NC
| | - Elizabeth J Mayer-Davis
- Departments of Nutrition and Medicine, The University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Jean M Lawrence
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA
| | - Dana Dabelea
- Department of Epidemiology, Colorado School of Public Health, University of Colorado Denver, Aurora, CO
| | | | - Sharon Saydah
- Division of Diabetes Translation, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA
| | - Ramon Casanova
- Division of Public Health Sciences, Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, NC
| | - Christine Turley
- Department of Pediatrics, Medical University of South Carolina, Charleston, SC
| | - Angela D Liese
- Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, Columbia, SC
| | | | - Michael G Kahn
- Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO
| | - Richard Hamman
- Department of Epidemiology, Colorado School of Public Health, University of Colorado Denver, Aurora, CO
| | - Jasmin Divers
- Division of Health Services Research, NYU Winthrop Research Institute, NYU Long Island School of Medicine, Mineola, NY
| |
Collapse
|
10
|
Walters CE, Nitin R, Margulis K, Boorom O, Gustavson DE, Bush CT, Davis LK, Below JE, Cox NJ, Camarata SM, Gordon RL. Automated Phenotyping Tool for Identifying Developmental Language Disorder Cases in Health Systems Data (APT-DLD): A New Research Algorithm for Deployment in Large-Scale Electronic Health Record Systems. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2020; 63:3019-3035. [PMID: 32791019 PMCID: PMC7890229 DOI: 10.1044/2020_jslhr-19-00397] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Revised: 04/23/2020] [Accepted: 05/19/2020] [Indexed: 05/13/2023]
Abstract
Purpose Data mining algorithms using electronic health records (EHRs) are useful in large-scale population-wide studies to classify etiology and comorbidities (Casey et al., 2016). Here, we apply this approach to developmental language disorder (DLD), a prevalent communication disorder whose risk factors and epidemiology remain largely undiscovered. Method We first created a reliable system for manually identifying DLD in EHRs based on speech-language pathologist (SLP) diagnostic expertise. We then developed and validated an automated algorithmic procedure, called, Automated Phenotyping Tool for identifying DLD cases in health systems data (APT-DLD), that classifies a DLD status for patients within EHRs on the basis of ICD (International Statistical Classification of Diseases and Related Health Problems) codes. APT-DLD was validated in a discovery sample (N = 973) using expert SLP manual phenotype coding as a gold-standard comparison and then applied and further validated in a replication sample of N = 13,652 EHRs. Results In the discovery sample, the APT-DLD algorithm correctly classified 98% (concordance) of DLD cases in concordance with manually coded records in the training set, indicating that APT-DLD successfully mimics a comprehensive chart review. The output of APT-DLD was also validated in relation to independently conducted SLP clinician coding in a subset of records, with a positive predictive value of 95% of cases correctly classified as DLD. We also applied APT-DLD to the replication sample, where it achieved a positive predictive value of 90% in relation to SLP clinician classification of DLD. Conclusions APT-DLD is a reliable, valid, and scalable tool for identifying DLD cohorts in EHRs. This new method has promising public health implications for future large-scale epidemiological investigations of DLD and may inform EHR data mining algorithms for other communication disorders. Supplemental Material https://doi.org/10.23641/asha.12753578.
Collapse
Affiliation(s)
- Courtney E. Walters
- Department of Otolaryngology, Vanderbilt University Medical Center, Nashville, TN
- Neuroscience Program, College of Arts and Science, Vanderbilt University, Nashville, TN
| | - Rachana Nitin
- Department of Otolaryngology, Vanderbilt University Medical Center, Nashville, TN
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN
| | - Katherine Margulis
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN
- Kennedy Krieger Institute, Baltimore, MD
| | - Olivia Boorom
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN
| | - Daniel E. Gustavson
- Department of Otolaryngology, Vanderbilt University Medical Center, Nashville, TN
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN
| | - Catherine T. Bush
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN
| | - Lea K. Davis
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Jennifer E. Below
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Nancy J. Cox
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Stephen M. Camarata
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN
| | - Reyna L. Gordon
- Department of Otolaryngology, Vanderbilt University Medical Center, Nashville, TN
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN
| |
Collapse
|
11
|
Weisman A, Tu K, Young J, Kumar M, Austin PC, Jaakkimainen L, Lipscombe L, Aronson R, Booth GL. Validation of a type 1 diabetes algorithm using electronic medical records and administrative healthcare data to study the population incidence and prevalence of type 1 diabetes in Ontario, Canada. BMJ Open Diabetes Res Care 2020; 8:8/1/e001224. [PMID: 32565422 PMCID: PMC7307536 DOI: 10.1136/bmjdrc-2020-001224] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/24/2020] [Revised: 05/12/2020] [Accepted: 05/19/2020] [Indexed: 12/19/2022] Open
Abstract
INTRODUCTION We aimed to develop algorithms distinguishing type 1 diabetes (T1D) from type 2 diabetes in adults ≥18 years old using primary care electronic medical record (EMRPC) and administrative healthcare data from Ontario, Canada, and to estimate T1D prevalence and incidence. RESEARCH DESIGN AND METHODS The reference population was a random sample of patients with diabetes in EMRPC whose charts were manually abstracted (n=5402). Algorithms were developed using classification trees, random forests, and rule-based methods, using electronic medical record (EMR) data, administrative data, or both. Algorithm performance was assessed in EMRPC. Administrative data algorithms were additionally evaluated using a diabetes clinic registry with endocrinologist-assigned diabetes type (n=29 371). Three algorithms were applied to the Ontario population to evaluate the minimum, moderate and maximum estimates of T1D prevalence and incidence rates between 2010 and 2017, and trends were analyzed using negative binomial regressions. RESULTS Of 5402 individuals with diabetes in EMRPC, 195 had T1D. Sensitivity, specificity, positive predictive value and negative predictive value for the best performing algorithms were 80.6% (75.9-87.2), 99.8% (99.7-100), 94.9% (92.3-98.7), and 99.3% (99.1-99.5) for EMR, 51.3% (44.0-58.5), 99.5% (99.3-99.7), 79.4% (71.2-86.1), and 98.2% (97.8-98.5) for administrative data, and 87.2% (81.7-91.5), 99.9% (99.7-100), 96.6% (92.7-98.7) and 99.5% (99.3-99.7) for combined EMR and administrative data. Administrative data algorithms had similar sensitivity and specificity in the diabetes clinic registry. Of 11 499 711 adults in Ontario in 2017, there were 24 789 (0.22%, minimum estimate) to 102 140 (0.89%, maximum estimate) with T1D. Between 2010 and 2017, the age-standardized and sex-standardized prevalence rates per 1000 person-years increased (minimum estimate 1.7 to 2.56, maximum estimate 7.48 to 9.86, p<0.0001). In contrast, incidence rates decreased (minimum estimate 0.1 to 0.04, maximum estimate 0.47 to 0.09, p<0.0001). CONCLUSIONS Primary care EMR and administrative data algorithms performed well in identifying T1D and demonstrated increasing T1D prevalence in Ontario. These algorithms may permit the development of large, population-based cohort studies of T1D.
Collapse
Affiliation(s)
- Alanna Weisman
- ICES, Toronto, Ontario, Canada
- Division of Endocrinology & Metabolism, Department of Medicine, University of Toronto, Toronto, Ontario, Canada
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
| | - Karen Tu
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
- Department of Family and Community Medicine, University of Toronto, Toronto, Ontario, Canada
- Toronto Western Hospital Family Health Team, University Health Network, Toronto, Ontario, Canada
- North York General Hospital, Toronto, Ontario, Canada
| | | | | | - Peter C Austin
- ICES, Toronto, Ontario, Canada
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
| | - Liisa Jaakkimainen
- ICES, Toronto, Ontario, Canada
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
- North York General Hospital, Toronto, Ontario, Canada
| | - Lorraine Lipscombe
- ICES, Toronto, Ontario, Canada
- Division of Endocrinology & Metabolism, Department of Medicine, University of Toronto, Toronto, Ontario, Canada
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
- Women's College Research Institute, Women's College Hospital, Toronto, Ontario, Canada
| | | | - Gillian L Booth
- ICES, Toronto, Ontario, Canada
- Division of Endocrinology & Metabolism, Department of Medicine, University of Toronto, Toronto, Ontario, Canada
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
- Li Ka Shing Knowledge Institute of St. Michael's Hospital, Toronto, Ontario, Canada
| |
Collapse
|
12
|
Ke C, Stukel TA, Luk A, Shah BR, Jha P, Lau E, Ma RCW, So WY, Kong AP, Chow E, Chan JCN. Development and validation of algorithms to classify type 1 and 2 diabetes according to age at diagnosis using electronic health records. BMC Med Res Methodol 2020; 20:35. [PMID: 32093635 PMCID: PMC7038546 DOI: 10.1186/s12874-020-00921-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Accepted: 02/10/2020] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Validated algorithms to classify type 1 and 2 diabetes (T1D, T2D) are mostly limited to white pediatric populations. We conducted a large study in Hong Kong among children and adults with diabetes to develop and validate algorithms using electronic health records (EHRs) to classify diabetes type against clinical assessment as the reference standard, and to evaluate performance by age at diagnosis. METHODS We included all people with diabetes (age at diagnosis 1.5-100 years during 2002-15) in the Hong Kong Diabetes Register and randomized them to derivation and validation cohorts. We developed candidate algorithms to identify diabetes types using encounter codes, prescriptions, and combinations of these criteria ("combination algorithms"). We identified 3 algorithms with the highest sensitivity, positive predictive value (PPV), and kappa coefficient, and evaluated performance by age at diagnosis in the validation cohort. RESULTS There were 10,196 (T1D n = 60, T2D n = 10,136) and 5101 (T1D n = 43, T2D n = 5058) people in the derivation and validation cohorts (mean age at diagnosis 22.7, 55.9 years; 53.3, 43.9% female; for T1D and T2D respectively). Algorithms using codes or prescriptions classified T1D well for age at diagnosis < 20 years, but sensitivity and PPV dropped for older ages at diagnosis. Combination algorithms maximized sensitivity or PPV, but not both. The "high sensitivity for type 1" algorithm (ratio of type 1 to type 2 codes ≥ 4, or at least 1 insulin prescription within 90 days) had a sensitivity of 95.3% (95% confidence interval 84.2-99.4%; PPV 12.8%, 9.3-16.9%), while the "high PPV for type 1" algorithm (ratio of type 1 to type 2 codes ≥ 4, and multiple daily injections with no other glucose-lowering medication prescription) had a PPV of 100.0% (79.4-100.0%; sensitivity 37.2%, 23.0-53.3%), and the "optimized" algorithm (ratio of type 1 to type 2 codes ≥ 4, and at least 1 insulin prescription within 90 days) had a sensitivity of 65.1% (49.1-79.0%) and PPV of 75.7% (58.8-88.2%) across all ages. Accuracy of T2D classification was high for all algorithms. CONCLUSIONS Our validated set of algorithms accurately classifies T1D and T2D using EHRs for Hong Kong residents enrolled in a diabetes register. The choice of algorithm should be tailored to the unique requirements of each study question.
Collapse
Affiliation(s)
- Calvin Ke
- Department of Medicine and Therapeutics, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, Hong Kong
- Department of Medicine, University of Toronto, Toronto, Canada
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Canada
| | - Thérèse A. Stukel
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Canada
- ICES, Toronto, Canada
| | - Andrea Luk
- Department of Medicine and Therapeutics, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, Hong Kong
- Asia Diabetes Foundation, Prince of Wales Hospital, Shatin, Hong Kong
- Hong Kong Institute of Diabetes and Obesity, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, Hong Kong
- Li Ka Shing Institute of Health Science, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, Hong Kong
| | - Baiju R. Shah
- Department of Medicine, University of Toronto, Toronto, Canada
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Canada
- ICES, Toronto, Canada
- Department of Medicine, Sunnybrook Health Sciences Centre, Toronto, Canada
| | - Prabhat Jha
- Centre for Global Health Research, St. Michael’s Hospital, and Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
| | - Eric Lau
- Department of Medicine and Therapeutics, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, Hong Kong
- Asia Diabetes Foundation, Prince of Wales Hospital, Shatin, Hong Kong
| | - Ronald C. W. Ma
- Department of Medicine and Therapeutics, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, Hong Kong
- Hong Kong Institute of Diabetes and Obesity, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, Hong Kong
- Li Ka Shing Institute of Health Science, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, Hong Kong
| | - Wing-Yee So
- Department of Medicine and Therapeutics, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, Hong Kong
| | - Alice P. Kong
- Department of Medicine and Therapeutics, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, Hong Kong
- Hong Kong Institute of Diabetes and Obesity, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, Hong Kong
- Li Ka Shing Institute of Health Science, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, Hong Kong
| | - Elaine Chow
- Department of Medicine and Therapeutics, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, Hong Kong
| | - Juliana C. N. Chan
- Department of Medicine and Therapeutics, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, Hong Kong
- Asia Diabetes Foundation, Prince of Wales Hospital, Shatin, Hong Kong
- Hong Kong Institute of Diabetes and Obesity, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, Hong Kong
- Li Ka Shing Institute of Health Science, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, Hong Kong
| |
Collapse
|
13
|
Pfaff ER, Crosskey M, Morton K, Krishnamurthy A. Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning. JMIR Med Inform 2020; 8:e16042. [PMID: 32012059 PMCID: PMC7007592 DOI: 10.2196/16042] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 10/30/2019] [Accepted: 12/16/2019] [Indexed: 01/02/2023] Open
Abstract
Computable phenotypes are algorithms that translate clinical features into code that can be run against electronic health record (EHR) data to define patient cohorts. However, computable phenotypes that only make use of structured EHR data do not capture the full richness of a patient’s medical record. While natural language processing (NLP) methods have shown success in extracting clinical features from text, the use of such tools has generally been limited to research groups with substantial NLP expertise. Our goal was to develop an open-source phenotyping software, Clinical Annotation Research Kit (CLARK), that would enable clinical and translational researchers to use machine learning–based NLP for computable phenotyping without requiring deep informatics expertise. CLARK enables nonexpert users to mine text using machine learning classifiers by specifying features for the software to match in clinical notes. Once the features are defined, the user-friendly CLARK interface allows the user to choose from a variety of standard machine learning algorithms (linear support vector machine, Gaussian Naïve Bayes, decision tree, and random forest), cross-validation methods, and the number of folds (cross-validation splits) to be used in evaluation of the classifier. Example phenotypes where CLARK has been applied include pediatric diabetes (sensitivity=0.91; specificity=0.98), symptomatic uterine fibroids (positive predictive value=0.81; negative predictive value=0.54), nonalcoholic fatty liver disease (sensitivity=0.90; specificity=0.94), and primary ciliary dyskinesia (sensitivity=0.88; specificity=1.0). In each of these use cases, CLARK allowed investigators to incorporate variables into their phenotype algorithm that would not be available as structured data. Moreover, the fact that nonexpert users can get started with machine learning–based NLP with limited informatics involvement is a significant improvement over the status quo. We hope to disseminate CLARK to other organizations that may not have NLP or machine learning specialists available, enabling wider use of these methods.
Collapse
Affiliation(s)
- Emily R Pfaff
- North Carolina Translational and Clinical Sciences Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | | | | | - Ashok Krishnamurthy
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| |
Collapse
|
14
|
Abstract
Electronic Health Records (EHR) are a rich repository of valuable clinical information that exist in primary and secondary care databases. In order to utilize EHRs for medical observational research a range of algorithms for automatically identifying individuals with a specific phenotype have been developed. This review summarizes and offers a critical evaluation of the literature relating to studies conducted into the development of EHR phenotyping systems. This review describes phenotyping systems and techniques based on structured and unstructured EHR data. Articles published on PubMed and Google scholar between 2013 and 2017 have been reviewed, using search terms derived from Medical Subject Headings (MeSH). The popularity of using Natural Language Processing (NLP) techniques in extracting features from narrative text has increased. This increased attention is due to the availability of open source NLP algorithms, combined with accuracy improvement. In this review, Concept extraction is the most popular NLP technique since it has been used by more than 50% of the reviewed papers to extract features from EHR. High-throughput phenotyping systems using unsupervised machine learning techniques have gained more popularity due to their ability to efficiently and automatically extract a phenotype with minimal human effort.
Collapse
|
15
|
Obeid JS, Weeda ER, Matuskowitz AJ, Gagnon K, Crawford T, Carr CM, Frey LJ. Automated detection of altered mental status in emergency department clinical notes: a deep learning approach. BMC Med Inform Decis Mak 2019; 19:164. [PMID: 31426779 PMCID: PMC6701023 DOI: 10.1186/s12911-019-0894-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2019] [Accepted: 08/11/2019] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Machine learning has been used extensively in clinical text classification tasks. Deep learning approaches using word embeddings have been recently gaining momentum in biomedical applications. In an effort to automate the identification of altered mental status (AMS) in emergency department provider notes for the purpose of decision support, we compare the performance of classic bag-of-words-based machine learning classifiers and novel deep learning approaches. METHODS We used a case-control study design to extract an adequate number of clinical notes with AMS and non-AMS based on ICD codes. The notes were parsed to extract the history of present illness, which was used as the clinical text for the classifiers. The notes were manually labeled by clinicians. As a baseline for comparison, we tested several traditional bag-of-words based classifiers. We then tested several deep learning models using a convolutional neural network architecture with three different types of word embeddings, a pre-trained word2vec model and two models without pre-training but with different word embedding dimensions. RESULTS We evaluated the models on 1130 labeled notes from the emergency department. The deep learning models had the best overall performance with an area under the ROC curve of 98.5% and an accuracy of 94.5%. Pre-training word embeddings on the unlabeled corpus reduced training iterations and had performance that was statistically no different than the other deep learning models. CONCLUSION This supervised deep learning approach performs exceedingly well for the detection of AMS symptoms in clinical text in our environment. Further work is needed for the generalizability of these findings, including evaluation of these models in other types of clinical notes and other environments. The results seem promising for the ultimate use of these types of classifiers in combination with other information derived from the electronic health records as input for clinical decision support.
Collapse
Affiliation(s)
- Jihad S Obeid
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC, USA.
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC, USA.
| | - Erin R Weeda
- Department of Clinical Pharmacy and Outcome Sciences, Medical University of South Carolina, Charleston, SC, USA
| | - Andrew J Matuskowitz
- Department of Emergency Medicine, Medical University of South Carolina, Charleston, SC, USA
| | - Kevin Gagnon
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, USA
| | - Tami Crawford
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC, USA
| | - Christine M Carr
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC, USA
- Department of Emergency Medicine, Medical University of South Carolina, Charleston, SC, USA
| | - Lewis J Frey
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC, USA
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC, USA
| |
Collapse
|
16
|
Kosowan L, Wicklow B, Queenan J, Yeung R, Amed S, Singer A. Enhancing Health Surveillance: Validation of a Novel Electronic Medical Records-Based Definition of Cases of Pediatric Type 1 and Type 2 Diabetes Mellitus. Can J Diabetes 2019; 43:392-398. [PMID: 30956098 DOI: 10.1016/j.jcjd.2019.02.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Revised: 12/18/2018] [Accepted: 02/13/2019] [Indexed: 12/20/2022]
Abstract
OBJECTIVES To compose and validate an electronic medical records-based case definition for pediatric diabetes in primary care. METHODS Data from the electronic medical records of 221 primary care providers participating in the Manitoba Primary Care Research Network were extracted from April 1, 1998, to March 31, 2015. We assessed agreement among the 3 case definitions of pediatric diabetes and compared the performance of each with the clinical database of the Manitoba Diabetes Education Resource for Children and Adolescents. RESULTS Our reference dataset included 41,055 pediatric patients. Electronic medical records-based case definitions, which included billing records, health conditions lists, prescription records and laboratory results, showed substantially higher sensitivity compared to the administration-based case definition that relied on billing and prescription records (96.9% and 94.9% vs 48.5%). Our study suggests a higher prevalence of pediatric diabetes in Manitoba than was previously reported through administration-based case definitions or in patients whose data were captured in the Manitoba Diabetes Education Resource for Children and Adolescents clinical database. CONCLUSIONS We describe a novel method of calculating the prevalence of pediatric diabetes in a primary care population. This case definition will improve the surveillance of pediatric diabetes and enhance service planning and the development of strategies to support prevention and management.
Collapse
Affiliation(s)
- Leanne Kosowan
- Department of Family Medicine, Max Rady College of Medicine, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Brandy Wicklow
- Department of Pediatrics and Child Health, Max Rady College of Medicine, Rady Faculty of Health Sciences, University of Manitoba Children's Hospital Research Institute of Manitoba, Winnipeg, Manitoba, Canada
| | - John Queenan
- Centre for Studies in Primary Care, Department of Family Medicine, Faculty of Health Sciences, Queen's University, Kingston, Ontario, Canada
| | - Roseanne Yeung
- Division of Endocrinology & Metabolism, Department of Medicine, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, Alberta, Canada
| | - Shazhan Amed
- Division of Endocrinology & Diabetes, Department of Pediatrics, Faculty of Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | - Alexander Singer
- Department of Family Medicine, Max Rady College of Medicine, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, Manitoba, Canada.
| |
Collapse
|
17
|
Wiese AD, Roumie CL, Buse JB, Guzman H, Bradford R, Zalimeni E, Knoepp P, Morris HL, Donahoo WT, Fanous N, Epstein BF, Katalenich BL, Ayala SG, Cook MM, Worley KJ, Bachmann KN, Grijalva CG, Rothman RL, Chakkalakal RJ. Performance of a computable phenotype for identification of patients with diabetes within PCORnet: The Patient-Centered Clinical Research Network. Pharmacoepidemiol Drug Saf 2019; 28:632-639. [PMID: 30680840 DOI: 10.1002/pds.4718] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Revised: 11/27/2018] [Accepted: 12/02/2018] [Indexed: 01/14/2023]
Abstract
PURPOSE PCORnet, the National Patient-Centered Clinical Research Network, represents an innovative system for the conduct of observational and pragmatic studies. We describe the identification and validation of a retrospective cohort of patients with type 2 diabetes (T2DM) from four PCORnet sites. METHODS We adapted existing computable phenotypes (CP) for the identification of patients with T2DM and evaluated their performance across four PCORnet sites (2012-2016). Patients entered the cohort on the earliest date they met one of three CP categories: (CP1) coded T2DM diagnosis (ICD-9/ICD-10) and an antidiabetic prescription, (CP2) diagnosis and glycosylated hemoglobin (HbA1c) ≥6.5%, or (CP3) an antidiabetic prescription and HbA1c ≥6.5%. We required evidence of health care utilization in each of the 2 prior years for each patient, as we also developed an incident T2DM CP to identify the subset of patients without documentation of T2DM in the 365 days before t0 . Among a systematic sample of patients, we calculated the positive predictive value (PPV) for the T2DM CP and incident-T2DM CP using electronic health record (EHR) review as reference. RESULTS The CP identified 50 657 patients with T2DM. The PPV of patients randomly selected for validation was 96.2% (n = 1572; CI:95.1-97.0) and was consistently high across sites. The PPV for the incident-T2DM CP was 5.8% (CI:4.5-7.5). CONCLUSIONS The T2DM CP accurately and efficiently identified patients with T2DM across multiple sites that participate in PCORnet, although the incident T2DM CP requires further study. PCORnet is a valuable data source for future epidemiological and comparative effectiveness research among patients with T2DM.
Collapse
Affiliation(s)
- Andrew D Wiese
- Department of Health Policy, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Christianne L Roumie
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.,Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA.,Veterans Health Administration-Tennessee Valley Healthcare System, Geriatric Research Education Clinical Center (GRECC), Nashville, TN, USA
| | - John B Buse
- Department of Medicine, University of North Carolina, Chapel Hill, NC, USA
| | - Herodes Guzman
- Department of Medicine, University of North Carolina, Chapel Hill, NC, USA
| | - Robert Bradford
- Department of Medicine, University of North Carolina, Chapel Hill, NC, USA
| | - Emily Zalimeni
- Department of Medicine, University of North Carolina, Chapel Hill, NC, USA
| | - Patricia Knoepp
- Department of Medicine, University of North Carolina, Chapel Hill, NC, USA
| | - Heather L Morris
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, USA
| | | | - Nada Fanous
- Department of Medicine, University of Florida, Gainesville, FL, USA
| | | | - Bonnie L Katalenich
- LA CaTS Clinical Translational Unit, Tulane University School of Medicine, Tulane, LA, USA
| | - Sujata G Ayala
- Institute for Medicine and Public Health, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Megan M Cook
- Institute for Medicine and Public Health, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Katherine J Worley
- Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Katherine N Bachmann
- Veterans Health Administration-Tennessee Valley Healthcare System, CSR&D, Nashville, TN, USA.,Vanderbilt Translational and Clinical Cardiovascular Research Center, Vanderbilt University Medical Center, Nashville, TN, USA.,Division of Diabetes, Endocrinology, and Metabolism, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Carlos G Grijalva
- Department of Health Policy, Vanderbilt University Medical Center, Nashville, TN, USA.,Veterans Health Administration-Tennessee Valley Healthcare System, Geriatric Research Education Clinical Center (GRECC), Nashville, TN, USA
| | - Russell L Rothman
- Department of Health Policy, Vanderbilt University Medical Center, Nashville, TN, USA.,Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.,Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA
| | | |
Collapse
|
18
|
Chi GC, Li X, Tartof SY, Slezak JM, Koebnick C, Lawrence JM. Validity of ICD-10-CM codes for determination of diabetes type for persons with youth-onset type 1 and type 2 diabetes. BMJ Open Diabetes Res Care 2019; 7:e000547. [PMID: 30899525 PMCID: PMC6398816 DOI: 10.1136/bmjdrc-2018-000547] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Revised: 11/16/2018] [Accepted: 12/08/2018] [Indexed: 01/18/2023] Open
Abstract
OBJECTIVE Diagnosis codes might be used for diabetes surveillance if they accurately distinguish diabetes type. We assessed the validity of International Classification of Disease, 10th Revision, Clinical Modification (ICD-10-CM) codes to discriminate between type 1 diabetes mellitus (T1DM) and type 2 diabetes mellitus (T2DM) among health plan members with youth-onset (diagnosis age <20 years) diabetes. RESEARCH DESIGN AND METHODS Diabetes case identification and abstraction of diabetes type was done as part of the SEARCH for Diabetes in Youth Study. The gold standard for diabetes type is the physician-assigned diabetes type documented in patients' medical records. Using all healthcare encounters with ICD-10-CM codes for diabetes, we summarized codes within each encounter and determined diabetes type using percent of encounters classified as T2DM. We chose 50% as the threshold from a receiver operating characteristic curve because this threshold yielded the largest Youden's index. Persons with ≥50% T2DM-coded encounters were classified as having T2DM. Otherwise, persons were classified as having T1DM. We calculated sensitivity, specificity, positive and negative predictive values, and accuracy overall and by demographic characteristics. RESULTS According to the gold standard, 1911 persons had T1DM and 652 persons had T2DM (mean age (SD): 19.1 (6.5) years). We obtained 90.6% (95% CI 88.4% to 92.9%) sensitivity, 96.3% (95% CI 95.4% to 97.1%) specificity, 89.3% (95% CI 86.9% to 91.6%) positive predictive value, 96.8% (95% CI 96.0% to 97.6%) negative predictive value, and 94.8% (95% CI 94.0% to 95.7%) accuracy for discriminating T2DM from T1DM. CONCLUSIONS ICD-10-CM codes can accurately classify diabetes type for persons with youth-onset diabetes, showing promise for rapid, cost-efficient diabetes surveillance.
Collapse
Affiliation(s)
- Gloria C Chi
- Epidemic Intelligence Service, Division of Scientific Education and Professional Development, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, California, USA
| | - Xia Li
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, California, USA
| | - Sara Y Tartof
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, California, USA
| | - Jeff M Slezak
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, California, USA
| | - Corinna Koebnick
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, California, USA
| | - Jean M. Lawrence
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, California, USA
| |
Collapse
|
19
|
Abstract
PURPOSE OF REVIEW Surveillance of type 1 diabetes provides an opportunity to address public health needs, inform etiological research, and plan health care services. We present issues in type 1 diabetes surveillance, review previous and current methods, and present new initiatives. RECENT FINDINGS Few diabetes surveillance systems distinguish between type 1 and type 2 diabetes. Most worldwide efforts have focused on registries and ages < 15 years, resulting in limited information among adults. Recently, surveillance includes use of electronic health information and national health surveys. However, distinguishing by diabetes type remains a challenge. Enhancing and improving surveillance of type 1 diabetes across all age groups could include validating questions for use in national health surveys. In addition, validated algorithms for classifying diabetes type in electronic health records could further improve surveillance efforts and close current gaps in our understanding of the epidemiology of type 1 diabetes.
Collapse
Affiliation(s)
- Sharon Saydah
- Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Division of Diabetes Translation, 4770 Bufford Highway, MS F-75, Atlanta, GA, 30341, USA.
| | - Giuseppina Imperatore
- Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Division of Diabetes Translation, 4770 Bufford Highway, MS F-75, Atlanta, GA, 30341, USA
| |
Collapse
|
20
|
Hoffman SR, Vines AI, Halladay JR, Pfaff E, Schiff L, Westreich D, Sundaresan A, Johnson LS, Nicholson WK. Optimizing research in symptomatic uterine fibroids with development of a computable phenotype for use with electronic health records. Am J Obstet Gynecol 2018; 218:610.e1-610.e7. [PMID: 29432754 DOI: 10.1016/j.ajog.2018.02.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Revised: 01/12/2018] [Accepted: 02/05/2018] [Indexed: 01/27/2023]
Abstract
BACKGROUND Women with symptomatic uterine fibroids can report a myriad of symptoms, including pain, bleeding, infertility, and psychosocial sequelae. Optimizing fibroid research requires the ability to enroll populations of women with image-confirmed symptomatic uterine fibroids. OBJECTIVE Our objective was to develop an electronic health record-based algorithm to identify women with symptomatic uterine fibroids for a comparative effectiveness study of medical or surgical treatments on quality-of-life measures. Using an iterative process and text-mining techniques, an effective computable phenotype algorithm, composed of demographics, and clinical and laboratory characteristics, was developed with reasonable performance. Such algorithms provide a feasible, efficient way to identify populations of women with symptomatic uterine fibroids for the conduct of large traditional or pragmatic trials and observational comparative effectiveness studies. Symptomatic uterine fibroids, due to menorrhagia, pelvic pain, bulk symptoms, or infertility, are a source of substantial morbidity for reproductive-age women. Comparing Treatment Options for Uterine Fibroids is a multisite registry study to compare the effectiveness of hormonal or surgical fibroid treatments on women's perceptions of their quality of life. Electronic health record-based algorithms are able to identify large numbers of women with fibroids, but additional work is needed to develop electronic health record algorithms that can identify women with symptomatic fibroids to optimize fibroid research. We sought to develop an efficient electronic health record-based algorithm that can identify women with symptomatic uterine fibroids in a large health care system for recruitment into large-scale observational and interventional research in fibroid management. STUDY DESIGN We developed and assessed the accuracy of 3 algorithms to identify patients with symptomatic fibroids using an iterative approach. The data source was the Carolina Data Warehouse for Health, a repository for the health system's electronic health record data. In addition to International Classification of Diseases, Ninth Revision diagnosis and procedure codes and clinical characteristics, text data-mining software was used to derive information from imaging reports to confirm the presence of uterine fibroids. Results of each algorithm were compared with expert manual review to calculate the positive predictive values for each algorithm. RESULTS Algorithm 1 was composed of the following criteria: (1) age 18-54 years; (2) either ≥1 International Classification of Diseases, Ninth Revision diagnosis codes for uterine fibroids or mention of fibroids using text-mined key words in imaging records or documents; and (3) no International Classification of Diseases, Ninth Revision or Current Procedural Terminology codes for hysterectomy and no reported history of hysterectomy. The positive predictive value was 47% (95% confidence interval 39-56%). Algorithm 2 required ≥2 International Classification of Diseases, Ninth Revision diagnosis codes for fibroids and positive text-mined key words and had a positive predictive value of 65% (95% confidence interval 50-79%). In algorithm 3, further refinements included ≥2 International Classification of Diseases, Ninth Revision diagnosis codes for fibroids on separate outpatient visit dates, the exclusion of women who had a positive pregnancy test within 3 months of their fibroid-related visit, and exclusion of incidentally detected fibroids during prenatal or emergency department visits. Algorithm 3 achieved a positive predictive value of 76% (95% confidence interval 71-81%). CONCLUSION An electronic health record-based algorithm is capable of identifying cases of symptomatic uterine fibroids with moderate positive predictive value and may be an efficient approach for large-scale study recruitment.
Collapse
Affiliation(s)
- Sarah R Hoffman
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC
| | - Anissa I Vines
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC
| | | | - Emily Pfaff
- North Carolina Translational and Clinical Sciences Institute, University of North Carolina, Chapel Hill, NC
| | - Lauren Schiff
- Department of Obstetrics and Gynecology, University of North Carolina, Chapel Hill, NC
| | - Daniel Westreich
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC
| | - Aditi Sundaresan
- Department of Obstetrics and Gynecology, University of North Carolina, Chapel Hill, NC; Center for Health Promotion and Disease Prevention, University of North Carolina, Chapel Hill, NC
| | - La-Shell Johnson
- Department of Obstetrics and Gynecology, University of North Carolina, Chapel Hill, NC; Center for Health Promotion and Disease Prevention, University of North Carolina, Chapel Hill, NC
| | - Wanda K Nicholson
- Department of Obstetrics and Gynecology, University of North Carolina, Chapel Hill, NC; Center for Women's Health Research, University of North Carolina, Chapel Hill, NC; Program on Women's Endocrine and Reproductive Health, School of Medicine, University of North Carolina, Chapel Hill, NC; Center for Health Promotion and Disease Prevention, University of North Carolina, Chapel Hill, NC.
| |
Collapse
|
21
|
Newcomer SR, Kulldorff M, Xu S, Daley MF, Fireman B, Lewis E, Glanz JM. Bias from outcome misclassification in immunization schedule safety research. Pharmacoepidemiol Drug Saf 2018; 27:221-228. [PMID: 29292551 DOI: 10.1002/pds.4374] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2017] [Revised: 09/18/2017] [Accepted: 11/20/2017] [Indexed: 11/11/2022]
Abstract
PURPOSE The Institute of Medicine recommended conducting observational studies of childhood immunization schedule safety. Such studies could be biased by outcome misclassification, leading to incorrect inferences. Using simulations, we evaluated (1) outcome positive predictive values (PPVs) as indicators of bias of an exposure-outcome association, and (2) quantitative bias analyses (QBA) for bias correction. METHODS Simulations were conducted based on proposed or ongoing Vaccine Safety Datalink studies. We simulated 4 studies of 2 exposure groups (children with no vaccines or on alternative schedules) and 2 baseline outcome levels (100 and 1000/100 000 person-years), with 3 relative risk (RR) levels (RR = 0.50, 1.00, and 2.00), across 1000 replications using probabilistic modeling. We quantified bias from non-differential and differential outcome misclassification, based on levels previously measured in database research (sensitivity > 95%; specificity > 99%). We calculated median outcome PPVs, median observed RRs, Type 1 error, and bias-corrected RRs following QBA. RESULTS We observed PPVs from 34% to 98%. With non-differential misclassification and true RR = 2.00, median bias was toward the null, with severe bias (median observed RR = 1.33) with PPV = 34% and modest bias (median observed RR = 1.83) with PPV = 83%. With differential misclassification, PPVs did not reflect median bias, and there was Type 1 error of 100% with PPV = 90%. QBA was generally effective in correcting misclassification bias. CONCLUSIONS In immunization schedule studies, outcome misclassification may be non-differential or differential to exposure. Overall outcome PPVs do not reflect the distribution of false positives by exposure and are poor indicators of bias in individual studies. Our results support QBA for immunization schedule safety research.
Collapse
Affiliation(s)
- Sophia R Newcomer
- Kaiser Permanente Colorado, Institute for Health Research, Denver, CO, USA.,Colorado School of Public Health, Anschutz Medical Campus, Department of Epidemiology, Denver, CO, USA
| | - Martin Kulldorff
- Brigham and Women's Hospital and Harvard Medical School, Division of Pharmacoepidemiology and Pharmacoeconomics, Boston, MA, USA
| | - Stan Xu
- Kaiser Permanente Colorado, Institute for Health Research, Denver, CO, USA
| | - Matthew F Daley
- Kaiser Permanente Colorado, Institute for Health Research, Denver, CO, USA.,University of Colorado Denver, School of Medicine, Department of Pediatrics, Denver, CO, USA
| | - Bruce Fireman
- Kaiser Permanente Northern California, Division of Research, Vaccine Study Center, Oakland, CA, USA
| | - Edwin Lewis
- Kaiser Permanente Northern California, Division of Research, Vaccine Study Center, Oakland, CA, USA
| | - Jason M Glanz
- Kaiser Permanente Colorado, Institute for Health Research, Denver, CO, USA.,Colorado School of Public Health, Anschutz Medical Campus, Department of Epidemiology, Denver, CO, USA
| |
Collapse
|
22
|
Kennell TI, Willig JH, Cimino JJ. Clinical Informatics Researcher's Desiderata for the Data Content of the Next Generation Electronic Health Record. Appl Clin Inform 2017; 8:1159-1172. [PMID: 29270955 DOI: 10.4338/aci-2017-06-r-0101] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
OBJECTIVE Clinical informatics researchers depend on the availability of high-quality data from the electronic health record (EHR) to design and implement new methods and systems for clinical practice and research. However, these data are frequently unavailable or present in a format that requires substantial revision. This article reports the results of a review of informatics literature published from 2010 to 2016 that addresses these issues by identifying categories of data content that might be included or revised in the EHR. MATERIALS AND METHODS We used an iterative review process on 1,215 biomedical informatics research articles. We placed them into generic categories, reviewed and refined the categories, and then assigned additional articles, for a total of three iterations. RESULTS Our process identified eight categories of data content issues: Adverse Events, Clinician Cognitive Processes, Data Standards Creation and Data Communication, Genomics, Medication List Data Capture, Patient Preferences, Patient-reported Data, and Phenotyping. DISCUSSION These categories summarize discussions in biomedical informatics literature that concern data content issues restricting clinical informatics research. These barriers to research result from data that are either absent from the EHR or are inadequate (e.g., in narrative text form) for the downstream applications of the data. In light of these categories, we discuss changes to EHR data storage that should be considered in the redesign of EHRs, to promote continued innovation in clinical informatics. CONCLUSION Based on published literature of clinical informaticians' reuse of EHR data, we characterize eight types of data content that, if included in the next generation of EHRs, would find immediate application in advanced informatics tools and techniques.
Collapse
Affiliation(s)
- Timothy I Kennell
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States
| | - James H Willig
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States.,Department of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States
| | - James J Cimino
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States.,Department of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States
| |
Collapse
|