Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Shivade C, Raghavan P, Fosler-Lussier E, Embi PJ, Elhadad N, Johnson SB, Lai AM. A review of approaches to identifying patient phenotype cohorts using electronic health records. J Am Med Inform Assoc 2013;21:221-30. [PMID: 24201027 PMCID: PMC3932460 DOI: 10.1136/amiajnl-2013-001935] [Citation(s) in RCA: 278] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open

For:	Shivade C, Raghavan P, Fosler-Lussier E, Embi PJ, Elhadad N, Johnson SB, Lai AM. A review of approaches to identifying patient phenotype cohorts using electronic health records. J Am Med Inform Assoc 2013;21:221-30. [PMID: 24201027 PMCID: PMC3932460 DOI: 10.1136/amiajnl-2013-001935] [Citation(s) in RCA: 278] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open

Number

Cited by Other Article(s)

201

Beuther DA, Krishnan JA. Finding Asthma: Building a Foundation for Care and Discovery. Am J Respir Crit Care Med 2017;196:401-402. [PMID: 28475356 DOI: 10.1164/rccm.201704-0840ed] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

202

Owusu Adjah ES, Montvida O, Agbeve J, Paul SK. Data Mining Approach to Identify Disease Cohorts from Primary Care Electronic Medical Records: A Case of Diabetes Mellitus. ACTA ACUST UNITED AC 2017. [DOI: 10.2174/1875036201710010016] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

203

Wang Z, Li L, Glicksberg BS, Israel A, Dudley JT, Ma'ayan A. Predicting age by mining electronic medical records with deep learning characterizes differences between chronological and physiological age. J Biomed Inform 2017;76:59-68. [PMID: 29113935 PMCID: PMC5716867 DOI: 10.1016/j.jbi.2017.11.003] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2017] [Revised: 10/28/2017] [Accepted: 11/04/2017] [Indexed: 02/08/2023]

Abstract

Determining the discrepancy between chronological and physiological age of patients is central to preventative and personalized care. Electronic medical records (EMR) provide rich information about the patient physiological state, but it is unclear whether such information can be predictive of chronological age. Here we present a deep learning model that uses vital signs and lab tests contained within the EMR of Mount Sinai Health System (MSHS) to predict chronological age. The model is trained on 377,686 EMR from patients of ages 18-85 years old. The discrepancy between the predicted and real chronological age is then used as a proxy to estimate physiological age. Overall, the model can predict the chronological age of patients with a standard deviation error of ∼7 years. The ages of the youngest and oldest patients were more accurately predicted, while patients of ages ranging between 40 and 60 years were the least accurately predicted. Patients with the largest discrepancy between their physiological and chronological age were further inspected. The patients predicted to be significantly older than their chronological age have higher systolic blood pressure, higher cholesterol, damaged liver, and anemia. In contrast, patients predicted to be younger than their chronological age have lower blood pressure and shorter stature among other indicators; both groups display lower weight than the population average. Using information from ∼10,000 patients from the entire cohort who have been also profiled with SNP arrays, genome-wide association study (GWAS) uncovers several novel genetic variants associated with aging. In particular, significant variants were mapped to genes known to be associated with inflammation, hypertension, lipid metabolism, height, and increased lifespan in mice. Several genes with missense mutations were identified as novel candidate aging genes. In conclusion, we demonstrate how EMR data can be used to assess overall health via a scale that is based on deviation from the patient's predicted chronological age.

Collapse

204

Chen J, Wei W, Guo C, Tang L, Sun L. Textual analysis and visualization of research trends in data mining for electronic health records. HEALTH POLICY AND TECHNOLOGY 2017. [DOI: 10.1016/j.hlpt.2017.10.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]

205

Mikalsen KØ, Soguero-Ruiz C, Jensen K, Hindberg K, Gran M, Revhaug A, Lindsetmo RO, Skrøvseth SO, Godtliebsen F, Jenssen R. Using anchors from free text in electronic health records to diagnose postoperative delirium. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2017;152:105-114. [PMID: 29054250 DOI: 10.1016/j.cmpb.2017.09.014] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/09/2017] [Revised: 09/05/2017] [Accepted: 09/15/2017] [Indexed: 06/07/2023]

206

Esteban S, Rodríguez Tablado M, Peper FE, Mahumud YS, Ricci RI, Kopitowski KS, Terrasa SA. Development and validation of various phenotyping algorithms for Diabetes Mellitus using data from electronic health records. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2017;152:53-70. [PMID: 29054261 DOI: 10.1016/j.cmpb.2017.09.009] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2017] [Revised: 08/19/2017] [Accepted: 09/13/2017] [Indexed: 06/07/2023]

Abstract

BACKGROUND AND OBJECTIVE

Recent progression towards precision medicine has encouraged the use of electronic health records (EHRs) as a source for large amounts of data, which is required for studying the effect of treatments or risk factors in more specific subpopulations. Phenotyping algorithms allow to automatically classify patients according to their particular electronic phenotype thus facilitating the setup of retrospective cohorts. Our objective is to compare the performance of different classification strategies (only using standardized problems, rule-based algorithms, statistical learning algorithms (six learners) and stacked generalization (five versions)), for the categorization of patients according to their diabetic status (diabetics, not diabetics and inconclusive; Diabetes of any type) using information extracted from EHRs.

METHODS

Patient information was extracted from the EHR at Hospital Italiano de Buenos Aires, Buenos Aires, Argentina. For the derivation and validation datasets, two probabilistic samples of patients from different years (2005: n = 1663; 2015: n = 800) were extracted. The only inclusion criterion was age (≥40 & <80 years). Four researchers manually reviewed all records and classified patients according to their diabetic status (diabetic: diabetes registered as a health problem or fulfilling the ADA criteria; non-diabetic: not fulfilling the ADA criteria and having at least one fasting glycemia below 126 mg/dL; inconclusive: no data regarding their diabetic status or only one abnormal value). The best performing algorithms within each strategy were tested on the validation set.

RESULTS

The standardized codes algorithm achieved a Kappa coefficient value of 0.59 (95% CI 0.49, 0.59) in the validation set. The Boolean logic algorithm reached 0.82 (95% CI 0.76, 0.88). A slightly higher value was achieved by the Feedforward Neural Network (0.9, 95% CI 0.85, 0.94). The best performing learner was the stacked generalization meta-learner that reached a Kappa coefficient value of 0.95 (95% CI 0.91, 0.98).

CONCLUSIONS

The stacked generalization strategy and the feedforward neural network showed the best classification metrics in the validation set. The implementation of these algorithms enables the exploitation of the data of thousands of patients accurately.

Collapse

207

Escudié JB, Rance B, Malamut G, Khater S, Burgun A, Cellier C, Jannot AS. A novel data-driven workflow combining literature and electronic health records to estimate comorbidities burden for a specific disease: a case study on autoimmune comorbidities in patients with celiac disease. BMC Med Inform Decis Mak 2017;17:140. [PMID: 28962565 PMCID: PMC5622531 DOI: 10.1186/s12911-017-0537-y] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2016] [Accepted: 09/12/2017] [Indexed: 01/07/2023] Open

208

Gustafson E, Pacheco J, Wehbe F, Silverberg J, Thompson W. A Machine Learning Algorithm for Identifying Atopic Dermatitis in Adults from Electronic Health Records. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS 2017;2017:83-90. [PMID: 29104964 DOI: 10.1109/ichi.2017.31] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]

209

Schlegel DR, Ficheur G. Secondary Use of Patient Data: Review of the Literature Published in 2016. Yearb Med Inform 2017;26:68-71. [PMID: 29063536 DOI: 10.15265/iy-2017-032] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

210

Blecker S, Sontag D, Horwitz LI, Kuperman G, Park H, Reyentovich A, Katz SD. Early Identification of Patients With Acute Decompensated Heart Failure. J Card Fail 2017;24:357-362. [PMID: 28887109 DOI: 10.1016/j.cardfail.2017.08.458] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2017] [Revised: 08/16/2017] [Accepted: 08/25/2017] [Indexed: 11/26/2022]

211

Chakrabarti S, Sen A, Huser V, Hruby GW, Rusanov A, Albers DJ, Weng C. An Interoperable Similarity-based Cohort Identification Method Using the OMOP Common Data Model version 5.0. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2017;1:1-18. [PMID: 28776047 DOI: 10.1007/s41666-017-0005-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]

212

El Naqa I, Kerns SL, Coates J, Luo Y, Speers C, West CML, Rosenstein BS, Ten Haken RK. Radiogenomics and radiotherapy response modeling. Phys Med Biol 2017;62:R179-R206. [PMID: 28657906 PMCID: PMC5557376 DOI: 10.1088/1361-6560/aa7c55] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

213

Dugas M. Clinical Research Informatics: Recent Advances and Future Directions. Yearb Med Inform 2017;10:174-7. [PMID: 26293865 DOI: 10.15265/iy-2015-010] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open

214

Clifton DA, Niehaus KE, Charlton P, Colopy GW. Health Informatics via Machine Learning for the Clinical Management of Patients. Yearb Med Inform 2017;10:38-43. [PMID: 26293849 DOI: 10.15265/iy-2015-014] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open

215

Wang M, Cyhaniuk A, Cooper DL, Iyer NN. Identification of people with acquired hemophilia in a large electronic health record database. J Blood Med 2017;8:89-97. [PMID: 28769599 PMCID: PMC5529096 DOI: 10.2147/jbm.s136060] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open

Abstract

Background

Electronic health records (EHRs) can provide insights into diagnoses, treatment patterns, and clinical outcomes. Acquired hemophilia (AH) is an ultrarare bleeding disorder characterized by factor VIII inhibiting autoantibodies.

Aim

To identify patients with AH using an EHR database.

Methods

Records were accessed from a large EHR database (Humedica) between January 1, 2007 and July 31, 2013. Broad selection criteria were applied using the International Classification of Diseases, Ninth Revision, clinical modification (ICD-9-CM) code for intrinsic circulating anticoagulants (286.5 and all subcodes) and confirmation of records 6 months before and 12 months after the first diagnosis. Additional selection criteria included mention of “bleeding” within physician notes identified via natural language processing output and a normal prothrombin time and prolonged activated partial thromboplastin time.

Results

Of 6,348 patients with a diagnosis code of 286.5 or any subcodes, 16 males and 15 females met the selection criteria. The most common bleeding locations reported was gastrointestinal (23%), vaginal (16%), and endocrine (13%). A wide range of comorbidities was reported. Natural language processing identified chart note mention of “hemophilia” in 3 patients (10%), “bruise” in 15 patients (48%), and “pain” in all 31 patients. No patients received a prescription for approved/recommended AH treatments. Four patient cases were reviewed to validate whether the identified cohort had AH; each patient had bleeding symptoms and a normal prothrombin time and prolonged activated partial thromboplastin time, although none received hemostatic treatments.

Conclusion

In ultrarare disorders, ICD-9-CM coding alone may be insufficient to identify patient cohorts; multimodal analysis combined with in-depth reviews of physician notes may be more effective.

Collapse

216

Clark C, Wellner B, Davis R, Aberdeen J, Hirschman L. Automatic classification of RDoC positive valence severity with a neural network. J Biomed Inform 2017;75S:S120-S128. [PMID: 28694118 DOI: 10.1016/j.jbi.2017.07.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2017] [Revised: 06/27/2017] [Accepted: 07/05/2017] [Indexed: 10/19/2022]

217

Kagawa R, Kawazoe Y, Ida Y, Shinohara E, Tanaka K, Imai T, Ohe K. Development of Type 2 Diabetes Mellitus Phenotyping Framework Using Expert Knowledge and Machine Learning Approach. J Diabetes Sci Technol 2017;11:791-799. [PMID: 27932531 PMCID: PMC5588819 DOI: 10.1177/1932296816681584] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]

218

Al Sallakh MA, Vasileiou E, Rodgers SE, Lyons RA, Sheikh A, Davies GA. Defining asthma and assessing asthma outcomes using electronic health record data: a systematic scoping review. Eur Respir J 2017;49:49/6/1700204. [DOI: 10.1183/13993003.00204-2017] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Accepted: 03/09/2017] [Indexed: 01/25/2023]

219

Jonnalagadda SR, Adupa AK, Garg RP, Corona-Cox J, Shah SJ. Text Mining of the Electronic Health Record: An Information Extraction Approach for Automated Identification and Subphenotyping of HFpEF Patients for Clinical Trials. J Cardiovasc Transl Res 2017;10:313-321. [DOI: 10.1007/s12265-017-9752-2] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Accepted: 05/16/2017] [Indexed: 12/01/2022]

220

Williams R, Kontopantelis E, Buchan I, Peek N. Clinical code set engineering for reusing EHR data for research: A review. J Biomed Inform 2017;70:1-13. [PMID: 28442434 DOI: 10.1016/j.jbi.2017.04.010] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2017] [Revised: 03/21/2017] [Accepted: 04/13/2017] [Indexed: 01/26/2023]

Abstract

INTRODUCTION

The construction of reliable, reusable clinical code sets is essential when re-using Electronic Health Record (EHR) data for research. Yet code set definitions are rarely transparent and their sharing is almost non-existent. There is a lack of methodological standards for the management (construction, sharing, revision and reuse) of clinical code sets which needs to be addressed to ensure the reliability and credibility of studies which use code sets.

OBJECTIVE

To review methodological literature on the management of sets of clinical codes used in research on clinical databases and to provide a list of best practice recommendations for future studies and software tools.

METHODS

We performed an exhaustive search for methodological papers about clinical code set engineering for re-using EHR data in research. This was supplemented with papers identified by snowball sampling. In addition, a list of e-phenotyping systems was constructed by merging references from several systematic reviews on this topic, and the processes adopted by those systems for code set management was reviewed.

RESULTS

Thirty methodological papers were reviewed. Common approaches included: creating an initial list of synonyms for the condition of interest (n=20); making use of the hierarchical nature of coding terminologies during searching (n=23); reviewing sets with clinician input (n=20); and reusing and updating an existing code set (n=20). Several open source software tools (n=3) were discovered.

DISCUSSION

There is a need for software tools that enable users to easily and quickly create, revise, extend, review and share code sets and we provide a list of recommendations for their design and implementation.

CONCLUSION

Research re-using EHR data could be improved through the further development, more widespread use and routine reporting of the methods by which clinical codes were selected.

Collapse

221

Vaduganathan M, Patel RB, Butler J, Metra M. Integrating electronic health records into the study of heart failure: promises and pitfalls. Eur J Heart Fail 2017;19:1128-1130. [PMID: 28544192 DOI: 10.1002/ejhf.878] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/31/2017] [Accepted: 04/04/2017] [Indexed: 02/03/2023] Open

222

Cox ZL, Lai P, Lewis CM, Lenihan DJ. Centers for Medicare and Medicaid Services' readmission reports inaccurately describe an institution's decompensated heart failure admissions. Clin Cardiol 2017;40:620-625. [PMID: 28471510 DOI: 10.1002/clc.22711] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/09/2017] [Revised: 03/02/2017] [Accepted: 03/02/2017] [Indexed: 11/06/2022] Open

223

Development and Prospective Validation of Tools to Accurately Identify Neurosurgical and Critical Care Events in Children With Traumatic Brain Injury. Pediatr Crit Care Med 2017;18:442-451. [PMID: 28252524 PMCID: PMC5419849 DOI: 10.1097/pcc.0000000000001120] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Abstract

OBJECTIVE

To develop and validate case definitions (computable phenotypes) to accurately identify neurosurgical and critical care events in children with traumatic brain injury.

DESIGN

Prospective observational cohort study, May 2013 to September 2015.

SETTING

Two large U.S. children's hospitals with level 1 Pediatric Trauma Centers.

PATIENTS

One hundred seventy-four children less than 18 years old admitted to an ICU after traumatic brain injury.

MEASUREMENTS AND MAIN RESULTS

Prospective data were linked to database codes for each patient. The outcomes were prospectively identified acute traumatic brain injury, intracranial pressure monitor placement, craniotomy or craniectomy, vascular catheter placement, invasive mechanical ventilation, and new gastrostomy tube or tracheostomy placement. Candidate predictors were database codes present in administrative, billing, or trauma registry data. For each clinical event, we developed and validated penalized regression and Boolean classifiers (models to identify clinical events that take database codes as predictors). We externally validated the best model for each clinical event. The primary model performance measure was accuracy, the percent of test patients correctly classified. The cohort included 174 children who required ICU admission after traumatic brain injury. Simple Boolean classifiers were greater than or equal to 94% accurate for seven of nine clinical diagnoses and events. For central venous catheter placement, no classifier achieved 90% accuracy. Classifier accuracy was dependent on available data fields. Five of nine classifiers were acceptably accurate using only administrative data but three required trauma registry fields and two required billing data.

CONCLUSIONS

In children with traumatic brain injury, computable phenotypes based on simple Boolean classifiers were highly accurate for most neurosurgical and critical care diagnoses and events. The computable phenotypes we developed and validated can be used in any observational study of children with traumatic brain injury and can reasonably be applied in studies of these interventions in other patient populations.

Collapse

224

Upadhyaya SG, Murphree DH, Ngufor CG, Knight AM, Cronk DJ, Cima RR, Curry TB, Pathak J, Carter RE, Kor DJ. Automated Diabetes Case Identification Using Electronic Health Record Data at a Tertiary Care Facility. Mayo Clin Proc Innov Qual Outcomes 2017;1:100-110. [PMID: 30225406 PMCID: PMC6135013 DOI: 10.1016/j.mayocpiqo.2017.04.005] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open

Abstract

Objective

To develop and validate a phenotyping algorithm for the identification of patients with type 1 and type 2 diabetes mellitus (DM) preoperatively using routinely available clinical data from electronic health records.

Patients and Methods

We used first-order logic rules (if-then-else rules) to imply the presence or absence of DM types 1 and 2. The “if” clause of each rule is a conjunction of logical and, or predicates that provides evidence toward or against the presence of DM. The rule includes International Classification of Diseases, Ninth Revision, Clinical Modification diagnostic codes, outpatient prescription information, laboratory values, and positive annotation of DM in patients’ clinical notes. This study was conducted from March 2, 2015, through February 10, 2016. The performance of our rule-based approach and similar approaches proposed by other institutions was evaluated with a reference standard created by an expert reviewer and implemented for routine clinical care at an academic medical center.

Results

A total of 4208 surgical patients (mean age, 52 years; males, 48%) were analyzed to develop the phenotyping algorithm. Expert review identified 685 patients (16.28% of the full cohort) as having DM. Our proposed method identified 684 patients (16.25%) as having DM. The algorithm performed well—99.70% sensitivity, 99.97% specificity—and compared favorably with previous approaches.

Conclusion

Among patients undergoing surgery, determination of DM can be made with high accuracy using simple, computationally efficient rules. Knowledge of patients’ DM status before surgery may alter physicians’ care plan and reduce postsurgical complications. Nevertheless, future efforts are necessary to determine the effect of first-order logic rules on clinical processes and patient outcomes.

Collapse

225

Marshall EA, Oates JC, Shoaibi A, Obeid JS, Habrat ML, Warren RW, Brady KT, Lenert LA. A population-based approach for implementing change from opt-out to opt-in research permissions. PLoS One 2017;12:e0168223. [PMID: 28441388 PMCID: PMC5404843 DOI: 10.1371/journal.pone.0168223] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2016] [Accepted: 11/28/2016] [Indexed: 01/23/2023] Open

Abstract

Due to recently proposed changes in the Common Rule regarding the collection of research preferences, there is an increased need for efficient methods to document opt-in research preferences at a population level. Previously, our institution developed an opt-out paper-based workflow that could not be utilized for research in a scalable fashion. This project was designed to demonstrate the feasibility of implementing an electronic health record (EHR)-based active opt-in research preferences program. The first phase of implementation required creating and disseminating a patient questionnaire through the EHR portal to populate discreet fields within the EHR indicating patients' preferences for future research study contact (contact) and their willingness to allow anonymised use of excess tissue and fluid specimens (biobank). In the second phase, the questionnaire was presented within a clinic nurse intake workflow in an obstetrical clinic. These permissions were tabulated in registries for use by investigators for feasibility studies and recruitment. The registry was also used for research patient contact management using a new EHR encounter type to differentiate research from clinical encounters. The research permissions questionnaire was sent to 59,670 patients via the EHR portal. Within four months, 21,814 responses (75% willing to participate in biobanking, and 72% willing to be contacted for future research) were received. Each response was recorded within a patient portal encounter to enable longitudinal analysis of responses. We obtained a significantly lower positive response from the 264 females who completed the questionnaire in the obstetrical clinic (55% volunteers for biobank and 52% for contact). We demonstrate that it is possible to establish a research permissions registry using the EHR portal and clinic-based workflows. This patient-centric, population-based, opt-in approach documents preferences in the EHR, allowing linkage of these preferences to health record information.

Collapse

226

EHR-based phenotyping: Bulk learning and evaluation. J Biomed Inform 2017;70:35-51. [PMID: 28410982 DOI: 10.1016/j.jbi.2017.04.009] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2016] [Revised: 03/09/2017] [Accepted: 04/10/2017] [Indexed: 01/29/2023]

227

Shivade C, Hebert C, Regan K, Fosler-Lussier E, Lai AM. Automatic data source identification for clinical trial eligibility criteria resolution. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2017;2016:1149-1158. [PMID: 28269912 PMCID: PMC5333255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

228

Duan R, Cao M, Wu Y, Huang J, Denny JC, Xu H, Chen Y. An Empirical Study for Impacts of Measurement Errors on EHR based Association Studies. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2017;2016:1764-1773. [PMID: 28269935 PMCID: PMC5333313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

229

Goodwin TR, Harabagiu SM. Multi-modal Patient Cohort Identification from EEG Report and Signal Data. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2017;2016:1794-1803. [PMID: 28269938 PMCID: PMC5333290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

230

Kuo TT, Rao P, Maehara C, Doan S, Chaparro JD, Day ME, Farcas C, Ohno-Machado L, Hsu CN. Ensembles of NLP Tools for Data Element Extraction from Clinical Notes. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2017;2016:1880-1889. [PMID: 28269947 PMCID: PMC5333200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

231

Jackson RG, Patel R, Jayatilleke N, Kolliakou A, Ball M, Gorrell G, Roberts A, Dobson RJ, Stewart R. Natural language processing to extract symptoms of severe mental illness from clinical text: the Clinical Record Interactive Search Comprehensive Data Extraction (CRIS-CODE) project. BMJ Open 2017;7:e012012. [PMID: 28096249 PMCID: PMC5253558 DOI: 10.1136/bmjopen-2016-012012] [Citation(s) in RCA: 111] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/23/2016] [Revised: 08/11/2016] [Accepted: 10/04/2016] [Indexed: 01/13/2023] Open

Abstract

OBJECTIVES

We sought to use natural language processing to develop a suite of language models to capture key symptoms of severe mental illness (SMI) from clinical text, to facilitate the secondary use of mental healthcare data in research.

DESIGN

Development and validation of information extraction applications for ascertaining symptoms of SMI in routine mental health records using the Clinical Record Interactive Search (CRIS) data resource; description of their distribution in a corpus of discharge summaries.

SETTING

Electronic records from a large mental healthcare provider serving a geographic catchment of 1.2 million residents in four boroughs of south London, UK.

PARTICIPANTS

The distribution of derived symptoms was described in 23 128 discharge summaries from 7962 patients who had received an SMI diagnosis, and 13 496 discharge summaries from 7575 patients who had received a non-SMI diagnosis.

OUTCOME MEASURES

Fifty SMI symptoms were identified by a team of psychiatrists for extraction based on salience and linguistic consistency in records, broadly categorised under positive, negative, disorganisation, manic and catatonic subgroups. Text models for each symptom were generated using the TextHunter tool and the CRIS database.

RESULTS

We extracted data for 46 symptoms with a median F1 score of 0.88. Four symptom models performed poorly and were excluded. From the corpus of discharge summaries, it was possible to extract symptomatology in 87% of patients with SMI and 60% of patients with non-SMI diagnosis.

CONCLUSIONS

This work demonstrates the possibility of automatically extracting a broad range of SMI symptoms from English text discharge summaries for patients with an SMI diagnosis. Descriptive data also indicated that most symptoms cut across diagnoses, rather than being restricted to particular groups.

Collapse

232

Claveau V, Silva Oliveira LE, Bouzillé G, Cuggia M, Cabral Moro CM, Grabar N. Numerical Eligibility Criteria in Clinical Protocols: Annotation, Automatic Detection and Interpretation. Artif Intell Med 2017. [DOI: 10.1007/978-3-319-59758-4_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

233

Cox ZL, Lewis CM, Lai P, Lenihan DJ. Validation of an automated electronic algorithm and "dashboard" to identify and characterize decompensated heart failure admissions across a medical center. Am Heart J 2017;183:40-48. [PMID: 27979040 DOI: 10.1016/j.ahj.2016.10.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/14/2016] [Accepted: 10/01/2016] [Indexed: 11/19/2022]

Abstract

BACKGROUND

We aim to validate the diagnostic performance of the first fully automatic, electronic heart failure (HF) identification algorithm and evaluate the implementation of an HF Dashboard system with 2 components: real-time identification of decompensated HF admissions and accurate characterization of disease characteristics and medical therapy.

METHODS

We constructed an HF identification algorithm requiring 3 of 4 identifiers: B-type natriuretic peptide >400 pg/mL; admitting HF diagnosis; history of HF International Classification of Disease, Ninth Revision, diagnosis codes; and intravenous diuretic administration. We validated the diagnostic accuracy of the components individually (n = 366) and combined in the HF algorithm (n = 150) compared with a blinded provider panel in 2 separate cohorts. We built an HF Dashboard within the electronic medical record characterizing the disease and medical therapies of HF admissions identified by the HF algorithm. We evaluated the HF Dashboard's performance over 26 months of clinical use.

RESULTS

Individually, the algorithm components displayed variable sensitivity and specificity, respectively: B-type natriuretic peptide >400 pg/mL (89% and 87%); diuretic (80% and 92%); and International Classification of Disease, Ninth Revision, code (56% and 95%). The HF algorithm achieved a high specificity (95%), positive predictive value (82%), and negative predictive value (85%) but achieved limited sensitivity (56%) secondary to missing provider-generated identification data. The HF Dashboard identified and characterized 3147 HF admissions over 26 months.

CONCLUSIONS

Automated identification and characterization systems can be developed and used with a substantial degree of specificity for the diagnosis of decompensated HF, although sensitivity is limited by clinical data input.

Collapse

234

Daniel C, Ouagne D, Sadou E, Paris N, Hussain S, Jaulent M, Kalra D. Cross border semantic interoperability for learning health systems: The EHR4CR semantic resources and services. Learn Health Syst 2017;1:e10014. [PMID: 31245551 PMCID: PMC6516724 DOI: 10.1002/lrh2.10014] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2015] [Revised: 07/07/2016] [Accepted: 07/28/2016] [Indexed: 12/15/2022] Open

235

Blecker S, Katz SD, Horwitz LI, Kuperman G, Park H, Gold A, Sontag D. Comparison of Approaches for Heart Failure Case Identification From Electronic Health Record Data. JAMA Cardiol 2016;1:1014-1020. [PMID: 27706470 DOI: 10.1001/jamacardio.2016.3236] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Abstract

Importance

Accurate, real-time case identification is needed to target interventions to improve quality and outcomes for hospitalized patients with heart failure. Problem lists may be useful for case identification but are often inaccurate or incomplete. Machine-learning approaches may improve accuracy of identification but can be limited by complexity of implementation.

Objective

To develop algorithms that use readily available clinical data to identify patients with heart failure while in the hospital.

Design, Setting, and Participants

We performed a retrospective study of hospitalizations at an academic medical center. Hospitalizations for patients 18 years or older who were admitted after January 1, 2013, and discharged before February 28, 2015, were included. From a random 75% sample of hospitalizations, we developed 5 algorithms for heart failure identification using electronic health record data: (1) heart failure on problem list; (2) presence of at least 1 of 3 characteristics: heart failure on problem list, inpatient loop diuretic, or brain natriuretic peptide level of 500 pg/mL or higher; (3) logistic regression of 30 clinically relevant structured data elements; (4) machine-learning approach using unstructured notes; and (5) machine-learning approach using structured and unstructured data.

Main Outcomes and Measures

Heart failure diagnosis based on discharge diagnosis and physician review of sampled medical records.

Results

A total of 47 119 hospitalizations were included in this study (mean [SD] age, 60.9 [18.15] years; 23 952 female [50.8%], 5258 black/African American [11.2%], and 3667 Hispanic/Latino [7.8%] patients). Of these hospitalizations, 6549 (13.9%) had a discharge diagnosis of heart failure. Inclusion of heart failure on the problem list (algorithm 1) had a sensitivity of 0.40 and a positive predictive value (PPV) of 0.96 for heart failure identification. Algorithm 2 improved sensitivity to 0.77 at the expense of a PPV of 0.64. Algorithms 3, 4, and 5 had areas under the receiver operating characteristic curves of 0.953, 0.969, and 0.974, respectively. With a PPV of 0.9, these algorithms had associated sensitivities of 0.68, 0.77, and 0.83, respectively.

Conclusions and Relevance

The problem list is insufficient for real-time identification of hospitalized patients with heart failure. The high predictive accuracy of machine learning using free text demonstrates that support of such analytics in future electronic health record systems can improve cohort identification.

Collapse

236

Vande Loo SJ, North F. Patient question set proliferation: scope and informatics challenges of patient question set management in a large multispecialty practice with case examples pertaining to tobacco use, menopause, and Urology and Orthopedics specialties. BMC Med Inform Decis Mak 2016;16:41. [PMID: 27066892 PMCID: PMC4828833 DOI: 10.1186/s12911-016-0279-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2015] [Accepted: 04/01/2016] [Indexed: 11/13/2022] Open

Abstract

Background

Health care institutions have patient question sets that can expand over time. For a multispecialty group, each specialty might have multiple question sets. As a result, question set governance can be challenging. Knowledge of the counts, variability and repetition of questions in a multispecialty practice can help institutions understand the challenges of question set proliferation.

Methods

We analyzed patient-facing question sets that were subject to institutional governance and those that were not. We examined question variability and number of repetitious questions for a simulated episode of care. In addition to examining general patient question sets, we used specific examples of tobacco questions, questions from two specialty areas, and questions to menopausal women.

Results

In our analysis, there were approximately 269 institutionally governed patient question sets with a mean of 74 questions per set accounting for an estimated 20,000 governed questions. Sampling from selected specialties revealed that 50 % of patient question sets were not institutionally governed. We found over 650 tobacco-related questions in use, many with only slight variations. A simulated use case for a menopausal woman revealed potentially over 200 repeated questions.

Conclusions

A group practice with multiple specialties can have a large volume of patient questions that are not centrally developed, stored or governed. This results in a lack of standardization and coordination. Patients may be given multiple repeated questions throughout the course of their care, and providers lack standardized question sets to help construct valid patient phenotypes. Even with the implementation of a single electronic health record, medical practices may still have a health information management gap in the ability to create, store and share patient-generated health information that is meaningful to both patients and physicians.

Electronic supplementary material

The online version of this article (doi:10.1186/s12911-016-0279-2) contains supplementary material, which is available to authorized users.

Collapse

237

Dendrou CA, McVean G, Fugger L. Neuroinflammation - using big data to inform clinical practice. Nat Rev Neurol 2016;12:685-698. [PMID: 27857124 DOI: 10.1038/nrneurol.2016.171] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

238

Demner-Fushman D, Elhadad N. Aspiring to Unintended Consequences of Natural Language Processing: A Review of Recent Developments in Clinical and Consumer-Generated Text Processing. Yearb Med Inform 2016;25:224-233. [PMID: 27830255 PMCID: PMC5171557 DOI: 10.15265/iy-2016-017] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open

239

Kirby JC, Speltz P, Rasmussen LV, Basford M, Gottesman O, Peissig PL, Pacheco JA, Tromp G, Pathak J, Carrell DS, Ellis SB, Lingren T, Thompson WK, Savova G, Haines J, Roden DM, Harris PA, Denny JC. PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability. J Am Med Inform Assoc 2016;23:1046-1052. [PMID: 27026615 PMCID: PMC5070514 DOI: 10.1093/jamia/ocv202] [Citation(s) in RCA: 213] [Impact Index Per Article: 26.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Revised: 10/27/2015] [Accepted: 11/25/2015] [Indexed: 01/29/2023] Open

Abstract

OBJECTIVE

Health care generated data have become an important source for clinical and genomic research. Often, investigators create and iteratively refine phenotype algorithms to achieve high positive predictive values (PPVs) or sensitivity, thereby identifying valid cases and controls. These algorithms achieve the greatest utility when validated and shared by multiple health care systems.Materials and Methods We report the current status and impact of the Phenotype KnowledgeBase (PheKB, http://phekb.org), an online environment supporting the workflow of building, sharing, and validating electronic phenotype algorithms. We analyze the most frequent components used in algorithms and their performance at authoring institutions and secondary implementation sites.

RESULTS

As of June 2015, PheKB contained 30 finalized phenotype algorithms and 62 algorithms in development spanning a range of traits and diseases. Phenotypes have had over 3500 unique views in a 6-month period and have been reused by other institutions. International Classification of Disease codes were the most frequently used component, followed by medications and natural language processing. Among algorithms with published performance data, the median PPV was nearly identical when evaluated at the authoring institutions (n = 44; case 96.0%, control 100%) compared to implementation sites (n = 40; case 97.5%, control 100%).

DISCUSSION

These results demonstrate that a broad range of algorithms to mine electronic health record data from different health systems can be developed with high PPV, and algorithms developed at one site are generally transportable to others.

CONCLUSION

By providing a central repository, PheKB enables improved development, transportability, and validity of algorithms for research-grade phenotypes using health care generated data.

Collapse

240

Chiaramello E, Pinciroli F, Bonalumi A, Caroli A, Tognola G. Use of “off-the-shelf” information extraction algorithms in clinical informatics: A feasibility study of MetaMap annotation of Italian medical notes. J Biomed Inform 2016;63:22-32. [DOI: 10.1016/j.jbi.2016.07.017] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2016] [Revised: 07/12/2016] [Accepted: 07/17/2016] [Indexed: 02/06/2023]

241

Zheng T, Xie W, Xu L, He X, Zhang Y, You M, Yang G, Chen Y. A machine learning-based framework to identify type 2 diabetes through electronic health records. Int J Med Inform 2016;97:120-127. [PMID: 27919371 DOI: 10.1016/j.ijmedinf.2016.09.014] [Citation(s) in RCA: 118] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2016] [Revised: 09/27/2016] [Accepted: 09/30/2016] [Indexed: 01/19/2023]

Abstract

OBJECTIVE

To discover diverse genotype-phenotype associations affiliated with Type 2 Diabetes Mellitus (T2DM) via genome-wide association study (GWAS) and phenome-wide association study (PheWAS), more cases (T2DM subjects) and controls (subjects without T2DM) are required to be identified (e.g., via Electronic Health Records (EHR)). However, existing expert based identification algorithms often suffer in a low recall rate and could miss a large number of valuable samples under conservative filtering standards. The goal of this work is to develop a semi-automated framework based on machine learning as a pilot study to liberalize filtering criteria to improve recall rate with a keeping of low false positive rate.

MATERIALS AND METHODS

We propose a data informed framework for identifying subjects with and without T2DM from EHR via feature engineering and machine learning. We evaluate and contrast the identification performance of widely-used machine learning models within our framework, including k-Nearest-Neighbors, Naïve Bayes, Decision Tree, Random Forest, Support Vector Machine and Logistic Regression. Our framework was conducted on 300 patient samples (161 cases, 60 controls and 79 unconfirmed subjects), randomly selected from 23,281 diabetes related cohort retrieved from a regional distributed EHR repository ranging from 2012 to 2014.

RESULTS

We apply top-performing machine learning algorithms on the engineered features. We benchmark and contrast the accuracy, precision, AUC, sensitivity and specificity of classification models against the state-of-the-art expert algorithm for identification of T2DM subjects. Our results indicate that the framework achieved high identification performances (∼0.98 in average AUC), which are much higher than the state-of-the-art algorithm (0.71 in AUC).

DISCUSSION

Expert algorithm-based identification of T2DM subjects from EHR is often hampered by the high missing rates due to their conservative selection criteria. Our framework leverages machine learning and feature engineering to loosen such selection criteria to achieve a high identification rate of cases and controls.

CONCLUSIONS

Our proposed framework demonstrates a more accurate and efficient approach for identifying subjects with and without T2DM from EHR.

Collapse

242

Lingren T, Thaker V, Brady C, Namjou B, Kennebeck S, Bickel J, Patibandla N, Ni Y, Van Driest SL, Chen L, Roach A, Cobb B, Kirby J, Denny J, Bailey-Davis L, Williams MS, Marsolo K, Solti I, Holm IA, Harley J, Kohane IS, Savova G, Crimmins N. Developing an Algorithm to Detect Early Childhood Obesity in Two Tertiary Pediatric Medical Centers. Appl Clin Inform 2016;7:693-706. [PMID: 27452794 DOI: 10.4338/aci-2016-01-ra-0015] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2016] [Accepted: 06/15/2016] [Indexed: 01/12/2023] Open

243

Daniel C, Ouagne D, Sadou E, Forsberg K, Gilchrist MM, Zapletal E, Paris N, Hussain S, Jaulent MC, Kalra D. Cross border semantic interoperability for clinical research: the EHR4CR semantic resources and services. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2016;2016:51-9. [PMID: 27570649 PMCID: PMC5001763] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]

244

Agarwal V, Podchiyska T, Banda JM, Goel V, Leung TI, Minty EP, Sweeney TE, Gyang E, Shah NH. Learning statistical models of phenotypes using noisy labeled training data. J Am Med Inform Assoc 2016;23:1166-1173. [PMID: 27174893 DOI: 10.1093/jamia/ocw028] [Citation(s) in RCA: 83] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2015] [Revised: 11/08/2015] [Accepted: 12/12/2015] [Indexed: 01/29/2023] Open

245

Mowery DL, Chapman BE, Conway M, South BR, Madden E, Keyhani S, Chapman WW. Extracting a stroke phenotype risk factor from Veteran Health Administration clinical reports: an information content analysis. J Biomed Semantics 2016;7:26. [PMID: 27175226 PMCID: PMC4863379 DOI: 10.1186/s13326-016-0065-1] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2015] [Accepted: 04/19/2016] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

In the United States, 795,000 people suffer strokes each year; 10-15 % of these strokes can be attributed to stenosis caused by plaque in the carotid artery, a major stroke phenotype risk factor. Studies comparing treatments for the management of asymptomatic carotid stenosis are challenging for at least two reasons: 1) administrative billing codes (i.e., Current Procedural Terminology (CPT) codes) that identify carotid images do not denote which neurovascular arteries are affected and 2) the majority of the image reports are negative for carotid stenosis. Studies that rely on manual chart abstraction can be labor-intensive, expensive, and time-consuming. Natural Language Processing (NLP) can expedite the process of manual chart abstraction by automatically filtering reports with no/insignificant carotid stenosis findings and flagging reports with significant carotid stenosis findings; thus, potentially reducing effort, costs, and time.

METHODS

In this pilot study, we conducted an information content analysis of carotid stenosis mentions in terms of their report location (Sections), report formats (structures) and linguistic descriptions (expressions) from Veteran Health Administration free-text reports. We assessed an NLP algorithm, pyConText's, ability to discern reports with significant carotid stenosis findings from reports with no/insignificant carotid stenosis findings given these three document composition factors for two report types: radiology (RAD) and text integration utility (TIU) notes.

RESULTS

We observed that most carotid mentions are recorded in prose using categorical expressions, within the Findings and Impression sections for RAD reports and within neither of these designated sections for TIU notes. For RAD reports, pyConText performed with high sensitivity (88 %), specificity (84 %), and negative predictive value (95 %) and reasonable positive predictive value (70 %). For TIU notes, pyConText performed with high specificity (87 %) and negative predictive value (92 %), reasonable sensitivity (73 %), and moderate positive predictive value (58 %). pyConText performed with the highest sensitivity processing the full report rather than the Findings or Impressions independently.

CONCLUSION

We conclude that pyConText can reduce chart review efforts by filtering reports with no/insignificant carotid stenosis findings and flagging reports with significant carotid stenosis findings from the Veteran Health Administration electronic health record, and hence has utility for expediting a comparative effectiveness study of treatment strategies for stroke prevention.

Collapse

246

Zhou SM, Fernandez-Gutierrez F, Kennedy J, Cooksey R, Atkinson M, Denaxas S, Siebert S, Dixon WG, O’Neill TW, Choy E, Sudlow C, Brophy S. Defining Disease Phenotypes in Primary Care Electronic Health Records by a Machine Learning Approach: A Case Study in Identifying Rheumatoid Arthritis. PLoS One 2016;11:e0154515. [PMID: 27135409 PMCID: PMC4852928 DOI: 10.1371/journal.pone.0154515] [Citation(s) in RCA: 53] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2016] [Accepted: 04/14/2016] [Indexed: 12/20/2022] Open

Abstract

OBJECTIVES

1) To use data-driven method to examine clinical codes (risk factors) of a medical condition in primary care electronic health records (EHRs) that can accurately predict a diagnosis of the condition in secondary care EHRs. 2) To develop and validate a disease phenotyping algorithm for rheumatoid arthritis using primary care EHRs.

METHODS

This study linked routine primary and secondary care EHRs in Wales, UK. A machine learning based scheme was used to identify patients with rheumatoid arthritis from primary care EHRs via the following steps: i) selection of variables by comparing relative frequencies of Read codes in the primary care dataset associated with disease case compared to non-disease control (disease/non-disease based on the secondary care diagnosis); ii) reduction of predictors/associated variables using a Random Forest method, iii) induction of decision rules from decision tree model. The proposed method was then extensively validated on an independent dataset, and compared for performance with two existing deterministic algorithms for RA which had been developed using expert clinical knowledge.

RESULTS

Primary care EHRs were available for 2,238,360 patients over the age of 16 and of these 20,667 were also linked in the secondary care rheumatology clinical system. In the linked dataset, 900 predictors (out of a total of 43,100 variables) in the primary care record were discovered more frequently in those with versus those without RA. These variables were reduced to 37 groups of related clinical codes, which were used to develop a decision tree model. The final algorithm identified 8 predictors related to diagnostic codes for RA, medication codes, such as those for disease modifying anti-rheumatic drugs, and absence of alternative diagnoses such as psoriatic arthritis. The proposed data-driven method performed as well as the expert clinical knowledge based methods.

CONCLUSION

Data-driven scheme, such as ensemble machine learning methods, has the potential of identifying the most informative predictors in a cost-effective and rapid way to accurately and reliably classify rheumatoid arthritis or other complex medical conditions in primary care EHRs.

Collapse

247

Halpern Y, Horng S, Choi Y, Sontag D. Electronic medical record phenotyping using the anchor and learn framework. J Am Med Inform Assoc 2016;23:731-40. [PMID: 27107443 PMCID: PMC4926745 DOI: 10.1093/jamia/ocw011] [Citation(s) in RCA: 84] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Accepted: 01/16/2016] [Indexed: 12/18/2022] Open

Abstract

Background Electronic medical records (EMRs) hold a tremendous amount of information about patients that is relevant to determining the optimal approach to patient care. As medicine becomes increasingly precise, a patient’s electronic medical record phenotype will play an important role in triggering clinical decision support systems that can deliver personalized recommendations in real time. Learning with anchors presents a method of efficiently learning statistically driven phenotypes with minimal manual intervention.

Materials and Methods We developed a phenotype library that uses both structured and unstructured data from the EMR to represent patients for real-time clinical decision support. Eight of the phenotypes were evaluated using retrospective EMR data on emergency department patients using a set of prospectively gathered gold standard labels.

Results We built a phenotype library with 42 publicly available phenotype definitions. Using information from triage time, the phenotype classifiers have an area under the ROC curve (AUC) of infection 0.89, cancer 0.88, immunosuppressed 0.85, septic shock 0.93, nursing home 0.87, anticoagulated 0.83, cardiac etiology 0.89, and pneumonia 0.90. Using information available at the time of disposition from the emergency department, the AUC values are infection 0.91, cancer 0.95, immunosuppressed 0.90, septic shock 0.97, nursing home 0.91, anticoagulated 0.94, cardiac etiology 0.92, and pneumonia 0.97.

Discussion The resulting phenotypes are interpretable and fast to build, and perform comparably to statistically learned phenotypes developed with 5000 manually labeled patients.

Conclusion Learning with anchors is an attractive option for building a large public repository of phenotype definitions that can be used for a range of health IT applications, including real-time decision support.

Collapse

248

Murphy SN, Herrick C, Wang Y, Wang TD, Sack D, Andriole KP, Wei J, Reynolds N, Plesniak W, Rosen BR, Pieper S, Gollub RL. High throughput tools to access images from clinical archives for research. J Digit Imaging 2016;28:194-204. [PMID: 25316195 PMCID: PMC4359193 DOI: 10.1007/s10278-014-9733-9] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open

249

Bates J, Fodeh SJ, Brandt CA, Womack JA. Classification of radiology reports for falls in an HIV study cohort. J Am Med Inform Assoc 2016;23:e113-7. [PMID: 26567329 PMCID: PMC4954638 DOI: 10.1093/jamia/ocv155] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2015] [Revised: 08/14/2015] [Accepted: 09/08/2015] [Indexed: 11/13/2022] Open

Abstract

OBJECTIVE

To identify patients in a human immunodeficiency virus (HIV) study cohort who have fallen by applying supervised machine learning methods to radiology reports of the cohort.

METHODS

We used the Veterans Aging Cohort Study Virtual Cohort (VACS-VC), an electronic health record-based cohort of 146 530 veterans for whom radiology reports were available (N=2 977 739). We created a reference standard of radiology reports, represented each report by a feature set of words and Unified Medical Language System concepts, and then developed several support vector machine (SVM) classifiers for falls. We compared mutual information (MI) ranking and embedded feature selection approaches. The SVM classifier with MI feature selection was chosen to classify all radiology reports in VACS-VC.

RESULTS

Our SVM classifier with MI feature selection achieved an area under the curve score of 97.04 on the test set. When applied to all the radiology reports in VACS-VC, 80 416 of these reports were classified as positive for a fall. Of these, 11 484 were associated with a fall-related external cause of injury code (E-code) and 68 932 were not, corresponding to 29 280 patients with potential fall-related injuries who could not have been found using E-codes.

DISCUSSION

Feature selection was crucial to improving the classifier's performance. Feature selection with MI allowed us to select the number of discriminative features to use for classification, in contrast to the embedded feature selection method, in which the number of features is chosen automatically.

CONCLUSION

Machine learning is an effective method of identifying patients who have suffered a fall. The development of this classifier supplements the clinical researcher's toolkit and reduces dependence on under-coded structured electronic health record data.

Collapse

250

Rumsfeld JS, Joynt KE, Maddox TM. Big data analytics to improve cardiovascular care: promise and challenges. Nat Rev Cardiol 2016;13:350-9. [PMID: 27009423 DOI: 10.1038/nrcardio.2016.42] [Citation(s) in RCA: 177] [Impact Index Per Article: 22.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]