1
|
Iscoe M, Socrates V, Gilson A, Chi L, Li H, Huang T, Kearns T, Perkins R, Khandjian L, Taylor RA. Identifying signs and symptoms of urinary tract infection from emergency department clinical notes using large language models. Acad Emerg Med 2024. [PMID: 38567658 DOI: 10.1111/acem.14883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 01/24/2024] [Accepted: 01/24/2024] [Indexed: 04/04/2024]
Abstract
BACKGROUND Natural language processing (NLP) tools including recently developed large language models (LLMs) have myriad potential applications in medical care and research, including the efficient labeling and classification of unstructured text such as electronic health record (EHR) notes. This opens the door to large-scale projects that rely on variables that are not typically recorded in a structured form, such as patient signs and symptoms. OBJECTIVES This study is designed to acquaint the emergency medicine research community with the foundational elements of NLP, highlighting essential terminology, annotation methodologies, and the intricacies involved in training and evaluating NLP models. Symptom characterization is critical to urinary tract infection (UTI) diagnosis, but identification of symptoms from the EHR has historically been challenging, limiting large-scale research, public health surveillance, and EHR-based clinical decision support. We therefore developed and compared two NLP models to identify UTI symptoms from unstructured emergency department (ED) notes. METHODS The study population consisted of patients aged ≥ 18 who presented to an ED in a northeastern U.S. health system between June 2013 and August 2021 and had a urinalysis performed. We annotated a random subset of 1250 ED clinician notes from these visits for a list of 17 UTI symptoms. We then developed two task-specific LLMs to perform the task of named entity recognition: a convolutional neural network-based model (SpaCy) and a transformer-based model designed to process longer documents (Clinical Longformer). Models were trained on 1000 notes and tested on a holdout set of 250 notes. We compared model performance (precision, recall, F1 measure) at identifying the presence or absence of UTI symptoms at the note level. RESULTS A total of 8135 entities were identified in 1250 notes; 83.6% of notes included at least one entity. Overall F1 measure for note-level symptom identification weighted by entity frequency was 0.84 for the SpaCy model and 0.88 for the Longformer model. F1 measure for identifying presence or absence of any UTI symptom in a clinical note was 0.96 (232/250 correctly classified) for the SpaCy model and 0.98 (240/250 correctly classified) for the Longformer model. CONCLUSIONS The study demonstrated the utility of LLMs and transformer-based models in particular for extracting UTI symptoms from unstructured ED clinical notes; models were highly accurate for detecting the presence or absence of any UTI symptom on the note level, with variable performance for individual symptoms.
Collapse
Affiliation(s)
- Mark Iscoe
- Department of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut, USA
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, New Haven, Connecticut, USA
| | - Vimig Socrates
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, New Haven, Connecticut, USA
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, USA
| | - Aidan Gilson
- Yale School of Medicine, New Haven, Connecticut, USA
| | - Ling Chi
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
| | - Huan Li
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, USA
| | - Thomas Huang
- Yale School of Medicine, New Haven, Connecticut, USA
| | - Thomas Kearns
- Department of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut, USA
| | - Rachelle Perkins
- Department of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut, USA
| | - Laura Khandjian
- Department of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut, USA
| | - R Andrew Taylor
- Department of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut, USA
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, New Haven, Connecticut, USA
| |
Collapse
|
2
|
Safranek CW, Huang T, Wright DS, Wright CX, Socrates V, Sangal RB, Iscoe M, Chartash D, Taylor RA. Automated HEART score determination via ChatGPT: Honing a framework for iterative prompt development. J Am Coll Emerg Physicians Open 2024; 5:e13133. [PMID: 38481520 PMCID: PMC10936537 DOI: 10.1002/emp2.13133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 01/25/2024] [Accepted: 02/10/2024] [Indexed: 03/17/2024] Open
Abstract
Objectives This study presents a design framework to enhance the accuracy by which large language models (LLMs), like ChatGPT can extract insights from clinical notes. We highlight this framework via prompt refinement for the automated determination of HEART (History, ECG, Age, Risk factors, Troponin risk algorithm) scores in chest pain evaluation. Methods We developed a pipeline for LLM prompt testing, employing stochastic repeat testing and quantifying response errors relative to physician assessment. We evaluated the pipeline for automated HEART score determination across a limited set of 24 synthetic clinical notes representing four simulated patients. To assess whether iterative prompt design could improve the LLMs' ability to extract complex clinical concepts and apply rule-based logic to translate them to HEART subscores, we monitored diagnostic performance during prompt iteration. Results Validation included three iterative rounds of prompt improvement for three HEART subscores with 25 repeat trials totaling 1200 queries each for GPT-3.5 and GPT-4. For both LLM models, from initial to final prompt design, there was a decrease in the rate of responses with erroneous, non-numerical subscore answers. Accuracy of numerical responses for HEART subscores (discrete 0-2 point scale) improved for GPT-4 from the initial to final prompt iteration, decreasing from a mean error of 0.16-0.10 (95% confidence interval: 0.07-0.14) points. Conclusion We established a framework for iterative prompt design in the clinical space. Although the results indicate potential for integrating LLMs in structured clinical note analysis, translation to real, large-scale clinical data with appropriate data privacy safeguards is needed.
Collapse
Affiliation(s)
- Conrad W. Safranek
- Section for Biomedical Informatics and Data ScienceYale University School of MedicineNew HavenConnecticutUSA
| | - Thomas Huang
- Section for Biomedical Informatics and Data ScienceYale University School of MedicineNew HavenConnecticutUSA
| | - Donald S. Wright
- Department of Emergency MedicineYale University School of MedicineNew HavenConnecticutUSA
| | - Catherine X. Wright
- Department of Cardiovascular MedicineYale University School of MedicineNew HavenConnecticutUSA
| | - Vimig Socrates
- Section for Biomedical Informatics and Data ScienceYale University School of MedicineNew HavenConnecticutUSA
| | - Rohit B. Sangal
- Department of Emergency MedicineYale University School of MedicineNew HavenConnecticutUSA
| | - Mark Iscoe
- Section for Biomedical Informatics and Data ScienceYale University School of MedicineNew HavenConnecticutUSA
- Department of Emergency MedicineYale University School of MedicineNew HavenConnecticutUSA
| | - David Chartash
- Section for Biomedical Informatics and Data ScienceYale University School of MedicineNew HavenConnecticutUSA
- School of MedicineUniversity College Dublin, National University of IrelandDublinRepublic of Ireland
| | - R. Andrew Taylor
- Section for Biomedical Informatics and Data ScienceYale University School of MedicineNew HavenConnecticutUSA
- Department of Emergency MedicineYale University School of MedicineNew HavenConnecticutUSA
| |
Collapse
|
3
|
Hauser RG, Quine DB, Iscoe M, Arvisais-Anhalt S. Development and Implementation of a Standard Format for Clinical Laboratory Test Results. Am J Clin Pathol 2022; 158:409-415. [PMID: 35713605 DOI: 10.1093/ajcp/aqac067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Accepted: 05/04/2022] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVES Surprisingly, laboratory results, the principal output of clinical laboratories, are not standardized. Thus, laboratories frequently report results with identical meaning in different formats. For example, laboratories report a positive pregnancy test as "+," "P," or "Positive." To assess the feasibility of a widespread implementation of a result standard, we (1) developed a standard result format for common laboratory tests and (2) implemented a feedback system for clinical laboratories to view their unstandardized results. METHODS In the largest integrated health care system in America, 130 facilities had the opportunity to collaboratively develop the standard. For 15 weeks, clinical laboratories received a weekly report of their unstandardized results. At the study's conclusion, laboratories were compared with themselves and their peers by metrics that reflected their unstandardized results. RESULTS We rereviewed 156 million test results and observed a 51% decline in the rate of unstandardized results. The number of facilities with fewer than 23 unstandardized results per 100,000 (Six Sigma σ > 5) increased by 58% (52 to 82 facilities; β = 1.79; P < .001). CONCLUSIONS This study demonstrated significant improvement in the standardization of clinical laboratory results in a relatively short time. The laboratory community should create and promulgate a standardized result format.
Collapse
Affiliation(s)
- Ronald George Hauser
- Veterans Affairs Connecticut Healthcare System , West Haven, CT , USA
- Department of Emergency Medicine, Yale University School of Medicine , New Haven, CT , USA
| | - Douglas B Quine
- Veterans Affairs Connecticut Healthcare System , West Haven, CT , USA
- Main Laboratory, Bridgeport Hospital , Bridgeport, CT , USA
| | - Mark Iscoe
- Veterans Affairs Connecticut Healthcare System , West Haven, CT , USA
- Department of Emergency Medicine, Yale University School of Medicine , New Haven, CT , USA
| | - Simone Arvisais-Anhalt
- Departments of Hospital Medicine and Laboratory Medicine, University of California, San Francisco , CA , USA
| |
Collapse
|
4
|
Fong A, Iscoe M, Sinsky CA, Haimovich A, Williams B, O'Connell RT, Goldstein R, Melnick E. Exploration of Primary Care Physician Phenotypes for Electronic Health Record Use. JMIR Med Inform 2022; 10:e34954. [PMID: 35275070 PMCID: PMC9055474 DOI: 10.2196/34954] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Revised: 02/03/2022] [Accepted: 03/11/2022] [Indexed: 12/02/2022] Open
Abstract
Background Electronic health records (EHRs) have become ubiquitous in US office-based physician practices. However, the different ways in which users engage with EHRs remain poorly characterized. Objective The aim of this study is to explore EHR use phenotypes among ambulatory care physicians. Methods In this retrospective cohort analysis, we applied affinity propagation, an unsupervised clustering machine learning technique, to identify EHR user types among primary care physicians. Results We identified 4 distinct phenotype clusters generalized across internal medicine, family medicine, and pediatrics specialties. Total EHR use varied for physicians in 2 clusters with above-average ratios of work outside of scheduled hours. This finding suggested that one cluster of physicians may have worked outside of scheduled hours out of necessity, whereas the other preferred ad hoc work hours. The two remaining clusters represented physicians with below-average EHR time and physicians who spend the largest proportion of their EHR time on documentation. Conclusions These findings demonstrate the utility of cluster analysis for exploring EHR use phenotypes and may offer opportunities for interventions to improve interface design to better support users’ needs.
Collapse
Affiliation(s)
- Allan Fong
- MedStar Health, 3007 Tilden St NW, Washington, US
| | | | | | | | | | | | | | | |
Collapse
|
5
|
Abstract
BACKGROUND Rising and burdensome health care costs have driven interest in the practice of high-value care (HVC) and have inspired calls for increased HVC training across all levels of medical education, including among undergraduate medical students. CONTEXT Classroom-based HVC curricula targeted to medical students have not been previously described in the medical literature. INNOVATION We developed and evaluated a workshop comprising a lecture, a small-group exercise and a group discussion to instruct medical students on interpreting cost-effectiveness analyses (CEA), applying CEA to patient care and discussing the cost of care with patients. From January 2014 to September 2015 the workshop was administered to five cohorts, 120 students in total, in the internal medicine clerkships at two US medical schools. Pre- and post-intervention confidence in various domains was assessed with a Likert-type scale ranging from 1 to 4. The overall response rate was 87.9 per cent. The proportion of students reporting high confidence scores (3 or 4) rose significantly (p < 0.01) in each domain: from 16.2 to 76.9 per cent for calculating an incremental cost-effectiveness ratio (ICER); from 16.0 to 79.6 per cent for interpreting quality-adjusted life-years (QALYs); from 8.7 to 71.3 per cent for using CEA in patient management; and from 15.3 to 71.4 per cent for discussing costs with patients. Students rated the overall quality of the course as 3.82 out of 5. Rising and burdensome health care costs have driven interest in the practice of high-value care IMPLICATIONS: Our experience of developing, evaluating and refining an HVC course targeted at medical students taught us that such a course is needed, can be educational and can be well-received. Future research is needed to assess the effects of curricula on clinical practice.
Collapse
Affiliation(s)
- Mark Iscoe
- Johns Hopkins School of Medicine, Baltimore, Maryland, USA
| | - Robert Lord
- Johns Hopkins School of Medicine, Baltimore, Maryland, USA
| | - John Schulz
- Johns Hopkins School of Medicine, Baltimore, Maryland, USA
| | - David Lee
- Johns Hopkins School of Medicine, Baltimore, Maryland, USA
| | - Danelle Cayea
- Johns Hopkins School of Medicine, Baltimore, Maryland, USA
| | - Amit Pahwa
- Johns Hopkins School of Medicine, Baltimore, Maryland, USA
| |
Collapse
|
6
|
Calderon Y, Cowan E, Schramm C, Stern S, Brusalis C, Iscoe M, Rahman S, Verma R, Leider J. HCV and HBV testing acceptability and knowledge among urban emergency department patients and pharmacy clients. Prev Med 2014; 61:29-33. [PMID: 24382298 DOI: 10.1016/j.ypmed.2013.12.026] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/06/2013] [Revised: 12/16/2013] [Accepted: 12/21/2013] [Indexed: 12/19/2022]
Abstract
OBJECTIVE Hepatitis C and hepatitis B are public health problems in the United States and remain largely undiagnosed. In response to the availability of rapid, point of care hepatitis tests, we assessed hepatitis knowledge and acceptability of hepatitis testing during an emergency department (ED) or pharmacy visit. METHODS From June 2010 to May 2011, an anonymous prospective survey was administered to a convenience sample of New York City ED patients and pharmacy clients. RESULTS The study population (N=2078) was 54% female, 36% Hispanic and 41% black. Mean age was 39, SD ± 15 years. The majority (72%;1480/2,2060) of the participants responded that they would get tested if free testing were offered, and 67% (1272/1912) of those responded that they would test for hepatitis B/C in conjunction with HIV. Participants who had previously tested for hepatitis had higher mean knowledge scores than those who had never tested. Pharmacy clients, those of black race, and those with higher mean knowledge scores would be more willing to accept hepatitis B/C testing if offered. CONCLUSIONS Urban ED patients and pharmacy clients were receptive to hepatitis testing. Most individuals would elect to be tested for hepatitis with HIV, which raises the possibility of integrated testing.
Collapse
Affiliation(s)
- Yvette Calderon
- Department of Emergency Medicine, Jacobi Medical Center, Bronx, NY, USA; Department of Emergency Medicine, Albert Einstein College of Medicine, Bronx, NY, USA.
| | - Ethan Cowan
- Department of Emergency Medicine, Jacobi Medical Center, Bronx, NY, USA; Department of Emergency Medicine, Albert Einstein College of Medicine, Bronx, NY, USA
| | | | - Sam Stern
- Department of Emergency Medicine, Jacobi Medical Center, Bronx, NY, USA
| | | | - Mark Iscoe
- Department of Emergency Medicine, Jacobi Medical Center, Bronx, NY, USA
| | - Sara Rahman
- Department of Emergency Medicine, Jacobi Medical Center, Bronx, NY, USA
| | - Rajesh Verma
- Department of Emergency Medicine, Jacobi Medical Center, Bronx, NY, USA
| | - Jason Leider
- Internal Medicine, Jacobi Medical Center, Bronx, NY, USA; Internal Medicine, Albert Einstein College of Medicine, Bronx, NY, USA
| |
Collapse
|
7
|
|