Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Wang Y, Wang L, Rastegar-Mojarad M, Moon S, Shen F, Afzal N, Liu S, Zeng Y, Mehrabi S, Sohn S, Liu H. Clinical information extraction applications: A literature review. J Biomed Inform 2018;77:34-49. [PMID: 29162496 PMCID: PMC5771858 DOI: 10.1016/j.jbi.2017.11.011] [Citation(s) in RCA: 316] [Impact Index Per Article: 52.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Revised: 11/01/2017] [Accepted: 11/17/2017] [Indexed: 12/24/2022]

For:	Wang Y, Wang L, Rastegar-Mojarad M, Moon S, Shen F, Afzal N, Liu S, Zeng Y, Mehrabi S, Sohn S, Liu H. Clinical information extraction applications: A literature review. J Biomed Inform 2018;77:34-49. [PMID: 29162496 PMCID: PMC5771858 DOI: 10.1016/j.jbi.2017.11.011] [Citation(s) in RCA: 316] [Impact Index Per Article: 52.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Revised: 11/01/2017] [Accepted: 11/17/2017] [Indexed: 12/24/2022]

Number

Cited by Other Article(s)

Shankar SV, Dhingra LS, Aminorroaya A, Adejumo P, Nadkarni GN, Xu H, Brandt C, Oikonomou EK, Pedroso AF, Khera R. Automated Transformation of Unstructured Cardiovascular Diagnostic Reports into Structured Datasets Using Sequentially Deployed Large Language Models. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.10.08.24315035. [PMID: 39417094 PMCID: PMC11482995 DOI: 10.1101/2024.10.08.24315035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/19/2024]

Abstract

Background

Rich data in cardiovascular diagnostic testing are often sequestered in unstructured reports, with the necessity of manual abstraction limiting their use in real-time applications in patient care and research.

Methods

We developed a two-step process that sequentially deploys generative and interpretative large language models (LLMs; Llama2 70b and Llama2 13b). Using a Llama2 70b model, we generated varying formats of transthoracic echocardiogram (TTE) reports from 3,000 real-world echo reports with paired structured elements, leveraging temporal changes in reporting formats to define the variations. Subsequently, we fine-tuned Llama2 13b using sequentially larger batches of generated echo reports as inputs, to extract data from free-text narratives across 18 clinically relevant echocardiographic fields. This was set up as a prompt-based supervised training task. We evaluated the fine-tuned Llama2 13b model, HeartDx-LM, on several distinct echocardiographic datasets: (i) reports across the different time periods and formats at Yale New Haven Health System (YNHHS), (ii) the Medical Information Mart for Intensive Care (MIMIC) III dataset, and (iii) the MIMIC IV dataset. We used the accuracy of extracted fields and Cohen's Kappa as the metrics and have publicly released the HeartDX-LM model.

Results

The HeartDX-LM model was trained on randomly selected 2,000 synthetic echo reports with varying formats and paired structured labels, with a wide range of clinical findings. We identified a lower threshold of 500 annotated reports required for fine-tuning Llama2 13b to achieve stable and consistent performance. At YNHHS, the HeartDx-LM model accurately extracted 69,144 out of 70,032 values (98.7%) across 18 clinical fields from unstructured reports in the test set from contemporary records where paired structured data were also available. In older echo reports where only unstructured reports were available, the model achieved 87.1% accuracy against expert annotations for the same 18 fields for a random sample of 100 reports. Similarly, in expert-annotated external validation sets from MIMIC-IV and MIMIC-III, HeartDx-LM correctly extracted 201 out of 220 available values (91.3%) and 615 out of 707 available values (87.9%), respectively, from 100 randomly chosen and expert annotated echo reports from each set.

Conclusion

We developed a novel method using paired large and moderate-sized LLMs to automate the extraction of unstructured echocardiographic reports into tabular datasets. Our approach represents a scalable strategy that transforms unstructured reports into computable elements that can be leveraged to improve cardiovascular care quality and enable research.

Collapse

Gao Y, Liu M. Application of machine learning based genome sequence analysis in pathogen identification. Front Microbiol 2024;15:1474078. [PMID: 39417073 PMCID: PMC11480060 DOI: 10.3389/fmicb.2024.1474078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Accepted: 09/23/2024] [Indexed: 10/19/2024] Open

Gundler C, Gottfried K, Wiederhold AJ, Ataian M, Wurlitzer M, Gewehr JE, Ückert F. Unlocking the Potential of Secondary Data for Public Health Research: Retrospective Study With a Novel Clinical Platform. Interact J Med Res 2024;13:e51563. [PMID: 39353185 PMCID: PMC11480676 DOI: 10.2196/51563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 12/01/2023] [Accepted: 07/17/2024] [Indexed: 10/04/2024] Open

Abstract

BACKGROUND

Clinical routine data derived from university hospitals hold immense value for health-related research on large cohorts. However, using secondary data for hypothesis testing necessitates adherence to scientific, legal (such as the General Data Protection Regulation, federal and state protection legislations), technical, and administrative requirements. This process is intricate, time-consuming, and susceptible to errors.

OBJECTIVE

This study aims to develop a platform that enables clinicians to use current real-world data for testing research and evaluate advantages and limitations at a large university medical center (542,944 patients in 2022).

METHODS

We identified requirements from clinical practitioners, conceptualized and implemented a platform based on the existing components, and assessed its applicability in clinical reality quantitatively and qualitatively.

RESULTS

The proposed platform was established at the University Medical Center Hamburg-Eppendorf and made 639 forms encompassing 10,629 data elements accessible to all resident scientists and clinicians. Every day, the number of patients rises, and parts of their electronic health records are made accessible through the platform. Qualitatively, we were able to conduct a retrospective analysis of Parkinson disease over 777 patients, where we provide additional evidence for a significantly higher proportion of action tremors in patients with rest tremors (340/777, 43.8%) compared with those without rest tremors (255/777, 32.8%), as determined by a chi-square test (P<.001). Quantitatively, our findings demonstrate increased user engagement within the last 90 days, underscoring clinicians' increasing adoption of the platform in their regular research activities. Notably, the platform facilitated the retrieval of clinical data from 600,000 patients, emphasizing its substantial added value.

CONCLUSIONS

This study demonstrates the feasibility of simplifying the use of clinical data to enhance exploration and sustainability in scientific research. The proposed platform emerges as a potential technological and legal framework for other medical centers, providing them with the means to unlock untapped potential within their routine data.

Collapse

Sivarajkumar S, Tam TYC, Mohammad HA, Viggiano S, Oniani D, Visweswaran S, Wang Y. Extraction of sleep information from clinical notes of Alzheimer's disease patients using natural language processing. J Am Med Inform Assoc 2024;31:2217-2227. [PMID: 39001795 DOI: 10.1093/jamia/ocae177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 06/19/2024] [Accepted: 07/01/2024] [Indexed: 07/15/2024] Open

Abstract

OBJECTIVES

Alzheimer's disease (AD) is the most common form of dementia in the United States. Sleep is one of the lifestyle-related factors that has been shown critical for optimal cognitive function in old age. However, there is a lack of research studying the association between sleep and AD incidence. A major bottleneck for conducting such research is that the traditional way to acquire sleep information is time-consuming, inefficient, non-scalable, and limited to patients' subjective experience. We aim to automate the extraction of specific sleep-related patterns, such as snoring, napping, poor sleep quality, daytime sleepiness, night wakings, other sleep problems, and sleep duration, from clinical notes of AD patients. These sleep patterns are hypothesized to play a role in the incidence of AD, providing insight into the relationship between sleep and AD onset and progression.

MATERIALS AND METHODS

A gold standard dataset is created from manual annotation of 570 randomly sampled clinical note documents from the adSLEEP, a corpus of 192 000 de-identified clinical notes of 7266 AD patients retrieved from the University of Pittsburgh Medical Center (UPMC). We developed a rule-based natural language processing (NLP) algorithm, machine learning models, and large language model (LLM)-based NLP algorithms to automate the extraction of sleep-related concepts, including snoring, napping, sleep problem, bad sleep quality, daytime sleepiness, night wakings, and sleep duration, from the gold standard dataset.

RESULTS

The annotated dataset of 482 patients comprised a predominantly White (89.2%), older adult population with an average age of 84.7 years, where females represented 64.1%, and a vast majority were non-Hispanic or Latino (94.6%). Rule-based NLP algorithm achieved the best performance of F1 across all sleep-related concepts. In terms of positive predictive value (PPV), the rule-based NLP algorithm achieved the highest PPV scores for daytime sleepiness (1.00) and sleep duration (1.00), while the machine learning models had the highest PPV for napping (0.95) and bad sleep quality (0.86), and LLAMA2 with finetuning had the highest PPV for night wakings (0.93) and sleep problem (0.89).

DISCUSSION

Although sleep information is infrequently documented in the clinical notes, the proposed rule-based NLP algorithm and LLM-based NLP algorithms still achieved promising results. In comparison, the machine learning-based approaches did not achieve good results, which is due to the small size of sleep information in the training data.

CONCLUSION

The results show that the rule-based NLP algorithm consistently achieved the best performance for all sleep concepts. This study focused on the clinical notes of patients with AD but could be extended to general sleep information extraction for other diseases.

Collapse

Wiest IC, Ferber D, Zhu J, van Treeck M, Meyer SK, Juglan R, Carrero ZI, Paech D, Kleesiek J, Ebert MP, Truhn D, Kather JN. Privacy-preserving large language models for structured medical information retrieval. NPJ Digit Med 2024;7:257. [PMID: 39304709 DOI: 10.1038/s41746-024-01233-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Accepted: 08/19/2024] [Indexed: 09/22/2024] Open

Affiliation(s)

Isabella Catharina Wiest Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
Dyke Ferber Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany Department of Medical Oncology, National Center for Tumor Diseases (NCT), Heidelberg University Hospital, Heidelberg, Germany
Jiefu Zhu Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
Marko van Treeck Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
Sonja K Meyer Department of Surgery I, University Hospital Würzburg, Würzburg, Germany
Radhika Juglan Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
Zunamys I Carrero Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
Daniel Paech German Cancer Research Center, Division of Radiology, Heidelberg, Germany University Hospital Bonn, Clinic for Neuroradiology, Bonn, Germany
Jens Kleesiek Institut für KI in der Medizin (IKIM), Universitätsmedizin Essen, Girardetstr. 2, 45131, Essen, Germany Cancer Research Center Cologne Essen (CCCE), West German Cancer Center Essen (WTZ), 45122, Essen, Germany TU Dortmund University, Department of Physics, Otto-Hahn-Straße 4, 44227, Dortmund, Germany
Matthias P Ebert Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany DKFZ Hector Cancer Institute at the University Medical Center, Mannheim, Germany Molecular Medicine Partnership Unit, EMBL, Heidelberg, Germany
Daniel Truhn Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, Germany
Jakob Nikolas Kather Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany. Department of Medical Oncology, National Center for Tumor Diseases (NCT), Heidelberg University Hospital, Heidelberg, Germany. Department of Medicine I, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, 01307, Dresden, Germany.

Collapse

Mason AJC, Bhavsar V, Botelle R, Chandran D, Li L, Mascio A, Sanyal J, Kadra-Scalzo G, Roberts A, Williams M, Stewart R. Applying neural network algorithms to ascertain reported experiences of violence in routine mental healthcare records and distributions of reports by diagnosis. Front Psychiatry 2024;15:1181739. [PMID: 39319350 PMCID: PMC11420987 DOI: 10.3389/fpsyt.2024.1181739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 08/14/2024] [Indexed: 09/26/2024] Open

Abstract

Introduction

Experiences of violence are important risk factors for worse outcome in people with mental health conditions; however, they are not routinely collected be mental health services, so their ascertainment depends on extraction from text fields with natural language processing (NLP) algorithms.

Methods

Applying previously developed neural network algorithms to routine mental healthcare records, we sought to describe the distribution of recorded violence victimisation by demographic and diagnostic characteristics. We ascertained recorded violence victimisation from the records of 60,021 patients receiving care from a large south London NHS mental healthcare provider during 2019. Descriptive and regression analyses were conducted to investigate variation by age, sex, ethnic group, and diagnostic category (ICD-10 F chapter sub-headings plus post-traumatic stress disorder (PTSD) as a specific condition).

Results

Patients with a mood disorder (adjusted odds ratio 1.63, 1.55-1.72), personality disorder (4.03, 3.65-4.45), schizophrenia spectrum disorder (1.84, 1.74-1.95) or PTSD (2.36, 2.08-2.69) had a significantly increased likelihood of victimisation compared to those with other mental health diagnoses. Additionally, patients from minority ethnic groups (1.10 (1.02-1.20) for Black, 1.40 (1.31-1.49) for Asian compared to White groups) had significantly higher likelihood of recorded violence victimisation. Males were significantly less likely to have reported recorded violence victimisation (0.44, 0.42-0.45) than females.

Discussion

We thus demonstrate the successful deployment of machine learning based NLP algorithms to ascertain important entities for outcome prediction in mental healthcare. The observed distributions highlight which sex, ethnicity and diagnostic groups had more records of violence victimisation. Further development of these algorithms could usefully capture broader experiences, such as differentiating more efficiently between witnessed, perpetrated and experienced violence and broader violence experiences like emotional abuse.

Collapse

Wen A, Wang L, He H, Fu S, Liu S, Hanauer DA, Harris DR, Kavuluru R, Zhang R, Natarajan K, Pavinkurve NP, Hajagos J, Rajupet S, Lingam V, Saltz M, Elowsky C, Moffitt RA, Koraishy FM, Palchuk MB, Donovan J, Lingrey L, Stone-DerHagopian G, Miller RT, Williams AE, Leese PJ, Kovach PI, Pfaff ER, Zemmel M, Pates RD, Guthe N, Haendel MA, Chute CG, Liu H. A Case Demonstration of the Open Health Natural Language Processing Toolkit From the National COVID-19 Cohort Collaborative and the Researching COVID to Enhance Recovery Programs for a Natural Language Processing System for COVID-19 or Postacute Sequelae of SARS CoV-2 Infection: Algorithm Development and Validation. JMIR Med Inform 2024;12:e49997. [PMID: 39250782 PMCID: PMC11420592 DOI: 10.2196/49997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 12/11/2023] [Accepted: 03/01/2024] [Indexed: 09/11/2024] Open

Abstract

BACKGROUND

A wealth of clinically relevant information is only obtainable within unstructured clinical narratives, leading to great interest in clinical natural language processing (NLP). While a multitude of approaches to NLP exist, current algorithm development approaches have limitations that can slow the development process. These limitations are exacerbated when the task is emergent, as is the case currently for NLP extraction of signs and symptoms of COVID-19 and postacute sequelae of SARS-CoV-2 infection (PASC).

OBJECTIVE

This study aims to highlight the current limitations of existing NLP algorithm development approaches that are exacerbated by NLP tasks surrounding emergent clinical concepts and to illustrate our approach to addressing these issues through the use case of developing an NLP system for the signs and symptoms of COVID-19 and PASC.

METHODS

We used 2 preexisting studies on PASC as a baseline to determine a set of concepts that should be extracted by NLP. This concept list was then used in conjunction with the Unified Medical Language System to autonomously generate an expanded lexicon to weakly annotate a training set, which was then reviewed by a human expert to generate a fine-tuned NLP algorithm. The annotations from a fully human-annotated test set were then compared with NLP results from the fine-tuned algorithm. The NLP algorithm was then deployed to 10 additional sites that were also running our NLP infrastructure. Of these 10 sites, 5 were used to conduct a federated evaluation of the NLP algorithm.

RESULTS

An NLP algorithm consisting of 12,234 unique normalized text strings corresponding to 2366 unique concepts was developed to extract COVID-19 or PASC signs and symptoms. An unweighted mean dictionary coverage of 77.8% was found for the 5 sites.

CONCLUSIONS

The evolutionary and time-critical nature of the PASC NLP task significantly complicates existing approaches to NLP algorithm development. In this work, we present a hybrid approach using the Open Health Natural Language Processing Toolkit aimed at addressing these needs with a dictionary-based weak labeling step that minimizes the need for additional expert annotation while still preserving the fine-tuning capabilities of expert involvement.

Collapse

Affiliation(s)

Andrew Wen Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States McWilliams School of Biomedical Informatics, University of Texas Health Sciences Center at Houston, Houston, TX, United States
Liwei Wang Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States McWilliams School of Biomedical Informatics, University of Texas Health Sciences Center at Houston, Houston, TX, United States
Huan He Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
Sunyang Fu Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States McWilliams School of Biomedical Informatics, University of Texas Health Sciences Center at Houston, Houston, TX, United States
Sijia Liu Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
David A Hanauer Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI, United States
Daniel R Harris Institute for Pharmaceutical Outcomes and Policy, College of Pharmacy, University of Kentucky, Lexington, KY, United States
Ramakanth Kavuluru Division of Biomedical Informatics, Department of Internal Medicine, University of Kentucky, Lexington, KY, United States
Rui Zhang Division of Health Data Science, University of Minnesota Medical School, Minneapolis, MN, United States
Karthik Natarajan Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, United States
Nishanth P Pavinkurve Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, United States
Janos Hajagos Department of Biomedical Informatics, Stony Brook Medicine, Stony Brook, NY, United States
Sritha Rajupet Department of Biomedical Informatics, Stony Brook Medicine, Stony Brook, NY, United States
Veena Lingam Department of Biomedical Informatics, Stony Brook Medicine, Stony Brook, NY, United States
Mary Saltz Department of Biomedical Informatics, Stony Brook Medicine, Stony Brook, NY, United States
Corey Elowsky Department of Biomedical Informatics, Stony Brook Medicine, Stony Brook, NY, United States
Richard A Moffitt Department of Biomedical Informatics, Stony Brook Medicine, Stony Brook, NY, United States
Farrukh M Koraishy Division of Nephrology, Stony Brook Medicine, Stony Brook, NY, United States
Matvey B Palchuk TriNetX LLC, Cambridge, MA, United States
Jordan Donovan TriNetX LLC, Cambridge, MA, United States
Lora Lingrey TriNetX LLC, Cambridge, MA, United States
Garo Stone-DerHagopian TriNetX LLC, Cambridge, MA, United States
Robert T Miller Clinical and Translational Science Institute, Tufts Medical Center, Boston, MA, United States
Andrew E Williams Clinical and Translational Science Institute, Tufts Medical Center, Boston, MA, United States Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, United States
Peter J Leese North Carolina Translational and Clinical Sciences Institute, University of North Carolina School of Medicine, Chapel Hill, NC, United States
Paul I Kovach North Carolina Translational and Clinical Sciences Institute, University of North Carolina School of Medicine, Chapel Hill, NC, United States
Emily R Pfaff North Carolina Translational and Clinical Sciences Institute, University of North Carolina School of Medicine, Chapel Hill, NC, United States
Mikhail Zemmel University of Virginia, Charlottesville, VA, United States
Robert D Pates University of Virginia, Charlottesville, VA, United States
Nick Guthe Department of Population Health, New York University Grossman School of Medicine, New York, NY, United States
Melissa A Haendel University of Colorado Anschutz Medical Campus, Denver, CO, United States
Christopher G Chute Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, MD, United States
Hongfang Liu Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States McWilliams School of Biomedical Informatics, University of Texas Health Sciences Center at Houston, Houston, TX, United States

Collapse

Wei R. Automated Medical Records Review for Mild Cognitive Impairment and Dementia. RESEARCH SQUARE 2024:rs.3.rs-5046441. [PMID: 39315274 PMCID: PMC11419186 DOI: 10.21203/rs.3.rs-5046441/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]

Abstract

Objectives

Unstructured and structured data in electronic health records (EHR) are a rich source of information for research and quality improvement studies. However, extracting accurate information from EHR is labor-intensive. Here we introduce an automated EHR phenotyping model to identify patients with Alzheimer's Disease, related dementias (ADRD), or mild cognitive impairment (MCI).

Methods

We assembled medical notes and associated International Classification of Diseases (ICD) codes and medication prescriptions from 3,626 outpatient adults from two hospitals seen between February 2015 and June 2022. Ground truth annotations regarding the presence vs. absence of a diagnosis of MCI or ADRD were determined through manual chart review. Indicators extracted from notes included the presence of keywords and phrases in unstructured clinical notes, prescriptions of medications associated with MCI/ADRD, and ICD codes associated with MCI/ADRD. We trained a regularized logistic regression model to predict the ground truth annotations. Model performance was evaluated using area under the receiver operating curve (AUROC), area under the precision-recall curve (AUPRC), accuracy, specificity, precision/positive predictive value, recall/sensitivity, and F1 score (harmonic mean of precision and recall).

Results

Thirty percent of patients in the cohort carried diagnoses of MCI/ADRD based on manual review. When evaluated on a held-out test set, the best model using clinical notes, ICDs, and medications, achieved an AUROC of 0.98, an AUPRC of 0.98, an accuracy of 0.93, a sensitivity (recall) of 0.91, a specificity of 0.96, a precision of 0.96, and an F1 score of 0.93 The estimated overall accuracy for patients randomly selected from EHRs was 99.88%.

Conclusion

Automated EHR phenotyping accurately identifies patients with MCI/ADRD based on clinical notes, ICD codes, and medication records. This approach holds potential for large-scale MCI/ADRD research utilizing EHR databases.

Collapse

Fu YV, Ramachandran GK, Halwani A, McInnes BT, Xia F, Lybarger K, Yetisgen M, Uzuner Ö. CACER: Clinical concept Annotations for Cancer Events and Relations. J Am Med Inform Assoc 2024:ocae231. [PMID: 39225779 DOI: 10.1093/jamia/ocae231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 08/08/2024] [Accepted: 08/12/2024] [Indexed: 09/04/2024] Open

Abstract

OBJECTIVE

Clinical notes contain unstructured representations of patient histories, including the relationships between medical problems and prescription drugs. To investigate the relationship between cancer drugs and their associated symptom burden, we extract structured, semantic representations of medical problem and drug information from the clinical narratives of oncology notes.

MATERIALS AND METHODS

We present Clinical concept Annotations for Cancer Events and Relations (CACER), a novel corpus with fine-grained annotations for over 48 000 medical problems and drug events and 10 000 drug-problem and problem-problem relations. Leveraging CACER, we develop and evaluate transformer-based information extraction models such as Bidirectional Encoder Representations from Transformers (BERT), Fine-tuned Language Net Text-To-Text Transfer Transformer (Flan-T5), Large Language Model Meta AI (Llama3), and Generative Pre-trained Transformers-4 (GPT-4) using fine-tuning and in-context learning (ICL).

RESULTS

In event extraction, the fine-tuned BERT and Llama3 models achieved the highest performance at 88.2-88.0 F1, which is comparable to the inter-annotator agreement (IAA) of 88.4 F1. In relation extraction, the fine-tuned BERT, Flan-T5, and Llama3 achieved the highest performance at 61.8-65.3 F1. GPT-4 with ICL achieved the worst performance across both tasks.

DISCUSSION

The fine-tuned models significantly outperformed GPT-4 in ICL, highlighting the importance of annotated training data and model optimization. Furthermore, the BERT models performed similarly to Llama3. For our task, large language models offer no performance advantage over the smaller BERT models.

CONCLUSIONS

We introduce CACER, a novel corpus with fine-grained annotations for medical problems, drugs, and their relationships in clinical narratives of oncology notes. State-of-the-art transformer models achieved performance comparable to IAA for several extraction tasks.

Collapse

Cho H, Yoo S, Kim B, Jang S, Sunwoo L, Kim S, Lee D, Kim S, Nam S, Chung JH. Extracting lung cancer staging descriptors from pathology reports: A generative language model approach. J Biomed Inform 2024;157:104720. [PMID: 39233209 DOI: 10.1016/j.jbi.2024.104720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 08/04/2024] [Accepted: 08/31/2024] [Indexed: 09/06/2024]

Abstract

BACKGROUND

In oncology, electronic health records contain textual key information for the diagnosis, staging, and treatment planning of patients with cancer. However, text data processing requires a lot of time and effort, which limits the utilization of these data. Recent advances in natural language processing (NLP) technology, including large language models, can be applied to cancer research. Particularly, extracting the information required for the pathological stage from surgical pathology reports can be utilized to update cancer staging according to the latest cancer staging guidelines.

OBJECTIVES

This study has two main objectives. The first objective is to evaluate the performance of extracting information from text-based surgical pathology reports and determining pathological stages based on the extracted information using fine-tuned generative language models (GLMs) for patients with lung cancer. The second objective is to determine the feasibility of utilizing relatively small GLMs for information extraction in a resource-constrained computing environment.

METHODS

Lung cancer surgical pathology reports were collected from the Common Data Model database of Seoul National University Bundang Hospital (SNUBH), a tertiary hospital in Korea. We selected 42 descriptors necessary for tumor-node (TN) classification based on these reports and created a gold standard with validation by two clinical experts. The pathology reports and gold standard were used to generate prompt-response pairs for training and evaluating GLMs which then were used to extract information required for staging from pathology reports.

RESULTS

We evaluated the information extraction performance of six trained models as well as their performance in TN classification using the extracted information. The Deductive Mistral-7B model, which was pre-trained with the deductive dataset, showed the best performance overall, with an exact match ratio of 92.24% in the information extraction problem and an accuracy of 0.9876 (predicting T and N classification concurrently) in classification.

CONCLUSION

This study demonstrated that training GLMs with deductive datasets can improve information extraction performance, and GLMs with a relatively small number of parameters at approximately seven billion can achieve high performance in this problem. The proposed GLM-based information extraction method is expected to be useful in clinical decision-making support, lung cancer staging and research.

Collapse

Hu Y, Chen Q, Du J, Peng X, Keloth VK, Zuo X, Zhou Y, Li Z, Jiang X, Lu Z, Roberts K, Xu H. Improving large language models for clinical named entity recognition via prompt engineering. J Am Med Inform Assoc 2024;31:1812-1820. [PMID: 38281112 PMCID: PMC11339492 DOI: 10.1093/jamia/ocad259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 12/15/2023] [Accepted: 12/26/2023] [Indexed: 01/29/2024] Open

Abstract

IMPORTANCE

The study highlights the potential of large language models, specifically GPT-3.5 and GPT-4, in processing complex clinical data and extracting meaningful information with minimal training data. By developing and refining prompt-based strategies, we can significantly enhance the models' performance, making them viable tools for clinical NER tasks and possibly reducing the reliance on extensive annotated datasets.

OBJECTIVES

This study quantifies the capabilities of GPT-3.5 and GPT-4 for clinical named entity recognition (NER) tasks and proposes task-specific prompts to improve their performance.

MATERIALS AND METHODS

We evaluated these models on 2 clinical NER tasks: (1) to extract medical problems, treatments, and tests from clinical notes in the MTSamples corpus, following the 2010 i2b2 concept extraction shared task, and (2) to identify nervous system disorder-related adverse events from safety reports in the vaccine adverse event reporting system (VAERS). To improve the GPT models' performance, we developed a clinical task-specific prompt framework that includes (1) baseline prompts with task description and format specification, (2) annotation guideline-based prompts, (3) error analysis-based instructions, and (4) annotated samples for few-shot learning. We assessed each prompt's effectiveness and compared the models to BioClinicalBERT.

RESULTS

Using baseline prompts, GPT-3.5 and GPT-4 achieved relaxed F1 scores of 0.634, 0.804 for MTSamples and 0.301, 0.593 for VAERS. Additional prompt components consistently improved model performance. When all 4 components were used, GPT-3.5 and GPT-4 achieved relaxed F1 socres of 0.794, 0.861 for MTSamples and 0.676, 0.736 for VAERS, demonstrating the effectiveness of our prompt framework. Although these results trail BioClinicalBERT (F1 of 0.901 for the MTSamples dataset and 0.802 for the VAERS), it is very promising considering few training samples are needed.

DISCUSSION

The study's findings suggest a promising direction in leveraging LLMs for clinical NER tasks. However, while the performance of GPT models improved with task-specific prompts, there's a need for further development and refinement. LLMs like GPT-4 show potential in achieving close performance to state-of-the-art models like BioClinicalBERT, but they still require careful prompt engineering and understanding of task-specific knowledge. The study also underscores the importance of evaluation schemas that accurately reflect the capabilities and performance of LLMs in clinical settings.

CONCLUSION

While direct application of GPT models to clinical NER tasks falls short of optimal performance, our task-specific prompt framework, incorporating medical knowledge and training samples, significantly enhances GPT models' feasibility for potential clinical applications.

Collapse

Fang Y, Ryan P, Weng C. Knowledge-guided generative artificial intelligence for automated taxonomy learning from drug labels. J Am Med Inform Assoc 2024;31:2065-2075. [PMID: 38787964 PMCID: PMC11339527 DOI: 10.1093/jamia/ocae105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 04/25/2024] [Accepted: 04/25/2024] [Indexed: 05/26/2024] Open

Zhou H, Li M, Xiao Y, Yang H, Zhang R. LEAP: LLM instruction-example adaptive prompting framework for biomedical relation extraction. J Am Med Inform Assoc 2024;31:2010-2018. [PMID: 38904416 PMCID: PMC11339510 DOI: 10.1093/jamia/ocae147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 05/26/2024] [Accepted: 06/03/2024] [Indexed: 06/22/2024] Open

Abstract

OBJECTIVE

To investigate the demonstration in large language models (LLMs) for biomedical relation extraction. This study introduces a framework comprising three types of adaptive tuning methods to assess their impacts and effectiveness.

MATERIALS AND METHODS

Our study was conducted in two phases. Initially, we analyzed a range of demonstration components vital for LLMs' biomedical data capabilities, including task descriptions and examples, experimenting with various combinations. Subsequently, we introduced the LLM instruction-example adaptive prompting (LEAP) framework, including instruction adaptive tuning, example adaptive tuning, and instruction-example adaptive tuning methods. This framework aims to systematically investigate both adaptive task descriptions and adaptive examples within the demonstration. We assessed the performance of the LEAP framework on the DDI, ChemProt, and BioRED datasets, employing LLMs such as Llama2-7b, Llama2-13b, and MedLLaMA_13B.

RESULTS

Our findings indicated that Instruction + Options + Example and its expanded form substantially improved F1 scores over the standard Instruction + Options mode for zero-shot LLMs. The LEAP framework, particularly through its example adaptive prompting, demonstrated superior performance over conventional instruction tuning across all models. Notably, the MedLLAMA_13B model achieved an exceptional F1 score of 95.13 on the ChemProt dataset using this method. Significant improvements were also observed in the DDI 2013 and BioRED datasets, confirming the method's robustness in sophisticated data extraction scenarios.

CONCLUSION

The LEAP framework offers a compelling strategy for enhancing LLM training strategies, steering away from extensive fine-tuning towards more dynamic and contextually enriched prompting methodologies, showcasing in biomedical relation extraction.

Collapse

Tran H, Yang Z, Yao Z, Yu H. BioInstruct: instruction tuning of large language models for biomedical natural language processing. J Am Med Inform Assoc 2024;31:1821-1832. [PMID: 38833265 PMCID: PMC11339494 DOI: 10.1093/jamia/ocae122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 05/10/2023] [Accepted: 05/14/2024] [Indexed: 06/06/2024] Open

Albashayreh A, Bandyopadhyay A, Zeinali N, Zhang M, Fan W, Gilbertson White S. Natural Language Processing Accurately Differentiates Cancer Symptom Information in Electronic Health Record Narratives. JCO Clin Cancer Inform 2024;8:e2300235. [PMID: 39116379 DOI: 10.1200/cci.23.00235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 04/29/2024] [Accepted: 05/30/2024] [Indexed: 08/10/2024] Open

Abstract

PURPOSE

Identifying cancer symptoms in electronic health record (EHR) narratives is feasible with natural language processing (NLP). However, more efficient NLP systems are needed to detect various symptoms and distinguish observed symptoms from negated symptoms and medication-related side effects. We evaluated the accuracy of NLP in (1) detecting 14 symptom groups (ie, pain, fatigue, swelling, depressed mood, anxiety, nausea/vomiting, pruritus, headache, shortness of breath, constipation, numbness/tingling, decreased appetite, impaired memory, disturbed sleep) and (2) distinguishing observed symptoms in EHR narratives among patients with cancer.

METHODS

We extracted 902,508 notes for 11,784 unique patients diagnosed with cancer and developed a gold standard corpus of 1,112 notes labeled for presence or absence of 14 symptom groups. We trained an embeddings-augmented NLP system integrating human and machine intelligence and conventional machine learning algorithms. NLP metrics were calculated on a gold standard corpus subset for testing.

RESULTS

The interannotator agreement for labeling the gold standard corpus was excellent at 92%. The embeddings-augmented NLP model achieved the best performance (F1 score = 0.877). The highest NLP accuracy was observed in pruritus (F1 score = 0.937) while the lowest accuracy was in swelling (F1 score = 0.787). After classifying the entire data set with embeddings-augmented NLP, we found that 41% of the notes included symptom documentation. Pain was the most documented symptom (29% of all notes) while impaired memory was the least documented (0.7% of all notes).

CONCLUSION

We illustrated the feasibility of detecting 14 symptom groups in EHR narratives and showed that an embeddings-augmented NLP system outperforms conventional machine learning algorithms in detecting symptom information and differentiating observed symptoms from negated symptoms and medication-related side effects.

Collapse

Munzone E, Marra A, Comotto F, Guercio L, Sangalli CA, Lo Cascio M, Pagan E, Sangalli D, Bigoni I, Porta FM, D'Ercole M, Ritorti F, Bagnardi V, Fusco N, Curigliano G. Development and Validation of a Natural Language Processing Algorithm for Extracting Clinical and Pathological Features of Breast Cancer From Pathology Reports. JCO Clin Cancer Inform 2024;8:e2400034. [PMID: 39137368 DOI: 10.1200/cci.24.00034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Revised: 04/26/2024] [Accepted: 06/25/2024] [Indexed: 08/15/2024] Open

Abstract

PURPOSE

Electronic health records (EHRs) are valuable information repositories that offer insights for enhancing clinical research on breast cancer (BC) using real-world data. The objective of this study was to develop a natural language processing (NLP) model specifically designed to extract structured data from BC pathology reports written in natural language.

METHODS

During the initial phase, the algorithm's development cohort comprised 193 pathology reports from 116 patients with BC from 2012 to 2016. A rule-based NLP algorithm was applied to extract 26 variables for analysis and was compared with the manual extraction of data performed by both a data entry specialist and an oncologist. Following the first approach, the data set was expanded to include 513 reports, and a Named Entity Recognition (NER)-NLP model was trained and evaluated using K-fold cross-validation.

RESULTS

The first approach led to a concordance analysis, which revealed an 82.9% agreement between the algorithm and the oncologist, whereas the concordance between the data entry specialist and the oncologist was 90.8%. The second training approach introduced the definition of an NER-NLP model, in which the accuracy showed remarkable potential (97.8%). Notably, the model demonstrated remarkable performance, especially for parameters such as estrogen receptor, progesterone receptor, human epidermal growth factor receptor 2, and Ki-67 (F1-score 1.0).

CONCLUSION

The present study aligns with the rapidly evolving field of artificial intelligence (AI) applications in oncology, seeking to expedite the development of complex cancer databases and registries. The results of the model are currently undergoing postprocessing procedures to organize the data into tabular structures, facilitating their utilization in real-world clinical and research endeavors.

Collapse

Alkhalaf M, Yu P, Yin M, Deng C. Applying generative AI with retrieval augmented generation to summarize and extract key clinical information from electronic health records. J Biomed Inform 2024;156:104662. [PMID: 38880236 DOI: 10.1016/j.jbi.2024.104662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 05/25/2024] [Accepted: 05/28/2024] [Indexed: 06/18/2024]

Abstract

BACKGROUND

Malnutrition is a prevalent issue in aged care facilities (RACFs), leading to adverse health outcomes. The ability to efficiently extract key clinical information from a large volume of data in electronic health records (EHR) can improve understanding about the extent of the problem and developing effective interventions. This research aimed to test the efficacy of zero-shot prompt engineering applied to generative artificial intelligence (AI) models on their own and in combination with retrieval augmented generation (RAG), for the automating tasks of summarizing both structured and unstructured data in EHR and extracting important malnutrition information.

METHODOLOGY

We utilized Llama 2 13B model with zero-shot prompting. The dataset comprises unstructured and structured EHRs related to malnutrition management in 40 Australian RACFs. We employed zero-shot learning to the model alone first, then combined it with RAG to accomplish two tasks: generate structured summaries about the nutritional status of a client and extract key information about malnutrition risk factors. We utilized 25 notes in the first task and 1,399 in the second task. We evaluated the model's output of each task manually against a gold standard dataset.

RESULT

The evaluation outcomes indicated that zero-shot learning applied to generative AI model is highly effective in summarizing and extracting information about nutritional status of RACFs' clients. The generated summaries provided concise and accurate representation of the original data with an overall accuracy of 93.25%. The addition of RAG improved the summarization process, leading to a 6% increase and achieving an accuracy of 99.25%. The model also proved its capability in extracting risk factors with an accuracy of 90%. However, adding RAG did not further improve accuracy in this task. Overall, the model has shown a robust performance when information was explicitly stated in the notes; however, it could encounter hallucination limitations, particularly when details were not explicitly provided.

CONCLUSION

This study demonstrates the high performance and limitations of applying zero-shot learning to generative AI models to automatic generation of structured summarization of EHRs data and extracting key clinical information. The inclusion of the RAG approach improved the model performance and mitigated the hallucination problem.

Collapse

Kempf E, Priou S, Cohen A, Redjdal A, Guével E, Tannier X. The More, the Better? Modalities of Metastatic Status Extraction on Free Medical Reports Based on Natural Language Processing. JCO Clin Cancer Inform 2024;8:e2400026. [PMID: 39186702 DOI: 10.1200/cci.24.00026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 04/26/2024] [Accepted: 05/10/2024] [Indexed: 08/28/2024] Open

Affiliation(s)

Emmanuelle Kempf Emmanuelle Kempf, MD, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France, Department of Medical Oncology, Assistance Publique Hôpitaux de Paris, Henri Mondor Teaching Hospital, Paris, France; Sonia Priou, MSc, Université Paris Saclay, CentraleSupélec, Laboratoire Génie Industriel, Gif-sur-Yvette, France; Ariel Cohen, MSc, IT Department, Innovation and Data, Assistance Publique Hôpitaux de Paris, Paris, France; Akram Redjdal, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France; Etienne Guével, MSc, IT Department, Innovation and Data, Assistance Publique Hôpitaux de Paris, Paris, France; and Xavier Tannier, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France
Sonia Priou Emmanuelle Kempf, MD, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France, Department of Medical Oncology, Assistance Publique Hôpitaux de Paris, Henri Mondor Teaching Hospital, Paris, France; Sonia Priou, MSc, Université Paris Saclay, CentraleSupélec, Laboratoire Génie Industriel, Gif-sur-Yvette, France; Ariel Cohen, MSc, IT Department, Innovation and Data, Assistance Publique Hôpitaux de Paris, Paris, France; Akram Redjdal, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France; Etienne Guével, MSc, IT Department, Innovation and Data, Assistance Publique Hôpitaux de Paris, Paris, France; and Xavier Tannier, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France
Ariel Cohen Emmanuelle Kempf, MD, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France, Department of Medical Oncology, Assistance Publique Hôpitaux de Paris, Henri Mondor Teaching Hospital, Paris, France; Sonia Priou, MSc, Université Paris Saclay, CentraleSupélec, Laboratoire Génie Industriel, Gif-sur-Yvette, France; Ariel Cohen, MSc, IT Department, Innovation and Data, Assistance Publique Hôpitaux de Paris, Paris, France; Akram Redjdal, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France; Etienne Guével, MSc, IT Department, Innovation and Data, Assistance Publique Hôpitaux de Paris, Paris, France; and Xavier Tannier, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France
Akram Redjdal Emmanuelle Kempf, MD, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France, Department of Medical Oncology, Assistance Publique Hôpitaux de Paris, Henri Mondor Teaching Hospital, Paris, France; Sonia Priou, MSc, Université Paris Saclay, CentraleSupélec, Laboratoire Génie Industriel, Gif-sur-Yvette, France; Ariel Cohen, MSc, IT Department, Innovation and Data, Assistance Publique Hôpitaux de Paris, Paris, France; Akram Redjdal, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France; Etienne Guével, MSc, IT Department, Innovation and Data, Assistance Publique Hôpitaux de Paris, Paris, France; and Xavier Tannier, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France
Etienne Guével Emmanuelle Kempf, MD, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France, Department of Medical Oncology, Assistance Publique Hôpitaux de Paris, Henri Mondor Teaching Hospital, Paris, France; Sonia Priou, MSc, Université Paris Saclay, CentraleSupélec, Laboratoire Génie Industriel, Gif-sur-Yvette, France; Ariel Cohen, MSc, IT Department, Innovation and Data, Assistance Publique Hôpitaux de Paris, Paris, France; Akram Redjdal, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France; Etienne Guével, MSc, IT Department, Innovation and Data, Assistance Publique Hôpitaux de Paris, Paris, France; and Xavier Tannier, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France
Xavier Tannier Emmanuelle Kempf, MD, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France, Department of Medical Oncology, Assistance Publique Hôpitaux de Paris, Henri Mondor Teaching Hospital, Paris, France; Sonia Priou, MSc, Université Paris Saclay, CentraleSupélec, Laboratoire Génie Industriel, Gif-sur-Yvette, France; Ariel Cohen, MSc, IT Department, Innovation and Data, Assistance Publique Hôpitaux de Paris, Paris, France; Akram Redjdal, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France; Etienne Guével, MSc, IT Department, Innovation and Data, Assistance Publique Hôpitaux de Paris, Paris, France; and Xavier Tannier, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France

Collapse

Burford KG, Itzkowitz NG, Ortega AG, Teitler JO, Rundle AG. Use of Generative AI to Identify Helmet Status Among Patients With Micromobility-Related Injuries From Unstructured Clinical Notes. JAMA Netw Open 2024;7:e2425981. [PMID: 39136946 PMCID: PMC11322845 DOI: 10.1001/jamanetworkopen.2024.25981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 05/15/2024] [Indexed: 08/16/2024] Open

Abstract

Importance

Large language models (LLMs) have potential to increase the efficiency of information extraction from unstructured clinical notes in electronic medical records.

Objective

To assess the utility and reliability of an LLM, ChatGPT-4 (OpenAI), to analyze clinical narratives and identify helmet use status of patients injured in micromobility-related accidents.

Design, Setting, and Participants

This cross-sectional study used publicly available, deidentified 2019 to 2022 data from the US Consumer Product Safety Commission's National Electronic Injury Surveillance System, a nationally representative stratified probability sample of 96 hospitals in the US. Unweighted estimates of e-bike, bicycle, hoverboard, and powered scooter-related injuries that resulted in an emergency department visit were used. Statistical analysis was performed from November 2023 to April 2024.

Main Outcomes and Measures

Patient helmet status (wearing vs not wearing vs unknown) was extracted from clinical narratives using (1) a text string search using researcher-generated text strings and (2) the LLM by prompting the system with low-, intermediate-, and high-detail prompts. The level of agreement between the 2 approaches across all 3 prompts was analyzed using Cohen κ test statistics. Fleiss κ was calculated to measure the test-retest reliability of the high-detail prompt across 5 new chat sessions and days. Performance statistics were calculated by comparing results from the high-detail prompt to classifications of helmet status generated by researchers reading the clinical notes (ie, a criterion standard review).

Results

Among 54 569 clinical notes, moderate (Cohen κ = 0.74 [95% CI, 0.73-0.75) and weak (Cohen κ = 0.53 [95% CI, 0.52-0.54]) agreement were found between the text string-search approach and the LLM for the low- and intermediate-detail prompts, respectively. The high-detail prompt had almost perfect agreement (κ = 1.00 [95% CI, 1.00-1.00]) but required the greatest amount of time to complete. The LLM did not perfectly replicate its analyses across new sessions and days (Fleiss κ = 0.91 across 5 trials; P < .001). The LLM often hallucinated and was consistent in replicating its hallucinations. It also showed high validity compared with the criterion standard (n = 400; κ = 0.98 [95% CI, 0.96-1.00]).

Conclusions and Relevance

This study's findings suggest that although there are efficiency gains for using the LLM to extract information from clinical notes, the inadequate reliability compared with a text string-search approach, hallucinations, and inconsistent performance significantly hinder the potential of the currently available LLM.

Collapse

Guo Y, Huang C, Sheng Y, Zhang W, Ye X, Lian H, Xu J, Chen Y. Improve the efficiency and accuracy of ophthalmologists' clinical decision-making based on AI technology. BMC Med Inform Decis Mak 2024;24:192. [PMID: 38982465 PMCID: PMC11234671 DOI: 10.1186/s12911-024-02587-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 06/24/2024] [Indexed: 07/11/2024] Open

Abstract

BACKGROUND

As global aging intensifies, the prevalence of ocular fundus diseases continues to rise. In China, the tense doctor-patient ratio poses numerous challenges for the early diagnosis and treatment of ocular fundus diseases. To reduce the high risk of missed or misdiagnosed cases, avoid irreversible visual impairment for patients, and ensure good visual prognosis for patients with ocular fundus diseases, it is particularly important to enhance the growth and diagnostic capabilities of junior doctors. This study aims to leverage the value of electronic medical record data to developing a diagnostic intelligent decision support platform. This platform aims to assist junior doctors in diagnosing ocular fundus diseases quickly and accurately, expedite their professional growth, and prevent delays in patient treatment. An empirical evaluation will assess the platform's effectiveness in enhancing doctors' diagnostic efficiency and accuracy.

METHODS

In this study, eight Chinese Named Entity Recognition (NER) models were compared, and the SoftLexicon-Glove-Word2vec model, achieving a high F1 score of 93.02%, was selected as the optimal recognition tool. This model was then used to extract key information from electronic medical records (EMRs) and generate feature variables based on diagnostic rule templates. Subsequently, an XGBoost algorithm was employed to construct an intelligent decision support platform for diagnosing ocular fundus diseases. The effectiveness of the platform in improving diagnostic efficiency and accuracy was evaluated through a controlled experiment comparing experienced and junior doctors.

RESULTS

The use of the diagnostic intelligent decision support platform resulted in significant improvements in both diagnostic efficiency and accuracy for both experienced and junior doctors (P < 0.05). Notably, the gap in diagnostic speed and precision between junior doctors and experienced doctors narrowed considerably when the platform was used. Although the platform also provided some benefits to experienced doctors, the improvement was less pronounced compared to junior doctors.

CONCLUSION

The diagnostic intelligent decision support platform established in this study, based on the XGBoost algorithm and NER, effectively enhances the diagnostic efficiency and accuracy of junior doctors in ocular fundus diseases. This has significant implications for optimizing clinical diagnosis and treatment.

Collapse

Dasaro CR, Sabra A, Jeon Y, Williams TA, Sloan NL, Todd AC, Teitelbaum SL. A comparison of two user-friendly methods to identify and support correction of misspelled medications. Prev Med Rep 2024;43:102765. [PMID: 38798907 PMCID: PMC11127154 DOI: 10.1016/j.pmedr.2024.102765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 04/29/2024] [Accepted: 05/14/2024] [Indexed: 05/29/2024] Open

Abstract

Objective

To identify and support correction of misspelled medication names recorded as free text, we compared the relative effectiveness of two user-friendly methods, used without reliance on clinical knowledge.

Methods

Leveraging the SAS® COMPGED function, fuzzy string search programs examined 1.8 million medication records from 183,600 World Trade Center General Responder Cohort monitoring visits conducted in New York and New Jersey between 7/16/2002 and 3/31/2021, producing replicable generalized edit distance scores between the reported and correct spelling. Scores < 120 were selected as optimal and compared to Stedman's 2020 Plus Medical/Pharmaceutical Spell Checker first suggested word, used as the comparative standard because it employs both spelling and phonetic similarities to suggest matching words. We coded each methods' results as identifying or not identifying the medications within each visit.

Results

Most types of medications (94.4 % anxiety, 98.4 % asthma and 94.6 % ulcer/gastroesophageal reflux disease) were correctly spelled. Cross tabulations assessed the agreement (anxiety 99.9 %, asthma 99.6 % and 98.4 % ulcer/ gastroesophageal reflux disease), false positive (respectively 0.02 %, 0.03 % and 2.0 %) and false negative (respectively 1.9 %, 0.5 % and 1.0 %) values. Scores < 120 occasionally correctly identified medications missed by the spell checker. We observed no difference in medication misspellings across socio-economically and culturally diverse patient characteristics.

Conclusions

Both methods efficiently identified most misspelled medications, greatly minimizing the review and rectification needed. The fuzzy method is more universally applicable for condition-specific medications identification, but requires more programming skills. The spell checker is inexpensive, but benefits from modest programming skills and is only available in some languages.

Collapse

Darer JD, Pesa J, Choudhry Z, Batista AE, Parab P, Yang X, Govindarajan R. Characterizing Myasthenia Gravis Symptoms, Exacerbations, and Crises From Neurologist's Clinical Notes Using Natural Language Processing. Cureus 2024;16:e65792. [PMID: 39219871 PMCID: PMC11361825 DOI: 10.7759/cureus.65792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/29/2024] [Indexed: 09/04/2024] Open

Abstract

Background Myasthenia gravis (MG) is a rare, autoantibody neuromuscular disorder characterized by fatigable weakness. Real-world evidence based on administrative and structured datasets regarding MG may miss important details related to the clinical encounter. Examination of free-text clinical progress notes has the potential to illuminate aspects of MG care. Objective The primary objective was to examine and characterize neurologist progress notes in the care of individuals with MG regarding the prevalence of documentation of clinical subtypes, antibody status, symptomatology, and MG deteriorations, including exacerbations and crises. The secondary objectives were to categorize MG deteriorations into practical, objective states as well as examine potential sources of clinical inertia in MG care. Methods We performed a retrospective, cross-sectional analysis of de-identified neurologist clinical notes from 2017 to 2022. A qualitative analysis of physician descriptions of MG deteriorations and a discussion of risks in MG care (risk for adverse effects, risk for clinical decompensation, etc.) was performed. Results Of the 3,085 individuals with MG, clinical subtypes and antibody status identified included gMG (n = 400; 13.0%), ocular MG (n = 253; 8.2%), MG unspecified (2,432; 78.8%), seropositivity for acetylcholine receptor antibody (n = 441; 14.3%), and MuSK antibody (n = 29; 0.9%). The most common gMG manifestations were dysphagia (n = 712; 23.0%), dyspnea (n = 626; 20.3%), and dysarthria (n = 514; 16.7%). In MG crisis patients, documentation of difficulties with MG standard therapies was common (n = 62; 45.2%). The qualitative analysis of MG deterioration types includes symptom fluctuation, symptom worsening with treatment intensification, MG deterioration with rescue therapy, and MG crisis. Qualitative analysis of MG-related risks included the toxicity of new therapies and concern for worsening MG because of changing therapies. Conclusions This study of neurologist progress notes demonstrates the potential for real-world evidence generation in the care of individuals with MG. MG patients suffer fluctuating symptomatology and a spectrum of clinical deteriorations. Adverse effects of MG therapies are common, highlighting the need for effective, less toxic treatments.

Collapse

Csore J, Roy TL, Wright G, Karmonik C. Unsupervised classification of multi-contrast magnetic resonance histology of peripheral arterial disease lesions using a convolutional variational autoencoder with a Gaussian mixture model in latent space: A technical feasibility study. Comput Med Imaging Graph 2024;115:102372. [PMID: 38581959 DOI: 10.1016/j.compmedimag.2024.102372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 02/09/2024] [Accepted: 03/18/2024] [Indexed: 04/08/2024]

Fu S, Wang L, He H, Wen A, Zong N, Kumari A, Liu F, Zhou S, Zhang R, Li C, Wang Y, St Sauver J, Liu H, Sohn S. A taxonomy for advancing systematic error analysis in multi-site electronic health record-based clinical concept extraction. J Am Med Inform Assoc 2024;31:1493-1502. [PMID: 38742455 PMCID: PMC11187420 DOI: 10.1093/jamia/ocae101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 03/26/2024] [Accepted: 04/19/2024] [Indexed: 05/16/2024] Open

Abstract

BACKGROUND

Error analysis plays a crucial role in clinical concept extraction, a fundamental subtask within clinical natural language processing (NLP). The process typically involves a manual review of error types, such as contextual and linguistic factors contributing to their occurrence, and the identification of underlying causes to refine the NLP model and improve its performance. Conducting error analysis can be complex, requiring a combination of NLP expertise and domain-specific knowledge. Due to the high heterogeneity of electronic health record (EHR) settings across different institutions, challenges may arise when attempting to standardize and reproduce the error analysis process.

OBJECTIVES

This study aims to facilitate a collaborative effort to establish common definitions and taxonomies for capturing diverse error types, fostering community consensus on error analysis for clinical concept extraction tasks.

MATERIALS AND METHODS

We iteratively developed and evaluated an error taxonomy based on existing literature, standards, real-world data, multisite case evaluations, and community feedback. The finalized taxonomy was released in both .dtd and .owl formats at the Open Health Natural Language Processing Consortium. The taxonomy is compatible with several different open-source annotation tools, including MAE, Brat, and MedTator.

RESULTS

The resulting error taxonomy comprises 43 distinct error classes, organized into 6 error dimensions and 4 properties, including model type (symbolic and statistical machine learning), evaluation subject (model and human), evaluation level (patient, document, sentence, and concept), and annotation examples. Internal and external evaluations revealed strong variations in error types across methodological approaches, tasks, and EHR settings. Key points emerged from community feedback, including the need to enhancing clarity, generalizability, and usability of the taxonomy, along with dissemination strategies.

CONCLUSION

The proposed taxonomy can facilitate the acceleration and standardization of the error analysis process in multi-site settings, thus improving the provenance, interpretability, and portability of NLP models. Future researchers could explore the potential direction of developing automated or semi-automated methods to assist in the classification and standardization of error analysis.

Collapse

Affiliation(s)

Sunyang Fu Department of AI and Informatics, Mayo Clinic, Rochester, MN 55902, United States Center for Translational AI Excellence and Applications in Medicine, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
Liwei Wang Department of AI and Informatics, Mayo Clinic, Rochester, MN 55902, United States Center for Translational AI Excellence and Applications in Medicine, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
Huan He Department of Biomedical Informatics & Data Science, Yale University, New Haven, CT 06520, United States
Andrew Wen Center for Translational AI Excellence and Applications in Medicine, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
Nansu Zong Department of AI and Informatics, Mayo Clinic, Rochester, MN 55902, United States
Anamika Kumari Department of Population and Quantitative Health Sciences, University of Massachusetts Chan Medical School, Boston, MA 01655, United States
Feifan Liu Department of Population and Quantitative Health Sciences, University of Massachusetts Chan Medical School, Boston, MA 01655, United States
Sicheng Zhou Division of Computational Health Sciences, University of Minnesota, Minneapolis, MN 55455, United States
Rui Zhang Division of Computational Health Sciences, University of Minnesota, Minneapolis, MN 55455, United States
Chenyu Li Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA 15260, United States
Yanshan Wang Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA 15260, United States
Jennifer St Sauver Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55902, United States
Hongfang Liu Department of AI and Informatics, Mayo Clinic, Rochester, MN 55902, United States Center for Translational AI Excellence and Applications in Medicine, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
Sunghwan Sohn Department of AI and Informatics, Mayo Clinic, Rochester, MN 55902, United States

Collapse

Zhao T, He ZA, Shao J, Regmi A, Shi L, Cai Y. Decoding hotline's information with text-mining: A protocol for improving tobacco control in Shanghai. Tob Induc Dis 2024;22:TID-22-107. [PMID: 38887599 PMCID: PMC11181012 DOI: 10.18332/tid/187864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 04/13/2024] [Accepted: 04/23/2024] [Indexed: 06/20/2024] Open

Jakovljevic M, Timofeyev Y, Zhuravleva T. The Impact of Pandemic-Driven Care Redesign on Hospital Efficiency. Risk Manag Healthc Policy 2024;17:1477-1491. [PMID: 38855044 PMCID: PMC11162215 DOI: 10.2147/rmhp.s465167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Accepted: 05/26/2024] [Indexed: 06/11/2024] Open

Zuo X, Kumar A, Shen S, Li J, Cong G, Jin E, Chen Q, Warner JL, Yang P, Xu H. Extracting Systemic Anticancer Therapy and Response Information From Clinical Notes Following the RECIST Definition. JCO Clin Cancer Inform 2024;8:e2300166. [PMID: 38885475 DOI: 10.1200/cci.23.00166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 02/27/2024] [Accepted: 03/11/2024] [Indexed: 06/20/2024] Open

Abstract

PURPOSE

The RECIST guidelines provide a standardized approach for evaluating the response of cancer to treatment, allowing for consistent comparison of treatment efficacy across different therapies and patients. However, collecting such information from electronic health records manually can be extremely labor-intensive and time-consuming because of the complexity and volume of clinical notes. The aim of this study is to apply natural language processing (NLP) techniques to automate this process, minimizing manual data collection efforts, and improving the consistency and reliability of the results.

METHODS

We proposed a complex, hybrid NLP system that automates the process of extracting, linking, and summarizing anticancer therapy and associated RECIST-like responses from narrative clinical text. The system consists of multiple machine learning-/deep learning-based and rule-based modules for diverse NLP tasks such as named entity recognition, assertion classification, relation extraction, and text normalization, to address different challenges associated with anticancer therapy and response information extraction. We then evaluated the system performances on two independent test sets from different institutions to demonstrate its effectiveness and generalizability.

RESULTS

The system used domain-specific language models, BioBERT and BioClinicalBERT, for high-performance therapy mentions identification and RECIST responses extraction and categorization. The best-performing model achieved a 0.66 score in linking therapy and RECIST response mentions, with end-to-end performance peaking at 0.74 after relation normalization, indicating substantial efficacy with room for improvement.

CONCLUSION

We developed, implemented, and tested an information extraction system from clinical notes for cancer treatment and efficacy assessment information. We expect this system will support future cancer research, particularly oncologic studies that focus on efficiently assessing the effectiveness and reliability of cancer therapeutics.

Collapse

Swaminathan A, Ren AL, Wu JY, Bhargava-Shah A, Lopez I, Srivastava U, Alexopoulos V, Pizzitola R, Bui B, Alkhani L, Lee S, Mohit N, Seo N, Macedo N, Cheng W, Wang W, Tran E, Thomas R, Gevaert O. Extraction of Unstructured Electronic Health Records to Evaluate Glioblastoma Treatment Patterns. JCO Clin Cancer Inform 2024;8:e2300091. [PMID: 38857465 PMCID: PMC11371099 DOI: 10.1200/cci.23.00091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 11/15/2023] [Accepted: 03/12/2024] [Indexed: 06/12/2024] Open

Shyr C, Hu Y, Bastarache L, Cheng A, Hamid R, Harris P, Xu H. Identifying and Extracting Rare Diseases and Their Phenotypes with Large Language Models. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2024;8:438-461. [PMID: 38681753 PMCID: PMC11052982 DOI: 10.1007/s41666-023-00155-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 10/24/2023] [Accepted: 11/13/2023] [Indexed: 05/01/2024]

Sivarajkumar S, Mohammad HA, Oniani D, Roberts K, Hersh W, Liu H, He D, Visweswaran S, Wang Y. Clinical Information Retrieval: A Literature Review. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2024;8:313-352. [PMID: 38681755 PMCID: PMC11052968 DOI: 10.1007/s41666-024-00159-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 12/07/2023] [Accepted: 01/08/2024] [Indexed: 05/01/2024]

Petit-Jean T, Gérardin C, Berthelot E, Chatellier G, Frank M, Tannier X, Kempf E, Bey R. Collaborative and privacy-enhancing workflows on a clinical data warehouse: an example developing natural language processing pipelines to detect medical conditions. J Am Med Inform Assoc 2024;31:1280-1290. [PMID: 38573195 PMCID: PMC11105139 DOI: 10.1093/jamia/ocae069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 02/28/2024] [Accepted: 03/13/2024] [Indexed: 04/05/2024] Open

Webb BD, Lau LY, Tsevdos D, Shewcraft RA, Corrigan D, Shi L, Lee S, Tyler J, Li S, Wang Z, Stolovitzky G, Edelmann L, Chen R, Schadt EE, Li L. An algorithm to identify patients aged 0-3 with rare genetic disorders. Orphanet J Rare Dis 2024;19:183. [PMID: 38698482 PMCID: PMC11064409 DOI: 10.1186/s13023-024-03188-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 04/17/2024] [Indexed: 05/05/2024] Open

Tavabi N, Pruneski J, Golchin S, Singh M, Sanborn R, Heyworth B, Landschaft A, Kimia A, Kiapour A. Building large-scale registries from unstructured clinical notes using a low-resource natural language processing pipeline. Artif Intell Med 2024;151:102847. [PMID: 38658131 DOI: 10.1016/j.artmed.2024.102847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 02/06/2024] [Accepted: 03/19/2024] [Indexed: 04/26/2024]

Abstract

Building clinical registries is an important step in clinical research and improvement of patient care quality. Natural Language Processing (NLP) methods have shown promising results in extracting valuable information from unstructured clinical notes. However, the structure and nature of clinical notes are very different from regular text that state-of-the-art NLP models are trained and tested on, and they have their own set of challenges. In this study, we propose Sentence Extractor with Keywords (SE-K), an efficient and interpretable classification approach for extracting information from clinical notes and show that it outperforms more computationally expensive methods in text classification. Following the Institutional Review Board (IRB) approval, we used SE-K and two embedding based NLP approaches (Sentence Extractor with Embeddings (SE-E) and Bidirectional Encoder Representations from Transformers (BERT)) to develop comprehensive registry of anterior cruciate ligament surgeries from 20 years of unstructured clinical data at a multi-site tertiary-care regional children's hospital. The low-resource approach (SE-K) had better performance (average AUROC of 0.94 ± 0.04) than the embedding-based approaches (SE-E: 0.93 ± 0.04 and BERT: 0.87 ± 0.09) for out of sample validation, in addition to minimum performance drop between test and out-of-sample validation. Moreover, the SE-K approach was at least six times faster (on CPU) than SE-E (on CPU) and BERT (on GPU) and provides interpretability. Our proposed approach, SE-K, can be effectively used to extract relevant variables from clinic notes to build large-scale registries, with consistently better performance compared to the more resource-intensive approaches (e.g., BERT). Such approaches can facilitate information extraction from unstructured notes for registry building, quality improvement and adverse event monitoring.

Collapse

Sivarajkumar S, Kelley M, Samolyk-Mazzanti A, Visweswaran S, Wang Y. An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing: Algorithm Development and Validation Study. JMIR Med Inform 2024;12:e55318. [PMID: 38587879 PMCID: PMC11036183 DOI: 10.2196/55318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 02/20/2024] [Accepted: 02/24/2024] [Indexed: 04/09/2024] Open

Abstract

BACKGROUND

Large language models (LLMs) have shown remarkable capabilities in natural language processing (NLP), especially in domains where labeled data are scarce or expensive, such as the clinical domain. However, to unlock the clinical knowledge hidden in these LLMs, we need to design effective prompts that can guide them to perform specific clinical NLP tasks without any task-specific training data. This is known as in-context learning, which is an art and science that requires understanding the strengths and weaknesses of different LLMs and prompt engineering approaches.

OBJECTIVE

The objective of this study is to assess the effectiveness of various prompt engineering techniques, including 2 newly introduced types-heuristic and ensemble prompts, for zero-shot and few-shot clinical information extraction using pretrained language models.

METHODS

This comprehensive experimental study evaluated different prompt types (simple prefix, simple cloze, chain of thought, anticipatory, heuristic, and ensemble) across 5 clinical NLP tasks: clinical sense disambiguation, biomedical evidence extraction, coreference resolution, medication status extraction, and medication attribute extraction. The performance of these prompts was assessed using 3 state-of-the-art language models: GPT-3.5 (OpenAI), Gemini (Google), and LLaMA-2 (Meta). The study contrasted zero-shot with few-shot prompting and explored the effectiveness of ensemble approaches.

RESULTS

The study revealed that task-specific prompt tailoring is vital for the high performance of LLMs for zero-shot clinical NLP. In clinical sense disambiguation, GPT-3.5 achieved an accuracy of 0.96 with heuristic prompts and 0.94 in biomedical evidence extraction. Heuristic prompts, alongside chain of thought prompts, were highly effective across tasks. Few-shot prompting improved performance in complex scenarios, and ensemble approaches capitalized on multiple prompt strengths. GPT-3.5 consistently outperformed Gemini and LLaMA-2 across tasks and prompt types.

CONCLUSIONS

This study provides a rigorous evaluation of prompt engineering methodologies and introduces innovative techniques for clinical information extraction, demonstrating the potential of in-context learning in the clinical domain. These findings offer clear guidelines for future prompt-based clinical NLP research, facilitating engagement by non-NLP experts in clinical NLP advancements. To the best of our knowledge, this is one of the first works on the empirical evaluation of different prompt engineering approaches for clinical NLP in this era of generative artificial intelligence, and we hope that it will inspire and inform future research in this area.

Collapse

Mashima Y, Tanigawa M, Yokoi H. Information heterogeneity between progress notes by physicians and nurses for inpatients with digestive system diseases. Sci Rep 2024;14:7656. [PMID: 38561333 PMCID: PMC10984979 DOI: 10.1038/s41598-024-56324-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 03/05/2024] [Indexed: 04/04/2024] Open

Wang L, Ma Y, Bi W, Lv H, Li Y. An Entity Extraction Pipeline for Medical Text Records Using Large Language Models: Analytical Study. J Med Internet Res 2024;26:e54580. [PMID: 38551633 PMCID: PMC11015372 DOI: 10.2196/54580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 01/23/2024] [Accepted: 02/14/2024] [Indexed: 04/02/2024] Open

Abstract

BACKGROUND

The study of disease progression relies on clinical data, including text data, and extracting valuable features from text data has been a research hot spot. With the rise of large language models (LLMs), semantic-based extraction pipelines are gaining acceptance in clinical research. However, the security and feature hallucination issues of LLMs require further attention.

OBJECTIVE

This study aimed to introduce a novel modular LLM pipeline, which could semantically extract features from textual patient admission records.

METHODS

The pipeline was designed to process a systematic succession of concept extraction, aggregation, question generation, corpus extraction, and question-and-answer scale extraction, which was tested via 2 low-parameter LLMs: Qwen-14B-Chat (QWEN) and Baichuan2-13B-Chat (BAICHUAN). A data set of 25,709 pregnancy cases from the People's Hospital of Guangxi Zhuang Autonomous Region, China, was used for evaluation with the help of a local expert's annotation. The pipeline was evaluated with the metrics of accuracy and precision, null ratio, and time consumption. Additionally, we evaluated its performance via a quantified version of Qwen-14B-Chat on a consumer-grade GPU.

RESULTS

The pipeline demonstrates a high level of precision in feature extraction, as evidenced by the accuracy and precision results of Qwen-14B-Chat (95.52% and 92.93%, respectively) and Baichuan2-13B-Chat (95.86% and 90.08%, respectively). Furthermore, the pipeline exhibited low null ratios and variable time consumption. The INT4-quantified version of QWEN delivered an enhanced performance with 97.28% accuracy and a 0% null ratio.

CONCLUSIONS

The pipeline exhibited consistent performance across different LLMs and efficiently extracted clinical features from textual data. It also showed reliable performance on consumer-grade hardware. This approach offers a viable and effective solution for mining clinical research data from textual records.

Collapse

Huang MS, Han JC, Lin PY, You YT, Tsai RTH, Hsu WL. Surveying biomedical relation extraction: a critical examination of current datasets and the proposal of a new resource. Brief Bioinform 2024;25:bbae132. [PMID: 38609331 PMCID: PMC11014787 DOI: 10.1093/bib/bbae132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 11/06/2023] [Accepted: 03/02/2023] [Indexed: 04/14/2024] Open

Yusuf A, Boyne DJ, O'Sullivan DE, Brenner DR, Cheung WY, Mirza I, Jarada TN. Text analysis framework for identifying mutations among non-small cell lung cancer patients from laboratory data. BMC Med Res Methodol 2024;24:63. [PMID: 38468224 PMCID: PMC10926579 DOI: 10.1186/s12874-024-02192-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 02/25/2024] [Indexed: 03/13/2024] Open

Abstract

BACKGROUND

Laboratory data can provide great value to support research aimed at reducing the incidence, prolonging survival and enhancing outcomes of cancer. Data is characterized by the information it carries and the format it holds. Data captured in Alberta's biomarker laboratory repository is free text, cluttered and rouge. Such data format limits its utility and prohibits broader adoption and research development. Text analysis for information extraction of unstructured data can change this and lead to more complete analyses. Previous work on extracting relevant information from free text, unstructured data employed Natural Language Processing (NLP), Machine Learning (ML), rule-based Information Extraction (IE) methods, or a hybrid combination between them.

METHODS

In our study, text analysis was performed on Alberta Precision Laboratories data which consisted of 95,854 entries from the Southern Alberta Dataset (SAD) and 6944 entries from the Northern Alberta Dataset (NAD). The data covers all of Alberta and is completely population-based. Our proposed framework is built around rule-based IE methods. It incorporates topics such as Syntax and Lexical analyses to achieve deterministic extraction of data from biomarker laboratory data (i.e., Epidermal Growth Factor Receptor (EGFR) test results). Lexical analysis compromises of data cleaning and pre-processing, Rich Text Format text conversion into readable plain text format, and normalization and tokenization of text. The framework then passes the text into the Syntax analysis stage which includes the rule-based method of extracting relevant data. Rule-based patterns of the test result are identified, and a Context Free Grammar then generates the rules of information extraction. Finally, the results are linked with the Alberta Cancer Registry to support real-world cancer research studies.

RESULTS

Of the original 5512 entries in the SAD dataset and 5017 entries in the NAD dataset which were filtered for EGFR, the framework yielded 5129 and 3388 extracted EGFR test results from the SAD and NAD datasets, respectively. An accuracy of 97.5% was achieved on a random sample of 362 tests.

CONCLUSIONS

We presented a text analysis framework to extract specific information from unstructured clinical data. Our proposed framework has shown that it can successfully extract relevant information from EGFR test results.

Collapse

Hu D, Liu B, Zhu X, Lu X, Wu N. Zero-shot information extraction from radiological reports using ChatGPT. Int J Med Inform 2024;183:105321. [PMID: 38157785 DOI: 10.1016/j.ijmedinf.2023.105321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 12/04/2023] [Accepted: 12/16/2023] [Indexed: 01/03/2024]

Abstract

INTRODUCTION

Electronic health records contain an enormous amount of valuable information recorded in free text. Information extraction is the strategy to transform free text into structured data, but some of its components require annotated data to tune, which has become a bottleneck. Large language models achieve good performances on various downstream NLP tasks without parameter tuning, becoming a possible way to extract information in a zero-shot manner.

METHODS

In this study, we aim to explore whether the most popular large language model, ChatGPT, can extract information from the radiological reports. We first design the prompt template for the interested information in the CT reports. Then, we generate the prompts by combining the prompt template with the CT reports as the inputs of ChatGPT to obtain the responses. A post-processing module is developed to transform the responses into structured extraction results. Besides, we add prior medical knowledge to the prompt template to reduce wrong extraction results. We also explore the consistency of the extraction results.

RESULTS

We conducted the experiments with 847 real CT reports. The experimental results indicate that ChatGPT can achieve competitive performances for some extraction tasks like tumor location, tumor long and short diameters compared with the baseline information extraction system. By adding some prior medical knowledge to the prompt template, extraction tasks about tumor spiculations and lobulations obtain significant improvements but tasks about tumor density and lymph node status do not achieve better performances.

CONCLUSION

ChatGPT can achieve competitive information extraction for radiological reports in a zero-shot manner. Adding prior medical knowledge as instructions can further improve performances for some extraction tasks but may lead to worse performances for some complex extraction tasks.

Collapse

Gu S, Lee EW, Zhang W, Simpson RL, Hertzberg VS, Ho JC. Evaluating Natural Language Processing Packages for Predicting Hospital-Acquired Pressure Injuries From Clinical Notes. Comput Inform Nurs 2024;42:184-192. [PMID: 37607706 PMCID: PMC10884344 DOI: 10.1097/cin.0000000000001053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]

Sushil M, Butte AJ, Schuit E, van Smeden M, Leeuwenberg AM. Cross-institution natural language processing for reliable clinical association studies: a methodological exploration. J Clin Epidemiol 2024;167:111258. [PMID: 38219811 DOI: 10.1016/j.jclinepi.2024.111258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 12/21/2023] [Accepted: 01/08/2024] [Indexed: 01/16/2024]

Abstract

OBJECTIVES

Natural language processing (NLP) of clinical notes in electronic medical records is increasingly used to extract otherwise sparsely available patient characteristics, to assess their association with relevant health outcomes. Manual data curation is resource intensive and NLP methods make these studies more feasible. However, the methodology of using NLP methods reliably in clinical research is understudied. The objective of this study is to investigate how NLP models could be used to extract study variables (specifically exposures) to reliably conduct exposure-outcome association studies.

STUDY DESIGN AND SETTING

In a convenience sample of patients admitted to the intensive care unit of a US academic health system, multiple association studies are conducted, comparing the association estimates based on NLP-extracted vs. manually extracted exposure variables. The association studies varied in NLP model architecture (Bidirectional Encoder Decoder from Transformers, Long Short-Term Memory), training paradigm (training a new model, fine-tuning an existing external model), extracted exposures (employment status, living status, and substance use), health outcomes (having a do-not-resuscitate/intubate code, length of stay, and in-hospital mortality), missing data handling (multiple imputation vs. complete case analysis), and the application of measurement error correction (via regression calibration).

RESULTS

The study was conducted on 1,174 participants (median [interquartile range] age, 61 [50, 73] years; 60.6% male). Additionally, up to 500 discharge reports of participants from the same health system and 2,528 reports of participants from an external health system were used to train the NLP models. Substantial differences were found between the associations based on NLP-extracted and manually extracted exposures under all settings. The error in association was only weakly correlated with the overall F1 score of the NLP models.

CONCLUSION

Associations estimated using NLP-extracted exposures should be interpreted with caution. Further research is needed to set conditions for reliable use of NLP in medical association studies.

Collapse

Margetta J, Sale A. Distinguishing cardiac catheter ablation energy modalities by applying natural language processing to electronic health records. J Comp Eff Res 2024;13:e230053. [PMID: 38261335 PMCID: PMC10945417 DOI: 10.57264/cer-2023-0053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Accepted: 01/03/2024] [Indexed: 01/24/2024] Open

Mora S, Turrisi R, Chiarella L, Consales A, Tassi L, Mai R, Nobili L, Barla A, Arnulfo G. NLP-based tools for localization of the epileptogenic zone in patients with drug-resistant focal epilepsy. Sci Rep 2024;14:2349. [PMID: 38287042 PMCID: PMC10825198 DOI: 10.1038/s41598-024-51846-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 01/10/2024] [Indexed: 01/31/2024] Open

Lin WC, Chen A, Song X, Weiskopf NG, Chiang MF, Hribar MR. Prediction of multiclass surgical outcomes in glaucoma using multimodal deep learning based on free-text operative notes and structured EHR data. J Am Med Inform Assoc 2024;31:456-464. [PMID: 37964658 PMCID: PMC10797280 DOI: 10.1093/jamia/ocad213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 10/16/2023] [Accepted: 10/25/2023] [Indexed: 11/16/2023] Open

Abstract

OBJECTIVE

Surgical outcome prediction is challenging but necessary for postoperative management. Current machine learning models utilize pre- and post-op data, excluding intraoperative information in surgical notes. Current models also usually predict binary outcomes even when surgeries have multiple outcomes that require different postoperative management. This study addresses these gaps by incorporating intraoperative information into multimodal models for multiclass glaucoma surgery outcome prediction.

MATERIALS AND METHODS

We developed and evaluated multimodal deep learning models for multiclass glaucoma trabeculectomy surgery outcomes using both structured EHR data and free-text operative notes. We compare those to baseline models that use structured EHR data exclusively, or neural network models that leverage only operative notes.

RESULTS

The multimodal neural network had the highest performance with a macro AUROC of 0.750 and F1 score of 0.583. It outperformed the baseline machine learning model with structured EHR data alone (macro AUROC of 0.712 and F1 score of 0.486). Additionally, the multimodal model achieved the highest recall (0.692) for hypotony surgical failure, while the surgical success group had the highest precision (0.884) and F1 score (0.775).

DISCUSSION

This study shows that operative notes are an important source of predictive information. The multimodal predictive model combining perioperative notes and structured pre- and post-op EHR data outperformed other models. Multiclass surgical outcome prediction can provide valuable insights for clinical decision-making.

CONCLUSIONS

Our results show the potential of deep learning models to enhance clinical decision-making for postoperative management. They can be applied to other specialties to improve surgical outcome predictions.

Collapse

Anell A, Arvidsson E, Dackehag M, Ellegård LM, Glenngård AH. Access to automated comparative feedback reports in primary care - a study of intensity of use and relationship with clinical performance among Swedish primary care practices. BMC Health Serv Res 2024;24:33. [PMID: 38178188 PMCID: PMC10768433 DOI: 10.1186/s12913-023-10407-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 11/30/2023] [Indexed: 01/06/2024] Open

Abstract

BACKGROUND

Digital applications that automatically extract information from electronic medical records and provide comparative visualizations of the data in the form of quality indicators to primary care practices may facilitate local quality improvement (QI). A necessary condition for such QI to work is that practices actively access the data. The purpose of this study was to explore the use of an application that visualizes quality indicators in Swedish primary care, developed by a profession-led QI initiative ("Primärvårdskvalitet"). We also describe the characteristics of practices that used the application more or less extensively, and the relationships between the intensity of use and changes in selected performance indicators.

METHODS

We studied longitudinal data on 122 primary care practices' visits to pages (page views) in the application over a period up to 5 years. We compared high and low users, classified by the average number of monthly page views, with respect to practice and patient characteristics as well as baseline measurements of a subset of the performance indicators. We estimated linear associations between visits to pages with diabetes-related indicators and the change in measurements of selected diabetes indicators over 1.5 years.

RESULTS

Less than half of all practices accessed the data in a given month, although most practices accessed the data during at least one third of the observed months. High and low users were similar in terms of most studied characteristics. We found statistically significant positive associations between use of the diabetes indicators and changes in measurements of three diabetes indicators.

CONCLUSIONS

Although most practices in this study indicated an interest in the automated feedback reports, the intensity of use can be described as varying and on average limited. The positive associations between the use and changes in performance suggest that policymakers should increase their support of practices' QI efforts. Such support may include providing a formalized structure for peer group discussions of data, facilitating both understanding of the data and possible action points to improve performance, while maintaining a profession-led use of applications.

Collapse

Cui Z, Yu K, Yuan Z, Dong X, Luo W. Language inference-based learning for Low-Resource Chinese clinical named entity recognition using language model. J Biomed Inform 2024;149:104559. [PMID: 38056702 DOI: 10.1016/j.jbi.2023.104559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 10/24/2023] [Accepted: 11/30/2023] [Indexed: 12/08/2023]

Zhou H, Li M, Xiao Y, Yang H, Zhang R. LLM Instruction-Example Adaptive Prompting (LEAP) Framework for Clinical Relation Extraction. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.12.15.23300059. [PMID: 38168203 PMCID: PMC10760264 DOI: 10.1101/2023.12.15.23300059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]

Li R, Wang X, Yu H. Two Directions for Clinical Data Generation with Large Language Models: Data-to-Label and Label-to-Data. PROCEEDINGS OF THE CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING. CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING 2023;2023:7129-7143. [PMID: 38213944 PMCID: PMC10782150 DOI: 10.18653/v1/2023.findings-emnlp.474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2024]

Nuthalapati P, Thomas L, Donahue MA, Moura LMVR, DeStefano S, Simpson JR, Buchhalter J, Fureman BE, Pellinen J. Improving Seizure Frequency Documentation and Classification. Neurol Clin Pract 2023;13:e200212. [PMID: 37873534 PMCID: PMC10586801 DOI: 10.1212/cpj.0000000000200212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Accepted: 09/01/2023] [Indexed: 10/25/2023]

Abstract

Background and Objectives

Accurate and reliable seizure data are essential for evaluating treatment strategies and tracking the quality of care in epilepsy clinics. This quality improvement project aimed to increase seizure documentation (i.e., documentation of seizure frequency from 80% to 100%, date of last seizure from 35% to 50%, and International League Against Epilepsy (ILAE) seizure classification from 35% to at least 50%) over 6 months.

Methods

We surveyed 7 epileptologists to determine their perceived seizure frequency, ILAE classification, and date of last seizure documentation habits. Baseline data were collected weekly from September to December 2021. Subsequently, we implemented a newly created flowsheet in our Electronic Health Record (EHR) based on the Epilepsy Learning Healthcare System (ELHS) Case Report Forms to increase seizure documentation in a standardized way. Two epileptologists tested this flowsheet tool in their epilepsy clinics between February 2022 and July 2022. Data were collected weekly and compared with documentation from other epileptologists within the same group.

Results

Epileptologists at our center believed they documented seizure frequency for 84%-87% of clinic visits, which aligned with baseline data collection, showing they recorded seizure frequency for 83% of clinic visits. Epileptologists believed they documented ILAE classification for 47%-52% of clinic visits, and baseline data showed this was documented in 33% of clinic visits. They also reported documenting the date of the last seizure for 52%-63% of clinic visits, but this occurred in only 35% of clinic visits. After implementing the new flowsheet, documentation increased to nearly 100% for all fields being completed by the providers who tested the flowsheet.

Discussion

We demonstrated that by implementing an easy-to-use standardized EHR documentation tool, our documentation of critical metrics, as defined by the ELHS, improved dramatically. This shows that simple and practical interventions can substantially improve clinically meaningful documentation.

Collapse

Affiliation(s)

Poojith Nuthalapati Department of Neurology (PN, MAD, LMVRM), Massachusetts General Hospital, Harvard Medical School, Boston; Department of Neurology (LT, SD, JRS, JP), University of Colorado School of Medicine, Aurora; Department of Pediatrics (JB), Cumming School of Medicine, University of Calgary, AB, CA; and Mission Outcomes Team (BEF), Epilepsy Foundation, Landover, MD
Lionel Thomas Department of Neurology (PN, MAD, LMVRM), Massachusetts General Hospital, Harvard Medical School, Boston; Department of Neurology (LT, SD, JRS, JP), University of Colorado School of Medicine, Aurora; Department of Pediatrics (JB), Cumming School of Medicine, University of Calgary, AB, CA; and Mission Outcomes Team (BEF), Epilepsy Foundation, Landover, MD
Maria A Donahue Department of Neurology (PN, MAD, LMVRM), Massachusetts General Hospital, Harvard Medical School, Boston; Department of Neurology (LT, SD, JRS, JP), University of Colorado School of Medicine, Aurora; Department of Pediatrics (JB), Cumming School of Medicine, University of Calgary, AB, CA; and Mission Outcomes Team (BEF), Epilepsy Foundation, Landover, MD
Lidia M V R Moura Department of Neurology (PN, MAD, LMVRM), Massachusetts General Hospital, Harvard Medical School, Boston; Department of Neurology (LT, SD, JRS, JP), University of Colorado School of Medicine, Aurora; Department of Pediatrics (JB), Cumming School of Medicine, University of Calgary, AB, CA; and Mission Outcomes Team (BEF), Epilepsy Foundation, Landover, MD
Samuel DeStefano Department of Neurology (PN, MAD, LMVRM), Massachusetts General Hospital, Harvard Medical School, Boston; Department of Neurology (LT, SD, JRS, JP), University of Colorado School of Medicine, Aurora; Department of Pediatrics (JB), Cumming School of Medicine, University of Calgary, AB, CA; and Mission Outcomes Team (BEF), Epilepsy Foundation, Landover, MD
Jennifer R Simpson Department of Neurology (PN, MAD, LMVRM), Massachusetts General Hospital, Harvard Medical School, Boston; Department of Neurology (LT, SD, JRS, JP), University of Colorado School of Medicine, Aurora; Department of Pediatrics (JB), Cumming School of Medicine, University of Calgary, AB, CA; and Mission Outcomes Team (BEF), Epilepsy Foundation, Landover, MD
Jeffrey Buchhalter Department of Neurology (PN, MAD, LMVRM), Massachusetts General Hospital, Harvard Medical School, Boston; Department of Neurology (LT, SD, JRS, JP), University of Colorado School of Medicine, Aurora; Department of Pediatrics (JB), Cumming School of Medicine, University of Calgary, AB, CA; and Mission Outcomes Team (BEF), Epilepsy Foundation, Landover, MD
Brandy E Fureman Department of Neurology (PN, MAD, LMVRM), Massachusetts General Hospital, Harvard Medical School, Boston; Department of Neurology (LT, SD, JRS, JP), University of Colorado School of Medicine, Aurora; Department of Pediatrics (JB), Cumming School of Medicine, University of Calgary, AB, CA; and Mission Outcomes Team (BEF), Epilepsy Foundation, Landover, MD
Jacob Pellinen Department of Neurology (PN, MAD, LMVRM), Massachusetts General Hospital, Harvard Medical School, Boston; Department of Neurology (LT, SD, JRS, JP), University of Colorado School of Medicine, Aurora; Department of Pediatrics (JB), Cumming School of Medicine, University of Calgary, AB, CA; and Mission Outcomes Team (BEF), Epilepsy Foundation, Landover, MD

Collapse

Crema C, Buonocore TM, Fostinelli S, Parimbelli E, Verde F, Fundarò C, Manera M, Ramusino MC, Capelli M, Costa A, Binetti G, Bellazzi R, Redolfi A. Advancing Italian biomedical information extraction with transformers-based models: Methodological insights and multicenter practical application. J Biomed Inform 2023;148:104557. [PMID: 38012982 DOI: 10.1016/j.jbi.2023.104557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 10/26/2023] [Accepted: 11/24/2023] [Indexed: 11/29/2023]