1
|
Shankar SV, Dhingra LS, Aminorroaya A, Adejumo P, Nadkarni GN, Xu H, Brandt C, Oikonomou EK, Pedroso AF, Khera R. Automated Transformation of Unstructured Cardiovascular Diagnostic Reports into Structured Datasets Using Sequentially Deployed Large Language Models. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.10.08.24315035. [PMID: 39417094 PMCID: PMC11482995 DOI: 10.1101/2024.10.08.24315035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/19/2024]
Abstract
Background Rich data in cardiovascular diagnostic testing are often sequestered in unstructured reports, with the necessity of manual abstraction limiting their use in real-time applications in patient care and research. Methods We developed a two-step process that sequentially deploys generative and interpretative large language models (LLMs; Llama2 70b and Llama2 13b). Using a Llama2 70b model, we generated varying formats of transthoracic echocardiogram (TTE) reports from 3,000 real-world echo reports with paired structured elements, leveraging temporal changes in reporting formats to define the variations. Subsequently, we fine-tuned Llama2 13b using sequentially larger batches of generated echo reports as inputs, to extract data from free-text narratives across 18 clinically relevant echocardiographic fields. This was set up as a prompt-based supervised training task. We evaluated the fine-tuned Llama2 13b model, HeartDx-LM, on several distinct echocardiographic datasets: (i) reports across the different time periods and formats at Yale New Haven Health System (YNHHS), (ii) the Medical Information Mart for Intensive Care (MIMIC) III dataset, and (iii) the MIMIC IV dataset. We used the accuracy of extracted fields and Cohen's Kappa as the metrics and have publicly released the HeartDX-LM model. Results The HeartDX-LM model was trained on randomly selected 2,000 synthetic echo reports with varying formats and paired structured labels, with a wide range of clinical findings. We identified a lower threshold of 500 annotated reports required for fine-tuning Llama2 13b to achieve stable and consistent performance. At YNHHS, the HeartDx-LM model accurately extracted 69,144 out of 70,032 values (98.7%) across 18 clinical fields from unstructured reports in the test set from contemporary records where paired structured data were also available. In older echo reports where only unstructured reports were available, the model achieved 87.1% accuracy against expert annotations for the same 18 fields for a random sample of 100 reports. Similarly, in expert-annotated external validation sets from MIMIC-IV and MIMIC-III, HeartDx-LM correctly extracted 201 out of 220 available values (91.3%) and 615 out of 707 available values (87.9%), respectively, from 100 randomly chosen and expert annotated echo reports from each set. Conclusion We developed a novel method using paired large and moderate-sized LLMs to automate the extraction of unstructured echocardiographic reports into tabular datasets. Our approach represents a scalable strategy that transforms unstructured reports into computable elements that can be leveraged to improve cardiovascular care quality and enable research.
Collapse
|
2
|
Gao Y, Liu M. Application of machine learning based genome sequence analysis in pathogen identification. Front Microbiol 2024; 15:1474078. [PMID: 39417073 PMCID: PMC11480060 DOI: 10.3389/fmicb.2024.1474078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Accepted: 09/23/2024] [Indexed: 10/19/2024] Open
Abstract
Infectious diseases caused by pathogenic microorganisms pose a serious threat to human health. Despite advances in molecular biology, genetics, computation, and medicinal chemistry, infectious diseases remain a significant public health concern. Addressing the challenges posed by pathogen outbreaks, pandemics, and antimicrobial resistance requires concerted interdisciplinary efforts. With the development of computer technology and the continuous exploration of artificial intelligence(AI)applications in the biomedical field, the automatic morphological recognition and image processing of microbial images under microscopes have advanced rapidly. The research team of Institute of Microbiology, Chinese Academy of Sciences has developed a single cell microbial identification technology combining Raman spectroscopy and artificial intelligence. Through laser Raman acquisition system and convolutional neural network analysis, the average accuracy rate of 95.64% has been achieved, and the identification can be completed in only 5 min. These technologies have shown substantial advantages in the visible morphological detection of pathogenic microorganisms, expanding anti-infective drug discovery, enhancing our understanding of infection biology, and accelerating the development of diagnostics. In this review, we discuss the application of AI-based machine learning in image analysis, genome sequencing data analysis, and natural language processing (NLP) for pathogen identification, highlighting the significant role of artificial intelligence in pathogen diagnosis. AI can improve the accuracy and efficiency of diagnosis, promote early detection and personalized treatment, and enhance public health safety.
Collapse
Affiliation(s)
- Yunqiu Gao
- Department of Dermatology, The First Hospital of China Medical University, Shenyang, China
- Key Laboratory of Immunodermatology, Ministry of Education and NHC, National Joint Engineering Research Center for Theranostics of Immunological Skin Diseases, Shenyang, China
| | - Min Liu
- Department of Dermatology, The First Hospital of China Medical University, Shenyang, China
- Institute of Respiratory Disease, China Medical University, Shenyang, China
| |
Collapse
|
3
|
Gundler C, Gottfried K, Wiederhold AJ, Ataian M, Wurlitzer M, Gewehr JE, Ückert F. Unlocking the Potential of Secondary Data for Public Health Research: Retrospective Study With a Novel Clinical Platform. Interact J Med Res 2024; 13:e51563. [PMID: 39353185 PMCID: PMC11480676 DOI: 10.2196/51563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 12/01/2023] [Accepted: 07/17/2024] [Indexed: 10/04/2024] Open
Abstract
BACKGROUND Clinical routine data derived from university hospitals hold immense value for health-related research on large cohorts. However, using secondary data for hypothesis testing necessitates adherence to scientific, legal (such as the General Data Protection Regulation, federal and state protection legislations), technical, and administrative requirements. This process is intricate, time-consuming, and susceptible to errors. OBJECTIVE This study aims to develop a platform that enables clinicians to use current real-world data for testing research and evaluate advantages and limitations at a large university medical center (542,944 patients in 2022). METHODS We identified requirements from clinical practitioners, conceptualized and implemented a platform based on the existing components, and assessed its applicability in clinical reality quantitatively and qualitatively. RESULTS The proposed platform was established at the University Medical Center Hamburg-Eppendorf and made 639 forms encompassing 10,629 data elements accessible to all resident scientists and clinicians. Every day, the number of patients rises, and parts of their electronic health records are made accessible through the platform. Qualitatively, we were able to conduct a retrospective analysis of Parkinson disease over 777 patients, where we provide additional evidence for a significantly higher proportion of action tremors in patients with rest tremors (340/777, 43.8%) compared with those without rest tremors (255/777, 32.8%), as determined by a chi-square test (P<.001). Quantitatively, our findings demonstrate increased user engagement within the last 90 days, underscoring clinicians' increasing adoption of the platform in their regular research activities. Notably, the platform facilitated the retrieval of clinical data from 600,000 patients, emphasizing its substantial added value. CONCLUSIONS This study demonstrates the feasibility of simplifying the use of clinical data to enhance exploration and sustainability in scientific research. The proposed platform emerges as a potential technological and legal framework for other medical centers, providing them with the means to unlock untapped potential within their routine data.
Collapse
Affiliation(s)
- Christopher Gundler
- Institute for Applied Medical Informatics, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Karl Gottfried
- Institute for Applied Medical Informatics, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | | | - Maximilian Ataian
- Institute for Applied Medical Informatics, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Marcus Wurlitzer
- Research Data Facility, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Jan Erik Gewehr
- Research Data Facility, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Frank Ückert
- Institute for Applied Medical Informatics, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| |
Collapse
|
4
|
Sivarajkumar S, Tam TYC, Mohammad HA, Viggiano S, Oniani D, Visweswaran S, Wang Y. Extraction of sleep information from clinical notes of Alzheimer's disease patients using natural language processing. J Am Med Inform Assoc 2024; 31:2217-2227. [PMID: 39001795 DOI: 10.1093/jamia/ocae177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 06/19/2024] [Accepted: 07/01/2024] [Indexed: 07/15/2024] Open
Abstract
OBJECTIVES Alzheimer's disease (AD) is the most common form of dementia in the United States. Sleep is one of the lifestyle-related factors that has been shown critical for optimal cognitive function in old age. However, there is a lack of research studying the association between sleep and AD incidence. A major bottleneck for conducting such research is that the traditional way to acquire sleep information is time-consuming, inefficient, non-scalable, and limited to patients' subjective experience. We aim to automate the extraction of specific sleep-related patterns, such as snoring, napping, poor sleep quality, daytime sleepiness, night wakings, other sleep problems, and sleep duration, from clinical notes of AD patients. These sleep patterns are hypothesized to play a role in the incidence of AD, providing insight into the relationship between sleep and AD onset and progression. MATERIALS AND METHODS A gold standard dataset is created from manual annotation of 570 randomly sampled clinical note documents from the adSLEEP, a corpus of 192 000 de-identified clinical notes of 7266 AD patients retrieved from the University of Pittsburgh Medical Center (UPMC). We developed a rule-based natural language processing (NLP) algorithm, machine learning models, and large language model (LLM)-based NLP algorithms to automate the extraction of sleep-related concepts, including snoring, napping, sleep problem, bad sleep quality, daytime sleepiness, night wakings, and sleep duration, from the gold standard dataset. RESULTS The annotated dataset of 482 patients comprised a predominantly White (89.2%), older adult population with an average age of 84.7 years, where females represented 64.1%, and a vast majority were non-Hispanic or Latino (94.6%). Rule-based NLP algorithm achieved the best performance of F1 across all sleep-related concepts. In terms of positive predictive value (PPV), the rule-based NLP algorithm achieved the highest PPV scores for daytime sleepiness (1.00) and sleep duration (1.00), while the machine learning models had the highest PPV for napping (0.95) and bad sleep quality (0.86), and LLAMA2 with finetuning had the highest PPV for night wakings (0.93) and sleep problem (0.89). DISCUSSION Although sleep information is infrequently documented in the clinical notes, the proposed rule-based NLP algorithm and LLM-based NLP algorithms still achieved promising results. In comparison, the machine learning-based approaches did not achieve good results, which is due to the small size of sleep information in the training data. CONCLUSION The results show that the rule-based NLP algorithm consistently achieved the best performance for all sleep concepts. This study focused on the clinical notes of patients with AD but could be extended to general sleep information extraction for other diseases.
Collapse
Affiliation(s)
- Sonish Sivarajkumar
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - Thomas Yu Chow Tam
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - Haneef Ahamed Mohammad
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - Samuel Viggiano
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - David Oniani
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - Shyam Visweswaran
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA 15260, United States
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - Yanshan Wang
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA 15260, United States
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA 15260, United States
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15260, United States
- Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, PA 15260, United States
| |
Collapse
|
5
|
Wiest IC, Ferber D, Zhu J, van Treeck M, Meyer SK, Juglan R, Carrero ZI, Paech D, Kleesiek J, Ebert MP, Truhn D, Kather JN. Privacy-preserving large language models for structured medical information retrieval. NPJ Digit Med 2024; 7:257. [PMID: 39304709 DOI: 10.1038/s41746-024-01233-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Accepted: 08/19/2024] [Indexed: 09/22/2024] Open
Abstract
Most clinical information is encoded as free text, not accessible for quantitative analysis. This study presents an open-source pipeline using the local large language model (LLM) "Llama 2" to extract quantitative information from clinical text and evaluates its performance in identifying features of decompensated liver cirrhosis. The LLM identified five key clinical features in a zero- and one-shot manner from 500 patient medical histories in the MIMIC IV dataset. We compared LLMs of three sizes and various prompt engineering approaches, with predictions compared against ground truth from three blinded medical experts. Our pipeline achieved high accuracy, detecting liver cirrhosis with 100% sensitivity and 96% specificity. High sensitivities and specificities were also yielded for detecting ascites (95%, 95%), confusion (76%, 94%), abdominal pain (84%, 97%), and shortness of breath (87%, 97%) using the 70 billion parameter model, which outperformed smaller versions. Our study successfully demonstrates the capability of locally deployed LLMs to extract clinical information from free text with low hardware requirements.
Collapse
Affiliation(s)
- Isabella Catharina Wiest
- Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
- Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
| | - Dyke Ferber
- Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
- Department of Medical Oncology, National Center for Tumor Diseases (NCT), Heidelberg University Hospital, Heidelberg, Germany
| | - Jiefu Zhu
- Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
| | - Marko van Treeck
- Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
| | - Sonja K Meyer
- Department of Surgery I, University Hospital Würzburg, Würzburg, Germany
| | - Radhika Juglan
- Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
| | - Zunamys I Carrero
- Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
| | - Daniel Paech
- German Cancer Research Center, Division of Radiology, Heidelberg, Germany
- University Hospital Bonn, Clinic for Neuroradiology, Bonn, Germany
| | - Jens Kleesiek
- Institut für KI in der Medizin (IKIM), Universitätsmedizin Essen, Girardetstr. 2, 45131, Essen, Germany
- Cancer Research Center Cologne Essen (CCCE), West German Cancer Center Essen (WTZ), 45122, Essen, Germany
- TU Dortmund University, Department of Physics, Otto-Hahn-Straße 4, 44227, Dortmund, Germany
| | - Matthias P Ebert
- Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
- DKFZ Hector Cancer Institute at the University Medical Center, Mannheim, Germany
- Molecular Medicine Partnership Unit, EMBL, Heidelberg, Germany
| | - Daniel Truhn
- Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, Germany
| | - Jakob Nikolas Kather
- Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany.
- Department of Medical Oncology, National Center for Tumor Diseases (NCT), Heidelberg University Hospital, Heidelberg, Germany.
- Department of Medicine I, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, 01307, Dresden, Germany.
| |
Collapse
|
6
|
Mason AJC, Bhavsar V, Botelle R, Chandran D, Li L, Mascio A, Sanyal J, Kadra-Scalzo G, Roberts A, Williams M, Stewart R. Applying neural network algorithms to ascertain reported experiences of violence in routine mental healthcare records and distributions of reports by diagnosis. Front Psychiatry 2024; 15:1181739. [PMID: 39319350 PMCID: PMC11420987 DOI: 10.3389/fpsyt.2024.1181739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 08/14/2024] [Indexed: 09/26/2024] Open
Abstract
Introduction Experiences of violence are important risk factors for worse outcome in people with mental health conditions; however, they are not routinely collected be mental health services, so their ascertainment depends on extraction from text fields with natural language processing (NLP) algorithms. Methods Applying previously developed neural network algorithms to routine mental healthcare records, we sought to describe the distribution of recorded violence victimisation by demographic and diagnostic characteristics. We ascertained recorded violence victimisation from the records of 60,021 patients receiving care from a large south London NHS mental healthcare provider during 2019. Descriptive and regression analyses were conducted to investigate variation by age, sex, ethnic group, and diagnostic category (ICD-10 F chapter sub-headings plus post-traumatic stress disorder (PTSD) as a specific condition). Results Patients with a mood disorder (adjusted odds ratio 1.63, 1.55-1.72), personality disorder (4.03, 3.65-4.45), schizophrenia spectrum disorder (1.84, 1.74-1.95) or PTSD (2.36, 2.08-2.69) had a significantly increased likelihood of victimisation compared to those with other mental health diagnoses. Additionally, patients from minority ethnic groups (1.10 (1.02-1.20) for Black, 1.40 (1.31-1.49) for Asian compared to White groups) had significantly higher likelihood of recorded violence victimisation. Males were significantly less likely to have reported recorded violence victimisation (0.44, 0.42-0.45) than females. Discussion We thus demonstrate the successful deployment of machine learning based NLP algorithms to ascertain important entities for outcome prediction in mental healthcare. The observed distributions highlight which sex, ethnicity and diagnostic groups had more records of violence victimisation. Further development of these algorithms could usefully capture broader experiences, such as differentiating more efficiently between witnessed, perpetrated and experienced violence and broader violence experiences like emotional abuse.
Collapse
Affiliation(s)
- Ava J C Mason
- King's College London Institute of Psychiatry, Psychology and Neuroscience, De Crespigny Park, London, United Kingdom
| | - Vishal Bhavsar
- King's College London Institute of Psychiatry, Psychology and Neuroscience, De Crespigny Park, London, United Kingdom
- Biomedical Research Centre, South London and Maudsley National Health Service (NHS) Foundation Trust, London, United Kingdom
| | - Riley Botelle
- King's College London Institute of Psychiatry, Psychology and Neuroscience, De Crespigny Park, London, United Kingdom
| | - David Chandran
- King's College London Institute of Psychiatry, Psychology and Neuroscience, De Crespigny Park, London, United Kingdom
| | - Lifang Li
- King's College London Institute of Psychiatry, Psychology and Neuroscience, De Crespigny Park, London, United Kingdom
| | - Aurelie Mascio
- King's College London Institute of Psychiatry, Psychology and Neuroscience, De Crespigny Park, London, United Kingdom
| | - Jyoti Sanyal
- Biomedical Research Centre, South London and Maudsley National Health Service (NHS) Foundation Trust, London, United Kingdom
| | - Gioulaina Kadra-Scalzo
- King's College London Institute of Psychiatry, Psychology and Neuroscience, De Crespigny Park, London, United Kingdom
| | - Angus Roberts
- King's College London Institute of Psychiatry, Psychology and Neuroscience, De Crespigny Park, London, United Kingdom
| | - Marcus Williams
- Biomedical Research Centre, South London and Maudsley National Health Service (NHS) Foundation Trust, London, United Kingdom
- Sandwell and West Birmingham Hospitals National Health Service (NHS) Trust, West Bromwich, United Kingdom
| | - Robert Stewart
- King's College London Institute of Psychiatry, Psychology and Neuroscience, De Crespigny Park, London, United Kingdom
- Biomedical Research Centre, South London and Maudsley National Health Service (NHS) Foundation Trust, London, United Kingdom
| |
Collapse
|
7
|
Wen A, Wang L, He H, Fu S, Liu S, Hanauer DA, Harris DR, Kavuluru R, Zhang R, Natarajan K, Pavinkurve NP, Hajagos J, Rajupet S, Lingam V, Saltz M, Elowsky C, Moffitt RA, Koraishy FM, Palchuk MB, Donovan J, Lingrey L, Stone-DerHagopian G, Miller RT, Williams AE, Leese PJ, Kovach PI, Pfaff ER, Zemmel M, Pates RD, Guthe N, Haendel MA, Chute CG, Liu H. A Case Demonstration of the Open Health Natural Language Processing Toolkit From the National COVID-19 Cohort Collaborative and the Researching COVID to Enhance Recovery Programs for a Natural Language Processing System for COVID-19 or Postacute Sequelae of SARS CoV-2 Infection: Algorithm Development and Validation. JMIR Med Inform 2024; 12:e49997. [PMID: 39250782 PMCID: PMC11420592 DOI: 10.2196/49997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 12/11/2023] [Accepted: 03/01/2024] [Indexed: 09/11/2024] Open
Abstract
BACKGROUND A wealth of clinically relevant information is only obtainable within unstructured clinical narratives, leading to great interest in clinical natural language processing (NLP). While a multitude of approaches to NLP exist, current algorithm development approaches have limitations that can slow the development process. These limitations are exacerbated when the task is emergent, as is the case currently for NLP extraction of signs and symptoms of COVID-19 and postacute sequelae of SARS-CoV-2 infection (PASC). OBJECTIVE This study aims to highlight the current limitations of existing NLP algorithm development approaches that are exacerbated by NLP tasks surrounding emergent clinical concepts and to illustrate our approach to addressing these issues through the use case of developing an NLP system for the signs and symptoms of COVID-19 and PASC. METHODS We used 2 preexisting studies on PASC as a baseline to determine a set of concepts that should be extracted by NLP. This concept list was then used in conjunction with the Unified Medical Language System to autonomously generate an expanded lexicon to weakly annotate a training set, which was then reviewed by a human expert to generate a fine-tuned NLP algorithm. The annotations from a fully human-annotated test set were then compared with NLP results from the fine-tuned algorithm. The NLP algorithm was then deployed to 10 additional sites that were also running our NLP infrastructure. Of these 10 sites, 5 were used to conduct a federated evaluation of the NLP algorithm. RESULTS An NLP algorithm consisting of 12,234 unique normalized text strings corresponding to 2366 unique concepts was developed to extract COVID-19 or PASC signs and symptoms. An unweighted mean dictionary coverage of 77.8% was found for the 5 sites. CONCLUSIONS The evolutionary and time-critical nature of the PASC NLP task significantly complicates existing approaches to NLP algorithm development. In this work, we present a hybrid approach using the Open Health Natural Language Processing Toolkit aimed at addressing these needs with a dictionary-based weak labeling step that minimizes the need for additional expert annotation while still preserving the fine-tuning capabilities of expert involvement.
Collapse
Affiliation(s)
- Andrew Wen
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
- McWilliams School of Biomedical Informatics, University of Texas Health Sciences Center at Houston, Houston, TX, United States
| | - Liwei Wang
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
- McWilliams School of Biomedical Informatics, University of Texas Health Sciences Center at Houston, Houston, TX, United States
| | - Huan He
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Sunyang Fu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
- McWilliams School of Biomedical Informatics, University of Texas Health Sciences Center at Houston, Houston, TX, United States
| | - Sijia Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - David A Hanauer
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI, United States
| | - Daniel R Harris
- Institute for Pharmaceutical Outcomes and Policy, College of Pharmacy, University of Kentucky, Lexington, KY, United States
| | - Ramakanth Kavuluru
- Division of Biomedical Informatics, Department of Internal Medicine, University of Kentucky, Lexington, KY, United States
| | - Rui Zhang
- Division of Health Data Science, University of Minnesota Medical School, Minneapolis, MN, United States
| | - Karthik Natarajan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, United States
| | - Nishanth P Pavinkurve
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, United States
| | - Janos Hajagos
- Department of Biomedical Informatics, Stony Brook Medicine, Stony Brook, NY, United States
| | - Sritha Rajupet
- Department of Biomedical Informatics, Stony Brook Medicine, Stony Brook, NY, United States
| | - Veena Lingam
- Department of Biomedical Informatics, Stony Brook Medicine, Stony Brook, NY, United States
| | - Mary Saltz
- Department of Biomedical Informatics, Stony Brook Medicine, Stony Brook, NY, United States
| | - Corey Elowsky
- Department of Biomedical Informatics, Stony Brook Medicine, Stony Brook, NY, United States
| | - Richard A Moffitt
- Department of Biomedical Informatics, Stony Brook Medicine, Stony Brook, NY, United States
| | - Farrukh M Koraishy
- Division of Nephrology, Stony Brook Medicine, Stony Brook, NY, United States
| | | | | | | | | | - Robert T Miller
- Clinical and Translational Science Institute, Tufts Medical Center, Boston, MA, United States
| | - Andrew E Williams
- Clinical and Translational Science Institute, Tufts Medical Center, Boston, MA, United States
- Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, United States
| | - Peter J Leese
- North Carolina Translational and Clinical Sciences Institute, University of North Carolina School of Medicine, Chapel Hill, NC, United States
| | - Paul I Kovach
- North Carolina Translational and Clinical Sciences Institute, University of North Carolina School of Medicine, Chapel Hill, NC, United States
| | - Emily R Pfaff
- North Carolina Translational and Clinical Sciences Institute, University of North Carolina School of Medicine, Chapel Hill, NC, United States
| | - Mikhail Zemmel
- University of Virginia, Charlottesville, VA, United States
| | - Robert D Pates
- University of Virginia, Charlottesville, VA, United States
| | - Nick Guthe
- Department of Population Health, New York University Grossman School of Medicine, New York, NY, United States
| | - Melissa A Haendel
- University of Colorado Anschutz Medical Campus, Denver, CO, United States
| | - Christopher G Chute
- Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, MD, United States
| | - Hongfang Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
- McWilliams School of Biomedical Informatics, University of Texas Health Sciences Center at Houston, Houston, TX, United States
| |
Collapse
|
8
|
Wei R. Automated Medical Records Review for Mild Cognitive Impairment and Dementia. RESEARCH SQUARE 2024:rs.3.rs-5046441. [PMID: 39315274 PMCID: PMC11419186 DOI: 10.21203/rs.3.rs-5046441/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
Objectives Unstructured and structured data in electronic health records (EHR) are a rich source of information for research and quality improvement studies. However, extracting accurate information from EHR is labor-intensive. Here we introduce an automated EHR phenotyping model to identify patients with Alzheimer's Disease, related dementias (ADRD), or mild cognitive impairment (MCI). Methods We assembled medical notes and associated International Classification of Diseases (ICD) codes and medication prescriptions from 3,626 outpatient adults from two hospitals seen between February 2015 and June 2022. Ground truth annotations regarding the presence vs. absence of a diagnosis of MCI or ADRD were determined through manual chart review. Indicators extracted from notes included the presence of keywords and phrases in unstructured clinical notes, prescriptions of medications associated with MCI/ADRD, and ICD codes associated with MCI/ADRD. We trained a regularized logistic regression model to predict the ground truth annotations. Model performance was evaluated using area under the receiver operating curve (AUROC), area under the precision-recall curve (AUPRC), accuracy, specificity, precision/positive predictive value, recall/sensitivity, and F1 score (harmonic mean of precision and recall). Results Thirty percent of patients in the cohort carried diagnoses of MCI/ADRD based on manual review. When evaluated on a held-out test set, the best model using clinical notes, ICDs, and medications, achieved an AUROC of 0.98, an AUPRC of 0.98, an accuracy of 0.93, a sensitivity (recall) of 0.91, a specificity of 0.96, a precision of 0.96, and an F1 score of 0.93 The estimated overall accuracy for patients randomly selected from EHRs was 99.88%. Conclusion Automated EHR phenotyping accurately identifies patients with MCI/ADRD based on clinical notes, ICD codes, and medication records. This approach holds potential for large-scale MCI/ADRD research utilizing EHR databases.
Collapse
|
9
|
Fu YV, Ramachandran GK, Halwani A, McInnes BT, Xia F, Lybarger K, Yetisgen M, Uzuner Ö. CACER: Clinical concept Annotations for Cancer Events and Relations. J Am Med Inform Assoc 2024:ocae231. [PMID: 39225779 DOI: 10.1093/jamia/ocae231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 08/08/2024] [Accepted: 08/12/2024] [Indexed: 09/04/2024] Open
Abstract
OBJECTIVE Clinical notes contain unstructured representations of patient histories, including the relationships between medical problems and prescription drugs. To investigate the relationship between cancer drugs and their associated symptom burden, we extract structured, semantic representations of medical problem and drug information from the clinical narratives of oncology notes. MATERIALS AND METHODS We present Clinical concept Annotations for Cancer Events and Relations (CACER), a novel corpus with fine-grained annotations for over 48 000 medical problems and drug events and 10 000 drug-problem and problem-problem relations. Leveraging CACER, we develop and evaluate transformer-based information extraction models such as Bidirectional Encoder Representations from Transformers (BERT), Fine-tuned Language Net Text-To-Text Transfer Transformer (Flan-T5), Large Language Model Meta AI (Llama3), and Generative Pre-trained Transformers-4 (GPT-4) using fine-tuning and in-context learning (ICL). RESULTS In event extraction, the fine-tuned BERT and Llama3 models achieved the highest performance at 88.2-88.0 F1, which is comparable to the inter-annotator agreement (IAA) of 88.4 F1. In relation extraction, the fine-tuned BERT, Flan-T5, and Llama3 achieved the highest performance at 61.8-65.3 F1. GPT-4 with ICL achieved the worst performance across both tasks. DISCUSSION The fine-tuned models significantly outperformed GPT-4 in ICL, highlighting the importance of annotated training data and model optimization. Furthermore, the BERT models performed similarly to Llama3. For our task, large language models offer no performance advantage over the smaller BERT models. CONCLUSIONS We introduce CACER, a novel corpus with fine-grained annotations for medical problems, drugs, and their relationships in clinical narratives of oncology notes. State-of-the-art transformer models achieved performance comparable to IAA for several extraction tasks.
Collapse
Affiliation(s)
- Yujuan Velvin Fu
- Department of Biomedical Informatics & Medical Education, University of Washington, Seattle, WA 98195, United States
| | | | - Ahmad Halwani
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, United States
| | - Bridget T McInnes
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, United States
| | - Fei Xia
- Department of Linguistics, University of Washington, Seattle, WA 98195, United States
| | - Kevin Lybarger
- Department of Information Sciences and Technology, George Mason University, Fairfax, VA 22030, United States
| | - Meliha Yetisgen
- Department of Biomedical Informatics & Medical Education, University of Washington, Seattle, WA 98195, United States
| | - Özlem Uzuner
- Department of Information Sciences and Technology, George Mason University, Fairfax, VA 22030, United States
| |
Collapse
|
10
|
Cho H, Yoo S, Kim B, Jang S, Sunwoo L, Kim S, Lee D, Kim S, Nam S, Chung JH. Extracting lung cancer staging descriptors from pathology reports: A generative language model approach. J Biomed Inform 2024; 157:104720. [PMID: 39233209 DOI: 10.1016/j.jbi.2024.104720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 08/04/2024] [Accepted: 08/31/2024] [Indexed: 09/06/2024]
Abstract
BACKGROUND In oncology, electronic health records contain textual key information for the diagnosis, staging, and treatment planning of patients with cancer. However, text data processing requires a lot of time and effort, which limits the utilization of these data. Recent advances in natural language processing (NLP) technology, including large language models, can be applied to cancer research. Particularly, extracting the information required for the pathological stage from surgical pathology reports can be utilized to update cancer staging according to the latest cancer staging guidelines. OBJECTIVES This study has two main objectives. The first objective is to evaluate the performance of extracting information from text-based surgical pathology reports and determining pathological stages based on the extracted information using fine-tuned generative language models (GLMs) for patients with lung cancer. The second objective is to determine the feasibility of utilizing relatively small GLMs for information extraction in a resource-constrained computing environment. METHODS Lung cancer surgical pathology reports were collected from the Common Data Model database of Seoul National University Bundang Hospital (SNUBH), a tertiary hospital in Korea. We selected 42 descriptors necessary for tumor-node (TN) classification based on these reports and created a gold standard with validation by two clinical experts. The pathology reports and gold standard were used to generate prompt-response pairs for training and evaluating GLMs which then were used to extract information required for staging from pathology reports. RESULTS We evaluated the information extraction performance of six trained models as well as their performance in TN classification using the extracted information. The Deductive Mistral-7B model, which was pre-trained with the deductive dataset, showed the best performance overall, with an exact match ratio of 92.24% in the information extraction problem and an accuracy of 0.9876 (predicting T and N classification concurrently) in classification. CONCLUSION This study demonstrated that training GLMs with deductive datasets can improve information extraction performance, and GLMs with a relatively small number of parameters at approximately seven billion can achieve high performance in this problem. The proposed GLM-based information extraction method is expected to be useful in clinical decision-making support, lung cancer staging and research.
Collapse
Affiliation(s)
- Hyeongmin Cho
- ezCaretech Research & Development Center, Jung-gu, Seoul, Republic of Korea
| | - Sooyoung Yoo
- Office of eHealth Research and Business, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Borham Kim
- Office of eHealth Research and Business, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Sowon Jang
- Department of Radiology, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Leonard Sunwoo
- Department of Radiology, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Sanghwan Kim
- ezCaretech Research & Development Center, Jung-gu, Seoul, Republic of Korea
| | - Donghyoung Lee
- ezCaretech Research & Development Center, Jung-gu, Seoul, Republic of Korea
| | - Seok Kim
- Office of eHealth Research and Business, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Sejin Nam
- ezCaretech Research & Development Center, Jung-gu, Seoul, Republic of Korea.
| | - Jin-Haeng Chung
- Department of Pathology, Seoul National University College of Medicine, Seoul, Republic of Korea; Department of Pathology and Translational Medicine Seoul National University Bundang Hospital, Seongnam, Republic of Korea.
| |
Collapse
|
11
|
Hu Y, Chen Q, Du J, Peng X, Keloth VK, Zuo X, Zhou Y, Li Z, Jiang X, Lu Z, Roberts K, Xu H. Improving large language models for clinical named entity recognition via prompt engineering. J Am Med Inform Assoc 2024; 31:1812-1820. [PMID: 38281112 PMCID: PMC11339492 DOI: 10.1093/jamia/ocad259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 12/15/2023] [Accepted: 12/26/2023] [Indexed: 01/29/2024] Open
Abstract
IMPORTANCE The study highlights the potential of large language models, specifically GPT-3.5 and GPT-4, in processing complex clinical data and extracting meaningful information with minimal training data. By developing and refining prompt-based strategies, we can significantly enhance the models' performance, making them viable tools for clinical NER tasks and possibly reducing the reliance on extensive annotated datasets. OBJECTIVES This study quantifies the capabilities of GPT-3.5 and GPT-4 for clinical named entity recognition (NER) tasks and proposes task-specific prompts to improve their performance. MATERIALS AND METHODS We evaluated these models on 2 clinical NER tasks: (1) to extract medical problems, treatments, and tests from clinical notes in the MTSamples corpus, following the 2010 i2b2 concept extraction shared task, and (2) to identify nervous system disorder-related adverse events from safety reports in the vaccine adverse event reporting system (VAERS). To improve the GPT models' performance, we developed a clinical task-specific prompt framework that includes (1) baseline prompts with task description and format specification, (2) annotation guideline-based prompts, (3) error analysis-based instructions, and (4) annotated samples for few-shot learning. We assessed each prompt's effectiveness and compared the models to BioClinicalBERT. RESULTS Using baseline prompts, GPT-3.5 and GPT-4 achieved relaxed F1 scores of 0.634, 0.804 for MTSamples and 0.301, 0.593 for VAERS. Additional prompt components consistently improved model performance. When all 4 components were used, GPT-3.5 and GPT-4 achieved relaxed F1 socres of 0.794, 0.861 for MTSamples and 0.676, 0.736 for VAERS, demonstrating the effectiveness of our prompt framework. Although these results trail BioClinicalBERT (F1 of 0.901 for the MTSamples dataset and 0.802 for the VAERS), it is very promising considering few training samples are needed. DISCUSSION The study's findings suggest a promising direction in leveraging LLMs for clinical NER tasks. However, while the performance of GPT models improved with task-specific prompts, there's a need for further development and refinement. LLMs like GPT-4 show potential in achieving close performance to state-of-the-art models like BioClinicalBERT, but they still require careful prompt engineering and understanding of task-specific knowledge. The study also underscores the importance of evaluation schemas that accurately reflect the capabilities and performance of LLMs in clinical settings. CONCLUSION While direct application of GPT models to clinical NER tasks falls short of optimal performance, our task-specific prompt framework, incorporating medical knowledge and training samples, significantly enhances GPT models' feasibility for potential clinical applications.
Collapse
Affiliation(s)
- Yan Hu
- McWilliams School of Biomedical Informatics, Houston, TX, United States
| | - Qingyu Chen
- Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT, United States
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States
| | - Jingcheng Du
- McWilliams School of Biomedical Informatics, Houston, TX, United States
| | - Xueqing Peng
- Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT, United States
| | - Vipina Kuttichi Keloth
- Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT, United States
| | - Xu Zuo
- McWilliams School of Biomedical Informatics, Houston, TX, United States
| | - Yujia Zhou
- McWilliams School of Biomedical Informatics, Houston, TX, United States
| | - Zehan Li
- McWilliams School of Biomedical Informatics, Houston, TX, United States
| | - Xiaoqian Jiang
- McWilliams School of Biomedical Informatics, Houston, TX, United States
| | - Zhiyong Lu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States
| | - Kirk Roberts
- McWilliams School of Biomedical Informatics, Houston, TX, United States
| | - Hua Xu
- Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT, United States
| |
Collapse
|
12
|
Fang Y, Ryan P, Weng C. Knowledge-guided generative artificial intelligence for automated taxonomy learning from drug labels. J Am Med Inform Assoc 2024; 31:2065-2075. [PMID: 38787964 PMCID: PMC11339527 DOI: 10.1093/jamia/ocae105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 04/25/2024] [Accepted: 04/25/2024] [Indexed: 05/26/2024] Open
Abstract
OBJECTIVES To automatically construct a drug indication taxonomy from drug labels using generative Artificial Intelligence (AI) represented by the Large Language Model (LLM) GPT-4 and real-world evidence (RWE). MATERIALS AND METHODS We extracted indication terms from 46 421 free-text drug labels using GPT-4, iteratively and recursively generated indication concepts and inferred indication concept-to-concept and concept-to-term subsumption relations by integrating GPT-4 with RWE, and created a drug indication taxonomy. Quantitative and qualitative evaluations involving domain experts were performed for cardiovascular (CVD), Endocrine, and Genitourinary system diseases. RESULTS 2909 drug indication terms were extracted and assigned into 24 high-level indication categories (ie, initially generated concepts), each of which was expanded into a sub-taxonomy. For example, the CVD sub-taxonomy contains 242 concepts, spanning a depth of 11, with 170 being leaf nodes. It collectively covers a total of 234 indication terms associated with 189 distinct drugs. The accuracies of GPT-4 on determining the drug indication hierarchy exceeded 0.7 with "good to very good" inter-rater reliability. However, the accuracies of the concept-to-term subsumption relation checking varied greatly, with "fair to moderate" reliability. DISCUSSION AND CONCLUSION We successfully used generative AI and RWE to create a taxonomy, with drug indications adequately consistent with domain expert expectations. We show that LLMs are good at deriving their own concept hierarchies but still fall short in determining the subsumption relations between concepts and terms in unregulated language from free-text drug labels, which is the same hard task for human experts.
Collapse
Affiliation(s)
- Yilu Fang
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, United States
| | - Patrick Ryan
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, United States
- Observational Health Data Analytics, Janssen Research and Development, Titusville, NJ 08560, United States
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, United States
| |
Collapse
|
13
|
Zhou H, Li M, Xiao Y, Yang H, Zhang R. LEAP: LLM instruction-example adaptive prompting framework for biomedical relation extraction. J Am Med Inform Assoc 2024; 31:2010-2018. [PMID: 38904416 PMCID: PMC11339510 DOI: 10.1093/jamia/ocae147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 05/26/2024] [Accepted: 06/03/2024] [Indexed: 06/22/2024] Open
Abstract
OBJECTIVE To investigate the demonstration in large language models (LLMs) for biomedical relation extraction. This study introduces a framework comprising three types of adaptive tuning methods to assess their impacts and effectiveness. MATERIALS AND METHODS Our study was conducted in two phases. Initially, we analyzed a range of demonstration components vital for LLMs' biomedical data capabilities, including task descriptions and examples, experimenting with various combinations. Subsequently, we introduced the LLM instruction-example adaptive prompting (LEAP) framework, including instruction adaptive tuning, example adaptive tuning, and instruction-example adaptive tuning methods. This framework aims to systematically investigate both adaptive task descriptions and adaptive examples within the demonstration. We assessed the performance of the LEAP framework on the DDI, ChemProt, and BioRED datasets, employing LLMs such as Llama2-7b, Llama2-13b, and MedLLaMA_13B. RESULTS Our findings indicated that Instruction + Options + Example and its expanded form substantially improved F1 scores over the standard Instruction + Options mode for zero-shot LLMs. The LEAP framework, particularly through its example adaptive prompting, demonstrated superior performance over conventional instruction tuning across all models. Notably, the MedLLAMA_13B model achieved an exceptional F1 score of 95.13 on the ChemProt dataset using this method. Significant improvements were also observed in the DDI 2013 and BioRED datasets, confirming the method's robustness in sophisticated data extraction scenarios. CONCLUSION The LEAP framework offers a compelling strategy for enhancing LLM training strategies, steering away from extensive fine-tuning towards more dynamic and contextually enriched prompting methodologies, showcasing in biomedical relation extraction.
Collapse
Affiliation(s)
- Huixue Zhou
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN 55455, United States
| | - Mingchen Li
- Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States
| | - Yongkang Xiao
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN 55455, United States
| | - Han Yang
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN 55455, United States
| | - Rui Zhang
- Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States
| |
Collapse
|
14
|
Tran H, Yang Z, Yao Z, Yu H. BioInstruct: instruction tuning of large language models for biomedical natural language processing. J Am Med Inform Assoc 2024; 31:1821-1832. [PMID: 38833265 PMCID: PMC11339494 DOI: 10.1093/jamia/ocae122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 05/10/2023] [Accepted: 05/14/2024] [Indexed: 06/06/2024] Open
Abstract
OBJECTIVES To enhance the performance of large language models (LLMs) in biomedical natural language processing (BioNLP) by introducing a domain-specific instruction dataset and examining its impact when combined with multi-task learning principles. MATERIALS AND METHODS We created the BioInstruct, comprising 25 005 instructions to instruction-tune LLMs (LLaMA 1 and 2, 7B and 13B version). The instructions were created by prompting the GPT-4 language model with 3-seed samples randomly drawn from an 80 human curated instructions. We employed Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning. We then evaluated these instruction-tuned LLMs on several BioNLP tasks, which can be grouped into 3 major categories: question answering (QA), information extraction (IE), and text generation (GEN). We also examined whether categories (eg, QA, IE, and generation) of instructions impact model performance. RESULTS AND DISCUSSION Comparing with LLMs without instruction-tuned, our instruction-tuned LLMs demonstrated marked performance gains: 17.3% in QA on average accuracy metric, 5.7% in IE on average F1 metric, and 96% in Generation tasks on average GPT-4 score metric. Our 7B-parameter instruction-tuned LLaMA 1 model was competitive or even surpassed other LLMs in the biomedical domain that were also fine-tuned from LLaMA 1 with vast domain-specific data or a variety of tasks. Our results also show that the performance gain is significantly higher when instruction fine-tuning is conducted with closely related tasks. Our findings align with the observations of multi-task learning, suggesting the synergies between 2 tasks. CONCLUSION The BioInstruct dataset serves as a valuable resource and instruction tuned LLMs lead to the best performing BioNLP applications.
Collapse
Affiliation(s)
- Hieu Tran
- Manning College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, MA 01003, United States
| | - Zhichao Yang
- Manning College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, MA 01003, United States
| | - Zonghai Yao
- Manning College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, MA 01003, United States
| | - Hong Yu
- Manning College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, MA 01003, United States
- Department of Medicine, University of Massachusetts Medical School, Worcester, MA 01655, United States
- Center for Biomedical and Health Research in Data Sciences, Miner School of Computer and Information Sciences, University of Massachusetts Lowell, Lowell, MA 01854, United States
- Center for Healthcare Organization and Implementation Research, VA Bedford Health Care, Bedford, MA 01730, United States
| |
Collapse
|
15
|
Albashayreh A, Bandyopadhyay A, Zeinali N, Zhang M, Fan W, Gilbertson White S. Natural Language Processing Accurately Differentiates Cancer Symptom Information in Electronic Health Record Narratives. JCO Clin Cancer Inform 2024; 8:e2300235. [PMID: 39116379 DOI: 10.1200/cci.23.00235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 04/29/2024] [Accepted: 05/30/2024] [Indexed: 08/10/2024] Open
Abstract
PURPOSE Identifying cancer symptoms in electronic health record (EHR) narratives is feasible with natural language processing (NLP). However, more efficient NLP systems are needed to detect various symptoms and distinguish observed symptoms from negated symptoms and medication-related side effects. We evaluated the accuracy of NLP in (1) detecting 14 symptom groups (ie, pain, fatigue, swelling, depressed mood, anxiety, nausea/vomiting, pruritus, headache, shortness of breath, constipation, numbness/tingling, decreased appetite, impaired memory, disturbed sleep) and (2) distinguishing observed symptoms in EHR narratives among patients with cancer. METHODS We extracted 902,508 notes for 11,784 unique patients diagnosed with cancer and developed a gold standard corpus of 1,112 notes labeled for presence or absence of 14 symptom groups. We trained an embeddings-augmented NLP system integrating human and machine intelligence and conventional machine learning algorithms. NLP metrics were calculated on a gold standard corpus subset for testing. RESULTS The interannotator agreement for labeling the gold standard corpus was excellent at 92%. The embeddings-augmented NLP model achieved the best performance (F1 score = 0.877). The highest NLP accuracy was observed in pruritus (F1 score = 0.937) while the lowest accuracy was in swelling (F1 score = 0.787). After classifying the entire data set with embeddings-augmented NLP, we found that 41% of the notes included symptom documentation. Pain was the most documented symptom (29% of all notes) while impaired memory was the least documented (0.7% of all notes). CONCLUSION We illustrated the feasibility of detecting 14 symptom groups in EHR narratives and showed that an embeddings-augmented NLP system outperforms conventional machine learning algorithms in detecting symptom information and differentiating observed symptoms from negated symptoms and medication-related side effects.
Collapse
Affiliation(s)
| | | | | | - Min Zhang
- School of Economics and Management, Communication University of China, Beijing, China
| | - Weiguo Fan
- Tippie College of Business, University of Iowa, Iowa City, IA
| | | |
Collapse
|
16
|
Munzone E, Marra A, Comotto F, Guercio L, Sangalli CA, Lo Cascio M, Pagan E, Sangalli D, Bigoni I, Porta FM, D'Ercole M, Ritorti F, Bagnardi V, Fusco N, Curigliano G. Development and Validation of a Natural Language Processing Algorithm for Extracting Clinical and Pathological Features of Breast Cancer From Pathology Reports. JCO Clin Cancer Inform 2024; 8:e2400034. [PMID: 39137368 DOI: 10.1200/cci.24.00034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Revised: 04/26/2024] [Accepted: 06/25/2024] [Indexed: 08/15/2024] Open
Abstract
PURPOSE Electronic health records (EHRs) are valuable information repositories that offer insights for enhancing clinical research on breast cancer (BC) using real-world data. The objective of this study was to develop a natural language processing (NLP) model specifically designed to extract structured data from BC pathology reports written in natural language. METHODS During the initial phase, the algorithm's development cohort comprised 193 pathology reports from 116 patients with BC from 2012 to 2016. A rule-based NLP algorithm was applied to extract 26 variables for analysis and was compared with the manual extraction of data performed by both a data entry specialist and an oncologist. Following the first approach, the data set was expanded to include 513 reports, and a Named Entity Recognition (NER)-NLP model was trained and evaluated using K-fold cross-validation. RESULTS The first approach led to a concordance analysis, which revealed an 82.9% agreement between the algorithm and the oncologist, whereas the concordance between the data entry specialist and the oncologist was 90.8%. The second training approach introduced the definition of an NER-NLP model, in which the accuracy showed remarkable potential (97.8%). Notably, the model demonstrated remarkable performance, especially for parameters such as estrogen receptor, progesterone receptor, human epidermal growth factor receptor 2, and Ki-67 (F1-score 1.0). CONCLUSION The present study aligns with the rapidly evolving field of artificial intelligence (AI) applications in oncology, seeking to expedite the development of complex cancer databases and registries. The results of the model are currently undergoing postprocessing procedures to organize the data into tabular structures, facilitating their utilization in real-world clinical and research endeavors.
Collapse
Affiliation(s)
- Elisabetta Munzone
- Division of Medical Senology, European Institute of Oncology IRCCS, Milan, Italy
| | - Antonio Marra
- Division of Early Drug Development for Innovative Therapies, European Institute of Oncology IRCCS, Milan, Italy
| | | | | | | | - Martina Lo Cascio
- Central Management of Information Systems and Technologies, European Institute of Oncology IRCCS, Milan, Italy
| | - Eleonora Pagan
- Department of Statistics and Quantitative Methods, University of Milan-Bicocca, Milan, Italy
| | - Davide Sangalli
- Central Management of Information Systems and Technologies, European Institute of Oncology IRCCS, Milan, Italy
| | | | | | - Marianna D'Ercole
- Division of Pathology, European Institute of Oncology IRCCS, Milan, Italy
| | | | - Vincenzo Bagnardi
- Department of Statistics and Quantitative Methods, University of Milan-Bicocca, Milan, Italy
| | - Nicola Fusco
- Division of Pathology, European Institute of Oncology IRCCS, Milan, Italy
- Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy
| | - Giuseppe Curigliano
- Division of Early Drug Development for Innovative Therapies, European Institute of Oncology IRCCS, Milan, Italy
- Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy
| |
Collapse
|
17
|
Alkhalaf M, Yu P, Yin M, Deng C. Applying generative AI with retrieval augmented generation to summarize and extract key clinical information from electronic health records. J Biomed Inform 2024; 156:104662. [PMID: 38880236 DOI: 10.1016/j.jbi.2024.104662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 05/25/2024] [Accepted: 05/28/2024] [Indexed: 06/18/2024]
Abstract
BACKGROUND Malnutrition is a prevalent issue in aged care facilities (RACFs), leading to adverse health outcomes. The ability to efficiently extract key clinical information from a large volume of data in electronic health records (EHR) can improve understanding about the extent of the problem and developing effective interventions. This research aimed to test the efficacy of zero-shot prompt engineering applied to generative artificial intelligence (AI) models on their own and in combination with retrieval augmented generation (RAG), for the automating tasks of summarizing both structured and unstructured data in EHR and extracting important malnutrition information. METHODOLOGY We utilized Llama 2 13B model with zero-shot prompting. The dataset comprises unstructured and structured EHRs related to malnutrition management in 40 Australian RACFs. We employed zero-shot learning to the model alone first, then combined it with RAG to accomplish two tasks: generate structured summaries about the nutritional status of a client and extract key information about malnutrition risk factors. We utilized 25 notes in the first task and 1,399 in the second task. We evaluated the model's output of each task manually against a gold standard dataset. RESULT The evaluation outcomes indicated that zero-shot learning applied to generative AI model is highly effective in summarizing and extracting information about nutritional status of RACFs' clients. The generated summaries provided concise and accurate representation of the original data with an overall accuracy of 93.25%. The addition of RAG improved the summarization process, leading to a 6% increase and achieving an accuracy of 99.25%. The model also proved its capability in extracting risk factors with an accuracy of 90%. However, adding RAG did not further improve accuracy in this task. Overall, the model has shown a robust performance when information was explicitly stated in the notes; however, it could encounter hallucination limitations, particularly when details were not explicitly provided. CONCLUSION This study demonstrates the high performance and limitations of applying zero-shot learning to generative AI models to automatic generation of structured summarization of EHRs data and extracting key clinical information. The inclusion of the RAG approach improved the model performance and mitigated the hallucination problem.
Collapse
Affiliation(s)
- Mohammad Alkhalaf
- School of Computing and Information Technology, University of Wollongong, Wollongong, NSW 2522, Australia; School of Computer Science, Qassim University, Qassim 51452, Saudi Arabia
| | - Ping Yu
- School of Computing and Information Technology, University of Wollongong, Wollongong, NSW 2522, Australia.
| | - Mengyang Yin
- Opal Healthcare, Level 11/420 George St, Sydney NSW 2000, Australia
| | - Chao Deng
- School of Medical, Indigenous and Health Sciences, University of Wollongong, Wollongong, NSW 2522, Australia
| |
Collapse
|
18
|
Kempf E, Priou S, Cohen A, Redjdal A, Guével E, Tannier X. The More, the Better? Modalities of Metastatic Status Extraction on Free Medical Reports Based on Natural Language Processing. JCO Clin Cancer Inform 2024; 8:e2400026. [PMID: 39186702 DOI: 10.1200/cci.24.00026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 04/26/2024] [Accepted: 05/10/2024] [Indexed: 08/28/2024] Open
Affiliation(s)
- Emmanuelle Kempf
- Emmanuelle Kempf, MD, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France, Department of Medical Oncology, Assistance Publique Hôpitaux de Paris, Henri Mondor Teaching Hospital, Paris, France; Sonia Priou, MSc, Université Paris Saclay, CentraleSupélec, Laboratoire Génie Industriel, Gif-sur-Yvette, France; Ariel Cohen, MSc, IT Department, Innovation and Data, Assistance Publique Hôpitaux de Paris, Paris, France; Akram Redjdal, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France; Etienne Guével, MSc, IT Department, Innovation and Data, Assistance Publique Hôpitaux de Paris, Paris, France; and Xavier Tannier, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France
| | - Sonia Priou
- Emmanuelle Kempf, MD, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France, Department of Medical Oncology, Assistance Publique Hôpitaux de Paris, Henri Mondor Teaching Hospital, Paris, France; Sonia Priou, MSc, Université Paris Saclay, CentraleSupélec, Laboratoire Génie Industriel, Gif-sur-Yvette, France; Ariel Cohen, MSc, IT Department, Innovation and Data, Assistance Publique Hôpitaux de Paris, Paris, France; Akram Redjdal, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France; Etienne Guével, MSc, IT Department, Innovation and Data, Assistance Publique Hôpitaux de Paris, Paris, France; and Xavier Tannier, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France
| | - Ariel Cohen
- Emmanuelle Kempf, MD, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France, Department of Medical Oncology, Assistance Publique Hôpitaux de Paris, Henri Mondor Teaching Hospital, Paris, France; Sonia Priou, MSc, Université Paris Saclay, CentraleSupélec, Laboratoire Génie Industriel, Gif-sur-Yvette, France; Ariel Cohen, MSc, IT Department, Innovation and Data, Assistance Publique Hôpitaux de Paris, Paris, France; Akram Redjdal, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France; Etienne Guével, MSc, IT Department, Innovation and Data, Assistance Publique Hôpitaux de Paris, Paris, France; and Xavier Tannier, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France
| | - Akram Redjdal
- Emmanuelle Kempf, MD, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France, Department of Medical Oncology, Assistance Publique Hôpitaux de Paris, Henri Mondor Teaching Hospital, Paris, France; Sonia Priou, MSc, Université Paris Saclay, CentraleSupélec, Laboratoire Génie Industriel, Gif-sur-Yvette, France; Ariel Cohen, MSc, IT Department, Innovation and Data, Assistance Publique Hôpitaux de Paris, Paris, France; Akram Redjdal, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France; Etienne Guével, MSc, IT Department, Innovation and Data, Assistance Publique Hôpitaux de Paris, Paris, France; and Xavier Tannier, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France
| | - Etienne Guével
- Emmanuelle Kempf, MD, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France, Department of Medical Oncology, Assistance Publique Hôpitaux de Paris, Henri Mondor Teaching Hospital, Paris, France; Sonia Priou, MSc, Université Paris Saclay, CentraleSupélec, Laboratoire Génie Industriel, Gif-sur-Yvette, France; Ariel Cohen, MSc, IT Department, Innovation and Data, Assistance Publique Hôpitaux de Paris, Paris, France; Akram Redjdal, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France; Etienne Guével, MSc, IT Department, Innovation and Data, Assistance Publique Hôpitaux de Paris, Paris, France; and Xavier Tannier, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France
| | - Xavier Tannier
- Emmanuelle Kempf, MD, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France, Department of Medical Oncology, Assistance Publique Hôpitaux de Paris, Henri Mondor Teaching Hospital, Paris, France; Sonia Priou, MSc, Université Paris Saclay, CentraleSupélec, Laboratoire Génie Industriel, Gif-sur-Yvette, France; Ariel Cohen, MSc, IT Department, Innovation and Data, Assistance Publique Hôpitaux de Paris, Paris, France; Akram Redjdal, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France; Etienne Guével, MSc, IT Department, Innovation and Data, Assistance Publique Hôpitaux de Paris, Paris, France; and Xavier Tannier, PhD, Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, LIMICS, Paris, France
| |
Collapse
|
19
|
Burford KG, Itzkowitz NG, Ortega AG, Teitler JO, Rundle AG. Use of Generative AI to Identify Helmet Status Among Patients With Micromobility-Related Injuries From Unstructured Clinical Notes. JAMA Netw Open 2024; 7:e2425981. [PMID: 39136946 PMCID: PMC11322845 DOI: 10.1001/jamanetworkopen.2024.25981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 05/15/2024] [Indexed: 08/16/2024] Open
Abstract
Importance Large language models (LLMs) have potential to increase the efficiency of information extraction from unstructured clinical notes in electronic medical records. Objective To assess the utility and reliability of an LLM, ChatGPT-4 (OpenAI), to analyze clinical narratives and identify helmet use status of patients injured in micromobility-related accidents. Design, Setting, and Participants This cross-sectional study used publicly available, deidentified 2019 to 2022 data from the US Consumer Product Safety Commission's National Electronic Injury Surveillance System, a nationally representative stratified probability sample of 96 hospitals in the US. Unweighted estimates of e-bike, bicycle, hoverboard, and powered scooter-related injuries that resulted in an emergency department visit were used. Statistical analysis was performed from November 2023 to April 2024. Main Outcomes and Measures Patient helmet status (wearing vs not wearing vs unknown) was extracted from clinical narratives using (1) a text string search using researcher-generated text strings and (2) the LLM by prompting the system with low-, intermediate-, and high-detail prompts. The level of agreement between the 2 approaches across all 3 prompts was analyzed using Cohen κ test statistics. Fleiss κ was calculated to measure the test-retest reliability of the high-detail prompt across 5 new chat sessions and days. Performance statistics were calculated by comparing results from the high-detail prompt to classifications of helmet status generated by researchers reading the clinical notes (ie, a criterion standard review). Results Among 54 569 clinical notes, moderate (Cohen κ = 0.74 [95% CI, 0.73-0.75) and weak (Cohen κ = 0.53 [95% CI, 0.52-0.54]) agreement were found between the text string-search approach and the LLM for the low- and intermediate-detail prompts, respectively. The high-detail prompt had almost perfect agreement (κ = 1.00 [95% CI, 1.00-1.00]) but required the greatest amount of time to complete. The LLM did not perfectly replicate its analyses across new sessions and days (Fleiss κ = 0.91 across 5 trials; P < .001). The LLM often hallucinated and was consistent in replicating its hallucinations. It also showed high validity compared with the criterion standard (n = 400; κ = 0.98 [95% CI, 0.96-1.00]). Conclusions and Relevance This study's findings suggest that although there are efficiency gains for using the LLM to extract information from clinical notes, the inadequate reliability compared with a text string-search approach, hallucinations, and inconsistent performance significantly hinder the potential of the currently available LLM.
Collapse
Affiliation(s)
- Kathryn G. Burford
- Department of Environmental Health Sciences, Columbia University Mailman School of Public Health, New York
| | - Nicole G. Itzkowitz
- Department of Epidemiology, Columbia University Mailman School of Public Health, New York
| | - Ashley G. Ortega
- Columbia Population Research Center, Columbia University, New York
| | | | - Andrew G. Rundle
- Department of Epidemiology, Columbia University Mailman School of Public Health, New York
| |
Collapse
|
20
|
Guo Y, Huang C, Sheng Y, Zhang W, Ye X, Lian H, Xu J, Chen Y. Improve the efficiency and accuracy of ophthalmologists' clinical decision-making based on AI technology. BMC Med Inform Decis Mak 2024; 24:192. [PMID: 38982465 PMCID: PMC11234671 DOI: 10.1186/s12911-024-02587-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 06/24/2024] [Indexed: 07/11/2024] Open
Abstract
BACKGROUND As global aging intensifies, the prevalence of ocular fundus diseases continues to rise. In China, the tense doctor-patient ratio poses numerous challenges for the early diagnosis and treatment of ocular fundus diseases. To reduce the high risk of missed or misdiagnosed cases, avoid irreversible visual impairment for patients, and ensure good visual prognosis for patients with ocular fundus diseases, it is particularly important to enhance the growth and diagnostic capabilities of junior doctors. This study aims to leverage the value of electronic medical record data to developing a diagnostic intelligent decision support platform. This platform aims to assist junior doctors in diagnosing ocular fundus diseases quickly and accurately, expedite their professional growth, and prevent delays in patient treatment. An empirical evaluation will assess the platform's effectiveness in enhancing doctors' diagnostic efficiency and accuracy. METHODS In this study, eight Chinese Named Entity Recognition (NER) models were compared, and the SoftLexicon-Glove-Word2vec model, achieving a high F1 score of 93.02%, was selected as the optimal recognition tool. This model was then used to extract key information from electronic medical records (EMRs) and generate feature variables based on diagnostic rule templates. Subsequently, an XGBoost algorithm was employed to construct an intelligent decision support platform for diagnosing ocular fundus diseases. The effectiveness of the platform in improving diagnostic efficiency and accuracy was evaluated through a controlled experiment comparing experienced and junior doctors. RESULTS The use of the diagnostic intelligent decision support platform resulted in significant improvements in both diagnostic efficiency and accuracy for both experienced and junior doctors (P < 0.05). Notably, the gap in diagnostic speed and precision between junior doctors and experienced doctors narrowed considerably when the platform was used. Although the platform also provided some benefits to experienced doctors, the improvement was less pronounced compared to junior doctors. CONCLUSION The diagnostic intelligent decision support platform established in this study, based on the XGBoost algorithm and NER, effectively enhances the diagnostic efficiency and accuracy of junior doctors in ocular fundus diseases. This has significant implications for optimizing clinical diagnosis and treatment.
Collapse
Affiliation(s)
- Yingxuan Guo
- School of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Changke Huang
- School of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Yaying Sheng
- School of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Wenjie Zhang
- School of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Xin Ye
- Center for Rehabilitation Medicine, Department of Ophthalmology, Zhejiang Provincial People's Hospital (Affiliated People's Hospital, Hangzhou Medical College), Hangzhou, Zhejiang, China
| | - Hengli Lian
- School of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Jiahao Xu
- School of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Yiqi Chen
- School of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, China.
- Center for Rehabilitation Medicine, Department of Ophthalmology, Zhejiang Provincial People's Hospital (Affiliated People's Hospital, Hangzhou Medical College), Hangzhou, Zhejiang, China.
| |
Collapse
|
21
|
Dasaro CR, Sabra A, Jeon Y, Williams TA, Sloan NL, Todd AC, Teitelbaum SL. A comparison of two user-friendly methods to identify and support correction of misspelled medications. Prev Med Rep 2024; 43:102765. [PMID: 38798907 PMCID: PMC11127154 DOI: 10.1016/j.pmedr.2024.102765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 04/29/2024] [Accepted: 05/14/2024] [Indexed: 05/29/2024] Open
Abstract
Objective To identify and support correction of misspelled medication names recorded as free text, we compared the relative effectiveness of two user-friendly methods, used without reliance on clinical knowledge. Methods Leveraging the SAS® COMPGED function, fuzzy string search programs examined 1.8 million medication records from 183,600 World Trade Center General Responder Cohort monitoring visits conducted in New York and New Jersey between 7/16/2002 and 3/31/2021, producing replicable generalized edit distance scores between the reported and correct spelling. Scores < 120 were selected as optimal and compared to Stedman's 2020 Plus Medical/Pharmaceutical Spell Checker first suggested word, used as the comparative standard because it employs both spelling and phonetic similarities to suggest matching words. We coded each methods' results as identifying or not identifying the medications within each visit. Results Most types of medications (94.4 % anxiety, 98.4 % asthma and 94.6 % ulcer/gastroesophageal reflux disease) were correctly spelled. Cross tabulations assessed the agreement (anxiety 99.9 %, asthma 99.6 % and 98.4 % ulcer/ gastroesophageal reflux disease), false positive (respectively 0.02 %, 0.03 % and 2.0 %) and false negative (respectively 1.9 %, 0.5 % and 1.0 %) values. Scores < 120 occasionally correctly identified medications missed by the spell checker. We observed no difference in medication misspellings across socio-economically and culturally diverse patient characteristics. Conclusions Both methods efficiently identified most misspelled medications, greatly minimizing the review and rectification needed. The fuzzy method is more universally applicable for condition-specific medications identification, but requires more programming skills. The spell checker is inexpensive, but benefits from modest programming skills and is only available in some languages.
Collapse
Affiliation(s)
- Christopher R. Dasaro
- World Trade Center Health Program General Responder Data Center, Department of Environmental Medicine and Climate Science, Icahn School of Medicine at Mount Sinai, 17 East 102 Street, 2 Floor, New York, NY 10029, United States of America
| | - Ahmad Sabra
- World Trade Center Health Program General Responder Data Center, Department of Environmental Medicine and Climate Science, Icahn School of Medicine at Mount Sinai, 17 East 102 Street, 2 Floor, New York, NY 10029, United States of America
| | - Yunho Jeon
- World Trade Center Health Program General Responder Data Center, Department of Environmental Medicine and Climate Science, Icahn School of Medicine at Mount Sinai, 17 East 102 Street, 2 Floor, New York, NY 10029, United States of America
| | - Tankeesha A. Williams
- World Trade Center Health Program General Responder Data Center, Department of Environmental Medicine and Climate Science, Icahn School of Medicine at Mount Sinai, 17 East 102 Street, 2 Floor, New York, NY 10029, United States of America
| | - Nancy L. Sloan
- World Trade Center Health Program General Responder Data Center, Department of Environmental Medicine and Climate Science, Icahn School of Medicine at Mount Sinai, 17 East 102 Street, 2 Floor, New York, NY 10029, United States of America
| | - Andrew C. Todd
- World Trade Center Health Program General Responder Data Center, Department of Environmental Medicine and Climate Science, Icahn School of Medicine at Mount Sinai, 17 East 102 Street, 2 Floor, New York, NY 10029, United States of America
| | - Susan L. Teitelbaum
- World Trade Center Health Program General Responder Data Center, Department of Environmental Medicine and Climate Science, Icahn School of Medicine at Mount Sinai, 17 East 102 Street, 2 Floor, New York, NY 10029, United States of America
| |
Collapse
|
22
|
Darer JD, Pesa J, Choudhry Z, Batista AE, Parab P, Yang X, Govindarajan R. Characterizing Myasthenia Gravis Symptoms, Exacerbations, and Crises From Neurologist's Clinical Notes Using Natural Language Processing. Cureus 2024; 16:e65792. [PMID: 39219871 PMCID: PMC11361825 DOI: 10.7759/cureus.65792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/29/2024] [Indexed: 09/04/2024] Open
Abstract
Background Myasthenia gravis (MG) is a rare, autoantibody neuromuscular disorder characterized by fatigable weakness. Real-world evidence based on administrative and structured datasets regarding MG may miss important details related to the clinical encounter. Examination of free-text clinical progress notes has the potential to illuminate aspects of MG care. Objective The primary objective was to examine and characterize neurologist progress notes in the care of individuals with MG regarding the prevalence of documentation of clinical subtypes, antibody status, symptomatology, and MG deteriorations, including exacerbations and crises. The secondary objectives were to categorize MG deteriorations into practical, objective states as well as examine potential sources of clinical inertia in MG care. Methods We performed a retrospective, cross-sectional analysis of de-identified neurologist clinical notes from 2017 to 2022. A qualitative analysis of physician descriptions of MG deteriorations and a discussion of risks in MG care (risk for adverse effects, risk for clinical decompensation, etc.) was performed. Results Of the 3,085 individuals with MG, clinical subtypes and antibody status identified included gMG (n = 400; 13.0%), ocular MG (n = 253; 8.2%), MG unspecified (2,432; 78.8%), seropositivity for acetylcholine receptor antibody (n = 441; 14.3%), and MuSK antibody (n = 29; 0.9%). The most common gMG manifestations were dysphagia (n = 712; 23.0%), dyspnea (n = 626; 20.3%), and dysarthria (n = 514; 16.7%). In MG crisis patients, documentation of difficulties with MG standard therapies was common (n = 62; 45.2%). The qualitative analysis of MG deterioration types includes symptom fluctuation, symptom worsening with treatment intensification, MG deterioration with rescue therapy, and MG crisis. Qualitative analysis of MG-related risks included the toxicity of new therapies and concern for worsening MG because of changing therapies. Conclusions This study of neurologist progress notes demonstrates the potential for real-world evidence generation in the care of individuals with MG. MG patients suffer fluctuating symptomatology and a spectrum of clinical deteriorations. Adverse effects of MG therapies are common, highlighting the need for effective, less toxic treatments.
Collapse
Affiliation(s)
| | - Jacqueline Pesa
- Real World Value and Evidence, Immunology, Janssen Scientific Affairs, Titusville, USA
| | - Zia Choudhry
- Rare Antibody Diseases, Janssen Scientific Affairs, Titusville, USA
| | | | - Purva Parab
- Biostatistics, Health Analytics, Clarksville, USA
| | - Xiaoyun Yang
- Biostatistics, Health Analytics, Clarksville, USA
| | | |
Collapse
|
23
|
Csore J, Roy TL, Wright G, Karmonik C. Unsupervised classification of multi-contrast magnetic resonance histology of peripheral arterial disease lesions using a convolutional variational autoencoder with a Gaussian mixture model in latent space: A technical feasibility study. Comput Med Imaging Graph 2024; 115:102372. [PMID: 38581959 DOI: 10.1016/j.compmedimag.2024.102372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 02/09/2024] [Accepted: 03/18/2024] [Indexed: 04/08/2024]
Abstract
PURPOSE To investigate the feasibility of a deep learning algorithm combining variational autoencoder (VAE) and two-dimensional (2D) convolutional neural networks (CNN) for automatically quantifying hard tissue presence and morphology in multi-contrast magnetic resonance (MR) images of peripheral arterial disease (PAD) occlusive lesions. METHODS Multi-contrast MR images (T2-weighted and ultrashort echo time) were acquired from lesions harvested from six amputated legs with high isotropic spatial resolution (0.078 mm and 0.156 mm, respectively) at 9.4 T. A total of 4014 pseudo-color combined images were generated, with 75% used to train a VAE employing custom 2D CNN layers. A Gaussian mixture model (GMM) was employed to classify the latent space data into four tissue classes: I) concentric calcified (c), II) eccentric calcified (e), III) occluded with hard tissue (h) and IV) occluded with soft tissue (s). Test image probabilities, encoded by the trained VAE were used to evaluate model performance. RESULTS GMM component classification probabilities ranged from 0.92 to 0.97 for class (c), 1.00 for class (e), 0.82-0.95 for class (h) and 0.56-0.93 for the remaining class (s). Due to the complexity of soft-tissue lesions reflected in the heterogeneity of the pseudo-color images, more GMM components (n=17) were attributed to class (s), compared to the other three (c, e and h) (n=6). CONCLUSION Combination of 2D CNN VAE and GMM achieves high classification probabilities for hard tissue-containing lesions. Automatic recognition of these classes may aid therapeutic decision-making and identifying uncrossable lesions prior to endovascular intervention.
Collapse
Affiliation(s)
- Judit Csore
- DeBakey Heart and Vascular Center, Houston Methodist Hospital, 6565 Fannin Street, Houston, TX 77030, USA; Heart and Vascular Center, Semmelweis University, 68 Városmajor Street, Budapest 1122, Hungary.
| | - Trisha L Roy
- DeBakey Heart and Vascular Center, Houston Methodist Hospital, 6565 Fannin Street, Houston, TX 77030, USA
| | - Graham Wright
- Sunnybrook Research Institute, 2075 Bayview Avenue, Toronto, Ontario M4N 3M5, Canada
| | - Christof Karmonik
- MRI Core, Translational Imaging Center, Houston Methodist Research Institute, 6670 Bertner Avenue, Houston, TX 77030, USA
| |
Collapse
|
24
|
Fu S, Wang L, He H, Wen A, Zong N, Kumari A, Liu F, Zhou S, Zhang R, Li C, Wang Y, St Sauver J, Liu H, Sohn S. A taxonomy for advancing systematic error analysis in multi-site electronic health record-based clinical concept extraction. J Am Med Inform Assoc 2024; 31:1493-1502. [PMID: 38742455 PMCID: PMC11187420 DOI: 10.1093/jamia/ocae101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 03/26/2024] [Accepted: 04/19/2024] [Indexed: 05/16/2024] Open
Abstract
BACKGROUND Error analysis plays a crucial role in clinical concept extraction, a fundamental subtask within clinical natural language processing (NLP). The process typically involves a manual review of error types, such as contextual and linguistic factors contributing to their occurrence, and the identification of underlying causes to refine the NLP model and improve its performance. Conducting error analysis can be complex, requiring a combination of NLP expertise and domain-specific knowledge. Due to the high heterogeneity of electronic health record (EHR) settings across different institutions, challenges may arise when attempting to standardize and reproduce the error analysis process. OBJECTIVES This study aims to facilitate a collaborative effort to establish common definitions and taxonomies for capturing diverse error types, fostering community consensus on error analysis for clinical concept extraction tasks. MATERIALS AND METHODS We iteratively developed and evaluated an error taxonomy based on existing literature, standards, real-world data, multisite case evaluations, and community feedback. The finalized taxonomy was released in both .dtd and .owl formats at the Open Health Natural Language Processing Consortium. The taxonomy is compatible with several different open-source annotation tools, including MAE, Brat, and MedTator. RESULTS The resulting error taxonomy comprises 43 distinct error classes, organized into 6 error dimensions and 4 properties, including model type (symbolic and statistical machine learning), evaluation subject (model and human), evaluation level (patient, document, sentence, and concept), and annotation examples. Internal and external evaluations revealed strong variations in error types across methodological approaches, tasks, and EHR settings. Key points emerged from community feedback, including the need to enhancing clarity, generalizability, and usability of the taxonomy, along with dissemination strategies. CONCLUSION The proposed taxonomy can facilitate the acceleration and standardization of the error analysis process in multi-site settings, thus improving the provenance, interpretability, and portability of NLP models. Future researchers could explore the potential direction of developing automated or semi-automated methods to assist in the classification and standardization of error analysis.
Collapse
Affiliation(s)
- Sunyang Fu
- Department of AI and Informatics, Mayo Clinic, Rochester, MN 55902, United States
- Center for Translational AI Excellence and Applications in Medicine, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Liwei Wang
- Department of AI and Informatics, Mayo Clinic, Rochester, MN 55902, United States
- Center for Translational AI Excellence and Applications in Medicine, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Huan He
- Department of Biomedical Informatics & Data Science, Yale University, New Haven, CT 06520, United States
| | - Andrew Wen
- Center for Translational AI Excellence and Applications in Medicine, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Nansu Zong
- Department of AI and Informatics, Mayo Clinic, Rochester, MN 55902, United States
| | - Anamika Kumari
- Department of Population and Quantitative Health Sciences, University of Massachusetts Chan Medical School, Boston, MA 01655, United States
| | - Feifan Liu
- Department of Population and Quantitative Health Sciences, University of Massachusetts Chan Medical School, Boston, MA 01655, United States
| | - Sicheng Zhou
- Division of Computational Health Sciences, University of Minnesota, Minneapolis, MN 55455, United States
| | - Rui Zhang
- Division of Computational Health Sciences, University of Minnesota, Minneapolis, MN 55455, United States
| | - Chenyu Li
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - Yanshan Wang
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - Jennifer St Sauver
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55902, United States
| | - Hongfang Liu
- Department of AI and Informatics, Mayo Clinic, Rochester, MN 55902, United States
- Center for Translational AI Excellence and Applications in Medicine, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Sunghwan Sohn
- Department of AI and Informatics, Mayo Clinic, Rochester, MN 55902, United States
| |
Collapse
|
25
|
Zhao T, He ZA, Shao J, Regmi A, Shi L, Cai Y. Decoding hotline's information with text-mining: A protocol for improving tobacco control in Shanghai. Tob Induc Dis 2024; 22:TID-22-107. [PMID: 38887599 PMCID: PMC11181012 DOI: 10.18332/tid/187864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 04/13/2024] [Accepted: 04/23/2024] [Indexed: 06/20/2024] Open
Abstract
Tobacco consumption in China remains the primary cause of preventable mortality, with Shanghai being particularly affected by issues related to secondhand smoke exposure. This study explores the role of the public service hotline 12345, a grassroots initiative in Shanghai, in capturing public sentiment and assessing the effectiveness of anti-smoking regulations. Our research aims to accurately and deeply understand the implementation and feedback of smoking control policies: by identifying high-frequency points and prominent issues in smoking control work based on the smoking control work order data received by the health hotline 12320. The results of this study will assist government enforcement agencies in improving smoking monitoring and clarify the direction for improving smoking control measures. Text-mining techniques were employed to analyze a dataset comprising 78011 call sheets, all related to tobacco control and collected from the hotline between 1 January 2015 and 31 December 2019. This methodological approach aims to uncover prevalent themes and sentiments in the public discourse on smoking and its regulation, as reflected in the hotline interactions. Our study identified hotspots and the issues of greatest concern to citizens. Additionally, it provided recommendations to enforcement agencies to enhance their capabilities, optimize the allocation of human resources for smoking control monitoring, reduce enforcement costs and support for anti-smoking campaigns, thereby contributing to more effective tobacco control policies in the region.
Collapse
Affiliation(s)
- Tong Zhao
- School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Zi-an He
- School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Jiading District Center for Disease Control and Prevention, Shanghai, China
| | - Jiaqi Shao
- Zhongshan Hospital, Fudan University, Shanghai, China
| | - Aksara Regmi
- School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Lili Shi
- Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yuyang Cai
- School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| |
Collapse
|
26
|
Jakovljevic M, Timofeyev Y, Zhuravleva T. The Impact of Pandemic-Driven Care Redesign on Hospital Efficiency. Risk Manag Healthc Policy 2024; 17:1477-1491. [PMID: 38855044 PMCID: PMC11162215 DOI: 10.2147/rmhp.s465167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Accepted: 05/26/2024] [Indexed: 06/11/2024] Open
Abstract
Purpose This study aims to identify medical care transformations during the COVID-19 pandemic and to assess the economic efficiency of these care transformations. Methods A systematic review was conducted in accordance with the Preferred Reporting Items for Systematic Reviewing and Meta-Analysis (PRISMA) guidelines. The databases used in the search protocol included PubMed, RSCI, and Google Scholar. Results Ten eligible studies in English and one publication in Russian were identified. In general, the following changes in organization of health care processes since 2020 are observed: hospital at home, telemedicine (physician-to-patient), and the adoption of new information communication technologies within physician-to-physician and physician-to-nurse communication. Earlier trends, such as (a) wider use of electronic devices, (b) adoption of Lean techniques, (c) the incorporation of patient and other customer experience feedback, and (d) the implementation of clinical decision support systems and automation of workflow, tend to be preserved. Conclusion The most common changes in hospital care organization and the respective impacts of workflow changes (ie, workflow interventions, redesign, and transformations) on the efficiency of hospital care were summarized and avenues for future research and policy implications were discussed. The pandemic demonstrated a need for building more resilient and adaptive healthcare systems, enhancing crisis preparedness along with rapid and effective responses.
Collapse
Affiliation(s)
- Mihajlo Jakovljevic
- UNESCO-TWAS, The World Academy of Sciences, Trieste, Italy
- Shaanxi University of Technology, Hanzhong, People’s Republic of China
- Department of Global Health Economics and Policy, University of Kragujevac, Kragujevac, Serbia
| | | | - Tatyana Zhuravleva
- International Laboratory for Experimental and Behavioural Economics, HSE University, Moscow, Russia
| |
Collapse
|
27
|
Zuo X, Kumar A, Shen S, Li J, Cong G, Jin E, Chen Q, Warner JL, Yang P, Xu H. Extracting Systemic Anticancer Therapy and Response Information From Clinical Notes Following the RECIST Definition. JCO Clin Cancer Inform 2024; 8:e2300166. [PMID: 38885475 DOI: 10.1200/cci.23.00166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 02/27/2024] [Accepted: 03/11/2024] [Indexed: 06/20/2024] Open
Abstract
PURPOSE The RECIST guidelines provide a standardized approach for evaluating the response of cancer to treatment, allowing for consistent comparison of treatment efficacy across different therapies and patients. However, collecting such information from electronic health records manually can be extremely labor-intensive and time-consuming because of the complexity and volume of clinical notes. The aim of this study is to apply natural language processing (NLP) techniques to automate this process, minimizing manual data collection efforts, and improving the consistency and reliability of the results. METHODS We proposed a complex, hybrid NLP system that automates the process of extracting, linking, and summarizing anticancer therapy and associated RECIST-like responses from narrative clinical text. The system consists of multiple machine learning-/deep learning-based and rule-based modules for diverse NLP tasks such as named entity recognition, assertion classification, relation extraction, and text normalization, to address different challenges associated with anticancer therapy and response information extraction. We then evaluated the system performances on two independent test sets from different institutions to demonstrate its effectiveness and generalizability. RESULTS The system used domain-specific language models, BioBERT and BioClinicalBERT, for high-performance therapy mentions identification and RECIST responses extraction and categorization. The best-performing model achieved a 0.66 score in linking therapy and RECIST response mentions, with end-to-end performance peaking at 0.74 after relation normalization, indicating substantial efficacy with room for improvement. CONCLUSION We developed, implemented, and tested an information extraction system from clinical notes for cancer treatment and efficacy assessment information. We expect this system will support future cancer research, particularly oncologic studies that focus on efficiently assessing the effectiveness and reliability of cancer therapeutics.
Collapse
Affiliation(s)
- Xu Zuo
- University of Texas Health Science Center, Houston, TX
| | | | | | - Jianfu Li
- University of Texas Health Science Center, Houston, TX
| | - Grace Cong
- University of Maryland, College Park, College Park, MD
| | - Edward Jin
- University of Southern California, Los Angeles, CA
| | - Qingxia Chen
- Vanderbilt University Medical Center, Nashville, TN
| | - Jeremy L Warner
- Vanderbilt University Medical Center, Nashville, TN
- Legorreta Cancer Center at Brown University, Providence, RI
- Lifespan Cancer Institute, Providence, RI
| | | | - Hua Xu
- Yale University, New Haven, CT
| |
Collapse
|
28
|
Swaminathan A, Ren AL, Wu JY, Bhargava-Shah A, Lopez I, Srivastava U, Alexopoulos V, Pizzitola R, Bui B, Alkhani L, Lee S, Mohit N, Seo N, Macedo N, Cheng W, Wang W, Tran E, Thomas R, Gevaert O. Extraction of Unstructured Electronic Health Records to Evaluate Glioblastoma Treatment Patterns. JCO Clin Cancer Inform 2024; 8:e2300091. [PMID: 38857465 PMCID: PMC11371099 DOI: 10.1200/cci.23.00091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 11/15/2023] [Accepted: 03/12/2024] [Indexed: 06/12/2024] Open
Abstract
PURPOSE Data on lines of therapy (LOTs) for cancer treatment are important for clinical oncology research, but LOTs are not explicitly recorded in electronic health records (EHRs). We present an efficient approach for clinical data abstraction and a flexible algorithm to derive LOTs from EHR-based medication data on patients with glioblastoma multiforme (GBM). METHODS Nonclinicians were trained to abstract the diagnosis of GBM from EHRs, and their accuracy was compared with abstraction performed by clinicians. The resulting data were used to build a cohort of patients with confirmed GBM diagnosis. An algorithm was developed to derive LOTs using structured medication data, accounting for the addition and discontinuation of therapies and drug class. Descriptive statistics were calculated and time-to-next-treatment (TTNT) analysis was performed using the Kaplan-Meier method. RESULTS Treating clinicians as the gold standard, nonclinicians abstracted GBM diagnosis with a sensitivity of 0.98, specificity 1.00, positive predictive value 1.00, and negative predictive value 0.90, suggesting that nonclinician abstraction of GBM diagnosis was comparable with clinician abstraction. Of 693 patients with a confirmed diagnosis of GBM, 246 patients contained structured information about the types of medications received. Of them, 165 (67.1%) received a first-line therapy (1L) of temozolomide, and the median TTNT from the start of 1L was 179 days. CONCLUSION We described a workflow for extracting diagnosis of GBM and LOT from EHR data that combines nonclinician abstraction with algorithmic processing, demonstrating comparable accuracy with clinician abstraction and highlighting the potential for scalable and efficient EHR-based oncology research.
Collapse
Affiliation(s)
| | | | - Janet Y. Wu
- Stanford University School of Medicine, Stanford, CA
| | | | - Ivan Lopez
- Stanford University School of Medicine, Stanford, CA
| | - Ujwal Srivastava
- Department of Computer Science, Stanford University, Stanford, CA
| | | | | | - Brandon Bui
- Department of Human Biology, Stanford University, Stanford, CA
| | - Layth Alkhani
- Department of Materials Science and Engineering, Stanford University, Stanford, CA
| | - Susan Lee
- Department of Computer Science, Stanford University, Stanford, CA
- Department of Psychology, Stanford University, Stanford, CA
| | - Nathan Mohit
- Department of Computer Science, Stanford University, Stanford, CA
| | - Noel Seo
- Department of Sociology, Stanford University, Stanford, CA
| | - Nicholas Macedo
- Department of Biology, Stanford University, Stanford, CA
- Department of Radiology, Stanford University School of Medicine, Stanford, CA
| | - Winson Cheng
- Department of Computer Science, Stanford University, Stanford, CA
- Department of Chemistry, Stanford University, Stanford, CA
| | - William Wang
- Department of Biology, Stanford University, Stanford, CA
- Department of Bioengineering, Stanford University, Stanford, CA
| | - Edward Tran
- Department of Computer Science, Stanford University, Stanford, CA
| | - Reena Thomas
- Stanford University School of Medicine, Stanford, CA
| | - Olivier Gevaert
- Department of Medicine, Stanford Center for Biomedical Informatics Research (BMIR), Stanford, CA
- Department of Biomedical Data Science, Stanford Center for Biomedical Informatics Research (BMIR), Stanford, CA
| |
Collapse
|
29
|
Shyr C, Hu Y, Bastarache L, Cheng A, Hamid R, Harris P, Xu H. Identifying and Extracting Rare Diseases and Their Phenotypes with Large Language Models. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2024; 8:438-461. [PMID: 38681753 PMCID: PMC11052982 DOI: 10.1007/s41666-023-00155-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 10/24/2023] [Accepted: 11/13/2023] [Indexed: 05/01/2024]
Abstract
Purpose Phenotyping is critical for informing rare disease diagnosis and treatment, but disease phenotypes are often embedded in unstructured text. While natural language processing (NLP) can automate extraction, a major bottleneck is developing annotated corpora. Recently, prompt learning with large language models (LLMs) has been shown to lead to generalizable results without any (zero-shot) or few annotated samples (few-shot), but none have explored this for rare diseases. Our work is the first to study prompt learning for identifying and extracting rare disease phenotypes in the zero- and few-shot settings. Methods We compared the performance of prompt learning with ChatGPT and fine-tuning with BioClinicalBERT. We engineered novel prompts for ChatGPT to identify and extract rare diseases and their phenotypes (e.g., diseases, symptoms, and signs), established a benchmark for evaluating its performance, and conducted an in-depth error analysis. Results Overall, fine-tuning BioClinicalBERT resulted in higher performance (F1 of 0.689) than ChatGPT (F1 of 0.472 and 0.610 in the zero- and few-shot settings, respectively). However, ChatGPT achieved higher accuracy for rare diseases and signs in the one-shot setting (F1 of 0.778 and 0.725). Conversational, sentence-based prompts generally achieved higher accuracy than structured lists. Conclusion Prompt learning using ChatGPT has the potential to match or outperform fine-tuning BioClinicalBERT at extracting rare diseases and signs with just one annotated sample. Given its accessibility, ChatGPT could be leveraged to extract these entities without relying on a large, annotated corpus. While LLMs can support rare disease phenotyping, researchers should critically evaluate model outputs to ensure phenotyping accuracy.
Collapse
Affiliation(s)
- Cathy Shyr
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203 USA
| | - Yan Hu
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77225 USA
| | - Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203 USA
| | - Alex Cheng
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203 USA
| | - Rizwan Hamid
- Division of Medical Genetics and Genomic Medicine, Vanderbilt University Medical Center, Nashville, TN 37203 USA
| | - Paul Harris
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203 USA
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37203 USA
- Department of Biomedical Engineering, Vanderbilt University Medical Center, 2525 West End Avenue, Nashville, TN 37203 USA
| | - Hua Xu
- Section of Biomedical Informatics and Data Science, Yale School of Medicine, 100 College Street, New Haven, CT 06510 USA
| |
Collapse
|
30
|
Sivarajkumar S, Mohammad HA, Oniani D, Roberts K, Hersh W, Liu H, He D, Visweswaran S, Wang Y. Clinical Information Retrieval: A Literature Review. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2024; 8:313-352. [PMID: 38681755 PMCID: PMC11052968 DOI: 10.1007/s41666-024-00159-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 12/07/2023] [Accepted: 01/08/2024] [Indexed: 05/01/2024]
Abstract
Clinical information retrieval (IR) plays a vital role in modern healthcare by facilitating efficient access and analysis of medical literature for clinicians and researchers. This scoping review aims to offer a comprehensive overview of the current state of clinical IR research and identify gaps and potential opportunities for future studies in this field. The main objective was to assess and analyze the existing literature on clinical IR, focusing on the methods, techniques, and tools employed for effective retrieval and analysis of medical information. Adhering to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, we conducted an extensive search across databases such as Ovid Embase, Ovid Medline, Scopus, ACM Digital Library, IEEE Xplore, and Web of Science, covering publications from January 1, 2010, to January 4, 2023. The rigorous screening process led to the inclusion of 184 papers in our review. Our findings provide a detailed analysis of the clinical IR research landscape, covering aspects like publication trends, data sources, methodologies, evaluation metrics, and applications. The review identifies key research gaps in clinical IR methods such as indexing, ranking, and query expansion, offering insights and opportunities for future studies in clinical IR, thus serving as a guiding framework for upcoming research efforts in this rapidly evolving field. The study also underscores an imperative for innovative research on advanced clinical IR systems capable of fast semantic vector search and adoption of neural IR techniques for effective retrieval of information from unstructured electronic health records (EHRs). Supplementary Information The online version contains supplementary material available at 10.1007/s41666-024-00159-4.
Collapse
Affiliation(s)
| | | | - David Oniani
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA USA
| | - Kirk Roberts
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX USA
| | - William Hersh
- Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, Portland, OR USA
| | - Hongfang Liu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX USA
| | - Daqing He
- Department of Information Science, University of Pittsburgh, Pittsburgh, PA USA
| | - Shyam Visweswaran
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA USA
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA USA
- Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, PA USA
| | - Yanshan Wang
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA USA
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA USA
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA USA
- Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, PA USA
| |
Collapse
|
31
|
Petit-Jean T, Gérardin C, Berthelot E, Chatellier G, Frank M, Tannier X, Kempf E, Bey R. Collaborative and privacy-enhancing workflows on a clinical data warehouse: an example developing natural language processing pipelines to detect medical conditions. J Am Med Inform Assoc 2024; 31:1280-1290. [PMID: 38573195 PMCID: PMC11105139 DOI: 10.1093/jamia/ocae069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 02/28/2024] [Accepted: 03/13/2024] [Indexed: 04/05/2024] Open
Abstract
OBJECTIVE To develop and validate a natural language processing (NLP) pipeline that detects 18 conditions in French clinical notes, including 16 comorbidities of the Charlson index, while exploring a collaborative and privacy-enhancing workflow. MATERIALS AND METHODS The detection pipeline relied both on rule-based and machine learning algorithms, respectively, for named entity recognition and entity qualification, respectively. We used a large language model pre-trained on millions of clinical notes along with annotated clinical notes in the context of 3 cohort studies related to oncology, cardiology, and rheumatology. The overall workflow was conceived to foster collaboration between studies while respecting the privacy constraints of the data warehouse. We estimated the added values of the advanced technologies and of the collaborative setting. RESULTS The pipeline reached macro-averaged F1-score positive predictive value, sensitivity, and specificity of 95.7 (95%CI 94.5-96.3), 95.4 (95%CI 94.0-96.3), 96.0 (95%CI 94.0-96.7), and 99.2 (95%CI 99.0-99.4), respectively. F1-scores were superior to those observed using alternative technologies or non-collaborative settings. The models were shared through a secured registry. CONCLUSIONS We demonstrated that a community of investigators working on a common clinical data warehouse could efficiently and securely collaborate to develop, validate and use sensitive artificial intelligence models. In particular, we provided an efficient and robust NLP pipeline that detects conditions mentioned in clinical notes.
Collapse
Affiliation(s)
- Thomas Petit-Jean
- Innovation and Data Unit, IT Department, Assistance Publique-Hôpitaux de Paris, Paris, 75012, France
| | - Christel Gérardin
- Innovation and Data Unit, IT Department, Assistance Publique-Hôpitaux de Paris, Paris, 75012, France
- Institut Pierre-Louis d’Epidémiologie et de Santé Publique, INSERM, Sorbonne Université, Paris, 75012, France
| | - Emmanuelle Berthelot
- Department of Cardiology, Hôpital Bicêtre, Assistance Publique-Hôpitaux de Paris, Le Kremlin Bicêtre, 94270, France
| | - Gilles Chatellier
- Innovation and Data Unit, IT Department, Assistance Publique-Hôpitaux de Paris, Paris, 75012, France
- Department of Medical Informatics, Assistance Publique-Hôpitaux de Paris, Centre-Université de Paris (APHP-CUP), Université de Paris, Paris, 75015, France
| | - Marie Frank
- Department of Medical Informatics, Hôpitaux Universitaires Paris-Saclay, Assistance Publique-Hôpitaux de Paris, Le Kremlin-Bicêtre, 94270, France
| | - Xavier Tannier
- Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé (LIMICS), INSERM, Université Sorbonne Paris Nord, Sorbonne Université, Paris, 75005, France
| | - Emmanuelle Kempf
- Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé (LIMICS), INSERM, Université Sorbonne Paris Nord, Sorbonne Université, Paris, 75005, France
- Department of Medical Oncology, Henri Mondor and Albert Chenevier Teaching Hospital, Assistance Publique-Hôpitaux de Paris, Créteil, 94000, France
| | - Romain Bey
- Innovation and Data Unit, IT Department, Assistance Publique-Hôpitaux de Paris, Paris, 75012, France
| |
Collapse
|
32
|
Webb BD, Lau LY, Tsevdos D, Shewcraft RA, Corrigan D, Shi L, Lee S, Tyler J, Li S, Wang Z, Stolovitzky G, Edelmann L, Chen R, Schadt EE, Li L. An algorithm to identify patients aged 0-3 with rare genetic disorders. Orphanet J Rare Dis 2024; 19:183. [PMID: 38698482 PMCID: PMC11064409 DOI: 10.1186/s13023-024-03188-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 04/17/2024] [Indexed: 05/05/2024] Open
Abstract
BACKGROUND With over 7000 Mendelian disorders, identifying children with a specific rare genetic disorder diagnosis through structured electronic medical record data is challenging given incompleteness of records, inaccurate medical diagnosis coding, as well as heterogeneity in clinical symptoms and procedures for specific disorders. We sought to develop a digital phenotyping algorithm (PheIndex) using electronic medical records to identify children aged 0-3 diagnosed with genetic disorders or who present with illness with an increased risk for genetic disorders. RESULTS Through expert opinion, we established 13 criteria for the algorithm and derived a score and a classification. The performance of each criterion and the classification were validated by chart review. PheIndex identified 1,088 children out of 93,154 live births who may be at an increased risk for genetic disorders. Chart review demonstrated that the algorithm achieved 90% sensitivity, 97% specificity, and 94% accuracy. CONCLUSIONS The PheIndex algorithm can help identify when a rare genetic disorder may be present, alerting providers to consider ordering a diagnostic genetic test and/or referring a patient to a medical geneticist.
Collapse
Affiliation(s)
- Bryn D Webb
- Department of Pediatrics, University of Wisconsin School of Medicine and Public Health, Madison, WI, USA.
- GeneDx Holdings Corp, (formerly known as Sema4 Holdings Corp.), Stamford, Connecticut, CT, USA.
| | - Lisa Y Lau
- GeneDx Holdings Corp, (formerly known as Sema4 Holdings Corp.), Stamford, Connecticut, CT, USA
| | - Despina Tsevdos
- Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ryan A Shewcraft
- GeneDx Holdings Corp, (formerly known as Sema4 Holdings Corp.), Stamford, Connecticut, CT, USA
| | - David Corrigan
- GeneDx Holdings Corp, (formerly known as Sema4 Holdings Corp.), Stamford, Connecticut, CT, USA
| | - Lisong Shi
- GeneDx Holdings Corp, (formerly known as Sema4 Holdings Corp.), Stamford, Connecticut, CT, USA
| | - Seungwoo Lee
- GeneDx Holdings Corp, (formerly known as Sema4 Holdings Corp.), Stamford, Connecticut, CT, USA
| | - Jonathan Tyler
- GeneDx Holdings Corp, (formerly known as Sema4 Holdings Corp.), Stamford, Connecticut, CT, USA
| | - Shilong Li
- GeneDx Holdings Corp, (formerly known as Sema4 Holdings Corp.), Stamford, Connecticut, CT, USA
| | - Zichen Wang
- GeneDx Holdings Corp, (formerly known as Sema4 Holdings Corp.), Stamford, Connecticut, CT, USA
| | - Gustavo Stolovitzky
- GeneDx Holdings Corp, (formerly known as Sema4 Holdings Corp.), Stamford, Connecticut, CT, USA
| | - Lisa Edelmann
- GeneDx Holdings Corp, (formerly known as Sema4 Holdings Corp.), Stamford, Connecticut, CT, USA
| | - Rong Chen
- GeneDx Holdings Corp, (formerly known as Sema4 Holdings Corp.), Stamford, Connecticut, CT, USA
| | - Eric E Schadt
- Department of Genetics and Genomic Sciences, The Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Li Li
- GeneDx Holdings Corp, (formerly known as Sema4 Holdings Corp.), Stamford, Connecticut, CT, USA.
| |
Collapse
|
33
|
Tavabi N, Pruneski J, Golchin S, Singh M, Sanborn R, Heyworth B, Landschaft A, Kimia A, Kiapour A. Building large-scale registries from unstructured clinical notes using a low-resource natural language processing pipeline. Artif Intell Med 2024; 151:102847. [PMID: 38658131 DOI: 10.1016/j.artmed.2024.102847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 02/06/2024] [Accepted: 03/19/2024] [Indexed: 04/26/2024]
Abstract
Building clinical registries is an important step in clinical research and improvement of patient care quality. Natural Language Processing (NLP) methods have shown promising results in extracting valuable information from unstructured clinical notes. However, the structure and nature of clinical notes are very different from regular text that state-of-the-art NLP models are trained and tested on, and they have their own set of challenges. In this study, we propose Sentence Extractor with Keywords (SE-K), an efficient and interpretable classification approach for extracting information from clinical notes and show that it outperforms more computationally expensive methods in text classification. Following the Institutional Review Board (IRB) approval, we used SE-K and two embedding based NLP approaches (Sentence Extractor with Embeddings (SE-E) and Bidirectional Encoder Representations from Transformers (BERT)) to develop comprehensive registry of anterior cruciate ligament surgeries from 20 years of unstructured clinical data at a multi-site tertiary-care regional children's hospital. The low-resource approach (SE-K) had better performance (average AUROC of 0.94 ± 0.04) than the embedding-based approaches (SE-E: 0.93 ± 0.04 and BERT: 0.87 ± 0.09) for out of sample validation, in addition to minimum performance drop between test and out-of-sample validation. Moreover, the SE-K approach was at least six times faster (on CPU) than SE-E (on CPU) and BERT (on GPU) and provides interpretability. Our proposed approach, SE-K, can be effectively used to extract relevant variables from clinic notes to build large-scale registries, with consistently better performance compared to the more resource-intensive approaches (e.g., BERT). Such approaches can facilitate information extraction from unstructured notes for registry building, quality improvement and adverse event monitoring.
Collapse
Affiliation(s)
- Nazgol Tavabi
- Department of Orthopaedic Surgery and Sports Medicine, Boston Children's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA.
| | - James Pruneski
- Department of Orthopaedic Surgery and Sports Medicine, Boston Children's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA
| | - Shahriar Golchin
- Department of Orthopaedic Surgery and Sports Medicine, Boston Children's Hospital, Boston, MA, USA
| | - Mallika Singh
- Department of Orthopaedic Surgery and Sports Medicine, Boston Children's Hospital, Boston, MA, USA
| | - Ryan Sanborn
- Department of Orthopaedic Surgery and Sports Medicine, Boston Children's Hospital, Boston, MA, USA
| | - Benton Heyworth
- Department of Orthopaedic Surgery and Sports Medicine, Boston Children's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA
| | - Assaf Landschaft
- Division of Emergency Medicine, Boston Children's Hospital, Boston, MA, USA
| | - Amir Kimia
- Harvard Medical School, Boston, MA, USA; Division of Emergency Medicine, Boston Children's Hospital, Boston, MA, USA
| | - Ata Kiapour
- Department of Orthopaedic Surgery and Sports Medicine, Boston Children's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
34
|
Sivarajkumar S, Kelley M, Samolyk-Mazzanti A, Visweswaran S, Wang Y. An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing: Algorithm Development and Validation Study. JMIR Med Inform 2024; 12:e55318. [PMID: 38587879 PMCID: PMC11036183 DOI: 10.2196/55318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 02/20/2024] [Accepted: 02/24/2024] [Indexed: 04/09/2024] Open
Abstract
BACKGROUND Large language models (LLMs) have shown remarkable capabilities in natural language processing (NLP), especially in domains where labeled data are scarce or expensive, such as the clinical domain. However, to unlock the clinical knowledge hidden in these LLMs, we need to design effective prompts that can guide them to perform specific clinical NLP tasks without any task-specific training data. This is known as in-context learning, which is an art and science that requires understanding the strengths and weaknesses of different LLMs and prompt engineering approaches. OBJECTIVE The objective of this study is to assess the effectiveness of various prompt engineering techniques, including 2 newly introduced types-heuristic and ensemble prompts, for zero-shot and few-shot clinical information extraction using pretrained language models. METHODS This comprehensive experimental study evaluated different prompt types (simple prefix, simple cloze, chain of thought, anticipatory, heuristic, and ensemble) across 5 clinical NLP tasks: clinical sense disambiguation, biomedical evidence extraction, coreference resolution, medication status extraction, and medication attribute extraction. The performance of these prompts was assessed using 3 state-of-the-art language models: GPT-3.5 (OpenAI), Gemini (Google), and LLaMA-2 (Meta). The study contrasted zero-shot with few-shot prompting and explored the effectiveness of ensemble approaches. RESULTS The study revealed that task-specific prompt tailoring is vital for the high performance of LLMs for zero-shot clinical NLP. In clinical sense disambiguation, GPT-3.5 achieved an accuracy of 0.96 with heuristic prompts and 0.94 in biomedical evidence extraction. Heuristic prompts, alongside chain of thought prompts, were highly effective across tasks. Few-shot prompting improved performance in complex scenarios, and ensemble approaches capitalized on multiple prompt strengths. GPT-3.5 consistently outperformed Gemini and LLaMA-2 across tasks and prompt types. CONCLUSIONS This study provides a rigorous evaluation of prompt engineering methodologies and introduces innovative techniques for clinical information extraction, demonstrating the potential of in-context learning in the clinical domain. These findings offer clear guidelines for future prompt-based clinical NLP research, facilitating engagement by non-NLP experts in clinical NLP advancements. To the best of our knowledge, this is one of the first works on the empirical evaluation of different prompt engineering approaches for clinical NLP in this era of generative artificial intelligence, and we hope that it will inspire and inform future research in this area.
Collapse
Affiliation(s)
- Sonish Sivarajkumar
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, United States
| | - Mark Kelley
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, United States
| | - Alyssa Samolyk-Mazzanti
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, United States
| | - Shyam Visweswaran
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Yanshan Wang
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, United States
| |
Collapse
|
35
|
Mashima Y, Tanigawa M, Yokoi H. Information heterogeneity between progress notes by physicians and nurses for inpatients with digestive system diseases. Sci Rep 2024; 14:7656. [PMID: 38561333 PMCID: PMC10984979 DOI: 10.1038/s41598-024-56324-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 03/05/2024] [Indexed: 04/04/2024] Open
Abstract
This study focused on the heterogeneity in progress notes written by physicians or nurses. A total of 806 days of progress notes written by physicians or nurses from 83 randomly selected patients hospitalized in the Gastroenterology Department at Kagawa University Hospital from January to December 2021 were analyzed. We extracted symptoms as the International Classification of Diseases (ICD) Chapter 18 (R00-R99, hereinafter R codes) from each progress note using MedNER-J natural language processing software and counted the days one or more symptoms were extracted to calculate the extraction rate. The R-code extraction rate was significantly higher from progress notes by nurses than by physicians (physicians 68.5% vs. nurses 75.2%; p = 0.00112), regardless of specialty. By contrast, the R-code subcategory R10-R19 for digestive system symptoms (44.2 vs. 37.5%, respectively; p = 0.00299) and many chapters of ICD codes for disease names, as represented by Chapter 11 K00-K93 (68.4 vs. 30.9%, respectively; p < 0.001), were frequently extracted from the progress notes by physicians, reflecting their specialty. We believe that understanding the information heterogeneity of medical documents, which can be the basis of medical artificial intelligence, is crucial, and this study is a pioneering step in that direction.
Collapse
Affiliation(s)
- Yukinori Mashima
- Clinical Research Support Center, Kagawa University Hospital, 1750-1 Ikenobe, Miki-cho, Kita-gun, Kagawa, 761-0793, Japan.
- Department of Medical Informatics, Faculty of Medicine, Kagawa University, Kagawa, Japan.
| | - Masatoshi Tanigawa
- Clinical Research Support Center, Kagawa University Hospital, 1750-1 Ikenobe, Miki-cho, Kita-gun, Kagawa, 761-0793, Japan
| | - Hideto Yokoi
- Clinical Research Support Center, Kagawa University Hospital, 1750-1 Ikenobe, Miki-cho, Kita-gun, Kagawa, 761-0793, Japan
- Department of Medical Informatics, Faculty of Medicine, Kagawa University, Kagawa, Japan
| |
Collapse
|
36
|
Wang L, Ma Y, Bi W, Lv H, Li Y. An Entity Extraction Pipeline for Medical Text Records Using Large Language Models: Analytical Study. J Med Internet Res 2024; 26:e54580. [PMID: 38551633 PMCID: PMC11015372 DOI: 10.2196/54580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 01/23/2024] [Accepted: 02/14/2024] [Indexed: 04/02/2024] Open
Abstract
BACKGROUND The study of disease progression relies on clinical data, including text data, and extracting valuable features from text data has been a research hot spot. With the rise of large language models (LLMs), semantic-based extraction pipelines are gaining acceptance in clinical research. However, the security and feature hallucination issues of LLMs require further attention. OBJECTIVE This study aimed to introduce a novel modular LLM pipeline, which could semantically extract features from textual patient admission records. METHODS The pipeline was designed to process a systematic succession of concept extraction, aggregation, question generation, corpus extraction, and question-and-answer scale extraction, which was tested via 2 low-parameter LLMs: Qwen-14B-Chat (QWEN) and Baichuan2-13B-Chat (BAICHUAN). A data set of 25,709 pregnancy cases from the People's Hospital of Guangxi Zhuang Autonomous Region, China, was used for evaluation with the help of a local expert's annotation. The pipeline was evaluated with the metrics of accuracy and precision, null ratio, and time consumption. Additionally, we evaluated its performance via a quantified version of Qwen-14B-Chat on a consumer-grade GPU. RESULTS The pipeline demonstrates a high level of precision in feature extraction, as evidenced by the accuracy and precision results of Qwen-14B-Chat (95.52% and 92.93%, respectively) and Baichuan2-13B-Chat (95.86% and 90.08%, respectively). Furthermore, the pipeline exhibited low null ratios and variable time consumption. The INT4-quantified version of QWEN delivered an enhanced performance with 97.28% accuracy and a 0% null ratio. CONCLUSIONS The pipeline exhibited consistent performance across different LLMs and efficiently extracted clinical features from textual data. It also showed reliable performance on consumer-grade hardware. This approach offers a viable and effective solution for mining clinical research data from textual records.
Collapse
Affiliation(s)
- Lei Wang
- BGI Research, Wuhan, China
- Guangdong Bigdata Engineering Technology Research Center for Life Sciences, BGI Research, Shenzhen, China
| | - Yinyao Ma
- Department of Obstetrics, People's Hospital of Guangxi Zhuang Autonomous Region, Nanning, China
| | | | | | - Yuxiang Li
- BGI Research, Wuhan, China
- Guangdong Bigdata Engineering Technology Research Center for Life Sciences, BGI Research, Shenzhen, China
| |
Collapse
|
37
|
Huang MS, Han JC, Lin PY, You YT, Tsai RTH, Hsu WL. Surveying biomedical relation extraction: a critical examination of current datasets and the proposal of a new resource. Brief Bioinform 2024; 25:bbae132. [PMID: 38609331 PMCID: PMC11014787 DOI: 10.1093/bib/bbae132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 11/06/2023] [Accepted: 03/02/2023] [Indexed: 04/14/2024] Open
Abstract
Natural language processing (NLP) has become an essential technique in various fields, offering a wide range of possibilities for analyzing data and developing diverse NLP tasks. In the biomedical domain, understanding the complex relationships between compounds and proteins is critical, especially in the context of signal transduction and biochemical pathways. Among these relationships, protein-protein interactions (PPIs) are of particular interest, given their potential to trigger a variety of biological reactions. To improve the ability to predict PPI events, we propose the protein event detection dataset (PEDD), which comprises 6823 abstracts, 39 488 sentences and 182 937 gene pairs. Our PEDD dataset has been utilized in the AI CUP Biomedical Paper Analysis competition, where systems are challenged to predict 12 different relation types. In this paper, we review the state-of-the-art relation extraction research and provide an overview of the PEDD's compilation process. Furthermore, we present the results of the PPI extraction competition and evaluate several language models' performances on the PEDD. This paper's outcomes will provide a valuable roadmap for future studies on protein event detection in NLP. By addressing this critical challenge, we hope to enable breakthroughs in drug discovery and enhance our understanding of the molecular mechanisms underlying various diseases.
Collapse
Affiliation(s)
- Ming-Siang Huang
- Intelligent Agent Systems Laboratory, Department of Computer Science and Information Engineering, Asia University, New Taipei City, Taiwan
- National Institute of Cancer Research, National Health Research Institutes, Tainan, Taiwan
- Department of Computer Science and Information Engineering, College of Information and Electrical Engineering, Asia University, Taichung, Taiwan
| | - Jen-Chieh Han
- Intelligent Information Service Research Laboratory, Department of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan
| | - Pei-Yen Lin
- Intelligent Agent Systems Laboratory, Department of Computer Science and Information Engineering, Asia University, New Taipei City, Taiwan
| | - Yu-Ting You
- Intelligent Agent Systems Laboratory, Department of Computer Science and Information Engineering, Asia University, New Taipei City, Taiwan
| | - Richard Tzong-Han Tsai
- Intelligent Information Service Research Laboratory, Department of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan
- Center for Geographic Information Science, Research Center for Humanities and Social Sciences, Academia Sinica, Taipei, Taiwan
| | - Wen-Lian Hsu
- Intelligent Agent Systems Laboratory, Department of Computer Science and Information Engineering, Asia University, New Taipei City, Taiwan
- Department of Computer Science and Information Engineering, College of Information and Electrical Engineering, Asia University, Taichung, Taiwan
| |
Collapse
|
38
|
Yusuf A, Boyne DJ, O'Sullivan DE, Brenner DR, Cheung WY, Mirza I, Jarada TN. Text analysis framework for identifying mutations among non-small cell lung cancer patients from laboratory data. BMC Med Res Methodol 2024; 24:63. [PMID: 38468224 PMCID: PMC10926579 DOI: 10.1186/s12874-024-02192-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 02/25/2024] [Indexed: 03/13/2024] Open
Abstract
BACKGROUND Laboratory data can provide great value to support research aimed at reducing the incidence, prolonging survival and enhancing outcomes of cancer. Data is characterized by the information it carries and the format it holds. Data captured in Alberta's biomarker laboratory repository is free text, cluttered and rouge. Such data format limits its utility and prohibits broader adoption and research development. Text analysis for information extraction of unstructured data can change this and lead to more complete analyses. Previous work on extracting relevant information from free text, unstructured data employed Natural Language Processing (NLP), Machine Learning (ML), rule-based Information Extraction (IE) methods, or a hybrid combination between them. METHODS In our study, text analysis was performed on Alberta Precision Laboratories data which consisted of 95,854 entries from the Southern Alberta Dataset (SAD) and 6944 entries from the Northern Alberta Dataset (NAD). The data covers all of Alberta and is completely population-based. Our proposed framework is built around rule-based IE methods. It incorporates topics such as Syntax and Lexical analyses to achieve deterministic extraction of data from biomarker laboratory data (i.e., Epidermal Growth Factor Receptor (EGFR) test results). Lexical analysis compromises of data cleaning and pre-processing, Rich Text Format text conversion into readable plain text format, and normalization and tokenization of text. The framework then passes the text into the Syntax analysis stage which includes the rule-based method of extracting relevant data. Rule-based patterns of the test result are identified, and a Context Free Grammar then generates the rules of information extraction. Finally, the results are linked with the Alberta Cancer Registry to support real-world cancer research studies. RESULTS Of the original 5512 entries in the SAD dataset and 5017 entries in the NAD dataset which were filtered for EGFR, the framework yielded 5129 and 3388 extracted EGFR test results from the SAD and NAD datasets, respectively. An accuracy of 97.5% was achieved on a random sample of 362 tests. CONCLUSIONS We presented a text analysis framework to extract specific information from unstructured clinical data. Our proposed framework has shown that it can successfully extract relevant information from EGFR test results.
Collapse
Affiliation(s)
- Amman Yusuf
- Department of Oncology, University of Calgary, Calgary, AB, T2N 4N2, Canada
| | - Devon J Boyne
- Department of Oncology, University of Calgary, Calgary, AB, T2N 4N2, Canada
- Department of Community Health Sciences, University of Calgary, Calgary, AB, T2N 4Z6, Canada
| | - Dylan E O'Sullivan
- Department of Oncology, University of Calgary, Calgary, AB, T2N 4N2, Canada
- Department of Community Health Sciences, University of Calgary, Calgary, AB, T2N 4Z6, Canada
| | - Darren R Brenner
- Department of Oncology, University of Calgary, Calgary, AB, T2N 4N2, Canada
- Department of Community Health Sciences, University of Calgary, Calgary, AB, T2N 4Z6, Canada
| | - Winson Y Cheung
- Department of Oncology, University of Calgary, Calgary, AB, T2N 4N2, Canada
- Department of Community Health Sciences, University of Calgary, Calgary, AB, T2N 4Z6, Canada
| | - Imran Mirza
- Alberta Precision Laboratories, Calgary, AB, T2L 2K8, Canada
| | - Tamer N Jarada
- Department of Oncology, University of Calgary, Calgary, AB, T2N 4N2, Canada.
- Department of Community Health Sciences, University of Calgary, Calgary, AB, T2N 4Z6, Canada.
| |
Collapse
|
39
|
Hu D, Liu B, Zhu X, Lu X, Wu N. Zero-shot information extraction from radiological reports using ChatGPT. Int J Med Inform 2024; 183:105321. [PMID: 38157785 DOI: 10.1016/j.ijmedinf.2023.105321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 12/04/2023] [Accepted: 12/16/2023] [Indexed: 01/03/2024]
Abstract
INTRODUCTION Electronic health records contain an enormous amount of valuable information recorded in free text. Information extraction is the strategy to transform free text into structured data, but some of its components require annotated data to tune, which has become a bottleneck. Large language models achieve good performances on various downstream NLP tasks without parameter tuning, becoming a possible way to extract information in a zero-shot manner. METHODS In this study, we aim to explore whether the most popular large language model, ChatGPT, can extract information from the radiological reports. We first design the prompt template for the interested information in the CT reports. Then, we generate the prompts by combining the prompt template with the CT reports as the inputs of ChatGPT to obtain the responses. A post-processing module is developed to transform the responses into structured extraction results. Besides, we add prior medical knowledge to the prompt template to reduce wrong extraction results. We also explore the consistency of the extraction results. RESULTS We conducted the experiments with 847 real CT reports. The experimental results indicate that ChatGPT can achieve competitive performances for some extraction tasks like tumor location, tumor long and short diameters compared with the baseline information extraction system. By adding some prior medical knowledge to the prompt template, extraction tasks about tumor spiculations and lobulations obtain significant improvements but tasks about tumor density and lymph node status do not achieve better performances. CONCLUSION ChatGPT can achieve competitive information extraction for radiological reports in a zero-shot manner. Adding prior medical knowledge as instructions can further improve performances for some extraction tasks but may lead to worse performances for some complex extraction tasks.
Collapse
Affiliation(s)
- Danqing Hu
- Zhejiang Lab, Hangzhou, 311121, Zhejiang, China.
| | - Bing Liu
- Department of Thoracic Surgery II, Peking University Cancer Hospital and Institute, Beijing, 100142, China
| | - Xiaofeng Zhu
- Zhejiang Lab, Hangzhou, 311121, Zhejiang, China.
| | - Xudong Lu
- College of Biomedical Engineering and Instrumental Science, Zhejiang University, Hangzhou, 310027, Zhejiang, China
| | - Nan Wu
- Department of Thoracic Surgery II, Peking University Cancer Hospital and Institute, Beijing, 100142, China.
| |
Collapse
|
40
|
Gu S, Lee EW, Zhang W, Simpson RL, Hertzberg VS, Ho JC. Evaluating Natural Language Processing Packages for Predicting Hospital-Acquired Pressure Injuries From Clinical Notes. Comput Inform Nurs 2024; 42:184-192. [PMID: 37607706 PMCID: PMC10884344 DOI: 10.1097/cin.0000000000001053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
Incidence of hospital-acquired pressure injury, a key indicator of nursing quality, is directly proportional to adverse outcomes, increased hospital stays, and economic burdens on patients, caregivers, and society. Thus, predicting hospital-acquired pressure injury is important. Prediction models use structured data more often than unstructured notes, although the latter often contain useful patient information. We hypothesize that unstructured notes, such as nursing notes, can predict hospital-acquired pressure injury. We evaluate the impact of using various natural language processing packages to identify salient patient information from unstructured text. We use named entity recognition to identify keywords, which comprise the feature space of our classifier for hospital-acquired pressure injury prediction. We compare scispaCy and Stanza, two different named entity recognition models, using unstructured notes in Medical Information Mart for Intensive Care III, a publicly available ICU data set. To assess the impact of vocabulary size reduction, we compare the use of all clinical notes with only nursing notes. Our results suggest that named entity recognition extraction using nursing notes can yield accurate models. Moreover, the extracted keywords play a significant role in the prediction of hospital-acquired pressure injury.
Collapse
Affiliation(s)
- Siyi Gu
- Author Affiliations: Department of Computer Science, Center for Data Science (Ms Gu, Mr Lee, and Dr Ho), and Nell Hodgson Woodruff School of Nursing (Drs Zhang, Simpson, and Hertzberg), Emory University, Atlanta, GA
| | | | | | | | | | | |
Collapse
|
41
|
Sushil M, Butte AJ, Schuit E, van Smeden M, Leeuwenberg AM. Cross-institution natural language processing for reliable clinical association studies: a methodological exploration. J Clin Epidemiol 2024; 167:111258. [PMID: 38219811 DOI: 10.1016/j.jclinepi.2024.111258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 12/21/2023] [Accepted: 01/08/2024] [Indexed: 01/16/2024]
Abstract
OBJECTIVES Natural language processing (NLP) of clinical notes in electronic medical records is increasingly used to extract otherwise sparsely available patient characteristics, to assess their association with relevant health outcomes. Manual data curation is resource intensive and NLP methods make these studies more feasible. However, the methodology of using NLP methods reliably in clinical research is understudied. The objective of this study is to investigate how NLP models could be used to extract study variables (specifically exposures) to reliably conduct exposure-outcome association studies. STUDY DESIGN AND SETTING In a convenience sample of patients admitted to the intensive care unit of a US academic health system, multiple association studies are conducted, comparing the association estimates based on NLP-extracted vs. manually extracted exposure variables. The association studies varied in NLP model architecture (Bidirectional Encoder Decoder from Transformers, Long Short-Term Memory), training paradigm (training a new model, fine-tuning an existing external model), extracted exposures (employment status, living status, and substance use), health outcomes (having a do-not-resuscitate/intubate code, length of stay, and in-hospital mortality), missing data handling (multiple imputation vs. complete case analysis), and the application of measurement error correction (via regression calibration). RESULTS The study was conducted on 1,174 participants (median [interquartile range] age, 61 [50, 73] years; 60.6% male). Additionally, up to 500 discharge reports of participants from the same health system and 2,528 reports of participants from an external health system were used to train the NLP models. Substantial differences were found between the associations based on NLP-extracted and manually extracted exposures under all settings. The error in association was only weakly correlated with the overall F1 score of the NLP models. CONCLUSION Associations estimated using NLP-extracted exposures should be interpreted with caution. Further research is needed to set conditions for reliable use of NLP in medical association studies.
Collapse
Affiliation(s)
- Madhumita Sushil
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, USA
| | - Atul J Butte
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, USA
| | - Ewoud Schuit
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Maarten van Smeden
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Artuur M Leeuwenberg
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands.
| |
Collapse
|
42
|
Margetta J, Sale A. Distinguishing cardiac catheter ablation energy modalities by applying natural language processing to electronic health records. J Comp Eff Res 2024; 13:e230053. [PMID: 38261335 PMCID: PMC10945417 DOI: 10.57264/cer-2023-0053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Accepted: 01/03/2024] [Indexed: 01/24/2024] Open
Abstract
Aim: Catheter ablation is used to treat symptomatic atrial fibrillation (AF) and is performed using either cryoballoon (CB) or radiofrequency (RF) ablation. There is limited real world data of CB and RF in the US as healthcare codes are agnostic of energy modality. An alternative method is to analyze patients' electronic health records (EHRs) using Optum's EHR database. Objective: To determine the feasibility of using patients' EHRs with natural language processing (NLP) to distinguish CB versus RF ablation procedures. Data Source: Optum® de-identified EHR dataset, Optum® Cardiac Ablation NLP Table. Methods: This was a retrospective analysis of existing de-identified EHR data. Medical codes were used to create an ablation validation table. Frequency analysis was used to assess ablation procedures and their associated note terms. Two cohorts were created (1) index procedures, (2) multiple procedures. Possible note term combinations included (1) cryoablation (2) radiofrequency (3) ablation, or (4) both. Results: Of the 40,810 validated cardiac ablations, 3777 (9%) index ablation procedures had available and matching NLP note terms. Of these, 22% (n = 844) were classified as ablation, 27% (n = 1016) as cryoablation, 49% (n = 1855) as radiofrequency ablation, and 1.6% (n = 62) as both. In the multiple procedures analysis, 5691 (14%) procedures had matching note terms. 24% (n = 1362) were classified as ablation, 27% as cryoablation, 47% as radiofrequency ablation, and 2% as both. Conclusion: NLP has potential to evaluate the frequency of cardiac ablation by type, however, for this to be a reliable real-world data source, mandatory data entry by providers and standardized electronic health reporting must occur.
Collapse
Affiliation(s)
- Jamie Margetta
- Department of Health Economics & Outcomes Research, Medtronic, Mounds View, MN 55112, USA
| | - Alicia Sale
- Department of Health Economics & Outcomes Research, Medtronic, Mounds View, MN 55112, USA
| |
Collapse
|
43
|
Mora S, Turrisi R, Chiarella L, Consales A, Tassi L, Mai R, Nobili L, Barla A, Arnulfo G. NLP-based tools for localization of the epileptogenic zone in patients with drug-resistant focal epilepsy. Sci Rep 2024; 14:2349. [PMID: 38287042 PMCID: PMC10825198 DOI: 10.1038/s41598-024-51846-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 01/10/2024] [Indexed: 01/31/2024] Open
Abstract
Epilepsy surgery is an option for people with focal onset drug-resistant (DR) seizures but a delayed or incorrect diagnosis of epileptogenic zone (EZ) location limits its efficacy. Seizure semiological manifestations and their chronological appearance contain valuable information on the putative EZ location but their interpretation relies on extensive experience. The aim of our work is to support the localization of EZ in DR patients automatically analyzing the semiological description of seizures contained in video-EEG reports. Our sample is composed of 536 descriptions of seizures extracted from Electronic Medical Records of 122 patients. We devised numerical representations of anamnestic records and seizures descriptions, exploiting Natural Language Processing (NLP) techniques, and used them to feed Machine Learning (ML) models. We performed three binary classification tasks: localizing the EZ in the right or left hemisphere, temporal or extra-temporal, and frontal or posterior regions. Our computational pipeline reached performances above 70% in all tasks. These results show that NLP-based numerical representation combined with ML-based classification models may help in localizing the origin of the seizures relying only on seizures-related semiological text data alone. Accurate early recognition of EZ could enable a more appropriate patient management and a faster access to epilepsy surgery to potential candidates.
Collapse
Affiliation(s)
- Sara Mora
- Department of Informatics, Bioengineering, Robotics and System Engineering (DIBRIS), University of Genoa, 16145, Genoa, Italy.
| | - Rosanna Turrisi
- Department of Informatics, Bioengineering, Robotics and System Engineering (DIBRIS), University of Genoa, 16145, Genoa, Italy
- MaLGa Machine Learning Genoa Center, University of Genoa, 16146, Genoa, Italy
| | - Lorenzo Chiarella
- Department of Neuroscience, Rehabilitation, Ophthalmology, Genetics, Child and Maternal Health (DINOGMI), University of Genoa, 16132, Genoa, Italy
- Child Neuropsychiatry Unit, IRCCS Istituto Giannina Gaslini, Member of the European Reference Network EpiCARE, 16147, Genoa, Italy
| | - Alessandro Consales
- Division of Neurosurgery, IRCCS Istituto Giannina Gaslini, 16147, Genoa, Italy
| | - Laura Tassi
- "Claudio Munari" Epilepsy Surgery Center, Niguarda Hospital, 20162, Milan, Italy
| | - Roberto Mai
- "Claudio Munari" Epilepsy Surgery Center, Niguarda Hospital, 20162, Milan, Italy
| | - Lino Nobili
- Department of Neuroscience, Rehabilitation, Ophthalmology, Genetics, Child and Maternal Health (DINOGMI), University of Genoa, 16132, Genoa, Italy
- Child Neuropsychiatry Unit, IRCCS Istituto Giannina Gaslini, Member of the European Reference Network EpiCARE, 16147, Genoa, Italy
| | - Annalisa Barla
- Department of Informatics, Bioengineering, Robotics and System Engineering (DIBRIS), University of Genoa, 16145, Genoa, Italy
- MaLGa Machine Learning Genoa Center, University of Genoa, 16146, Genoa, Italy
| | - Gabriele Arnulfo
- Department of Informatics, Bioengineering, Robotics and System Engineering (DIBRIS), University of Genoa, 16145, Genoa, Italy
- Neuroscience Center, Helsinki Institute of Life Science (HiLife), University of Helsinki, 00014, Helsinki, Finland
| |
Collapse
|
44
|
Lin WC, Chen A, Song X, Weiskopf NG, Chiang MF, Hribar MR. Prediction of multiclass surgical outcomes in glaucoma using multimodal deep learning based on free-text operative notes and structured EHR data. J Am Med Inform Assoc 2024; 31:456-464. [PMID: 37964658 PMCID: PMC10797280 DOI: 10.1093/jamia/ocad213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 10/16/2023] [Accepted: 10/25/2023] [Indexed: 11/16/2023] Open
Abstract
OBJECTIVE Surgical outcome prediction is challenging but necessary for postoperative management. Current machine learning models utilize pre- and post-op data, excluding intraoperative information in surgical notes. Current models also usually predict binary outcomes even when surgeries have multiple outcomes that require different postoperative management. This study addresses these gaps by incorporating intraoperative information into multimodal models for multiclass glaucoma surgery outcome prediction. MATERIALS AND METHODS We developed and evaluated multimodal deep learning models for multiclass glaucoma trabeculectomy surgery outcomes using both structured EHR data and free-text operative notes. We compare those to baseline models that use structured EHR data exclusively, or neural network models that leverage only operative notes. RESULTS The multimodal neural network had the highest performance with a macro AUROC of 0.750 and F1 score of 0.583. It outperformed the baseline machine learning model with structured EHR data alone (macro AUROC of 0.712 and F1 score of 0.486). Additionally, the multimodal model achieved the highest recall (0.692) for hypotony surgical failure, while the surgical success group had the highest precision (0.884) and F1 score (0.775). DISCUSSION This study shows that operative notes are an important source of predictive information. The multimodal predictive model combining perioperative notes and structured pre- and post-op EHR data outperformed other models. Multiclass surgical outcome prediction can provide valuable insights for clinical decision-making. CONCLUSIONS Our results show the potential of deep learning models to enhance clinical decision-making for postoperative management. They can be applied to other specialties to improve surgical outcome predictions.
Collapse
Affiliation(s)
- Wei-Chun Lin
- Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, 3181 S.W. Sam Jackson Park Rd, Portland, OR, 97239, United States
- Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, 545 SW Campus Dr, Portland, OR, 97239, United States
| | - Aiyin Chen
- Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, 545 SW Campus Dr, Portland, OR, 97239, United States
| | - Xubo Song
- Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, 3181 S.W. Sam Jackson Park Rd, Portland, OR, 97239, United States
| | - Nicole G Weiskopf
- Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, 3181 S.W. Sam Jackson Park Rd, Portland, OR, 97239, United States
| | - Michael F Chiang
- National Eye Institute, National Institutes of Health, 31 Center Dr MSC 2510, Bethesda, MD, 20892, United States
- National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD, 20894, United States
| | - Michelle R Hribar
- Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, 3181 S.W. Sam Jackson Park Rd, Portland, OR, 97239, United States
- Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, 545 SW Campus Dr, Portland, OR, 97239, United States
- National Eye Institute, National Institutes of Health, 31 Center Dr MSC 2510, Bethesda, MD, 20892, United States
| |
Collapse
|
45
|
Anell A, Arvidsson E, Dackehag M, Ellegård LM, Glenngård AH. Access to automated comparative feedback reports in primary care - a study of intensity of use and relationship with clinical performance among Swedish primary care practices. BMC Health Serv Res 2024; 24:33. [PMID: 38178188 PMCID: PMC10768433 DOI: 10.1186/s12913-023-10407-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 11/30/2023] [Indexed: 01/06/2024] Open
Abstract
BACKGROUND Digital applications that automatically extract information from electronic medical records and provide comparative visualizations of the data in the form of quality indicators to primary care practices may facilitate local quality improvement (QI). A necessary condition for such QI to work is that practices actively access the data. The purpose of this study was to explore the use of an application that visualizes quality indicators in Swedish primary care, developed by a profession-led QI initiative ("Primärvårdskvalitet"). We also describe the characteristics of practices that used the application more or less extensively, and the relationships between the intensity of use and changes in selected performance indicators. METHODS We studied longitudinal data on 122 primary care practices' visits to pages (page views) in the application over a period up to 5 years. We compared high and low users, classified by the average number of monthly page views, with respect to practice and patient characteristics as well as baseline measurements of a subset of the performance indicators. We estimated linear associations between visits to pages with diabetes-related indicators and the change in measurements of selected diabetes indicators over 1.5 years. RESULTS Less than half of all practices accessed the data in a given month, although most practices accessed the data during at least one third of the observed months. High and low users were similar in terms of most studied characteristics. We found statistically significant positive associations between use of the diabetes indicators and changes in measurements of three diabetes indicators. CONCLUSIONS Although most practices in this study indicated an interest in the automated feedback reports, the intensity of use can be described as varying and on average limited. The positive associations between the use and changes in performance suggest that policymakers should increase their support of practices' QI efforts. Such support may include providing a formalized structure for peer group discussions of data, facilitating both understanding of the data and possible action points to improve performance, while maintaining a profession-led use of applications.
Collapse
Affiliation(s)
- Anders Anell
- Lund University School of Economics & Management, Lund, Sweden
| | - Eva Arvidsson
- Futurum, Region Jönköping County, Jönköping, Sweden
- School of Health and Welfare, Jönköping University, Jönköping, Sweden
| | | | - Lina Maria Ellegård
- Lund University School of Economics & Management, Lund, Sweden.
- Faculty of Business, Kristianstad University, Kristianstad, Sweden.
| | | |
Collapse
|
46
|
Cui Z, Yu K, Yuan Z, Dong X, Luo W. Language inference-based learning for Low-Resource Chinese clinical named entity recognition using language model. J Biomed Inform 2024; 149:104559. [PMID: 38056702 DOI: 10.1016/j.jbi.2023.104559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 10/24/2023] [Accepted: 11/30/2023] [Indexed: 12/08/2023]
Abstract
Electronic health records (EHRs) have been widely used and are gradually replacing paper records. Therefore, extracting valuable information from EHRs has become the focus and hotspot of current research. Clinical named entity recognition (CNER) is an important task in information extraction. Most current research methods used standard supervised learning approaches to fine-tune pre-trained language models (PLMs), which require a large amount of annotated data for model training. However, in realistic medical scenarios, annotated data are scarce, especially in the healthcare field. The process of annotating data in real clinical settings is time-consuming and labour-intensive. In this paper, a language inference-based learning method (LANGIL) is proposed to study clinical NER tasks with limited annotated samples, i.e., in low-resource clinical scenarios. A method based on prompt learning is designed to reformulate the entity recognition task into a language inference-based task. Differing from the standard fine-tuning method, the approach introduced in this paper does not design the additional network layers that train from scratch. This alleviates the gap between pre-training tasks and downstream tasks, allowing the comprehension capabilities of PLMs to be leveraged under the condition of limited training samples. The experiments on four Chinese clinical named entity recognition datasets showed that LANGIL achieves significant improvements in F1-score compared to the former method.
Collapse
Affiliation(s)
- Zhaojian Cui
- Schoolof Information Science and Technology, Hangzhou Normal University, Hangzhou, 311121 China
| | - Kai Yu
- Schoolof Information Science and Technology, Hangzhou Normal University, Hangzhou, 311121 China.
| | - Zhenming Yuan
- Schoolof Information Science and Technology, Hangzhou Normal University, Hangzhou, 311121 China
| | - Xiaofeng Dong
- Schoolof Information Science and Technology, Hangzhou Normal University, Hangzhou, 311121 China
| | - Weibin Luo
- Schoolof Information Science and Technology, Hangzhou Normal University, Hangzhou, 311121 China
| |
Collapse
|
47
|
Zhou H, Li M, Xiao Y, Yang H, Zhang R. LLM Instruction-Example Adaptive Prompting (LEAP) Framework for Clinical Relation Extraction. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.12.15.23300059. [PMID: 38168203 PMCID: PMC10760264 DOI: 10.1101/2023.12.15.23300059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
Objective To investigate the demonstration in Large Language Models (LLMs) for clinical relation extraction. We focus on examining two types of adaptive demonstration: instruction adaptive prompting, and example adaptive prompting to understand their impacts and effectiveness. Materials and Methods The study unfolds in two stages. Initially, we explored a range of demonstration components vital to LLMs' clinical data extraction, such as task descriptions and examples, and tested their combinations. Subsequently, we introduced the Instruction-Example Adaptive Prompting (LEAP) Framework, a system that integrates two types of adaptive prompts: one preceding instruction and another before examples. This framework is designed to systematically explore both adaptive task description and adaptive examples within the demonstration. We evaluated LEAP framework's performance on the DDI and BC5CDR chemical interaction datasets, applying it across LLMs such as Llama2-7b, Llama2-13b, and MedLLaMA_13B. Results The study revealed that Instruction + Options + Examples and its expanded form substantially raised F1-scores over the standard Instruction + Options mode. LEAP framework excelled, especially with example adaptive prompting that outdid traditional instruction tuning across models. Notably, the MedLLAMA-13b model scored an impressive 95.13 F1 on the BC5CDR dataset with this method. Significant improvements were also seen in the DDI 2013 dataset, confirming the method's robustness in sophisticated data extraction. Conclusion The LEAP framework presents a promising avenue for refining LLM training strategies, steering away from extensive finetuning towards more contextually rich and dynamic prompting methodologies.
Collapse
Affiliation(s)
- Huixue Zhou
- Institute for Health Informatics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Mingchen Li
- Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, Minnesota, USA
| | - Yongkang Xiao
- Institute for Health Informatics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Han Yang
- Institute for Health Informatics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Rui Zhang
- Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, Minnesota, USA
| |
Collapse
|
48
|
Li R, Wang X, Yu H. Two Directions for Clinical Data Generation with Large Language Models: Data-to-Label and Label-to-Data. PROCEEDINGS OF THE CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING. CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING 2023; 2023:7129-7143. [PMID: 38213944 PMCID: PMC10782150 DOI: 10.18653/v1/2023.findings-emnlp.474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2024]
Abstract
Large language models (LLMs) can generate natural language texts for various domains and tasks, but their potential for clinical text mining, a domain with scarce, sensitive, and imbalanced medical data, is under-explored. We investigate whether LLMs can augment clinical data for detecting Alzheimer's Disease (AD)-related signs and symptoms from electronic health records (EHRs), a challenging task that requires high expertise. We create a novel pragmatic taxonomy for AD sign and symptom progression based on expert knowledge and generated three datasets: (1) a gold dataset annotated by human experts on longitudinal EHRs of AD patients; (2) a silver dataset created by the data-to-label method, which labels sentences from a public EHR collection with AD-related signs and symptoms; and (3) a bronze dataset created by the label-to-data method which generates sentences with AD-related signs and symptoms based on the label definition. We train a system to detect AD-related signs and symptoms from EHRs. We find that the silver and bronze datasets improves the system performance, outperforming the system using only the gold dataset. This shows that LLMs can generate synthetic clinical data for a complex task by incorporating expert knowledge, and our label-to-data method can produce datasets that are free of sensitive information, while maintaining acceptable quality.
Collapse
Affiliation(s)
- Rumeng Li
- Umass Amherst, Amherst, MA, USA
- VA Bedford Healthcare System, Bedford, MA, USA
| | | | - Hong Yu
- Umass Amherst, Amherst, MA, USA
- VA Bedford Healthcare System, Bedford, MA, USA
- Umass Lowell, Lowell, MA, USA
| |
Collapse
|
49
|
Nuthalapati P, Thomas L, Donahue MA, Moura LMVR, DeStefano S, Simpson JR, Buchhalter J, Fureman BE, Pellinen J. Improving Seizure Frequency Documentation and Classification. Neurol Clin Pract 2023; 13:e200212. [PMID: 37873534 PMCID: PMC10586801 DOI: 10.1212/cpj.0000000000200212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Accepted: 09/01/2023] [Indexed: 10/25/2023]
Abstract
Background and Objectives Accurate and reliable seizure data are essential for evaluating treatment strategies and tracking the quality of care in epilepsy clinics. This quality improvement project aimed to increase seizure documentation (i.e., documentation of seizure frequency from 80% to 100%, date of last seizure from 35% to 50%, and International League Against Epilepsy (ILAE) seizure classification from 35% to at least 50%) over 6 months. Methods We surveyed 7 epileptologists to determine their perceived seizure frequency, ILAE classification, and date of last seizure documentation habits. Baseline data were collected weekly from September to December 2021. Subsequently, we implemented a newly created flowsheet in our Electronic Health Record (EHR) based on the Epilepsy Learning Healthcare System (ELHS) Case Report Forms to increase seizure documentation in a standardized way. Two epileptologists tested this flowsheet tool in their epilepsy clinics between February 2022 and July 2022. Data were collected weekly and compared with documentation from other epileptologists within the same group. Results Epileptologists at our center believed they documented seizure frequency for 84%-87% of clinic visits, which aligned with baseline data collection, showing they recorded seizure frequency for 83% of clinic visits. Epileptologists believed they documented ILAE classification for 47%-52% of clinic visits, and baseline data showed this was documented in 33% of clinic visits. They also reported documenting the date of the last seizure for 52%-63% of clinic visits, but this occurred in only 35% of clinic visits. After implementing the new flowsheet, documentation increased to nearly 100% for all fields being completed by the providers who tested the flowsheet. Discussion We demonstrated that by implementing an easy-to-use standardized EHR documentation tool, our documentation of critical metrics, as defined by the ELHS, improved dramatically. This shows that simple and practical interventions can substantially improve clinically meaningful documentation.
Collapse
Affiliation(s)
- Poojith Nuthalapati
- Department of Neurology (PN, MAD, LMVRM), Massachusetts General Hospital, Harvard Medical School, Boston; Department of Neurology (LT, SD, JRS, JP), University of Colorado School of Medicine, Aurora; Department of Pediatrics (JB), Cumming School of Medicine, University of Calgary, AB, CA; and Mission Outcomes Team (BEF), Epilepsy Foundation, Landover, MD
| | - Lionel Thomas
- Department of Neurology (PN, MAD, LMVRM), Massachusetts General Hospital, Harvard Medical School, Boston; Department of Neurology (LT, SD, JRS, JP), University of Colorado School of Medicine, Aurora; Department of Pediatrics (JB), Cumming School of Medicine, University of Calgary, AB, CA; and Mission Outcomes Team (BEF), Epilepsy Foundation, Landover, MD
| | - Maria A Donahue
- Department of Neurology (PN, MAD, LMVRM), Massachusetts General Hospital, Harvard Medical School, Boston; Department of Neurology (LT, SD, JRS, JP), University of Colorado School of Medicine, Aurora; Department of Pediatrics (JB), Cumming School of Medicine, University of Calgary, AB, CA; and Mission Outcomes Team (BEF), Epilepsy Foundation, Landover, MD
| | - Lidia M V R Moura
- Department of Neurology (PN, MAD, LMVRM), Massachusetts General Hospital, Harvard Medical School, Boston; Department of Neurology (LT, SD, JRS, JP), University of Colorado School of Medicine, Aurora; Department of Pediatrics (JB), Cumming School of Medicine, University of Calgary, AB, CA; and Mission Outcomes Team (BEF), Epilepsy Foundation, Landover, MD
| | - Samuel DeStefano
- Department of Neurology (PN, MAD, LMVRM), Massachusetts General Hospital, Harvard Medical School, Boston; Department of Neurology (LT, SD, JRS, JP), University of Colorado School of Medicine, Aurora; Department of Pediatrics (JB), Cumming School of Medicine, University of Calgary, AB, CA; and Mission Outcomes Team (BEF), Epilepsy Foundation, Landover, MD
| | - Jennifer R Simpson
- Department of Neurology (PN, MAD, LMVRM), Massachusetts General Hospital, Harvard Medical School, Boston; Department of Neurology (LT, SD, JRS, JP), University of Colorado School of Medicine, Aurora; Department of Pediatrics (JB), Cumming School of Medicine, University of Calgary, AB, CA; and Mission Outcomes Team (BEF), Epilepsy Foundation, Landover, MD
| | - Jeffrey Buchhalter
- Department of Neurology (PN, MAD, LMVRM), Massachusetts General Hospital, Harvard Medical School, Boston; Department of Neurology (LT, SD, JRS, JP), University of Colorado School of Medicine, Aurora; Department of Pediatrics (JB), Cumming School of Medicine, University of Calgary, AB, CA; and Mission Outcomes Team (BEF), Epilepsy Foundation, Landover, MD
| | - Brandy E Fureman
- Department of Neurology (PN, MAD, LMVRM), Massachusetts General Hospital, Harvard Medical School, Boston; Department of Neurology (LT, SD, JRS, JP), University of Colorado School of Medicine, Aurora; Department of Pediatrics (JB), Cumming School of Medicine, University of Calgary, AB, CA; and Mission Outcomes Team (BEF), Epilepsy Foundation, Landover, MD
| | - Jacob Pellinen
- Department of Neurology (PN, MAD, LMVRM), Massachusetts General Hospital, Harvard Medical School, Boston; Department of Neurology (LT, SD, JRS, JP), University of Colorado School of Medicine, Aurora; Department of Pediatrics (JB), Cumming School of Medicine, University of Calgary, AB, CA; and Mission Outcomes Team (BEF), Epilepsy Foundation, Landover, MD
| |
Collapse
|
50
|
Crema C, Buonocore TM, Fostinelli S, Parimbelli E, Verde F, Fundarò C, Manera M, Ramusino MC, Capelli M, Costa A, Binetti G, Bellazzi R, Redolfi A. Advancing Italian biomedical information extraction with transformers-based models: Methodological insights and multicenter practical application. J Biomed Inform 2023; 148:104557. [PMID: 38012982 DOI: 10.1016/j.jbi.2023.104557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 10/26/2023] [Accepted: 11/24/2023] [Indexed: 11/29/2023]
Abstract
The introduction of computerized medical records in hospitals has reduced burdensome activities like manual writing and information fetching. However, the data contained in medical records are still far underutilized, primarily because extracting data from unstructured textual medical records takes time and effort. Information Extraction, a subfield of Natural Language Processing, can help clinical practitioners overcome this limitation by using automated text-mining pipelines. In this work, we created the first Italian neuropsychiatric Named Entity Recognition dataset, PsyNIT, and used it to develop a Transformers-based model. Moreover, we collected and leveraged three external independent datasets to implement an effective multicenter model, with overall F1-score 84.77 %, Precision 83.16 %, Recall 86.44 %. The lessons learned are: (i) the crucial role of a consistent annotation process and (ii) a fine-tuning strategy that combines classical methods with a "low-resource" approach. This allowed us to establish methodological guidelines that pave the way for Natural Language Processing studies in less-resourced languages.
Collapse
Affiliation(s)
- Claudio Crema
- Laboratory of Neuroinformatics, IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli, Brescia, Italy.
| | - Tommaso Mario Buonocore
- Dept. of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy.
| | - Silvia Fostinelli
- Molecular Markers Laboratory, IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli, Brescia, Italy.
| | - Enea Parimbelli
- Dept. of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy.
| | - Federico Verde
- Department of Neurology and Laboratory of Neuroscience, IRCCS Istituto Auxologico Italiano, Milan, Italy; Department of Pathophysiology and Transplantation, Dino Ferrari Center, Università degli Studi di Milano, Milan, Italy.
| | - Cira Fundarò
- Neurophysiopatology Unit, IRCCS Istituti Clinici Scientifici Maugeri, Pavia, Italy.
| | - Marina Manera
- Psychology Unit, IRCCS Istituti Clinici Scientifici Maugeri, Pavia, Italy.
| | - Matteo Cotta Ramusino
- Unit of Behavioral Neurology, IRCCS Mondino Foundation Pavia, and Dept. of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy.
| | - Marco Capelli
- Unit of Behavioral Neurology, IRCCS Mondino Foundation Pavia, and Dept. of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy.
| | - Alfredo Costa
- Unit of Behavioral Neurology, IRCCS Mondino Foundation Pavia, and Dept. of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy.
| | - Giuliano Binetti
- Molecular Markers Laboratory, IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli, Brescia, Italy.
| | - Riccardo Bellazzi
- Dept. of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy.
| | - Alberto Redolfi
- Laboratory of Neuroinformatics, IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli, Brescia, Italy.
| |
Collapse
|