1
|
Liu S, Wang A, Xiu X, Zhong M, Wu S. Evaluating Medical Entity Recognition in Health Care: Entity Model Quantitative Study. JMIR Med Inform 2024; 12:e59782. [PMID: 39419501 PMCID: PMC11528166 DOI: 10.2196/59782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Revised: 08/09/2024] [Accepted: 09/15/2024] [Indexed: 10/19/2024] Open
Abstract
BACKGROUND Named entity recognition (NER) models are essential for extracting structured information from unstructured medical texts by identifying entities such as diseases, treatments, and conditions, enhancing clinical decision-making and research. Innovations in machine learning, particularly those involving Bidirectional Encoder Representations From Transformers (BERT)-based deep learning and large language models, have significantly advanced NER capabilities. However, their performance varies across medical datasets due to the complexity and diversity of medical terminology. Previous studies have often focused on overall performance, neglecting specific challenges in medical contexts and the impact of macrofactors like lexical composition on prediction accuracy. These gaps hinder the development of optimized NER models for medical applications. OBJECTIVE This study aims to meticulously evaluate the performance of various NER models in the context of medical text analysis, focusing on how complex medical terminology affects entity recognition accuracy. Additionally, we explored the influence of macrofactors on model performance, seeking to provide insights for refining NER models and enhancing their reliability for medical applications. METHODS This study comprehensively evaluated 7 NER models-hidden Markov models, conditional random fields, BERT for Biomedical Text Mining, Big Transformer Models for Efficient Long-Sequence Attention, Decoding-enhanced BERT with Disentangled Attention, Robustly Optimized BERT Pretraining Approach, and Gemma-across 3 medical datasets: Revised Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA), BioCreative V CDR, and Anatomical Entity Mention (AnatEM). The evaluation focused on prediction accuracy, resource use (eg, central processing unit and graphics processing unit use), and the impact of fine-tuning hyperparameters. The macrofactors affecting model performance were also screened using the multilevel factor elimination algorithm. RESULTS The fine-tuned BERT for Biomedical Text Mining, with balanced resource use, generally achieved the highest prediction accuracy across the Revised JNLPBA and AnatEM datasets, with microaverage (AVG_MICRO) scores of 0.932 and 0.8494, respectively, highlighting its superior proficiency in identifying medical entities. Gemma, fine-tuned using the low-rank adaptation technique, achieved the highest accuracy on the BioCreative V CDR dataset with an AVG_MICRO score of 0.9962 but exhibited variability across the other datasets (AVG_MICRO scores of 0.9088 on the Revised JNLPBA and 0.8029 on AnatEM), indicating a need for further optimization. In addition, our analysis revealed that 2 macrofactors, entity phrase length and the number of entity words in each entity phrase, significantly influenced model performance. CONCLUSIONS This study highlights the essential role of NER models in medical informatics, emphasizing the imperative for model optimization via precise data targeting and fine-tuning. The insights from this study will notably improve clinical decision-making and facilitate the creation of more sophisticated and effective medical NER models.
Collapse
Affiliation(s)
- Shengyu Liu
- Department of Medical Data Sharing, Institute of Medical Information & Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Anran Wang
- Department of Medical Data Sharing, Institute of Medical Information & Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Xiaolei Xiu
- Department of Medical Data Sharing, Institute of Medical Information & Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Ming Zhong
- Department of Medical Data Sharing, Institute of Medical Information & Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Sizhu Wu
- Department of Medical Data Sharing, Institute of Medical Information & Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| |
Collapse
|
2
|
Keloth VK, Hu Y, Xie Q, Peng X, Wang Y, Zheng A, Selek M, Raja K, Wei CH, Jin Q, Lu Z, Chen Q, Xu H. Advancing entity recognition in biomedicine via instruction tuning of large language models. Bioinformatics 2024; 40:btae163. [PMID: 38514400 PMCID: PMC11001490 DOI: 10.1093/bioinformatics/btae163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 02/18/2024] [Accepted: 03/19/2024] [Indexed: 03/23/2024] Open
Abstract
MOTIVATION Large Language Models (LLMs) have the potential to revolutionize the field of Natural Language Processing, excelling not only in text generation and reasoning tasks but also in their ability for zero/few-shot learning, swiftly adapting to new tasks with minimal fine-tuning. LLMs have also demonstrated great promise in biomedical and healthcare applications. However, when it comes to Named Entity Recognition (NER), particularly within the biomedical domain, LLMs fall short of the effectiveness exhibited by fine-tuned domain-specific models. One key reason is that NER is typically conceptualized as a sequence labeling task, whereas LLMs are optimized for text generation and reasoning tasks. RESULTS We developed an instruction-based learning paradigm that transforms biomedical NER from a sequence labeling task into a generation task. This paradigm is end-to-end and streamlines the training and evaluation process by automatically repurposing pre-existing biomedical NER datasets. We further developed BioNER-LLaMA using the proposed paradigm with LLaMA-7B as the foundational LLM. We conducted extensive testing on BioNER-LLaMA across three widely recognized biomedical NER datasets, consisting of entities related to diseases, chemicals, and genes. The results revealed that BioNER-LLaMA consistently achieved higher F1-scores ranging from 5% to 30% compared to the few-shot learning capabilities of GPT-4 on datasets with different biomedical entities. We show that a general-domain LLM can match the performance of rigorously fine-tuned PubMedBERT models and PMC-LLaMA, biomedical-specific language model. Our findings underscore the potential of our proposed paradigm in developing general-domain LLMs that can rival SOTA performances in multi-task, multi-domain scenarios in biomedical and health applications. AVAILABILITY AND IMPLEMENTATION Datasets and other resources are available at https://github.com/BIDS-Xu-Lab/BioNER-LLaMA.
Collapse
Affiliation(s)
- Vipina K Keloth
- Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT-06510, United States
| | - Yan Hu
- McWilliams School of Biomedical Informatics, University of Texas Health Science at Houston, Houston, TX-77030, United States
| | - Qianqian Xie
- Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT-06510, United States
| | - Xueqing Peng
- Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT-06510, United States
| | - Yan Wang
- Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT-06510, United States
| | - Andrew Zheng
- William P. Clements High School, Sugar Land, TX-77479, United States
| | - Melih Selek
- Stephen F. Austin High School, Sugar Land, TX-77498, United States
| | - Kalpana Raja
- Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT-06510, United States
| | - Chih Hsuan Wei
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD-20894, United States
| | - Qiao Jin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD-20894, United States
| | - Zhiyong Lu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD-20894, United States
| | - Qingyu Chen
- Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT-06510, United States
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD-20894, United States
| | - Hua Xu
- Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT-06510, United States
| |
Collapse
|
3
|
Datta S, Lee K, Paek H, Manion FJ, Ofoegbu N, Du J, Li Y, Huang LC, Wang J, Lin B, Xu H, Wang X. AutoCriteria: a generalizable clinical trial eligibility criteria extraction system powered by large language models. J Am Med Inform Assoc 2024; 31:375-385. [PMID: 37952206 PMCID: PMC10797270 DOI: 10.1093/jamia/ocad218] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 09/19/2023] [Accepted: 11/08/2023] [Indexed: 11/14/2023] Open
Abstract
OBJECTIVES We aim to build a generalizable information extraction system leveraging large language models to extract granular eligibility criteria information for diverse diseases from free text clinical trial protocol documents. We investigate the model's capability to extract criteria entities along with contextual attributes including values, temporality, and modifiers and present the strengths and limitations of this system. MATERIALS AND METHODS The clinical trial data were acquired from https://ClinicalTrials.gov/. We developed a system, AutoCriteria, which comprises the following modules: preprocessing, knowledge ingestion, prompt modeling based on GPT, postprocessing, and interim evaluation. The final system evaluation was performed, both quantitatively and qualitatively, on 180 manually annotated trials encompassing 9 diseases. RESULTS AutoCriteria achieves an overall F1 score of 89.42 across all 9 diseases in extracting the criteria entities, with the highest being 95.44 for nonalcoholic steatohepatitis and the lowest of 84.10 for breast cancer. Its overall accuracy is 78.95% in identifying all contextual information across all diseases. Our thematic analysis indicated accurate logic interpretation of criteria as one of the strengths and overlooking/neglecting the main criteria as one of the weaknesses of AutoCriteria. DISCUSSION AutoCriteria demonstrates strong potential to extract granular eligibility criteria information from trial documents without requiring manual annotations. The prompts developed for AutoCriteria generalize well across different disease areas. Our evaluation suggests that the system handles complex scenarios including multiple arm conditions and logics. CONCLUSION AutoCriteria currently encompasses a diverse range of diseases and holds potential to extend to more in the future. This signifies a generalizable and scalable solution, poised to address the complexities of clinical trial application in real-world settings.
Collapse
Affiliation(s)
- Surabhi Datta
- Melax Technologies, Houston, TX 77030, United States
| | - Kyeryoung Lee
- Melax Technologies, Houston, TX 77030, United States
| | - Hunki Paek
- Melax Technologies, Houston, TX 77030, United States
| | | | - Nneka Ofoegbu
- Melax Technologies, Houston, TX 77030, United States
| | - Jingcheng Du
- Melax Technologies, Houston, TX 77030, United States
| | - Ying Li
- Regeneron Pharmaceuticals, Tarrytown, NY 10591, United States
| | | | - Jingqi Wang
- Melax Technologies, Houston, TX 77030, United States
| | - Bin Lin
- Melax Technologies, Houston, TX 77030, United States
| | - Hua Xu
- Yale School of Medicine, New Haven, CT 06511, United States
| | - Xiaoyan Wang
- Melax Technologies, Houston, TX 77030, United States
| |
Collapse
|
4
|
Crema C, Buonocore TM, Fostinelli S, Parimbelli E, Verde F, Fundarò C, Manera M, Ramusino MC, Capelli M, Costa A, Binetti G, Bellazzi R, Redolfi A. Advancing Italian biomedical information extraction with transformers-based models: Methodological insights and multicenter practical application. J Biomed Inform 2023; 148:104557. [PMID: 38012982 DOI: 10.1016/j.jbi.2023.104557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 10/26/2023] [Accepted: 11/24/2023] [Indexed: 11/29/2023]
Abstract
The introduction of computerized medical records in hospitals has reduced burdensome activities like manual writing and information fetching. However, the data contained in medical records are still far underutilized, primarily because extracting data from unstructured textual medical records takes time and effort. Information Extraction, a subfield of Natural Language Processing, can help clinical practitioners overcome this limitation by using automated text-mining pipelines. In this work, we created the first Italian neuropsychiatric Named Entity Recognition dataset, PsyNIT, and used it to develop a Transformers-based model. Moreover, we collected and leveraged three external independent datasets to implement an effective multicenter model, with overall F1-score 84.77 %, Precision 83.16 %, Recall 86.44 %. The lessons learned are: (i) the crucial role of a consistent annotation process and (ii) a fine-tuning strategy that combines classical methods with a "low-resource" approach. This allowed us to establish methodological guidelines that pave the way for Natural Language Processing studies in less-resourced languages.
Collapse
Affiliation(s)
- Claudio Crema
- Laboratory of Neuroinformatics, IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli, Brescia, Italy.
| | - Tommaso Mario Buonocore
- Dept. of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy.
| | - Silvia Fostinelli
- Molecular Markers Laboratory, IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli, Brescia, Italy.
| | - Enea Parimbelli
- Dept. of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy.
| | - Federico Verde
- Department of Neurology and Laboratory of Neuroscience, IRCCS Istituto Auxologico Italiano, Milan, Italy; Department of Pathophysiology and Transplantation, Dino Ferrari Center, Università degli Studi di Milano, Milan, Italy.
| | - Cira Fundarò
- Neurophysiopatology Unit, IRCCS Istituti Clinici Scientifici Maugeri, Pavia, Italy.
| | - Marina Manera
- Psychology Unit, IRCCS Istituti Clinici Scientifici Maugeri, Pavia, Italy.
| | - Matteo Cotta Ramusino
- Unit of Behavioral Neurology, IRCCS Mondino Foundation Pavia, and Dept. of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy.
| | - Marco Capelli
- Unit of Behavioral Neurology, IRCCS Mondino Foundation Pavia, and Dept. of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy.
| | - Alfredo Costa
- Unit of Behavioral Neurology, IRCCS Mondino Foundation Pavia, and Dept. of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy.
| | - Giuliano Binetti
- Molecular Markers Laboratory, IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli, Brescia, Italy.
| | - Riccardo Bellazzi
- Dept. of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy.
| | - Alberto Redolfi
- Laboratory of Neuroinformatics, IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli, Brescia, Italy.
| |
Collapse
|
5
|
Sun Z, Tao C. Named Entity Recognition and Normalization for Alzheimer's Disease Eligibility Criteria. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS 2023; 2023:558-564. [PMID: 38283164 PMCID: PMC10815931 DOI: 10.1109/ichi57859.2023.00100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/30/2024]
Abstract
Alzheimer's Disease (AD) is a complex neurodegenerative disorder that affects millions of people worldwide. Finding effective treatments for this disease is crucial. Clinical trials play an essential role in developing and testing new treatments for AD. However, identifying eligible participants can be challenging, time-consuming, and costly. In recent years, the development of natural language processing (NLP) techniques, specifically named entity recognition (NER) and named entity normalization (NEN), have helped to automate the identification and extraction of relevant information from the eligibility criteria (EC) more efficiently, in order to facilitate semi-automatic patient recruitment and enable data FAIRness for clinical trial data. Nevertheless, most current biomedical NER models only provide annotations for a restricted set of entity types that may not be applicable to the clinical trial data. Additionally, accurately performing NEN on entities that are negated using a negative prefix currently lacks established techniques. In this paper, we introduce a pipeline designed for information extraction from AD clinical trial EC, which involves preprocessing of the EC data, clinical NER, and biomedical NEN to Unified Medical Language System (UMLS). Our NER model can identify named entities in seven pre-defined categories, while our NEN model employs a combination of exact match and partial match search strategies, as well as customized rules to accurately normalize entities with negative prefixes. To evaluate the performance of our pipeline, we measured the precision, recall, and F1 score for the NER component, and we manually reviewed the top five mapping results produced by the NEN component. Our evaluation of the pipeline's performance revealed that it can successfully normalize named entities in clinical trial ECs with optimal accuracies. The NER component achieved a overall F1 of 0.816, demonstrating its ability to accurately identify seven types of named entities in clinical text. The NEN component of the pipeline also demonstrated impressive performance, with customized rules and a combination of exact and partial match strategies leading to an accuracy of 0.940 for normalized entities.
Collapse
Affiliation(s)
- Zenan Sun
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas
| | - Cui Tao
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas
| |
Collapse
|
6
|
Ivanisenko TV, Demenkov PS, Kolchanov NA, Ivanisenko VA. The New Version of the ANDDigest Tool with Improved AI-Based Short Names Recognition. Int J Mol Sci 2022; 23:ijms232314934. [PMID: 36499269 PMCID: PMC9738852 DOI: 10.3390/ijms232314934] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 11/19/2022] [Accepted: 11/22/2022] [Indexed: 12/05/2022] Open
Abstract
The body of scientific literature continues to grow annually. Over 1.5 million abstracts of biomedical publications were added to the PubMed database in 2021. Therefore, developing cognitive systems that provide a specialized search for information in scientific publications based on subject area ontology and modern artificial intelligence methods is urgently needed. We previously developed a web-based information retrieval system, ANDDigest, designed to search and analyze information in the PubMed database using a customized domain ontology. This paper presents an improved ANDDigest version that uses fine-tuned PubMedBERT classifiers to enhance the quality of short name recognition for molecular-genetics entities in PubMed abstracts on eight biological object types: cell components, diseases, side effects, genes, proteins, pathways, drugs, and metabolites. This approach increased average short name recognition accuracy by 13%.
Collapse
Affiliation(s)
- Timofey V. Ivanisenko
- Kurchatov Genomics Center, Institute of Cytology & Genetics, Siberian Branch, Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk 630090, Russia
- Institute of Cytology & Genetics, Siberian Branch, Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk 630090, Russia
- Correspondence:
| | - Pavel S. Demenkov
- Kurchatov Genomics Center, Institute of Cytology & Genetics, Siberian Branch, Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk 630090, Russia
- Institute of Cytology & Genetics, Siberian Branch, Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk 630090, Russia
| | - Nikolay A. Kolchanov
- Kurchatov Genomics Center, Institute of Cytology & Genetics, Siberian Branch, Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk 630090, Russia
- Institute of Cytology & Genetics, Siberian Branch, Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk 630090, Russia
- Faculty of Natural Sciences, Novosibirsk State University, St. Pirogova 1, Novosibirsk 630090, Russia
| | - Vladimir A. Ivanisenko
- Kurchatov Genomics Center, Institute of Cytology & Genetics, Siberian Branch, Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk 630090, Russia
- Institute of Cytology & Genetics, Siberian Branch, Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk 630090, Russia
- Faculty of Natural Sciences, Novosibirsk State University, St. Pirogova 1, Novosibirsk 630090, Russia
| |
Collapse
|