1
|
Zitu MM, Gatti-Mays ME, Johnson KC, Zhang S, Shendre A, Elsaid MI, Li L. Detection of Patient-Level Immunotherapy-Related Adverse Events (irAEs) from Clinical Narratives of Electronic Health Records: A High-Sensitivity Artificial Intelligence Model. Pragmat Obs Res 2024; 15:243-252. [PMID: 39720010 PMCID: PMC11668329 DOI: 10.2147/por.s468253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Accepted: 09/02/2024] [Indexed: 12/26/2024] Open
Abstract
Purpose We developed an artificial intelligence (AI) model to detect immunotherapy -related adverse events (irAEs) from clinical narratives of electronic health records (EHRs) at the patient level. Patients and Methods Training data, used for internal validation of the AI model, comprised 1230 clinical notes from 30 patients at The Ohio State University James Cancer Hospital-20 patients who experienced irAEs and ten who did not. 3256 clinical notes of 50 patients were utilized for external validation of the AI model. Results Use of a leave-one-out cross-validation technique for internal validation among those 30 patients yielded accurate identification of 19 of 20 with irAEs (positive patients; 95% sensitivity) and correct dissociation of eight of ten without (negative patients; 80% specificity). External validation on 3256 clinical notes of 50 patients yielded high sensitivity (95%) but moderate specificity (64%). If we improve the model's specificity to 100%, it could eliminate the need to manually review 2500 of those 3256 clinical notes (77%). Conclusion Combined use of this AI model with the manual review of clinical notes will improve both sensitivity and specificity in the detection of irAEs, decreasing workload and costs and facilitating the development of improved immunotherapies.
Collapse
Affiliation(s)
- Md Muntasir Zitu
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
| | - Margaret E Gatti-Mays
- Division of Medical Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, OH, 43210, USA
| | - Kai C Johnson
- Division of Medical Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, OH, 43210, USA
| | - Shijun Zhang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
| | - Aditi Shendre
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
| | - Mohamed I Elsaid
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
| | - Lang Li
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
| |
Collapse
|
2
|
Dai X, Karimi S, Sarker A, Hachey B, Paris C. MultiADE: A Multi-domain benchmark for Adverse Drug Event extraction. J Biomed Inform 2024; 160:104744. [PMID: 39536999 DOI: 10.1016/j.jbi.2024.104744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Revised: 10/23/2024] [Accepted: 10/29/2024] [Indexed: 11/16/2024]
Abstract
OBJECTIVE Active adverse event surveillance monitors Adverse Drug Events (ADE) from different data sources, such as electronic health records, medical literature, social media and search engine logs. Over the years, many datasets have been created, and shared tasks have been organised to facilitate active adverse event surveillance. However, most - if not all - datasets or shared tasks focus on extracting ADEs from a particular type of text. Domain generalisation - the ability of a machine learning model to perform well on new, unseen domains (text types) - is under-explored. Given the rapid advancements in natural language processing, one unanswered question is how far we are from having a single ADE extraction model that is effective on various types of text, such as scientific literature and social media posts. METHODS We contribute to answering this question by building a multi-domain benchmark for adverse drug event extraction, which we named MultiADE. The new benchmark comprises several existing datasets sampled from different text types and our newly created dataset-CADECv2, which is an extension of CADEC (Karimi et al., 2015), covering online posts regarding more diverse drugs than CADEC. Our new dataset is carefully annotated by human annotators following detailed annotation guidelines. CONCLUSION Our benchmark results show that the generalisation of the trained models is far from perfect, making it infeasible to be deployed to process different types of text. In addition, although intermediate transfer learning is a promising approach to utilising existing resources, further investigation is needed on methods of domain adaptation, particularly cost-effective methods to select useful training instances. The newly created CADECv2 and the scripts for building the benchmark are publicly available at CSIRO's Data Portal (https://data.csiro.au/collection/csiro:62387). These resources enable the research community to further information extraction, leading to more effective active adverse drug event surveillance.
Collapse
|
3
|
Yang Y, Lu Y, Zheng Z, Wu H, Lin Y, Qian F, Yan W. MKG-GC: A multi-task learning-based knowledge graph construction framework with personalized application to gastric cancer. Comput Struct Biotechnol J 2024; 23:1339-1347. [PMID: 38585647 PMCID: PMC10995799 DOI: 10.1016/j.csbj.2024.03.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 03/24/2024] [Accepted: 03/24/2024] [Indexed: 04/09/2024] Open
Abstract
Over the past decade, information for precision disease medicine has accumulated in the form of textual data. To effectively utilize this expanding medical text, we proposed a multi-task learning-based framework based on hard parameter sharing for knowledge graph construction (MKG), and then used it to automatically extract gastric cancer (GC)-related biomedical knowledge from the literature and identify GC drug candidates. In MKG, we designed three separate modules, MT-BGIPN, MT-SGTF and MT-ScBERT, for entity recognition, entity normalization, and relation classification, respectively. To address the challenges posed by the long and irregular naming of medical entities, the MT-BGIPN utilized bidirectional gated recurrent unit and interactive pointer network techniques, significantly improving entity recognition accuracy to an average F1 value of 84.5% across datasets. In MT-SGTF, we employed the term frequency-inverse document frequency and the gated attention unit. These combine both semantic and characteristic features of entities, resulting in an average Hits@ 1 score of 94.5% across five datasets. The MT-ScBERT integrated cross-text, entity, and context features, yielding an average F1 value of 86.9% across 11 relation classification datasets. Based on the MKG, we then developed a specific knowledge graph for GC (MKG-GC), which encompasses a total of 9129 entities and 88,482 triplets. Lastly, the MKG-GC was used to predict potential GC drugs using a pre-trained language model called BioKGE-BERT and a drug-disease discriminant model based on CNN-BiLSTM. Remarkably, nine out of the top ten predicted drugs have been previously reported as effective for gastric cancer treatment. Finally, an online platform was created for exploration and visualization of MKG-GC at https://www.yanglab-mi.org.cn/MKG-GC/.
Collapse
Affiliation(s)
- Yang Yang
- Computing Science and Artificial Intelligence College, Suzhou City University, Suzhou 215004, China
- School of Computer Science & Technology, Soochow University, Suzhou 215000, China
| | - Yuwei Lu
- School of Computer Science & Technology, Soochow University, Suzhou 215000, China
| | - Zixuan Zheng
- School of Computer Science & Technology, Soochow University, Suzhou 215000, China
| | - Hao Wu
- Department of Bioinformatics, School of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow University, Suzhou 215123, China
| | - Yuxin Lin
- Center for Systems Biology, Soochow University, Suzhou 215123, China
- Department of Urology, the First Affiliated Hospital of Soochow University, Suzhou 215000, China
| | - Fuliang Qian
- Center for Systems Biology, Soochow University, Suzhou 215123, China
- Medical Center of Soochow University, Suzhou 215123, China
- Jiangsu Province Engineering Research Center of Precision Diagnostics and Therapeutics Development, Soochow University, Suzhou 215123, China
| | - Wenying Yan
- Department of Bioinformatics, School of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow University, Suzhou 215123, China
- Center for Systems Biology, Soochow University, Suzhou 215123, China
- Jiangsu Province Engineering Research Center of Precision Diagnostics and Therapeutics Development, Soochow University, Suzhou 215123, China
| |
Collapse
|
4
|
Fu YV, Ramachandran GK, Halwani A, McInnes BT, Xia F, Lybarger K, Yetisgen M, Uzuner Ö. CACER: Clinical concept Annotations for Cancer Events and Relations. J Am Med Inform Assoc 2024; 31:2583-2594. [PMID: 39225779 PMCID: PMC11491616 DOI: 10.1093/jamia/ocae231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 08/08/2024] [Accepted: 08/12/2024] [Indexed: 09/04/2024] Open
Abstract
OBJECTIVE Clinical notes contain unstructured representations of patient histories, including the relationships between medical problems and prescription drugs. To investigate the relationship between cancer drugs and their associated symptom burden, we extract structured, semantic representations of medical problem and drug information from the clinical narratives of oncology notes. MATERIALS AND METHODS We present Clinical concept Annotations for Cancer Events and Relations (CACER), a novel corpus with fine-grained annotations for over 48 000 medical problems and drug events and 10 000 drug-problem and problem-problem relations. Leveraging CACER, we develop and evaluate transformer-based information extraction models such as Bidirectional Encoder Representations from Transformers (BERT), Fine-tuned Language Net Text-To-Text Transfer Transformer (Flan-T5), Large Language Model Meta AI (Llama3), and Generative Pre-trained Transformers-4 (GPT-4) using fine-tuning and in-context learning (ICL). RESULTS In event extraction, the fine-tuned BERT and Llama3 models achieved the highest performance at 88.2-88.0 F1, which is comparable to the inter-annotator agreement (IAA) of 88.4 F1. In relation extraction, the fine-tuned BERT, Flan-T5, and Llama3 achieved the highest performance at 61.8-65.3 F1. GPT-4 with ICL achieved the worst performance across both tasks. DISCUSSION The fine-tuned models significantly outperformed GPT-4 in ICL, highlighting the importance of annotated training data and model optimization. Furthermore, the BERT models performed similarly to Llama3. For our task, large language models offer no performance advantage over the smaller BERT models. CONCLUSIONS We introduce CACER, a novel corpus with fine-grained annotations for medical problems, drugs, and their relationships in clinical narratives of oncology notes. State-of-the-art transformer models achieved performance comparable to IAA for several extraction tasks.
Collapse
Affiliation(s)
- Yujuan Velvin Fu
- Department of Biomedical Informatics & Medical Education, University of Washington, Seattle, WA 98195, United States
| | | | - Ahmad Halwani
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, United States
| | - Bridget T McInnes
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, United States
| | - Fei Xia
- Department of Linguistics, University of Washington, Seattle, WA 98195, United States
| | - Kevin Lybarger
- Department of Information Sciences and Technology, George Mason University, Fairfax, VA 22030, United States
| | - Meliha Yetisgen
- Department of Biomedical Informatics & Medical Education, University of Washington, Seattle, WA 98195, United States
| | - Özlem Uzuner
- Department of Information Sciences and Technology, George Mason University, Fairfax, VA 22030, United States
| |
Collapse
|
5
|
Liu J, Wong ZSY. Utilizing active learning strategies in machine-assisted annotation for clinical named entity recognition: a comprehensive analysis considering annotation costs and target effectiveness. J Am Med Inform Assoc 2024; 31:2632-2640. [PMID: 39081233 PMCID: PMC11491619 DOI: 10.1093/jamia/ocae197] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 07/09/2024] [Accepted: 07/15/2024] [Indexed: 10/22/2024] Open
Abstract
OBJECTIVES Active learning (AL) has rarely integrated diversity-based and uncertainty-based strategies into a dynamic sampling framework for clinical named entity recognition (NER). Machine-assisted annotation is becoming popular for creating gold-standard labels. This study investigated the effectiveness of dynamic AL strategies under simulated machine-assisted annotation scenarios for clinical NER. MATERIALS AND METHODS We proposed 3 new AL strategies: a diversity-based strategy (CLUSTER) based on Sentence-BERT and 2 dynamic strategies (CLC and CNBSE) capable of switching from diversity-based to uncertainty-based strategies. Using BioClinicalBERT as the foundational NER model, we conducted simulation experiments on 3 medication-related clinical NER datasets independently: i2b2 2009, n2c2 2018 (Track 2), and MADE 1.0. We compared the proposed strategies with uncertainty-based (LC and NBSE) and passive-learning (RANDOM) strategies. Performance was primarily measured by the number of edits made by the annotators to achieve a desired target effectiveness evaluated on independent test sets. RESULTS When aiming for 98% overall target effectiveness, on average, CLUSTER required the fewest edits. When aiming for 99% overall target effectiveness, CNBSE required 20.4% fewer edits than NBSE did. CLUSTER and RANDOM could not achieve such a high target under the pool-based simulation experiment. For high-difficulty entities, CNBSE required 22.5% fewer edits than NBSE to achieve 99% target effectiveness, whereas neither CLUSTER nor RANDOM achieved 93% target effectiveness. DISCUSSION AND CONCLUSION When the target effectiveness was set high, the proposed dynamic strategy CNBSE exhibited both strong learning capabilities and low annotation costs in machine-assisted annotation. CLUSTER required the fewest edits when the target effectiveness was set low.
Collapse
Affiliation(s)
- Jiaxing Liu
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, Hubei 430073, China
| | - Zoie S Y Wong
- Graduate School of Public Health, St Luke’s International University, OMURA Susumu & Mieko Memorial St Luke’s Center for Clinical Academia, Chuo-ku, Tokyo 104-0045, Japan
- The Kirby Institute, University of New South Wales, Sydney, NSW 2052, Australia
- School of Medical Sciences, The Unviersity of Sydney, Camperdown, NSW 2050, Australia
| |
Collapse
|
6
|
Kim K, Park S, Min J, Park S, Kim JY, Eun J, Jung K, Park YE, Kim E, Lee EY, Lee J, Choi J. Multifaceted Natural Language Processing Task-Based Evaluation of Bidirectional Encoder Representations From Transformers Models for Bilingual (Korean and English) Clinical Notes: Algorithm Development and Validation. JMIR Med Inform 2024; 12:e52897. [PMID: 39475725 PMCID: PMC11539635 DOI: 10.2196/52897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 07/08/2024] [Accepted: 08/17/2024] [Indexed: 11/08/2024] Open
Abstract
Background The bidirectional encoder representations from transformers (BERT) model has attracted considerable attention in clinical applications, such as patient classification and disease prediction. However, current studies have typically progressed to application development without a thorough assessment of the model's comprehension of clinical context. Furthermore, limited comparative studies have been conducted on BERT models using medical documents from non-English-speaking countries. Therefore, the applicability of BERT models trained on English clinical notes to non-English contexts is yet to be confirmed. To address these gaps in literature, this study focused on identifying the most effective BERT model for non-English clinical notes. Objective In this study, we evaluated the contextual understanding abilities of various BERT models applied to mixed Korean and English clinical notes. The objective of this study was to identify the BERT model that excels in understanding the context of such documents. Methods Using data from 164,460 patients in a South Korean tertiary hospital, we pretrained BERT-base, BERT for Biomedical Text Mining (BioBERT), Korean BERT (KoBERT), and Multilingual BERT (M-BERT) to improve their contextual comprehension capabilities and subsequently compared their performances in 7 fine-tuning tasks. Results The model performance varied based on the task and token usage. First, BERT-base and BioBERT excelled in tasks using classification ([CLS]) token embeddings, such as document classification. BioBERT achieved the highest F1-score of 89.32. Both BERT-base and BioBERT demonstrated their effectiveness in document pattern recognition, even with limited Korean tokens in the dictionary. Second, M-BERT exhibited a superior performance in reading comprehension tasks, achieving an F1-score of 93.77. Better results were obtained when fewer words were replaced with unknown ([UNK]) tokens. Third, M-BERT excelled in the knowledge inference task in which correct disease names were inferred from 63 candidate disease names in a document with disease names replaced with [MASK] tokens. M-BERT achieved the highest hit@10 score of 95.41. Conclusions This study highlighted the effectiveness of various BERT models in a multilingual clinical domain. The findings can be used as a reference in clinical and language-based applications.
Collapse
Affiliation(s)
- Kyungmo Kim
- Interdisciplinary Program for Bioengineering, Seoul National University, Seoul, Republic of Korea
| | - Seongkeun Park
- Seoul National University Medical Research Center, Seoul, Republic of Korea
| | - Jeongwon Min
- Interdisciplinary Program for Bioengineering, Seoul National University, Seoul, Republic of Korea
| | - Sumin Park
- Institute of Medical and Biological Engineering, Medical Research Center, Seoul National University, Seoul, Republic of Korea
| | - Ju Yeon Kim
- Division of Rheumatology, Department of Internal Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| | - Jinsu Eun
- Human Computer Interaction and Design Lab, Seoul National University, Seoul, Republic of Korea
| | - Kyuha Jung
- Human Computer Interaction and Design Lab, Seoul National University, Seoul, Republic of Korea
| | - Yoobin Elyson Park
- Human Computer Interaction and Design Lab, Seoul National University, Seoul, Republic of Korea
| | - Esther Kim
- Human Computer Interaction and Design Lab, Seoul National University, Seoul, Republic of Korea
| | - Eun Young Lee
- Division of Rheumatology, Department of Internal Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| | - Joonhwan Lee
- Human Computer Interaction and Design Lab, Seoul National University, Seoul, Republic of Korea
| | - Jinwook Choi
- Institute of Medical and Biological Engineering, Medical Research Center, Seoul National University, Seoul, Republic of Korea
- Seoul National University College of Medicine, 103 Daehak-ro, Jongno-gu, Seoul, 03080, Republic of Korea, 82 2-766-3421
| |
Collapse
|
7
|
Yada S, Nakamura Y, Wakamiya S, Aramaki E. Cross-lingual Natural Language Processing on Limited Annotated Case/Radiology Reports in English and Japanese: Insights from the Real-MedNLP Workshop. Methods Inf Med 2024. [PMID: 39209296 DOI: 10.1055/a-2405-2489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/04/2024]
Abstract
BACKGROUND Textual datasets (corpora) are crucial for the application of natural language processing (NLP) models. However, corpus creation in the medical field is challenging, primarily because of privacy issues with raw clinical data such as health records. Thus, the existing clinical corpora are generally small and scarce. Medical NLP (MedNLP) methodologies perform well with limited data availability. OBJECTIVES We present the outcomes of the Real-MedNLP workshop, which was conducted using limited and parallel medical corpora. Real-MedNLP exhibits three distinct characteristics: (1) limited annotated documents: the training data comprise only a small set (∼100) of case reports (CRs) and radiology reports (RRs) that have been annotated. (2) Bilingually parallel: the constructed corpora are parallel in Japanese and English. (3) Practical tasks: the workshop addresses fundamental tasks, such as named entity recognition (NER) and applied practical tasks. METHODS We propose three tasks: NER of ∼100 available documents (Task 1), NER based only on annotation guidelines for humans (Task 2), and clinical applications (Task 3) consisting of adverse drug effect (ADE) detection for CRs and identical case identification (CI) for RRs. RESULTS Nine teams participated in this study. The best systems achieved 0.65 and 0.89 F1-scores for CRs and RRs in Task 1, whereas the top scores in Task 2 decreased by 50 to 70%. In Task 3, ADE reports were detected by up to 0.64 F1-score, and CI scored up to 0.96 binary accuracy. CONCLUSION Most systems adopt medical-domain-specific pretrained language models using data augmentation methods. Despite the challenge of limited corpus size in Tasks 1 and 2, recent approaches are promising because the partial match scores reached ∼0.8-0.9 F1-scores. Task 3 applications revealed that the different availabilities of external language resources affected the performance per language.
Collapse
Affiliation(s)
- Shuntaro Yada
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Nara, Japan
| | - Yuta Nakamura
- 22nd Century Medical and Research Center, The University of Tokyo Hospital, Tokyo, Japan
| | - Shoko Wakamiya
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Nara, Japan
| | - Eiji Aramaki
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Nara, Japan
| |
Collapse
|
8
|
Turki H, Dossou BFP, Emezue CC, Owodunni AT, Hadj Taieb MA, Ben Aouicha M, Ben Hassen H, Masmoudi A. MeSH2Matrix: combining MeSH keywords and machine learning for biomedical relation classification based on PubMed. J Biomed Semantics 2024; 15:18. [PMID: 39354632 PMCID: PMC11445994 DOI: 10.1186/s13326-024-00319-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 08/31/2024] [Indexed: 10/03/2024] Open
Abstract
Biomedical relation classification has been significantly improved by the application of advanced machine learning techniques on the raw texts of scholarly publications. Despite this improvement, the reliance on large chunks of raw text makes these algorithms suffer in terms of generalization, precision, and reliability. The use of the distinctive characteristics of bibliographic metadata can prove effective in achieving better performance for this challenging task. In this research paper, we introduce an approach for biomedical relation classification using the qualifiers of co-occurring Medical Subject Headings (MeSH). First of all, we introduce MeSH2Matrix, our dataset consisting of 46,469 biomedical relations curated from PubMed publications using our approach. Our dataset includes a matrix that maps associations between the qualifiers of subject MeSH keywords and those of object MeSH keywords. It also specifies the corresponding Wikidata relation type and the superclass of semantic relations for each relation. Using MeSH2Matrix, we build and train three machine learning models (Support Vector Machine [SVM], a dense model [D-Model], and a convolutional neural network [C-Net]) to evaluate the efficiency of our approach for biomedical relation classification. Our best model achieves an accuracy of 70.78% for 195 classes and 83.09% for five superclasses. Finally, we provide confusion matrix and extensive feature analyses to better examine the relationship between the MeSH qualifiers and the biomedical relations being classified. Our results will hopefully shed light on developing better algorithms for biomedical ontology classification based on the MeSH keywords of PubMed publications. For reproducibility purposes, MeSH2Matrix, as well as all our source codes, are made publicly accessible at https://github.com/SisonkeBiotik-Africa/MeSH2Matrix .
Collapse
Affiliation(s)
- Houcemeddine Turki
- Data Engineering and Semantics Research Unit, Faculty of Sciences of Sfax, University of Sfax, Sfax, Tunisia.
| | | | - Chris Chinenye Emezue
- Mila Quebec AI Institute, Montreal, Canada
- Technical University of Munich, Munich, Germany
| | | | - Mohamed Ali Hadj Taieb
- Data Engineering and Semantics Research Unit, Faculty of Sciences of Sfax, University of Sfax, Sfax, Tunisia
| | - Mohamed Ben Aouicha
- Data Engineering and Semantics Research Unit, Faculty of Sciences of Sfax, University of Sfax, Sfax, Tunisia
| | - Hanen Ben Hassen
- Laboratory of Probability and Statistics, Faculty of Sciences of Sfax, University of Sfax, Sfax, Tunisia
| | - Afif Masmoudi
- Laboratory of Probability and Statistics, Faculty of Sciences of Sfax, University of Sfax, Sfax, Tunisia
| |
Collapse
|
9
|
Košprdić M, Prodanović N, Ljajić A, Bašaragin B, Milošević N. From zero to hero: Harnessing transformers for biomedical named entity recognition in zero- and few-shot contexts. Artif Intell Med 2024; 156:102970. [PMID: 39197375 DOI: 10.1016/j.artmed.2024.102970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Revised: 08/23/2024] [Accepted: 08/23/2024] [Indexed: 09/01/2024]
Abstract
Supervised named entity recognition (NER) in the biomedical domain depends on large sets of annotated texts with the given named entities. The creation of such datasets can be time-consuming and expensive, while extraction of new entities requires additional annotation tasks and retraining the model. This paper proposes a method for zero- and few-shot NER in the biomedical domain to address these challenges. The method is based on transforming the task of multi-class token classification into binary token classification and pre-training on a large number of datasets and biomedical entities, which allows the model to learn semantic relations between the given and potentially novel named entity labels. We have achieved average F1 scores of 35.44% for zero-shot NER, 50.10% for one-shot NER, 69.94% for 10-shot NER, and 79.51% for 100-shot NER on 9 diverse evaluated biomedical entities with fine-tuned PubMedBERT-based model. The results demonstrate the effectiveness of the proposed method for recognizing new biomedical entities with no or limited number of examples, outperforming previous transformer-based methods, and being comparable to GPT3-based models using models with over 1000 times fewer parameters. We make models and developed code publicly available.
Collapse
Affiliation(s)
- Miloš Košprdić
- Institute for Artificial Intelligence Research and Development of Serbia, Fruškogorska 1, Novi Sad, 21000, Serbia
| | - Nikola Prodanović
- Institute for Artificial Intelligence Research and Development of Serbia, Fruškogorska 1, Novi Sad, 21000, Serbia
| | - Adela Ljajić
- Institute for Artificial Intelligence Research and Development of Serbia, Fruškogorska 1, Novi Sad, 21000, Serbia
| | - Bojana Bašaragin
- Institute for Artificial Intelligence Research and Development of Serbia, Fruškogorska 1, Novi Sad, 21000, Serbia
| | - Nikola Milošević
- Institute for Artificial Intelligence Research and Development of Serbia, Fruškogorska 1, Novi Sad, 21000, Serbia; Bayer A.G., Research and Development, Mullerstrasse 173, Berlin, 13342, Germany.
| |
Collapse
|
10
|
Yang S, Yang X, Lyu T, Huang JL, Chen A, He X, Braithwaite D, Mehta HJ, Wu Y, Guo Y, Bian J. Extracting Pulmonary Nodules and Nodule Characteristics from Radiology Reports of Lung Cancer Screening Patients Using Transformer Models. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2024; 8:463-477. [PMID: 39131104 PMCID: PMC11310180 DOI: 10.1007/s41666-024-00166-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 04/12/2024] [Accepted: 05/12/2024] [Indexed: 08/13/2024]
Abstract
Pulmonary nodules and nodule characteristics are important indicators of lung nodule malignancy. However, nodule information is often documented as free text in clinical narratives such as radiology reports in electronic health record systems. Natural language processing (NLP) is the key technology to extract and standardize patient information from radiology reports into structured data elements. This study aimed to develop an NLP system using state-of-the-art transformer models to extract pulmonary nodules and associated nodule characteristics from radiology reports. We identified a cohort of 3080 patients who underwent LDCT at the University of Florida health system and collected their radiology reports. We manually annotated 394 reports as the gold standard. We explored eight pretrained transformer models from three transformer architectures including bidirectional encoder representations from transformers (BERT), robustly optimized BERT approach (RoBERTa), and A Lite BERT (ALBERT), for clinical concept extraction, relation identification, and negation detection. We examined general transformer models pretrained using general English corpora, transformer models fine-tuned using a clinical corpus, and a large clinical transformer model, GatorTron, which was trained from scratch using 90 billion words of clinical text. We compared transformer models with two baseline models including a recurrent neural network implemented using bidirectional long short-term memory with a conditional random fields layer and support vector machines. RoBERTa-mimic achieved the best F1-score of 0.9279 for nodule concept and nodule characteristics extraction. ALBERT-base and GatorTron achieved the best F1-score of 0.9737 in linking nodule characteristics to pulmonary nodules. Seven out of eight transformers achieved the best F1-score of 1.0000 for negation detection. Our end-to-end system achieved an overall F1-score of 0.8869. This study demonstrated the advantage of state-of-the-art transformer models for pulmonary nodule information extraction from radiology reports. Supplementary Information The online version contains supplementary material available at 10.1007/s41666-024-00166-5.
Collapse
Affiliation(s)
- Shuang Yang
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL USA
| | - Xi Yang
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL USA
| | - Tianchen Lyu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL USA
| | - James L. Huang
- Department of Pharmaceutical Outcomes and Policy, College of Pharmacy, University of Florida, Gainesville, FL USA
| | - Aokun Chen
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL USA
| | - Xing He
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL USA
| | - Dejana Braithwaite
- Departments of Surgery and Epidemiology, University of Florida, Gainesville, FL USA
| | - Hiren J. Mehta
- Division of Pulmonary, Critical Care, and Sleep Medicine, College of Medicine, University of Florida, Gainesville, FL USA
| | - Yonghui Wu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL USA
| | - Yi Guo
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL USA
| |
Collapse
|
11
|
Peng C, Yang X, Chen A, Yu Z, Smith KE, Costa AB, Flores MG, Bian J, Wu Y. Generative large language models are all-purpose text analytics engines: text-to-text learning is all your need. J Am Med Inform Assoc 2024; 31:1892-1903. [PMID: 38630580 PMCID: PMC11339507 DOI: 10.1093/jamia/ocae078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 02/26/2024] [Accepted: 04/02/2024] [Indexed: 04/19/2024] Open
Abstract
OBJECTIVE To solve major clinical natural language processing (NLP) tasks using a unified text-to-text learning architecture based on a generative large language model (LLM) via prompt tuning. METHODS We formulated 7 key clinical NLP tasks as text-to-text learning and solved them using one unified generative clinical LLM, GatorTronGPT, developed using GPT-3 architecture and trained with up to 20 billion parameters. We adopted soft prompts (ie, trainable vectors) with frozen LLM, where the LLM parameters were not updated (ie, frozen) and only the vectors of soft prompts were updated, known as prompt tuning. We added additional soft prompts as a prefix to the input layer, which were optimized during the prompt tuning. We evaluated the proposed method using 7 clinical NLP tasks and compared them with previous task-specific solutions based on Transformer models. RESULTS AND CONCLUSION The proposed approach achieved state-of-the-art performance for 5 out of 7 major clinical NLP tasks using one unified generative LLM. Our approach outperformed previous task-specific transformer models by ∼3% for concept extraction and 7% for relation extraction applied to social determinants of health, 3.4% for clinical concept normalization, 3.4%-10% for clinical abbreviation disambiguation, and 5.5%-9% for natural language inference. Our approach also outperformed a previously developed prompt-based machine reading comprehension (MRC) model, GatorTron-MRC, for clinical concept and relation extraction. The proposed approach can deliver the "one model for all" promise from training to deployment using a unified generative LLM.
Collapse
Affiliation(s)
- Cheng Peng
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32611, United States
| | - Xi Yang
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32611, United States
- Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL 32610, United States
| | - Aokun Chen
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32611, United States
- Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL 32610, United States
| | - Zehao Yu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32611, United States
| | | | | | | | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32611, United States
- Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL 32610, United States
| | - Yonghui Wu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32611, United States
- Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL 32610, United States
| |
Collapse
|
12
|
Islamaj R, Lai PT, Wei CH, Luo L, Almeida T, Jonker RAA, Conceição SIR, Sousa DF, Phan CP, Chiang JH, Li J, Pan D, Meesawad W, Tsai RTH, Sarol MJ, Hong G, Valiev A, Tutubalina E, Lee SM, Hsu YY, Li M, Verspoor K, Lu Z. The overview of the BioRED (Biomedical Relation Extraction Dataset) track at BioCreative VIII. Database (Oxford) 2024; 2024:baae069. [PMID: 39114977 PMCID: PMC11306928 DOI: 10.1093/database/baae069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 05/27/2024] [Accepted: 07/09/2024] [Indexed: 08/11/2024]
Abstract
The BioRED track at BioCreative VIII calls for a community effort to identify, semantically categorize, and highlight the novelty factor of the relationships between biomedical entities in unstructured text. Relation extraction is crucial for many biomedical natural language processing (NLP) applications, from drug discovery to custom medical solutions. The BioRED track simulates a real-world application of biomedical relationship extraction, and as such, considers multiple biomedical entity types, normalized to their specific corresponding database identifiers, as well as defines relationships between them in the documents. The challenge consisted of two subtasks: (i) in Subtask 1, participants were given the article text and human expert annotated entities, and were asked to extract the relation pairs, identify their semantic type and the novelty factor, and (ii) in Subtask 2, participants were given only the article text, and were asked to build an end-to-end system that could identify and categorize the relationships and their novelty. We received a total of 94 submissions from 14 teams worldwide. The highest F-score performances achieved for the Subtask 1 were: 77.17% for relation pair identification, 58.95% for relation type identification, 59.22% for novelty identification, and 44.55% when evaluating all of the above aspects of the comprehensive relation extraction. The highest F-score performances achieved for the Subtask 2 were: 55.84% for relation pair, 43.03% for relation type, 42.74% for novelty, and 32.75% for comprehensive relation extraction. The entire BioRED track dataset and other challenge materials are available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BC8-BioRED-track/ and https://codalab.lisn.upsaclay.fr/competitions/13377 and https://codalab.lisn.upsaclay.fr/competitions/13378. Database URL: https://ftp.ncbi.nlm.nih.gov/pub/lu/BC8-BioRED-track/https://codalab.lisn.upsaclay.fr/competitions/13377https://codalab.lisn.upsaclay.fr/competitions/13378.
Collapse
Affiliation(s)
- Rezarta Islamaj
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20894, United States
| | - Po-Ting Lai
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20894, United States
| | - Chih-Hsuan Wei
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20894, United States
| | - Ling Luo
- School of Computer Science and Technology, Dalian University of Technology, No. 2 Linggong Road, Ganjingzi District, Dalian 116024, China
| | - Tiago Almeida
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Campus Universitário de Santiago, Aveiro 3810-193, Portugal
| | - Richard A. A Jonker
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Campus Universitário de Santiago, Aveiro 3810-193, Portugal
| | - Sofia I. R Conceição
- Departamento de Informática, Faculdade de Ciências da Universidade de Lisboa, Edifício C6 Campo Grande, Lisbon 1749-016, Portugal
| | - Diana F Sousa
- Departamento de Informática, Faculdade de Ciências da Universidade de Lisboa, Edifício C6 Campo Grande, Lisbon 1749-016, Portugal
| | - Cong-Phuoc Phan
- Department of Computer Science and Information Engineering, National Cheng Kung University, No.1, University Road, Tainan City 701, Taiwan, Republic of China
| | - Jung-Hsien Chiang
- Department of Computer Science and Information Engineering, National Cheng Kung University, No.1, University Road, Tainan City 701, Taiwan, Republic of China
| | - Jiru Li
- School of Computer Science and Technology, Dalian University of Technology, No. 2 Linggong Road, Ganjingzi District, Dalian 116024, China
| | - Dinghao Pan
- School of Computer Science and Technology, Dalian University of Technology, No. 2 Linggong Road, Ganjingzi District, Dalian 116024, China
| | - Wilailack Meesawad
- Department of Computer Science and Information Engineering, National Central University, No. 300, Zhongda Rd., Zhongli District, Taoyuan City 32001, Taiwan, Republic of China
| | - Richard Tzong-Han Tsai
- Department of Computer Science and Information Engineering, National Central University, No. 300, Zhongda Rd., Zhongli District, Taoyuan City 32001, Taiwan, Republic of China
- Research Center for Humanities and Social Sciences, Academia Sinica, No. 128, Section 2, Academia Rd., Nangang District, Taoyuan City 115201, Taiwan, Republic of China
| | - M. Janina Sarol
- School of Information Sciences, University of Illinois at Urbana-Champaign, 614 E. Daniel St, Champaign, IL 61820, United States
| | - Gibong Hong
- School of Information Sciences, University of Illinois at Urbana-Champaign, 614 E. Daniel St, Champaign, IL 61820, United States
| | - Airat Valiev
- Higher School of Economics University, 20 Myasnitskaya St, Moscow 101000, Russia
| | - Elena Tutubalina
- Artificial Intelligence Research Institute (AIRI), 32 Kutuzovskiy St, Moscow 121170, Russia
- Kazan Federal University, 18 Kremlevskaya St, Kazan 420008, Russia
| | - Shao-Man Lee
- Miin Wu School of Computing, National Cheng Kung University, No. 1, University Road, Tainan 701, Taiwan, Republic of China
| | - Yi-Yu Hsu
- Miin Wu School of Computing, National Cheng Kung University, No. 1, University Road, Tainan 701, Taiwan, Republic of China
| | - Mingjie Li
- School of Computing Technologies, RMIT University, 124 La Trobe Street, Melbourne, Victoria 3000, Australia
| | - Karin Verspoor
- School of Computing Technologies, RMIT University, 124 La Trobe Street, Melbourne, Victoria 3000, Australia
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20894, United States
| |
Collapse
|
13
|
Luo X, Deng Z, Yang B, Luo MY. Pre-trained language models in medicine: A survey. Artif Intell Med 2024; 154:102904. [PMID: 38917600 DOI: 10.1016/j.artmed.2024.102904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 04/15/2024] [Accepted: 06/03/2024] [Indexed: 06/27/2024]
Abstract
With the rapid progress in Natural Language Processing (NLP), Pre-trained Language Models (PLM) such as BERT, BioBERT, and ChatGPT have shown great potential in various medical NLP tasks. This paper surveys the cutting-edge achievements in applying PLMs to various medical NLP tasks. Specifically, we first brief PLMS and outline the research of PLMs in medicine. Next, we categorise and discuss the types of tasks in medical NLP, covering text summarisation, question-answering, machine translation, sentiment analysis, named entity recognition, information extraction, medical education, relation extraction, and text mining. For each type of task, we first provide an overview of the basic concepts, the main methodologies, the advantages of applying PLMs, the basic steps of applying PLMs application, the datasets for training and testing, and the metrics for task evaluation. Subsequently, a summary of recent important research findings is presented, analysing their motivations, strengths vs weaknesses, similarities vs differences, and discussing potential limitations. Also, we assess the quality and influence of the research reviewed in this paper by comparing the citation count of the papers reviewed and the reputation and impact of the conferences and journals where they are published. Through these indicators, we further identify the most concerned research topics currently. Finally, we look forward to future research directions, including enhancing models' reliability, explainability, and fairness, to promote the application of PLMs in clinical practice. In addition, this survey also collect some download links of some model codes and the relevant datasets, which are valuable references for researchers applying NLP techniques in medicine and medical professionals seeking to enhance their expertise and healthcare service through AI technology.
Collapse
Affiliation(s)
- Xudong Luo
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China; Guangxi Key Lab of Multi-source Information Mining, Guangxi Normal University, Guilin 541004, China; Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China.
| | - Zhiqi Deng
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China; Guangxi Key Lab of Multi-source Information Mining, Guangxi Normal University, Guilin 541004, China; Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China.
| | - Binxia Yang
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China; Guangxi Key Lab of Multi-source Information Mining, Guangxi Normal University, Guilin 541004, China; Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China.
| | - Michael Y Luo
- Emmanuel College, Cambridge University, Cambridge, CB2 3AP, UK.
| |
Collapse
|
14
|
Nerella S, Bandyopadhyay S, Zhang J, Contreras M, Siegel S, Bumin A, Silva B, Sena J, Shickel B, Bihorac A, Khezeli K, Rashidi P. Transformers and large language models in healthcare: A review. Artif Intell Med 2024; 154:102900. [PMID: 38878555 PMCID: PMC11638972 DOI: 10.1016/j.artmed.2024.102900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 05/28/2024] [Accepted: 05/30/2024] [Indexed: 08/09/2024]
Abstract
With Artificial Intelligence (AI) increasingly permeating various aspects of society, including healthcare, the adoption of the Transformers neural network architecture is rapidly changing many applications. Transformer is a type of deep learning architecture initially developed to solve general-purpose Natural Language Processing (NLP) tasks and has subsequently been adapted in many fields, including healthcare. In this survey paper, we provide an overview of how this architecture has been adopted to analyze various forms of healthcare data, including clinical NLP, medical imaging, structured Electronic Health Records (EHR), social media, bio-physiological signals, biomolecular sequences. Furthermore, which have also include the articles that used the transformer architecture for generating surgical instructions and predicting adverse outcomes after surgeries under the umbrella of critical care. Under diverse settings, these models have been used for clinical diagnosis, report generation, data reconstruction, and drug/protein synthesis. Finally, we also discuss the benefits and limitations of using transformers in healthcare and examine issues such as computational cost, model interpretability, fairness, alignment with human values, ethical implications, and environmental impact.
Collapse
Affiliation(s)
- Subhash Nerella
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | | | - Jiaqing Zhang
- Department of Electrical and Computer Engineering, University of Florida, Gainesville, United States
| | - Miguel Contreras
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Scott Siegel
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Aysegul Bumin
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, United States
| | - Brandon Silva
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, United States
| | - Jessica Sena
- Department Of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Benjamin Shickel
- Department of Medicine, University of Florida, Gainesville, United States
| | - Azra Bihorac
- Department of Medicine, University of Florida, Gainesville, United States
| | - Kia Khezeli
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Parisa Rashidi
- Department of Biomedical Engineering, University of Florida, Gainesville, United States.
| |
Collapse
|
15
|
Desai MK. Artificial intelligence in pharmacovigilance - Opportunities and challenges. Perspect Clin Res 2024; 15:116-121. [PMID: 39140015 PMCID: PMC11318788 DOI: 10.4103/picr.picr_290_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 12/04/2023] [Accepted: 12/09/2023] [Indexed: 08/15/2024] Open
Abstract
Pharmacovigilance (PV) is a data-driven process to identify medicine safety issues at the earliest by processing suspected adverse event (AE) reports and extraction of health data. The PV case processing cycle starts with data collection, data entry, initial checking completeness and validity, coding, medical assessment for causality, expectedness, severity, and seriousness, subsequently submitting report, quality checking followed by data storage and maintenance. This requires a workforce and technical expertise and therefore, is expensive and time-consuming. There has been exponential growth in the number of suspected AE reports in the PV database due to smart collection and reporting of individual case safety reports, widening the base by increased awareness and participation by health-care professionals and patients. Processing of the enormous volume and variety of data, making its sensible use and separating "needles from haystack," is a challenge for key stakeholders such as pharmaceutical firms, regulatory authorities, medical and PV experts, and National Pharmacovigilance Program managers. Artificial intelligence (AI) in health care has been very impressive in specialties that rely heavily on the interpretation of medical images. Similarly, there has been a growing interest to adopt AI tools to complement and automate the PV process. The advanced technology can certainly complement the routine, repetitive, manual task of case processing, and boost efficiency; however, its implementation across the PV lifecycle and practical impact raises several questions and challenges. Full automation of PV system is a double-edged sword and needs to consider two aspects - people and processes. The focus should be a collaborative approach of technical expertise (people) combined with intelligent technology (processes) to augment human talent that meets the objective of the PV system and benefit all stakeholders. AI technology should enhance human intelligence rather than substitute human experts. What is important is to emphasize and ensure that AI brings more benefits to PV rather than challenges. This review describes the benefits and the outstanding scientific, technological, and policy issues, and the maturity of AI tools for full automation in the context to the Indian health-care system.
Collapse
Affiliation(s)
- Mira Kirankumar Desai
- Department of Pharmacology, Dr. M. K. Shah Medical College and Research Centre, Ahmedabad, Gujarat, India
| |
Collapse
|
16
|
Hsu E, Roberts K. Leveraging Large Language Models for Knowledge-free Weak Supervision in Clinical Natural Language Processing. RESEARCH SQUARE 2024:rs.3.rs-4559971. [PMID: 38978609 PMCID: PMC11230489 DOI: 10.21203/rs.3.rs-4559971/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
The performance of deep learning-based natural language processing systems is based on large amounts of labeled training data which, in the clinical domain, are not easily available or affordable. Weak supervision and in-context learning offer partial solutions to this issue, particularly using large language models (LLMs), but their performance still trails traditional supervised methods with moderate amounts of gold-standard data. In particular, inferencing with LLMs is computationally heavy. We propose an approach leveraging fine-tuning LLMs and weak supervision with virtually no domain knowledge that still achieves consistently dominant performance. Using a prompt-based approach, the LLM is used to generate weakly-labeled data for training a downstream BERT model. The weakly supervised model is then further fine-tuned on small amounts of gold standard data. We evaluate this approach using Llama2 on three different n2c2 datasets. With no more than 10 gold standard notes, our final BERT models weakly supervised by fine-tuned Llama2-13B consistently outperformed out-of-the-box PubMedBERT by 4.7-47.9% in F1 scores. With only 50 gold standard notes, our models achieved close performance to fully fine-tuned systems.
Collapse
Affiliation(s)
- Enshuo Hsu
- University of Texas Health Science Center at Houston
| | - Kirk Roberts
- University of Texas Health Science Center at Houston
| |
Collapse
|
17
|
Silverman AL, Sushil M, Bhasuran B, Ludwig D, Buchanan J, Racz R, Parakala M, El-Kamary S, Ahima O, Belov A, Choi L, Billings M, Li Y, Habal N, Liu Q, Tiwari J, Butte AJ, Rudrapatna VA. Algorithmic Identification of Treatment-Emergent Adverse Events From Clinical Notes Using Large Language Models: A Pilot Study in Inflammatory Bowel Disease. Clin Pharmacol Ther 2024; 115:1391-1399. [PMID: 38459719 PMCID: PMC11090709 DOI: 10.1002/cpt.3226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 02/13/2024] [Indexed: 03/10/2024]
Abstract
Outpatient clinical notes are a rich source of information regarding drug safety. However, data in these notes are currently underutilized for pharmacovigilance due to methodological limitations in text mining. Large language models (LLMs) like Bidirectional Encoder Representations from Transformers (BERT) have shown progress in a range of natural language processing tasks but have not yet been evaluated on adverse event (AE) detection. We adapted a new clinical LLM, University of California - San Francisco (UCSF)-BERT, to identify serious AEs (SAEs) occurring after treatment with a non-steroid immunosuppressant for inflammatory bowel disease (IBD). We compared this model to other language models that have previously been applied to AE detection. We annotated 928 outpatient IBD notes corresponding to 928 individual patients with IBD for all SAE-associated hospitalizations occurring after treatment with a non-steroid immunosuppressant. These notes contained 703 SAEs in total, the most common of which was failure of intended efficacy. Out of eight candidate models, UCSF-BERT achieved the highest numerical performance on identifying drug-SAE pairs from this corpus (accuracy 88-92%, macro F1 61-68%), with 5-10% greater accuracy than previously published models. UCSF-BERT was significantly superior at identifying hospitalization events emergent to medication use (P < 0.01). LLMs like UCSF-BERT achieve numerically superior accuracy on the challenging task of SAE detection from clinical notes compared with prior methods. Future work is needed to adapt this methodology to improve model performance and evaluation using multicenter data and newer architectures like Generative pre-trained transformer (GPT). Our findings support the potential value of using large language models to enhance pharmacovigilance.
Collapse
Affiliation(s)
- Anna L. Silverman
- Division of Gastroenterology and Hepatology, Department of Medicine, Mayo Clinic, Phoenix, Arizona, USA
- Department of Medicine, University of California, San Diego, La Jolla, California, USA
| | - Madhumita Sushil
- Bakar Computational Health Sciences Institute, San Francisco, California, USA
| | - Balu Bhasuran
- Bakar Computational Health Sciences Institute, San Francisco, California, USA
| | - Dana Ludwig
- Bakar Computational Health Sciences Institute, San Francisco, California, USA
| | - James Buchanan
- Bakar Computational Health Sciences Institute, San Francisco, California, USA
| | - Rebecca Racz
- United States Food and Drug Administration, Silver Spring, Maryland, USA
| | - Mahalakshmi Parakala
- Department of Public Health, University of California, Berkeley, Berkeley, California, USA
| | - Samer El-Kamary
- United States Food and Drug Administration, Silver Spring, Maryland, USA
- Present address: University of Maryland School of Medicine, Baltimore, Maryland, USA
- Present address: Takeda Pharmaceuticals Inc, Boston, Massachussetts, USA
| | - Ohenewaa Ahima
- United States Food and Drug Administration, Silver Spring, Maryland, USA
| | - Artur Belov
- United States Food and Drug Administration, Silver Spring, Maryland, USA
| | - Lauren Choi
- United States Food and Drug Administration, Silver Spring, Maryland, USA
| | - Monisha Billings
- United States Food and Drug Administration, Silver Spring, Maryland, USA
| | - Yan Li
- United States Food and Drug Administration, Silver Spring, Maryland, USA
| | - Nadia Habal
- United States Food and Drug Administration, Silver Spring, Maryland, USA
| | - Qi Liu
- United States Food and Drug Administration, Silver Spring, Maryland, USA
| | - Jawahar Tiwari
- United States Food and Drug Administration, Silver Spring, Maryland, USA
| | - Atul J. Butte
- Bakar Computational Health Sciences Institute, San Francisco, California, USA
- Center for Data-Driven Insights and Innovation, University of California Health, Oakland, California, USA
| | - Vivek A. Rudrapatna
- Bakar Computational Health Sciences Institute, San Francisco, California, USA
- Division of Gastroenterology and Hepatology, Department of Medicine, University of California, San Francisco, San Francisco, California, USA
| |
Collapse
|
18
|
Molinet B, Marro S, Cabrio E, Villata S. Explanatory argumentation in natural language for correct and incorrect medical diagnoses. J Biomed Semantics 2024; 15:8. [PMID: 38816758 PMCID: PMC11138001 DOI: 10.1186/s13326-024-00306-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 04/12/2024] [Indexed: 06/01/2024] Open
Abstract
BACKGROUND A huge amount of research is carried out nowadays in Artificial Intelligence to propose automated ways to analyse medical data with the aim to support doctors in delivering medical diagnoses. However, a main issue of these approaches is the lack of transparency and interpretability of the achieved results, making it hard to employ such methods for educational purposes. It is therefore necessary to develop new frameworks to enhance explainability in these solutions. RESULTS In this paper, we present a novel full pipeline to generate automatically natural language explanations for medical diagnoses. The proposed solution starts from a clinical case description associated with a list of correct and incorrect diagnoses and, through the extraction of the relevant symptoms and findings, enriches the information contained in the description with verified medical knowledge from an ontology. Finally, the system returns a pattern-based explanation in natural language which elucidates why the correct (incorrect) diagnosis is the correct (incorrect) one. The main contribution of the paper is twofold: first, we propose two novel linguistic resources for the medical domain (i.e, a dataset of 314 clinical cases annotated with the medical entities from UMLS, and a database of biological boundaries for common findings), and second, a full Information Extraction pipeline to extract symptoms and findings from the clinical cases and match them with the terms in a medical ontology and to the biological boundaries. An extensive evaluation of the proposed approach shows the our method outperforms comparable approaches. CONCLUSIONS Our goal is to offer AI-assisted educational support framework to form clinical residents to formulate sound and exhaustive explanations for their diagnoses to patients.
Collapse
Affiliation(s)
- Benjamin Molinet
- Université Côte d'Azur, CNRS, Inria, I3S, Rte des Lucioles, Sophia Antipolis, 06900, Alpes-Maritimes, France.
| | - Santiago Marro
- Université Côte d'Azur, CNRS, Inria, I3S, Rte des Lucioles, Sophia Antipolis, 06900, Alpes-Maritimes, France
| | - Elena Cabrio
- Université Côte d'Azur, CNRS, Inria, I3S, Rte des Lucioles, Sophia Antipolis, 06900, Alpes-Maritimes, France
| | - Serena Villata
- Université Côte d'Azur, CNRS, Inria, I3S, Rte des Lucioles, Sophia Antipolis, 06900, Alpes-Maritimes, France
| |
Collapse
|
19
|
Peng L, Luo G, Zhou S, Chen J, Xu Z, Sun J, Zhang R. An in-depth evaluation of federated learning on biomedical natural language processing for information extraction. NPJ Digit Med 2024; 7:127. [PMID: 38750290 PMCID: PMC11096157 DOI: 10.1038/s41746-024-01126-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 04/23/2024] [Indexed: 05/18/2024] Open
Abstract
Language models (LMs) such as BERT and GPT have revolutionized natural language processing (NLP). However, the medical field faces challenges in training LMs due to limited data access and privacy constraints imposed by regulations like the Health Insurance Portability and Accountability Act (HIPPA) and the General Data Protection Regulation (GDPR). Federated learning (FL) offers a decentralized solution that enables collaborative learning while ensuring data privacy. In this study, we evaluated FL on 2 biomedical NLP tasks encompassing 8 corpora using 6 LMs. Our results show that: (1) FL models consistently outperformed models trained on individual clients' data and sometimes performed comparably with models trained with polled data; (2) with the fixed number of total data, FL models training with more clients produced inferior performance but pre-trained transformer-based models exhibited great resilience. (3) FL models significantly outperformed pre-trained LLMs with few-shot prompting.
Collapse
Affiliation(s)
- Le Peng
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
| | - Gaoxiang Luo
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Sicheng Zhou
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA
| | - Jiandong Chen
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA
| | - Ziyue Xu
- Nvidia Corporation, Santa Clara, CA, USA
| | - Ju Sun
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA.
| | - Rui Zhang
- Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN, USA.
| |
Collapse
|
20
|
Gallifant J, Celi LA, Sharon E, Bitterman DS. Navigating the Complexities of Artificial Intelligence-Enabled Real-World Data Collection for Oncology Pharmacovigilance. JCO Clin Cancer Inform 2024; 8:e2400051. [PMID: 38713889 PMCID: PMC11466373 DOI: 10.1200/cci.24.00051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 04/03/2024] [Indexed: 05/09/2024] Open
Abstract
This new editorial discusses the promise and challenges of successful integration of natural language processing methods into electronic health records for timely, robust, and fair oncology pharmacovigilance.
Collapse
Affiliation(s)
- Jack Gallifant
- Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA 02139
- Department of Critical Care, Guy’s & St Thomas’ NHS Trust, London, United Kingdom, SE1 7EH
| | - Leo Anthony Celi
- Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA 02139
- Division of Pulmonary, Critical Care and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, MA 02215
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115
| | - Elad Sharon
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Danielle S. Bitterman
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA
- Department of Radiation Oncology, Brigham and Women’s Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
| |
Collapse
|
21
|
Peng C, Yang X, Smith KE, Yu Z, Chen A, Bian J, Wu Y. Model tuning or prompt Tuning? a study of large language models for clinical concept and relation extraction. J Biomed Inform 2024; 153:104630. [PMID: 38548007 PMCID: PMC11065560 DOI: 10.1016/j.jbi.2024.104630] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 02/24/2024] [Accepted: 03/25/2024] [Indexed: 04/04/2024]
Abstract
OBJECTIVE To develop soft prompt-based learning architecture for large language models (LLMs), examine prompt-tuning using frozen/unfrozen LLMs, and assess their abilities in transfer learning and few-shot learning. METHODS We developed a soft prompt-based learning architecture and compared 4 strategies including (1) fine-tuning without prompts; (2) hard-prompting with unfrozen LLMs; (3) soft-prompting with unfrozen LLMs; and (4) soft-prompting with frozen LLMs. We evaluated GatorTron, a clinical LLM with up to 8.9 billion parameters, and compared GatorTron with 4 existing transformer models for clinical concept and relation extraction on 2 benchmark datasets for adverse drug events and social determinants of health (SDoH). We evaluated the few-shot learning ability and generalizability for cross-institution applications. RESULTS AND CONCLUSION When LLMs are unfrozen, GatorTron-3.9B with soft prompting achieves the best strict F1-scores of 0.9118 and 0.8604 for concept extraction, outperforming the traditional fine-tuning and hard prompt-based models by 0.6 ∼ 3.1 % and 1.2 ∼ 2.9 %, respectively; GatorTron-345 M with soft prompting achieves the best F1-scores of 0.8332 and 0.7488 for end-to-end relation extraction, outperforming other two models by 0.2 ∼ 2 % and 0.6 ∼ 11.7 %, respectively. When LLMs are frozen, small LLMs have a big gap to be competitive with unfrozen models; scaling LLMs up to billions of parameters makes frozen LLMs competitive with unfrozen models. Soft prompting with a frozen GatorTron-8.9B model achieved the best performance for cross-institution evaluation. We demonstrate that (1) machines can learn soft prompts better than hard prompts composed by human, (2) frozen LLMs have good few-shot learning ability and generalizability for cross-institution applications, (3) frozen LLMs reduce computing cost to 2.5 ∼ 6 % of previous methods using unfrozen LLMs, and (4) frozen LLMs require large models (e.g., over several billions of parameters) for good performance.
Collapse
Affiliation(s)
- Cheng Peng
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Xi Yang
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | | | - Zehao Yu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Aokun Chen
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Yonghui Wu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA.
| |
Collapse
|
22
|
Yu Z, Peng C, Yang X, Dang C, Adekkanattu P, Gopal Patra B, Peng Y, Pathak J, Wilson DL, Chang CY, Lo-Ciganic WH, George TJ, Hogan WR, Guo Y, Bian J, Wu Y. Identifying social determinants of health from clinical narratives: A study of performance, documentation ratio, and potential bias. J Biomed Inform 2024; 153:104642. [PMID: 38621641 PMCID: PMC11141428 DOI: 10.1016/j.jbi.2024.104642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 04/09/2024] [Accepted: 04/12/2024] [Indexed: 04/17/2024]
Abstract
OBJECTIVE To develop a natural language processing (NLP) package to extract social determinants of health (SDoH) from clinical narratives, examine the bias among race and gender groups, test the generalizability of extracting SDoH for different disease groups, and examine population-level extraction ratio. METHODS We developed SDoH corpora using clinical notes identified at the University of Florida (UF) Health. We systematically compared 7 transformer-based large language models (LLMs) and developed an open-source package - SODA (i.e., SOcial DeterminAnts) to facilitate SDoH extraction from clinical narratives. We examined the performance and potential bias of SODA for different race and gender groups, tested the generalizability of SODA using two disease domains including cancer and opioid use, and explored strategies for improvement. We applied SODA to extract 19 categories of SDoH from the breast (n = 7,971), lung (n = 11,804), and colorectal cancer (n = 6,240) cohorts to assess patient-level extraction ratio and examine the differences among race and gender groups. RESULTS We developed an SDoH corpus using 629 clinical notes of cancer patients with annotations of 13,193 SDoH concepts/attributes from 19 categories of SDoH, and another cross-disease validation corpus using 200 notes from opioid use patients with 4,342 SDoH concepts/attributes. We compared 7 transformer models and the GatorTron model achieved the best mean average strict/lenient F1 scores of 0.9122 and 0.9367 for SDoH concept extraction and 0.9584 and 0.9593 for linking attributes to SDoH concepts. There is a small performance gap (∼4%) between Males and Females, but a large performance gap (>16 %) among race groups. The performance dropped when we applied the cancer SDoH model to the opioid cohort; fine-tuning using a smaller opioid SDoH corpus improved the performance. The extraction ratio varied in the three cancer cohorts, in which 10 SDoH could be extracted from over 70 % of cancer patients, but 9 SDoH could be extracted from less than 70 % of cancer patients. Individuals from the White and Black groups have a higher extraction ratio than other minority race groups. CONCLUSIONS Our SODA package achieved good performance in extracting 19 categories of SDoH from clinical narratives. The SODA package with pre-trained transformer models is available at https://github.com/uf-hobi-informatics-lab/SODA_Docker.
Collapse
Affiliation(s)
- Zehao Yu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Cheng Peng
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Xi Yang
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Chong Dang
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Prakash Adekkanattu
- Information Technologies and Services, Weill Cornell Medicine, New York, NY, USA
| | - Braja Gopal Patra
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Yifan Peng
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Jyotishman Pathak
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Debbie L Wilson
- Department of Pharmaceutical Outcomes & Policy, College of Pharmacy, University of Florida, Gainesville, FL 32611, USA
| | - Ching-Yuan Chang
- Department of Pharmaceutical Outcomes & Policy, College of Pharmacy, University of Florida, Gainesville, FL 32611, USA
| | - Wei-Hsuan Lo-Ciganic
- Department of Pharmaceutical Outcomes & Policy, College of Pharmacy, University of Florida, Gainesville, FL 32611, USA
| | - Thomas J George
- Division of Hematology & Oncology, Department of Medicine, College of Medicine, University of Florida, Gainesville, FL, USA
| | - William R Hogan
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Yi Guo
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Yonghui Wu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA.
| |
Collapse
|
23
|
Li Y, Tao W, Li Z, Sun Z, Li F, Fenton S, Xu H, Tao C. Artificial intelligence-powered pharmacovigilance: A review of machine and deep learning in clinical text-based adverse drug event detection for benchmark datasets. J Biomed Inform 2024; 152:104621. [PMID: 38447600 DOI: 10.1016/j.jbi.2024.104621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 02/19/2024] [Accepted: 03/03/2024] [Indexed: 03/08/2024]
Abstract
OBJECTIVE The primary objective of this review is to investigate the effectiveness of machine learning and deep learning methodologies in the context of extracting adverse drug events (ADEs) from clinical benchmark datasets. We conduct an in-depth analysis, aiming to compare the merits and drawbacks of both machine learning and deep learning techniques, particularly within the framework of named-entity recognition (NER) and relation classification (RC) tasks related to ADE extraction. Additionally, our focus extends to the examination of specific features and their impact on the overall performance of these methodologies. In a broader perspective, our research extends to ADE extraction from various sources, including biomedical literature, social media data, and drug labels, removing the limitation to exclusively machine learning or deep learning methods. METHODS We conducted an extensive literature review on PubMed using the query "(((machine learning [Medical Subject Headings (MeSH) Terms]) OR (deep learning [MeSH Terms])) AND (adverse drug event [MeSH Terms])) AND (extraction)", and supplemented this with a snowballing approach to review 275 references sourced from retrieved articles. RESULTS In our analysis, we included twelve articles for review. For the NER task, deep learning models outperformed machine learning models. In the RC task, gradient Boosting, multilayer perceptron and random forest models excelled. The Bidirectional Encoder Representations from Transformers (BERT) model consistently achieved the best performance in the end-to-end task. Future efforts in the end-to-end task should prioritize improving NER accuracy, especially for 'ADE' and 'Reason'. CONCLUSION These findings hold significant implications for advancing the field of ADE extraction and pharmacovigilance, ultimately contributing to improved drug safety monitoring and healthcare outcomes.
Collapse
Affiliation(s)
- Yiming Li
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Wei Tao
- Department of Biostatistics & Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Zehan Li
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Zenan Sun
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Fang Li
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL 32224, USA
| | - Susan Fenton
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Hua Xu
- Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT 06510, USA
| | - Cui Tao
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL 32224, USA.
| |
Collapse
|
24
|
Huang MS, Han JC, Lin PY, You YT, Tsai RTH, Hsu WL. Surveying biomedical relation extraction: a critical examination of current datasets and the proposal of a new resource. Brief Bioinform 2024; 25:bbae132. [PMID: 38609331 PMCID: PMC11014787 DOI: 10.1093/bib/bbae132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 11/06/2023] [Accepted: 03/02/2023] [Indexed: 04/14/2024] Open
Abstract
Natural language processing (NLP) has become an essential technique in various fields, offering a wide range of possibilities for analyzing data and developing diverse NLP tasks. In the biomedical domain, understanding the complex relationships between compounds and proteins is critical, especially in the context of signal transduction and biochemical pathways. Among these relationships, protein-protein interactions (PPIs) are of particular interest, given their potential to trigger a variety of biological reactions. To improve the ability to predict PPI events, we propose the protein event detection dataset (PEDD), which comprises 6823 abstracts, 39 488 sentences and 182 937 gene pairs. Our PEDD dataset has been utilized in the AI CUP Biomedical Paper Analysis competition, where systems are challenged to predict 12 different relation types. In this paper, we review the state-of-the-art relation extraction research and provide an overview of the PEDD's compilation process. Furthermore, we present the results of the PPI extraction competition and evaluate several language models' performances on the PEDD. This paper's outcomes will provide a valuable roadmap for future studies on protein event detection in NLP. By addressing this critical challenge, we hope to enable breakthroughs in drug discovery and enhance our understanding of the molecular mechanisms underlying various diseases.
Collapse
Affiliation(s)
- Ming-Siang Huang
- Intelligent Agent Systems Laboratory, Department of Computer Science and Information Engineering, Asia University, New Taipei City, Taiwan
- National Institute of Cancer Research, National Health Research Institutes, Tainan, Taiwan
- Department of Computer Science and Information Engineering, College of Information and Electrical Engineering, Asia University, Taichung, Taiwan
| | - Jen-Chieh Han
- Intelligent Information Service Research Laboratory, Department of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan
| | - Pei-Yen Lin
- Intelligent Agent Systems Laboratory, Department of Computer Science and Information Engineering, Asia University, New Taipei City, Taiwan
| | - Yu-Ting You
- Intelligent Agent Systems Laboratory, Department of Computer Science and Information Engineering, Asia University, New Taipei City, Taiwan
| | - Richard Tzong-Han Tsai
- Intelligent Information Service Research Laboratory, Department of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan
- Center for Geographic Information Science, Research Center for Humanities and Social Sciences, Academia Sinica, Taipei, Taiwan
| | - Wen-Lian Hsu
- Intelligent Agent Systems Laboratory, Department of Computer Science and Information Engineering, Asia University, New Taipei City, Taiwan
- Department of Computer Science and Information Engineering, College of Information and Electrical Engineering, Asia University, Taichung, Taiwan
| |
Collapse
|
25
|
Modi S, Kasmiran KA, Mohd Sharef N, Sharum MY. Extracting adverse drug events from clinical Notes: A systematic review of approaches used. J Biomed Inform 2024; 151:104603. [PMID: 38331081 DOI: 10.1016/j.jbi.2024.104603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 01/31/2024] [Accepted: 02/01/2024] [Indexed: 02/10/2024]
Abstract
BACKGROUND An adverse drug event (ADE) is any unfavorable effect that occurs due to the use of a drug. Extracting ADEs from unstructured clinical notes is essential to biomedical text extraction research because it helps with pharmacovigilance and patient medication studies. OBJECTIVE From the considerable amount of clinical narrative text, natural language processing (NLP) researchers have developed methods for extracting ADEs and their related attributes. This work presents a systematic review of current methods. METHODOLOGY Two biomedical databases have been searched from June 2022 until December 2023 for relevant publications regarding this review, namely the databases PubMed and Medline. Similarly, we searched the multi-disciplinary databases IEEE Xplore, Scopus, ScienceDirect, and the ACL Anthology. We adopted the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 statement guidelines and recommendations for reporting systematic reviews in conducting this review. Initially, we obtained 5,537 articles from the search results from the various databases between 2015 and 2023. Based on predefined inclusion and exclusion criteria for article selection, 100 publications have undergone full-text review, of which we consider 82 for our analysis. RESULTS We determined the general pattern for extracting ADEs from clinical notes, with named entity recognition (NER) and relation extraction (RE) being the dual tasks considered. Researchers that tackled both NER and RE simultaneously have approached ADE extraction as a "pipeline extraction" problem (n = 22), as a "joint task extraction" problem (n = 7), and as a "multi-task learning" problem (n = 6), while others have tackled only NER (n = 27) or RE (n = 20). We further grouped the reviews based on the approaches for data extraction, namely rule-based (n = 8), machine learning (n = 11), deep learning (n = 32), comparison of two or more approaches (n = 11), hybrid (n = 12) and large language models (n = 8). The most used datasets are MADE 1.0, TAC 2017 and n2c2 2018. CONCLUSION Extracting ADEs is crucial, especially for pharmacovigilance studies and patient medications. This survey showcases advances in ADE extraction research, approaches, datasets, and state-of-the-art performance in them. Challenges and future research directions are highlighted. We hope this review will guide researchers in gaining background knowledge and developing more innovative ways to address the challenges.
Collapse
Affiliation(s)
- Salisu Modi
- Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Selangor, Malaysia; Department of Computer Science, Sokoto State University, Sokoto, Nigeria.
| | - Khairul Azhar Kasmiran
- Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Selangor, Malaysia.
| | - Nurfadhlina Mohd Sharef
- Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Selangor, Malaysia.
| | - Mohd Yunus Sharum
- Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Selangor, Malaysia.
| |
Collapse
|
26
|
Han L, Gladkoff S, Erofeev G, Sorokina I, Galiano B, Nenadic G. Neural machine translation of clinical text: an empirical investigation into multilingual pre-trained language models and transfer-learning. Front Digit Health 2024; 6:1211564. [PMID: 38468693 PMCID: PMC10926203 DOI: 10.3389/fdgth.2024.1211564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 01/12/2024] [Indexed: 03/13/2024] Open
Abstract
Clinical text and documents contain very rich information and knowledge in healthcare, and their processing using state-of-the-art language technology becomes very important for building intelligent systems for supporting healthcare and social good. This processing includes creating language understanding models and translating resources into other natural languages to share domain-specific cross-lingual knowledge. In this work, we conduct investigations on clinical text machine translation by examining multilingual neural network models using deep learning such as Transformer based structures. Furthermore, to address the language resource imbalance issue, we also carry out experiments using a transfer learning methodology based on massive multilingual pre-trained language models (MMPLMs). The experimental results on three sub-tasks including (1) clinical case (CC), (2) clinical terminology (CT), and (3) ontological concept (OC) show that our models achieved top-level performances in the ClinSpEn-2022 shared task on English-Spanish clinical domain data. Furthermore, our expert-based human evaluations demonstrate that the small-sized pre-trained language model (PLM) outperformed the other two extra-large language models by a large margin in the clinical domain fine-tuning, which finding was never reported in the field. Finally, the transfer learning method works well in our experimental setting using the WMT21fb model to accommodate a new language space Spanish that was not seen at the pre-training stage within WMT21fb itself, which deserves more exploitation for clinical knowledge transformation, e.g. to investigate into more languages. These research findings can shed some light on domain-specific machine translation development, especially in clinical and healthcare fields. Further research projects can be carried out based on our work to improve healthcare text analytics and knowledge transformation. Our data is openly available for research purposes at: https://github.com/HECTA-UoM/ClinicalNMT.
Collapse
Affiliation(s)
- Lifeng Han
- Department of Computer Science, The University of Manchester, Manchester, United Kingom
| | - Serge Gladkoff
- AI Lab, Logrus Global, Translation & Localization, Philadelphia, PA, United States
| | - Gleb Erofeev
- AI Lab, Logrus Global, Translation & Localization, Philadelphia, PA, United States
| | - Irina Sorokina
- AI Lab, Logrus Global, Translation & Localization, Philadelphia, PA, United States
| | - Betty Galiano
- Management Department, Ocean Translations, Rosario, Argentina
| | - Goran Nenadic
- Department of Computer Science, The University of Manchester, Manchester, United Kingom
| |
Collapse
|
27
|
GE Y, AL-GARADI MA, SARKER A. Data Augmentation with Nearest Neighbor Classifier for Few-Shot Named Entity Recognition. Stud Health Technol Inform 2024; 310:690-694. [PMID: 38269897 PMCID: PMC11471308 DOI: 10.3233/shti231053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
Few-shot learning (FSL) is a category of machine learning models that are designed with the intent of solving problems that have small amounts of labeled data available for training. FSL research progress in natural language processing (NLP), particularly within the medical domain, has been notably slow, primarily due to greater difficulties posed by domain-specific characteristics and data sparsity problems. We explored the use of novel methods for text representation and encoding combined with distance-based measures for improving FSL entity detection. In this paper, we propose a data augmentation method to incorporate semantic information from medical texts into the learning process and combine it with a nearest-neighbor classification strategy for predicting entities. Experiments performed on five biomedical text datasets demonstrate that our proposed approach often outperforms other approaches.
Collapse
Affiliation(s)
- Yao GE
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, Georgia
| | - Mohammed Ali AL-GARADI
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, U.S. state of Tennessee
| | - Abeed SARKER
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, Georgia
| |
Collapse
|
28
|
Zhou H, Austin R, Lu SC, Silverman GM, Zhou Y, Kilicoglu H, Xu H, Zhang R. Complementary and Integrative Health Information in the literature: its lexicon and named entity recognition. J Am Med Inform Assoc 2024; 31:426-434. [PMID: 37952122 PMCID: PMC10797266 DOI: 10.1093/jamia/ocad216] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 10/20/2023] [Accepted: 11/08/2023] [Indexed: 11/14/2023] Open
Abstract
OBJECTIVE To construct an exhaustive Complementary and Integrative Health (CIH) Lexicon (CIHLex) to help better represent the often underrepresented physical and psychological CIH approaches in standard terminologies, and to also apply state-of-the-art natural language processing (NLP) techniques to help recognize them in the biomedical literature. MATERIALS AND METHODS We constructed the CIHLex by integrating various resources, compiling and integrating data from biomedical literature and relevant sources of knowledge. The Lexicon encompasses 724 unique concepts with 885 corresponding unique terms. We matched these concepts to the Unified Medical Language System (UMLS), and we developed and utilized BERT models comparing their efficiency in CIH named entity recognition to well-established models including MetaMap and CLAMP, as well as the large language model GPT3.5-turbo. RESULTS Of the 724 unique concepts in CIHLex, 27.2% could be matched to at least one term in the UMLS. About 74.9% of the mapped UMLS Concept Unique Identifiers were categorized as "Therapeutic or Preventive Procedure." Among the models applied to CIH named entity recognition, BLUEBERT delivered the highest macro-average F1-score of 0.91, surpassing other models. CONCLUSION Our CIHLex significantly augments representation of CIH approaches in biomedical literature. Demonstrating the utility of advanced NLP models, BERT notably excelled in CIH entity recognition. These results highlight promising strategies for enhancing standardization and recognition of CIH terminology in biomedical contexts.
Collapse
Affiliation(s)
- Huixue Zhou
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN, United States
| | - Robin Austin
- School of Nursing, University of Minnesota, Minneapolis, MN, United States
| | - Sheng-Chieh Lu
- Department of Symptom Research, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Greg Marc Silverman
- Department of Surgery, University of Minnesota, Minneapolis, MN, United States
| | - Yuqi Zhou
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN, United States
- Department of Pharmaceutical Care & Health Systems, University of Minnesota, Minneapolis, MN, United States
| | - Halil Kilicoglu
- School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL, United States
| | - Hua Xu
- Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT, United States
| | - Rui Zhang
- Department of Surgery, University of Minnesota, Minneapolis, MN, United States
| |
Collapse
|
29
|
Deimazar G, Sheikhtaheri A. Machine learning models to detect and predict patient safety events using electronic health records: A systematic review. Int J Med Inform 2023; 180:105246. [PMID: 37837710 DOI: 10.1016/j.ijmedinf.2023.105246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Revised: 10/02/2023] [Accepted: 10/08/2023] [Indexed: 10/16/2023]
Abstract
INTRODUCTION Identifying patient safety events using electronic health records (EHRs) and automated machine learning-based detection methods can help improve the efficiency and quality of healthcare service provision. OBJECTIVE This study aimed to systematically review machine learning-based methods and techniques, as well as their results for patient safety event management using EHRs. METHODS We reviewed the studies that focused on machine learning techniques, including automatic prediction and detection of patient safety events and medical errors through EHR analysis to manage patient safety events. The data were collected by searching Scopus, PubMed (Medline), Web of Science, EMBASE, and IEEE Xplore databases. RESULTS After screening, 41 papers were reviewed. Support vector machine (SVM), random forest, conditional random field (CRF), and bidirectional long short-term memory with conditional random field (BiLSTM-CRF) algorithms were mostly applied to predict, identify, and classify patient safety events using EHRs; however, they had different performances. BiLSTM-CRF was employed in most of the studies to extract and identify concepts, e.g., adverse drug events (ADEs) and adverse drug reactions (ADRs), as well as relationships between drug and severity, drug and ADEs, drug and ADRs. Recurrent neural networks (RNN) and BiLSTM-CRF had the best results in detecting ADEs compared to other patient safety events. Linear classifiers and Naive Bayes (NB) had the highest performance for ADR detection. Logistic regression had the best results in detecting surgical site infections. According to the findings, the quality of articles has non-significantly improved in recent years, but they had low average scores. CONCLUSIONS Machine learning can be useful in automatic detection and prediction of patient safety events. However, most of these algorithms have not yet been externally validated or prospectively tested. Therefore, further studies are required to improve the performance of these automated systems.
Collapse
Affiliation(s)
- Ghasem Deimazar
- Department of Health Information Management, School of Health Management and Information Sciences, Iran University of Medical Sciences, Tehran, Iran
| | - Abbas Sheikhtaheri
- Department of Health Information Management, School of Health Management and Information Sciences, Iran University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
30
|
Frei J, Frei-Stuber L, Kramer F. GERNERMED++: Semantic annotation in German medical NLP through transfer-learning, translation and word alignment. J Biomed Inform 2023; 147:104513. [PMID: 37838290 DOI: 10.1016/j.jbi.2023.104513] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 09/27/2023] [Accepted: 10/04/2023] [Indexed: 10/16/2023]
Abstract
We present a statistical model, GERNERMED++, for German medical natural language processing trained for named entity recognition (NER) as an open, publicly available model. We demonstrate the effectiveness of combining multiple techniques in order to achieve strong results in entity recognition performance by the means of transfer-learning on pre-trained deep language models (LM), word-alignment and neural machine translation, outperforming a pre-existing baseline model on several datasets. Due to the sparse situation of open, public medical entity recognition models for German texts, this work offers benefits to the German research community on medical NLP as a baseline model. The work serves as a refined successor to our first GERNERMED model. Similar to our previous work, our trained model is publicly available to other researchers. The sample code and the statistical model is available at: https://github.com/frankkramer-lab/GERNERMED-pp.
Collapse
Affiliation(s)
- Johann Frei
- IT-Infrastructure for Translational Medical Research, University of Augsburg, Alter Postweg 101, 86159 Augsburg, Germany.
| | - Ludwig Frei-Stuber
- Institute and Outpatient Clinic for Occupational, Social and Environmental Medicine, 80336 Munich, Germany.
| | - Frank Kramer
- IT-Infrastructure for Translational Medical Research, University of Augsburg, Alter Postweg 101, 86159 Augsburg, Germany.
| |
Collapse
|
31
|
Guo B, Liu H, Niu L. Integration of natural and deep artificial cognitive models in medical images: BERT-based NER and relation extraction for electronic medical records. Front Neurosci 2023; 17:1266771. [PMID: 37732304 PMCID: PMC10507183 DOI: 10.3389/fnins.2023.1266771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 08/14/2023] [Indexed: 09/22/2023] Open
Abstract
Introduction Medical images and signals are important data sources in the medical field, and they contain key information such as patients' physiology, pathology, and genetics. However, due to the complexity and diversity of medical images and signals, resulting in difficulties in medical knowledge acquisition and decision support. Methods In order to solve this problem, this paper proposes an end-to-end framework based on BERT for NER and RE tasks in electronic medical records. Our framework first integrates NER and RE tasks into a unified model, adopting an end-to-end processing manner, which removes the limitation and error propagation of multiple independent steps in traditional methods. Second, by pre-training and fine-tuning the BERT model on large-scale electronic medical record data, we enable the model to obtain rich semantic representation capabilities that adapt to the needs of medical fields and tasks. Finally, through multi-task learning, we enable the model to make full use of the correlation and complementarity between NER and RE tasks, and improve the generalization ability and effect of the model on different data sets. Results and discussion We conduct experimental evaluation on four electronic medical record datasets, and the model significantly out performs other methods on different datasets in the NER task. In the RE task, the EMLB model also achieved advantages on different data sets, especially in the multi-task learning mode, its performance has been significantly improved, and the ETE and MTL modules performed well in terms of comprehensive precision and recall. Our research provides an innovative solution for medical image and signal data.
Collapse
Affiliation(s)
- Bo Guo
- School of Computer and Information Engineering, Fuyang Normal University, Fuyang, China
- Department of Computing, Faculty of Communication, Visual Art and Computing, Universiti Selangor, Bestari Jaya, Selangor, Malaysia
| | - Huaming Liu
- School of Computer and Information Engineering, Fuyang Normal University, Fuyang, China
| | - Lei Niu
- School of Computer and Information Engineering, Fuyang Normal University, Fuyang, China
| |
Collapse
|
32
|
Frei J, Kramer F. Annotated dataset creation through large language models for non-english medical NLP. J Biomed Inform 2023; 145:104478. [PMID: 37625508 DOI: 10.1016/j.jbi.2023.104478] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 08/01/2023] [Accepted: 08/21/2023] [Indexed: 08/27/2023]
Abstract
Obtaining text datasets with semantic annotations is an effortful process, yet crucial for supervised training in natural language processing (NLP). In general, developing and applying new NLP pipelines in domain-specific contexts for tasks often requires custom-designed datasets to address NLP tasks in a supervised machine learning fashion. When operating in non-English languages for medical data processing, this exposes several minor and major, interconnected problems such as the lack of task-matching datasets as well as task-specific pre-trained models. In our work, we suggest to leverage pre-trained large language models for training data acquisition in order to retrieve sufficiently large datasets for training smaller and more efficient models for use-case-specific tasks. To demonstrate the effectiveness of your approach, we create a custom dataset that we use to train a medical NER model for German texts, GPTNERMED, yet our method remains language-independent in principle. Our obtained dataset as well as our pre-trained models are publicly available at https://github.com/frankkramer-lab/GPTNERMED.
Collapse
Affiliation(s)
- Johann Frei
- IT-Infrastructure for Translational Medical Research, University of Augsburg Alter Postweg 101, 86159 Augsburg, Germany.
| | - Frank Kramer
- IT-Infrastructure for Translational Medical Research, University of Augsburg Alter Postweg 101, 86159 Augsburg, Germany.
| |
Collapse
|
33
|
Fraile Navarro D, Ijaz K, Rezazadegan D, Rahimi-Ardabili H, Dras M, Coiera E, Berkovsky S. Clinical named entity recognition and relation extraction using natural language processing of medical free text: A systematic review. Int J Med Inform 2023; 177:105122. [PMID: 37295138 DOI: 10.1016/j.ijmedinf.2023.105122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2022] [Revised: 04/14/2023] [Accepted: 06/03/2023] [Indexed: 06/12/2023]
Abstract
BACKGROUND Natural Language Processing (NLP) applications have developed over the past years in various fields including its application to clinical free text for named entity recognition and relation extraction. However, there has been rapid developments the last few years that there's currently no overview of it. Moreover, it is unclear how these models and tools have been translated into clinical practice. We aim to synthesize and review these developments. METHODS We reviewed literature from 2010 to date, searching PubMed, Scopus, the Association of Computational Linguistics (ACL), and Association of Computer Machinery (ACM) libraries for studies of NLP systems performing general-purpose (i.e., not disease- or treatment-specific) information extraction and relation extraction tasks in unstructured clinical text (e.g., discharge summaries). RESULTS We included in the review 94 studies with 30 studies published in the last three years. Machine learning methods were used in 68 studies, rule-based in 5 studies, and both in 22 studies. 63 studies focused on Named Entity Recognition, 13 on Relation Extraction and 18 performed both. The most frequently extracted entities were "problem", "test" and "treatment". 72 studies used public datasets and 22 studies used proprietary datasets alone. Only 14 studies defined clearly a clinical or information task to be addressed by the system and just three studies reported its use outside the experimental setting. Only 7 studies shared a pre-trained model and only 8 an available software tool. DISCUSSION Machine learning-based methods have dominated the NLP field on information extraction tasks. More recently, Transformer-based language models are taking the lead and showing the strongest performance. However, these developments are mostly based on a few datasets and generic annotations, with very few real-world use cases. This may raise questions about the generalizability of findings, translation into practice and highlights the need for robust clinical evaluation.
Collapse
Affiliation(s)
- David Fraile Navarro
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia.
| | - Kiran Ijaz
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Dana Rezazadegan
- Department of Computer Science and Software Engineering. School of Software and Electrical Engineering, Swinburne University of Technology, Melbourne, Australia
| | - Hania Rahimi-Ardabili
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Mark Dras
- Department of Computing, Macquarie University, Sydney, Australia
| | - Enrico Coiera
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Shlomo Berkovsky
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| |
Collapse
|
34
|
Zhou H, Silverman G, Niu Z, Silverman J, Evans R, Austin R, Zhang R. Extracting Complementary and Integrative Health Approaches in Electronic Health Records. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2023; 7:277-290. [PMID: 37637720 PMCID: PMC10449701 DOI: 10.1007/s41666-023-00137-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 04/12/2023] [Accepted: 07/03/2023] [Indexed: 08/29/2023]
Abstract
Complementary and Integrative Health (CIH) has gained increasing popularity in the past decades. While the evidence bases to support them are growing, there is still a gap in understanding their effects and potential adverse events using real-world data. The overall goal of this study is to represent information pertinent to both psychological and physical CIH approaches (specifically, using examples of music therapy, chiropractic, and aquatic exercise in this study) in an electronic health record (EHR) system. We also aim to evaluate the ability of existing natural language processing (NLP) systems to identify CIH approaches. A total of 300 notes were randomly selected and manually annotated. Annotations were made for status, symptom, and frequency of each approach. This set of annotations was used as a gold standard to evaluate the performance of NLP systems used in this study (specifically BioMedICUS, MetaMap, and cTAKES) for extracting CIH concepts. Venn diagram was used to investigate the consistency of medical records searching by Current Procedural Terminology (CPT) codes and CIH approaches keywords in SQL. Since CPT codes usually do not have specific mentions of CIH approaches, the Venn diagram had less overlap with those found in clinical notes for all three CIH therapies. The three NLP systems achieved 0.41 in average lenient match F1-score in all three CIH approaches, respectively. BioMedICUS achieved the best performance in aquatic exercise with an F1-score of 0.66. This study contributes to the overall representation of CIH in clinical note and lays a foundation for using EHR for clinical research for CIH approaches.
Collapse
Affiliation(s)
- Huixue Zhou
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN 55414 USA
| | - Greg Silverman
- Department of Surgery, University of Minnesota, Minneapolis, MN 55414 USA
| | - Zhongran Niu
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN 55414 USA
| | - Jenzi Silverman
- Earl E. Bakken Center for Spirituality & Healing, University of Minnesota, Minneapolis, MN 55414 USA
| | - Roni Evans
- Earl E. Bakken Center for Spirituality & Healing, University of Minnesota, Minneapolis, MN 55414 USA
| | - Robin Austin
- School of Nursing, University of Minnesota, Minneapolis, MN 55414 USA
| | - Rui Zhang
- Department of Surgery, University of Minnesota, Minneapolis, MN 55414 USA
| |
Collapse
|
35
|
Peng C, Yang X, Yu Z, Bian J, Hogan WR, Wu Y. Clinical concept and relation extraction using prompt-based machine reading comprehension. J Am Med Inform Assoc 2023; 30:1486-1493. [PMID: 37316988 PMCID: PMC10436141 DOI: 10.1093/jamia/ocad107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 05/08/2023] [Accepted: 06/05/2023] [Indexed: 06/16/2023] Open
Abstract
OBJECTIVE To develop a natural language processing system that solves both clinical concept extraction and relation extraction in a unified prompt-based machine reading comprehension (MRC) architecture with good generalizability for cross-institution applications. METHODS We formulate both clinical concept extraction and relation extraction using a unified prompt-based MRC architecture and explore state-of-the-art transformer models. We compare our MRC models with existing deep learning models for concept extraction and end-to-end relation extraction using 2 benchmark datasets developed by the 2018 National NLP Clinical Challenges (n2c2) challenge (medications and adverse drug events) and the 2022 n2c2 challenge (relations of social determinants of health [SDoH]). We also evaluate the transfer learning ability of the proposed MRC models in a cross-institution setting. We perform error analyses and examine how different prompting strategies affect the performance of MRC models. RESULTS AND CONCLUSION The proposed MRC models achieve state-of-the-art performance for clinical concept and relation extraction on the 2 benchmark datasets, outperforming previous non-MRC transformer models. GatorTron-MRC achieves the best strict and lenient F1-scores for concept extraction, outperforming previous deep learning models on the 2 datasets by 1%-3% and 0.7%-1.3%, respectively. For end-to-end relation extraction, GatorTron-MRC and BERT-MIMIC-MRC achieve the best F1-scores, outperforming previous deep learning models by 0.9%-2.4% and 10%-11%, respectively. For cross-institution evaluation, GatorTron-MRC outperforms traditional GatorTron by 6.4% and 16% for the 2 datasets, respectively. The proposed method is better at handling nested/overlapped concepts, extracting relations, and has good portability for cross-institute applications. Our clinical MRC package is publicly available at https://github.com/uf-hobi-informatics-lab/ClinicalTransformerMRC.
Collapse
Affiliation(s)
- Cheng Peng
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida, USA
| | - Xi Yang
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida, USA
- Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, Florida, USA
| | - Zehao Yu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida, USA
- Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, Florida, USA
| | - William R Hogan
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida, USA
| | - Yonghui Wu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida, USA
- Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, Florida, USA
| |
Collapse
|
36
|
Kim S, Kang T, Chung TK, Choi Y, Hong Y, Jung K, Lee H. Automatic Extraction of Comprehensive Drug Safety Information from Adverse Drug Event Narratives in the Korea Adverse Event Reporting System Using Natural Language Processing Techniques. Drug Saf 2023; 46:781-795. [PMID: 37330415 PMCID: PMC10344995 DOI: 10.1007/s40264-023-01323-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/22/2023] [Indexed: 06/19/2023]
Abstract
INTRODUCTION Concerns have been raised over the quality of drug safety information, particularly data completeness, collected through spontaneous reporting systems (SRS), although regulatory agencies routinely use SRS data to guide their pharmacovigilance programs. We expected that collecting additional drug safety information from adverse event (ADE) narratives and incorporating it into the SRS database would improve data completeness. OBJECTIVE The aims of this study were to define the extraction of comprehensive drug safety information from ADE narratives reported through the Korea Adverse Event Reporting System (KAERS) as natural language processing (NLP) tasks and to provide baseline models for the defined tasks. METHODS This study used ADE narratives and structured drug safety information from individual case safety reports (ICSRs) reported through KAERS between 1 January 2015 and 31 December 2019. We developed the annotation guideline for the extraction of comprehensive drug safety information from ADE narratives based on the International Conference on Harmonisation (ICH) E2B(R3) guideline and manually annotated 3723 ADE narratives. Then, we developed a domain-specific Korean Bidirectional Encoder Representations from Transformers (KAERS-BERT) model using 1.2 million ADE narratives in KAERS and provided baseline models for the task we defined. In addition, we performed an ablation experiment to investigate whether named entity recognition (NER) models were improved when a training dataset contained more diverse ADE narratives. RESULTS We defined 21 types of word entities, six types of entity labels, and 49 types of relations to formulate the extraction of comprehensive drug safety information as NLP tasks. We obtained a total of 86,750 entities, 81,828 entity labels, and 45,107 relations from manually annotated ADE narratives. The KAERS-BERT model achieved F1-scores of 83.81 and 76.62% on the NER and sentence extraction tasks, respectively, while outperforming other baseline models on all the NLP tasks we defined except the sentence extraction task. Finally, utilizing the NER model for extracting drug safety information from ADE narratives resulted in an average increase of 3.24% in data completeness for KAERS structured data fields. CONCLUSIONS We formulated the extraction of comprehensive drug safety information from ADE narratives as NLP tasks and developed the annotated corpus and strong baseline models for the tasks. The annotated corpus and models for extracting comprehensive drug safety information can improve the data quality of an SRS database.
Collapse
Affiliation(s)
- Siun Kim
- Biomedical Research Institute, Seoul National University Hospital, Seoul, South Korea
| | - Taegwan Kang
- Department of Electrical and Computer Engineering, Seoul National University, Room 1005 Building 301, 1 Gwanak-ro, Gwanak-gu, Seoul, 151-744, Republic of Korea
- LG AI Research, 128, Yeoui-daero, Yeongdeungpo-gu, Seoul, South Korea
| | - Tae Kyu Chung
- Department of Applied Bioengineering, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, South Korea
| | - Yoona Choi
- Department of Applied Bioengineering, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, South Korea
| | - YeSol Hong
- Department of Molecular Medicine and Biopharmaceutical Sciences, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, South Korea
| | - Kyomin Jung
- Department of Electrical and Computer Engineering, Seoul National University, Room 1005 Building 301, 1 Gwanak-ro, Gwanak-gu, Seoul, 151-744, Republic of Korea.
| | - Howard Lee
- Department of Applied Bioengineering, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, South Korea.
- Department of Molecular Medicine and Biopharmaceutical Sciences, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, South Korea.
- Department of Clinical Pharmacology and Therapeutics, Seoul National University College of Medicine and Hospital, 103 Daehak-ro, Jongno-gu, Seoul, 110-799, South Korea.
- Advanced Institutes of Convergence Technology, Suwon, 16229, South Korea.
| |
Collapse
|
37
|
Affiliation(s)
- Hanyin Wang
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, 750 N. Lake Shore Drive, 11-189, Chicago, IL, 60611, USA
| | - Yanyi Jenny Ding
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, 750 N. Lake Shore Drive, 11-189, Chicago, IL, 60611, USA
| | - Yuan Luo
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, 750 N. Lake Shore Drive, 11-189, Chicago, IL, 60611, USA.
| |
Collapse
|
38
|
Dietrich J, Kazzer P. Provision and Characterization of a Corpus for Pharmaceutical, Biomedical Named Entity Recognition for Pharmacovigilance: Evaluation of Language Registers and Training Data Sufficiency. Drug Saf 2023; 46:765-779. [PMID: 37338799 PMCID: PMC10345043 DOI: 10.1007/s40264-023-01322-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/16/2023] [Indexed: 06/21/2023]
Abstract
INTRODUCTION AND OBJECTIVE Machine learning (ML) systems are widely used for automatic entity recognition in pharmacovigilance. Publicly available datasets do not allow the use of annotated entities independently, focusing on small entity subsets or on single language registers (informal or scientific language). The objective of the current study was to create a dataset that enables independent usage of entities, explores the performance of predictive ML models on different registers, and introduces a method to investigate entity cut-off performance. METHODS A dataset has been created combining different registers with 18 different entities. We applied this dataset to compare the performance of integrated models with models created with single language registers only. We introduced fractional stratified k-fold cross-validation to determine model performance on entity level by using training dataset fractions. We investigated the course of entity performance with fractions of training datasets and evaluated entity peak and cut-off performance. RESULTS The dataset combines 1400 records (scientific language: 790; informal language: 610) with 2622 sentences and 9989 entity occurrences and combines data from external (801 records) and internal sources (599 records). We demonstrated that single language register models underperform compared to integrated models trained with multiple language registers. CONCLUSIONS A manually annotated dataset with a variety of different pharmaceutical and biomedical entities was created and is made available to the research community. Our results show that models that combine different registers provide better maintainability, have higher robustness, and have similar or higher performance. Fractional stratified k-fold cross-validation allows the evaluation of training data sufficiency on the entity level.
Collapse
Affiliation(s)
- Jürgen Dietrich
- Bayer AG, Pharmaceuticals, Medical Affairs & Pharmacovigilance, Data Science & Insights, Müllerstr. 170, 13353, Berlin, Germany.
| | | |
Collapse
|
39
|
Mahajan D, Liang JJ, Tsou CH, Uzuner Ö. Overview of the 2022 n2c2 shared task on contextualized medication event extraction in clinical notes. J Biomed Inform 2023; 144:104432. [PMID: 37356640 PMCID: PMC10529825 DOI: 10.1016/j.jbi.2023.104432] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 06/15/2023] [Accepted: 06/17/2023] [Indexed: 06/27/2023]
Abstract
BACKGROUND An accurate medication history, foundational for providing quality medical care, requires understanding of medication change events documented in clinical notes. However, extracting medication changes without the necessary clinical context is insufficient for real-world applications. METHODS To address this need, Track 1 of the 2022 National NLP Clinical Challenges focused on extracting the context for medication changes documented in clinical notes using the Contextualized Medication Event Dataset. Track 1 consisted of 3 subtasks: extracting medication mentions from clinical notes (NER), determining whether a medication change is being discussed (Event), and determining the action, negation, temporality, certainty, and actor for any change events (Context). Participants were allowed to participate in any one or more of the subtasks. RESULTS A total of 32 teams with participants from 19 countries submitted a total of 211 systems across all subtasks. Most teams formulated NER as a token classification task and Event and Context as multi-class classification tasks, using transformer-based large language models. Overall, performance for NER was high across submitted systems. However, performance for Event and Context were much lower, often due to indirectly stated change events with no clear action verb, events requiring farther textual clues for understanding, and medication mentions with multiple change events. CONCLUSIONS This shared task showed that while NLP research on medication extraction is relatively mature, understanding of contextual information surrounding medication events in clinical notes is still an open problem requiring further research to achieve the end goal of supporting real-world clinical applications.
Collapse
Affiliation(s)
- Diwakar Mahajan
- IBM T.J. Watson Research Center, Yorktown Heights, NY, United States of America
| | - Jennifer J Liang
- IBM T.J. Watson Research Center, Yorktown Heights, NY, United States of America.
| | - Ching-Huei Tsou
- IBM T.J. Watson Research Center, Yorktown Heights, NY, United States of America
| | - Özlem Uzuner
- Department of Information Sciences & Technology, George Mason University, Fairfax, VA, United States of America
| |
Collapse
|
40
|
Mishra RK, Roy S, Palla SK, Patel N, Patel M, Jos S. Hybrid approach combining deep learning and a rule based expert system for concept extraction from prescriptions. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-4. [PMID: 38082624 DOI: 10.1109/embc40787.2023.10339977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
Concept extraction from prescriptions is a very important task that provides a foundation for many of the downstream healthcare applications in decision making across the areas of pharmacovigilance, medication adherence, inventory management, and other matters of value-based care. Although short, these directions can sometimes be complex. With the increase in complexity of direction, it becomes harder to extract various concepts by only rule based expert system. It identifies major concepts like frequency, dosage, duration, etc. from the natural text direction using a combination of rules and deep learning (DL) based methods on a large real world data of a pharmacy chain. The DL module includes a fine-tuned BERT transformer and Gram CNN (Convolutional Neural Network) based NER (Named Entity Recognition) architecture. The proposed method utilizes the domain heuristics along with intelligent labelling and bootstrapping to help DL models extract concepts with high evaluation scores and thus provides a way for carrying out concept extraction using targeted methods instead of one single method. To the best of our knowledge, this is the best performance reported in the literature for concept extraction from doctor's prescription.
Collapse
|
41
|
Chen S, Guevara M, Ramirez N, Murray A, Warner JL, Aerts HJWL, Miller TA, Savova GK, Mak RH, Bitterman DS. Natural Language Processing to Automatically Extract the Presence and Severity of Esophagitis in Notes of Patients Undergoing Radiotherapy. JCO Clin Cancer Inform 2023; 7:e2300048. [PMID: 37506330 DOI: 10.1200/cci.23.00048] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/05/2023] [Accepted: 05/26/2023] [Indexed: 07/30/2023] Open
Abstract
PURPOSE Radiotherapy (RT) toxicities can impair survival and quality of life, yet remain understudied. Real-world evidence holds potential to improve our understanding of toxicities, but toxicity information is often only in clinical notes. We developed natural language processing (NLP) models to identify the presence and severity of esophagitis from notes of patients treated with thoracic RT. METHODS Our corpus consisted of a gold-labeled data set of 1,524 clinical notes from 124 patients with lung cancer treated with RT, manually annotated for Common Terminology Criteria for Adverse Events (CTCAE) v5.0 esophagitis grade, and a silver-labeled data set of 2,420 notes from 1,832 patients from whom toxicity grades had been collected as structured data during clinical care. We fine-tuned statistical and pretrained Bidirectional Encoder Representations from Transformers-based models for three esophagitis classification tasks: task 1, no esophagitis versus grade 1-3; task 2, grade ≤1 versus >1; and task 3, no esophagitis versus grade 1 versus grade 2-3. Transferability was tested on 345 notes from patients with esophageal cancer undergoing RT. RESULTS Fine-tuning of PubMedBERT yielded the best performance. The best macro-F1 was 0.92, 0.82, and 0.74 for tasks 1, 2, and 3, respectively. Selecting the most informative note sections during fine-tuning improved macro-F1 by ≥2% for all tasks. Silver-labeled data improved the macro-F1 by ≥3% across all tasks. For the esophageal cancer notes, the best macro-F1 was 0.73, 0.74, and 0.65 for tasks 1, 2, and 3, respectively, without additional fine-tuning. CONCLUSION To our knowledge, this is the first effort to automatically extract esophagitis toxicity severity according to CTCAE guidelines from clinical notes. This provides proof of concept for NLP-based automated detailed toxicity monitoring in expanded domains.
Collapse
Affiliation(s)
- Shan Chen
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA
| | - Marco Guevara
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA
| | - Nicolas Ramirez
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA
| | - Arpi Murray
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA
| | - Jeremy L Warner
- Population Sciences Program, Legorreta Cancer Center, Brown University, Providence, RI
- Lifespan Cancer Institute, Providence, RI
| | - Hugo J W L Aerts
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA
- Radiology and Nuclear Medicine, GROW & CARIM, Maastricht University, Maastricht, the Netherlands
| | - Timothy A Miller
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA
| | - Guergana K Savova
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA
| | - Raymond H Mak
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA
| | - Danielle S Bitterman
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA
| |
Collapse
|
42
|
Gan Q, Hu M, Peterson KS, Eyre H, Alba PR, Bowles AE, Stanley JC, DuVall SL, Shi J. A deep learning approach for medication disposition and corresponding attributes extraction. J Biomed Inform 2023; 143:104391. [PMID: 37196988 PMCID: PMC10527481 DOI: 10.1016/j.jbi.2023.104391] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Revised: 05/05/2023] [Accepted: 05/10/2023] [Indexed: 05/19/2023]
Abstract
OBJECTIVE This article summarizes our approach to extracting medication and corresponding attributes from clinical notes, which is the focus of track 1 of the 2022 National Natural Language Processing (NLP) Clinical Challenges(n2c2) shared task. METHODS The dataset was prepared using Contextualized Medication Event Dataset (CMED), including 500 notes from 296 patients. Our system consisted of three components: medication named entity recognition (NER), event classification (EC), and context classification (CC). These three components were built using transformer models with slightly different architecture and input text engineering. A zero-shot learning solution for CC was also explored. RESULTS Our best performance systems achieved micro-average F1 scores of 0.973, 0.911, and 0.909 for the NER, EC, and CC, respectively. CONCLUSION In this study, we implemented a deep learning-based NLP system and demonstrated that our approach of (1) utilizing special tokens helps our model to distinguish multiple medications mentions in the same context; (2) aggregating multiple events of a single medication into multiple labels improves our model's performance.
Collapse
Affiliation(s)
- Qiwei Gan
- VA Salt Lake City Health Care System, 500, Foothill Boulevard, Salt Lake City 84148, USA; Division of Epidemiology, University of Utah, 295 Chipeta Way, Salt Lake City 84132, USA
| | - Mengke Hu
- VA Salt Lake City Health Care System, 500, Foothill Boulevard, Salt Lake City 84148, USA; Division of Epidemiology, University of Utah, 295 Chipeta Way, Salt Lake City 84132, USA
| | - Kelly S Peterson
- Division of Epidemiology, University of Utah, 295 Chipeta Way, Salt Lake City 84132, USA; Veterans Health Administration Office of Analytics and Performance Integration, 500, Foothill Boulevard, Salt Lake City 84148, USA
| | - Hannah Eyre
- VA Salt Lake City Health Care System, 500, Foothill Boulevard, Salt Lake City 84148, USA; Division of Epidemiology, University of Utah, 295 Chipeta Way, Salt Lake City 84132, USA
| | - Patrick R Alba
- VA Salt Lake City Health Care System, 500, Foothill Boulevard, Salt Lake City 84148, USA; Division of Epidemiology, University of Utah, 295 Chipeta Way, Salt Lake City 84132, USA
| | - Annie E Bowles
- VA Salt Lake City Health Care System, 500, Foothill Boulevard, Salt Lake City 84148, USA; Division of Epidemiology, University of Utah, 295 Chipeta Way, Salt Lake City 84132, USA
| | - Johnathan C Stanley
- VA Salt Lake City Health Care System, 500, Foothill Boulevard, Salt Lake City 84148, USA; Division of Epidemiology, University of Utah, 295 Chipeta Way, Salt Lake City 84132, USA
| | - Scott L DuVall
- VA Salt Lake City Health Care System, 500, Foothill Boulevard, Salt Lake City 84148, USA; Division of Epidemiology, University of Utah, 295 Chipeta Way, Salt Lake City 84132, USA
| | - Jianlin Shi
- VA Salt Lake City Health Care System, 500, Foothill Boulevard, Salt Lake City 84148, USA; Division of Epidemiology, University of Utah, 295 Chipeta Way, Salt Lake City 84132, USA.
| |
Collapse
|
43
|
Botsis T, Kreimeyer K. Improving drug safety with adverse event detection using natural language processing. Expert Opin Drug Saf 2023; 22:659-668. [PMID: 37339273 DOI: 10.1080/14740338.2023.2228197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 06/19/2023] [Indexed: 06/22/2023]
Abstract
INTRODUCTION Pharmacovigilance (PV) involves monitoring and aggregating adverse event information from a variety of data sources, including health records, biomedical literature, spontaneous adverse event reports, product labels, and patient-generated content like social media posts, but the most pertinent details in these sources are typically available in narrative free-text formats. Natural language processing (NLP) techniques can be used to extract clinically relevant information from PV texts to inform decision-making. AREAS COVERED We conducted a non-systematic literature review by querying the PubMed database to examine the uses of NLP in drug safety and distilled the findings to present our expert opinion on the topic. EXPERT OPINION New NLP techniques and approaches continue to be applied for drug safety use cases; however, systems that are fully deployed and in use in a clinical environment remain vanishingly rare. To see high-performing NLP techniques implemented in the real setting will require long-term engagement with end users and other stakeholders and revised workflows in fully formulated business plans for the targeted use cases. Additionally, we found little to no evidence of extracted information placed into standardized data models, which should be a way to make implementations more portable and adaptable.
Collapse
Affiliation(s)
- Taxiarchis Botsis
- Department of Oncology, the Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Kory Kreimeyer
- Department of Oncology, the Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| |
Collapse
|
44
|
Moon S, He H, Jia H, Liu H, Fan JW. Extractive Clinical Question-Answering With Multianswer and Multifocus Questions: Data Set Development and Evaluation Study. JMIR AI 2023; 2:e41818. [PMID: 38875580 PMCID: PMC11041481 DOI: 10.2196/41818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 01/31/2023] [Accepted: 05/22/2023] [Indexed: 06/16/2024]
Abstract
BACKGROUND Extractive question-answering (EQA) is a useful natural language processing (NLP) application for answering patient-specific questions by locating answers in their clinical notes. Realistic clinical EQA can yield multiple answers to a single question and multiple focus points in 1 question, which are lacking in existing data sets for the development of artificial intelligence solutions. OBJECTIVE This study aimed to create a data set for developing and evaluating clinical EQA systems that can handle natural multianswer and multifocus questions. METHODS We leveraged the annotated relations from the 2018 National NLP Clinical Challenges corpus to generate an EQA data set. Specifically, the 1-to-N, M-to-1, and M-to-N drug-reason relations were included to form the multianswer and multifocus question-answering entries, which represent more complex and natural challenges in addition to the basic 1-drug-1-reason cases. A baseline solution was developed and tested on the data set. RESULTS The derived RxWhyQA data set contains 96,939 QA entries. Among the answerable questions, 25% of them require multiple answers, and 2% of them ask about multiple drugs within 1 question. Frequent cues were observed around the answers in the text, and 90% of the drug and reason terms occurred within the same or an adjacent sentence. The baseline EQA solution achieved a best F1-score of 0.72 on the entire data set, and on specific subsets, it was 0.93 for the unanswerable questions, 0.48 for single-drug questions versus 0.60 for multidrug questions, and 0.54 for the single-answer questions versus 0.43 for multianswer questions. CONCLUSIONS The RxWhyQA data set can be used to train and evaluate systems that need to handle multianswer and multifocus questions. Specifically, multianswer EQA appears to be challenging and therefore warrants more investment in research. We created and shared a clinical EQA data set with multianswer and multifocus questions that would channel future research efforts toward more realistic scenarios.
Collapse
Affiliation(s)
- Sungrim Moon
- Department of Artificial Intelligence & Informatics, Mayo Clinic, Rochester, MN, United States
| | - Huan He
- Department of Artificial Intelligence & Informatics, Mayo Clinic, Rochester, MN, United States
| | - Heling Jia
- Department of Artificial Intelligence & Informatics, Mayo Clinic, Rochester, MN, United States
| | - Hongfang Liu
- Department of Artificial Intelligence & Informatics, Mayo Clinic, Rochester, MN, United States
| | - Jungwei Wilfred Fan
- Department of Artificial Intelligence & Informatics, Mayo Clinic, Rochester, MN, United States
| |
Collapse
|
45
|
Yang L, Huang X, Wang J, Yang X, Ding L, Li Z, Li J. Identifying stroke-related quantified evidence from electronic health records in real-world studies. Artif Intell Med 2023; 140:102552. [PMID: 37210153 DOI: 10.1016/j.artmed.2023.102552] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 02/28/2023] [Accepted: 04/11/2023] [Indexed: 05/22/2023]
Abstract
BACKGROUND Stroke is one of the leading causes of death and disability worldwide. The National Institutes of Health Stroke Scale (NIHSS) scores in electronic health records (EHRs), which quantitatively describe patients' neurological deficits in evidence-based treatment, are crucial in stroke-related clinical investigations. However, the free-text format and lack of standardization inhibit their effective use. Automatically extracting the scale scores from the clinical free text so that its potential value in real-world studies is realized has become an important goal. OBJECTIVE This study aims to develop an automated method to extract scale scores from the free text of EHRs. METHODS We propose a two-step pipeline method to identify NIHSS items and numerical scores and validate its feasibility using a freely accessible critical care database: MIMIC-III (Medical Information Mart for Intensive Care III). First, we utilize MIMIC-III to create an annotated corpus. Then, we investigate possible machine learning methods for two subtasks, NIHSS item and score recognition and item-score relation extraction. In the evaluation, we conduct both task-specific and end-to-end evaluations and compare our method with the rule-based method using precision, recall and F1 scores as evaluation metrics. RESULTS We use all available discharge summaries of stroke cases in MIMIC-III. The annotated NIHSS corpus contains 312 cases, 2929 scale items, 2774 scores and 2733 relations. The results show that the best F1-score of our method was 0.9006, which was attained by combining BERT-BiLSTM-CRF and Random Forest, and it outperformed the rule-based method (F1-score = 0.8098). In the end-to-end task, our method could successfully recognize the item "1b level of consciousness questions", the score "1" and their relation "('1b level of consciousness questions', '1', 'has value')" from the sentence "1b level of consciousness questions: said name = 1", while the rule-based method could not. CONCLUSIONS The two-step pipeline method we propose is an effective approach to identify NIHSS items, scores and their relations. With its help, clinical investigators can easily retrieve and access structured scale data, thereby supporting stroke-related real-world studies.
Collapse
Affiliation(s)
- Lin Yang
- Institute of Medical Information and Library, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing 100020, China; Key Laboratory of Medical Information Intelligent Technology, Chinese Academy of Medical Sciences, Beijing 100020, China
| | - Xiaoshuo Huang
- Institute of Medical Information and Library, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing 100020, China; School of Health Care Technology, Dalian Neusoft University of Information, Dalian 116023, China
| | - Jiayang Wang
- Institute of Medical Information and Library, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing 100020, China
| | - Xin Yang
- China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China; National Center for Healthcare Quality Management in Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China
| | - Lingling Ding
- China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China; Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China
| | - Zixiao Li
- China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China; National Center for Healthcare Quality Management in Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China; Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China
| | - Jiao Li
- Institute of Medical Information and Library, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing 100020, China; Key Laboratory of Medical Information Intelligent Technology, Chinese Academy of Medical Sciences, Beijing 100020, China.
| |
Collapse
|
46
|
Chen A, Yu Z, Yang X, Guo Y, Bian J, Wu Y. Contextualized medication information extraction using Transformer-based deep learning architectures. J Biomed Inform 2023; 142:104370. [PMID: 37100106 PMCID: PMC10980542 DOI: 10.1016/j.jbi.2023.104370] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 03/14/2023] [Accepted: 04/19/2023] [Indexed: 04/28/2023]
Abstract
OBJECTIVE To develop a natural language processing (NLP) system to extract medications and contextual information that help understand drug changes. This project is part of the 2022 n2c2 challenge. MATERIALS AND METHODS We developed NLP systems for medication mention extraction, event classification (indicating medication changes discussed or not), and context classification to classify medication changes context into 5 orthogonal dimensions related to drug changes. We explored 6 state-of-the-art pretrained transformer models for the three subtasks, including GatorTron, a large language model pretrained using > 90 billion words of text (including > 80 billion words from > 290 million clinical notes identified at the University of Florida Health). We evaluated our NLP systems using annotated data and evaluation scripts provided by the 2022 n2c2 organizers. RESULTS Our GatorTron models achieved the best F1-scores of 0.9828 for medication extraction (ranked 3rd), 0.9379 for event classification (ranked 2nd), and the best micro-average accuracy of 0.9126 for context classification. GatorTron outperformed existing transformer models pretrained using smaller general English text and clinical text corpora, indicating the advantage of large language models. CONCLUSION This study demonstrated the advantage of using large transformer models for contextual medication information extraction from clinical narratives.
Collapse
Affiliation(s)
- Aokun Chen
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Zehao Yu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Xi Yang
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Yi Guo
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Yonghui Wu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA.
| |
Collapse
|
47
|
Murali L, Gopakumar G, Viswanathan DM, Nedungadi P. Towards electronic health record-based medical knowledge graph construction, completion, and applications: A literature study. J Biomed Inform 2023:104403. [PMID: 37230406 DOI: 10.1016/j.jbi.2023.104403] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 05/16/2023] [Accepted: 05/19/2023] [Indexed: 05/27/2023]
Abstract
With the growth of data and intelligent technologies, the healthcare sector opened numerous technology that enabled services for patients, clinicians, and researchers. One major hurdle in achieving state-of-the-art results in health informatics is domain-specific terminologies and their semantic complexities. A knowledge graph crafted from medical concepts, events, and relationships acts as a medical semantic network to extract new links and hidden patterns from health data sources. Current medical knowledge graph construction studies are limited to generic techniques and opportunities and focus less on exploiting real-world data sources in knowledge graph construction. A knowledge graph constructed from Electronic Health Records (EHR) data obtains real-world data from healthcare records. It ensures better results in subsequent tasks like knowledge extraction and inference, knowledge graph completion, and medical knowledge graph applications such as diagnosis predictions, clinical recommendations, and clinical decision support. This review critically analyses existing works on medical knowledge graphs that used EHR data as the data source at (i) representation level, (ii) extraction level (iii) completion level. In this investigation, we found that EHR-based knowledge graph construction involves challenges such as high complexity and dimensionality of data, lack of knowledge fusion, and dynamic update of the knowledge graph. In addition, the study presents possible ways to tackle the challenges identified. Our findings conclude that future research should focus on knowledge graph integration and knowledge graph completion challenges.
Collapse
Affiliation(s)
- Lino Murali
- Center for Research in Analytics and Technologies for Education (CREATE), Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, 690525, Kerala, India; Division of Information technology, School of Engineering, Cochin University of Science and Technology, Kochi, 682022, Kerala, India
| | - G Gopakumar
- Department of Computer Science and Engineering, School of Computing, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, 690525, Kerala, India
| | - Daleesha M Viswanathan
- Division of Information technology, School of Engineering, Cochin University of Science and Technology, Kochi, 682022, Kerala, India
| | - Prema Nedungadi
- Center for Research in Analytics and Technologies for Education (CREATE), Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, 690525, Kerala, India; Department of Computer Science and Engineering, School of Computing, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, 690525, Kerala, India.
| |
Collapse
|
48
|
Socrates V, Gilson A, Lopez K, Chi L, Taylor RA, Chartash D. Predicting relations between SOAP note sections: The value of incorporating a clinical information model. J Biomed Inform 2023; 141:104360. [PMID: 37061014 PMCID: PMC10197152 DOI: 10.1016/j.jbi.2023.104360] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 03/27/2023] [Accepted: 04/05/2023] [Indexed: 04/17/2023]
Abstract
Physician progress notes are frequently organized into Subjective, Objective, Assessment, and Plan (SOAP) sections. The Assessment section synthesizes information recorded in the Subjective and Objective sections, and the Plan section documents tests and treatments to narrow the differential diagnosis and manage symptoms. Classifying the relationship between the Assessment and Plan sections has been suggested to provide valuable insight into clinical reasoning. In this work, we use a novel human-in-the-loop pipeline to classify the relationships between the Assessment and Plan sections of SOAP notes as a part of the n2c2 2022 Track 3 Challenge. In particular, we use a clinical information model constructed from both the entailment logic expected from the aforementioned Challenge and the problem-oriented medical record. This information model is used to label named entities as primary and secondary problems/symptoms, events and complications in all four SOAP sections. We iteratively train separate Named Entity Recognition models and use them to annotate entities in all notes/sections. We fine-tune a downstream RoBERTa-large model to classify the Assessment-Plan relationship. We evaluate multiple language model architectures, preprocessing parameters, and methods of knowledge integration, achieving a maximum macro-F1 score of 82.31%. Our initial model achieves top-2 performance during the challenge (macro-F1: 81.52%, competitors' macro-F1 range: 74.54%-82.12%). We improved our model by incorporating post-challenge annotations (S&O sections), outperforming the top model from the Challenge. We also used Shapley additive explanations to investigate the extent of language model clinical logic, under the lens of our clinical information model. We find that the model often uses shallow heuristics and nonspecific attention when making predictions, suggesting language model knowledge integration requires further research.
Collapse
Affiliation(s)
- Vimig Socrates
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, 300 George St, 06511, New Haven, USA; Department of Emergency Medicine, Yale University School of Medicine, 464 Congress Ave #260, New Haven, 06519, USA; Program of Computational Biology and Bioinformatics, Yale University, 300 George St, New Haven, 06511, USA.
| | - Aidan Gilson
- Department of Emergency Medicine, Yale University School of Medicine, 464 Congress Ave #260, New Haven, 06519, USA.
| | - Kevin Lopez
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, 300 George St, 06511, New Haven, USA; Department of Emergency Medicine, Yale University School of Medicine, 464 Congress Ave #260, New Haven, 06519, USA.
| | - Ling Chi
- Department of Emergency Medicine, Yale University School of Medicine, 464 Congress Ave #260, New Haven, 06519, USA.
| | - Richard Andrew Taylor
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, 300 George St, 06511, New Haven, USA; Department of Emergency Medicine, Yale University School of Medicine, 464 Congress Ave #260, New Haven, 06519, USA.
| | - David Chartash
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, 300 George St, 06511, New Haven, USA; School of Medicine, University College Dublin - National University of Ireland, Dublin, Health Sciences Centre, Belfield, Dublin 4, Ireland.
| |
Collapse
|
49
|
Kaas-Hansen BS, Gentile S, Caioli A, Andersen SE. Exploratory pharmacovigilance with machine learning in big patient data: A focused scoping review. Basic Clin Pharmacol Toxicol 2023; 132:233-241. [PMID: 36541054 DOI: 10.1111/bcpt.13828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2022] [Revised: 12/15/2022] [Accepted: 12/15/2022] [Indexed: 12/24/2022]
Abstract
BACKGROUND Machine learning can operationalize the rich and complex data in electronic patient records for exploratory pharmacovigilance endeavours. OBJECTIVE The objective of this review is to identify applications of machine learning and big patient data in exploratory pharmacovigilance. METHODS We searched PubMed and Embase and included original articles with an exploratory pharmacovigilance purpose, focusing on medicinal interventions and reporting the use of machine learning in electronic patient records with ≥1000 patients collected after market entry. FINDINGS Of 2557 studies screened, seven were included. Those covered six countries and were published between 2015 and 2021. The most prominent machine learning methods were random forests, logistic regressions, and support vector machines. Two studies used artificial neural networks or naive Bayes classifiers. One study used formal concept analysis for association mining, and another used temporal difference learning. Five studies compared several methods against each other. The numbers of patients in most data sets were in the order of thousands; two studies used what can more reasonably be considered big data with >1 000 000 patients records. CONCLUSION Despite years of great aspirations for combining machine learning and clinical data for exploratory pharmacovigilance, only few studies still seem to deliver somewhat on these expectations.
Collapse
Affiliation(s)
- Benjamin Skov Kaas-Hansen
- Department of Intensive Care, Copenhagen University Hospital - Rigshospitalet, Copenhagen, Denmark.,Section for Biostatistics, Department of Public Health, University of Copenhagen, Copenhagen, Denmark
| | - Simona Gentile
- Department of Radiology, Zealand University Hospital, Roskilde, Denmark
| | - Alessandro Caioli
- Department of Infectious Diseases - Hepatology, National Institute of Infectious Diseases Lazzaro Spallanzani, Rome, Italy
| | - Stig Ejdrup Andersen
- Clinical Pharmacology Unit, Zealand University Hospital Roskilde, Roskilde, Denmark
| |
Collapse
|
50
|
Frei J, Kramer F. German Medical Named Entity Recognition Model and Data Set Creation Using Machine Translation and Word Alignment: Algorithm Development and Validation. JMIR Form Res 2023; 7:e39077. [PMID: 36853741 PMCID: PMC10015355 DOI: 10.2196/39077] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 09/11/2022] [Accepted: 11/03/2022] [Indexed: 11/06/2022] Open
Abstract
BACKGROUND Data mining in the field of medical data analysis often needs to rely solely on the processing of unstructured data to retrieve relevant data. For German natural language processing, few open medical neural named entity recognition (NER) models have been published before this work. A major issue can be attributed to the lack of German training data. OBJECTIVE We developed a synthetic data set and a novel German medical NER model for public access to demonstrate the feasibility of our approach. In order to bypass legal restrictions due to potential data leaks through model analysis, we did not make use of internal, proprietary data sets, which is a frequent veto factor for data set publication. METHODS The underlying German data set was retrieved by translation and word alignment of a public English data set. The data set served as a foundation for model training and evaluation. For demonstration purposes, our NER model follows a simple network architecture that is designed for low computational requirements. RESULTS The obtained data set consisted of 8599 sentences including 30,233 annotations. The model achieved a class frequency-averaged F1 score of 0.82 on the test set after training across 7 different NER types. Artifacts in the synthesized data set with regard to translation and alignment induced by the proposed method were exposed. The annotation performance was evaluated on an external data set and measured in comparison with an existing baseline model that has been trained on a dedicated German data set in a traditional fashion. We discussed the drop in annotation performance on an external data set for our simple NER model. Our model is publicly available. CONCLUSIONS We demonstrated the feasibility of obtaining a data set and training a German medical NER model by the exclusive use of public training data through our suggested method. The discussion on the limitations of our approach includes ways to further mitigate remaining problems in future work.
Collapse
Affiliation(s)
- Johann Frei
- IT Infrastructure for Translational Medical Research, University of Augsburg, Augsburg, Germany
| | - Frank Kramer
- IT Infrastructure for Translational Medical Research, University of Augsburg, Augsburg, Germany
| |
Collapse
|