Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Bose P, Srinivasan S, Sleeman WC, Palta J, Kapoor R, Ghosh P. A Survey on Recent Named Entity Recognition and Relationship Extraction Techniques on Clinical Texts. Applied Sciences 2021;11:8319. [DOI: 10.3390/app11188319] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]

For:	Bose P, Srinivasan S, Sleeman WC, Palta J, Kapoor R, Ghosh P. A Survey on Recent Named Entity Recognition and Relationship Extraction Techniques on Clinical Texts. Applied Sciences 2021;11:8319. [DOI: 10.3390/app11188319] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]

Number

Cited by Other Article(s)

Fonferko-Shadrach B, Strafford H, Jones C, Khan RA, Brown S, Edwards J, Hawken J, Shrimpton LE, White CP, Powell R, Sawhney IMS, Pickrell WO, Lacey AS. Annotation of epilepsy clinic letters for natural language processing. J Biomed Semantics 2024;15:17. [PMID: 39277770 PMCID: PMC11402197 DOI: 10.1186/s13326-024-00316-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 07/22/2024] [Indexed: 09/17/2024] Open

Abstract

BACKGROUND

Natural language processing (NLP) is increasingly being used to extract structured information from unstructured text to assist clinical decision-making and aid healthcare research. The availability of expert-annotated documents for the development and validation of NLP applications is limited. We created synthetic clinical documents to address this, and to validate the Extraction of Epilepsy Clinical Text version 2 (ExECTv2) NLP pipeline.

METHODS

We created 200 synthetic clinic letters based on hospital outpatient consultations with epilepsy specialists. The letters were double annotated by trained clinicians and researchers according to agreed guidelines. We used the annotation tool, Markup, with an epilepsy concept list based on the Unified Medical Language System ontology. All annotations were reviewed, and a gold standard set of annotations was agreed and used to validate the performance of ExECTv2.

RESULTS

The overall inter-annotator agreement (IAA) between the two sets of annotations produced a per item F1 score of 0.73. Validating ExECTv2 using the gold standard gave an overall F1 score of 0.87 per item, and 0.90 per letter.

CONCLUSION

The synthetic letters, annotations, and annotation guidelines have been made freely available. To our knowledge, this is the first publicly available set of annotated epilepsy clinic letters and guidelines that can be used for NLP researchers with minimum epilepsy knowledge. The IAA results show that clinical text annotation tasks are difficult and require a gold standard to be arranged by researcher consensus. The results for ExECTv2, our automated epilepsy NLP pipeline, extracted detailed epilepsy information from unstructured epilepsy letters with more accuracy than human annotators, further confirming the utility of NLP for clinical and research applications.

Collapse

Li D, Yang Y, Cui J, Meng X, Qu J, Jiang Z, Zhao Y. Joint extraction of Chinese medical entities and relations based on RoBERTa and single-module global pointer. BMC Med Inform Decis Mak 2024;24:218. [PMID: 39085892 PMCID: PMC11293210 DOI: 10.1186/s12911-024-02577-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Accepted: 06/13/2024] [Indexed: 08/02/2024] Open

Abstract

BACKGROUND

Most Chinese joint entity and relation extraction tasks in medicine involve numerous nested entities, overlapping relations, and other challenging extraction issues. In response to these problems, some traditional methods decompose the joint extraction task into multiple steps or multiple modules, resulting in local dependency in the meantime.

METHODS

To alleviate this issue, we propose a joint extraction model of Chinese medical entities and relations based on RoBERTa and single-module global pointer, namely RSGP, which formulates joint extraction as a global pointer linking problem. Considering the uniqueness of Chinese language structure, we introduce the RoBERTa-wwm pre-trained language model at the encoding layer to obtain a better embedding representation. Then, we represent the input sentence as a third-order tensor and score each position in the tensor to prepare for the subsequent process of decoding the triples. In the end, we design a novel single-module global pointer decoding approach to alleviate the generation of redundant information. Specifically, we analyze the decoding process of single character entities individually, improving the time and space performance of RSGP to some extent.

RESULTS

In order to verify the effectiveness of our model in extracting Chinese medical entities and relations, we carry out the experiments on the public dataset, CMeIE. Experimental results show that RSGP performs significantly better on the joint extraction of Chinese medical entities and relations, and achieves state-of-the-art results compared with baseline models.

CONCLUSION

The proposed RSGP can effectively extract entities and relations from Chinese medical texts and help to realize the structure of Chinese medical texts, so as to provide high-quality data support for the construction of Chinese medical knowledge graphs.

Collapse

Mojibian A, Jaskolka J, Ching G, Lee B, Myers R, Devine C, Nicolaou S, Parker W. The Efficacy of a Named Entity Recognition AI Model for Identifying Incidental Pulmonary Nodules in CT Reports. Can Assoc Radiol J 2024:8465371241266785. [PMID: 39066637 DOI: 10.1177/08465371241266785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/30/2024] Open

Kuo NIH, Perez-Concha O, Hanly M, Mnatzaganian E, Hao B, Di Sipio M, Yu G, Vanjara J, Valerie IC, de Oliveira Costa J, Churches T, Lujic S, Hegarty J, Jorm L, Barbieri S. Enriching Data Science and Health Care Education: Application and Impact of Synthetic Data Sets Through the Health Gym Project. JMIR MEDICAL EDUCATION 2024;10:e51388. [PMID: 38227356 PMCID: PMC10828942 DOI: 10.2196/51388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Revised: 10/20/2023] [Accepted: 11/08/2023] [Indexed: 01/17/2024]

Wibaek R, Andersen GS, Dahm CC, Witte DR, Hulman A. Large Language Models for Epidemiological Research via Automated Machine Learning: Case Study Using Data From the British National Child Development Study. JMIR Med Inform 2023;11:e43638. [PMID: 37787655 PMCID: PMC10547934 DOI: 10.2196/43638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 06/29/2023] [Accepted: 07/22/2023] [Indexed: 10/04/2023] Open

Abstract

Background

Large language models have had a huge impact on natural language processing (NLP) in recent years. However, their application in epidemiological research is still limited to the analysis of electronic health records and social media data.

objectives

To demonstrate the potential of NLP beyond these domains, we aimed to develop prediction models based on texts collected from an epidemiological cohort and compare their performance to classical regression methods.

Methods

We used data from the British National Child Development Study, where 10,567 children aged 11 years wrote essays about how they imagined themselves as 25-year-olds. Overall, 15% of the data set was set aside as a test set for performance evaluation. Pretrained language models were fine-tuned using AutoTrain (Hugging Face) to predict current reading comprehension score (range: 0-35) and future BMI and physical activity (active vs inactive) at the age of 33 years. We then compared their predictive performance (accuracy or discrimination) with linear and logistic regression models, including demographic and lifestyle factors of the parents and children from birth to the age of 11 years as predictors.

Results

NLP clearly outperformed linear regression when predicting reading comprehension scores (root mean square error: 3.89, 95% CI 3.74-4.05 for NLP vs 4.14, 95% CI 3.98-4.30 and 5.41, 95% CI 5.23-5.58 for regression models with and without general ability score as a predictor, respectively). Predictive performance for physical activity was similarly poor for the 2 methods (area under the receiver operating characteristic curve: 0.55, 95% CI 0.52-0.60 for both) but was slightly better than random assignment, whereas linear regression clearly outperformed the NLP approach when predicting BMI (root mean square error: 4.38, 95% CI 4.02-4.74 for NLP vs 3.85, 95% CI 3.54-4.16 for regression). The NLP approach did not perform better than simply assigning the mean BMI from the training set as a predictor.

Conclusions

Our study demonstrated the potential of using large language models on text collected from epidemiological studies. The performance of the approach appeared to depend on how directly the topic of the text was related to the outcome. Open-ended questions specifically designed to capture certain health concepts and lived experiences in combination with NLP methods should receive more attention in future epidemiological studies.

Collapse

Peng C, Yang X, Yu Z, Bian J, Hogan WR, Wu Y. Clinical concept and relation extraction using prompt-based machine reading comprehension. J Am Med Inform Assoc 2023;30:1486-1493. [PMID: 37316988 PMCID: PMC10436141 DOI: 10.1093/jamia/ocad107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 05/08/2023] [Accepted: 06/05/2023] [Indexed: 06/16/2023] Open

Abstract

OBJECTIVE

To develop a natural language processing system that solves both clinical concept extraction and relation extraction in a unified prompt-based machine reading comprehension (MRC) architecture with good generalizability for cross-institution applications.

METHODS

We formulate both clinical concept extraction and relation extraction using a unified prompt-based MRC architecture and explore state-of-the-art transformer models. We compare our MRC models with existing deep learning models for concept extraction and end-to-end relation extraction using 2 benchmark datasets developed by the 2018 National NLP Clinical Challenges (n2c2) challenge (medications and adverse drug events) and the 2022 n2c2 challenge (relations of social determinants of health [SDoH]). We also evaluate the transfer learning ability of the proposed MRC models in a cross-institution setting. We perform error analyses and examine how different prompting strategies affect the performance of MRC models.

RESULTS AND CONCLUSION

The proposed MRC models achieve state-of-the-art performance for clinical concept and relation extraction on the 2 benchmark datasets, outperforming previous non-MRC transformer models. GatorTron-MRC achieves the best strict and lenient F1-scores for concept extraction, outperforming previous deep learning models on the 2 datasets by 1%-3% and 0.7%-1.3%, respectively. For end-to-end relation extraction, GatorTron-MRC and BERT-MIMIC-MRC achieve the best F1-scores, outperforming previous deep learning models by 0.9%-2.4% and 10%-11%, respectively. For cross-institution evaluation, GatorTron-MRC outperforms traditional GatorTron by 6.4% and 16% for the 2 datasets, respectively. The proposed method is better at handling nested/overlapped concepts, extracting relations, and has good portability for cross-institute applications. Our clinical MRC package is publicly available at https://github.com/uf-hobi-informatics-lab/ClinicalTransformerMRC.

Collapse

Ullah Miah MS, Sulaiman J, Sarwar TB, Islam SS, Rahman M, Haque MS. Medical Named Entity Recognition (MedNER): A Deep Learning Model for Recognizing Medical Entities (Drug, Disease) from Scientific Texts. IEEE EUROCON 2023 - 20TH INTERNATIONAL CONFERENCE ON SMART TECHNOLOGIES 2023. [DOI: 10.1109/eurocon56442.2023.10199075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]

Schäfer H, Idrissi-Yaghir A, Bewersdorff J, Frihat S, Friedrich CM, Zesch T. Medication event extraction in clinical notes: Contribution of the WisPerMed team to the n2c2 2022 challenge. J Biomed Inform 2023;143:104400. [PMID: 37211196 DOI: 10.1016/j.jbi.2023.104400] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Revised: 04/21/2023] [Accepted: 05/15/2023] [Indexed: 05/23/2023]

Yang L, Huang X, Wang J, Yang X, Ding L, Li Z, Li J. Identifying stroke-related quantified evidence from electronic health records in real-world studies. Artif Intell Med 2023;140:102552. [PMID: 37210153 DOI: 10.1016/j.artmed.2023.102552] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 02/28/2023] [Accepted: 04/11/2023] [Indexed: 05/22/2023]

Abstract

BACKGROUND

Stroke is one of the leading causes of death and disability worldwide. The National Institutes of Health Stroke Scale (NIHSS) scores in electronic health records (EHRs), which quantitatively describe patients' neurological deficits in evidence-based treatment, are crucial in stroke-related clinical investigations. However, the free-text format and lack of standardization inhibit their effective use. Automatically extracting the scale scores from the clinical free text so that its potential value in real-world studies is realized has become an important goal.

OBJECTIVE

This study aims to develop an automated method to extract scale scores from the free text of EHRs.

METHODS

We propose a two-step pipeline method to identify NIHSS items and numerical scores and validate its feasibility using a freely accessible critical care database: MIMIC-III (Medical Information Mart for Intensive Care III). First, we utilize MIMIC-III to create an annotated corpus. Then, we investigate possible machine learning methods for two subtasks, NIHSS item and score recognition and item-score relation extraction. In the evaluation, we conduct both task-specific and end-to-end evaluations and compare our method with the rule-based method using precision, recall and F1 scores as evaluation metrics.

RESULTS

We use all available discharge summaries of stroke cases in MIMIC-III. The annotated NIHSS corpus contains 312 cases, 2929 scale items, 2774 scores and 2733 relations. The results show that the best F1-score of our method was 0.9006, which was attained by combining BERT-BiLSTM-CRF and Random Forest, and it outperformed the rule-based method (F1-score = 0.8098). In the end-to-end task, our method could successfully recognize the item "1b level of consciousness questions", the score "1" and their relation "('1b level of consciousness questions', '1', 'has value')" from the sentence "1b level of consciousness questions: said name = 1", while the rule-based method could not.

CONCLUSIONS

The two-step pipeline method we propose is an effective approach to identify NIHSS items, scores and their relations. With its help, clinical investigators can easily retrieve and access structured scale data, thereby supporting stroke-related real-world studies.

Collapse

Affiliation(s)

Lin Yang Institute of Medical Information and Library, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing 100020, China; Key Laboratory of Medical Information Intelligent Technology, Chinese Academy of Medical Sciences, Beijing 100020, China
Xiaoshuo Huang Institute of Medical Information and Library, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing 100020, China; School of Health Care Technology, Dalian Neusoft University of Information, Dalian 116023, China
Jiayang Wang Institute of Medical Information and Library, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing 100020, China
Xin Yang China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China; National Center for Healthcare Quality Management in Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China
Lingling Ding China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China; Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China
Zixiao Li China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China; National Center for Healthcare Quality Management in Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China; Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China
Jiao Li Institute of Medical Information and Library, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing 100020, China; Key Laboratory of Medical Information Intelligent Technology, Chinese Academy of Medical Sciences, Beijing 100020, China.

Collapse

Shi B, Fan R, Zhang L, Huang J, Xiong N, Vasilakos A, Wan J, Zhang L. A Joint Extraction System Based on Conditional Layer Normalization for Health Monitoring. SENSORS (BASEL, SWITZERLAND) 2023;23:4812. [PMID: 37430725 DOI: 10.3390/s23104812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Revised: 05/10/2023] [Accepted: 05/11/2023] [Indexed: 07/12/2023]

Vasilakes J, Georgiadis P, Nguyen NT, Miwa M, Ananiadou S. Contextualized medication event extraction with levitated markers. J Biomed Inform 2023;141:104347. [PMID: 37030658 DOI: 10.1016/j.jbi.2023.104347] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 03/23/2023] [Indexed: 04/09/2023]

Kaplar A, Stošović M, Kaplar A, Brković V, Naumović R, Kovačević A. Evaluation of clinical named entity recognition methods for Serbian electronic health records. Int J Med Inform 2022;164:104805. [PMID: 35653828 DOI: 10.1016/j.ijmedinf.2022.104805] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 05/06/2022] [Accepted: 05/22/2022] [Indexed: 11/25/2022]

Abstract

BACKGROUND AND OBJECTIVES

The importance of clinical natural language processing (NLP) has increased with the adoption of electronic health records (EHRs). One of the critical tasks in clinical NLP is named entity recognition (NER). Clinical NER in the Serbian language is a severely under-researched area. The few approaches that have been proposed so far are based on rules or machine-learning models with hand-crafted features, while current state-of-the-art models have not been explored. The objective of this paper is to assess the performance of state-of-the-art NER methods on clinical narratives in the Serbian language.

MATERIALS AND METHODS

We designed an experimental setup for a comprehensive evaluation of state-of-the-art NER models. The gold standard corpus we used for the evaluation is comprised of discharge summaries from the Clinic for Nephrology at the University Clinical Center of Serbia. The following models were evaluated: conditional random fields (CRF), multilingual transformers (BERT Multilingual and XLM RoBERTa), and long short-term memory (LSTM) recurrent neural networks, and their ensembles. In addition, we investigated the necessity of the pretraining task of transformer based models and the use of pretrained word embeddings with LSTM model.

RESULTS

Our results show that individually CRF had the best precision, the pretrained BERT Multilingual model had the best recall values, and the LSTM model had the best F1 score. The best performance was achieved by combining the existing models in a majority voting ensemble with an F1 score of 0.892. The presented results are similar to the inter annotator agreement on our gold standard corpus and are comparable to existing state-of-the-art results for clinical NER reported in literature.

CONCLUSION

Existing state-of-the-art models can provide viable results for clinical named entity recognition when applied to languages with the complexity of the Serbian language without major modifications.

Collapse

Current Approaches and Applications in Natural Language Processing. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12104859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Privacy-Preserving Mimic Models for clinical Named Entity Recognition in French. J Biomed Inform 2022;130:104073. [DOI: 10.1016/j.jbi.2022.104073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 02/09/2022] [Accepted: 04/07/2022] [Indexed: 11/18/2022]

Ebbehoj A, Thunbo MØ, Andersen OE, Glindtvad MV, Hulman A. Transfer learning for non-image data in clinical research: A scoping review. PLOS DIGITAL HEALTH 2022;1:e0000014. [PMID: 36812540 PMCID: PMC9931256 DOI: 10.1371/journal.pdig.0000014] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 12/15/2021] [Indexed: 01/14/2023]

Zhang X, Gao F, Zhou L, Jing S, Wang Z, Wang Y, Miao S, Zhang X, Guo J, Shan T, Liu Y. Fine-Grained Drug Interaction Extraction Based on Entity Pair Calibration and Pre-Training Model for Chinese Drug Instructions. INT J SEMANT WEB INF 2022. [DOI: 10.4018/ijswis.307908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]