1
|
Durango MC, Torres-Silva EA, Orozco-Duque A. Named Entity Recognition in Electronic Health Records: A Methodological Review. Healthc Inform Res 2023; 29:286-300. [PMID: 37964451 PMCID: PMC10651400 DOI: 10.4258/hir.2023.29.4.286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 07/29/2023] [Accepted: 09/03/2023] [Indexed: 11/16/2023] Open
Abstract
OBJECTIVES A substantial portion of the data contained in Electronic Health Records (EHR) is unstructured, often appearing as free text. This format restricts its potential utility in clinical decision-making. Named entity recognition (NER) methods address the challenge of extracting pertinent information from unstructured text. The aim of this study was to outline the current NER methods and trace their evolution from 2011 to 2022. METHODS We conducted a methodological literature review of NER methods, with a focus on distinguishing the classification models, the types of tagging systems, and the languages employed in various corpora. RESULTS Several methods have been documented for automatically extracting relevant information from EHRs using natural language processing techniques such as NER and relation extraction (RE). These methods can automatically extract concepts, events, attributes, and other data, as well as the relationships between them. Most NER studies conducted thus far have utilized corpora in English or Chinese. Additionally, the bidirectional encoder representation from transformers using the BIO tagging system architecture is the most frequently reported classification scheme. We discovered a limited number of papers on the implementation of NER or RE tasks in EHRs within a specific clinical domain. CONCLUSIONS EHRs play a pivotal role in gathering clinical information and could serve as the primary source for automated clinical decision support systems. However, the creation of new corpora from EHRs in specific clinical domains is essential to facilitate the swift development of NER and RE models applied to EHRs for use in clinical practice.
Collapse
Affiliation(s)
- María C. Durango
- Grupo de Investigación e Innovación Biomédica, Instituto Tecnológico Metropolitano, Antioquia,
Colombia
| | - Ever A. Torres-Silva
- Grupo de Investigación e Innovación Biomédica, Instituto Tecnológico Metropolitano, Antioquia,
Colombia
| | - Andrés Orozco-Duque
- Grupo de Investigación e Innovación Biomédica, Instituto Tecnológico Metropolitano, Antioquia,
Colombia
- Facultad de Ingenierías, Universidad de Medellín, Antioquia,
Colombia
| |
Collapse
|
2
|
Jantscher M, Gunzer F, Kern R, Hassler E, Tschauner S, Reishofer G. Information extraction from German radiological reports for general clinical text and language understanding. Sci Rep 2023; 13:2353. [PMID: 36759679 PMCID: PMC9911592 DOI: 10.1038/s41598-023-29323-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Accepted: 02/02/2023] [Indexed: 02/11/2023] Open
Abstract
Recent advances in deep learning and natural language processing (NLP) have opened many new opportunities for automatic text understanding and text processing in the medical field. This is of great benefit as many clinical downstream tasks rely on information from unstructured clinical documents. However, for low-resource languages like German, the use of modern text processing applications that require a large amount of training data proves to be difficult, as only few data sets are available mainly due to legal restrictions. In this study, we present an information extraction framework that was initially pre-trained on real-world computed tomographic (CT) reports of head examinations, followed by domain adaptive fine-tuning on reports from different imaging examinations. We show that in the pre-training phase, the semantic and contextual meaning of one clinical reporting domain can be captured and effectively transferred to foreign clinical imaging examinations. Moreover, we introduce an active learning approach with an intrinsic strategic sampling method to generate highly informative training data with low human annotation cost. We see that the model performance can be significantly improved by an appropriate selection of the data to be annotated, without the need to train the model on a specific downstream task. With a general annotation scheme that can be used not only in the radiology field but also in a broader clinical setting, we contribute to a more consistent labeling and annotation process that also facilitates the verification and evaluation of language models in the German clinical setting.
Collapse
Affiliation(s)
| | - Felix Gunzer
- Division of Neuroradiology, Vascular and Interventional Radiology, Department of Radiology, Medical University Graz, 8036, Graz, Austria
| | | | - Eva Hassler
- Division of Neuroradiology, Vascular and Interventional Radiology, Department of Radiology, Medical University Graz, 8036, Graz, Austria
| | - Sebastian Tschauner
- Division of Pediatric Radiology, Department of Radiology, Medical University Graz, 8036, Graz, Austria
| | - Gernot Reishofer
- Department of Radiology, Medical University Graz, 8036, Graz, Austria. .,BioTechMed-Graz, 8010, Graz, Austria.
| |
Collapse
|
3
|
Wan C, Feng W, Ma R, Ma H, Wang J, Huang R, Zhang X, Jing M, Yang H, Yu H, Liu Y. Association between depressive symptoms and diagnosis of diabetes and its complications: A network analysis in electronic health records. Front Psychiatry 2022; 13:966758. [PMID: 36213916 PMCID: PMC9543719 DOI: 10.3389/fpsyt.2022.966758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Accepted: 08/08/2022] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVES Diabetes and its complications are commonly associated with depressive symptoms, and few studies have investigated the diagnosis effect of depressive symptoms in patients with diabetes. The present study used a network-based approach to explore the association between depressive symptoms, which are annotated from electronic health record (EHR) notes by a deep learning model, and the diagnosis of type 2 diabetes mellitus (T2DM) and its complications. METHODS In this study, we used anonymous admission notes of 52,139 inpatients diagnosed with T2DM at the first affiliated hospital of Nanjing Medical University from 2008 to 2016 as input for a symptom annotation model named T5-depression based on transformer architecture which helps to annotate depressive symptoms from present illness. We measured the performance of the model by using the F1 score and the area under the receiver operating characteristic curve (AUROC). We constructed networks of depressive symptoms to examine the connectivity of these networks in patients diagnosed with T2DM, including those with certain complications. RESULTS The T5-depression model achieved the best performance with an F1-score of 91.71 and an AUROC of 96.25 compared with the benchmark models. The connectivity of depressive symptoms in patients diagnosed with T2DM (p = 0.025) and hypertension (p = 0.013) showed a statistically significant increase 2 years after the diagnosis, which is consistent with the number of patients diagnosed with depression. CONCLUSION The T5-depression model proposed in this study can effectively annotate depressive symptoms in EHR notes. The connectivity of annotated depressive symptoms is associated with the diagnosis of T2DM and hypertension. The changes in the network of depressive symptoms generated by the T5-depression model could be used as an indicator for screening depression.
Collapse
Affiliation(s)
- Cheng Wan
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Wei Feng
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Renyi Ma
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Hui Ma
- Department of Medical Psychology, Nanjing Brain Hospital, Nanjing Medical University, Nanjing, China
| | - Junjie Wang
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Ruochen Huang
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Xin Zhang
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China.,Department of Information, The First Affiliated Hospital, Nanjing Medical University, Nanjing, China
| | - Mang Jing
- Department of Information, The First Affiliated Hospital, Nanjing Medical University, Nanjing, China
| | - Hao Yang
- Department of Medical Psychology, Nanjing Brain Hospital, Nanjing Medical University, Nanjing, China
| | - Haoran Yu
- Department of Medical Psychology, Nanjing Brain Hospital, Nanjing Medical University, Nanjing, China
| | - Yun Liu
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China.,Department of Information, The First Affiliated Hospital, Nanjing Medical University, Nanjing, China
| |
Collapse
|