1
|
Perez N, Cuadros M, Rigau G. Negation and speculation processing: A study on cue-scope labelling and assertion classification in Spanish clinical text. Artif Intell Med 2023; 145:102682. [PMID: 37925211 DOI: 10.1016/j.artmed.2023.102682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 08/25/2023] [Accepted: 10/06/2023] [Indexed: 11/06/2023]
Abstract
Natural Language Processing (NLP) based on new deep learning technology is contributing to the emergence of powerful solutions that help healthcare providers and researchers discover valuable patterns within insurmountable volumes of health records and scientific literature. Fundamental to the success of such solutions is the processing of negation and speculation. The article addresses this problem with state-of-the-art deep learning approaches from two perspectives: cue and scope labelling, and assertion classification. In light of the real struggle to access clinical annotated data, the study (a) proposes a methodology to automatically convert cue-scope annotations to assertion annotations; and (b) includes a range of scenarios with varying amounts of training data and adversarial test examples. The results expose the clear advantage of Transformer-based models in this regard, managing to overpass a series of baselines and the related work in the public corpus NUBes of clinical Spanish text.
Collapse
Affiliation(s)
- Naiara Perez
- SNLT group at Vicomtech Foundation, Basque Research and Technology Alliance (BRTA), Mikeletegi Pasealekua 57, Donostia/San Sebastián, 20009, Spain; HiTZ Basque Center for Language Technologies, University of the Basque Country (UPV-EHU), Manuel Lardizabal Ibilbidea 1, Donostia/San Sebastián, 20018, Spain.
| | - Montse Cuadros
- SNLT group at Vicomtech Foundation, Basque Research and Technology Alliance (BRTA), Mikeletegi Pasealekua 57, Donostia/San Sebastián, 20009, Spain
| | - German Rigau
- HiTZ Basque Center for Language Technologies, University of the Basque Country (UPV-EHU), Manuel Lardizabal Ibilbidea 1, Donostia/San Sebastián, 20018, Spain
| |
Collapse
|
2
|
Argüello-González G, Aquino-Esperanza J, Salvador D, Bretón-Romero R, Del Río-Bermudez C, Tello J, Menke S. Negation recognition in clinical natural language processing using a combination of the NegEx algorithm and a convolutional neural network. BMC Med Inform Decis Mak 2023; 23:216. [PMID: 37833661 PMCID: PMC10576331 DOI: 10.1186/s12911-023-02301-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 09/18/2023] [Indexed: 10/15/2023] Open
Abstract
BACKGROUND Important clinical information of patients is present in unstructured free-text fields of Electronic Health Records (EHRs). While this information can be extracted using clinical Natural Language Processing (cNLP), the recognition of negation modifiers represents an important challenge. A wide range of cNLP applications have been developed to detect the negation of medical entities in clinical free-text, however, effective solutions for languages other than English are scarce. This study aimed at developing a solution for negation recognition in Spanish EHRs based on a combination of a customized rule-based NegEx layer and a convolutional neural network (CNN). METHODS Based on our previous experience in real world evidence (RWE) studies using information embedded in EHRs, negation recognition was simplified into a binary problem ('affirmative' vs. 'non-affirmative' class). For the NegEx layer, negation rules were obtained from a publicly available Spanish corpus and enriched with custom ones, whereby the CNN binary classifier was trained on EHRs annotated for clinical named entities (cNEs) and negation markers by medical doctors. RESULTS The proposed negation recognition pipeline obtained precision, recall, and F1-score of 0.93, 0.94, and 0.94 for the 'affirmative' class, and 0.86, 0.84, and 0.85 for the 'non-affirmative' class, respectively. To validate the generalization capabilities of our methodology, we applied the negation recognition pipeline on EHRs (6,710 cNEs) from a different data source distribution than the training corpus and obtained consistent performance metrics for the 'affirmative' and 'non-affirmative' class (0.95, 0.97, and 0.96; and 0.90, 0.83, and 0.86 for precision, recall, and F1-score, respectively). Lastly, we evaluated the pipeline against two publicly available Spanish negation corpora, the IULA and NUBes, obtaining state-of-the-art metrics (1.00, 0.99, and 0.99; and 1.00, 0.93, and 0.96 for precision, recall, and F1-score, respectively). CONCLUSION Negation recognition is a source of low precision in the retrieval of cNEs from EHRs' free-text. Combining a customized rule-based NegEx layer with a CNN binary classifier outperformed many other current approaches. RWE studies highly benefit from the correct recognition of negation as it reduces false positive detections of cNE which otherwise would undoubtedly reduce the credibility of cNLP systems.
Collapse
Affiliation(s)
- Guillermo Argüello-González
- MedSavana SL, Madrid, 28004, Spain
- Statistics and Operations Research, University of Oviedo, Oviedo, 33003, Spain
| | - José Aquino-Esperanza
- MedSavana SL, Madrid, 28004, Spain
- Faculty of Medicine and Health Sciences, University of Barcelona, Barcelona, 08007, Spain
| | | | | | | | | | | |
Collapse
|
3
|
Lu W, Jiang J, Shi Y, Zhong X, Gu J, Huangfu L, Gong M. Application of Entity-BERT model based on neuroscience and brain-like cognition in electronic medical record entity recognition. Front Neurosci 2023; 17:1259652. [PMID: 37799340 PMCID: PMC10547885 DOI: 10.3389/fnins.2023.1259652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Accepted: 08/14/2023] [Indexed: 10/07/2023] Open
Abstract
Introduction In the medical field, electronic medical records contain a large amount of textual information, and the unstructured nature of this information makes data extraction and analysis challenging. Therefore, automatic extraction of entity information from electronic medical records has become a significant issue in the healthcare domain. Methods To address this problem, this paper proposes a deep learning-based entity information extraction model called Entity-BERT. The model aims to leverage the powerful feature extraction capabilities of deep learning and the pre-training language representation learning of BERT(Bidirectional Encoder Representations from Transformers), enabling it to automatically learn and recognize various entity types in medical electronic records, including medical terminologies, disease names, drug information, and more, providing more effective support for medical research and clinical practices. The Entity-BERT model utilizes a multi-layer neural network and cross-attention mechanism to process and fuse information at different levels and types, resembling the hierarchical and distributed processing of the human brain. Additionally, the model employs pre-trained language and sequence models to process and learn textual data, sharing similarities with the language processing and semantic understanding of the human brain. Furthermore, the Entity-BERT model can capture contextual information and long-term dependencies, combining the cross-attention mechanism to handle the complex and diverse language expressions in electronic medical records, resembling the information processing method of the human brain in many aspects. Additionally, exploring how to utilize competitive learning, adaptive regulation, and synaptic plasticity to optimize the model's prediction results, automatically adjust its parameters, and achieve adaptive learning and dynamic adjustments from the perspective of neuroscience and brain-like cognition is of interest. Results and discussion Experimental results demonstrate that the Entity-BERT model achieves outstanding performance in entity recognition tasks within electronic medical records, surpassing other existing entity recognition models. This research not only provides more efficient and accurate natural language processing technology for the medical and health field but also introduces new ideas and directions for the design and optimization of deep learning models.
Collapse
Affiliation(s)
- Weijia Lu
- Science and Technology Department, Affiliated Hospital of Nantong University, Nantong, China
- Jianghai Hospital of Nantong Sutong Science and Technology Park, Nantong, China
| | - Jiehui Jiang
- Department of Biomedical Engineering, Shanghai University, Shanghai, China
| | - Yaxiang Shi
- Network Information Center, Zhongda Hospital Southeast University, Nanjing, China
| | - Xiaowei Zhong
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Jun Gu
- Department of Respiratory, Affiliated Hospital Nantong University, Nantong, China
| | - Lixia Huangfu
- Information Center Department, Affiliated Hospital of Nantong University, Nantong, China
| | - Ming Gong
- Information Center Department, Affiliated Hospital of Nantong University, Nantong, China
| |
Collapse
|
4
|
Poveda JL, Bretón-Romero R, Del Rio-Bermudez C, Taberna M, Medrano IH. How can artificial intelligence optimize value-based contracting? J Pharm Policy Pract 2022; 15:85. [PMID: 36401303 PMCID: PMC9673444 DOI: 10.1186/s40545-022-00475-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 10/26/2022] [Indexed: 11/19/2022] Open
Abstract
Efforts in the pharmaceutical market have been aimed at ensuring that the benefits obtained from the introduction of new therapies justify the associated costs. In recent years, drug payment models in healthcare have undergone a dramatic shift from focusing on volume (i.e., size of the target clinical population) to focusing on value (i.e., drug performance in real-world settings). In this context, value-based contracts (VBCs) were designed to align the payment of a drug to its clinical performance outside clinical trials by evaluating the effectiveness using real-word evidence (RWE). Despite their widespread implementation, different factors jeopardize the application of VBCs to most marketed drugs in a near future, including the need for easily measurable and relevant outcomes associated with clinical improvements, and access to a large patient population to assess said outcomes. Here, we argue that the extraction and analysis of massive amounts of RWE captured in patients' electronic health records (EHRs) will circumvent these issues and optimize negotiations in VBCs. Particularly, the use of Natural Language Processing (NLP) has proven successful in the analysis of structured and unstructured clinical information in EHRs in multicenter research studies. Thus, the application of NLP to analyze patient-centered information in EHRs in the context of innovative contracting can be utterly beneficial as it enables the real-time evaluation of treatment response and financial impact in real-world settings.
Collapse
Affiliation(s)
- Jose Luis Poveda
- Pharmacy Department, Drug Clinical Area, University and Polytechnic Hospital La Fe, Avda. Fernando Abril Martorell 106, 46026, Valencia, Spain.
| | | | | | | | | |
Collapse
|
5
|
Zhao Y, Ren B, Yu W, Zhang H, Zhao D, Lv J, Xie Z, Jiang K, Shang L, Yao H, Xu Y, Zhao G. Construction of an Assisted Model Based on Natural Language Processing for Automatic Early Diagnosis of Autoimmune Encephalitis. Neurol Ther 2022; 11:1117-1134. [PMID: 35543808 PMCID: PMC9338198 DOI: 10.1007/s40120-022-00355-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Accepted: 04/07/2022] [Indexed: 11/25/2022] Open
Abstract
Introduction Early diagnosis and etiological treatment can effectively improve the prognosis of patients with autoimmune encephalitis (AE). However, anti-neuronal antibody tests which provide the definitive diagnosis require time and are not always abnormal. By using natural language processing (NLP) technology, our study proposes an assisted diagnostic method for early clinical diagnosis of AE and compares its sensitivity with that of previously established criteria. Methods Our model is based on the text classification model trained by the history of present illness (HPI) in electronic medical records (EMRs) that present a definite pathological diagnosis of AE or infectious encephalitis (IE). The definitive diagnosis of IE was based on the results of traditional etiological examinations. The definitive diagnosis of AE was based on the results of neuronal antibodies, and the diagnostic criteria of definite autoimmune limbic encephalitis proposed by Graus et al. used as the reference standard for antibody-negative AE. First, we automatically recognized and extracted symptoms for all HPI texts in EMRs by training a dataset of 552 cases. Second, four text classification models trained by a dataset of 199 cases were established for differential diagnosis of AE and IE based on a post-structuring text dataset of every HPI, which was completed using symptoms in English language after the process of normalization of synonyms. The optimal model was identified by evaluating and comparing the performance of the four models. Finally, combined with three typical symptoms and the results of standard paraclinical tests such as cerebrospinal fluid (CSF), magnetic resonance imaging (MRI), or electroencephalogram (EEG) proposed from Graus criteria, an assisted early diagnostic model for AE was established on the basis of the text classification model with the best performance. Results The comparison results for the four models applied to the independent testing dataset showed the naïve Bayesian classifier with bag of words achieved the best performance, with an area under the receiver operating characteristic curve of 0.85, accuracy of 84.5% (95% confidence interval [CI] 74.0–92.0%), sensitivity of 86.7% (95% CI 69.3–96.2%), and specificity of 82.9% (95% CI 67.9–92.8%), respectively. Compared with the diagnostic criteria proposed previously, the early diagnostic sensitivity for possible AE using the assisted diagnostic model based on the independent testing dataset was improved from 73.3% (95% CI 54.1–87.7%) to 86.7% (95% CI 69.3–96.2%). Conclusions The assisted diagnostic model could effectively increase the early diagnostic sensitivity for AE compared to previous diagnostic criteria, assist physicians in establishing the diagnosis of AE automatically after inputting the HPI and the results of standard paraclinical tests according to their narrative habits for describing symptoms, avoiding misdiagnosis and allowing for prompt initiation of specific treatment. Supplementary Information The online version contains supplementary material available at 10.1007/s40120-022-00355-7.
Collapse
Affiliation(s)
- Yunsong Zhao
- Department of Neurology, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| | - Bin Ren
- Department of Information, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| | - Wenjin Yu
- Department of Neurology, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| | - Haijun Zhang
- Department of Neurology, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| | - Di Zhao
- Department of Neurology, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| | - Junchao Lv
- Department of Neurology, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| | - Zhen Xie
- College of Life Sciences and Medicine, Northwest University, Xi'an, China
| | - Kun Jiang
- Department of Information, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| | - Lei Shang
- Department of Health Statistics, Fourth Military Medical University, Xi'an, China
| | - Han Yao
- Department of Neurobiology, School of Basic Medicine, Fourth Military Medical University, Xi'an, China
| | - Yongyong Xu
- College of Life Sciences and Medicine, Northwest University, Xi'an, China.
| | - Gang Zhao
- Department of Neurology, Xijing Hospital, Fourth Military Medical University, Xi'an, China.
- College of Life Sciences and Medicine, Northwest University, Xi'an, China.
| |
Collapse
|
6
|
Solarte Pabón O, Montenegro O, Torrente M, Rodríguez González A, Provencio M, Menasalvas E. Negation and uncertainty detection in clinical texts written in Spanish: a deep learning-based approach. PeerJ Comput Sci 2022; 8:e913. [PMID: 35494817 PMCID: PMC9044225 DOI: 10.7717/peerj-cs.913] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Accepted: 02/10/2022] [Indexed: 06/14/2023]
Abstract
Detecting negation and uncertainty is crucial for medical text mining applications; otherwise, extracted information can be incorrectly identified as real or factual events. Although several approaches have been proposed to detect negation and uncertainty in clinical texts, most efforts have focused on the English language. Most proposals developed for Spanish have focused mainly on negation detection and do not deal with uncertainty. In this paper, we propose a deep learning-based approach for both negation and uncertainty detection in clinical texts written in Spanish. The proposed approach explores two deep learning methods to achieve this goal: (i) Bidirectional Long-Short Term Memory with a Conditional Random Field layer (BiLSTM-CRF) and (ii) Bidirectional Encoder Representation for Transformers (BERT). The approach was evaluated using NUBES and IULA, two public corpora for the Spanish language. The results obtained showed an F-score of 92% and 80% in the scope recognition task for negation and uncertainty, respectively. We also present the results of a validation process conducted using a real-life annotated dataset from clinical notes belonging to cancer patients. The proposed approach shows the feasibility of deep learning-based methods to detect negation and uncertainty in Spanish clinical texts. Experiments also highlighted that this approach improves performance in the scope recognition task compared to other proposals in the biomedical domain.
Collapse
Affiliation(s)
- Oswaldo Solarte Pabón
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Madrid, Spain
- Escuela de Ingeniería de Sistemas y Computación, Universidad del Valle, Cali, Colombia
| | - Orlando Montenegro
- Escuela de Ingeniería de Sistemas y Computación, Universidad del Valle, Cali, Colombia
| | | | | | | | - Ernestina Menasalvas
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Madrid, Spain
| |
Collapse
|
7
|
Grabar N, Grouin C. Year 2020 (with COVID): Observation of Scientific Literature on Clinical Natural Language Processing. Yearb Med Inform 2021; 30:257-263. [PMID: 34479397 PMCID: PMC8416212 DOI: 10.1055/s-0041-1726528] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Objectives:
To analyze the content of publications within the medical NLP domain in 2020.
Methods:
Automatic and manual preselection of publications to be reviewed, and selection of the best NLP papers of the year. Analysis of the important issues.
Results:
Three best papers have been selected in 2020. We also propose an analysis of the content of the NLP publications in 2020, all topics included.
Conclusion:
The two main issues addressed in 2020 are related to the investigation of COVID-related questions and to the further adaptation and use of transformer models. Besides, the trends from the past years continue, such as diversification of languages processed and use of information from social networks
Collapse
Affiliation(s)
- Natalia Grabar
- Université Paris Saclay, CNRS, Laboratoire Interdisciplinaire des Sciences du Numérique, Orsay, France.,STL, CNRS, Université de Lille, Domaine du Pont-de-bois, Villeneuve-d'Ascq cedex, France
| | - Cyril Grouin
- Université Paris Saclay, CNRS, Laboratoire Interdisciplinaire des Sciences du Numérique, Orsay, France
| | | |
Collapse
|