1
|
Jani M, Alfattni G, Belousov M, Laidlaw L, Zhang Y, Cheng M, Webb K, Hamilton R, Kanter AS, Dixon WG, Nenadic G. Development and evaluation of a text analytics algorithm for automated application of national COVID-19 shielding criteria in rheumatology patients. Ann Rheum Dis 2024:ard-2024-225544. [PMID: 38575324 DOI: 10.1136/ard-2024-225544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Accepted: 03/26/2024] [Indexed: 04/06/2024]
Abstract
INTRODUCTION At the beginning of the COVID-19 pandemic, the UK's Scientific Committee issued extreme social distancing measures, termed 'shielding', aimed at a subpopulation deemed extremely clinically vulnerable to infection. National guidance for risk stratification was based on patients' age, comorbidities and immunosuppressive therapies, including biologics that are not captured in primary care records. This process required considerable clinician time to manually review outpatient letters. Our aim was to develop and evaluate an automated shielding algorithm by text-mining outpatient letter diagnoses and medications, reducing the need for future manual review. METHODS Rheumatology outpatient letters from a large UK foundation trust were retrieved. Free-text diagnoses were processed using Intelligent Medical Objects software (Concept Tagger), which used interface terminology for each condition mapped to Systematized Medical Nomenclature for Medicine-Clinical Terminology (SNOMED-CT) codes. We developed the Medication Concept Recognition tool (Named Entity Recognition) to retrieve medications' type, dose, duration and status (active/past) at the time of the letter. Age, diagnosis and medication variables were then combined to calculate a shielding score based on the most recent letter. The algorithm's performance was evaluated using clinical review as the gold standard. The time taken to deploy the developed algorithm on a larger patient subset was measured. RESULTS In total, 5942 free-text diagnoses were extracted and mapped to SNOMED-CT, with 13 665 free-text medications (n=803 patients). The automated algorithm demonstrated a sensitivity of 80% (95% CI: 75%, 85%) and specificity of 92% (95% CI: 90%, 94%). Positive likelihood ratio was 10 (95% CI: 8, 14), negative likelihood ratio was 0.21 (95% CI: 0.16, 0.28) and F1 score was 0.81. Evaluation of mismatches revealed that the algorithm performed correctly against the gold standard in most cases. The developed algorithm was then deployed on records from an additional 15 865 patients, which took 18 hours for data extraction and 1 hour to deploy. DISCUSSION An automated algorithm for risk stratification has several advantages including reducing clinician time for manual review to allow more time for direct care, improving efficiency and increasing transparency in individual patient communication. It has the potential to be adapted for future public health initiatives that require prompt automated review of hospital outpatient letters.
Collapse
Affiliation(s)
- Meghna Jani
- Centre for Epidemiology Versus Arthritis, Centre for Musculoskeletal Research, The University of Manchester, Manchester, UK
- Department of Rheumatology, Northern Care Alliance NHS Foundation Trust Salford Care Organisation, Salford, UK
- NIHR Manchester Biomedical Research Centre, Manchester University NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester, UK
| | - Ghada Alfattni
- Department of Computer Science, The University of Manchester, Manchester, UK
- Department of Computer Science, Jamoum University College, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Maksim Belousov
- Department of Computer Science, The University of Manchester, Manchester, UK
| | - Lynn Laidlaw
- Centre for Epidemiology Versus Arthritis, Centre for Musculoskeletal Research, The University of Manchester, Manchester, UK
| | - Yuanyuan Zhang
- Centre for Epidemiology Versus Arthritis, Centre for Musculoskeletal Research, The University of Manchester, Manchester, UK
| | - Michael Cheng
- Department of Business Intelligence, Northern Care Alliance NHS Foundation Trust, Salford Care Organisation, Salford, UK
| | - Karim Webb
- Department of Business Intelligence, Northern Care Alliance NHS Foundation Trust, Salford Care Organisation, Salford, UK
| | - Robyn Hamilton
- Department of Business Intelligence, Northern Care Alliance NHS Foundation Trust, Salford Care Organisation, Salford, UK
| | - Andrew S Kanter
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - William G Dixon
- Centre for Epidemiology Versus Arthritis, Centre for Musculoskeletal Research, The University of Manchester, Manchester, UK
- Department of Rheumatology, Northern Care Alliance NHS Foundation Trust Salford Care Organisation, Salford, UK
- NIHR Manchester Biomedical Research Centre, Manchester University NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester, UK
| | - Goran Nenadic
- Department of Computer Science, The University of Manchester, Manchester, UK
| |
Collapse
|
2
|
Davies H, Nenadic G, Alfattni G, Arguello Casteleiro M, Al Moubayed N, Farrell SO, Radford AD, Noble PJM. Text mining for disease surveillance in veterinary clinical data: part one, the language of veterinary clinical records and searching for words. Front Vet Sci 2024; 11:1352239. [PMID: 38322169 PMCID: PMC10844486 DOI: 10.3389/fvets.2024.1352239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 01/09/2024] [Indexed: 02/08/2024] Open
Abstract
The development of natural language processing techniques for deriving useful information from unstructured clinical narratives is a fast-paced and rapidly evolving area of machine learning research. Large volumes of veterinary clinical narratives now exist curated by projects such as the Small Animal Veterinary Surveillance Network (SAVSNET) and VetCompass, and the application of such techniques to these datasets is already (and will continue to) improve our understanding of disease and disease patterns within veterinary medicine. In part one of this two part article series, we discuss the importance of understanding the lexical structure of clinical records and discuss the use of basic tools for filtering records based on key words and more complex rule based pattern matching approaches. We discuss the strengths and weaknesses of these approaches highlighting the on-going potential value in using these "traditional" approaches but ultimately recognizing that these approaches constrain how effectively information retrieval can be automated. This sets the scene for the introduction of machine-learning methodologies and the plethora of opportunities for automation of information extraction these present which is discussed in part two of the series.
Collapse
Affiliation(s)
- Heather Davies
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, United Kingdom
| | - Goran Nenadic
- Department of Computer Science, University of Manchester, Manchester, United Kingdom
| | - Ghada Alfattni
- Department of Computer Science, University of Manchester, Manchester, United Kingdom
- Department of Computer Science, Jamoum University College, Umm Al-Qura University, Makkah, Saudi Arabia
| | | | - Noura Al Moubayed
- Department of Computer Science, Durham University, Durham, United Kingdom
| | - Sean O. Farrell
- Department of Computer Science, Durham University, Durham, United Kingdom
| | - Alan D. Radford
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, United Kingdom
| | - Peter-John M. Noble
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, United Kingdom
| |
Collapse
|
3
|
Alfattni G, Peek N, Nenadic G. Attention-based bidirectional long short-term memory networks for extracting temporal relationships from clinical discharge summaries. J Biomed Inform 2021; 123:103915. [PMID: 34600144 DOI: 10.1016/j.jbi.2021.103915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2021] [Revised: 08/05/2021] [Accepted: 09/09/2021] [Indexed: 10/20/2022]
Abstract
Temporal relation extraction between health-related events is a widely studied task in clinical Natural Language Processing (NLP). The current state-of-the-art methods mostly rely on engineered features (i.e., rule-based modelling) and sequence modelling, which often encodes a source sentence into a single fixed-length context. An obvious disadvantage of this fixed-length context design is its incapability to model longer sentences, as important temporal information in the clinical text may appear at different positions. To address this issue, we propose an Attention-based Bidirectional Long Short-Term Memory (Att-BiLSTM) model to enable learning the important semantic information in long source text segments and to better determine which parts of the text are most important. We experimented with two embeddings and compared the performances to traditional state-of-the-art methods that require elaborate linguistic pre-processing and hand-engineered features. The experimental results on the i2b2 2012 temporal relation test corpus show that the proposed method achieves a significant improvement with an F-score of 0.811, which is at least 10% better than state-of-the-art in the field. We show that the model can be remarkably effective at classifying temporal relations when provided with word embeddings trained on corpora in a general domain. Finally, we perform an error analysis to gain insight into the common errors made by the model.
Collapse
Affiliation(s)
- Ghada Alfattni
- Department of Computer Science, University of Manchester, Manchester, UK; Department of Computer Science, Jamoum University College, Umm Al-Qura University, Makkah, Saudi Arabia.
| | - Niels Peek
- Centre for Health Informatics, Division of Informatics, Imaging and Data Sciences, University of Manchester, Manchester, UK; National Institute of Health Research Manchester Biomedical Research Centre, Manchester Academic Health Science Centre, University of Manchester, Manchester, UK; The Alan Turing Institute, UK
| | - Goran Nenadic
- Department of Computer Science, University of Manchester, Manchester, UK; The Alan Turing Institute, UK
| |
Collapse
|
4
|
Alfattni G, Belousov M, Peek N, Nenadic G. Extracting Drug Names and Associated Attributes From Discharge Summaries: Text Mining Study. JMIR Med Inform 2021; 9:e24678. [PMID: 33949962 PMCID: PMC8135022 DOI: 10.2196/24678] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 02/15/2021] [Accepted: 02/20/2021] [Indexed: 11/13/2022] Open
Abstract
Background Drug prescriptions are often recorded in free-text clinical narratives; making this information available in a structured form is important to support many health-related tasks. Although several natural language processing (NLP) methods have been proposed to extract such information, many challenges remain. Objective This study evaluates the feasibility of using NLP and deep learning approaches for extracting and linking drug names and associated attributes identified in clinical free-text notes and presents an extensive error analysis of different methods. This study initiated with the participation in the 2018 National NLP Clinical Challenges (n2c2) shared task on adverse drug events and medication extraction. Methods The proposed system (DrugEx) consists of a named entity recognizer (NER) to identify drugs and associated attributes and a relation extraction (RE) method to identify the relations between them. For NER, we explored deep learning-based approaches (ie, bidirectional long-short term memory with conditional random fields [BiLSTM-CRFs]) with various embeddings (ie, word embedding, character embedding [CE], and semantic-feature embedding) to investigate how different embeddings influence the performance. A rule-based method was implemented for RE and compared with a context-aware long-short term memory (LSTM) model. The methods were trained and evaluated using the 2018 n2c2 shared task data. Results The experiments showed that the best model (BiLSTM-CRFs with pretrained word embeddings [PWE] and CE) achieved lenient micro F-scores of 0.921 for NER, 0.927 for RE, and 0.855 for the end-to-end system. NER, which relies on the pretrained word and semantic embeddings, performed better on most individual entity types, but NER with PWE and CE had the highest classification efficiency among the proposed approaches. Extracting relations using the rule-based method achieved higher accuracy than the context-aware LSTM for most relations. Interestingly, the LSTM model performed notably better in the reason-drug relations, the most challenging relation type. Conclusions The proposed end-to-end system achieved encouraging results and demonstrated the feasibility of using deep learning methods to extract medication information from free-text data.
Collapse
Affiliation(s)
- Ghada Alfattni
- Department of Computer Science, University of Manchester, Manchester, United Kingdom.,Department of Computer Science, Jamoum University College, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Maksim Belousov
- Department of Computer Science, University of Manchester, Manchester, United Kingdom
| | - Niels Peek
- Centre for Health Informatics, Division of Informatics, Imaging and Data Sciences, University of Manchester, Manchester, United Kingdom.,National Institute of Health Research Manchester Biomedical Research Centre, Manchester Academic Health Science Centre, University of Manchester, Manchester, United Kingdom.,The Alan Turing Institute, Manchester, United Kingdom
| | - Goran Nenadic
- Department of Computer Science, University of Manchester, Manchester, United Kingdom.,The Alan Turing Institute, Manchester, United Kingdom
| |
Collapse
|
5
|
Alfattni G, Peek N, Nenadic G. Corrigendum to "Extraction of temporal relations from clinical free text: A systematic review of current approaches" [J. Biomed. Inf. 108 (2020) 103488]. J Biomed Inform 2020; 113:103663. [PMID: 33341543 DOI: 10.1016/j.jbi.2020.103663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Ghada Alfattni
- Department of Computer Science, University of Manchester, Manchester, UK; Department of Computer Science, Jamoum University College, Umm Al-Qura University, Makkah, Saudi Arabia.
| | - Niels Peek
- Centre for Health Informatics, Division of Informatics, Imaging and Data Sciences, University of Manchester, Manchester, UK; National Institute of Health Research Manchester Biomedical Research Centre, Manchester Academic Health Science Centre, University of Manchester, Manchester, UK; The Alan Turing Institute, UK
| | - Goran Nenadic
- Department of Computer Science, University of Manchester, Manchester, UK; The Alan Turing Institute, UK
| |
Collapse
|
6
|
Alfattni G, Peek N, Nenadic G. Extraction of temporal relations from clinical free text: A systematic review of current approaches. J Biomed Inform 2020; 108:103488. [PMID: 32673788 DOI: 10.1016/j.jbi.2020.103488] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Revised: 06/10/2020] [Accepted: 06/15/2020] [Indexed: 10/23/2022]
Abstract
BACKGROUND Temporal relations between clinical events play an important role in clinical assessment and decision making. Extracting such relations from free text data is a challenging task because it lies on between medical natural language processing, temporal representation and temporal reasoning. OBJECTIVES To survey existing methods for extracting temporal relations (TLINKs) between events from clinical free text in English; to establish the state-of-the-art in this field; and to identify outstanding methodological challenges. METHODS A systematic search in PubMed and the DBLP computer science bibliography was conducted for studies published between January 2006 and December 2018. The relevant studies were identified by examining the titles and abstracts. Then, the full text of selected studies was analyzed in depth and information were collected on TLINK tasks, TLINK types, data sources, features selection, methods used, and reported performance. RESULTS A total of 2834 publications were identified for title and abstract screening. Of these publications, 51 studies were selected. Thirty-two studies used machine learning approaches, 15 studies used a hybrid approaches, and only four studies used a rule-based approach. The majority of studies use publicly available corpora: THYME (28 studies) and the i2b2 corpus (17 studies). CONCLUSION The performance of TLINK extraction methods ranges widely depending on relation types and events (e.g. from 32% to 87% F-score for identifying relations between clinical events and document creation time). A small set of TLINKs (before, after, overlap and contains) has been widely studied with relatively good performance, whereas other types of TLINK (e.g., started by, finished by, precedes) are rarely studied and remain challenging. Machine learning classifiers (such as Support Vector Machine and Conditional Random Fields) and Deep Neural Networks were among the best performing methods for extracting TLINKs, but nearly all the work has been carried out and tested on two publicly available corpora only. The field would benefit from the availability of more publicly available, high-quality, annotated clinical text corpora.
Collapse
Affiliation(s)
- Ghada Alfattni
- Department of Computer Science, University of Manchester, Manchester, UK; Department of Computer Science, Jamoum University College, Umm Al-Qura University, Makkah, Saudi Arabia.
| | - Niels Peek
- Centre for Health Informatics, Division of Informatics, Imaging and Data Sciences, University of Manchester, Manchester, UK; National Institute of Health Research Manchester Biomedical Research Centre, Manchester Academic Health Science Centre, University of Manchester, Manchester, UK; The Alan Turing Institute, UK
| | - Goran Nenadic
- Department of Computer Science, University of Manchester, Manchester, UK; The Alan Turing Institute, UK
| |
Collapse
|
7
|
Alfattni G, Peek N, Nenadic G, Caskey F. Integrating text analytics and statistical modelling to analyse kidney transplant immune suppression medication in registry data. Int J Popul Data Sci 2017. [PMCID: PMC9351127 DOI: 10.23889/ijpds.v1i1.353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
|