1
|
van de Burgt BWM, Wasylewicz ATM, Dullemond B, Jessurun NT, Grouls RJE, Bouwman RA, Korsten EHM, Egberts TCG. Development of a text mining algorithm for identifying adverse drug reactions in electronic health records. JAMIA Open 2024; 7:ooae070. [PMID: 39156048 PMCID: PMC11328534 DOI: 10.1093/jamiaopen/ooae070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Revised: 07/03/2024] [Accepted: 08/13/2024] [Indexed: 08/20/2024] Open
Abstract
Objective Adverse drug reactions (ADRs) are a significant healthcare concern. They are often documented as free text in electronic health records (EHRs), making them challenging to use in clinical decision support systems (CDSS). The study aimed to develop a text mining algorithm to identify ADRs in free text of Dutch EHRs. Materials and Methods In Phase I, our previously developed CDSS algorithm was recoded and improved upon with the same relatively large dataset of 35 000 notes (Step A), using R to identify possible ADRs with Medical Dictionary for Regulatory Activities (MedDRA) terms and the related Systematized Nomenclature of Medicine Clinical Terms (SNOMED-CT) (Step B). In Phase II, 6 existing text-mining R-scripts were used to detect and present unique ADRs, and positive predictive value (PPV) and sensitivity were observed. Results In Phase IA, the recoded algorithm performed better than the previously developed CDSS algorithm, resulting in a PPV of 13% and a sensitivity of 93%. For The sensitivity for serious ADRs was 95%. The algorithm identified 58 additional possible ADRs. In Phase IB, the algorithm achieved a PPV of 10%, a sensitivity of 86%, and an F-measure of 0.18. In Phase II, four R-scripts enhanced the sensitivity and PPV of the algorithm, resulting in a PPV of 70%, a sensitivity of 73%, an F-measure of 0.71, and a 63% sensitivity for serious ADRs. Discussion and Conclusion The recoded Dutch algorithm effectively identifies ADRs from free-text Dutch EHRs using R-scripts and MedDRA/SNOMED-CT. The study details its limitations, highlighting the algorithm's potential and significant improvements.
Collapse
Affiliation(s)
- Britt W M van de Burgt
- Division of Clinical Pharmacy, Catharina Hospital Eindhoven, 5623 EJ Eindhoven, The Netherlands
- Division Healthcare Intelligence, Catharina Hospital Eindhoven, 5623 EJ Eindhoven, The Netherlands
- Department of Electrical Engineering, Signal Processing Group, Technical University Eindhoven, 5612 AP Eindhoven, The Netherlands
| | - Arthur T M Wasylewicz
- Division Healthcare Intelligence, Catharina Hospital Eindhoven, 5623 EJ Eindhoven, The Netherlands
| | - Bjorn Dullemond
- Department of Mathematics and Computer Science, Technical University Eindhoven, 5612 AP Eindhoven, The Netherlands
| | - Naomi T Jessurun
- Netherlands Pharmacovigilance Centre LAREB, 5237 MH 's-Hertogenbosch, The Netherlands
| | - Rene J E Grouls
- Division of Clinical Pharmacy, Catharina Hospital Eindhoven, 5623 EJ Eindhoven, The Netherlands
| | - R Arthur Bouwman
- Department of Electrical Engineering, Signal Processing Group, Technical University Eindhoven, 5612 AP Eindhoven, The Netherlands
- Department of Anesthesiology, Catharina Hospital Eindhoven, 5623 EJ Eindhoven, The Netherlands
| | - Erik H M Korsten
- Division Healthcare Intelligence, Catharina Hospital Eindhoven, 5623 EJ Eindhoven, The Netherlands
- Department of Electrical Engineering, Signal Processing Group, Technical University Eindhoven, 5612 AP Eindhoven, The Netherlands
| | - Toine C G Egberts
- Department of Clinical Pharmacy, University Medical Centre Utrecht, 3584 CX Utrecht, The Netherlands
- Department of Pharmacoepidemiology and Clinical Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, Utrecht University, 3584 CX Utrecht, The Netherlands
| |
Collapse
|
2
|
Modi S, Kasmiran KA, Mohd Sharef N, Sharum MY. Extracting adverse drug events from clinical Notes: A systematic review of approaches used. J Biomed Inform 2024; 151:104603. [PMID: 38331081 DOI: 10.1016/j.jbi.2024.104603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 01/31/2024] [Accepted: 02/01/2024] [Indexed: 02/10/2024]
Abstract
BACKGROUND An adverse drug event (ADE) is any unfavorable effect that occurs due to the use of a drug. Extracting ADEs from unstructured clinical notes is essential to biomedical text extraction research because it helps with pharmacovigilance and patient medication studies. OBJECTIVE From the considerable amount of clinical narrative text, natural language processing (NLP) researchers have developed methods for extracting ADEs and their related attributes. This work presents a systematic review of current methods. METHODOLOGY Two biomedical databases have been searched from June 2022 until December 2023 for relevant publications regarding this review, namely the databases PubMed and Medline. Similarly, we searched the multi-disciplinary databases IEEE Xplore, Scopus, ScienceDirect, and the ACL Anthology. We adopted the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 statement guidelines and recommendations for reporting systematic reviews in conducting this review. Initially, we obtained 5,537 articles from the search results from the various databases between 2015 and 2023. Based on predefined inclusion and exclusion criteria for article selection, 100 publications have undergone full-text review, of which we consider 82 for our analysis. RESULTS We determined the general pattern for extracting ADEs from clinical notes, with named entity recognition (NER) and relation extraction (RE) being the dual tasks considered. Researchers that tackled both NER and RE simultaneously have approached ADE extraction as a "pipeline extraction" problem (n = 22), as a "joint task extraction" problem (n = 7), and as a "multi-task learning" problem (n = 6), while others have tackled only NER (n = 27) or RE (n = 20). We further grouped the reviews based on the approaches for data extraction, namely rule-based (n = 8), machine learning (n = 11), deep learning (n = 32), comparison of two or more approaches (n = 11), hybrid (n = 12) and large language models (n = 8). The most used datasets are MADE 1.0, TAC 2017 and n2c2 2018. CONCLUSION Extracting ADEs is crucial, especially for pharmacovigilance studies and patient medications. This survey showcases advances in ADE extraction research, approaches, datasets, and state-of-the-art performance in them. Challenges and future research directions are highlighted. We hope this review will guide researchers in gaining background knowledge and developing more innovative ways to address the challenges.
Collapse
Affiliation(s)
- Salisu Modi
- Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Selangor, Malaysia; Department of Computer Science, Sokoto State University, Sokoto, Nigeria.
| | - Khairul Azhar Kasmiran
- Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Selangor, Malaysia.
| | - Nurfadhlina Mohd Sharef
- Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Selangor, Malaysia.
| | - Mohd Yunus Sharum
- Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Selangor, Malaysia.
| |
Collapse
|
3
|
Xie F, Chang J, Luong T, Wu B, Lustigova E, Shrader E, Chen W. Identifying Symptoms Prior to Pancreatic Ductal Adenocarcinoma Diagnosis in Real-World Care Settings: Natural Language Processing Approach. JMIR AI 2024; 3:e51240. [PMID: 38875566 PMCID: PMC11041417 DOI: 10.2196/51240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 12/08/2023] [Accepted: 12/16/2023] [Indexed: 06/16/2024]
Abstract
BACKGROUND Pancreatic cancer is the third leading cause of cancer deaths in the United States. Pancreatic ductal adenocarcinoma (PDAC) is the most common form of pancreatic cancer, accounting for up to 90% of all cases. Patient-reported symptoms are often the triggers of cancer diagnosis and therefore, understanding the PDAC-associated symptoms and the timing of symptom onset could facilitate early detection of PDAC. OBJECTIVE This paper aims to develop a natural language processing (NLP) algorithm to capture symptoms associated with PDAC from clinical notes within a large integrated health care system. METHODS We used unstructured data within 2 years prior to PDAC diagnosis between 2010 and 2019 and among matched patients without PDAC to identify 17 PDAC-related symptoms. Related terms and phrases were first compiled from publicly available resources and then recursively reviewed and enriched with input from clinicians and chart review. A computerized NLP algorithm was iteratively developed and fine-trained via multiple rounds of chart review followed by adjudication. Finally, the developed algorithm was applied to the validation data set to assess performance and to the study implementation notes. RESULTS A total of 408,147 and 709,789 notes were retrieved from 2611 patients with PDAC and 10,085 matched patients without PDAC, respectively. In descending order, the symptom distribution of the study implementation notes ranged from 4.98% for abdominal or epigastric pain to 0.05% for upper extremity deep vein thrombosis in the PDAC group, and from 1.75% for back pain to 0.01% for pale stool in the non-PDAC group. Validation of the NLP algorithm against adjudicated chart review results of 1000 notes showed that precision ranged from 98.9% (jaundice) to 84% (upper extremity deep vein thrombosis), recall ranged from 98.1% (weight loss) to 82.8% (epigastric bloating), and F1-scores ranged from 0.97 (jaundice) to 0.86 (depression). CONCLUSIONS The developed and validated NLP algorithm could be used for the early detection of PDAC.
Collapse
Affiliation(s)
- Fagen Xie
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA, United States
| | - Jenny Chang
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA, United States
| | - Tiffany Luong
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA, United States
| | - Bechien Wu
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA, United States
| | - Eva Lustigova
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA, United States
| | - Eva Shrader
- Pancreatic Cancer Action Network, Manhattan Beach, CA, United States
| | - Wansu Chen
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA, United States
| |
Collapse
|
4
|
Sim JA, Huang X, Horan MR, Stewart CM, Robison LL, Hudson MM, Baker JN, Huang IC. Natural language processing with machine learning methods to analyze unstructured patient-reported outcomes derived from electronic health records: A systematic review. Artif Intell Med 2023; 146:102701. [PMID: 38042599 PMCID: PMC10693655 DOI: 10.1016/j.artmed.2023.102701] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 09/30/2023] [Accepted: 10/29/2023] [Indexed: 12/04/2023]
Abstract
OBJECTIVE Natural language processing (NLP) combined with machine learning (ML) techniques are increasingly used to process unstructured/free-text patient-reported outcome (PRO) data available in electronic health records (EHRs). This systematic review summarizes the literature reporting NLP/ML systems/toolkits for analyzing PROs in clinical narratives of EHRs and discusses the future directions for the application of this modality in clinical care. METHODS We searched PubMed, Scopus, and Web of Science for studies written in English between 1/1/2000 and 12/31/2020. Seventy-nine studies meeting the eligibility criteria were included. We abstracted and summarized information related to the study purpose, patient population, type/source/amount of unstructured PRO data, linguistic features, and NLP systems/toolkits for processing unstructured PROs in EHRs. RESULTS Most of the studies used NLP/ML techniques to extract PROs from clinical narratives (n = 74) and mapped the extracted PROs into specific PRO domains for phenotyping or clustering purposes (n = 26). Some studies used NLP/ML to process PROs for predicting disease progression or onset of adverse events (n = 22) or developing/validating NLP/ML pipelines for analyzing unstructured PROs (n = 19). Studies used different linguistic features, including lexical, syntactic, semantic, and contextual features, to process unstructured PROs. Among the 25 NLP systems/toolkits we identified, 15 used rule-based NLP, 6 used hybrid NLP, and 4 used non-neural ML algorithms embedded in NLP. CONCLUSIONS This study supports the potential utility of different NLP/ML techniques in processing unstructured PROs available in EHRs for clinical care. Though using annotation rules for NLP/ML to analyze unstructured PROs is dominant, deploying novel neural ML-based methods is warranted.
Collapse
Affiliation(s)
- Jin-Ah Sim
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, TN, United States; School of AI Convergence, Hallym University, Chuncheon, Republic of Korea
| | - Xiaolei Huang
- Department of Computer Science, University of Memphis, Memphis, TN, United States
| | - Madeline R Horan
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, TN, United States
| | - Christopher M Stewart
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States
| | - Leslie L Robison
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, TN, United States
| | - Melissa M Hudson
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, TN, United States; Department of Oncology, St. Jude Children's Research Hospital, Memphis, TN, United States
| | - Justin N Baker
- Department of Pediatrics, Stanford University, Stanford, CA, United States
| | - I-Chan Huang
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, TN, United States.
| |
Collapse
|
5
|
Zolnour A, Eldredge CE, Faiola A, Yaghoobzadeh Y, Khani M, Foy D, Topaz M, Kharrazi H, Fung KW, Fontelo P, Davoudi A, Tabaie A, Breitinger SA, Oesterle TS, Rouhizadeh M, Zonnor Z, Moen H, Patrick TB, Zolnoori M. A risk identification model for detection of patients at risk of antidepressant discontinuation. Front Artif Intell 2023; 6:1229609. [PMID: 37693012 PMCID: PMC10484003 DOI: 10.3389/frai.2023.1229609] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 08/04/2023] [Indexed: 09/12/2023] Open
Abstract
Purpose Between 30 and 68% of patients prematurely discontinue their antidepressant treatment, posing significant risks to patient safety and healthcare outcomes. Online healthcare forums have the potential to offer a rich and unique source of data, revealing dimensions of antidepressant discontinuation that may not be captured by conventional data sources. Methods We analyzed 891 patient narratives from the online healthcare forum, "askapatient.com," utilizing content analysis to create PsyRisk-a corpus highlighting the risk factors associated with antidepressant discontinuation. Leveraging PsyRisk, alongside PsyTAR [a publicly available corpus of adverse drug reactions (ADRs) related to antidepressants], we developed a machine learning-driven algorithm for proactive identification of patients at risk of abrupt antidepressant discontinuation. Results From the analyzed 891 patients, 232 reported antidepressant discontinuation. Among these patients, 92% experienced ADRs, and 72% found these reactions distressful, negatively affecting their daily activities. Approximately 26% of patients perceived the antidepressants as ineffective. Most reported ADRs were physiological (61%, 411/673), followed by cognitive (30%, 197/673), and psychological (28%, 188/673) ADRs. In our study, we employed a nested cross-validation strategy with an outer 5-fold cross-validation for model selection, and an inner 5-fold cross-validation for hyperparameter tuning. The performance of our risk identification algorithm, as assessed through this robust validation technique, yielded an AUC-ROC of 90.77 and an F1-score of 83.33. The most significant contributors to abrupt discontinuation were high perceived distress from ADRs and perceived ineffectiveness of the antidepressants. Conclusion The risk factors identified and the risk identification algorithm developed in this study have substantial potential for clinical application. They could assist healthcare professionals in identifying and managing patients with depression who are at risk of prematurely discontinuing their antidepressant treatment.
Collapse
Affiliation(s)
- Ali Zolnour
- School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran
| | | | - Anthony Faiola
- College of Health Sciences, University of Kentucky, Lexington, KY, United States
| | | | - Masoud Khani
- Biomedical and Health Informatics, University of Wisconsin-Milwaukee, Milwaukee, WI, United States
| | - Doreen Foy
- School of Pharmacy, University of Pittsburgh, Pittsburgh, PA, United States
| | - Maxim Topaz
- School of Nursing and Data Science Institute, Columbia University, New York, NY, United States
- Center for Home Care Policy and Research, VNS Health, New York, NY, United States
| | - Hadi Kharrazi
- Department of Health Policy and Management, Johns Hopkins University, Baltimore, MD, United States
| | - Kin Wah Fung
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States
| | - Paul Fontelo
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States
| | - Anahita Davoudi
- Center for Home Care Policy and Research, VNS Health, New York, NY, United States
| | - Azade Tabaie
- Center of Biostatistics, Informatics, and Data Science, MedStar Health Research Institute, Washington, DC, United States
| | - Scott A. Breitinger
- Department of Psychiatry and Psychology, Mayo Clinic, Rochester, MN, United States
| | - Tyler S. Oesterle
- Department of Psychiatry and Psychology, Mayo Clinic, Rochester, MN, United States
| | - Masoud Rouhizadeh
- Collage of Pharmacy, University of Florida, Gainesville, FL, United States
| | - Zahra Zonnor
- Department of Biomechanics, Bu-Ali Sina University, Hamedan, Iran
| | - Hans Moen
- Department of Computer Science, Aalto University, Otaniemi, Finland
| | - Timothy B. Patrick
- Biomedical and Health Informatics, University of Wisconsin-Milwaukee, Milwaukee, WI, United States
| | - Maryam Zolnoori
- School of Nursing and Data Science Institute, Columbia University, New York, NY, United States
- Department of Psychiatry and Psychology, Mayo Clinic, Rochester, MN, United States
| |
Collapse
|
6
|
Zhang Y, Li X, Yang Y, Wang T. Disease- and Drug-Related Knowledge Extraction for Health Management from Online Health Communities Based on BERT-BiGRU-ATT. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:16590. [PMID: 36554472 PMCID: PMC9779596 DOI: 10.3390/ijerph192416590] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/01/2022] [Accepted: 12/06/2022] [Indexed: 06/17/2023]
Abstract
Knowledge extraction from rich text in online health communities can supplement and improve the existing knowledge base, supporting evidence-based medicine and clinical decision making. The extracted time series health management data of users can help users with similar conditions when managing their health. By annotating four relationships, this study constructed a deep learning model, BERT-BiGRU-ATT, to extract disease-medication relationships. A Chinese-pretrained BERT model was used to generate word embeddings for the question-and-answer data from online health communities in China. In addition, the bidirectional gated recurrent unit, combined with an attention mechanism, was employed to capture sequence context features and then to classify text related to diseases and drugs using a softmax classifier and to obtain the time series data provided by users. By using various word embedding training experiments and comparisons with classical models, the superiority of our model in relation to extraction was verified. Based on the knowledge extraction, the evolution of a user's disease progression was analyzed according to the time series data provided by users to further analyze the evolution of the user's disease progression. BERT word embedding, GRU, and attention mechanisms in our research play major roles in knowledge extraction. The knowledge extraction results obtained are expected to supplement and improve the existing knowledge base, assist doctors' diagnosis, and help users with dynamic lifecycle health management, such as user disease treatment management. In future studies, a co-reference resolution can be introduced to further improve the effect of extracting the relationships among diseases, drugs, and drug effects.
Collapse
Affiliation(s)
- Yanli Zhang
- College of Business Administration, Henan Finance University, Zhengzhou 451464, China
- Business School, Henan University, Kaifeng 475004, China
| | - Xinmiao Li
- School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai 200433, China
| | - Yu Yang
- School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai 200433, China
- China Banking and Insurance Regulatory Commission Neimengu Office, Hohhot 010019, China
| | - Tao Wang
- College of Business Administration, Henan Finance University, Zhengzhou 451464, China
| |
Collapse
|
7
|
Han F, Zhang Z, Zhang H, Nakaya J, Kudo K, Ogasawara K. Extraction and Quantification of Words Representing Degrees of Diseases: Combining the Fuzzy C-Means Method and Gaussian Membership. JMIR Form Res 2022; 6:e38677. [DOI: 10.2196/38677] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 09/29/2022] [Accepted: 10/24/2022] [Indexed: 11/19/2022] Open
Abstract
Background
Due to the development of medical data, a large amount of clinical data has been generated. These unstructured data contain substantial information. Extracting useful knowledge from this data and making scientific decisions for diagnosing and treating diseases have become increasingly necessary. Unstructured data, such as in the Marketplace for Medical Information in Intensive Care III (MIMIC-III) data set, contain several ambiguous words that demonstrate the subjectivity of doctors, such as descriptions of patient symptoms. These data could be used to further improve the accuracy of medical diagnostic system assessments. To the best of our knowledge, there is currently no method for extracting subjective words that express the extent of these symptoms (hereinafter, “degree words”).
Objective
Therefore, we propose using the fuzzy c-means (FCM) method and Gaussian membership to quantify the degree words in the clinical medical data set MIMIC-III.
Methods
First, we preprocessed the 381,091 radiology reports collected in MIMIC-III, and then we used the FCM method to extract degree words from unstructured text. Thereafter, we used the Gaussian membership method to quantify the extracted degree words, which transform the fuzzy words extracted from the medical text into computer-recognizable numbers.
Results
The results showed that the digitization of ambiguous words in medical texts is feasible. The words representing each degree of each disease had a range of corresponding values. Examples of membership medians were 2.971 (atelectasis), 3.121 (pneumonia), 2.899 (pneumothorax), 3.051 (pulmonary edema), and 2.435 (pulmonary embolus). Additionally, all extracted words contained the same subjective words (low, high, etc), which allows for an objective evaluation method. Furthermore, we will verify the specific impact of the quantification results of ambiguous words such as symptom words and degree words on the use of medical texts in subsequent studies. These same ambiguous words may be used as a new set of feature values to represent the disorders.
Conclusions
This study proposes an innovative method for handling subjective words. We used the FCM method to extract the subjective degree words in the English-interpreted report of the MIMIC-III and then used the Gaussian functions to quantify the subjective degree words. In this method, words containing subjectivity in unstructured texts can be automatically processed and transformed into numerical ranges by digital processing. It was concluded that the digitization of ambiguous words in medical texts is feasible.
Collapse
|
8
|
Lindvall C, Deng CY, Agaronnik ND, Kwok A, Samineni S, Umeton R, Mackie-Jenkins W, Kehl KL, Tulsky JA, Enzinger AC. Deep Learning for Cancer Symptoms Monitoring on the Basis of Electronic Health Record Unstructured Clinical Notes. JCO Clin Cancer Inform 2022; 6:e2100136. [PMID: 35714301 PMCID: PMC9232368 DOI: 10.1200/cci.21.00136] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
PURPOSE Symptoms are vital outcomes for cancer clinical trials, observational research, and population-level surveillance. Patient-reported outcomes (PROs) are valuable for monitoring symptoms, yet there are many challenges to collecting PROs at scale. We sought to develop, test, and externally validate a deep learning model to extract symptoms from unstructured clinical notes in the electronic health record. METHODS We randomly selected 1,225 outpatient progress notes from among patients treated at the Dana-Farber Cancer Institute between January 2016 and December 2019 and used 1,125 notes as our training/validation data set and 100 notes as our test data set. We evaluated the performance of 10 deep learning models for detecting 80 symptoms included in the National Cancer Institute's Patient-Reported Outcomes version of the Common Terminology Criteria for Adverse Events (PRO-CTCAE) framework. Model performance as compared with manual chart abstraction was assessed using standard metrics, and the highest performer was externally validated on a sample of 100 physician notes from a different clinical context. RESULTS In our training and test data sets, 75 of the 80 candidate symptoms were identified. The ELECTRA-small model had the highest performance for symptom identification at the token level (ie, at the individual symptom level), with an F1 of 0.87 and a processing time of 3.95 seconds per note. For the 10 most common symptoms in the test data set, the F1 score ranged from 0.98 for anxious to 0.86 for fatigue. For external validation of the same symptoms, the note-level performance ranged from F1 = 0.97 for diarrhea and dizziness to F1 = 0.73 for swelling. CONCLUSION Training a deep learning model to identify a wide range of electronic health record-documented symptoms relevant to cancer care is feasible. This approach could be used at the health system scale to complement to electronic PROs.
Collapse
Affiliation(s)
- Charlotta Lindvall
- Dana-Farber Cancer Institute, Boston, MA.,Harvard Medical School, Boston, MA.,Brigham and Women's Hospital, Boston, MA
| | | | - Nicole D Agaronnik
- Dana-Farber Cancer Institute, Boston, MA.,Harvard Medical School, Boston, MA
| | - Anne Kwok
- Dana-Farber Cancer Institute, Boston, MA
| | | | | | | | - Kenneth L Kehl
- Dana-Farber Cancer Institute, Boston, MA.,Harvard Medical School, Boston, MA.,Brigham and Women's Hospital, Boston, MA
| | - James A Tulsky
- Dana-Farber Cancer Institute, Boston, MA.,Harvard Medical School, Boston, MA.,Brigham and Women's Hospital, Boston, MA
| | - Andrea C Enzinger
- Dana-Farber Cancer Institute, Boston, MA.,Harvard Medical School, Boston, MA.,Brigham and Women's Hospital, Boston, MA
| |
Collapse
|
9
|
Chopard D, Treder MS, Corcoran P, Ahmed N, Johnson C, Busse M, Spasic I. Text Mining of Adverse Events in Clinical Trials: Deep Learning Approach. JMIR Med Inform 2021; 9:e28632. [PMID: 34951601 PMCID: PMC8742206 DOI: 10.2196/28632] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 08/01/2021] [Accepted: 11/14/2021] [Indexed: 11/28/2022] Open
Abstract
BACKGROUND Pharmacovigilance and safety reporting, which involve processes for monitoring the use of medicines in clinical trials, play a critical role in the identification of previously unrecognized adverse events or changes in the patterns of adverse events. OBJECTIVE This study aims to demonstrate the feasibility of automating the coding of adverse events described in the narrative section of the serious adverse event report forms to enable statistical analysis of the aforementioned patterns. METHODS We used the Unified Medical Language System (UMLS) as the coding scheme, which integrates 217 source vocabularies, thus enabling coding against other relevant terminologies such as the International Classification of Diseases-10th Revision, Medical Dictionary for Regulatory Activities, and Systematized Nomenclature of Medicine). We used MetaMap, a highly configurable dictionary lookup software, to identify the mentions of the UMLS concepts. We trained a binary classifier using Bidirectional Encoder Representations from Transformers (BERT), a transformer-based language model that captures contextual relationships, to differentiate between mentions of the UMLS concepts that represented adverse events and those that did not. RESULTS The model achieved a high F1 score of 0.8080, despite the class imbalance. This is 10.15 percent points lower than human-like performance but also 17.45 percent points higher than that of the baseline approach. CONCLUSIONS These results confirmed that automated coding of adverse events described in the narrative section of serious adverse event reports is feasible. Once coded, adverse events can be statistically analyzed so that any correlations with the trialed medicines can be estimated in a timely fashion.
Collapse
Affiliation(s)
- Daphne Chopard
- School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom
| | - Matthias S Treder
- School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom
| | - Padraig Corcoran
- School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom
| | - Nagheen Ahmed
- Centre for Trials Research, Cardiff University, Cardiff, United Kingdom
| | - Claire Johnson
- Centre for Trials Research, Cardiff University, Cardiff, United Kingdom
| | - Monica Busse
- Centre for Trials Research, Cardiff University, Cardiff, United Kingdom
| | - Irena Spasic
- School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom
| |
Collapse
|
10
|
Segev A, Iqbal E, McDonagh TA, Casetta C, Oloyede E, Piper S, Plymen CM, MacCabe JH. Clozapine-induced myocarditis: electronic health register analysis of incidence, timing, clinical markers and diagnostic accuracy. Br J Psychiatry 2021; 219:644-651. [PMID: 35048875 PMCID: PMC8636612 DOI: 10.1192/bjp.2021.58] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/13/2020] [Revised: 04/07/2021] [Accepted: 04/13/2021] [Indexed: 01/08/2023]
Abstract
BACKGROUND Clozapine is associated with increased risk of myocarditis. However, many common side-effects of clozapine overlap with the clinical manifestations of myocarditis. As a result, there is uncertainty about which signs, symptoms and investigations are important in distinguishing myocarditis from benign adverse effects of clozapine. Clarity on this issue is important, since missing a diagnosis of myocarditis or discontinuing clozapine unnecessarily may both have devastating consequences. AIMS To examine the clinical characteristics of clozapine-induced myocarditis and to identify which signs and symptoms distinguish true myocarditis from other clozapine adverse effects. METHOD A retrospective analysis of the record database for 247 621 patients was performed. A natural language processing algorithm identified the instances of patients in which myocarditis was suspected. The anonymised case notes for the patients of each suspected instance were then manually examined, and those whose instances were ambiguous were referred for an independent assessment by up to three cardiologists. Patients with suspected instances were classified as having confirmed myocarditis, myocarditis ruled out or undetermined. RESULTS Of 254 instances in 228 patients with suspected myocarditis, 11.4% (n = 29 instances) were confirmed as probable myocarditis. Troponin and C-reactive protein (CRP) had excellent diagnostic value (area under the curve 0.975 and 0.896, respectively), whereas tachycardia was of little diagnostic value. All confirmed instances occurred within 42 days of clozapine initiation. CONCLUSIONS Suspicion of myocarditis can lead to unnecessary discontinuation of clozapine. The 'critical period' for myocarditis emergence is the first 6 weeks, and clinical signs including tachycardia are of low specificity. Elevated CRP and troponin are the best markers for the need for further evaluation.
Collapse
Affiliation(s)
- Aviv Segev
- Department of Psychosis Studies, Institute of Psychiatry, Psychology and Neuroscience, King's College London, UK; Shalvata Mental Health Centre, Israel; and Sackler Faculty of Medicine, Tel Aviv University, Israel
| | - Ehtesham Iqbal
- The Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, UK
| | - Theresa A. McDonagh
- Cardiology Department, King's College Hospital and King's College London, UK
| | - Cecilia Casetta
- Department of Psychosis Studies, Institute of Psychiatry, Psychology and Neuroscience, King's College London, UK; and National Psychosis Service, South London and Maudsley NHS Foundation Trust, UK
| | - Ebenezer Oloyede
- Department of Psychosis Studies, Institute of Psychiatry, Psychology and Neuroscience, King's College London, UK; and Pharmacy Department, South London and Maudsley NHS Foundation Trust, UK
| | - Susan Piper
- Cardiology Department, King's College Hospital and King's College London, UK
| | - Carla M. Plymen
- Cardiology Department, Hammersmith Hospital, Imperial College Healthcare NHS Trust, UK
| | - James H. MacCabe
- Department of Psychosis Studies, Institute of Psychiatry, Psychology and Neuroscience, King's College London, UK; and National Psychosis Service, South London and Maudsley NHS Foundation Trust, UK
| |
Collapse
|
11
|
de Oliveira JM, da Costa CA, Antunes RS. Data structuring of electronic health records: a systematic review. HEALTH AND TECHNOLOGY 2021. [DOI: 10.1007/s12553-021-00607-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
|
12
|
Wasylewicz A, van de Burgt B, Weterings A, Jessurun N, Korsten E, Egberts T, Bouwman A, Kerskes M, Grouls R, van der Linden C. Identifying adverse drug reactions from free-text electronic hospital health record notes. Br J Clin Pharmacol 2021; 88:1235-1245. [PMID: 34468999 PMCID: PMC9292762 DOI: 10.1111/bcp.15068] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 08/16/2021] [Accepted: 08/18/2021] [Indexed: 11/30/2022] Open
Abstract
Background Adverse drug reactions (ADRs) are estimated to be the fifth cause of hospital death. Up to 50% are potentially preventable and a significant number are recurrent (reADRs). Clinical decision support systems have been used to prevent reADRs using structured reporting concerning the patient's ADR experience, which in current clinical practice is poorly performed. Identifying ADRs directly from free text in electronic health records (EHRs) could circumvent this. Aim To develop strategies to identify ADRs from free‐text notes in electronic hospital health records. Methods In stage I, the EHRs of 10 patients were reviewed to establish strategies for identifying ADRs. In stage II, complete EHR histories of 45 patients were reviewed for ADRs and compared to the strategies programmed into a rule‐based model. ADRs were classified using MedDRA and included in the study if the Naranjo causality score was ≥1. Seriousness was assessed using the European Medicine Agency's important medical event list. Results In stage I, two main search strategies were identified: keywords indicating an ADR and specific prepositions followed by medication names. In stage II, the EHRs contained a median of 7.4 (range 0.01‐18) years of medical history covering over 35 000 notes. A total of 318 unique ADRs were identified of which 63 were potentially serious and 179 (sensitivity 57%) were identified by the rule. The method falsely identified 377 ADRs (positive predictive value 32%). However, it also identified an additional eight ADRs. Conclusion Two key strategies were developed to identify ADRs from hospital EHRs using free‐text notes. The results appear promising and warrant further study.
Collapse
Affiliation(s)
- Arthur Wasylewicz
- Department of Healthcare Intelligence, Catharina Hospital, Eindhoven, the Netherlands
| | - Britt van de Burgt
- Department of Healthcare Intelligence, Catharina Hospital, Eindhoven, the Netherlands
| | - Aniek Weterings
- Department of Geriatrics, Catharina Hospital, Eindhoven, the Netherlands
| | - Naomi Jessurun
- Netherlands Pharmacovigilance Centre LAREB,'s-Hertogenbosch, the Netherlands
| | - Erik Korsten
- Department of Healthcare Intelligence, Catharina Hospital, Eindhoven, the Netherlands.,Department of Signal Processing Systems, Faculty of Electronic Engineering, Eindhoven University of Technology, Eindhoven, the Netherlands
| | - Toine Egberts
- Division of Pharmacoepidemiology and Clinical Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Department of Pharmaceutical Sciences, Faculty of Science, Utrecht University, Utrecht, the Netherlands.,Department of Clinical Pharmacy, University Medical Centre Utrecht, Utrecht, the Netherlands
| | - Arthur Bouwman
- Department of Signal Processing Systems, Faculty of Electronic Engineering, Eindhoven University of Technology, Eindhoven, the Netherlands.,Department of Anesthesiology, Catharina Hospital, Eindhoven, the Netherlands
| | - Marieke Kerskes
- Department of Clinical Pharmacy, Catharina Hospital, Eindhoven, the Netherlands
| | - René Grouls
- Department of Clinical Pharmacy, Catharina Hospital, Eindhoven, the Netherlands
| | | |
Collapse
|
13
|
Murugadoss K, Rajasekharan A, Malin B, Agarwal V, Bade S, Anderson JR, Ross JL, Faubion WA, Halamka JD, Soundararajan V, Ardhanari S. Building a best-in-class automated de-identification tool for electronic health records through ensemble learning. PATTERNS (NEW YORK, N.Y.) 2021; 2:100255. [PMID: 34179842 PMCID: PMC8212138 DOI: 10.1016/j.patter.2021.100255] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Revised: 02/24/2021] [Accepted: 04/07/2021] [Indexed: 10/29/2022]
Abstract
The presence of personally identifiable information (PII) in natural language portions of electronic health records (EHRs) constrains their broad reuse. Despite continuous improvements in automated detection of PII, residual identifiers require manual validation and correction. Here, we describe an automated de-identification system that employs an ensemble architecture, incorporating attention-based deep-learning models and rule-based methods, supported by heuristics for detecting PII in EHR data. Detected identifiers are then transformed into plausible, though fictional, surrogates to further obfuscate any leaked identifier. Our approach outperforms existing tools, with a recall of 0.992 and precision of 0.979 on the i2b2 2014 dataset and a recall of 0.994 and precision of 0.967 on a dataset of 10,000 notes from the Mayo Clinic. The de-identification system presented here enables the generation of de-identified patient data at the scale required for modern machine-learning applications to help accelerate medical discoveries.
Collapse
Affiliation(s)
| | | | - Bradley Malin
- Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | | | | | - Jeff R. Anderson
- Mayo Clinic, Rochester, MN 55905, USA
- Mayo Clinic Platform, Rochester, MN 55905, USA
| | | | | | - John D. Halamka
- Mayo Clinic, Rochester, MN 55905, USA
- Mayo Clinic Platform, Rochester, MN 55905, USA
| | | | | |
Collapse
|
14
|
Koleck TA, Tatonetti NP, Bakken S, Mitha S, Henderson MM, George M, Miaskowski C, Smaldone A, Topaz M. Identifying Symptom Information in Clinical Notes Using Natural Language Processing. Nurs Res 2021; 70:173-183. [PMID: 33196504 PMCID: PMC9109773 DOI: 10.1097/nnr.0000000000000488] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
BACKGROUND Symptoms are a core concept of nursing interest. Large-scale secondary data reuse of notes in electronic health records (EHRs) has the potential to increase the quantity and quality of symptom research. However, the symptom language used in clinical notes is complex. A need exists for methods designed specifically to identify and study symptom information from EHR notes. OBJECTIVES We aim to describe a method that combines standardized vocabularies, clinical expertise, and natural language processing to generate comprehensive symptom vocabularies and identify symptom information in EHR notes. We piloted this method with five diverse symptom concepts: constipation, depressed mood, disturbed sleep, fatigue, and palpitations. METHODS First, we obtained synonym lists for each pilot symptom concept from the Unified Medical Language System. Then, we used two large bodies of text (clinical notes from Columbia University Irving Medical Center and PubMed abstracts containing Medical Subject Headings or key words related to the pilot symptoms) to further expand our initial vocabulary of synonyms for each pilot symptom concept. We used NimbleMiner, an open-source natural language processing tool, to accomplish these tasks and evaluated NimbleMiner symptom identification performance by comparison to a manually annotated set of nurse- and physician-authored common EHR note types. RESULTS Compared to the baseline Unified Medical Language System synonym lists, we identified up to 11 times more additional synonym words or expressions, including abbreviations, misspellings, and unique multiword combinations, for each symptom concept. Natural language processing system symptom identification performance was excellent. DISCUSSION Using our comprehensive symptom vocabularies and NimbleMiner to label symptoms in clinical notes produced excellent performance metrics. The ability to extract symptom information from EHR notes in an accurate and scalable manner has the potential to greatly facilitate symptom science research.
Collapse
|
15
|
Al-Harrasi AM, Iqbal E, Tsamakis K, Lasek J, Gadelrab R, Soysal P, Kohlhoff E, Tsiptsios D, Rizos E, Perera G, Aarsland D, Stewart R, Mueller C. Motor signs in Alzheimer's disease and vascular dementia: Detection through natural language processing, co-morbid features and relationship to adverse outcomes. Exp Gerontol 2021; 146:111223. [PMID: 33450346 DOI: 10.1016/j.exger.2020.111223] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 11/09/2020] [Accepted: 12/21/2020] [Indexed: 11/21/2022]
Abstract
BACKGROUND Motor signs in patients with dementia are associated with a higher risk of cognitive decline, institutionalisation, death and increased health care costs, but prevalences differ between studies. The aims of this study were to employ a natural language processing pipeline to detect motor signs in a patient cohort in routine care; to explore which other difficulties occur co-morbid to motor signs; and whether these, as a group and individually, predict adverse outcomes. METHODS A cohort of 11,106 patients with dementia in Alzheimer's disease, vascular dementia or a combination was assembled from a large dementia care health records database in Southeast London. A natural language processing algorithm was devised in order to establish the presence of motor signs (bradykinesia, Parkinsonian gait, rigidity, tremor) recorded around the time of dementia diagnosis. We examined the co-morbidity profile of patients with these symptoms and used Cox regression models to analyse associations with survival and hospitalisation, adjusting for twenty-four potential confounders. RESULTS Less than 10% of patients were recorded to display any motor sign, and tremor was most frequently detected. Presence of motor signs was associated with younger age at diagnosis, neuropsychiatric symptoms, poor physical health and higher prescribing of psychotropics. Rigidity was independently associated with a 23% increased mortality risk after adjustment for confounders (p = 0.014). A non-significant trend for a 15% higher risk of hospitalisation was detected in those with a recorded Parkinsonian gait (p = 0.094). CONCLUSIONS With the exception of tremor, motor signs appear to be under-recorded in routine care. They are part of a complex clinical picture and often accompanied by neuropsychiatric and functional difficulties, and thereby associated with adverse outcomes. This underlines the need to establish structured examinations in routine clinical practice via easy-to-use tools.
Collapse
Affiliation(s)
- Ahmed M Al-Harrasi
- King's College London, Institute of Psychiatry, Psychology and Neuroscience, London, UK; Sultan Qaboos University Hospital, Muscat, Oman
| | - Ehtesham Iqbal
- King's College London, Institute of Psychiatry, Psychology and Neuroscience, London, UK
| | - Konstantinos Tsamakis
- King's College London, Institute of Psychiatry, Psychology and Neuroscience, London, UK; National and Kapodistrian University of Athens, School of Medicine, Second Department of Psychiatry, University General Hospital 'ATTIKON', Athens, Greece
| | - Judista Lasek
- South London and Maudsley NHS Foundation Trust, London, UK
| | | | - Pinar Soysal
- Department of Geriatric Medicine, Faculty of Medicine, Bezmialem Vakif University, Istanbul, Turkey
| | - Enno Kohlhoff
- Aragon Institute for Health Research (IIS Aragón), Zaragoza, Spain
| | - Dimitrios Tsiptsios
- Neurophysiology Department, Sunderland Royal Hospital, South Tyneside & Sunderland NHS Foundation Trust, Sunderland, UK
| | - Emmanouil Rizos
- National and Kapodistrian University of Athens, School of Medicine, Second Department of Psychiatry, University General Hospital 'ATTIKON', Athens, Greece
| | - Gayan Perera
- King's College London, Institute of Psychiatry, Psychology and Neuroscience, London, UK
| | - Dag Aarsland
- King's College London, Institute of Psychiatry, Psychology and Neuroscience, London, UK; South London and Maudsley NHS Foundation Trust, London, UK; Centre for Age-Related Medicine (SESAM), Stavanger University Hospital, Stavanger, Norway
| | - Robert Stewart
- King's College London, Institute of Psychiatry, Psychology and Neuroscience, London, UK; South London and Maudsley NHS Foundation Trust, London, UK
| | - Christoph Mueller
- King's College London, Institute of Psychiatry, Psychology and Neuroscience, London, UK; South London and Maudsley NHS Foundation Trust, London, UK.
| |
Collapse
|
16
|
Iqbal E, Govind R, Romero A, Dzahini O, Broadbent M, Stewart R, Smith T, Kim CH, Werbeloff N, MacCabe JH, Dobson RJB, Ibrahim ZM. The side effect profile of Clozapine in real world data of three large mental health hospitals. PLoS One 2020; 15:e0243437. [PMID: 33290433 PMCID: PMC7723266 DOI: 10.1371/journal.pone.0243437] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 11/22/2020] [Indexed: 12/30/2022] Open
Abstract
OBJECTIVE Mining the data contained within Electronic Health Records (EHRs) can potentially generate a greater understanding of medication effects in the real world, complementing what we know from Randomised control trials (RCTs). We Propose a text mining approach to detect adverse events and medication episodes from the clinical text to enhance our understanding of adverse effects related to Clozapine, the most effective antipsychotic drug for the management of treatment-resistant schizophrenia, but underutilised due to concerns over its side effects. MATERIAL AND METHODS We used data from de-identified EHRs of three mental health trusts in the UK (>50 million documents, over 500,000 patients, 2835 of which were prescribed Clozapine). We explored the prevalence of 33 adverse effects by age, gender, ethnicity, smoking status and admission type three months before and after the patients started Clozapine treatment. Where possible, we compared the prevalence of adverse effects with those reported in the Side Effects Resource (SIDER). RESULTS Sedation, fatigue, agitation, dizziness, hypersalivation, weight gain, tachycardia, headache, constipation and confusion were amongst the highest recorded Clozapine adverse effect in the three months following the start of treatment. Higher percentages of all adverse effects were found in the first month of Clozapine therapy. Using a significance level of (p< 0.05) our chi-square tests show a significant association between most of the ADRs and smoking status and hospital admission, and some in gender, ethnicity and age groups in all trusts hospitals. Later we combined the data from the three trusts hospitals to estimate the average effect of ADRs in each monthly interval. In gender and ethnicity, the results show significant association in 7 out of 33 ADRs, smoking status shows significant association in 21 out of 33 ADRs and hospital admission shows the significant association in 30 out of 33 ADRs. CONCLUSION A better understanding of how drugs work in the real world can complement clinical trials.
Collapse
Affiliation(s)
- Ehtesham Iqbal
- The Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, United Kingdom
| | - Risha Govind
- The Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, United Kingdom
| | - Alvin Romero
- SLAM BioResource for Mental Health, South London and Maudsley NHS Foundation Trust and King’s College London, London, United Kingdom
| | - Olubanke Dzahini
- Pharmacy Department, South London and Maudsley NHS Foundation Trust, London, United Kingdom
| | - Matthew Broadbent
- NIHR Biomedical Research Centre for Mental Health, South London and Maudsley NHS Foundation, London, United Kingdom
- Biomedical Research Unit for Dementia, South London and Maudsley NHS Foundation, London, United Kingdom
| | - Robert Stewart
- NIHR Biomedical Research Centre for Mental Health, South London and Maudsley NHS Foundation, London, United Kingdom
- Biomedical Research Unit for Dementia, South London and Maudsley NHS Foundation, London, United Kingdom
- Department of Health Service & Population Research, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, United Kingdom
| | - Tanya Smith
- Oxford Health NHS Foundation Trust, Oxford, United Kingdom
- NIHR Oxford Health Biomedical Research Centre, University of Oxford and Oxford Health NHS Foundation Trust, Oxford, United Kingdom
| | - Chi-Hun Kim
- Department of Psychiatry, University of Oxford, Oxford, United Kingdom
| | - Nomi Werbeloff
- UCL Division of Psychiatry, University College London, London, United Kingdom
- Camden and Islington, NHS Foundation Trust, London, United Kingdom
| | - James H. MacCabe
- NIHR Biomedical Research Centre for Mental Health, South London and Maudsley NHS Foundation, London, United Kingdom
- Psychosis Studies, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, De Crespigny Park, London, United Kingdom
| | - Richard J. B. Dobson
- The Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, United Kingdom
- NIHR Biomedical Research Centre for Mental Health, South London and Maudsley NHS Foundation, London, United Kingdom
- Biomedical Research Unit for Dementia, South London and Maudsley NHS Foundation, London, United Kingdom
- The Farr Institute of Health Informatics Research, UCL Institute of Health Informatics, University College London, London, United Kingdom
- NIHR Biomedical Research Centre, University College London Hospitals, London, United Kingdom
| | - Zina M. Ibrahim
- The Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, United Kingdom
- NIHR Biomedical Research Centre for Mental Health, South London and Maudsley NHS Foundation, London, United Kingdom
- Biomedical Research Unit for Dementia, South London and Maudsley NHS Foundation, London, United Kingdom
- The Farr Institute of Health Informatics Research, UCL Institute of Health Informatics, University College London, London, United Kingdom
- NIHR Biomedical Research Centre, University College London Hospitals, London, United Kingdom
| |
Collapse
|
17
|
Jones KH, Ford EM, Lea N, Griffiths LJ, Hassan L, Heys S, Squires E, Nenadic G. Toward the Development of Data Governance Standards for Using Clinical Free-Text Data in Health Research: Position Paper. J Med Internet Res 2020; 22:e16760. [PMID: 32597785 PMCID: PMC7367542 DOI: 10.2196/16760] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Revised: 03/06/2020] [Accepted: 03/23/2020] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Clinical free-text data (eg, outpatient letters or nursing notes) represent a vast, untapped source of rich information that, if more accessible for research, would clarify and supplement information coded in structured data fields. Data usually need to be deidentified or anonymized before they can be reused for research, but there is a lack of established guidelines to govern effective deidentification and use of free-text information and avoid damaging data utility as a by-product. OBJECTIVE This study aimed to develop recommendations for the creation of data governance standards to integrate with existing frameworks for personal data use, to enable free-text data to be used safely for research for patient and public benefit. METHODS We outlined data protection legislation and regulations relating to the United Kingdom for context and conducted a rapid literature review and UK-based case studies to explore data governance models used in working with free-text data. We also engaged with stakeholders, including text-mining researchers and the general public, to explore perceived barriers and solutions in working with clinical free-text. RESULTS We proposed a set of recommendations, including the need for authoritative guidance on data governance for the reuse of free-text data, to ensure public transparency in data flows and uses, to treat deidentified free-text data as potentially identifiable with use limited to accredited data safe havens, and to commit to a culture of continuous improvement to understand the relationships between the efficacy of deidentification and reidentification risks, so this can be communicated to all stakeholders. CONCLUSIONS By drawing together the findings of a combination of activities, we present a position paper to contribute to the development of data governance standards for the reuse of clinical free-text data for secondary purposes. While working in accordance with existing data governance frameworks, there is a need for further work to take forward the recommendations we have proposed, with commitment and investment, to assure and expand the safe reuse of clinical free-text data for public benefit.
Collapse
Affiliation(s)
- Kerina H Jones
- Population Data Science, Medical School, Swansea University, Swansea, United Kingdom
| | | | - Nathan Lea
- Institute of Health Informatics, University College London, London, United Kingdom
| | - Lucy J Griffiths
- Population Data Science, Medical School, Swansea University, Swansea, United Kingdom
| | - Lamiece Hassan
- Division of Informatics, Imaging & Data Sciences, University of Manchester, Manchester, United Kingdom
| | - Sharon Heys
- Population Data Science, Medical School, Swansea University, Swansea, United Kingdom
| | - Emma Squires
- Population Data Science, Medical School, Swansea University, Swansea, United Kingdom
| | - Goran Nenadic
- Department of Computer Science, University of Manchester & The Alan Turing Institute, Manchester, United Kingdom
| |
Collapse
|
18
|
Norgeot B, Muenzen K, Peterson TA, Fan X, Glicksberg BS, Schenk G, Rutenberg E, Oskotsky B, Sirota M, Yazdany J, Schmajuk G, Ludwig D, Goldstein T, Butte AJ. Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes. NPJ Digit Med 2020; 3:57. [PMID: 32337372 PMCID: PMC7156708 DOI: 10.1038/s41746-020-0258-y] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Accepted: 03/02/2020] [Indexed: 11/29/2022] Open
Abstract
There is a great and growing need to ascertain what exactly is the state of a patient, in terms of disease progression, actual care practices, pathology, adverse events, and much more, beyond the paucity of data available in structured medical record data. Ascertaining these harder-to-reach data elements is now critical for the accurate phenotyping of complex traits, detection of adverse outcomes, efficacy of off-label drug use, and longitudinal patient surveillance. Clinical notes often contain the most detailed and relevant digital information about individual patients, the nuances of their diseases, the treatment strategies selected by physicians, and the resulting outcomes. However, notes remain largely unused for research because they contain Protected Health Information (PHI), which is synonymous with individually identifying data. Previous clinical note de-identification approaches have been rigid and still too inaccurate to see any substantial real-world use, primarily because they have been trained with too small medical text corpora. To build a new de-identification tool, we created the largest manually annotated clinical note corpus for PHI and develop a customizable open-source de-identification software called Philter ("Protected Health Information filter"). Here we describe the design and evaluation of Philter, and show how it offers substantial real-world improvements over prior methods.
Collapse
Affiliation(s)
- Beau Norgeot
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA USA
| | - Kathleen Muenzen
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA USA
| | - Thomas A. Peterson
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA USA
| | - Xuancheng Fan
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA USA
| | - Benjamin S. Glicksberg
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA USA
| | - Gundolf Schenk
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA USA
| | - Eugenia Rutenberg
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA USA
| | - Boris Oskotsky
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA USA
| | - Marina Sirota
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA USA
| | - Jinoos Yazdany
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, San Francisco, CA USA
| | - Gabriela Schmajuk
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, San Francisco, CA USA
- San Francisco Veterans Affairs Medical Center, San Francisco, CA USA
| | - Dana Ludwig
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA USA
| | - Theodore Goldstein
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA USA
| | - Atul J. Butte
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA USA
- Center for Data-Driven Insights and Innovation, University of California Health, Oakland, CA USA
| |
Collapse
|
19
|
Koleck TA, Dreisbach C, Bourne PE, Bakken S. Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review. J Am Med Inform Assoc 2020; 26:364-379. [PMID: 30726935 DOI: 10.1093/jamia/ocy173] [Citation(s) in RCA: 206] [Impact Index Per Article: 41.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 11/20/2018] [Accepted: 11/27/2018] [Indexed: 12/26/2022] Open
Abstract
OBJECTIVE Natural language processing (NLP) of symptoms from electronic health records (EHRs) could contribute to the advancement of symptom science. We aim to synthesize the literature on the use of NLP to process or analyze symptom information documented in EHR free-text narratives. MATERIALS AND METHODS Our search of 1964 records from PubMed and EMBASE was narrowed to 27 eligible articles. Data related to the purpose, free-text corpus, patients, symptoms, NLP methodology, evaluation metrics, and quality indicators were extracted for each study. RESULTS Symptom-related information was presented as a primary outcome in 14 studies. EHR narratives represented various inpatient and outpatient clinical specialties, with general, cardiology, and mental health occurring most frequently. Studies encompassed a wide variety of symptoms, including shortness of breath, pain, nausea, dizziness, disturbed sleep, constipation, and depressed mood. NLP approaches included previously developed NLP tools, classification methods, and manually curated rule-based processing. Only one-third (n = 9) of studies reported patient demographic characteristics. DISCUSSION NLP is used to extract information from EHR free-text narratives written by a variety of healthcare providers on an expansive range of symptoms across diverse clinical specialties. The current focus of this field is on the development of methods to extract symptom information and the use of symptom information for disease classification tasks rather than the examination of symptoms themselves. CONCLUSION Future NLP studies should concentrate on the investigation of symptoms and symptom documentation in EHR free-text narratives. Efforts should be undertaken to examine patient characteristics and make symptom-related NLP algorithms or pipelines and vocabularies openly available.
Collapse
Affiliation(s)
| | - Caitlin Dreisbach
- School of Nursing, University of Virginia, Charlottesville, Virginia, USA.,Data Science Institute, University of Virginia, Charlottesville, Virginia, USA
| | - Philip E Bourne
- Data Science Institute, University of Virginia, Charlottesville, Virginia, USA
| | - Suzanne Bakken
- School of Nursing, Columbia University, New York, New York, USA.,Department of Biomedical Informatics, Columbia University, New York, New York, USA.,Data Science Institute, Columbia University, New York, New York, USA
| |
Collapse
|
20
|
Ju M, Nguyen NTH, Miwa M, Ananiadou S. An ensemble of neural models for nested adverse drug events and medication extraction with subwords. J Am Med Inform Assoc 2020; 27:22-30. [PMID: 31197355 PMCID: PMC6913208 DOI: 10.1093/jamia/ocz075] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 03/22/2019] [Accepted: 05/07/2019] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVE This article describes an ensembling system to automatically extract adverse drug events and drug related entities from clinical narratives, which was developed for the 2018 n2c2 Shared Task Track 2. MATERIALS AND METHODS We designed a neural model to tackle both nested (entities embedded in other entities) and polysemous entities (entities annotated with multiple semantic types) based on MIMIC III discharge summaries. To better represent rare and unknown words in entities, we further tokenized the MIMIC III data set by splitting the words into finer-grained subwords. We finally combined all the models to boost the performance. Additionally, we implemented a featured-based conditional random field model and created an ensemble to combine its predictions with those of the neural model. RESULTS Our method achieved 92.78% lenient micro F1-score, with 95.99% lenient precision, and 89.79% lenient recall, respectively. Experimental results showed that combining the predictions of either multiple models, or of a single model with different settings can improve performance. DISCUSSION Analysis of the development set showed that our neural models can detect more informative text regions than feature-based conditional random field models. Furthermore, most entity types significantly benefit from subword representation, which also allows us to extract sparse entities, especially nested entities. CONCLUSION The overall results have demonstrated that the ensemble method can accurately recognize entities, including nested and polysemous entities. Additionally, our method can recognize sparse entities by reconsidering the clinical narratives at a finer-grained subword level, rather than at the word level.
Collapse
Affiliation(s)
- Meizhi Ju
- National Centre for Text Mining, School of Computer Science, The University of Manchester, Manchester, UK
- Artificial Intelligence Research Centre (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Nhung T H Nguyen
- National Centre for Text Mining, School of Computer Science, The University of Manchester, Manchester, UK
- Artificial Intelligence Research Centre (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Makoto Miwa
- Toyota Technological Institute, Nagoya, Japan
- Artificial Intelligence Research Centre (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Sophia Ananiadou
- National Centre for Text Mining, School of Computer Science, The University of Manchester, Manchester, UK
- Artificial Intelligence Research Centre (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| |
Collapse
|
21
|
König M, Sander A, Demuth I, Diekmann D, Steinhagen-Thiessen E. Knowledge-based best of breed approach for automated detection of clinical events based on German free text digital hospital discharge letters. PLoS One 2019; 14:e0224916. [PMID: 31774830 PMCID: PMC6881027 DOI: 10.1371/journal.pone.0224916] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2019] [Accepted: 10/24/2019] [Indexed: 12/26/2022] Open
Abstract
Objectives The secondary use of medical data contained in electronic medical records, such as hospital discharge letters, is a valuable resource for the improvement of clinical care (e.g. in terms of medication safety) or for research purposes. However, the automated processing and analysis of medical free text still poses a huge challenge to available natural language processing (NLP) systems. The aim of this study was to implement a knowledge-based best of breed approach, combining a terminology server with integrated ontology, a NLP pipeline and a rules engine. Methods We tested the performance of this approach in a use case. The clinical event of interest was the particular drug-disease interaction “proton-pump inhibitor [PPI] use and osteoporosis”. Cases were to be identified based on free text digital discharge letters as source of information. Automated detection was validated against a gold standard. Results Precision of recognition of osteoporosis was 94.19%, and recall was 97.45%. PPIs were detected with 100% precision and 97.97% recall. The F-score for the detection of the given drug-disease-interaction was 96,13%. Conclusion We could show that our approach of combining a NLP pipeline, a terminology server, and a rules engine for the purpose of automated detection of clinical events such as drug-disease interactions from free text digital hospital discharge letters was effective. There is huge potential for the implementation in clinical and research contexts, as this approach enables analyses of very high numbers of medical free text documents within a short time period.
Collapse
Affiliation(s)
- Maximilian König
- Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Lipid Clinic at Interdisciplinary Metabolism Center, Berlin, Germany
- Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Department of Nephrology and Internal Intensive Care Medicine Berlin, Germany
- * E-mail:
| | - André Sander
- ID Information und Dokumentation im Gesundheitswesen GmbH, Berlin, Germany
| | - Ilja Demuth
- Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Lipid Clinic at Interdisciplinary Metabolism Center, Berlin, Germany
- Charité - Universitätsmedizin Berlin, BCRT—Berlin Institute of Health Center for Regenerative Therapies, Berlin, Germany
| | - Daniel Diekmann
- ID Information und Dokumentation im Gesundheitswesen GmbH, Berlin, Germany
| | - Elisabeth Steinhagen-Thiessen
- Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Lipid Clinic at Interdisciplinary Metabolism Center, Berlin, Germany
| |
Collapse
|
22
|
Natsiavas P, Malousi A, Bousquet C, Jaulent MC, Koutkias V. Computational Advances in Drug Safety: Systematic and Mapping Review of Knowledge Engineering Based Approaches. Front Pharmacol 2019; 10:415. [PMID: 31156424 PMCID: PMC6533857 DOI: 10.3389/fphar.2019.00415] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2018] [Accepted: 04/02/2019] [Indexed: 12/12/2022] Open
Abstract
Drug Safety (DS) is a domain with significant public health and social impact. Knowledge Engineering (KE) is the Computer Science discipline elaborating on methods and tools for developing “knowledge-intensive” systems, depending on a conceptual “knowledge” schema and some kind of “reasoning” process. The present systematic and mapping review aims to investigate KE-based approaches employed for DS and highlight the introduced added value as well as trends and possible gaps in the domain. Journal articles published between 2006 and 2017 were retrieved from PubMed/MEDLINE and Web of Science® (873 in total) and filtered based on a comprehensive set of inclusion/exclusion criteria. The 80 finally selected articles were reviewed on full-text, while the mapping process relied on a set of concrete criteria (concerning specific KE and DS core activities, special DS topics, employed data sources, reference ontologies/terminologies, and computational methods, etc.). The analysis results are publicly available as online interactive analytics graphs. The review clearly depicted increased use of KE approaches for DS. The collected data illustrate the use of KE for various DS aspects, such as Adverse Drug Event (ADE) information collection, detection, and assessment. Moreover, the quantified analysis of using KE for the respective DS core activities highlighted room for intensifying research on KE for ADE monitoring, prevention and reporting. Finally, the assessed use of the various data sources for DS special topics demonstrated extensive use of dominant data sources for DS surveillance, i.e., Spontaneous Reporting Systems, but also increasing interest in the use of emerging data sources, e.g., observational healthcare databases, biochemical/genetic databases, and social media. Various exemplar applications were identified with promising results, e.g., improvement in Adverse Drug Reaction (ADR) prediction, detection of drug interactions, and novel ADE profiles related with specific mechanisms of action, etc. Nevertheless, since the reviewed studies mostly concerned proof-of-concept implementations, more intense research is required to increase the maturity level that is necessary for KE approaches to reach routine DS practice. In conclusion, we argue that efficiently addressing DS data analytics and management challenges requires the introduction of high-throughput KE-based methods for effective knowledge discovery and management, resulting ultimately, in the establishment of a continuous learning DS system.
Collapse
Affiliation(s)
- Pantelis Natsiavas
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece.,Sorbonne Université, INSERM, Univ Paris 13, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, Paris, France
| | - Andigoni Malousi
- Laboratory of Biological Chemistry, Department of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Cédric Bousquet
- Sorbonne Université, INSERM, Univ Paris 13, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, Paris, France.,Public Health and Medical Information Unit, University Hospital of Saint-Etienne, Saint-Étienne, France
| | - Marie-Christine Jaulent
- Sorbonne Université, INSERM, Univ Paris 13, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, Paris, France
| | - Vassilis Koutkias
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| |
Collapse
|
23
|
Fonferko-Shadrach B, Lacey AS, Roberts A, Akbari A, Thompson S, Ford DV, Lyons RA, Rees MI, Pickrell WO. Using natural language processing to extract structured epilepsy data from unstructured clinic letters: development and validation of the ExECT (extraction of epilepsy clinical text) system. BMJ Open 2019; 9:e023232. [PMID: 30940752 PMCID: PMC6500195 DOI: 10.1136/bmjopen-2018-023232] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/03/2018] [Revised: 01/23/2019] [Accepted: 02/13/2019] [Indexed: 11/24/2022] Open
Abstract
OBJECTIVE Routinely collected healthcare data are a powerful research resource but often lack detailed disease-specific information that is collected in clinical free text, for example, clinic letters. We aim to use natural language processing techniques to extract detailed clinical information from epilepsy clinic letters to enrich routinely collected data. DESIGN We used the general architecture for text engineering (GATE) framework to build an information extraction system, ExECT (extraction of epilepsy clinical text), combining rule-based and statistical techniques. We extracted nine categories of epilepsy information in addition to clinic date and date of birth across 200 clinic letters. We compared the results of our algorithm with a manual review of the letters by an epilepsy clinician. SETTING De-identified and pseudonymised epilepsy clinic letters from a Health Board serving half a million residents in Wales, UK. RESULTS We identified 1925 items of information with overall precision, recall and F1 score of 91.4%, 81.4% and 86.1%, respectively. Precision and recall for epilepsy-specific categories were: epilepsy diagnosis (88.1%, 89.0%), epilepsy type (89.8%, 79.8%), focal seizures (96.2%, 69.7%), generalised seizures (88.8%, 52.3%), seizure frequency (86.3%-53.6%), medication (96.1%, 94.0%), CT (55.6%, 58.8%), MRI (82.4%, 68.8%) and electroencephalogram (81.5%, 75.3%). CONCLUSIONS We have built an automated clinical text extraction system that can accurately extract epilepsy information from free text in clinic letters. This can enhance routinely collected data for research in the UK. The information extracted with ExECT such as epilepsy type, seizure frequency and neurological investigations are often missing from routinely collected data. We propose that our algorithm can bridge this data gap enabling further epilepsy research opportunities. While many of the rules in our pipeline were tailored to extract epilepsy specific information, our methods can be applied to other diseases and also can be used in clinical practice to record patient information in a structured manner.
Collapse
Affiliation(s)
- Beata Fonferko-Shadrach
- Neurology and Molecular Neuroscience Group, Institute of Life Science, Swansea University Medical School, Swansea University, Swansea, UK
| | - Arron S Lacey
- Neurology and Molecular Neuroscience Group, Institute of Life Science, Swansea University Medical School, Swansea University, Swansea, UK
- Health Data Research UK, Data Science Building, Swansea University Medical School, Swansea University, Swansea, UK
| | - Angus Roberts
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Ashley Akbari
- Health Data Research UK, Data Science Building, Swansea University Medical School, Swansea University, Swansea, UK
| | - Simon Thompson
- Health Data Research UK, Data Science Building, Swansea University Medical School, Swansea University, Swansea, UK
| | - David V Ford
- Health Data Research UK, Data Science Building, Swansea University Medical School, Swansea University, Swansea, UK
| | - Ronan A Lyons
- Health Data Research UK, Data Science Building, Swansea University Medical School, Swansea University, Swansea, UK
| | - Mark I Rees
- Neurology and Molecular Neuroscience Group, Institute of Life Science, Swansea University Medical School, Swansea University, Swansea, UK
- Faculty of Medicine and Health, University of Sydney, Sydney, Australia
| | - William Owen Pickrell
- Neurology and Molecular Neuroscience Group, Institute of Life Science, Swansea University Medical School, Swansea University, Swansea, UK
| |
Collapse
|
24
|
Kadra G, Spiros A, Shetty H, Iqbal E, Hayes RD, Stewart R, Geerts H. Predicting parkinsonism side-effects of antipsychotic polypharmacy prescribed in secondary mental healthcare. J Psychopharmacol 2018; 32:1191-1196. [PMID: 30232932 PMCID: PMC6238161 DOI: 10.1177/0269881118796809] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
BACKGROUND Computer-modelling approaches have the potential to predict the interactions between different antipsychotics and provide guidance for polypharmacy. AIMS To evaluate the accuracy of the quantitative systems pharmacology platform to predict parkinsonism side-effects in patients prescribed antipsychotic polypharmacy. METHODS Using anonymized data from South London and Maudsley NHS Foundation Trust electronic health records we applied quantitative systems pharmacology, a neurophysiology-based computer model of humanized neuronal circuits, to predict the risk for parkinsonism symptoms in patients with schizophrenia prescribed two concomitant antipsychotics. The performance of the quantitative systems pharmacology model was compared with the performance of simple parameters such as: combination of affinity constants (1/Ksum); sum of D2R occupancies (D2R) and chlorpromazine equivalent dose. RESULTS We identified 832 patients with schizophrenia who were receiving two antipsychotics for six or more months, between 1 January 2007 and 31 December 2014. The area under the receiver operating characteristic (AUROC) curve for the quantitative systems pharmacology model was 0.66 ( p = 0.01), while AUROCs for D2R, 1/Ksum and chlorpromazine equivalent dose were 0.52 ( p = 0.350), 0.53 ( p = 0.347) and 0.52 ( p = 0.330) respectively. CONCLUSION Our results indicate that quantitative systems pharmacology has the potential to predict the risk of parkinsonism associated with antipsychotic polypharmacy from minimal source information, and thus might have potential decision-support applicability in clinical settings.
Collapse
Affiliation(s)
- Giouliana Kadra
- King’s College London, Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, London, UK,Giouliana Kadra, BRC Neucleus, Mapother House, De Crespigny Park, IOPPN, King’s College London, London, SE5 8AF, UK.
| | | | - Hitesh Shetty
- South London and Maudsley NHS Trust, BRC Nucleus, London, UK
| | - Ehtesham Iqbal
- King’s College London, SGDP, Institute of Psychiatry, Psychology and Neuroscience, London, UK
| | - Richard D Hayes
- King’s College London, Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, London, UK
| | - Robert Stewart
- King’s College London, Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, London, UK,South London and Maudsley NHS Trust, BRC Nucleus, London, UK
| | | |
Collapse
|
25
|
Névéol A, Zweigenbaum P. Expanding the Diversity of Texts and Applications: Findings from the Section on Clinical Natural Language Processing of the International Medical Informatics Association Yearbook. Yearb Med Inform 2018; 27:193-198. [PMID: 30157523 PMCID: PMC6115241 DOI: 10.1055/s-0038-1667080] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Objectives:
To summarize recent research and present a selection of the best papers published in 2017 in the field of clinical Natural Language Processing (NLP).
Methods:
A survey of the literature was performed by the two editors of the NLP section of the International Medical Informatics Association (IMIA) Yearbook. Bibliographic databases PubMed and Association of Computational Linguistics (ACL) Anthology were searched for papers with a focus on NLP efforts applied to clinical texts or aimed at a clinical outcome. A total of 709 papers were automatically ranked and then manually reviewed based on title and abstract. A shortlist of 15 candidate best papers was selected by the section editors and peer-reviewed by independent external reviewers to come to the three best clinical NLP papers for 2017.
Results:
Clinical NLP best papers provide a contribution that ranges from methodological studies to the application of research results to practical clinical settings. They draw from text genres as diverse as clinical narratives across hospitals and languages or social media.
Conclusions:
Clinical NLP continued to thrive in 2017, with an increasing number of contributions towards applications compared to fundamental methods. Methodological work explores deep learning and system adaptation across language variants. Research results continue to translate into freely available tools and corpora, mainly for the English language.
Collapse
|
26
|
Bean DM, Wu H, Iqbal E, Dzahini O, Ibrahim ZM, Broadbent M, Stewart R, Dobson RJB. Knowledge graph prediction of unknown adverse drug reactions and validation in electronic health records. Sci Rep 2017; 7:16416. [PMID: 29180758 PMCID: PMC5703951 DOI: 10.1038/s41598-017-16674-x] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2017] [Accepted: 11/16/2017] [Indexed: 01/31/2023] Open
Abstract
Unknown adverse reactions to drugs available on the market present a significant health risk and limit accurate judgement of the cost/benefit trade-off for medications. Machine learning has the potential to predict unknown adverse reactions from current knowledge. We constructed a knowledge graph containing four types of node: drugs, protein targets, indications and adverse reactions. Using this graph, we developed a machine learning algorithm based on a simple enrichment test and first demonstrated this method performs extremely well at classifying known causes of adverse reactions (AUC 0.92). A cross validation scheme in which 10% of drug-adverse reaction edges were systematically deleted per fold showed that the method correctly predicts 68% of the deleted edges on average. Next, a subset of adverse reactions that could be reliably detected in anonymised electronic health records from South London and Maudsley NHS Foundation Trust were used to validate predictions from the model that are not currently known in public databases. High-confidence predictions were validated in electronic records significantly more frequently than random models, and outperformed standard methods (logistic regression, decision trees and support vector machines). This approach has the potential to improve patient safety by predicting adverse reactions that were not observed during randomised trials.
Collapse
Affiliation(s)
- Daniel M Bean
- Department of Biostatistics and Health Informatics, Institute of Psychiatry Psychology and Neuroscience, King's College London, 16 De Crespigny Park, London, SE5 8AF, United Kingdom
| | - Honghan Wu
- Department of Biostatistics and Health Informatics, Institute of Psychiatry Psychology and Neuroscience, King's College London, 16 De Crespigny Park, London, SE5 8AF, United Kingdom
| | - Ehtesham Iqbal
- Department of Biostatistics and Health Informatics, Institute of Psychiatry Psychology and Neuroscience, King's College London, 16 De Crespigny Park, London, SE5 8AF, United Kingdom
| | - Olubanke Dzahini
- South London and Maudsley NHS Foundation Trust, Denmark Hill, London, SE5 8AZ, United Kingdom
- Institute of Pharmaceutical Science, King's College, London, 5th Floor, Franklin-Wilkins Building, 150 Stamford Street, London, SE1 9NH, United Kingdom
| | - Zina M Ibrahim
- Department of Biostatistics and Health Informatics, Institute of Psychiatry Psychology and Neuroscience, King's College London, 16 De Crespigny Park, London, SE5 8AF, United Kingdom
- Farr Institute of Health Informatics Research, UCL Institute of Health Informatics, University College London, London, WC1E 6BT, United Kingdom
| | - Matthew Broadbent
- South London and Maudsley NHS Foundation Trust, Denmark Hill, London, SE5 8AZ, United Kingdom
| | - Robert Stewart
- South London and Maudsley NHS Foundation Trust, Denmark Hill, London, SE5 8AZ, United Kingdom
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, 16 De Crespigny Park, London, SE5 8AF, United Kingdom
| | - Richard J B Dobson
- Department of Biostatistics and Health Informatics, Institute of Psychiatry Psychology and Neuroscience, King's College London, 16 De Crespigny Park, London, SE5 8AF, United Kingdom.
- Farr Institute of Health Informatics Research, UCL Institute of Health Informatics, University College London, London, WC1E 6BT, United Kingdom.
| |
Collapse
|