Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Wang X, Hripcsak G, Markatou M, Friedman C. Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study. J Am Med Inform Assoc 2009;16:328-37. [PMID: 19261932 PMCID: PMC2732239 DOI: 10.1197/jamia.m3028] [Citation(s) in RCA: 162] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2008] [Accepted: 01/31/2009] [Indexed: 11/10/2022] Open

For:	Wang X, Hripcsak G, Markatou M, Friedman C. Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study. J Am Med Inform Assoc 2009;16:328-37. [PMID: 19261932 PMCID: PMC2732239 DOI: 10.1197/jamia.m3028] [Citation(s) in RCA: 162] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2008] [Accepted: 01/31/2009] [Indexed: 11/10/2022] Open

Number

Cited by Other Article(s)

Sivarajkumar S, Tam TYC, Mohammad HA, Viggiano S, Oniani D, Visweswaran S, Wang Y. Extraction of sleep information from clinical notes of Alzheimer's disease patients using natural language processing. J Am Med Inform Assoc 2024:ocae177. [PMID: 39001795 DOI: 10.1093/jamia/ocae177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 06/19/2024] [Accepted: 07/01/2024] [Indexed: 07/15/2024] Open

Abstract

OBJECTIVES

Alzheimer's disease (AD) is the most common form of dementia in the United States. Sleep is one of the lifestyle-related factors that has been shown critical for optimal cognitive function in old age. However, there is a lack of research studying the association between sleep and AD incidence. A major bottleneck for conducting such research is that the traditional way to acquire sleep information is time-consuming, inefficient, non-scalable, and limited to patients' subjective experience. We aim to automate the extraction of specific sleep-related patterns, such as snoring, napping, poor sleep quality, daytime sleepiness, night wakings, other sleep problems, and sleep duration, from clinical notes of AD patients. These sleep patterns are hypothesized to play a role in the incidence of AD, providing insight into the relationship between sleep and AD onset and progression.

MATERIALS AND METHODS

A gold standard dataset is created from manual annotation of 570 randomly sampled clinical note documents from the adSLEEP, a corpus of 192 000 de-identified clinical notes of 7266 AD patients retrieved from the University of Pittsburgh Medical Center (UPMC). We developed a rule-based natural language processing (NLP) algorithm, machine learning models, and large language model (LLM)-based NLP algorithms to automate the extraction of sleep-related concepts, including snoring, napping, sleep problem, bad sleep quality, daytime sleepiness, night wakings, and sleep duration, from the gold standard dataset.

RESULTS

The annotated dataset of 482 patients comprised a predominantly White (89.2%), older adult population with an average age of 84.7 years, where females represented 64.1%, and a vast majority were non-Hispanic or Latino (94.6%). Rule-based NLP algorithm achieved the best performance of F1 across all sleep-related concepts. In terms of positive predictive value (PPV), the rule-based NLP algorithm achieved the highest PPV scores for daytime sleepiness (1.00) and sleep duration (1.00), while the machine learning models had the highest PPV for napping (0.95) and bad sleep quality (0.86), and LLAMA2 with finetuning had the highest PPV for night wakings (0.93) and sleep problem (0.89).

DISCUSSION

Although sleep information is infrequently documented in the clinical notes, the proposed rule-based NLP algorithm and LLM-based NLP algorithms still achieved promising results. In comparison, the machine learning-based approaches did not achieve good results, which is due to the small size of sleep information in the training data.

CONCLUSION

The results show that the rule-based NLP algorithm consistently achieved the best performance for all sleep concepts. This study focused on the clinical notes of patients with AD but could be extended to general sleep information extraction for other diseases.

Collapse

Bazoge A, Morin E, Daille B, Gourraud PA. Applying Natural Language Processing to Textual Data From Clinical Data Warehouses: Systematic Review. JMIR Med Inform 2023;11:e42477. [PMID: 38100200 PMCID: PMC10757232 DOI: 10.2196/42477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 01/16/2023] [Accepted: 09/07/2023] [Indexed: 12/17/2023] Open

Abstract

BACKGROUND

In recent years, health data collected during the clinical care process have been often repurposed for secondary use through clinical data warehouses (CDWs), which interconnect disparate data from different sources. A large amount of information of high clinical value is stored in unstructured text format. Natural language processing (NLP), which implements algorithms that can operate on massive unstructured textual data, has the potential to structure the data and make clinical information more accessible.

OBJECTIVE

The aim of this review was to provide an overview of studies applying NLP to textual data from CDWs. It focuses on identifying the (1) NLP tasks applied to data from CDWs and (2) NLP methods used to tackle these tasks.

METHODS

This review was performed according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. We searched for relevant articles in 3 bibliographic databases: PubMed, Google Scholar, and ACL Anthology. We reviewed the titles and abstracts and included articles according to the following inclusion criteria: (1) focus on NLP applied to textual data from CDWs, (2) articles published between 1995 and 2021, and (3) written in English.

RESULTS

We identified 1353 articles, of which 194 (14.34%) met the inclusion criteria. Among all identified NLP tasks in the included papers, information extraction from clinical text (112/194, 57.7%) and the identification of patients (51/194, 26.3%) were the most frequent tasks. To address the various tasks, symbolic methods were the most common NLP methods (124/232, 53.4%), showing that some tasks can be partially achieved with classical NLP techniques, such as regular expressions or pattern matching that exploit specialized lexica, such as drug lists and terminologies. Machine learning (70/232, 30.2%) and deep learning (38/232, 16.4%) have been increasingly used in recent years, including the most recent approaches based on transformers. NLP methods were mostly applied to English language data (153/194, 78.9%).

CONCLUSIONS

CDWs are central to the secondary use of clinical texts for research purposes. Although the use of NLP on data from CDWs is growing, there remain challenges in this field, especially with regard to languages other than English. Clinical NLP is an effective strategy for accessing, extracting, and transforming data from CDWs. Information retrieved with NLP can assist in clinical research and have an impact on clinical practice.

Collapse

Boxley C, Fujimoto M, Ratwani RM, Fong A. A text mining approach to categorize patient safety event reports by medication error type. Sci Rep 2023;13:18354. [PMID: 37884577 PMCID: PMC10603175 DOI: 10.1038/s41598-023-45152-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 10/17/2023] [Indexed: 10/28/2023] Open

Davis SE, Zabotka L, Desai RJ, Wang SV, Maro JC, Coughlin K, Hernández-Muñoz JJ, Stojanovic D, Shah NH, Smith JC. Use of Electronic Health Record Data for Drug Safety Signal Identification: A Scoping Review. Drug Saf 2023;46:725-742. [PMID: 37340238 DOI: 10.1007/s40264-023-01325-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/31/2023] [Indexed: 06/22/2023]

Lee K, Liu Z, Chandran U, Kalsekar I, Laxmanan B, Higashi MK, Jun T, Ma M, Li M, Mai Y, Gilman C, Wang T, Ai L, Aggarwal P, Pan Q, Oh W, Stolovitzky G, Schadt E, Wang X. Detecting Ground Glass Opacity Features in Patients With Lung Cancer: Automated Extraction and Longitudinal Analysis via Deep Learning-Based Natural Language Processing. JMIR AI 2023;2:e44537. [PMID: 38875565 PMCID: PMC11041451 DOI: 10.2196/44537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 01/30/2023] [Accepted: 03/31/2023] [Indexed: 06/16/2024]

Abstract

BACKGROUND

Ground-glass opacities (GGOs) appearing in computed tomography (CT) scans may indicate potential lung malignancy. Proper management of GGOs based on their features can prevent the development of lung cancer. Electronic health records are rich sources of information on GGO nodules and their granular features, but most of the valuable information is embedded in unstructured clinical notes.

OBJECTIVE

We aimed to develop, test, and validate a deep learning-based natural language processing (NLP) tool that automatically extracts GGO features to inform the longitudinal trajectory of GGO status from large-scale radiology notes.

METHODS

We developed a bidirectional long short-term memory with a conditional random field-based deep-learning NLP pipeline to extract GGO and granular features of GGO retrospectively from radiology notes of 13,216 lung cancer patients. We evaluated the pipeline with quality assessments and analyzed cohort characterization of the distribution of nodule features longitudinally to assess changes in size and solidity over time.

RESULTS

Our NLP pipeline built on the GGO ontology we developed achieved between 95% and 100% precision, 89% and 100% recall, and 92% and 100% F1-scores on different GGO features. We deployed this GGO NLP model to extract and structure comprehensive characteristics of GGOs from 29,496 radiology notes of 4521 lung cancer patients. Longitudinal analysis revealed that size increased in 16.8% (240/1424) of patients, decreased in 14.6% (208/1424), and remained unchanged in 68.5% (976/1424) in their last note compared to the first note. Among 1127 patients who had longitudinal radiology notes of GGO status, 815 (72.3%) were reported to have stable status, and 259 (23%) had increased/progressed status in the subsequent notes.

CONCLUSIONS

Our deep learning-based NLP pipeline can automatically extract granular GGO features at scale from electronic health records when this information is documented in radiology notes and help inform the natural history of GGO. This will open the way for a new paradigm in lung cancer prevention and early detection.

Collapse

Keloth VK, Zhou S, Lindemann L, Zheng L, Elhanan G, Einstein AJ, Geller J, Perl Y. Mining of EHR for interface terminology concepts for annotating EHRs of COVID patients. BMC Med Inform Decis Mak 2023;23:40. [PMID: 36829139 PMCID: PMC9951157 DOI: 10.1186/s12911-023-02136-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 02/09/2023] [Indexed: 02/26/2023] Open

Abstract

BACKGROUND

Two years into the COVID-19 pandemic and with more than five million deaths worldwide, the healthcare establishment continues to struggle with every new wave of the pandemic resulting from a new coronavirus variant. Research has demonstrated that there are variations in the symptoms, and even in the order of symptom presentations, in COVID-19 patients infected by different SARS-CoV-2 variants (e.g., Alpha and Omicron). Textual data in the form of admission notes and physician notes in the Electronic Health Records (EHRs) is rich in information regarding the symptoms and their orders of presentation. Unstructured EHR data is often underutilized in research due to the lack of annotations that enable automatic extraction of useful information from the available extensive volumes of textual data.

METHODS

We present the design of a COVID Interface Terminology (CIT), not just a generic COVID-19 terminology, but one serving a specific purpose of enabling automatic annotation of EHRs of COVID-19 patients. CIT was constructed by integrating existing COVID-related ontologies and mining additional fine granularity concepts from clinical notes. The iterative mining approach utilized the techniques of 'anchoring' and 'concatenation' to identify potential fine granularity concepts to be added to the CIT. We also tested the generalizability of our approach on a hold-out dataset and compared the annotation coverage to the coverage obtained for the dataset used to build the CIT.

RESULTS

Our experiments demonstrate that this approach results in higher annotation coverage compared to existing ontologies such as SNOMED CT and Coronavirus Infectious Disease Ontology (CIDO). The final version of CIT achieved about 20% more coverage than SNOMED CT and 50% more coverage than CIDO. In the future, the concepts mined and added into CIT could be used as training data for machine learning models for mining even more concepts into CIT and further increasing the annotation coverage.

CONCLUSION

In this paper, we demonstrated the construction of a COVID interface terminology that can be utilized for automatically annotating EHRs of COVID-19 patients. The techniques presented can identify frequently documented fine granularity concepts that are missing in other ontologies thereby increasing the annotation coverage.

Collapse

Whitaker B, Pizarro J, Deady M, Williams A, Ezzeldin H, Belov A, Kanderian S, Billings D, Cook K, Hettinger AZ, Anderson S. Detection of allergic transfusion-related adverse events from electronic medical records. Transfusion 2022;62:2029-2038. [PMID: 36004803 DOI: 10.1111/trf.17069] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 07/18/2022] [Accepted: 07/19/2022] [Indexed: 11/29/2022]

Explainable detection of adverse drug reaction with imbalanced data distribution. PLoS Comput Biol 2022;18:e1010144. [PMID: 35704662 PMCID: PMC9239481 DOI: 10.1371/journal.pcbi.1010144] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2021] [Revised: 06/28/2022] [Accepted: 04/26/2022] [Indexed: 11/18/2022] Open

Lindvall C, Deng CY, Agaronnik ND, Kwok A, Samineni S, Umeton R, Mackie-Jenkins W, Kehl KL, Tulsky JA, Enzinger AC. Deep Learning for Cancer Symptoms Monitoring on the Basis of Electronic Health Record Unstructured Clinical Notes. JCO Clin Cancer Inform 2022;6:e2100136. [PMID: 35714301 PMCID: PMC9232368 DOI: 10.1200/cci.21.00136] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Abstract

PURPOSE

Symptoms are vital outcomes for cancer clinical trials, observational research, and population-level surveillance. Patient-reported outcomes (PROs) are valuable for monitoring symptoms, yet there are many challenges to collecting PROs at scale. We sought to develop, test, and externally validate a deep learning model to extract symptoms from unstructured clinical notes in the electronic health record.

METHODS

We randomly selected 1,225 outpatient progress notes from among patients treated at the Dana-Farber Cancer Institute between January 2016 and December 2019 and used 1,125 notes as our training/validation data set and 100 notes as our test data set. We evaluated the performance of 10 deep learning models for detecting 80 symptoms included in the National Cancer Institute's Patient-Reported Outcomes version of the Common Terminology Criteria for Adverse Events (PRO-CTCAE) framework. Model performance as compared with manual chart abstraction was assessed using standard metrics, and the highest performer was externally validated on a sample of 100 physician notes from a different clinical context.

RESULTS

In our training and test data sets, 75 of the 80 candidate symptoms were identified. The ELECTRA-small model had the highest performance for symptom identification at the token level (ie, at the individual symptom level), with an F1 of 0.87 and a processing time of 3.95 seconds per note. For the 10 most common symptoms in the test data set, the F1 score ranged from 0.98 for anxious to 0.86 for fatigue. For external validation of the same symptoms, the note-level performance ranged from F1 = 0.97 for diarrhea and dizziness to F1 = 0.73 for swelling.

CONCLUSION

Training a deep learning model to identify a wide range of electronic health record-documented symptoms relevant to cancer care is feasible. This approach could be used at the health system scale to complement to electronic PROs.

Collapse

Using Machine Learning for Pharmacovigilance: A Systematic Review. Pharmaceutics 2022;14:pharmaceutics14020266. [PMID: 35213998 PMCID: PMC8924891 DOI: 10.3390/pharmaceutics14020266] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 01/13/2022] [Accepted: 01/21/2022] [Indexed: 02/04/2023] Open

Deady M, Ezzeldin H, Cook K, Billings D, Pizarro J, Plotogea AA, Saunders-Hastings P, Belov A, Whitaker BI, Anderson SA. The Food and Drug Administration Biologics Effectiveness and Safety Initiative Facilitates Detection of Vaccine Administrations From Unstructured Data in Medical Records Through Natural Language Processing. Front Digit Health 2022;3:777905. [PMID: 35005697 PMCID: PMC8727347 DOI: 10.3389/fdgth.2021.777905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Accepted: 12/03/2021] [Indexed: 12/03/2022] Open

Abstract

Introduction: The Food and Drug Administration Center for Biologics Evaluation and Research conducts post-market surveillance of biologic products to ensure their safety and effectiveness. Studies have found that common vaccine exposures may be missing from structured data elements of electronic health records (EHRs), instead being captured in clinical notes. This impacts monitoring of adverse events following immunizations (AEFIs). For example, COVID-19 vaccines have been regularly administered outside of traditional medical settings. We developed a natural language processing (NLP) algorithm to mine unstructured clinical notes for vaccinations not captured in structured EHR data.

Methods: A random sample of 1,000 influenza vaccine administrations, representing 995 unique patients, was extracted from a large U.S. EHR database. NLP techniques were used to detect administrations from the clinical notes in the training dataset [80% (N = 797) of patients]. The algorithm was applied to the validation dataset [20% (N = 198) of patients] to assess performance. Full medical charts for 28 randomly selected administration events in the validation dataset were reviewed by clinicians. The NLP algorithm was then applied across the entire dataset (N = 995) to quantify the number of additional events identified.

Results: A total of 3,199 administrations were identified in the structured data and clinical notes combined. Of these, 2,740 (85.7%) were identified in the structured data, while the NLP algorithm identified 1,183 (37.0%) administrations in clinical notes; 459 were not also captured in the structured data. This represents a 16.8% increase in the identification of vaccine administrations compared to using structured data alone. The validation of 28 vaccine administrations confirmed 27 (96.4%) as “definite” vaccine administrations; 18 (64.3%) had evidence of a vaccination event in the structured data, while 10 (35.7%) were found solely in the unstructured notes.

Discussion: We demonstrated the utility of an NLP algorithm to identify vaccine administrations not captured in structured EHR data. NLP techniques have the potential to improve detection of vaccine administrations not otherwise reported without increasing the analysis burden on physicians or practitioners. Future applications could include refining estimates of vaccine coverage and detecting other exposures, population characteristics, and outcomes not reliably captured in structured EHR data.

Collapse

Piscitelli A, Bevilacqua L, Labella B, Parravicini E, Auxilia F. A Keyword Approach to Identify Adverse Events Within Narrative Documents From 4 Italian Institutions. J Patient Saf 2022;18:e362-e367. [PMID: 32910039 DOI: 10.1097/pts.0000000000000783] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Edrees H, Song W, Syrowatka A, Simona A, Amato MG, Bates DW. Intelligent Telehealth in Pharmacovigilance: A Future Perspective. Drug Saf 2022;45:449-458. [PMID: 35579810 PMCID: PMC9112241 DOI: 10.1007/s40264-022-01172-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/02/2022] [Indexed: 01/28/2023]

Chopard D, Treder MS, Corcoran P, Ahmed N, Johnson C, Busse M, Spasic I. Text Mining of Adverse Events in Clinical Trials: Deep Learning Approach. JMIR Med Inform 2021;9:e28632. [PMID: 34951601 PMCID: PMC8742206 DOI: 10.2196/28632] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 08/01/2021] [Accepted: 11/14/2021] [Indexed: 11/28/2022] Open

Mao J, Sedrakyan A, Sun T, Guiahi M, Chudnoff S, Kinard M, Johnson SB. Assessing adverse event reports of hysteroscopic sterilization device removal using natural language processing. Pharmacoepidemiol Drug Saf 2021;31:442-451. [PMID: 34919294 DOI: 10.1002/pds.5402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 12/09/2021] [Accepted: 12/13/2021] [Indexed: 11/07/2022]

Abstract

OBJECTIVE

To develop an annotation model to apply natural language processing (NLP) to device adverse event reports and implement the model to evaluate the most frequently experienced events among women reporting a sterilization device removal.

METHODS

We included adverse event reports from the Manufacturer and User Facility Device Experience database from January 2005 to June 2018 related to device removal following hysteroscopic sterilization. We used an iterative process to develop an annotation model that extracts six categories of desired information and applied the annotation model to train an NLP algorithm. We assessed the model performance using positive predictive value (PPV, also known as precision), sensitivity (also known as recall), and F₁ score (a combined measure of PPV and sensitivity). Using extracted variables, we summarized the reporting source, the presence of prespecified and other patient and device events, additional sterilizations and other procedures performed, and time from implantation to removal.

RESULTS

The overall F₁ score was 91.5% for labeled items and 93.9% for distinct events after excluding duplicates. A total of 16 535 reports of device removal were analyzed. The most frequently reported patient and device events were abdominal/pelvic/genital pain (N = 13 166, 79.6%) and device dislocation/migration (N = 3180, 19.2%), respectively. Of those reporting an additional sterilization procedure, the majority had a hysterectomy or salpingectomy (N = 7932). One-fifth of the cases that had device removal timing specified reported a removal after 7 years following implantation (N = 2444/11 293).

CONCLUSIONS

We present a roadmap to develop an annotation model for NLP to analyze device adverse event reports. The extracted information is informative and complements findings from previous research using administrative data.

Collapse

Natural Language Processing to Identify Pulmonary Nodules and Extract Nodule Characteristics From Radiology Reports. Chest 2021;160:1902-1914. [PMID: 34089738 DOI: 10.1016/j.chest.2021.05.048] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Revised: 03/20/2021] [Accepted: 05/11/2021] [Indexed: 12/17/2022] Open

Abstract

BACKGROUND

There is an urgent need for population-based studies on managing patients with pulmonary nodules.

RESEARCH QUESTION

Is it possible to identify pulmonary nodules and associated characteristics using an automated method?

STUDY DESIGN AND METHODS

We revised and refined an existing natural language processing (NLP) algorithm to identify radiology transcripts with pulmonary nodules and greatly expanded its functionality to identify the characteristics of the largest nodule, when present, including size, lobe, laterality, attenuation, calcification, and edge. We compared NLP results with a reference standard of manual transcript review in a random test sample of 200 radiology transcripts. We applied the final automated method to a larger cohort of patients who underwent chest CT scan in an integrated health care system from 2006 to 2016, and described their demographic and clinical characteristics.

RESULTS

In the test sample, the NLP algorithm had very high sensitivity (98.6%; 95% CI, 95.0%-99.8%) and specificity (100%; 95% CI, 93.9%-100%) for identifying pulmonary nodules. For attenuation, edge, and calcification, the NLP algorithm achieved similar accuracies, and it correctly identified the diameter of the largest nodule in 135 of 141 cases (95.7%; 95% CI, 91.0%-98.4%). In the larger cohort, the NLP found 217,771 reports with nodules among 717,304 chest CT reports (30.4%). From 2006 to 2016, the number of reports with nodules increased by 150%, and the mean size of the largest nodule gradually decreased from 11 to 8.9 mm. Radiologists documented the laterality and lobe (90%-95%) more often than the attenuation, calcification, and edge characteristics (11%-14%).

INTERPRETATION

The NLP algorithm identified pulmonary nodules and associated characteristics with high accuracy. In our community practice settings, the documentation of nodule characteristics is incomplete. Our results call for better documentation of nodule findings. The NLP algorithm can be used in population-based studies to identify pulmonary nodules, avoiding labor-intensive chart review.

Collapse

Koleck TA, Tatonetti NP, Bakken S, Mitha S, Henderson MM, George M, Miaskowski C, Smaldone A, Topaz M. Identifying Symptom Information in Clinical Notes Using Natural Language Processing. Nurs Res 2021;70:173-183. [PMID: 33196504 PMCID: PMC9109773 DOI: 10.1097/nnr.0000000000000488] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]

Abstract

BACKGROUND

Symptoms are a core concept of nursing interest. Large-scale secondary data reuse of notes in electronic health records (EHRs) has the potential to increase the quantity and quality of symptom research. However, the symptom language used in clinical notes is complex. A need exists for methods designed specifically to identify and study symptom information from EHR notes.

OBJECTIVES

We aim to describe a method that combines standardized vocabularies, clinical expertise, and natural language processing to generate comprehensive symptom vocabularies and identify symptom information in EHR notes. We piloted this method with five diverse symptom concepts: constipation, depressed mood, disturbed sleep, fatigue, and palpitations.

METHODS

First, we obtained synonym lists for each pilot symptom concept from the Unified Medical Language System. Then, we used two large bodies of text (clinical notes from Columbia University Irving Medical Center and PubMed abstracts containing Medical Subject Headings or key words related to the pilot symptoms) to further expand our initial vocabulary of synonyms for each pilot symptom concept. We used NimbleMiner, an open-source natural language processing tool, to accomplish these tasks and evaluated NimbleMiner symptom identification performance by comparison to a manually annotated set of nurse- and physician-authored common EHR note types.

RESULTS

Compared to the baseline Unified Medical Language System synonym lists, we identified up to 11 times more additional synonym words or expressions, including abbreviations, misspellings, and unique multiword combinations, for each symptom concept. Natural language processing system symptom identification performance was excellent.

DISCUSSION

Using our comprehensive symptom vocabularies and NimbleMiner to label symptoms in clinical notes produced excellent performance metrics. The ability to extract symptom information from EHR notes in an accurate and scalable manner has the potential to greatly facilitate symptom science research.

Collapse

Malec SA, Wei P, Bernstam EV, Boyce RD, Cohen T. Using computable knowledge mined from the literature to elucidate confounders for EHR-based pharmacovigilance. J Biomed Inform 2021;117:103719. [PMID: 33716168 PMCID: PMC8559730 DOI: 10.1016/j.jbi.2021.103719] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Revised: 12/31/2020] [Accepted: 01/04/2021] [Indexed: 10/21/2022]

Abstract

INTRODUCTION

Drug safety research asks causal questions but relies on observational data. Confounding bias threatens the reliability of studies using such data. The successful control of confounding requires knowledge of variables called confounders affecting both the exposure and outcome of interest. However, causal knowledge of dynamic biological systems is complex and challenging. Fortunately, computable knowledge mined from the literature may hold clues about confounders. In this paper, we tested the hypothesis that incorporating literature-derived confounders can improve causal inference from observational data.

METHODS

We introduce two methods (semantic vector-based and string-based confounder search) that query literature-derived information for confounder candidates to control, using SemMedDB, a database of computable knowledge mined from the biomedical literature. These methods search SemMedDB for confounders by applying semantic constraint search for indications treated by the drug (exposure) and that are also known to cause the adverse event (outcome). We then include the literature-derived confounder candidates in statistical and causal models derived from free-text clinical notes. For evaluation, we use a reference dataset widely used in drug safety containing labeled pairwise relationships between drugs and adverse events and attempt to rediscover these relationships from a corpus of 2.2 M NLP-processed free-text clinical notes. We employ standard adjustment and causal inference procedures to predict and estimate causal effects by informing the models with varying numbers of literature-derived confounders and instantiating the exposure, outcome, and confounder variables in the models with dichotomous EHR-derived data. Finally, we compare the results from applying these procedures with naive measures of association (χ2 and reporting odds ratio) and with each other.

RESULTS AND CONCLUSIONS

We found semantic vector-based search to be superior to string-based search at reducing confounding bias. However, the effect of including more rather than fewer literature-derived confounders was inconclusive. We recommend using targeted learning estimation methods that can address treatment-confounder feedback, where confounders also behave as intermediate variables, and engaging subject-matter experts to adjudicate the handling of problematic covariates.

Collapse

Wei Q, Ji Z, Li Z, Du J, Wang J, Xu J, Xiang Y, Tiryaki F, Wu S, Zhang Y, Tao C, Xu H. A study of deep learning approaches for medication and adverse drug event extraction from clinical text. J Am Med Inform Assoc 2021;27:13-21. [PMID: 31135882 DOI: 10.1093/jamia/ocz063] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Revised: 03/23/2019] [Accepted: 04/17/2019] [Indexed: 11/13/2022] Open

Abstract

OBJECTIVE

This article presents our approaches to extraction of medications and associated adverse drug events (ADEs) from clinical documents, which is the second track of the 2018 National NLP Clinical Challenges (n2c2) shared task.

MATERIALS AND METHODS

The clinical corpus used in this study was from the MIMIC-III database and the organizers annotated 303 documents for training and 202 for testing. Our system consists of 2 components: a named entity recognition (NER) and a relation classification (RC) component. For each component, we implemented deep learning-based approaches (eg, BI-LSTM-CRF) and compared them with traditional machine learning approaches, namely, conditional random fields for NER and support vector machines for RC, respectively. In addition, we developed a deep learning-based joint model that recognizes ADEs and their relations to medications in 1 step using a sequence labeling approach. To further improve the performance, we also investigated different ensemble approaches to generating optimal performance by combining outputs from multiple approaches.

RESULTS

Our best-performing systems achieved F1 scores of 93.45% for NER, 96.30% for RC, and 89.05% for end-to-end evaluation, which ranked #2, #1, and #1 among all participants, respectively. Additional evaluations show that the deep learning-based approaches did outperform traditional machine learning algorithms in both NER and RC. The joint model that simultaneously recognizes ADEs and their relations to medications also achieved the best performance on RC, indicating its promise for relation extraction.

CONCLUSION

In this study, we developed deep learning approaches for extracting medications and their attributes such as ADEs, and demonstrated its superior performance compared with traditional machine learning algorithms, indicating its uses in broader NER and RC tasks in the medical domain.

Collapse

Christopoulou F, Tran TT, Sahu SK, Miwa M, Ananiadou S. Adverse drug events and medication relation extraction in electronic health records with ensemble deep learning methods. J Am Med Inform Assoc 2021;27:39-46. [PMID: 31390003 PMCID: PMC6913215 DOI: 10.1093/jamia/ocz101] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 03/21/2019] [Accepted: 05/24/2019] [Indexed: 01/21/2023] Open

Abstract

Objective

Identification of drugs, associated medication entities, and interactions among them are crucial to prevent unwanted effects of drug therapy, known as adverse drug events. This article describes our participation to the n2c2 shared-task in extracting relations between medication-related entities in electronic health records.

Materials and Methods

We proposed an ensemble approach for relation extraction and classification between drugs and medication-related entities. We incorporated state-of-the-art named-entity recognition (NER) models based on bidirectional long short-term memory (BiLSTM) networks and conditional random fields (CRF) for end-to-end extraction. We additionally developed separate models for intra- and inter-sentence relation extraction and combined them using an ensemble method. The intra-sentence models rely on bidirectional long short-term memory networks and attention mechanisms and are able to capture dependencies between multiple related pairs in the same sentence. For the inter-sentence relations, we adopted a neural architecture that utilizes the Transformer network to improve performance in longer sequences.

Results

Our team ranked third with a micro-averaged F1 score of 94.72% and 87.65% for relation and end-to-end relation extraction, respectively (Tracks 2 and 3). Our ensemble effectively takes advantages from our proposed models. Analysis of the reported results indicated that our proposed approach is more generalizable than the top-performing system, which employs additional training data- and corpus-driven processing techniques.

Conclusions

We proposed a relation extraction system to identify relations between drugs and medication-related entities. The proposed approach is independent of external syntactic tools. Analysis showed that by using latent Drug-Drug interactions we were able to significantly improve the performance of non–Drug-Drug pairs in EHRs.

Collapse

Nguyen T, Zhang T, Fox G, Zeng S, Cao N, Pan C, Chen JY. Linking clinotypes to phenotypes and genotypes from laboratory test results in comprehensive physical exams. BMC Med Inform Decis Mak 2021;21:51. [PMID: 33627109 PMCID: PMC7903607 DOI: 10.1186/s12911-021-01387-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Accepted: 01/06/2021] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

In this work, we aimed to demonstrate how to utilize the lab test results and other clinical information to support precision medicine research and clinical decisions on complex diseases, with the support of electronic medical record facilities. We defined "clinotypes" as clinical information that could be observed and measured objectively using biomedical instruments. From well-known 'omic' problem definitions, we defined problems using clinotype information, including stratifying patients-identifying interested sub cohorts for future studies, mining significant associations between clinotypes and specific phenotypes-diseases, and discovering potential linkages between clinotype and genomic information. We solved these problems by integrating public omic databases and applying advanced machine learning and visual analytic techniques on two-year health exam records from a large population of healthy southern Chinese individuals (size n = 91,354). When developing the solution, we carefully addressed the missing information, imbalance and non-uniformed data annotation issues.

RESULTS

We organized the techniques and solutions to address the problems and issues above into CPA framework (Clinotype Prediction and Association-finding). At the data preprocessing step, we handled the missing value issue with predicted accuracy of 0.760. We curated 12,635 clinotype-gene associations. We found 147 Associations between 147 chronic diseases-phenotype and clinotypes, which improved the disease predictive performance to AUC (average) of 0.967. We mined 182 significant clinotype-clinotype associations among 69 clinotypes.

CONCLUSIONS

Our results showed strong potential connectivity between the omics information and the clinical lab test information. The results further emphasized the needs to utilize and integrate the clinical information, especially the lab test results, in future PheWas and omic studies. Furthermore, it showed that the clinotype information could initiate an alternative research direction and serve as an independent field of data to support the well-known 'phenome' and 'genome' researches.

Collapse

Pandey B, Kumar Pandey D, Pratap Mishra B, Rhmann W. A comprehensive survey of deep learning in the field of medical imaging and medical natural language processing: Challenges and research directions. JOURNAL OF KING SAUD UNIVERSITY - COMPUTER AND INFORMATION SCIENCES 2021. [DOI: 10.1016/j.jksuci.2021.01.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

Derington CG, Mueller SR, Glanz JM, Binswanger IA. Identifying naloxone administrations in electronic health record data using a text-mining tool. Subst Abus 2020;42:806-812. [PMID: 33320803 PMCID: PMC8203755 DOI: 10.1080/08897077.2020.1856288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

Lee EK, Uppal K. CERC: an interactive content extraction, recognition, and construction tool for clinical and biomedical text. BMC Med Inform Decis Mak 2020;20:306. [PMID: 33323109 PMCID: PMC7739454 DOI: 10.1186/s12911-020-01330-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/11/2020] [Indexed: 12/24/2022] Open

Abstract

BACKGROUND

Automated summarization of scientific literature and patient records is essential for enhancing clinical decision-making and facilitating precision medicine. Most existing summarization methods are based on single indicators of relevance, offer limited capabilities for information visualization, and do not account for user specific interests. In this work, we develop an interactive content extraction, recognition, and construction system (CERC) that combines machine learning and visualization techniques with domain knowledge for highlighting and extracting salient information from clinical and biomedical text.

METHODS

A novel sentence-ranking framework multi indicator text summarization, MINTS, is developed for extractive summarization. MINTS uses random forests and multiple indicators of importance for relevance evaluation and ranking of sentences. Indicative summarization is performed using weighted term frequency-inverse document frequency scores of over-represented domain-specific terms. A controlled vocabulary dictionary generated using MeSH, SNOMED-CT, and PubTator is used for determining relevant terms. 35 full-text CRAFT articles were used as the training set. The performance of the MINTS algorithm is evaluated on a test set consisting of the remaining 32 full-text CRAFT articles and 30 clinical case reports using the ROUGE toolkit.

RESULTS

The random forests model classified sentences as "good" or "bad" with 87.5% accuracy on the test set. Summarization results from the MINTS algorithm achieved higher ROUGE-1, ROUGE-2, and ROUGE-SU4 scores when compared to methods based on single indicators such as term frequency distribution, position, eigenvector centrality (LexRank), and random selection, p < 0.01. The automatic language translator and the customizable information extraction and pre-processing pipeline for EHR demonstrate that CERC can readily be incorporated within clinical decision support systems to improve quality of care and assist in data-driven and evidence-based informed decision making for direct patient care.

CONCLUSIONS

We have developed a web-based summarization and visualization tool, CERC ( https://newton.isye.gatech.edu/CERC1/ ), for extracting salient information from clinical and biomedical text. The system ranks sentences by relevance and includes features that can facilitate early detection of medical risks in a clinical setting. The interactive interface allows users to filter content and edit/save summaries. The evaluation results on two test corpuses show that the newly developed MINTS algorithm outperforms methods based on single characteristics of importance.

Collapse

Routray R, Tetarenko N, Abu-Assal C, Mockute R, Assuncao B, Chen H, Bao S, Danysz K, Desai S, Cicirello S, Willis V, Alford SH, Krishnamurthy V, Mingle E. Application of Augmented Intelligence for Pharmacovigilance Case Seriousness Determination. Drug Saf 2020;43:57-66. [PMID: 31605285 PMCID: PMC6965337 DOI: 10.1007/s40264-019-00869-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]

Abstract

INTRODUCTION

Identification of adverse events and determination of their seriousness ensures timely detection of potential patient safety concerns. Adverse event seriousness is a key factor in defining reporting timelines and is often performed manually by pharmacovigilance experts. The dramatic increase in the volume of safety reports necessitates exploration of scalable solutions that also meet reporting timeline requirements.

OBJECTIVE

The aim of this study was to develop an augmented intelligence methodology for automatically identifying adverse event seriousness in spontaneous, solicited, and medical literature safety reports. Deep learning models were evaluated for accuracy and/or the F1 score against a ground truth labeled by pharmacovigilance experts.

METHODS

Using a stratified random sample of safety reports received by Celgene, we developed three neural networks for addressing identification of adverse event seriousness: (1) a binary adverse-event level seriousness classifier; (2) a classifier for determining seriousness categorization at the adverse-event level; and (3) an annotator for identifying seriousness criteria terms to provide supporting evidence at the document level.

RESULTS

The seriousness classifier achieved an accuracy of 83.0% in post-marketing reports, 92.9% in solicited reports, and 86.3% in medical literature reports. F1 scores for seriousness categorization were 77.7 for death, 78.9 for hospitalization, and 75.5 for important medical events. The seriousness annotator achieved an F1 score of 89.9 in solicited reports, and 75.2 in medical literature reports.

CONCLUSIONS

The results of this study indicate that a neural network approach can provide an accurate and scalable solution for potentially augmenting pharmacovigilance practitioner determination of adverse event seriousness in spontaneous, solicited, and medical literature reports.

Collapse

Crowson MG, Hamour A, Lin V, Chen JM, Chan TCY. Machine learning for pattern detection in cochlear implant FDA adverse event reports. Cochlear Implants Int 2020;21:313-322. [DOI: 10.1080/14670100.2020.1784569] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

Eskildsen NK, Eriksson R, Christensen SB, Aghassipour TS, Bygsø MJ, Brunak S, Hansen SL. Implementation and comparison of two text mining methods with a standard pharmacovigilance method for signal detection of medication errors. BMC Med Inform Decis Mak 2020;20:94. [PMID: 32448248 PMCID: PMC7245808 DOI: 10.1186/s12911-020-1097-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Accepted: 04/21/2020] [Indexed: 11/16/2022] Open

Abstract

Background

Medication errors have been identified as the most common preventable cause of adverse events. The lack of granularity in medication error terminology has led pharmacovigilance experts to rely on information in individual case safety reports’ (ICSRs) codes and narratives for signal detection, which is both time consuming and labour intensive. Thus, there is a need for complementary methods for the detection of medication errors from ICSRs. The aim of this study is to evaluate the utility of two natural language processing text mining methods as complementary tools to the traditional approach followed by pharmacovigilance experts for medication error signal detection.

Methods

The safety surveillance advisor (SSA) method, I2E text mining and University of Copenhagen Center for Protein Research (CPR) text mining, were evaluated for their ability to extract cases containing a type of medication error where patients extracted insulin from a prefilled pen or cartridge by a syringe. A total of 154,209 ICSRs were retrieved from Novo Nordisk’s safety database from January 1987 to February 2018. Each method was evaluated by recall (sensitivity) and precision (positive predictive value).

Results

We manually annotated 2533 ICSRs to investigate whether these contained the sought medication error. All these ICSRs were then analysed using the three methods. The recall was 90.4, 88.1 and 78.5% for the CPR text mining, the SSA method and the I2E text mining, respectively. Precision was low for all three methods ranging from 3.4% for the SSA method to 1.9 and 1.6% for the CPR and I2E text mining methods, respectively.

Conclusions

Text mining methods can, with advantage, be used for the detection of complex signals relying on information found in unstructured text (e.g., ICSR narratives) as standardised and both less labour-intensive and time-consuming methods compared to traditional pharmacovigilance methods. The employment of text mining in pharmacovigilance need not be limited to the surveillance of potential medication errors but can be used for the ongoing regulatory requests, e.g., obligations in risk management plans and may thus be utilised broadly for signal detection and ongoing surveillance activities.

Collapse

Zhang Y, Cui S, Gao H. Adverse drug reaction detection on social media with deep linguistic features. J Biomed Inform 2020;106:103437. [PMID: 32360987 DOI: 10.1016/j.jbi.2020.103437] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2019] [Revised: 04/02/2020] [Accepted: 04/26/2020] [Indexed: 11/26/2022]

Koleck TA, Dreisbach C, Bourne PE, Bakken S. Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review. J Am Med Inform Assoc 2020;26:364-379. [PMID: 30726935 DOI: 10.1093/jamia/ocy173] [Citation(s) in RCA: 200] [Impact Index Per Article: 50.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 11/20/2018] [Accepted: 11/27/2018] [Indexed: 12/26/2022] Open

Abstract

OBJECTIVE

Natural language processing (NLP) of symptoms from electronic health records (EHRs) could contribute to the advancement of symptom science. We aim to synthesize the literature on the use of NLP to process or analyze symptom information documented in EHR free-text narratives.

MATERIALS AND METHODS

Our search of 1964 records from PubMed and EMBASE was narrowed to 27 eligible articles. Data related to the purpose, free-text corpus, patients, symptoms, NLP methodology, evaluation metrics, and quality indicators were extracted for each study.

RESULTS

Symptom-related information was presented as a primary outcome in 14 studies. EHR narratives represented various inpatient and outpatient clinical specialties, with general, cardiology, and mental health occurring most frequently. Studies encompassed a wide variety of symptoms, including shortness of breath, pain, nausea, dizziness, disturbed sleep, constipation, and depressed mood. NLP approaches included previously developed NLP tools, classification methods, and manually curated rule-based processing. Only one-third (n = 9) of studies reported patient demographic characteristics.

DISCUSSION

NLP is used to extract information from EHR free-text narratives written by a variety of healthcare providers on an expansive range of symptoms across diverse clinical specialties. The current focus of this field is on the development of methods to extract symptom information and the use of symptom information for disease classification tasks rather than the examination of symptoms themselves.

CONCLUSION

Future NLP studies should concentrate on the investigation of symptoms and symptom documentation in EHR free-text narratives. Efforts should be undertaken to examine patient characteristics and make symptom-related NLP algorithms or pipelines and vocabularies openly available.

Collapse

Chung AE, Shoenbill K, Mitchell SA, Dueck AC, Schrag D, Bruner DW, Minasian LM, St Germain D, O'Mara AM, Baumgartner P, Rogak LJ, Abernethy AP, Griffin AC, Basch EM. Patient free text reporting of symptomatic adverse events in cancer clinical research using the National Cancer Institute's Patient-Reported Outcomes version of the Common Terminology Criteria for Adverse Events (PRO-CTCAE). J Am Med Inform Assoc 2020;26:276-285. [PMID: 30840079 DOI: 10.1093/jamia/ocy169] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2018] [Revised: 10/17/2018] [Accepted: 11/26/2018] [Indexed: 11/14/2022] Open

Abstract

OBJECTIVE

The study sought to describe patient-entered supplemental information on symptomatic adverse events (AEs) in cancer clinical research reported via a National Cancer Institute software system and examine the feasibility of mapping these entries to established terminologies.

MATERIALS AND METHODS

Patients in 3 multicenter trials electronically completed surveys during cancer treatment. Each survey included a prespecified subset of items from the National Cancer Institute's Patient-Reported Outcomes version of the Common Terminology Criteria for Adverse Events (PRO-CTCAE). Upon completion of the survey items, patients could add supplemental symptomatic AE information in a free text box. As patients typed into the box, structured dropdown terms could be selected from the PRO-CTCAE item library or Medical Dictionary for Regulatory Activities (MedDRA), or patients could type unstructured free text for submission.

RESULTS

Data were pooled from 1760 participants (48% women; 78% White) who completed 8892 surveys, of which 2387 (26.8%) included supplemental symptomatic AE information. Overall, 1024 (58%) patients entered supplemental information at least once, with an average of 2.3 per patient per study. This encompassed 1474 of 8892 (16.6%) dropdowns and 913 of 8892 (10.3%) unstructured free text entries. One-third of the unstructured free text entries (32%) could be mapped post hoc to a PRO-CTCAE term and 68% to a MedDRA term.

DISCUSSION

Participants frequently added supplemental information beyond study-specific survey items. Almost half selected a structured dropdown term, although many opted to submit unstructured free text entries. Most free text entries could be mapped post hoc to PRO-CTCAE or MedDRA terms, suggesting opportunities to enhance the system to perform real-time mapping for AE reporting.

CONCLUSIONS

Patient reporting of symptomatic AEs using a text box functionality with mapping to existing terminologies is both feasible and informative.

Collapse

Affiliation(s)

Arlene E Chung Department of Medicine, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA.,Program on Health and Clinical Informatics, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA.,Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
Kimberly Shoenbill Program on Health and Clinical Informatics, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA.,Department of Family Medicine, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
Sandra A Mitchell National Cancer Institute, Rockville, Maryland, USA
Amylou C Dueck Alliance Statistics and Data Center, Mayo Clinic, Scottsdale, Arizona, USA
Deborah Schrag Division of Population Sciences, Department of Medical Oncology, Dana-Farber/Harvard Cancer Center, Brookline, Massachusetts, USA
Deborah W Bruner Nell Hodgson Woodruff School of Nursing, Winship Cancer Institute, Emory University, Atlanta, Georgia, USA
Lori M Minasian National Cancer Institute, Rockville, Maryland, USA
Diane St Germain National Cancer Institute, Rockville, Maryland, USA
Ann M O'Mara National Cancer Institute, Rockville, Maryland, USA
Paul Baumgartner Semantic Bits, LLC, Herndon, Virginia, USA
Lauren J Rogak Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York, USA
Amy P Abernethy Department of Medicine, Duke Cancer Institute, Durham, North Carolina, USA.,Flatiron Health, New York, New York, USA
Ashley C Griffin Program on Health and Clinical Informatics, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
Ethan M Basch Department of Medicine, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA.,Program on Health and Clinical Informatics, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA.,Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.,Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York, USA

Collapse

Dang TT, Nguyen TH, Ho TB. Causality Assessment of Adverse Drug Reaction: Controlling Confounding Induced by Polypharmacy. Curr Pharm Des 2020;25:1134-1143. [PMID: 31038058 DOI: 10.2174/1381612825666190416115714] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Accepted: 04/01/2019] [Indexed: 11/22/2022]

Bielinski SJ, St Sauver JL, Olson JE, Larson NB, Black JL, Scherer SE, Bernard ME, Boerwinkle E, Borah BJ, Caraballo PJ, Curry TB, Doddapaneni H, Formea CM, Freimuth RR, Gibbs RA, Giri J, Hathcock MA, Hu J, Jacobson DJ, Jones LA, Kalla S, Koep TH, Korchina V, Kovar CL, Lee S, Liu H, Matey ET, McGree ME, McAllister TM, Moyer AM, Muzny DM, Nicholson WT, Oyen LJ, Qin X, Raj R, Roger VL, Rohrer Vitek CR, Ross JL, Sharp RR, Takahashi PY, Venner E, Walker K, Wang L, Wang Q, Wright JA, Wu TJ, Wang L, Weinshilboum RM. Cohort Profile: The Right Drug, Right Dose, Right Time: Using Genomic Data to Individualize Treatment Protocol (RIGHT Protocol). Int J Epidemiol 2020;49:23-24k. [PMID: 31378813 PMCID: PMC7124480 DOI: 10.1093/ije/dyz123] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/31/2019] [Indexed: 12/29/2022] Open

Affiliation(s)

Suzette J Bielinski Division of Epidemiology, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
Jennifer L St Sauver Division of Epidemiology, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA Robert D and Patricia E Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN, USA
Janet E Olson Division of Epidemiology, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA Center for Individualized Medicine, Mayo Clinic, Rochester, MN, USA
Nicholas B Larson Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
John L Black Center for Individualized Medicine, Mayo Clinic, Rochester, MN, USA Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA
Steven E Scherer Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
Matthew E Bernard Department of Family Medicine, Mayo Clinic, Rochester, MN, USA
Eric Boerwinkle Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, School of Public Health, University of Texas Health Science Center at Houston, Houston, TX, USA
Bijan J Borah Robert D and Patricia E Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN, USA Division of Health Care Policy and Research, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
Pedro J Caraballo Division of General Internal Medicine, Department of Medicine, Mayo Clinic, Rochester, MN, USA
Timothy B Curry Center for Individualized Medicine, Mayo Clinic, Rochester, MN, USA Department of Anesthesia and Perioperative Medicine, Mayo Clinic, Rochester, MN, USA
HarshaVardhan Doddapaneni Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
Christine M Formea Department of Pharmacy, Mayo Clinic, Rochester, MN, USA
Robert R Freimuth Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
Richard A Gibbs Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
Jyothsna Giri Center for Individualized Medicine, Mayo Clinic, Rochester, MN, USA
Matthew A Hathcock Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
Jianhong Hu Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
Debra J Jacobson Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
Leila A Jones Center for Individualized Medicine, Mayo Clinic, Rochester, MN, USA
Sara Kalla Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
Tyler H Koep OneOme, LLC, Minneapolis, MN, USA
Viktoriya Korchina Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
Christie L Kovar Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
Sandra Lee Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
Hongfang Liu Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
Eric T Matey Center for Individualized Medicine, Mayo Clinic, Rochester, MN, USA Department of Pharmacy, Mayo Clinic, Rochester, MN, USA
Michaela E McGree Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
Tammy M McAllister Center for Individualized Medicine, Mayo Clinic, Rochester, MN, USA
Ann M Moyer Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA
Donna M Muzny Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
Wayne T Nicholson Department of Anesthesia and Perioperative Medicine, Mayo Clinic, Rochester, MN, USA
Lance J Oyen Department of Pharmacy, Mayo Clinic, Rochester, MN, USA
Xiang Qin Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
Ritika Raj Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
Véronique L Roger Division of Epidemiology, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA Division of Cardiovascular Diseases, Department of Internal Medicine, Mayo Clinic, Rochester, MN, USA
Carolyn R Rohrer Vitek Center for Individualized Medicine, Mayo Clinic, Rochester, MN, USA
Jason L Ross OneOme, LLC, Minneapolis, MN, USA
Richard R Sharp Center for Individualized Medicine, Mayo Clinic, Rochester, MN, USA Division of Health Care Policy and Research, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
Paul Y Takahashi Division of Community Internal Medicine, Department of Medicine, Mayo Clinic, Rochester, MN, USA
Eric Venner Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
Kimberly Walker Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
Liwei Wang Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
Qiaoyan Wang Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
Jessica A Wright Center for Individualized Medicine, Mayo Clinic, Rochester, MN, USA Department of Pharmacy, Mayo Clinic, Rochester, MN, USA
Tsung-Jung Wu Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
Liewei Wang Center for Individualized Medicine, Mayo Clinic, Rochester, MN, USA Division of Clinical Pharmacology, Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN, USA
Richard M Weinshilboum Center for Individualized Medicine, Mayo Clinic, Rochester, MN, USA Division of Clinical Pharmacology, Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN, USA

Collapse

Mohammadhassanzadeh H, Sketris I, Traynor R, Alexander S, Winquist B, Stewart SA. Using Natural Language Processing to Examine the Uptake, Content, and Readability of Media Coverage of a Pan-Canadian Drug Safety Research Project: Cross-Sectional Observational Study. JMIR Form Res 2020;4:e13296. [PMID: 31934872 PMCID: PMC6996767 DOI: 10.2196/13296] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Revised: 07/11/2019] [Accepted: 09/26/2019] [Indexed: 11/18/2022] Open

Abstract

Background

Isotretinoin, for treating cystic acne, increases the risk of miscarriage and fetal abnormalities when taken during pregnancy. The Health Canada–approved product monograph for isotretinoin includes pregnancy prevention guidelines. A recent study by the Canadian Network for Observational Drug Effect Studies (CNODES) on the occurrence of pregnancy and pregnancy outcomes during isotretinoin therapy estimated poor adherence to these guidelines. Media uptake of this study was unknown; awareness of this uptake could help improve drug safety communication.

Objective

The aim of this study was to understand how the media present pharmacoepidemiological research using the CNODES isotretinoin study as a case study.

Methods

Google News was searched (April 25-May 6, 2016), using a predefined set of terms, for mention of the CNODES study. In total, 26 articles and 3 CNODES publications (original article, press release, and podcast) were identified. The article texts were cleaned (eg, advertisements and links removed), and the podcast was transcribed. A dictionary of 1295 unique words was created using natural language processing (NLP) techniques (term frequency-inverse document frequency, Porter stemming, and stop-word filtering) to identify common words and phrases. Similarity between the articles and reference publications was calculated using Euclidian distance; articles were grouped using hierarchical agglomerative clustering. Nine readability scales were applied to measure text readability based on factors such as number of words, difficult words, syllables, sentence counts, and other textual metrics.

Results

The top 5 dictionary words were pregnancy (250 appearances), isotretinoin (220), study (209), drug (201), and women (185). Three distinct clusters were identified: Clusters 2 (5 articles) and 3 (4 articles) were from health-related websites and media, respectively; Cluster 1 (18 articles) contained largely media sources; 2 articles fell outside these clusters. Use of the term isotretinoin versus Accutane (a brand name of isotretinoin), discussion of pregnancy complications, and assignment of responsibility for guideline adherence varied between clusters. For example, the term pregnanc appeared most often in Clusters 1 (14.6 average times per article) and 2 (11.4) and relatively infrequently in Cluster 3 (1.8). Average readability for all articles was high (eg, Flesch-Kincaid, 13; Gunning Fog, 15; SMOG Index, 10; Coleman Liau Index, 15; Linsear Write Index, 13; and Text Standard, 13). Readability increased from Cluster 2 (Gunning Fog of 16.9) to 3 (12.2). It varied between clusters (average 13th-15th grade) but exceeded the recommended health information reading level (grade 6th to 8th), overall.

Conclusions

Media interpretation of the CNODES study varied, with differences in synonym usage and areas of focus. All articles were written above the recommended health information reading level. Analyzing media using NLP techniques can help determine drug safety communication effectiveness. This project is important for understanding how drug safety studies are taken up and redistributed in the media.

Collapse

Krzhizhanovskaya VV, Závodszky G, Lees MH, Dongarra JJ, Sloot PMA, Brissos S, Teixeira J. Applicability of Machine Learning Methods to Multi-label Medical Text Classification. LECTURE NOTES IN COMPUTER SCIENCE 2020. [PMCID: PMC7303696 DOI: 10.1007/978-3-030-50423-6_38] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Natural Language Processing Combined with ICD-9-CM Codes as a Novel Method to Study the Epidemiology of Allergic Drug Reactions. THE JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY-IN PRACTICE 2019;8:1032-1038.e1. [PMID: 31857264 DOI: 10.1016/j.jaip.2019.12.007] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Revised: 11/25/2019] [Accepted: 12/02/2019] [Indexed: 11/20/2022]

Gefen D, Ben-Assuli O, Shlomo N, Robertson N, Klempfner R. A case study of applying text analysis to identify possible adverse drug interactions: The case of Adalat (Nifedipine). Health Informatics J 2019;26:1455-1464. [PMID: 31635509 DOI: 10.1177/1460458219882269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Wang Y, Fan X, Chen L, Chang EIC, Ananiadou S, Tsujii J, Xu Y. Mapping anatomical related entities to human body parts based on wikipedia in discharge summaries. BMC Bioinformatics 2019;20:430. [PMID: 31419946 PMCID: PMC6697955 DOI: 10.1186/s12859-019-3005-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Accepted: 07/23/2019] [Indexed: 11/16/2022] Open

Fan DF, Yu YC, Ding XS, Nie XL, Wei R, Feng XY, Peng XX, Gao MM, Jia LL, Wang XL. Exploring the drug-induced anemia signals in children using electronic medical records. Expert Opin Drug Saf 2019;18:993-999. [PMID: 31315002 DOI: 10.1080/14740338.2019.1645832] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]

Machine Learning for Feature Selection and Cluster Analysis in Drug Utilisation Research. CURR EPIDEMIOL REP 2019. [DOI: 10.1007/s40471-019-00211-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

Zheng C, Yu W, Xie F, Chen W, Mercado C, Sy LS, Qian L, Glenn S, Lee G, Tseng HF, Duffy J, Jackson LA, Daley MF, Crane B, McLean HQ, Jacobsen SJ. The use of natural language processing to identify Tdap-related local reactions at five health care systems in the Vaccine Safety Datalink. Int J Med Inform 2019;127:27-34. [PMID: 31128829 PMCID: PMC6645678 DOI: 10.1016/j.ijmedinf.2019.04.009] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2018] [Revised: 01/31/2019] [Accepted: 04/12/2019] [Indexed: 01/28/2023]

Thompson J, Hu J, Mudaranthakam DP, Streeter D, Neums L, Park M, Koestler DC, Gajewski B, Jensen R, Mayo MS. Relevant Word Order Vectorization for Improved Natural Language Processing in Electronic Health Records. Sci Rep 2019;9:9253. [PMID: 31239489 PMCID: PMC6592944 DOI: 10.1038/s41598-019-45705-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Accepted: 06/11/2019] [Indexed: 12/14/2022] Open

Beck EM, Hatton ND, Ryan JJ. Novel techniques for advancing our understanding of pulmonary arterial hypertension. Eur Respir J 2019;53:53/5/1900556. [DOI: 10.1183/13993003.00556-2019] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Accepted: 03/20/2019] [Indexed: 01/18/2023]

Automatic Disease Annotation From Radiology Reports Using Artificial Intelligence Implemented by a Recurrent Neural Network. AJR Am J Roentgenol 2019;212:734-740. [DOI: 10.2214/ajr.18.19869] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]

Zeng Z, Espino S, Roy A, Li X, Khan SA, Clare SE, Jiang X, Neapolitan R, Luo Y. Using natural language processing and machine learning to identify breast cancer local recurrence. BMC Bioinformatics 2018;19:498. [PMID: 30591037 PMCID: PMC6309052 DOI: 10.1186/s12859-018-2466-x] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open

Ta CN, Dumontier M, Hripcsak G, Tatonetti NP, Weng C. Columbia Open Health Data, clinical concept prevalence and co-occurrence from electronic health records. Sci Data 2018;5:180273. [PMID: 30480666 PMCID: PMC6257042 DOI: 10.1038/sdata.2018.273] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2018] [Accepted: 10/16/2018] [Indexed: 12/11/2022] Open

Wang L, Rastegar-Mojarad M, Ji Z, Liu S, Liu K, Moon S, Shen F, Wang Y, Yao L, Davis Iii JM, Liu H. Detecting Pharmacovigilance Signals Combining Electronic Medical Records With Spontaneous Reports: A Case Study of Conventional Disease-Modifying Antirheumatic Drugs for Rheumatoid Arthritis. Front Pharmacol 2018;9:875. [PMID: 30131701 PMCID: PMC6090179 DOI: 10.3389/fphar.2018.00875] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2017] [Accepted: 07/19/2018] [Indexed: 12/24/2022] Open

Combi C, Zorzi M, Pozzani G, Arzenton E, Moretti U. Normalizing Spontaneous Reports Into MedDRA: Some Experiments With MagiCoder. IEEE J Biomed Health Inform 2018;23:95-102. [PMID: 30059326 DOI: 10.1109/jbhi.2018.2861213] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Bhasuran B, Natarajan J. Automatic extraction of gene-disease associations from literature using joint ensemble learning. PLoS One 2018;13:e0200699. [PMID: 30048465 PMCID: PMC6061985 DOI: 10.1371/journal.pone.0200699] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2018] [Accepted: 07/02/2018] [Indexed: 12/26/2022] Open

Chen X, Faviez C, Schuck S, Lillo-Le-Louët A, Texier N, Dahamna B, Huot C, Foulquié P, Pereira S, Leroux V, Karapetiantz P, Guenegou-Arnoux A, Katsahian S, Bousquet C, Burgun A. Mining Patients' Narratives in Social Media for Pharmacovigilance: Adverse Effects and Misuse of Methylphenidate. Front Pharmacol 2018;9:541. [PMID: 29881351 PMCID: PMC5978246 DOI: 10.3389/fphar.2018.00541] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Accepted: 05/04/2018] [Indexed: 12/29/2022] Open

Abstract

Background: The Food and Drug Administration (FDA) in the United States and the European Medicines Agency (EMA) have recognized social media as a new data source to strengthen their activities regarding drug safety.

Objective: Our objective in the ADR-PRISM project was to provide text mining and visualization tools to explore a corpus of posts extracted from social media. We evaluated this approach on a corpus of 21 million posts from five patient forums, and conducted a qualitative analysis of the data available on methylphenidate in this corpus.

Methods: We applied text mining methods based on named entity recognition and relation extraction in the corpus, followed by signal detection using proportional reporting ratio (PRR). We also used topic modeling based on the Correlated Topic Model to obtain the list of the matics in the corpus and classify the messages based on their topics.

Results: We automatically identified 3443 posts about methylphenidate published between 2007 and 2016, among which 61 adverse drug reactions (ADR) were automatically detected. Two pharmacovigilance experts evaluated manually the quality of automatic identification, and a f-measure of 0.57 was reached. Patient's reports were mainly neuro-psychiatric effects. Applying PRR, 67% of the ADRs were signals, including most of the neuro-psychiatric symptoms but also palpitations. Topic modeling showed that the most represented topics were related to Childhood and Treatment initiation, but also Side effects. Cases of misuse were also identified in this corpus, including recreational use and abuse.

Conclusion: Named entity recognition combined with signal detection and topic modeling have demonstrated their complementarity in mining social media data. An in-depth analysis focused on methylphenidate showed that this approach was able to detect potential signals and to provide better understanding of patients' behaviors regarding drugs, including misuse.

Collapse

Affiliation(s)

Xiaoyi Chen UMRS 1138, équipe 22, Institut National de la Santé et de la Recherche Médicale, Centre de Recherche des Cordeliers, Université Paris Descartes, Paris, France
Carole Faviez Kappa Santé, Paris, France
Stéphane Schuck Kappa Santé, Paris, France
Agnès Lillo-Le-Louët Centre Régional de Pharmacovigilance, Hôpital Européen Georges-Pompidou, AP-HP, Paris, France
Nathalie Texier Kappa Santé, Paris, France
Badisse Dahamna Service d'Informatique Biomédicale, Centre Hospitalier Universitaire de Rouen, Rouen, France.,Laboratoire d'Informatique, du Traitement de l'Information et des Systèmes-TIBS EA 4108, Rouen, France
Charles Huot Expert System, Paris, France
Pierre Foulquié Kappa Santé, Paris, France
Suzanne Pereira Vidal, Issy Les Moulineaux, France
Vincent Leroux Institut de Santé Urbaine, Saint-Maurice, France
Pierre Karapetiantz UMRS 1138, équipe 22, Institut National de la Santé et de la Recherche Médicale, Centre de Recherche des Cordeliers, Université Paris Descartes, Paris, France
Armelle Guenegou-Arnoux UMRS 1138, équipe 22, Institut National de la Santé et de la Recherche Médicale, Centre de Recherche des Cordeliers, Université Paris Descartes, Paris, France
Sandrine Katsahian UMRS 1138, équipe 22, Institut National de la Santé et de la Recherche Médicale, Centre de Recherche des Cordeliers, Université Paris Descartes, Paris, France.,Département d'Informatique Médicale, Hôpital Européen Georges Pompidou, Paris, France
Cédric Bousquet Sorbonne Université, Inserm, université Paris 13, Laboratoire d'informatique médicale et d'ingénierie des connaissances en e-santé, LIMICS, Paris, France
Anita Burgun UMRS 1138, équipe 22, Institut National de la Santé et de la Recherche Médicale, Centre de Recherche des Cordeliers, Université Paris Descartes, Paris, France.,Département d'Informatique Médicale, Hôpital Européen Georges Pompidou, Paris, France

Collapse

Smith JC, Chen Q, Denny JC, Roden DM, Johnson KB, Miller RA. Evaluation of a Novel System to Enhance Clinicians' Recognition of Preadmission Adverse Drug Reactions. Appl Clin Inform 2018;9:313-325. [PMID: 29742757 DOI: 10.1055/s-0038-1646963] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open

Abstract

BACKGROUND

Often unrecognized by providers, adverse drug reactions (ADRs) diminish patients' quality of life, cause preventable admissions and emergency department visits, and increase health care costs.

OBJECTIVE

This article evaluates whether an automated system, the Adverse Drug Effect Recognizer (ADER), could assist clinicians in detecting and addressing inpatients' ongoing preadmission ADRs.

METHODS

ADER uses natural language processing to extract patients' medications, findings, and past diagnoses from admission notes. It compares excerpted information to a database of known medication adverse effects and promptly warns clinicians about potential ongoing ADRs and potential confounders via alerts placed in patients' electronic health records (EHRs). A 3-month intervention trial evaluated ADER's impact on antihypertensive medication ordering behaviors. At the time of patient admission, ADER warned providers on the Internal Medicine wards of Vanderbilt University Hospital about potential ongoing preadmission antihypertensive medication ADRs. A retrospective control group, comprised similar physicians from a period prior to the intervention, received no alerts. The evaluation compared ordering behaviors for each group to determine if preadmission medications changed during hospitalization or at discharge. The study also analyzed intervention group participants' survey responses and user comments.

RESULTS

ADER identified potential preadmission ADRs for 30% of both groups. Compared with controls, intervention providers more often withheld or discontinued suspected ADR-causing medications during the inpatient stay (p < 0.001). Intervention providers who responded to alert-related surveys held or discontinued suspected ADR-causing medications more often at discharge (p < 0.001).

CONCLUSION

Results indicate that ADER helped physicians recognize ADRs and reduced ordering of suspected ADR-causing medications. In hospitals using EHRs, ADER-like systems could improve clinicians' recognition and elimination of ongoing ADRs.

Collapse