1
|
Karapetiantz P, Audeh B, Redjdal A, Tiffet T, Bousquet C, Jaulent MC. Monitoring Adverse Drug Events in Web Forums: Evaluation of a Pipeline and Use Case Study. J Med Internet Res 2024; 26:e46176. [PMID: 38888956 PMCID: PMC11220433 DOI: 10.2196/46176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 10/20/2023] [Accepted: 03/12/2024] [Indexed: 06/20/2024] Open
Abstract
BACKGROUND To mitigate safety concerns, regulatory agencies must make informed decisions regarding drug usage and adverse drug events (ADEs). The primary pharmacovigilance data stem from spontaneous reports by health care professionals. However, underreporting poses a notable challenge within the current system. Explorations into alternative sources, including electronic patient records and social media, have been undertaken. Nevertheless, social media's potential remains largely untapped in real-world scenarios. OBJECTIVE The challenge faced by regulatory agencies in using social media is primarily attributed to the absence of suitable tools to support decision makers. An effective tool should enable access to information via a graphical user interface, presenting data in a user-friendly manner rather than in their raw form. This interface should offer various visualization options, empowering users to choose representations that best convey the data and facilitate informed decision-making. Thus, this study aims to assess the potential of integrating social media into pharmacovigilance and enhancing decision-making with this novel data source. To achieve this, our objective was to develop and assess a pipeline that processes data from the extraction of web forum posts to the generation of indicators and alerts within a visual and interactive environment. The goal was to create a user-friendly tool that enables regulatory authorities to make better-informed decisions effectively. METHODS To enhance pharmacovigilance efforts, we have devised a pipeline comprising 4 distinct modules, each independently editable, aimed at efficiently analyzing health-related French web forums. These modules were (1) web forums' posts extraction, (2) web forums' posts annotation, (3) statistics and signal detection algorithm, and (4) a graphical user interface (GUI). We showcase the efficacy of the GUI through an illustrative case study involving the introduction of the new formula of Levothyrox in France. This event led to a surge in reports to the French regulatory authority. RESULTS Between January 1, 2017, and February 28, 2021, a total of 2,081,296 posts were extracted from 23 French web forums. These posts contained 437,192 normalized drug-ADE couples, annotated with the Anatomical Therapeutic Chemical (ATC) Classification and Medical Dictionary for Regulatory Activities (MedDRA). The analysis of the Levothyrox new formula revealed a notable pattern. In August 2017, there was a sharp increase in posts related to this medication on social media platforms, which coincided with a substantial uptick in reports submitted by patients to the national regulatory authority during the same period. CONCLUSIONS We demonstrated that conducting quantitative analysis using the GUI is straightforward and requires no coding. The results aligned with prior research and also offered potential insights into drug-related matters. Our hypothesis received partial confirmation because the final users were not involved in the evaluation process. Further studies, concentrating on ergonomics and the impact on professionals within regulatory agencies, are imperative for future research endeavors. We emphasized the versatility of our approach and the seamless interoperability between different modules over the performance of individual modules. Specifically, the annotation module was integrated early in the development process and could undergo substantial enhancement by leveraging contemporary techniques rooted in the Transformers architecture. Our pipeline holds potential applications in health surveillance by regulatory agencies or pharmaceutical companies, aiding in the identification of safety concerns. Moreover, it could be used by research teams for retrospective analysis of events.
Collapse
Affiliation(s)
- Pierre Karapetiantz
- Inserm, Sorbonne Université, université Paris 13, Laboratoire d'informatique médicale et d'ingénierie des connaissances en e-santé, LIMICS, F-75006, Paris, France
| | - Bissan Audeh
- Inserm, Sorbonne Université, université Paris 13, Laboratoire d'informatique médicale et d'ingénierie des connaissances en e-santé, LIMICS, F-75006, Paris, France
| | - Akram Redjdal
- Inserm, Sorbonne Université, université Paris 13, Laboratoire d'informatique médicale et d'ingénierie des connaissances en e-santé, LIMICS, F-75006, Paris, France
| | - Théophile Tiffet
- Service de santé publique et information médicale, CHU de Saint Etienne, 42000 Saint-Etienne, France
- Institut National de la Santé et de la Recherche Médicale, Université Jean Monnet, SAnté INgéniérie BIOlogie St-Etienne, SAINBIOSE, 42270 Saint-Priest-en-Jarez, France
| | - Cédric Bousquet
- Inserm, Sorbonne Université, université Paris 13, Laboratoire d'informatique médicale et d'ingénierie des connaissances en e-santé, LIMICS, F-75006, Paris, France
- Service de santé publique et information médicale, CHU de Saint Etienne, 42000 Saint-Etienne, France
| | - Marie-Christine Jaulent
- Inserm, Sorbonne Université, université Paris 13, Laboratoire d'informatique médicale et d'ingénierie des connaissances en e-santé, LIMICS, F-75006, Paris, France
| |
Collapse
|
2
|
Lee S, Woo H, Lee CC, Kim G, Kim JY, Lee S. Drug_SNSMiner: standard pharmacovigilance pipeline for detection of adverse drug reaction using SNS data. Sci Rep 2023; 13:3779. [PMID: 36882478 PMCID: PMC9992476 DOI: 10.1038/s41598-023-28912-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Accepted: 01/27/2023] [Indexed: 03/09/2023] Open
Abstract
As society continues to age, it is becoming increasingly important to monitor drug use in the elderly. Social media data have been used for monitoring adverse drug reactions. The aim of this study was to determine whether social network studies (SNS) are useful sources of drug side effects information. We propose a method for utilizing SNS data to plot the known side effects of geriatric drugs in a dosing map. We developed a lexicon of drug terms associated with side effects and mapped patterns from social media data. We confirmed that well-known side effects may be obtained by utilizing SNS data. Based on these results, we propose a pharmacovigilance pipeline that can be extended to unknown side effects. We propose the standard analysis pipeline Drug_SNSMiner for monitoring side effects using SNS data and evaluated it as a drug prescription platform for the elderly. We confirmed that side effects may be monitored from the consumer's perspective based on SNS data using only drug information. SNS data were deemed good sources of information to determine ADRs and obtain other complementary data. We established that these learning data are invaluable for AI requiring the acquisition of ADR posts on efficacious drugs.
Collapse
Affiliation(s)
- Seunghee Lee
- Healthcare Data Science Center, Konyang University Hospital, Daejeon, 35365, Republic of Korea
| | - Hyekyung Woo
- Department of Health Administration, Kongju National University, Gongju, 32588, Republic of Korea.,Institute of Health and Environment, Kongju National University, Gongju, 32588, Republic of Korea
| | - Chung Chun Lee
- Department of Biomedical Informatics, College of Medicine, Konyang University, Daejeon, 35365, Republic of Korea
| | - Gyeongmin Kim
- Department of Biomedical Engineering, Konyang University, Daejeon, 35365, Republic of Korea
| | - Jong-Yeup Kim
- Healthcare Data Science Center, Konyang University Hospital, Daejeon, 35365, Republic of Korea. .,Department of Biomedical Informatics, College of Medicine, Konyang University, Daejeon, 35365, Republic of Korea.
| | - Suehyun Lee
- College of IT Convergence, Gachon University, Seongnam, 13120, Republic of Korea.
| |
Collapse
|
3
|
Martenot V, Masdeu V, Cupe J, Gehin F, Blanchon M, Dauriat J, Horst A, Renaudin M, Girard P, Zucker JD. LiSA: an assisted literature search pipeline for detecting serious adverse drug events with deep learning. BMC Med Inform Decis Mak 2022; 22:338. [PMID: 36550485 PMCID: PMC9773506 DOI: 10.1186/s12911-022-02085-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 12/13/2022] [Indexed: 12/24/2022] Open
Abstract
INTRODUCTION Detecting safety signals attributed to a drug in scientific literature is a fundamental issue in pharmacovigilance. The constant increase in the volume of publications requires the automation of this tedious task, in order to find and extract relevant articles from the pack. This task is critical, as serious Adverse Drug Reactions (ADRs) still account for a large number of hospital admissions each year. OBJECTIVES The aim of this study is to develop an augmented intelligence methodology for automatically identifying relevant publications mentioning an established link between a Drug and a Serious Adverse Event, according to the European Medicines Agency (EMA) definition of seriousness. METHODS The proposed pipeline, called LiSA (for Literature Search Application), is based on three independent deep learning models supporting a precise detection of safety signals in the biomedical literature. By combining a Bidirectional Encoder Representations from Transformers (BERT) algorithms and a modular architecture, the pipeline achieves a precision of 0.81 and a recall of 0.89 at sentences level in articles extracted from PubMed (either abstract or full-text). We also measured that by using LiSA, a medical reviewer increases by a factor of 2.5 the number of relevant documents it can collect and evaluate compared to a simple keyword search. In the interest of re-usability, emphasis was placed on building a modular pipeline allowing the insertion of other NLP modules to enrich the results provided by the system, and extend it to other use cases. In addition, a lightweight visualization tool was developed to analyze and monitor safety signal results. CONCLUSIONS Overall, the generic pipeline and the visualization tool proposed in this article allows for efficient and accurate monitoring of serious adverse drug reactions from the literature and can easily be adapted to similar pharmacovigilance use cases. To facilitate reproducibility and benefit other research studies, we also shared a first benchmark dataset for Serious Adverse Drug Events detection.
Collapse
Affiliation(s)
| | | | - Jean Cupe
- Quinten, 8 rue Vernier, 75017 Paris, France
| | | | | | | | - Alexander Horst
- grid.483664.b0000 0001 0683 3095Swiss Agency for Therapeutic Products, Swissmedic, Hallerstrasse 7, 3012 Bern, Switzerland
| | - Michael Renaudin
- grid.483664.b0000 0001 0683 3095Swiss Agency for Therapeutic Products, Swissmedic, Hallerstrasse 7, 3012 Bern, Switzerland
| | - Philippe Girard
- grid.483664.b0000 0001 0683 3095Swiss Agency for Therapeutic Products, Swissmedic, Hallerstrasse 7, 3012 Bern, Switzerland
| | | |
Collapse
|
4
|
Gonzalez-Hernandez G, Krallinger M, Muñoz M, Rodriguez-Esteban R, Uzuner Ö, Hirschman L. Challenges and opportunities for mining adverse drug reactions: perspectives from pharma, regulatory agencies, healthcare providers and consumers. Database (Oxford) 2022; 2022:baac071. [PMID: 36050787 PMCID: PMC9436770 DOI: 10.1093/database/baac071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 07/08/2022] [Accepted: 08/25/2022] [Indexed: 11/17/2022]
Abstract
Monitoring drug safety is a central concern throughout the drug life cycle. Information about toxicity and adverse events is generated at every stage of this life cycle, and stakeholders have a strong interest in applying text mining and artificial intelligence (AI) methods to manage the ever-increasing volume of this information. Recognizing the importance of these applications and the role of challenge evaluations to drive progress in text mining, the organizers of BioCreative VII (Critical Assessment of Information Extraction in Biology) convened a panel of experts to explore 'Challenges in Mining Drug Adverse Reactions'. This article is an outgrowth of the panel; each panelist has highlighted specific text mining application(s), based on their research and their experiences in organizing text mining challenge evaluations. While these highlighted applications only sample the complexity of this problem space, they reveal both opportunities and challenges for text mining to aid in the complex process of drug discovery, testing, marketing and post-market surveillance. Stakeholders are eager to embrace natural language processing and AI tools to help in this process, provided that these tools can be demonstrated to add value to stakeholder workflows. This creates an opportunity for the BioCreative community to work in partnership with regulatory agencies, pharma and the text mining community to identify next steps for future challenge evaluations.
Collapse
Affiliation(s)
- Graciela Gonzalez-Hernandez
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., West Hollywood, CA 90069, USA
| | - Martin Krallinger
- Life Sciences—Text Mining, Barcelona Supercomputing Center, Plaça Eusebi Güell, 1-3, Barcelona 08034, Spain
| | - Monica Muñoz
- Division of Pharmacovigilance, Office of Surveillance and Epidemiology, Center of Drug Evaluation and Research, FDA, 10903 New Hampshire Ave, Silver Spring, MD 20993, USA
| | - Raul Rodriguez-Esteban
- Roche Innovation Center Basel, Roche Pharmaceuticals, Grenzacherstrasse 124, Basel 4070, Switzerland
| | - Özlem Uzuner
- Information Sciences and Technology, George Mason University, 4400 University Dr, Fairfax, VA 22030, USA
| | - Lynette Hirschman
- MITRE Labs, The MITRE Corporation, 202 Burlington Rd., Bedford, MA 01730, USA
| |
Collapse
|
5
|
Binkheder S, Wu HY, Quinney SK, Zhang S, Zitu MM, Chiang CW, Wang L, Jones J, Li L. PhenoDEF: a corpus for annotating sentences with information of phenotype definitions in biomedical literature. J Biomed Semantics 2022; 13:17. [PMID: 35690873 PMCID: PMC9188713 DOI: 10.1186/s13326-022-00272-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Accepted: 05/18/2022] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Adverse events induced by drug-drug interactions are a major concern in the United States. Current research is moving toward using electronic health record (EHR) data, including for adverse drug events discovery. One of the first steps in EHR-based studies is to define a phenotype for establishing a cohort of patients. However, phenotype definitions are not readily available for all phenotypes. One of the first steps of developing automated text mining tools is building a corpus. Therefore, this study aimed to develop annotation guidelines and a gold standard corpus to facilitate building future automated approaches for mining phenotype definitions contained in the literature. Furthermore, our aim is to improve the understanding of how these published phenotype definitions are presented in the literature and how we annotate them for future text mining tasks. RESULTS Two annotators manually annotated the corpus on a sentence-level for the presence of evidence for phenotype definitions. Three major categories (inclusion, intermediate, and exclusion) with a total of ten dimensions were proposed characterizing major contextual patterns and cues for presenting phenotype definitions in published literature. The developed annotation guidelines were used to annotate the corpus that contained 3971 sentences: 1923 out of 3971 (48.4%) for the inclusion category, 1851 out of 3971 (46.6%) for the intermediate category, and 2273 out of 3971 (57.2%) for exclusion category. The highest number of annotated sentences was 1449 out of 3971 (36.5%) for the "Biomedical & Procedure" dimension. The lowest number of annotated sentences was 49 out of 3971 (1.2%) for "The use of NLP". The overall percent inter-annotator agreement was 97.8%. Percent and Kappa statistics also showed high inter-annotator agreement across all dimensions. CONCLUSIONS The corpus and annotation guidelines can serve as a foundational informatics approach for annotating and mining phenotype definitions in literature, and can be used later for text mining applications.
Collapse
Affiliation(s)
- Samar Binkheder
- Department of Biohealth Informatics, Indiana University School of Informatics and Computing, Indianapolis, IN, USA
- Medical Informatics Unit, Department of Medical Education, College of Medicine, King Saud University, Riyadh, Saudi Arabia
| | - Heng-Yi Wu
- Development Science Informatics, Genentech, South San Francisco, CA, USA
| | - Sara K Quinney
- Department of Obstetrics and Gynecology, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Shijun Zhang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Md Muntasir Zitu
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Chien-Wei Chiang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Lei Wang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Josette Jones
- Department of Biohealth Informatics, Indiana University School of Informatics and Computing, Indianapolis, IN, USA
| | - Lang Li
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA.
- , 250 Lincoln Tower, 1800 Cannon Drive, Columbus, OH, 43210, USA.
| |
Collapse
|
6
|
Gao Y, Duan W, Rui H. Does Social Media Accelerate Product Recalls? Evidence from the Pharmaceutical Industry. INFORMATION SYSTEMS RESEARCH 2021. [DOI: 10.1287/isre.2021.1092] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Social media has become a vital platform for voicing product-related experiences that may not only reveal product defects, but also impose pressure on firms to act more promptly than before. This study scrutinizes the rarely studied relationship between these voices and the speed of product recalls in the context of the pharmaceutical industry in which social media pharmacovigilance is becoming increasingly important for the detection of drug safety signals. Using Federal Drug Administration drug enforcement reports and social media data crawled from online forums and Twitter, we investigate whether social media can accelerate the product recall process in the context of drug recalls. Results based on discrete-time survival analyses suggest that more adverse drug reaction discussions on social media lead to a higher hazard rate of the drug being recalled and, thus, a shorter time to recall. To better understand the underlying mechanism, we propose the information effect, which captures how extracting information from social media helps detect more signals and mine signals faster to accelerate product recalls, and the publicity effect, which captures how firms and government agencies are pressured by public concerns to initiate speedy recalls. Estimation results from two mechanism tests support the existence of these conceptualized channels underlying the acceleration hypothesis of social media. This study offers new insights for firms and policymakers concerning the power of social media and its influence on product recalls.
Collapse
Affiliation(s)
- Yang Gao
- School of Computing and Information Systems, Singapore Management University, Singapore 178902
| | - Wenjing Duan
- School of Business, George Washington University, Washington, District of Columbia 20037
| | - Huaxia Rui
- Simon Business School, University of Rochester, Rochester, New York 14627
| |
Collapse
|
7
|
Tandan M, Acharya Y, Pokharel S, Timilsina M. Discovering symptom patterns of COVID-19 patients using association rule mining. Comput Biol Med 2021; 131:104249. [PMID: 33561673 PMCID: PMC7966840 DOI: 10.1016/j.compbiomed.2021.104249] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 01/25/2021] [Accepted: 01/25/2021] [Indexed: 12/16/2022]
Abstract
BACKGROUND The COVID-19 pandemic is a significant public health crisis that is hitting hard on people's health, well-being, and freedom of movement, and affecting the global economy. Scientists worldwide are competing to develop therapeutics and vaccines; currently, three drugs and two vaccine candidates have been given emergency authorization use. However, there are still questions of efficacy with regard to specific subgroups of patients and the vaccine's scalability to the general public. Under such circumstances, understanding COVID-19 symptoms is vital in initial triage; it is crucial to distinguish the severity of cases for effective management and treatment. This study aimed to discover symptom patterns and overall symptom rules, including rules disaggregated by age, sex, chronic condition, and mortality status, among COVID-19 patients. METHODS This study was a retrospective analysis of COVID-19 patient data made available online by the Wolfram Data Repository through May 27, 2020. We applied a widely used rule-based machine learning technique called association rule mining to identify frequent symptoms and define patterns in the rules discovered. RESULT In total, 1,560 patients with COVID-19 were included in the study, with a median age of 52 years. The most frequently occurring symptom was fever (67%), followed by cough (37%), malaise/body soreness (11%), pneumonia (11%), and sore throat (8%). Myocardial infarction, heart failure, and renal disease were present in less than 1% of patients. The top ten significant symptom rules (out of 71 generated) showed cough, septic shock, and respiratory distress syndrome as frequent consequents. If a patient had a breathing problem and sputum production, then, there was higher confidence of that patient having a cough; if cardiac disease, renal disease, or pneumonia was present, then there was a higher confidence of septic shock or respiratory distress syndrome. Symptom rules differed between younger and older patients and between male and female patients. Patients who had chronic conditions or died of COVID-19 had more severe symptom rules than those patients who did not have chronic conditions or survived of COVID-19. Concerning chronic condition rules among 147 patients, if a patient had diabetes, prerenal azotemia, and coronary bypass surgery, there was a certainty of hypertension. CONCLUSION The most frequently reported symptoms in patients with COVID-19 were fever, cough, pneumonia, and sore throat; while 1% had severe symptoms, such as septic shock, respiratory distress syndrome, and respiratory failure. Symptom rules differed by age and sex. Patients with chronic disease and patients who died of COVID-19 had severe symptom rules more specifically, cardiovascular-related symptoms accompanied by pneumonia, fever, and cough as consequents.
Collapse
Affiliation(s)
- Meera Tandan
- Cecil G Sheps Center for Health Service Research, University of North Carolina, Chapel Hill, USA.
| | - Yogesh Acharya
- Western Vascular Institute, Galway University Hospital, Galway, Ireland.
| | - Suresh Pokharel
- The University of Queensland, St Lucia, Queensland, Australia.
| | - Mohan Timilsina
- Data Science Institute, Insight Centre for Data Analytics, National University of Ireland Galway, Ireland.
| |
Collapse
|
8
|
Cheerkoot-Jalim S, Khedo KK. A systematic review of text mining approaches applied to various application areas in the biomedical domain. JOURNAL OF KNOWLEDGE MANAGEMENT 2020. [DOI: 10.1108/jkm-09-2019-0524] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Purpose
This work shows the results of a systematic literature review on biomedical text mining. The purpose of this study is to identify the different text mining approaches used in different application areas of the biomedical domain, the common tools used and the challenges of biomedical text mining as compared to generic text mining algorithms. This study will be of value to biomedical researchers by allowing them to correlate text mining approaches to specific biomedical application areas. Implications for future research are also discussed.
Design/methodology/approach
The review was conducted following the principles of the Kitchenham method. A number of research questions were first formulated, followed by the definition of the search strategy. The papers were then selected based on a list of assessment criteria. Each of the papers were analyzed and information relevant to the research questions were extracted.
Findings
It was found that researchers have mostly harnessed data sources such as electronic health records, biomedical literature, social media and health-related forums. The most common text mining technique was natural language processing using tools such as MetaMap and Unstructured Information Management Architecture, alongside the use of medical terminologies such as Unified Medical Language System. The main application area was the detection of adverse drug events. Challenges identified included the need to deal with huge amounts of text, the heterogeneity of the different data sources, the duality of meaning of words in biomedical text and the amount of noise introduced mainly from social media and health-related forums.
Originality/value
To the best of the authors’ knowledge, other reviews in this area have focused on either specific techniques, specific application areas or specific data sources. The results of this review will help researchers to correlate most relevant and recent advances in text mining approaches to specific biomedical application areas by providing an up-to-date and holistic view of work done in this research area. The use of emerging text mining techniques has great potential to spur the development of innovative applications, thus considerably impacting on the advancement of biomedical research.
Collapse
|
9
|
Yu Y, Ruddy K, Mansfield A, Zong N, Wen A, Tsuji S, Huang M, Liu H, Shah N, Jiang G. Detecting and Filtering Immune-Related Adverse Events Signal Based on Text Mining and Observational Health Data Sciences and Informatics Common Data Model: Framework Development Study. JMIR Med Inform 2020; 8:e17353. [PMID: 32530430 PMCID: PMC7320306 DOI: 10.2196/17353] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Revised: 04/13/2020] [Accepted: 04/15/2020] [Indexed: 01/30/2023] Open
Abstract
Background Immune checkpoint inhibitors are associated with unique immune-related adverse events (irAEs). As most of the immune checkpoint inhibitors are new to the market, it is important to conduct studies using real-world data sources to investigate their safety profiles. Objective The aim of the study was to develop a framework for signal detection and filtration of novel irAEs for 6 Food and Drug Administration–approved immune checkpoint inhibitors. Methods In our framework, we first used the Food and Drug Administration’s Adverse Event Reporting System (FAERS) standardized in an Observational Health Data Sciences and Informatics (OHDSI) common data model (CDM) to collect immune checkpoint inhibitor-related event data and conducted irAE signal detection. OHDSI CDM is a standard-driven data model that focuses on transforming different databases into a common format and standardizing medical terms to a common representation. We then filtered those already known irAEs from drug labels and literature by using a customized text-mining pipeline based on clinical text analysis and knowledge extraction system with Medical Dictionary for Regulatory Activities (MedDRA) as a dictionary. Finally, we classified the irAE detection results into three different categories to discover potentially new irAE signals. Results By our text-mining pipeline, 490 irAE terms were identified from drug labels, and 918 terms were identified from the literature. In addition, of the 94 positive signals detected using CDM-based FAERS, 53 signals (56%) were labeled signals, 10 (11%) were unlabeled published signals, and 31 (33%) were potentially new signals. Conclusions We demonstrated that our approach is effective for irAE signal detection and filtration. Moreover, our CDM-based framework could facilitate adverse drug events detection and filtration toward the goal of next-generation pharmacovigilance that seamlessly integrates electronic health record data for improved signal detection.
Collapse
Affiliation(s)
- Yue Yu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Kathryn Ruddy
- Division of Medical Oncology, Department of Oncology, Mayo Clinic, Rochester, MN, United States
| | - Aaron Mansfield
- Division of Medical Oncology, Department of Oncology, Mayo Clinic, Rochester, MN, United States
| | - Nansu Zong
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Andrew Wen
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Shintaro Tsuji
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Ming Huang
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Nilay Shah
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Guoqian Jiang
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| |
Collapse
|
10
|
Ju M, Nguyen NTH, Miwa M, Ananiadou S. An ensemble of neural models for nested adverse drug events and medication extraction with subwords. J Am Med Inform Assoc 2020; 27:22-30. [PMID: 31197355 PMCID: PMC6913208 DOI: 10.1093/jamia/ocz075] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 03/22/2019] [Accepted: 05/07/2019] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVE This article describes an ensembling system to automatically extract adverse drug events and drug related entities from clinical narratives, which was developed for the 2018 n2c2 Shared Task Track 2. MATERIALS AND METHODS We designed a neural model to tackle both nested (entities embedded in other entities) and polysemous entities (entities annotated with multiple semantic types) based on MIMIC III discharge summaries. To better represent rare and unknown words in entities, we further tokenized the MIMIC III data set by splitting the words into finer-grained subwords. We finally combined all the models to boost the performance. Additionally, we implemented a featured-based conditional random field model and created an ensemble to combine its predictions with those of the neural model. RESULTS Our method achieved 92.78% lenient micro F1-score, with 95.99% lenient precision, and 89.79% lenient recall, respectively. Experimental results showed that combining the predictions of either multiple models, or of a single model with different settings can improve performance. DISCUSSION Analysis of the development set showed that our neural models can detect more informative text regions than feature-based conditional random field models. Furthermore, most entity types significantly benefit from subword representation, which also allows us to extract sparse entities, especially nested entities. CONCLUSION The overall results have demonstrated that the ensemble method can accurately recognize entities, including nested and polysemous entities. Additionally, our method can recognize sparse entities by reconsidering the clinical narratives at a finer-grained subword level, rather than at the word level.
Collapse
Affiliation(s)
- Meizhi Ju
- National Centre for Text Mining, School of Computer Science, The University of Manchester, Manchester, UK
- Artificial Intelligence Research Centre (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Nhung T H Nguyen
- National Centre for Text Mining, School of Computer Science, The University of Manchester, Manchester, UK
- Artificial Intelligence Research Centre (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Makoto Miwa
- Toyota Technological Institute, Nagoya, Japan
- Artificial Intelligence Research Centre (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Sophia Ananiadou
- National Centre for Text Mining, School of Computer Science, The University of Manchester, Manchester, UK
- Artificial Intelligence Research Centre (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| |
Collapse
|
11
|
Arnoux-Guenegou A, Girardeau Y, Chen X, Deldossi M, Aboukhamis R, Faviez C, Dahamna B, Karapetiantz P, Guillemin-Lanne S, Lillo-Le Louët A, Texier N, Burgun A, Katsahian S. The Adverse Drug Reactions From Patient Reports in Social Media Project: Protocol for an Evaluation Against a Gold Standard. JMIR Res Protoc 2019; 8:e11448. [PMID: 31066711 PMCID: PMC6528435 DOI: 10.2196/11448] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 11/16/2018] [Accepted: 12/21/2018] [Indexed: 12/30/2022] Open
Abstract
Background Social media is a potential source of information on postmarketing drug safety surveillance that still remains unexploited nowadays. Information technology solutions aiming at extracting adverse reactions (ADRs) from posts on health forums require a rigorous evaluation methodology if their results are to be used to make decisions. First, a gold standard, consisting of manual annotations of the ADR by human experts from the corpus extracted from social media, must be implemented and its quality must be assessed. Second, as for clinical research protocols, the sample size must rely on statistical arguments. Finally, the extraction methods must target the relation between the drug and the disease (which might be either treated or caused by the drug) rather than simple co-occurrences in the posts. Objective We propose a standardized protocol for the evaluation of a software extracting ADRs from the messages on health forums. The study is conducted as part of the Adverse Drug Reactions from Patient Reports in Social Media project. Methods Messages from French health forums were extracted. Entity recognition was based on Racine Pharma lexicon for drugs and Medical Dictionary for Regulatory Activities terminology for potential adverse events (AEs). Natural language processing–based techniques automated the ADR information extraction (relation between the drug and AE entities). The corpus of evaluation was a random sample of the messages containing drugs and/or AE concepts corresponding to recent pharmacovigilance alerts. A total of 2 persons experienced in medical terminology manually annotated the corpus, thus creating the gold standard, according to an annotator guideline. We will evaluate our tool against the gold standard with recall, precision, and f-measure. Interannotator agreement, reflecting gold standard quality, will be evaluated with hierarchical kappa. Granularities in the terminologies will be further explored. Results Necessary and sufficient sample size was calculated to ensure statistical confidence in the assessed results. As we expected a global recall of 0.5, we needed at least 384 identified ADR concepts to obtain a 95% CI with a total width of 0.10 around 0.5. The automated ADR information extraction in the corpus for evaluation is already finished. The 2 annotators already completed the annotation process. The analysis of the performance of the ADR information extraction module as compared with gold standard is ongoing. Conclusions This protocol is based on the standardized statistical methods from clinical research to create the corpus, thus ensuring the necessary statistical power of the assessed results. Such evaluation methodology is required to make the ADR information extraction software useful for postmarketing drug safety surveillance. International Registered Report Identifier (IRRID) RR1-10.2196/11448
Collapse
Affiliation(s)
- Armelle Arnoux-Guenegou
- INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Centre de Recherche des Cordeliers, Paris, France
| | - Yannick Girardeau
- INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Centre de Recherche des Cordeliers, Paris, France.,Département d'Informatique Médicale, Hôpital Européen Georges-Pompidou, Assistance Publique - Hôpitaux de Paris, Paris, France
| | - Xiaoyi Chen
- INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Centre de Recherche des Cordeliers, Paris, France
| | | | - Rim Aboukhamis
- Centre Régional de Pharmacovigilance, Hôpital Européen Georges-Pompidou, Assistance Publique - Hôpitaux de Paris, Paris, France
| | | | - Badisse Dahamna
- Service d'Informatique Biomédicale, D2IM, Centre Hospitalier Universitaire de Rouen, Rouen, France
| | - Pierre Karapetiantz
- INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Centre de Recherche des Cordeliers, Paris, France
| | | | - Agnès Lillo-Le Louët
- Centre Régional de Pharmacovigilance, Hôpital Européen Georges-Pompidou, Assistance Publique - Hôpitaux de Paris, Paris, France
| | | | - Anita Burgun
- INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Centre de Recherche des Cordeliers, Paris, France.,Département d'Informatique Médicale, Hôpital Européen Georges-Pompidou, Assistance Publique - Hôpitaux de Paris, Paris, France.,INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Paris Descartes University, Sorbonne Paris Cité, Paris, France
| | - Sandrine Katsahian
- INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Centre de Recherche des Cordeliers, Paris, France.,INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Paris Descartes University, Sorbonne Paris Cité, Paris, France.,Clinical Research Unit Hôpitaux Universitaires Paris Ouest, Hôpital Européen Georges-Pompidou, Assistance Publique - Hôpitaux de Paris, Paris, France.,INSERM CIC1418, Clinical Epidemiology, Hôpital Européen Georges-Pompidou, Paris, France
| |
Collapse
|
12
|
Zhang M, Zhang M, Ge C, Liu Q, Wang J, Wei J, Zhu KQ. Automatic discovery of adverse reactions through Chinese social media. Data Min Knowl Discov 2019. [DOI: 10.1007/s10618-018-00610-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
13
|
Harnessing social media data for pharmacovigilance: a review of current state of the art, challenges and future directions. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2019. [DOI: 10.1007/s41060-019-00175-3] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
14
|
Negi K, Pavuri A, Patel L, Jain C. A novel method for drug-adverse event extraction using machine learning. INFORMATICS IN MEDICINE UNLOCKED 2019. [DOI: 10.1016/j.imu.2019.100190] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open
|
15
|
Convertino I, Ferraro S, Blandizzi C, Tuccori M. The usefulness of listening social media for pharmacovigilance purposes: a systematic review. Expert Opin Drug Saf 2018; 17:1081-1093. [DOI: 10.1080/14740338.2018.1531847] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Affiliation(s)
- Irma Convertino
- Unit of Pharmacology and Pharmacovigilance, University of Pisa, Pisa, Italy
| | - Sara Ferraro
- Unit of Pharmacology and Pharmacovigilance, University of Pisa, Pisa, Italy
| | - Corrado Blandizzi
- Unit of Pharmacology and Pharmacovigilance, University of Pisa, Pisa, Italy
- Division of Pharmacology and Pharmacovigilance, Department of Clinical and Experimental Medicine, University Hospital of Pisa, Pisa, Italy
| | - Marco Tuccori
- Unit of Pharmacology and Pharmacovigilance, University of Pisa, Pisa, Italy
- Division of Pharmacology and Pharmacovigilance, Department of Clinical and Experimental Medicine, University Hospital of Pisa, Pisa, Italy
| |
Collapse
|
16
|
Jang G, Lee T, Hwang S, Park C, Ahn J, Seo S, Hwang Y, Yoon Y. PISTON: Predicting drug indications and side effects using topic modeling and natural language processing. J Biomed Inform 2018; 87:96-107. [PMID: 30268842 DOI: 10.1016/j.jbi.2018.09.015] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Revised: 08/23/2018] [Accepted: 09/27/2018] [Indexed: 02/07/2023]
Abstract
The process of discovering novel drugs to treat diseases requires a long time and high cost. It is important to understand side effects of drugs as well as their therapeutic effects, because these can seriously damage the patients due to unexpected actions of the derived candidate drugs. In order to overcome these limitations, computational methods for predicting the therapeutic effects and side effects have been proposed. In particular, text mining is a widely used technique in the field of systems biology, because it can discover hidden relationships between drugs, genes and diseases from a large amount of literature data. Compared with in vivo/in vitro experiments, text mining derives meaningful results with less time and cost. In this study, we propose an algorithm for predicting novel drug-phenotype associations and drug-side effect associations using topic modeling and natural language processing (NLP). We extract sentences in which drugs and genes co-occur from the abstracts of the literature and identify words that describe the relationship between them using NLP. Considering the characteristics of the identified words, we determine if the drug has an up-regulation effect or a down-regulation effect on the gene. Based on genes that affect drugs and their regulatory relationships, we group the frequently occurring genes and regulatory relationships into topics, and build a drug-topic probability matrix by calculating the score that the drug will have a topic using topic modeling. Using the matrix, a classifier is constructed for predicting the novel indications and side effects of drugs considering the characteristics of known drug-phenotype associations or drug-side effect associations. The proposed method predicts both indications and side effects with a single algorithm, and it can exclude drugs with serious side effects or side effects that patients do not want to experience from among the candidate drugs provided for the treatment of the phenotype. Furthermore, lists of novel candidate drugs for phenotypes and side effects can be continuously updated with our algorithm every time a document is added. More than a thousand documents are produced per day, and it is possible for our algorithm to efficiently derive candidate drugs because it requires less cost than the existing drug repositioning methods. The resource of PISTON is available at databio.gachon.ac.kr/tools/PISTON.
Collapse
Affiliation(s)
- Giup Jang
- Department of IT Convergence Engineering, Gachon University, Seongnam, Republic of Korea
| | - Taekeon Lee
- Department of Computer Engineering, Gachon University, Seongnam, Republic of Korea
| | - Soyoun Hwang
- Department of IT Convergence Engineering, Gachon University, Seongnam, Republic of Korea
| | - Chihyun Park
- Department of Computer Science, Yonsei University, Seoul, Republic of Korea
| | - Jaegyoon Ahn
- Department of Computer Science and Engineering, Incheon University, Incheon, Republic of Korea
| | - Sukyung Seo
- Department of Computer Engineering, Gachon University, Seongnam, Republic of Korea
| | - Youhyeon Hwang
- Department of Computer Science, University of Southern California, Los Angeles, USA
| | - Youngmi Yoon
- Department of Computer Engineering, Gachon University, Seongnam, Republic of Korea.
| |
Collapse
|
17
|
LIU J, JIANG X, CHEN Q, SONG M, LI J. Adverse Drug Reaction Related Post Detecting Using Sentiment Feature. IRANIAN JOURNAL OF PUBLIC HEALTH 2018; 47:861-867. [PMID: 30087872 PMCID: PMC6077625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
BACKGROUND The posts related to Adverse Drug Reaction (ADR) on social websites are believed to be valuable resource for post-marketing drug surveillance. Beyond domain feature, the aim of this study was to find a more effective method to detect ADR related post. METHODS We conducted experiment on posts using sentiment features from March 8 to May 20 in 2016 in Shanghai of China. Firstly, the diabetes posts were collected; the 1814 posts were annotated by hand. Secondly, sentiment features set were generated and the χ2 (CHI) statistics were used to select feature. Finally, we evaluated the effectiveness of our method using the different feature sets. RESULTS By comparing the posts detection performance of different feature sets, using sentiment features by CHI statistics can improve ADR related post detection performance. By comparing the ADR-related group with the non-ADR group, performance of ADR related post detection was better than the performance of non-ADR post detection. We could obtain highest performance owing to introducing sentiment feature and using CHI feature selection technique, and the method was proved to be effective during detecting post related to ADR. CONCLUSION By using sentiment feature and CHI feature selection technique, we can get an effective method to detect post related to ADR.
Collapse
Affiliation(s)
- Jingfang LIU
- Dept. of Information Management, School of Management, Shanghai University, Shanghai 200444, China
| | - Xiaoyan JIANG
- Dept. of Information Management, School of Management, Shanghai University, Shanghai 200444, China
| | - Qiangyuan CHEN
- Dept. of Economics, School of Economics, Shanghai University, Shanghai 200444, China,Corresponding Author:
| | - Mei SONG
- Dept. of Software Engineering, School of Smart Education, Jiangsu Normal University, Xuzhou 221116, China
| | - Jia LI
- Dept. of Management Science and Engineering, School of Business, East China University of Science and Technology, Shanghai 200237, China
| |
Collapse
|
18
|
Chen X, Faviez C, Schuck S, Lillo-Le-Louët A, Texier N, Dahamna B, Huot C, Foulquié P, Pereira S, Leroux V, Karapetiantz P, Guenegou-Arnoux A, Katsahian S, Bousquet C, Burgun A. Mining Patients' Narratives in Social Media for Pharmacovigilance: Adverse Effects and Misuse of Methylphenidate. Front Pharmacol 2018; 9:541. [PMID: 29881351 PMCID: PMC5978246 DOI: 10.3389/fphar.2018.00541] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Accepted: 05/04/2018] [Indexed: 12/29/2022] Open
Abstract
Background: The Food and Drug Administration (FDA) in the United States and the European Medicines Agency (EMA) have recognized social media as a new data source to strengthen their activities regarding drug safety. Objective: Our objective in the ADR-PRISM project was to provide text mining and visualization tools to explore a corpus of posts extracted from social media. We evaluated this approach on a corpus of 21 million posts from five patient forums, and conducted a qualitative analysis of the data available on methylphenidate in this corpus. Methods: We applied text mining methods based on named entity recognition and relation extraction in the corpus, followed by signal detection using proportional reporting ratio (PRR). We also used topic modeling based on the Correlated Topic Model to obtain the list of the matics in the corpus and classify the messages based on their topics. Results: We automatically identified 3443 posts about methylphenidate published between 2007 and 2016, among which 61 adverse drug reactions (ADR) were automatically detected. Two pharmacovigilance experts evaluated manually the quality of automatic identification, and a f-measure of 0.57 was reached. Patient's reports were mainly neuro-psychiatric effects. Applying PRR, 67% of the ADRs were signals, including most of the neuro-psychiatric symptoms but also palpitations. Topic modeling showed that the most represented topics were related to Childhood and Treatment initiation, but also Side effects. Cases of misuse were also identified in this corpus, including recreational use and abuse. Conclusion: Named entity recognition combined with signal detection and topic modeling have demonstrated their complementarity in mining social media data. An in-depth analysis focused on methylphenidate showed that this approach was able to detect potential signals and to provide better understanding of patients' behaviors regarding drugs, including misuse.
Collapse
Affiliation(s)
- Xiaoyi Chen
- UMRS 1138, équipe 22, Institut National de la Santé et de la Recherche Médicale, Centre de Recherche des Cordeliers, Université Paris Descartes, Paris, France
| | | | | | - Agnès Lillo-Le-Louët
- Centre Régional de Pharmacovigilance, Hôpital Européen Georges-Pompidou, AP-HP, Paris, France
| | | | - Badisse Dahamna
- Service d'Informatique Biomédicale, Centre Hospitalier Universitaire de Rouen, Rouen, France.,Laboratoire d'Informatique, du Traitement de l'Information et des Systèmes-TIBS EA 4108, Rouen, France
| | | | | | | | | | - Pierre Karapetiantz
- UMRS 1138, équipe 22, Institut National de la Santé et de la Recherche Médicale, Centre de Recherche des Cordeliers, Université Paris Descartes, Paris, France
| | - Armelle Guenegou-Arnoux
- UMRS 1138, équipe 22, Institut National de la Santé et de la Recherche Médicale, Centre de Recherche des Cordeliers, Université Paris Descartes, Paris, France
| | - Sandrine Katsahian
- UMRS 1138, équipe 22, Institut National de la Santé et de la Recherche Médicale, Centre de Recherche des Cordeliers, Université Paris Descartes, Paris, France.,Département d'Informatique Médicale, Hôpital Européen Georges Pompidou, Paris, France
| | - Cédric Bousquet
- Sorbonne Université, Inserm, université Paris 13, Laboratoire d'informatique médicale et d'ingénierie des connaissances en e-santé, LIMICS, Paris, France
| | - Anita Burgun
- UMRS 1138, équipe 22, Institut National de la Santé et de la Recherche Médicale, Centre de Recherche des Cordeliers, Université Paris Descartes, Paris, France.,Département d'Informatique Médicale, Hôpital Européen Georges Pompidou, Paris, France
| |
Collapse
|
19
|
Harpaz R, DuMouchel W, Schuemie M, Bodenreider O, Friedman C, Horvitz E, Ripple A, Sorbello A, White RW, Winnenburg R, Shah NH. Toward multimodal signal detection of adverse drug reactions. J Biomed Inform 2017; 76:41-49. [PMID: 29081385 PMCID: PMC8502488 DOI: 10.1016/j.jbi.2017.10.013] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2017] [Revised: 10/14/2017] [Accepted: 10/24/2017] [Indexed: 11/27/2022]
Abstract
OBJECTIVE Improving mechanisms to detect adverse drug reactions (ADRs) is key to strengthening post-marketing drug safety surveillance. Signal detection is presently unimodal, relying on a single information source. Multimodal signal detection is based on jointly analyzing multiple information sources. Building on, and expanding the work done in prior studies, the aim of the article is to further research on multimodal signal detection, explore its potential benefits, and propose methods for its construction and evaluation. MATERIAL AND METHODS Four data sources are investigated; FDA's adverse event reporting system, insurance claims, the MEDLINE citation database, and the logs of major Web search engines. Published methods are used to generate and combine signals from each data source. Two distinct reference benchmarks corresponding to well-established and recently labeled ADRs respectively are used to evaluate the performance of multimodal signal detection in terms of area under the ROC curve (AUC) and lead-time-to-detection, with the latter relative to labeling revision dates. RESULTS Limited to our reference benchmarks, multimodal signal detection provides AUC improvements ranging from 0.04 to 0.09 based on a widely used evaluation benchmark, and a comparative added lead-time of 7-22 months relative to labeling revision dates from a time-indexed benchmark. CONCLUSIONS The results support the notion that utilizing and jointly analyzing multiple data sources may lead to improved signal detection. Given certain data and benchmark limitations, the early stage of development, and the complexity of ADRs, it is currently not possible to make definitive statements about the ultimate utility of the concept. Continued development of multimodal signal detection requires a deeper understanding the data sources used, additional benchmarks, and further research on methods to generate and synthesize signals.
Collapse
Affiliation(s)
- Rave Harpaz
- Oracle Health Sciences, Bedford, MA, United States.
| | | | | | | | | | | | - Anna Ripple
- National Library of Medicine, NIH, Bethesda, MD, United States
| | | | | | | | - Nigam H Shah
- Stanford University, Stanford, CA, United States
| |
Collapse
|
20
|
Bousquet C, Dahamna B, Guillemin-Lanne S, Darmoni SJ, Faviez C, Huot C, Katsahian S, Leroux V, Pereira S, Richard C, Schück S, Souvignet J, Lillo-Le Louët A, Texier N. The Adverse Drug Reactions from Patient Reports in Social Media Project: Five Major Challenges to Overcome to Operationalize Analysis and Efficiently Support Pharmacovigilance Process. JMIR Res Protoc 2017; 6:e179. [PMID: 28935617 PMCID: PMC5629348 DOI: 10.2196/resprot.6463] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Revised: 06/19/2017] [Accepted: 07/12/2017] [Indexed: 11/13/2022] Open
Abstract
Background Adverse drug reactions (ADRs) are an important cause of morbidity and mortality. Classical Pharmacovigilance process is limited by underreporting which justifies the current interest in new knowledge sources such as social media. The Adverse Drug Reactions from Patient Reports in Social Media (ADR-PRISM) project aims to extract ADRs reported by patients in these media. We identified 5 major challenges to overcome to operationalize the analysis of patient posts: (1) variable quality of information on social media, (2) guarantee of data privacy, (3) response to pharmacovigilance expert expectations, (4) identification of relevant information within Web pages, and (5) robust and evolutive architecture. Objective This article aims to describe the current state of advancement of the ADR-PRISM project by focusing on the solutions we have chosen to address these 5 major challenges. Methods In this article, we propose methods and describe the advancement of this project on several aspects: (1) a quality driven approach for selecting relevant social media for the extraction of knowledge on potential ADRs, (2) an assessment of ethical issues and French regulation for the analysis of data on social media, (3) an analysis of pharmacovigilance expert requirements when reviewing patient posts on the Internet, (4) an extraction method based on natural language processing, pattern based matching, and selection of relevant medical concepts in reference terminologies, and (5) specifications of a component-based architecture for the monitoring system. Results Considering the 5 major challenges, we (1) selected a set of 21 validated criteria for selecting social media to support the extraction of potential ADRs, (2) proposed solutions to guarantee data privacy of patients posting on Internet, (3) took into account pharmacovigilance expert requirements with use case diagrams and scenarios, (4) built domain-specific knowledge resources embeding a lexicon, morphological rules, context rules, semantic rules, syntactic rules, and post-analysis processing, and (5) proposed a component-based architecture that allows storage of big data and accessibility to third-party applications through Web services. Conclusions We demonstrated the feasibility of implementing a component-based architecture that allows collection of patient posts on the Internet, near real-time processing of those posts including annotation, and storage in big data structures. In the next steps, we will evaluate the posts identified by the system in social media to clarify the interest and relevance of such approach to improve conventional pharmacovigilance processes based on spontaneous reporting.
Collapse
Affiliation(s)
- Cedric Bousquet
- Laboratoire d'Informatique Médicale et d'Ingénieurie des Connaissances en e-Santé, U1142, Institut National de la Santé et de la Recherche Médicale, Paris, France.,Service de Santé Publique et de l'Information Médicale, Centre Hospitalier Universitaire de Saint Etienne, Saint-Etienne, France
| | - Badisse Dahamna
- Department of Biomedical Informatics, Rouen University Hospital, Rouen, France
| | | | - Stefan J Darmoni
- Laboratoire d'Informatique Médicale et d'Ingénieurie des Connaissances en e-Santé, U1142, Institut National de la Santé et de la Recherche Médicale, Paris, France.,Department of Biomedical Informatics, Rouen University Hospital, Rouen, France
| | | | | | - Sandrine Katsahian
- Unité mixte de recherche 1138, équipe 22, Institut National de la Santé et de la Recherche Médicale, Centre de Recherche des Cordeliers, Paris, France
| | | | | | | | | | - Julien Souvignet
- Laboratoire d'Informatique Médicale et d'Ingénieurie des Connaissances en e-Santé, U1142, Institut National de la Santé et de la Recherche Médicale, Paris, France
| | - Agnès Lillo-Le Louët
- Assistance Publique-Hôpitaux de Paris, Hôpital Européen Georges Pompidou, Centre Régional de Pharmacovigilance, Paris, France
| | | |
Collapse
|
21
|
Abdellaoui R, Schück S, Texier N, Burgun A. Filtering Entities to Optimize Identification of Adverse Drug Reaction From Social Media: How Can the Number of Words Between Entities in the Messages Help? JMIR Public Health Surveill 2017. [PMID: 28642212 PMCID: PMC5500778 DOI: 10.2196/publichealth.6577] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND With the increasing popularity of Web 2.0 applications, social media has made it possible for individuals to post messages on adverse drug reactions. In such online conversations, patients discuss their symptoms, medical history, and diseases. These disorders may correspond to adverse drug reactions (ADRs) or any other medical condition. Therefore, methods must be developed to distinguish between false positives and true ADR declarations. OBJECTIVE The aim of this study was to investigate a method for filtering out disorder terms that did not correspond to adverse events by using the distance (as number of words) between the drug term and the disorder or symptom term in the post. We hypothesized that the shorter the distance between the disorder name and the drug, the higher the probability to be an ADR. METHODS We analyzed a corpus of 648 messages corresponding to a total of 1654 (drug and disorder) pairs from 5 French forums using Gaussian mixture models and an expectation-maximization (EM) algorithm . RESULTS The distribution of the distances between the drug term and the disorder term enabled the filtering of 50.03% (733/1465) of the disorders that were not ADRs. Our filtering strategy achieved a precision of 95.8% and a recall of 50.0%. CONCLUSIONS This study suggests that such distance between terms can be used for identifying false positives, thereby improving ADR detection in social media.
Collapse
Affiliation(s)
- Redhouane Abdellaoui
- INSERM, UMRS 1138 Team 22, Université Pierre et Marie Curie, Paris, France.,Kappa Santé, Innovation, Paris, France
| | | | | | - Anita Burgun
- INSERM, UMRS 1138 Team 22, Université Pierre et Marie Curie, Paris, France.,Assistance Publique-Hôpitaux de Paris (AP-HP), Hôpital Européen Georges-Pompidou (HEGP), Medical Informatics, Paris, France
| |
Collapse
|
22
|
Koutkias VG, Lillo-Le Louët A, Jaulent MC. Exploiting heterogeneous publicly available data sources for drug safety surveillance: computational framework and case studies. Expert Opin Drug Saf 2016; 16:113-124. [PMID: 27813420 DOI: 10.1080/14740338.2017.1257604] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
OBJECTIVE Driven by the need of pharmacovigilance centres and companies to routinely collect and review all available data about adverse drug reactions (ADRs) and adverse events of interest, we introduce and validate a computational framework exploiting dominant as well as emerging publicly available data sources for drug safety surveillance. METHODS Our approach relies on appropriate query formulation for data acquisition and subsequent filtering, transformation and joint visualization of the obtained data. We acquired data from the FDA Adverse Event Reporting System (FAERS), PubMed and Twitter. In order to assess the validity and the robustness of the approach, we elaborated on two important case studies, namely, clozapine-induced cardiomyopathy/myocarditis versus haloperidol-induced cardiomyopathy/myocarditis, and apixaban-induced cerebral hemorrhage. RESULTS The analysis of the obtained data provided interesting insights (identification of potential patient and health-care professional experiences regarding ADRs in Twitter, information/arguments against an ADR existence across all sources), while illustrating the benefits (complementing data from multiple sources to strengthen/confirm evidence) and the underlying challenges (selecting search terms, data presentation) of exploiting heterogeneous information sources, thereby advocating the need for the proposed framework. CONCLUSIONS This work contributes in establishing a continuous learning system for drug safety surveillance by exploiting heterogeneous publicly available data sources via appropriate support tools.
Collapse
Affiliation(s)
- Vassilis G Koutkias
- a Institute of Applied Biosciences , Centre for Research & Technology Hellas , Thermi , Thessaloniki , Greece.,b INSERM, U1142, LIMICS , F-75006 , Paris , France.,c Sorbonne Universités, UPMC Univ Paris 06, UMR_S 1142, LIMICS 1142, LIMICS, F-75006 , Paris , France.,d Université Paris 13, Sorbonne Paris Cité, LIMICS, (UMR_S 1142) , F-93430 , Villetaneuse , France
| | - Agnès Lillo-Le Louët
- e Centre Reìgional de Pharmacovigilance, Hôpital Européen Georges-Pompidou, AP-HP , F-75015 , Paris , France
| | - Marie-Christine Jaulent
- b INSERM, U1142, LIMICS , F-75006 , Paris , France.,c Sorbonne Universités, UPMC Univ Paris 06, UMR_S 1142, LIMICS 1142, LIMICS, F-75006 , Paris , France.,d Université Paris 13, Sorbonne Paris Cité, LIMICS, (UMR_S 1142) , F-93430 , Villetaneuse , France
| |
Collapse
|
23
|
Pechsiri C, Piriyakul R. Extraction of a group-pair relation: problem-solving relation from web-board documents. SPRINGERPLUS 2016; 5:1265. [PMID: 27540498 PMCID: PMC4975736 DOI: 10.1186/s40064-016-2864-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2015] [Accepted: 07/19/2016] [Indexed: 11/10/2022]
Abstract
This paper aims to extract a group-pair relation as a Problem-Solving relation, for example a DiseaseSymptom-Treatment relation and a CarProblem-Repair relation, between two event-explanation groups, a problem-concept group as a symptom/CarProblem-concept group and a solving-concept group as a treatment-concept/repair concept group from hospital-web-board and car-repair-guru-web-board documents. The Problem-Solving relation (particularly Symptom-Treatment relation) including the graphical representation benefits non-professional persons by supporting knowledge of primarily solving problems. The research contains three problems: how to identify an EDU (an Elementary Discourse Unit, which is a simple sentence) with the event concept of either a problem or a solution; how to determine a problem-concept EDU boundary and a solving-concept EDU boundary as two event-explanation groups, and how to determine the Problem-Solving relation between these two event-explanation groups. Therefore, we apply word co-occurrence to identify a problem-concept EDU and a solving-concept EDU, and machine-learning techniques to solve a problem-concept EDU boundary and a solving-concept EDU boundary. We propose using k-mean and Naïve Bayes to determine the Problem-Solving relation between the two event-explanation groups involved with clustering features. In contrast to previous works, the proposed approach enables group-pair relation extraction with high accuracy.
Collapse
Affiliation(s)
- Chaveevan Pechsiri
- Department of Information Technology, DhurakijPundit University, Bangkok, Thailand
| | - Rapepun Piriyakul
- Department of Computer Science, Ramkhamhaeng University, Bangkok, Thailand
| |
Collapse
|
24
|
Zheng Y, Lan C, Peng H, Li J. Using constrained information entropy to detect rare adverse drug reactions from medical forums. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2016; 2016:2460-2463. [PMID: 28268822 DOI: 10.1109/embc.2016.7591228] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Adverse drug reactions (ADRs) detection is critical to avoid malpractices yet challenging due to its uncertainty in pre-marketing review and the underreporting in post-marketing surveillance. To conquer this predicament, social media based ADRs detection methods have been proposed recently. However, existing researches are mostly co-occurrence based methods and face several issues, in particularly, leaving out the rare ADRs and unable to distinguish irrelevant ADRs. In this work, we introduce a constrained information entropy (CIE) method to solve these problems. CIE first recognizes the drug-related adverse reactions using a predefined keyword dictionary and then captures high- and low-frequency (rare) ADRs by information entropy. Extensive experiments on medical forums dataset demonstrate that CIE outperforms the state-of-the-art co-occurrence based methods, especially in rare ADRs detection.
Collapse
|
25
|
Kuhn M, Letunic I, Jensen LJ, Bork P. The SIDER database of drugs and side effects. Nucleic Acids Res 2015; 44:D1075-9. [PMID: 26481350 PMCID: PMC4702794 DOI: 10.1093/nar/gkv1075] [Citation(s) in RCA: 631] [Impact Index Per Article: 70.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2015] [Accepted: 10/06/2015] [Indexed: 01/10/2023] Open
Abstract
Unwanted side effects of drugs are a burden on patients and a severe impediment in the development of new drugs. At the same time, adverse drug reactions (ADRs) recorded during clinical trials are an important source of human phenotypic data. It is therefore essential to combine data on drugs, targets and side effects into a more complete picture of the therapeutic mechanism of actions of drugs and the ways in which they cause adverse reactions. To this end, we have created the SIDER (‘Side Effect Resource’, http://sideeffects.embl.de) database of drugs and ADRs. The current release, SIDER 4, contains data on 1430 drugs, 5880 ADRs and 140 064 drug–ADR pairs, which is an increase of 40% compared to the previous version. For more fine-grained analyses, we extracted the frequency with which side effects occur from the package inserts. This information is available for 39% of drug–ADR pairs, 19% of which can be compared to the frequency under placebo treatment. SIDER furthermore contains a data set of drug indications, extracted from the package inserts using Natural Language Processing. These drug indications are used to reduce the rate of false positives by identifying medical terms that do not correspond to ADRs.
Collapse
Affiliation(s)
- Michael Kuhn
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstr. 108, 01307 Dresden, Germany
| | - Ivica Letunic
- Biobyte solutions GmbH, Bothestr. 142, 69117 Heidelberg, Germany
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Peer Bork
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, Molecular Medicine Partnership Unit, Meyerhofstrasse 1, 69117 Heidelberg, Germany Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany
| |
Collapse
|
26
|
Golder S, Norman G, Loke YK. Systematic review on the prevalence, frequency and comparative value of adverse events data in social media. Br J Clin Pharmacol 2015; 80:878-88. [PMID: 26271492 PMCID: PMC4594731 DOI: 10.1111/bcp.12746] [Citation(s) in RCA: 66] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2015] [Revised: 07/16/2015] [Accepted: 08/03/2015] [Indexed: 11/27/2022] Open
Abstract
AIM The aim of this review was to summarize the prevalence, frequency and comparative value of information on the adverse events of healthcare interventions from user comments and videos in social media. METHODS A systematic review of assessments of the prevalence or type of information on adverse events in social media was undertaken. Sixteen databases and two internet search engines were searched in addition to handsearching, reference checking and contacting experts. The results were sifted independently by two researchers. Data extraction and quality assessment were carried out by one researcher and checked by a second. The quality assessment tool was devised in-house and a narrative synthesis of the results followed. RESULTS From 3064 records, 51 studies met the inclusion criteria. The studies assessed over 174 social media sites with discussion forums (71%) being the most popular. The overall prevalence of adverse events reports in social media varied from 0.2% to 8% of posts. Twenty-nine studies compared the results from searching social media with using other data sources to identify adverse events. There was general agreement that a higher frequency of adverse events was found in social media and that this was particularly true for 'symptom' related and 'mild' adverse events. Those adverse events that were under-represented in social media were laboratory-based and serious adverse events. CONCLUSIONS Reports of adverse events are identifiable within social media. However, there is considerable heterogeneity in the frequency and type of events reported, and the reliability or validity of the data has not been thoroughly evaluated.
Collapse
Affiliation(s)
- Su Golder
- Department of Health Sciences, University of YorkYork, YO10 5DD, UK
| | - Gill Norman
- School of Nursing, Midwifery & Social Work, University of ManchesterRoom 5.328, Jean McFarlane Building, Oxford Road, Manchester, M13 9PL, UK
| | - Yoon K Loke
- Norwich Medical School, University of East AngliaNorwich, NR4 7TJ, UK
| |
Collapse
|
27
|
Sloane R, Osanlou O, Lewis D, Bollegala D, Maskell S, Pirmohamed M. Social media and pharmacovigilance: A review of the opportunities and challenges. Br J Clin Pharmacol 2015; 80:910-20. [PMID: 26147850 PMCID: PMC4594734 DOI: 10.1111/bcp.12717] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2015] [Revised: 06/29/2015] [Accepted: 07/03/2015] [Indexed: 01/23/2023] Open
Abstract
Adverse drug reactions come at a considerable cost on society. Social media are a potentially invaluable reservoir of information for pharmacovigilance, yet their true value remains to be fully understood. In order to realize the benefits social media holds, a number of technical, regulatory and ethical challenges remain to be addressed. We outline these key challenges identifying relevant current research and present possible solutions.
Collapse
Affiliation(s)
- Richard Sloane
- Department of Electrical Engineering and Electronics, University of LiverpoolL69 3GJ, UK
- Department of Molecular and Clinical Pharmacology, University of LiverpoolL69 3GL, UK
- Department of Computer Science, University of LiverpoolL69 3BX, UK
| | - Orod Osanlou
- Department of Molecular and Clinical Pharmacology, University of LiverpoolL69 3GL, UK
- Royal Liverpool and Broadgreen University Hospital NHS TrustLiverpool, L7 8XP, UK
| | - David Lewis
- Drug Safety & Epidemiology, Novartis Pharma AG, PostfachCH-4002, Basel, Switzerland
| | | | - Simon Maskell
- Department of Electrical Engineering and Electronics, University of LiverpoolL69 3GJ, UK
- Department of Computer Science, University of LiverpoolL69 3BX, UK
| | - Munir Pirmohamed
- Department of Molecular and Clinical Pharmacology, University of LiverpoolL69 3GL, UK
- Royal Liverpool and Broadgreen University Hospital NHS TrustLiverpool, L7 8XP, UK
| |
Collapse
|
28
|
Leveraging MEDLINE indexing for pharmacovigilance - Inherent limitations and mitigation strategies. J Biomed Inform 2015; 57:425-35. [PMID: 26342964 DOI: 10.1016/j.jbi.2015.08.022] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Revised: 07/30/2015] [Accepted: 08/25/2015] [Indexed: 11/24/2022]
Abstract
BACKGROUND Traditional approaches to pharmacovigilance center on the signal detection from spontaneous reports, e.g., the U.S. Food and Drug Administration (FDA) adverse event reporting system (FAERS). In order to enrich the scientific evidence and enhance the detection of emerging adverse drug events that can lead to unintended harmful outcomes, pharmacovigilance activities need to evolve to encompass novel complementary data streams, for example the biomedical literature available through MEDLINE. OBJECTIVES (1) To review how the characteristics of MEDLINE indexing influence the identification of adverse drug events (ADEs); (2) to leverage this knowledge to inform the design of a system for extracting ADEs from MEDLINE indexing; and (3) to assess the specific contribution of some characteristics of MEDLINE indexing to the performance of this system. METHODS We analyze the characteristics of MEDLINE indexing. We integrate three specific characteristics into the design of a system for extracting ADEs from MEDLINE indexing. We experimentally assess the specific contribution of these characteristics over a baseline system based on co-occurrence between drug descriptors qualified by adverse effects and disease descriptors qualified by chemically induced. RESULTS Our system extracted 405,300 ADEs from 366,120 MEDLINE articles. The baseline system accounts for 297,093 ADEs (73%). 85,318 ADEs (21%) can be extracted only after integrating specific pre-coordinated MeSH descriptors and additional qualifiers. 22,889 ADEs (6%) can be extracted only after considering indirect links between the drug of interest and the descriptor that bears the ADE context. CONCLUSIONS In this paper, we demonstrate significant improvement over a baseline approach to identifying ADEs from MEDLINE indexing, which mitigates some of the inherent limitations of MEDLINE indexing for pharmacovigilance. ADEs extracted from MEDLINE indexing are complementary to, not a replacement for, other sources.
Collapse
|
29
|
Lardon J, Abdellaoui R, Bellet F, Asfari H, Souvignet J, Texier N, Jaulent MC, Beyens MN, Burgun A, Bousquet C. Adverse Drug Reaction Identification and Extraction in Social Media: A Scoping Review. J Med Internet Res 2015; 17:e171. [PMID: 26163365 PMCID: PMC4526988 DOI: 10.2196/jmir.4304] [Citation(s) in RCA: 67] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2015] [Revised: 04/09/2015] [Accepted: 04/22/2015] [Indexed: 02/06/2023] Open
Abstract
Background The underreporting of adverse drug reactions (ADRs) through traditional reporting channels is a limitation in the efficiency of the current pharmacovigilance system. Patients’ experiences with drugs that they report on social media represent a new source of data that may have some value in postmarketing safety surveillance. Objective A scoping review was undertaken to explore the breadth of evidence about the use of social media as a new source of knowledge for pharmacovigilance. Methods Daubt et al’s recommendations for scoping reviews were followed. The research questions were as follows: How can social media be used as a data source for postmarketing drug surveillance? What are the available methods for extracting data? What are the different ways to use these data? We queried PubMed, Embase, and Google Scholar to extract relevant articles that were published before June 2014 and with no lower date limit. Two pairs of reviewers independently screened the selected studies and proposed two themes of review: manual ADR identification (theme 1) and automated ADR extraction from social media (theme 2). Descriptive characteristics were collected from the publications to create a database for themes 1 and 2. Results Of the 1032 citations from PubMed and Embase, 11 were relevant to the research question. An additional 13 citations were added after further research on the Internet and in reference lists. Themes 1 and 2 explored 11 and 13 articles, respectively. Ways of approaching the use of social media as a pharmacovigilance data source were identified. Conclusions This scoping review noted multiple methods for identifying target data, extracting them, and evaluating the quality of medical information from social media. It also showed some remaining gaps in the field. Studies related to the identification theme usually failed to accurately assess the completeness, quality, and reliability of the data that were analyzed from social media. Regarding extraction, no study proposed a generic approach to easily adding a new site or data source. Additional studies are required to precisely determine the role of social media in the pharmacovigilance system.
Collapse
Affiliation(s)
- Jérémy Lardon
- Université Paris 13, Sorbonne Paris Cité, Laboratoire d'Informatique Médicale et d'Ingénieurie des Connaissances en e-Santé (LIMICS), (Unité Mixte de Recherche en Santé, UMR_S 1142), F-93430, Villetaneuse, France, Sorbonne Universités, University of Pierre and Marie Curie (UPMC) Université Paris 06, Unité Mixte de Recherche en Santé (UMR_S) 1142, Laboratoire d'Informatique Médicale et d'Ingénieurie des Connaissances en e-Santé (LIMICS), F-75006, Institut National de la Santé et de la Recherche Médicale (INSERM), U1142, Laboratoire d'Informatique Médicale et d'Ingénieurie des Connaissances en e-Santé (LIMICS), F-75006, Paris, France.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Bridging islands of information to establish an integrated knowledge base of drugs and health outcomes of interest. Drug Saf 2015; 37:557-67. [PMID: 24985530 PMCID: PMC4134480 DOI: 10.1007/s40264-014-0189-0] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The entire drug safety enterprise has a need to search, retrieve, evaluate, and synthesize scientific evidence more efficiently. This discovery and synthesis process would be greatly accelerated through access to a common framework that brings all relevant information sources together within a standardized structure. This presents an opportunity to establish an open-source community effort to develop a global knowledge base, one that brings together and standardizes all available information for all drugs and all health outcomes of interest (HOIs) from all electronic sources pertinent to drug safety. To make this vision a reality, we have established a workgroup within the Observational Health Data Sciences and Informatics (OHDSI, http://ohdsi.org) collaborative. The workgroup’s mission is to develop an open-source standardized knowledge base for the effects of medical products and an efficient procedure for maintaining and expanding it. The knowledge base will make it simpler for practitioners to access, retrieve, and synthesize evidence so that they can reach a rigorous and accurate assessment of causal relationships between a given drug and HOI. Development of the knowledge base will proceed with the measureable goal of supporting an efficient and thorough evidence-based assessment of the effects of 1,000 active ingredients across 100 HOIs. This non-trivial task will result in a high-quality and generally applicable drug safety knowledge base. It will also yield a reference standard of drug–HOI pairs that will enable more advanced methodological research that empirically evaluates the performance of drug safety analysis methods.
Collapse
|
31
|
Sarker A, Ginn R, Nikfarjam A, O'Connor K, Smith K, Jayaraman S, Upadhaya T, Gonzalez G. Utilizing social media data for pharmacovigilance: A review. J Biomed Inform 2015; 54:202-12. [PMID: 25720841 DOI: 10.1016/j.jbi.2015.02.004] [Citation(s) in RCA: 238] [Impact Index Per Article: 26.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2014] [Revised: 01/02/2015] [Accepted: 02/15/2015] [Indexed: 10/23/2022]
Abstract
OBJECTIVE Automatic monitoring of Adverse Drug Reactions (ADRs), defined as adverse patient outcomes caused by medications, is a challenging research problem that is currently receiving significant attention from the medical informatics community. In recent years, user-posted data on social media, primarily due to its sheer volume, has become a useful resource for ADR monitoring. Research using social media data has progressed using various data sources and techniques, making it difficult to compare distinct systems and their performances. In this paper, we perform a methodical review to characterize the different approaches to ADR detection/extraction from social media, and their applicability to pharmacovigilance. In addition, we present a potential systematic pathway to ADR monitoring from social media. METHODS We identified studies describing approaches for ADR detection from social media from the Medline, Embase, Scopus and Web of Science databases, and the Google Scholar search engine. Studies that met our inclusion criteria were those that attempted to extract ADR information posted by users on any publicly available social media platform. We categorized the studies according to different characteristics such as primary ADR detection approach, size of corpus, data source(s), availability, and evaluation criteria. RESULTS Twenty-two studies met our inclusion criteria, with fifteen (68%) published within the last two years. However, publicly available annotated data is still scarce, and we found only six studies that made the annotations used publicly available, making system performance comparisons difficult. In terms of algorithms, supervised classification techniques to detect posts containing ADR mentions, and lexicon-based approaches for extraction of ADR mentions from texts have been the most popular. CONCLUSION Our review suggests that interest in the utilization of the vast amounts of available social media data for ADR monitoring is increasing. In terms of sources, both health-related and general social media data have been used for ADR detection-while health-related sources tend to contain higher proportions of relevant data, the volume of data from general social media websites is significantly higher. There is still very limited amount of annotated data publicly available , and, as indicated by the promising results obtained by recent supervised learning approaches, there is a strong need to make such data available to the research community.
Collapse
Affiliation(s)
- Abeed Sarker
- Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ, United States.
| | - Rachel Ginn
- Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ, United States
| | - Azadeh Nikfarjam
- Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ, United States
| | - Karen O'Connor
- Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ, United States
| | - Karen Smith
- Rueckert-Hartman College for Health Professions, Regis University, Denver, CO, United States
| | - Swetha Jayaraman
- School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ, United States
| | - Tejaswi Upadhaya
- School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ, United States
| | - Graciela Gonzalez
- Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ, United States
| |
Collapse
|
32
|
Sarker A, Gonzalez G. Portable automatic text classification for adverse drug reaction detection via multi-corpus training. J Biomed Inform 2014; 53:196-207. [PMID: 25451103 DOI: 10.1016/j.jbi.2014.11.002] [Citation(s) in RCA: 145] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2014] [Revised: 10/24/2014] [Accepted: 11/02/2014] [Indexed: 10/24/2022]
Abstract
OBJECTIVE Automatic detection of adverse drug reaction (ADR) mentions from text has recently received significant interest in pharmacovigilance research. Current research focuses on various sources of text-based information, including social media-where enormous amounts of user posted data is available, which have the potential for use in pharmacovigilance if collected and filtered accurately. The aims of this study are: (i) to explore natural language processing (NLP) approaches for generating useful features from text, and utilizing them in optimized machine learning algorithms for automatic classification of ADR assertive text segments; (ii) to present two data sets that we prepared for the task of ADR detection from user posted internet data; and (iii) to investigate if combining training data from distinct corpora can improve automatic classification accuracies. METHODS One of our three data sets contains annotated sentences from clinical reports, and the two other data sets, built in-house, consist of annotated posts from social media. Our text classification approach relies on generating a large set of features, representing semantic properties (e.g., sentiment, polarity, and topic), from short text nuggets. Importantly, using our expanded feature sets, we combine training data from different corpora in attempts to boost classification accuracies. RESULTS Our feature-rich classification approach performs significantly better than previously published approaches with ADR class F-scores of 0.812 (previously reported best: 0.770), 0.538 and 0.678 for the three data sets. Combining training data from multiple compatible corpora further improves the ADR F-scores for the in-house data sets to 0.597 (improvement of 5.9 units) and 0.704 (improvement of 2.6 units) respectively. CONCLUSIONS Our research results indicate that using advanced NLP techniques for generating information rich features from text can significantly improve classification accuracies over existing benchmarks. Our experiments illustrate the benefits of incorporating various semantic features such as topics, concepts, sentiments, and polarities. Finally, we show that integration of information from compatible corpora can significantly improve classification performance. This form of multi-corpus training may be particularly useful in cases where data sets are heavily imbalanced (e.g., social media data), and may reduce the time and costs associated with the annotation of data in the future.
Collapse
Affiliation(s)
- Abeed Sarker
- Department of Biomedical Informatics, Arizona State University, 13212 East Shea Blvd., Scottsdale, AZ 85259, USA.
| | - Graciela Gonzalez
- Department of Biomedical Informatics, Arizona State University, 13212 East Shea Blvd., Scottsdale, AZ 85259, USA.
| |
Collapse
|