Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Yeleswarapu S, Rao A, Joseph T, Saipradeep VG, Srinivasan R. A pipeline to extract drug-adverse event pairs from multiple data sources. BMC Med Inform Decis Mak 2014;14:13. [PMID: 24559132 PMCID: PMC3936866 DOI: 10.1186/1472-6947-14-13] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2013] [Accepted: 02/14/2014] [Indexed: 11/10/2022] Open

For:	Yeleswarapu S, Rao A, Joseph T, Saipradeep VG, Srinivasan R. A pipeline to extract drug-adverse event pairs from multiple data sources. BMC Med Inform Decis Mak 2014;14:13. [PMID: 24559132 PMCID: PMC3936866 DOI: 10.1186/1472-6947-14-13] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2013] [Accepted: 02/14/2014] [Indexed: 11/10/2022] Open

Number

Cited by Other Article(s)

Karapetiantz P, Audeh B, Redjdal A, Tiffet T, Bousquet C, Jaulent MC. Monitoring Adverse Drug Events in Web Forums: Evaluation of a Pipeline and Use Case Study. J Med Internet Res 2024;26:e46176. [PMID: 38888956 PMCID: PMC11220433 DOI: 10.2196/46176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 10/20/2023] [Accepted: 03/12/2024] [Indexed: 06/20/2024] Open

Abstract

BACKGROUND

To mitigate safety concerns, regulatory agencies must make informed decisions regarding drug usage and adverse drug events (ADEs). The primary pharmacovigilance data stem from spontaneous reports by health care professionals. However, underreporting poses a notable challenge within the current system. Explorations into alternative sources, including electronic patient records and social media, have been undertaken. Nevertheless, social media's potential remains largely untapped in real-world scenarios.

OBJECTIVE

The challenge faced by regulatory agencies in using social media is primarily attributed to the absence of suitable tools to support decision makers. An effective tool should enable access to information via a graphical user interface, presenting data in a user-friendly manner rather than in their raw form. This interface should offer various visualization options, empowering users to choose representations that best convey the data and facilitate informed decision-making. Thus, this study aims to assess the potential of integrating social media into pharmacovigilance and enhancing decision-making with this novel data source. To achieve this, our objective was to develop and assess a pipeline that processes data from the extraction of web forum posts to the generation of indicators and alerts within a visual and interactive environment. The goal was to create a user-friendly tool that enables regulatory authorities to make better-informed decisions effectively.

METHODS

To enhance pharmacovigilance efforts, we have devised a pipeline comprising 4 distinct modules, each independently editable, aimed at efficiently analyzing health-related French web forums. These modules were (1) web forums' posts extraction, (2) web forums' posts annotation, (3) statistics and signal detection algorithm, and (4) a graphical user interface (GUI). We showcase the efficacy of the GUI through an illustrative case study involving the introduction of the new formula of Levothyrox in France. This event led to a surge in reports to the French regulatory authority.

RESULTS

Between January 1, 2017, and February 28, 2021, a total of 2,081,296 posts were extracted from 23 French web forums. These posts contained 437,192 normalized drug-ADE couples, annotated with the Anatomical Therapeutic Chemical (ATC) Classification and Medical Dictionary for Regulatory Activities (MedDRA). The analysis of the Levothyrox new formula revealed a notable pattern. In August 2017, there was a sharp increase in posts related to this medication on social media platforms, which coincided with a substantial uptick in reports submitted by patients to the national regulatory authority during the same period.

CONCLUSIONS

We demonstrated that conducting quantitative analysis using the GUI is straightforward and requires no coding. The results aligned with prior research and also offered potential insights into drug-related matters. Our hypothesis received partial confirmation because the final users were not involved in the evaluation process. Further studies, concentrating on ergonomics and the impact on professionals within regulatory agencies, are imperative for future research endeavors. We emphasized the versatility of our approach and the seamless interoperability between different modules over the performance of individual modules. Specifically, the annotation module was integrated early in the development process and could undergo substantial enhancement by leveraging contemporary techniques rooted in the Transformers architecture. Our pipeline holds potential applications in health surveillance by regulatory agencies or pharmaceutical companies, aiding in the identification of safety concerns. Moreover, it could be used by research teams for retrospective analysis of events.

Collapse

Lee S, Woo H, Lee CC, Kim G, Kim JY, Lee S. Drug_SNSMiner: standard pharmacovigilance pipeline for detection of adverse drug reaction using SNS data. Sci Rep 2023;13:3779. [PMID: 36882478 PMCID: PMC9992476 DOI: 10.1038/s41598-023-28912-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Accepted: 01/27/2023] [Indexed: 03/09/2023] Open

Martenot V, Masdeu V, Cupe J, Gehin F, Blanchon M, Dauriat J, Horst A, Renaudin M, Girard P, Zucker JD. LiSA: an assisted literature search pipeline for detecting serious adverse drug events with deep learning. BMC Med Inform Decis Mak 2022;22:338. [PMID: 36550485 PMCID: PMC9773506 DOI: 10.1186/s12911-022-02085-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 12/13/2022] [Indexed: 12/24/2022] Open

Abstract

INTRODUCTION

Detecting safety signals attributed to a drug in scientific literature is a fundamental issue in pharmacovigilance. The constant increase in the volume of publications requires the automation of this tedious task, in order to find and extract relevant articles from the pack. This task is critical, as serious Adverse Drug Reactions (ADRs) still account for a large number of hospital admissions each year.

OBJECTIVES

The aim of this study is to develop an augmented intelligence methodology for automatically identifying relevant publications mentioning an established link between a Drug and a Serious Adverse Event, according to the European Medicines Agency (EMA) definition of seriousness.

METHODS

The proposed pipeline, called LiSA (for Literature Search Application), is based on three independent deep learning models supporting a precise detection of safety signals in the biomedical literature. By combining a Bidirectional Encoder Representations from Transformers (BERT) algorithms and a modular architecture, the pipeline achieves a precision of 0.81 and a recall of 0.89 at sentences level in articles extracted from PubMed (either abstract or full-text). We also measured that by using LiSA, a medical reviewer increases by a factor of 2.5 the number of relevant documents it can collect and evaluate compared to a simple keyword search. In the interest of re-usability, emphasis was placed on building a modular pipeline allowing the insertion of other NLP modules to enrich the results provided by the system, and extend it to other use cases. In addition, a lightweight visualization tool was developed to analyze and monitor safety signal results.

CONCLUSIONS

Overall, the generic pipeline and the visualization tool proposed in this article allows for efficient and accurate monitoring of serious adverse drug reactions from the literature and can easily be adapted to similar pharmacovigilance use cases. To facilitate reproducibility and benefit other research studies, we also shared a first benchmark dataset for Serious Adverse Drug Events detection.

Collapse

Gonzalez-Hernandez G, Krallinger M, Muñoz M, Rodriguez-Esteban R, Uzuner Ö, Hirschman L. Challenges and opportunities for mining adverse drug reactions: perspectives from pharma, regulatory agencies, healthcare providers and consumers. Database (Oxford) 2022;2022:baac071. [PMID: 36050787 PMCID: PMC9436770 DOI: 10.1093/database/baac071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 07/08/2022] [Accepted: 08/25/2022] [Indexed: 11/17/2022]

Binkheder S, Wu HY, Quinney SK, Zhang S, Zitu MM, Chiang CW, Wang L, Jones J, Li L. PhenoDEF: a corpus for annotating sentences with information of phenotype definitions in biomedical literature. J Biomed Semantics 2022;13:17. [PMID: 35690873 PMCID: PMC9188713 DOI: 10.1186/s13326-022-00272-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Accepted: 05/18/2022] [Indexed: 12/28/2022] Open

Abstract

BACKGROUND

Adverse events induced by drug-drug interactions are a major concern in the United States. Current research is moving toward using electronic health record (EHR) data, including for adverse drug events discovery. One of the first steps in EHR-based studies is to define a phenotype for establishing a cohort of patients. However, phenotype definitions are not readily available for all phenotypes. One of the first steps of developing automated text mining tools is building a corpus. Therefore, this study aimed to develop annotation guidelines and a gold standard corpus to facilitate building future automated approaches for mining phenotype definitions contained in the literature. Furthermore, our aim is to improve the understanding of how these published phenotype definitions are presented in the literature and how we annotate them for future text mining tasks.

RESULTS

Two annotators manually annotated the corpus on a sentence-level for the presence of evidence for phenotype definitions. Three major categories (inclusion, intermediate, and exclusion) with a total of ten dimensions were proposed characterizing major contextual patterns and cues for presenting phenotype definitions in published literature. The developed annotation guidelines were used to annotate the corpus that contained 3971 sentences: 1923 out of 3971 (48.4%) for the inclusion category, 1851 out of 3971 (46.6%) for the intermediate category, and 2273 out of 3971 (57.2%) for exclusion category. The highest number of annotated sentences was 1449 out of 3971 (36.5%) for the "Biomedical & Procedure" dimension. The lowest number of annotated sentences was 49 out of 3971 (1.2%) for "The use of NLP". The overall percent inter-annotator agreement was 97.8%. Percent and Kappa statistics also showed high inter-annotator agreement across all dimensions.

CONCLUSIONS

The corpus and annotation guidelines can serve as a foundational informatics approach for annotating and mining phenotype definitions in literature, and can be used later for text mining applications.

Collapse

Gao Y, Duan W, Rui H. Does Social Media Accelerate Product Recalls? Evidence from the Pharmaceutical Industry. INFORMATION SYSTEMS RESEARCH 2021. [DOI: 10.1287/isre.2021.1092] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

Tandan M, Acharya Y, Pokharel S, Timilsina M. Discovering symptom patterns of COVID-19 patients using association rule mining. Comput Biol Med 2021;131:104249. [PMID: 33561673 PMCID: PMC7966840 DOI: 10.1016/j.compbiomed.2021.104249] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 01/25/2021] [Accepted: 01/25/2021] [Indexed: 12/16/2022]

Abstract

BACKGROUND

The COVID-19 pandemic is a significant public health crisis that is hitting hard on people's health, well-being, and freedom of movement, and affecting the global economy. Scientists worldwide are competing to develop therapeutics and vaccines; currently, three drugs and two vaccine candidates have been given emergency authorization use. However, there are still questions of efficacy with regard to specific subgroups of patients and the vaccine's scalability to the general public. Under such circumstances, understanding COVID-19 symptoms is vital in initial triage; it is crucial to distinguish the severity of cases for effective management and treatment. This study aimed to discover symptom patterns and overall symptom rules, including rules disaggregated by age, sex, chronic condition, and mortality status, among COVID-19 patients.

METHODS

This study was a retrospective analysis of COVID-19 patient data made available online by the Wolfram Data Repository through May 27, 2020. We applied a widely used rule-based machine learning technique called association rule mining to identify frequent symptoms and define patterns in the rules discovered.

RESULT

In total, 1,560 patients with COVID-19 were included in the study, with a median age of 52 years. The most frequently occurring symptom was fever (67%), followed by cough (37%), malaise/body soreness (11%), pneumonia (11%), and sore throat (8%). Myocardial infarction, heart failure, and renal disease were present in less than 1% of patients. The top ten significant symptom rules (out of 71 generated) showed cough, septic shock, and respiratory distress syndrome as frequent consequents. If a patient had a breathing problem and sputum production, then, there was higher confidence of that patient having a cough; if cardiac disease, renal disease, or pneumonia was present, then there was a higher confidence of septic shock or respiratory distress syndrome. Symptom rules differed between younger and older patients and between male and female patients. Patients who had chronic conditions or died of COVID-19 had more severe symptom rules than those patients who did not have chronic conditions or survived of COVID-19. Concerning chronic condition rules among 147 patients, if a patient had diabetes, prerenal azotemia, and coronary bypass surgery, there was a certainty of hypertension.

CONCLUSION

The most frequently reported symptoms in patients with COVID-19 were fever, cough, pneumonia, and sore throat; while 1% had severe symptoms, such as septic shock, respiratory distress syndrome, and respiratory failure. Symptom rules differed by age and sex. Patients with chronic disease and patients who died of COVID-19 had severe symptom rules more specifically, cardiovascular-related symptoms accompanied by pneumonia, fever, and cough as consequents.

Collapse

Cheerkoot-Jalim S, Khedo KK. A systematic review of text mining approaches applied to various application areas in the biomedical domain. JOURNAL OF KNOWLEDGE MANAGEMENT 2020. [DOI: 10.1108/jkm-09-2019-0524] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]

Abstract Purpose This work shows the results of a systematic literature review on biomedical text mining. The purpose of this study is to identify the different text mining approaches used in different application areas of the biomedical domain, the common tools used and the challenges of biomedical text mining as compared to generic text mining algorithms. This study will be of value to biomedical researchers by allowing them to correlate text mining approaches to specific biomedical application areas. Implications for future research are also discussed. Design/methodology/approach The review was conducted following the principles of the Kitchenham method. A number of research questions were first formulated, followed by the definition of the search strategy. The papers were then selected based on a list of assessment criteria. Each of the papers were analyzed and information relevant to the research questions were extracted. Findings It was found that researchers have mostly harnessed data sources such as electronic health records, biomedical literature, social media and health-related forums. The most common text mining technique was natural language processing using tools such as MetaMap and Unstructured Information Management Architecture, alongside the use of medical terminologies such as Unified Medical Language System. The main application area was the detection of adverse drug events. Challenges identified included the need to deal with huge amounts of text, the heterogeneity of the different data sources, the duality of meaning of words in biomedical text and the amount of noise introduced mainly from social media and health-related forums. Originality/value To the best of the authors’ knowledge, other reviews in this area have focused on either specific techniques, specific application areas or specific data sources. The results of this review will help researchers to correlate most relevant and recent advances in text mining approaches to specific biomedical application areas by providing an up-to-date and holistic view of work done in this research area. The use of emerging text mining techniques has great potential to spur the development of innovative applications, thus considerably impacting on the advancement of biomedical research. Collapse

Yu Y, Ruddy K, Mansfield A, Zong N, Wen A, Tsuji S, Huang M, Liu H, Shah N, Jiang G. Detecting and Filtering Immune-Related Adverse Events Signal Based on Text Mining and Observational Health Data Sciences and Informatics Common Data Model: Framework Development Study. JMIR Med Inform 2020;8:e17353. [PMID: 32530430 PMCID: PMC7320306 DOI: 10.2196/17353] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Revised: 04/13/2020] [Accepted: 04/15/2020] [Indexed: 01/30/2023] Open

Abstract

Background

Immune checkpoint inhibitors are associated with unique immune-related adverse events (irAEs). As most of the immune checkpoint inhibitors are new to the market, it is important to conduct studies using real-world data sources to investigate their safety profiles.

Objective

The aim of the study was to develop a framework for signal detection and filtration of novel irAEs for 6 Food and Drug Administration–approved immune checkpoint inhibitors.

Methods

In our framework, we first used the Food and Drug Administration’s Adverse Event Reporting System (FAERS) standardized in an Observational Health Data Sciences and Informatics (OHDSI) common data model (CDM) to collect immune checkpoint inhibitor-related event data and conducted irAE signal detection. OHDSI CDM is a standard-driven data model that focuses on transforming different databases into a common format and standardizing medical terms to a common representation. We then filtered those already known irAEs from drug labels and literature by using a customized text-mining pipeline based on clinical text analysis and knowledge extraction system with Medical Dictionary for Regulatory Activities (MedDRA) as a dictionary. Finally, we classified the irAE detection results into three different categories to discover potentially new irAE signals.

Results

By our text-mining pipeline, 490 irAE terms were identified from drug labels, and 918 terms were identified from the literature. In addition, of the 94 positive signals detected using CDM-based FAERS, 53 signals (56%) were labeled signals, 10 (11%) were unlabeled published signals, and 31 (33%) were potentially new signals.

Conclusions

We demonstrated that our approach is effective for irAE signal detection and filtration. Moreover, our CDM-based framework could facilitate adverse drug events detection and filtration toward the goal of next-generation pharmacovigilance that seamlessly integrates electronic health record data for improved signal detection.

Collapse

Ju M, Nguyen NTH, Miwa M, Ananiadou S. An ensemble of neural models for nested adverse drug events and medication extraction with subwords. J Am Med Inform Assoc 2020;27:22-30. [PMID: 31197355 PMCID: PMC6913208 DOI: 10.1093/jamia/ocz075] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 03/22/2019] [Accepted: 05/07/2019] [Indexed: 11/13/2022] Open

Arnoux-Guenegou A, Girardeau Y, Chen X, Deldossi M, Aboukhamis R, Faviez C, Dahamna B, Karapetiantz P, Guillemin-Lanne S, Lillo-Le Louët A, Texier N, Burgun A, Katsahian S. The Adverse Drug Reactions From Patient Reports in Social Media Project: Protocol for an Evaluation Against a Gold Standard. JMIR Res Protoc 2019;8:e11448. [PMID: 31066711 PMCID: PMC6528435 DOI: 10.2196/11448] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 11/16/2018] [Accepted: 12/21/2018] [Indexed: 12/30/2022] Open

Abstract

Background

Social media is a potential source of information on postmarketing drug safety surveillance that still remains unexploited nowadays. Information technology solutions aiming at extracting adverse reactions (ADRs) from posts on health forums require a rigorous evaluation methodology if their results are to be used to make decisions. First, a gold standard, consisting of manual annotations of the ADR by human experts from the corpus extracted from social media, must be implemented and its quality must be assessed. Second, as for clinical research protocols, the sample size must rely on statistical arguments. Finally, the extraction methods must target the relation between the drug and the disease (which might be either treated or caused by the drug) rather than simple co-occurrences in the posts.

Objective

We propose a standardized protocol for the evaluation of a software extracting ADRs from the messages on health forums. The study is conducted as part of the Adverse Drug Reactions from Patient Reports in Social Media project.

Methods

Messages from French health forums were extracted. Entity recognition was based on Racine Pharma lexicon for drugs and Medical Dictionary for Regulatory Activities terminology for potential adverse events (AEs). Natural language processing–based techniques automated the ADR information extraction (relation between the drug and AE entities). The corpus of evaluation was a random sample of the messages containing drugs and/or AE concepts corresponding to recent pharmacovigilance alerts. A total of 2 persons experienced in medical terminology manually annotated the corpus, thus creating the gold standard, according to an annotator guideline. We will evaluate our tool against the gold standard with recall, precision, and f-measure. Interannotator agreement, reflecting gold standard quality, will be evaluated with hierarchical kappa. Granularities in the terminologies will be further explored.

Results

Necessary and sufficient sample size was calculated to ensure statistical confidence in the assessed results. As we expected a global recall of 0.5, we needed at least 384 identified ADR concepts to obtain a 95% CI with a total width of 0.10 around 0.5. The automated ADR information extraction in the corpus for evaluation is already finished. The 2 annotators already completed the annotation process. The analysis of the performance of the ADR information extraction module as compared with gold standard is ongoing.

Conclusions

This protocol is based on the standardized statistical methods from clinical research to create the corpus, thus ensuring the necessary statistical power of the assessed results. Such evaluation methodology is required to make the ADR information extraction software useful for postmarketing drug safety surveillance.

International Registered Report Identifier (IRRID)

RR1-10.2196/11448

Collapse

Affiliation(s)

Armelle Arnoux-Guenegou INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Centre de Recherche des Cordeliers, Paris, France
Yannick Girardeau INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Centre de Recherche des Cordeliers, Paris, France.,Département d'Informatique Médicale, Hôpital Européen Georges-Pompidou, Assistance Publique - Hôpitaux de Paris, Paris, France
Xiaoyi Chen INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Centre de Recherche des Cordeliers, Paris, France
Myrtille Deldossi Innovative Projects - Text Mining, Expert System, Paris, France
Rim Aboukhamis Centre Régional de Pharmacovigilance, Hôpital Européen Georges-Pompidou, Assistance Publique - Hôpitaux de Paris, Paris, France
Carole Faviez Kappa Santé, Paris, France
Badisse Dahamna Service d'Informatique Biomédicale, D2IM, Centre Hospitalier Universitaire de Rouen, Rouen, France
Pierre Karapetiantz INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Centre de Recherche des Cordeliers, Paris, France
Sylvie Guillemin-Lanne Innovative Projects - Text Mining, Expert System, Paris, France
Agnès Lillo-Le Louët Centre Régional de Pharmacovigilance, Hôpital Européen Georges-Pompidou, Assistance Publique - Hôpitaux de Paris, Paris, France
Nathalie Texier Kappa Santé, Paris, France
Anita Burgun INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Centre de Recherche des Cordeliers, Paris, France.,Département d'Informatique Médicale, Hôpital Européen Georges-Pompidou, Assistance Publique - Hôpitaux de Paris, Paris, France.,INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Paris Descartes University, Sorbonne Paris Cité, Paris, France
Sandrine Katsahian INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Centre de Recherche des Cordeliers, Paris, France.,INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Paris Descartes University, Sorbonne Paris Cité, Paris, France.,Clinical Research Unit Hôpitaux Universitaires Paris Ouest, Hôpital Européen Georges-Pompidou, Assistance Publique - Hôpitaux de Paris, Paris, France.,INSERM CIC1418, Clinical Epidemiology, Hôpital Européen Georges-Pompidou, Paris, France

Collapse

Zhang M, Zhang M, Ge C, Liu Q, Wang J, Wei J, Zhu KQ. Automatic discovery of adverse reactions through Chinese social media. Data Min Knowl Discov 2019. [DOI: 10.1007/s10618-018-00610-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]

Harnessing social media data for pharmacovigilance: a review of current state of the art, challenges and future directions. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2019. [DOI: 10.1007/s41060-019-00175-3] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]

Negi K, Pavuri A, Patel L, Jain C. A novel method for drug-adverse event extraction using machine learning. INFORMATICS IN MEDICINE UNLOCKED 2019. [DOI: 10.1016/j.imu.2019.100190] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open

Convertino I, Ferraro S, Blandizzi C, Tuccori M. The usefulness of listening social media for pharmacovigilance purposes: a systematic review. Expert Opin Drug Saf 2018;17:1081-1093. [DOI: 10.1080/14740338.2018.1531847] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]

Jang G, Lee T, Hwang S, Park C, Ahn J, Seo S, Hwang Y, Yoon Y. PISTON: Predicting drug indications and side effects using topic modeling and natural language processing. J Biomed Inform 2018;87:96-107. [PMID: 30268842 DOI: 10.1016/j.jbi.2018.09.015] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Revised: 08/23/2018] [Accepted: 09/27/2018] [Indexed: 02/07/2023]

Abstract

The process of discovering novel drugs to treat diseases requires a long time and high cost. It is important to understand side effects of drugs as well as their therapeutic effects, because these can seriously damage the patients due to unexpected actions of the derived candidate drugs. In order to overcome these limitations, computational methods for predicting the therapeutic effects and side effects have been proposed. In particular, text mining is a widely used technique in the field of systems biology, because it can discover hidden relationships between drugs, genes and diseases from a large amount of literature data. Compared with in vivo/in vitro experiments, text mining derives meaningful results with less time and cost. In this study, we propose an algorithm for predicting novel drug-phenotype associations and drug-side effect associations using topic modeling and natural language processing (NLP). We extract sentences in which drugs and genes co-occur from the abstracts of the literature and identify words that describe the relationship between them using NLP. Considering the characteristics of the identified words, we determine if the drug has an up-regulation effect or a down-regulation effect on the gene. Based on genes that affect drugs and their regulatory relationships, we group the frequently occurring genes and regulatory relationships into topics, and build a drug-topic probability matrix by calculating the score that the drug will have a topic using topic modeling. Using the matrix, a classifier is constructed for predicting the novel indications and side effects of drugs considering the characteristics of known drug-phenotype associations or drug-side effect associations. The proposed method predicts both indications and side effects with a single algorithm, and it can exclude drugs with serious side effects or side effects that patients do not want to experience from among the candidate drugs provided for the treatment of the phenotype. Furthermore, lists of novel candidate drugs for phenotypes and side effects can be continuously updated with our algorithm every time a document is added. More than a thousand documents are produced per day, and it is possible for our algorithm to efficiently derive candidate drugs because it requires less cost than the existing drug repositioning methods. The resource of PISTON is available at databio.gachon.ac.kr/tools/PISTON.

Collapse

LIU J, JIANG X, CHEN Q, SONG M, LI J. Adverse Drug Reaction Related Post Detecting Using Sentiment Feature. IRANIAN JOURNAL OF PUBLIC HEALTH 2018;47:861-867. [PMID: 30087872 PMCID: PMC6077625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]

Chen X, Faviez C, Schuck S, Lillo-Le-Louët A, Texier N, Dahamna B, Huot C, Foulquié P, Pereira S, Leroux V, Karapetiantz P, Guenegou-Arnoux A, Katsahian S, Bousquet C, Burgun A. Mining Patients' Narratives in Social Media for Pharmacovigilance: Adverse Effects and Misuse of Methylphenidate. Front Pharmacol 2018;9:541. [PMID: 29881351 PMCID: PMC5978246 DOI: 10.3389/fphar.2018.00541] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Accepted: 05/04/2018] [Indexed: 12/29/2022] Open

Abstract

Background: The Food and Drug Administration (FDA) in the United States and the European Medicines Agency (EMA) have recognized social media as a new data source to strengthen their activities regarding drug safety.

Objective: Our objective in the ADR-PRISM project was to provide text mining and visualization tools to explore a corpus of posts extracted from social media. We evaluated this approach on a corpus of 21 million posts from five patient forums, and conducted a qualitative analysis of the data available on methylphenidate in this corpus.

Methods: We applied text mining methods based on named entity recognition and relation extraction in the corpus, followed by signal detection using proportional reporting ratio (PRR). We also used topic modeling based on the Correlated Topic Model to obtain the list of the matics in the corpus and classify the messages based on their topics.

Results: We automatically identified 3443 posts about methylphenidate published between 2007 and 2016, among which 61 adverse drug reactions (ADR) were automatically detected. Two pharmacovigilance experts evaluated manually the quality of automatic identification, and a f-measure of 0.57 was reached. Patient's reports were mainly neuro-psychiatric effects. Applying PRR, 67% of the ADRs were signals, including most of the neuro-psychiatric symptoms but also palpitations. Topic modeling showed that the most represented topics were related to Childhood and Treatment initiation, but also Side effects. Cases of misuse were also identified in this corpus, including recreational use and abuse.

Conclusion: Named entity recognition combined with signal detection and topic modeling have demonstrated their complementarity in mining social media data. An in-depth analysis focused on methylphenidate showed that this approach was able to detect potential signals and to provide better understanding of patients' behaviors regarding drugs, including misuse.

Collapse

Affiliation(s)

Xiaoyi Chen UMRS 1138, équipe 22, Institut National de la Santé et de la Recherche Médicale, Centre de Recherche des Cordeliers, Université Paris Descartes, Paris, France
Carole Faviez Kappa Santé, Paris, France
Stéphane Schuck Kappa Santé, Paris, France
Agnès Lillo-Le-Louët Centre Régional de Pharmacovigilance, Hôpital Européen Georges-Pompidou, AP-HP, Paris, France
Nathalie Texier Kappa Santé, Paris, France
Badisse Dahamna Service d'Informatique Biomédicale, Centre Hospitalier Universitaire de Rouen, Rouen, France.,Laboratoire d'Informatique, du Traitement de l'Information et des Systèmes-TIBS EA 4108, Rouen, France
Charles Huot Expert System, Paris, France
Pierre Foulquié Kappa Santé, Paris, France
Suzanne Pereira Vidal, Issy Les Moulineaux, France
Vincent Leroux Institut de Santé Urbaine, Saint-Maurice, France
Pierre Karapetiantz UMRS 1138, équipe 22, Institut National de la Santé et de la Recherche Médicale, Centre de Recherche des Cordeliers, Université Paris Descartes, Paris, France
Armelle Guenegou-Arnoux UMRS 1138, équipe 22, Institut National de la Santé et de la Recherche Médicale, Centre de Recherche des Cordeliers, Université Paris Descartes, Paris, France
Sandrine Katsahian UMRS 1138, équipe 22, Institut National de la Santé et de la Recherche Médicale, Centre de Recherche des Cordeliers, Université Paris Descartes, Paris, France.,Département d'Informatique Médicale, Hôpital Européen Georges Pompidou, Paris, France
Cédric Bousquet Sorbonne Université, Inserm, université Paris 13, Laboratoire d'informatique médicale et d'ingénierie des connaissances en e-santé, LIMICS, Paris, France
Anita Burgun UMRS 1138, équipe 22, Institut National de la Santé et de la Recherche Médicale, Centre de Recherche des Cordeliers, Université Paris Descartes, Paris, France.,Département d'Informatique Médicale, Hôpital Européen Georges Pompidou, Paris, France

Collapse

Harpaz R, DuMouchel W, Schuemie M, Bodenreider O, Friedman C, Horvitz E, Ripple A, Sorbello A, White RW, Winnenburg R, Shah NH. Toward multimodal signal detection of adverse drug reactions. J Biomed Inform 2017;76:41-49. [PMID: 29081385 PMCID: PMC8502488 DOI: 10.1016/j.jbi.2017.10.013] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2017] [Revised: 10/14/2017] [Accepted: 10/24/2017] [Indexed: 11/27/2022]

Abstract

OBJECTIVE

Improving mechanisms to detect adverse drug reactions (ADRs) is key to strengthening post-marketing drug safety surveillance. Signal detection is presently unimodal, relying on a single information source. Multimodal signal detection is based on jointly analyzing multiple information sources. Building on, and expanding the work done in prior studies, the aim of the article is to further research on multimodal signal detection, explore its potential benefits, and propose methods for its construction and evaluation.

MATERIAL AND METHODS

Four data sources are investigated; FDA's adverse event reporting system, insurance claims, the MEDLINE citation database, and the logs of major Web search engines. Published methods are used to generate and combine signals from each data source. Two distinct reference benchmarks corresponding to well-established and recently labeled ADRs respectively are used to evaluate the performance of multimodal signal detection in terms of area under the ROC curve (AUC) and lead-time-to-detection, with the latter relative to labeling revision dates.

RESULTS

Limited to our reference benchmarks, multimodal signal detection provides AUC improvements ranging from 0.04 to 0.09 based on a widely used evaluation benchmark, and a comparative added lead-time of 7-22 months relative to labeling revision dates from a time-indexed benchmark.

CONCLUSIONS

The results support the notion that utilizing and jointly analyzing multiple data sources may lead to improved signal detection. Given certain data and benchmark limitations, the early stage of development, and the complexity of ADRs, it is currently not possible to make definitive statements about the ultimate utility of the concept. Continued development of multimodal signal detection requires a deeper understanding the data sources used, additional benchmarks, and further research on methods to generate and synthesize signals.

Collapse

Bousquet C, Dahamna B, Guillemin-Lanne S, Darmoni SJ, Faviez C, Huot C, Katsahian S, Leroux V, Pereira S, Richard C, Schück S, Souvignet J, Lillo-Le Louët A, Texier N. The Adverse Drug Reactions from Patient Reports in Social Media Project: Five Major Challenges to Overcome to Operationalize Analysis and Efficiently Support Pharmacovigilance Process. JMIR Res Protoc 2017;6:e179. [PMID: 28935617 PMCID: PMC5629348 DOI: 10.2196/resprot.6463] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Revised: 06/19/2017] [Accepted: 07/12/2017] [Indexed: 11/13/2022] Open

Abstract

Background

Adverse drug reactions (ADRs) are an important cause of morbidity and mortality. Classical Pharmacovigilance process is limited by underreporting which justifies the current interest in new knowledge sources such as social media. The Adverse Drug Reactions from Patient Reports in Social Media (ADR-PRISM) project aims to extract ADRs reported by patients in these media. We identified 5 major challenges to overcome to operationalize the analysis of patient posts: (1) variable quality of information on social media, (2) guarantee of data privacy, (3) response to pharmacovigilance expert expectations, (4) identification of relevant information within Web pages, and (5) robust and evolutive architecture.

Objective

This article aims to describe the current state of advancement of the ADR-PRISM project by focusing on the solutions we have chosen to address these 5 major challenges.

Methods

In this article, we propose methods and describe the advancement of this project on several aspects: (1) a quality driven approach for selecting relevant social media for the extraction of knowledge on potential ADRs, (2) an assessment of ethical issues and French regulation for the analysis of data on social media, (3) an analysis of pharmacovigilance expert requirements when reviewing patient posts on the Internet, (4) an extraction method based on natural language processing, pattern based matching, and selection of relevant medical concepts in reference terminologies, and (5) specifications of a component-based architecture for the monitoring system.

Results

Considering the 5 major challenges, we (1) selected a set of 21 validated criteria for selecting social media to support the extraction of potential ADRs, (2) proposed solutions to guarantee data privacy of patients posting on Internet, (3) took into account pharmacovigilance expert requirements with use case diagrams and scenarios, (4) built domain-specific knowledge resources embeding a lexicon, morphological rules, context rules, semantic rules, syntactic rules, and post-analysis processing, and (5) proposed a component-based architecture that allows storage of big data and accessibility to third-party applications through Web services.

Conclusions

We demonstrated the feasibility of implementing a component-based architecture that allows collection of patient posts on the Internet, near real-time processing of those posts including annotation, and storage in big data structures. In the next steps, we will evaluate the posts identified by the system in social media to clarify the interest and relevance of such approach to improve conventional pharmacovigilance processes based on spontaneous reporting.

Collapse

Abdellaoui R, Schück S, Texier N, Burgun A. Filtering Entities to Optimize Identification of Adverse Drug Reaction From Social Media: How Can the Number of Words Between Entities in the Messages Help? JMIR Public Health Surveill 2017. [PMID: 28642212 PMCID: PMC5500778 DOI: 10.2196/publichealth.6577] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Koutkias VG, Lillo-Le Louët A, Jaulent MC. Exploiting heterogeneous publicly available data sources for drug safety surveillance: computational framework and case studies. Expert Opin Drug Saf 2016;16:113-124. [PMID: 27813420 DOI: 10.1080/14740338.2017.1257604] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]

Pechsiri C, Piriyakul R. Extraction of a group-pair relation: problem-solving relation from web-board documents. SPRINGERPLUS 2016;5:1265. [PMID: 27540498 PMCID: PMC4975736 DOI: 10.1186/s40064-016-2864-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2015] [Accepted: 07/19/2016] [Indexed: 11/10/2022]

Zheng Y, Lan C, Peng H, Li J. Using constrained information entropy to detect rare adverse drug reactions from medical forums. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2016;2016:2460-2463. [PMID: 28268822 DOI: 10.1109/embc.2016.7591228] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Kuhn M, Letunic I, Jensen LJ, Bork P. The SIDER database of drugs and side effects. Nucleic Acids Res 2015;44:D1075-9. [PMID: 26481350 PMCID: PMC4702794 DOI: 10.1093/nar/gkv1075] [Citation(s) in RCA: 631] [Impact Index Per Article: 70.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2015] [Accepted: 10/06/2015] [Indexed: 01/10/2023] Open

Golder S, Norman G, Loke YK. Systematic review on the prevalence, frequency and comparative value of adverse events data in social media. Br J Clin Pharmacol 2015;80:878-88. [PMID: 26271492 PMCID: PMC4594731 DOI: 10.1111/bcp.12746] [Citation(s) in RCA: 66] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2015] [Revised: 07/16/2015] [Accepted: 08/03/2015] [Indexed: 11/27/2022] Open

Sloane R, Osanlou O, Lewis D, Bollegala D, Maskell S, Pirmohamed M. Social media and pharmacovigilance: A review of the opportunities and challenges. Br J Clin Pharmacol 2015;80:910-20. [PMID: 26147850 PMCID: PMC4594734 DOI: 10.1111/bcp.12717] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2015] [Revised: 06/29/2015] [Accepted: 07/03/2015] [Indexed: 01/23/2023] Open

Leveraging MEDLINE indexing for pharmacovigilance - Inherent limitations and mitigation strategies. J Biomed Inform 2015;57:425-35. [PMID: 26342964 DOI: 10.1016/j.jbi.2015.08.022] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Revised: 07/30/2015] [Accepted: 08/25/2015] [Indexed: 11/24/2022]

Abstract

BACKGROUND

Traditional approaches to pharmacovigilance center on the signal detection from spontaneous reports, e.g., the U.S. Food and Drug Administration (FDA) adverse event reporting system (FAERS). In order to enrich the scientific evidence and enhance the detection of emerging adverse drug events that can lead to unintended harmful outcomes, pharmacovigilance activities need to evolve to encompass novel complementary data streams, for example the biomedical literature available through MEDLINE.

OBJECTIVES

(1) To review how the characteristics of MEDLINE indexing influence the identification of adverse drug events (ADEs); (2) to leverage this knowledge to inform the design of a system for extracting ADEs from MEDLINE indexing; and (3) to assess the specific contribution of some characteristics of MEDLINE indexing to the performance of this system.

METHODS

We analyze the characteristics of MEDLINE indexing. We integrate three specific characteristics into the design of a system for extracting ADEs from MEDLINE indexing. We experimentally assess the specific contribution of these characteristics over a baseline system based on co-occurrence between drug descriptors qualified by adverse effects and disease descriptors qualified by chemically induced.

RESULTS

Our system extracted 405,300 ADEs from 366,120 MEDLINE articles. The baseline system accounts for 297,093 ADEs (73%). 85,318 ADEs (21%) can be extracted only after integrating specific pre-coordinated MeSH descriptors and additional qualifiers. 22,889 ADEs (6%) can be extracted only after considering indirect links between the drug of interest and the descriptor that bears the ADE context.

CONCLUSIONS

In this paper, we demonstrate significant improvement over a baseline approach to identifying ADEs from MEDLINE indexing, which mitigates some of the inherent limitations of MEDLINE indexing for pharmacovigilance. ADEs extracted from MEDLINE indexing are complementary to, not a replacement for, other sources.

Collapse

Lardon J, Abdellaoui R, Bellet F, Asfari H, Souvignet J, Texier N, Jaulent MC, Beyens MN, Burgun A, Bousquet C. Adverse Drug Reaction Identification and Extraction in Social Media: A Scoping Review. J Med Internet Res 2015;17:e171. [PMID: 26163365 PMCID: PMC4526988 DOI: 10.2196/jmir.4304] [Citation(s) in RCA: 67] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2015] [Revised: 04/09/2015] [Accepted: 04/22/2015] [Indexed: 02/06/2023] Open

Abstract

Background

The underreporting of adverse drug reactions (ADRs) through traditional reporting channels is a limitation in the efficiency of the current pharmacovigilance system. Patients’ experiences with drugs that they report on social media represent a new source of data that may have some value in postmarketing safety surveillance.

Objective

A scoping review was undertaken to explore the breadth of evidence about the use of social media as a new source of knowledge for pharmacovigilance.

Methods

Daubt et al’s recommendations for scoping reviews were followed. The research questions were as follows: How can social media be used as a data source for postmarketing drug surveillance? What are the available methods for extracting data? What are the different ways to use these data? We queried PubMed, Embase, and Google Scholar to extract relevant articles that were published before June 2014 and with no lower date limit. Two pairs of reviewers independently screened the selected studies and proposed two themes of review: manual ADR identification (theme 1) and automated ADR extraction from social media (theme 2). Descriptive characteristics were collected from the publications to create a database for themes 1 and 2.

Results

Of the 1032 citations from PubMed and Embase, 11 were relevant to the research question. An additional 13 citations were added after further research on the Internet and in reference lists. Themes 1 and 2 explored 11 and 13 articles, respectively. Ways of approaching the use of social media as a pharmacovigilance data source were identified.

Conclusions

This scoping review noted multiple methods for identifying target data, extracting them, and evaluating the quality of medical information from social media. It also showed some remaining gaps in the field. Studies related to the identification theme usually failed to accurately assess the completeness, quality, and reliability of the data that were analyzed from social media. Regarding extraction, no study proposed a generic approach to easily adding a new site or data source. Additional studies are required to precisely determine the role of social media in the pharmacovigilance system.

Collapse

Bridging islands of information to establish an integrated knowledge base of drugs and health outcomes of interest. Drug Saf 2015;37:557-67. [PMID: 24985530 PMCID: PMC4134480 DOI: 10.1007/s40264-014-0189-0] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Sarker A, Ginn R, Nikfarjam A, O'Connor K, Smith K, Jayaraman S, Upadhaya T, Gonzalez G. Utilizing social media data for pharmacovigilance: A review. J Biomed Inform 2015;54:202-12. [PMID: 25720841 DOI: 10.1016/j.jbi.2015.02.004] [Citation(s) in RCA: 238] [Impact Index Per Article: 26.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2014] [Revised: 01/02/2015] [Accepted: 02/15/2015] [Indexed: 10/23/2022]

Abstract

OBJECTIVE

Automatic monitoring of Adverse Drug Reactions (ADRs), defined as adverse patient outcomes caused by medications, is a challenging research problem that is currently receiving significant attention from the medical informatics community. In recent years, user-posted data on social media, primarily due to its sheer volume, has become a useful resource for ADR monitoring. Research using social media data has progressed using various data sources and techniques, making it difficult to compare distinct systems and their performances. In this paper, we perform a methodical review to characterize the different approaches to ADR detection/extraction from social media, and their applicability to pharmacovigilance. In addition, we present a potential systematic pathway to ADR monitoring from social media.

METHODS

We identified studies describing approaches for ADR detection from social media from the Medline, Embase, Scopus and Web of Science databases, and the Google Scholar search engine. Studies that met our inclusion criteria were those that attempted to extract ADR information posted by users on any publicly available social media platform. We categorized the studies according to different characteristics such as primary ADR detection approach, size of corpus, data source(s), availability, and evaluation criteria.

RESULTS

Twenty-two studies met our inclusion criteria, with fifteen (68%) published within the last two years. However, publicly available annotated data is still scarce, and we found only six studies that made the annotations used publicly available, making system performance comparisons difficult. In terms of algorithms, supervised classification techniques to detect posts containing ADR mentions, and lexicon-based approaches for extraction of ADR mentions from texts have been the most popular.

CONCLUSION

Our review suggests that interest in the utilization of the vast amounts of available social media data for ADR monitoring is increasing. In terms of sources, both health-related and general social media data have been used for ADR detection-while health-related sources tend to contain higher proportions of relevant data, the volume of data from general social media websites is significantly higher. There is still very limited amount of annotated data publicly available , and, as indicated by the promising results obtained by recent supervised learning approaches, there is a strong need to make such data available to the research community.

Collapse

Sarker A, Gonzalez G. Portable automatic text classification for adverse drug reaction detection via multi-corpus training. J Biomed Inform 2014;53:196-207. [PMID: 25451103 DOI: 10.1016/j.jbi.2014.11.002] [Citation(s) in RCA: 145] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2014] [Revised: 10/24/2014] [Accepted: 11/02/2014] [Indexed: 10/24/2022]

Abstract

OBJECTIVE

Automatic detection of adverse drug reaction (ADR) mentions from text has recently received significant interest in pharmacovigilance research. Current research focuses on various sources of text-based information, including social media-where enormous amounts of user posted data is available, which have the potential for use in pharmacovigilance if collected and filtered accurately. The aims of this study are: (i) to explore natural language processing (NLP) approaches for generating useful features from text, and utilizing them in optimized machine learning algorithms for automatic classification of ADR assertive text segments; (ii) to present two data sets that we prepared for the task of ADR detection from user posted internet data; and (iii) to investigate if combining training data from distinct corpora can improve automatic classification accuracies.

METHODS

One of our three data sets contains annotated sentences from clinical reports, and the two other data sets, built in-house, consist of annotated posts from social media. Our text classification approach relies on generating a large set of features, representing semantic properties (e.g., sentiment, polarity, and topic), from short text nuggets. Importantly, using our expanded feature sets, we combine training data from different corpora in attempts to boost classification accuracies.

RESULTS

Our feature-rich classification approach performs significantly better than previously published approaches with ADR class F-scores of 0.812 (previously reported best: 0.770), 0.538 and 0.678 for the three data sets. Combining training data from multiple compatible corpora further improves the ADR F-scores for the in-house data sets to 0.597 (improvement of 5.9 units) and 0.704 (improvement of 2.6 units) respectively.

CONCLUSIONS

Our research results indicate that using advanced NLP techniques for generating information rich features from text can significantly improve classification accuracies over existing benchmarks. Our experiments illustrate the benefits of incorporating various semantic features such as topics, concepts, sentiments, and polarities. Finally, we show that integration of information from compatible corpora can significantly improve classification performance. This form of multi-corpus training may be particularly useful in cases where data sets are heavily imbalanced (e.g., social media data), and may reduce the time and costs associated with the annotation of data in the future.

Collapse