1
|
Zhang BX, Lin WY, Huang TK. Stacking Ensemble of Disproportionality Indicators for Adverse Vaccine Reactions Detection-An Empirical Study on Predicting Adverse Reactions of COVID-19 Vaccines. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-4. [PMID: 38082660 DOI: 10.1109/embc40787.2023.10340698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
Vaccine safety is a critical issue for public health, which has recently become more crucial than ever since COVID-19 started to spread worldwide in 2020. Many COVID-19 vaccines have been developed and used without following the traditional three clinical trial stages. Instead, most COVID-19 vaccines were approved through emergency use approval (EUA) within one year, significantly raising the risk of rare and severe adverse events. Reporting systems like the Vaccine Adverse Event Reporting System (VAERS) have been established worldwide to detect unknown and severe adverse reactions as early as possible. Although experts and researchers have been working hard to find ways to detect adverse vaccine event (AVE) signals from VAERS data, most of the contemporary methods are statistical methods based on measuring the disproportionality between vaccine-induced events and non-vaccine-induced events. This paper proposes a novel ensemble AVE detection method, which adopts a stacking ensemble of various disproportionality indicators, fusing dual-scale contingency values measured in single and cumulative yearly duration, and embraces the concept of feature concatenation. Experiments conducted on US VAERS data to predict AVE caused by COVID-19 vaccines show that our proposed method is effective. We observed that: (1) Stacking ensemble of various disproportionality indicators is superior to any single disproportionality indicator and voting ensemble method; (2) Fusing dual-scale contingency values and feature concatenation brings synergy to our proposed stacking ensemble AVE detection. Compared to the best disproportionality metric in this study, our top-performing ensemble version exhibited a 34% improvement in accuracy, 71% in precision, 29% in recall, and 77% in F-measure, with a slight decrease (8%) in specificity.
Collapse
|
2
|
Giovanni RD, Cochrane A, Parker J, Lewis DJ. Adverse events in the digital age and where to find them. Pharmacoepidemiol Drug Saf 2022; 31:1131-1139. [PMID: 35996833 DOI: 10.1002/pds.5532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 07/02/2022] [Accepted: 08/19/2022] [Indexed: 11/12/2022]
Abstract
Exponential growth of health-related data collected by digital tools is a reality within pharmaceutical and medical device research and development. Data generated through digital tools may be categorized as relevant to efficacy and/or safety. The enormity of these data requires the adoption of new approaches for processing and evaluation. Recognition of patterns within the safety data is vital for sponsors seeking regulatory approval for their new products. Non-traditional data sources may contain relevant safety information; early evaluation of these data will help to determine the product safety profile. Advanced technologies have allowed the development of digital tools to screen these data, which in some situations are classified as software as a medical devices and subject to clinical evaluation and post-marketing surveillance. Artificial intelligence may help to reduce or even eliminate noise from within these data, allowing safety experts to focus on the most pertinent evidence. We propose a data typology and provide considerations on how to define adverse events within different types of data, even where no human reporter exists. Proposals are made for the automation of screening processes. We consider validation aspects to support solutions that are proven to produce reliable results, and to deliver trusted outputs to stakeholders.
Collapse
Affiliation(s)
- Robert Di Giovanni
- Chief Medical Office and Patient Safety, Global Drug Development, Novartis Pharma AG, Basel, Switzerland
| | - Andrew Cochrane
- Chief Medical Office and Patient Safety, Global Drug Development, Novartis Pharma AG, Basel, Switzerland
| | - Jeremy Parker
- Enterprise Risk Management, Research and Development, ERC, Novartis Pharma AG, Basel, Switzerland
| | - David J Lewis
- Chief Medical Office and Patient Safety, Global Drug Development, Novartis Pharma GmbH, Oeflinger Strasse 44, Wehr, Germany.,Department of Pharmacy, Pharmacology and Postgraduate Medicine, University of Hertfordshire, Hatfield, Hertfordshire, UK
| |
Collapse
|
3
|
Kaas-Hansen BS, Placido D, Rodríguez CL, Thorsen-Meyer HC, Gentile S, Nielsen AP, Brunak S, Jürgens G, Andersen SE. Language-agnostic pharmacovigilant text mining to elicit side effects from clinical notes and hospital medication records. Basic Clin Pharmacol Toxicol 2022; 131:282-293. [PMID: 35834334 PMCID: PMC9541191 DOI: 10.1111/bcpt.13773] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 06/10/2022] [Accepted: 07/09/2022] [Indexed: 11/26/2022]
Abstract
We sought to craft a drug safety signalling pipeline associating latent information in clinical free text with exposures to single drugs and drug pairs. Data arose from 12 secondary and tertiary public hospitals in two Danish regions, comprising approximately half the Danish population. Notes were operationalised with a fastText embedding, based on which we trained 10,720 neural-network models (one for each distinct single-drug/drug-pair exposure) predicting the risk of exposure given an embedding vector. We included 2,905,251 admissions between May 2008 and June 2016, with 13,740,564 distinct drug prescriptions; the median number of prescriptions was 5 (IQR: 3-9) and in 1,184,340 (41%) admissions patients used ≥5 drugs concomitantly. 10,788,259 clinical notes were included, with 179,441,739 tokens retained after pruning. Of 345 single-drug signals reviewed, 28 (8.1%) represented possibly undescribed relationships; 186 (54%) signals were clinically meaningful. 16 (14%) of the 115 drug-pair signals were possible interactions and 2 (1.7%) were known. In conclusion, we built a language-agnostic pipeline for mining associations between free-text information and medication exposure without manual curation, predicting not the likely outcome of a range of exposures, but the likely exposures for outcomes of interest. Our approach may help overcome limitations of text mining methods relying on curated data in English and can help leverage non-English free text for pharmacovigilance.
Collapse
Affiliation(s)
- Benjamin Skov Kaas-Hansen
- Clinical Pharmacology Unit, Zealand University Hospital, Denmark.,NNF Center for Protein Research, University of Copenhagen, Denmark.,Section of Biostatistics, Department of Public Health, University of Copenhagen, Denmark
| | - Davide Placido
- NNF Center for Protein Research, University of Copenhagen, Denmark
| | | | | | | | | | - Søren Brunak
- NNF Center for Protein Research, University of Copenhagen, Denmark
| | - Gesche Jürgens
- Clinical Pharmacology Unit, Zealand University Hospital, Denmark
| | | |
Collapse
|
4
|
Yu L, Cheng M, Qiu W, Xiao X, Lin W. idse-HE: Hybrid embedding graph neural network for drug side effects prediction. J Biomed Inform 2022; 131:104098. [PMID: 35636720 DOI: 10.1016/j.jbi.2022.104098] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 04/29/2022] [Accepted: 05/24/2022] [Indexed: 10/18/2022]
Abstract
In drug development, unexpected side effects are the main reason for the failure of candidate drug trials. Discovering potential side effects of drugsin silicocan improve the success rate of drug screening. However, most previous works extracted and utilized an effective representation of drugs from a single perspective. These methods merely considered the topological information of drug in the biological entity network, or combined the association information (e.g. knowledge graph KG) between drug and other biomarkers, or only used the chemical structure or sequence information of drug. Consequently, to jointly learn drug features from both the macroscopic biological network and the microscopic drug molecules. We propose a hybrid embedding graph neural network model named idse-HE, which integrates graph embedding module and node embedding module. idse-HE can fuse the drug chemical structure information, the drug substructure sequence information and the drug network topology information. Our model deems the final representation of drugs and side effects as two implicit factors to reconstruct the original matrix and predicts the potential side effects of drugs. In the robustness experiment, idse-HE shows stable performance in all indicators. We reproduce the baselines under the same conditions, and the experimental results indicate that idse-HE is superior to other advanced methods. Finally, we also collect evidence to confirm several real drug side effect pairs in the predicted results, which were previously regarded as negative samples. More detailed information, scientific researchers can access the user-friendly web-server of idse-HE at http://bioinfo.jcu.edu.cn/idse-HE. In this server, users can obtain the original data and source code, and will be guided to reproduce the model results.
Collapse
Affiliation(s)
- Liyi Yu
- School of Information Engineering, Jingdezhen Ceramic Institute, Jingdezhen 333403, China
| | - Meiling Cheng
- School of Information Engineering, Jingdezhen Ceramic Institute, Jingdezhen 333403, China
| | - Wangren Qiu
- School of Information Engineering, Jingdezhen Ceramic Institute, Jingdezhen 333403, China.
| | - Xuan Xiao
- School of Information Engineering, Jingdezhen Ceramic Institute, Jingdezhen 333403, China.
| | - Weizhong Lin
- School of Information Engineering, Jingdezhen Ceramic Institute, Jingdezhen 333403, China
| |
Collapse
|
5
|
Huang JY, Lee WP, Lee KD. Predicting Adverse Drug Reactions from Social Media Posts: Data Balance, Feature Selection and Deep Learning. Healthcare (Basel) 2022; 10:healthcare10040618. [PMID: 35455795 PMCID: PMC9024774 DOI: 10.3390/healthcare10040618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Revised: 03/22/2022] [Accepted: 03/23/2022] [Indexed: 11/16/2022] Open
Abstract
Social forums offer a lot of new channels for collecting patients’ opinions to construct predictive models of adverse drug reactions (ADRs) for post-marketing surveillance. However, due to the characteristics of social posts, there are many challenges still to be solved when deriving such models, mainly including problems caused by data sparseness, data features with a high-dimensionality, and term diversity in data. To tackle these crucial issues related to identifying ADRs from social posts, we perform data analytics from the perspectives of data balance, feature selection, and feature learning. Meanwhile, we design a comprehensive experimental analysis to investigate the performance of different data processing techniques and data modeling methods. Most importantly, we present a deep learning-based approach that adopts the BERT (Bidirectional Encoder Representations from Transformers) model with a new batch-wise adaptive strategy to enhance the predictive performance. A series of experiments have been conducted to evaluate the machine learning methods with both manual and automated feature engineering processes. The results prove that with their own advantages both types of methods are effective in ADR prediction. In contrast to the traditional machine learning methods, our feature learning approach can automatically achieve the required task to save the manual effort for the large number of experiments.
Collapse
Affiliation(s)
- Jhih-Yuan Huang
- Department of Information Management, National Sun Yat-sen University, Kaohsiung 80424, Taiwan;
| | - Wei-Po Lee
- Department of Information Management, National Sun Yat-sen University, Kaohsiung 80424, Taiwan;
- Correspondence:
| | - King-Der Lee
- Department of Healthcare Administration and Medical Informatics, Kaohsiung Medical University, Kaohsiung 80708, Taiwan;
| |
Collapse
|
6
|
Chopard D, Treder MS, Corcoran P, Ahmed N, Johnson C, Busse M, Spasic I. Text Mining of Adverse Events in Clinical Trials: Deep Learning Approach. JMIR Med Inform 2021; 9:e28632. [PMID: 34951601 PMCID: PMC8742206 DOI: 10.2196/28632] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 08/01/2021] [Accepted: 11/14/2021] [Indexed: 11/28/2022] Open
Abstract
BACKGROUND Pharmacovigilance and safety reporting, which involve processes for monitoring the use of medicines in clinical trials, play a critical role in the identification of previously unrecognized adverse events or changes in the patterns of adverse events. OBJECTIVE This study aims to demonstrate the feasibility of automating the coding of adverse events described in the narrative section of the serious adverse event report forms to enable statistical analysis of the aforementioned patterns. METHODS We used the Unified Medical Language System (UMLS) as the coding scheme, which integrates 217 source vocabularies, thus enabling coding against other relevant terminologies such as the International Classification of Diseases-10th Revision, Medical Dictionary for Regulatory Activities, and Systematized Nomenclature of Medicine). We used MetaMap, a highly configurable dictionary lookup software, to identify the mentions of the UMLS concepts. We trained a binary classifier using Bidirectional Encoder Representations from Transformers (BERT), a transformer-based language model that captures contextual relationships, to differentiate between mentions of the UMLS concepts that represented adverse events and those that did not. RESULTS The model achieved a high F1 score of 0.8080, despite the class imbalance. This is 10.15 percent points lower than human-like performance but also 17.45 percent points higher than that of the baseline approach. CONCLUSIONS These results confirmed that automated coding of adverse events described in the narrative section of serious adverse event reports is feasible. Once coded, adverse events can be statistically analyzed so that any correlations with the trialed medicines can be estimated in a timely fashion.
Collapse
Affiliation(s)
- Daphne Chopard
- School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom
| | - Matthias S Treder
- School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom
| | - Padraig Corcoran
- School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom
| | - Nagheen Ahmed
- Centre for Trials Research, Cardiff University, Cardiff, United Kingdom
| | - Claire Johnson
- Centre for Trials Research, Cardiff University, Cardiff, United Kingdom
| | - Monica Busse
- Centre for Trials Research, Cardiff University, Cardiff, United Kingdom
| | - Irena Spasic
- School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom
| |
Collapse
|
7
|
Miscommunication in the age of communication: A crowdsourcing framework for symptom surveillance at the time of pandemics. Int J Med Inform 2021; 151:104486. [PMID: 33991885 PMCID: PMC8111883 DOI: 10.1016/j.ijmedinf.2021.104486] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Revised: 04/22/2021] [Accepted: 05/07/2021] [Indexed: 11/20/2022]
Abstract
OBJECTIVE There was a significant delay in compiling a complete list of the symptoms of COVID-19 during the 2020 outbreak of the disease. When there is little information about the symptoms of a novel disease, interventions to contain the spread of the disease would be suboptimal because people experiencing symptoms that are not yet known to be related to the disease may not limit their social activities. Our goal was to understand whether users' social media postings about the symptoms of novel diseases could be used to develop a complete list of the disease symptoms in a shorter time. MATERIALS AND METHODS We used the Twitter API to download tweets that contained 'coronavirus', 'COVID-19', and 'symptom'. After data cleaning, the resulting dataset consisted of over 95,000 unique, English tweets posted between January 17, 2020 and March 15, 2020 that contained references to the symptoms of COVID-19. We analyzed this data using network and time series methods. RESULTS We found that a complete list of the symptoms of COVID-19 could have been compiled by mid-March 2020, before most states in the U.S. announced a lockdown and about 75 days earlier than the list was completed on CDC's website. DISCUSSION & CONCLUSION We conclude that national and international health agencies should use the crowd-sourced intelligence obtained from social media to develop effective symptom surveillance systems in the early stages of pandemics. We propose a high-level framework that facilitates the collection, analysis, and dissemination of information that are posted in various languages and on different social media platforms about the symptoms of novel diseases.
Collapse
|
8
|
Schotland P, Racz R, Jackson DB, Soldatos TG, Levin R, Strauss DG, Burkhart K. Target Adverse Event Profiles for Predictive Safety in the Postmarket Setting. Clin Pharmacol Ther 2021; 109:1232-1243. [PMID: 33090463 PMCID: PMC8246740 DOI: 10.1002/cpt.2074] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Accepted: 08/31/2020] [Indexed: 12/21/2022]
Abstract
We improved a previous pharmacological target adverse-event (TAE) profile model to predict adverse events (AEs) on US Food and Drug Administration (FDA) drug labels at the time of approval. The new model uses more drugs and features for learning as well as a new algorithm. Comparator drugs sharing similar target activities to a drug of interest were evaluated by aggregating AEs from the FDA Adverse Event Reporting System (FAERS), FDA drug labels, and medical literature. An ensemble machine learning model was used to evaluate FAERS case count, disproportionality scores, percent of comparator drug labels with a specific AE, and percent of comparator drugs with the reports of the event in the literature. Overall classifier performance was F1 of 0.71, area under the precision-recall curve of 0.78, and area under the receiver operating characteristic curve of 0.87. TAE analysis continues to show promise as a method to predict adverse events at the time of approval.
Collapse
Affiliation(s)
- Peter Schotland
- Division of Applied Regulatory ScienceOffice of Clinical PharmacologyCenter for Drug Evaluation and ResearchUS Food and Drug AdministrationSilver SpringMarylandUSA
- Present address:
Office of Oncologic DiseasesOffice of New DrugsCenter for Drug Evaluation and ResearchUS Food and Drug AdministrationSilver SpringMarylandUSA
| | - Rebecca Racz
- Division of Applied Regulatory ScienceOffice of Clinical PharmacologyCenter for Drug Evaluation and ResearchUS Food and Drug AdministrationSilver SpringMarylandUSA
| | | | | | - Robert Levin
- Office of Surveillance and EpidemiologyCenter for Drug Evaluation and ResearchUS Food and Drug AdministrationSilver SpringMarylandUSA
| | - David G. Strauss
- Division of Applied Regulatory ScienceOffice of Clinical PharmacologyCenter for Drug Evaluation and ResearchUS Food and Drug AdministrationSilver SpringMarylandUSA
| | - Keith Burkhart
- Division of Applied Regulatory ScienceOffice of Clinical PharmacologyCenter for Drug Evaluation and ResearchUS Food and Drug AdministrationSilver SpringMarylandUSA
| |
Collapse
|
9
|
Chamikara MAP, Chen YPP. MedFused: A framework to discover the relationships between drug chemical functional group impacts and side effects. Comput Biol Med 2021; 133:104361. [PMID: 33872968 DOI: 10.1016/j.compbiomed.2021.104361] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Revised: 03/12/2021] [Accepted: 03/25/2021] [Indexed: 11/16/2022]
Abstract
It is a well-known fact that there are often side effects to the long-term use of certain medications. These side effects can vary from mild dizziness to, at its most serious, death. The main factors that cause these side effects are the chemical composition, the mode of treatment, and the dose. The dynamics that govern the reaction of a drug heavily depend on its structural composition. The structural composition of a drug is defined by the structural arrangement of the corresponding basic chemical functional groups. Hence, it is essential to investigate the effect of chemical functional groups on the side effects to synthesize drugs with minimal side effects. To support this process, we developed a framework named MedFused (Medical Functional Group Side Effects Database), which is composed of drugs (International Union of Pure and Applied Chemistry: IUPAC nomenclature), functional groups, and the side effects along with other valuable information such as STITCH (search tool for interactions of chemicals) compound ID, and the Unified Medical Language System (UMLS) concept ID. We develop a web framework that functions on the MedFused system database on top of the Django web framework. Our web server supports functionalities such as exploring the database and descriptive graph tools, which provide additional exploration capabilities to the framework. These descriptive tools include histograms, pie charts, and association charts, which further explore the system. Above these basic tools, MedFused includes functionality to discover the drug's "chemical functional group" impact on "side effects". The method conducts an association rule analysis on the relationships by considering the MedFused database as a collection of transactions. A specific transaction has a list of the functional groups of a drug and one side effect. Hence, a drug that has more than one side effect forms multiple transactions. Next, we generate a binary feature matrix based on the transactions and introduce a pruning mechanism to consider only the potential functional groups and side effects based on their support (frequencies), subjected to a predefined threshold (which can be changed accordingly). As the current version of the MedFused database has a limited number of side effects (hence low support), we restricted the analysis to identify the functional groups which have the most potential of causing a particular side effect, based on a confidence value of 1. Our framework can be further extended with more functions and tools as it supports the model view controller (MVC) architecture, which is inherited from the Django Python web framework.
Collapse
Affiliation(s)
| | - Yi-Ping Phoebe Chen
- College of Science, Health and Engineering, La Trobe University, Melbourne, Australia.
| |
Collapse
|
10
|
Spiro A, Fernández García J, Yanover C. Inferring new relations between medical entities using literature curated term co-occurrences. JAMIA Open 2020; 2:378-385. [PMID: 31984370 PMCID: PMC6951958 DOI: 10.1093/jamiaopen/ooz022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 06/05/2019] [Accepted: 06/08/2019] [Indexed: 11/17/2022] Open
Abstract
Objectives Identifying new relations between medical entities, such as drugs, diseases, and side effects, is typically a resource-intensive task, involving experimentation and clinical trials. The increased availability of related data and curated knowledge enables a computational approach to this task, notably by training models to predict likely relations. Such models rely on meaningful representations of the medical entities being studied. We propose a generic features vector representation that leverages co-occurrences of medical terms, linked with PubMed citations. Materials and Methods We demonstrate the usefulness of the proposed representation by inferring two types of relations: a drug causes a side effect and a drug treats an indication. To predict these relations and assess their effectiveness, we applied 2 modeling approaches: multi-task modeling using neural networks and single-task modeling based on gradient boosting machines and logistic regression. Results These trained models, which predict either side effects or indications, obtained significantly better results than baseline models that use a single direct co-occurrence feature. The results demonstrate the advantage of a comprehensive representation. Discussion Selecting the appropriate representation has an immense impact on the predictive performance of machine learning models. Our proposed representation is powerful, as it spans multiple medical domains and can be used to predict a wide range of relation types. Conclusion The discovery of new relations between various medical entities can be translated into meaningful insights, for example, related to drug development or disease understanding. Our representation of medical entities can be used to train models that predict such relations, thus accelerating healthcare-related discoveries.
Collapse
Affiliation(s)
- Adam Spiro
- Machine Learning for Healthcare and Life Sciences, Department of Health Informatics, IBM Research, Haifa, Israel
| | - Jonatan Fernández García
- Machine Learning for Healthcare and Life Sciences, Department of Health Informatics, IBM Research, Haifa, Israel
| | - Chen Yanover
- Machine Learning for Healthcare and Life Sciences, Department of Health Informatics, IBM Research, Haifa, Israel
| |
Collapse
|
11
|
Davazdahemami B, Delen D. A chronological pharmacovigilance network analytics approach for predicting adverse drug events. J Am Med Inform Assoc 2019; 25:1311-1321. [PMID: 30085102 DOI: 10.1093/jamia/ocy097] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2018] [Accepted: 06/29/2018] [Indexed: 12/31/2022] Open
Abstract
Objectives This study extends prior research by combining a chronological pharmacovigilance network approach with machine-learning (ML) techniques to predict adverse drug events (ADEs) based on the drugs' similarities in terms of the proteins they target in the human body. The focus of this research, though, is particularly centered on predicting the drug-ADE associations for a set of 8 common and high-risk ADEs. Materials and methods large collection of annotated MEDLINE biomedical articles was used to construct a drug-ADE network, and the network was further equipped with information about drugs' target proteins. Several network metrics were extracted and used as predictors in ML algorithms to predict the existence of network edges (ie, associations or relationships). Results Gradient boosted trees (GBTs) as an ensemble ML algorithm outperformed other prediction methods in identifying the drug-ADE associations with an overall accuracy of 92.8% on the validation sample. The prediction model was able to predict drug-ADE associations, on average, 3.84 years earlier than they were actually mentioned in the biomedical literature. Conclusion While network analysis and ML techniques were used in separation in prior ADE studies, our results showed that they, in combination with each other, can boost the power of one another and predict better. Moreover, our results highlight the superior capability of ensemble-type ML methods in capturing drug-ADE patterns compared to the regular (ie, singular), ML algorithms.
Collapse
Affiliation(s)
- Behrooz Davazdahemami
- Department of Management Science and Information Systems, Oklahoma State University, Stillwater, Oklahoma, USA
| | - Dursun Delen
- Department of Management Science and Information Systems, Center for Health Systems Innovation, Oklahoma State University, Stillwater, Oklahoma, USA
| |
Collapse
|
12
|
Zhang T, Lin H, Ren Y, Yang L, Xu B, Yang Z, Wang J, Zhang Y. Adverse drug reaction detection via a multihop self-attention mechanism. BMC Bioinformatics 2019; 20:479. [PMID: 31533622 PMCID: PMC6751590 DOI: 10.1186/s12859-019-3053-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Accepted: 08/26/2019] [Indexed: 12/17/2022] Open
Abstract
Background The adverse reactions that are caused by drugs are potentially life-threatening problems. Comprehensive knowledge of adverse drug reactions (ADRs) can reduce their detrimental impacts on patients. Detecting ADRs through clinical trials takes a large number of experiments and a long period of time. With the growing amount of unstructured textual data, such as biomedical literature and electronic records, detecting ADRs in the available unstructured data has important implications for ADR research. Most of the neural network-based methods typically focus on the simple semantic information of sentence sequences; however, the relationship of the two entities depends on more complex semantic information. Methods In this paper, we propose multihop self-attention mechanism (MSAM) model that aims to learn the multi-aspect semantic information for the ADR detection task. first, the contextual information of the sentence is captured by using the bidirectional long short-term memory (Bi-LSTM) model. Then, via applying the multiple steps of an attention mechanism, multiple semantic representations of a sentence are generated. Each attention step obtains a different attention distribution focusing on the different segments of the sentence. Meanwhile, our model locates and enhances various keywords from the multiple representations of a sentence. Results Our model was evaluated by using two ADR corpora. It is shown that the method has a stable generalization ability. Via extensive experiments, our model achieved F-measure of 0.853, 0.799 and 0.851 for ADR detection for TwiMed-PubMed, TwiMed-Twitter, and ADE, respectively. The experimental results showed that our model significantly outperforms other compared models for ADR detection. Conclusions In this paper, we propose a modification of multihop self-attention mechanism (MSAM) model for an ADR detection task. The proposed method significantly improved the learning of the complex semantic information of sentences.
Collapse
Affiliation(s)
- Tongxuan Zhang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Hongfei Lin
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China.
| | - Yuqi Ren
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Liang Yang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Bo Xu
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Zhihao Yang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Jian Wang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Yijia Zhang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China
| |
Collapse
|
13
|
Gavrielov-Yusim N, Kürzinger ML, Nishikawa C, Pan C, Pouget J, Epstein LB, Golant Y, Tcherny-Lessenot S, Lin S, Hamelin B, Juhaeri J. Comparison of text processing methods in social media-based signal detection. Pharmacoepidemiol Drug Saf 2019; 28:1309-1317. [PMID: 31392844 DOI: 10.1002/pds.4857] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2018] [Revised: 06/12/2019] [Accepted: 06/14/2019] [Indexed: 11/08/2022]
Abstract
PURPOSE Adverse event (AE) identification in social media (SM) can be performed using various types of natural language processing (NLP) and machine learning (ML). These methods can be categorized by complexity and precision level. Co-occurrence-based ML methods are rather basic, as they identify simultaneous appearance of drugs and clinical events in a single post. In contrast, statistical learning methods involve more complex NLP and identify drugs, events, and associations between them. We aimed to compare the ability of co-occurrence and NLP to identify AEs and signals of disproportionate reporting (SDR) in patient-generated SM. We also examined the performance of lift in SM-based signal detection (SD). METHODS Our examination was performed in a corpus of SM posts crawled from open online patient forums and communities, using the spontaneously reported VigiBase data as reference data set. RESULTS We found that co-occurrence and NLP produce AEs, which are 57% and 93% consistent with VigiBase AEs, respectively. Among the SDRs identified both in SM and in VigiBase, up to 55.3% were identified earlier in co-occurrence, and up to 32.1% were identified earlier in NLP-processed SM. Using lift in SM SD provided performance similar to frequentist methods, both in co-occurrence and in NLP-processed AEs. CONCLUSION Our results indicate that using SM as a data source complementary to traditional pharmacovigilance sources should be considered further. Various levels of SM processing may be considered, depending on the preferred policies and tolerance for false-positive to false-negative balance in routine pharmacovigilance processes.
Collapse
Affiliation(s)
| | | | - Chihiro Nishikawa
- Epidemiology and Benefit Risk Evaluation, Sanofi, Chilly-Mazarin, France
| | - Chunshen Pan
- Epidemiology and Benefit Risk Evaluation, Sanofi, Bridgewater, NJ, USA
| | - Julie Pouget
- Information Technology and Solutions, R&D CMO - SC Real World Evidence, Sanofi, Lyon, France
| | | | | | | | - Stephen Lin
- Global Pharmacovigilance, Sanofi, Bridgewater, NJ, USA
| | | | - Juhaeri Juhaeri
- Epidemiology and Benefit Risk Evaluation, Sanofi, Bridgewater, NJ, USA
| |
Collapse
|
14
|
Natsiavas P, Malousi A, Bousquet C, Jaulent MC, Koutkias V. Computational Advances in Drug Safety: Systematic and Mapping Review of Knowledge Engineering Based Approaches. Front Pharmacol 2019; 10:415. [PMID: 31156424 PMCID: PMC6533857 DOI: 10.3389/fphar.2019.00415] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2018] [Accepted: 04/02/2019] [Indexed: 12/12/2022] Open
Abstract
Drug Safety (DS) is a domain with significant public health and social impact. Knowledge Engineering (KE) is the Computer Science discipline elaborating on methods and tools for developing “knowledge-intensive” systems, depending on a conceptual “knowledge” schema and some kind of “reasoning” process. The present systematic and mapping review aims to investigate KE-based approaches employed for DS and highlight the introduced added value as well as trends and possible gaps in the domain. Journal articles published between 2006 and 2017 were retrieved from PubMed/MEDLINE and Web of Science® (873 in total) and filtered based on a comprehensive set of inclusion/exclusion criteria. The 80 finally selected articles were reviewed on full-text, while the mapping process relied on a set of concrete criteria (concerning specific KE and DS core activities, special DS topics, employed data sources, reference ontologies/terminologies, and computational methods, etc.). The analysis results are publicly available as online interactive analytics graphs. The review clearly depicted increased use of KE approaches for DS. The collected data illustrate the use of KE for various DS aspects, such as Adverse Drug Event (ADE) information collection, detection, and assessment. Moreover, the quantified analysis of using KE for the respective DS core activities highlighted room for intensifying research on KE for ADE monitoring, prevention and reporting. Finally, the assessed use of the various data sources for DS special topics demonstrated extensive use of dominant data sources for DS surveillance, i.e., Spontaneous Reporting Systems, but also increasing interest in the use of emerging data sources, e.g., observational healthcare databases, biochemical/genetic databases, and social media. Various exemplar applications were identified with promising results, e.g., improvement in Adverse Drug Reaction (ADR) prediction, detection of drug interactions, and novel ADE profiles related with specific mechanisms of action, etc. Nevertheless, since the reviewed studies mostly concerned proof-of-concept implementations, more intense research is required to increase the maturity level that is necessary for KE approaches to reach routine DS practice. In conclusion, we argue that efficiently addressing DS data analytics and management challenges requires the introduction of high-throughput KE-based methods for effective knowledge discovery and management, resulting ultimately, in the establishment of a continuous learning DS system.
Collapse
Affiliation(s)
- Pantelis Natsiavas
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece.,Sorbonne Université, INSERM, Univ Paris 13, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, Paris, France
| | - Andigoni Malousi
- Laboratory of Biological Chemistry, Department of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Cédric Bousquet
- Sorbonne Université, INSERM, Univ Paris 13, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, Paris, France.,Public Health and Medical Information Unit, University Hospital of Saint-Etienne, Saint-Étienne, France
| | - Marie-Christine Jaulent
- Sorbonne Université, INSERM, Univ Paris 13, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, Paris, France
| | - Vassilis Koutkias
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| |
Collapse
|
15
|
Zheng Y, Peng H, Ghosh S, Lan C, Li J. Inverse similarity and reliable negative samples for drug side-effect prediction. BMC Bioinformatics 2019; 19:554. [PMID: 30717666 PMCID: PMC7402513 DOI: 10.1186/s12859-018-2563-x] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Accepted: 12/07/2018] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND In silico prediction of potential drug side-effects is of crucial importance for drug development, since wet experimental identification of drug side-effects is expensive and time-consuming. Existing computational methods mainly focus on leveraging validated drug side-effect relations for the prediction. The performance is severely impeded by the lack of reliable negative training data. Thus, a method to select reliable negative samples becomes vital in the performance improvement. METHODS Most of the existing computational prediction methods are essentially based on the assumption that similar drugs are inclined to share the same side-effects, which has given rise to remarkable performance. It is also rational to assume an inverse proposition that dissimilar drugs are less likely to share the same side-effects. Based on this inverse similarity hypothesis, we proposed a novel method to select highly-reliable negative samples for side-effect prediction. The first step of our method is to build a drug similarity integration framework to measure the similarity between drugs from different perspectives. This step integrates drug chemical structures, drug target proteins, drug substituents, and drug therapeutic information as features into a unified framework. Then, a similarity score between each candidate negative drug and validated positive drugs is calculated using the similarity integration framework. Those candidate negative drugs with lower similarity scores are preferentially selected as negative samples. Finally, both the validated positive drugs and the selected highly-reliable negative samples are used for predictions. RESULTS The performance of the proposed method was evaluated on simulative side-effect prediction of 917 DrugBank drugs, comparing with four machine-learning algorithms. Extensive experiments show that the drug similarity integration framework has superior capability in capturing drug features, achieving much better performance than those based on a single type of drug property. Besides, the four machine-learning algorithms achieved significant improvement in macro-averaging F1-score (e.g., SVM from 0.655 to 0.898), macro-averaging precision (e.g., RBF from 0.592 to 0.828) and macro-averaging recall (e.g., KNN from 0.651 to 0.772) complimentarily attributed to the highly-reliable negative samples selected by the proposed method. CONCLUSIONS The results suggest that the inverse similarity hypothesis and the integration of different drug properties are valuable for side-effect prediction. The selection of highly-reliable negative samples can also make significant contributions to the performance improvement.
Collapse
Affiliation(s)
- Yi Zheng
- Advanced Analytics Institute, FEIT, University of Technology Sydney, 15 Broadway, Ultimo, NSW 2007, Australia
| | - Hui Peng
- Advanced Analytics Institute, FEIT, University of Technology Sydney, 15 Broadway, Ultimo, NSW 2007, Australia
| | - Shameek Ghosh
- Advanced Analytics Institute, FEIT, University of Technology Sydney, 15 Broadway, Ultimo, NSW 2007, Australia
| | - Chaowang Lan
- Advanced Analytics Institute, FEIT, University of Technology Sydney, 15 Broadway, Ultimo, NSW 2007, Australia
| | - Jinyan Li
- Advanced Analytics Institute, FEIT, University of Technology Sydney, 15 Broadway, Ultimo, NSW 2007, Australia.
| |
Collapse
|
16
|
McDonald L, Malcolm B, Ramagopalan S, Syrad H. Real-world data and the patient perspective: the PROmise of social media? BMC Med 2019; 17:11. [PMID: 30646913 PMCID: PMC6334434 DOI: 10.1186/s12916-018-1247-8] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/11/2018] [Accepted: 12/21/2018] [Indexed: 12/30/2022] Open
Abstract
Understanding the patient perspective is fundamental to delivering patient-centred care. In most healthcare systems, however, patient-reported outcomes are not regularly collected or recorded as part of routine clinical care, despite evidence that doing so can have tangible clinical benefit. In the absence of the routine collection of these data, research is beginning to turn to social media as a novel means to capture the patient voice. Publicly available social media data can now be analysed with relative ease, bypassing many logistical hurdles associated with traditional approaches and allowing for accelerated and cost-effective data collection. Existing work has shown these data can offer credible insight into the patient experience, although more work is needed to understand limitations with respect to patient representativeness and nuances of captured experience. Nevertheless, linking social media to electronic medical records offers a significant opportunity for patient views to be systematically collected for health services research and ultimately to improve patient care.
Collapse
Affiliation(s)
- Laura McDonald
- Centre for Observational Research and Data Sciences, Bristol-Myers Squibb, Uxbridge, UK
| | | | - Sreeram Ramagopalan
- Centre for Observational Research and Data Sciences, Bristol-Myers Squibb, Uxbridge, UK.
| | | |
Collapse
|
17
|
Zheng Y, Peng H, Zhang X, Zhao Z, Yin J, Li J. Predicting adverse drug reactions of combined medication from heterogeneous pharmacologic databases. BMC Bioinformatics 2018; 19:517. [PMID: 30598065 PMCID: PMC6311930 DOI: 10.1186/s12859-018-2520-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Early and accurate identification of potential adverse drug reactions (ADRs) for combined medication is vital for public health. Existing methods either rely on expensive wet-lab experiments or detecting existing associations from related records. Thus, they inevitably suffer under-reporting, delays in reporting, and inability to detect ADRs for new and rare drugs. The current application of machine learning methods is severely impeded by the lack of proper drug representation and credible negative samples. Therefore, a method to represent drugs properly and to select credible negative samples becomes vital in applying machine learning methods to this problem. RESULTS In this work, we propose a machine learning method to predict ADRs of combined medication from pharmacologic databases by building up highly-credible negative samples (HCNS-ADR). Specifically, we fuse heterogeneous information from different databases and represent each drug as a multi-dimensional vector according to its chemical substructures, target proteins, substituents, and related pathways first. Then, a drug-pair vector is obtained by appending the vector of one drug to the other. Next, we construct a drug-disease-gene network and devise a scoring method to measure the interaction probability of every drug pair via network analysis. Drug pairs with lower interaction probability are preferentially selected as negative samples. Following that, the validated positive samples and the selected credible negative samples are projected into a lower-dimensional space using the principal component analysis. Finally, a classifier is built for each ADR using its positive and negative samples with reduced dimensions. The performance of the proposed method is evaluated on simulative prediction for 1276 ADRs and 1048 drugs, comparing using four machine learning algorithms and with two baseline approaches. Extensive experiments show that the proposed way to represent drugs characterizes drugs accurately. With highly-credible negative samples selected by HCNS-ADR, the four machine learning algorithms achieve significant performance improvements. HCNS-ADR is also shown to be able to predict both known and novel drug-drug-ADR associations, outperforming two other baseline approaches significantly. CONCLUSIONS The results demonstrate that integration of different drug properties to represent drugs are valuable for ADR prediction of combined medication and the selection of highly-credible negative samples can significantly improve the prediction performance.
Collapse
Affiliation(s)
- Yi Zheng
- Advanced Analytics Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, 15 Broadway Ultimo, Sydney, 2007 Australia
| | - Hui Peng
- Advanced Analytics Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, 15 Broadway Ultimo, Sydney, 2007 Australia
| | - Xiaocai Zhang
- Advanced Analytics Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, 15 Broadway Ultimo, Sydney, 2007 Australia
| | - Zhixun Zhao
- Advanced Analytics Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, 15 Broadway Ultimo, Sydney, 2007 Australia
| | - Jie Yin
- Discipline of Business Analytics, The University of Sydney, Darlington, Sydney, 2006 Australia
| | - Jinyan Li
- Advanced Analytics Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, 15 Broadway Ultimo, Sydney, 2007 Australia
| |
Collapse
|
18
|
Mining heterogeneous networks with topological features constructed from patient-contributed content for pharmacovigilance. Artif Intell Med 2018; 90:42-52. [PMID: 30093253 DOI: 10.1016/j.artmed.2018.07.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2017] [Revised: 07/12/2018] [Accepted: 07/18/2018] [Indexed: 11/21/2022]
Abstract
Drug safety, also called pharmacovigilance, represents a serious health problem all over the world. Adverse drug reactions (ADRs) and drug-drug interactions (DDIs) are two important issues in pharmacovigilance, and how to detect drug safety signals has drawn many researchers' attention and efforts. Currently, methods proposed for ADR and DDI detection are mainly based on traditional data sources such as spontaneous reporting data, electronic health records, pharmaceutical databases, and biomedical literature. However, these data sources are either limited by under-reporting ratio, privacy issues, high cost, or long publication cycle. In this study, we propose a framework for drug safety signal detection by harnessing online health community data, a timely, informative, and publicly available data source. Concretely, we used MedHelp as the data source to collect patient-contributed content based on which a weighted heterogeneous network was constructed. We extracted topological features from the network, quantified them with different weighting methods, and used supervised learning method for both ADR and DDI signal detection. In addition, after identifying DDI signals, we proposed a new metric, named Interaction Ratio, to identify associated ADRs due to suspected interactions. The experiment results showed that our proposed techniques outperforms baseline methods.
Collapse
|
19
|
Wang J, Zhao L, Ye Y, Zhang Y. Adverse event detection by integrating twitter data and VAERS. J Biomed Semantics 2018; 9:19. [PMID: 29925405 PMCID: PMC6011255 DOI: 10.1186/s13326-018-0184-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2018] [Accepted: 05/10/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Vaccine has been one of the most successful public health interventions to date. However, vaccines are pharmaceutical products that carry risks so that many adverse events (AEs) are reported after receiving vaccines. Traditional adverse event reporting systems suffer from several crucial challenges including poor timeliness. This motivates increasing social media-based detection systems, which demonstrate successful capability to capture timely and prevalent disease information. Despite these advantages, social media-based AE detection suffers from serious challenges such as labor-intensive labeling and class imbalance of the training data. RESULTS To tackle both challenges from traditional reporting systems and social media, we exploit their complementary strength and develop a combinatorial classification approach by integrating Twitter data and the Vaccine Adverse Event Reporting System (VAERS) information aiming to identify potential AEs after influenza vaccine. Specifically, we combine formal reports which have accurately predefined labels with social media data to reduce the cost of manual labeling; in order to combat the class imbalance problem, a max-rule based multi-instance learning method is proposed to bias positive users. Various experiments were conducted to validate our model compared with other baselines. We observed that (1) multi-instance learning methods outperformed baselines when only Twitter data were used; (2) formal reports helped improve the performance metrics of our multi-instance learning methods consistently while affecting the performance of other baselines negatively; (3) the effect of formal reports was more obvious when the training size was smaller. Case studies show that our model labeled users and tweets accurately. CONCLUSIONS We have developed a framework to detect vaccine AEs by combining formal reports with social media data. We demonstrate the power of formal reports on the performance improvement of AE detection when the amount of social media data was small. Various experiments and case studies show the effectiveness of our model.
Collapse
Affiliation(s)
- Junxiang Wang
- Department of Information Science and Technology, George Mason University, Fairfax, VA, USA
| | - Liang Zhao
- Department of Information Science and Technology, George Mason University, Fairfax, VA, USA
| | - Yanfang Ye
- Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV, USA.,Benjamin M. Statler College of Engineering and Mineral Resources, West Virginia University, Morgantown, WV, USA
| | - Yuji Zhang
- Department of Epidemiology & Public Health, University of Maryland School of Medicine, Baltimore, MD, USA. .,Division of Biostatistics and Bioinformatics, University of Maryland Marlene and Stewart Greenebaum Comprehensive Cancer Center, Baltimore, MD, USA.
| |
Collapse
|
20
|
Liu J, Wang G. Pharmacovigilance from social media: An improved random subspace method for identifying adverse drug events. Int J Med Inform 2018; 117:33-43. [PMID: 30032963 DOI: 10.1016/j.ijmedinf.2018.06.008] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Revised: 05/10/2018] [Accepted: 06/12/2018] [Indexed: 11/17/2022]
Abstract
OBJECTIVE Recent advances in Web 2.0 technologies have seen significant strides towards utilizing patient-generated content for pharmacovigilance. Social media-based pharmacovigilance has great potential to augment current efforts and provide regulatory authorities with valuable decision aids. Among various pharmacovigilance activities, identifying adverse drug events (ADEs) is very important for patient safety. However, in health-related discussion forums, ADEs may confound with drug indications and beneficial effects, etc. Therefore, the focus of this study is to develop a strategy to identify ADEs from other semantic types, and meanwhile to determine the drug that an ADE is associated with. MATERIALS AND METHODS In this study, two groups of features, i.e., shallow linguistic features and semantic features, are explored. Moreover, motivated and inspired by the characteristics of explored two feature categories for social media-based ADE identification, an improved random subspace method, called Stratified Sampling-based Random Subspace (SSRS), is proposed. Unlike conventional random subspace method that applies random sampling for subspace selection, SSRS adopts stratified sampling-based subspace selection strategy. RESULTS A case study on heart disease discussion forums is performed to evaluate the effectiveness of the SSRS method. Experimental results reveal that the proposed SSRS method significantly outperforms other compared ensemble methods and existing approaches for ADE identification. DISCUSSION AND CONCLUSION Our proposed method is easy to implement since it is based on two feature sets that can be naturally derived, and therefore, can omit artificial stratum generation efforts. Moreover, SSRS has great potential of being applied to deal with other high-dimensional problems that can represent original data from two different aspects.
Collapse
Affiliation(s)
- Jing Liu
- School of Management Science and Engineering, Tianjin University of Finance and Economics, Tianjin 300222, PR China
| | - Gang Wang
- School of Management, Hefei University of Technology, Hefei, Anhui 230009, PR China.
| |
Collapse
|
21
|
Sinha MS, Freifeld CC, Brownstein JS, Donneyong MM, Rausch P, Lappin BM, Zhou EH, Dal Pan GJ, Pawar AM, Hwang TJ, Avorn J, Kesselheim AS. Social Media Impact of the Food and Drug Administration's Drug Safety Communication Messaging About Zolpidem: Mixed-Methods Analysis. JMIR Public Health Surveill 2018; 4:e1. [PMID: 29305342 PMCID: PMC5775485 DOI: 10.2196/publichealth.7823] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2017] [Revised: 09/29/2017] [Accepted: 10/30/2017] [Indexed: 11/28/2022] Open
Abstract
Background The Food and Drug Administration (FDA) issues drug safety communications (DSCs) to health care professionals, patients, and the public when safety issues emerge related to FDA-approved drug products. These safety messages are disseminated through social media to ensure broad uptake. Objective The objective of this study was to assess the social media dissemination of 2 DSCs released in 2013 for the sleep aid zolpidem. Methods We used the MedWatcher Social program and the DataSift historic query tool to aggregate Twitter and Facebook posts from October 1, 2012 through August 31, 2013, a period beginning approximately 3 months before the first DSC and ending 3 months after the second. Posts were categorized as (1) junk, (2) mention, and (3) adverse event (AE) based on a score between –0.2 (completely unrelated) to 1 (perfectly related). We also looked at Google Trends data and Wikipedia edits for the same time period. Google Trends search volume is scaled on a range of 0 to 100 and includes “Related queries” during the relevant time periods. An interrupted time series (ITS) analysis assessed the impact of DSCs on the counts of posts with specific mention of zolpidem-containing products. Chow tests for known structural breaks were conducted on data from Twitter, Facebook, and Google Trends. Finally, Wikipedia edits were pulled from the website’s editorial history, which lists all revisions to a given page and the editor’s identity. Results In total, 174,286 Twitter posts and 59,641 Facebook posts met entry criteria. Of those, 16.63% (28,989/174,286) of Twitter posts and 25.91% (15,453/59,641) of Facebook posts were labeled as junk and excluded. AEs and mentions represented 9.21% (16,051/174,286) and 74.16% (129,246/174,286) of Twitter posts and 5.11% (3,050/59,641) and 68.98% (41,138/59,641) of Facebook posts, respectively. Total daily counts of posts about zolpidem-containing products increased on Twitter and Facebook on the day of the first DSC; Google searches increased on the week of the first DSC. ITS analyses demonstrated variability but pointed to an increase in interest around the first DSC. Chow tests were significant (P<.0001) for both DSCs on Facebook and Twitter, but only the first DSC on Google Trends. Wikipedia edits occurred soon after each DSC release, citing news articles rather than the DSC itself and presenting content that needed subsequent revisions for accuracy. Conclusions Social media offers challenges and opportunities for dissemination of the DSC messages. The FDA could consider strategies for more actively disseminating DSC safety information through social media platforms, particularly when announcements require updating. The FDA may also benefit from directly contributing content to websites like Wikipedia that are frequently accessed for drug-related information.
Collapse
Affiliation(s)
- Michael S Sinha
- Program On Regulation, Therapeutics, And Law, Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, United States
| | - Clark C Freifeld
- College of Computer and Information Science, Northeastern University, Boston, MA, United States
| | - John S Brownstein
- Computational Epidemiology Group, Boston Children's Hospital, Boston, MA, United States
| | - Macarius M Donneyong
- Health Services Management and Policy, College of Public Health, The Ohio State University, Columbus, OH, United States
| | - Paula Rausch
- Food and Drug Administration, Silver Spring, MD, United States
| | - Brian M Lappin
- Food and Drug Administration, Silver Spring, MD, United States
| | - Esther H Zhou
- Food and Drug Administration, Silver Spring, MD, United States
| | | | - Ajinkya M Pawar
- Program On Regulation, Therapeutics, And Law, Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, United States
| | - Thomas J Hwang
- Program On Regulation, Therapeutics, And Law, Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, United States
| | - Jerry Avorn
- Program On Regulation, Therapeutics, And Law, Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, United States
| | - Aaron S Kesselheim
- Program On Regulation, Therapeutics, And Law, Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, United States
| |
Collapse
|
22
|
P Tafti A, Badger J, LaRose E, Shirzadi E, Mahnke A, Mayer J, Ye Z, Page D, Peissig P. Adverse Drug Event Discovery Using Biomedical Literature: A Big Data Neural Network Adventure. JMIR Med Inform 2017; 5:e51. [PMID: 29222076 PMCID: PMC5741828 DOI: 10.2196/medinform.9170] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2017] [Revised: 11/07/2017] [Accepted: 11/08/2017] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND The study of adverse drug events (ADEs) is a tenured topic in medical literature. In recent years, increasing numbers of scientific articles and health-related social media posts have been generated and shared daily, albeit with very limited use for ADE study and with little known about the content with respect to ADEs. OBJECTIVE The aim of this study was to develop a big data analytics strategy that mines the content of scientific articles and health-related Web-based social media to detect and identify ADEs. METHODS We analyzed the following two data sources: (1) biomedical articles and (2) health-related social media blog posts. We developed an intelligent and scalable text mining solution on big data infrastructures composed of Apache Spark, natural language processing, and machine learning. This was combined with an Elasticsearch No-SQL distributed database to explore and visualize ADEs. RESULTS The accuracy, precision, recall, and area under receiver operating characteristic of the system were 92.7%, 93.6%, 93.0%, and 0.905, respectively, and showed better results in comparison with traditional approaches in the literature. This work not only detected and classified ADE sentences from big data biomedical literature but also scientifically visualized ADE interactions. CONCLUSIONS To the best of our knowledge, this work is the first to investigate a big data machine learning strategy for ADE discovery on massive datasets downloaded from PubMed Central and social media. This contribution illustrates possible capacities in big data biomedical text analysis using advanced computational methods with real-time update from new data published on a daily basis.
Collapse
Affiliation(s)
- Ahmad P Tafti
- Biomedical Informatics Research Center, Marshfield Clinic Research Institute, Marshfield, WI, United States
| | - Jonathan Badger
- Biomedical Informatics Research Center, Marshfield Clinic Research Institute, Marshfield, WI, United States
| | - Eric LaRose
- Biomedical Informatics Research Center, Marshfield Clinic Research Institute, Marshfield, WI, United States
| | - Ehsan Shirzadi
- Institute of Electrical and Electronics Engineers, Dublin, Ireland
| | - Andrea Mahnke
- Biomedical Informatics Research Center, Marshfield Clinic Research Institute, Marshfield, WI, United States
| | - John Mayer
- Biomedical Informatics Research Center, Marshfield Clinic Research Institute, Marshfield, WI, United States
| | - Zhan Ye
- Biomedical Informatics Research Center, Marshfield Clinic Research Institute, Marshfield, WI, United States
| | - David Page
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, United States
| | - Peggy Peissig
- Biomedical Informatics Research Center, Marshfield Clinic Research Institute, Marshfield, WI, United States
| |
Collapse
|
23
|
Esteban S, Rodríguez Tablado M, Peper FE, Mahumud YS, Ricci RI, Kopitowski KS, Terrasa SA. Development and validation of various phenotyping algorithms for Diabetes Mellitus using data from electronic health records. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2017; 152:53-70. [PMID: 29054261 DOI: 10.1016/j.cmpb.2017.09.009] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2017] [Revised: 08/19/2017] [Accepted: 09/13/2017] [Indexed: 06/07/2023]
Abstract
BACKGROUND AND OBJECTIVE Recent progression towards precision medicine has encouraged the use of electronic health records (EHRs) as a source for large amounts of data, which is required for studying the effect of treatments or risk factors in more specific subpopulations. Phenotyping algorithms allow to automatically classify patients according to their particular electronic phenotype thus facilitating the setup of retrospective cohorts. Our objective is to compare the performance of different classification strategies (only using standardized problems, rule-based algorithms, statistical learning algorithms (six learners) and stacked generalization (five versions)), for the categorization of patients according to their diabetic status (diabetics, not diabetics and inconclusive; Diabetes of any type) using information extracted from EHRs. METHODS Patient information was extracted from the EHR at Hospital Italiano de Buenos Aires, Buenos Aires, Argentina. For the derivation and validation datasets, two probabilistic samples of patients from different years (2005: n = 1663; 2015: n = 800) were extracted. The only inclusion criterion was age (≥40 & <80 years). Four researchers manually reviewed all records and classified patients according to their diabetic status (diabetic: diabetes registered as a health problem or fulfilling the ADA criteria; non-diabetic: not fulfilling the ADA criteria and having at least one fasting glycemia below 126 mg/dL; inconclusive: no data regarding their diabetic status or only one abnormal value). The best performing algorithms within each strategy were tested on the validation set. RESULTS The standardized codes algorithm achieved a Kappa coefficient value of 0.59 (95% CI 0.49, 0.59) in the validation set. The Boolean logic algorithm reached 0.82 (95% CI 0.76, 0.88). A slightly higher value was achieved by the Feedforward Neural Network (0.9, 95% CI 0.85, 0.94). The best performing learner was the stacked generalization meta-learner that reached a Kappa coefficient value of 0.95 (95% CI 0.91, 0.98). CONCLUSIONS The stacked generalization strategy and the feedforward neural network showed the best classification metrics in the validation set. The implementation of these algorithms enables the exploitation of the data of thousands of patients accurately.
Collapse
Affiliation(s)
- Santiago Esteban
- Family and Community Division, Hospital Italiano de Buenos Aires, Buenos Aires, Argentina.; Research Department, Instituto Universitario Hospital Italiano de Buenos Aires, Buenos Aires, Argentina..
| | | | - Francisco E Peper
- Family and Community Division, Hospital Italiano de Buenos Aires, Buenos Aires, Argentina
| | - Yamila S Mahumud
- Family and Community Division, Hospital Italiano de Buenos Aires, Buenos Aires, Argentina
| | - Ricardo I Ricci
- Family and Community Division, Hospital Italiano de Buenos Aires, Buenos Aires, Argentina
| | - Karin S Kopitowski
- Family and Community Division, Hospital Italiano de Buenos Aires, Buenos Aires, Argentina.; Research Department, Instituto Universitario Hospital Italiano de Buenos Aires, Buenos Aires, Argentina
| | - Sergio A Terrasa
- Family and Community Division, Hospital Italiano de Buenos Aires, Buenos Aires, Argentina.; Public Health Department, Instituto Universitario Hospital Italiano de Buenos Aires, Buenos Aires, Argentina
| |
Collapse
|
24
|
SSEL-ADE: A semi-supervised ensemble learning framework for extracting adverse drug events from social media. Artif Intell Med 2017; 84:34-49. [PMID: 29111222 DOI: 10.1016/j.artmed.2017.10.003] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Revised: 08/28/2017] [Accepted: 10/15/2017] [Indexed: 11/21/2022]
Abstract
With the development of Web 2.0 technology, social media websites have become lucrative but under-explored data sources for extracting adverse drug events (ADEs), which is a serious health problem. Besides ADE, other semantic relation types (e.g., drug indication and beneficial effect) could hold between the drug and adverse event mentions, making ADE relation extraction - distinguishing ADE relationship from other relation types - necessary. However, conducting ADE relation extraction in social media environment is not a trivial task because of the expertise-dependent, time-consuming and costly annotation process, and the feature space's high-dimensionality attributed to intrinsic characteristics of social media data. This study aims to develop a framework for ADE relation extraction using patient-generated content in social media with better performance than that delivered by previous efforts. To achieve the objective, a general semi-supervised ensemble learning framework, SSEL-ADE, was developed. The framework exploited various lexical, semantic, and syntactic features, and integrated ensemble learning and semi-supervised learning. A series of experiments were conducted to verify the effectiveness of the proposed framework. Empirical results demonstrate the effectiveness of each component of SSEL-ADE and reveal that our proposed framework outperforms most of existing ADE relation extraction methods The SSEL-ADE can facilitate enhanced ADE relation extraction performance, thereby providing more reliable support for pharmacovigilance. Moreover, the proposed semi-supervised ensemble methods have the potential of being applied to effectively deal with other social media-based problems.
Collapse
|
25
|
Taewijit S, Theeramunkong T, Ikeda M. Distant Supervision with Transductive Learning for Adverse Drug Reaction Identification from Electronic Medical Records. JOURNAL OF HEALTHCARE ENGINEERING 2017; 2017:7575280. [PMID: 29090077 PMCID: PMC5635478 DOI: 10.1155/2017/7575280] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/05/2017] [Accepted: 07/19/2017] [Indexed: 11/17/2022]
Abstract
Information extraction and knowledge discovery regarding adverse drug reaction (ADR) from large-scale clinical texts are very useful and needy processes. Two major difficulties of this task are the lack of domain experts for labeling examples and intractable processing of unstructured clinical texts. Even though most previous works have been conducted on these issues by applying semisupervised learning for the former and a word-based approach for the latter, they face with complexity in an acquisition of initial labeled data and ignorance of structured sequence of natural language. In this study, we propose automatic data labeling by distant supervision where knowledge bases are exploited to assign an entity-level relation label for each drug-event pair in texts, and then, we use patterns for characterizing ADR relation. The multiple-instance learning with expectation-maximization method is employed to estimate model parameters. The method applies transductive learning to iteratively reassign a probability of unknown drug-event pair at the training time. By investigating experiments with 50,998 discharge summaries, we evaluate our method by varying large number of parameters, that is, pattern types, pattern-weighting models, and initial and iterative weightings of relations for unlabeled data. Based on evaluations, our proposed method outperforms the word-based feature for NB-EM (iEM), MILR, and TSVM with F1 score of 11.3%, 9.3%, and 6.5% improvement, respectively.
Collapse
Affiliation(s)
- Siriwon Taewijit
- The School of Information, Communication and Computer Technologies, Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani 12120, Thailand
- The School of Knowledge Science, Japan Advanced Institute of Science and Technology, Nomi 923-1292, Japan
| | - Thanaruk Theeramunkong
- The School of Information, Communication and Computer Technologies, Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani 12120, Thailand
| | - Mitsuru Ikeda
- The School of Knowledge Science, Japan Advanced Institute of Science and Technology, Nomi 923-1292, Japan
| |
Collapse
|
26
|
Névéol A, Zweigenbaum P. Making Sense of Big Textual Data for Health Care: Findings from the Section on Clinical Natural Language Processing. Yearb Med Inform 2017; 26:228-234. [PMID: 29063569 PMCID: PMC6239234 DOI: 10.15265/iy-2017-027] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Indexed: 02/01/2023] Open
Abstract
Objectives: To summarize recent research and present a selection of the best papers published in 2016 in the field of clinical Natural Language Processing (NLP). Method: A survey of the literature was performed by the two section editors of the IMIA Yearbook NLP section. Bibliographic databases were searched for papers with a focus on NLP efforts applied to clinical texts or aimed at a clinical outcome. Papers were automatically ranked and then manually reviewed based on titles and abstracts. A shortlist of candidate best papers was first selected by the section editors before being peer-reviewed by independent external reviewers. Results: The five clinical NLP best papers provide a contribution that ranges from emerging original foundational methods to transitioning solid established research results to a practical clinical setting. They offer a framework for abbreviation disambiguation and coreference resolution, a classification method to identify clinically useful sentences, an analysis of counseling conversations to improve support to patients with mental disorder and grounding of gradable adjectives. Conclusions: Clinical NLP continued to thrive in 2016, with an increasing number of contributions towards applications compared to fundamental methods. Fundamental work addresses increasingly complex problems such as lexical semantics, coreference resolution, and discourse analysis. Research results translate into freely available tools, mainly for English.
Collapse
Affiliation(s)
- A. Névéol
- LIMSI, CNRS, Université Paris Saclay, Orsay, France
| | | | | |
Collapse
|
27
|
Price J. What Can Big Data Offer the Pharmacovigilance of Orphan Drugs? Clin Ther 2016; 38:2533-2545. [PMID: 27914633 DOI: 10.1016/j.clinthera.2016.11.009] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2016] [Accepted: 11/07/2016] [Indexed: 12/18/2022]
Abstract
The pharmacovigilance of drugs for orphan diseases presents problems related to the small patient population. Obtaining high-quality information on individual reports of suspected adverse reactions is of particular importance for the pharmacovigilance of orphan drugs. The possibility of mining "big data" to detect suspected adverse reactions is being explored in pharmacovigilance generally but may have limited application to orphan drugs. Sources of big data such as social media may be infrequently used as communication channels by patients with rare disease or their caregivers or by health care providers; any adverse reactions identified are likely to reflect what is already known about the safety of the drug from the network of support that grows up around these patients. Opportunities related to potential future big data sources are discussed.
Collapse
Affiliation(s)
- John Price
- Alexion Pharmaceuticals, Inc, New Haven, Connecticut.
| |
Collapse
|