51
|
P Tafti A, Badger J, LaRose E, Shirzadi E, Mahnke A, Mayer J, Ye Z, Page D, Peissig P. Adverse Drug Event Discovery Using Biomedical Literature: A Big Data Neural Network Adventure. JMIR Med Inform 2017; 5:e51. [PMID: 29222076 PMCID: PMC5741828 DOI: 10.2196/medinform.9170] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2017] [Revised: 11/07/2017] [Accepted: 11/08/2017] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND The study of adverse drug events (ADEs) is a tenured topic in medical literature. In recent years, increasing numbers of scientific articles and health-related social media posts have been generated and shared daily, albeit with very limited use for ADE study and with little known about the content with respect to ADEs. OBJECTIVE The aim of this study was to develop a big data analytics strategy that mines the content of scientific articles and health-related Web-based social media to detect and identify ADEs. METHODS We analyzed the following two data sources: (1) biomedical articles and (2) health-related social media blog posts. We developed an intelligent and scalable text mining solution on big data infrastructures composed of Apache Spark, natural language processing, and machine learning. This was combined with an Elasticsearch No-SQL distributed database to explore and visualize ADEs. RESULTS The accuracy, precision, recall, and area under receiver operating characteristic of the system were 92.7%, 93.6%, 93.0%, and 0.905, respectively, and showed better results in comparison with traditional approaches in the literature. This work not only detected and classified ADE sentences from big data biomedical literature but also scientifically visualized ADE interactions. CONCLUSIONS To the best of our knowledge, this work is the first to investigate a big data machine learning strategy for ADE discovery on massive datasets downloaded from PubMed Central and social media. This contribution illustrates possible capacities in big data biomedical text analysis using advanced computational methods with real-time update from new data published on a daily basis.
Collapse
Affiliation(s)
- Ahmad P Tafti
- Biomedical Informatics Research Center, Marshfield Clinic Research Institute, Marshfield, WI, United States
| | - Jonathan Badger
- Biomedical Informatics Research Center, Marshfield Clinic Research Institute, Marshfield, WI, United States
| | - Eric LaRose
- Biomedical Informatics Research Center, Marshfield Clinic Research Institute, Marshfield, WI, United States
| | - Ehsan Shirzadi
- Institute of Electrical and Electronics Engineers, Dublin, Ireland
| | - Andrea Mahnke
- Biomedical Informatics Research Center, Marshfield Clinic Research Institute, Marshfield, WI, United States
| | - John Mayer
- Biomedical Informatics Research Center, Marshfield Clinic Research Institute, Marshfield, WI, United States
| | - Zhan Ye
- Biomedical Informatics Research Center, Marshfield Clinic Research Institute, Marshfield, WI, United States
| | - David Page
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, United States
| | - Peggy Peissig
- Biomedical Informatics Research Center, Marshfield Clinic Research Institute, Marshfield, WI, United States
| |
Collapse
|
52
|
VanDam C, Kanthawala S, Pratt W, Chai J, Huh J. Detecting clinically related content in online patient posts. J Biomed Inform 2017; 75:96-106. [PMID: 28986329 PMCID: PMC5685920 DOI: 10.1016/j.jbi.2017.09.015] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Revised: 09/14/2017] [Accepted: 09/30/2017] [Indexed: 10/18/2022]
Abstract
Patients with chronic health conditions use online health communities to seek support and information to help manage their condition. For clinically related topics, patients can benefit from getting opinions from clinical experts, and many are concerned about misinformation and biased information being spread online. However, a large volume of community posts makes it challenging for moderators and clinical experts, if there are any, to provide necessary information. Automatically identifying forum posts that need validated clinical resources can help online health communities efficiently manage content exchange. This automation can also assist patients in need of clinical expertise by getting proper help. We present our results on testing text classification models that efficiently and accurately identify community posts containing clinical topics. We annotated 1817 posts comprised of 4966 sentences of an existing online diabetes community. We found that our classifier performed the best (F-measure: 0.83, Precision: 0.79, Recall:0.86) when using Naïve Bayes algorithm, unigrams, bigrams, trigrams, and MetaMap Symantic Types. Training took 5 s. The classification process took a fraction of 1 s. We applied our classifier to another online diabetes community, and the results were: F-measure: 0.63, Precision: 0.57, Recall: 0.71. Our results show our model is feasible to scale to other forums on identifying posts containing clinical topic with common errors properly addressed.
Collapse
Affiliation(s)
| | | | - Wanda Pratt
- University of Washington, Seattle, United States.
| | - Joyce Chai
- Michigan State University, United States.
| | - Jina Huh
- University of California San Diego, United States.
| |
Collapse
|
53
|
SSEL-ADE: A semi-supervised ensemble learning framework for extracting adverse drug events from social media. Artif Intell Med 2017; 84:34-49. [PMID: 29111222 DOI: 10.1016/j.artmed.2017.10.003] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Revised: 08/28/2017] [Accepted: 10/15/2017] [Indexed: 11/21/2022]
Abstract
With the development of Web 2.0 technology, social media websites have become lucrative but under-explored data sources for extracting adverse drug events (ADEs), which is a serious health problem. Besides ADE, other semantic relation types (e.g., drug indication and beneficial effect) could hold between the drug and adverse event mentions, making ADE relation extraction - distinguishing ADE relationship from other relation types - necessary. However, conducting ADE relation extraction in social media environment is not a trivial task because of the expertise-dependent, time-consuming and costly annotation process, and the feature space's high-dimensionality attributed to intrinsic characteristics of social media data. This study aims to develop a framework for ADE relation extraction using patient-generated content in social media with better performance than that delivered by previous efforts. To achieve the objective, a general semi-supervised ensemble learning framework, SSEL-ADE, was developed. The framework exploited various lexical, semantic, and syntactic features, and integrated ensemble learning and semi-supervised learning. A series of experiments were conducted to verify the effectiveness of the proposed framework. Empirical results demonstrate the effectiveness of each component of SSEL-ADE and reveal that our proposed framework outperforms most of existing ADE relation extraction methods The SSEL-ADE can facilitate enhanced ADE relation extraction performance, thereby providing more reliable support for pharmacovigilance. Moreover, the proposed semi-supervised ensemble methods have the potential of being applied to effectively deal with other social media-based problems.
Collapse
|
54
|
Cronin RM, Fabbri D, Denny JC, Rosenbloom ST, Jackson GP. A comparison of rule-based and machine learning approaches for classifying patient portal messages. Int J Med Inform 2017; 105:110-120. [PMID: 28750904 DOI: 10.1016/j.ijmedinf.2017.06.004] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2017] [Revised: 06/13/2017] [Accepted: 06/20/2017] [Indexed: 12/28/2022]
Abstract
OBJECTIVE Secure messaging through patient portals is an increasingly popular way that consumers interact with healthcare providers. The increasing burden of secure messaging can affect clinic staffing and workflows. Manual management of portal messages is costly and time consuming. Automated classification of portal messages could potentially expedite message triage and delivery of care. MATERIALS AND METHODS We developed automated patient portal message classifiers with rule-based and machine learning techniques using bag of words and natural language processing (NLP) approaches. To evaluate classifier performance, we used a gold standard of 3253 portal messages manually categorized using a taxonomy of communication types (i.e., main categories of informational, medical, logistical, social, and other communications, and subcategories including prescriptions, appointments, problems, tests, follow-up, contact information, and acknowledgement). We evaluated our classifiers' accuracies in identifying individual communication types within portal messages with area under the receiver-operator curve (AUC). Portal messages often contain more than one type of communication. To predict all communication types within single messages, we used the Jaccard Index. We extracted the variables of importance for the random forest classifiers. RESULTS The best performing approaches to classification for the major communication types were: logistic regression for medical communications (AUC: 0.899); basic (rule-based) for informational communications (AUC: 0.842); and random forests for social communications and logistical communications (AUCs: 0.875 and 0.925, respectively). The best performing classification approach of classifiers for individual communication subtypes was random forests for Logistical-Contact Information (AUC: 0.963). The Jaccard Indices by approach were: basic classifier, Jaccard Index: 0.674; Naïve Bayes, Jaccard Index: 0.799; random forests, Jaccard Index: 0.859; and logistic regression, Jaccard Index: 0.861. For medical communications, the most predictive variables were NLP concepts (e.g., Temporal_Concept, which maps to 'morning', 'evening' and Idea_or_Concept which maps to 'appointment' and 'refill'). For logistical communications, the most predictive variables contained similar numbers of NLP variables and words (e.g., Telephone mapping to 'phone', 'insurance'). For social and informational communications, the most predictive variables were words (e.g., social: 'thanks', 'much', informational: 'question', 'mean'). CONCLUSIONS This study applies automated classification methods to the content of patient portal messages and evaluates the application of NLP techniques on consumer communications in patient portal messages. We demonstrated that random forest and logistic regression approaches accurately classified the content of portal messages, although the best approach to classification varied by communication type. Words were the most predictive variables for classification of most communication types, although NLP variables were most predictive for medical communication types. As adoption of patient portals increases, automated techniques could assist in understanding and managing growing volumes of messages. Further work is needed to improve classification performance to potentially support message triage and answering.
Collapse
Affiliation(s)
- Robert M Cronin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA.
| | - Daniel Fabbri
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Computer Science, Vanderbilt University, Nashville, TN, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - S Trent Rosenbloom
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Gretchen Purcell Jackson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Pediatric Surgery, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
55
|
Abdellaoui R, Schück S, Texier N, Burgun A. Filtering Entities to Optimize Identification of Adverse Drug Reaction From Social Media: How Can the Number of Words Between Entities in the Messages Help? JMIR Public Health Surveill 2017. [PMID: 28642212 PMCID: PMC5500778 DOI: 10.2196/publichealth.6577] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND With the increasing popularity of Web 2.0 applications, social media has made it possible for individuals to post messages on adverse drug reactions. In such online conversations, patients discuss their symptoms, medical history, and diseases. These disorders may correspond to adverse drug reactions (ADRs) or any other medical condition. Therefore, methods must be developed to distinguish between false positives and true ADR declarations. OBJECTIVE The aim of this study was to investigate a method for filtering out disorder terms that did not correspond to adverse events by using the distance (as number of words) between the drug term and the disorder or symptom term in the post. We hypothesized that the shorter the distance between the disorder name and the drug, the higher the probability to be an ADR. METHODS We analyzed a corpus of 648 messages corresponding to a total of 1654 (drug and disorder) pairs from 5 French forums using Gaussian mixture models and an expectation-maximization (EM) algorithm . RESULTS The distribution of the distances between the drug term and the disorder term enabled the filtering of 50.03% (733/1465) of the disorders that were not ADRs. Our filtering strategy achieved a precision of 95.8% and a recall of 50.0%. CONCLUSIONS This study suggests that such distance between terms can be used for identifying false positives, thereby improving ADR detection in social media.
Collapse
Affiliation(s)
- Redhouane Abdellaoui
- INSERM, UMRS 1138 Team 22, Université Pierre et Marie Curie, Paris, France.,Kappa Santé, Innovation, Paris, France
| | | | | | - Anita Burgun
- INSERM, UMRS 1138 Team 22, Université Pierre et Marie Curie, Paris, France.,Assistance Publique-Hôpitaux de Paris (AP-HP), Hôpital Européen Georges-Pompidou (HEGP), Medical Informatics, Paris, France
| |
Collapse
|
56
|
Integrating Personalized Technology in Toxicology: Sensors, Smart Glass, and Social Media Applications in Toxicology Research. J Med Toxicol 2017; 13:166-172. [PMID: 28405896 DOI: 10.1007/s13181-017-0611-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2017] [Revised: 03/07/2017] [Accepted: 03/15/2017] [Indexed: 02/04/2023] Open
Abstract
Rapid proliferation of mobile technologies in social and healthcare spaces create an opportunity for advancement in research and clinical practice. The application of mobile, personalized technology in healthcare, referred to as mHealth, has not yet become routine in toxicology. However, key features of our practice environment, such as frequent need for remote evaluation, unreliable historical data from patients, and sensitive subject matter, make mHealth tools appealing solutions in comparison to traditional methods that collect retrospective or indirect data. This manuscript describes the features, uses, and costs associated with several of common sectors of mHealth research including wearable biosensors, ingestible biosensors, head-mounted devices, and social media applications. The benefits and novel challenges associated with the study and use of these applications are then discussed. Finally, opportunities for further research and integration are explored with a particular focus on toxicology-based applications.
Collapse
|
57
|
Adams DZ, Gruss R, Abrahams AS. Automated discovery of safety and efficacy concerns for joint & muscle pain relief treatments from online reviews. Int J Med Inform 2017; 100:108-120. [DOI: 10.1016/j.ijmedinf.2017.01.005] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2016] [Revised: 12/20/2016] [Accepted: 01/07/2017] [Indexed: 02/07/2023]
|
58
|
Pierce CE, Bouri K, Pamer C, Proestel S, Rodriguez HW, Van Le H, Freifeld CC, Brownstein JS, Walderhaug M, Edwards IR, Dasgupta N. Evaluation of Facebook and Twitter Monitoring to Detect Safety Signals for Medical Products: An Analysis of Recent FDA Safety Alerts. Drug Saf 2017; 40:317-331. [PMID: 28044249 PMCID: PMC5362648 DOI: 10.1007/s40264-016-0491-0] [Citation(s) in RCA: 74] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
INTRODUCTION The rapid expansion of the Internet and computing power in recent years has opened up the possibility of using social media for pharmacovigilance. While this general concept has been proposed by many, central questions remain as to whether social media can provide earlier warnings for rare and serious events than traditional signal detection from spontaneous report data. OBJECTIVE Our objective was to examine whether specific product-adverse event pairs were reported via social media before being reported to the US FDA Adverse Event Reporting System (FAERS). METHODS A retrospective analysis of public Facebook and Twitter data was conducted for 10 recent FDA postmarketing safety signals at the drug-event pair level with six negative controls. Social media data corresponding to two years prior to signal detection of each product-event pair were compiled. Automated classifiers were used to identify each 'post with resemblance to an adverse event' (Proto-AE), among English language posts. A custom dictionary was used to translate Internet vernacular into Medical Dictionary for Regulatory Activities (MedDRA®) Preferred Terms. Drug safety physicians conducted a manual review to determine causality using World Health Organization-Uppsala Monitoring Centre (WHO-UMC) assessment criteria. Cases were also compared with those reported in FAERS. FINDINGS A total of 935,246 posts were harvested from Facebook and Twitter, from March 2009 through October 2014. The automated classifier identified 98,252 Proto-AEs. Of these, 13 posts were selected for causality assessment of product-event pairs. Clinical assessment revealed that posts had sufficient information to warrant further investigation for two possible product-event associations: dronedarone-vasculitis and Banana Boat Sunscreen--skin burns. No product-event associations were found among the negative controls. In one of the positive cases, the first report occurred in social media prior to signal detection from FAERS, whereas the other case occurred first in FAERS. CONCLUSIONS An efficient semi-automated approach to social media monitoring may provide earlier insights into certain adverse events. More work is needed to elaborate additional uses for social media data in pharmacovigilance and to determine how they can be applied by regulatory agencies.
Collapse
Affiliation(s)
| | - Khaled Bouri
- US Food and Drug Administration, Silver Spring, MD, USA
| | - Carol Pamer
- US Food and Drug Administration, Silver Spring, MD, USA
| | | | | | | | - Clark C Freifeld
- Epidemico, Inc., Boston, MA, USA
- Northeastern University College of Computer and Information Science, Boston, MA, USA
| | | | | | - I Ralph Edwards
- Uppsala Monitoring Centre, WHO Collaborating Centre for International Drug Monitoring, Uppsala, Sweden
| | | |
Collapse
|
59
|
Large-scale adverse effects related to treatment evidence standardization (LAERTES): an open scalable system for linking pharmacovigilance evidence sources with clinical data. J Biomed Semantics 2017; 8:11. [PMID: 28270198 PMCID: PMC5341176 DOI: 10.1186/s13326-017-0115-3] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2016] [Accepted: 01/13/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Integrating multiple sources of pharmacovigilance evidence has the potential to advance the science of safety signal detection and evaluation. In this regard, there is a need for more research on how to integrate multiple disparate evidence sources while making the evidence computable from a knowledge representation perspective (i.e., semantic enrichment). Existing frameworks suggest well-promising outcomes for such integration but employ a rather limited number of sources. In particular, none have been specifically designed to support both regulatory and clinical use cases, nor have any been designed to add new resources and use cases through an open architecture. This paper discusses the architecture and functionality of a system called Large-scale Adverse Effects Related to Treatment Evidence Standardization (LAERTES) that aims to address these shortcomings. RESULTS LAERTES provides a standardized, open, and scalable architecture for linking evidence sources relevant to the association of drugs with health outcomes of interest (HOIs). Standard terminologies are used to represent different entities. For example, drugs and HOIs are represented in RxNorm and Systematized Nomenclature of Medicine -- Clinical Terms respectively. At the time of this writing, six evidence sources have been loaded into the LAERTES evidence base and are accessible through prototype evidence exploration user interface and a set of Web application programming interface services. This system operates within a larger software stack provided by the Observational Health Data Sciences and Informatics clinical research framework, including the relational Common Data Model for observational patient data created by the Observational Medical Outcomes Partnership. Elements of the Linked Data paradigm facilitate the systematic and scalable integration of relevant evidence sources. CONCLUSIONS The prototype LAERTES system provides useful functionality while creating opportunities for further research. Future work will involve improving the method for normalizing drug and HOI concepts across the integrated sources, aggregated evidence at different levels of a hierarchy of HOI concepts, and developing more advanced user interface for drug-HOI investigations.
Collapse
|
60
|
Validation of New Signal Detection Methods for Web Query Log Data Compared to Signal Detection Algorithms Used With FAERS. Drug Saf 2017; 40:399-408. [DOI: 10.1007/s40264-017-0507-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
61
|
Anderson LS, Bell HG, Gilbert M, Davidson JE, Winter C, Barratt MJ, Win B, Painter JL, Menone C, Sayegh J, Dasgupta N. Using Social Listening Data to Monitor Misuse and Nonmedical Use of Bupropion: A Content Analysis. JMIR Public Health Surveill 2017; 3:e6. [PMID: 28148472 PMCID: PMC5311422 DOI: 10.2196/publichealth.6174] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2016] [Revised: 11/02/2016] [Accepted: 01/07/2017] [Indexed: 12/17/2022] Open
Abstract
Background The nonmedical use of pharmaceutical products has become a significant public health concern. Traditionally, the evaluation of nonmedical use has focused on controlled substances with addiction risk. Currently, there is no effective means of evaluating the nonmedical use of noncontrolled antidepressants. Objective Social listening, in the context of public health sometimes called infodemiology or infoveillance, is the process of identifying and assessing what is being said about a company, product, brand, or individual, within forms of electronic interactive media. The objectives of this study were (1) to determine whether content analysis of social listening data could be utilized to identify posts discussing potential misuse or nonmedical use of bupropion and two comparators, amitriptyline and venlafaxine, and (2) to describe and characterize these posts. Methods Social listening was performed on all publicly available posts cumulative through July 29, 2015, from two harm-reduction Web forums, Bluelight and Opiophile, which mentioned the study drugs. The acquired data were stripped of personally identifiable identification (PII). A set of generic, brand, and vernacular product names was used to identify product references in posts. Posts were obtained using natural language processing tools to identify vernacular references to drug misuse-related Preferred Terms from the English Medical Dictionary for Regulatory Activities (MedDRA) version 18 terminology. Posts were reviewed manually by coders, who extracted relevant details. Results A total of 7756 references to at least one of the study antidepressants were identified within posts gathered for this study. Of these posts, 668 (8.61%, 668/7756) referenced misuse or nonmedical use of the drug, with bupropion accounting for 438 (65.6%, 438/668). Of the 668 posts, nonmedical use was discouraged by 40.6% (178/438), 22% (22/100), and 18.5% (24/130) and encouraged by 12.3% (54/438), 10% (10/100), and 10.8% (14/130) for bupropion, amitriptyline, and venlafaxine, respectively. The most commonly reported desired effects were similar to stimulants with bupropion, sedatives with amitriptyline, and dissociatives with venlafaxine. The nasal route of administration was most frequently reported for bupropion, whereas the oral route was most frequently reported for amitriptyline and venlafaxine. Bupropion and venlafaxine were most commonly procured from health care providers, whereas amitriptyline was most commonly obtained or stolen from a third party. The Fleiss kappa for interrater agreement among 20 items with 7 categorical response options evaluated by all 11 raters was 0.448 (95% CI 0.421-0.457). Conclusions Social listening, conducted in collaboration with harm-reduction Web forums, offers a valuable new data source that can be used for monitoring nonmedical use of antidepressants. Additional work on the capabilities of social listening will help further delineate the benefits and limitations of this rapidly evolving data source.
Collapse
Affiliation(s)
| | - Heidi G Bell
- Gyra MediPharm ConsultingResearch Triangle Park, NCUnited States
| | | | | | | | - Monica J Barratt
- National Drug and Alcohol Research Centre, UNSW AustraliaRandwickAustralia.,Bluelight.orgDover, DEUnited States.,Kadiant AnalyticsBoston, MAUnited States
| | - Beta Win
- GlaxoSmithKlineStockley Park, MiddlesexUnited Kingdom
| | | | | | | | | |
Collapse
|
62
|
Lucini FR, Fogliatto FS, da Silveira GJC, Neyeloff JL, Anzanello MJ, Kuchenbecker RS, Schaan BD. Text mining approach to predict hospital admissions using early medical records from the emergency department. Int J Med Inform 2017; 100:1-8. [PMID: 28241931 DOI: 10.1016/j.ijmedinf.2017.01.001] [Citation(s) in RCA: 59] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2016] [Revised: 10/31/2016] [Accepted: 01/03/2017] [Indexed: 11/25/2022]
Abstract
OBJECTIVE Emergency department (ED) overcrowding is a serious issue for hospitals. Early information on short-term inward bed demand from patients receiving care at the ED may reduce the overcrowding problem, and optimize the use of hospital resources. In this study, we use text mining methods to process data from early ED patient records using the SOAP framework, and predict future hospitalizations and discharges. DESIGN We try different approaches for pre-processing of text records and to predict hospitalization. Sets-of-words are obtained via binary representation, term frequency, and term frequency-inverse document frequency. Unigrams, bigrams and trigrams are tested for feature formation. Feature selection is based on χ2 and F-score metrics. In the prediction module, eight text mining methods are tested: Decision Tree, Random Forest, Extremely Randomized Tree, AdaBoost, Logistic Regression, Multinomial Naïve Bayes, Support Vector Machine (Kernel linear) and Nu-Support Vector Machine (Kernel linear). MEASUREMENTS Prediction performance is evaluated by F1-scores. Precision and Recall values are also informed for all text mining methods tested. RESULTS Nu-Support Vector Machine was the text mining method with the best overall performance. Its average F1-score in predicting hospitalization was 77.70%, with a standard deviation (SD) of 0.66%. CONCLUSIONS The method could be used to manage daily routines in EDs such as capacity planning and resource allocation. Text mining could provide valuable information and facilitate decision-making by inward bed management teams.
Collapse
Affiliation(s)
- Filipe R Lucini
- Industrial Engineering Department, Federal University of Rio Grande do Sul. Av. Osvaldo Aranha, 99, 5° Andar, 90035-190 Porto Alegre, RS, Brazil.
| | - Flavio S Fogliatto
- Industrial Engineering Department, Federal University of Rio Grande do Sul. Av. Osvaldo Aranha, 99, 5° Andar, 90035-190 Porto Alegre, RS, Brazil
| | - Giovani J C da Silveira
- Haskayne School of Business, University of Calgary, 2500 University Dr NW, T2N 1N4 Calgary, AB, Canada
| | - Jeruza L Neyeloff
- Hospital de Clínicas de Porto Alegre, Federal University of Rio Grande do Sul. Rua Ramiro Barcelos, 2350, 90035-903 Porto Alegre, RS, Brazil
| | - Michel J Anzanello
- Industrial Engineering Department, Federal University of Rio Grande do Sul. Av. Osvaldo Aranha, 99, 5° Andar, 90035-190 Porto Alegre, RS, Brazil
| | - Ricardo S Kuchenbecker
- Hospital de Clínicas de Porto Alegre, Federal University of Rio Grande do Sul. Rua Ramiro Barcelos, 2350, 90035-903 Porto Alegre, RS, Brazil
| | - Beatriz D Schaan
- Hospital de Clínicas de Porto Alegre, Federal University of Rio Grande do Sul. Rua Ramiro Barcelos, 2350, 90035-903 Porto Alegre, RS, Brazil
| |
Collapse
|
63
|
Powell GE, Seifert HA, Reblin T, Burstein PJ, Blowers J, Menius JA, Painter JL, Thomas M, Pierce CE, Rodriguez HW, Brownstein JS, Freifeld CC, Bell HG, Dasgupta N. Social Media Listening for Routine Post-Marketing Safety Surveillance. Drug Saf 2016; 39:443-54. [PMID: 26798054 DOI: 10.1007/s40264-015-0385-6] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
INTRODUCTION Post-marketing safety surveillance primarily relies on data from spontaneous adverse event reports, medical literature, and observational databases. Limitations of these data sources include potential under-reporting, lack of geographic diversity, and time lag between event occurrence and discovery. There is growing interest in exploring the use of social media ('social listening') to supplement established approaches for pharmacovigilance. Although social listening is commonly used for commercial purposes, there are only anecdotal reports of its use in pharmacovigilance. Health information posted online by patients is often publicly available, representing an untapped source of post-marketing safety data that could supplement data from existing sources. OBJECTIVES The objective of this paper is to describe one methodology that could help unlock the potential of social media for safety surveillance. METHODS A third-party vendor acquired 24 months of publicly available Facebook and Twitter data, then processed the data by standardizing drug names and vernacular symptoms, removing duplicates and noise, masking personally identifiable information, and adding supplemental data to facilitate the review process. The resulting dataset was analyzed for safety and benefit information. RESULTS In Twitter, a total of 6,441,679 Medical Dictionary for Regulatory Activities (MedDRA(®)) Preferred Terms (PTs) representing 702 individual PTs were discussed in the same post as a drug compared with 15,650,108 total PTs representing 946 individual PTs in Facebook. Further analysis revealed that 26 % of posts also contained benefit information. CONCLUSION Social media listening is an important tool to augment post-marketing safety surveillance. Much work remains to determine best practices for using this rapidly evolving data source.
Collapse
Affiliation(s)
- Gregory E Powell
- GlaxoSmithKline, 5 Moore Dr., Research Triangle Park, NC, 27709, USA.
| | | | | | | | - James Blowers
- GlaxoSmithKline, 5 Moore Dr., Research Triangle Park, NC, 27709, USA
| | - J Alan Menius
- GlaxoSmithKline, 5 Moore Dr., Research Triangle Park, NC, 27709, USA
| | - Jeffery L Painter
- GlaxoSmithKline, 5 Moore Dr., Research Triangle Park, NC, 27709, USA
| | | | | | | | | | | | | | | |
Collapse
|
64
|
Lim S, Tucker CS, Kumara S. An unsupervised machine learning model for discovering latent infectious diseases using social media data. J Biomed Inform 2016; 66:82-94. [PMID: 28034788 DOI: 10.1016/j.jbi.2016.12.007] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2016] [Revised: 12/03/2016] [Accepted: 12/14/2016] [Indexed: 10/20/2022]
Abstract
INTRODUCTION The authors of this work propose an unsupervised machine learning model that has the ability to identify real-world latent infectious diseases by mining social media data. In this study, a latent infectious disease is defined as a communicable disease that has not yet been formalized by national public health institutes and explicitly communicated to the general public. Most existing approaches to modeling infectious-disease-related knowledge discovery through social media networks are top-down approaches that are based on already known information, such as the names of diseases and their symptoms. In existing top-down approaches, necessary but unknown information, such as disease names and symptoms, is mostly unidentified in social media data until national public health institutes have formalized that disease. Most of the formalizing processes for latent infectious diseases are time consuming. Therefore, this study presents a bottom-up approach for latent infectious disease discovery in a given location without prior information, such as disease names and related symptoms. METHODS Social media messages with user and temporal information are extracted during the data preprocessing stage. An unsupervised sentiment analysis model is then presented. Users' expressions about symptoms, body parts, and pain locations are also identified from social media data. Then, symptom weighting vectors for each individual and time period are created, based on their sentiment and social media expressions. Finally, latent-infectious-disease-related information is retrieved from individuals' symptom weighting vectors. DATASETS AND RESULTS Twitter data from August 2012 to May 2013 are used to validate this study. Real electronic medical records for 104 individuals, who were diagnosed with influenza in the same period, are used to serve as ground truth validation. The results are promising, with the highest precision, recall, and F1 score values of 0.773, 0.680, and 0.724, respectively. CONCLUSION This work uses individuals' social media messages to identify latent infectious diseases, without prior information, quicker than when the disease(s) is formalized by national public health institutes. In particular, the unsupervised machine learning model using user, textual, and temporal information in social media data, along with sentiment analysis, identifies latent infectious diseases in a given location.
Collapse
Affiliation(s)
- Sunghoon Lim
- Department of Industrial and Manufacturing Engineering, The Pennsylvania State University, University Park, PA 16802, USA
| | - Conrad S Tucker
- School of Engineering Design, Technology, and Professional Programs, The Pennsylvania State University, University Park, PA 16802, USA; Department of Industrial and Manufacturing Engineering, The Pennsylvania State University, University Park, PA 16802, USA.
| | - Soundar Kumara
- Department of Industrial and Manufacturing Engineering, The Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
65
|
Herrero-Zazo M, Segura-Bedmar I, Martínez P. Conceptual models of drug-drug interactions: A summary of recent efforts. Knowl Based Syst 2016. [DOI: 10.1016/j.knosys.2016.10.006] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
66
|
Affiliation(s)
- John D Seeger
- Division of Pharmacoepidemiology and Pharmacoeconomics, Harvard Medical School/Brigham and Women's Hospital, 1620 Tremont, Suite 3030, Boston, MA, 02120, USA.
| |
Collapse
|
67
|
An ensemble method for extracting adverse drug events from social media. Artif Intell Med 2016; 70:62-76. [PMID: 27431037 DOI: 10.1016/j.artmed.2016.05.004] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2015] [Revised: 05/20/2016] [Accepted: 05/27/2016] [Indexed: 11/24/2022]
Abstract
OBJECTIVE Because adverse drug events (ADEs) are a serious health problem and a leading cause of death, it is of vital importance to identify them correctly and in a timely manner. With the development of Web 2.0, social media has become a large data source for information on ADEs. The objective of this study is to develop a relation extraction system that uses natural language processing techniques to effectively distinguish between ADEs and non-ADEs in informal text on social media. METHODS AND MATERIALS We develop a feature-based approach that utilizes various lexical, syntactic, and semantic features. Information-gain-based feature selection is performed to address high-dimensional features. Then, we evaluate the effectiveness of four well-known kernel-based approaches (i.e., subset tree kernel, tree kernel, shortest dependency path kernel, and all-paths graph kernel) and several ensembles that are generated by adopting different combination methods (i.e., majority voting, weighted averaging, and stacked generalization). All of the approaches are tested using three data sets: two health-related discussion forums and one general social networking site (i.e., Twitter). RESULTS When investigating the contribution of each feature subset, the feature-based approach attains the best area under the receiver operating characteristics curve (AUC) values, which are 78.6%, 72.2%, and 79.2% on the three data sets. When individual methods are used, we attain the best AUC values of 82.1%, 73.2%, and 77.0% using the subset tree kernel, shortest dependency path kernel, and feature-based approach on the three data sets, respectively. When using classifier ensembles, we achieve the best AUC values of 84.5%, 77.3%, and 84.5% on the three data sets, outperforming the baselines. CONCLUSIONS Our experimental results indicate that ADE extraction from social media can benefit from feature selection. With respect to the effectiveness of different feature subsets, lexical features and semantic features can enhance the ADE extraction capability. Kernel-based approaches, which can stay away from the feature sparsity issue, are qualified to address the ADE extraction problem. Combining different individual classifiers using suitable combination methods can further enhance the ADE extraction effectiveness.
Collapse
|
68
|
Martínez P, Martínez JL, Segura-Bedmar I, Moreno-Schneider J, Luna A, Revert R. Turning user generated health-related content into actionable knowledge through text analytics services. COMPUT IND 2016. [DOI: 10.1016/j.compind.2015.10.006] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
69
|
Segura-Bedmar I, Martínez P. Pharmacovigilance through the development of text mining and natural language processing techniques. J Biomed Inform 2015; 58:288-291. [PMID: 26547007 DOI: 10.1016/j.jbi.2015.11.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
| | - Paloma Martínez
- Computer Science Department, Universidad Carlos III de Madrid, Spain.
| |
Collapse
|
70
|
Ruzich E, Allison C, Chakrabarti B, Smith P, Musto H, Ring H, Baron-Cohen S. Sex and STEM Occupation Predict Autism-Spectrum Quotient (AQ) Scores in Half a Million People. PLoS One 2015; 10:e0141229. [PMID: 26488477 PMCID: PMC4619566 DOI: 10.1371/journal.pone.0141229] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2015] [Accepted: 10/06/2015] [Indexed: 02/01/2023] Open
Abstract
This study assesses Autism-Spectrum Quotient (AQ) scores in a 'big data' sample collected through the UK Channel 4 television website, following the broadcasting of a medical education program. We examine correlations between the AQ and age, sex, occupation, and UK geographic region in 450,394 individuals. We predicted that age and geography would not be correlated with AQ, whilst sex and occupation would have a correlation. Mean AQ for the total sample score was m = 19.83 (SD = 8.71), slightly higher than a previous systematic review of 6,900 individuals in a non-clinical sample (mean of means = 16.94) This likely reflects that this big-data sample includes individuals with autism who in the systematic review score much higher (mean of means = 35.19). As predicted, sex and occupation differences were observed: on average, males (m = 21.55, SD = 8.82) scored higher than females (m = 18.95; SD = 8.52), and individuals working in a STEM career (m = 21.92, SD = 8.92) scored higher than individuals non-STEM careers (m = 18.92, SD = 8.48). Also as predicted, age and geographic region were not meaningfully correlated with AQ. These results support previous findings relating to sex and STEM careers in the largest set of individuals for which AQ scores have been reported and suggest the AQ is a useful self-report measure of autistic traits.
Collapse
Affiliation(s)
- Emily Ruzich
- Autism Research Centre, Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom
- Cambridge Intellectual and Developmental Disabilities Research Group, Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom
- * E-mail:
| | - Carrie Allison
- Autism Research Centre, Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom
- NIHR CLAHRC-EoE for Cambridgeshire and Peterborough NHS Foundation Trust, Cambridge, United Kingdom
| | - Bhismadev Chakrabarti
- Autism Research Centre, Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom
- Centre for Integrative Neuroscience and Neurodynamics, School of Psychology and Clinical Language Sciences, University of Reading, Reading, United Kingdom
| | - Paula Smith
- Autism Research Centre, Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom
| | - Henry Musto
- Autism Research Centre, Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom
| | - Howard Ring
- Cambridge Intellectual and Developmental Disabilities Research Group, Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom
- NIHR CLAHRC-EoE for Cambridgeshire and Peterborough NHS Foundation Trust, Cambridge, United Kingdom
| | - Simon Baron-Cohen
- Autism Research Centre, Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom
- NIHR CLAHRC-EoE for Cambridgeshire and Peterborough NHS Foundation Trust, Cambridge, United Kingdom
- CLASS Clinic, Cambridgeshire and Peterborough NHS Foundation Trust, Cambridge, United Kingdom
| |
Collapse
|
71
|
Sloane R, Osanlou O, Lewis D, Bollegala D, Maskell S, Pirmohamed M. Social media and pharmacovigilance: A review of the opportunities and challenges. Br J Clin Pharmacol 2015; 80:910-20. [PMID: 26147850 PMCID: PMC4594734 DOI: 10.1111/bcp.12717] [Citation(s) in RCA: 72] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2015] [Revised: 06/29/2015] [Accepted: 07/03/2015] [Indexed: 01/23/2023] Open
Abstract
Adverse drug reactions come at a considerable cost on society. Social media are a potentially invaluable reservoir of information for pharmacovigilance, yet their true value remains to be fully understood. In order to realize the benefits social media holds, a number of technical, regulatory and ethical challenges remain to be addressed. We outline these key challenges identifying relevant current research and present possible solutions.
Collapse
Affiliation(s)
- Richard Sloane
- Department of Electrical Engineering and Electronics, University of LiverpoolL69 3GJ, UK
- Department of Molecular and Clinical Pharmacology, University of LiverpoolL69 3GL, UK
- Department of Computer Science, University of LiverpoolL69 3BX, UK
| | - Orod Osanlou
- Department of Molecular and Clinical Pharmacology, University of LiverpoolL69 3GL, UK
- Royal Liverpool and Broadgreen University Hospital NHS TrustLiverpool, L7 8XP, UK
| | - David Lewis
- Drug Safety & Epidemiology, Novartis Pharma AG, PostfachCH-4002, Basel, Switzerland
| | | | - Simon Maskell
- Department of Electrical Engineering and Electronics, University of LiverpoolL69 3GJ, UK
- Department of Computer Science, University of LiverpoolL69 3BX, UK
| | - Munir Pirmohamed
- Department of Molecular and Clinical Pharmacology, University of LiverpoolL69 3GL, UK
- Royal Liverpool and Broadgreen University Hospital NHS TrustLiverpool, L7 8XP, UK
| |
Collapse
|
72
|
Chen LS, Lin ZC, Chang JR. FIR: An Effective Scheme for Extracting Useful Metadata from Social Media. J Med Syst 2015; 39:139. [PMID: 26330225 DOI: 10.1007/s10916-015-0333-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Accepted: 08/21/2015] [Indexed: 11/27/2022]
Abstract
Recently, the use of social media for health information exchange is expanding among patients, physicians, and other health care professionals. In medical areas, social media allows non-experts to access, interpret, and generate medical information for their own care and the care of others. Researchers paid much attention on social media in medical educations, patient-pharmacist communications, adverse drug reactions detection, impacts of social media on medicine and healthcare, and so on. However, relatively few papers discuss how to extract useful knowledge from a huge amount of textual comments in social media effectively. Therefore, this study aims to propose a Fuzzy adaptive resonance theory network based Information Retrieval (FIR) scheme by combining Fuzzy adaptive resonance theory (ART) network, Latent Semantic Indexing (LSI), and association rules (AR) discovery to extract knowledge from social media. In our FIR scheme, Fuzzy ART network firstly has been employed to segment comments. Next, for each customer segment, we use LSI technique to retrieve important keywords. Then, in order to make the extracted keywords understandable, association rules mining is presented to organize these extracted keywords to build metadata. These extracted useful voices of customers will be transformed into design needs by using Quality Function Deployment (QFD) for further decision making. Unlike conventional information retrieval techniques which acquire too many keywords to get key points, our FIR scheme can extract understandable metadata from social media.
Collapse
|