1
|
Martenot V, Masdeu V, Cupe J, Gehin F, Blanchon M, Dauriat J, Horst A, Renaudin M, Girard P, Zucker JD. LiSA: an assisted literature search pipeline for detecting serious adverse drug events with deep learning. BMC Med Inform Decis Mak 2022; 22:338. [PMID: 36550485 PMCID: PMC9773506 DOI: 10.1186/s12911-022-02085-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 12/13/2022] [Indexed: 12/24/2022] Open
Abstract
INTRODUCTION Detecting safety signals attributed to a drug in scientific literature is a fundamental issue in pharmacovigilance. The constant increase in the volume of publications requires the automation of this tedious task, in order to find and extract relevant articles from the pack. This task is critical, as serious Adverse Drug Reactions (ADRs) still account for a large number of hospital admissions each year. OBJECTIVES The aim of this study is to develop an augmented intelligence methodology for automatically identifying relevant publications mentioning an established link between a Drug and a Serious Adverse Event, according to the European Medicines Agency (EMA) definition of seriousness. METHODS The proposed pipeline, called LiSA (for Literature Search Application), is based on three independent deep learning models supporting a precise detection of safety signals in the biomedical literature. By combining a Bidirectional Encoder Representations from Transformers (BERT) algorithms and a modular architecture, the pipeline achieves a precision of 0.81 and a recall of 0.89 at sentences level in articles extracted from PubMed (either abstract or full-text). We also measured that by using LiSA, a medical reviewer increases by a factor of 2.5 the number of relevant documents it can collect and evaluate compared to a simple keyword search. In the interest of re-usability, emphasis was placed on building a modular pipeline allowing the insertion of other NLP modules to enrich the results provided by the system, and extend it to other use cases. In addition, a lightweight visualization tool was developed to analyze and monitor safety signal results. CONCLUSIONS Overall, the generic pipeline and the visualization tool proposed in this article allows for efficient and accurate monitoring of serious adverse drug reactions from the literature and can easily be adapted to similar pharmacovigilance use cases. To facilitate reproducibility and benefit other research studies, we also shared a first benchmark dataset for Serious Adverse Drug Events detection.
Collapse
Affiliation(s)
| | | | - Jean Cupe
- Quinten, 8 rue Vernier, 75017 Paris, France
| | | | | | | | - Alexander Horst
- grid.483664.b0000 0001 0683 3095Swiss Agency for Therapeutic Products, Swissmedic, Hallerstrasse 7, 3012 Bern, Switzerland
| | - Michael Renaudin
- grid.483664.b0000 0001 0683 3095Swiss Agency for Therapeutic Products, Swissmedic, Hallerstrasse 7, 3012 Bern, Switzerland
| | - Philippe Girard
- grid.483664.b0000 0001 0683 3095Swiss Agency for Therapeutic Products, Swissmedic, Hallerstrasse 7, 3012 Bern, Switzerland
| | | |
Collapse
|
2
|
Quazi S. Artificial intelligence and machine learning in precision and genomic medicine. Med Oncol 2022; 39:120. [PMID: 35704152 PMCID: PMC9198206 DOI: 10.1007/s12032-022-01711-1] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Accepted: 03/14/2022] [Indexed: 10/28/2022]
Abstract
The advancement of precision medicine in medical care has led behind the conventional symptom-driven treatment process by allowing early risk prediction of disease through improved diagnostics and customization of more effective treatments. It is necessary to scrutinize overall patient data alongside broad factors to observe and differentiate between ill and relatively healthy people to take the most appropriate path toward precision medicine, resulting in an improved vision of biological indicators that can signal health changes. Precision and genomic medicine combined with artificial intelligence have the potential to improve patient healthcare. Patients with less common therapeutic responses or unique healthcare demands are using genomic medicine technologies. AI provides insights through advanced computation and inference, enabling the system to reason and learn while enhancing physician decision making. Many cell characteristics, including gene up-regulation, proteins binding to nucleic acids, and splicing, can be measured at high throughput and used as training objectives for predictive models. Researchers can create a new era of effective genomic medicine with the improved availability of a broad range of datasets and modern computer techniques such as machine learning. This review article has elucidated the contributions of ML algorithms in precision and genome medicine.
Collapse
Affiliation(s)
- Sameer Quazi
- GenLab Biosolutions Private Limited, Bangalore, Karnataka, 560043, India.
- Department of Biomedical Sciences, School of Life Sciences, Anglia Ruskin University, Cambridge, UK.
| |
Collapse
|
3
|
Abstract
The advancement of precision medicine in medical care has led behind the conventional symptom-driven treatment process by allowing early risk prediction of disease through improved diagnostics and customization of more effective treatments. It is necessary to scrutinize overall patient data alongside broad factors to observe and differentiate between ill and relatively healthy people to take the most appropriate path toward precision medicine, resulting in an improved vision of biological indicators that can signal health changes. Precision and genomic medicine combined with artificial intelligence have the potential to improve patient healthcare. Patients with less common therapeutic responses or unique healthcare demands are using genomic medicine technologies. AI provides insights through advanced computation and inference, enabling the system to reason and learn while enhancing physician decision making. Many cell characteristics, including gene up-regulation, proteins binding to nucleic acids, and splicing, can be measured at high throughput and used as training objectives for predictive models. Researchers can create a new era of effective genomic medicine with the improved availability of a broad range of datasets and modern computer techniques such as machine learning. This review article has elucidated the contributions of ML algorithms in precision and genome medicine.
Collapse
Affiliation(s)
- Sameer Quazi
- GenLab Biosolutions Private Limited, Bangalore, Karnataka, 560043, India.
- Department of Biomedical Sciences, School of Life Sciences, Anglia Ruskin University, Cambridge, UK.
| |
Collapse
|
4
|
Huang JY, Lee WP, Lee KD. Predicting Adverse Drug Reactions from Social Media Posts: Data Balance, Feature Selection and Deep Learning. Healthcare (Basel) 2022; 10:healthcare10040618. [PMID: 35455795 PMCID: PMC9024774 DOI: 10.3390/healthcare10040618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Revised: 03/22/2022] [Accepted: 03/23/2022] [Indexed: 11/16/2022] Open
Abstract
Social forums offer a lot of new channels for collecting patients’ opinions to construct predictive models of adverse drug reactions (ADRs) for post-marketing surveillance. However, due to the characteristics of social posts, there are many challenges still to be solved when deriving such models, mainly including problems caused by data sparseness, data features with a high-dimensionality, and term diversity in data. To tackle these crucial issues related to identifying ADRs from social posts, we perform data analytics from the perspectives of data balance, feature selection, and feature learning. Meanwhile, we design a comprehensive experimental analysis to investigate the performance of different data processing techniques and data modeling methods. Most importantly, we present a deep learning-based approach that adopts the BERT (Bidirectional Encoder Representations from Transformers) model with a new batch-wise adaptive strategy to enhance the predictive performance. A series of experiments have been conducted to evaluate the machine learning methods with both manual and automated feature engineering processes. The results prove that with their own advantages both types of methods are effective in ADR prediction. In contrast to the traditional machine learning methods, our feature learning approach can automatically achieve the required task to save the manual effort for the large number of experiments.
Collapse
Affiliation(s)
- Jhih-Yuan Huang
- Department of Information Management, National Sun Yat-sen University, Kaohsiung 80424, Taiwan;
| | - Wei-Po Lee
- Department of Information Management, National Sun Yat-sen University, Kaohsiung 80424, Taiwan;
- Correspondence:
| | - King-Der Lee
- Department of Healthcare Administration and Medical Informatics, Kaohsiung Medical University, Kaohsiung 80708, Taiwan;
| |
Collapse
|
5
|
Using Machine Learning for Pharmacovigilance: A Systematic Review. Pharmaceutics 2022; 14:pharmaceutics14020266. [PMID: 35213998 PMCID: PMC8924891 DOI: 10.3390/pharmaceutics14020266] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 01/13/2022] [Accepted: 01/21/2022] [Indexed: 02/04/2023] Open
Abstract
Pharmacovigilance is a science that involves the ongoing monitoring of adverse drug reactions to existing medicines. Traditional approaches in this field can be expensive and time-consuming. The application of natural language processing (NLP) to analyze user-generated content is hypothesized as an effective supplemental source of evidence. In this systematic review, a broad and multi-disciplinary literature search was conducted involving four databases. A total of 5318 publications were initially found. Studies were considered relevant if they reported on the application of NLP to understand user-generated text for pharmacovigilance. A total of 16 relevant publications were included in this systematic review. All studies were evaluated to have medium reliability and validity. For all types of drugs, 14 publications reported positive findings with respect to the identification of adverse drug reactions, providing consistent evidence that natural language processing can be used effectively and accurately on user-generated textual content that was published to the Internet to identify adverse drug reactions for the purpose of pharmacovigilance. The evidence presented in this review suggest that the analysis of textual data has the potential to complement the traditional system of pharmacovigilance.
Collapse
|
6
|
Alarifi M, Jabour A, Foy DM, Zolnoori M. Identifying the underlying factors associated with antidepressant drug discontinuation: content analysis of patients' drug reviews. Inform Health Soc Care 2022; 47:414-423. [PMID: 35050827 DOI: 10.1080/17538157.2021.2024835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
The rate of antidepressant prescriptions is globally increasing. A large portion of patients stop their medications, which could lead to many side effects including relapse, and anxiety. The aim of this was to develop a drug-continuity prediction model and identify the factors associated with drug-continuity using online patient forums. We retrieved 982 antidepressant drug reviews from the online patient's forum AskaPatient.com. We followed the Analytical Framework Method to extract structured data from unstructured data. Using the structured data, we examined the factors associated with antidepressant discontinuity and developed a predictive model using multiple machine learning techniques. We tested multiple machine learning techniques which resulted in different performances ranging from accuracy of 65% to 82%. We found that Random Forest algorithm provides the highest prediction method with 82% Accuracy, 78% Precision, 88.03% Recall, and 84.2% F1-Score. The factors associated with drug discontinuity the most were: withdrawal symptoms, effectiveness-ineffectiveness, perceived-distress-adverse drug reaction, rating, and perceiveddistress related to withdrawal symptoms. Although the nature of data available at online forums differ from data collected through surveys, we found that online patients forum can be a valuable source of data for drug continuity prediction and understanding patients experience. The factors identified through our techniques were consistent with the findings of prior studies that used surveys.
Collapse
Affiliation(s)
- Mohammad Alarifi
- Department of Radiological Sciences, College of Applied Medical Sciences, King Saud University, Riyadh, Saudi Arabia.,Department of Health Informatics & Administration, College of Health Sciences, University of Wisconsin Milwaukee, Milwaukee, Wisconsin, USA
| | - Abdulrahman Jabour
- Health Informatics Department, Faculty of Public Health and Tropical Medicine, Jazan University, Jazan, Saudi Arabia
| | - Doreen M Foy
- Department of Pharmacy and Therapeutics, Pharmacy College, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Maryam Zolnoori
- Section of Medical Informatics, Department of Health Science Research, Mayo Clinic, Rochester, Minnesota, USA
| |
Collapse
|
7
|
Gattepaille LM, Hedfors Vidlin S, Bergvall T, Pierce CE, Ellenius J. Prospective Evaluation of Adverse Event Recognition Systems in Twitter: Results from the Web-RADR Project. Drug Saf 2021; 43:797-808. [PMID: 32410156 PMCID: PMC7395913 DOI: 10.1007/s40264-020-00942-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Introduction A large number of studies on systems to detect and sometimes normalize adverse events (AEs) in social media have been published, but evidence of their practical utility is scarce. This raises the question of the transferability of such systems to new settings. Objectives The aims of this study were to develop an AE recognition system, prospectively evaluate its performance on an external benchmark dataset and identify potential factors influencing the transferability of AE recognition systems. Methods A pipeline based on dictionary lookups and logistic regression classifiers was developed using a proprietary dataset of 196,533 Tweets manually annotated for AE relations and prospectively evaluated the system on the publicly available WEB-RADR reference dataset, exploring different aspects affecting transferability. Results Our system achieved 0.53 precision, 0.52 recall and 0.52 F1-score on the development test set; however, when applied to the WEB-RADR reference dataset, system performance dropped to 0.38 precision, 0.20 recall and 0.26 F1-score. Similarly, a previously published method aiming at automatically detecting adverse event posts reported 0.5 precision, 0.92 recall and 0.65 F1-score on thus another dataset, while performance on the WEB-RADR reference dataset was reduced to 0.37 precision, 0.63 recall and 0.46 F1-score. We identified four potential factors leading to poor transferability: overfitting, selection bias, label bias and prevalence. Conclusion We warn the community about a potentially large discrepancy between the expected performance of automated AE recognition systems based on published results and the actual observed performance on independent data. This study highlights the difficulty of implementing an all-purpose system for automatic adverse event recognition in Twitter, which could explain the lack of such systems in practical pharmacovigilance settings. Our recommendation is to use benchmark independent datasets, such as the WEB-RADR reference, to investigate the transferability of the adverse event recognition systems and ultimately enforce rigorous comparisons across studies on the task. Electronic supplementary material The online version of this article (10.1007/s40264-020-00942-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | - Tomas Bergvall
- Uppsala Monitoring Centre, Box 1051, 75140, Uppsala, Sweden
| | | | - Johan Ellenius
- Uppsala Monitoring Centre, Box 1051, 75140, Uppsala, Sweden
| |
Collapse
|
8
|
Cheerkoot-Jalim S, Khedo KK. A systematic review of text mining approaches applied to various application areas in the biomedical domain. JOURNAL OF KNOWLEDGE MANAGEMENT 2020. [DOI: 10.1108/jkm-09-2019-0524] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Purpose
This work shows the results of a systematic literature review on biomedical text mining. The purpose of this study is to identify the different text mining approaches used in different application areas of the biomedical domain, the common tools used and the challenges of biomedical text mining as compared to generic text mining algorithms. This study will be of value to biomedical researchers by allowing them to correlate text mining approaches to specific biomedical application areas. Implications for future research are also discussed.
Design/methodology/approach
The review was conducted following the principles of the Kitchenham method. A number of research questions were first formulated, followed by the definition of the search strategy. The papers were then selected based on a list of assessment criteria. Each of the papers were analyzed and information relevant to the research questions were extracted.
Findings
It was found that researchers have mostly harnessed data sources such as electronic health records, biomedical literature, social media and health-related forums. The most common text mining technique was natural language processing using tools such as MetaMap and Unstructured Information Management Architecture, alongside the use of medical terminologies such as Unified Medical Language System. The main application area was the detection of adverse drug events. Challenges identified included the need to deal with huge amounts of text, the heterogeneity of the different data sources, the duality of meaning of words in biomedical text and the amount of noise introduced mainly from social media and health-related forums.
Originality/value
To the best of the authors’ knowledge, other reviews in this area have focused on either specific techniques, specific application areas or specific data sources. The results of this review will help researchers to correlate most relevant and recent advances in text mining approaches to specific biomedical application areas by providing an up-to-date and holistic view of work done in this research area. The use of emerging text mining techniques has great potential to spur the development of innovative applications, thus considerably impacting on the advancement of biomedical research.
Collapse
|
9
|
Gujral H, Kushwaha AK, Khurana S. Utilization of Time Series Tools in Life-sciences and Neuroscience. Neurosci Insights 2020; 15:2633105520963045. [PMID: 33345189 PMCID: PMC7727047 DOI: 10.1177/2633105520963045] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Accepted: 09/11/2020] [Indexed: 01/18/2023] Open
Abstract
Time series tools are part and parcel of modern day research. Their usage in the biomedical field; specifically, in neuroscience, has not been previously quantified. A quantification of trends can tell about lacunae in the current uses and point towards future uses. We evaluated the principles and applications of few classical time series tools, such as Principal Component Analysis, Neural Networks, common Auto-regression Models, Markov Models, Hidden Markov Models, Fourier Analysis, Spectral Analysis, in addition to diverse work, generically lumped under time series category. We quantified the usage from two perspectives, one, information technology professionals', other, researchers utilizing these tools for biomedical and neuroscience research. For understanding trends from the information technology perspective, we evaluated two of the largest open source question and answer databases of Stack Overflow and Cross Validated. We quantified the trends in their application in the biomedical domain, and specifically neuroscience, by searching literature and application usage on PubMed. While the use of all the time series tools continues to gain popularity in general biomedical and life science research, and also neuroscience, and so have been the total number of questions asked on Stack overflow and Cross Validated, the total views to questions on these are on a decrease in recent years, indicating well established texts, algorithms, and libraries, resulting in engineers not looking for what used to be common questions a few years back. The use of these tools in neuroscience clearly leaves room for improvement.
Collapse
Affiliation(s)
- Harshit Gujral
- Department of Computer Science and Information Technology, Jaypee Institute of Information Technology, Noida, India
| | - Ajay Kumar Kushwaha
- Department of Computer Science and Information Technology, Jaypee Institute of Information Technology, Noida, India
| | - Sukant Khurana
- CSIR-Central Drug Research Institute, Lucknow, Uttar Pradesh, India
- CSIR-Institute of Genomics and Integrative Biology, India
| |
Collapse
|
10
|
Learning structured medical information from social media. J Biomed Inform 2020; 110:103568. [PMID: 32942027 DOI: 10.1016/j.jbi.2020.103568] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Revised: 08/21/2020] [Accepted: 09/12/2020] [Indexed: 11/21/2022]
Abstract
Our goal is to summarise and aggregate information from social media regarding the symptoms of a disease, the drugs used and the treatment effects both positive and negative. To achieve this we first apply a supervised machine learning method to automatically extract medical concepts from natural language text. In an environment such as social media, where new data is continuously streamed, we need a methodology that will allow us to continuously train with the new data. To attain such incremental re-training, a semi-supervised methodology is developed, which is capable of learning new concepts from a small set of labelled data together with the much larger set of unlabelled data. The semi-supervised methodology deploys a conditional random field (CRF) as the base-line training algorithm for extracting medical concepts. The methodology iteratively augments to the training set sentences having high confidence, and adds terms to existing dictionaries to be used as features with the base-line model for further classification. Our empirical results show that the base-line CRF performs strongly across a range of different dictionary and training sizes; when the base-line is built with the full training data the F1 score reaches the range 84%-90%. Moreover, we show that the semi-supervised method produces a mild but significant improvement over the base-line. We also discuss the significance of the potential improvement of the semi-supervised methodology and found that it is significantly more accurate in most cases than the underlying base-line model.
Collapse
|
11
|
Nguyen VH, Sugiyama K, Kan MY, Halder K. Neural side effect discovery from user credibility and experience-assessed online health discussions. J Biomed Semantics 2020; 11:5. [PMID: 32641159 PMCID: PMC7341623 DOI: 10.1186/s13326-020-00221-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Accepted: 06/07/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Health 2.0 allows patients and caregivers to conveniently seek medical information and advice via e-portals and online discussion forums, especially regarding potential drug side effects. Although online health communities are helpful platforms for obtaining non-professional opinions, they pose risks in communicating unreliable and insufficient information in terms of quality and quantity. Existing methods in extracting user-reported adverse drug reactions (ADRs) in online health forums are not only insufficiently accurate as they disregard user credibility and drug experience, but are also expensive as they rely on supervised ground truth annotation of individual statement. We propose a NEural ArchiTecture for Drug side effect prediction (NEAT), which is optimized on the task of drug side effect discovery based on a complete discussion while being attentive to user credibility and experience, thus, addressing the mentioned shortcomings. We train our neural model in a self-supervised fashion using ground truth drug side effects from mayoclinic.org. NEAT learns to assign each user a score that is descriptive of their credibility and highlights the critical textual segments of their post. RESULTS Experiments show that NEAT improves drug side effect discovery from online health discussion by 3.04% from user-credibility agnostic baselines, and by 9.94% from non-neural baselines in term of F1. Additionally, the latent credibility scores learned by the model correlate well with trustworthiness signals, such as the number of "thanks" received by other forum members, and improve credibility heuristics such as number of posts by 0.113 in term of Spearman's rank correlation coefficient. Experience-based self-supervised attention highlights critical phrases such as mentioned side effects, and enhances fully supervised ADR extraction models based on sequence labelling by 5.502% in terms of precision. CONCLUSIONS NEAT considers both user credibility and experience in online health forums, making feasible a self-supervised approach to side effect prediction for mentioned drugs. The derived user credibility and attention mechanism are transferable and improve downstream ADR extraction models. Our approach enhances automatic drug side effect discovery and fosters research in several domains including pharmacovigilance and clinical studies.
Collapse
Affiliation(s)
- Van-Hoang Nguyen
- School of Computing, National University of Singapore, 13 Computing Drive, Singapore, 117417 Singapore
| | - Kazunari Sugiyama
- School of Computing, National University of Singapore, 13 Computing Drive, Singapore, 117417 Singapore
| | - Min-Yen Kan
- School of Computing, National University of Singapore, 13 Computing Drive, Singapore, 117417 Singapore
| | - Kishaloy Halder
- School of Computing, National University of Singapore, 13 Computing Drive, Singapore, 117417 Singapore
| |
Collapse
|
12
|
Spiro A, Fernández García J, Yanover C. Inferring new relations between medical entities using literature curated term co-occurrences. JAMIA Open 2020; 2:378-385. [PMID: 31984370 PMCID: PMC6951958 DOI: 10.1093/jamiaopen/ooz022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 06/05/2019] [Accepted: 06/08/2019] [Indexed: 11/17/2022] Open
Abstract
Objectives Identifying new relations between medical entities, such as drugs, diseases, and side effects, is typically a resource-intensive task, involving experimentation and clinical trials. The increased availability of related data and curated knowledge enables a computational approach to this task, notably by training models to predict likely relations. Such models rely on meaningful representations of the medical entities being studied. We propose a generic features vector representation that leverages co-occurrences of medical terms, linked with PubMed citations. Materials and Methods We demonstrate the usefulness of the proposed representation by inferring two types of relations: a drug causes a side effect and a drug treats an indication. To predict these relations and assess their effectiveness, we applied 2 modeling approaches: multi-task modeling using neural networks and single-task modeling based on gradient boosting machines and logistic regression. Results These trained models, which predict either side effects or indications, obtained significantly better results than baseline models that use a single direct co-occurrence feature. The results demonstrate the advantage of a comprehensive representation. Discussion Selecting the appropriate representation has an immense impact on the predictive performance of machine learning models. Our proposed representation is powerful, as it spans multiple medical domains and can be used to predict a wide range of relation types. Conclusion The discovery of new relations between various medical entities can be translated into meaningful insights, for example, related to drug development or disease understanding. Our representation of medical entities can be used to train models that predict such relations, thus accelerating healthcare-related discoveries.
Collapse
Affiliation(s)
- Adam Spiro
- Machine Learning for Healthcare and Life Sciences, Department of Health Informatics, IBM Research, Haifa, Israel
| | - Jonatan Fernández García
- Machine Learning for Healthcare and Life Sciences, Department of Health Informatics, IBM Research, Haifa, Israel
| | - Chen Yanover
- Machine Learning for Healthcare and Life Sciences, Department of Health Informatics, IBM Research, Haifa, Israel
| |
Collapse
|
13
|
Ahmed Z, Mohamed K, Zeeshan S, Dong X. Artificial intelligence with multi-functional machine learning platform development for better healthcare and precision medicine. Database (Oxford) 2020; 2020:baaa010. [PMID: 32185396 PMCID: PMC7078068 DOI: 10.1093/database/baaa010] [Citation(s) in RCA: 151] [Impact Index Per Article: 37.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2019] [Revised: 01/05/2020] [Accepted: 01/21/2020] [Indexed: 02/06/2023]
Abstract
Precision medicine is one of the recent and powerful developments in medical care, which has the potential to improve the traditional symptom-driven practice of medicine, allowing earlier interventions using advanced diagnostics and tailoring better and economically personalized treatments. Identifying the best pathway to personalized and population medicine involves the ability to analyze comprehensive patient information together with broader aspects to monitor and distinguish between sick and relatively healthy people, which will lead to a better understanding of biological indicators that can signal shifts in health. While the complexities of disease at the individual level have made it difficult to utilize healthcare information in clinical decision-making, some of the existing constraints have been greatly minimized by technological advancements. To implement effective precision medicine with enhanced ability to positively impact patient outcomes and provide real-time decision support, it is important to harness the power of electronic health records by integrating disparate data sources and discovering patient-specific patterns of disease progression. Useful analytic tools, technologies, databases, and approaches are required to augment networking and interoperability of clinical, laboratory and public health systems, as well as addressing ethical and social issues related to the privacy and protection of healthcare data with effective balance. Developing multifunctional machine learning platforms for clinical data extraction, aggregation, management and analysis can support clinicians by efficiently stratifying subjects to understand specific scenarios and optimize decision-making. Implementation of artificial intelligence in healthcare is a compelling vision that has the potential in leading to the significant improvements for achieving the goals of providing real-time, better personalized and population medicine at lower costs. In this study, we focused on analyzing and discussing various published artificial intelligence and machine learning solutions, approaches and perspectives, aiming to advance academic solutions in paving the way for a new data-centric era of discovery in healthcare.
Collapse
Affiliation(s)
- Zeeshan Ahmed
- Institute for Health, Health Care Policy and Aging Research, Rutgers, The State University of New Jersey, 112 Paterson Street, New Brunswick, NJ, USA
- Department of Medicine, Rutgers Robert Wood Johnson Medical School, Rutgers Biomedical and Health Sciences, 125 Paterson Street, New Brunswick, NJ, USA
- Department of Genetics and Genome Sciences, School of Medicine, University of Connecticut Health Center, 263 Farmington Ave., Farmington, CT, USA
- Institute for Systems Genomics, University of Connecticut, 67 North Eagleville Road, Storrs, CT, USA
| | - Khalid Mohamed
- Department of Genetics and Genome Sciences, School of Medicine, University of Connecticut Health Center, 263 Farmington Ave., Farmington, CT, USA
| | - Saman Zeeshan
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA
| | - XinQi Dong
- Institute for Health, Health Care Policy and Aging Research, Rutgers, The State University of New Jersey, 112 Paterson Street, New Brunswick, NJ, USA
- Department of Medicine, Rutgers Robert Wood Johnson Medical School, Rutgers Biomedical and Health Sciences, 125 Paterson Street, New Brunswick, NJ, USA
| |
Collapse
|
14
|
Wunnava S, Qin X, Kakar T, Sen C, Rundensteiner EA, Kong X. Adverse Drug Event Detection from Electronic Health Records Using Hierarchical Recurrent Neural Networks with Dual-Level Embedding. Drug Saf 2019; 42:113-122. [PMID: 30649736 DOI: 10.1007/s40264-018-0765-9] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
INTRODUCTION Adverse drug event (ADE) detection is a vital step towards effective pharmacovigilance and prevention of future incidents caused by potentially harmful ADEs. The electronic health records (EHRs) of patients in hospitals contain valuable information regarding ADEs and hence are an important source for detecting ADE signals. However, EHR texts tend to be noisy. Yet applying off-the-shelf tools for EHR text preprocessing jeopardizes the subsequent ADE detection performance, which depends on a well tokenized text input. OBJECTIVE In this paper, we report our experience with the NLP Challenges for Detecting Medication and Adverse Drug Events from Electronic Health Records (MADE1.0), which aims to promote deep innovations on this subject. In particular, we have developed rule-based sentence and word tokenization techniques to deal with the noise in the EHR text. METHODS We propose a detection methodology by adapting a three-layered, deep learning architecture of (1) recurrent neural network [bi-directional long short-term memory (Bi-LSTM)] for character-level word representation to encode the morphological features of the medical terminology, (2) Bi-LSTM for capturing the contextual information of each word within a sentence, and (3) conditional random fields for the final label prediction by also considering the surrounding words. We experiment with different word embedding methods commonly used in word-level classification tasks and demonstrate the impact of an integrated usage of both domain-specific and general-purpose pre-trained word embedding for detecting ADEs from EHRs. RESULTS Our system was ranked first for the named entity recognition task in the MADE1.0 challenge, with a micro-averaged F1-score of 0.8290 (official score). CONCLUSION Our results indicate that the integration of two widely used sequence labeling techniques that complement each other along with dual-level embedding (character level and word level) to represent words in the input layer results in a deep learning architecture that achieves excellent information extraction accuracy for EHR notes.
Collapse
Affiliation(s)
- Susmitha Wunnava
- Worcester Polytechnic Institute, 100 Institute Rd, Worcester, MA, 01609, USA.
| | - Xiao Qin
- Worcester Polytechnic Institute, 100 Institute Rd, Worcester, MA, 01609, USA
| | - Tabassum Kakar
- Worcester Polytechnic Institute, 100 Institute Rd, Worcester, MA, 01609, USA
| | - Cansu Sen
- Worcester Polytechnic Institute, 100 Institute Rd, Worcester, MA, 01609, USA
| | | | - Xiangnan Kong
- Worcester Polytechnic Institute, 100 Institute Rd, Worcester, MA, 01609, USA
| |
Collapse
|
15
|
Arnoux-Guenegou A, Girardeau Y, Chen X, Deldossi M, Aboukhamis R, Faviez C, Dahamna B, Karapetiantz P, Guillemin-Lanne S, Lillo-Le Louët A, Texier N, Burgun A, Katsahian S. The Adverse Drug Reactions From Patient Reports in Social Media Project: Protocol for an Evaluation Against a Gold Standard. JMIR Res Protoc 2019; 8:e11448. [PMID: 31066711 PMCID: PMC6528435 DOI: 10.2196/11448] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 11/16/2018] [Accepted: 12/21/2018] [Indexed: 12/30/2022] Open
Abstract
Background Social media is a potential source of information on postmarketing drug safety surveillance that still remains unexploited nowadays. Information technology solutions aiming at extracting adverse reactions (ADRs) from posts on health forums require a rigorous evaluation methodology if their results are to be used to make decisions. First, a gold standard, consisting of manual annotations of the ADR by human experts from the corpus extracted from social media, must be implemented and its quality must be assessed. Second, as for clinical research protocols, the sample size must rely on statistical arguments. Finally, the extraction methods must target the relation between the drug and the disease (which might be either treated or caused by the drug) rather than simple co-occurrences in the posts. Objective We propose a standardized protocol for the evaluation of a software extracting ADRs from the messages on health forums. The study is conducted as part of the Adverse Drug Reactions from Patient Reports in Social Media project. Methods Messages from French health forums were extracted. Entity recognition was based on Racine Pharma lexicon for drugs and Medical Dictionary for Regulatory Activities terminology for potential adverse events (AEs). Natural language processing–based techniques automated the ADR information extraction (relation between the drug and AE entities). The corpus of evaluation was a random sample of the messages containing drugs and/or AE concepts corresponding to recent pharmacovigilance alerts. A total of 2 persons experienced in medical terminology manually annotated the corpus, thus creating the gold standard, according to an annotator guideline. We will evaluate our tool against the gold standard with recall, precision, and f-measure. Interannotator agreement, reflecting gold standard quality, will be evaluated with hierarchical kappa. Granularities in the terminologies will be further explored. Results Necessary and sufficient sample size was calculated to ensure statistical confidence in the assessed results. As we expected a global recall of 0.5, we needed at least 384 identified ADR concepts to obtain a 95% CI with a total width of 0.10 around 0.5. The automated ADR information extraction in the corpus for evaluation is already finished. The 2 annotators already completed the annotation process. The analysis of the performance of the ADR information extraction module as compared with gold standard is ongoing. Conclusions This protocol is based on the standardized statistical methods from clinical research to create the corpus, thus ensuring the necessary statistical power of the assessed results. Such evaluation methodology is required to make the ADR information extraction software useful for postmarketing drug safety surveillance. International Registered Report Identifier (IRRID) RR1-10.2196/11448
Collapse
Affiliation(s)
- Armelle Arnoux-Guenegou
- INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Centre de Recherche des Cordeliers, Paris, France
| | - Yannick Girardeau
- INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Centre de Recherche des Cordeliers, Paris, France.,Département d'Informatique Médicale, Hôpital Européen Georges-Pompidou, Assistance Publique - Hôpitaux de Paris, Paris, France
| | - Xiaoyi Chen
- INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Centre de Recherche des Cordeliers, Paris, France
| | | | - Rim Aboukhamis
- Centre Régional de Pharmacovigilance, Hôpital Européen Georges-Pompidou, Assistance Publique - Hôpitaux de Paris, Paris, France
| | | | - Badisse Dahamna
- Service d'Informatique Biomédicale, D2IM, Centre Hospitalier Universitaire de Rouen, Rouen, France
| | - Pierre Karapetiantz
- INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Centre de Recherche des Cordeliers, Paris, France
| | | | - Agnès Lillo-Le Louët
- Centre Régional de Pharmacovigilance, Hôpital Européen Georges-Pompidou, Assistance Publique - Hôpitaux de Paris, Paris, France
| | | | - Anita Burgun
- INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Centre de Recherche des Cordeliers, Paris, France.,Département d'Informatique Médicale, Hôpital Européen Georges-Pompidou, Assistance Publique - Hôpitaux de Paris, Paris, France.,INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Paris Descartes University, Sorbonne Paris Cité, Paris, France
| | - Sandrine Katsahian
- INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Centre de Recherche des Cordeliers, Paris, France.,INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Paris Descartes University, Sorbonne Paris Cité, Paris, France.,Clinical Research Unit Hôpitaux Universitaires Paris Ouest, Hôpital Européen Georges-Pompidou, Assistance Publique - Hôpitaux de Paris, Paris, France.,INSERM CIC1418, Clinical Epidemiology, Hôpital Européen Georges-Pompidou, Paris, France
| |
Collapse
|
16
|
Sotoodeh M, Ho JC. Improving length of stay prediction using a hidden Markov model. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2019; 2019:425-434. [PMID: 31258996 PMCID: PMC6568102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Estimating length of stay of intensive care unit patients is crucial to reducing health care costs. This can help physicians intervene at the right time to prevent adverse outcomes for the patients. Moreover, resource allocation can be optimized to ensure appropriate hospital staff levels. Yet the length of stay prediction is very hard, as physicians can only accurately estimate half of their patient population. As electronic health records have become more prevalent, researchers can harness the power of machine learning to accurately predict the length of stay. We propose a hidden Markov model-based framework to predict the length of stay using some of patients' physiological measurements during the first 48 hours of their admission to the intensive care unit. We show that this model can succinctly capture temporal patient representations. We demonstrate the potential of our framework on real ICU data in consistently outperforming most of the existing baselines.
Collapse
Affiliation(s)
- Mani Sotoodeh
- Department of Computer Science, Emory University, Atlanta, GA, US
| | - Joyce C Ho
- Department of Computer Science, Emory University, Atlanta, GA, US
| |
Collapse
|
17
|
Zhang M, Zhang M, Ge C, Liu Q, Wang J, Wei J, Zhu KQ. Automatic discovery of adverse reactions through Chinese social media. Data Min Knowl Discov 2019. [DOI: 10.1007/s10618-018-00610-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
18
|
Harnessing social media data for pharmacovigilance: a review of current state of the art, challenges and future directions. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2019. [DOI: 10.1007/s41060-019-00175-3] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
19
|
Hoang T, Liu J, Pratt N, Zheng VW, Chang KC, Roughead E, Li J. Authenticity and credibility aware detection of adverse drug events from social media. Int J Med Inform 2018; 120:157-171. [PMID: 30409341 DOI: 10.1016/j.ijmedinf.2018.10.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2017] [Revised: 09/11/2018] [Accepted: 10/09/2018] [Indexed: 11/16/2022]
Abstract
OBJECTIVES Adverse drug events (ADEs) are among the top causes of hospitalization and death. Social media is a promising open data source for the timely detection of potential ADEs. In this paper, we study the problem of detecting signals of ADEs from social media. METHODS Detecting ADEs whose drug and AE may be reported in different posts of a user leads to major concerns regarding the content authenticity and user credibility, which have not been addressed in previous studies. Content authenticity concerns whether a post mentions drugs or adverse events that are actually consumed or experienced by the writer. User credibility indicates the degree to which chronological evidence from a user's sequence of posts should be trusted in the ADE detection. We propose AC-SPASM, a Bayesian model for the authenticity and credibility aware detection of ADEs from social media. The model exploits the interaction between content authenticity, user credibility and ADE signal quality. In particular, we argue that the credibility of a user correlates with the user's consistency in reporting authentic content. RESULTS We conduct experiments on a real-world Twitter dataset containing 1.2 million posts from 13,178 users. Our benchmark set contains 22 drugs and 8089 AEs. AC-SPASM recognizes authentic posts with F1 - the harmonic mean of precision and recall of 80%, and estimates user credibility with precision@10 = 90% and NDCG@10 - a measure for top-10 ranking quality of 96%. Upon validation against known ADEs, AC-SPASM achieves F1 = 91%, outperforming state-of-the-art baseline models by 32% (p < 0.05). Also, AC-SPASM obtains precision@456 = 73% and NDCG@456 = 94% in detecting and prioritizing unknown potential ADE signals for further investigation. Furthermore, the results show that AC-SPASM is scalable to large datasets. CONCLUSIONS Our study demonstrates that taking into account the content authenticity and user credibility improves the detection of ADEs from social media. Our work generates hypotheses to reduce experts' guesswork in identifying unknown potential ADEs.
Collapse
Affiliation(s)
- Tao Hoang
- School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, South Australia 5095, Australia.
| | - Jixue Liu
- School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, South Australia 5095, Australia
| | - Nicole Pratt
- School of Pharmacy and Medical Sciences, University of South Australia, City East Campus, North Terrace, South Australia 5000, Australia
| | - Vincent W Zheng
- Advanced Digital Sciences Center, 1 Fusionopolis Way, #08-10 Connexis North Tower, Singapore 138632, Singapore
| | - Kevin C Chang
- Department of Computer Science, University of Illinois at Urbana-Champaign, 201 N Goodwin Ave, Urbana, IL 61801, United States
| | - Elizabeth Roughead
- School of Pharmacy and Medical Sciences, University of South Australia, City East Campus, North Terrace, South Australia 5000, Australia
| | - Jiuyong Li
- School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, South Australia 5095, Australia
| |
Collapse
|
20
|
Hoang T, Liu J, Pratt N, Zheng VW, Chang KC, Roughead E, Li J. Authenticity and credibility aware detection of adverse drug events from social media. Int J Med Inform 2018; 120:101-115. [PMID: 30409335 DOI: 10.1016/j.ijmedinf.2018.09.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2017] [Accepted: 09/03/2018] [Indexed: 11/29/2022]
Abstract
OBJECTIVES Adverse drug events (ADEs) are among the top causes of hospitalization and death. Social media is a promising open data source for the timely detection of potential ADEs. In this paper, we study the problem of detecting signals of ADEs from social media. METHODS Detecting ADEs whose drug and AE may be reported in different posts of a user leads to major concerns regarding the content authenticity and user credibility, which have not been addressed in previous studies. Content authenticity concerns whether a post mentions drugs or adverse events that are actually consumed or experienced by the writer. User credibility indicates the degree to which chronological evidence from a user's sequence of posts should be trusted in the ADE detection. We propose AC-SPASM, a Bayesian model for the authenticity and credibility aware detection of ADEs from social media. The model exploits the interaction between content authenticity, user credibility and ADE signal quality. In particular, we argue that the credibility of a user correlates with the user's consistency in reporting authentic content. RESULTS We conduct experiments on a real-world Twitter dataset containing 1.2 million posts from 13,178 users. Our benchmark set contains 22 drugs and 8089 AEs. AC-SPASM recognizes authentic posts with F1 - the harmonic mean of precision and recall of 80%, and estimates user credibility with precision@10 = 90% and NDCG@10 - a measure for top-10 ranking quality of 96%. Upon validation against known ADEs, AC-SPASM achieves F1 = 91%, outperforming state-of-the-art baseline models by 32% (p < 0.05). Also, AC-SPASM obtains precision@456 = 73% and NDCG@456 = 94% in detecting and prioritizing unknown potential ADE signals for further investigation. Furthermore, the results show that AC-SPASM is scalable to large datasets. CONCLUSIONS Our study demonstrates that taking into account the content authenticity and user credibility improves the detection of ADEs from social media. Our work generates hypotheses to reduce experts' guesswork in identifying unknown potential ADEs.
Collapse
Affiliation(s)
- Tao Hoang
- School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Adelaide, South Australia 5095, Australia.
| | - Jixue Liu
- School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Adelaide, South Australia 5095, Australia
| | - Nicole Pratt
- School of Pharmacy and Medical Sciences, University of South Australia, City East Campus, North Terrace, Adelaide, South Australia 5000, Australia
| | - Vincent W Zheng
- Advanced Digital Sciences Center, 1 Fusionopolis Way, #08-10 Connexis North Tower, Singapore, 138632, Singapore
| | - Kevin C Chang
- Department of Computer Science, University of Illinois at Urbana-Champaign, 201 N Goodwin Ave, Urbana, IL 61801, United States
| | - Elizabeth Roughead
- School of Pharmacy and Medical Sciences, University of South Australia, City East Campus, North Terrace, Adelaide, South Australia 5000, Australia
| | - Jiuyong Li
- School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Adelaide, South Australia 5095, Australia
| |
Collapse
|
21
|
Convertino I, Ferraro S, Blandizzi C, Tuccori M. The usefulness of listening social media for pharmacovigilance purposes: a systematic review. Expert Opin Drug Saf 2018; 17:1081-1093. [DOI: 10.1080/14740338.2018.1531847] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Affiliation(s)
- Irma Convertino
- Unit of Pharmacology and Pharmacovigilance, University of Pisa, Pisa, Italy
| | - Sara Ferraro
- Unit of Pharmacology and Pharmacovigilance, University of Pisa, Pisa, Italy
| | - Corrado Blandizzi
- Unit of Pharmacology and Pharmacovigilance, University of Pisa, Pisa, Italy
- Division of Pharmacology and Pharmacovigilance, Department of Clinical and Experimental Medicine, University Hospital of Pisa, Pisa, Italy
| | - Marco Tuccori
- Unit of Pharmacology and Pharmacovigilance, University of Pisa, Pisa, Italy
- Division of Pharmacology and Pharmacovigilance, Department of Clinical and Experimental Medicine, University Hospital of Pisa, Pisa, Italy
| |
Collapse
|
22
|
Tricco AC, Zarin W, Lillie E, Jeblee S, Warren R, Khan PA, Robson R, Pham B, Hirst G, Straus SE. Utility of social media and crowd-intelligence data for pharmacovigilance: a scoping review. BMC Med Inform Decis Mak 2018; 18:38. [PMID: 29898743 PMCID: PMC6001022 DOI: 10.1186/s12911-018-0621-y] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Accepted: 05/31/2018] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND A scoping review to characterize the literature on the use of conversations in social media as a potential source of data for detecting adverse events (AEs) related to health products. METHODS Our specific research questions were (1) What social media listening platforms exist to detect adverse events related to health products, and what are their capabilities and characteristics? (2) What is the validity and reliability of data from social media for detecting these adverse events? MEDLINE, EMBASE, Cochrane Library, and relevant websites were searched from inception to May 2016. Any type of document (e.g., manuscripts, reports) that described the use of social media data for detecting health product AEs was included. Two reviewers independently screened citations and full-texts, and one reviewer and one verifier performed data abstraction. Descriptive synthesis was conducted. RESULTS After screening 3631 citations and 321 full-texts, 70 unique documents with 7 companion reports available from 2001 to 2016 were included. Forty-six documents (66%) described an automated or semi-automated information extraction system to detect health product AEs from social media conversations (in the developmental phase). Seven pre-existing information extraction systems to mine social media data were identified in eight documents. Nineteen documents compared AEs reported in social media data with validated data and found consistent AE discovery in all except two documents. None of the documents reported the validity and reliability of the overall system, but some reported on the performance of individual steps in processing the data. The validity and reliability results were found for the following steps in the data processing pipeline: data de-identification (n = 1), concept identification (n = 3), concept normalization (n = 2), and relation extraction (n = 8). The methods varied widely, and some approaches yielded better results than others. CONCLUSIONS Our results suggest that the use of social media conversations for pharmacovigilance is in its infancy. Although social media data has the potential to supplement data from regulatory agency databases; is able to capture less frequently reported AEs; and can identify AEs earlier than official alerts or regulatory changes, the utility and validity of the data source remains under-studied. TRIAL REGISTRATION Open Science Framework ( https://osf.io/kv9hu/ ).
Collapse
Affiliation(s)
- Andrea C. Tricco
- Li Ka Shing Knowledge Institute of St. Michael’s Hospital, 209 Victoria Street, East Building, Toronto, ON M5B 1W8 Canada
- Epidemiology Division, Dalla Lana School of Public Health, University of Toronto, 6th Floor, 155 College St, Toronto, ON M5T 3M7 Canada
| | - Wasifa Zarin
- Li Ka Shing Knowledge Institute of St. Michael’s Hospital, 209 Victoria Street, East Building, Toronto, ON M5B 1W8 Canada
| | - Erin Lillie
- Li Ka Shing Knowledge Institute of St. Michael’s Hospital, 209 Victoria Street, East Building, Toronto, ON M5B 1W8 Canada
| | - Serena Jeblee
- Department of Computer Science, University of Toronto, 10 King’s College Road, Toronto, ON M5S 3G4 Canada
| | - Rachel Warren
- Li Ka Shing Knowledge Institute of St. Michael’s Hospital, 209 Victoria Street, East Building, Toronto, ON M5B 1W8 Canada
| | - Paul A. Khan
- Li Ka Shing Knowledge Institute of St. Michael’s Hospital, 209 Victoria Street, East Building, Toronto, ON M5B 1W8 Canada
| | - Reid Robson
- Li Ka Shing Knowledge Institute of St. Michael’s Hospital, 209 Victoria Street, East Building, Toronto, ON M5B 1W8 Canada
| | - Ba’ Pham
- Li Ka Shing Knowledge Institute of St. Michael’s Hospital, 209 Victoria Street, East Building, Toronto, ON M5B 1W8 Canada
| | - Graeme Hirst
- Department of Computer Science, University of Toronto, 10 King’s College Road, Toronto, ON M5S 3G4 Canada
| | - Sharon E. Straus
- Li Ka Shing Knowledge Institute of St. Michael’s Hospital, 209 Victoria Street, East Building, Toronto, ON M5B 1W8 Canada
- Department of Geriatric Medicine, Faculty of Medicine, University of Toronto, 27 Kings College Circle, Toronto, ON M5S 1A1 Canada
| |
Collapse
|
23
|
Karapetiantz P, Bellet F, Audeh B, Lardon J, Leprovost D, Aboukhamis R, Morlane-Hondère F, Grouin C, Burgun A, Katsahian S, Jaulent MC, Beyens MN, Lillo-Le Louët A, Bousquet C. Descriptions of Adverse Drug Reactions Are Less Informative in Forums Than in the French Pharmacovigilance Database but Provide More Unexpected Reactions. Front Pharmacol 2018; 9:439. [PMID: 29765326 PMCID: PMC5938397 DOI: 10.3389/fphar.2018.00439] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Accepted: 04/13/2018] [Indexed: 01/28/2023] Open
Abstract
Background: Social media have drawn attention for their potential use in Pharmacovigilance. Recent work showed that it is possible to extract information concerning adverse drug reactions (ADRs) from posts in social media. The main objective of the Vigi4MED project was to evaluate the relevance and quality of the information shared by patients on web forums about drug safety and its potential utility for pharmacovigilance. Methods: After selecting websites of interest, we manually evaluated the relevance of the content of posts for pharmacovigilance related to six drugs (agomelatine, baclofen, duloxetine, exenatide, strontium ranelate, and tetrazepam). We compared forums to the French Pharmacovigilance Database (FPVD) to (1) evaluate whether they contained relevant information to characterize a pharmacovigilance case report (patient’s age and sex; treatment indication, dose and duration; time-to-onset (TTO) and outcome of the ADR, and drug dechallenge and rechallenge) and (2) perform impact analysis (nature, seriousness, unexpectedness, and outcome of the ADR). Results: The cases in the FPVD were significantly more informative than posts in forums for patient description (age, sex), treatment description (dose, duration, TTO), and outcome of the ADR, but the indication for the treatment was more often found in forums. Cases were more often serious in the FPVD than in forums (46% vs. 4%), but forums more often contained an unexpected ADR than the FPVD (24% vs. 17%). Moreover, 197 unexpected ADRs identified in forums were absent from the FPVD and the distribution of the MedDRA System Organ Classes (SOCs) was different between the two data sources. Discussion: This study is the first to evaluate if patients’ posts may qualify as potential and informative case reports that should be stored in a pharmacovigilance database in the same way as case reports submitted by health professionals. The posts were less informative (except for the indication) and focused on less serious ADRs than the FPVD cases, but more unexpected ADRs were presented in forums than in the FPVD and their SOCs were different. Thus, web forums should be considered as a secondary, but complementary source for pharmacovigilance.
Collapse
Affiliation(s)
- Pierre Karapetiantz
- Sorbonne Université, INSERM, Université Paris 13, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances en e-Santé, Paris, France
| | - Florelle Bellet
- Centre Régional de Pharmacovigilance, Centre Hospitalier Universitaire de Saint-Étienne, Hôpital Nord, Saint-Étienne, France
| | - Bissan Audeh
- Université de Lyon, IMT Mines Saint-Etienne, Institut Henri Fayol, Département ISI, Université Jean Monnet, Institut d'Optique Graduate School, Centre National de la Recherche Scientifique, Laboratoire Hubert Curien, Saint-Étienne, France
| | - Jérémy Lardon
- Sorbonne Université, INSERM, Université Paris 13, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances en e-Santé, Paris, France
| | - Damien Leprovost
- Sorbonne Université, INSERM, Université Paris 13, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances en e-Santé, Paris, France
| | - Rim Aboukhamis
- Centre Régional de Pharmacovigilance, Hôpital Européen Georges-Pompidou, Assistance Publique - Hôpitaux de Paris, Paris, France
| | | | - Cyril Grouin
- LIMSI, CNRS, Université Paris-Saclay, Orsay, France
| | - Anita Burgun
- INSERM UMRS1138 Centre de Recherche des Cordeliers, Paris, France.,Département d'Informatique Médicale, Hôpital Européen Georges-Pompidou, Assistance Publique - Hôpitaux de Paris, Paris, France
| | - Sandrine Katsahian
- INSERM UMRS1138 Centre de Recherche des Cordeliers, Paris, France.,Département d'Informatique Médicale, Hôpital Européen Georges-Pompidou, Assistance Publique - Hôpitaux de Paris, Paris, France
| | - Marie-Christine Jaulent
- Sorbonne Université, INSERM, Université Paris 13, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances en e-Santé, Paris, France
| | - Marie-Noëlle Beyens
- Centre Régional de Pharmacovigilance, Centre Hospitalier Universitaire de Saint-Étienne, Hôpital Nord, Saint-Étienne, France
| | - Agnès Lillo-Le Louët
- Centre Régional de Pharmacovigilance, Hôpital Européen Georges-Pompidou, Assistance Publique - Hôpitaux de Paris, Paris, France
| | - Cédric Bousquet
- Sorbonne Université, INSERM, Université Paris 13, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances en e-Santé, Paris, France
| |
Collapse
|
24
|
Beyond vector space model for hierarchical Arabic text classification: A Markov chain approach. Inf Process Manag 2018. [DOI: 10.1016/j.ipm.2017.10.003] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
25
|
SSEL-ADE: A semi-supervised ensemble learning framework for extracting adverse drug events from social media. Artif Intell Med 2017; 84:34-49. [PMID: 29111222 DOI: 10.1016/j.artmed.2017.10.003] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Revised: 08/28/2017] [Accepted: 10/15/2017] [Indexed: 11/21/2022]
Abstract
With the development of Web 2.0 technology, social media websites have become lucrative but under-explored data sources for extracting adverse drug events (ADEs), which is a serious health problem. Besides ADE, other semantic relation types (e.g., drug indication and beneficial effect) could hold between the drug and adverse event mentions, making ADE relation extraction - distinguishing ADE relationship from other relation types - necessary. However, conducting ADE relation extraction in social media environment is not a trivial task because of the expertise-dependent, time-consuming and costly annotation process, and the feature space's high-dimensionality attributed to intrinsic characteristics of social media data. This study aims to develop a framework for ADE relation extraction using patient-generated content in social media with better performance than that delivered by previous efforts. To achieve the objective, a general semi-supervised ensemble learning framework, SSEL-ADE, was developed. The framework exploited various lexical, semantic, and syntactic features, and integrated ensemble learning and semi-supervised learning. A series of experiments were conducted to verify the effectiveness of the proposed framework. Empirical results demonstrate the effectiveness of each component of SSEL-ADE and reveal that our proposed framework outperforms most of existing ADE relation extraction methods The SSEL-ADE can facilitate enhanced ADE relation extraction performance, thereby providing more reliable support for pharmacovigilance. Moreover, the proposed semi-supervised ensemble methods have the potential of being applied to effectively deal with other social media-based problems.
Collapse
|
26
|
Al-Thuhli A, Al-Badawi M, Baghdadi Y, Al-Hamdani A. A Framework for Interfacing Unstructured Data Into Business Process From Enterprise Social Networks. INTERNATIONAL JOURNAL OF ENTERPRISE INFORMATION SYSTEMS 2017. [DOI: 10.4018/ijeis.2017100102] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The increased number of Enterprise Social Networks (ESN) business applications has had a major impact on organizations' business processes improvements by allowing the involvement of human interactions to these process. However, these applications generate unstructured data which create barriers and challenges to offering the data in the form of web services in a SOA environment, which again impacts negatively the business process. In this context, the authors propose a framework to interface ESN unstructured data into BP using text mining techniques. The Term frequency-inverse document frequency is used as a weighting schema in this framework. After that, the cosine similarity and k-mean are utilized to find similar values from different documents and cluster documents into groups respectively. The result of the evaluation of the framework shows promising results for retrieving social unstructured data. These results can be published into the SOA enterprise service bus using the RESTful web services.
Collapse
Affiliation(s)
- Amjed Al-Thuhli
- Department of Computer Science, Sultan Qaboos University, Muscat, Oman
| | | | - Youcef Baghdadi
- Department of Computer Science, Sultan Qaboos University, Muscat, Oman
| | | |
Collapse
|
27
|
Liu Y, Shi J, Chen Y. Patient-centered and experience-aware mining for effective adverse drug reaction discovery in online health forums. J Assoc Inf Sci Technol 2017. [DOI: 10.1002/asi.23929] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Yunzhong Liu
- Computer Science & Engineering; Arizona State University
| | - Jinhe Shi
- Ying Wu College of Computing Sciences; New Jersey Institute of Technology
| | - Yi Chen
- Martin Tuchman School of Management; New Jersey Institute of Technology
| |
Collapse
|
28
|
Bousquet C, Dahamna B, Guillemin-Lanne S, Darmoni SJ, Faviez C, Huot C, Katsahian S, Leroux V, Pereira S, Richard C, Schück S, Souvignet J, Lillo-Le Louët A, Texier N. The Adverse Drug Reactions from Patient Reports in Social Media Project: Five Major Challenges to Overcome to Operationalize Analysis and Efficiently Support Pharmacovigilance Process. JMIR Res Protoc 2017; 6:e179. [PMID: 28935617 PMCID: PMC5629348 DOI: 10.2196/resprot.6463] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Revised: 06/19/2017] [Accepted: 07/12/2017] [Indexed: 11/13/2022] Open
Abstract
Background Adverse drug reactions (ADRs) are an important cause of morbidity and mortality. Classical Pharmacovigilance process is limited by underreporting which justifies the current interest in new knowledge sources such as social media. The Adverse Drug Reactions from Patient Reports in Social Media (ADR-PRISM) project aims to extract ADRs reported by patients in these media. We identified 5 major challenges to overcome to operationalize the analysis of patient posts: (1) variable quality of information on social media, (2) guarantee of data privacy, (3) response to pharmacovigilance expert expectations, (4) identification of relevant information within Web pages, and (5) robust and evolutive architecture. Objective This article aims to describe the current state of advancement of the ADR-PRISM project by focusing on the solutions we have chosen to address these 5 major challenges. Methods In this article, we propose methods and describe the advancement of this project on several aspects: (1) a quality driven approach for selecting relevant social media for the extraction of knowledge on potential ADRs, (2) an assessment of ethical issues and French regulation for the analysis of data on social media, (3) an analysis of pharmacovigilance expert requirements when reviewing patient posts on the Internet, (4) an extraction method based on natural language processing, pattern based matching, and selection of relevant medical concepts in reference terminologies, and (5) specifications of a component-based architecture for the monitoring system. Results Considering the 5 major challenges, we (1) selected a set of 21 validated criteria for selecting social media to support the extraction of potential ADRs, (2) proposed solutions to guarantee data privacy of patients posting on Internet, (3) took into account pharmacovigilance expert requirements with use case diagrams and scenarios, (4) built domain-specific knowledge resources embeding a lexicon, morphological rules, context rules, semantic rules, syntactic rules, and post-analysis processing, and (5) proposed a component-based architecture that allows storage of big data and accessibility to third-party applications through Web services. Conclusions We demonstrated the feasibility of implementing a component-based architecture that allows collection of patient posts on the Internet, near real-time processing of those posts including annotation, and storage in big data structures. In the next steps, we will evaluate the posts identified by the system in social media to clarify the interest and relevance of such approach to improve conventional pharmacovigilance processes based on spontaneous reporting.
Collapse
Affiliation(s)
- Cedric Bousquet
- Laboratoire d'Informatique Médicale et d'Ingénieurie des Connaissances en e-Santé, U1142, Institut National de la Santé et de la Recherche Médicale, Paris, France.,Service de Santé Publique et de l'Information Médicale, Centre Hospitalier Universitaire de Saint Etienne, Saint-Etienne, France
| | - Badisse Dahamna
- Department of Biomedical Informatics, Rouen University Hospital, Rouen, France
| | | | - Stefan J Darmoni
- Laboratoire d'Informatique Médicale et d'Ingénieurie des Connaissances en e-Santé, U1142, Institut National de la Santé et de la Recherche Médicale, Paris, France.,Department of Biomedical Informatics, Rouen University Hospital, Rouen, France
| | | | | | - Sandrine Katsahian
- Unité mixte de recherche 1138, équipe 22, Institut National de la Santé et de la Recherche Médicale, Centre de Recherche des Cordeliers, Paris, France
| | | | | | | | | | - Julien Souvignet
- Laboratoire d'Informatique Médicale et d'Ingénieurie des Connaissances en e-Santé, U1142, Institut National de la Santé et de la Recherche Médicale, Paris, France
| | - Agnès Lillo-Le Louët
- Assistance Publique-Hôpitaux de Paris, Hôpital Européen Georges Pompidou, Centre Régional de Pharmacovigilance, Paris, France
| | | |
Collapse
|
29
|
Golder S, Ahmed S, Norman G, Booth A. Attitudes Toward the Ethics of Research Using Social Media: A Systematic Review. J Med Internet Res 2017; 19:e195. [PMID: 28588006 PMCID: PMC5478799 DOI: 10.2196/jmir.7082] [Citation(s) in RCA: 110] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2016] [Revised: 03/13/2017] [Accepted: 03/30/2017] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Although primarily used for social networking and often used for social support and dissemination, data on social media platforms are increasingly being used to facilitate research. However, the ethical challenges in conducting social media research remain of great concern. Although much debated in the literature, it is the views of the public that are most pertinent to inform future practice. OBJECTIVE The aim of our study was to ascertain attitudes on the ethical considerations of using social media as a data source for research as expressed by social media users and researchers. METHODS A systematic review was conducted, wherein 16 databases and 2 Internet search engines were searched in addition to handsearching, reference checking, citation searching, and contacting authors and experts. Studies that conducted any qualitative methods to collect data on attitudes on the ethical implications of research using social media were included. Quality assessment was conducted using the quality of reporting tool (QuaRT) and findings analyzed using inductive thematic synthesis. RESULTS In total, 17 studies met the inclusion criteria. Attitudes varied from overly positive with people expressing the views about the essential nature of such research for the public good, to very concerned with views that social media research should not happen. Underlying reasons for this variation related to issues such as the purpose and quality of the research, the researcher affiliation, and the potential harms. The methods used to conduct the research were also important. Many respondents were positive about social media research while adding caveats such as the need for informed consent or use restricted to public platforms only. CONCLUSIONS Many conflicting issues contribute to the complexity of good ethical practice in social media research. However, this should not deter researchers from conducting social media research. Each Internet research project requires an individual assessment of its own ethical issues. Guidelines on ethical conduct should be based on current evidence and standardized to avoid discrepancies between, and duplication across, different institutions, taking into consideration different jurisdictions.
Collapse
Affiliation(s)
- Su Golder
- Department of Health Sciences, University of York, York, United Kingdom
| | - Shahd Ahmed
- Department of Health Sciences, University of York, York, United Kingdom
| | - Gill Norman
- School of Nursing, Midwifery & Social Work, University of Manchester, Manchester, United Kingdom
| | - Andrew Booth
- School of Health and Related Research (ScHARR), University of Sheffield, Sheffield, United Kingdom
| |
Collapse
|
30
|
Krallinger M, Rabal O, Lourenço A, Oyarzabal J, Valencia A. Information Retrieval and Text Mining Technologies for Chemistry. Chem Rev 2017; 117:7673-7761. [PMID: 28475312 DOI: 10.1021/acs.chemrev.6b00851] [Citation(s) in RCA: 111] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.
Collapse
Affiliation(s)
- Martin Krallinger
- Structural Computational Biology Group, Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre , C/Melchor Fernández Almagro 3, Madrid E-28029, Spain
| | - Obdulia Rabal
- Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra , Avenida Pio XII 55, Pamplona E-31008, Spain
| | - Anália Lourenço
- ESEI - Department of Computer Science, University of Vigo , Edificio Politécnico, Campus Universitario As Lagoas s/n, Ourense E-32004, Spain.,Centro de Investigaciones Biomédicas (Centro Singular de Investigación de Galicia) , Campus Universitario Lagoas-Marcosende, Vigo E-36310, Spain.,CEB-Centre of Biological Engineering, University of Minho , Campus de Gualtar, Braga 4710-057, Portugal
| | - Julen Oyarzabal
- Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra , Avenida Pio XII 55, Pamplona E-31008, Spain
| | - Alfonso Valencia
- Life Science Department, Barcelona Supercomputing Centre (BSC-CNS) , C/Jordi Girona, 29-31, Barcelona E-08034, Spain.,Joint BSC-IRB-CRG Program in Computational Biology, Parc Científic de Barcelona , C/ Baldiri Reixac 10, Barcelona E-08028, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA) , Passeig de Lluís Companys 23, Barcelona E-08010, Spain
| |
Collapse
|
31
|
Alvaro N, Miyao Y, Collier N. TwiMed: Twitter and PubMed Comparable Corpus of Drugs, Diseases, Symptoms, and Their Relations. JMIR Public Health Surveill 2017; 3:e24. [PMID: 28468748 PMCID: PMC5438461 DOI: 10.2196/publichealth.6396] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2016] [Revised: 11/24/2016] [Accepted: 03/20/2017] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Work on pharmacovigilance systems using texts from PubMed and Twitter typically target at different elements and use different annotation guidelines resulting in a scenario where there is no comparable set of documents from both Twitter and PubMed annotated in the same manner. OBJECTIVE This study aimed to provide a comparable corpus of texts from PubMed and Twitter that can be used to study drug reports from these two sources of information, allowing researchers in the area of pharmacovigilance using natural language processing (NLP) to perform experiments to better understand the similarities and differences between drug reports in Twitter and PubMed. METHODS We produced a corpus comprising 1000 tweets and 1000 PubMed sentences selected using the same strategy and annotated at entity level by the same experts (pharmacists) using the same set of guidelines. RESULTS The resulting corpus, annotated by two pharmacists, comprises semantically correct annotations for a set of drugs, diseases, and symptoms. This corpus contains the annotations for 3144 entities, 2749 relations, and 5003 attributes. CONCLUSIONS We present a corpus that is unique in its characteristics as this is the first corpus for pharmacovigilance curated from Twitter messages and PubMed sentences using the same data selection and annotation strategies. We believe this corpus will be of particular interest for researchers willing to compare results from pharmacovigilance systems (eg, classifiers and named entity recognition systems) when using data from Twitter and from PubMed. We hope that given the comprehensive set of drug names and the annotated entities and relations, this corpus becomes a standard resource to compare results from different pharmacovigilance studies in the area of NLP.
Collapse
Affiliation(s)
- Nestor Alvaro
- National Institute of Informatics, Department of Informatics, Tokyo, Japan
- The Graduate University for Advanced Studies (SOKENDAI), Kanagawa, Japan
| | - Yusuke Miyao
- National Institute of Informatics, Department of Informatics, Tokyo, Japan
- The Graduate University for Advanced Studies (SOKENDAI), Kanagawa, Japan
| | - Nigel Collier
- Faculty of Modern & Medieval Languages, Department of Theoretical and Applied Linguistics, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
32
|
Clinicians' Reports in Electronic Health Records Versus Patients' Concerns in Social Media: A Pilot Study of Adverse Drug Reactions of Aspirin and Atorvastatin. Drug Saf 2016; 39:241-50. [PMID: 26715498 DOI: 10.1007/s40264-015-0381-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
INTRODUCTION Large databases of clinician reported (e.g., allergy repositories) and patient reported (e.g., social media) adverse drug reactions (ADRs) exist; however, whether patients and clinicians report the same concerns is not clear. OBJECTIVES Our objective was to compare electronic health record data and social media data to better understand differences and similarities between clinician-reported ADRs and patients' concerns regarding aspirin and atorvastatin. METHODS This pilot study explored a large repository of electronic health record data and social media data for clinician-reported ADRs and patients concerns for two common medications: aspirin (n = 31,817 ADRs accessible in clinical data; n = 19,186 potential ADRs accessible in social media data) and atorvastatin (n = 15,047 ADRs accessible in clinical data; n = 23,408 potential ADRs accessible in social media data). RESULTS We found that the most frequently reported ADRs matched the most frequent patients' concerns. However, several less frequently reported reactions were more prevalent on social media (i.e., aspirin-induced hypoglycemia was discussed only on social media). Overall, we found a relatively strong positive and statistically significant correlation between the frequency ranking of reactions and patients' concerns for atorvastatin (Pearson's r = 0.61, p < 0.001) but not for aspirin (Pearson's r = 0.1, p = 0.69). CONCLUSION Future studies should develop further natural language methods for a more detailed data analysis (i.e., identifying causality and temporal aspects in the social media data).
Collapse
|
33
|
A New Data Representation Based on Training Data Characteristics to Extract Drug Name Entity in Medical Text. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2016; 2016:3483528. [PMID: 27843447 PMCID: PMC5098107 DOI: 10.1155/2016/3483528] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Revised: 08/08/2016] [Accepted: 09/18/2016] [Indexed: 11/18/2022]
Abstract
One essential task in information extraction from the medical corpus is drug name recognition. Compared with text sources come from other domains, the medical text mining poses more challenges, for example, more unstructured text, the fast growing of new terms addition, a wide range of name variation for the same drug, the lack of labeled dataset sources and external knowledge, and the multiple token representations for a single drug name. Although many approaches have been proposed to overwhelm the task, some problems remained with poor F-score performance (less than 0.75). This paper presents a new treatment in data representation techniques to overcome some of those challenges. We propose three data representation techniques based on the characteristics of word distribution and word similarities as a result of word embedding training. The first technique is evaluated with the standard NN model, that is, MLP. The second technique involves two deep network classifiers, that is, DBN and SAE. The third technique represents the sentence as a sequence that is evaluated with a recurrent NN model, that is, LSTM. In extracting the drug name entities, the third technique gives the best F-score performance compared to the state of the art, with its average F-score being 0.8645.
Collapse
|
34
|
Korkontzelos I, Nikfarjam A, Shardlow M, Sarker A, Ananiadou S, Gonzalez GH. Analysis of the effect of sentiment analysis on extracting adverse drug reactions from tweets and forum posts. J Biomed Inform 2016; 62:148-58. [PMID: 27363901 PMCID: PMC4981644 DOI: 10.1016/j.jbi.2016.06.007] [Citation(s) in RCA: 58] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2015] [Revised: 06/03/2016] [Accepted: 06/22/2016] [Indexed: 12/03/2022]
Abstract
Sentiment analysis features are useful in spotting adverse drug reactions in text. Sentiment analysis features help to distinguish adverse drug reactions and indications. Posts about adverse drug reactions are associated with negative feelings.
Objective The abundance of text available in social media and health related forums along with the rich expression of public opinion have recently attracted the interest of the public health community to use these sources for pharmacovigilance. Based on the intuition that patients post about Adverse Drug Reactions (ADRs) expressing negative sentiments, we investigate the effect of sentiment analysis features in locating ADR mentions. Methods We enrich the feature space of a state-of-the-art ADR identification method with sentiment analysis features. Using a corpus of posts from the DailyStrength forum and tweets annotated for ADR and indication mentions, we evaluate the extent to which sentiment analysis features help in locating ADR mentions and distinguishing them from indication mentions. Results Evaluation results show that sentiment analysis features marginally improve ADR identification in tweets and health related forum posts. Adding sentiment analysis features achieved a statistically significant F-measure increase from 72.14% to 73.22% in the Twitter part of an existing corpus using its original train/test split. Using stratified 10 × 10-fold cross-validation, statistically significant F-measure increases were shown in the DailyStrength part of the corpus, from 79.57% to 80.14%, and in the Twitter part of the corpus, from 66.91% to 69.16%. Moreover, sentiment analysis features are shown to reduce the number of ADRs being recognized as indications. Conclusion This study shows that adding sentiment analysis features can marginally improve the performance of even a state-of-the-art ADR identification method. This improvement can be of use to pharmacovigilance practice, due to the rapidly increasing popularity of social media and health forums.
Collapse
Affiliation(s)
- Ioannis Korkontzelos
- National Centre for Text Mining, School of Computer Science, University of Manchester, Manchester Institute of Biotechnology, 131 Princess Street, M1 7DN Manchester, United Kingdom.
| | - Azadeh Nikfarjam
- Department of Biomedical Informatics, Arizona State University, Mayo Clinic, Samuel C. Johnson Research Building, 13212 East Shea Boulevard, Scottsdale, AZ 85259, United States.
| | - Matthew Shardlow
- National Centre for Text Mining, School of Computer Science, University of Manchester, Manchester Institute of Biotechnology, 131 Princess Street, M1 7DN Manchester, United Kingdom.
| | - Abeed Sarker
- Department of Biomedical Informatics, Arizona State University, Mayo Clinic, Samuel C. Johnson Research Building, 13212 East Shea Boulevard, Scottsdale, AZ 85259, United States.
| | - Sophia Ananiadou
- National Centre for Text Mining, School of Computer Science, University of Manchester, Manchester Institute of Biotechnology, 131 Princess Street, M1 7DN Manchester, United Kingdom.
| | - Graciela H Gonzalez
- Department of Biomedical Informatics, Arizona State University, Mayo Clinic, Samuel C. Johnson Research Building, 13212 East Shea Boulevard, Scottsdale, AZ 85259, United States.
| |
Collapse
|
35
|
Bravo À, Li TS, Su AI, Good BM, Furlong LI. Combining machine learning, crowdsourcing and expert knowledge to detect chemical-induced diseases in text. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw094. [PMID: 27307137 PMCID: PMC4908671 DOI: 10.1093/database/baw094] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/04/2015] [Accepted: 05/10/2016] [Indexed: 01/13/2023]
Abstract
Drug toxicity is a major concern for both regulatory agencies and the pharmaceutical industry. In this context, text-mining methods for the identification of drug side effects from free text are key for the development of up-to-date knowledge sources on drug adverse reactions. We present a new system for identification of drug side effects from the literature that combines three approaches: machine learning, rule- and knowledge-based approaches. This system has been developed to address the Task 3.B of Biocreative V challenge (BC5) dealing with Chemical-induced Disease (CID) relations. The first two approaches focus on identifying relations at the sentence-level, while the knowledge-based approach is applied both at sentence and abstract levels. The machine learning method is based on the BeFree system using two corpora as training data: the annotated data provided by the CID task organizers and a new CID corpus developed by crowdsourcing. Different combinations of results from the three strategies were selected for each run of the challenge. In the final evaluation setting, the system achieved the highest Recall of the challenge (63%). By performing an error analysis, we identified the main causes of misclassifications and areas for improving of our system, and highlighted the need of consistent gold standard data sets for advancing the state of the art in text mining of drug side effects. Database URL: https://zenodo.org/record/29887?ln¼en#.VsL3yDLWR_V
Collapse
Affiliation(s)
- Àlex Bravo
- Research Programme on Biomedical Informatics (GRIB), IMIM, UPF, Barcelona, Spain and
| | - Tong Shu Li
- Department of Molecular and Experimental Medicine, the Scripps Research Institute, La Jolla, CA, USA
| | - Andrew I Su
- Department of Molecular and Experimental Medicine, the Scripps Research Institute, La Jolla, CA, USA
| | - Benjamin M Good
- Department of Molecular and Experimental Medicine, the Scripps Research Institute, La Jolla, CA, USA
| | - Laura I Furlong
- Research Programme on Biomedical Informatics (GRIB), IMIM, UPF, Barcelona, Spain and
| |
Collapse
|
36
|
Golder S, Norman G, Loke YK. Systematic review on the prevalence, frequency and comparative value of adverse events data in social media. Br J Clin Pharmacol 2015; 80:878-88. [PMID: 26271492 PMCID: PMC4594731 DOI: 10.1111/bcp.12746] [Citation(s) in RCA: 66] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2015] [Revised: 07/16/2015] [Accepted: 08/03/2015] [Indexed: 11/27/2022] Open
Abstract
AIM The aim of this review was to summarize the prevalence, frequency and comparative value of information on the adverse events of healthcare interventions from user comments and videos in social media. METHODS A systematic review of assessments of the prevalence or type of information on adverse events in social media was undertaken. Sixteen databases and two internet search engines were searched in addition to handsearching, reference checking and contacting experts. The results were sifted independently by two researchers. Data extraction and quality assessment were carried out by one researcher and checked by a second. The quality assessment tool was devised in-house and a narrative synthesis of the results followed. RESULTS From 3064 records, 51 studies met the inclusion criteria. The studies assessed over 174 social media sites with discussion forums (71%) being the most popular. The overall prevalence of adverse events reports in social media varied from 0.2% to 8% of posts. Twenty-nine studies compared the results from searching social media with using other data sources to identify adverse events. There was general agreement that a higher frequency of adverse events was found in social media and that this was particularly true for 'symptom' related and 'mild' adverse events. Those adverse events that were under-represented in social media were laboratory-based and serious adverse events. CONCLUSIONS Reports of adverse events are identifiable within social media. However, there is considerable heterogeneity in the frequency and type of events reported, and the reliability or validity of the data has not been thoroughly evaluated.
Collapse
Affiliation(s)
- Su Golder
- Department of Health Sciences, University of YorkYork, YO10 5DD, UK
| | - Gill Norman
- School of Nursing, Midwifery & Social Work, University of ManchesterRoom 5.328, Jean McFarlane Building, Oxford Road, Manchester, M13 9PL, UK
| | - Yoon K Loke
- Norwich Medical School, University of East AngliaNorwich, NR4 7TJ, UK
| |
Collapse
|
37
|
Sloane R, Osanlou O, Lewis D, Bollegala D, Maskell S, Pirmohamed M. Social media and pharmacovigilance: A review of the opportunities and challenges. Br J Clin Pharmacol 2015; 80:910-20. [PMID: 26147850 PMCID: PMC4594734 DOI: 10.1111/bcp.12717] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2015] [Revised: 06/29/2015] [Accepted: 07/03/2015] [Indexed: 01/23/2023] Open
Abstract
Adverse drug reactions come at a considerable cost on society. Social media are a potentially invaluable reservoir of information for pharmacovigilance, yet their true value remains to be fully understood. In order to realize the benefits social media holds, a number of technical, regulatory and ethical challenges remain to be addressed. We outline these key challenges identifying relevant current research and present possible solutions.
Collapse
Affiliation(s)
- Richard Sloane
- Department of Electrical Engineering and Electronics, University of LiverpoolL69 3GJ, UK
- Department of Molecular and Clinical Pharmacology, University of LiverpoolL69 3GL, UK
- Department of Computer Science, University of LiverpoolL69 3BX, UK
| | - Orod Osanlou
- Department of Molecular and Clinical Pharmacology, University of LiverpoolL69 3GL, UK
- Royal Liverpool and Broadgreen University Hospital NHS TrustLiverpool, L7 8XP, UK
| | - David Lewis
- Drug Safety & Epidemiology, Novartis Pharma AG, PostfachCH-4002, Basel, Switzerland
| | | | - Simon Maskell
- Department of Electrical Engineering and Electronics, University of LiverpoolL69 3GJ, UK
- Department of Computer Science, University of LiverpoolL69 3BX, UK
| | - Munir Pirmohamed
- Department of Molecular and Clinical Pharmacology, University of LiverpoolL69 3GL, UK
- Royal Liverpool and Broadgreen University Hospital NHS TrustLiverpool, L7 8XP, UK
| |
Collapse
|
38
|
Lardon J, Abdellaoui R, Bellet F, Asfari H, Souvignet J, Texier N, Jaulent MC, Beyens MN, Burgun A, Bousquet C. Adverse Drug Reaction Identification and Extraction in Social Media: A Scoping Review. J Med Internet Res 2015; 17:e171. [PMID: 26163365 PMCID: PMC4526988 DOI: 10.2196/jmir.4304] [Citation(s) in RCA: 67] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2015] [Revised: 04/09/2015] [Accepted: 04/22/2015] [Indexed: 02/06/2023] Open
Abstract
Background The underreporting of adverse drug reactions (ADRs) through traditional reporting channels is a limitation in the efficiency of the current pharmacovigilance system. Patients’ experiences with drugs that they report on social media represent a new source of data that may have some value in postmarketing safety surveillance. Objective A scoping review was undertaken to explore the breadth of evidence about the use of social media as a new source of knowledge for pharmacovigilance. Methods Daubt et al’s recommendations for scoping reviews were followed. The research questions were as follows: How can social media be used as a data source for postmarketing drug surveillance? What are the available methods for extracting data? What are the different ways to use these data? We queried PubMed, Embase, and Google Scholar to extract relevant articles that were published before June 2014 and with no lower date limit. Two pairs of reviewers independently screened the selected studies and proposed two themes of review: manual ADR identification (theme 1) and automated ADR extraction from social media (theme 2). Descriptive characteristics were collected from the publications to create a database for themes 1 and 2. Results Of the 1032 citations from PubMed and Embase, 11 were relevant to the research question. An additional 13 citations were added after further research on the Internet and in reference lists. Themes 1 and 2 explored 11 and 13 articles, respectively. Ways of approaching the use of social media as a pharmacovigilance data source were identified. Conclusions This scoping review noted multiple methods for identifying target data, extracting them, and evaluating the quality of medical information from social media. It also showed some remaining gaps in the field. Studies related to the identification theme usually failed to accurately assess the completeness, quality, and reliability of the data that were analyzed from social media. Regarding extraction, no study proposed a generic approach to easily adding a new site or data source. Additional studies are required to precisely determine the role of social media in the pharmacovigilance system.
Collapse
Affiliation(s)
- Jérémy Lardon
- Université Paris 13, Sorbonne Paris Cité, Laboratoire d'Informatique Médicale et d'Ingénieurie des Connaissances en e-Santé (LIMICS), (Unité Mixte de Recherche en Santé, UMR_S 1142), F-93430, Villetaneuse, France, Sorbonne Universités, University of Pierre and Marie Curie (UPMC) Université Paris 06, Unité Mixte de Recherche en Santé (UMR_S) 1142, Laboratoire d'Informatique Médicale et d'Ingénieurie des Connaissances en e-Santé (LIMICS), F-75006, Institut National de la Santé et de la Recherche Médicale (INSERM), U1142, Laboratoire d'Informatique Médicale et d'Ingénieurie des Connaissances en e-Santé (LIMICS), F-75006, Paris, France.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Sarker A, Ginn R, Nikfarjam A, O'Connor K, Smith K, Jayaraman S, Upadhaya T, Gonzalez G. Utilizing social media data for pharmacovigilance: A review. J Biomed Inform 2015; 54:202-12. [PMID: 25720841 DOI: 10.1016/j.jbi.2015.02.004] [Citation(s) in RCA: 238] [Impact Index Per Article: 26.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2014] [Revised: 01/02/2015] [Accepted: 02/15/2015] [Indexed: 10/23/2022]
Abstract
OBJECTIVE Automatic monitoring of Adverse Drug Reactions (ADRs), defined as adverse patient outcomes caused by medications, is a challenging research problem that is currently receiving significant attention from the medical informatics community. In recent years, user-posted data on social media, primarily due to its sheer volume, has become a useful resource for ADR monitoring. Research using social media data has progressed using various data sources and techniques, making it difficult to compare distinct systems and their performances. In this paper, we perform a methodical review to characterize the different approaches to ADR detection/extraction from social media, and their applicability to pharmacovigilance. In addition, we present a potential systematic pathway to ADR monitoring from social media. METHODS We identified studies describing approaches for ADR detection from social media from the Medline, Embase, Scopus and Web of Science databases, and the Google Scholar search engine. Studies that met our inclusion criteria were those that attempted to extract ADR information posted by users on any publicly available social media platform. We categorized the studies according to different characteristics such as primary ADR detection approach, size of corpus, data source(s), availability, and evaluation criteria. RESULTS Twenty-two studies met our inclusion criteria, with fifteen (68%) published within the last two years. However, publicly available annotated data is still scarce, and we found only six studies that made the annotations used publicly available, making system performance comparisons difficult. In terms of algorithms, supervised classification techniques to detect posts containing ADR mentions, and lexicon-based approaches for extraction of ADR mentions from texts have been the most popular. CONCLUSION Our review suggests that interest in the utilization of the vast amounts of available social media data for ADR monitoring is increasing. In terms of sources, both health-related and general social media data have been used for ADR detection-while health-related sources tend to contain higher proportions of relevant data, the volume of data from general social media websites is significantly higher. There is still very limited amount of annotated data publicly available , and, as indicated by the promising results obtained by recent supervised learning approaches, there is a strong need to make such data available to the research community.
Collapse
Affiliation(s)
- Abeed Sarker
- Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ, United States.
| | - Rachel Ginn
- Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ, United States
| | - Azadeh Nikfarjam
- Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ, United States
| | - Karen O'Connor
- Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ, United States
| | - Karen Smith
- Rueckert-Hartman College for Health Professions, Regis University, Denver, CO, United States
| | - Swetha Jayaraman
- School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ, United States
| | - Tejaswi Upadhaya
- School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ, United States
| | - Graciela Gonzalez
- Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ, United States
| |
Collapse
|