51
|
Sampathkumar H, Chen XW, Luo B. Mining adverse drug reactions from online healthcare forums using hidden Markov model. BMC Med Inform Decis Mak 2014; 14:91. [PMID: 25341686 PMCID: PMC4283122 DOI: 10.1186/1472-6947-14-91] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2013] [Accepted: 08/18/2014] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Adverse Drug Reactions are one of the leading causes of injury or death among patients undergoing medical treatments. Not all Adverse Drug Reactions are identified before a drug is made available in the market. Current post-marketing drug surveillance methods, which are based purely on voluntary spontaneous reports, are unable to provide the early indications necessary to prevent the occurrence of such injuries or fatalities. The objective of this research is to extract reports of adverse drug side-effects from messages in online healthcare forums and use them as early indicators to assist in post-marketing drug surveillance. METHODS We treat the task of extracting adverse side-effects of drugs from healthcare forum messages as a sequence labeling problem and present a Hidden Markov Model(HMM) based Text Mining system that can be used to classify a message as containing drug side-effect information and then extract the adverse side-effect mentions from it. A manually annotated dataset from http://www.medications.com is used in the training and validation of the HMM based Text Mining system. RESULTS A 10-fold cross-validation on the manually annotated dataset yielded on average an F-Score of 0.76 from the HMM Classifier, in comparison to 0.575 from the Baseline classifier. Without the Plain Text Filter component as a part of the Text Processing module, the F-Score of the HMM Classifier was reduced to 0.378 on average, while absence of the HTML Filter component was found to have no impact. Reducing the Drug names dictionary size by half, on average reduced the F-Score of the HMM Classifier to 0.359, while a similar reduction to the side-effects dictionary yielded an F-Score of 0.651 on average. Adverse side-effects mined from http://www.medications.com and http://www.steadyhealth.com were found to match the Adverse Drug Reactions on the Drug Package Labels of several drugs. In addition, some novel adverse side-effects, which can be potential Adverse Drug Reactions, were also identified. CONCLUSIONS The results from the HMM based Text Miner are encouraging to pursue further enhancements to this approach. The mined novel side-effects can act as early indicators for health authorities to help focus their efforts in post-marketing drug surveillance.
Collapse
Affiliation(s)
| | - Xue-wen Chen
- />Dept. of Computer Science, Wayne State University, 48202 Detroit, USA
| | - Bo Luo
- />EECS, University of Kansas, 66045 Lawrence, USA
| |
Collapse
|
52
|
Baldo P, De Paoli P. Pharmacovigilance in oncology: evaluation of current practice and future perspectives. J Eval Clin Pract 2014; 20:559-69. [PMID: 24909067 DOI: 10.1111/jep.12184] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/23/2014] [Indexed: 12/11/2022]
Abstract
RATIONALE, AIMS AND OBJECTIVES Pharmacovigilance (PV), or drug safety monitoring, aims to improve patient safety through the detection and management of drug-related adverse reactions. It is implemented both by spontaneous reporting of adverse drug reactions (ADRs) and by careful detection of signals suggestive of drug toxicity. PV is an important clinical topic in clinical practice and pharmacotherapy, assuring the maintenance of a safe risk/benefit ratio throughout the commercial life cycle of a drug. METHODS We conducted a structured literature search on PubMed, Scopus, Cinahl and the Cochrane Library. We also performed manual searches in international databases of ADR individual reports to outline a structured profile on the topic. Our goal was to review key elements that affect safety monitoring of cancer drugs and their appropriate use, highlighting the strengths and weaknesses of PV in oncology. RESULTS This paper provides an understanding of the methodologies used by PV in current clinical practice and particularly in cancer drug therapy; a focus upon reporting of ADRs by health professionals and patients; and a focus upon methods used by PV to detect new signals of risk/harm related to medicines utilization. CONCLUSION To our knowledge, few articles focus upon the importance of PV and post-marketing surveillance of cancer drug therapies. Structured management of spontaneous reports of ADRs and data collection is essential to monitoring the safe use of drugs in this field in which pharmacotherapy is affected by high incidence of drug-related complications and by a narrow benefit/risk ratio.
Collapse
Affiliation(s)
- Paolo Baldo
- Division of Pharmacy, Centro Di Riferimento Oncologico (CRO), Aviano, Italy
| | | |
Collapse
|
53
|
Abstract
While pharmacovigilance systems have made substantial progress in the past several decades, all pharmacovigilance systems face a common set of ongoing challenges in drug safety surveillance in five principal interrelated areas: engaging the public, collaboration and partnerships, incorporating informatics, adopting a global approach, and assessing the impact of efforts. In broad terms, these challenges are not new. Rather, advances in science and technology, along with more demanding societal expectations, have changed the nature of these challenges and provided new opportunities to move the field forward. Differences in organization and levels of development, as well as regional differences, necessarily imply that a single approach is not suitable for all regions, though sharing of best practices can help each region.
Collapse
Affiliation(s)
- Gerald J Dal Pan
- US Food and Drug Administration, 10903 New Hampshire Ave., Building 22, Room 4304, Silver Spring, MD, USA,
| |
Collapse
|
54
|
Abstract
OBJECTIVES Implementation of Electronic Health Record (EHR) systems continues to expand. The massive number of patient encounters results in high amounts of stored data. Transforming clinical data into knowledge to improve patient care has been the goal of biomedical informatics professionals for many decades, and this work is now increasingly recognized outside our field. In reviewing the literature for the past three years, we focus on "big data" in the context of EHR systems and we report on some examples of how secondary use of data has been put into practice. METHODS We searched PubMed database for articles from January 1, 2011 to November 1, 2013. We initiated the search with keywords related to "big data" and EHR. We identified relevant articles and additional keywords from the retrieved articles were added. Based on the new keywords, more articles were retrieved and we manually narrowed down the set utilizing predefined inclusion and exclusion criteria. RESULTS Our final review includes articles categorized into the themes of data mining (pharmacovigilance, phenotyping, natural language processing), data application and integration (clinical decision support, personal monitoring, social media), and privacy and security. CONCLUSION The increasing adoption of EHR systems worldwide makes it possible to capture large amounts of clinical data. There is an increasing number of articles addressing the theme of "big data", and the concepts associated with these articles vary. The next step is to transform healthcare big data into actionable knowledge.
Collapse
Affiliation(s)
- M K Ross
- Lucila Ohno-Machado, Division of Biomedical Informatics, 9500 Gilman Drive, MC 0505, La Jolla, California, 92037-0505, USA, Tel: +1 858 822 4931, E-mail:
| | | | | |
Collapse
|
55
|
Yildirim P, Majnarić L, Ekmekci OI, Holzinger A. Knowledge discovery of drug data on the example of adverse reaction prediction. BMC Bioinformatics 2014; 15 Suppl 6:S7. [PMID: 25079450 PMCID: PMC4158658 DOI: 10.1186/1471-2105-15-s6-s7] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Antibiotics are the widely prescribed drugs for children and most likely to be related with adverse reactions. Record on adverse reactions and allergies from antibiotics considerably affect the prescription choices. We consider this a biomedical decision-making problem and explore hidden knowledge in survey results on data extracted from a big data pool of health records of children, from the Health Center of Osijek, Eastern Croatia. RESULTS We applied and evaluated a k-means algorithm to the dataset to generate some clusters which have similar features. Our results highlight that some type of antibiotics form different clusters, which insight is most helpful for the clinician to support better decision-making. CONCLUSIONS Medical professionals can investigate the clusters which our study revealed, thus gaining useful knowledge and insight into this data for their clinical studies.
Collapse
Affiliation(s)
- Pinar Yildirim
- Department of Computer Engineering, Faculty of Engineering & Architecture,
Okan University, Istanbul, Turkey
| | | | - Ozgur Ilyas Ekmekci
- Department of Computer Engineering, Faculty of Engineering & Architecture,
Okan University, Istanbul, Turkey
| | - Andreas Holzinger
- Institute for Medical Informatics, Statistics & Documentation, Medical
University of Graz, Graz, Austria
| |
Collapse
|
56
|
Herrero-Zazo M, Segura-Bedmar I, Martínez P, Declerck T. The DDI corpus: an annotated corpus with pharmacological substances and drug-drug interactions. J Biomed Inform 2013; 46:914-20. [PMID: 23906817 DOI: 10.1016/j.jbi.2013.07.011] [Citation(s) in RCA: 174] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2013] [Revised: 07/10/2013] [Accepted: 07/18/2013] [Indexed: 12/20/2022]
Abstract
The management of drug-drug interactions (DDIs) is a critical issue resulting from the overwhelming amount of information available on them. Natural Language Processing (NLP) techniques can provide an interesting way to reduce the time spent by healthcare professionals on reviewing biomedical literature. However, NLP techniques rely mostly on the availability of the annotated corpora. While there are several annotated corpora with biological entities and their relationships, there is a lack of corpora annotated with pharmacological substances and DDIs. Moreover, other works in this field have focused in pharmacokinetic (PK) DDIs only, but not in pharmacodynamic (PD) DDIs. To address this problem, we have created a manually annotated corpus consisting of 792 texts selected from the DrugBank database and other 233 Medline abstracts. This fined-grained corpus has been annotated with a total of 18,502 pharmacological substances and 5028 DDIs, including both PK as well as PD interactions. The quality and consistency of the annotation process has been ensured through the creation of annotation guidelines and has been evaluated by the measurement of the inter-annotator agreement between two annotators. The agreement was almost perfect (Kappa up to 0.96 and generally over 0.80), except for the DDIs in the MedLine database (0.55-0.72). The DDI corpus has been used in the SemEval 2013 DDIExtraction challenge as a gold standard for the evaluation of information extraction techniques applied to the recognition of pharmacological substances and the detection of DDIs from biomedical texts. DDIExtraction 2013 has attracted wide attention with a total of 14 teams from 7 different countries. For the task of recognition and classification of pharmacological names, the best system achieved an F1 of 71.5%, while, for the detection and classification of DDIs, the best result was F1 of 65.1%. These results show that the corpus has enough quality to be used for training and testing NLP techniques applied to the field of Pharmacovigilance. The DDI corpus and the annotation guidelines are free for use for academic research and are available at http://labda.inf.uc3m.es/ddicorpus.
Collapse
Affiliation(s)
- María Herrero-Zazo
- Computer Science Department, Universidad Carlos III de Madrid, Leganés 28911, Madrid, Spain.
| | | | | | | |
Collapse
|
57
|
Coloma PM, Valkhoff VE, Mazzaglia G, Nielsson MS, Pedersen L, Molokhia M, Mosseveld M, Morabito P, Schuemie MJ, van der Lei J, Sturkenboom M, Trifirò G. Identification of acute myocardial infarction from electronic healthcare records using different disease coding systems: a validation study in three European countries. BMJ Open 2013; 3:bmjopen-2013-002862. [PMID: 23794587 PMCID: PMC3686251 DOI: 10.1136/bmjopen-2013-002862] [Citation(s) in RCA: 92] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
OBJECTIVE To evaluate positive predictive value (PPV) of different disease codes and free text in identifying acute myocardial infarction (AMI) from electronic healthcare records (EHRs). DESIGN Validation study of cases of AMI identified from general practitioner records and hospital discharge diagnoses using free text and codes from the International Classification of Primary Care (ICPC), International Classification of Diseases 9th revision-clinical modification (ICD9-CM) and ICD-10th revision (ICD-10). SETTING Population-based databases comprising routinely collected data from primary care in Italy and the Netherlands and from secondary care in Denmark from 1996 to 2009. PARTICIPANTS A total of 4 034 232 individuals with 22 428 883 person-years of follow-up contributed to the data, from which 42 774 potential AMI cases were identified. A random sample of 800 cases was subsequently obtained for validation. MAIN OUTCOME MEASURES PPVs were calculated overall and for each code/free text. 'Best-case scenario' and 'worst-case scenario' PPVs were calculated, the latter taking into account non-retrievable/non-assessable cases. We further assessed the effects of AMI misclassification on estimates of risk during drug exposure. RESULTS Records of 748 cases (93.5% of sample) were retrieved. ICD-10 codes had a 'best-case scenario' PPV of 100% while ICD9-CM codes had a PPV of 96.6% (95% CI 93.2% to 99.9%). ICPC codes had a 'best-case scenario' PPV of 75% (95% CI 67.4% to 82.6%) and free text had PPV ranging from 20% to 60%. Corresponding PPVs in the 'worst-case scenario' all decreased. Use of codes with lower PPV generally resulted in small changes in AMI risk during drug exposure, but codes with higher PPV resulted in attenuation of risk for positive associations. CONCLUSIONS ICD9-CM and ICD-10 codes have good PPV in identifying AMI from EHRs; strategies are necessary to further optimise utility of ICPC codes and free-text search. Use of specific AMI disease codes in estimation of risk during drug exposure may lead to small but significant changes and at the expense of decreased precision.
Collapse
Affiliation(s)
- Preciosa M Coloma
- Department of Medical Informatics, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| | - Vera E Valkhoff
- Department of Medical Informatics, Erasmus MC University Medical Center, Rotterdam, The Netherlands
- Department of Gastroenterology and Hepatology, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| | - Giampiero Mazzaglia
- Department of Research, Health Search, Italian College of General Practitioners, Florence, Italy
| | | | - Lars Pedersen
- Department of Clinical Epidemiology, Aarhus University Hospital, Aarhus, Denmark
| | - Mariam Molokhia
- Primary Care and Population Sciences, Kings College, London, UK
| | - Mees Mosseveld
- Department of Medical Informatics, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| | - Paolo Morabito
- Department of Clinical and Experimental Medicine and Pharmacology, University of Messina, Messina, Italy
| | - Martijn J Schuemie
- Department of Medical Informatics, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| | - Johan van der Lei
- Department of Medical Informatics, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| | - Miriam Sturkenboom
- Department of Medical Informatics, Erasmus MC University Medical Center, Rotterdam, The Netherlands
- Department of Epidemiology, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| | - Gianluca Trifirò
- Department of Medical Informatics, Erasmus MC University Medical Center, Rotterdam, The Netherlands
- Department of Clinical and Experimental Medicine and Pharmacology, University of Messina, Messina, Italy
| | | |
Collapse
|
58
|
Ballerio S, Cerizza D. Using Text Mining to Validate Diagnoses of Acute Myocardial Infarction. CONTRIBUTIONS TO STATISTICS 2013. [DOI: 10.1007/978-88-470-5379-3_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
59
|
Tomlin A, Reith D, Dovey S, Tilyard M. Methods for retrospective detection of drug safety signals and adverse events in electronic general practice records. Drug Saf 2012; 35:733-43. [PMID: 22861670 DOI: 10.1007/bf03261970] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
BACKGROUND Examination of clinical data routinely recorded in general practice provides significant opportunities for identifying and quantifying medicine-related adverse events not captured by spontaneous adverse reaction reporting systems. Robust pharmacovigilance methods for detecting and monitoring adverse events due to treatment with new and existing medicines are required to estimate the true extent of adverse events experienced by primary care patients. OBJECTIVES The aim of the study was to examine evidence of adverse events contained in general practice electronic records and to study observed events related to selective serotonin reuptake inhibitors (SSRIs) as an example of drug-specific pharmaceutical surveillance achievable with these data. METHODS Electronic clinical records for a cohort of 338 931 patients consulting from 2002 to 2007 were extracted from the patient management systems of 30 primary care clinics in New Zealand. Medical warnings files, prescription records and free text consultation notes were used to identify physician-recorded treatment cautions, including adverse events and medicines they were associated with. A structured chronological analysis of prescriptions, consultation notes and adverse events relating to patients prescribed the SSRI citalopram was undertaken, and included investigating reasons for switching treatment to another SSRI (fluoxetine or paroxetine) as a method for detecting evidence of drug safety signals. We compared the number of adverse events identified for patients at one practice with the number spontaneously reported to New Zealand's Centre for Adverse Reactions Monitoring (CARM). RESULTS During the 6-year study period, 173 478 patients received 4 811 561 prescriptions. There were 37 397 allergies, adverse events and other warnings recorded for 24 994 patients (7.4%); adverse events relating to 65 different types of drug were reported. Medicines most frequently implicated in adverse event reports were antibacterials, analgesics, antihypertensive medicines, lipid-modifying agents and skin preparations. Citalopram was prescribed for 5612 patients, and 701 adverse events relating to citalopram were identified in the electronic health records of 473 (8.4%) patients. A total of 713 (12.7%) patients changed treatment from citalopram to another SSRI, and 164 reasons for the switch were identified: suspected adverse drug effects for 129 (78.7%), lack of effect for 29 (17.7%) and patient preference for 6 (3.7%). The most common adverse events preceding the switch were anxiety, nausea and headaches. Of the 725 adverse events and medical warnings recorded at one practice, 21 (2.9%) were spontaneously reported to the CARM. CONCLUSIONS Routinely recorded general practice data provide a wealth of opportunities for monitoring drug safety signals and for other patient safety issues. Medical warning records and consultation notes contain a wealth of information on adverse events but structured search methodologies are often required to identify these.
Collapse
Affiliation(s)
- Andrew Tomlin
- Best Practice Advocacy Centre, Dunedin, New Zealand.
| | | | | | | |
Collapse
|
60
|
Hahn U, Cohen KB, Garten Y, Shah NH. Mining the pharmacogenomics literature--a survey of the state of the art. Brief Bioinform 2012; 13:460-94. [PMID: 22833496 PMCID: PMC3404399 DOI: 10.1093/bib/bbs018] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2011] [Accepted: 03/23/2012] [Indexed: 01/05/2023] Open
Abstract
This article surveys efforts on text mining of the pharmacogenomics literature, mainly from the period 2008 to 2011. Pharmacogenomics (or pharmacogenetics) is the field that studies how human genetic variation impacts drug response. Therefore, publications span the intersection of research in genotypes, phenotypes and pharmacology, a topic that has increasingly become a focus of active research in recent years. This survey covers efforts dealing with the automatic recognition of relevant named entities (e.g. genes, gene variants and proteins, diseases and other pathological phenomena, drugs and other chemicals relevant for medical treatment), as well as various forms of relations between them. A wide range of text genres is considered, such as scientific publications (abstracts, as well as full texts), patent texts and clinical narratives. We also discuss infrastructure and resources needed for advanced text analytics, e.g. document corpora annotated with corresponding semantic metadata (gold standards and training data), biomedical terminologies and ontologies providing domain-specific background knowledge at different levels of formality and specificity, software architectures for building complex and scalable text analytics pipelines and Web services grounded to them, as well as comprehensive ways to disseminate and interact with the typically huge amounts of semiformal knowledge structures extracted by text mining tools. Finally, we consider some of the novel applications that have already been developed in the field of pharmacogenomic text mining and point out perspectives for future research.
Collapse
Affiliation(s)
- Udo Hahn
- Jena University Language and Information Engineering (JULIE) Lab, Friedrich-Schiller-Universität Jena, Jena, Germany.
| | | | | | | |
Collapse
|