101
|
LIU J, ZHANG P, LU Y. Automatic Identification of Messages Related to Adverse Drug Reactions from Online User Reviews using Feature-based Classification. IRANIAN JOURNAL OF PUBLIC HEALTH 2014; 43:1519-27. [PMID: 26060719 PMCID: PMC4449501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/19/2014] [Accepted: 10/04/2014] [Indexed: 10/26/2022]
Abstract
BACKGROUND User-generated medical messages on Internet contain extensive information related to adverse drug reactions (ADRs) and are known as valuable resources for post-marketing drug surveillance. The aim of this study was to find an effective method to identify messages related to ADRs automatically from online user reviews. METHODS We conducted experiments on online user reviews using different feature set and different classification technique. Firstly, the messages from three communities, allergy community, schizophrenia community and pain management community, were collected, the 3000 messages were annotated. Secondly, the N-gram-based features set and medical domain-specific features set were generated. Thirdly, three classification techniques, SVM, C4.5 and Naïve Bayes, were used to perform classification tasks separately. Finally, we evaluated the performance of different method using different feature set and different classification technique by comparing the metrics including accuracy and F-measure. RESULTS In terms of accuracy, the accuracy of SVM classifier was higher than 0.8, the accuracy of C4.5 classifier or Naïve Bayes classifier was lower than 0.8; meanwhile, the combination feature sets including n-gram-based feature set and domain-specific feature set consistently outperformed single feature set. In terms of F-measure, the highest F-measure is 0.895 which was achieved by using combination feature sets and a SVM classifier. In all, we can get the best classification performance by using combination feature sets and SVM classifier. CONCLUSION By using combination feature sets and SVM classifier, we can get an effective method to identify messages related to ADRs automatically from online user reviews.
Collapse
Affiliation(s)
- Jingfang LIU
- 1. Antai College of Economics and Management, Shanghai Jiao Tong University, Shanghai, China,* Corresponding Author:
| | - Pengzhu ZHANG
- 1. Antai College of Economics and Management, Shanghai Jiao Tong University, Shanghai, China
| | - Yingjie LU
- 2. School of Economics and Management, Beijing University of Chemical Technology, Beijing, China
| |
Collapse
|
102
|
Syed-Abdul S, Nguyen A, Huang F, Jian WS, Iqbal U, Yang V, Hsu MH, Li YC. A smart medication recommendation model for the electronic prescription. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2014; 117:218-224. [PMID: 25092226 DOI: 10.1016/j.cmpb.2014.06.019] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/13/2014] [Revised: 06/04/2014] [Accepted: 06/27/2014] [Indexed: 06/03/2023]
Abstract
BACKGROUND The report from the Institute of Medicine, To Err Is Human: Building a Safer Health System in 1999 drew a special attention towards preventable medical errors and patient safety. The American Reinvestment and Recovery Act of 2009 and federal criteria of 'Meaningful use' stage 1 mandated e-prescribing to be used by eligible providers in order to access Medicaid and Medicare incentive payments. Inappropriate prescribing has been identified as a preventable cause of at least 20% of drug-related adverse events. A few studies reported system-related errors and have offered targeted recommendations on improving and enhancing e-prescribing system. OBJECTIVE This study aims to enhance efficiency of the e-prescribing system by shortening the medication list, reducing the risk of inappropriate selection of medication, as well as in reducing the prescribing time of physicians. METHOD 103.48 million prescriptions from Taiwan's national health insurance claim data were used to compute Diagnosis-Medication association. Furthermore, 100,000 prescriptions were randomly selected to develop a smart medication recommendation model by using association rules of data mining. RESULTS AND CONCLUSION The important contribution of this model is to introduce a new concept called Mean Prescription Rank (MPR) of prescriptions and Coverage Rate (CR) of prescriptions. A proactive medication list (PML) was computed using MPR and CR. With this model the medication drop-down menu is significantly shortened, thereby reducing medication selection errors and prescription times. The physicians will still select relevant medications even in the case of inappropriate (unintentional) selection.
Collapse
Affiliation(s)
- Shabbir Syed-Abdul
- Taipei Medical University, College of Medical Science and Technology, Graduate Institute of Biomedical Informatics, Taiwan.
| | - Alex Nguyen
- Taipei Medical University, College of Medical Science and Technology, Graduate Institute of Biomedical Informatics, Taiwan.
| | - Frank Huang
- Taipei Medical University, College of Medical Science and Technology, Graduate Institute of Biomedical Informatics, Taiwan.
| | - Wen-Shan Jian
- Taipei Medical University, School of Health Care Administration, Taiwan.
| | - Usman Iqbal
- Taipei Medical University, College of Medical Science and Technology, Graduate Institute of Biomedical Informatics, Taiwan.
| | - Vivian Yang
- College of Medical Science and Technology, Institute of Biomedical Informatics, Taiwan.
| | - Min-Huei Hsu
- Taipei Medical University, College of Medical Science and Technology, Graduate Institute of Biomedical Informatics, Taiwan.
| | - Yu-Chuan Li
- Taipei Medical University, College of Medical Science and Technology, Graduate Institute of Biomedical Informatics, Taiwan.
| |
Collapse
|
103
|
Sampathkumar H, Chen XW, Luo B. Mining adverse drug reactions from online healthcare forums using hidden Markov model. BMC Med Inform Decis Mak 2014; 14:91. [PMID: 25341686 PMCID: PMC4283122 DOI: 10.1186/1472-6947-14-91] [Citation(s) in RCA: 61] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2013] [Accepted: 08/18/2014] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Adverse Drug Reactions are one of the leading causes of injury or death among patients undergoing medical treatments. Not all Adverse Drug Reactions are identified before a drug is made available in the market. Current post-marketing drug surveillance methods, which are based purely on voluntary spontaneous reports, are unable to provide the early indications necessary to prevent the occurrence of such injuries or fatalities. The objective of this research is to extract reports of adverse drug side-effects from messages in online healthcare forums and use them as early indicators to assist in post-marketing drug surveillance. METHODS We treat the task of extracting adverse side-effects of drugs from healthcare forum messages as a sequence labeling problem and present a Hidden Markov Model(HMM) based Text Mining system that can be used to classify a message as containing drug side-effect information and then extract the adverse side-effect mentions from it. A manually annotated dataset from http://www.medications.com is used in the training and validation of the HMM based Text Mining system. RESULTS A 10-fold cross-validation on the manually annotated dataset yielded on average an F-Score of 0.76 from the HMM Classifier, in comparison to 0.575 from the Baseline classifier. Without the Plain Text Filter component as a part of the Text Processing module, the F-Score of the HMM Classifier was reduced to 0.378 on average, while absence of the HTML Filter component was found to have no impact. Reducing the Drug names dictionary size by half, on average reduced the F-Score of the HMM Classifier to 0.359, while a similar reduction to the side-effects dictionary yielded an F-Score of 0.651 on average. Adverse side-effects mined from http://www.medications.com and http://www.steadyhealth.com were found to match the Adverse Drug Reactions on the Drug Package Labels of several drugs. In addition, some novel adverse side-effects, which can be potential Adverse Drug Reactions, were also identified. CONCLUSIONS The results from the HMM based Text Miner are encouraging to pursue further enhancements to this approach. The mined novel side-effects can act as early indicators for health authorities to help focus their efforts in post-marketing drug surveillance.
Collapse
Affiliation(s)
| | - Xue-wen Chen
- />Dept. of Computer Science, Wayne State University, 48202 Detroit, USA
| | - Bo Luo
- />EECS, University of Kansas, 66045 Lawrence, USA
| |
Collapse
|
104
|
Jung K, LePendu P, Iyer S, Bauer-Mehren A, Percha B, Shah NH. Functional evaluation of out-of-the-box text-mining tools for data-mining tasks. J Am Med Inform Assoc 2014; 22:121-31. [PMID: 25336595 PMCID: PMC4433377 DOI: 10.1136/amiajnl-2014-002902] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Objective The trade-off between the speed and simplicity of dictionary-based term recognition and the richer linguistic information provided by more advanced natural language processing (NLP) is an area of active discussion in clinical informatics. In this paper, we quantify this trade-off among text processing systems that make different trade-offs between speed and linguistic understanding. We tested both types of systems in three clinical research tasks: phase IV safety profiling of a drug, learning adverse drug–drug interactions, and learning used-to-treat relationships between drugs and indications. Materials We first benchmarked the accuracy of the NCBO Annotator and REVEAL in a manually annotated, publically available dataset from the 2008 i2b2 Obesity Challenge. We then applied the NCBO Annotator and REVEAL to 9 million clinical notes from the Stanford Translational Research Integrated Database Environment (STRIDE) and used the resulting data for three research tasks. Results There is no significant difference between using the NCBO Annotator and REVEAL in the results of the three research tasks when using large datasets. In one subtask, REVEAL achieved higher sensitivity with smaller datasets. Conclusions For a variety of tasks, employing simple term recognition methods instead of advanced NLP methods results in little or no impact on accuracy when using large datasets. Simpler dictionary-based methods have the advantage of scaling well to very large datasets. Promoting the use of simple, dictionary-based methods for population level analyses can advance adoption of NLP in practice.
Collapse
Affiliation(s)
- Kenneth Jung
- Program in Biomedical Informatics, Stanford University, Stanford, California, USA
| | - Paea LePendu
- Center for Biomedical Informatics Research, Stanford University, Stanford, California, USA
| | - Srinivasan Iyer
- Center for Biomedical Informatics Research, Stanford University, Stanford, California, USA
| | - Anna Bauer-Mehren
- Center for Biomedical Informatics Research, Stanford University, Stanford, California, USA
| | - Bethany Percha
- Program in Biomedical Informatics, Stanford University, Stanford, California, USA
| | - Nigam H Shah
- Center for Biomedical Informatics Research, Stanford University, Stanford, California, USA
| |
Collapse
|
105
|
Xu R, Wang Q. Combining automatic table classification and relationship extraction in extracting anticancer drug-side effect pairs from full-text articles. J Biomed Inform 2014; 53:128-35. [PMID: 25445920 DOI: 10.1016/j.jbi.2014.10.002] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2014] [Revised: 08/30/2014] [Accepted: 10/03/2014] [Indexed: 01/09/2023]
Abstract
Anticancer drug-associated side effect knowledge often exists in multiple heterogeneous and complementary data sources. A comprehensive anticancer drug-side effect (drug-SE) relationship knowledge base is important for computation-based drug target discovery, drug toxicity predication and drug repositioning. In this study, we present a two-step approach by combining table classification and relationship extraction to extract drug-SE pairs from a large number of high-profile oncological full-text articles. The data consists of 31,255 tables downloaded from the Journal of Oncology (JCO). We first trained a statistical classifier to classify tables into SE-related and -unrelated categories. We then extracted drug-SE pairs from SE-related tables. We compared drug side effect knowledge extracted from JCO tables to that derived from FDA drug labels. Finally, we systematically analyzed relationships between anti-cancer drug-associated side effects and drug-associated gene targets, metabolism genes, and disease indications. The statistical table classifier is effective in classifying tables into SE-related and -unrelated (precision: 0.711; recall: 0.941; F1: 0.810). We extracted a total of 26,918 drug-SE pairs from SE-related tables with a precision of 0.605, a recall of 0.460, and a F1 of 0.520. Drug-SE pairs extracted from JCO tables is largely complementary to those derived from FDA drug labels; as many as 84.7% of the pairs extracted from JCO tables have not been included a side effect database constructed from FDA drug labels. Side effects associated with anticancer drugs positively correlate with drug target genes, drug metabolism genes, and disease indications.
Collapse
Affiliation(s)
- Rong Xu
- Medical Informatics Program, Center for Clinical Investigation, Case Western Reserve University, Cleveland, OH 44106, United States.
| | - QuanQiu Wang
- ThinTek, LLC, Palo Alto, CA 94306, United States.
| |
Collapse
|
106
|
Vilar S, Ryan PB, Madigan D, Stang PE, Schuemie MJ, Friedman C, Tatonetti NP, Hripcsak G. Similarity-based modeling applied to signal detection in pharmacovigilance. CPT-PHARMACOMETRICS & SYSTEMS PHARMACOLOGY 2014; 3:e137. [PMID: 25250527 PMCID: PMC4211266 DOI: 10.1038/psp.2014.35] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/10/2014] [Accepted: 07/06/2014] [Indexed: 12/31/2022]
Abstract
One of the main objectives in pharmacovigilance is the detection of adverse drug events (ADEs) through mining of healthcare databases, such as electronic health records or administrative claims data. Although different approaches have been shown to be of great value, research is still focusing on the enhancement of signal detection to gain efficiency in further assessment and follow-up. We applied similarity-based modeling techniques, using 2D and 3D molecular structure, ADE, target, and ATC (anatomical therapeutic chemical) similarity measures, to the candidate associations selected previously in a medication-wide association study for four ADE outcomes. Our results showed an improvement in the precision when we ranked the subset of ADE candidates using similarity scorings. This method is simple, useful to strengthen or prioritize signals generated from healthcare databases, and facilitates ADE detection through the identification of the most similar drugs for which ADE information is available.
Collapse
Affiliation(s)
- S Vilar
- 1] Department of Biomedical Informatics, Columbia University, New York, New York, USA [2] Observational Health Data Sciences and Informatics (OHDSI), New York, New York, USA
| | - P B Ryan
- 1] Observational Health Data Sciences and Informatics (OHDSI), New York, New York, USA [2] Janssen Research and Development, Titusville, New Jersey, USA
| | - D Madigan
- 1] Observational Health Data Sciences and Informatics (OHDSI), New York, New York, USA [2] Department of Statistics, Columbia University, New York, New York, USA
| | - P E Stang
- 1] Observational Health Data Sciences and Informatics (OHDSI), New York, New York, USA [2] Janssen Research and Development, Titusville, New Jersey, USA
| | - M J Schuemie
- 1] Observational Health Data Sciences and Informatics (OHDSI), New York, New York, USA [2] Janssen Research and Development, Titusville, New Jersey, USA
| | - C Friedman
- 1] Department of Biomedical Informatics, Columbia University, New York, New York, USA [2] Observational Health Data Sciences and Informatics (OHDSI), New York, New York, USA
| | - N P Tatonetti
- 1] Department of Biomedical Informatics, Columbia University, New York, New York, USA [2] Observational Health Data Sciences and Informatics (OHDSI), New York, New York, USA [3] Department of Systems Biology, Columbia University Medical Center, New York, New York, USA [4] Department of Medicine, Columbia University Medical Center, New York, New York, USA
| | - G Hripcsak
- 1] Department of Biomedical Informatics, Columbia University, New York, New York, USA [2] Observational Health Data Sciences and Informatics (OHDSI), New York, New York, USA
| |
Collapse
|
107
|
Wang L, Jiang G, Li D, Liu H. Standardizing adverse drug event reporting data. J Biomed Semantics 2014; 5:36. [PMID: 25157320 PMCID: PMC4142531 DOI: 10.1186/2041-1480-5-36] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2013] [Accepted: 07/23/2014] [Indexed: 11/16/2022] Open
Abstract
Background The Adverse Event Reporting System (AERS) is an FDA database providing rich information on voluntary reports of adverse drug events (ADEs). Normalizing data in the AERS would improve the mining capacity of the AERS for drug safety signal detection and promote semantic interoperability between the AERS and other data sources. In this study, we normalize the AERS and build a publicly available normalized ADE data source. The drug information in the AERS is normalized to RxNorm, a standard terminology source for medication, using a natural language processing medication extraction tool, MedEx. Drug class information is then obtained from the National Drug File-Reference Terminology (NDF-RT) using a greedy algorithm. Adverse events are aggregated through mapping with the Preferred Term (PT) and System Organ Class (SOC) codes of Medical Dictionary for Regulatory Activities (MedDRA). The performance of MedEx-based annotation was evaluated and case studies were performed to demonstrate the usefulness of our approaches. Results Our study yields an aggregated knowledge-enhanced AERS data mining set (AERS-DM). In total, the AERS-DM contains 37,029,228 Drug-ADE records. Seventy-one percent (10,221/14,490) of normalized drug concepts in the AERS were classified to 9 classes in NDF-RT. The number of unique pairs is 4,639,613 between RxNorm concepts and MedDRA Preferred Term (PT) codes and 205,725 between RxNorm concepts and SOC codes after ADE aggregation. Conclusions We have built an open-source Drug-ADE knowledge resource with data being normalized and aggregated using standard biomedical ontologies. The data resource has the potential to assist the mining of ADE from AERS for the data mining research community.
Collapse
Affiliation(s)
- Liwei Wang
- Department of Medical Informatics, School of Public Health, Jilin University, Jilin, China ; Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Guoqian Jiang
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Dingcheng Li
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| |
Collapse
|
108
|
Shang N, Xu H, Rindflesch TC, Cohen T. Identifying plausible adverse drug reactions using knowledge extracted from the literature. J Biomed Inform 2014; 52:293-310. [PMID: 25046831 DOI: 10.1016/j.jbi.2014.07.011] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2014] [Revised: 06/06/2014] [Accepted: 07/10/2014] [Indexed: 01/08/2023]
Abstract
Pharmacovigilance involves continually monitoring drug safety after drugs are put to market. To aid this process; algorithms for the identification of strongly correlated drug/adverse drug reaction (ADR) pairs from data sources such as adverse event reporting systems or Electronic Health Records have been developed. These methods are generally statistical in nature, and do not draw upon the large volumes of knowledge embedded in the biomedical literature. In this paper, we investigate the ability of scalable Literature Based Discovery (LBD) methods to identify side effects of pharmaceutical agents. The advantage of LBD methods is that they can provide evidence from the literature to support the plausibility of a drug/ADR association, thereby assisting human review to validate the signal, which is an essential component of pharmacovigilance. To do so, we draw upon vast repositories of knowledge that has been extracted from the biomedical literature by two Natural Language Processing tools, MetaMap and SemRep. We evaluate two LBD methods that scale comfortably to the volume of knowledge available in these repositories. Specifically, we evaluate Reflective Random Indexing (RRI), a model based on concept-level co-occurrence, and Predication-based Semantic Indexing (PSI), a model that encodes the nature of the relationship between concepts to support reasoning analogically about drug-effect relationships. An evaluation set was constructed from the Side Effect Resource 2 (SIDER2), which contains known drug/ADR relations, and models were evaluated for their ability to "rediscover" these relations. In this paper, we demonstrate that both RRI and PSI can recover known drug-adverse event associations. However, PSI performed better overall, and has the additional advantage of being able to recover the literature underlying the reasoning pathways it used to make its predictions.
Collapse
Affiliation(s)
- Ning Shang
- School of Biomedical Informatics, The University of Texas Health Science Center, Houston, TX, United States.
| | - Hua Xu
- School of Biomedical Informatics, The University of Texas Health Science Center, Houston, TX, United States
| | | | - Trevor Cohen
- School of Biomedical Informatics, The University of Texas Health Science Center, Houston, TX, United States
| |
Collapse
|
109
|
Polepalli Ramesh B, Belknap SM, Li Z, Frid N, West DP, Yu H. Automatically Recognizing Medication and Adverse Event Information From Food and Drug Administration's Adverse Event Reporting System Narratives. JMIR Med Inform 2014; 2:e10. [PMID: 25600332 PMCID: PMC4288072 DOI: 10.2196/medinform.3022] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2013] [Revised: 12/10/2013] [Accepted: 12/10/2013] [Indexed: 12/14/2022] Open
Abstract
Background The Food and Drug Administration’s (FDA) Adverse Event Reporting System (FAERS) is a repository of spontaneously-reported adverse drug events (ADEs) for FDA-approved prescription drugs. FAERS reports include both structured reports and unstructured narratives. The narratives often include essential information for evaluation of the severity, causality, and description of ADEs that are not present in the structured data. The timely identification of unknown toxicities of prescription drugs is an important, unsolved problem. Objective The objective of this study was to develop an annotated corpus of FAERS narratives and biomedical named entity tagger to automatically identify ADE related information in the FAERS narratives. Methods We developed an annotation guideline and annotate medication information and adverse event related entities on 122 FAERS narratives comprising approximately 23,000 word tokens. A named entity tagger using supervised machine learning approaches was built for detecting medication information and adverse event entities using various categories of features. Results The annotated corpus had an agreement of over .9 Cohen’s kappa for medication and adverse event entities. The best performing tagger achieves an overall performance of 0.73 F1 score for detection of medication, adverse event and other named entities. Conclusions In this study, we developed an annotated corpus of FAERS narratives and machine learning based models for automatically extracting medication and adverse event information from the FAERS narratives. Our study is an important step towards enriching the FAERS data for postmarketing pharmacovigilance.
Collapse
|
110
|
Raghavan P, Chen JL, Fosler-Lussier E, Lai AM. How essential are unstructured clinical narratives and information fusion to clinical trial recruitment? AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2014; 2014:218-23. [PMID: 25717416 PMCID: PMC4333685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
Electronic health records capture patient information using structured controlled vocabularies and unstructured narrative text. While structured data typically encodes lab values, encounters and medication lists, unstructured data captures the physician's interpretation of the patient's condition, prognosis, and response to therapeutic intervention. In this paper, we demonstrate that information extraction from unstructured clinical narratives is essential to most clinical applications. We perform an empirical study to validate the argument and show that structured data alone is insufficient in resolving eligibility criteria for recruiting patients onto clinical trials for chronic lymphocytic leukemia (CLL) and prostate cancer. Unstructured data is essential to solving 59% of the CLL trial criteria and 77% of the prostate cancer trial criteria. More specifically, for resolving eligibility criteria with temporal constraints, we show the need for temporal reasoning and information integration with medical events within and across unstructured clinical narratives and structured data.
Collapse
|
111
|
Zhang R, Pakhomov S, Melton GB. Longitudinal analysis of new information types in clinical notes. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2014; 2014:232-7. [PMID: 25717418 PMCID: PMC4333708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
It is increasingly recognized that redundant information in clinical notes within electronic health record (EHR) systems is ubiquitous, significant, and may negatively impact the secondary use of these notes for research and patient care. We investigated several automated methods to identify redundant versus relevant new information in clinical reports. These methods may provide a valuable approach to extract clinically pertinent information and further improve the accuracy of clinical information extraction systems. In this study, we used UMLS semantic types to extract several types of new information, including problems, medications, and laboratory information. Automatically identified new information highly correlated with manual reference standard annotations. Methods to identify different types of new information can potentially help to build up more robust information extraction systems for clinical researchers as well as aid clinicians and researchers in navigating clinical notes more effectively and quickly identify information pertaining to changes in health states.
Collapse
Affiliation(s)
- Rui Zhang
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN
| | - Serguei Pakhomov
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN,College of Pharmacy, University of Minnesota, Minneapolis, MN
| | - Genevieve B. Melton
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN,Departments of Surgery, University of Minnesota, Minneapolis, MN
| |
Collapse
|
112
|
Yang CC, Yang H, Jiang L. Postmarketing Drug Safety Surveillance Using Publicly Available Health-Consumer-Contributed Content in Social Media. ACM TRANSACTIONS ON MANAGEMENT INFORMATION SYSTEMS 2014. [DOI: 10.1145/2576233] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Postmarketing drug safety surveillance is important because many potential adverse drug reactions cannot be identified in the premarketing review process. It is reported that about 5% of hospital admissions are attributed to adverse drug reactions and many deaths are eventually caused, which is a serious concern in public health. Currently, drug safety detection relies heavily on voluntarily reporting system, electronic health records, or relevant databases. There is often a time delay before the reports are filed and only a small portion of adverse drug reactions experienced by health consumers are reported. Given the popularity of social media, many health social media sites are now available for health consumers to discuss any health-related issues, including adverse drug reactions they encounter. There is a large volume of health-consumer-contributed content available, but little effort has been made to harness this information for postmarketing drug safety surveillance to supplement the traditional approach. In this work, we propose the association rule mining approach to identify the association between a drug and an adverse drug reaction. We use the alerts posted by Food and Drug Administration as the gold standard to evaluate the effectiveness of our approach. The result shows that the performance of harnessing health-related social media content to detect adverse drug reaction is good and promising.
Collapse
|
113
|
Yeleswarapu S, Rao A, Joseph T, Saipradeep VG, Srinivasan R. A pipeline to extract drug-adverse event pairs from multiple data sources. BMC Med Inform Decis Mak 2014; 14:13. [PMID: 24559132 PMCID: PMC3936866 DOI: 10.1186/1472-6947-14-13] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2013] [Accepted: 02/14/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Pharmacovigilance aims to uncover and understand harmful side-effects of drugs, termed adverse events (AEs). Although the current process of pharmacovigilance is very systematic, the increasing amount of information available in specialized health-related websites as well as the exponential growth in medical literature presents a unique opportunity to supplement traditional adverse event gathering mechanisms with new-age ones. METHOD We present a semi-automated pipeline to extract associations between drugs and side effects from traditional structured adverse event databases, enhanced by potential drug-adverse event pairs mined from user-comments from health-related websites and MEDLINE abstracts. The pipeline was tested using a set of 12 drugs representative of two previous studies of adverse event extraction from health-related websites and MEDLINE abstracts. RESULTS Testing the pipeline shows that mining non-traditional sources helps substantiate the adverse event databases. The non-traditional sources not only contain the known AEs, but also suggest some unreported AEs for drugs which can then be analyzed further. CONCLUSION A semi-automated pipeline to extract the AE pairs from adverse event databases as well as potential AE pairs from non-traditional sources such as text from MEDLINE abstracts and user-comments from health-related websites is presented.
Collapse
Affiliation(s)
| | - Aditya Rao
- TCS Innovation Labs, Tata Consultancy Services Ltd, Deccan Park, 1, Software Units Layout, Madhapur, Hyderabad 500081, Andhra Pradesh, India.
| | | | | | | |
Collapse
|
114
|
Xu R, Wang Q. Large-scale combining signals from both biomedical literature and the FDA Adverse Event Reporting System (FAERS) to improve post-marketing drug safety signal detection. BMC Bioinformatics 2014; 15:17. [PMID: 24428898 PMCID: PMC3906761 DOI: 10.1186/1471-2105-15-17] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2013] [Accepted: 01/13/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Independent data sources can be used to augment post-marketing drug safety signal detection. The vast amount of publicly available biomedical literature contains rich side effect information for drugs at all clinical stages. In this study, we present a large-scale signal boosting approach that combines over 4 million records in the US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) and over 21 million biomedical articles. RESULTS The datasets are comprised of 4,285,097 records from FAERS and 21,354,075 MEDLINE articles. We first extracted all drug-side effect (SE) pairs from FAERS. Our study implemented a total of seven signal ranking algorithms. We then compared these different ranking algorithms before and after they were boosted with signals from MEDLINE sentences or abstracts. Finally, we manually curated all drug-cardiovascular (CV) pairs that appeared in both data sources and investigated whether our approach can detect many true signals that have not been included in FDA drug labels. We extracted a total of 2,787,797 drug-SE pairs from FAERS with a low initial precision of 0.025. The ranking algorithm combined signals from both FAERS and MEDLINE, significantly improving the precision from 0.025 to 0.371 for top-ranked pairs, representing a 13.8 fold elevation in precision. We showed by manual curation that drug-SE pairs that appeared in both data sources were highly enriched with true signals, many of which have not yet been included in FDA drug labels. CONCLUSIONS We have developed an efficient and effective drug safety signal ranking and strengthening approach We demonstrate that large-scale combining information from FAERS and biomedical literature can significantly contribute to drug safety surveillance.
Collapse
Affiliation(s)
- Rong Xu
- Medical Informatics Division, Case Western Reserve, Cleveland, Ohio, USA
| | | |
Collapse
|
115
|
Abstract
The growing amount and availability of electronic health record (EHR) data present enhanced opportunities for discovering new knowledge about diseases. In the past decade, there has been an increasing number of data and text mining studies focused on the identification of disease associations (e.g., disease-disease, disease-drug, and disease-gene) in structured and unstructured EHR data. This chapter presents a knowledge discovery framework for mining the EHR for disease knowledge and describes each step for data selection, preprocessing, transformation, data mining, and interpretation/validation. Topics including natural language processing, standards, and data privacy and security are also discussed in the context of this framework.
Collapse
Affiliation(s)
- Elizabeth S Chen
- Center for Clinical and Translational Science, University of Vermont, Burlington, VT, USA,
| | | |
Collapse
|
116
|
Smith J, Denny J, Chen Q, Nian H, Spickard III A, Rosenbloom ST, Miller RA. Lessons learned from developing a drug evidence base to support pharmacovigilance. Appl Clin Inform 2013; 4:596-617. [PMID: 24454585 PMCID: PMC3885918 DOI: 10.4338/aci-2013-08-ra-0062] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2013] [Accepted: 11/06/2013] [Indexed: 12/14/2022] Open
Abstract
OBJECTIVE This work identified challenges associated with extraction and representation of medication-related information from publicly available electronic sources. METHODS We gained direct observational experience through creating and evaluating the Drug Evidence Base (DEB), a repository of drug indications and adverse effects (ADEs), and supplemented this through literature review. We extracted DEB content from the National Drug File Reference Terminology, from aggregated MEDLINE co-occurrence data, and from the National Library of Medicine's DailyMed. To understand better the similarities, differences and problems with the content of DEB and the SIDER Side Effect Resource, and Vanderbilt's MEDI Indication Resource, we carried out statistical evaluations and human expert reviews. RESULTS While DEB, SIDER, and MEDI often agreed on medication indications and side effects, cross-system shortcomings limit their current utility. The drug information resources we evaluated frequently employed multiple, disparate vaguely related UMLS concepts to represent a single specific clinical drug indication or adverse effect. Thus, evaluations comparing drug-indication and drug-ADE coverage for such resources will encounter substantial numbers of false negative and false positive matches. Furthermore, our review found that many indication and ADE relationships are too complex - logically and temporally - to represent within existing systems. CONCLUSION To enhance applicability and utility, future drug information systems deriving indications and ADEs from public resources must represent clinical concepts uniformly and as precisely as possible. Future systems must also better represent the inherent complexity of indications and ADEs.
Collapse
Affiliation(s)
- J.C. Smith
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | | | | | - H. Nian
- Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA;
4School of Nursing, Vanderbilt University, Nashville, Tennessee, USA
| | | | | | | |
Collapse
|
117
|
Gobbel GT, Reeves R, Jayaramaraja S, Giuse D, Speroff T, Brown SH, Elkin PL, Matheny ME. Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives. J Biomed Inform 2013; 48:54-65. [PMID: 24316051 DOI: 10.1016/j.jbi.2013.11.008] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2013] [Revised: 08/16/2013] [Accepted: 11/17/2013] [Indexed: 11/16/2022]
Abstract
Rapid, automated determination of the mapping of free text phrases to pre-defined concepts could assist in the annotation of clinical notes and increase the speed of natural language processing systems. The aim of this study was to design and evaluate a token-order-specific naïve Bayes-based machine learning system (RapTAT) to predict associations between phrases and concepts. Performance was assessed using a reference standard generated from 2860 VA discharge summaries containing 567,520 phrases that had been mapped to 12,056 distinct Systematized Nomenclature of Medicine - Clinical Terms (SNOMED CT) concepts by the MCVS natural language processing system. It was also assessed on the manually annotated, 2010 i2b2 challenge data. Performance was established with regard to precision, recall, and F-measure for each of the concepts within the VA documents using bootstrapping. Within that corpus, concepts identified by MCVS were broadly distributed throughout SNOMED CT, and the token-order-specific language model achieved better performance based on precision, recall, and F-measure (0.95±0.15, 0.96±0.16, and 0.95±0.16, respectively; mean±SD) than the bag-of-words based, naïve Bayes model (0.64±0.45, 0.61±0.46, and 0.60±0.45, respectively) that has previously been used for concept mapping. Precision, recall, and F-measure on the i2b2 test set were 92.9%, 85.9%, and 89.2% respectively, using the token-order-specific model. RapTAT required just 7.2ms to map all phrases within a single discharge summary, and mapping rate did not decrease as the number of processed documents increased. The high performance attained by the tool in terms of both accuracy and speed was encouraging, and the mapping rate should be sufficient to support near-real-time, interactive annotation of medical narratives. These results demonstrate the feasibility of rapidly and accurately mapping phrases to a wide range of medical concepts based on a token-order-specific naïve Bayes model and machine learning.
Collapse
Affiliation(s)
- Glenn T Gobbel
- Geriatric Research, Education and Clinical Center (GRECC), Department of Veterans Affairs Medical Center, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA; Division of General Internal Medicine & Public Health, Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA.
| | - Ruth Reeves
- Geriatric Research, Education and Clinical Center (GRECC), Department of Veterans Affairs Medical Center, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA.
| | - Shrimalini Jayaramaraja
- Geriatric Research, Education and Clinical Center (GRECC), Department of Veterans Affairs Medical Center, Nashville, TN, USA; Division of General Internal Medicine & Public Health, Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA.
| | - Dario Giuse
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA.
| | - Theodore Speroff
- Geriatric Research, Education and Clinical Center (GRECC), Department of Veterans Affairs Medical Center, Nashville, TN, USA; Division of General Internal Medicine & Public Health, Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA; Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA.
| | - Steven H Brown
- Geriatric Research, Education and Clinical Center (GRECC), Department of Veterans Affairs Medical Center, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA.
| | - Peter L Elkin
- Department of Biomedical Informatics, University at Buffalo, SUNY, Buffalo, NY, USA.
| | - Michael E Matheny
- Geriatric Research, Education and Clinical Center (GRECC), Department of Veterans Affairs Medical Center, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA; Division of General Internal Medicine & Public Health, Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA; Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA.
| |
Collapse
|
118
|
Henao R, Murray J, Ginsburg G, Carin L, Lucas JE. Patient clustering with uncoded text in electronic medical records. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2013; 2013:592-599. [PMID: 24551361 PMCID: PMC3900202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
We propose a mixture model for text data designed to capture underlying structure in the history of present illness section of electronic medical records data. Additionally, we propose a method to induce bias that leads to more homogeneous sets of diagnoses for patients in each cluster. We apply our model to a collection of electronic records from an emergency department and compare our results to three other relevant models in order to assess performance. Results using standard metrics demonstrate that patient clusters from our model are more homogeneous when compared to others, and qualitative analyses suggest that our approach leads to interpretable patient sub-populations when applied to real data. Finally, we demonstrate an example of our patient clustering model to identify adverse drug events.
Collapse
|
119
|
Jiang X, Tse K, Wang S, Doan S, Kim H, Ohno-Machado L. Recent trends in biomedical informatics: a study based on JAMIA articles. J Am Med Inform Assoc 2013; 20:e198-205. [PMID: 24214018 PMCID: PMC3861936 DOI: 10.1136/amiajnl-2013-002429] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
In a growing interdisciplinary field like biomedical informatics, information dissemination and citation trends are changing rapidly due to many factors. To understand these factors better, we analyzed the evolution of the number of articles per major biomedical informatics topic, download/online view frequencies, and citation patterns (using Web of Science) for articles published from 2009 to 2012 in JAMIA. The number of articles published in JAMIA increased significantly from 2009 to 2012, and there were some topic differences in the last 4 years. Medical Record Systems, Algorithms, and Methods are topic categories that are growing fast in several publications. We observed a significant correlation between download frequencies and the number of citations per month since publication for a given article. Earlier free availability of articles to non-subscribers was associated with a higher number of downloads and showed a trend towards a higher number of citations. This trend will need to be verified as more data accumulate in coming years.
Collapse
Affiliation(s)
- Xiaoqian Jiang
- Division of Biomedical Informatics, Department of Medicine, University of California San Diego, La Jolla, California, USA
| | | | | | | | | | | |
Collapse
|
120
|
Coloma PM, Trifirò G, Patadia V, Sturkenboom M. Postmarketing safety surveillance : where does signal detection using electronic healthcare records fit into the big picture? Drug Saf 2013; 36:183-97. [PMID: 23377696 DOI: 10.1007/s40264-013-0018-x] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The safety profile of a drug evolves over its lifetime on the market; there are bound to be changes in the circumstances of a drug's clinical use which may give rise to previously unobserved adverse effects, hence necessitating surveillance postmarketing. Postmarketing surveillance has traditionally been carried out by systematic manual review of spontaneous reports of adverse drug reactions. Vast improvements in computing capabilities have provided opportunities to automate signal detection, and several worldwide initiatives are exploring new approaches to facilitate earlier detection, primarily through mining of routinely-collected data from electronic healthcare records (EHR). This paper provides an overview of ongoing initiatives exploring data from EHR for signal detection vis-à-vis established spontaneous reporting systems (SRS). We describe the role SRS has played in regulatory decision making with respect to safety issues, and evaluate the potential added value of EHR-based signal detection systems to the current practice of drug surveillance. Safety signal detection is both an iterative and dynamic process. It is in the best interest of public health to integrate and understand evidence from all possibly relevant information sources on drug safety. Proper evaluation and communication of potential signals identified remains an imperative and should accompany any signal detection activity.
Collapse
Affiliation(s)
- Preciosa M Coloma
- Ee-2116, Department of Medical Informatics, Erasmus Medical Centre, PO Box 2040, 3000 CA, Rotterdam, The Netherlands.
| | | | | | | |
Collapse
|
121
|
Hanauer DA, Ramakrishnan N, Seyfried LS. Describing the relationship between cat bites and human depression using data from an electronic health record. PLoS One 2013; 8:e70585. [PMID: 23936453 PMCID: PMC3731284 DOI: 10.1371/journal.pone.0070585] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2013] [Accepted: 06/20/2013] [Indexed: 01/09/2023] Open
Abstract
Data mining approaches have been increasingly applied to the electronic health record and have led to the discovery of numerous clinical associations. Recent data mining studies have suggested a potential association between cat bites and human depression. To explore this possible association in more detail we first used administrative diagnosis codes to identify patients with either depression or bites, drawn from a population of 1.3 million patients. We then conducted a manual chart review in the electronic health record of all patients with a code for a bite to accurately determine which were from cats or dogs. Overall there were 750 patients with cat bites, 1,108 with dog bites, and approximately 117,000 patients with depression. Depression was found in 41.3% of patients with cat bites and 28.7% of those with dog bites. Furthermore, 85.5% of those with both cat bites and depression were women, compared to 64.5% of those with dog bites and depression. The probability of a woman being diagnosed with depression at some point in her life if she presented to our health system with a cat bite was 47.0%, compared to 24.2% of men presenting with a similar bite. The high proportion of depression in patients who had cat bites, especially among women, suggests that screening for depression could be appropriate in patients who present to a clinical provider with a cat bite. Additionally, while no causative link is known to explain this association, there is growing evidence to suggest that the relationship between cats and human mental illness, such as depression, warrants further investigation.
Collapse
Affiliation(s)
- David A Hanauer
- Department of Pediatrics, University of Michigan, Ann Arbor, Michigan, USA.
| | | | | |
Collapse
|
122
|
Li Y, Salmasian H, Vilar S, Chase H, Friedman C, Wei Y. A method for controlling complex confounding effects in the detection of adverse drug reactions using electronic health records. J Am Med Inform Assoc 2013; 21:308-14. [PMID: 23907285 DOI: 10.1136/amiajnl-2013-001718] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
OBJECTIVE Electronic health records (EHRs) contain information to detect adverse drug reactions (ADRs), as they contain comprehensive clinical information. A major challenge of using comprehensive information involves confounding. We propose a novel data-driven method to identify ADR signals accurately by adjusting for confounders. MATERIALS AND METHODS We focused on two serious ADRs, rhabdomyolysis and pancreatitis, and used information in 264,155 unique patient records. We identified an ADR using established criteria, selected potential confounders, and then used penalized logistic regressions to estimate confounder-adjusted ADR associations. A reference standard was created to evaluate and compare the precision of the proposed method and four others. RESULTS Precision was 83.3% for rhabdomyolysis and 60.8% for pancreatitis when using the proposed method, and we identified several drug safety signals that are interesting for further clinical review. DISCUSSION The proposed method effectively estimated ADR associations after adjusting for confounders. A main cause of error was probably due to the nature of the dataset in that a substantial number of patients had a single visit only and, therefore, it was not possible to determine correctly the appropriate sequence of events for them. It is likely that performance will be improved with use of EHR data that contain more longitudinal records. CONCLUSIONS This data-driven method is effective in controlling for confounding, resulting in either a higher or similar precision when compared with four comparators, has the unique ability to provide insight into confounders for each specific medication-ADR pair, and can be easily adapted to other EHR systems.
Collapse
Affiliation(s)
- Ying Li
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | | | | | | | | | | |
Collapse
|
123
|
LePendu P, Iyer SV, Bauer-Mehren A, Harpaz R, Mortensen JM, Podchiyska T, Ferris TA, Shah NH. Pharmacovigilance using clinical notes. Clin Pharmacol Ther 2013; 93:547-55. [PMID: 23571773 PMCID: PMC3846296 DOI: 10.1038/clpt.2013.47] [Citation(s) in RCA: 101] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
With increasing adoption of electronic health records (EHRs), there is an opportunity to use the free-text portion of EHRs for pharmacovigilance. We present novel methods that annotate the unstructured clinical notes and transform them into a deidentified patient-feature matrix encoded using medical terminologies. We demonstrate the use of the resulting high-throughput data for detecting drug-adverse event associations and adverse events associated with drug-drug interactions. We show that these methods flag adverse events early (in most cases before an official alert), allow filtering of spurious signals by adjusting for potential confounding, and compile prevalence information. We argue that analyzing large volumes of free-text clinical notes enables drug safety surveillance using a yet untapped data source. Such data mining can be used for hypothesis generation and for rapid analysis of suspected adverse event risk.
Collapse
Affiliation(s)
- P LePendu
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, California, USA.
| | | | | | | | | | | | | | | |
Collapse
|
124
|
Luo G. Open issues in intelligent personal health record--an updated status report for 2012. J Med Syst 2013; 37:9943. [PMID: 23584758 DOI: 10.1007/s10916-013-9943-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2012] [Accepted: 03/20/2013] [Indexed: 12/16/2022]
Abstract
To improve the capability and usability of the personal health record (PHR) as a tool to empower consumers in the management of their own health, we have proposed the concept of an intelligent PHR (iPHR) and built a prototype iPHR system with four functions. These four functions use various health knowledge and computer science techniques to automatically provide users with personalized healthcare information to facilitate their well-being. This paper discusses several open issues in iPHR, including two enhancements to an existing function and two potential new functions. The two enhancements are for automatically compiling relevant self-care activities for each health issue and automatically identifying contraindicated self-care activities, respectively. One potential new function is personalized search for individual healthcare providers. Another potential new function is personalized local search for health-related services to help maintain patients in their homes. We include some preliminary thoughts on how to address these open issues with the hope to stimulate future research work on iPHR.
Collapse
Affiliation(s)
- Gang Luo
- Department of Biomedical Informatics, University of Utah, HSEB Room 5725B, 26 South 2000 East, Salt Lake City, UT, 84112, USA,
| |
Collapse
|
125
|
Vilar S, Uriarte E, Santana L, Tatonetti NP, Friedman C. Detection of drug-drug interactions by modeling interaction profile fingerprints. PLoS One 2013; 8:e58321. [PMID: 23520498 PMCID: PMC3592896 DOI: 10.1371/journal.pone.0058321] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2012] [Accepted: 02/01/2013] [Indexed: 11/19/2022] Open
Abstract
Drug-drug interactions (DDIs) constitute an important problem in postmarketing pharmacovigilance and in the development of new drugs. The effectiveness or toxicity of a medication could be affected by the co-administration of other drugs that share pharmacokinetic or pharmacodynamic pathways. For this reason, a great effort is being made to develop new methodologies to detect and assess DDIs. In this article, we present a novel method based on drug interaction profile fingerprints (IPFs) with successful application to DDI detection. IPFs were generated based on the DrugBank database, which provided 9,454 well-established DDIs as a primary source of interaction data. The model uses IPFs to measure the similarity of pairs of drugs and generates new putative DDIs from the non-intersecting interactions of a pair. We described as part of our analysis the pharmacological and biological effects associated with the putative interactions; for example, the interaction between haloperidol and dicyclomine can cause increased risk of psychosis and tardive dyskinesia. First, we evaluated the method through hold-out validation and then by using four independent test sets that did not overlap with DrugBank. Precision for the test sets ranged from 0.4–0.5 with more than two fold enrichment factor enhancement. In conclusion, we demonstrated the usefulness of the method in pharmacovigilance as a DDI predictor, and created a dataset of potential DDIs, highlighting the etiology or pharmacological effect of the DDI, and providing an exploratory tool to facilitate decision support in DDI detection and patient safety.
Collapse
Affiliation(s)
- Santiago Vilar
- Department of Biomedical Informatics, Columbia University Medical Center, New York, New York, United States of America.
| | | | | | | | | |
Collapse
|
126
|
Cohen R, Elhadad M, Elhadad N. Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategies. BMC Bioinformatics 2013; 14:10. [PMID: 23323800 PMCID: PMC3599108 DOI: 10.1186/1471-2105-14-10] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2011] [Accepted: 12/24/2012] [Indexed: 11/13/2022] Open
Abstract
Background The increasing availability of Electronic Health Record (EHR) data and specifically free-text patient notes presents opportunities for phenotype extraction. Text-mining methods in particular can help disease modeling by mapping named-entities mentions to terminologies and clustering semantically related terms. EHR corpora, however, exhibit specific statistical and linguistic characteristics when compared with corpora in the biomedical literature domain. We focus on copy-and-paste redundancy: clinicians typically copy and paste information from previous notes when documenting a current patient encounter. Thus, within a longitudinal patient record, one expects to observe heavy redundancy. In this paper, we ask three research questions: (i) How can redundancy be quantified in large-scale text corpora? (ii) Conventional wisdom is that larger corpora yield better results in text mining. But how does the observed EHR redundancy affect text mining? Does such redundancy introduce a bias that distorts learned models? Or does the redundancy introduce benefits by highlighting stable and important subsets of the corpus? (iii) How can one mitigate the impact of redundancy on text mining? Results We analyze a large-scale EHR corpus and quantify redundancy both in terms of word and semantic concept repetition. We observe redundancy levels of about 30% and non-standard distribution of both words and concepts. We measure the impact of redundancy on two standard text-mining applications: collocation identification and topic modeling. We compare the results of these methods on synthetic data with controlled levels of redundancy and observe significant performance variation. Finally, we compare two mitigation strategies to avoid redundancy-induced bias: (i) a baseline strategy, keeping only the last note for each patient in the corpus; (ii) removing redundant notes with an efficient fingerprinting-based algorithm. aFor text mining, preprocessing the EHR corpus with fingerprinting yields significantly better results. Conclusions Before applying text-mining techniques, one must pay careful attention to the structure of the analyzed corpora. While the importance of data cleaning has been known for low-level text characteristics (e.g., encoding and spelling), high-level and difficult-to-quantify corpus characteristics, such as naturally occurring redundancy, can also hurt text mining. Fingerprinting enables text-mining techniques to leverage available data in the EHR corpus, while avoiding the bias introduced by redundancy.
Collapse
Affiliation(s)
- Raphael Cohen
- Department of Computer Science, Ben-Gurion University in the Negev, Beer-Sheva, Israel.
| | | | | |
Collapse
|
127
|
Abstract
Abstract: The combination of improved genomic analysis methods, decreasing genotyping costs, and increasing computing resources has led to an explosion of clinical genomic knowledge in the last decade. Similarly, healthcare systems are increasingly adopting robust electronic health record (EHR) systems that not only can improve health care, but also contain a vast repository of disease and treatment data that could be mined for genomic research. Indeed, institutions are creating EHR-linked DNA biobanks to enable genomic and pharmacogenomic research, using EHR data for phenotypic information. However, EHRs are designed primarily for clinical care, not research, so reuse of clinical EHR data for research purposes can be challenging. Difficulties in use of EHR data include: data availability, missing data, incorrect data, and vast quantities of unstructured narrative text data. Structured information includes billing codes, most laboratory reports, and other variables such as physiologic measurements and demographic information. Significant information, however, remains locked within EHR narrative text documents, including clinical notes and certain categories of test results, such as pathology and radiology reports. For relatively rare observations, combinations of simple free-text searches and billing codes may prove adequate when followed by manual chart review. However, to extract the large cohorts necessary for genome-wide association studies, natural language processing methods to process narrative text data may be needed. Combinations of structured and unstructured textual data can be mined to generate high-validity collections of cases and controls for a given condition. Once high-quality cases and controls are identified, EHR-derived cases can be used for genomic discovery and validation. Since EHR data includes a broad sampling of clinically-relevant phenotypic information, it may enable multiple genomic investigations upon a single set of genotyped individuals. This chapter reviews several examples of phenotype extraction and their application to genetic research, demonstrating a viable future for genomic discovery using EHR-linked data.
Collapse
Affiliation(s)
- Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America.
| |
Collapse
|
128
|
Gurulingappa H, Mateen‐Rajput A, Toldo L. Extraction of potential adverse drug events from medical case reports. J Biomed Semantics 2012; 3:15. [PMID: 23256479 PMCID: PMC3599676 DOI: 10.1186/2041-1480-3-15] [Citation(s) in RCA: 106] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2012] [Accepted: 11/22/2012] [Indexed: 11/23/2022] Open
Abstract
: The sheer amount of information about potential adverse drug events published in medical case reports pose major challenges for drug safety experts to perform timely monitoring. Efficient strategies for identification and extraction of information about potential adverse drug events from free-text resources are needed to support pharmacovigilance research and pharmaceutical decision making. Therefore, this work focusses on the adaptation of a machine learning-based system for the identification and extraction of potential adverse drug event relations from MEDLINE case reports. It relies on a high quality corpus that was manually annotated using an ontology-driven methodology. Qualitative evaluation of the system showed robust results. An experiment with large scale relation extraction from MEDLINE delivered under-identified potential adverse drug events not reported in drug monographs. Overall, this approach provides a scalable auto-assistance platform for drug safety professionals to automatically collect potential adverse drug events communicated as free-text data.
Collapse
Affiliation(s)
| | | | - Luca Toldo
- , Merck KGaA, Frankfurterstraße 250, Darmstadt 64293, Germany
| |
Collapse
|
129
|
Abstract
Medicines are designed to cure, treat, or prevent diseases; however, there are also risks in taking any medicine - particularly short term or long term adverse drug reactions (ADRs) can cause serious harm to patients. Adverse drug events have been estimated to cause over 700,000 emergency department visits each year in the United States. Thus, for medication safety, ADR monitoring is required for each drug throughout its life cycle, including early stages of drug design, different phases of clinical trials, and postmarketing surveillance. Pharmacovigilance (PhV) is the science that concerns with the detection, assessment, understanding and prevention of ADRs. In the pre-marketing stages of a drug, PhV primarily focuses on predicting potential ADRs using preclinical characteristics of the compounds (e.g., drug targets, chemical structure) or screening data (e.g., bioassay data). In the postmarketing stage, PhV has traditionally involved in mining spontaneous reports submitted to national surveillance systems. The research focus is currently shifting toward the use of data generated from platforms outside the conventional framework such as electronic medical records (EMRs), biomedical literature, and patient-reported data in online health forums. The emerging trend of PhV is to link preclinical data from the experimental platform with human safety information observed in the postmarketing phase. This article provides a general overview of the current computational methodologies applied for PhV at different stages of drug development and concludes with future directions and challenges.
Collapse
Affiliation(s)
- Mei Liu
- NJ Institute of Technology, Newark, NJ, USA
| | | | - Yong Hu
- Sun Yat-sen University, Guangzhou, China
| | - Hua Xu
- Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
130
|
Liu M, McPeek Hinz ER, Matheny ME, Denny JC, Schildcrout JS, Miller RA, Xu H. Comparative analysis of pharmacovigilance methods in the detection of adverse drug reactions using electronic medical records. J Am Med Inform Assoc 2012; 20:420-6. [PMID: 23161894 DOI: 10.1136/amiajnl-2012-001119] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
OBJECTIVE Medication safety requires that each drug be monitored throughout its market life as early detection of adverse drug reactions (ADRs) can lead to alerts that prevent patient harm. Recently, electronic medical records (EMRs) have emerged as a valuable resource for pharmacovigilance. This study examines the use of retrospective medication orders and inpatient laboratory results documented in the EMR to identify ADRs. METHODS Using 12 years of EMR data from Vanderbilt University Medical Center (VUMC), we designed a study to correlate abnormal laboratory results with specific drug administrations by comparing the outcomes of a drug-exposed group and a matched unexposed group. We assessed the relative merits of six pharmacovigilance measures used in spontaneous reporting systems (SRSs): proportional reporting ratio (PRR), reporting OR (ROR), Yule's Q (YULE), the χ(2) test (CHI), Bayesian confidence propagation neural networks (BCPNN), and a gamma Poisson shrinker (GPS). RESULTS We systematically evaluated the methods on two independently constructed reference standard datasets of drug-event pairs. The dataset of Yoon et al contained 470 drug-event pairs (10 drugs and 47 laboratory abnormalities). Using VUMC's EMR, we created another dataset of 378 drug-event pairs (nine drugs and 42 laboratory abnormalities). Evaluation on our reference standard showed that CHI, ROR, PRR, and YULE all had the same F score (62%). When the reference standard of Yoon et al was used, ROR had the best F score of 68%, with 77% precision and 61% recall. CONCLUSIONS Results suggest that EMR-derived laboratory measurements and medication orders can help to validate previously reported ADRs, and detect new ADRs.
Collapse
Affiliation(s)
- Mei Liu
- Department of Computer Science, New Jersey Institute of Technology, Newark, New Jersey, USA
| | | | | | | | | | | | | |
Collapse
|
131
|
Yang C, Srinivasan P, Polgreen PM. Automatic adverse drug events detection using letters to the editor. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2012; 2012:1030-1039. [PMID: 23304379 PMCID: PMC3540506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
We present and test the intuition that letters to the editor in journals carry early signals of adverse drug events (ADEs). Surprisingly these letters have not yet been exploited for automatic ADE detection unlike for example, clinical records and PubMed. Part of the challenge is that it is not easy to access the full-text of letters (for the most part these do not appear in PubMed). Also letters are likely underrated in comparison with full articles. Besides demonstrating that this intuition holds we contribute techniques for post market drug surveillance. Specifically, we test an automatic approach for ADE detection from letters using off-the-shelf machine learning tools. We also involve natural language processing for feature definitions. Overall we achieve high accuracy in our experiments and our method also works well on a second new test set. Our results encourage us to further pursue this line of research.
Collapse
Affiliation(s)
- Chao Yang
- Department of Computer Science, The University of Iowa, Iowa City, IA, USA
| | | | | |
Collapse
|
132
|
Hanauer DA, Ramakrishnan N. Modeling temporal relationships in large scale clinical associations. J Am Med Inform Assoc 2012; 20:332-41. [PMID: 23019240 DOI: 10.1136/amiajnl-2012-001117] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
OBJECTIVE We describe an approach for modeling temporal relationships in a large scale association analysis of electronic health record data. The addition of temporal information can inform hypothesis generation and help to explain the relationships. We applied this approach on a dataset containing 41.2 million time-stamped International Classification of Diseases, Ninth Revision (ICD-9) codes from 1.6 million patients. METHODS We performed two independent analyses including a pairwise association analysis using a χ(2) test and a temporal analysis using a binomial test. Data were visualized using network diagrams and reviewed for clinical significance. RESULTS We found nearly 400 000 highly associated pairs of ICD-9 codes with varying numbers of strong temporal associations ranging from ≥1 day to ≥10 years apart. Most of the findings were not considered clinically novel, although some, such as an association between Helicobacter pylori infection and diabetes, have recently been reported in the literature. The temporal analysis in our large cohort, however, revealed that diabetes usually preceded the diagnoses of H pylori, raising questions about possible cause and effect. DISCUSSION Such analyses have significant limitations, some of which are due to known problems with ICD-9 codes and others to potentially incomplete data even at a health system level. Nevertheless, large scale association analyses with temporal modeling can help provide a mechanism for novel discovery in support of hypothesis generation. CONCLUSIONS Temporal relationships can provide an additional layer of meaning in identifying and interpreting clinical associations.
Collapse
Affiliation(s)
- David A Hanauer
- Department of Pediatrics, University of Michigan Medical School, Ann Arbor, MI 48109-5940, USA.
| | | |
Collapse
|
133
|
Kim HE, Jiang X, Kim J, Ohno-Machado L. Trends in biomedical informatics: most cited topics from recent years. J Am Med Inform Assoc 2012; 18 Suppl 1:i166-70. [PMID: 22180873 DOI: 10.1136/amiajnl-2011-000706] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
Biomedical informatics is a young, highly interdisciplinary field that is evolving quickly. It is important to know which published topics in generalist biomedical informatics journals elicit the most interest from the scientific community, and whether this interest changes over time, so that journals can better serve their readers. It is also important to understand whether free access to biomedical informatics articles impacts their citation rates in a significant way, so authors can make informed decisions about unlock fees, and journal owners and publishers understand the implications of open access. The topics and JAMIA articles from years 2009 and 2010 that have been most cited according to the Web of Science are described. To better understand the effects of free access in article dissemination, the number of citations per month after publication for articles published in 2009 versus 2010 was compared, since there was a significant change in free access to JAMIA articles between those years. Results suggest that there is a positive association between free access and citation rate for JAMIA articles.
Collapse
Affiliation(s)
- Hyeon-Eui Kim
- Division of Biomedical Informatics, Department of Medicine, University of California-San Diego, La Jolla, California 92093, USA
| | | | | | | |
Collapse
|
134
|
Warrer P, Hansen EH, Juhl-Jensen L, Aagaard L. Using text-mining techniques in electronic patient records to identify ADRs from medicine use. Br J Clin Pharmacol 2012; 73:674-84. [PMID: 22122057 DOI: 10.1111/j.1365-2125.2011.04153.x] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
This literature review included studies that use text-mining techniques in narrative documents stored in electronic patient records (EPRs) to investigate ADRs. We searched PubMed, Embase, Web of Science and International Pharmaceutical Abstracts without restrictions from origin until July 2011. We included empirically based studies on text mining of electronic patient records (EPRs) that focused on detecting ADRs, excluding those that investigated adverse events not related to medicine use. We extracted information on study populations, EPR data sources, frequencies and types of the identified ADRs, medicines associated with ADRs, text-mining algorithms used and their performance. Seven studies, all from the United States, were eligible for inclusion in the review. Studies were published from 2001, the majority between 2009 and 2010. Text-mining techniques varied over time from simple free text searching of outpatient visit notes and inpatient discharge summaries to more advanced techniques involving natural language processing (NLP) of inpatient discharge summaries. Performance appeared to increase with the use of NLP, although many ADRs were still missed. Due to differences in study design and populations, various types of ADRs were identified and thus we could not make comparisons across studies. The review underscores the feasibility and potential of text mining to investigate narrative documents in EPRs for ADRs. However, more empirical studies are needed to evaluate whether text mining of EPRs can be used systematically to collect new information about ADRs.
Collapse
Affiliation(s)
- Pernille Warrer
- Department of Pharmacology and Pharmacotherapy, Section for Social Pharmacy, Faculty of Pharmaceutical Sciences, University of Copenhagen, Copenhagen, Denmark.
| | | | | | | |
Collapse
|
135
|
Enhancing adverse drug event detection in electronic health records using molecular structure similarity: application to pancreatitis. PLoS One 2012; 7:e41471. [PMID: 22911794 PMCID: PMC3404072 DOI: 10.1371/journal.pone.0041471] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2012] [Accepted: 06/21/2012] [Indexed: 12/14/2022] Open
Abstract
Background Adverse drug events (ADEs) detection and assessment is at the center of pharmacovigilance. Data mining of systems, such as FDA’s Adverse Event Reporting System (AERS) and more recently, Electronic Health Records (EHRs), can aid in the automatic detection and analysis of ADEs. Although different data mining approaches have been shown to be valuable, it is still crucial to improve the quality of the generated signals. Objective To leverage structural similarity by developing molecular fingerprint-based models (MFBMs) to strengthen ADE signals generated from EHR data. Methods A reference standard of drugs known to be causally associated with the adverse event pancreatitis was used to create a MFBM. Electronic Health Records (EHRs) from the New York Presbyterian Hospital were mined to generate structured data. Disproportionality Analysis (DPA) was applied to the data, and 278 possible signals related to the ADE pancreatitis were detected. Candidate drugs associated with these signals were then assessed using the MFBM to find the most promising candidates based on structural similarity. Results The use of MFBM as a means to strengthen or prioritize signals generated from the EHR significantly improved the detection accuracy of ADEs related to pancreatitis. MFBM also highlights the etiology of the ADE by identifying structurally similar drugs, which could follow a similar mechanism of action. Conclusion The method proposed in this paper provides evidence of being a promising adjunct to existing automated ADE detection and analysis approaches.
Collapse
|
136
|
Hahn U, Cohen KB, Garten Y, Shah NH. Mining the pharmacogenomics literature--a survey of the state of the art. Brief Bioinform 2012; 13:460-94. [PMID: 22833496 PMCID: PMC3404399 DOI: 10.1093/bib/bbs018] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2011] [Accepted: 03/23/2012] [Indexed: 01/05/2023] Open
Abstract
This article surveys efforts on text mining of the pharmacogenomics literature, mainly from the period 2008 to 2011. Pharmacogenomics (or pharmacogenetics) is the field that studies how human genetic variation impacts drug response. Therefore, publications span the intersection of research in genotypes, phenotypes and pharmacology, a topic that has increasingly become a focus of active research in recent years. This survey covers efforts dealing with the automatic recognition of relevant named entities (e.g. genes, gene variants and proteins, diseases and other pathological phenomena, drugs and other chemicals relevant for medical treatment), as well as various forms of relations between them. A wide range of text genres is considered, such as scientific publications (abstracts, as well as full texts), patent texts and clinical narratives. We also discuss infrastructure and resources needed for advanced text analytics, e.g. document corpora annotated with corresponding semantic metadata (gold standards and training data), biomedical terminologies and ontologies providing domain-specific background knowledge at different levels of formality and specificity, software architectures for building complex and scalable text analytics pipelines and Web services grounded to them, as well as comprehensive ways to disseminate and interact with the typically huge amounts of semiformal knowledge structures extracted by text mining tools. Finally, we consider some of the novel applications that have already been developed in the field of pharmacogenomic text mining and point out perspectives for future research.
Collapse
Affiliation(s)
- Udo Hahn
- Jena University Language and Information Engineering (JULIE) Lab, Friedrich-Schiller-Universität Jena, Jena, Germany.
| | | | | | | |
Collapse
|
137
|
Haerian K, Varn D, Vaidya S, Ena L, Chase HS, Friedman C. Detection of pharmacovigilance-related adverse events using electronic health records and automated methods. Clin Pharmacol Ther 2012; 92:228-34. [PMID: 22713699 DOI: 10.1038/clpt.2012.54] [Citation(s) in RCA: 87] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Electronic health records (EHRs) are an important source of data for detection of adverse drug reactions (ADRs). However, adverse events are frequently due not to medications but to the patients' underlying conditions. Mining to detect ADRs from EHR data must account for confounders. We developed an automated method using natural-language processing (NLP) and a knowledge source to differentiate cases in which the patient's disease is responsible for the event rather than a drug. Our method was applied to 199,920 hospitalization records, concentrating on two serious ADRs: rhabdomyolysis (n = 687) and agranulocytosis (n = 772). Our method automatically identified 75% of the cases, those with disease etiology. The sensitivity and specificity were 93.8% (confidence interval: 88.9-96.7%) and 91.8% (confidence interval: 84.0-96.2%), respectively. The method resulted in considerable saving of time: for every 1 h spent in development, there was a saving of at least 20 h in manual review. The review of the remaining 25% of the cases therefore became more feasible, allowing us to identify the medications that had caused the ADRs.
Collapse
Affiliation(s)
- K Haerian
- Department of Biomedical Informatics, Columbia University, New York, NY, USA.
| | | | | | | | | | | |
Collapse
|
138
|
Vilar S, Harpaz R, Uriarte E, Santana L, Rabadan R, Friedman C. Drug-drug interaction through molecular structure similarity analysis. J Am Med Inform Assoc 2012; 19:1066-74. [PMID: 22647690 DOI: 10.1136/amiajnl-2012-000935] [Citation(s) in RCA: 140] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
BACKGROUND Drug-drug interactions (DDIs) are responsible for many serious adverse events; their detection is crucial for patient safety but is very challenging. Currently, the US Food and Drug Administration and pharmaceutical companies are showing great interest in the development of improved tools for identifying DDIs. METHODS We present a new methodology applicable on a large scale that identifies novel DDIs based on molecular structural similarity to drugs involved in established DDIs. The underlying assumption is that if drug A and drug B interact to produce a specific biological effect, then drugs similar to drug A (or drug B) are likely to interact with drug B (or drug A) to produce the same effect. DrugBank was used as a resource for collecting 9454 established DDIs. The structural similarity of all pairs of drugs in DrugBank was computed to identify DDI candidates. RESULTS The methodology was evaluated using as a gold standard the interactions retrieved from the initial DrugBank database. Results demonstrated an overall sensitivity of 0.68, specificity of 0.96, and precision of 0.26. Additionally, the methodology was also evaluated in an independent test using the Micromedex/Drugdex database. CONCLUSION The proposed methodology is simple, efficient, allows the investigation of large numbers of drugs, and helps highlight the etiology of DDI. A database of 58 403 predicted DDIs with structural evidence is provided as an open resource for investigators seeking to analyze DDIs.
Collapse
Affiliation(s)
- Santiago Vilar
- Department of Biomedical Informatics, Columbia University Medical Center, New York, NY, USA.
| | | | | | | | | | | |
Collapse
|
139
|
Mining electronic health records: towards better research applications and clinical care. Nat Rev Genet 2012; 13:395-405. [PMID: 22549152 DOI: 10.1038/nrg3208] [Citation(s) in RCA: 733] [Impact Index Per Article: 56.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Clinical data describing the phenotypes and treatment of patients represents an underused data source that has much greater research potential than is currently realized. Mining of electronic health records (EHRs) has the potential for establishing new patient-stratification principles and for revealing unknown disease correlations. Integrating EHR data with genetic data will also give a finer understanding of genotype-phenotype relationships. However, a broad range of ethical, legal and technical reasons currently hinder the systematic deposition of these data in EHRs and their mining. Here, we consider the potential for furthering medical research and clinical care using EHR data and the challenges that must be overcome before this is a reality.
Collapse
|
140
|
Lependu P, Iyer SV, Fairon C, Shah NH. Annotation Analysis for Testing Drug Safety Signals using Unstructured Clinical Notes. J Biomed Semantics 2012; 3 Suppl 1:S5. [PMID: 22541596 PMCID: PMC3337270 DOI: 10.1186/2041-1480-3-s1-s5] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Background The electronic surveillance for adverse drug events is largely based upon the analysis of coded data from reporting systems. Yet, the vast majority of electronic health data lies embedded within the free text of clinical notes and is not gathered into centralized repositories. With the increasing access to large volumes of electronic medical data—in particular the clinical notes—it may be possible to computationally encode and to test drug safety signals in an active manner. Results We describe the application of simple annotation tools on clinical text and the mining of the resulting annotations to compute the risk of getting a myocardial infarction for patients with rheumatoid arthritis that take Vioxx. Our analysis clearly reveals elevated risks for myocardial infarction in rheumatoid arthritis patients taking Vioxx (odds ratio 2.06) before 2005. Conclusions Our results show that it is possible to apply annotation analysis methods for testing hypotheses about drug safety using electronic medical records.
Collapse
Affiliation(s)
- Paea Lependu
- Stanford Center for Biomedical Informatics Research, Stanford University, USA.
| | | | | | | |
Collapse
|
141
|
LePendu P, Liu Y, Iyer S, Udell MR, Shah NH. Analyzing patterns of drug use in clinical notes for patient safety. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2012; 2012:63-70. [PMID: 22779054 PMCID: PMC3392046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Doctors prescribe drugs for indications that are not FDA approved. Research indicates that 21% of prescriptions filled are for off-label indications. Of those, more than 73% lack supporting scientific evidence. Traditional drug safety alerts may not cover usages that are not FDA approved. Therefore, analyzing patterns of off-label drug usage in the clinical setting is an important step toward reducing the incidence of adverse events and for improving patient safety. We applied term extraction tools on the clinical notes of a million patients to compile a database of statistically significant patterns of drug use. We validated some of the usage patterns learned from the data against sources of known on-label and off-label use. Given our ability to quantify adverse event risks using the clinical notes, this will enable us to address patient safety because we can now rank-order off-label drug use and prioritize the search for their adverse event profiles.
Collapse
|
142
|
Liu Y, LePendu P, Iyer S, Shah NH. Using temporal patterns in medical records to discern adverse drug events from indications. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2012; 2012:47-56. [PMID: 22779050 PMCID: PMC3392062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
Researchers estimate that electronic health record systems record roughly 2-million ambulatory adverse drug events and that patients suffer from adverse drug events in roughly 30% of hospital stays. Some have used structured databases of patient medical records and health insurance claims recently-going beyond the current paradigm of using spontaneous reporting systems like AERS-to detect drug-safety signals. However, most efforts do not use the free-text from clinical notes in monitoring for drug-safety signals. We hypothesize that drug-disease co-occurrences, extracted from ontology-based annotations of the clinical notes, can be examined for statistical enrichment and used for drug safety surveillance. When analyzing such co-occurrences of drugs and diseases, one major challenge is to differentiate whether the disease in a drug-disease pair represents an indication or an adverse event. We demonstrate that it is possible to make this distinction by combining the frequency distribution of the drug, the disease, and the drug-disease pair as well as the temporal ordering of the drugs and diseases in each pair across more than one million patients.
Collapse
|
143
|
Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language processing: an introduction. J Am Med Inform Assoc 2012; 18:544-51. [PMID: 21846786 DOI: 10.1136/amiajnl-2011-000464] [Citation(s) in RCA: 483] [Impact Index Per Article: 37.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
OBJECTIVES To provide an overview and tutorial of natural language processing (NLP) and modern NLP-system design. TARGET AUDIENCE This tutorial targets the medical informatics generalist who has limited acquaintance with the principles behind NLP and/or limited knowledge of the current state of the art. SCOPE We describe the historical evolution of NLP, and summarize common NLP sub-problems in this extensive field. We then provide a synopsis of selected highlights of medical NLP efforts. After providing a brief description of common machine-learning approaches that are being used for diverse NLP sub-problems, we discuss how modern NLP architectures are designed, with a summary of the Apache Foundation's Unstructured Information Management Architecture. We finally consider possible future directions for NLP, and reflect on the possible impact of IBM Watson on the medical field.
Collapse
|
144
|
Detection of Adverse Drug Reaction Signals Using an Electronic Health Records Database: Comparison of the Laboratory Extreme Abnormality Ratio (CLEAR) Algorithm. Clin Pharmacol Ther 2012; 91:467-74. [PMID: 22237257 DOI: 10.1038/clpt.2011.248] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
145
|
Natural Language Processing, Electronic Health Records, and Clinical Research. HEALTH INFORMATICS 2012. [DOI: 10.1007/978-1-84882-448-5_16] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2023]
|
146
|
Harkema H, Chapman WW, Saul M, Dellon ES, Schoen RE, Mehrotra A. Developing a natural language processing application for measuring the quality of colonoscopy procedures. J Am Med Inform Assoc 2011; 18 Suppl 1:i150-6. [PMID: 21946240 PMCID: PMC3241178 DOI: 10.1136/amiajnl-2011-000431] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2011] [Accepted: 08/18/2011] [Indexed: 12/24/2022] Open
Abstract
OBJECTIVE The quality of colonoscopy procedures for colorectal cancer screening is often inadequate and varies widely among physicians. Routine measurement of quality is limited by the costs of manual review of free-text patient charts. Our goal was to develop a natural language processing (NLP) application to measure colonoscopy quality. MATERIALS AND METHODS Using a set of quality measures published by physician specialty societies, we implemented an NLP engine that extracts 21 variables for 19 quality measures from free-text colonoscopy and pathology reports. We evaluated the performance of the NLP engine on a test set of 453 colonoscopy reports and 226 pathology reports, considering accuracy in extracting the values of the target variables from text, and the reliability of the outcomes of the quality measures as computed from the NLP-extracted information. RESULTS The average accuracy of the NLP engine over all variables was 0.89 (range: 0.62-1.0) and the average F measure over all variables was 0.74 (range: 0.49-0.89). The average agreement score, measured as Cohen's κ, between the manually established and NLP-derived outcomes of the quality measures was 0.62 (range: 0.09-0.86). DISCUSSION For nine of the 19 colonoscopy quality measures, the agreement score was 0.70 or above, which we consider a sufficient score for the NLP-derived outcomes of these measures to be practically useful for quality measurement. CONCLUSION The use of NLP for information extraction from free-text colonoscopy and pathology reports creates opportunities for large scale, routine quality measurement, which can support quality improvement in colonoscopy care.
Collapse
Affiliation(s)
- Henk Harkema
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, USA.
| | | | | | | | | | | |
Collapse
|
147
|
The Growing Role of Clinical and Genomic Databases in the Development of Antifungal Strategies. CURRENT FUNGAL INFECTION REPORTS 2011. [DOI: 10.1007/s12281-011-0071-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
148
|
Hripcsak G, Albers DJ, Perotte A. Exploiting time in electronic health record correlations. J Am Med Inform Assoc 2011; 18 Suppl 1:i109-15. [PMID: 22116643 DOI: 10.1136/amiajnl-2011-000463] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
OBJECTIVE To demonstrate that a large, heterogeneous clinical database can reveal fine temporal patterns in clinical associations; to illustrate several types of associations; and to ascertain the value of exploiting time. MATERIALS AND METHODS Lagged linear correlation was calculated between seven clinical laboratory values and 30 clinical concepts extracted from resident signout notes from a 22-year, 3-million-patient database of electronic health records. Time points were interpolated, and patients were normalized to reduce inter-patient effects. RESULTS The method revealed several types of associations with detailed temporal patterns. Definitional associations included low blood potassium preceding 'hypokalemia.' Low potassium preceding the drug spironolactone with high potassium following spironolactone exemplified intentional and physiologic associations, respectively. Counterintuitive results such as the fact that diseases appeared to follow their effects may be due to the workflow of healthcare, in which clinical findings precede the clinician's diagnosis of a disease even though the disease actually preceded the findings. Fully exploiting time by interpolating time points produced less noisy results. DISCUSSION Electronic health records are not direct reflections of the patient state, but rather reflections of the healthcare process and the recording process. With proper techniques and understanding, and with proper incorporation of time, interpretable associations can be derived from a large clinical database. CONCLUSION A large, heterogeneous clinical database can reveal clinical associations, time is an important feature, and care must be taken to interpret the results.
Collapse
Affiliation(s)
- George Hripcsak
- Department of Biomedical Informatics, Columbia University, New York, New York 10032, USA.
| | | | | |
Collapse
|
149
|
Nikfarjam A, Gonzalez GH. Pattern mining for extraction of mentions of Adverse Drug Reactions from user comments. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2011; 2011:1019-1026. [PMID: 22195162 PMCID: PMC3243273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Rapid growth of online health social networks has enabled patients to communicate more easily with each other. This way of exchange of opinions and experiences has provided a rich source of information about drugs and their effectiveness and more importantly, their possible adverse reactions. We developed a system to automatically extract mentions of Adverse Drug Reactions (ADRs) from user reviews about drugs in social network websites by mining a set of language patterns. The system applied association rule mining on a set of annotated comments to extract the underlying patterns of colloquial expressions about adverse effects. The patterns were tested on a set of unseen comments to evaluate their performance. We reached to precision of 70.01% and recall of 66.32% and F-measure of 67.96%.
Collapse
Affiliation(s)
- Azadeh Nikfarjam
- Biomedical Informatics Department, Arizona State University, Phoenix, AZ, USA
| | | |
Collapse
|
150
|
Sohn S, Kocher JPA, Chute CG, Savova GK. Drug side effect extraction from clinical narratives of psychiatry and psychology patients. J Am Med Inform Assoc 2011; 18 Suppl 1:i144-9. [PMID: 21946242 DOI: 10.1136/amiajnl-2011-000351] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
OBJECTIVE To extract physician-asserted drug side effects from electronic medical record clinical narratives. MATERIALS AND METHODS Pattern matching rules were manually developed through examining keywords and expression patterns of side effects to discover an individual side effect and causative drug relationship. A combination of machine learning (C4.5) using side effect keyword features and pattern matching rules was used to extract sentences that contain side effect and causative drug pairs, enabling the system to discover most side effect occurrences. Our system was implemented as a module within the clinical Text Analysis and Knowledge Extraction System. RESULTS The system was tested in the domain of psychiatry and psychology. The rule-based system extracting side effects and causative drugs produced an F score of 0.80 (0.55 excluding allergy section). The hybrid system identifying side effect sentences had an F score of 0.75 (0.56 excluding allergy section) but covered more side effect and causative drug pairs than individual side effect extraction. DISCUSSION The rule-based system was able to identify most side effects expressed by clear indication words. More sophisticated semantic processing is required to handle complex side effect descriptions in the narrative. We demonstrated that our system can be trained to identify sentences with complex side effect descriptions that can be submitted to a human expert for further abstraction. CONCLUSION Our system was able to extract most physician-asserted drug side effects. It can be used in either an automated mode for side effect extraction or semi-automated mode to identify side effect sentences that can significantly simplify abstraction by a human expert.
Collapse
Affiliation(s)
- Sunghwan Sohn
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota 55905, USA.
| | | | | | | |
Collapse
|