151
|
Linder JA, Haas JS, Iyer A, Labuzetta MA, Ibara M, Celeste M, Getty G, Bates DW. Secondary use of electronic health record data: spontaneous triggered adverse drug event reporting. Pharmacoepidemiol Drug Saf 2011; 19:1211-5. [PMID: 21155192 DOI: 10.1002/pds.2027] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
PURPOSE Physicians in the United States report fewer than 1% of adverse drug events (ADEs) to the Food and Drug Administration (FDA), but frequently document ADEs within electronic health records (EHRs). We developed and implemented a generalizable, scalable EHR-based system to automatically send electronic ADE reports to the FDA in real-time. METHODS Proof-of-concept study involving 26 clinicians given access to EHR-based ADE reporting functionality from December 2008 to May 2009. MEASUREMENTS Number and content of ADE reports; severity of adverse reactions (clinician and computer algorithm defined); clinician survey. RESULTS During the study period, 26 clinicians submitted 217 reports to the FDA. The clinicians defined 23% of the ADEs as serious and a computer algorithm defined 4% of the ADEs as serious. The most common drug classes were cardiovascular drugs (40%), central nervous system drugs (19%), analgesics (13%), and endocrine drugs (7%). The reports contained information, pre-filled from the EHR, about comorbid conditions (207 reports [95%] listed 1899 comorbid conditions), concurrent medications (193 reports [89%] listed 1687 concurrent medications), weight (209 reports [96%]), and laboratory data (215 reports [99%]). It took clinicians a mean of 53 seconds to complete and send the form. In the clinician survey, 21 of 23 respondents (91%) said they had submitted zero ADE reports to the FDA in the prior 12 months. CONCLUSIONS EHR-based, triggered ADE reporting is efficient and acceptable to clinicians, provides detailed clinical information, and has the potential to greatly increase the number and quality of spontaneous reports submitted to the FDA.
Collapse
Affiliation(s)
- Jeffrey A Linder
- Division of General Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA 02120, USA.
| | | | | | | | | | | | | | | |
Collapse
|
152
|
Oprea TI, Nielsen SK, Ursu O, Yang JJ, Taboureau O, Mathias SL, Kouskoumvekaki L, Sklar LA, Bologa CG. Associating Drugs, Targets and Clinical Outcomes into an Integrated Network Affords a New Platform for Computer-Aided Drug Repurposing. Mol Inform 2011; 30:100-111. [PMID: 22287994 PMCID: PMC3266123 DOI: 10.1002/minf.201100023] [Citation(s) in RCA: 89] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Finding new uses for old drugs is a strategy embraced by the pharmaceutical industry, with increasing participation from the academic sector. Drug repurposing efforts focus on identifying novel modes of action, but not in a systematic manner. With intensive data mining and curation, we aim to apply bio- and cheminformatics tools using the DRUGS database, containing 3,837 unique small molecules annotated on 1,750 proteins. These are likely to serve as drug targets and antitargets (i.e., associated with side effects, SE). The academic community, the pharmaceutical sector and clinicians alike could benefit from an integrated, semantic-web compliant computer-aided drug repurposing (CADR) effort, one that would enable deep data mining of associations between approved drugs (D), targets (T), clinical outcomes (CO) and SE. We report preliminary results from text mining and multivariate statistics, based on 7,684 approved drug labels, ADL (Dailymed) via text mining. From the ADL corresponding to 988 unique drugs, the "adverse reactions" section was mapped onto 174 SE, then clustered via principal component analysis into a 5x5 self-organizing map that was integrated into a Cytoscape network of SE-D-T-CO. This type of data can be used to streamline drug repurposing and may result in novel insights that can lead to the identification of novel drug actions.
Collapse
Affiliation(s)
- Tudor I. Oprea
- Division of Biocomputing, Department of Biochemistry and Molecular Biology, University of New Mexico School of Medicine, MSC11 6145, Albuquerque, NM 87131, USA
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark Kemitorvet 8, Kgs. Lyngby, 2800, Denmark
- UNM Center for Molecular Discovery, University of New Mexico School of Medicine, MSC11 6145, Albuquerque, NM 87131, USA
| | - Sonny Kim Nielsen
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark Kemitorvet 8, Kgs. Lyngby, 2800, Denmark
| | - Oleg Ursu
- Division of Biocomputing, Department of Biochemistry and Molecular Biology, University of New Mexico School of Medicine, MSC11 6145, Albuquerque, NM 87131, USA
- UNM Center for Molecular Discovery, University of New Mexico School of Medicine, MSC11 6145, Albuquerque, NM 87131, USA
| | - Jeremy J. Yang
- Division of Biocomputing, Department of Biochemistry and Molecular Biology, University of New Mexico School of Medicine, MSC11 6145, Albuquerque, NM 87131, USA
- UNM Center for Molecular Discovery, University of New Mexico School of Medicine, MSC11 6145, Albuquerque, NM 87131, USA
| | - Olivier Taboureau
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark Kemitorvet 8, Kgs. Lyngby, 2800, Denmark
| | - Stephen L. Mathias
- Division of Biocomputing, Department of Biochemistry and Molecular Biology, University of New Mexico School of Medicine, MSC11 6145, Albuquerque, NM 87131, USA
- UNM Center for Molecular Discovery, University of New Mexico School of Medicine, MSC11 6145, Albuquerque, NM 87131, USA
| | - lrene Kouskoumvekaki
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark Kemitorvet 8, Kgs. Lyngby, 2800, Denmark
| | - Larry A. Sklar
- UNM Center for Molecular Discovery, University of New Mexico School of Medicine, MSC11 6145, Albuquerque, NM 87131, USA
| | - Cristian G. Bologa
- Division of Biocomputing, Department of Biochemistry and Molecular Biology, University of New Mexico School of Medicine, MSC11 6145, Albuquerque, NM 87131, USA
- UNM Center for Molecular Discovery, University of New Mexico School of Medicine, MSC11 6145, Albuquerque, NM 87131, USA
| |
Collapse
|
153
|
Nadkarni PM. Drug safety surveillance using de-identified EMR and claims data: issues and challenges. J Am Med Inform Assoc 2011; 17:671-4. [PMID: 20962129 DOI: 10.1136/jamia.2010.008607] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
The author discusses the challenges of pharmacovigilance using electronic medical record and claims data. Use of ICD-9 encoded data has low sensitivity for detection of adverse drug events (ADEs), because it requires that an ADE escalate to major-complaint level before it can be identified, and because clinical symptomatology is relatively under-represented in ICD-9. A more appropriate vocabulary for ADE identification, SNOMED CT, awaits wider deployment. The narrative-text record of progress notes can potentially be used for more sensitive ADE detection. More effective surveillance will require the ability to grade ADEs by severity. Finally, access to online drug information that includes both a reliable hierarchy of drug families as well as structured information on existing ADEs can improve the focus and predictive ability of surveillance efforts.
Collapse
Affiliation(s)
- Prakash M Nadkarni
- Center for Medical Informatics, Yale University School of Medicine, New Haven, CT 06511, USA.
| |
Collapse
|
154
|
Nadkarni PM, Darer JD. Determining correspondences between high-frequency MedDRA concepts and SNOMED: a case study. BMC Med Inform Decis Mak 2010; 10:66. [PMID: 21029418 PMCID: PMC2988705 DOI: 10.1186/1472-6947-10-66] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2009] [Accepted: 10/28/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The Systematic Nomenclature of Medicine Clinical Terms (SNOMED CT) is being advocated as the foundation for encoding clinical documentation. While the electronic medical record is likely to play a critical role in pharmacovigilance - the detection of adverse events due to medications - classification and reporting of Adverse Events is currently based on the Medical Dictionary of Regulatory Activities (MedDRA). Complete and high-quality MedDRA-to-SNOMED CT mappings can therefore facilitate pharmacovigilance. The existing mappings, as determined through the Unified Medical Language System (UMLS), are partial, and record only one-to-one correspondences even though SNOMED CT can be used compositionally. Efforts to map previously unmapped MedDRA concepts would be most productive if focused on concepts that occur frequently in actual adverse event data. We aimed to identify aspects of MedDRA that complicate mapping to SNOMED CT, determine pattern in unmapped high-frequency MedDRA concepts, and to identify types of integration errors in the mapping of MedDRA to UMLS. METHODS Using one years' data from the US Federal Drug Administrations Adverse Event Reporting System, we identified MedDRA preferred terms that collectively accounted for 95% of both Adverse Events and Therapeutic Indications records. After eliminating those already mapping to SNOMED CT, we attempted to map the remaining 645 Adverse-Event and 141 Therapeutic-Indications preferred terms with software assistance. RESULTS All but 46 Adverse-Event and 7 Therapeutic-Indications preferred terms could be composed using SNOMED CT concepts: none of these required more than 3 SNOMED CT concepts to compose. We describe the common composition patterns in the paper. About 30% of both Adverse-Event and Therapeutic-Indications Preferred Terms corresponded to single SNOMED CT concepts: the correspondence was detectable by human inspection but had been missed during the integration process, which had created duplicated concepts in UMLS. CONCLUSIONS Identification of composite mapping patterns, and the types of errors that occur in the MedDRA content within UMLS, can focus larger-scale efforts on improving the quality of such mappings, which may assist in the creation of an adverse-events ontology.
Collapse
Affiliation(s)
- Prakash M Nadkarni
- Geisinger Health Systems, Danville, PA, USA
- Center for Medical Informatics, Yale University School of Medicine, New Haven, CT, USA
| | | |
Collapse
|
155
|
Buczak AL, Babin S, Moniz L. Data-driven approach for creating synthetic electronic medical records. BMC Med Inform Decis Mak 2010; 10:59. [PMID: 20946670 PMCID: PMC2972239 DOI: 10.1186/1472-6947-10-59] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2010] [Accepted: 10/14/2010] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND New algorithms for disease outbreak detection are being developed to take advantage of full electronic medical records (EMRs) that contain a wealth of patient information. However, due to privacy concerns, even anonymized EMRs cannot be shared among researchers, resulting in great difficulty in comparing the effectiveness of these algorithms. To bridge the gap between novel bio-surveillance algorithms operating on full EMRs and the lack of non-identifiable EMR data, a method for generating complete and synthetic EMRs was developed. METHODS This paper describes a novel methodology for generating complete synthetic EMRs both for an outbreak illness of interest (tularemia) and for background records. The method developed has three major steps: 1) synthetic patient identity and basic information generation; 2) identification of care patterns that the synthetic patients would receive based on the information present in real EMR data for similar health problems; 3) adaptation of these care patterns to the synthetic patient population. RESULTS We generated EMRs, including visit records, clinical activity, laboratory orders/results and radiology orders/results for 203 synthetic tularemia outbreak patients. Validation of the records by a medical expert revealed problems in 19% of the records; these were subsequently corrected. We also generated background EMRs for over 3000 patients in the 4-11 yr age group. Validation of those records by a medical expert revealed problems in fewer than 3% of these background patient EMRs and the errors were subsequently rectified. CONCLUSIONS A data-driven method was developed for generating fully synthetic EMRs. The method is general and can be applied to any data set that has similar data elements (such as laboratory and radiology orders and results, clinical activity, prescription orders). The pilot synthetic outbreak records were for tularemia but our approach may be adapted to other infectious diseases. The pilot synthetic background records were in the 4-11 year old age group. The adaptations that must be made to the algorithms to produce synthetic background EMRs for other age groups are indicated.
Collapse
Affiliation(s)
- Anna L Buczak
- Johns Hopkins University Applied Physics Laboratory, 11100 Johns Hopkins Rd, Laurel, MD 20723-6099, USA
| | - Steven Babin
- Johns Hopkins University Applied Physics Laboratory, 11100 Johns Hopkins Rd, Laurel, MD 20723-6099, USA
| | - Linda Moniz
- Johns Hopkins University Applied Physics Laboratory, 11100 Johns Hopkins Rd, Laurel, MD 20723-6099, USA
| |
Collapse
|
156
|
Iezzoni LI. Multiple chronic conditions and disabilities: implications for health services research and data demands. Health Serv Res 2010; 45:1523-40. [PMID: 21054370 PMCID: PMC2965890 DOI: 10.1111/j.1475-6773.2010.01145.x] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Increasing numbers of Americans are living with multiple chronic conditions (MCCs) and disabilities. Addressing health care needs of persons with MCCs or disabilities presents challenges on many levels. For health services researchers, priorities include (1) considering MCCs and disabilities in comparative effectiveness research (CER) and assessing quality of care; and (2) identifying and evaluating the data needed to conduct CER, performance measure development, and other research to inform health policy and public health decisions concerning persons with MCCs or disabilities. Little information is available to guide CER or treatment choices for persons with MCCs or disabilities, however, because they are typically excluded from clinical trials that produce the scientific evidence base. Furthermore, most research funding flows through public and private agencies oriented around single organ systems or diseases. Likely changes in the data landscape-notably wider dissemination of electronic health records (EHRs) and moving toward updated coding nomenclatures-may increase the information available to monitor health care service delivery and quality for persons with MCCs and disabilities. Generating this information will require new methods to extract and code information about MCCs and functional status from EHRs, especially narrative texts, and incorporating coding nomenclatures that capture critical dimensions of functional status and disability.
Collapse
Affiliation(s)
- Lisa I Iezzoni
- Mongan Institute for Health Policy, Massachusetts General Hospital, 50 Staniford Street, Room 901B, Boston, MA 02114, USA.
| |
Collapse
|
157
|
Wright A, Chen ES, Maloney FL. An automated technique for identifying associations between medications, laboratory results and problems. J Biomed Inform 2010; 43:891-901. [PMID: 20884377 DOI: 10.1016/j.jbi.2010.09.009] [Citation(s) in RCA: 82] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2009] [Revised: 09/14/2010] [Accepted: 09/15/2010] [Indexed: 11/30/2022]
Abstract
BACKGROUND The patient problem list is an important component of clinical medicine. The problem list enables decision support and quality measurement, and evidence suggests that patients with accurate and complete problem lists may have better outcomes. However, the problem list is often incomplete. OBJECTIVE To determine whether association rule mining, a data mining technique, has utility for identifying associations between medications, laboratory results and problems. Such associations may be useful for identifying probable gaps in the problem list. DESIGN Association rule mining was performed on structured electronic health record data for a sample of 100,000 patients receiving care at the Brigham and Women's Hospital, Boston, MA. The dataset included 272,749 coded problems, 442,658 medications and 11,801,068 laboratory results. MEASUREMENTS Candidate medication-problem and laboratory-problem associations were generated using support, confidence, chi square, interest, and conviction statistics. High-scoring candidate pairs were compared to a gold standard: the Lexi-Comp drug reference database for medications and Mosby's Diagnostic and Laboratory Test Reference for laboratory results. RESULTS We were able to successfully identify a large number of clinically accurate associations. A high proportion of high-scoring associations were adjudged clinically accurate when evaluated against the gold standard (89.2% for medications with the best-performing statistic, chi square, and 55.6% for laboratory results using interest). CONCLUSION Association rule mining appears to be a useful tool for identifying clinically accurate associations between medications, laboratory results and problems and has several important advantages over alternative knowledge-based approaches.
Collapse
Affiliation(s)
- Adam Wright
- Division of General Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02120, USA.
| | | | | |
Collapse
|
158
|
Chiang JH, Lin JW, Yang CW. Automated evaluation of electronic discharge notes to assess quality of care for cardiovascular diseases using Medical Language Extraction and Encoding System (MedLEE). J Am Med Inform Assoc 2010; 17:245-52. [PMID: 20442141 DOI: 10.1136/jamia.2009.000182] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
The objective of this study was to develop and validate an automated acquisition system to assess quality of care (QC) measures for cardiovascular diseases. This system combining searching and retrieval algorithms was designed to extract QC measures from electronic discharge notes and to estimate the attainment rates to the current standards of care. It was developed on the patients with ST-segment elevation myocardial infarction and tested on the patients with unstable angina/non-ST-segment elevation myocardial infarction, both diseases sharing almost the same QC measures. The system was able to reach a reasonable agreement (kappa value) with medical experts from 0.65 (early reperfusion rate) to 0.97 (beta-blockers and lipid-lowering agents before discharge) for different QC measures in the test set, and then applied to evaluate QC in the patients who underwent coronary artery bypass grafting surgery. The result has validated a new tool to reliably extract QC measures for cardiovascular diseases.
Collapse
Affiliation(s)
- Jung-Hsien Chiang
- Institute of Medical Informatics and Department of Computer Science, National Cheng Kung University, Tainan, Taiwan.
| | | | | |
Collapse
|
159
|
Orme M, Sjöqvist F, Birkett D, Brøsen K, Cascorbi I, Gustafsson LL, Maxwell S, Rago L, Rawlins M, Reidenberg M, Sjöqvist F, Smith T, Thuerman P, Walubo A. Clinical Pharmacology in Research, Teaching and Health Care. Basic Clin Pharmacol Toxicol 2010; 107:531-59. [DOI: 10.1111/j.1742-7843.2010.00602.x] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
160
|
Wang X, Chase H, Markatou M, Hripcsak G, Friedman C. Selecting information in electronic health records for knowledge acquisition. J Biomed Inform 2010; 43:595-601. [PMID: 20362071 DOI: 10.1016/j.jbi.2010.03.011] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2009] [Revised: 03/13/2010] [Accepted: 03/28/2010] [Indexed: 11/18/2022]
Abstract
Knowledge acquisition of relations between biomedical entities is critical for many automated biomedical applications, including pharmacovigilance and decision support. Automated acquisition of statistical associations from biomedical and clinical documents has shown some promise. However, acquisition of clinically meaningful relations (i.e. specific associations) remains challenging because textual information is noisy and co-occurrence does not typically determine specific relations. In this work, we focus on acquisition of two types of relations from clinical reports: disease-manifestation related symptom (MRS) and drug-adverse drug event (ADE), and explore the use of filtering by sections of the reports to improve performance. Evaluation indicated that applying the filters improved recall (disease-MRS: from 0.85 to 0.90; drug-ADE: from 0.43 to 0.75) and precision (disease-MRS: from 0.82 to 0.92; drug-ADE: from 0.16 to 0.31). This preliminary study demonstrates that selecting information in narrative electronic reports based on the sections improves the detection of disease-MRS and drug-ADE types of relations. Further investigation of complementary methods, such as more sophisticated statistical methods, more complex temporal models and use of information from other knowledge sources, is needed.
Collapse
Affiliation(s)
- Xiaoyan Wang
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, United States.
| | | | | | | | | |
Collapse
|
161
|
Denny JC, Ritchie MD, Basford MA, Pulley JM, Bastarache L, Brown-Gentry K, Wang D, Masys DR, Roden DM, Crawford DC. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics 2010; 26:1205-10. [PMID: 20335276 PMCID: PMC2859132 DOI: 10.1093/bioinformatics/btq126] [Citation(s) in RCA: 803] [Impact Index Per Article: 57.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Emergence of genetic data coupled to longitudinal electronic medical records (EMRs) offers the possibility of phenome-wide association scans (PheWAS) for disease-gene associations. We propose a novel method to scan phenomic data for genetic associations using International Classification of Disease (ICD9) billing codes, which are available in most EMR systems. We have developed a code translation table to automatically define 776 different disease populations and their controls using prevalent ICD9 codes derived from EMR data. As a proof of concept of this algorithm, we genotyped the first 6005 European-Americans accrued into BioVU, Vanderbilt's DNA biobank, at five single nucleotide polymorphisms (SNPs) with previously reported disease associations: atrial fibrillation, Crohn's disease, carotid artery stenosis, coronary artery disease, multiple sclerosis, systemic lupus erythematosus and rheumatoid arthritis. The PheWAS software generated cases and control populations across all ICD9 code groups for each of these five SNPs, and disease-SNP associations were analyzed. The primary outcome of this study was replication of seven previously known SNP-disease associations for these SNPs. RESULTS Four of seven known SNP-disease associations using the PheWAS algorithm were replicated with P-values between 2.8 x 10(-6) and 0.011. The PheWAS algorithm also identified 19 previously unknown statistical associations between these SNPs and diseases at P < 0.01. This study indicates that PheWAS analysis is a feasible method to investigate SNP-disease associations. Further evaluation is needed to determine the validity of these associations and the appropriate statistical thresholds for clinical significance. AVAILABILITY The PheWAS software and code translation table are freely available at http://knowledgemap.mc.vanderbilt.edu/research.
Collapse
Affiliation(s)
- Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
162
|
Abstract
There is a current and pressing need for a test bed of electronic medical records (EMRs) to insure consistent development, validation and verification of public health related algorithms that operate on EMRs. However, access to full EMRs is limited and not generally available to the academic algorithm developers who support the public health community. This paper describes a set of algorithms that produce synthetic EMRs using real EMRs as a model. The algorithms were used to generate a pilot set of over 3000 synthetic EMRs that are currently available on CDC’s Public Health grid. The properties of the synthetic EMRs were validated, both in the entire aggregate data set and for individual (synthetic) patients. We describe how the algorithms can be extended to produce records beyond the initial pilot data set.
Collapse
|
163
|
Wang X, Hripcsak G, Friedman C. Characterizing environmental and phenotypic associations using information theory and electronic health records. BMC Bioinformatics 2009; 10 Suppl 9:S13. [PMID: 19761567 PMCID: PMC2745684 DOI: 10.1186/1471-2105-10-s9-s13] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Background The availability of up-to-date, executable, evidence-based medical knowledge is essential for many clinical applications, such as pharmacovigilance, but executable knowledge is costly to obtain and update. Automated acquisition of environmental and phenotypic associations in biomedical and clinical documents using text mining has showed some success. The usefulness of the association knowledge is limited, however, due to the fact that the specific relationships between clinical entities remain unknown. In particular, some associations are indirect relations due to interdependencies among the data. Results In this work, we develop methods using mutual information (MI) and its property, the data processing inequality (DPI), to help characterize associations that were generated based on use of natural language processing to encode clinical information in narrative patient records followed by statistical methods. Evaluation based on a random sample consisting of two drugs and two diseases indicates an overall precision of 81%. Conclusion This preliminary study demonstrates that the proposed method is effective for helping to characterize phenotypic and environmental associations obtained from clinical reports.
Collapse
Affiliation(s)
- Xiaoyan Wang
- Dept of Biomedical Informatics, Columbia University, New York 10032, USA.
| | | | | |
Collapse
|
164
|
Friedman C. Discovering Novel Adverse Drug Events Using Natural Language Processing and Mining of the Electronic Health Record. Artif Intell Med 2009. [DOI: 10.1007/978-3-642-02976-9_1] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|