1
|
Sun VH, Heemelaar JC, Hadzic I, Raghu VK, Wu CY, Zubiri L, Ghamari A, LeBoeuf NR, Abu-Shawer O, Kehl KL, Grover S, Singh P, Suero-Abreu GA, Wu J, Falade AS, Grealish K, Thomas MF, Hathaway N, Medoff BD, Gilman HK, Villani AC, Ho JS, Mooradian MJ, Sise ME, Zlotoff DA, Blum SM, Dougan M, Sullivan RJ, Neilan TG, Reynolds KL. Enhancing Precision in Detecting Severe Immune-Related Adverse Events: Comparative Analysis of Large Language Models and International Classification of Disease Codes in Patient Records. J Clin Oncol 2024:JCO2400326. [PMID: 39226489 DOI: 10.1200/jco.24.00326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 05/20/2024] [Accepted: 06/24/2024] [Indexed: 09/05/2024] Open
Abstract
PURPOSE Current approaches to accurately identify immune-related adverse events (irAEs) in large retrospective studies are limited. Large language models (LLMs) offer a potential solution to this challenge, given their high performance in natural language comprehension tasks. Therefore, we investigated the use of an LLM to identify irAEs among hospitalized patients, comparing its performance with manual adjudication and International Classification of Disease (ICD) codes. METHODS Hospital admissions of patients receiving immune checkpoint inhibitor (ICI) therapy at a single institution from February 5, 2011, to September 5, 2023, were individually reviewed and adjudicated for the presence of irAEs. ICD codes and an LLM with retrieval-augmented generation were applied to detect frequent irAEs (ICI-induced colitis, hepatitis, and pneumonitis) and the most fatal irAE (ICI-myocarditis) from electronic health records. The performance between ICD codes and LLM was compared via sensitivity and specificity with an α = .05, relative to the gold standard of manual adjudication. External validation was performed using a data set of hospital admissions from June 1, 2018, to May 31, 2019, from a second institution. RESULTS Of the 7,555 admissions for patients on ICI therapy in the initial cohort, 2.0% were adjudicated to be due to ICI-colitis, 1.1% ICI-hepatitis, 0.7% ICI-pneumonitis, and 0.8% ICI-myocarditis. The LLM demonstrated higher sensitivity than ICD codes (94.7% v 68.7%), achieving significance for ICI-hepatitis (P < .001), myocarditis (P < .001), and pneumonitis (P = .003) while yielding similar specificities (93.7% v 92.4%). The LLM spent an average of 9.53 seconds/chart in comparison with an estimated 15 minutes for adjudication. In the validation cohort (N = 1,270), the mean LLM sensitivity and specificity were 98.1% and 95.7%, respectively. CONCLUSION LLMs are a useful tool for the detection of irAEs, outperforming ICD codes in sensitivity and adjudication in efficiency.
Collapse
Affiliation(s)
- Virginia H Sun
- Harvard Medical School, Boston, MA
- Cardiovascular Imaging Research Center, Massachusetts General Hospital, Boston, MA
| | - Julius C Heemelaar
- Harvard Medical School, Boston, MA
- Cardiovascular Imaging Research Center, Massachusetts General Hospital, Boston, MA
- Leiden University Medical Center, Leiden, the Netherlands
| | - Ibrahim Hadzic
- Harvard Medical School, Boston, MA
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Boston, MA
- Brigham and Women's Hospital, Boston, MA
- Maastricht University, Maastricht, the Netherlands
| | - Vineet K Raghu
- Harvard Medical School, Boston, MA
- Cardiovascular Imaging Research Center, Massachusetts General Hospital, Boston, MA
| | - Chia-Yun Wu
- Division of Hematology and Oncology, Department of Medicine, Massachusetts General Hospital, Boston, MA
- Far Eastern Memorial Hospital, New Taipei City, Taiwan
| | - Leyre Zubiri
- Harvard Medical School, Boston, MA
- Division of Hematology and Oncology, Department of Medicine, Massachusetts General Hospital, Boston, MA
| | - Azin Ghamari
- Harvard Medical School, Boston, MA
- Cardiovascular Imaging Research Center, Massachusetts General Hospital, Boston, MA
| | - Nicole R LeBoeuf
- Harvard Medical School, Boston, MA
- Department of Dermatology, Brigham and Women's Hospital, Boston, MA
- Center for Cutaneous Oncology, Dana-Farber Cancer Institute, Boston, MA
| | - Osama Abu-Shawer
- Department of Internal Medicine, Cleveland Clinic, Cleveland, OH
| | - Kenneth L Kehl
- Harvard Medical School, Boston, MA
- Division of Population Sciences, Dana-Farber Cancer Institute, Boston, MA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
| | - Shilpa Grover
- Harvard Medical School, Boston, MA
- Division of Gastroenterology, Hepatology, and Endoscopy, Brigham and Women's Hospital, Boston, MA
| | - Prabhsimranjot Singh
- Harvard Medical School, Boston, MA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
| | - Giselle A Suero-Abreu
- Harvard Medical School, Boston, MA
- Cardiovascular Imaging Research Center, Massachusetts General Hospital, Boston, MA
- Division of Cardiology, Massachusetts General Hospital, Boston, MA
| | - Jessica Wu
- Harvard Medical School, Boston, MA
- Cardiovascular Imaging Research Center, Massachusetts General Hospital, Boston, MA
| | - Ayo S Falade
- Internal Medicine Department, Massachusetts General Brigham Salem Hospital, Salem, MA
| | - Kelley Grealish
- Division of Hematology and Oncology, Department of Medicine, Massachusetts General Hospital, Boston, MA
| | - Molly F Thomas
- Division of Gastroenterology, Oregon Health and Science University, Portland, OR
- Department of Medicine, Oregon Health and Science University, Portland, OR
- Department of Cell, Developmental, and Cancer Biology, Oregon Health and Science University, Portland, OR
| | - Nora Hathaway
- Division of Hematology and Oncology, Department of Medicine, Massachusetts General Hospital, Boston, MA
| | - Benjamin D Medoff
- Harvard Medical School, Boston, MA
- Division of Pulmonary and Critical Care Medicine, Massachusetts General Hospital, Boston, MA
| | - Hannah K Gilman
- Harvard Medical School, Boston, MA
- Cardiovascular Imaging Research Center, Massachusetts General Hospital, Boston, MA
| | - Alexandra-Chloe Villani
- Harvard Medical School, Boston, MA
- Center for Immunology and Inflammatory Diseases (CIID), Massachusetts General Hospital Krantz Family Center for Cancer Research, Boston, MA
- Broad Institute of MIT and Harvard, Cambridge, MA
| | - Jor Sam Ho
- Harvard Medical School, Boston, MA
- Cardiovascular Imaging Research Center, Massachusetts General Hospital, Boston, MA
| | - Meghan J Mooradian
- Harvard Medical School, Boston, MA
- Division of Hematology and Oncology, Department of Medicine, Massachusetts General Hospital, Boston, MA
| | - Meghan E Sise
- Harvard Medical School, Boston, MA
- Division of Nephrology, Massachusetts General Hospital, Boston, MA
| | - Daniel A Zlotoff
- Harvard Medical School, Boston, MA
- Division of Cardiology, Massachusetts General Hospital, Boston, MA
| | - Steven M Blum
- Harvard Medical School, Boston, MA
- Division of Hematology and Oncology, Department of Medicine, Massachusetts General Hospital, Boston, MA
- Center for Immunology and Inflammatory Diseases (CIID), Massachusetts General Hospital Krantz Family Center for Cancer Research, Boston, MA
- Broad Institute of MIT and Harvard, Cambridge, MA
| | - Michael Dougan
- Harvard Medical School, Boston, MA
- Division of Gastroenterology, Massachusetts General Hospital, Boston, MA
| | - Ryan J Sullivan
- Harvard Medical School, Boston, MA
- Division of Hematology and Oncology, Department of Medicine, Massachusetts General Hospital, Boston, MA
| | - Tomas G Neilan
- Harvard Medical School, Boston, MA
- Cardiovascular Imaging Research Center, Massachusetts General Hospital, Boston, MA
- Division of Cardiology, Massachusetts General Hospital, Boston, MA
| | - Kerry L Reynolds
- Harvard Medical School, Boston, MA
- Division of Hematology and Oncology, Department of Medicine, Massachusetts General Hospital, Boston, MA
| |
Collapse
|
2
|
Ferreira-da-Silva R, Reis-Pardal J, Pinto M, Monteiro-Soares M, Sousa-Pinto B, Morato M, Polónia JJ, Ribeiro-Vaz I. A Comparison of Active Pharmacovigilance Strategies Used to Monitor Adverse Events to Antiviral Agents: A Systematic Review. Drug Saf 2024:10.1007/s40264-024-01470-0. [PMID: 39160354 DOI: 10.1007/s40264-024-01470-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/17/2024] [Indexed: 08/21/2024]
Abstract
INTRODUCTION The safety of antiviral agents in real-world clinical settings is crucial, as pre-marketing studies often do not capture all adverse events (AE). Active pharmacovigilance strategies are essential for detecting and characterising these AE comprehensively. OBJECTIVE The aim of this study was to identify and characterise active pharmacovigilance strategies used in real-world clinical settings for patients under systemic antiviral agents, focusing on the frequency of AE and the clinical data sources used. METHODS We conducted a systematic review by searching three electronic bibliographic databases targeting observational prospective active pharmacovigilance studies, phase IV clinical trials for post-marketing safety surveillance, and interventional studies assessing active pharmacovigilance strategies, focusing on individuals exposed to systemic antiviral agents. RESULTS We included 36 primary studies, predominantly using Drug Event Monitoring (DEM), with a minority employing sentinel sites and registries. Human immunodeficiency virus (HIV) was the most common condition, with the majority using DEM. Within the DEM, there was a wide range of incidences of patients experiencing at least one AE, and most of these studies used one or two data sources. Sentinel site studies were less common, with two on hepatitis C virus (HCV) and one on HIV, each relying on one or two data sources. The single study using a registry focusing on HIV therapy reported using just one data source. Patient interviews were the most common data source, followed by medical records and laboratory tests. The quality of the studies was considered 'good' in 18/36, 'fair' in 1/36, and 'poor' in 17/36 studies. CONCLUSION DEM was the predominant pharmacovigilance strategy, employing multiple data sources, and appears to increase the likelihood of detecting higher AE incidence. Establishing such a framework would facilitate a more detailed and consistent approach across different studies and settings.
Collapse
Affiliation(s)
- Renato Ferreira-da-Silva
- Porto Pharmacovigilance Centre, Faculty of Medicine of the University of Porto, Porto, Portugal.
- Center for Health Technology and Services Research, Associate Laboratory RISE-Health Research Network (CINTESIS@RISE), Porto, Portugal.
- Department of Community Medicine, Information and Health Decision Sciences (MEDCIDS), Faculty of Medicine of the University of Porto (FMUP), Porto, Portugal.
| | - Joana Reis-Pardal
- Center for Health Technology and Services Research, Associate Laboratory RISE-Health Research Network (CINTESIS@RISE), Porto, Portugal
- Department of Community Medicine, Information and Health Decision Sciences (MEDCIDS), Faculty of Medicine of the University of Porto (FMUP), Porto, Portugal
| | - Manuela Pinto
- São João University Hospital Centre, Porto, Portugal
| | - Matilde Monteiro-Soares
- Center for Health Technology and Services Research, Associate Laboratory RISE-Health Research Network (CINTESIS@RISE), Porto, Portugal
- Department of Community Medicine, Information and Health Decision Sciences (MEDCIDS), Faculty of Medicine of the University of Porto (FMUP), Porto, Portugal
- Portuguese Red Cross Health School-Lisbon, Lisbon, Portugal
- Cross I&D, Lisbon, Portugal
| | - Bernardo Sousa-Pinto
- Center for Health Technology and Services Research, Associate Laboratory RISE-Health Research Network (CINTESIS@RISE), Porto, Portugal
- Department of Community Medicine, Information and Health Decision Sciences (MEDCIDS), Faculty of Medicine of the University of Porto (FMUP), Porto, Portugal
| | - Manuela Morato
- Laboratory of Pharmacology, Department of Drug Sciences, Faculty of Pharmacy of the University of Porto, Porto, Portugal
- LAQV@REQUIMTE, Faculty of Pharmacy of the University of Porto, Porto, Portugal
| | - Jorge Junqueira Polónia
- Porto Pharmacovigilance Centre, Faculty of Medicine of the University of Porto, Porto, Portugal
- Center for Health Technology and Services Research, Associate Laboratory RISE-Health Research Network (CINTESIS@RISE), Porto, Portugal
- Department of Medicine, Faculty of Medicine of the University of Porto, Porto, Portugal
| | - Inês Ribeiro-Vaz
- Porto Pharmacovigilance Centre, Faculty of Medicine of the University of Porto, Porto, Portugal
- Center for Health Technology and Services Research, Associate Laboratory RISE-Health Research Network (CINTESIS@RISE), Porto, Portugal
- Department of Community Medicine, Information and Health Decision Sciences (MEDCIDS), Faculty of Medicine of the University of Porto (FMUP), Porto, Portugal
| |
Collapse
|
3
|
Ascenção R, Nogueira P, Sampaio F, Henriques A, Costa A. Adverse drug reactions in hospitals: population estimates for Portugal and the ICD-9-CM to ICD-10-CM crosswalk. BMC Health Serv Res 2023; 23:1222. [PMID: 37940971 PMCID: PMC10634004 DOI: 10.1186/s12913-023-10225-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 10/27/2023] [Indexed: 11/10/2023] Open
Abstract
BACKGROUND Adverse drug reactions (ADR), both preventable and non-preventable, are frequent and pose a significant burden. This study aimed to produce up-to-date estimates for ADR rates in hospitals, in Portugal, from 2010 to 2018. In addition, it explores possible pitfalls when crosswalking between ICD-9-CM and ICD-10-CM code sets for ADR identification. METHODS The Portuguese Hospital Morbidity Database was used to identify hospital episodes (outpatient or inpatient) with at least one ICD code of ADR. Since the study period spanned from 2010 to 2018, both ICD-9-CM and ICD-10-CM codes based on previously published studies were used to define episodes. This was an exploratory study, and descriptive statistics were used to provide ADR rates and summarise episode features for the full period (2010-2018) as well as for the ICD-9-CM (2010-2016) and ICD -10-CM (2017-2018) eras. RESULTS Between 2010 and 2018, ADR occurred in 162,985 hospital episodes, corresponding to 1.00% of the total number of episodes during the same period. Higher rates were seen in the oldest age groups. In the same period, the mean annual rate of episodes related to ADR was 174.2/100,000 population. The episode rate (per 100,000 population) was generally higher in males, except in young adults (aged '15-20', '25-30' and '30-35' years), although the overall frequency of ADR in hospital episodes was higher in females. CONCLUSIONS Despite the ICD-10-CM transition, administrative health data in Portugal remain a feasible source for producing up-to-date estimates on ADR in hospitals. There is a need for future research to identify target recipients for preventive interventions and improve medication safety practices in Portugal.
Collapse
Affiliation(s)
- Raquel Ascenção
- Laboratório de Farmacologia Clínica e Terapêutica, Faculdade de Medicina, Universidade de Lisboa, Avenida Professor Egas Moniz, 1649-028, Lisboa, Portugal.
| | - Paulo Nogueira
- Escola Nacional de Saúde Pública - Universidade Nova de Lisboa, Lisboa, Portugal
| | - Filipa Sampaio
- Department of Public Health and Caring Sciences, Uppsala University, Uppsala, Sweden
| | - Adriana Henriques
- Nursing Research, Innovation and Development Centre of Lisbon (CIDNUR), Nursing School of Lisbon, Lisboa, Portugal
| | - Andreia Costa
- Instituto de Saúde Ambiental (ISAMB), Faculdade de Medicina, Universidade de Lisboa, Lisboa, Portugal
| |
Collapse
|
4
|
Trinkley KE, Wright G, Allen LA, Bennett TD, Glasgow RE, Hale G, Heckman S, Huebschmann AG, Kahn MG, Kao DP, Lin CT, Malone DC, Matlock DD, Wells L, Wysocki V, Zhang S, Suresh K. Sustained Effect of Clinical Decision Support for Heart Failure: A Natural Experiment Using Implementation Science. Appl Clin Inform 2023; 14:822-832. [PMID: 37852249 PMCID: PMC10584394 DOI: 10.1055/s-0043-1775566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 08/02/2023] [Indexed: 10/20/2023] Open
Abstract
OBJECTIVES In a randomized controlled trial, we found that applying implementation science (IS) methods and best practices in clinical decision support (CDS) design to create a locally customized, "enhanced" CDS significantly improved evidence-based prescribing of β blockers (BB) for heart failure compared with an unmodified commercially available CDS. At trial conclusion, the enhanced CDS was expanded to all sites. The purpose of this study was to evaluate the real-world sustained effect of the enhanced CDS compared with the commercial CDS. METHODS In this natural experiment of 28 primary care clinics, we compared clinics exposed to the commercial CDS (preperiod) to clinics exposed to the enhanced CDS (both periods). The primary effectiveness outcome was the proportion of alerts resulting in a BB prescription. Secondary outcomes included patient reach and clinician adoption (dismissals). RESULTS There were 367 alerts for 183 unique patients and 171 unique clinicians (pre: March 2019-August 2019; post: October 2019-March 2020). The enhanced CDS increased prescribing by 26.1% compared with the commercial (95% confidence interval [CI]: 17.0-35.1%), which is consistent with the 24% increase in the previous study. The odds of adopting the enhanced CDS was 81% compared with 29% with the commercial (odds ratio: 4.17, 95% CI: 1.96-8.85). The enhanced CDS adoption and effectiveness rates were 62 and 14% in the preperiod and 92 and 10% in the postperiod. CONCLUSION Applying IS methods with CDS best practices was associated with improved and sustained clinician adoption and effectiveness compared with a commercially available CDS tool.
Collapse
Affiliation(s)
- Katy E. Trinkley
- Department of Family Medicine, University of Colorado Anschutz Medical Campus, School of Medicine, Aurora, Colorado, United States
- UCHealth, Aurora, Colorado, United States
- Department of Clinical Pharmacy, University of Colorado Anschutz Medical Campus Skaggs School of Pharmacy and Pharmaceutical Sciences, Aurora, Colorado, United States
| | - Garth Wright
- Department of Clinical Pharmacy, University of Colorado Anschutz Medical Campus Skaggs School of Pharmacy and Pharmaceutical Sciences, Aurora, Colorado, United States
| | - Larry A. Allen
- Adult and Child Center for Outcomes Research and Delivery Science, Aurora, Colorado, United States
- Division of Cardiology, University of Colorado Anschutz Medical Campus, School of Medicine, Aurora, Colorado, United States
| | - Tellen D. Bennett
- Adult and Child Center for Outcomes Research and Delivery Science, Aurora, Colorado, United States
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, Colorado, United States
| | - Russell E. Glasgow
- Department of Family Medicine, University of Colorado Anschutz Medical Campus, School of Medicine, Aurora, Colorado, United States
- Adult and Child Center for Outcomes Research and Delivery Science, Aurora, Colorado, United States
- Veterans Affairs Eastern Colorado Geriatric Research Education and Clinical Center, Aurora, Colorado, United States
| | - Gary Hale
- UCHealth, Aurora, Colorado, United States
| | | | - Amy G. Huebschmann
- Adult and Child Center for Outcomes Research and Delivery Science, Aurora, Colorado, United States
- Division of Internal Medicine, University of Colorado Anschutz Medical Campus, School of Medicine, Aurora, Colorado, United States
- University of Colorado Anschutz Medical Campus Ludeman Family Center for Women's Health Research, Aurora, Colorado, United States
| | - Michael G. Kahn
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, Colorado, United States
| | - David P. Kao
- UCHealth, Aurora, Colorado, United States
- Department of Clinical Pharmacy, University of Colorado Anschutz Medical Campus Skaggs School of Pharmacy and Pharmaceutical Sciences, Aurora, Colorado, United States
| | - Chen-Tan Lin
- UCHealth, Aurora, Colorado, United States
- Division of Internal Medicine, University of Colorado Anschutz Medical Campus, School of Medicine, Aurora, Colorado, United States
| | - Daniel C. Malone
- Department of Pharmacotherapy, University of Utah Skaggs College of Pharmacy, Salt Lake City, Utah, United States
| | - Daniel D. Matlock
- Adult and Child Center for Outcomes Research and Delivery Science, Aurora, Colorado, United States
- Veterans Affairs Eastern Colorado Geriatric Research Education and Clinical Center, Aurora, Colorado, United States
- Division of Internal Medicine, University of Colorado Anschutz Medical Campus, School of Medicine, Aurora, Colorado, United States
- Division of Geriatrics, University of Colorado Anschutz Medical Campus, School of Medicine, Aurora, Colorado, United States
| | - Lauren Wells
- Department of Clinical Pharmacy, University of Colorado Anschutz Medical Campus Skaggs School of Pharmacy and Pharmaceutical Sciences, Aurora, Colorado, United States
| | - Vincent Wysocki
- Department of Clinical Pharmacy, University of Colorado Anschutz Medical Campus Skaggs School of Pharmacy and Pharmaceutical Sciences, Aurora, Colorado, United States
| | - Shelley Zhang
- Department of Family Medicine, University of Colorado Anschutz Medical Campus, School of Medicine, Aurora, Colorado, United States
| | - Krithika Suresh
- Adult and Child Center for Outcomes Research and Delivery Science, Aurora, Colorado, United States
- Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, Colorado, United States
| |
Collapse
|
5
|
Explainable detection of adverse drug reaction with imbalanced data distribution. PLoS Comput Biol 2022; 18:e1010144. [PMID: 35704662 PMCID: PMC9239481 DOI: 10.1371/journal.pcbi.1010144] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2021] [Revised: 06/28/2022] [Accepted: 04/26/2022] [Indexed: 11/18/2022] Open
Abstract
Analysis of health-related texts can be used to detect adverse drug reactions (ADR). The greatest challenge for ADR detection lies in imbalanced data distributions where words related to ADR symptoms are often minority classes. As a result, trained models tend to converge to a point that strongly biases towards the majority class and then ignores the minority class. Since the most used cross-entropy criteria is an approximation to accuracy, the model focuses more readily on the majority class to achieve high accuracy. To address this issue, existing methods apply either oversampling or down-sampling strategies to balance the data distribution and exploit the most difficult samples of the minority class. However, increasing or reducing the number of individual tokens alone in sequence labeling tasks will result in the loss of the syntactic relations of the sentence. This paper proposes a weighted variant of conditional random field (CRF) for data-imbalanced sequence labeling tasks. Such a weighting strategy can alleviate data distribution imbalances between majority and minority classes. Instead of using softmax in the output layer, the CRF can capture the relationship of labels between tokens. The locally interpretable model-agnostic explanations (LIME) algorithm was applied to investigate performance differences between models with and without the weighted loss function. Experimental results on two different ADR tasks show that the proposed model outperforms previously proposed sequence labeling methods.
Collapse
|
6
|
A weakly supervised model for the automated detection of adverse events using clinical notes. J Biomed Inform 2021; 126:103969. [PMID: 34864210 DOI: 10.1016/j.jbi.2021.103969] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Revised: 10/26/2021] [Accepted: 11/27/2021] [Indexed: 11/21/2022]
Abstract
With clinical trials unable to detect all potential adverse reactions to drugs and medical devices prior to their release into the market, accurate post-market surveillance is critical to ensure their safety and efficacy. Electronic health records (EHR) contain rich observational patient data, making them a valuable source to actively monitor the safety of drugs and devices. While structured EHR data and spontaneous reporting systems often underreport the complexities of patient encounters and outcomes, free-text clinical notes offer greater detail about a patient's status. Previous studies have proposed machine learning methods to detect adverse events from clinical notes, but suffer from manually extracted features, reliance on costly hand-labeled data, and lack of validation on external datasets. To address these challenges, we develop a weakly-supervised machine learning framework for adverse event detection from unstructured clinical notes and evaluate it on insulin pump failure as a test case. Our model accurately detected cases of pump failure with 0.842 PR AUC on the holdout test set and 0.815 PR AUC when validated on an external dataset. Our approach allowed us to leverage a large dataset with far less hand-labeled data and can be easily transferred to additional adverse events for scalable post-market surveillance.
Collapse
|
7
|
Nashed A, Zhang S, Chiang CW, Zitu M, Otterson GA, Presley CJ, Kendra K, Patel SH, Johns A, Li M, Grogan M, Lopez G, Owen DH, Li L. Comparative assessment of manual chart review and ICD claims data in evaluating immunotherapy-related adverse events. Cancer Immunol Immunother 2021; 70:2761-2769. [PMID: 33625533 PMCID: PMC10992210 DOI: 10.1007/s00262-021-02880-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 02/01/2021] [Indexed: 12/19/2022]
Abstract
BACKGROUND The aim of this retrospective study was to demonstrate that irAEs, specifically gastrointestinal and pulmonary, examined through International Classification of Disease (ICD) data leads to underrepresentation of true irAEs and overrepresentation of false irAEs, thereby concluding that ICD claims data are a poor approach to electronic health record (EHR) data mining for irAEs in immunotherapy clinical research. METHODS This retrospective analysis was conducted in 1,063 cancer patients who received ICIs between 2011 and 2017. We identified irAEs by manual review of medical records to determine the incidence of each of our endpoints, namely colitis, hepatitis, pneumonitis, other irAE, or no irAE. We then performed a secondary analysis utilizing ICD claims data alone using a broad range of symptom and disease-specific ICD codes representative of irAEs. RESULTS 16% (n = 174/1,063) of the total study population was initially found to have either pneumonitis 3% (n = 37), colitis 7% (n = 81) or hepatitis 5% (n = 56) on manual review. Of these patients, 46% (n = 80/174) did not have ICD code evidence in the EHR reflecting their irAE. Of the total patients not found to have any irAEs during manual review, 61% (n = 459/748) of patients had ICD codes suggestive of possible irAE, yet were not identified as having an irAE during manual review. DISCUSSION Examining gastrointestinal and pulmonary irAEs through the International Classification of Disease (ICD) data leads to underrepresentation of true irAEs and overrepresentation of false irAEs.
Collapse
Affiliation(s)
- Andrew Nashed
- Department of Internal Medicine, The Ohio State University, A450B Starling Loving Hall Columbus, Columbus, OH, 43210, USA.
| | - Shijun Zhang
- Department of Biomedical Informatics and Center for Biostatistics, The Ohio State University, A450B Starling Loving Hall Columbus, Columbus, OH, 43210, USA
| | - Chien-Wei Chiang
- Department of Biomedical Informatics and Center for Biostatistics, The Ohio State University, A450B Starling Loving Hall Columbus, Columbus, OH, 43210, USA
| | - M Zitu
- Department of Biomedical Informatics and Center for Biostatistics, The Ohio State University, A450B Starling Loving Hall Columbus, Columbus, OH, 43210, USA
| | - Gregory A Otterson
- Division of Medical Oncology, The Ohio State University, A450B Starling Loving Hall ColumbusA450B Starling Loving Hall Columbus, Columbus, OH, 43210, USA
| | - Carolyn J Presley
- Division of Medical Oncology, The Ohio State University, A450B Starling Loving Hall ColumbusA450B Starling Loving Hall Columbus, Columbus, OH, 43210, USA
| | - Kari Kendra
- Division of Medical Oncology, The Ohio State University, A450B Starling Loving Hall ColumbusA450B Starling Loving Hall Columbus, Columbus, OH, 43210, USA
| | - Sandip H Patel
- Division of Medical Oncology, The Ohio State University, A450B Starling Loving Hall ColumbusA450B Starling Loving Hall Columbus, Columbus, OH, 43210, USA
| | - Andrew Johns
- Department of Internal Medicine, The Ohio State University, A450B Starling Loving Hall Columbus, Columbus, OH, 43210, USA
| | - Mingjia Li
- Department of Internal Medicine, The Ohio State University, A450B Starling Loving Hall Columbus, Columbus, OH, 43210, USA
| | - Madison Grogan
- Division of Medical Oncology, The Ohio State University, A450B Starling Loving Hall ColumbusA450B Starling Loving Hall Columbus, Columbus, OH, 43210, USA
| | - Gabrielle Lopez
- Division of Medical Oncology, The Ohio State University, A450B Starling Loving Hall ColumbusA450B Starling Loving Hall Columbus, Columbus, OH, 43210, USA
| | - Dwight H Owen
- Division of Medical Oncology, The Ohio State University, A450B Starling Loving Hall ColumbusA450B Starling Loving Hall Columbus, Columbus, OH, 43210, USA
| | - Lang Li
- Department of Biomedical Informatics and Center for Biostatistics, The Ohio State University, A450B Starling Loving Hall Columbus, Columbus, OH, 43210, USA
| |
Collapse
|
8
|
Shin H, Cha J, Lee Y, Kim JY, Lee S. Real-world data-based adverse drug reactions detection from the Korea Adverse Event Reporting System databases with electronic health records-based detection algorithm. Health Informatics J 2021; 27:14604582211033014. [PMID: 34289723 DOI: 10.1177/14604582211033014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Pharmacovigilance involves monitoring of drugs and their adverse drug reactions (ADRs) and is essential for their safety post-marketing. Because of the different types and structures of medical databases, several previous surveillance studies have analyzed only one database. In the present study, we extracted potential drug-ADR pairs from electronic health record (EHR) data using the MetaNurse algorithm and analyzed them using the Korean Adverse Event Reporting System (KAERS) database for systematic validation. The Medical Dictionary for Regulatory Activities (MedDRA) and World Health Organization (WHO) Adverse Reactions Terminology (WHO-ART) were mapped for signal detection. We used the Side Effect Resource (SIDER) database to select 2663 drug-ADR pairs to investigate unknown drug-induced ADRs. The reporting odds ratio (ROR) value was calculated for the drug-exposed and non-exposed groups of drug-ADR pairs, and 19 potential pairs showed significant signals. Appropriate terminology systems and criteria are needed to handle diverse medical databases.
Collapse
Affiliation(s)
- Hyunah Shin
- Konyang University Hospital, Republic of Korea
| | - Jaehun Cha
- Konyang University Hospital, Republic of Korea
| | - Youngho Lee
- Gachon University College of IT, Republic of Korea
| | - Jong-Yeup Kim
- Konyang University Hospital, Republic of Korea; Konyang University College of Medicine, Republic of Korea
| | - Suehyun Lee
- Konyang University Hospital, Republic of Korea; Konyang University College of Medicine, Republic of Korea
| |
Collapse
|
9
|
Geva A, Abman SH, Manzi SF, Ivy DD, Mullen MP, Griffin J, Lin C, Savova GK, Mandl KD. Adverse drug event rates in pediatric pulmonary hypertension: a comparison of real-world data sources. J Am Med Inform Assoc 2021; 27:294-300. [PMID: 31769835 PMCID: PMC7025334 DOI: 10.1093/jamia/ocz194] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Revised: 10/08/2019] [Accepted: 10/21/2019] [Indexed: 11/14/2022] Open
Abstract
Objective Real-world data (RWD) are increasingly used for pharmacoepidemiology and regulatory innovation. Our objective was to compare adverse drug event (ADE) rates determined from two RWD sources, electronic health records and administrative claims data, among children treated with drugs for pulmonary hypertension. Materials and Methods Textual mentions of medications and signs/symptoms that may represent ADEs were identified in clinical notes using natural language processing. Diagnostic codes for the same signs/symptoms were identified in our electronic data warehouse for the patients with textual evidence of taking pulmonary hypertension-targeted drugs. We compared rates of ADEs identified in clinical notes to those identified from diagnostic code data. In addition, we compared putative ADE rates from clinical notes to those from a healthcare claims dataset from a large, national insurer. Results Analysis of clinical notes identified up to 7-fold higher ADE rates than those ascertained from diagnostic codes. However, certain ADEs (eg, hearing loss) were more often identified in diagnostic code data. Similar results were found when ADE rates ascertained from clinical notes and national claims data were compared. Discussion While administrative claims and clinical notes are both increasingly used for RWD-based pharmacovigilance, ADE rates substantially differ depending on data source. Conclusion Pharmacovigilance based on RWD may lead to discrepant results depending on the data source analyzed. Further work is needed to confirm the validity of identified ADEs, to distinguish them from disease effects, and to understand tradeoffs in sensitivity and specificity between data sources.
Collapse
Affiliation(s)
- Alon Geva
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.,Division of Critical Care Medicine, Department of Anesthesiology, Critical Care, and Pain Medicine, Boston Children's Hospital, Boston, Massachusetts, USA.,Department of Anaesthesia, Harvard Medical School, Boston, Massachusetts, USA
| | - Steven H Abman
- Division of Pediatric Pulmonary Medicine, Children's Hospital Colorado, Aurora, Colorado, USA.,Department of Pediatrics, University of Colorado School of Medicine, Aurora, Colorado, USA
| | - Shannon F Manzi
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.,Division of Genetics & Genomics, Clinical Pharmacogenomics Service, Department of Pharmacy, Boston Children's Hospital, Boston, Massachusetts, USA.,Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Dunbar D Ivy
- Department of Pediatrics, University of Colorado School of Medicine, Aurora, Colorado, USA.,Division of Cardiology, Heart Institute, Children's Hospital Colorado, Aurora, Colorado, USA
| | - Mary P Mullen
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA.,Department of Cardiology, Boston Children's Hospital, Boston, Massachusetts, USA
| | - John Griffin
- Division of Critical Care Medicine, Department of Anesthesiology, Critical Care, and Pain Medicine, Boston Children's Hospital, Boston, Massachusetts, USA
| | - Chen Lin
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA
| | - Guergana K Savova
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.,Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Kenneth D Mandl
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.,Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
10
|
Wei Q, Ji Z, Li Z, Du J, Wang J, Xu J, Xiang Y, Tiryaki F, Wu S, Zhang Y, Tao C, Xu H. A study of deep learning approaches for medication and adverse drug event extraction from clinical text. J Am Med Inform Assoc 2021; 27:13-21. [PMID: 31135882 DOI: 10.1093/jamia/ocz063] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Revised: 03/23/2019] [Accepted: 04/17/2019] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVE This article presents our approaches to extraction of medications and associated adverse drug events (ADEs) from clinical documents, which is the second track of the 2018 National NLP Clinical Challenges (n2c2) shared task. MATERIALS AND METHODS The clinical corpus used in this study was from the MIMIC-III database and the organizers annotated 303 documents for training and 202 for testing. Our system consists of 2 components: a named entity recognition (NER) and a relation classification (RC) component. For each component, we implemented deep learning-based approaches (eg, BI-LSTM-CRF) and compared them with traditional machine learning approaches, namely, conditional random fields for NER and support vector machines for RC, respectively. In addition, we developed a deep learning-based joint model that recognizes ADEs and their relations to medications in 1 step using a sequence labeling approach. To further improve the performance, we also investigated different ensemble approaches to generating optimal performance by combining outputs from multiple approaches. RESULTS Our best-performing systems achieved F1 scores of 93.45% for NER, 96.30% for RC, and 89.05% for end-to-end evaluation, which ranked #2, #1, and #1 among all participants, respectively. Additional evaluations show that the deep learning-based approaches did outperform traditional machine learning algorithms in both NER and RC. The joint model that simultaneously recognizes ADEs and their relations to medications also achieved the best performance on RC, indicating its promise for relation extraction. CONCLUSION In this study, we developed deep learning approaches for extracting medications and their attributes such as ADEs, and demonstrated its superior performance compared with traditional machine learning algorithms, indicating its uses in broader NER and RC tasks in the medical domain.
Collapse
Affiliation(s)
- Qiang Wei
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Zongcheng Ji
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Zhiheng Li
- School of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Jingcheng Du
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Jingqi Wang
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Jun Xu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Yang Xiang
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Firat Tiryaki
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Stephen Wu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Yaoyun Zhang
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Cui Tao
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Hua Xu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| |
Collapse
|
11
|
Ko S, Kim H, Shinn J, Byeon SJ, Choi JH, Kim HS. Estimation of sodium-glucose cotransporter 2 inhibitor-related genital and urinary tract infections via electronic medical record-based common data model. J Clin Pharm Ther 2021; 46:975-983. [PMID: 33565150 DOI: 10.1111/jcpt.13381] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Revised: 01/20/2021] [Accepted: 01/22/2021] [Indexed: 11/27/2022]
Abstract
WHAT IS KNOWN AND OBJECTIVES In Korea, the side effects of sodium-glucose cotransporter 2 inhibitors (SGLT2i) have not been clearly reported, aside from voluntary reporting. We aimed to develop detection algorithms for SGLT2i-related genital tract infections (GTIs) and urinary tract infections (UTIs) via a common data model (CDM), an electronic medical record-based database for supporting multi-hospital clinical research. We estimated the occurrence of GTIs and UTIs and-by assessing the status of each step of the algorithm-we also aimed to determine how clinicians responded to the SGLT2i-related GTIs and UTIs. METHODS We targeted all patients who were prescribed SGLT2i at Catholic University Seoul St. Mary's Hospital and Hallym University Dongtan Sacred Heart Hospital from January 2014 to August 2018. We developed algorithms for detection of SGLT2i-related GTIs or UTIs that divided patients into "most likely," "possibly" or "less likely" categories of GTIs or UTIs. The numbers of patients at each step were extracted. RESULTS AND DISCUSSION A total of 4253 patients received their first prescription of SGLT2i. According to the algorithm used in this study, the proportions of "most likely GTI" and "possibly GTI" were 0.9% (37 out of 4253) and 19.4% (826 out of 4253 patients), respectively. Similarly, the proportions of "most likely UTI" and "possibly UTI" were 0.9% (38 out of 4253) and 20.2% (858 out of 4253 patients), respectively. Compared to the various existing prospective studies, both GTIs and UTIs showed lower occurrence among patients who met "most likely" criteria and higher occurrence among those who met "possibly" criteria. When a GTI or UTI occurred or was suspected, the overall rate of discontinuing SGLT2i was 51.8% (1721 out of 3323). Despite a confirmed or suspected GTI and an UTI, 62.8% (1460 out of 2323) and 14.2% (142 out of 1000) of patients continued to take SGLT2i, respectively. The discontinuation rate for suspected GTIs was significantly lower than that for suspected UTIs (37.2% vs. 85.8%, p < 0.001). WHAT IS NEW AND CONCLUSION In this study, although the GTIs appeared to have a similar occurrence as UTIs, however, the discontinuation rate of SGLT2i for suspected GTIs was relatively lower. Our study is novel in that we identified how the physicians approached SGLT2i-related GTIs or UTIs at each step in a real-world clinical practice setting. Although we could estimate SGLT2i-related GTIs and UTIs via CDM, we were limited in our ability to accurately detect mild drug side effects via CDM, which lacked data for operational definition.
Collapse
Affiliation(s)
- SooJeong Ko
- Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - HyungMin Kim
- Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Jiwon Shinn
- Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Sun-Ju Byeon
- Department of Pathology, Hallym University Dongtan Sacred Heart Hospital, Hwaseong, Korea
| | - Jeong-Hee Choi
- Department of Pulmonology and Allergy, Hallym University Dongtan Sacred Heart Hospital, Hwaseong, Korea
| | - Hun-Sung Kim
- Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul, Korea.,Division of Endocrinology and Metabolism, Department of Internal Medicine, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea
| |
Collapse
|
12
|
Hoopes M, Angier H, Raynor LA, Suchocki A, Muench J, Marino M, Rivera P, Huguet N. Development of an algorithm to link electronic health record prescriptions with pharmacy dispense claims. J Am Med Inform Assoc 2019; 25:1322-1330. [PMID: 30113681 DOI: 10.1093/jamia/ocy095] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Accepted: 06/27/2018] [Indexed: 11/14/2022] Open
Abstract
Objective Medication adherence is an important aspect of chronic disease management. Electronic health record (EHR) data are often not linked to dispensing data, limiting clinicians' understanding of which of their patients fill their medications, and how to tailor care appropriately. We aimed to develop an algorithm to link EHR prescribing to claims-based dispensing data and use the results to quantify how often patients with diabetes filled prescribed chronic disease medications. Materials and Methods We developed an algorithm linking EHR prescribing data (RxNorm terminology) to claims-based dispensing data (NDC terminology), within sample of adult (19-64) community health center (CHC) patients with diabetes from a network of CHCs across 12 states. We demonstrate an application of the method by calculating dispense rates for a set of commonly prescribed diabetes and cardio-protective medications. To further inform clinical care, we computed adjusted odds ratios of dispense by patient-, encounter-, and clinic-level characteristics. Results Seventy-six percent of cardio-protective medication prescriptions and 74% of diabetes medications were linked to a dispensing record. Age, income, ethnicity, insurance, assigned primary care provider, comorbidity, time on EHR, and clinic size were significantly associated with odds of dispensing. Discussion EHR prescriptions and pharmacy dispense data can be linked at the record level across different terminologies. Dispensing rates in this low-income population with diabetes were similar to other populations. Conclusion Record linkage resulted in the finding that CHC patients with diabetes largely had their chronic disease medications dispensed. Understanding factors associated with dispensing rates highlight barriers and opportunities for optimal disease management.
Collapse
Affiliation(s)
| | - Heather Angier
- Department of Family Medicine, Oregon Health & Science University, Portland, Oregon, USA
| | | | - Andrew Suchocki
- Department of Family Medicine, Oregon Health & Science University, Portland, Oregon, USA
| | - John Muench
- Department of Family Medicine, Oregon Health & Science University, Portland, Oregon, USA
| | - Miguel Marino
- Department of Family Medicine, Oregon Health & Science University, Portland, Oregon, USA.,School of Public Health, Oregon Health & Science University - Portland State University, Portland, Oregon, USA
| | | | - Nathalie Huguet
- Department of Family Medicine, Oregon Health & Science University, Portland, Oregon, USA
| |
Collapse
|
13
|
Extracting Adverse Drug Event Information with Minimal Engineering. PROCEEDINGS OF THE CONFERENCE. ASSOCIATION FOR COMPUTATIONAL LINGUISTICS. NORTH AMERICAN CHAPTER. MEETING 2019; 2019:22-27. [PMID: 34027520 DOI: 10.18653/v1/w19-1903] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
In this paper we describe an evaluation of the potential of classical information extraction methods to extract drug-related attributes, including adverse drug events, and compare to more recently developed neural methods. We use the 2018 N2C2 shared task data as our gold standard data set for training. We train support vector machine classifiers to detect drug and drug attribute spans, and pair these detected entities as training instances for an SVM relation classifier, with both systems using standard features. We compare to baseline neural methods that use standard contextualized embedding representations for entity and relation extraction. The SVM-based system and a neural system obtain comparable results, with the SVM system doing better on concepts and the neural system performing better on relation extraction tasks. The neural system obtains surprisingly strong results compared to the system based on years of research in developing features for information extraction.
Collapse
|
14
|
Kuang Z, Bao Y, Thomson J, Caldwell M, Peissig P, Stewart R, Willett R, Page D. A Machine-Learning-Based Drug Repurposing Approach Using Baseline Regularization. Methods Mol Biol 2019; 1903:255-267. [PMID: 30547447 PMCID: PMC6296259 DOI: 10.1007/978-1-4939-8955-3_15] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
We present the baseline regularization model for computational drug repurposing using electronic health records (EHRs). In EHRs, drug prescriptions of various drugs are recorded throughout time for various patients. In the same time, numeric physical measurements (e.g., fasting blood glucose level) are also recorded. Baseline regularization uses statistical relationships between the occurrences of prescriptions of some particular drugs and the increase or the decrease in the values of some particular numeric physical measurements to identify potential repurposing opportunities.
Collapse
Affiliation(s)
| | - Yujia Bao
- The Massachusetts Institute of Technology
| | | | | | | | | | | | | |
Collapse
|
15
|
Chapman AB, Peterson KS, Alba PR, DuVall SL, Patterson OV. Detecting Adverse Drug Events with Rapidly Trained Classification Models. Drug Saf 2019; 42:147-156. [PMID: 30649737 PMCID: PMC6373386 DOI: 10.1007/s40264-018-0763-y] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
INTRODUCTION Identifying occurrences of medication side effects and adverse drug events (ADEs) is an important and challenging task because they are frequently only mentioned in clinical narrative and are not formally reported. METHODS We developed a natural language processing (NLP) system that aims to identify mentions of symptoms and drugs in clinical notes and label the relationship between the mentions as indications or ADEs. The system leverages an existing word embeddings model with induced word clusters for dimensionality reduction. It employs a conditional random field (CRF) model for named entity recognition (NER) and a random forest model for relation extraction (RE). RESULTS Final performance of each model was evaluated separately and then combined on a manually annotated evaluation set. The micro-averaged F1 score was 80.9% for NER, 88.1% for RE, and 61.2% for the integrated systems. Outputs from our systems were submitted to the NLP Challenges for Detecting Medication and Adverse Drug Events from Electronic Health Records (MADE 1.0) competition (Yu et al. in http://bio-nlp.org/index.php/projects/39-nlp-challenges , 2018). System performance was evaluated in three tasks (NER, RE, and complete system) with multiple teams submitting output from their systems for each task. Our RE system placed first in Task 2 of the challenge and our integrated system achieved third place in Task 3. CONCLUSION Adding to the growing number of publications that utilize NLP to detect occurrences of ADEs, our study illustrates the benefits of employing innovative feature engineering.
Collapse
Affiliation(s)
| | - Kelly S Peterson
- VA Salt Lake City Health Care System, University of Utah, Salt Lake City, UT, USA
- Division of Epidemiology, University of Utah, Salt Lake City, UT, USA
| | - Patrick R Alba
- VA Salt Lake City Health Care System, University of Utah, Salt Lake City, UT, USA
- Division of Epidemiology, University of Utah, Salt Lake City, UT, USA
| | - Scott L DuVall
- VA Salt Lake City Health Care System, University of Utah, Salt Lake City, UT, USA
- Division of Epidemiology, University of Utah, Salt Lake City, UT, USA
| | - Olga V Patterson
- VA Salt Lake City Health Care System, University of Utah, Salt Lake City, UT, USA.
- Division of Epidemiology, University of Utah, Salt Lake City, UT, USA.
| |
Collapse
|
16
|
Cocos A, Fiks AG, Masino AJ. Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in Twitter posts. J Am Med Inform Assoc 2018; 24:813-821. [PMID: 28339747 DOI: 10.1093/jamia/ocw180] [Citation(s) in RCA: 115] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2016] [Accepted: 12/17/2016] [Indexed: 01/07/2023] Open
Abstract
Objective Social media is an important pharmacovigilance data source for adverse drug reaction (ADR) identification. Human review of social media data is infeasible due to data quantity, thus natural language processing techniques are necessary. Social media includes informal vocabulary and irregular grammar, which challenge natural language processing methods. Our objective is to develop a scalable, deep-learning approach that exceeds state-of-the-art ADR detection performance in social media. Materials and Methods We developed a recurrent neural network (RNN) model that labels words in an input sequence with ADR membership tags. The only input features are word-embedding vectors, which can be formed through task-independent pretraining or during ADR detection training. Results Our best-performing RNN model used pretrained word embeddings created from a large, non-domain-specific Twitter dataset. It achieved an approximate match F-measure of 0.755 for ADR identification on the dataset, compared to 0.631 for a baseline lexicon system and 0.65 for the state-of-the-art conditional random field model. Feature analysis indicated that semantic information in pretrained word embeddings boosted sensitivity and, combined with contextual awareness captured in the RNN, precision. Discussion Our model required no task-specific feature engineering, suggesting generalizability to additional sequence-labeling tasks. Learning curve analysis showed that our model reached optimal performance with fewer training examples than the other models. Conclusion ADR detection performance in social media is significantly improved by using a contextually aware model and word embeddings formed from large, unlabeled datasets. The approach reduces manual data-labeling requirements and is scalable to large social media datasets.
Collapse
Affiliation(s)
- Anne Cocos
- Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia Philadelphia, PA, USA
| | - Alexander G Fiks
- Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia Philadelphia, PA, USA
| | - Aaron J Masino
- Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia Philadelphia, PA, USA
| |
Collapse
|
17
|
Vasiljeva I, Arandjelović O. Diagnosis Prediction from Electronic Health Records Using the Binary Diagnosis History Vector Representation. J Comput Biol 2017; 24:767-786. [DOI: 10.1089/cmb.2017.0023] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Affiliation(s)
- Ieva Vasiljeva
- School of Computer Science, University of St Andrews, St Andrews, Fife, Scotland, United Kingdom
| | - Ognjen Arandjelović
- School of Computer Science, University of St Andrews, St Andrews, Fife, Scotland, United Kingdom
| |
Collapse
|
18
|
Montvida O, Arandjelović O, Reiner E, Paul SK. Data Mining Approach to Estimate the Duration of Drug Therapy from Longitudinal Electronic Medical Records. ACTA ACUST UNITED AC 2017. [DOI: 10.2174/1875036201709010001] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Background:
Electronic Medical Records (EMRs) from primary/ ambulatory care systems present a new and promising source of information for conducting clinical and translational research.
Objectives:
To address the methodological and computational challenges in order to extract reliable medication information from raw data which is often complex, incomplete and erroneous. To assess whether the use of specific chaining fields of medication information may additionally improve the data quality.
Methods:
Guided by a range of challenges associated with missing and internally inconsistent data, we introduce two methods for the robust extraction of patient-level medication data. First method relies on chaining fields to estimate duration of treatment (“chaining”), while second disregards chaining fields and relies on the chronology of records (“continuous”). Centricity EMR database was used to estimate treatment duration with both methods for two widely prescribed drugs among type 2 diabetes patients: insulin and glucagon-like peptide-1 receptor agonists.
Results:
At individual patient level the “chaining” approach could identify the treatment alterations longitudinally and produced more robust estimates of treatment duration for individual drugs, while the “continuous” method was unable to capture that dynamics. At population level, both methods produced similar estimates of average treatment duration, however, notable differences were observed at individual-patient level.
Conclusion:
The proposed algorithms explicitly identify and handle longitudinal erroneous or missing entries and estimate treatment duration with specific drug(s) of interest, which makes them a valuable tool for future EMR based clinical and pharmaco-epidemiological studies. To improve accuracy of real-world based studies, implementing chaining fields of medication information is recommended.
Collapse
|
19
|
|
20
|
Abstract
Background and Objective Several studies have demonstrated the ability to detect adverse events potentially related to multiple drug exposure via data mining. However, the number of putative associations produced by such computational approaches is typically large, making experimental validation difficult. We theorized that those potential associations for which there is evidence from multiple complementary sources are more likely to be true, and explored this idea using a published database of drug–drug-adverse event associations derived from electronic health records (EHRs). Methods We prioritized drug–drug-event associations derived from EHRs using four sources of information: (1) public databases, (2) sources of spontaneous reports, (3) literature, and (4) non-EHR drug–drug interaction (DDI) prediction methods. After pre-filtering the associations by removing those found in public databases, we devised a ranking for associations based on the support from the remaining sources, and evaluated the results of this rank-based prioritization. Results We collected information for 5983 putative EHR-derived drug–drug-event associations involving 345 drugs and ten adverse events from four data sources and four prediction methods. Only seven drug–drug-event associations (<0.5 %) had support from the majority of evidence sources, and about one third (1777) had support from at least one of the evidence sources. Conclusions Our proof-of-concept method for scoring putative drug–drug-event associations from EHRs offers a systematic and reproducible way of prioritizing associations for further study. Our findings also quantify the agreement (or lack thereof) among complementary sources of evidence for drug–drug-event associations and highlight the challenges of developing a robust approach for prioritizing signals of these associations. Electronic supplementary material The online version of this article (doi:10.1007/s40264-015-0352-2) contains supplementary material, which is available to authorized users.
Collapse
|
21
|
Andrei V, Arandjelović O. Complex temporal topic evolution modelling using the Kullback-Leibler divergence and the Bhattacharyya distance. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2016; 2016:16. [PMID: 27746813 PMCID: PMC5042987 DOI: 10.1186/s13637-016-0050-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/06/2016] [Accepted: 09/12/2016] [Indexed: 11/10/2022]
Abstract
The rapidly expanding corpus of medical research literature presents major challenges in the understanding of previous work, the extraction of maximum information from collected data, and the identification of promising research directions. We present a case for the use of advanced machine learning techniques as an aide in this task and introduce a novel methodology that is shown to be capable of extracting meaningful information from large longitudinal corpora and of tracking complex temporal changes within it. Our framework is based on (i) the discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes. More specifically, this is the first work that discusses and distinguishes between two groups of particularly challenging topic evolution phenomena: topic splitting and speciation and topic convergence and merging, in addition to the more widely recognized emergence and disappearance and gradual evolution. The proposed framework is evaluated on a public medical literature corpus.
Collapse
Affiliation(s)
- Victor Andrei
- School of Computer Science, University of St Andrews, St Andrews KY16 9SX, Fife, Scotland, UK
| | - Ognjen Arandjelović
- School of Computer Science, University of St Andrews, St Andrews KY16 9SX, Fife, Scotland, UK
| |
Collapse
|
22
|
Kuang Z, Thomson J, Caldwell M, Peissig P, Stewart R, Page D. Computational Drug Repositioning Using Continuous Self-Controlled Case Series. KDD : PROCEEDINGS. INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING 2016; 2016:491-500. [PMID: 28316874 DOI: 10.1145/2939672.2939715] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Computational Drug Repositioning (CDR) is the task of discovering potential new indications for existing drugs by mining large-scale heterogeneous drug-related data sources. Leveraging the patient-level temporal ordering information between numeric physiological measurements and various drug prescriptions provided in Electronic Health Records (EHRs), we propose a Continuous Self-controlled Case Series (CSCCS) model for CDR. As an initial evaluation, we look for drugs that can control Fasting Blood Glucose (FBG) level in our experiments. Applying CSCCS to the Marshfield Clinic EHR, well-known drugs that are indicated for controlling blood glucose level are rediscovered. Furthermore, some drugs with recent literature support for the potential effect of blood glucose level control are also identified.
Collapse
|
23
|
Generalized enrichment analysis improves the detection of adverse drug events from the biomedical literature. BMC Bioinformatics 2016; 17:250. [PMID: 27333889 PMCID: PMC4918084 DOI: 10.1186/s12859-016-1080-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2015] [Accepted: 05/11/2016] [Indexed: 01/12/2023] Open
Abstract
Background Identification of associations between marketed drugs and adverse events from the biomedical literature assists drug safety monitoring efforts. Assessing the significance of such literature-derived associations and determining the granularity at which they should be captured remains a challenge. Here, we assess how defining a selection of adverse event terms from MeSH, based on information content, can improve the detection of adverse events for drugs and drug classes. Results We analyze a set of 105,354 candidate drug adverse event pairs extracted from article indexes in MEDLINE. First, we harmonize extracted adverse event terms by aggregating them into higher-level MeSH terms based on the terms’ information content. Then, we determine statistical enrichment of adverse events associated with drug and drug classes using a conditional hypergeometric test that adjusts for dependencies among associated terms. We compare our results with methods based on disproportionality analysis (proportional reporting ratio, PRR) and quantify the improvement in signal detection with our generalized enrichment analysis (GEA) approach using a gold standard of drug-adverse event associations spanning 174 drugs and four events. For single drugs, the best GEA method (Precision: .92/Recall: .71/F1-measure: .80) outperforms the best PRR based method (.69/.69/.69) on all four adverse event outcomes in our gold standard. For drug classes, our GEA performs similarly (.85/.69/.74) when increasing the level of abstraction for adverse event terms. Finally, on examining the 1609 individual drugs in our MEDLINE set, which map to chemical substances in ATC, we find signals for 1379 drugs (10,122 unique adverse event associations) on applying GEA with p < 0.005. Conclusions We present an approach based on generalized enrichment analysis that can be used to detect associations between drugs, drug classes and adverse events at a given level of granularity, at the same time correcting for known dependencies among events. Our study demonstrates the use of GEA, and the importance of choosing appropriate abstraction levels to complement current drug safety methods. We provide an R package for exploration of alternative abstraction levels of adverse event terms based on information content. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1080-z) contains supplementary material, which is available to authorized users.
Collapse
|
24
|
Harpaz R, Callahan A, Tamang S, Low Y, Odgers D, Finlayson S, Jung K, LePendu P, Shah NH. Text mining for adverse drug events: the promise, challenges, and state of the art. Drug Saf 2015; 37:777-90. [PMID: 25151493 DOI: 10.1007/s40264-014-0218-z] [Citation(s) in RCA: 107] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Text mining is the computational process of extracting meaningful information from large amounts of unstructured text. It is emerging as a tool to leverage underutilized data sources that can improve pharmacovigilance, including the objective of adverse drug event (ADE) detection and assessment. This article provides an overview of recent advances in pharmacovigilance driven by the application of text mining, and discusses several data sources-such as biomedical literature, clinical narratives, product labeling, social media, and Web search logs-that are amenable to text mining for pharmacovigilance. Given the state of the art, it appears text mining can be applied to extract useful ADE-related information from multiple textual sources. Nonetheless, further research is required to address remaining technical challenges associated with the text mining methodologies, and to conclusively determine the relative contribution of each textual source to improving pharmacovigilance.
Collapse
Affiliation(s)
- Rave Harpaz
- Center for Biomedical Informatics Research, Stanford University, 1265 Welch Road, Stanford, CA, 94305-5479, USA,
| | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Shah NH, Tenenbaum JD. The coming age of data-driven medicine: translational bioinformatics' next frontier. J Am Med Inform Assoc 2013; 19:e2-4. [PMID: 22718035 PMCID: PMC3392866 DOI: 10.1136/amiajnl-2012-000969] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Affiliation(s)
- Nigam H Shah
- Stanford Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, California, USA
| | - Jessica D Tenenbaum
- Duke Translational Medicine Institute, Duke University, Durham, North Carolina, USA
| |
Collapse
|
26
|
LePendu P, Iyer SV, Bauer-Mehren A, Harpaz R, Mortensen JM, Podchiyska T, Ferris TA, Shah NH. Pharmacovigilance using clinical notes. Clin Pharmacol Ther 2013; 93:547-55. [PMID: 23571773 PMCID: PMC3846296 DOI: 10.1038/clpt.2013.47] [Citation(s) in RCA: 101] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
With increasing adoption of electronic health records (EHRs), there is an opportunity to use the free-text portion of EHRs for pharmacovigilance. We present novel methods that annotate the unstructured clinical notes and transform them into a deidentified patient-feature matrix encoded using medical terminologies. We demonstrate the use of the resulting high-throughput data for detecting drug-adverse event associations and adverse events associated with drug-drug interactions. We show that these methods flag adverse events early (in most cases before an official alert), allow filtering of spurious signals by adjusting for potential confounding, and compile prevalence information. We argue that analyzing large volumes of free-text clinical notes enables drug safety surveillance using a yet untapped data source. Such data mining can be used for hypothesis generation and for rapid analysis of suspected adverse event risk.
Collapse
Affiliation(s)
- P LePendu
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, California, USA.
| | | | | | | | | | | | | | | |
Collapse
|
27
|
Sai K, Hanatani T, Azuma Y, Segawa K, Tohkin M, Omatsu H, Makimoto H, Hirai M, Saito Y. Development of a detection algorithm for statin-induced myopathy using electronic medical records. J Clin Pharm Ther 2013; 38:230-5. [DOI: 10.1111/jcpt.12063] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2013] [Accepted: 03/07/2013] [Indexed: 11/29/2022]
Affiliation(s)
- K. Sai
- Division of Medicinal Safety Science; National Institute of Health Sciences; Tokyo Japan
| | - T. Hanatani
- Division of Medicinal Safety Science; National Institute of Health Sciences; Tokyo Japan
- Department of Regulatory Science; Graduate School of Pharmaceutical Sciences; Nagoya City University; Nagoya Japan
| | - Y. Azuma
- Division of Medicinal Safety Science; National Institute of Health Sciences; Tokyo Japan
| | - K. Segawa
- Division of Medicinal Safety Science; National Institute of Health Sciences; Tokyo Japan
| | - M. Tohkin
- Division of Medicinal Safety Science; National Institute of Health Sciences; Tokyo Japan
- Department of Regulatory Science; Graduate School of Pharmaceutical Sciences; Nagoya City University; Nagoya Japan
| | - H. Omatsu
- Department of Hospital Pharmacy; Kobe University Hospital; Kobe Japan
| | - H. Makimoto
- Department of Hospital Pharmacy; Kobe University Hospital; Kobe Japan
| | - M. Hirai
- Department of Hospital Pharmacy; Kobe University Hospital; Kobe Japan
| | - Y. Saito
- Division of Medicinal Safety Science; National Institute of Health Sciences; Tokyo Japan
| |
Collapse
|
28
|
Jiang G, Liu H, Solbrig HR, Chute CG. ADEpedia 2.0: Integration of Normalized Adverse Drug Events (ADEs) Knowledge from the UMLS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2013; 2013:100-4. [PMID: 24303245 PMCID: PMC3845793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
A standardized Adverse Drug Events (ADEs) knowledge base that encodes known ADE knowledge can be very useful in improving ADE detection for drug safety surveillance. In our previous study, we developed the ADEpedia that is a standardized knowledge base of ADEs based on drug product labels. The objectives of the present study are 1) to integrate normalized ADE knowledge from the Unified Medical Language System (UMLS) into the ADEpedia; and 2) to enrich the knowledge base with the drug-disorder co-occurrence data from a 51-million-document electronic medical records (EMRs) system. We extracted 266,832 drug-disorder concept pairs from the UMLS, covering 14,256 (1.69%) distinct drug concepts and 19,006 (3.53%) distinct disorder concepts. Of them, 71,626 (26.8%) concept pairs from UMLS co-occurred in the EMRs. We performed a preliminary evaluation on the utility of the UMLS ADE data. In conclusion, we have built an ADEpedia 2.0 framework that intends to integrate known ADE knowledge from disparate sources. The UMLS is a useful source for providing standardized ADE knowledge relevant to indications, contraindications and adverse effects, and complementary to the ADE data from drug product labels. The statistics from EMRs would enable the meaningful use of ADE data for drug safety surveillance.
Collapse
Affiliation(s)
- Guoqian Jiang
- Department of Health Sciences Research, Division of Biomedical Statistics & Informatics, Mayo Clinic College of Medicine, Rochester, MN
| | - Hongfang Liu
- Department of Health Sciences Research, Division of Biomedical Statistics & Informatics, Mayo Clinic College of Medicine, Rochester, MN
| | - Harold R. Solbrig
- Department of Health Sciences Research, Division of Biomedical Statistics & Informatics, Mayo Clinic College of Medicine, Rochester, MN
| | - Christopher G. Chute
- Department of Health Sciences Research, Division of Biomedical Statistics & Informatics, Mayo Clinic College of Medicine, Rochester, MN
| |
Collapse
|
29
|
Avillach P, Coloma PM, Gini R, Schuemie M, Mougin F, Dufour JC, Mazzaglia G, Giaquinto C, Fornari C, Herings R, Molokhia M, Pedersen L, Fourrier-Réglat A, Fieschi M, Sturkenboom M, van der Lei J, Pariente A, Trifirò G. Harmonization process for the identification of medical events in eight European healthcare databases: the experience from the EU-ADR project. J Am Med Inform Assoc 2012; 20:184-92. [PMID: 22955495 DOI: 10.1136/amiajnl-2012-000933] [Citation(s) in RCA: 61] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
OBJECTIVE Data from electronic healthcare records (EHR) can be used to monitor drug safety, but in order to compare and pool data from different EHR databases, the extraction of potential adverse events must be harmonized. In this paper, we describe the procedure used for harmonizing the extraction from eight European EHR databases of five events of interest deemed to be important in pharmacovigilance: acute myocardial infarction (AMI); acute renal failure (ARF); anaphylactic shock (AS); bullous eruption (BE); and rhabdomyolysis (RHABD). DESIGN The participating databases comprise general practitioners' medical records and claims for hospitalization and other healthcare services. Clinical information is collected using four different disease terminologies and free text in two different languages. The Unified Medical Language System was used to identify concepts and corresponding codes in each terminology. A common database model was used to share and pool data and verify the semantic basis of the event extraction queries. Feedback from the database holders was obtained at various stages to refine the extraction queries. MEASUREMENTS Standardized and age specific incidence rates (IRs) were calculated to facilitate benchmarking and harmonization of event data extraction across the databases. This was an iterative process. RESULTS The study population comprised overall 19 647 445 individuals with a follow-up of 59 929 690 person-years (PYs). Age adjusted IRs for the five events of interest across the databases were as follows: (1) AMI: 60-148/100 000 PYs; (2) ARF: 3-49/100 000 PYs; (3) AS: 2-12/100 000 PYs; (4) BE: 2-17/100 000 PYs; and (5) RHABD: 0.1-8/100 000 PYs. CONCLUSIONS The iterative harmonization process enabled a more homogeneous identification of events across differently structured databases using different coding based algorithms. This workflow can facilitate transparent and reproducible event extractions and understanding of differences between databases.
Collapse
Affiliation(s)
- Paul Avillach
- LESIM, ISPED, University Bordeaux Segalen, Bordeaux, France.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Cross-terminology mapping challenges: a demonstration using medication terminological systems. J Biomed Inform 2012; 45:613-25. [PMID: 22750536 DOI: 10.1016/j.jbi.2012.06.005] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2011] [Revised: 06/12/2012] [Accepted: 06/14/2012] [Indexed: 11/23/2022]
Abstract
Standardized terminological systems for biomedical information have provided considerable benefits to biomedical applications and research. However, practical use of this information often requires mapping across terminological systems-a complex and time-consuming process. This paper demonstrates the complexity and challenges of mapping across terminological systems in the context of medication information. It provides a review of medication terminological systems and their linkages, then describes a case study in which we mapped proprietary medication codes from an electronic health record to SNOMED CT and the UMLS Metathesaurus. The goal was to create a polyhierarchical classification system for querying an i2b2 clinical data warehouse. We found that three methods were required to accurately map the majority of actively prescribed medications. Only 62.5% of source medication codes could be mapped automatically. The remaining codes were mapped using a combination of semi-automated string comparison with expert selection, and a completely manual approach. Compound drugs were especially difficult to map: only 7.5% could be mapped using the automatic method. General challenges to mapping across terminological systems include (1) the availability of up-to-date information to assess the suitability of a given terminological system for a particular use case, and to assess the quality and completeness of cross-terminology links; (2) the difficulty of correctly using complex, rapidly evolving, modern terminologies; (3) the time and effort required to complete and evaluate the mapping; (4) the need to address differences in granularity between the source and target terminologies; and (5) the need to continuously update the mapping as terminological systems evolve.
Collapse
|
31
|
Lependu P, Iyer SV, Fairon C, Shah NH. Annotation Analysis for Testing Drug Safety Signals using Unstructured Clinical Notes. J Biomed Semantics 2012; 3 Suppl 1:S5. [PMID: 22541596 PMCID: PMC3337270 DOI: 10.1186/2041-1480-3-s1-s5] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Background The electronic surveillance for adverse drug events is largely based upon the analysis of coded data from reporting systems. Yet, the vast majority of electronic health data lies embedded within the free text of clinical notes and is not gathered into centralized repositories. With the increasing access to large volumes of electronic medical data—in particular the clinical notes—it may be possible to computationally encode and to test drug safety signals in an active manner. Results We describe the application of simple annotation tools on clinical text and the mining of the resulting annotations to compute the risk of getting a myocardial infarction for patients with rheumatoid arthritis that take Vioxx. Our analysis clearly reveals elevated risks for myocardial infarction in rheumatoid arthritis patients taking Vioxx (odds ratio 2.06) before 2005. Conclusions Our results show that it is possible to apply annotation analysis methods for testing hypotheses about drug safety using electronic medical records.
Collapse
Affiliation(s)
- Paea Lependu
- Stanford Center for Biomedical Informatics Research, Stanford University, USA.
| | | | | | | |
Collapse
|
32
|
Liu Y, LePendu P, Iyer S, Shah NH. Using temporal patterns in medical records to discern adverse drug events from indications. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2012; 2012:47-56. [PMID: 22779050 PMCID: PMC3392062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
Researchers estimate that electronic health record systems record roughly 2-million ambulatory adverse drug events and that patients suffer from adverse drug events in roughly 30% of hospital stays. Some have used structured databases of patient medical records and health insurance claims recently-going beyond the current paradigm of using spontaneous reporting systems like AERS-to detect drug-safety signals. However, most efforts do not use the free-text from clinical notes in monitoring for drug-safety signals. We hypothesize that drug-disease co-occurrences, extracted from ontology-based annotations of the clinical notes, can be examined for statistical enrichment and used for drug safety surveillance. When analyzing such co-occurrences of drugs and diseases, one major challenge is to differentiate whether the disease in a drug-disease pair represents an indication or an adverse event. We demonstrate that it is possible to make this distinction by combining the frequency distribution of the drug, the disease, and the drug-disease pair as well as the temporal ordering of the drugs and diseases in each pair across more than one million patients.
Collapse
|
33
|
Detection of Adverse Drug Reaction Signals Using an Electronic Health Records Database: Comparison of the Laboratory Extreme Abnormality Ratio (CLEAR) Algorithm. Clin Pharmacol Ther 2012; 91:467-74. [PMID: 22237257 DOI: 10.1038/clpt.2011.248] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
34
|
Shah NH. Translational bioinformatics embraces big data. Yearb Med Inform 2012; 7:130-134. [PMID: 22890354 PMCID: PMC4370941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2023] Open
Abstract
We review the latest trends and major developments in translational bioinformatics in the year 2011-2012. Our emphasis is on highlighting the key events in the field and pointing at promising research areas for the future. The key take-home points are: • Translational informatics is ready to revolutionize human health and healthcare using large-scale measurements on individuals. • Data-centric approaches that compute on massive amounts of data (often called "Big Data") to discover patterns and to make clinically relevant predictions will gain adoption. • Research that bridges the latest multimodal measurement technologies with large amounts of electronic healthcare data is increasing; and is where new breakthroughs will occur.
Collapse
Affiliation(s)
- N H Shah
- Stanford University School of Medicine, 1265 Welch Road, Room X-229, Stanford, CA 94305, USA. E-mail:
| |
Collapse
|
35
|
Anderson HD, Pace WD, Libby AM, West DR, Valuck RJ. Rates of 5 common antidepressant side effects among new adult and adolescent cases of depression: a retrospective US claims study. Clin Ther 2011; 34:113-23. [PMID: 22177545 DOI: 10.1016/j.clinthera.2011.11.024] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/16/2011] [Indexed: 10/14/2022]
Abstract
BACKGROUND Antidepressants are the first-line treatment for depression, yet medication-related side effects may be associated with antidepressant discontinuation before reaching a period of exposure believed to result in effectiveness. There is a gap in knowledge of the prevalence of side effects across commonly prescribed antidepressants and the effect of the type of antidepressant on the likelihood of side effects in real-world clinical practice. OBJECTIVE The aim of this study was to estimate and compare the prevalence of headaches, nausea or vomiting, agitation, sedation, and sexual dysfunction among patients diagnosed with depression who initiated monotherapy across different classes of antidepressants and to estimate the effect of the type of antidepressant on the likelihood of each of the 5 side effects. METHODS A retrospective cohort of patients aged ≥13 who were newly diagnosed with depression and began antidepressant monotherapy was created using LifeLink managed care claims from 1998 to 2008. Antidepressant groups included selective serotonin reuptake inhibitors (SSRIs), serotonin-norepinephrine reuptake inhibitors (SNRIs), tricyclic antidepressants (TCAs), monoamine oxidase inhibitors (MAOIs), bupropion, phenylpiperazine, and tetracyclic antidepressants. Prevalence of headache, nausea or vomiting, agitation, sedation, and sexual dysfunction were compared across antidepressant groups. Propensity-adjusted Cox proportional hazards regression was used to estimate the likelihood of each of the 5 side effects for each antidepressant group compared with SSRIs, adjusted for demographic, clinical, and treatment characteristics. RESULTS The study cohort included 40,017 patients (3617 adolescents, aged 13-18 years, and 36,400 adults, aged ≥19 years; mean age = 45 years; 67% female) with a new episode of depression who were initiated on antidepressant monotherapy within 30 days of diagnosis (SSRI [66%], bupropion [14%], SNRI [12%], other [8%]). The most common side effects were headache (up to 17/1000 person-months of therapy in adults and adolescents) and nausea (up to 7.2/1000 in adults, 9.3/1000 in adolescents). Relative to adults receiving SSRIs, adults receiving SNRIs had a higher risk of nausea (hazard ratio [HR] = 1.26; 95%CI,1.05-1.51). Adults (HR = 0.78; 95% CI, 0.62-0.96) and adolescents (HR = 0.43; 95% CI, 0.21-0.87) taking bupropion were less likely to experience headaches compared with adults and adolescents, respectively, taking an SSRI. Adolescents receiving a tetracyclic were more likely to experience headaches than adolescents receiving an SSRI (HR = 3.16; 95%CI, 1.13-8.84). CONCLUSIONS Prevalence and risk of the 5 side effects varied across types of antidepressants for both adults and adolescents. Results from this study were consistent with prior clinical trials, suggesting that variation in side effect profiles exists in a more generalized managed care population.
Collapse
Affiliation(s)
- Heather D Anderson
- University of Colorado Skaggs School of Pharmacy and Pharmaceutical Sciences, Denver, CO, USA
| | | | | | | | | |
Collapse
|
36
|
Lau EC, Mowat FS, Kelsh MA, Legg JC, Engel-Nitz NM, Watson HN, Collins HL, Nordyke RJ, Whyte JL. Use of electronic medical records (EMR) for oncology outcomes research: assessing the comparability of EMR information to patient registry and health claims data. Clin Epidemiol 2011; 3:259-72. [PMID: 22135501 PMCID: PMC3224632 DOI: 10.2147/clep.s23690] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
UNLABELLED Electronic medical records (EMRs) are used increasingly for research in clinical oncology, epidemiology, and comparative effectiveness research (CER). OBJECTIVE To assess the utility of using EMR data in population-based cancer research by comparing a database of EMRs from community oncology clinics against Surveillance Epidemiology and End Results (SEER) cancer registry data and two claims databases (Medicare and commercial claims). STUDY DESIGN AND SETTING DEMOGRAPHIC, CLINICAL, AND TREATMENT PATTERNS IN THE EMR, SEER, MEDICARE, AND COMMERCIAL CLAIMS DATA WERE COMPARED USING SIX TUMOR SITES: breast, lung/bronchus, head/neck, colorectal, prostate, and non-Hodgkin's lymphoma (NHL). We identified various challenges in data standardization and selection of appropriate statistical procedures. We describe the patient and clinic inclusion criteria, treatment definitions, and consideration of the administrative and clinical purposes of the EMR, registry, and claims data to address these challenges. RESULTS Sex and 10-year age distributions of patient populations for each tumor site were generally similar across the data sets. We observed several differences in racial composition and treatment patterns, and modest differences in distribution of tumor site. CONCLUSION Our experience with an oncology EMR database identified several factors that must be considered when using EMRs for research purposes or generalizing results to the US cancer population. These factors were related primarily to evaluation of treatment patterns, including evaluation of stage, geographic location, race, and specialization of the medical facilities. While many specialty EMRs may not provide the breadth of data on medical care, as found in comprehensive claims databases and EMR systems, they can provide detailed clinical data not found in claims that are extremely important in conducting epidemiologic and outcomes research.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Robert J Nordyke
- PriceSpective LLC, El Segundo, CA, USA
- Department of Health Services, UCLA School of Public Health, Los Angeles, CA, USA
| | | |
Collapse
|