1
|
Neyarapally GA, Wu L, Xu J, Zhou EH, Dang O, Lee J, Mehta D, Vaughn RD, Pinnow E, Fang H. Description and Validation of a Novel AI Tool, LabelComp, for the Identification of Adverse Event Changes in FDA Labeling. Drug Saf 2024:10.1007/s40264-024-01468-8. [PMID: 39085589 DOI: 10.1007/s40264-024-01468-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/15/2024] [Indexed: 08/02/2024]
Abstract
INTRODUCTION The accurate identification and timely updating of adverse reactions in drug labeling are crucial for patient safety and effective drug use. Postmarketing surveillance plays a pivotal role in identifying previously undetected adverse events (AEs) that emerge when a drug is used in broader and more diverse patient populations. However, traditional methods of updating drug labeling with new AE information have been manual, time consuming, and error prone. This paper introduces the LabelComp tool, an innovative artificial intelligence (AI) tool designed to enhance the efficiency and accuracy of postmarketing drug safety surveillance. Utilizing a combination of text analytics and a trained Bidirectional Encoder Representations from Transformers (BERT) model, the LabelComp tool automatically identifies changes in AE terms from updated drug labeling documents. OBJECTIVE Our objective was to create and validate an AI tool with high accuracy that could enable researchers and FDA reviewers to efficiently identify safety-related drug labeling changes. RESULTS Our validation study of 87 drug labeling PDF pairs demonstrates the tool's high accuracy, with F1 scores of overall performance ranging from 0.795 to 0.936 across different evaluation tiers and a recall of at least 0.997 with only one missed AE out of 483 total AEs detected, indicating the tool's efficacy in identifying new AEs. CONCLUSION The LabelComp tool can support drug safety surveillance and inform regulatory decision-making. The publication of this tool also aims to encourage further community-driven enhancements, aligning with broader interests in applying AI to advance regulatory science and public health.
Collapse
Affiliation(s)
- George A Neyarapally
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research (CDER), FDA, Silver Spring, MD, USA.
| | - Leihong Wu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research (NCTR), US Food and Drug Administration (FDA), Jefferson, AR, USA
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research (NCTR), US Food and Drug Administration (FDA), Jefferson, AR, USA
| | - Esther H Zhou
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research (CDER), FDA, Silver Spring, MD, USA
| | - Oanh Dang
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research (CDER), FDA, Silver Spring, MD, USA
| | - Joann Lee
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research (CDER), FDA, Silver Spring, MD, USA
| | - Dharmang Mehta
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research (CDER), FDA, Silver Spring, MD, USA
| | - Rochelle D Vaughn
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research (CDER), FDA, Silver Spring, MD, USA
| | - Ellen Pinnow
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research (CDER), FDA, Silver Spring, MD, USA
| | - Hong Fang
- Office of Scientific Coordination, National Center for Toxicological Research (NCTR), FDA, Jefferson, AR, USA
| |
Collapse
|
2
|
Wu L, Gray M, Dang O, Xu J, Fang H, Tong W. RxBERT: Enhancing drug labeling text mining and analysis with AI language modeling. Exp Biol Med (Maywood) 2023; 248:1937-1943. [PMID: 38166420 PMCID: PMC10798181 DOI: 10.1177/15353702231220669] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 11/02/2023] [Indexed: 01/04/2024] Open
Abstract
The US drug labeling document contains essential information on drug efficacy and safety, making it a crucial regulatory resource for Food and Drug Administration (FDA) drug reviewers. Due to its extensive volume and the presence of free-text, conventional text mining analysis have encountered challenges in processing these data. Recent advances in artificial intelligence (AI) for natural language processing (NLP) have provided an unprecedented opportunity to identify key information from drug labeling, thereby enhancing safety reviews and support for regulatory decisions. We developed RxBERT, a Bidirectional Encoder Representations from Transformers (BERT) model pretrained on FDA human prescription drug labeling documents for an enhanced application of drug labeling documents in both research and drug review. RxBERT was derived from BioBERT with further training on human prescription drug labeling documents. RxBERT was demonstrated in several tasks using regulatory datasets, including those involved in the National Institutes of Technology Text Analysis Challenge Dataset (NIST TAC dataset), the FDA Adverse Drug Event Evaluation Dataset (ADE Eval dataset), and the classification of texts from submission packages into labeling sections (US Drug Labeling dataset). For all these tasks, RxBERT reached 86.5 F1-scores in both TAC and ADE Eval classification, respectively, and prediction accuracy of 87% for the US Drug Labeling dataset. Overall, RxBERT was shown to be as competitive or have better performance compared to other NLP approaches such as BERT, BioBERT, etc. In summary, we developed RxBERT, a transformer-based model specific for drug labeling that outperformed the original BERT model. RxBERT has the potential to be used to assist research scientists and FDA reviewers to better process and utilize drug labeling information toward the advancement of drug effectiveness and safety for public health. This proof-of-concept study also demonstrated a potential pathway to customized large language models (LLMs) tailored to the sensitive regulatory documents for internal application.
Collapse
Affiliation(s)
- Leihong Wu
- Division of Bioinformatics and Biostatistics, FDA National Center for Toxicological Research, Jefferson, AR 72079, USA
| | - Magnus Gray
- Division of Bioinformatics and Biostatistics, FDA National Center for Toxicological Research, Jefferson, AR 72079, USA
| | - Oanh Dang
- Office of Surveillance and Epidemiology, FDA Center for Drug Evaluation and Research, Silver Spring, MD 20993, USA
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, FDA National Center for Toxicological Research, Jefferson, AR 72079, USA
| | - Hong Fang
- Office of Scientific Coordination, FDA National Center for Toxicological Research, Jefferson, AR 72079, USA
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, FDA National Center for Toxicological Research, Jefferson, AR 72079, USA
| |
Collapse
|
3
|
Niazi SK. The Coming of Age of AI/ML in Drug Discovery, Development, Clinical Testing, and Manufacturing: The FDA Perspectives. Drug Des Devel Ther 2023; 17:2691-2725. [PMID: 37701048 PMCID: PMC10493153 DOI: 10.2147/dddt.s424991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 08/24/2023] [Indexed: 09/14/2023] Open
Abstract
Artificial intelligence (AI) and machine learning (ML) represent significant advancements in computing, building on technologies that humanity has developed over millions of years-from the abacus to quantum computers. These tools have reached a pivotal moment in their development. In 2021 alone, the U.S. Food and Drug Administration (FDA) received over 100 product registration submissions that heavily relied on AI/ML for applications such as monitoring and improving human performance in compiling dossiers. To ensure the safe and effective use of AI/ML in drug discovery and manufacturing, the FDA and numerous other U.S. federal agencies have issued continuously updated, stringent guidelines. Intriguingly, these guidelines are often generated or updated with the aid of AI/ML tools themselves. The overarching goal is to expedite drug discovery, enhance the safety profiles of existing drugs, introduce novel treatment modalities, and improve manufacturing compliance and robustness. Recent FDA publications offer an encouraging outlook on the potential of these tools, emphasizing the need for their careful deployment. This has expanded market opportunities for retraining personnel handling these technologies and enabled innovative applications in emerging therapies such as gene editing, CRISPR-Cas9, CAR-T cells, mRNA-based treatments, and personalized medicine. In summary, the maturation of AI/ML technologies is a testament to human ingenuity. Far from being autonomous entities, these are tools created by and for humans designed to solve complex problems now and in the future. This paper aims to present the status of these technologies, along with examples of their present and future applications.
Collapse
|
4
|
Humbert-Droz M, Corley J, Tamang S, Gevaert O. Development and validation of MedDRA Tagger: a tool for extraction and structuring medical information from clinical notes. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2022:2022.12.14.22283470. [PMID: 36561189 PMCID: PMC9774225 DOI: 10.1101/2022.12.14.22283470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Rapid and automated extraction of clinical information from patients' notes is a desirable though difficult task. Natural language processing (NLP) and machine learning have great potential to automate and accelerate such applications, but developing such models can require a large amount of labeled clinical text, which can be a slow and laborious process. To address this gap, we propose the MedDRA tagger, a fast annotation tool that makes use of industrial level libraries such as spaCy, biomedical ontologies and weak supervision to annotate and extract clinical concepts at scale. The tool can be used to annotate clinical text and obtain labels for training machine learning models and further refine the clinical concept extraction performance, or to extract clinical concepts for observational study purposes. To demonstrate the usability and versatility of our tool, we present three different use cases: we use the tagger to determine patients with a primary brain cancer diagnosis, we show evidence of rising mental health symptoms at the population level and our last use case shows the evolution of COVID-19 symptomatology throughout three waves between February 2020 and October 2021. The validation of our tool showed good performance on both specific annotations from our development set (F1 score 0.81) and open source annotated data set (F1 score 0.79). We successfully demonstrate the versatility of our pipeline with three different use cases. Finally, we note that the modular nature of our tool allows for a straightforward adaptation to another biomedical ontology. We also show that our tool is independent of EHR system, and as such generalizable.
Collapse
Affiliation(s)
- Marie Humbert-Droz
- Stanford Center for Biomedical Informatics Research, Department of Medicine, Stanford University, Stanford, CA
| | | | - Suzanne Tamang
- Department of Biomedical Data Science, Stanford University, Stanford, CA
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research, Department of Medicine, Stanford University, Stanford, CA
- Department of Biomedical Data Science, Stanford University, Stanford, CA
| |
Collapse
|
5
|
Gonzalez-Hernandez G, Krallinger M, Muñoz M, Rodriguez-Esteban R, Uzuner Ö, Hirschman L. Challenges and opportunities for mining adverse drug reactions: perspectives from pharma, regulatory agencies, healthcare providers and consumers. Database (Oxford) 2022; 2022:baac071. [PMID: 36050787 PMCID: PMC9436770 DOI: 10.1093/database/baac071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 07/08/2022] [Accepted: 08/25/2022] [Indexed: 11/17/2022]
Abstract
Monitoring drug safety is a central concern throughout the drug life cycle. Information about toxicity and adverse events is generated at every stage of this life cycle, and stakeholders have a strong interest in applying text mining and artificial intelligence (AI) methods to manage the ever-increasing volume of this information. Recognizing the importance of these applications and the role of challenge evaluations to drive progress in text mining, the organizers of BioCreative VII (Critical Assessment of Information Extraction in Biology) convened a panel of experts to explore 'Challenges in Mining Drug Adverse Reactions'. This article is an outgrowth of the panel; each panelist has highlighted specific text mining application(s), based on their research and their experiences in organizing text mining challenge evaluations. While these highlighted applications only sample the complexity of this problem space, they reveal both opportunities and challenges for text mining to aid in the complex process of drug discovery, testing, marketing and post-market surveillance. Stakeholders are eager to embrace natural language processing and AI tools to help in this process, provided that these tools can be demonstrated to add value to stakeholder workflows. This creates an opportunity for the BioCreative community to work in partnership with regulatory agencies, pharma and the text mining community to identify next steps for future challenge evaluations.
Collapse
Affiliation(s)
- Graciela Gonzalez-Hernandez
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., West Hollywood, CA 90069, USA
| | - Martin Krallinger
- Life Sciences—Text Mining, Barcelona Supercomputing Center, Plaça Eusebi Güell, 1-3, Barcelona 08034, Spain
| | - Monica Muñoz
- Division of Pharmacovigilance, Office of Surveillance and Epidemiology, Center of Drug Evaluation and Research, FDA, 10903 New Hampshire Ave, Silver Spring, MD 20993, USA
| | - Raul Rodriguez-Esteban
- Roche Innovation Center Basel, Roche Pharmaceuticals, Grenzacherstrasse 124, Basel 4070, Switzerland
| | - Özlem Uzuner
- Information Sciences and Technology, George Mason University, 4400 University Dr, Fairfax, VA 22030, USA
| | - Lynette Hirschman
- MITRE Labs, The MITRE Corporation, 202 Burlington Rd., Bedford, MA 01730, USA
| |
Collapse
|
6
|
Utilizing Deep Learning for Detecting Adverse Drug Events in Structured and Unstructured Regulatory Drug Data Sets. Pharmaceut Med 2022; 36:307-317. [DOI: 10.1007/s40290-022-00434-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/01/2022] [Indexed: 10/16/2022]
|
7
|
Multilabel classification of medical concepts for patient clinical profile identification. Artif Intell Med 2022; 128:102311. [DOI: 10.1016/j.artmed.2022.102311] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 04/21/2022] [Accepted: 04/22/2022] [Indexed: 11/18/2022]
|
8
|
Ball R, Dal Pan G. "Artificial Intelligence" for Pharmacovigilance: Ready for Prime Time? Drug Saf 2022; 45:429-438. [PMID: 35579808 PMCID: PMC9112277 DOI: 10.1007/s40264-022-01157-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/10/2022] [Indexed: 01/28/2023]
Abstract
There is great interest in the application of 'artificial intelligence' (AI) to pharmacovigilance (PV). Although US FDA is broadly exploring the use of AI for PV, we focus on the application of AI to the processing and evaluation of Individual Case Safety Reports (ICSRs) submitted to the FDA Adverse Event Reporting System (FAERS). We describe a general framework for considering the readiness of AI for PV, followed by some examples of the application of AI to ICSR processing and evaluation in industry and FDA. We conclude that AI can usefully be applied to some aspects of ICSR processing and evaluation, but the performance of current AI algorithms requires a 'human-in-the-loop' to ensure good quality. We identify outstanding scientific and policy issues to be addressed before the full potential of AI can be exploited for ICSR processing and evaluation, including approaches to quality assurance of 'human-in-the-loop' AI systems, large-scale, publicly available training datasets, a well-defined and computable 'cognitive framework', a formal sociotechnical framework for applying AI to PV, and development of best practices for applying AI to PV. Practical experience with stepwise implementation of AI for ICSR processing and evaluation will likely provide important lessons that will inform the necessary policy and regulatory framework to facilitate widespread adoption and provide a foundation for further development of AI approaches to other aspects of PV.
Collapse
Affiliation(s)
- Robert Ball
- grid.483500.a0000 0001 2154 2448US Food and Drug Administration, Center for Drug Evaluation and Research, Office of Surveillance and Epidemiology, Silver Spring, MD USA
| | - Gerald Dal Pan
- grid.483500.a0000 0001 2154 2448US Food and Drug Administration, Center for Drug Evaluation and Research, Office of Surveillance and Epidemiology, Silver Spring, MD USA
| |
Collapse
|
9
|
Wang J, Ren Y, Zhang Z, Xu H, Zhang Y. From Tokenization to Self-Supervision: Building a High-Performance Information Extraction System for Chemical Reactions in Patents. Front Res Metr Anal 2021; 6:691105. [PMID: 35005421 PMCID: PMC8727901 DOI: 10.3389/frma.2021.691105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 11/02/2021] [Indexed: 11/28/2022] Open
Abstract
Chemical reactions and experimental conditions are fundamental information for chemical research and pharmaceutical applications. However, the latest information of chemical reactions is usually embedded in the free text of patents. The rapidly accumulating chemical patents urge automatic tools based on natural language processing (NLP) techniques for efficient and accurate information extraction. This work describes the participation of the Melax Tech team in the CLEF 2020-ChEMU Task of Chemical Reaction Extraction from Patent. The task consisted of two subtasks: (1) named entity recognition to identify compounds and different semantic roles in the chemical reaction and (2) event extraction to identify event triggers of chemical reaction and their relations with the semantic roles recognized in subtask 1. To build an end-to-end system with high performance, multiple strategies tailored to chemical patents were applied and evaluated, ranging from optimizing the tokenization, pre-training patent language models based on self-supervision, to domain knowledge-based rules. Our hybrid approaches combining different strategies achieved state-of-the-art results in both subtasks, with the top-ranked F1 of 0.957 for entity recognition and the top-ranked F1 of 0.9536 for event extraction, indicating that the proposed approaches are promising.
Collapse
Affiliation(s)
- Jingqi Wang
- Melax Technologies, Inc., Houston, TX, United States
| | - Yuankai Ren
- School of Medicine, Nantong University, Nantong, China
| | - Zhi Zhang
- School of Medicine, Nantong University, Nantong, China
| | - Hua Xu
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Yaoyun Zhang
- Melax Technologies, Inc., Houston, TX, United States
| |
Collapse
|
10
|
An attentive joint model with transformer-based weighted graph convolutional network for extracting adverse drug event relation. J Biomed Inform 2021; 125:103968. [PMID: 34871807 DOI: 10.1016/j.jbi.2021.103968] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 11/25/2021] [Accepted: 11/27/2021] [Indexed: 11/21/2022]
Abstract
Adverse drug event (ADE) relation extraction is a crucial task for drug safety surveillance which aims to discover potential relations between ADE mentions from unstructured medical texts. To date, the graph convolutional networks (GCN) have been the state-of-the-art solutions for improving the ability of relation extraction task. However, there are many challenging issues that should be addressed. Among these, the syntactic information is not fully exploited by GCN-based methods, especially the diversified dependency edges. Still, these methods fail to effectively extract complex relations that include nested, discontinuous and overlapping mentions. Besides, the task is primarily regarded as a classification problem where each candidate relation is treated independently which neglects the interaction between other relations. To deal with these issues, in this paper, we propose an attentive joint model with transformer-based weighted GCN for extracting ADE Relations, called ADERel. Firstly, the ADERel system formulates the ADE relation extraction task as an N-level sequence labelling so as to model the complex relations in different levels and capture greater interaction between relations. Then, it exploits our neural joint model to process the N-level sequences jointly. The joint model leverages the contextual and structural information by adopting a shared representation that combines a bidirectional encoder representation from transformers (BERT) and our proposed weighted GCN (WGCN). The latter assigns a score to each dependency edge within a sentence so as to capture rich syntactic features and determine the most influential edges for extracting ADE relations. Finally, the system employs a multi-head attention to exchange boundary knowledge across levels. We evaluate ADERel on two benchmark datasets from TAC 2017 and n2c2 2018 shared tasks. The experimental results show that ADERel is superior in performance compared with several state-of-the-art methods. The results also demonstrate that incorporating a transformer model with WGCN makes the proposed system more effective for extracting various types of ADE relations. The evaluations further highlight that ADERel takes advantage of joint learning, showing its effectiveness in recognizing complex relations.
Collapse
|