1
|
He T, Belouali A, Patricoski J, Lehmann H, Ball R, Anagnostou V, Kreimeyer K, Botsis T. Trends and opportunities in computable clinical phenotyping: A scoping review. J Biomed Inform 2023; 140:104335. [PMID: 36933631 DOI: 10.1016/j.jbi.2023.104335] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 03/07/2023] [Accepted: 03/09/2023] [Indexed: 03/18/2023]
Abstract
Identifying patient cohorts meeting the criteria of specific phenotypes is essential in biomedicine and particularly timely in precision medicine. Many research groups deliver pipelines that automatically retrieve and analyze data elements from one or more sources to automate this task and deliver high-performing computable phenotypes. We applied a systematic approach based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines to conduct a thorough scoping review on computable clinical phenotyping. Five databases were searched using a query that combined the concepts of automation, clinical context, and phenotyping. Subsequently, four reviewers screened 7960 records (after removing over 4000 duplicates) and selected 139 that satisfied the inclusion criteria. This dataset was analyzed to extract information on target use cases, data-related topics, phenotyping methodologies, evaluation strategies, and portability of developed solutions. Most studies supported patient cohort selection without discussing the application to specific use cases, such as precision medicine. Electronic Health Records were the primary source in 87.1 % (N = 121) of all studies, and International Classification of Diseases codes were heavily used in 55.4 % (N = 77) of all studies, however, only 25.9 % (N = 36) of the records described compliance with a common data model. In terms of the presented methods, traditional Machine Learning (ML) was the dominant method, often combined with natural language processing and other approaches, while external validation and portability of computable phenotypes were pursued in many cases. These findings revealed that defining target use cases precisely, moving away from sole ML strategies, and evaluating the proposed solutions in the real setting are essential opportunities for future work. There is also momentum and an emerging need for computable phenotyping to support clinical and epidemiological research and precision medicine.
Collapse
Affiliation(s)
- Ting He
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA; Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| | - Anas Belouali
- Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Jessica Patricoski
- Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Harold Lehmann
- Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Robert Ball
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Valsamo Anagnostou
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Kory Kreimeyer
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA; Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Taxiarchis Botsis
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA; Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
2
|
Eldredge CE, Pracht E, Gallagher J, Tsalatsanis A. Direct Versus Indirect Query Performance of ICD-9/-10 Coding to Identify Anaphylaxis. THE JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY. IN PRACTICE 2023; 11:1190-1197.e2. [PMID: 36621609 DOI: 10.1016/j.jaip.2022.12.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 12/16/2022] [Accepted: 12/19/2022] [Indexed: 01/07/2023]
Abstract
BACKGROUND Anaphylaxis is an often under =diagnosed, severe allergic event for which epidemiological data are sporadic. Researchers have leveraged administrative and claims data algorithms to study large databases of anaphylactic events; however, little longitudinal data analysis is available after transition to the International Classification of Diseases, 10th Revision, Clinical Modification (ICD-10-CM). OBJECTIVE Study longitudinal trends in anaphylaxis incidence using direct and indirect query methods. METHODS Emergency department (ED) and inpatient data were analyzed from a large state health care administration database from 2011 to 2020. Incidence was calculated using direct queries of anaphylaxis ICD-9-CM and ICD-10-CM codes and indirect queries using a symptom-based ICD-9-CM algorithm and forward mapped ICD-10-CM version to identify undiagnosed anaphylaxis episodes and to assess algorithm performance at the population level. RESULTS An average of 2.4 million inpatient and 7.5 million ED observations/y were analyzed. Using the direct query method, annual ED anaphylaxis cases increased steadily from 1,454 (2011) to 4,029 (2019) then declined to 3,341 in 2020 during the coronavirus disease 2019 (COVID-19) pandemic. In contrast, inpatient cases remained relatively steady, with a slight decline after 2015 during the ICD version transition, until a significant drop occurred in 2020. Using the indirect queries, anaphylaxis cases increased markedly after the ICD transition year, especially involving drug-related anaphylaxis. CONCLUSIONS Nontypical drug associations with anaphylaxis episodes using the ICD-10-CM version of the algorithm suggest poor performance with drug-related codes. Further, the increased granularity of ICD-10-CM identified potential limitations of a previously validated symptom-based ICD-9-CM algorithm used to detect undiagnosed cases.
Collapse
Affiliation(s)
| | - Etienne Pracht
- College of Public Health, University of South Florida, Tampa, Fla
| | - Joel Gallagher
- Cone Health, University of North Carolina-Chapel Hill, Chapel Hill, NC
| | | |
Collapse
|
3
|
Carrell DS, Gruber S, Floyd JS, Bann MA, Cushing-Haugen KL, Johnson RL, Graham V, Cronkite DJ, Hazlehurst BL, Felcher AH, Bejan CA, Kennedy A, Shinde MU, Karami S, Ma Y, Stojanovic D, Zhao Y, Ball R, Nelson JC. Improving Methods of Identifying Anaphylaxis for Medical Product Safety Surveillance Using Natural Language Processing and Machine Learning. Am J Epidemiol 2022; 192:283-295. [PMID: 36331289 PMCID: PMC9896464 DOI: 10.1093/aje/kwac182] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Revised: 07/06/2022] [Accepted: 10/11/2022] [Indexed: 11/06/2022] Open
Abstract
We sought to determine whether machine learning and natural language processing (NLP) applied to electronic medical records could improve performance of automated health-care claims-based algorithms to identify anaphylaxis events using data on 516 patients with outpatient, emergency department, or inpatient anaphylaxis diagnosis codes during 2015-2019 in 2 integrated health-care institutions in the Northwest United States. We used one site's manually reviewed gold-standard outcomes data for model development and the other's for external validation based on cross-validated area under the receiver operating characteristic curve (AUC), positive predictive value (PPV), and sensitivity. In the development site 154 (64%) of 239 potential events met adjudication criteria for anaphylaxis compared with 180 (65%) of 277 in the validation site. Logistic regression models using only structured claims data achieved a cross-validated AUC of 0.58 (95% CI: 0.54, 0.63). Machine learning improved cross-validated AUC to 0.62 (0.58, 0.66); incorporating NLP-derived covariates further increased cross-validated AUCs to 0.70 (0.66, 0.75) in development and 0.67 (0.63, 0.71) in external validation data. A classification threshold with cross-validated PPV of 79% and cross-validated sensitivity of 66% in development data had cross-validated PPV of 78% and cross-validated sensitivity of 56% in external data. Machine learning and NLP-derived data improved identification of validated anaphylaxis events.
Collapse
Affiliation(s)
- David S Carrell
- Correspondence to Dr. David Carrell, Kaiser Permanente Washington Health Research Institute, 1730 Minor Avenue, Suite 1600, Seattle, WA 98101 (e-mail: )
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
4
|
Binkheder S, Wu HY, Quinney SK, Zhang S, Zitu MM, Chiang CW, Wang L, Jones J, Li L. PhenoDEF: a corpus for annotating sentences with information of phenotype definitions in biomedical literature. J Biomed Semantics 2022; 13:17. [PMID: 35690873 PMCID: PMC9188713 DOI: 10.1186/s13326-022-00272-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Accepted: 05/18/2022] [Indexed: 12/28/2022] Open
Abstract
Background Adverse events induced by drug-drug interactions are a major concern in the United States. Current research is moving toward using electronic health record (EHR) data, including for adverse drug events discovery. One of the first steps in EHR-based studies is to define a phenotype for establishing a cohort of patients. However, phenotype definitions are not readily available for all phenotypes. One of the first steps of developing automated text mining tools is building a corpus. Therefore, this study aimed to develop annotation guidelines and a gold standard corpus to facilitate building future automated approaches for mining phenotype definitions contained in the literature. Furthermore, our aim is to improve the understanding of how these published phenotype definitions are presented in the literature and how we annotate them for future text mining tasks. Results Two annotators manually annotated the corpus on a sentence-level for the presence of evidence for phenotype definitions. Three major categories (inclusion, intermediate, and exclusion) with a total of ten dimensions were proposed characterizing major contextual patterns and cues for presenting phenotype definitions in published literature. The developed annotation guidelines were used to annotate the corpus that contained 3971 sentences: 1923 out of 3971 (48.4%) for the inclusion category, 1851 out of 3971 (46.6%) for the intermediate category, and 2273 out of 3971 (57.2%) for exclusion category. The highest number of annotated sentences was 1449 out of 3971 (36.5%) for the “Biomedical & Procedure” dimension. The lowest number of annotated sentences was 49 out of 3971 (1.2%) for “The use of NLP”. The overall percent inter-annotator agreement was 97.8%. Percent and Kappa statistics also showed high inter-annotator agreement across all dimensions. Conclusions The corpus and annotation guidelines can serve as a foundational informatics approach for annotating and mining phenotype definitions in literature, and can be used later for text mining applications. Supplementary Information The online version contains supplementary material available at 10.1186/s13326-022-00272-6.
Collapse
Affiliation(s)
- Samar Binkheder
- Department of Biohealth Informatics, Indiana University School of Informatics and Computing, Indianapolis, IN, USA.,Medical Informatics Unit, Department of Medical Education, College of Medicine, King Saud University, Riyadh, Saudi Arabia
| | - Heng-Yi Wu
- Development Science Informatics, Genentech, South San Francisco, CA, USA
| | - Sara K Quinney
- Department of Obstetrics and Gynecology, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Shijun Zhang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Md Muntasir Zitu
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Chien-Wei Chiang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Lei Wang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Josette Jones
- Department of Biohealth Informatics, Indiana University School of Informatics and Computing, Indianapolis, IN, USA
| | - Lang Li
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA. .,, 250 Lincoln Tower, 1800 Cannon Drive, Columbus, OH, 43210, USA.
| |
Collapse
|
5
|
Ball R, Dal Pan G. "Artificial Intelligence" for Pharmacovigilance: Ready for Prime Time? Drug Saf 2022; 45:429-438. [PMID: 35579808 PMCID: PMC9112277 DOI: 10.1007/s40264-022-01157-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/10/2022] [Indexed: 01/28/2023]
Abstract
There is great interest in the application of 'artificial intelligence' (AI) to pharmacovigilance (PV). Although US FDA is broadly exploring the use of AI for PV, we focus on the application of AI to the processing and evaluation of Individual Case Safety Reports (ICSRs) submitted to the FDA Adverse Event Reporting System (FAERS). We describe a general framework for considering the readiness of AI for PV, followed by some examples of the application of AI to ICSR processing and evaluation in industry and FDA. We conclude that AI can usefully be applied to some aspects of ICSR processing and evaluation, but the performance of current AI algorithms requires a 'human-in-the-loop' to ensure good quality. We identify outstanding scientific and policy issues to be addressed before the full potential of AI can be exploited for ICSR processing and evaluation, including approaches to quality assurance of 'human-in-the-loop' AI systems, large-scale, publicly available training datasets, a well-defined and computable 'cognitive framework', a formal sociotechnical framework for applying AI to PV, and development of best practices for applying AI to PV. Practical experience with stepwise implementation of AI for ICSR processing and evaluation will likely provide important lessons that will inform the necessary policy and regulatory framework to facilitate widespread adoption and provide a foundation for further development of AI approaches to other aspects of PV.
Collapse
Affiliation(s)
- Robert Ball
- grid.483500.a0000 0001 2154 2448US Food and Drug Administration, Center for Drug Evaluation and Research, Office of Surveillance and Epidemiology, Silver Spring, MD USA
| | - Gerald Dal Pan
- grid.483500.a0000 0001 2154 2448US Food and Drug Administration, Center for Drug Evaluation and Research, Office of Surveillance and Epidemiology, Silver Spring, MD USA
| |
Collapse
|
6
|
Celli BR, Fabbri LM, Aaron SD, Agusti A, Brook R, Criner GJ, Franssen FME, Humbert M, Hurst JR, O'Donnell D, Pantoni L, Papi A, Rodriguez-Roisin R, Sethi S, Torres A, Vogelmeier CF, Wedzicha JA. An Updated Definition and Severity Classification of Chronic Obstructive Pulmonary Disease Exacerbations: The Rome Proposal. Am J Respir Crit Care Med 2021; 204:1251-1258. [PMID: 34570991 DOI: 10.1164/rccm.202108-1819pp] [Citation(s) in RCA: 105] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Affiliation(s)
- Bartolome R Celli
- Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts
| | - Leonardo M Fabbri
- Section of Respiratory Medicine, Translational Medicine and for Romagna, University of Ferrara, Ferrara, Italy
| | - Shawn D Aaron
- The Ottawa Hospital Research Institute, University of Ottawa, Ottawa, Ontario, Canada
| | - Alvar Agusti
- Universitat de Barcelona, Barcelona, Spain.,Institut Clínic Respiratori, Hospital Clínic de Barcelona, Barcelona, Spain.,Instituto de Investigaciones Biomédicas August Pi i Sunyer, Barcelona, Spain.,Centro de Investigación Biomédica en Red Enfermedades Respiratorias, Madrid, Spain
| | | | - Gerard J Criner
- Department of Thoracic Medicine and Surgery, Lewis Katz School of Medicine, Temple University, Philadelphia, Pennsylvania
| | - Frits M E Franssen
- Department of Research and Education, CIRO, Horn, the Netherlands.,Department of Respiratory Medicine, Maastricht University Medical Center, Maastricht, the Netherlands
| | - Marc Humbert
- Service de Pneumologie et Soins Intensifs Respiratoires, Hôpital Bicêtre, Assistance Publique-Hôpitaux de Paris, Le Kremlin-Bicêtre, France.,Université Paris-Saclay and Institut National de la Santé et de la Recherche Médicale, Unité Mixte de Recherche 999, Le Kremlin-Bicêtre, France
| | - John R Hurst
- UCL Respiratory, University College London, London, United Kingdom
| | - Denis O'Donnell
- Respiratory Investigation Unit, Queens University and Kingston Health Sciences Centre, Kingston, Ontario, Canada
| | - Leonardo Pantoni
- "Luigi Sacco" Department of Biomedical and Clinical Sciences, University of Milan, Milan, Italy
| | - Alberto Papi
- Section of Respiratory Medicine, University of Ferrara, Ferrara, Italy.,Emergency Department, St. Anna University Hospital, Ferrara, Italy
| | - Roberto Rodriguez-Roisin
- Universitat de Barcelona, Barcelona, Spain.,Institut Clínic Respiratori, Hospital Clínic de Barcelona, Barcelona, Spain
| | - Sanjay Sethi
- Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, State University of New York, Buffalo, New York
| | - Antoni Torres
- Universitat de Barcelona, Barcelona, Spain.,Institut Clínic Respiratori, Hospital Clínic de Barcelona, Barcelona, Spain.,Instituto de Investigaciones Biomédicas August Pi i Sunyer, Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats Acadèmia, Centre d'Investigació Biomèdica en Xarxa de Malalties Respiratòries, Barcelona, Spain
| | - Claus F Vogelmeier
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, University Medical Center of Giessen and Marburg, Philipps University of Marburg, Member of the German Center for Lung Research (DZL), Marburg, Germany; and
| | - Jadwiga A Wedzicha
- Respiratory Division, National Heart and Lung Institute, Imperial College London, London, United Kingdom
| |
Collapse
|
7
|
Feature engineering and machine learning for causality assessment in pharmacovigilance: Lessons learned from application to the FDA Adverse Event Reporting System. Comput Biol Med 2021; 135:104517. [PMID: 34130003 DOI: 10.1016/j.compbiomed.2021.104517] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 05/20/2021] [Accepted: 05/21/2021] [Indexed: 10/21/2022]
Abstract
BACKGROUND Our objective was to support the automated classification of Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) reports for their usefulness in assessing the possibility of a causal relationship between a drug product and an adverse event. METHOD We used a data set of 326 redacted FAERS reports that was previously annotated using a modified version of the World Health Organization-Uppsala Monitoring Centre criteria for drug causality assessment by a group of SEs at the FDA and supported a similar study on the classification of reports using supervised machine learning and text engineering methods. We explored many potential features, including the incorporation of natural language processing on report text and information from external data sources, for supervised learning and developed models for predicting the classification status of reports. We then evaluated the models on a larger data set of previously unseen reports. RESULTS The best-performing models achieved recall and F1 scores on both data sets above 0.80 for the identification of assessable reports (i.e. those containing enough information to make an informed causality assessment) and above 0.75 for the identification of reports meeting at least a Possible causality threshold. CONCLUSIONS Causal inference from FAERS reports depends on many components with complex logical relationships that are yet to be made fully computable. Efforts focused on readily addressable tasks, such as quickly eliminating unassessable reports, fit naturally in SE's thought processes to provide real enhancements for FDA workflows.
Collapse
|
8
|
Botsis T, Foster M, Kreimeyer K, Pandey A, Forshee R. Monitoring biomedical literature for post-market safety purposes by analyzing networks of text-based coded information. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2017; 2017:66-75. [PMID: 28815108 PMCID: PMC5543357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Literature review is critical but time-consuming in the post-market surveillance of medical products. We focused on the safety signal of intussusception after the vaccination of infants with the Rotashield Vaccine in 1999 and retrieved all PubMed abstracts for rotavirus vaccines published after January 1, 1998. We used the Event-based Text-mining of Health Electronic Records system, the MetaMap tool, and the National Center for Biomedical Ontologies Annotator to process the abstracts and generate coded terms stamped with the date of publication. Data were analyzed in the Pattern-based and Advanced Network Analyzer for Clinical Evaluation and Assessment to evaluate the intussusception-related findings before and after the release of the new rotavirus vaccines in 2006. The tight connection of intussusception with the historical signal in the first period and the absence of any safety concern for the new vaccines in the second period were verified. We demonstrated the feasibility for semi-automated solutions that may assist medical reviewers in monitoring biomedical literature.
Collapse
Affiliation(s)
- Taxiarchis Botsis
- Office of Biostatistics and Epidemiology, Center for Biologics Evaluation & Research, US Food and Drug Administration, Silver Spring, MD
| | - Matthew Foster
- Office of Biostatistics and Epidemiology, Center for Biologics Evaluation & Research, US Food and Drug Administration, Silver Spring, MD
| | - Kory Kreimeyer
- Office of Biostatistics and Epidemiology, Center for Biologics Evaluation & Research, US Food and Drug Administration, Silver Spring, MD
| | - Abhishek Pandey
- Office of Biostatistics and Epidemiology, Center for Biologics Evaluation & Research, US Food and Drug Administration, Silver Spring, MD
| | - Richard Forshee
- Office of Biostatistics and Epidemiology, Center for Biologics Evaluation & Research, US Food and Drug Administration, Silver Spring, MD
| |
Collapse
|
9
|
Decision support environment for medical product safety surveillance. J Biomed Inform 2016; 64:354-362. [DOI: 10.1016/j.jbi.2016.07.023] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2016] [Revised: 07/22/2016] [Accepted: 07/27/2016] [Indexed: 02/04/2023]
|
10
|
Baer B, Nguyen M, Woo EJ, Winiecki S, Scott J, Martin D, Botsis T, Ball R. Can Natural Language Processing Improve the Efficiency of Vaccine Adverse Event Report Review? Methods Inf Med 2015; 55:144-50. [PMID: 26394725 DOI: 10.3414/me14-01-0066] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2014] [Accepted: 06/30/2015] [Indexed: 11/09/2022]
Abstract
BACKGROUND Individual case review of spontaneous adverse event (AE) reports remains a cornerstone of medical product safety surveillance for industry and regulators. Previously we developed the Vaccine Adverse Event Text Miner (VaeTM) to offer automated information extraction and potentially accelerate the evaluation of large volumes of unstructured data and facilitate signal detection. OBJECTIVE To assess how the information extraction performed by VaeTM impacts the accuracy of a medical expert's review of the vaccine adverse event report. METHODS The "outcome of interest" (diagnosis, cause of death, second level diagnosis), "onset time," and "alternative explanations" (drug, medical and family history) for the adverse event were extracted from 1000 reports from the Vaccine Adverse Event Reporting System (VAERS) using the VaeTM system. We compared the human interpretation, by medical experts, of the VaeTM extracted data with their interpretation of the traditional full text reports for these three variables. Two experienced clinicians alternately reviewed text miner output and full text. A third clinician scored the match rate using a predefined algorithm; the proportion of matches and 95% confidence intervals (CI) were calculated. Review time per report was analyzed. RESULTS Proportion of matches between the interpretation of the VaeTM extracted data, compared to the interpretation of the full text: 93% for outcome of interest (95% CI: 91-94%) and 78% for alternative explanation (95% CI: 75-81%). Extracted data on the time to onset was used in 14% of cases and was a match in 54% (95% CI: 46-63%) of those cases. When supported by structured time data from reports, the match for time to onset was 79% (95% CI: 76-81%). The extracted text averaged 136 (74%) fewer words, resulting in a mean reduction in review time of 50 (58%) seconds per report. CONCLUSION Despite a 74% reduction in words, the clinical conclusion from VaeTM extracted data agreed with the full text in 93% and 78% of reports for the outcome of interest and alternative explanation, respectively. The limited amount of extracted time interval data indicates the need for further development of this feature. VaeTM may improve review efficiency, but further study is needed to determine if this level of agreement is sufficient for routine use.
Collapse
Affiliation(s)
- B Baer
- Bethany Baer, FDA Center for Biologics Evaluation and Research, 10903 New Hampshire Ave, WO71-1323, Silver Spring, MD 20993-0002, 240-402-8584, USA, E-mail:
| | | | | | | | | | | | | | | |
Collapse
|
11
|
Duggirala HJ, Tonning JM, Smith E, Bright RA, Baker JD, Ball R, Bell C, Bright-Ponte SJ, Botsis T, Bouri K, Boyer M, Burkhart K, Steven Condrey G, Chen JJ, Chirtel S, Filice RW, Francis H, Jiang H, Levine J, Martin D, Oladipo T, O’Neill R, Palmer LAM, Paredes A, Rochester G, Sholtes D, Szarfman A, Wong HL, Xu Z, Kass-Hout T. Use of data mining at the Food and Drug Administration. J Am Med Inform Assoc 2015. [DOI: 10.1093/jamia/ocv063] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Abstract
Objectives This article summarizes past and current data mining activities at the United States Food and Drug Administration (FDA).
Target audience We address data miners in all sectors, anyone interested in the safety of products regulated by the FDA (predominantly medical products, food, veterinary products and nutrition, and tobacco products), and those interested in FDA activities.
Scope Topics include routine and developmental data mining activities, short descriptions of mined FDA data, advantages and challenges of data mining at the FDA, and future directions of data mining at the FDA.
Collapse
Affiliation(s)
| | | | - Ella Smith
- Center for Food Safety and Applied Nutrition, FDA
| | | | | | - Robert Ball
- Center for Biologics Evaluation and Research, FDA
| | - Carlos Bell
- Center for Drug Evaluation and Research, FDA
| | | | | | | | - Marc Boyer
- Center for Food Safety and Applied Nutrition, FDA
| | | | | | | | | | | | | | | | | | - David Martin
- Center for Biologics Evaluation and Research, FDA
| | | | | | | | | | | | | | | | | | - Zhiheng Xu
- Center for Devices and Radiological Health, FDA
| | | |
Collapse
|
12
|
Ball R. Perspectives on the future of postmarket vaccine safety surveillance and evaluation. Expert Rev Vaccines 2014; 13:455-62. [DOI: 10.1586/14760584.2014.891941] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|