1
|
Li R, Romano JD, Chen Y, Moore JH. Centralized and Federated Models for the Analysis of Clinical Data. Annu Rev Biomed Data Sci 2024; 7:179-199. [PMID: 38723657 DOI: 10.1146/annurev-biodatasci-122220-115746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/25/2024]
Abstract
The progress of precision medicine research hinges on the gathering and analysis of extensive and diverse clinical datasets. With the continued expansion of modalities, scales, and sources of clinical datasets, it becomes imperative to devise methods for aggregating information from these varied sources to achieve a comprehensive understanding of diseases. In this review, we describe two important approaches for the analysis of diverse clinical datasets, namely the centralized model and federated model. We compare and contrast the strengths and weaknesses inherent in each model and present recent progress in methodologies and their associated challenges. Finally, we present an outlook on the opportunities that both models hold for the future analysis of clinical data.
Collapse
Affiliation(s)
- Ruowang Li
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, California, USA;
| | - Joseph D Romano
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Yong Chen
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Jason H Moore
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, California, USA;
| |
Collapse
|
2
|
Carrell DS, Floyd JS, Gruber S, Hazlehurst BL, Heagerty PJ, Nelson JC, Williamson BD, Ball R. A general framework for developing computable clinical phenotype algorithms. J Am Med Inform Assoc 2024; 31:1785-1796. [PMID: 38748991 PMCID: PMC11258420 DOI: 10.1093/jamia/ocae121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 05/07/2024] [Accepted: 05/14/2024] [Indexed: 07/20/2024] Open
Abstract
OBJECTIVE To present a general framework providing high-level guidance to developers of computable algorithms for identifying patients with specific clinical conditions (phenotypes) through a variety of approaches, including but not limited to machine learning and natural language processing methods to incorporate rich electronic health record data. MATERIALS AND METHODS Drawing on extensive prior phenotyping experiences and insights derived from 3 algorithm development projects conducted specifically for this purpose, our team with expertise in clinical medicine, statistics, informatics, pharmacoepidemiology, and healthcare data science methods conceptualized stages of development and corresponding sets of principles, strategies, and practical guidelines for improving the algorithm development process. RESULTS We propose 5 stages of algorithm development and corresponding principles, strategies, and guidelines: (1) assessing fitness-for-purpose, (2) creating gold standard data, (3) feature engineering, (4) model development, and (5) model evaluation. DISCUSSION AND CONCLUSION This framework is intended to provide practical guidance and serve as a basis for future elaboration and extension.
Collapse
Affiliation(s)
- David S Carrell
- Kaiser Permanente Washington Health Research Institute, Seattle, WA 98101, United States
| | - James S Floyd
- Department of Medicine, School of Medicine, University of Washington, Seattle, WA 98195, United States
- Department of Epidemiology, School of Public Health, University of Washington, Seattle, WA 98195, United States
| | - Susan Gruber
- Putnam Data Sciences, LLC, Cambridge, MA 02139, United States
| | - Brian L Hazlehurst
- Center for Health Research, Kaiser Permanente Northwest, Portland, OR 97227, United States
| | - Patrick J Heagerty
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, WA 98195, United States
| | - Jennifer C Nelson
- Kaiser Permanente Washington Health Research Institute, Seattle, WA 98101, United States
| | - Brian D Williamson
- Kaiser Permanente Washington Health Research Institute, Seattle, WA 98101, United States
| | - Robert Ball
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, United States Food and Drug Administration, Silver Spring, MD 20993, United States
| |
Collapse
|
3
|
Bakken S. Standards and frameworks. J Am Med Inform Assoc 2024; 31:1629-1630. [PMID: 39026503 PMCID: PMC11258492 DOI: 10.1093/jamia/ocae163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Indexed: 07/20/2024] Open
Affiliation(s)
- Suzanne Bakken
- Department of Biomedical Informatics, Data Science Institute, School of Nursing, Columbia University, New York, NY, United States
| |
Collapse
|
4
|
Smith JC, Williamson BD, Cronkite DJ, Park D, Whitaker JM, McLemore MF, Osmanski JT, Winter R, Ramaprasan A, Kelley A, Shea M, Wittayanukorn S, Stojanovic D, Zhao Y, Toh S, Johnson KB, Aronoff DM, Carrell DS. Data-driven automated classification algorithms for acute health conditions: applying PheNorm to COVID-19 disease. J Am Med Inform Assoc 2024; 31:574-582. [PMID: 38109888 PMCID: PMC10873852 DOI: 10.1093/jamia/ocad241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 10/19/2023] [Accepted: 11/27/2023] [Indexed: 12/20/2023] Open
Abstract
OBJECTIVES Automated phenotyping algorithms can reduce development time and operator dependence compared to manually developed algorithms. One such approach, PheNorm, has performed well for identifying chronic health conditions, but its performance for acute conditions is largely unknown. Herein, we implement and evaluate PheNorm applied to symptomatic COVID-19 disease to investigate its potential feasibility for rapid phenotyping of acute health conditions. MATERIALS AND METHODS PheNorm is a general-purpose automated approach to creating computable phenotype algorithms based on natural language processing, machine learning, and (low cost) silver-standard training labels. We applied PheNorm to cohorts of potential COVID-19 patients from 2 institutions and used gold-standard manual chart review data to investigate the impact on performance of alternative feature engineering options and implementing externally trained models without local retraining. RESULTS Models at each institution achieved AUC, sensitivity, and positive predictive value of 0.853, 0.879, 0.851 and 0.804, 0.976, and 0.885, respectively, at quantiles of model-predicted risk that maximize F1. We report performance metrics for all combinations of silver labels, feature engineering options, and models trained internally versus externally. DISCUSSION Phenotyping algorithms developed using PheNorm performed well at both institutions. Performance varied with different silver-standard labels and feature engineering options. Models developed locally at one site also worked well when implemented externally at the other site. CONCLUSION PheNorm models successfully identified an acute health condition, symptomatic COVID-19. The simplicity of the PheNorm approach allows it to be applied at multiple study sites with substantially reduced overhead compared to traditional approaches.
Collapse
Affiliation(s)
- Joshua C Smith
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Brian D Williamson
- Kaiser Permanente Washington Health Research Institute, Seattle, WA 98101, United States
| | - David J Cronkite
- Kaiser Permanente Washington Health Research Institute, Seattle, WA 98101, United States
| | - Daniel Park
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Jill M Whitaker
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Michael F McLemore
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Joshua T Osmanski
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Robert Winter
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Arvind Ramaprasan
- Kaiser Permanente Washington Health Research Institute, Seattle, WA 98101, United States
| | - Ann Kelley
- Kaiser Permanente Washington Health Research Institute, Seattle, WA 98101, United States
| | - Mary Shea
- Kaiser Permanente Washington Health Research Institute, Seattle, WA 98101, United States
| | - Saranrat Wittayanukorn
- Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD 20903, United States
| | - Danijela Stojanovic
- Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD 20903, United States
| | - Yueqin Zhao
- Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD 20903, United States
| | - Sengwee Toh
- Harvard Pilgrim Health Care Institute, Boston, MA 02215, United States
| | - Kevin B Johnson
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - David M Aronoff
- Department of Medicine, Indiana University School of Medicine, Indianapolis, IN 46202, United States
| | - David S Carrell
- Kaiser Permanente Washington Health Research Institute, Seattle, WA 98101, United States
| |
Collapse
|
5
|
Ailawadhi S, Romanus D, Shah S, Fraeman K, Saragoussi D, Buus RM, Nguyen B, Cherepanov D, Lamerato L, Berger A. Development and validation of algorithms for identifying lines of therapy in multiple myeloma using real-world data. Future Oncol 2024. [PMID: 38231002 DOI: 10.2217/fon-2023-0696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2024] Open
Abstract
Aim: To validate algorithms based on electronic health data to identify composition of lines of therapy (LOT) in multiple myeloma (MM). Materials & methods: This study used available electronic health data for selected adults within Henry Ford Health (Michigan, USA) newly diagnosed with MM in 2006-2017. Algorithm performance in this population was verified via chart review. As with prior oncology studies, good performance was defined as positive predictive value (PPV) ≥75%. Results: Accuracy for identifying LOT1 (N = 133) was 85.0%. For the most frequent regimens, accuracy was 92.5-97.7%, PPV 80.6-93.8%, sensitivity 88.2-89.3% and specificity 94.3-99.1%. Algorithm performance decreased in subsequent LOTs, with decreasing sample sizes. Only 19.5% of patients received maintenance therapy during LOT1. Accuracy for identifying maintenance therapy was 85.7%; PPV for the most common maintenance therapy was 73.3%. Conclusion: Algorithms performed well in identifying LOT1 - especially more commonly used regimens - and slightly less well in identifying maintenance therapy therein.
Collapse
Affiliation(s)
- Sikander Ailawadhi
- Division of Hematology/Oncology, Department of Medicine, Mayo Clinic, Jacksonville, FL 32224, USA
| | - Dorothy Romanus
- Global Evidence & Outcomes, Takeda Development Center Americas, Inc. (TDCA), Lexington, MA 02421, USA
| | - Surbhi Shah
- Real-World Evidence, Evidera/PPD (part of Thermo fisher Scientific), Bethesda, MD 20814, USA
| | - Kathy Fraeman
- Real-World Evidence, Evidera/PPD (part of Thermo fisher Scientific), Bethesda, MD 20814, USA
| | - Delphine Saragoussi
- Real-World Evidence, Evidera/PPD (part of Thermo fisher Scientific), London, W6 8BJ, UK
| | - Rebecca Morris Buus
- Epidemiology and Scientific Affairs, Clinical Development Services Division, Evidera/PPD (part of Thermo Fisher Scientific), Bethesda, MD 20814, USA
| | - Binh Nguyen
- Medical Science and Strategy, Oncology, PPD (part of Thermo Fisher Scientific), Bethesda, MD 20814, USA
| | - Dasha Cherepanov
- Global Evidence & Outcomes, Takeda Development Center Americas, Inc. (TDCA), Lexington, MA 02421, USA
| | | | - Ariel Berger
- Real-World Evidence, Evidera/PPD (part of Thermo fisher Scientific), Bethesda, MD 20814, USA
| |
Collapse
|
6
|
Mollalo A, Hamidi B, Lenert L, Alekseyenko AV. Application of Spatial Analysis for Electronic Health Records: Characterizing Patient Phenotypes and Emerging Trends. RESEARCH SQUARE 2024:rs.3.rs-3443865. [PMID: 37886509 PMCID: PMC10602163 DOI: 10.21203/rs.3.rs-3443865/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2023]
Abstract
Background Electronic health records (EHR) commonly contain patient addresses that provide valuable data for geocoding and spatial analysis, enabling more comprehensive descriptions of individual patients for clinical purposes. Despite the widespread use of EHR in clinical decision support and interventions, no systematic review has examined the extent to which spatial analysis is used to characterize patient phenotypes. Objective This study reviews advanced spatial analyses that employed individual-level health data from EHR within the US to characterize patient phenotypes. Methods We systematically evaluated English-language peer-reviewed articles from PubMed/MEDLINE, Scopus, Web of Science, and Google Scholar databases from inception to August 20, 2023, without imposing constraints on time, study design, or specific health domains. Results Only 49 articles met the eligibility criteria. These articles utilized diverse spatial methods, with a predominant focus on clustering techniques, while spatiotemporal analysis (frequentist and Bayesian) and modeling were relatively underexplored. A noteworthy surge (n = 42, 85.7%) in publications was observed post-2017. The publications investigated a variety of adult and pediatric clinical areas, including infectious disease, endocrinology, and cardiology, using phenotypes defined over a range of data domains, such as demographics, diagnoses, and visits. The primary health outcomes investigated were asthma, hypertension, and diabetes. Notably, patient phenotypes involving genomics, imaging, and notes were rarely utilized. Conclusions This review underscores the growing interest in spatial analysis of EHR-derived data and highlights knowledge gaps in clinical health, phenotype domains, and spatial methodologies. Additionally, this review proposes guidelines for harnessing the potential of spatial analysis to enhance the context of individual patients for future clinical decision support.
Collapse
|
7
|
Weberpals J, Wang SV. The FAIRification of research in real-world evidence: A practical introduction to reproducible analytic workflows using Git and R. Pharmacoepidemiol Drug Saf 2024; 33:e5740. [PMID: 38173166 DOI: 10.1002/pds.5740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 11/29/2023] [Accepted: 11/30/2023] [Indexed: 01/05/2024]
Abstract
Transparency and reproducibility are major prerequisites for conducting meaningful real-world evidence (RWE) studies that are fit for decision-making. Many advances have been made in the documentation and reporting of study protocols and results, but the principles for version control and sharing of analytic code in RWE are not yet as established as in other quantitative disciplines like computational biology and health informatics. In this practical tutorial, we aim to give an introduction to distributed version control systems (VCS) tailored toward the FAIR (Findable, Accessible, Interoperable, and Reproducible) implementation of RWE studies. To ease adoption, we provide detailed step-by-step instructions with practical examples on how the Git VCS and R programming language can be implemented into RWE study workflows to facilitate reproducible analyzes. We further discuss and showcase how these tools can be used to track changes, collaborate, disseminate, and archive RWE studies through dedicated project repositories that maintain a complete audit trail of all relevant study documents.
Collapse
Affiliation(s)
- Janick Weberpals
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Shirley V Wang
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
8
|
Maro JC, Nguyen MD, Kolonoski J, Schoeplein R, Huang TY, Dutcher SK, Dal Pan GJ, Ball R. Six Years of the US Food and Drug Administration's Postmarket Active Risk Identification and Analysis System in the Sentinel Initiative: Implications for Real World Evidence Generation. Clin Pharmacol Ther 2023; 114:815-824. [PMID: 37391385 DOI: 10.1002/cpt.2979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 05/25/2023] [Indexed: 07/02/2023]
Abstract
Congress mandated the creation of a postmarket Active Risk Identification and Analysis (ARIA) system containing data on 100 million individuals for monitoring risks associated with drug and biologic products using data from disparate sources to complement the US Food and Drug Administration's (FDA's) existing postmarket capabilities. We report on the first 6 years of ARIA utilization in the Sentinel System (2016-2021). The FDA has used the ARIA system to evaluate 133 safety concerns; 54 of these evaluations have closed with regulatory determinations, whereas the rest remain in progress. If the ARIA system and the FDA's Adverse Event Reporting System are deemed insufficient to address a safety concern, then the FDA may issue a postmarket requirement to a product's manufacturer. One hundred ninety-seven ARIA insufficiency determinations have been made. The most common situation for which ARIA was found to be insufficient is the evaluation of adverse pregnancy and fetal outcomes following in utero drug exposure, followed by neoplasms and death. ARIA was most likely to be sufficient for thromboembolic events, which have high positive predictive value in claims data alone and do not require supplemental clinical data. The lessons learned from this experience illustrate the continued challenges using administrative claims data, especially to define novel clinical outcomes. This analysis can help to identify where more granular clinical data are needed to fill gaps to improve the use of real-world data for drug safety analyses and provide insights into what is needed to efficiently generate high-quality real-world evidence for efficacy.
Collapse
Affiliation(s)
- Judith C Maro
- Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, Massachusetts, USA
| | - Michael D Nguyen
- Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, Maryland, USA
| | - Joy Kolonoski
- Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, Massachusetts, USA
| | - Ryan Schoeplein
- Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, Massachusetts, USA
| | - Ting-Ying Huang
- Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, Massachusetts, USA
| | - Sarah K Dutcher
- Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, Maryland, USA
| | - Gerald J Dal Pan
- Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, Maryland, USA
| | - Robert Ball
- Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, Maryland, USA
| |
Collapse
|
9
|
Williamson BD, Wyss R, Stuart EA, Dang LE, Mertens AN, Neugebauer RS, Wilson A, Gruber S. An application of the Causal Roadmap in two safety monitoring case studies: Causal inference and outcome prediction using electronic health record data. J Clin Transl Sci 2023; 7:e208. [PMID: 37900347 PMCID: PMC10603358 DOI: 10.1017/cts.2023.632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 09/12/2023] [Accepted: 09/13/2023] [Indexed: 10/31/2023] Open
Abstract
Background Real-world data, such as administrative claims and electronic health records, are increasingly used for safety monitoring and to help guide regulatory decision-making. In these settings, it is important to document analytic decisions transparently and objectively to assess and ensure that analyses meet their intended goals. Methods The Causal Roadmap is an established framework that can guide and document analytic decisions through each step of the analytic pipeline, which will help investigators generate high-quality real-world evidence. Results In this paper, we illustrate the utility of the Causal Roadmap using two case studies previously led by workgroups sponsored by the Sentinel Initiative - a program for actively monitoring the safety of regulated medical products. Each case example focuses on different aspects of the analytic pipeline for drug safety monitoring. The first case study shows how the Causal Roadmap encourages transparency, reproducibility, and objective decision-making for causal analyses. The second case study highlights how this framework can guide analytic decisions beyond inference on causal parameters, improving outcome ascertainment in clinical phenotyping. Conclusion These examples provide a structured framework for implementing the Causal Roadmap in safety surveillance and guide transparent, reproducible, and objective analysis.
Collapse
Affiliation(s)
- Brian D. Williamson
- Biostatistics Division, Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| | - Richard Wyss
- Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Elizabeth A. Stuart
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Lauren E. Dang
- Department of Biostatistics, University of California, Berkeley, CA, USA
| | - Andrew N. Mertens
- Department of Biostatistics, University of California, Berkeley, CA, USA
| | | | | | | |
Collapse
|
10
|
He T, Belouali A, Patricoski J, Lehmann H, Ball R, Anagnostou V, Kreimeyer K, Botsis T. Trends and opportunities in computable clinical phenotyping: A scoping review. J Biomed Inform 2023; 140:104335. [PMID: 36933631 DOI: 10.1016/j.jbi.2023.104335] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 03/07/2023] [Accepted: 03/09/2023] [Indexed: 03/18/2023]
Abstract
Identifying patient cohorts meeting the criteria of specific phenotypes is essential in biomedicine and particularly timely in precision medicine. Many research groups deliver pipelines that automatically retrieve and analyze data elements from one or more sources to automate this task and deliver high-performing computable phenotypes. We applied a systematic approach based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines to conduct a thorough scoping review on computable clinical phenotyping. Five databases were searched using a query that combined the concepts of automation, clinical context, and phenotyping. Subsequently, four reviewers screened 7960 records (after removing over 4000 duplicates) and selected 139 that satisfied the inclusion criteria. This dataset was analyzed to extract information on target use cases, data-related topics, phenotyping methodologies, evaluation strategies, and portability of developed solutions. Most studies supported patient cohort selection without discussing the application to specific use cases, such as precision medicine. Electronic Health Records were the primary source in 87.1 % (N = 121) of all studies, and International Classification of Diseases codes were heavily used in 55.4 % (N = 77) of all studies, however, only 25.9 % (N = 36) of the records described compliance with a common data model. In terms of the presented methods, traditional Machine Learning (ML) was the dominant method, often combined with natural language processing and other approaches, while external validation and portability of computable phenotypes were pursued in many cases. These findings revealed that defining target use cases precisely, moving away from sole ML strategies, and evaluating the proposed solutions in the real setting are essential opportunities for future work. There is also momentum and an emerging need for computable phenotyping to support clinical and epidemiological research and precision medicine.
Collapse
Affiliation(s)
- Ting He
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA; Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| | - Anas Belouali
- Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Jessica Patricoski
- Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Harold Lehmann
- Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Robert Ball
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Valsamo Anagnostou
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Kory Kreimeyer
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA; Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Taxiarchis Botsis
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA; Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
11
|
van Stekelenborg J, Kara V, Haack R, Vogel U, Garg A, Krupp M, Gofman K, Dreyfus B, Hauben M, Bate A. Individual Case Safety Report Replication: An Analysis of Case Reporting Transmission Networks. Drug Saf 2023; 46:39-52. [PMID: 36565374 PMCID: PMC9870831 DOI: 10.1007/s40264-022-01251-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/18/2022] [Indexed: 12/25/2022]
Abstract
INTRODUCTION The basis of pharmacovigilance is provided by the exchange of Individual Case Safety Reports (ICSRs) between the recipient of the original report and other interested parties, which include Marketing Authorization Holders (MAHs) and Health Authorities (HAs). Different regulators have different reporting requirements for report transmission. This results in replication of each ICSR that will exist in multiple locations. Adding in the fact that each case will go through multiple versions, different recipients may receive different case versions at different times, potentially influencing patient safety decisions and potentially amplifying or obscuring safety signals inappropriately. OBJECTIVE The present study aimed to investigate the magnitude of replication, the variability among recipients, and the subsequent divergence across recipients of ICSRs. METHODS Seven participating TransCelerate Member Companies (MCs) queried their respective safety databases covering a 3-year period and provided aggregate ICSR submission statistics for expedited safety reports to an independent project manager. As measured in the US Food and Drug Administration (FDA)'s Adverse Event Reporting System (FAERS), ICSR volume for these seven MCs makes up approximately 20% of the total case volume. Aggregate metrics were calculated from the company data, specifically: (i) number of ICSR transmissions, (ii) average number of recipients (ANR) per case version transmitted, (iii) a submission selectivity metric, which measures the percentage of recipients not having received all sequential case version numbers, and (iv) percent of common ISCRs residing in two or more MAH databases. RESULTS The analysis reflects 2,539,802 case versions, distributed through 7,602,678 submissions. The overall mean replication rate is 3.0 submissions per case version. The distribution of the ANR replication measure was observed to be very long-tailed, with a significant fraction of case versions (~ 12.4% of all transmissions) being sent to ten or more HA recipients. Replication is higher than average for serious, unlisted, and literature cases, ranging from 3.5 to 6.1 submissions per version. Within the subset of ICSR versions sent to three recipients, a significant degree of variability in the actual recipients (i.e., HAs) was observed, indicating that there is not one single combination of the same three HAs predominantly receiving an ICSR. Submission selectivity increases with the case version. For case version 6, the range of the submission selectivity for the MAHs ranges from ~ 10% to over 50%, with a median of 30.2%. Within the participating MAHs, the percentage of cases that reside within at least two safety databases is approximately 2% across five databases. Further analysis of the data from three MAHs showed percentages of 13.4%, 15.6%, and 27.9% of ICSRs originating from HAs and any other partners such as other MAHs and other institutions. CONCLUSION Replication of ICSRs and the variation of available safety information in recipient databases were quantified and shown to be substantial. Our work shows that multiple processors and medical reviewers will likely handle the same original ICSR as a result of replication. Aside from the obvious duplicate work, this phenomenon could conceivably lead to differing clinical assessments and decisions. If replication could be reduced or even eliminated, this would enable more focus on activities with a benefit for patient safety.
Collapse
Affiliation(s)
- John van Stekelenborg
- Global Medical Safety, Janssen, the Pharmaceutical Companies of Johnson & Johnson, Horsham, PA USA
| | | | - Roman Haack
- Bayer AG, Pharmacovigilance Analytics, Berlin, Germany
| | - Ulrich Vogel
- Global Patient Safety and Pharmacovigilance, Boehringer Ingelheim International GmbH, Ingelheim am Rhein, Germany
| | - Anju Garg
- Safety Analyses Innovation and Submission Readiness Lead, Global Pharmacovigilance, Sanofi, Bridgewater, NJ USA
| | - Markus Krupp
- Global Patient Safety, Merck Healthcare KGaA, Darmstadt, Germany
| | - Kate Gofman
- CVRM Unit, Global Patient Safety Physician, CMO Patient Safety, AstraZeneca, Gaithersburg, MD USA
| | - Brian Dreyfus
- Epidemiology Lead for Integrated Oncology, Bristol-Myers Squibb Company, Princeton, NJ USA
| | - Manfred Hauben
- Pfizer Inc, New York, NY USA ,Department of Medicine, NYU Langone Health, New York, NY USA
| | - Andrew Bate
- Global Safety, GSK, Brentford, UK ,Department of Non‑Communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK ,Department of Medicine at NYU Grossman School of Medicine, New York, NY USA
| |
Collapse
|
12
|
Brown JS, Mendelsohn AB, Nam YH, Maro JC, Cocoros NM, Rodriguez-Watson C, Lockhart CM, Platt R, Ball R, Dal Pan GJ, Toh S. The US Food and Drug Administration Sentinel System: a national resource for a learning health system. J Am Med Inform Assoc 2022; 29:2191-2200. [PMID: 36094070 PMCID: PMC9667154 DOI: 10.1093/jamia/ocac153] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 06/18/2022] [Accepted: 08/18/2022] [Indexed: 07/23/2023] Open
Abstract
The US Food and Drug Administration (FDA) created the Sentinel System in response to a requirement in the FDA Amendments Act of 2007 that the agency establish a system for monitoring risks associated with drug and biologic products using data from disparate sources. The Sentinel System has completed hundreds of analyses, including many that have directly informed regulatory decisions. The Sentinel System also was designed to support a national infrastructure for a learning health system. Sentinel governance and guiding principles were designed to facilitate Sentinel's role as a national resource. The Sentinel System infrastructure now supports multiple non-FDA projects for stakeholders ranging from regulated industry to other federal agencies, international regulators, and academics. The Sentinel System is a working example of a learning health system that is expanding with the potential to create a global learning health system that can support medical product safety assessments and other research.
Collapse
Affiliation(s)
- Jeffrey S Brown
- Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, Massachusetts, USA
| | - Aaron B Mendelsohn
- Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, Massachusetts, USA
| | - Young Hee Nam
- Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, Massachusetts, USA
| | - Judith C Maro
- Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, Massachusetts, USA
| | - Noelle M Cocoros
- Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, Massachusetts, USA
| | - Carla Rodriguez-Watson
- Reagan-Udall Foundation for the Food and Drug Administration, Washington, District of Columbia, USA
| | - Catherine M Lockhart
- Biologics and Biosimilars Collective Intelligence Consortium, Alexandria, Virginia, USA
| | - Richard Platt
- Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, Massachusetts, USA
| | - Robert Ball
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, Maryland, USA
| | - Gerald J Dal Pan
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, Maryland, USA
| | - Sengwee Toh
- Corresponding Author: Sengwee Toh, ScD, Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, 401 Park Drive, Suite 401 East, Boston, MA 02215, USA;
| |
Collapse
|
13
|
Carrell DS, Gruber S, Floyd JS, Bann MA, Cushing-Haugen KL, Johnson RL, Graham V, Cronkite DJ, Hazlehurst BL, Felcher AH, Bejan CA, Kennedy A, Shinde MU, Karami S, Ma Y, Stojanovic D, Zhao Y, Ball R, Nelson JC. Improving Methods of Identifying Anaphylaxis for Medical Product Safety Surveillance Using Natural Language Processing and Machine Learning. Am J Epidemiol 2022; 192:283-295. [PMID: 36331289 PMCID: PMC9896464 DOI: 10.1093/aje/kwac182] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Revised: 07/06/2022] [Accepted: 10/11/2022] [Indexed: 11/06/2022] Open
Abstract
We sought to determine whether machine learning and natural language processing (NLP) applied to electronic medical records could improve performance of automated health-care claims-based algorithms to identify anaphylaxis events using data on 516 patients with outpatient, emergency department, or inpatient anaphylaxis diagnosis codes during 2015-2019 in 2 integrated health-care institutions in the Northwest United States. We used one site's manually reviewed gold-standard outcomes data for model development and the other's for external validation based on cross-validated area under the receiver operating characteristic curve (AUC), positive predictive value (PPV), and sensitivity. In the development site 154 (64%) of 239 potential events met adjudication criteria for anaphylaxis compared with 180 (65%) of 277 in the validation site. Logistic regression models using only structured claims data achieved a cross-validated AUC of 0.58 (95% CI: 0.54, 0.63). Machine learning improved cross-validated AUC to 0.62 (0.58, 0.66); incorporating NLP-derived covariates further increased cross-validated AUCs to 0.70 (0.66, 0.75) in development and 0.67 (0.63, 0.71) in external validation data. A classification threshold with cross-validated PPV of 79% and cross-validated sensitivity of 66% in development data had cross-validated PPV of 78% and cross-validated sensitivity of 56% in external data. Machine learning and NLP-derived data improved identification of validated anaphylaxis events.
Collapse
Affiliation(s)
- David S Carrell
- Correspondence to Dr. David Carrell, Kaiser Permanente Washington Health Research Institute, 1730 Minor Avenue, Suite 1600, Seattle, WA 98101 (e-mail: )
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
14
|
Gonzalez-Hernandez G, Krallinger M, Muñoz M, Rodriguez-Esteban R, Uzuner Ö, Hirschman L. Challenges and opportunities for mining adverse drug reactions: perspectives from pharma, regulatory agencies, healthcare providers and consumers. Database (Oxford) 2022; 2022:baac071. [PMID: 36050787 PMCID: PMC9436770 DOI: 10.1093/database/baac071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 07/08/2022] [Accepted: 08/25/2022] [Indexed: 11/17/2022]
Abstract
Monitoring drug safety is a central concern throughout the drug life cycle. Information about toxicity and adverse events is generated at every stage of this life cycle, and stakeholders have a strong interest in applying text mining and artificial intelligence (AI) methods to manage the ever-increasing volume of this information. Recognizing the importance of these applications and the role of challenge evaluations to drive progress in text mining, the organizers of BioCreative VII (Critical Assessment of Information Extraction in Biology) convened a panel of experts to explore 'Challenges in Mining Drug Adverse Reactions'. This article is an outgrowth of the panel; each panelist has highlighted specific text mining application(s), based on their research and their experiences in organizing text mining challenge evaluations. While these highlighted applications only sample the complexity of this problem space, they reveal both opportunities and challenges for text mining to aid in the complex process of drug discovery, testing, marketing and post-market surveillance. Stakeholders are eager to embrace natural language processing and AI tools to help in this process, provided that these tools can be demonstrated to add value to stakeholder workflows. This creates an opportunity for the BioCreative community to work in partnership with regulatory agencies, pharma and the text mining community to identify next steps for future challenge evaluations.
Collapse
Affiliation(s)
- Graciela Gonzalez-Hernandez
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., West Hollywood, CA 90069, USA
| | - Martin Krallinger
- Life Sciences—Text Mining, Barcelona Supercomputing Center, Plaça Eusebi Güell, 1-3, Barcelona 08034, Spain
| | - Monica Muñoz
- Division of Pharmacovigilance, Office of Surveillance and Epidemiology, Center of Drug Evaluation and Research, FDA, 10903 New Hampshire Ave, Silver Spring, MD 20993, USA
| | - Raul Rodriguez-Esteban
- Roche Innovation Center Basel, Roche Pharmaceuticals, Grenzacherstrasse 124, Basel 4070, Switzerland
| | - Özlem Uzuner
- Information Sciences and Technology, George Mason University, 4400 University Dr, Fairfax, VA 22030, USA
| | - Lynette Hirschman
- MITRE Labs, The MITRE Corporation, 202 Burlington Rd., Bedford, MA 01730, USA
| |
Collapse
|
15
|
Penberthy LT, Rivera DR, Lund JL, Bruno MA, Meyer AM. An overview of real-world data sources for oncology and considerations for research. CA Cancer J Clin 2022; 72:287-300. [PMID: 34964981 DOI: 10.3322/caac.21714] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 11/12/2021] [Accepted: 11/18/2021] [Indexed: 12/11/2022] Open
Abstract
Generating evidence on the use, effectiveness, and safety of new cancer therapies is a priority for researchers, health care providers, payers, and regulators given the rapid pace of change in cancer diagnosis and treatments. The use of real-world data (RWD) is integral to understanding the utilization patterns and outcomes of these new treatments among patients with cancer who are treated in clinical practice and community settings. An initial step in the use of RWD is careful study design to assess the suitability of an RWD source. This pivotal process can be guided by using a conceptual model that encourages predesign conceptualization. The primary types of RWD included are electronic health records, administrative claims data, cancer registries, and specialty data providers and networks. Careful consideration of each data type is necessary because they are collected for a specific purpose, capturing a set of data elements within a certain population for that purpose, and they vary by population coverage and longitudinality. In this review, the authors provide a high-level assessment of the strengths and limitations of each data category to inform data source selection appropriate to the study question. Overall, the development and accessibility of RWD sources for cancer research are rapidly increasing, and the use of these data requires careful consideration of composition and utility to assess important questions in understanding the use and effectiveness of new therapies.
Collapse
Affiliation(s)
- Lynne T Penberthy
- Surveillance Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Rockville, Maryland
| | - Donna R Rivera
- Surveillance Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Rockville, Maryland
| | - Jennifer L Lund
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Melissa A Bruno
- Surveillance Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Rockville, Maryland
| | - Anne-Marie Meyer
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| |
Collapse
|
16
|
Lee P, Abernethy A, Shaywitz D, Gundlapalli AV, Weinstein J, Doraiswamy PM, Schulman K, Madhavan S. Digital Health COVID-19 Impact Assessment: Lessons Learned and Compelling Needs. NAM Perspect 2022; 2022:202201c. [PMID: 35402858 PMCID: PMC8970223 DOI: 10.31478/202201c] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2023]
|
17
|
Anklam E, Bahl MI, Ball R, Beger RD, Cohen J, Fitzpatrick S, Girard P, Halamoda-Kenzaoui B, Hinton D, Hirose A, Hoeveler A, Honma M, Hugas M, Ishida S, Kass GEN, Kojima H, Krefting I, Liachenko S, Liu Y, Masters S, Marx U, McCarthy T, Mercer T, Patri A, Pelaez C, Pirmohamed M, Platz S, Ribeiro AJS, Rodricks JV, Rusyn I, Salek RM, Schoonjans R, Silva P, Svendsen CN, Sumner S, Sung K, Tagle D, Tong L, Tong W, van den Eijnden-van-Raaij J, Vary N, Wang T, Waterton J, Wang M, Wen H, Wishart D, Yuan Y, Slikker Jr. W. Emerging technologies and their impact on regulatory science. Exp Biol Med (Maywood) 2022; 247:1-75. [PMID: 34783606 PMCID: PMC8749227 DOI: 10.1177/15353702211052280] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
There is an evolution and increasing need for the utilization of emerging cellular, molecular and in silico technologies and novel approaches for safety assessment of food, drugs, and personal care products. Convergence of these emerging technologies is also enabling rapid advances and approaches that may impact regulatory decisions and approvals. Although the development of emerging technologies may allow rapid advances in regulatory decision making, there is concern that these new technologies have not been thoroughly evaluated to determine if they are ready for regulatory application, singularly or in combinations. The magnitude of these combined technical advances may outpace the ability to assess fit for purpose and to allow routine application of these new methods for regulatory purposes. There is a need to develop strategies to evaluate the new technologies to determine which ones are ready for regulatory use. The opportunity to apply these potentially faster, more accurate, and cost-effective approaches remains an important goal to facilitate their incorporation into regulatory use. However, without a clear strategy to evaluate emerging technologies rapidly and appropriately, the value of these efforts may go unrecognized or may take longer. It is important for the regulatory science field to keep up with the research in these technically advanced areas and to understand the science behind these new approaches. The regulatory field must understand the critical quality attributes of these novel approaches and learn from each other's experience so that workforces can be trained to prepare for emerging global regulatory challenges. Moreover, it is essential that the regulatory community must work with the technology developers to harness collective capabilities towards developing a strategy for evaluation of these new and novel assessment tools.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Reza M Salek
- International Agency for Research on Cancer, France
| | | | | | | | | | | | | | - Li Tong
- Universities of Georgia Tech and Emory, USA
| | | | | | - Neil Vary
- Canadian Food Inspection Agency, Canada
| | - Tao Wang
- National Medical Products Administration, China
| | | | - May Wang
- Universities of Georgia Tech and Emory, USA
| | - Hairuo Wen
- National Institutes for Food and Drug Control, China
| | | | | | | |
Collapse
|
18
|
Ball R, Dal Pan G. "Artificial Intelligence" for Pharmacovigilance: Ready for Prime Time? Drug Saf 2022; 45:429-438. [PMID: 35579808 PMCID: PMC9112277 DOI: 10.1007/s40264-022-01157-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/10/2022] [Indexed: 01/28/2023]
Abstract
There is great interest in the application of 'artificial intelligence' (AI) to pharmacovigilance (PV). Although US FDA is broadly exploring the use of AI for PV, we focus on the application of AI to the processing and evaluation of Individual Case Safety Reports (ICSRs) submitted to the FDA Adverse Event Reporting System (FAERS). We describe a general framework for considering the readiness of AI for PV, followed by some examples of the application of AI to ICSR processing and evaluation in industry and FDA. We conclude that AI can usefully be applied to some aspects of ICSR processing and evaluation, but the performance of current AI algorithms requires a 'human-in-the-loop' to ensure good quality. We identify outstanding scientific and policy issues to be addressed before the full potential of AI can be exploited for ICSR processing and evaluation, including approaches to quality assurance of 'human-in-the-loop' AI systems, large-scale, publicly available training datasets, a well-defined and computable 'cognitive framework', a formal sociotechnical framework for applying AI to PV, and development of best practices for applying AI to PV. Practical experience with stepwise implementation of AI for ICSR processing and evaluation will likely provide important lessons that will inform the necessary policy and regulatory framework to facilitate widespread adoption and provide a foundation for further development of AI approaches to other aspects of PV.
Collapse
Affiliation(s)
- Robert Ball
- grid.483500.a0000 0001 2154 2448US Food and Drug Administration, Center for Drug Evaluation and Research, Office of Surveillance and Epidemiology, Silver Spring, MD USA
| | - Gerald Dal Pan
- grid.483500.a0000 0001 2154 2448US Food and Drug Administration, Center for Drug Evaluation and Research, Office of Surveillance and Epidemiology, Silver Spring, MD USA
| |
Collapse
|
19
|
Desai RJ, Matheny ME, Johnson K, Marsolo K, Curtis LH, Nelson JC, Heagerty PJ, Maro J, Brown J, Toh S, Nguyen M, Ball R, Pan GD, Wang SV, Gagne JJ, Schneeweiss S. Broadening the reach of the FDA Sentinel system: A roadmap for integrating electronic health record data in a causal analysis framework. NPJ Digit Med 2021; 4:170. [PMID: 34931012 PMCID: PMC8688411 DOI: 10.1038/s41746-021-00542-0] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 11/28/2021] [Indexed: 11/09/2022] Open
Abstract
The Sentinel System is a major component of the United States Food and Drug Administration's (FDA) approach to active medical product safety surveillance. While Sentinel has historically relied on large quantities of health insurance claims data, leveraging longitudinal electronic health records (EHRs) that contain more detailed clinical information, as structured and unstructured features, may address some of the current gaps in capabilities. We identify key challenges when using EHR data to investigate medical product safety in a scalable and accelerated way, outline potential solutions, and describe the Sentinel Innovation Center's initiatives to put solutions into practice by expanding and strengthening the existing system with a query-ready, large-scale data infrastructure of linked EHR and claims data. We describe our initiatives in four strategic priority areas: (1) data infrastructure, (2) feature engineering, (3) causal inference, and (4) detection analytics, with the goal of incorporating emerging data science innovations to maximize the utility of EHR data for medical product safety surveillance.
Collapse
Affiliation(s)
- Rishi J Desai
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
| | - Michael E Matheny
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Kevin Johnson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Keith Marsolo
- Department of Population Health Sciences, Duke University, Durham, NC, USA
| | - Lesley H Curtis
- Department of Population Health Sciences, Duke University, Durham, NC, USA
| | - Jennifer C Nelson
- Biostatistics Unit, Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| | | | - Judith Maro
- Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, MA, USA
| | - Jeffery Brown
- Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, MA, USA
| | - Sengwee Toh
- Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, MA, USA
| | - Michael Nguyen
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, FDA, Silver Spring, MD, USA
| | - Robert Ball
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, FDA, Silver Spring, MD, USA
| | - Gerald Dal Pan
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, FDA, Silver Spring, MD, USA
| | - Shirley V Wang
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Joshua J Gagne
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.,Johnson & Johnson, New Brunswick, NJ, USA
| | - Sebastian Schneeweiss
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| |
Collapse
|
20
|
An efficient and accurate distributed learning algorithm for modeling multi-site zero-inflated count outcomes. Sci Rep 2021; 11:19647. [PMID: 34608222 PMCID: PMC8490431 DOI: 10.1038/s41598-021-99078-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2021] [Accepted: 09/13/2021] [Indexed: 12/03/2022] Open
Abstract
Clinical research networks (CRNs), made up of multiple healthcare systems each with patient data from several care sites, are beneficial for studying rare outcomes and increasing generalizability of results. While CRNs encourage sharing aggregate data across healthcare systems, individual systems within CRNs often cannot share patient-level data due to privacy regulations, prohibiting multi-site regression which requires an analyst to access all individual patient data pooled together. Meta-analysis is commonly used to model data stored at multiple institutions within a CRN but can result in biased estimation, most notably in rare-event contexts. We present a communication-efficient, privacy-preserving algorithm for modeling multi-site zero-inflated count outcomes within a CRN. Our method, a one-shot distributed algorithm for performing hurdle regression (ODAH), models zero-inflated count data stored in multiple sites without sharing patient-level data across sites, resulting in estimates closely approximating those that would be obtained in a pooled patient-level data analysis. We evaluate our method through extensive simulations and two real-world data applications using electronic health records: examining risk factors associated with pediatric avoidable hospitalization and modeling serious adverse event frequency associated with a colorectal cancer therapy. In simulations, ODAH produced bias less than 0.1% across all settings explored while meta-analysis estimates exhibited bias up to 12.7%, with meta-analysis performing worst in settings with high zero-inflation or low event rates. Across both applied analyses, ODAH estimates had less than 10% bias for 18 of 20 coefficients estimated, while meta-analysis estimates exhibited substantially higher bias. Relative to existing methods for distributed data analysis, ODAH offers a highly accurate, computationally efficient method for modeling multi-site zero-inflated count data.
Collapse
|
21
|
Csoke E, Landes S, Francis MJ, Ma L, Teotico Pohlhaus D, Anquez-Traxler C. How can real-world evidence aid decision making during the life cycle of nonprescription medicines? Clin Transl Sci 2021; 15:43-54. [PMID: 34405554 PMCID: PMC8742642 DOI: 10.1111/cts.13129] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 06/21/2021] [Accepted: 07/06/2021] [Indexed: 11/30/2022] Open
Abstract
Real-world evidence (RWE) is an emerging scientific discipline which is being increasingly utilized for decision making on prescription-only medicines. However, there has been little focus to date on the application of RWE within the nonprescription sector. This paper reviews the existing and potential applications of RWE for nonprescription medicines, using the nonprescription medicine life cycle as a framework for discussion. Relevant sources of real-world data (RWD) are reviewed and compared with those available for prescribed medicines. Existing life-cycle data gaps are identified where RWE is required or where use of RWE can complement data from randomized controlled trials. Published RWE examples relating to nonprescription medicines are summarized, and potential relevant future sources of RWD discussed. Challenges and limitations to the use of RWE on nonprescription medicines are discussed, and recommendations made to promote optimal and appropriate use of RWE in this sector. Overall, RWE currently plays a key role in specific phases of the nonprescription medicine life cycle, including reclassification and postmarketing safety surveillance. The increasing availability of patient-generated health data is likely to further increase the utilization of RWE to aid decision making on nonprescription medicines.
Collapse
Affiliation(s)
- Emese Csoke
- Regulatory, Medical, Safety and Compliance, Bayer Consumer Healthcare, Basel, Switzerland
| | - Sabine Landes
- Consumer Health Care Medical Affairs, Sanofi-Aventis Germany, Frankfurt, Germany
| | - Matthew J Francis
- Global Safety Surveillance & Analysis, The Procter & Gamble Company, Cincinnati, Ohio, USA
| | - Larry Ma
- Office of Consumer Medical Safety, Johnson & Johnson, New Brunswick, New Jersey, USA
| | - Denise Teotico Pohlhaus
- Consumer and Sensory Product Understanding, GSK Consumer Health, Collegeville, Pennsylvania, USA
| | - Christelle Anquez-Traxler
- Regulatory and Scientific Affairs, AESGP, The Association of the European Self-Care Industry, Brussels, Belgium
| |
Collapse
|
22
|
Haynes K. Preparing for COVID-19 vaccine safety surveillance: A United States perspective. Pharmacoepidemiol Drug Saf 2020; 29:1529-1531. [PMID: 32978861 PMCID: PMC7537525 DOI: 10.1002/pds.5142] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Revised: 07/11/2020] [Accepted: 09/21/2020] [Indexed: 11/23/2022]
|