1
|
Schuler A, Walsh D, Hall D, Walsh J, Fisher C. Increasing the efficiency of randomized trial estimates via linear adjustment for a prognostic score. Int J Biostat 2022; 18:329-356. [PMID: 34957728 DOI: 10.1515/ijb-2021-0072] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Accepted: 11/28/2021] [Indexed: 01/10/2023]
Abstract
Estimating causal effects from randomized experiments is central to clinical research. Reducing the statistical uncertainty in these analyses is an important objective for statisticians. Registries, prior trials, and health records constitute a growing compendium of historical data on patients under standard-of-care that may be exploitable to this end. However, most methods for historical borrowing achieve reductions in variance by sacrificing strict type-I error rate control. Here, we propose a use of historical data that exploits linear covariate adjustment to improve the efficiency of trial analyses without incurring bias. Specifically, we train a prognostic model on the historical data, then estimate the treatment effect using a linear regression while adjusting for the trial subjects' predicted outcomes (their prognostic scores). We prove that, under certain conditions, this prognostic covariate adjustment procedure attains the minimum variance possible among a large class of estimators. When those conditions are not met, prognostic covariate adjustment is still more efficient than raw covariate adjustment and the gain in efficiency is proportional to a measure of the predictive accuracy of the prognostic model above and beyond the linear relationship with the raw covariates. We demonstrate the approach using simulations and a reanalysis of an Alzheimer's disease clinical trial and observe meaningful reductions in mean-squared error and the estimated variance. Lastly, we provide a simplified formula for asymptotic variance that enables power calculations that account for these gains. Sample size reductions between 10% and 30% are attainable when using prognostic models that explain a clinically realistic percentage of the outcome variance.
Collapse
Affiliation(s)
| | | | - Diana Hall
- Unlearn.AI, Inc., San Francisco, CA, USA
| | - Jon Walsh
- Unlearn.AI, Inc., San Francisco, CA, USA
| | | | -
- UC Berkeley Center for Targeted Learning, Berkeley, CA, USA
| | -
- UC Berkeley Center for Targeted Learning, Berkeley, CA, USA
| | -
- UC Berkeley Center for Targeted Learning, Berkeley, CA, USA
| |
Collapse
|
2
|
Schuler A. Mixed Models for Repeated Measures Should Include Time-by-Covariate Interactions to Assure Power Gains and Robustness Against Dropout Bias Relative to Complete-Case ANCOVA. Ther Innov Regul Sci 2021; 56:145-154. [PMID: 34674187 DOI: 10.1007/s43441-021-00348-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Accepted: 10/12/2021] [Indexed: 10/20/2022]
Abstract
In randomized trials with continuous-valued outcomes, the goal is often to estimate the difference in average outcomes between two treatment groups. However, the outcome in some trials is longitudinal, meaning that multiple measurements of the same outcome are taken over time for each subject. The target of inference in this case is often still the difference in averages at a given timepoint. One way to analyze these data is to ignore the measurements at intermediate timepoints and proceed with a standard covariate-adjusted analysis (e.g., ANCOVA) with the complete cases. However, it is generally thought that exploiting information from intermediate timepoints using mixed models for repeated measures (MMRM) (a) increases power and (b) more naturally "handles" missing data. Here, we prove that neither of these conclusions is entirely correct when baseline covariates are adjusted for without including time-by-covariate interactions. We back these claims up with simulations. MMRM provides benefits over complete-cases ANCOVA in many cases, but covariate-time interaction terms should always be included to guarantee the best results.
Collapse
|
3
|
Schuler A. Designing efficient randomized trials: power and sample size calculation when using semiparametric efficient estimators. Int J Biostat 2021; 18:151-171. [PMID: 34364314 DOI: 10.1515/ijb-2021-0039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 07/12/2021] [Indexed: 11/15/2022]
Abstract
Trials enroll a large number of subjects in order to attain power, making them expensive and time-consuming. Sample size calculations are often performed with the assumption of an unadjusted analysis, even if the trial analysis plan specifies a more efficient estimator (e.g. ANCOVA). This leads to conservative estimates of required sample sizes and an opportunity for savings. Here we show that a relatively simple formula can be used to estimate the power of any two-arm, single-timepoint trial analyzed with a semiparametric efficient estimator, regardless of the domain of the outcome or kind of treatment effect (e.g. odds ratio, mean difference). Since an efficient estimator attains the minimum possible asymptotic variance, this allows for the design of trials that are as small as possible while still attaining design power and control of type I error. The required sample size calculation is parsimonious and requires the analyst to provide only a small number of population parameters. We verify in simulation that the large-sample properties of trials designed this way attain their nominal values. Lastly, we demonstrate how to use this formula in the "design" (and subsequent reanalysis) of a real randomized trial and show that fewer subjects are required to attain the same design power when a semiparametric efficient estimator is accounted for at the design stage.
Collapse
|
4
|
Schuemie MJ, Cepeda MS, Suchard MA, Yang J, Tian Y, Schuler A, Ryan PB, Madigan D, Hripcsak G. How Confident Are We about Observational Findings in Healthcare: A Benchmark Study. HARVARD DATA SCIENCE REVIEW 2020; 2. [PMID: 33367288 DOI: 10.1162/99608f92.147cc28e] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Healthcare professionals increasingly rely on observational healthcare data, such as administrative claims and electronic health records, to estimate the causal effects of interventions. However, limited prior studies raise concerns about the real-world performance of the statistical and epidemiological methods that are used. We present the "OHDSI Methods Benchmark" that aims to evaluate the performance of effect estimation methods on real data. The benchmark comprises a gold standard, a set of metrics, and a set of open source software tools. The gold standard is a collection of real negative controls (drug-outcome pairs where no causal effect appears to exist) and synthetic positive controls (drug-outcome pairs that augment negative controls with simulated causal effects). We apply the benchmark using four large healthcare databases to evaluate methods commonly used in practice: the new-user cohort, self-controlled cohort, case-control, case-crossover, and self-controlled case series designs. The results confirm the concerns about these methods, showing that for most methods the operating characteristics deviate considerably from nominal levels. For example, in most contexts, only half of the 95% confidence intervals we calculated contain the corresponding true effect size. We previously developed an "empirical calibration" procedure to restore these characteristics and we also evaluate this procedure. While no one method dominates, self-controlled methods such as the empirically calibrated self-controlled case series perform well across a wide range of scenarios.
Collapse
Affiliation(s)
- Martijn J Schuemie
- Observational Health Data Sciences and Informatics.,Epidemiology Analytics, Janssen Research and Development.,Department of Biostatistics, University of California, Los Angeles
| | - M Soledad Cepeda
- Observational Health Data Sciences and Informatics.,Epidemiology Analytics, Janssen Research and Development
| | - Marc A Suchard
- Observational Health Data Sciences and Informatics.,Department of Biostatistics, University of California, Los Angeles.,Department of Biomathematics, University of California, Los Angeles.,Department of Human Genetics, University of California, Los Angeles
| | - Jianxiao Yang
- Observational Health Data Sciences and Informatics.,Department of Biomathematics, University of California, Los Angeles
| | - Yuxi Tian
- Observational Health Data Sciences and Informatics.,Department of Biomathematics, University of California, Los Angeles
| | - Alejandro Schuler
- Observational Health Data Sciences and Informatics.,Center for Biomedical Informatics Research, Stanford University
| | - Patrick B Ryan
- Observational Health Data Sciences and Informatics.,Epidemiology Analytics, Janssen Research and Development.,Department of Biomedical Informatics, Columbia University
| | - David Madigan
- Observational Health Data Sciences and Informatics.,Department of Statistics, Columbia University
| | - George Hripcsak
- Observational Health Data Sciences and Informatics.,Department of Biomedical Informatics, Columbia University.,Medical Informatics Services, New York-Presbyterian Hospital
| |
Collapse
|
5
|
Empirical confidence interval calibration for population-level effect estimation studies in observational healthcare data. Proc Natl Acad Sci U S A 2018. [PMID: 29531023 DOI: 10.1073/pnas.1708282114] [Citation(s) in RCA: 80] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Observational healthcare data, such as electronic health records and administrative claims, offer potential to estimate effects of medical products at scale. Observational studies have often been found to be nonreproducible, however, generating conflicting results even when using the same database to answer the same question. One source of discrepancies is error, both random caused by sampling variability and systematic (for example, because of confounding, selection bias, and measurement error). Only random error is typically quantified but converges to zero as databases become larger, whereas systematic error persists independent from sample size and therefore, increases in relative importance. Negative controls are exposure-outcome pairs, where one believes no causal effect exists; they can be used to detect multiple sources of systematic error, but interpreting their results is not always straightforward. Previously, we have shown that an empirical null distribution can be derived from a sample of negative controls and used to calibrate P values, accounting for both random and systematic error. Here, we extend this work to calibration of confidence intervals (CIs). CIs require positive controls, which we synthesize by modifying negative controls. We show that our CI calibration restores nominal characteristics, such as 95% coverage of the true effect size by the 95% CI. We furthermore show that CI calibration reduces disagreement in replications of two pairs of conflicting observational studies: one related to dabigatran, warfarin, and gastrointestinal bleeding and one related to selective serotonin reuptake inhibitors and upper gastrointestinal bleeding. We recommend CI calibration to improve reproducibility of observational studies.
Collapse
|
6
|
Brauer R, Douglas I, Garcia Rodriguez LA, Downey G, Huerta C, de Abajo F, Bate A, Feudjo Tepie M, de Groot MCH, Schlienger R, Reynolds R, Smeeth L, Klungel O, Ruigómez A. Risk of acute liver injury associated with use of antibiotics. Comparative cohort and nested case-control studies using two primary care databases in Europe. Pharmacoepidemiol Drug Saf 2017; 25 Suppl 1:29-38. [PMID: 27038354 DOI: 10.1002/pds.3861] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2014] [Revised: 07/10/2015] [Accepted: 07/24/2015] [Indexed: 11/10/2022]
Abstract
PURPOSE To assess the impact of varying study designs, exposure and outcome definitions on the risk of acute liver injury (ALI) associated with antibiotic use. METHODS The source population comprised of patients registered in two primary care databases, in the UK and in Spain. We identified a cohort consisting of new users of antibiotics during the study period (2004-2009) and non-users during the study period or in the previous year. Cases with ALI were identified within this cohort and classified as definite or probable, based on recorded medical information. The relative risk (RR) of ALI associated with antibiotic use was computed using Poisson regression. For the nested case-control analyses, up to five controls were matched to each case by age, sex, date and practice (in CPRD) and odds ratios (OR) were computed with conditional logistic regression. RESULTS The age, sex and year adjusted RRs of definite ALI in the current antibiotic use periods was 10.04 (95% CI: 6.97-14.47) in CPRD and 5.76 (95% CI: 3.46-9.59) in BIFAP. In the case-control analyses adjusting for life-style, comorbidities and use of medications, the OR of ALI for current users of antibiotics was and 5.7 (95% CI: 3.46-9.36) in CPRD and 2.6 (95% CI: 1.26-5.37) in BIFAP. CONCLUSION Guided by a common protocol, both cohort and case-control study designs found an increased risk of ALI associated with the use of antibiotics in both databases, independent of the exposure and case definitions used. However, the magnitude of the risk was higher in CPRD compared to BIFAP.
Collapse
Affiliation(s)
- Ruth Brauer
- London School of Hygiene and Tropical Medicine, Faculty of Epidemiology and Population Health, London, UK.,Amgen Limited, London, UK
| | - Ian Douglas
- London School of Hygiene and Tropical Medicine, Faculty of Epidemiology and Population Health, London, UK
| | | | | | - Consuelo Huerta
- Agencia Española de Medicamentos y Productos Sanitarios (AEMPS), Medicines for Human Use Department, Division of Pharmacoepidemiology and Pharmacovigilance, Madrid, Spain
| | - Francisco de Abajo
- Clinical Pharmacology Unit, University Hospital Príncipe de Asturias, Department of Biomedical Sciences, University of Alcala, Alcalá de Henares, Spain
| | | | | | - Mark C H de Groot
- Utrecht University, Faculty of Science, Division of Pharmacoepidemiology and Clinical Pharmacology, Utrecht, The Netherlands
| | | | - Robert Reynolds
- Epidemiology, Pfizer Research and Development, New York, NY, USA
| | - Liam Smeeth
- London School of Hygiene and Tropical Medicine, Faculty of Epidemiology and Population Health, London, UK
| | - Olaf Klungel
- Utrecht University, Faculty of Science, Division of Pharmacoepidemiology and Clinical Pharmacology, Utrecht, The Netherlands
| | - Ana Ruigómez
- Fundación Centro Español de Investigación Farmacoepidemiológica (CEIFE), Madrid, Spain
| |
Collapse
|
7
|
Samwald M, Xu H, Blagec K, Empey PE, Malone DC, Ahmed SM, Ryan P, Hofer S, Boyce RD. Incidence of Exposure of Patients in the United States to Multiple Drugs for Which Pharmacogenomic Guidelines Are Available. PLoS One 2016; 11:e0164972. [PMID: 27764192 PMCID: PMC5072717 DOI: 10.1371/journal.pone.0164972] [Citation(s) in RCA: 65] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2016] [Accepted: 10/04/2016] [Indexed: 01/12/2023] Open
Abstract
Pre-emptive pharmacogenomic (PGx) testing of a panel of genes may be easier to implement and more cost-effective than reactive pharmacogenomic testing if a sufficient number of medications are covered by a single test and future medication exposure can be anticipated. We analysed the incidence of exposure of individual patients in the United States to multiple drugs for which pharmacogenomic guidelines are available (PGx drugs) within a selected four-year period (2009-2012) in order to identify and quantify the incidence of pharmacotherapy in a nation-wide patient population that could be impacted by pre-emptive PGx testing based on currently available clinical guidelines. In total, 73 024 095 patient records from private insurance, Medicare Supplemental and Medicaid were included. Patients enrolled in Medicare Supplemental age > = 65 or Medicaid age 40-64 had the highest incidence of PGx drug use, with approximately half of the patients receiving at least one PGx drug during the 4 year period and one fourth to one third of patients receiving two or more PGx drugs. These data suggest that exposure to multiple PGx drugs is common and that it may be beneficial to implement wide-scale pre-emptive genomic testing. Future work should therefore concentrate on investigating the cost-effectiveness of multiplexed pre-emptive testing strategies.
Collapse
Affiliation(s)
- Matthias Samwald
- Section for Artificial Intelligence and Decision Support; Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Vienna, Austria
- * E-mail:
| | - Hong Xu
- Section for Artificial Intelligence and Decision Support; Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Vienna, Austria
| | - Kathrin Blagec
- Section for Artificial Intelligence and Decision Support; Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Vienna, Austria
| | - Philip E. Empey
- Department of Pharmacy and Therapeutics, School of Pharmacy, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Daniel C. Malone
- College of Pharmacy, University of Arizona, Tucson, Arizona, United States of America
| | - Seid Mussa Ahmed
- Department of Pharmacy, College of public health and medical sciences, Jimma University, Jimma, Ethiopia
| | - Patrick Ryan
- Janssen Research and Development, Titusville, New Jersey, United States of America
- Observational Health Data Sciences and Informatics, New York, New York, United States of America
| | - Sebastian Hofer
- Section for Artificial Intelligence and Decision Support; Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Vienna, Austria
| | - Richard D. Boyce
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
8
|
Ainsworth J, Buchan I. Combining Health Data Uses to Ignite Health System Learning. Methods Inf Med 2015; 54:479-87. [PMID: 26395036 DOI: 10.3414/me15-01-0064] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2015] [Accepted: 06/09/2015] [Indexed: 11/09/2022]
Abstract
OBJECTIVES In this paper we aim to characterise the critical mass of linked data, methods and expertise required for health systems to adapt to the needs of the populations they serve - more recently known as learning health systems. The objectives are to: 1) identify opportunities to combine separate uses of common data sources in order to reduce duplication of data processing and improve information quality; 2) identify challenges in scaling-up the reuse of health data sufficiently to support health system learning. METHODS The challenges and opportunities were identified through a series of e-health stakeholder consultations and workshops in Northern England from 2011 to 2014. From 2013 the concepts presented here have been refined through feedback to collaborators, including patient/citizen representatives, in a regional health informatics research network (www.herc.ac.uk). RESULTS Health systems typically have separate information pipelines for: 1) commissioning services; 2) auditing service performance; 3) managing finances; 4) monitoring public health; and 5) research. These pipelines share common data sources but usually duplicate data extraction, aggregation, cleaning/preparation and analytics. Suboptimal analyses may be performed due to a lack of expertise, which may exist elsewhere in the health system but is fully committed to a different pipeline. Contextual knowledge that is essential for proper data analysis and interpretation may be needed in one pipeline but accessible only in another. The lack of capable health and care intelligence systems for populations can be attributed to a legacy of three flawed assumptions: 1) universality: the generalizability of evidence across populations; 2) time-invariance: the stability of evidence over time; and 3) reducibility: the reduction of evidence into specialised sub-systems that may be recombined. CONCLUSIONS We conceptualize a population health and care intelligence system capable of supporting health system learning and we put forward a set of maturity tests of progress toward such a system. A factor common to each test is data-action latency; a mature system spawns timely actions proportionate to the information that can be derived from the data, and in doing so creates meaningful measurement about system learning. We illustrate, using future scenarios, some major opportunities to improve health systems by exchanging conventional intelligence pipelines for networked critical masses of data, methods and expertise that minimise data-action latency and ignite system-learning.
Collapse
Affiliation(s)
- J Ainsworth
- John Ainsworth, Centre for Health Informatics, University of Manchester, Manchester, M13 9PL, UK, E-mail:
| | | |
Collapse
|
9
|
|
10
|
Madigan D, Schuemie MJ, Ryan PB. Empirical performance of the case-control method: lessons for developing a risk identification and analysis system. Drug Saf 2014; 36 Suppl 1:S73-82. [PMID: 24166225 DOI: 10.1007/s40264-013-0105-z] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
BACKGROUND Considerable attention now focuses on the use of large-scale observational healthcare data for understanding drug safety. In this context, analysts utilize a variety of statistical and epidemiological approaches such as case-control, cohort, and self-controlled methods. The operating characteristics of these methods are poorly understood. OBJECTIVE Establish the operating characteristics of the case-control method for large scale observational analysis in drug safety. RESEARCH DESIGN We empirically evaluated the case-control approach in 5 real observational healthcare databases and 6 simulated datasets. We retrospectively studied the predictive accuracy of the method when applied to a collection of 165 positive controls and 234 negative controls across 4 outcomes: acute liver injury, acute myocardial infarction, acute kidney injury, and upper gastrointestinal bleeding. RESULTS In our experiment, the case-control method provided weak discrimination between positive and negative controls. Furthermore, the method yielded positively biased estimates and confidence intervals that had poor coverage properties. CONCLUSIONS For the four outcomes we examined, the case-control method may not be the method of choice for estimating potentially harmful effects of drugs.
Collapse
Affiliation(s)
- David Madigan
- Department of Statistics, Columbia University, 1255 Amsterdam Avenue, New York, NY, 10027, USA,
| | | | | |
Collapse
|
11
|
Reich CG, Ryan PB, Schuemie MJ. Alternative outcome definitions and their effect on the performance of methods for observational outcome studies. Drug Saf 2014; 36 Suppl 1:S181-93. [PMID: 24166234 DOI: 10.1007/s40264-013-0111-1] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
BACKGROUND A systematic risk identification system has the potential to test marketed drugs for important Health Outcomes of Interest or HOI. For each HOI, multiple definitions are used in the literature, and some of them are validated for certain databases. However, little is known about the effect of different definitions on the ability of methods to estimate their association with medical products. OBJECTIVES Alternative definitions of HOI were studied for their effect on the performance of analytical methods in observational outcome studies. METHODS A set of alternative definitions for three HOI were defined based on literature review and clinical diagnosis guidelines: acute kidney injury, acute liver injury and acute myocardial infarction. The definitions varied by the choice of diagnostic codes and the inclusion of procedure codes and lab values. They were then used to empirically study an array of analytical methods with various analytical choices in four observational healthcare databases. The methods were executed against predefined drug-HOI pairs to generate an effect estimate and standard error for each pair. These test cases included positive controls (active ingredients with evidence to suspect a positive association with the outcome) and negative controls (active ingredients with no evidence to expect an effect on the outcome). Three different performance metrics where used: (i) Area Under the Receiver Operator Characteristics (ROC) curve (AUC) as a measure of a method's ability to distinguish between positive and negative test cases, (ii) Measure of bias by estimation of distribution of observed effect estimates for the negative test pairs where the true effect can be assumed to be one (no relative risk), and (iii) Minimal Detectable Relative Risk (MDRR) as a measure of whether there is sufficient power to generate effect estimates. RESULTS In the three outcomes studied, different definitions of outcomes show comparable ability to differentiate true from false control cases (AUC) and a similar bias estimation. However, broader definitions generating larger outcome cohorts allowed more drugs to be studied with sufficient statistical power. CONCLUSIONS Broader definitions are preferred since they allow studying drugs with lower prevalence than the more precise or narrow definitions while showing comparable performance characteristics in differentiation of signal vs. no signal as well as effect size estimation.
Collapse
|
12
|
Ryan PB, Schuemie MJ, Welebob E, Duke J, Valentine S, Hartzema AG. Defining a reference set to support methodological research in drug safety. Drug Saf 2014; 36 Suppl 1:S33-47. [PMID: 24166222 DOI: 10.1007/s40264-013-0097-8] [Citation(s) in RCA: 83] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
BACKGROUND Methodological research to evaluate the performance of methods requires a benchmark to serve as a referent comparison. In drug safety, the performance of analyses of spontaneous adverse event reporting databases and observational healthcare data, such as administrative claims and electronic health records, has been limited by the lack of such standards. OBJECTIVES To establish a reference set of test cases that contain both positive and negative controls, which can serve the basis for methodological research in evaluating methods performance in identifying drug safety issues. RESEARCH DESIGN Systematic literature review and natural language processing of structured product labeling was performed to identify evidence to support the classification of drugs as either positive controls or negative controls for four outcomes: acute liver injury, acute kidney injury, acute myocardial infarction, and upper gastrointestinal bleeding. RESULTS Three-hundred and ninety-nine test cases comprised of 165 positive controls and 234 negative controls were identified across the four outcomes. The majority of positive controls for acute kidney injury and upper gastrointestinal bleeding were supported by randomized clinical trial evidence, while the majority of positive controls for acute liver injury and acute myocardial infarction were only supported based on published case reports. Literature estimates for the positive controls shows substantial variability that limits the ability to establish a reference set with known effect sizes. CONCLUSIONS A reference set of test cases can be established to facilitate methodological research in drug safety. Creating a sufficient sample of drug-outcome pairs with binary classification of having no effect (negative controls) or having an increased effect (positive controls) is possible and can enable estimation of predictive accuracy through discrimination. Since the magnitude of the positive effects cannot be reliably obtained and the quality of evidence may vary across outcomes, assumptions are required to use the test cases in real data for purposes of measuring bias, mean squared error, or coverage probability.
Collapse
Affiliation(s)
- Patrick B Ryan
- Janssen Research and Development LLC, 1125 Trenton-Harbourton Road, Room K30205, PO Box 200, Titusville, NJ, 08560, USA,
| | | | | | | | | | | |
Collapse
|
13
|
Reich CG, Ryan PB, Suchard MA. The impact of drug and outcome prevalence on the feasibility and performance of analytical methods for a risk identification and analysis system. Drug Saf 2014; 36 Suppl 1:S195-204. [PMID: 24166235 DOI: 10.1007/s40264-013-0112-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
BACKGROUND A systematic risk identification system has the potential to study all marketed drugs. However, the rates of drug exposure and outcome occurrences in observational databases, the database size and the desired risk detection threshold determine the power and therefore limit the feasibility of the application of appropriate analytical methods. Drugs vary dramatically for these parameters because of their prevalence of indication, cost, time on the market, payer formularies, market pressures and clinical guidelines. OBJECTIVES Evaluate (i) the feasibility of a risk identification system based on commercially available observational databases, (ii) the range of drugs that can be studied for certain outcomes, (iii) the influence of underpowered drug-outcome pairs on the performance of analytical methods estimating the strength of their association and (iv) the time required from the introduction of a new drug to accumulate sufficient data for signal detection. METHODS As part of the Observational Medical Outcomes Partnership experiment, we used data from commercially available observational databases and calculated the minimal detectable relative risk of all pairs of marketed drugs and eight health outcomes of interest. We then studied an array of analytical methods for their ability to distinguish between pre-determined positive and negative drug-outcome test pairs. The positive controls contained active ingredients with evidence of a positive association with the outcome, and the negative controls had no such evidence. As a performance measure we used the area under the receiver operator characteristics curve (AUC). We compared the AUC of methods using all test pairs or only pairs sufficiently powered for detection of a relative risk of 1.25. Finally, we studied all drugs introduced to the market in 2003-2008 and determined the time required to achieve the same minimal detectable relative risk threshold. RESULTS The performance of methods improved after restricting them to fully powered drug-outcome pairs. The availability of drug-outcome pairs with sufficient power to detect a relative risk of 1.25 varies enormously among outcomes. Depending on the market uptake, drugs can generate relevant signals in the first month after approval, or never reach sufficient power. CONCLUSION The incidence of drugs and important outcomes determines sample size and method performance in estimating drug-outcome associations. Careful consideration is therefore necessary to choose databases and outcome definitions, particularly for newly introduced drugs.
Collapse
|
14
|
Ryan PB, Schuemie MJ, Madigan D. Empirical performance of a self-controlled cohort method: lessons for developing a risk identification and analysis system. Drug Saf 2014; 36 Suppl 1:S95-106. [PMID: 24166227 DOI: 10.1007/s40264-013-0101-3] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
BACKGROUND Observational healthcare data offer the potential to enable identification of risks of medical products, but appropriate methodology has not yet been defined. The self-controlled cohort method, which compares the post-exposure outcome rate with the pre-exposure rate among an exposed cohort, has been proposed as a potential approach for risk identification but its performance has not been fully assessed. OBJECTIVES To evaluate the performance of the self-controlled cohort method as a tool for risk identification in observational healthcare data. RESEARCH DESIGN The method was applied to 399 drug-outcome scenarios (165 positive controls and 234 negative controls across 4 health outcomes of interest) in 5 real observational databases (4 administrative claims and 1 electronic health record) and in 6 simulated datasets with no effect and injected relative risks of 1.25, 1.5, 2, 4, and 10, respectively. MEASURES Method performance was evaluated through area under ROC curve (AUC), bias, and coverage probability. RESULTS The self-controlled cohort design achieved strong predictive accuracy across the outcomes and databases under study, with the top-performing settings exceeding AUC >0.76 in all scenarios. However, the estimates generated were observed to be highly biased with low coverage probability. CONCLUSIONS If the objective for a risk identification system is one of discrimination, the self-controlled cohort method shows promise as a potential tool for risk identification. However, if a system is intended to generate effect estimates to quantify the magnitude of potential risks, the self-controlled cohort method may not be suitable, and requires substantial calibration to be properly interpreted under nominal properties.
Collapse
Affiliation(s)
- Patrick B Ryan
- Janssen Research and Development LLC, 1125 Trenton-Harbourton Road, Room K30205, PO Box 200, Titusville, NJ, 08560, USA,
| | | | | |
Collapse
|
15
|
DuMouchel W, Ryan PB, Schuemie MJ, Madigan D. Evaluation of disproportionality safety signaling applied to healthcare databases. Drug Saf 2014; 36 Suppl 1:S123-32. [PMID: 24166229 DOI: 10.1007/s40264-013-0106-y] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
OBJECTIVE To evaluate the performance of a disproportionality design, commonly used for analysis of spontaneous reports data such as the FDA Adverse Event Reporting System database, as a potential analytical method for an adverse drug reaction risk identification system using healthcare data. RESEARCH DESIGN We tested the disproportionality design in 5 real observational healthcare databases and 6 simulated datasets, retrospectively studying the predictive accuracy of the method when applied to a collection of 165 positive controls and 234 negative controls across 4 outcomes: acute liver injury, acute myocardial infarction, acute kidney injury, and upper gastrointestinal bleeding. MEASURES We estimate how well the method can be expected to identify true effects and discriminate from false findings and explore the statistical properties of the estimates the design generates. The primary measure was the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. RESULTS For each combination of 4 outcomes and 5 databases, 48 versions of disproportionality analysis (DPA) were carried out and the AUC computed. The majority of the AUC values were in the range of 0.35 < AUC < 0.6, which is considered to be poor predictive accuracy, since the value AUC = 0.5 would be expected from mere random assignment. Several DPA versions achieved AUC of about 0.7 for the outcome Acute Renal Failure within the GE database. The overall highest DPA version across all 20 outcome-database combinations was the Bayesian Information Component method with no stratification by age and gender, using first occurrence of outcome and with assumed time-at-risk equal to duration of exposure + 30 d, but none were uniformly optimal. The relative risk estimates for the negative control drug-event combinations were very often biased either upward or downward by a factor of 2 or more. Coverage probabilities of confidence intervals from all methods were far below nominal. CONCLUSIONS The disproportionality methods that we evaluated did not discriminate true positives from true negatives using healthcare data as they seem to do using spontaneous report data.
Collapse
|
16
|
Ball R. Perspectives on the future of postmarket vaccine safety surveillance and evaluation. Expert Rev Vaccines 2014; 13:455-62. [DOI: 10.1586/14760584.2014.891941] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
17
|
Affiliation(s)
- Stephen J W Evans
- Department of Medical Statistics, Room 37b, London School of Hygiene and Tropical Medicine, Keppel Street, London, WC1E 7HT, UK,
| |
Collapse
|
18
|
Schuemie MJ, Madigan D, Ryan PB. Empirical Performance of LGPS and LEOPARD: Lessons for Developing a Risk Identification and Analysis System. Drug Saf 2013; 36 Suppl 1:S133-42. [DOI: 10.1007/s40264-013-0107-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|