1
|
Webster-Clark M, Stürmer T, Wang T, Man K, Marinac-Dabic D, Rothman KJ, Ellis AR, Gokhale M, Lunt M, Girman C, Glynn RJ. Using propensity scores to estimate effects of treatment initiation decisions: State of the science. Stat Med 2020; 40:1718-1735. [PMID: 33377193 DOI: 10.1002/sim.8866] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Revised: 12/11/2020] [Accepted: 12/14/2020] [Indexed: 02/02/2023]
Abstract
Confounding can cause substantial bias in nonexperimental studies that aim to estimate causal effects. Propensity score methods allow researchers to reduce bias from measured confounding by summarizing the distributions of many measured confounders in a single score based on the probability of receiving treatment. This score can then be used to mitigate imbalances in the distributions of these measured confounders between those who received the treatment of interest and those in the comparator population, resulting in less biased treatment effect estimates. This methodology was formalized by Rosenbaum and Rubin in 1983 and, since then, has been used increasingly often across a wide variety of scientific disciplines. In this review article, we provide an overview of propensity scores in the context of real-world evidence generation with a focus on their use in the setting of single treatment decisions, that is, choosing between two therapeutic options. We describe five aspects of propensity score analysis: alignment with the potential outcomes framework, implications for study design, estimation procedures, implementation options, and reporting. We add context to these concepts by highlighting how the types of comparator used, the implementation method, and balance assessment techniques have changed over time. Finally, we discuss evolving applications of propensity scores.
Collapse
|
Review |
5 |
58 |
2
|
Stürmer T, Webster-Clark M, Lund JL, Wyss R, Ellis AR, Lunt M, Rothman KJ, Glynn RJ. Propensity Score Weighting and Trimming Strategies for Reducing Variance and Bias of Treatment Effect Estimates: A Simulation Study. Am J Epidemiol 2021; 190:1659-1670. [PMID: 33615349 DOI: 10.1093/aje/kwab041] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Revised: 02/05/2021] [Accepted: 02/15/2021] [Indexed: 12/30/2022] Open
Abstract
To extend previous simulations on the performance of propensity score (PS) weighting and trimming methods to settings without and with unmeasured confounding, Poisson outcomes, and various strengths of treatment prediction (PS c statistic), we simulated studies with a binary intended treatment T as a function of 4 measured covariates. We mimicked treatment withheld and last-resort treatment by adding 2 "unmeasured" dichotomous factors that directed treatment to change for some patients in both tails of the PS distribution. The number of outcomes Y was simulated as a Poisson function of T and confounders. We estimated the PS as a function of measured covariates and trimmed the tails of the PS distribution using 3 strategies ("Crump," "Stürmer," and "Walker"). After trimming and reestimation, we used alternative PS weights to estimate the treatment effect (rate ratio): inverse probability of treatment weighting, standardized mortality ratio (SMR)-treated, SMR-untreated, the average treatment effect in the overlap population (ATO), matching, and entropy. With no unmeasured confounding, the ATO (123%) and "Crump" trimming (112%) improved relative efficiency compared with untrimmed inverse probability of treatment weighting. With unmeasured confounding, untrimmed estimates were biased irrespective of weighting method, and only Stürmer and Walker trimming consistently reduced bias. In settings where unmeasured confounding (e.g., frailty) may lead physicians to withhold treatment, Stürmer and Walker trimming should be considered before primary analysis.
Collapse
|
Research Support, N.I.H., Extramural |
4 |
45 |
3
|
Hong JL, Jonsson Funk M, LoCasale R, Dempster SE, Cole SR, Webster-Clark M, Edwards JK, Stürmer T. Generalizing Randomized Clinical Trial Results: Implementation and Challenges Related to Missing Data in the Target Population. Am J Epidemiol 2018; 187:817-827. [PMID: 29020193 DOI: 10.1093/aje/kwx287] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2016] [Accepted: 07/25/2017] [Indexed: 01/02/2023] Open
Abstract
Statins are indicated in patients with elevated levels of high-sensitivity C-reactive protein and normal low-density lipoprotein cholesterol based on results of the multicountry trial, Justification for the Use of Statins in Prevention: an Intervention Trial Evaluating Rosuvastatin (JUPITER) (2003-2008), but the benefit in real-world populations remains unknown. We sought to generalize JUPITER results to trial-eligible population using data from the UK Clinical Practice Research Datalink (CPRD), 2001-2014. We multiply imputed missing baseline characteristics for the CPRD population and selected the trial-eligible participants as the target population based on observed and imputed values. Trial participants were weighted to be representative of the CPRD population (n = 383,418) based on individual predicted probability of selection into the trial. Trial participants were also standardized to the CPRD population without missing values (n = 2,677). In JUPITER, rosuvastatin reduced cardiovascular risk with a 3-year risk difference of -2.0% (95% confidence interval (CI): -2.9, -1.1). The rosuvastatin effect was muted in the first 2 years but remained strong at 3 years after standardizing to the imputed CPRD population (3-year risk difference = -2.7%; 95% CI: -5.8, 0.4) and the CPRD population without missing data (3-year risk difference = -1.7%; 95% CI: -3.5, 0.1). The study serves as an illustration of possible approaches to understanding generalizability of trials using real-world databases given limitations due to missing data on inclusion/exclusion criteria.
Collapse
|
Randomized Controlled Trial |
7 |
20 |
4
|
Webster-Clark M, Jaeger B, Zhong Y, Filler G, Alvarez-Elias A, Franceschini N, Díaz-González de Ferris ME. Low agreement between modified-Schwartz and CKD-EPI eGFR in young adults: a retrospective longitudinal cohort study. BMC Nephrol 2018; 19:194. [PMID: 30081844 PMCID: PMC6080537 DOI: 10.1186/s12882-018-0995-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2018] [Accepted: 07/26/2018] [Indexed: 01/22/2023] Open
Abstract
Background While there is a great deal of research updating methods for estimating renal function, many of these methods are being developed in either adults with CKD or younger children. Currently, there is limited understanding of the agreement between the modified new bedside Schwartz estimated glomerular filtration rate (eGFR) formula and the adult CKD-EPI formula in adolescents and young adults (AYAs) with chronic kidney disease (CKD) measured longitudinally. Methods Longitudinal cohort study of 242 patients (10–30 years) with CKD, followed retrospectively in a single tertiary centre as they transitioned from the paediatric- to adult-focused settings. The study population came from a longitudinal cohort of AYAs undergoing healthcare transition at the STARx Program at the University of North Carolina, in the South-Eastern USA, from 2006 to 2015. We calculated and compared the eGFR using the new bedside Schwartz formula and the CKD-EPI eGFR. Measurements were repeated for each age in years. Agreement was tested using Bland & Altman analysis. Subgroup analysis was performed using the following age groups 10–15, 15–20, 20–25 and 25–30 years, glomerular and non-glomerular causes of CKD and height z-score. Results Using repeated measures, concordance between the new Schwartz and CKD-EPI eGFR was low at 0.74 (95% C.I. 0.67, 0.79) at the lowest age range of 10–15, 0.78 (95% C.I. 0.71, 0.84) at age 15–20, 0.80 (0.70, 0.87) at ages 20–25, and 0.82 (95% C.I. 0.70, 0.90) at age 25–30. Discordance was worse in males and largest in the 10–15 year-old age group, and in patients with stunted growth. Conclusions The Schwartz and CKD-EPI equations exhibit poor agreement in patients before and during the transition period with CKD-EPI consistently yielding higher eGFRs, especially in males. Further studies are required to determine the appropriate age for switching to the CKD-EPI equation after age 18. Electronic supplementary material The online version of this article (10.1186/s12882-018-0995-1) contains supplementary material, which is available to authorized users.
Collapse
|
Observational Study |
7 |
17 |
5
|
Hong JL, Webster-Clark M, Jonsson Funk M, Stürmer T, Dempster SE, Cole SR, Herr I, LoCasale R. Comparison of Methods to Generalize Randomized Clinical Trial Results Without Individual-Level Data for the Target Population. Am J Epidemiol 2019; 188:426-437. [PMID: 30312378 DOI: 10.1093/aje/kwy233] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2018] [Accepted: 10/05/2018] [Indexed: 01/24/2023] Open
Abstract
Our study explored the application of methods to generalize randomized controlled trial results to a target population without individual-level data. We compared 4 methods using aggregate data for the target population to generalize results from the international trial, Justification for the Use of Statins in Prevention: an Intervention Trial Evaluating Rosuvastatin (JUPITER), to a target population of trial-eligible patients in the UK Clinical Practice Research Datalink (CPRD). The gold-standard method used individual data from both the trial and CPRD to predict probabilities of being sampled in the trial and to reweight trial participants to reflect CPRD patient characteristics. Methods 1 and 2 used weighting methods based on simulated individual data or the method of moments, respectively. Method 3 weighted the trial's subgroup-specific treatment effects to match the distribution of an effect modifier in CPRD. Method 4 calculated the expected absolute benefits in CPRD assuming homogeneous relative treatment effect. Methods based on aggregate data for the target population generally yielded results between the trial and gold-standard estimates. Methods 1 and 2 yielded estimates closest to the gold-standard estimates when continuous effect modifiers were represented as categorical variables. Although individual data or data on joint distributions remains the best approach to generalize trial results, these methods using aggregate data might be useful tools for timely assessment of randomized trial generalizability.
Collapse
|
Multicenter Study |
6 |
15 |
6
|
Webster-Clark M, Ross RK, Lund JL. Initiator Types and the Causal Question of the Prevalent New-User Design: A Simulation Study. Am J Epidemiol 2021; 190:1341-1348. [PMID: 33350433 DOI: 10.1093/aje/kwaa283] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Revised: 10/30/2020] [Accepted: 11/02/2020] [Indexed: 12/12/2022] Open
Abstract
New-user designs restricting to treatment initiators have become the preferred design for studying drug comparative safety and effectiveness using nonexperimental data. This design reduces confounding by indication and healthy-adherer bias at the cost of smaller study sizes and reduced external validity, particularly when assessing a newly approved treatment compared with standard treatment. The prevalent new-user design includes adopters of a new treatment who switched from or previously used standard treatment (i.e., the comparator), expanding study sample size and potentially broadening the study population for inference. Previous work has suggested the use of time-conditional propensity-score matching to mitigate prevalent user bias. In this study, we describe 3 "types" of initiators of a treatment: new users, direct switchers, and delayed switchers. Using these initiator types, we articulate the causal questions answered by the prevalent new-user design and compare them with those answered by the new-user design. We then show, using simulation, how conditioning on time since initiating the comparator (rather than full treatment history) can still result in a biased estimate of the treatment effect. When implemented properly, the prevalent new-user design estimates new and important causal effects distinct from the new-user design.
Collapse
|
Research Support, N.I.H., Extramural |
4 |
13 |
7
|
Lesko CR, Ackerman B, Webster-Clark M, Edwards JK. Target validity: Bringing treatment of external validity in line with internal validity. CURR EPIDEMIOL REP 2021; 7:117-124. [PMID: 33585162 DOI: 10.1007/s40471-020-00239-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Purpose of Review "Target bias" is the difference between an estimate of association from a study sample and the causal effect in the target population of interest. It is the sum of internal and external bias. Given the extensive literature on internal validity, here, we review threats and methods to improve external validity. Recent findings External bias may arise when the distribution of modifiers of the effect of treatment differs between the study sample and the target population. Methods including those based on modeling the outcome, modeling sample membership, and doubly robust methods are available, assuming data on the target population is available. Summary The relevance of information for making policy decisions is dependent on both the actions that were studied and the sample in which they were evaluated. Combining methods for addressing internal and external validity can improve the policy relevance of study results.
Collapse
|
|
4 |
10 |
8
|
Webster-Clark M, Jonsson Funk M, Stürmer T. Single-arm Trials With External Comparators and Confounder Misclassification: How Adjustment Can Fail. Med Care 2020; 58:1116-1121. [PMID: 32925456 PMCID: PMC7665993 DOI: 10.1097/mlr.0000000000001400] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
BACKGROUND "Single-arm trials" with external comparators that contrast outcomes in those on experimental therapy to real-world patients have been used to evaluate efficacy and safety of experimental drugs in rare and severe diseases. Regulatory agencies are considering expanding the role these studies can play; guidance thus far has explicitly considered outcome misclassification with little discussion of misclassification of confounding variables. OBJECTIVES This work uses causal diagrams to illustrate how adjustment for a misclassified confounder can result in estimates farther from the truth than ignoring it completely. This theory is augmented with quantitative examples using plausible values for misclassification of smoking in real-world pharmaceutical claims data. A tool is also provided for calculating bias of adjusted estimates with specific input parameters. RESULTS When confounder misclassification is similar in both data sources, adjustment generally brings estimates closer to the truth. When it is not, adjustment can generate estimates that are considerably farther from the truth than the crude. While all nonrandomized studies are subject to this potential bias, single-arm studies are particularly vulnerable due to perfect alignment of confounder measurement and treatment group. This is most problematic when the prevalence of the confounder does not differ between data sources and misclassification does, but can occur even with strong confounder-data source associations. DISCUSSION Researchers should consider differential confounder misclassification when designing protocols for these types of studies. Subsample validation of confounders, followed by imputation or other bias correction methods, may be a key tool for combining trial and real-world data going forward.
Collapse
|
Research Support, N.I.H., Extramural |
5 |
9 |
9
|
Webster-Clark M, Huang TY, Hou L, Toh S. Translating claims-based CHA 2 DS 2 -VaSc and HAS-BLED to ICD-10-CM: Impacts of mapping strategies. Pharmacoepidemiol Drug Saf 2020; 29:409-418. [PMID: 32067286 DOI: 10.1002/pds.4973] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Revised: 01/27/2020] [Accepted: 02/02/2020] [Indexed: 11/06/2022]
Abstract
PURPOSE The CHA2 DS2 -VaSc and HAS-BLED risk scores are commonly used in the studies of oral anticoagulants (OACs). The best ways to map these scores to the International Classification of Diseases, 10th Revision, Clinical Modification (ICD-10-CM) codes is unclear, as is how they perform in various types of OAC users. We aimed to assess the distributions of CHA2 DS2 -VaSc and HAS-BLED scores and C-statistics for outcome prediction in the ICD-10-CM era using different mapping strategies. METHODS We compared the distributions of CHA2 DS2 -VaSc and HAS-BLED scores from various mapping strategies in atrial fibrillation patients before, during, and after ICD-10-CM transition. We estimated the C-statistics predicting the 90-day risk of hospitalized stroke (for CHA2 DS2 -VaSc) or hospitalized bleeding (for HAS-BLED) in patients identified at least 6 months after the ICD-10-CM transition, overall and by anticoagulant type. RESULTS Forward-backward mapping produced higher CHA2 DS2 -VaSc and HAS-BLED scores in the ICD-10-CM era compared to the ICD-9-CM era: the mean difference was 0.074 (95% confidence interval 0.064-0.085) for CHA2 DS2 -VaSc and 0.055 (0.048-0.062) for HAS-BLED. Both scores had higher C-statistics in patients taking no OACs (0.697 [0.677-0.717] for CHA2 DS2 -VaSc; 0.719 [0.702-0.737] for HAS-BLED) or direct OACs (0.695 [0.654-0.735] for CHA2 DS2 -VaSc; 0.700 [0.673-0.728] for HAS-BLED) than those taking warfarin (0.655 [0.613-0.697] for CHA2 DS2 -VaSc; 0.663 [0.6320.695] for HAS-BLED). CONCLUSIONS Existing mapping strategies generally preserved the distributions of CHA2 DS2 -VaSc and HAS-BLED scores after ICD-10-CM transition. Both scores performed better in patients on no OACs or direct OACs than patients on warfarin.
Collapse
|
|
5 |
9 |
10
|
Duchesneau ED, Jackson BE, Webster-Clark M, Lund JL, Reeder-Hayes KE, Nápoles AM, Strassle PD. The Timing, the Treatment, the Question: Comparison of Epidemiologic Approaches to Minimize Immortal Time Bias in Real-World Data Using a Surgical Oncology Example. Cancer Epidemiol Biomarkers Prev 2022; 31:2079-2086. [PMID: 35984990 PMCID: PMC9627261 DOI: 10.1158/1055-9965.epi-22-0495] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Revised: 07/01/2022] [Accepted: 08/17/2022] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Studies evaluating the effects of cancer treatments are prone to immortal time bias that, if unaddressed, can lead to treatments appearing more beneficial than they are. METHODS To demonstrate the impact of immortal time bias, we compared results across several analytic approaches (dichotomous exposure, dichotomous exposure excluding immortal time, time-varying exposure, landmark analysis, clone-censor-weight method), using surgical resection among women with metastatic breast cancer as an example. All adult women diagnosed with incident metastatic breast cancer from 2013-2016 in the National Cancer Database were included. To quantify immortal time bias, we also conducted a simulation study where the "true" relationship between surgical resection and mortality was known. RESULTS 24,329 women (median age 61, IQR 51-71) were included, and 24% underwent surgical resection. The largest association between resection and mortality was observed when using a dichotomized exposure [HR, 0.54; 95% confidence interval (CI), 0.51-0.57], followed by dichotomous with exclusion of immortal time (HR, 0.62; 95% CI, 0.59-0.65). Results from the time-varying exposure, landmark, and clone-censor-weight method analyses were closer to the null (HR, 0.67-0.84). Results from the plasmode simulation found that the time-varying exposure, landmark, and clone-censor-weight method models all produced unbiased HRs (bias -0.003 to 0.016). Both standard dichotomous exposure (HR, 0.84; bias, -0.177) and dichotomous with exclusion of immortal time (HR, 0.93; bias, -0.074) produced meaningfully biased estimates. CONCLUSIONS Researchers should use time-varying exposures with a treatment assessment window or the clone-censor-weight method when immortal time is present. IMPACT Using methods that appropriately account for immortal time will improve evidence and decision-making from research using real-world data.
Collapse
|
Research Support, N.I.H., Extramural |
3 |
9 |
11
|
Webster-Clark M, Breskin A. Directed Acyclic Graphs, Effect Measure Modification, and Generalizability. Am J Epidemiol 2021; 190:322-327. [PMID: 32840557 DOI: 10.1093/aje/kwaa185] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Revised: 08/11/2020] [Accepted: 08/21/2020] [Indexed: 11/13/2022] Open
Abstract
Directed acyclic graphs (DAGs) have had a major impact on the field of epidemiology by providing straightforward graphical rules for determining when estimates are expected to lack causally interpretable internal validity. Much less attention has been paid, however, to what DAGs can tell researchers about effect measure modification and external validity. In this work, we describe 2 rules based on DAGs related to effect measure modification. Rule 1 states that if a variable, $P$, is conditionally independent of an outcome, $Y$, within levels of a treatment, $X$, then $P$ is not an effect measure modifier for the effect of $X$ on $Y$ on any scale. Rule 2 states that if $P$ is not conditionally independent of $Y$ within levels of $X$, and there are open causal paths from $X$ to $Y$ within levels of $P$, then $P$ is an effect measure modifier for the effect of $X$ on $Y$ on at least 1 scale (given no exact cancelation of associations). We then show how Rule 1 can be used to identify sufficient adjustment sets to generalize nested trials studying the effect of $X$ on $Y$ to the total source population or to those who did not participate in the trial.
Collapse
|
Journal Article |
4 |
9 |
12
|
Webster-Clark M, Keil AP, Robert N, Frytak JR, Boyd M, Stürmer T, Sanoff H, Westreich D, Lund JL. Comparing Trial and Real-world Adjuvant Oxaliplatin Delivery in Patients With Stage III Colon Cancer Using a Longitudinal Cumulative Dose. JAMA Oncol 2022; 8:2797492. [PMID: 36227604 PMCID: PMC9562097 DOI: 10.1001/jamaoncol.2022.4445] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Accepted: 07/13/2022] [Indexed: 11/14/2022]
Abstract
Importance Delivery of adjuvant chemotherapy can differ substantially between trial and real-world populations. Adherence metrics like relative dose intensity (RDI) cannot capture the timing of modifications and mask differences in the total amount of chemotherapy received. Objective To compare oxaliplatin delivery between MOSAIC trial participants and patients treated in the US Oncology Network with stage III colon cancer using a longitudinal cumulative dose (LCD). Design, Setting, and Participants This cohort study used secondary data from the MOSAIC trial, an international randomized clinical trial (concluded in 2004), and electronic health records from US Oncology (2009-2018), a network of community oncology practices in the US. It included participants in MOSAIC with stage III colon cancer who were randomized to receive treatment with oxaliplatin and fluorouracil/leucovorin (n = 663) and US Oncology patients with stage III colon cancer who were treated with a modified FOLFOX-6 regimen (n = 2523). Exposures Oxaliplatin and fluorouracil/leucovorin. Outcomes and Measures We evaluated RDI and LCD over time and at the end of treatment in the MOSAIC and US Oncology populations. We used bootstrapping to estimate 95% confidence bands for LCD differences between the populations. Results The 663 MOSAIC participants (296 women [44.7%]) and 2523 US Oncology patients (1245 women [49.4%]) were generally similar with respect to demographic characteristics. Median RDI was lower in US Oncology (80% in MOSAIC vs 70% in US Oncology). The LCD also suggested differences in the total amount of oxaliplatin received between populations; the final median LCD in US Oncology was 10.2% lower than in MOSAIC, equivalent to receiving 1.2 fewer treatment cycles less of oxaliplatin. This difference only began 133 days into treatment and persisted after accounting for covariates, likely in terms of more frequent oxaliplatin treatment discontinuation in US Oncology patients than their MOSAIC counterparts. Conclusions and Relevance The study results suggest that real-world patients in community practice in the US treated with modified FOLFOX 6 received less oxaliplatin than their historical counterparts in the MOSAIC trial, with differences manifesting late in the treatment course. The LCD allowed us to identify the amount and extent of these differences, the timing of which was unclear when using RDI alone. Trial Registration ClinicalTrials.gov identifier: NCT00275210.
Collapse
|
brief-report |
3 |
5 |
13
|
Webster-Clark M, Stürmer T, Edwards JK, Poole C, Simpson RJ, Lund JL. Real-world on-treatment and initial treatment absolute risk differences for dabigatran vs warfarin in older US adults. Pharmacoepidemiol Drug Saf 2020; 29:832-841. [PMID: 32666678 DOI: 10.1002/pds.5069] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Revised: 05/05/2020] [Accepted: 06/01/2020] [Indexed: 11/07/2022]
Abstract
PURPOSE Trials and past observational work compared dabigatran and warfarin in patients with atrial fibrillation, but few reported estimates of absolute harm and benefit under real-world adherence patterns, particularly in older adults that may have differing benefit-harm profiles. We aimed to estimate risk differences for ischemic stroke, death, and gastrointestinal bleeding after initiating dabigatran and warfarin in older adults (a) when patients adhere to treatment and (b) under real-world adherence patterns. METHODS In a 20% sample of nationwide Medicare claims from 2010 to 2015, we identified beneficiaries aged 66 years and older initiating warfarin and dabigatran. We followed individuals from initiation until death or October 2015 (initial treatment, IT) and separately censored individuals' follow-up after drug switches and gaps in supply (on-treatment, OT). We applied inverse probability of treatment and standardized morbidity ratio weights, as well as inverse probability of censoring weights, to estimate two-year risk differences (RDs) for dabigatran vs warfarin. RESULTS We identified 10,717 dabigatran and 74,891 warfarin initiators. Weighted OT RDs suggested decreased ischemic stroke risk for dabigatran vs warfarin; IT RDs indicated increased or no change in ischemic stroke risk. Regardless of follow-up approach and weighting strategy, risk of death appeared lower and risk of gastrointestinal bleeding appeared higher when comparing dabigatran vs warfarin. CONCLUSIONS Dabigatran use was associated with lower risks of mortality and ischemic stroke in routine care when older adults stayed on treatment. IT analyses suggested that these benefits may be diminished under real-world patterns of switching and discontinuation.
Collapse
|
Research Support, N.I.H., Extramural |
5 |
5 |
14
|
Webster-Clark M, Keil AP, Sanoff HK, Stürmer T, Westreich D, Lund JL. Introducing longitudinal cumulative dose to describe chemotherapy patterns over time: Case study of a colon cancer trial. Int J Cancer 2021; 149:394-402. [PMID: 33729546 DOI: 10.1002/ijc.33565] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 02/09/2021] [Accepted: 02/22/2021] [Indexed: 01/07/2023]
Abstract
Adjuvant chemotherapy regimens take months to complete. Despite this, studies evaluate chemotherapy adherence via measures assessed at the end of treatment (eg, number of patients missing any dose, relative dose intensity [RDI]). This approach ignores information like the timing of treatment delays. We propose longitudinal cumulative dose (LCD) to integrate impacts of dose reductions, missed doses and dose delays over time. We obtained data from the 2246 participants in the MOSAIC trial randomized to FOLFOX (all three agents) or 5-FU/LV (only 5-fluorouracil and leucovorin). We evaluated proportions of patients stopping treatment early and reducing, missing or delaying a dose in each arm for each chemotherapy agent at each cycle. We calculated LCD, the fraction of the final standard dose a participant reached by a given day, for each participant and each agent and compared it over time and at 24 weeks between treatment arms. Participants randomized to FOLFOX were more likely to stop treatment, reduce doses, miss doses or delay cycles; these differences increased over time. Median LCD for oxaliplatin in the FOLFOX arm at 24 weeks was 77%. The LCD for 5-fluorouracil differed between arms (FOLFOX arm median: 81%; 5-FU/LV arm median: 96%). Visualizing LCD highlighted the timing of deviations from standard administration in a way RDI could not, with major differences in 5-fluorouracil LCD across treatment arms beginning after the sixth dose. Further evaluation of LCD and its impacts on clinical outcomes may clarify mechanisms for heterogeneous patient outcomes.
Collapse
|
Research Support, N.I.H., Extramural |
4 |
5 |
15
|
Webster-Clark M, Mavros P, Garry EM, Stürmer T, Shmuel S, Young J, Girman C. Alternative analytic and matching approaches for the prevalent new-user design: A simulation study. Pharmacoepidemiol Drug Saf 2022; 31:796-803. [PMID: 35505471 DOI: 10.1002/pds.5446] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 04/21/2022] [Accepted: 04/22/2022] [Indexed: 12/15/2022]
Abstract
PURPOSE To describe the creation of prevalent new user (PNU) cohorts and compare the relative bias and computational efficiency of several alternative analytic and matching approaches in PNU studies. METHODS In a simulated cohort, we estimated the effect of a treatment of interest vs a comparator among those who switched to the treatment of interest using the originally proposed time-conditional propensity score (TCPS) matching, standardized morbidity ratio weighting (SMRW), disease risk scores (DRS), and several alternative propensity score matching approaches. For each analytic method, we compared the average RR (across 2000 replicates) to the known risk ratio (RR) of 1.00. RESULTS SMRW and DRS yielded unbiased results (RR = 0.998 and 0.997, respectively). TCPS matching with replacement was also unbiased (RR = 0.999). TCPS matching without replacement was unbiased when matches were identified starting with patients with the shortest treatment history as initially proposed (RR = 0.999), but it resulted in very slight bias (RR = 0.983) when starting with patients with the longest treatment history. Similarly, creating a match pool without replacement starting with patients with the shortest treatment history yielded an unbiased estimate (RR = 0.997), but matching with the longest treatment history first resulted in substantial bias (RR = 0.903). The most biased strategy was matching after selecting one random comparator observation per individual that continued on the comparator (RR = 0.802). CONCLUSIONS Multiple analytic methods can estimate treatment effects without bias in a PNU cohort. Still, researchers should be wary of introducing bias when selecting controls for complex matching strategies beyond the initially proposed TCPS.
Collapse
|
Research Support, N.I.H., Extramural |
3 |
4 |
16
|
Ross RK, Su IH, Webster-Clark M, Jonsson Funk M. Nondifferential Treatment Misclassification Biases Toward the Null? Not a Safe Bet for Active Comparator Studies. Am J Epidemiol 2022; 191:1917-1925. [PMID: 35882378 PMCID: PMC10144712 DOI: 10.1093/aje/kwac131] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 05/04/2022] [Accepted: 07/21/2022] [Indexed: 02/01/2023] Open
Abstract
Active comparator studies are increasingly common, particularly in pharmacoepidemiology. In such studies, the parameter of interest is a contrast (difference or ratio) in the outcome risks between the treatment of interest and the selected active comparator. While it may appear treatment is dichotomous, treatment is actually polytomous as there are at least 3 levels: no treatment, the treatment of interest, and the active comparator. Because misclassification may occur between any of these groups, independent nondifferential treatment misclassification may not be toward the null (as expected with a dichotomous treatment). In this work, we describe bias from independent nondifferential treatment misclassification in active comparator studies with a focus on misclassification that occurs between each active treatment and no treatment. We derive equations for bias in the estimated outcome risks, risk difference, and risk ratio, and we provide bias correction equations that produce unbiased estimates, in expectation. Using data obtained from US insurance claims data, we present a hypothetical comparative safety study of antibiotic treatment to illustrate factors that influence bias and provide an example probabilistic bias analysis using our derived bias correction equations.
Collapse
|
Research Support, N.I.H., Extramural |
3 |
3 |
17
|
Webster-Clark M. Ways COVID-19 may impact unrelated pharmacoepidemiologic research using routinely collected data. Pharmacoepidemiol Drug Saf 2020; 30:400-401. [PMID: 33314441 DOI: 10.1002/pds.5182] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Accepted: 12/07/2020] [Indexed: 11/07/2022]
|
Comment |
5 |
2 |
18
|
Webster-Clark M, Keil AP. How Effect Measure Choice Influences Minimally Sufficient Adjustment Sets for External Validity. Am J Epidemiol 2023:7051039. [PMID: 36813295 DOI: 10.1093/aje/kwad041] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Revised: 12/01/2022] [Accepted: 02/21/2023] [Indexed: 02/24/2023] Open
Abstract
Epidemiologic researchers generalizing or transporting effect estimates from a study to a target must account for effect measure modifiers (EMMs) on the scale of interest. Little attention is paid to how the EMMs required may vary depending on the mathematical nuances of each effect measure, however. We defined two types of EMM: marginal EMM, where the effect on the scale of interest differs across levels of a variable; and conditional EMM, where the effect differs conditional on other variables associated with the outcome. These types define three classes of variables: Class 1 (conditional EMM), Class 2 (marginal, but not conditional, EMM), or Class 3 (neither marginal nor conditional EMM). Class 1 variables are necessary to achieve a valid estimate of the RD in a target, while a RR requires Class 1 and Class 2 and an OR requires Class 1, Class 2, and Class 3 (i.e., all variables associated with the outcome). This does not mean that fewer variables are required for an externally valid RD (because variables may not modify effects on all scale) but does suggest researchers should consider the scale of the effect measure when identifying EMM necessary for an externally valid treatment effect estimate.
Collapse
|
|
2 |
1 |
19
|
Ross RK, Kuo TM, Webster-Clark M, Lewis CL, Kistler CE, Jonsson Funk M, Lund JL. Validation of a 5-Year Mortality Prediction Model among U.S. Medicare Beneficiaries. J Am Geriatr Soc 2020; 68:2898-2902. [PMID: 32889756 DOI: 10.1111/jgs.16816] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Revised: 07/21/2020] [Accepted: 08/11/2020] [Indexed: 12/23/2022]
Abstract
BACKGROUND/OBJECTIVES A claims-based model predicting 5-year mortality (Lund-Lewis) was developed in a 2008 cohort of North Carolina (NC) Medicare beneficiaries and included indicators of comorbid conditions, frailty, disability, and functional impairment. The objective of this study was to validate the Lund-Lewis model externally within a nationwide sample of Medicare beneficiaries. DESIGN Retrospective validation study. SETTING U.S. Medicare population. PARTICIPANTS From a random sample of Medicare beneficiaries, we created four annual cohorts from 2008 to 2011 of individuals aged 66 and older with an office visit in that year. The annual cohorts ranged from 1.13 to 1.18 million beneficiaries. MEASUREMENTS The outcome was 5-year all-cause mortality. We assessed clinical indicators in the 12 months before the qualifying office visit and estimated predicted 5-year mortality for each beneficiary in the nationwide sample by applying estimates derived in the original NC cohort. Model performance was assessed by quantifying discrimination, calibration, and reclassification metrics compared with a model fit on a comorbidity score. RESULTS Across the annual cohorts, 5-year mortality ranged from 24.4% to 25.5%. The model had strong discrimination (C-statistics ranged across cohorts from .823 to .826). Reclassification measures showed improvement over a comorbidity score model for beneficiaries who died but reduced performance among beneficiaries who survived. The calibration slope ranged from .83 to .86; the model generally predicted a higher risk than observed. CONCLUSION The Lund-Lewis model showed strong and consistent discrimination in a national U.S. Medicare sample, although calibration indicated slight overfitting. Future work should investigate methods for improving model calibration and evaluating performance within specific disease settings.
Collapse
|
Research Support, N.I.H., Extramural |
5 |
1 |
20
|
Webster-Clark M, Sanoff HK, Keil AP, Sturmer T, Westreich D, Lund JL. Comparing FOLFOX delivery in trial and real-world populations using longitudinal cumulative dose. J Clin Oncol 2021. [DOI: 10.1200/jco.2021.39.15_suppl.1521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
1521 Background: Patterns of chemotherapy delivery are likely to differ between trial and real-world populations. Typical measures used to compare these patterns are calculated at treatment completion, potentially missing key differences in the timing and trajectory of delays and dose reductions. We used a new measure, longitudinal cumulative dose (LCD), to compare treatment delivery over time in trial and real-world populations. Methods: We compared chemotherapy delivery in patients with stage II-III colon cancer enrolled in the MOSAIC trial of 5-fluorouracil (5FU) vs oxaliplatin + 5FU (FOLFOX4) to patients treated from 2008-2019 in the US Oncology Network with FOLFOX4, FOLFOX6, or mFOLFOX6. For each patient, we computed oxaliplatin LCD as the cumulative oxaliplatin dose received at a given timepoint (t) divided by the final standard oxaliplatin dose. We then estimated the median and 25th and 75th percentiles for oxaliplatin LCD within each regimen at day 68 (before the standard timing of the 7th dose), 168 (two weeks after the standard end of treatment), and 250. Results: The table shows the number of patients receiving each treatment regimen and the median and interquartile range for oxaliplatin LCD at each time. Higher LCDs in the trial show delivery closer to standard treatment, meaning fewer delays, dose reductions, and discontinuations. Differences between the medians, 25th percentiles, and 75th percentiles of LCD in each regimen were small at day 68 but grew considerably by days 168 and 250. Conclusions: Divergence from the standard dosing schedule was larger in real-world versus trial settings and varied by oxaliplatin regimen. LCD, as a longitudinal measure, showed that differences in delivery between trial and real-world populations grew substantially over time (even after 168 days and the standard end of treatment) possibly as real-world patients experienced more side effects and barriers to treatment than trial participants. These discrepancies in LCD may cause poorer outcomes in real-world settings than expected based on randomized trials.[Table: see text]
Collapse
|
|
4 |
|
21
|
Webster-Clark M, Matthews AA, Ellis AR, Kinlaw AC, Platt RW. Using methods to extend inferences to specific target populations to improve the precision of subgroup analyses. J Clin Epidemiol 2025; 181:111716. [PMID: 39924128 DOI: 10.1016/j.jclinepi.2025.111716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Revised: 01/30/2025] [Accepted: 02/01/2025] [Indexed: 02/11/2025]
Abstract
OBJECTIVES While subgroup analyses are common in epidemiologic research, restriction to subgroup members can yield imprecise estimates. We aimed to demonstrate how methods extending inferences to external targets improve precision of subgroup estimates under the major assumption effects differ between subgroup members and nonmembers due to measured effect measure modifiers (EMMs) and membership is independent of the effect after conditioning on EMMs. STUDY DESIGN AND SETTING We applied this approach in the Panitumumab Randomized Trial in Combination with Chemotherapy for Metastatic Colorectal Cancer to Determine Efficacy. Assuming Hispanic vs non-Hispanic ethnicity was independent of the effect conditional on measured EMMs, we weighted non-Hispanic White participants to resemble Hispanic participants in EMMs, assigned Hispanic participants weights of 1, and estimated weighted 9-month progression-free survival differences (PFSDs) with 95% confidence limits from 2000 bootstraps. We also explored outcome-based approaches. Finally, we examined a situation where the method generates biased estimates (targeting participants with mutant-type Kirsten rat sarcoma virus (KRAS), which determines efficacy). RESULTS While the Hispanic participant-only analysis estimated a 9-month panitumumab PFSD of -7.1% (95% CI -32%, 19%), the weighted combined estimate targeting Hispanic participants was much more precise (-3.7%, 95% CI: -16%, 9.2%). Other analytic approaches yielded similar results. Meanwhile, the weighted combined estimate targeting mutant-type KRAS participants appeared biased (-2.2%, 95% CI: -7.5%, 3.3%) vs the subgroup-only estimate (-11%, 95% CI: -18%, -2.3%). CONCLUSION While extending inferences from study populations to specific targets can improve the precision of estimates in small subgroups, violating key assumptions creates bias for many subgroups of interest. PLAIN LANGUAGE SUMMARY Understanding the benefits and harms in specific subgroups of patients is an important part of epidemiologic and public health research. Unfortunately, commonly used methods to do subgroup analyses can result in estimates with lots of uncertainty. Repurposing methods that have traditionally been used to "generalize" or "transport" effect estimates from specific studies to the types of patients more likely to be encountered in the real world could be used to obtain more informative estimates in subgroups without ignoring differences between different types of patients. In this project, we applied this strategy to the Panitumumab Randomized Trial in Combination with Chemotherapy for Metastatic Colorectal Cancer to Determine Efficacy (PRIME) to create much less variable estimates of the treatment effect in Hispanic participants without ignoring the fact that there were more Hispanic participants with a tumor variation that changed the effect of treatment. On the other hand, when we tried to apply this strategy to improve estimates in patients with that tumor variation, we ended up with a misleading effect estimate. While these methods can reduce uncertainty about the benefits of treatment in specific subgroups interesting to researchers, they can result in incorrect subgroup estimates when their assumptions are violated.
Collapse
|
Clinical Trial, Phase III |
1 |
|
22
|
Pak J, Lund JL, Keil A, Westreich D, Stürmer T, Wohl D, Farel C, Drummond MB, Webster-Clark M. A systematic review of whether COVID-19 randomized controlled trials reported on demographic and clinical characteristics. Pharmacoepidemiol Drug Saf 2022; 31:1219-1227. [PMID: 35996832 PMCID: PMC9538362 DOI: 10.1002/pds.5533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 06/18/2022] [Accepted: 08/19/2022] [Indexed: 12/15/2022]
Abstract
PURPOSE We aim to assess the reporting of key patient-level demographic and clinical characteristics among COVID-19 related randomized controlled trials (RCTs). METHODS We queried English-language articles from PubMed, Web of Science, clinicaltrials.gov, and the CDC library of gray literature databases using keywords of "coronavirus," "covid," "clinical trial" and "randomized controlled trial" from January 2020 to June 2021. From the search, we conducted an initial review to rule-out duplicate entries, identify those that met inclusion criteria (i.e., had results), and exclude those that did not meet the definition of an RCT. Lastly, we abstracted the demographic and clinical characteristics reported on within each RCT. RESULTS From the initial 43 627 manuscripts, our final eligible manuscripts consisted of 149 RCTs described in 137 articles. Most of the RCTs (113/149) studied potential treatments, while fewer studied vaccines (29), prophylaxis strategies (5), and interventions to prevent transmission among those infected (2). Study populations ranged from 10 to 38 206 participants (median = 100, IQR: 60-300). All 149 RCTs reported on age, 147 on sex, 50 on race, and 110 on the prevalence of at least one comorbidity. No RCTs reported on income, urban versus rural residence, or other indicators of socioeconomic status (SES). CONCLUSIONS Limited reporting on race and other markers of SES make it difficult to draw conclusions about specific external target populations without making strong assumptions that treatment effects are homogenous. These findings highlight the need for more robust reporting on the clinical and demographic profiles of patients enrolled in COVID-19 related RCTs.
Collapse
|
Systematic Review |
3 |
|
23
|
Shu D, Webster-Clark M, Platt RW, Toh S. Meta-analysis with sample-standardization in multi-site studies. Pharmacoepidemiol Drug Saf 2023; 32:56-59. [PMID: 35976190 DOI: 10.1002/pds.5527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 07/13/2022] [Accepted: 08/13/2022] [Indexed: 02/06/2023]
Abstract
PURPOSE To conceptualize a particular target population and estimand for multi-site pharmacoepidemiologic studies within data networks and to analytically examine sample-standardization as a meta-analytic method compared with inverse-variance weighted meta-analyses. METHODS The target population of interest is all and only all individuals from the data-contributing sites. Standardization, a general conditioning technique frequently employed for confounding control, was adopted to estimate the network-wide causal treatment effect. Specifically, the proposed sample-standardization yields a meta-analysis estimator, that is, a weighted summation of site-specific results, where the weight for a site is the proportion of its size in the entire network. This sample-standardization estimator was evaluated analytically in comparison to estimators from inverse-variance weighted fixed-effect and random-effects meta-analyses in terms of statistical consistency. RESULTS A proof is reported to justify the consistency of the sample-standardization estimator with and without treatment effect heterogeneity by site. Both inverse-variance weighted fixed-effect and random-effects meta-analyses were found to generally result in inconsistent estimators in the presence of treatment effect heterogeneity by site for this particular target population and estimand. CONCLUSIONS Sample-standardization is a valid approach to generate causal inference in multi-site studies when the target population comprises all and only all individuals within the network, even in the presence of heterogeneity of treatment effect by site. Multi-site studies should clearly specify the target population and estimand to help select the most appropriate meta-analytic methods.
Collapse
|
Meta-Analysis |
2 |
|
24
|
Webster-Clark M, Filion KB, Platt RW. Standardizing to specific target populations in distributed networks and multisite pharmacoepidemiologic studies. Am J Epidemiol 2024; 193:1031-1039. [PMID: 38412261 PMCID: PMC11520739 DOI: 10.1093/aje/kwae015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 01/20/2024] [Accepted: 02/22/2024] [Indexed: 02/29/2024] Open
Abstract
Distributed network studies and multisite studies assess drug safety and effectiveness in diverse populations by pooling information. Targeting groups of clinical or policy interest (including specific sites or site combinations) and applying weights based on effect measure modifiers (EMMs) prior to pooling estimates within multisite studies may increase interpretability and improve precision. We simulated a 4-site study, standardized each site using inverse odds weights (IOWs) to resemble the 3 smallest sites or the smallest site, estimated IOW-weighted risk differences (RDs), and combined estimates with inverse variance weights (IVWs). We also created an artificial distributed network in the Clinical Practice Research Datalink (CPRD) Aurum consisting of 1 site for each geographic region. We compared metformin and sulfonylurea initiators with respect to mortality, targeting the smallest region. In the simulation, IOWs reduced differences between estimates and increased precision when targeting the 3 smallest sites or the smallest site. In the CPRD Aurum study, the IOW + IVW estimate was also more precise (smallest region: RD = 5.41% [95% CI, 1.03-9.79]; IOW + IVW estimate: RD = 3.25% [95% CI, 3.07-3.43]). When performing pharmacoepidemiologic research in distributed networks or multisite studies in the presence of EMMs, designation of target populations has the potential to improve estimate precision and interpretability. This article is part of a Special Collection on Pharmacoepidemiology.
Collapse
|
research-article |
1 |
|
25
|
Lund JL, Webster-Clark M, Keil AP, Westreich D, Sturmer T, Sanoff HK. Effectiveness of adjuvant FOLFOX versus 5FU for colon cancer treatment in community oncology practice using a hybrid study approach. J Clin Oncol 2020. [DOI: 10.1200/jco.2020.38.15_suppl.7067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
7067 Background: Treatment effects may differ between trials and community settings, in part due to underrepresentation of certain patient subgroups in trials. We used a hybrid approach combining clinical trial and real-world data to compare the effectiveness of adjuvant FOLFOX vs 5FU for stage II-III colon cancer in community oncology practice. Methods: We used Multicenter International Study of Oxaliplatin/5FU-LV in the Adjuvant Treatment of Colon Cancer (MOSAIC) combined with patients who met trial eligibility criteria within US Oncology from 1/1/2008-5/31/2019. In the combined data, we used logistic regression to estimate the probability of trial enrollment as a function of age, sex, substage, body mass index (BMI), and performance status. We estimated inverse odds of sampling weights and weighted MOSAIC participants to reflect three US Oncology populations: 1) patients meeting trial eligibility, 2) stage III patients, and 3) stage III patients initiating FOLFOX. Within the weighted trial populations, we estimated mortality hazard ratios (HRs) and bootstrapped 95% confidence intervals (CIs) comparing FOLFOX with 5FU. Results: There were 2246 MOSAIC participants and 9335 US Oncology patients. MOSAIC participants were younger, had more stage II cancer, lower BMI, and worse performance status compared with US Oncology patients. After weighting MOSAIC participants to reflect the US Oncology populations, the HRs were attenuated (Table) compared with the original MOSAIC estimate (HR = 0.84; 0.71,1.00). Conclusions: When differences between trial and clinical populations exist and response to therapy varies across subgroups, treatment efficacy can differ from clinical effectiveness. Compared with trial results, we found that effectiveness of FOLFOX versus 5FU was attenuated in community oncology practice. [Table: see text]
Collapse
|
|
5 |
|