1
|
Hudson A, Shojaie A. Statistical inference on qualitative differences in the magnitude of an effect. Stat Med 2024; 43:1419-1440. [PMID: 38305667 PMCID: PMC10947912 DOI: 10.1002/sim.10025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/15/2023] [Indexed: 02/03/2024]
Abstract
Qualitative interactions occur when a treatment effect or measure of association varies in sign by sub-population. Of particular interest in many biomedical settings are absence/presence qualitative interactions, which occur when an effect is present in one sub-population but absent in another. Absence/presence interactions arise in emerging applications in precision medicine, where the objective is to identify a set of predictive biomarkers that have prognostic value for clinical outcomes in some sub-population but not others. They also arise naturally in gene regulatory network inference, where the goal is to identify differences in networks corresponding to diseased and healthy individuals, or to different subtypes of disease; such differences lead to identification of network-based biomarkers for diseases. In this paper, we argue that while the absence/presence hypothesis is important, developing a statistical test for this hypothesis is an intractable problem. To overcome this challenge, we approximate the problem in a novel inference framework. In particular, we propose to make inferences about absence/presence interactions by quantifying the relative difference in effect size, reasoning that when the relative difference is large, an absence/presence interaction occurs. The proposed methodology is illustrated through a simulation study as well as an analysis of breast cancer data from the Cancer Genome Atlas.
Collapse
Affiliation(s)
- Aaron Hudson
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Washington, United States
| | - Ali Shojaie
- Department of Biostatistics, University of Washington, Washington, United States
| |
Collapse
|
2
|
Abstract
A common reason given for assessing interaction is to evaluate “whether the effect is larger in one group versus another”. It has long been known that the answer to this question is scale dependent: the “effect” may be larger for one subgroup on the difference scale, but smaller on the ratio scale. In this article, we show that if the relative magnitude of effects across subgroups is of interest then there exists an “interaction continuum” that characterizes the nature of these relations. When both main effects are positive then the placement on the continuum depends on the relative magnitude of the probability of the outcome in the doubly exposed group. For high probabilities of the outcome in the doubly exposed group, the interaction may be positive-multiplicative positive-additive, the strongest form of positive interaction on the “interaction continuum”. As the probability of the outcome in the doubly exposed group goes down, the form of interaction descends through ranks, of what we will refer to as the following: positive-multiplicative positive-additive, no-multiplicative positive-additive, negative-multiplicative positive-additive, negative-multiplicative zero-additive, negative-multiplicative negative-additive, single pure interaction, single qualitative interaction, single-qualitative single-pure interaction, double qualitative interaction, perfect antagonism, inverted interaction. One can thus place a particular set of outcome probabilities into one of these eleven states on the interaction continuum. Analogous results are also given when both exposures are protective, or when one is protective and one causative. The “interaction continuum” can allow for inquiries as to relative effects sizes, while also acknowledging the scale dependence of the notion of interaction itself.
Collapse
|
3
|
Lienert J, Patel M. Patient Phenotypes Help Explain Variation in Response to a Social Gamification Weight Loss Intervention. Am J Health Promot 2019; 34:277-284. [PMID: 31876175 DOI: 10.1177/0890117119892776] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
PURPOSE This study aims to determine latent classes of study participants using baseline characteristics, explore the patterns within the groups, and determine whether the intervention had differential effects on weight loss across the groups. DESIGN Secondary analysis of a completed randomized clinical trial. SETTING Participants in a gamification intervention with social incentives who were recruited as pairs and given an intervention for 24 weeks. Participants were randomized to control, gamification, or gamification with primary care physician sharing arms. PARTICIPANTS All 196 participants in the Lose It trial (recruited as 98 pairs). MEASURES Outcome variable-participants' weight change after 24 and 36 weeks. Factors-intervention arm and latent class. ANALYSIS Latent class analysis on both participants' and teams' characteristics. This was followed by 1-sample t tests of weight at 24 and 36 weeks, stratified by latent class. RESULTS Three groups of participants were identified: "Kin teams," "Distant teams," and "Married teams." "Kin teams" lost more weight after the intervention in the gamification and gamification with PCP sharing arms. The "Distant teams" lost similar amounts of weight in all 3 arms but did not keep it off during maintenance. The "Married teams" lost the most weight across all 3 arms and kept it off following the intervention. CONCLUSIONS Patient phenotypes can identify variations in response to a gamification weight loss intervention. Future intervention studies may benefit from leveraging this during participant recruitment and allocation.
Collapse
Affiliation(s)
- Jeffrey Lienert
- Center for Health Equity Research and Promotion, Crescenz VA Medical Center, Philadelphia, PA, USA.,Nudge Unit, Perelman School of Medicine, Philadelphia, PA, USA
| | - Mitesh Patel
- Center for Health Equity Research and Promotion, Crescenz VA Medical Center, Philadelphia, PA, USA.,Nudge Unit, Perelman School of Medicine, Philadelphia, PA, USA.,Department of Health Care Management, The Wharton School, Philadelphia, PA, USA
| |
Collapse
|
4
|
Kracht CL, Webster EK, Staiano AE. Sociodemographic Differences in Young Children Meeting 24-Hour Movement Guidelines. J Phys Act Health 2019; 16:908-915. [PMID: 31491748 PMCID: PMC7058481 DOI: 10.1123/jpah.2019-0018] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Revised: 06/13/2019] [Accepted: 07/14/2019] [Indexed: 12/16/2022]
Abstract
BACKGROUND Little is known about variation in meeting the 24-Hour Movement Guidelines (including physical activity [PA], sleep, and screen time [ST]) in early childhood. The aim was to evaluate sociodemographic differences in meeting the 24-Hour Movement Guidelines. METHODS Parents of 3-4 year old children reported sociodemographic information and ST. Sleep and PA were measured using accelerometry, and height and weight were objectively measured. The 24-Hour Movement Guidelines include daily PA (total PA: ≥3 h; including ≥1 h of moderate to vigorous), sleep (10-13 h), and ST (≤1 h). Meeting guidelines by age, sex, race, poverty level, and weight status were assessed using chi-square and linear regression models. RESULTS Of 107 children, 57% were white and 26% lived in households at or below the poverty level. Most children met the PA (91.5%) and sleep (86.9%) guidelines, but few met ST (14.0%) or all 3 (11.3%) guidelines. African American children and children who lived at or below the poverty level were less likely to meet the sleep, ST, and all 3 guidelines compared with others (P < .01 for all). There were no other differences. CONCLUSION These results suggest future interventions should focus on reducing differences in movement, namely in sleep and ST.
Collapse
Affiliation(s)
- Chelsea L. Kracht
- Pennington Biomedical Research Center, 6400 Perkins Road, Baton Rouge, LA, 70808
| | - E. Kipling Webster
- Louisiana State University’s School of Kinesiology, 112 Long Fieldhouse, Baton Rouge, LA, 70803
| | - Amanda E. Staiano
- Pennington Biomedical Research Center, 6400 Perkins Road, Baton Rouge, LA, 70808
| |
Collapse
|
5
|
Abstract
We consider the problem of selecting the optimal subgroup to treat when data on covariates are available from a randomized trial or observational study. We distinguish between four different settings including: (1) treatment selection when resources are constrained; (2) treatment selection when resources are not constrained; (3) treatment selection in the presence of side effects and costs; and (4) treatment selection to maximize effect heterogeneity. We show that, in each of these cases, the optimal treatment selection rule involves treating those for whom the predicted mean difference in outcomes comparing those with versus without treatment, conditional on covariates, exceeds a certain threshold. The threshold varies across these four scenarios, but the form of the optimal treatment selection rule does not. The results suggest a move away from the traditional subgroup analysis for personalized medicine. New randomized trial designs are proposed so as to implement and make use of optimal treatment selection rules in healthcare practice.
Collapse
|
6
|
Taft L, Shen C. A non-parametric statistical test of null treatment effect in sub-populations. J Biopharm Stat 2019; 30:277-293. [PMID: 31304862 DOI: 10.1080/10543406.2019.1636810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Randomized clinical trials are designed to estimate the average treatment effect (ATE). If heterogeneity of treatment effect exists, then it is possible that there may be subjects who derive a treatment effect different from the ATE. We propose a method to test the hypothesis that there exist subjects who derive benefit (or harm) against the null hypothesis that the treatment has no benefit (or harm) on each of the smallest sub-populations defined by discrete baseline covariates. Our approach is nonparametric, which generates the null distribution of the test statistic by the permutation principle. A key innovation of our method is that stochastic simulation is built into the test statistic to detect signals that may not be linearly related to the multiple covariates. This is important because, in many real clinical problems, the treatment effect is not linearly correlated with relevant baseline characteristics. We applied the method to a real randomized study that compared the Implantable Cardioverter Defibrillator (ICD) with conventional medical therapy in reducing total mortality in a low ejection fraction population. Simulations and power calculations were performed to compare the proposed test with existing methods.
Collapse
Affiliation(s)
- Lin Taft
- Clinical Statistics, GlaxoSmithKline, Collegeville, PA, USA
| | - Changyu Shen
- Department of Medicine, Smith Center for Outcomes Research in Cardiology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
7
|
Lesko CR, Henderson NC, Varadhan R. Considerations when assessing heterogeneity of treatment effect in patient-centered outcomes research. J Clin Epidemiol 2018; 100:22-31. [PMID: 29654822 DOI: 10.1016/j.jclinepi.2018.04.005] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2017] [Revised: 02/06/2018] [Accepted: 04/01/2018] [Indexed: 01/23/2023]
Abstract
When baseline risk of an outcome varies within a population, the effect of a treatment on that outcome will vary on at least one scale (e.g., additive, multiplicative). This treatment effect heterogeneity is of interest in patient-centered outcomes research. Based on a literature review and solicited expert opinion, we assert the following: (1) Treatment effect heterogeneity on the additive scale is most interpretable to health-care providers and patients using effect estimates to guide treatment decision-making; heterogeneity reported on the multiplicative scale may be misleading as to the magnitude or direction of a substantively important interaction. (2) The additive scale may give clues about sufficient-cause interaction, although such interaction is typically not relevant to patients' treatment choices. (3) Statistical modeling need not be conducted on the same scale as results are communicated. (4) Statistical testing is one tool for investigations, provided important subgroups are identified a priori, but test results should be interpreted cautiously given nonequivalence of statistical and clinical significance. (5) Qualitative interactions should be evaluated in a prespecified manner for important subgroups. Principled analytic plans that take into account the purpose of investigation of treatment effect heterogeneity are likely to yield more useful results for guiding treatment decisions.
Collapse
Affiliation(s)
- Catherine R Lesko
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, 615 N Wolfe St., Baltimore, MD 21205, USA
| | - Nicholas C Henderson
- Division of Biostatistics and Bioinformatics, Sidney Kimmel Cancer Care Center, Johns Hopkins School of Medicine, 550 N. Broadway, suite 1111-E, Baltimore, MD 21205, USA
| | - Ravi Varadhan
- Division of Biostatistics and Bioinformatics, Sidney Kimmel Cancer Care Center, Johns Hopkins School of Medicine, 550 N. Broadway, suite 1111-E, Baltimore, MD 21205, USA; Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 N Wolfe St., Baltimore, MD 21205, USA.
| |
Collapse
|
8
|
Petkova E, Ogden RT, Tarpey T, Ciarleglio A, Jiang B, Su Z, Carmody T, Adams P, Kraemer HC, Grannemann BD, Oquendo MA, Parsey R, Weissman M, McGrath PJ, Fava M, Trivedi MH. Statistical Analysis Plan for Stage 1 EMBARC (Establishing Moderators and Biosignatures of Antidepressant Response for Clinical Care) Study. Contemp Clin Trials Commun 2017; 6:22-30. [PMID: 28670629 PMCID: PMC5485858 DOI: 10.1016/j.conctc.2017.02.007] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2016] [Revised: 02/08/2017] [Accepted: 02/13/2017] [Indexed: 12/28/2022] Open
Abstract
Antidepressant medications are commonly used to treat depression, but only about 30% of patients reach remission with any single first-step antidepressant. If the first-step treatment fails, response and remission rates at subsequent steps are even more limited. The literature on biomarkers for treatment response is largely based on secondary analyses of studies designed to answer primary questions of efficacy, rather than on a planned systematic evaluation of biomarkers for treatment decision. The lack of evidence-based knowledge to guide treatment decisions for patients with depression has lead to the recognition that specially designed studies with the primary objective being to discover biosignatures for optimizing treatment decisions are necessary. Establishing Moderators and Biosignatures of Antidepressant Response in Clinical Care (EMBARC) is one such discovery study. Stage 1 of EMBARC is a randomized placebo controlled clinical trial of 8 week duration. A wide array of patient characteristics is collected at baseline, including assessments of brain structure, function and connectivity along with electrophysiological, biological, behavioral and clinical features. This paper reports on the data analytic strategy for discovering biosignatures for treatment response based on Stage 1 of EMBARC.
Collapse
Affiliation(s)
- Eva Petkova
- New York University, New York, NY, USA
- Nathan Kline Institute for Psychiatric Research, Orangeburg, NY, USA
| | | | - Thaddeus Tarpey
- New York University, New York, NY, USA
- Wright State University, Dayton, OH, USA
| | - Adam Ciarleglio
- New York University, New York, NY, USA
- Columbia University, New York, NY, USA
| | - Bei Jiang
- University of Alberta, Edmonton, Alberta, Canada
| | - Zhe Su
- New York University, New York, NY, USA
| | - Thomas Carmody
- University of Texas, Southwestern Medical Center, Dallas, TX, USA
| | - Philip Adams
- New York State Psychiatric Institute, New York, NY, USA
- Department of Psychiatry, College of Physicians and Surgeons of Columbia University, New York, NY, USA
| | | | | | - Maria A. Oquendo
- New York State Psychiatric Institute, New York, NY, USA
- Department of Psychiatry, College of Physicians and Surgeons of Columbia University, New York, NY, USA
| | | | - Myrna Weissman
- New York State Psychiatric Institute, New York, NY, USA
- Department of Psychiatry, College of Physicians and Surgeons of Columbia University, New York, NY, USA
| | - Patrick J. McGrath
- New York State Psychiatric Institute, New York, NY, USA
- Department of Psychiatry, College of Physicians and Surgeons of Columbia University, New York, NY, USA
| | | | | |
Collapse
|
9
|
[Biostatistical support for decision making in drug licensing and reimbursement exemplified by implications of heterogeneous findings in subgroups of the study population]. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz 2017; 58:274-82. [PMID: 25566838 DOI: 10.1007/s00103-014-2105-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
In the context of both drug licensing and reimbursement, the target population is sometimes restricted to a specific subgroup. In the setting of drug licensing the discussion concerns a negative benefit/risk assessment in a relevant subgroup. For reimbursement the debate involves the detection of an additional benefit compared with standard treatment, which can in some situations not be accepted for the overall study population. In their Methods Paper, the Institute for Quality and Efficiency in Health Care (IQWiG) refers to published articles that name criteria for the evaluation of credibility to claim a therapeutic effect on the basis of results in the subgroups of a study population (BMJ 340:850-854, 2010). A number of these criteria have found their way into the regulatory debate, which was recently published in a draft guideline of the European Medicines Agency (EMA). However, the significance of the interaction/heterogeneity test has been mentioned as one criterion for the credibility of a finding in a subgroup of the study population. This aspect is critically challenged in our paper. In our estimation, the application of this criterion hinders the critical discussion of whether a global treatment effect is applicable to relevant subgroups of a study population and the potential implications of this. We feel that biostatistical support for decision-making strategies should be the same in both worlds, even though in some instances the outcomes in a specific situation may be different, depending on the objective to be demonstrated.
Collapse
|
10
|
Tanniou J, van der Tweel I, Teerenstra S, Roes KCB. Subgroup analyses in confirmatory clinical trials: time to be specific about their purposes. BMC Med Res Methodol 2016; 16:20. [PMID: 26891992 PMCID: PMC4757983 DOI: 10.1186/s12874-016-0122-6] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2016] [Accepted: 02/09/2016] [Indexed: 11/26/2022] Open
Abstract
Background It is well recognized that treatment effects may not be homogeneous across the study population. Subgroup analyses constitute a fundamental step in the assessment of evidence from confirmatory (Phase III) clinical trials, where conclusions for the overall study population might not hold. Subgroup analyses can have different and distinct purposes, requiring specific design and analysis solutions. It is relevant to evaluate methodological developments in subgroup analyses against these purposes to guide health care professionals and regulators as well as to identify gaps in current methodology. Methods We defined four purposes for subgroup analyses: (1) Investigate the consistency of treatment effects across subgroups of clinical importance, (2) Explore the treatment effect across different subgroups within an overall non-significant trial, (3) Evaluate safety profiles limited to one or a few subgroup(s), (4) Establish efficacy in the targeted subgroup when included in a confirmatory testing strategy of a single trial. We reviewed the methodology in line with this “purpose-based” framework. The review covered papers published between January 2005 and April 2015 and aimed to classify them in none, one or more of the aforementioned purposes. Results In total 1857 potentially eligible papers were identified. Forty-eight papers were selected and 20 additional relevant papers were identified from their references, leading to 68 papers in total. Nineteen were dedicated to purpose 1, 16 to purpose 4, one to purpose 2 and none to purpose 3. Seven papers were dedicated to more than one purpose, the 25 remaining could not be classified unambiguously. Purposes of the methods were often not specifically indicated, methods for subgroup analysis for safety purposes were almost absent and a multitude of diverse methods were developed for purpose (1). Conclusions It is important that researchers developing methodology for subgroup analysis explicitly clarify the objectives of their methods in terms that can be understood from a patient’s, health care provider’s and/or regulator’s perspective. A clear operational definition for consistency of treatment effects across subgroups is lacking, but is needed to improve the usability of subgroup analyses in this setting. Finally, methods to particularly explore benefit-risk systematically across subgroups need more research. Electronic supplementary material The online version of this article (doi:10.1186/s12874-016-0122-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Julien Tanniou
- Julius Center for Health Sciences and Primary Care, UMC Utrecht, Universiteitsweg 100, 3584 CG, Utrecht, The Netherlands. .,College ter Beoordeling van Geneesmiddelen, Dutch Medicines Evaluation Board, Graadt van Roggenweg 500, 3531 AH, Utrecht, The Netherlands.
| | - Ingeborg van der Tweel
- Julius Center for Health Sciences and Primary Care, UMC Utrecht, Universiteitsweg 100, 3584 CG, Utrecht, The Netherlands.
| | - Steven Teerenstra
- College ter Beoordeling van Geneesmiddelen, Dutch Medicines Evaluation Board, Graadt van Roggenweg 500, 3531 AH, Utrecht, The Netherlands. .,Department of Health Evidence, Section Biostatistics, Radboud University Medical Centre, Geert Grooteplein 21, 6525 GA, Nijmegen, The Netherlands.
| | - Kit C B Roes
- Julius Center for Health Sciences and Primary Care, UMC Utrecht, Universiteitsweg 100, 3584 CG, Utrecht, The Netherlands. .,College ter Beoordeling van Geneesmiddelen, Dutch Medicines Evaluation Board, Graadt van Roggenweg 500, 3531 AH, Utrecht, The Netherlands.
| |
Collapse
|
11
|
Alosh M, Huque MF, Koch GG. Statistical Perspectives on Subgroup Analysis: Testing for Heterogeneity and Evaluating Error Rate for the Complementary Subgroup. J Biopharm Stat 2014; 25:1161-78. [DOI: 10.1080/10543406.2014.971169] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
12
|
Kitsche A. Detecting qualitative interactions in clinical trials with binary responses. Pharm Stat 2014; 13:309-15. [PMID: 25049176 DOI: 10.1002/pst.1632] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2013] [Revised: 03/21/2014] [Accepted: 06/23/2014] [Indexed: 11/07/2022]
Abstract
This study considers the detection of treatment-by-subset interactions in a stratified, randomised clinical trial with a binary-response variable. The focus lies on the detection of qualitative interactions. In addition, the presented method is useful more generally, as it can assess the inconsistency of the treatment effects among strata by using an a priori-defined inconsistency margin. The methodology presented is based on the construction of ratios of treatment effects. In addition to multiplicity-adjusted p-values, simultaneous confidence intervals are recommended to use in detecting the source and the amount of a potential qualitative interaction. The proposed method is demonstrated on a multi-regional trial using the open-source statistical software R.
Collapse
Affiliation(s)
- Andreas Kitsche
- Institut für Biostatistik, Leibniz Universität Hannover, Herrenhäuser Straße 2, Hannover, Germany
| |
Collapse
|
13
|
Abstract
AbstractIn this tutorial, we provide a broad introduction to the topic of interaction between the effects of exposures. We discuss interaction on both additive and multiplicative scales using risks, and we discuss their relation to statistical models (e.g. linear, log-linear, and logistic models). We discuss and evaluate arguments that have been made for using additive or multiplicative scales to assess interaction. We further discuss approaches to presenting interaction analyses, different mechanistic forms of interaction, when interaction is robust to unmeasured confounding, interaction for continuous outcomes, qualitative or “crossover” interactions, methods for attributing effects to interactions, case-only estimators of interaction, and power and sample size calculations for additive and multiplicative interaction.
Collapse
|
14
|
Kitsche A, Hothorn LA. Testing for qualitative interaction using ratios of treatment differences. Stat Med 2013; 33:1477-89. [DOI: 10.1002/sim.6048] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2013] [Revised: 08/30/2013] [Accepted: 11/03/2013] [Indexed: 11/07/2022]
Affiliation(s)
- Andreas Kitsche
- Institut für Biostatistik; Leibniz Universität Hannover; Herrenhäuser Straße 2 30419 Hannover Germany
| | - Ludwig A. Hothorn
- Institut für Biostatistik; Leibniz Universität Hannover; Herrenhäuser Straße 2 30419 Hannover Germany
| |
Collapse
|
15
|
Detecting moderator effects using subgroup analyses. PREVENTION SCIENCE : THE OFFICIAL JOURNAL OF THE SOCIETY FOR PREVENTION RESEARCH 2013; 14:111-20. [PMID: 21562742 DOI: 10.1007/s11121-011-0221-x] [Citation(s) in RCA: 90] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
In the analysis of prevention and intervention studies, it is often important to investigate whether treatment effects vary among subgroups of patients defined by individual characteristics. These "subgroup analyses" can provide information about how best to use a new prevention or intervention program. However, subgroup analyses can be misleading if they test data-driven hypotheses, employ inappropriate statistical methods, or fail to account for multiple testing. These problems have led to a general suspicion of findings from subgroup analyses. This article discusses sound methods for conducting subgroup analyses to detect moderators. Multiple authors have argued that, to assess whether a treatment effect varies across subgroups defined by patient characteristics, analyses should be based on tests for interaction rather than treatment comparisons within the subgroups. We discuss the concept of heterogeneity and its dependence on the metric used to describe treatment effects. We discuss issues of multiple comparisons related to subgroup analyses and the importance of considering multiplicity in the interpretation of results. We also discuss the types of questions that would lead to subgroup analyses and how different scientific goals may affect the study at the design stage. Finally, we discuss subgroup analyses based on post-baseline factors and the complexity associated with this type of subgroup analysis.
Collapse
|
16
|
Abstract
Plausibility of high variability in treatment effects across individuals has been recognized as an important consideration in clinical studies. Surprisingly, little attention has been given to evaluating this variability in design of clinical trials or analyses of resulting data. High variation in a treatment's efficacy or safety across individuals (referred to herein as treatment heterogeneity) may have important consequences because the optimal treatment choice for an individual may be different from that suggested by a study of average effects. We call this an individual qualitative interaction (IQI), borrowing terminology from earlier work - referring to a qualitative interaction (QI) being present when the optimal treatment varies across a"groups" of individuals. At least three techniques have been proposed to investigate treatment heterogeneity: techniques to detect a QI, use of measures such as the density overlap of two outcome variables under different treatments, and use of cross-over designs to observe "individual effects." We elucidate underlying connections among them, their limitations and some assumptions that may be required. We do so under a potential outcomes framework that can add insights to results from usual data analyses and to study design features that improve the capability to more directly assess treatment heterogeneity.
Collapse
Affiliation(s)
- Robert S Poulson
- Statistical Methods Group, Edwards Air Force Base Edwards, CA 93524
| | | | | |
Collapse
|
17
|
Parker RA. Testing for qualitative interactions between stages in an adaptive study. Stat Med 2010; 29:210-8. [PMID: 19908261 DOI: 10.1002/sim.3757] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
I consider the underlying structure for a test of qualitative interaction of a treatment when assessing heterogeneity between stages in an adaptive trial. Since decisions about the clinical utility of a drug are based on the balance of risks and benefits, a quantitative interaction in treatment efficacy across different groups could lead to qualitatively different decisions. Thus, the difference between quantitative and qualitative interactions is not a true dichotomy. I show that the standard tests for qualitative interactions (Gail and Simon,Biometrics 1985; 41:361-372; Piantadosi and Gail, Statist. Med. 1993; 12:1239-1248) are very conservative in this application. Theoretical calculations in a simpler situation confirm that the published criteria are very conservative, which may help explain why the tests are known to have very low power to detect interaction. I introduce the concept of 'minimum detectable effect', which is the smallest effect that a study could identify as statistically significant. I propose that important heterogeneity between stages in an adaptive trial be identified when two criteria are met. First, at least one individual stage must be below the overall study mean by at least the minimum detectable effect. Second, using an appropriate critical value based on simulations, there must be statistically significant heterogeneity between the stages.
Collapse
Affiliation(s)
- Robert A Parker
- Truth, Ltd., 3311 Blue Ridge Court, Westlake Village, CA 91362, USA.
| |
Collapse
|
18
|
Bayman EO, Chaloner K, Cowles MK. Detecting qualitative interaction: a Bayesian approach. Stat Med 2010; 29:455-63. [PMID: 19950107 DOI: 10.1002/sim.3787] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Differences in treatment effects between centers in a multi-center trial may be important. These differences represent treatment by subgroup interaction. Peto defines qualitative interaction (QI) to occur when the simple treatment effect in one subgroup has a different sign than in another subgroup: this interaction is important. Interaction where the treatment effects are of the same sign in all subgroups is called quantitative and is often not important because the treatment recommendation is identical in all cases. A hierarchical model is used here with exchangeable mean responses to each treatment between subgroups. The posterior probability of QI and the corresponding Bayes factor are proposed as a diagnostic and as a test statistic. The model is motivated by two multi-center trials with binary responses. The frequentist power and size of the test using the Bayes factor are examined and compared with two other commonly used tests. The impact of imbalance between the sample sizes in each subgroup on power is examined, and the test based on the Bayes factor typically has better power for unbalanced designs, especially for small sample sizes. An exact test based on the Bayes factor is also suggested assuming the hierarchical model. The Bayes factor provides a concise summary of the evidence for or against QI. It is shown by example that it is easily adapted to summarize the evidence for 'clinically meaningful QI,' defined as the simple effects being of opposite signs and larger in absolute value than a minimal clinically meaningful effect.
Collapse
|