1
|
Simulation-based sample size calculations of marginal proportional means models for recurrent events with competing risks. Pharm Stat 2024. [PMID: 38509020 DOI: 10.1002/pst.2382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Revised: 12/22/2023] [Accepted: 03/05/2024] [Indexed: 03/22/2024]
Abstract
In randomised controlled trials, the outcome of interest could be recurrent events, such as hospitalisations for heart failure. If mortality rates are non-negligible, both recurrent events and competing terminal events need to be addressed when formulating the estimand and statistical analysis is no longer trivial. In order to design future trials with primary recurrent event endpoints with competing risks, it is necessary to be able to perform power calculations to determine sample sizes. This paper introduces a simulation-based approach for power estimation based on a proportional means model for recurrent events and a proportional hazards model for terminal events. The simulation procedure is presented along with a discussion of what the user needs to specify to use the approach. The method is flexible and based on marginal quantities which are easy to specify. However, the method introduces a lack of a certain type of dependence. This is explored in a sensitivity analysis which suggests that the power is robust in spite of that. Data from a randomised controlled trial, LEADER, is used as the basis for generating data for a future trial. Finally, potential power gains of recurrent event methods as opposed to first event methods are discussed.
Collapse
|
2
|
A statistical framework for planning and analysing test-retest studies of repeatability. Stat Methods Med Res 2024; 33:295-308. [PMID: 38298010 DOI: 10.1177/09622802241227959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2024]
Abstract
There is an increasing number of potential quantitative biomarkers that could allow for early assessment of treatment response or disease progression. However, measurements of such biomarkers are subject to random variability. Hence, differences of a biomarker in longitudinal measurements do not necessarily represent real change but might be caused by this random measurement variability. Before utilizing a quantitative biomarker in longitudinal studies, it is therefore essential to assess the measurement repeatability. Measurement repeatability obtained from test-retest studies can be quantified by the repeatability coefficient, which is then used in the subsequent longitudinal study to determine if a measured difference represents real change or is within the range of expected random measurement variability. The quality of the point estimate of the repeatability coefficient, therefore, directly governs the assessment quality of the longitudinal study. Repeatability coefficient estimation accuracy depends on the case number in the test-retest study, but despite its pivotal role, no comprehensive framework for sample size calculation of test-retest studies exists. To address this issue, we have established such a framework, which allows for flexible sample size calculation of test-retest studies, based upon newly introduced criteria concerning assessment quality in the longitudinal study. This also permits retrospective assessment of prior test-retest studies.
Collapse
|
3
|
Designing individually randomized group treatment trials with repeated outcome measurements using generalized estimating equations. Stat Med 2024; 43:358-378. [PMID: 38009329 PMCID: PMC10939061 DOI: 10.1002/sim.9966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2022] [Revised: 11/04/2023] [Accepted: 11/08/2023] [Indexed: 11/28/2023]
Abstract
Individually randomized group treatment (IRGT) trials, in which the clustering of outcome is induced by group-based treatment delivery, are increasingly popular in public health research. IRGT trials frequently incorporate longitudinal measurements, of which the proper sample size calculations should account for correlation structures reflecting both the treatment-induced clustering and repeated outcome measurements. Given the relatively sparse literature on designing longitudinal IRGT trials, we propose sample size procedures for continuous and binary outcomes based on the generalized estimating equations approach, employing the block exchangeable correlation structures with different correlation parameters for the treatment arm and for the control arm, and surveying five marginal mean models with different assumptions of time effect: no-time constant treatment effect, linear-time constant treatment effect, categorical-time constant treatment effect, linear time by treatment interaction, and categorical time by treatment interaction. Closed-form sample size formulas are derived for continuous outcomes, which depends on the eigenvalues of the correlation matrices; detailed numerical sample size procedures are proposed for binary outcomes. Through simulations, we demonstrate that the empirical power agrees well with the predicted power, for as few as eight groups formed in the treatment arm, when data are analyzed using the matrix-adjusted estimating equations for the correlation parameters with a bias-corrected sandwich variance estimator.
Collapse
|
4
|
Sample size calculation in clinical trials with two co-primary endpoints including overdispersed count and continuous outcomes. Pharm Stat 2024; 23:46-59. [PMID: 38267827 DOI: 10.1002/pst.2337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 08/11/2023] [Accepted: 08/25/2023] [Indexed: 01/26/2024]
Abstract
Count outcomes are collected in clinical trials for new drug development in several therapeutic areas and the event rate is commonly used as a single primary endpoint. Count outcomes that are greater than the mean value are termed overdispersion; thus, count outcomes are assumed to have a negative binomial distribution. However, in clinical trials for treating asthma and chronic obstructive pulmonary disease (COPD), a regulatory agency has suggested that a continuous endpoint related to lung function must be evaluated as a primary endpoint in addition to the event rate. The two co-primary endpoints that need to be evaluated include overdispersed count and continuous outcomes. Some researchers have proposed sample size calculation methods in the context of co-primary endpoints for various outcome types. However, methodologies for sample size calculation in trials with two co-primary endpoints, including overdispersed count and continuous outcomes, required when planning clinical trials for treating asthma and COPD, remain to be proposed. In this study, we aimed to develop a hypothesis-testing method and a corresponding sample size calculation method with two co-primary endpoints including overdispersed count and continuous outcomes. In a simulation, we demonstrated that the proposed sample size calculation method has adequate power accuracy. In addition, we illustrated an application of the proposed sample size calculation method to a placebo-controlled Phase 3 trial for patients with COPD.
Collapse
|
5
|
Study design for restricted mean time analysis of recurrent events and death. Biometrics 2023; 79:3701-3714. [PMID: 37612246 PMCID: PMC10841174 DOI: 10.1111/biom.13923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Accepted: 08/10/2023] [Indexed: 08/25/2023]
Abstract
The restricted mean time in favor (RMT-IF) of treatment has just been added to the analytic toolbox for composite endpoints of recurrent events and death. To help practitioners design new trials based on this method, we develop tools to calculate the sample size and power. Specifically, we formulate the outcomes as a multistate Markov process with a sequence of transient states for recurrent events and an absorbing state for death. The transition intensities, in this case the instantaneous risks of another nonfatal event or death, are assumed to be time-homogeneous but nonetheless allowed to depend on the number of past events. Using the properties of Coxian distributions, we derive the RMT-IF effect size under the alternative hypothesis as a function of the treatment-to-control intensity ratios along with the baseline intensities, the latter of which can be easily estimated from historical data. We also reduce the variance of the nonparametric RMT-IF estimator to calculable terms under a standard set-up for censoring. Simulation studies show that the resulting formulas provide accurate approximation to the sample size and power in realistic settings. For illustration, a past cardiovascular trial with recurrent-hospitalization and mortality outcomes is analyzed to generate the parameters needed to design a future trial. The procedures are incorporated into the rmt package along with the original methodology on the Comprehensive R Archive Network (CRAN).
Collapse
|
6
|
Sample sizes for estimating the sensitivity of a monitoring system that generates repeated binary outcomes with autocorrelation. Stat Methods Med Res 2023; 32:2347-2364. [PMID: 37915238 DOI: 10.1177/09622802231208058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2023]
Abstract
Sample size formulas are provided to determine how many events and how many patient care units are needed to estimate the sensitivity of a monitoring system. The monitoring systems we consider generate time series binary data that are autocorrelated and clustered by patient care units. Our application of interest is an automated hand hygiene monitoring system that assesses whether healthcare workers perform hand hygiene when they should. We apply an autoregressive order 1 mixed effects logistic regression model to determine sample sizes that allow the sensitivity of the monitoring system to be estimated at a specified confidence level and margin of error. This model overcomes a major limitation of simpler approaches that fail to provide confidence intervals with the specified levels of confidence when the sensitivity of the monitoring system is above 90%.
Collapse
|
7
|
Sample size calculation for one-armed clinical trials with clustered data and binary outcome. Biom J 2023; 65:e2300123. [PMID: 37377083 DOI: 10.1002/bimj.202300123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 05/30/2023] [Accepted: 06/11/2023] [Indexed: 06/29/2023]
Abstract
The formula of Fleiss and Cuzick (1979) to estimate the intraclass correlation coefficient is applied to reduce the task of sample size calculation for clustered data with binary outcome. It is demonstrated that this approach reduces the complexity of sample size calculation to the determination of the null and alternative hypothesis and the formulation of the quantitative influence of the belonging to the same cluster on the therapy success probability.
Collapse
|
8
|
Re-evaluating the role of pilot trials in informing effect and sample size estimates for full-scale trials: a meta-epidemiological study. BMJ Evid Based Med 2023; 28:383-391. [PMID: 37491141 DOI: 10.1136/bmjebm-2023-112358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/09/2023] [Indexed: 07/27/2023]
Abstract
BACKGROUND Some have argued that pilot trials have little value for informing the expected effect size of a subsequent large trial. This study aims to empirically evaluate the roles of pilot trials in informing the effect and sample size estimates of a full-scale trial. METHODS We conducted a search in PubMed on 19 February 2022, for all pilot trials published between 2005 and 2018 and their subsequent full-scale trials. We analysed the agreement in results by comparing the direction and magnitude of the effect size in the pilot trial and full-scale trial. Logistic regression was used to explore whether a significant pilot trial and other characteristics were associated with a significant full-scale trial. RESULTS A total of 248 pairs of pilot and full-scale trials were analysed. Full-scale trials with a significant pilot trial were 2.72 times more likely to find a significant result for the primary efficacy outcome than those with a non-significant pilot trial (95% CI 1.52 to 4.86, p=0.001). The association remained significant irrespective of changes made to the trial design. In 73% of the pairs, the pilot trial produced a larger point estimate than the subsequent full-scale trial, but 87% of pairs had a 95% CI estimated by the pilot trial that covered the full-scale trial point estimate. Full-scale trials with a sample size estimated using the SD from the pilot trial were less likely to yield a significant result (OR=0.26, 95% CI 0.10 to 0.65, p=0.004). CONCLUSION Pilot trials can provide strong signals on intervention efficacy. When determining the sample size for full-scale trials, using the CI bounds from the pilot trials instead of the point estimate may improve power estimation.
Collapse
|
9
|
Does it decay? Obtaining decaying correlation parameter values from previously analysed cluster randomised trials. Stat Methods Med Res 2023; 32:2123-2134. [PMID: 37589088 PMCID: PMC10683336 DOI: 10.1177/09622802231194753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/18/2023]
Abstract
A frequently applied assumption in the analysis of data from cluster randomised trials is that the outcomes from all participants within a cluster are equally correlated. That is, the intracluster correlation, which describes the degree of dependence between outcomes from participants in the same cluster, is the same for each pair of participants in a cluster. However, recent work has discussed the importance of allowing for this correlation to decay as the time between the measurement of participants in a cluster increases. Incorrect omission of such a decay can lead to under-powered studies, and confidence intervals for estimated treatment effects can be too narrow or too wide, depending on the characteristics of the design. When planning studies, researchers often rely on previously reported analyses of trials to inform their choice of intracluster correlation. However, most reported analyses of clustered data do not incorporate a correlation decay. Thus, often all that is available are estimates of intracluster correlations obtained under the potentially incorrect assumption of no decay. In this article, we show that it is possible to use intracluster correlation values obtained from models that incorrectly omit a decay to inform plausible choices of decaying correlations. Our focus is on intracluster correlation estimates for continuous outcomes obtained by fitting linear mixed models with exchangeable or block-exchangeable correlation structures. We describe how plausible values for decaying correlations may be obtained given these estimated intracluster correlations. An online app is presented that allows users to obtain plausible values of the decay, which can be used at the trial planning stage to assess the sensitivity of sample size and power calculations to decaying correlation structures.
Collapse
|
10
|
Using the Delphi process to determine the minimum clinically important effect size for the Balanced-2 randomised controlled trial. Clin Trials 2023; 20:473-478. [PMID: 37144615 DOI: 10.1177/17407745231173058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
BACKGROUND The sample size calculation is an important step in designing randomised controlled trials. For a trial comparing a control and an intervention group, where the outcome is binary, the sample size calculation requires choosing values for the anticipated event rates in both the control and intervention groups (the effect size), and the error rates. The Difference ELicitation in TriAls guidance recommends that the effect size should be both realistic, and clinically important to stakeholder groups. Overestimating the effect size leads to sample sizes that are too small to reliably detect the true population effect size, which in turn results in low achieved power. In this study, we use the Delphi approach to gain consensus on what the minimum clinically important effect size is for Balanced-2, a randomised controlled trial comparing processed electroencephalogram-guided 'light' to 'deep' general anaesthesia on the incidence of postoperative delirium in older adults undergoing major surgery. METHODS Delphi rounds were conducted using electronic surveys. Surveys were administered to two stakeholder groups: specialist anaesthetists from a general adult department in Auckland City Hospital, New Zealand (Group 1), and specialist anaesthetists with expertise in clinical research, identified from the Australian and New Zealand College of Anaesthetist's Clinical Trials Network (Group 2). A total of 187 anaesthetists were invited to participate (81 from Group 1 and 106 from Group 2). Results from each Delphi round were summarised and presented in subsequent rounds until consensus was reached (>70% agreement). RESULTS The overall response rate for the first Delphi survey was 47% (88/187). The median minimum clinically important effect size was 5.0% (interquartile range: 5.0-10.0) for both stakeholder groups. The overall response rate for the second Delphi survey was 51% (95/187). Consensus was reached after the second round, as 74% of respondents in Group 1 and 82% of respondents in Group 2 agreed with the median effect size. The combined minimum clinically important effect size across both groups was 5.0% (interquartile range: 3.0-6.5). CONCLUSIONS This study demonstrates that surveying stakeholder groups using a Delphi process is a simple way of defining a minimum clinically important effect size, which aids the sample size calculation and determines whether a randomised study is feasible.
Collapse
|
11
|
Blinded sample size recalculation in multiple composite population designs with normal data and baseline adjustments. Biom J 2023; 65:e2000326. [PMID: 37309256 DOI: 10.1002/bimj.202000326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 12/13/2022] [Accepted: 03/07/2023] [Indexed: 06/14/2023]
Abstract
The increasing interest in subpopulation analysis has led to the development of various new trial designs and analysis methods in the fields of personalized medicine and targeted therapies. In this paper, subpopulations are defined in terms of an accumulation of disjoint population subsets and will therefore be called composite populations. The proposed trial design is applicable to any set of composite populations, considering normally distributed endpoints and random baseline covariates. Treatment effects for composite populations are tested by combining p-values, calculated on the subset levels, using the inverse normal combination function to generate test statistics for those composite populations while the closed testing procedure accounts for multiple testing. Critical boundaries for intersection hypothesis tests are derived using multivariate normal distributions, reflecting the joint distribution of composite population test statistics given no treatment effect exists. For sample size calculation and sample size, recalculation multivariate normal distributions are derived which describe the joint distribution of composite population test statistics under an assumed alternative hypothesis. Simulations demonstrate the absence of any practical relevant inflation of the type I error rate. The target power after sample size recalculation is typically met or close to being met.
Collapse
|
12
|
How to calculate sample size in animal and human studies. Front Med (Lausanne) 2023; 10:1215927. [PMID: 37663663 PMCID: PMC10469945 DOI: 10.3389/fmed.2023.1215927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 07/18/2023] [Indexed: 09/05/2023] Open
Abstract
One of the most important statistical analyses when designing animal and human studies is the calculation of the required sample size. In this review, we define central terms in the context of sample size determination, including mean, standard deviation, statistical hypothesis testing, type I/II error, power, direction of effect, effect size, expected attrition, corrected sample size, and allocation ratio. We also provide practical examples of sample size calculations for animal and human studies based on pilot studies, larger studies similar to the proposed study-or if no previous studies are available-estimated magnitudes of the effect size per Cohen and Sawilowsky.
Collapse
|
13
|
Sample size considerations for micro-randomized trials with binary proximal outcomes. Stat Med 2023; 42:2777-2796. [PMID: 37094566 PMCID: PMC10314739 DOI: 10.1002/sim.9748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 04/04/2023] [Accepted: 04/11/2023] [Indexed: 04/26/2023]
Abstract
Micro-randomized trials (MRTs) are a novel experimental design for developing mobile health interventions. Participants are repeatedly randomized in an MRT, resulting in longitudinal data with time-varying treatments. Causal excursion effects are the main quantities of interest in MRT primary and secondary analyses. We consider MRTs where the proximal outcome is binary and the randomization probability is constant or time-varying but not data-dependent. We develop a sample size formula for detecting a nonzero marginal excursion effect. We prove that the formula guarantees power under a set of working assumptions. We demonstrate via simulation that violations of certain working assumptions do not affect the power, and for those that do, we point out the direction in which the power changes. We then propose practical guidelines for using the sample size formula. As an illustration, the formula is used to size an MRT on interventions for excessive drinking. The sample size calculator is implemented in R package MRTSampleSizeBinary and an interactive R Shiny app. This work can be used in trial planning for a wide range of MRTs with binary proximal outcomes.
Collapse
|
14
|
Best (but oft forgotten) practices: Efficient sample sizes for commonly used trial designs. Am J Clin Nutr 2023; 117:1063-1085. [PMID: 37270287 DOI: 10.1016/j.ajcnut.2023.02.013] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 02/08/2023] [Accepted: 02/14/2023] [Indexed: 06/05/2023] Open
Abstract
Designing studies such that they have a high level of power to detect an effect or association of interest is an important tool to improve the quality and reproducibility of findings from such studies. Since resources (research subjects, time, and money) are scarce, it is important to obtain sufficient power with minimum use of such resources. For commonly used randomized trials of the treatment effect on a continuous outcome, designs are presented that minimize the number of subjects or the amount of research budget when aiming for a desired power level. This concerns the optimal allocation of subjects to treatments and, in case of nested designs such as cluster-randomized trials and multicenter trials, also the optimal number of centers versus the number of persons per center. Since such optimal designs require knowledge of parameters of the analysis model that are not known in the design stage, in particular outcome variances, maximin designs are presented. These designs guarantee a prespecified power level for plausible ranges of the unknown parameters and minimize research costs for the worst-case values of these parameters. The focus is on a 2-group parallel design, the AB/BA crossover design, and cluster-randomized and multicenter trials with a continuous outcome. How to calculate sample sizes for maximin designs is illustrated for examples from nutrition. Several computer programs that are helpful in calculating sample sizes for optimal and maximin designs are discussed as well as some results on optimal designs for other types of outcomes.
Collapse
|
15
|
Combining evidence from clinical trials in conditional or accelerated approval. Pharm Stat 2023. [PMID: 37114714 DOI: 10.1002/pst.2302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 03/24/2023] [Indexed: 04/29/2023]
Abstract
Conditional (European Medicines Agency) or accelerated (U.S. Food and Drug Administration) approval of drugs allows earlier access to promising new treatments that address unmet medical needs. Certain post-marketing requirements must typically be met in order to obtain full approval, such as conducting a new post-market clinical trial. We study the applicability of the recently developed harmonic mean χ 2 $$ {\chi}^2 $$ -test to this conditional or accelerated approval framework. The proposed approach can be used both to support the design of the post-market trial and the analysis of the combined evidence provided by both trials. Other methods considered are the two-trials rule, Fisher's criterion and Stouffer's method. In contrast to some of the traditional methods, the harmonic mean χ 2 $$ {\chi}^2 $$ -test always requires a post-market clinical trial. If the p $$ p $$ -value from the pre-market clinical trial is ≪ 0.025 $$ \ll 0.025 $$ , a smaller sample size for the post-market clinical trial is needed than with the two-trials rule. For illustration, we apply the harmonic mean χ 2 $$ {\chi}^2 $$ -test to a drug which received conditional (and later full) market licensing by the EMA. A simulation study is conducted to study the operating characteristics of the harmonic mean χ 2 $$ {\chi}^2 $$ -test and two-trials rule in more detail. We finally investigate the applicability of these two methods to compute the power at interim of an ongoing post-market trial. These results are expected to aid in the design and assessment of the required post-market studies in terms of the level of evidence required for full approval.
Collapse
|
16
|
Most Placebo-Controlled Trials in Inflammatory Bowel Disease were Underpowered Because of Overestimated Drug Efficacy Rates: Results from a Systematic Review of Induction Studies. J Crohns Colitis 2023; 17:404-417. [PMID: 36219564 DOI: 10.1093/ecco-jcc/jjac150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Indexed: 12/13/2022]
Abstract
BACKGROUND AND AIMS Most pharmaceutical clinical trials for inflammatory bowel disease [IBD] are placebo-controlled and require effect size estimation for a drug relative to placebo. We compared expected effect sizes in sample size calculations [SSCs] to actual effect sizes in IBD clinical trials. METHODS MEDLINE, EMBASE, CENTRAL and the Cochrane library were searched from inception to March 26, 2021, to identify placebo-controlled induction studies for luminal Crohn's disease [CD] and ulcerative colitis [UC] that reported an SSC and a primary endpoint of clinical remission/response. Expected effects were subtracted from actual effects, and interquartile ranges [IQRs] for each corresponding median difference were calculated. Linear regression was used to assess whether placebo or drug event rate misspecifications were responsible for these differences. RESULTS Of eligible studies, 36.9% [55/149] were excluded because of incomplete SSC reporting, yielding 94 studies [46 CD, 48 UC]. Treatment effects were overestimated in CD for remission (-12.6% [IQR: -16.3 to -1.6%]), in UC for remission (-10.2% [IQR: -16.5 to -5.6%]) and in CD for response (-15.3% [IQR: -27.1 to -5.8%]). Differences observed were due to overestimated drug event rates, whereas expected and actual placebo event rates were similar. A meta-regression demonstrated associations between overestimated treatment effect sizes and several trial characteristics: isolated ileal disease, longer CD duration, extensive colitis [UC], single-centre, phase 2 and no endoscopic endpoint component [UC]. CONCLUSION Overestimation of IBD therapy efficacy rates resulted in smaller-than-expected treatment effects. These results should be used to inform SSCs and trial design for IBD drug development.
Collapse
|
17
|
JPEN Journal Club 72. The devil in the details. JPEN J Parenter Enteral Nutr 2023; 47:442-444. [PMID: 35975333 DOI: 10.1002/jpen.2439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 08/16/2022] [Indexed: 11/11/2022]
|
18
|
Group sequential multi-arm multi-stage trial design with treatment selection. Stat Med 2023; 42:1480-1491. [PMID: 36808736 DOI: 10.1002/sim.9682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 12/26/2022] [Accepted: 01/30/2023] [Indexed: 02/22/2023]
Abstract
A multi-arm trial allows simultaneous comparison of multiple experimental treatments with a common control and provides a substantial efficiency advantage compared to the traditional randomized controlled trial. Many novel multi-arm multi-stage (MAMS) clinical trial designs have been proposed. However, a major hurdle to adopting the group sequential MAMS routinely is the computational effort of obtaining total sample size and sequential stopping boundaries. In this paper, we develop a group sequential MAMS trial design based on the sequential conditional probability ratio test. The proposed method provides analytical solutions for futility and efficacy boundaries to an arbitrary number of stages and arms. Thus, it avoids complicated computational effort for the methods proposed by Magirr et al. Simulation results showed that the proposed method has several advantages compared to the methods implemented in R package MAMS by Magirr et al.
Collapse
|
19
|
Sample size calculation for clinical trials analyzed with the meta-analytic-predictive approach. Res Synth Methods 2023; 14:396-413. [PMID: 36625478 DOI: 10.1002/jrsm.1618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 01/03/2023] [Accepted: 01/04/2023] [Indexed: 01/11/2023]
Abstract
The meta-analytic-predictive (MAP) approach is a Bayesian method to incorporate historical controls in new trials that aims to increase the statistical power and reduce the required sample size. Here we investigate how to calculate the sample size of the new trial when historical data is available, and the MAP approach is used in the analysis. In previous applications of the MAP approach, the prior effective sample size (ESS) acted as a metric to quantify the number of subjects the historical information is worth. However, the validity of using the prior ESS in sample size calculation (i.e., reducing the number of randomized controls by the derived prior ESS) is questionable, because different approaches may yield different values for prior ESS. In this work, we propose a straightforward Monte Carlo approach to calculate the sample size that achieves the desired power in the new trial given available historical controls. To make full use of the available historical information to simulate the new trial data, the control parameters are not taken as a point estimate but sampled from the MAP prior. These sampled control parameters and the MAP prior based on the historical data are then used to derive the statistical power for the treatment effect and the resulting required sample size. The proposed sample size calculation approach is illustrated with real-life data sets with different outcomes from three studies. The results show that this approach to calculating the required sample size for the MAP analysis is straightforward and generic.
Collapse
|
20
|
Design and analysis of cluster randomized trials with time-to-event outcomes under the additive hazards mixed model. Stat Med 2022; 41:4860-4885. [PMID: 35908796 PMCID: PMC9588628 DOI: 10.1002/sim.9541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2021] [Revised: 05/04/2022] [Accepted: 07/19/2022] [Indexed: 11/12/2022]
Abstract
A primary focus of current methods for cluster randomized trials (CRTs) has been for continuous, binary, and count outcomes, with relatively less attention given to right-censored, time-to-event outcomes. In this article, we detail considerations for sample size requirement and statistical inference in CRTs with time-to-event outcomes when the intervention effect parameter is specified through the additive hazards mixed model (AHMM), which includes a frailty term to explicitly account for the dependency between the failure times. First, we discuss improved inference for the treatment effect parameter via bias-corrected sandwich variance estimators and randomization-based test under AHMM, addressing potential small-sample biases in CRTs. Next, we derive a new sample size formula for AHMM analysis of CRTs accommodating both equal and unequal cluster sizes. When the cluster sizes vary, our sample size formula depends on the mean and coefficient of variation of cluster sizes, based on which we articulate the impact of cluster size variation in CRTs with time-to-event outcomes. Furthermore, we obtain the insight that the classical variance inflation factor for CRTs with a non-censored outcome can in fact apply to CRTs with a time-to-event outcome, providing that an appropriate definition of the intraclass correlation coefficient is considered under AHMM. Simulation studies are carried out to illustrate key design and analysis considerations in CRTs with a small to moderate number of clusters. The proposed sample size procedure and analytical methods are further illustrated using the context of the STrategies to Reduce Injuries and Develop Confidence in Elders CRT.
Collapse
|
21
|
Sample size calculation for randomized selection trials with a time-to-event endpoint and a margin of practical equivalence. Stat Med 2022; 41:4022-4033. [PMID: 35688463 PMCID: PMC9544500 DOI: 10.1002/sim.9490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 05/16/2022] [Accepted: 05/23/2022] [Indexed: 11/30/2022]
Abstract
Selection trials are used to compare potentially active experimental treatments without a control arm. While sample size calculation methods exist for binary endpoints, no such methods are available for time‐to‐event endpoints, even though these are ubiquitous in clinical trials. Recent selection trials have begun using progression‐free survival as their primary endpoint, but have dichotomized it at a specific time point for sample size calculation and analysis. This changes the clinical question and may reduce power to detect a difference between the arms. In this article, we develop the theory for sample size calculation in selection trials where the time‐to‐event endpoint is assumed to follow an exponential or Weilbull distribution. We provide a free web application for sample size calculation, as well as an R package, that researchers can use in the design of their studies.
Collapse
|
22
|
Optimal sample size determination for single-arm trials in pediatric and rare populations with Bayesian borrowing. J Biopharm Stat 2022; 32:529-546. [PMID: 35604836 DOI: 10.1080/10543406.2022.2058529] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
In many therapeutic areas with unmet medical needs, such as pediatric oncology and rare diseases, one of the deterrent factors for clinical trial interpretability is the limited sample size with less-than-ideal operating characteristics. Single arm is usually the only viable design due to feasibility and ethical concerns. For the trial results to be more interpretable and conclusive, the evaluation of operating characteristics, such as type I error rate and power, and the appropriate utilization of prior information for study design, shall be prespecified and fully investigated during the trial planning phase. So far, very few existing literature addressed optimal sample size determination issues for the planning of pediatric and rare population trials, with majority of research focusing on analysis perspective with focus on Bayesian borrowing. In practice, when a single-arm trial is designed for rare population, it is not uncommon that the only information available is from an earlier trial and/or a few clinical publications based on observational studies, often constituting mixed or uncertain conclusions. In light of this, an optimal Bayesian sample size determination method for single-arm trial with binary or continuous endpoint is proposed, where conflicting prior beliefs can be readily incorporated. Prior effective sample size can be calculated to assess the robustness as well as the prior information borrowed. Moreover, due to the lack of closed-form posterior distributions in general, an alternative approach for calculating Bayesian power is described. Simulation studies are provided to demonstrate the utility of the proposed methods. In addition, a case study in pediatric patients with leukemia is included to illustrate the proposed method with the existing approaches.
Collapse
|
23
|
Optimal unplanned design modification in adaptive two-stage trials. Pharm Stat 2022; 21:1121-1137. [PMID: 35604767 DOI: 10.1002/pst.2228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Revised: 02/01/2022] [Accepted: 04/24/2022] [Indexed: 11/08/2022]
Abstract
Adaptive planning of clinical trials allows modifying the entire trial design at any time point mid-course. In this paper, we consider the case when a trial-external update of the planning assumptions during the ongoing trial makes an unforeseen design adaptation necessary. We take up the idea to construct adaptive designs with defined features by solving an optimization problem and apply it to the situation of unplanned design reassessment. By using the conditional error principle, we present an approach on how to optimally modify the trial design at an unplanned interim analysis while at the same time strictly protecting the type I error rate. This linking of optimal design planning and the conditional error principle allows sound reactions to unforeseen events that make a design reassessment necessary.
Collapse
|
24
|
The batched stepped wedge design: A design robust to delays in cluster recruitment. Stat Med 2022; 41:3627-3641. [PMID: 35596691 PMCID: PMC9541502 DOI: 10.1002/sim.9438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 04/13/2022] [Accepted: 05/05/2022] [Indexed: 11/08/2022]
Abstract
Stepped wedge designs are an increasingly popular variant of longitudinal cluster randomized trial designs, and roll out interventions across clusters in a randomized, but step-wise fashion. In the standard stepped wedge design, assumptions regarding the effect of time on outcomes may require that all clusters start and end trial participation at the same time. This would require ethics approvals and data collection procedures to be in place in all clusters before a stepped wedge trial can start in any cluster. Hence, although stepped wedge designs are useful for testing the impacts of many cluster-based interventions on outcomes, there can be lengthy delays before a trial can commence. In this article, we introduce "batched" stepped wedge designs. Batched stepped wedge designs allow clusters to commence the study in batches, instead of all at once, allowing for staggered cluster recruitment. Like the stepped wedge, the batched stepped wedge rolls out the intervention to all clusters in a randomized and step-wise fashion: a series of self-contained stepped wedge designs. Provided that separate period effects are included for each batch, software for standard stepped wedge sample size calculations can be used. With this time parameterization, in many situations including when linear models are assumed, sample size calculations reduce to the setting of a single stepped wedge design with multiple clusters per sequence. In these situations, sample size calculations will not depend on the delays between the commencement of batches. Hence, the power of batched stepped wedge designs is robust to unexpected delays between batches.
Collapse
|
25
|
Optimization of adaptive designs with respect to a performance score. Biom J 2022; 64:989-1006. [PMID: 35426460 DOI: 10.1002/bimj.202100166] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Revised: 02/09/2022] [Accepted: 02/12/2022] [Indexed: 11/08/2022]
Abstract
Adaptive designs are an increasingly popular method for the adaptation of design aspects in clinical trials, such as the sample size. Scoring different adaptive designs helps to make an appropriate choice among the numerous existing adaptive design methods. Several scores have been proposed to evaluate adaptive designs. Moreover, it is possible to determine optimal two-stage adaptive designs with respect to a customized objective score by solving a constrained optimization problem. In this paper, we use the conditional performance score by Herrmann et al. (2020) as the optimization criterion to derive optimal adaptive two-stage designs. We investigate variations of the original performance score, for example, by assigning different weights to the score components and by incorporating prior assumptions on the effect size. We further investigate a setting where the optimization framework is extended by a global power constraint, and additional optimization of the critical value function next to the stage-two sample size is performed. Those evaluations with respect to the sample size curves and the resulting design's performance can contribute to facilitate the score's usage in practice.
Collapse
|
26
|
Sample size calculation in randomised clinical trials. Comment on Br J Anaesth 2020; 125: 802-10. Br J Anaesth 2022; 128:e288-e289. [PMID: 35144803 DOI: 10.1016/j.bja.2022.01.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Accepted: 01/10/2022] [Indexed: 11/02/2022] Open
|
27
|
Power analysis for stepped wedge trials with multiple interventions. Stat Med 2022; 41:1498-1512. [PMID: 35014710 DOI: 10.1002/sim.9301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 11/02/2021] [Accepted: 12/09/2021] [Indexed: 11/06/2022]
Abstract
Stepped wedge design (SWD) trials are cluster randomized trials that feature staggered, unidirectional cross-over between treatment conditions. Existing literature on power for SWDs focuses primarily on designs with two conditions, typically a control and an intervention condition. However, SWDs with more than one treatment condition are being proposed and conducted. We present a linear mixed model for SWDs with two or more interventions, including both multiarm and factorial designs. We derive standard errors of the intervention effect coefficients, and present power calculation methods. We consider both repeated cross-sectional and cohort designs. Design features, with a focus on treatment allocations, are examined to determine their impact on power.
Collapse
|
28
|
Accounting for unequal cluster sizes in designing cluster randomized trials to detect treatment effect heterogeneity. Stat Med 2021; 41:1376-1396. [PMID: 34923655 DOI: 10.1002/sim.9283] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 11/14/2021] [Accepted: 11/24/2021] [Indexed: 12/26/2022]
Abstract
Unequal cluster sizes are common in cluster randomized trials (CRTs). While there are a number of previous investigations studying the impact of unequal cluster sizes on the power for testing the average treatment effect in CRTs, little is known about the impact of unequal cluster sizes on the power for testing the heterogeneous treatment effect (HTE) in CRTs. In this work, we expand the sample size procedures for studying HTE in CRTs to accommodate cluster size variation under the linear mixed model framework. Through analytical derivation and graphical exploration, we show that the sample size for the HTE with an individual-level effect modifier is less affected by unequal cluster sizes than with a cluster-level effect modifier. The impact of cluster size variability jointly depends on the mean and coefficient of variation of cluster sizes, covariate intraclass correlation coefficient (ICC) and the conditional outcome ICC. In addition, we demonstrate that the HTE-motivated analysis of covariance framework can be used for analyzing the average treatment effect, and offer a more efficient sample size procedure for studying the average treatment effect adjusting for the effect modifier. We use simulations to confirm the accuracy of the proposed sample size procedures for both the average treatment effect and HTE in CRTs. Extensions to multivariate effect modifiers are provided and our procedure is illustrated in the context of the Strategies to Reduce Injuries and Develop Confidence in Elders trial.
Collapse
|
29
|
Review of pragmatic trials found that multiple primary outcomes are common but so too are discrepancies between protocols and final reports. J Clin Epidemiol 2021; 143:149-158. [PMID: 34896234 DOI: 10.1016/j.jclinepi.2021.12.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Revised: 11/14/2021] [Accepted: 12/02/2021] [Indexed: 11/21/2022]
Abstract
OBJECTIVES To describe prevalence of multiple primary outcomes, changes in primary outcomes and target sample sizes between protocols and final reports, and how issues of multiplicity are addressed in pragmatic trials. STUDY DESIGN AND SETTING Individually randomised trials labelled as pragmatic, published 2014-2019 in MEDLINE and registered with ClinicalTrials.gov. RESULTS We identified 262 final reports and located protocols for 159 (61%); primary outcomes were clearly reported in 145 (91%) protocols and 256 (98%) final reports. Thirty (19%) protocols and 38 (15%) final reports had multiple primary outcomes. Primary outcomes were present and identical in 128 (81%) matched protocol-final reports. Among 140 pairs with target sample sizes reported, 28 (20.0%) reduced their target sample size (mean 543 fewer participants per trial) and 16 (11.4%) increased it (mean 192 more participants per trial). Thirteen (29.5%) provided an explanation. Only 2/30 (7%) protocols and 4/38 (11%) final reports with co-primary outcomes explained how results would be interpreted in light of multiplicity; 21/30 (70%) protocols and 20/38 (53%) final reports accounted for co-primary outcomes in power calculations. CONCLUSION Co-primary outcomes are common in pragmatic trials; improved transparency around design and analysis decisions involving co-primary outcomes is required.
Collapse
|
30
|
Design and Analysis Methods for Trials with AI-Based Diagnostic Devices for Breast Cancer. J Pers Med 2021; 11:jpm11111150. [PMID: 34834502 PMCID: PMC8617855 DOI: 10.3390/jpm11111150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 11/02/2021] [Accepted: 11/02/2021] [Indexed: 11/24/2022] Open
Abstract
Imaging is important in cancer diagnostics. It takes a long period of medical training and clinical experience for radiologists to be able to accurately interpret diagnostic images. With the advance of big data analysis, machine learning and AI-based devices are currently under development and taking a role in imaging diagnostics. If an AI-based imaging device can read the image as accurately as experienced radiologists, it may be able to help radiologists increase the accuracy of their reading and manage their workloads. In this paper, we consider two potential study objectives of a clinical trial to evaluate an AI-based device for breast cancer diagnosis by comparing its concordance with human radiologists. We propose statistical design and analysis methods for each study objective. Extensive numerical studies are conducted to show that the proposed statistical testing methods control the type I error rate accurately and the design methods provide required sample sizes with statistical powers close to pre-specified nominal levels. The proposed methods were successfully used to design and analyze a real device trial.
Collapse
|
31
|
Choosing and changing the analysis scale in non-inferiority trials with a binary outcome. Clin Trials 2021; 19:14-21. [PMID: 34693789 PMCID: PMC8847766 DOI: 10.1177/17407745211053790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Background The size of the margin strongly influences the required sample size in non-inferiority and equivalence trials. What is sometimes ignored, however, is that for trials with binary outcomes, the scale of the margin – risk difference, risk ratio or odds ratio – also has a large impact on power and thus on sample size requirement. When considering several scales at the design stage of a trial, these sample size consequences should be taken into account. Sometimes, changing the scale may be needed at a later stage of a trial, for example, when the event proportion in the control arm turns out different from expected. Also after completion of a trial, a switch to another scale is sometimes made, for example, when using a regression model in a secondary analysis or when combining study results in a meta-analysis that requires unifying scales. The exact consequences of such switches are currently unknown. Methods and Results This article first outlines sample size consequences for different choices of analysis scale at the design stage of a trial. We add a new result on sample size requirement comparing the risk difference scale with the risk ratio scale. Then, we study two different approaches to changing the analysis scale after the trial has commenced: (1) mapping the original non-inferiority margin using the event proportion in the control arm that was anticipated at the design stage or (2) mapping the original non-inferiority margin using the observed event proportion in the control arm. We use simulations to illustrate consequences on type I and type II error rates. Methods are illustrated on the INES trial, a non-inferiority trial that compared single birth rates in subfertile couples after different fertility treatments. Our results demonstrate large differences in required sample size when choosing between risk difference, risk ratio and odds ratio scales at the design stage of non-inferiority trials. In some cases, the sample size requirement is twice as large on one scale compared with another. Changing the scale after commencing the trial using anticipated proportions mainly impacts type II error rate, whereas switching using observed proportions is not advised due to not maintaining type I error rate. Differences were more pronounced with larger margins. Conclusions Trialists should be aware that the analysis scale can have large impact on type I and type II error rates in non-inferiority trials.
Collapse
|
32
|
Designing efficient randomized trials: power and sample size calculation when using semiparametric efficient estimators. Int J Biostat 2021; 18:151-171. [PMID: 34364314 DOI: 10.1515/ijb-2021-0039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 07/12/2021] [Indexed: 11/15/2022]
Abstract
Trials enroll a large number of subjects in order to attain power, making them expensive and time-consuming. Sample size calculations are often performed with the assumption of an unadjusted analysis, even if the trial analysis plan specifies a more efficient estimator (e.g. ANCOVA). This leads to conservative estimates of required sample sizes and an opportunity for savings. Here we show that a relatively simple formula can be used to estimate the power of any two-arm, single-timepoint trial analyzed with a semiparametric efficient estimator, regardless of the domain of the outcome or kind of treatment effect (e.g. odds ratio, mean difference). Since an efficient estimator attains the minimum possible asymptotic variance, this allows for the design of trials that are as small as possible while still attaining design power and control of type I error. The required sample size calculation is parsimonious and requires the analyst to provide only a small number of population parameters. We verify in simulation that the large-sample properties of trials designed this way attain their nominal values. Lastly, we demonstrate how to use this formula in the "design" (and subsequent reanalysis) of a real randomized trial and show that fewer subjects are required to attain the same design power when a semiparametric efficient estimator is accounted for at the design stage.
Collapse
|
33
|
Sample size calculation for recurrent event data with additive rates models. Pharm Stat 2021; 21:89-102. [PMID: 34309179 DOI: 10.1002/pst.2154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Revised: 05/20/2021] [Accepted: 06/28/2021] [Indexed: 11/06/2022]
Abstract
This paper discusses the design of clinical trials where the primary endpoint is a recurrent event with the focus on the sample size calculation. For the problem, a few methods have been proposed but most of them assume a multiplicative treatment effect on the rate or mean number of recurrent events. In practice, sometimes the additive treatment effect may be preferred or more appealing because of its intuitive clinical meaning and straightforward interpretation compared to a multiplicative relationship. In this paper, new methods are presented and investigated for the sample size calculation based on the additive rates model for superiority, non-inferiority, and equivalence trials. They allow for flexible baseline rate function, staggered entry, random dropout, and overdispersion in event numbers, and simulation studies show that the proposed methods perform well in a variety of settings. We also illustrate how to use the proposed methods to design a clinical trial based on real data.
Collapse
|
34
|
Bayesian single-arm phase II trial designs with time-to-event endpoints. Pharm Stat 2021; 20:1235-1248. [PMID: 34085764 DOI: 10.1002/pst.2143] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Revised: 04/27/2021] [Accepted: 05/24/2021] [Indexed: 11/12/2022]
Abstract
For the cancer clinical trials with immunotherapy and molecularly targeted therapy, time-to-event endpoint is often a desired endpoint. In this paper, we present an event-driven approach for Bayesian one-stage and two-stage single-arm phase II trial designs. Two versions of Bayesian one-stage designs were proposed with executable algorithms and meanwhile, we also develop theoretical relationships between the frequentist and Bayesian designs. These findings help investigators who want to design a trial using Bayesian approach have an explicit understanding of how the frequentist properties can be achieved. Moreover, the proposed Bayesian designs using the exact posterior distributions accommodate the single-arm phase II trials with small sample sizes. We also proposed an optimal two-stage approach, which can be regarded as an extension of Simon's two-stage design with the time-to-event endpoint. Comprehensive simulations were conducted to explore the frequentist properties of the proposed Bayesian designs and an R package BayesDesign can be assessed via R CRAN for convenient use of the proposed methods.
Collapse
|
35
|
Sample size, sample size planning, and the impact of study context: systematic review and recommendations by the example of psychological depression treatment. Psychol Med 2021; 51:902-908. [PMID: 33879275 PMCID: PMC8161431 DOI: 10.1017/s003329172100129x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Revised: 01/27/2021] [Accepted: 03/23/2021] [Indexed: 11/07/2022]
Abstract
BACKGROUND Sample size planning (SSP) is vital for efficient studies that yield reliable outcomes. Hence, guidelines, emphasize the importance of SSP. The present study investigates the practice of SSP in current trials for depression. METHODS Seventy-eight randomized controlled trials published between 2013 and 2017 were examined. Impact of study design (e.g. number of randomized conditions) and study context (e.g. funding) on sample size was analyzed using multiple regression. RESULTS Overall, sample size during pre-registration, during SSP, and in published articles was highly correlated (r's ≥ 0.887). Simultaneously, only 7-18% of explained variance related to study design (p = 0.055-0.155). This proportion increased to 30-42% by adding study context (p = 0.002-0.005). The median sample size was N = 106, with higher numbers for internet interventions (N = 181; p = 0.021) compared to face-to-face therapy. In total, 59% of studies included SSP, with 28% providing basic determinants and 8-10% providing information for comprehensible SSP. Expected effect sizes exhibited a sharp peak at d = 0.5. Depending on the definition, 10.2-20.4% implemented intense assessment to improve statistical power. CONCLUSIONS Findings suggest that investigators achieve their determined sample size and pre-registration rates are increasing. During study planning, however, study context appears more important than study design. Study context, therefore, needs to be emphasized in the present discussion, as it can help understand the relatively stable trial numbers of the past decades. Acknowledging this situation, indications exist that digital psychiatry (e.g. Internet interventions or intense assessment) can help to mitigate the challenge of underpowered studies. The article includes a short guide for efficient study planning.
Collapse
|
36
|
Two Questions About the Design of Cluster Randomized Trials: A Tutorial. J Pain Symptom Manage 2021; 61:858-863. [PMID: 33246075 PMCID: PMC8009809 DOI: 10.1016/j.jpainsymman.2020.11.019] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 11/12/2020] [Accepted: 11/16/2020] [Indexed: 11/16/2022]
Abstract
This is a short tutorial on two key questions that pertain to cluster randomized trials (CRTs): 1) Should I perform a CRT? and 2) If so, how do I derive the sample size? In summary, a CRT is the best option when you "must" (e.g., the intervention can only be administered to a group) or you "should" (e.g., because of issues such as feasibility and contamination). CRTs are less statistically efficient and usually more logistically complex than individually randomized trials, and so reviewing the rationale for their use is critical. The most straightforward approach to the sample size calculation is to first perform the calculation as if the design were randomized at the level of the patient and then to inflate this sample size by multiplying by the "design effect", which quantifies the degree to which responses within a cluster are similar to one another. Although trials with large numbers of small clusters are more statistically efficient than those with a few large clusters, trials with large clusters can be more feasible. Also, if results are to be compared across individual sites, then sufficient sample size will be required to attain adequate precision within each site. Sample size calculations should include sensitivity analyses, as inputs from the literature can lack precision. Collaborating with a statistician is essential. To illustrate these points, we describe an ongoing CRT testing a mobile-based app to systematically engage families of intensive care unit patients and help intensive care unit clinicians deliver needs-targeted palliative care.
Collapse
|
37
|
Optimal planning of adaptive two-stage designs. Stat Med 2021; 40:3196-3213. [PMID: 33738842 DOI: 10.1002/sim.8953] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Revised: 01/31/2021] [Accepted: 03/02/2021] [Indexed: 12/12/2022]
Abstract
Adaptive designs are playing an increasingly important role in the planning of clinical trials. While there exists various research on the optimal determination of a two-stage design, non-optimal versions still are frequently applied in clinical research. In this article, we strive to motivate the application of optimal adaptive designs and give guidance on how to determine them. It is demonstrated that optimizing a trial design with respect to particular objective criteria can have a substantial benefit over the application of conventional adaptive sample size recalculation rules. Furthermore, we show that in many practical situations, optimal group-sequential designs show an almost negligible performance loss compared to optimal adaptive designs. Finally, we illustrate how optimal designs can be tailored to specific operational requirements by customizing the underlying optimization problem.
Collapse
|
38
|
Some design considerations incorporating early futility for single-arm clinical trials with time-to-event primary endpoints using Weibull distribution. Pharm Stat 2021; 20:610-644. [PMID: 33565236 DOI: 10.1002/pst.2097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 01/04/2021] [Accepted: 01/05/2021] [Indexed: 11/07/2022]
Abstract
Sample size calculation is an essential component of the planning phase of a clinical trial. In the context of single-arm clinical trials with time-to-event (TTE) endpoints, only a few options with limited design features are available. Motivated from ethical or practical considerations, two-stage designs are implemented for single-arm studies to obtain early evidence of futility. A major drawback of such designs is that early stopping may only occur at the conclusion of the first stage, even if lack of efficacy becomes apparent at any other time point over the course of the clinical trial. In this manuscript, we attempt to fill some existing gaps in the literature related to single-arm clinical trials with TTE endpoints. We propose a parametric maximum likelihood estimate-based test whose variance component accounts for the expected proportion of loss to follow-up and different accrual patterns (early, late, or uniform accrual). For the proposed method, we present three stochastic curtailment methods (conditional power, predictive power, Bayesian predictive probability) which can be employed for efficacy or futility testing purposes. Finally, we discuss the implementation of group sequential designs for obtaining an early evidence of efficacy or futility at pre-planned timings of interim analyses. Through extensive simulations, it is shown that our proposed method performs well for designing these studies with moderate to large sample sizes. Some examples are presented to demonstrate various aspects of the stochastic curtailment and repeated significance testing methods presented in this manuscript.
Collapse
|
39
|
Abstract
The stepped wedge cluster randomized design has received increasing attention in pragmatic clinical trials and implementation science research. The key feature of the design is the unidirectional crossover of clusters from the control to intervention conditions on a staggered schedule, which induces confounding of the intervention effect by time. The stepped wedge design first appeared in the Gambia hepatitis study in the 1980s. However, the statistical model used for the design and analysis was not formally introduced until 2007 in an article by Hussey and Hughes. Since then, a variety of mixed-effects model extensions have been proposed for the design and analysis of these trials. In this article, we explore these extensions under a unified perspective. We provide a general model representation and regard various model extensions as alternative ways to characterize the secular trend, intervention effect, as well as sources of heterogeneity. We review the key model ingredients and clarify their implications for the design and analysis. The article serves as an entry point to the evolving statistical literatures on stepped wedge designs.
Collapse
|
40
|
Sample size calculation for active-arm trial with counterfactual incidence based on recency assay. STATISTICAL COMMUNICATIONS IN INFECTIOUS DISEASES 2021; 13:20200009. [PMID: 35880999 PMCID: PMC8865397 DOI: 10.1515/scid-2020-0009] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Revised: 09/27/2021] [Accepted: 09/30/2021] [Indexed: 06/15/2023]
Abstract
Objectives The past decade has seen tremendous progress in the development of biomedical agents that are effective as pre-exposure prophylaxis (PrEP) for HIV prevention. To expand the choice of products and delivery methods, new medications and delivery methods are under development. Future trials of non-inferiority, given the high efficacy of ARV-based PrEP products as they become current or future standard of care, would require a large number of participants and long follow-up time that may not be feasible. This motivates the construction of a counterfactual estimate that approximates incidence for a randomized concurrent control group receiving no PrEP. Methods We propose an approach that is to enroll a cohort of prospective PrEP users and aug-ment screening for HIV with laboratory markers of duration of HIV infection to indicate recent infections. We discuss the assumptions under which these data would yield an estimate of the counterfactual HIV incidence and develop sample size and power calculations for comparisons to incidence observed on an investigational PrEP agent. Results We consider two hypothetical trials for men who have sex with men (MSM) and transgender women (TGW) from different regions and young women in sub-Saharan Africa. The calculated sample sizes are reasonable and yield desirable power in simulation studies. Conclusions Future one-arm trials with counterfactual placebo incidence based on a recency assay can be conducted with reasonable total screening sample sizes and adequate power to determine treatment efficacy.
Collapse
|
41
|
Abstract
The bootstrap, introduced in Efron (1979. Bootstrap methods: another look at the jackknife. The Annals of Statistics7, 1-26), is a landmark method for quantifying variability. It uses sampling with replacement with a sample size equal to that of the original data. We propose the upstrap, which samples with replacement either more or fewer samples than the original sample size. We illustrate the upstrap by solving a hard, but common, sample size calculation problem. The data and code used for the analysis in this article are available on GitHub (2018. https://github.com/ccrainic/upstrap).
Collapse
|
42
|
Optimal two-stage sampling for mean estimation in multilevel populations when cluster size is informative. Stat Methods Med Res 2020; 30:357-375. [PMID: 32940135 PMCID: PMC8172256 DOI: 10.1177/0962280220952833] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
To estimate the mean of a quantitative variable in a hierarchical population, it is logistically convenient to sample in two stages (two-stage sampling), i.e. selecting first clusters, and then individuals from the sampled clusters. Allowing cluster size to vary in the population and to be related to the mean of the outcome variable of interest (informative cluster size), the following competing sampling designs are considered: sampling clusters with probability proportional to cluster size, and then the same number of individuals per cluster; drawing clusters with equal probability, and then the same percentage of individuals per cluster; and selecting clusters with equal probability, and then the same number of individuals per cluster. For each design, optimal sample sizes are derived under a budget constraint. The three optimal two-stage sampling designs are compared, in terms of efficiency, with each other and with simple random sampling of individuals. Sampling clusters with probability proportional to size is recommended. To overcome the dependency of the optimal design on unknown nuisance parameters, maximin designs are derived. The results are illustrated, assuming probability proportional to size sampling of clusters, with the planning of a hypothetical survey to compare adolescent alcohol consumption between France and Italy.
Collapse
|
43
|
Cancer immunotherapy trial design with long-term survivors. Pharm Stat 2020; 20:117-128. [PMID: 32869945 DOI: 10.1002/pst.2060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Revised: 04/25/2020] [Accepted: 07/20/2020] [Indexed: 11/06/2022]
Abstract
Cancer immunotherapy often reflects the improvement in both short-term risk reduction and long-term survival. In this scenario, a mixture cure model can be used for the trial design. However, the hazard functions based on the mixture cure model between two groups will ultimately crossover. Thus, the conventional assumption of proportional hazards may be violated and study design using standard log-rank test (LRT) could lose power if the main interest is to detect the improvement of long-term survival. In this paper, we propose a change sign weighted LRT for the trial design. We derived a sample size formula for the weighted LRT, which can be used for designing cancer immunotherapy trials to detect both short-term risk reduction and long-term survival. Simulation studies are conducted to compare the efficiency between the standard LRT and the change sign weighted LRT.
Collapse
|
44
|
A method for sample size calculation via E-value in the planning of observational studies. Pharm Stat 2020; 20:163-174. [PMID: 32816399 DOI: 10.1002/pst.2064] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 06/23/2020] [Accepted: 07/30/2020] [Indexed: 12/28/2022]
Abstract
Confounding adjustment plays a key role in designing observational studies such as cross-sectional studies, case-control studies, and cohort studies. In this article, we propose a simple method for sample size calculation in observational research in the presence of confounding. The method is motivated by the notion of E-value, using some bounding factor to quantify the impact of confounders on the effect size. The method can be applied to calculate the needed sample size in observational research when the outcome variable is binary, continuous, or time-to-event. The method can be implemented straightforwardly using existing commercial software such as the PASS software. We demonstrate the performance of the proposed method through numerical examples, simulation studies, and a real application, which show that the proposed method is conservative in providing a slightly bigger sample size than what it needs to achieve a given power.
Collapse
|
45
|
JPEN Journal Club 55. Best- and Worst-Case Scenarios. JPEN J Parenter Enteral Nutr 2020; 45:212-214. [PMID: 32441785 DOI: 10.1002/jpen.1927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Accepted: 05/11/2020] [Indexed: 11/07/2022]
|
46
|
Design, analysis, power, and sample size calculation for three-phase interrupted time series analysis in evaluation of health policy interventions. J Eval Clin Pract 2020; 26:826-841. [PMID: 31429175 PMCID: PMC7028460 DOI: 10.1111/jep.13266] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Revised: 08/01/2019] [Accepted: 08/06/2019] [Indexed: 01/07/2023]
Abstract
OBJECTIVE To discuss the study design and data analysis for three-phase interrupted time series (ITS) studies to evaluate the impact of health policy, systems, or environmental interventions. Simulation methods are used to conduct power and sample size calculation for these studies. METHODS We consider the design and analysis of three-phase ITS studies using a study funded by National Institutes of Health as an exemplar. The design and analysis of both one-arm and two-arm three-phase ITS studies are introduced. RESULTS A simulation-based approach, with ready-to-use computer programs, was developed to determine the power for two types of three-phase ITS studies. Simulations were conducted to estimate the power of segmented autoregressive (AR) error models when autocorrelation ranged from -0.9 to 0.9 with various effect sizes. The power increased as the sample size or the effect size increased. The power to detect the same effect sizes varied largely, depending on testing level change, trend changes, or both. CONCLUSION This article provides a convenient tool for investigators to generate sample sizes to ensure sufficient statistical power when three-phase ITS study design is implemented.
Collapse
|
47
|
Abstract
The one-sample log-rank test allows to compare the survival of a single sample with a prefixed reference survival curve. It naturally applies in single-arm phase IIa trials with time-to-event endpoint. Several authors have described that the original one-sample log-rank test is conservative when sample size is small and have proposed strategies to correct the conservativeness. Here, we propose an alternative approach to improve the one-sample log-rank test. Our new one-sample log-rank statistic is based on the unique transformation of the underlying counting process martingale such that the moments of the limiting normal distribution have no shared parameters. Simulation results show that the new one-sample log-rank test gives type I error rate and power close to the nominal levels also when sample size is small, while relevantly reducing the required sample size to achieve the desired power as compared to current approaches to design studies to compare the survival outcome of a sample with a reference.
Collapse
|
48
|
Determination of hazard ratio for progression-free survival considering the tumor assessment schedule in sample size calculation. Pharm Stat 2020; 19:126-136. [PMID: 32067336 DOI: 10.1002/pst.1973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2018] [Revised: 08/21/2019] [Accepted: 09/09/2019] [Indexed: 11/12/2022]
Abstract
Progression-free survival is recognized as an important endpoint in oncology clinical trials. In clinical trials aimed at new drug development, the target population often comprises patients that are refractory to standard therapy with a tumor that shows rapid progression. This situation would increase the bias of the hazard ratio calculated for progression-free survival, resulting in decreased power for such patients. Therefore, new measures are needed to prevent decreasing the power in advance when estimating the sample size. Here, I propose a novel calculation procedure to assume the hazard ratio for progression-free survival using the Cox proportional hazards model, which can be applied in sample size calculation. The hazard ratios derived by the proposed procedure were almost identical to those obtained by simulation. The hazard ratio calculated by the proposed procedure is applicable to sample size calculation and coincides with the nominal power. Methods that compensate for the lack of power due to biases in the hazard ratio are also discussed from a practical point of view.
Collapse
|
49
|
SIMON: Simple methods for analyzing DNA methylation by targeted bisulfite next-generation sequencing. PLANT BIOTECHNOLOGY (TOKYO, JAPAN) 2019; 36:213-222. [PMID: 31983875 PMCID: PMC6978500 DOI: 10.5511/plantbiotechnology.19.0822a] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Accepted: 08/22/2019] [Indexed: 06/10/2023]
Abstract
DNA methylation in higher organisms has become an expanding field of study as it often involves the regulation of gene expression. Although Whole Genome Bisulfite Sequencing (WG-BS) based on next-generation sequencing (NGS) is the most versatile method, this is a costly technique that lacks in-depth analytic power. There are no conventional methods based on NGS that enable researchers to easily compare the level of DNA methylation from the practical number of samples handled in the laboratory. Although the targeted BS method based on Sanger sequencing is generally used in this case, it lacks in-depth analytic power. Therefore, we propose a new method that combines the high throughput analytic power of NGS and bioinformatics with the specificity and focus offered by PCR-amplification-based bisulfite sequencing methods. We use in silico size sieving of DNA-fragments and primer matchings instead of whole-fragment alignment in our bioinformatics analyses, and named our method SIMON (Simple Inference for Methylome based On NGS). The results of our targeted BS method based on NGS (SIMON method) show that small variations in DNA methylation patterns can be precisely and efficiently measured at a single nucleotide resolution. SIMON method combines pre-existing techniques to provide a cost-effective technique for in-depth studies that focus on pre-identified loci. It offers significant improvements with regard to workflow and the quality of the acquired DNA methylation information. Because of the high accuracy of the analysis, small variations of DNA methylation levels can be precisely determined even with large numbers of samples and loci.
Collapse
|
50
|
Design and analysis considerations for cohort stepped wedge cluster randomized trials with a decay correlation structure. Stat Med 2019; 39:438-455. [PMID: 31797438 DOI: 10.1002/sim.8415] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 08/21/2019] [Accepted: 10/07/2019] [Indexed: 01/08/2023]
Abstract
A stepped wedge cluster randomized trial is a type of longitudinal cluster design that sequentially switches clusters to intervention over time until all clusters are treated. While the traditional posttest-only parallel design requires adjustment for a single intraclass correlation coefficient, the stepped wedge design allows multiple outcome measurements from the same cluster and so additional correlation parameters are necessary to characterize the within-cluster correlation structure. Although a number of studies have differentiated between the concepts of within-period and between-period correlations, only a few studies have allowed the between-period correlation to decay over time. In this article, we consider the proportional decay correlation structure for a cohort stepped wedge design, and provide a matrix-adjusted quasi-least squares approach to accurately estimate the correlation parameters along with the marginal intervention effect. We further develop the sample size and power procedures accounting for the correlation decay, and investigate the accuracy of the power procedure with continuous outcomes in a simulation study. We show that the empirical power agrees well with the prediction even with as few as nine clusters, when data are analyzed with matrix-adjusted quasi-least squares concurrently with a suitable bias-corrected sandwich variance. Two trial examples are provided to illustrate the new sample size procedure.
Collapse
|