How large should the next study be? Predictive power and sample size requirements for replication studies.
Stat Med 2022;
41:3090-3101. [PMID:
35396714 PMCID:
PMC9325423 DOI:
10.1002/sim.9406]
[Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Revised: 03/21/2022] [Accepted: 03/24/2022] [Indexed: 12/13/2022]
Abstract
We use information derived from over 40K trials in the Cochrane Collaboration database of systematic reviews (CDSR) to compute the replication probability, or predictive power of an experiment given its observed (two‐sided) P‐value. We find that an exact replication of a marginally significant result with P=.05 has less than 30% chance of again reaching significance. Moreover, the replication of a result with P=.005 still has only 50% chance of significance. We also compute the probability that the direction (sign) of the estimated effect is correct, which is closely related to the type S error of Gelman and Tuerlinckx. We find that if an estimated effect has P=.05, there is a 93% probability that its sign is correct. If P=.005, then that probability is 99%. Finally, we compute the required sample size for a replication study to achieve some specified power conditional on the p‐value of the original study. We find that the replication of a result with P=.05 requires a sample size more than 16 times larger than the original study to achieve 80% power, while P=.005 requires at least 3.5 times larger sample size. These findings confirm that failure to replicate the statistical significance of a trial does not necessarily indicate that the original result was a fluke.
Collapse