Kulinskaya E, Hoaglin DC. On the Q statistic with constant weights in meta-analysis of binary outcomes.
BMC Med Res Methodol 2023;
23:146. [PMID:
37344771 DOI:
10.1186/s12874-023-01939-z]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 05/05/2023] [Indexed: 06/23/2023] Open
Abstract
BACKGROUND
Cochran's Q statistic is routinely used for testing heterogeneity in meta-analysis. Its expected value (under an incorrect null distribution) is part of several popular estimators of the between-study variance, [Formula: see text]. Those applications generally do not account for use of the studies' estimated variances in the inverse-variance weights that define Q (more explicitly, [Formula: see text]). Importantly, those weights make approximating the distribution of [Formula: see text] rather complicated.
METHODS
As an alternative, we are investigating a Q statistic, [Formula: see text], whose constant weights use only the studies' arm-level sample sizes. For log-odds-ratio (LOR), log-relative-risk (LRR), and risk difference (RD) as the measures of effect, we study, by simulation, approximations to distributions of [Formula: see text] and [Formula: see text], as the basis for tests of heterogeneity.
RESULTS
The results show that: for LOR and LRR, a two-moment gamma approximation to the distribution of [Formula: see text] works well for small sample sizes, and an approximation based on an algorithm of Farebrother is recommended for larger sample sizes. For RD, the Farebrother approximation works very well, even for small sample sizes. For [Formula: see text], the standard chi-square approximation provides levels that are much too low for LOR and LRR and too high for RD. The Kulinskaya et al. (Res Synth Methods 2:254-70, 2011) approximation for RD and the Kulinskaya and Dollinger (BMC Med Res Methodol 15:49, 2015) approximation for LOR work well for [Formula: see text] but have some convergence issues for very small sample sizes combined with small probabilities.
CONCLUSIONS
The performance of the standard [Formula: see text] approximation is inadequate for all three binary effect measures. Instead, we recommend a test of heterogeneity based on [Formula: see text] and provide practical guidelines for choosing an appropriate test at the .05 level for all three effect measures.
Collapse