Wiley K, Berger P, Friehs MA, Mandryk RL. Measuring the Reliability of a Gamified Stroop Task: Quantitative Experiment.
JMIR Serious Games 2024;
12:e50315. [PMID:
38598265 PMCID:
PMC11043929 DOI:
10.2196/50315]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 11/29/2023] [Accepted: 01/31/2024] [Indexed: 04/11/2024] Open
Abstract
BACKGROUND
Few gamified cognitive tasks are subjected to rigorous examination of psychometric properties, despite their use in experimental and clinical settings. Even small manipulations to cognitive tasks require extensive research to understand their effects.
OBJECTIVE
This study aims to investigate how game elements can affect the reliability of scores on a Stroop task. We specifically investigated performance consistency within and across sessions.
METHODS
We created 2 versions of the Stroop task, with and without game elements, and then tested each task with participants at 2 time points. The gamified task used points and feedback as game elements. In this paper, we report on the reliability of the gamified Stroop task in terms of internal consistency and test-retest reliability, compared with the control task. We used a permutation approach to evaluate internal consistency. For test-retest reliability, we calculated the Pearson correlation and intraclass correlation coefficients between each time point. We also descriptively compared the reliability of scores on a trial-by-trial basis, considering the different trial types.
RESULTS
At the first time point, the Stroop effect was reduced in the game condition, indicating an increase in performance. Participants in the game condition had faster reaction times (P=.005) and lower error rates (P=.04) than those in the basic task condition. Furthermore, the game condition led to higher measures of internal consistency at both time points for reaction times and error rates, which indicates a more consistent response pattern. For reaction time in the basic task condition, at time 1, rSpearman-Brown=0.78, 95% CI 0.64-0.89. At time 2, rSpearman-Brown=0.64, 95% CI 0.40-0.81. For reaction time, in the game condition, at time 1, rSpearman-Brown=0.83, 95% CI 0.71-0.91. At time 2, rSpearman-Brown=0.76, 95% CI 0.60-0.88. Similarly, for error rates in the basic task condition, at time 1, rSpearman-Brown=0.76, 95% CI 0.62-0.87. At time 2, rSpearman-Brown=0.74, 95% CI 0.58-0.86. For error rates in the game condition, at time 1, rSpearman-Brown=0.76, 95% CI 0.62-0.87. At time 2, rSpearman-Brown=0.74, 95% CI 0.58-0.86. Test-retest reliability analysis revealed a distinctive performance pattern depending on the trial type, which may be reflective of motivational differences between task versions. In short, especially in the incongruent trials where cognitive conflict occurs, performance in the game condition reaches peak consistency after 100 trials, whereas performance consistency drops after 50 trials for the basic version and only catches up to the game after 250 trials.
CONCLUSIONS
Even subtle gamification can impact task performance albeit not only in terms of a direct difference in performance between conditions. People playing the game reach peak performance sooner, and their performance is more consistent within and across sessions. We advocate for a closer examination of the impact of game elements on performance.
Collapse