1
|
Brockbank E, Vul E. Repeated rock, paper, scissors play reveals limits in adaptive sequential behavior. Cogn Psychol 2024; 151:101654. [PMID: 38657419 DOI: 10.1016/j.cogpsych.2024.101654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 03/30/2024] [Accepted: 04/08/2024] [Indexed: 04/26/2024]
Abstract
How do people adapt to others in adversarial settings? Prior work has shown that people often violate rational models of adversarial decision-making in repeated interactions. In particular, in mixed strategy equilibrium (MSE) games, where optimal action selection entails choosing moves randomly, people often do not play randomly, but instead try to outwit their opponents. However, little is known about the adaptive reasoning that underlies these deviations from random behavior. Here, we examine strategic decision-making across repeated rounds of rock, paper, scissors, a well-known MSE game. In experiment 1, participants were paired with bot opponents that exhibited distinct stable move patterns, allowing us to identify the bounds of the complexity of opponent behavior that people can detect and adapt to. In experiment 2, bot opponents instead exploited stable patterns in the human participants' moves, providing a symmetrical bound on the complexity of patterns people can revise in their own behavior. Across both experiments, people exhibited a robust and flexible attention to transition patterns from one move to the next, exploiting these patterns in opponents and modifying them strategically in their own moves. However, their adaptive reasoning showed strong limitations with respect to more sophisticated patterns. Together, results provide a precise and consistent account of the surprisingly limited scope of people's adaptive decision-making in this setting.
Collapse
Affiliation(s)
| | - Edward Vul
- University of California San Diego, United States of America
| |
Collapse
|
2
|
Zhang Y, Huynh TKT, Dyson BJ. Deliberately making miskates: Behavioural consistency under win maximization and loss maximization conditions. NPJ SCIENCE OF LEARNING 2023; 8:55. [PMID: 38057350 DOI: 10.1038/s41539-023-00206-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Accepted: 11/17/2023] [Indexed: 12/08/2023]
Abstract
We argue that the feedback traditionally used to indicate negative outcomes causes future detrimental performance because of the default goal of win maximization. In gaming paradigms where participants intentionally performed as well (win maximization) and as poorly (loss maximization) as possible, we showed a double dissociation where actions following wins were more consistent during win maximization, but actions following losses were more consistent during loss maximization. This broader distinction between goal-congruent and goal-incongruent feedback suggests that individuals are able to flexibly redefine their definition of 'success', and provide a reconsideration of the way we think about 'losing'.
Collapse
Affiliation(s)
| | | | - Benjamin James Dyson
- University of Alberta, Edmonton, Canada.
- Toronto Metropolitan University, Toronto, Canada.
| |
Collapse
|
3
|
Chen Z, Doekemeijer RA, Noël X, Verbruggen F. Winning and losing in online gambling: Effects on within-session chasing. PLoS One 2022; 17:e0273359. [PMID: 35981088 PMCID: PMC9387854 DOI: 10.1371/journal.pone.0273359] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 08/08/2022] [Indexed: 11/21/2022] Open
Abstract
The tendency to continue or intensify gambling after losing (loss-chasing) is widely regarded as a defining feature of gambling disorder. However, loss-chasing in real gambling contexts is multifaceted, and some aspects are better understood than others. Gamblers may chase losses between multiple sessions or within a single session. Furthermore, within a session, loss-chasing can be expressed in the decision of (1) when to stop, (2) how much stake to bet, and (3) the speed of play after winning and losing. Using a large player-tracking data set (>2500 players, >10 million rounds) collected from the online commercial game Mystery Arena, we examined these three behavioral expressions of within-session loss-chasing. While the first two aspects (when to stop and how much stake to bet) have been examined previously, the current research is the first large-scale study to examine the effects of wins and losses on the speed of play in real gambling. The players were additionally assigned different involvement levels by the operator based on their gambling behavior on the operator’s own platform, which further allowed us to examine group differences in loss-chasing. We found that after winning, both the high- and low-involvement groups were less likely to stop, and increased the stake amount, thus showing win-chasing instead of loss-chasing in these two facets. After losing, both groups played more quickly though, which may reflect an urge to continue gambling (as an expression of loss-chasing). Wins and losses had a smaller influence on the speed of play for the high-involvement players, suggesting that they might have reduced sensitivity to wins and/or losses. Future work can further examine chasing in different gambling products and in people with gambling problems to assess the generalizability of these findings.
Collapse
Affiliation(s)
- Zhang Chen
- Department of Experimental Psychology, Ghent University, Ghent, Belgium
- * E-mail:
| | | | - Xavier Noël
- Laboratoire de Psychologie Médicale et d’Addictologie, Faculté de Médecine, Université Libre de Bruxelles, Brussels, Belgium
| | | |
Collapse
|
4
|
Dahal R, MacLellan K, Vavrek D, Dyson BJ. Assessing behavioural profiles following neutral, positive and negative feedback. PLoS One 2022; 17:e0270475. [PMID: 35788745 PMCID: PMC9255737 DOI: 10.1371/journal.pone.0270475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 06/10/2022] [Indexed: 12/02/2022] Open
Abstract
Previous data suggest zero-value, neutral outcomes (draw) are subjectively assigned negative rather than positive valence. The combined observations of faster rather than slower reaction times, subsequent actions defined by shift rather than stay behaviour, reduced flexibility, and, larger rather than smaller deviations from optimal performance following draws all align with the consequences of explicitly negative outcomes such as losses. We further tested the relationships between neutral, positive and negative outcomes by manipulating value salience and observing their behavioural profiles. Despite speeded reaction times and a non-significant bias towards shift behaviour similar to losses when draws were assigned the value of 0 (Experiment 1), the degree of shift behaviour approached an approximation of optimal performance when the draw value was explicitly positive (+1). This was in contrast to when the draw value was explicitly negative (-1), which led to a significant increase in the degree of shift behaviour (Experiment 2). Similar modifications were absent when the same value manipulations were applied to win or lose trials (Experiment 3). Rather than viewing draws as neutral and valence-free outcomes, the processing cascade generated by draws produces a complex behavioural profile containing elements found in response to both explicitly positive and explicitly negative results.
Collapse
Affiliation(s)
| | | | | | - Benjamin James Dyson
- University of Alberta, Edmonton, Canada
- University of Sussex, Brighton, United Kingdom
- Toronto Metropolian University, Toronto, Canada
- * E-mail:
| |
Collapse
|
5
|
Sundvall J, Dyson BJ. Breaking the bonds of reinforcement: Effects of trial outcome, rule consistency and rule complexity against exploitable and unexploitable opponents. PLoS One 2022; 17:e0262249. [PMID: 35108279 PMCID: PMC8809577 DOI: 10.1371/journal.pone.0262249] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Accepted: 12/21/2021] [Indexed: 11/18/2022] Open
Abstract
In two experiments, we used the simple zero-sum game Rock, Paper and Scissors to study the common reinforcement-based rules of repeating choices after winning (win-stay) and shifting from previous choice options after losing (lose-shift). Participants played the game against both computer opponents who could not be exploited and computer opponents who could be exploited by making choices that would at times conflict with reinforcement. Against unexploitable opponents, participants achieved an approximation of random behavior, contrary to previous research commonly finding reinforcement biases. Against exploitable opponents, the participants learned to exploit the opponent regardless of whether optimal choices conflicted with reinforcement or not. The data suggest that learning a rule that allows one to exploit was largely determined by the outcome of the previous trial.
Collapse
Affiliation(s)
| | - Benjamin James Dyson
- University of Alberta, Alberta, Canada
- University of Sussex, Sussex, United Kingdom
- Ryerson University, Toronto, Canada
| |
Collapse
|
6
|
Abstract
In simple dyadic games such as rock, paper, scissors (RPS), people exhibit peculiar sequential dependencies across repeated interactions with a stable opponent. These regularities seem to arise from a mutually adversarial process of trying to outwit their opponent. What underlies this process, and what are its limits? Here, we offer a novel framework for formally describing and quantifying human adversarial reasoning in the rock, paper, scissors game. We first show that this framework enables a precise characterization of the complexity of patterned behaviors that people exhibit themselves, and appear to exploit in others. This combination allows for a quantitative understanding of human opponent modeling abilities. We apply these tools to an experiment in which people played 300 rounds of RPS in stable dyads. We find that although people exhibit very complex move dependencies, they cannot exploit these dependencies in their opponents, indicating a fundamental limitation in people’s capacity for adversarial reasoning. Taken together, the results presented here show how the rock, paper, scissors game allows for precise formalization of human adaptive reasoning abilities.
Collapse
|
7
|
Champ versus Chump: Viewing an Opponent’s Face Engages Attention but Not Reward Systems. GAMES 2021. [DOI: 10.3390/g12030062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
When we play competitive games, the opponents that we face act as predictors of the outcome of the game. For instance, if you are an average chess player and you face a Grandmaster, you anticipate a loss. Framed in a reinforcement learning perspective, our opponents can be thought of as predictors of rewards and punishments. The present study investigates whether facing an opponent would be processed as a reward or punishment depending on the level of difficulty the opponent poses. Participants played Rock, Paper, Scissors against three computer opponents while electroencephalographic (EEG) data was recorded. In a key manipulation, one opponent (HARD) was programmed to win most often, another (EASY) was made to lose most often, and the third (AVERAGE) had equiprobable outcomes of wins, losses, and ties. Through practice, participants learned to anticipate the relative challenge of a game based on the opponent they were facing that round. An analysis of our EEG data revealed that winning outcomes elicited a reward positivity relative to losing outcomes. Interestingly, our analysis of the predictive cues (i.e., the opponents’ faces) demonstrated that attentional engagement (P3a) was contextually sensitive to anticipated game difficulty. As such, our results for the predictive cue are contrary to what one might expect for a reinforcement model associated with predicted reward, but rather demonstrate that the neural response to the predictive cue was encoding the level of engagement with the opponent as opposed to value relative to the anticipated outcome.
Collapse
|
8
|
Abstract
This research studied the strategies that players use in sequential adversarial games. We took the Rock-Paper-Scissors (RPS) game as an example and ran players in two experiments. The first experiment involved two humans, who played the RPS together for 100 times. Importantly, our payoff design in the RPS allowed us to differentiate between participants who used a random strategy from those who used a Nash strategy. We found that participants did not play in agreement with the Nash strategy, but rather, their behavior was closer to random. Moreover, the analyses of the participants’ sequential actions indicated heterogeneous cycle-based behaviors: some participants’ actions were independent of their past outcomes, some followed a well-known win-stay/lose-change strategy, and others exhibited the win-change/lose-stay behavior. To understand the sequential patterns of outcome-dependent actions, we designed probabilistic computer algorithms involving specific change actions (i.e., to downgrade or upgrade according to the immediate past outcome): the Win-Downgrade/Lose-Stay (WDLS) or Win-Stay/Lose-Upgrade (WSLU) strategies. Experiment 2 used these strategies against a human player. Our findings show that participants followed a win-stay strategy against the WDLS algorithm and a lose-change strategy against the WSLU algorithm, while they had difficulty in using an upgrade/downgrade direction, suggesting humans’ limited ability to detect and counter the actions of the algorithm. Taken together, our two experiments showed a large diversity of sequential strategies, where the win-stay/lose-change strategy did not describe the majority of human players’ dynamic behaviors in this adversarial situation.
Collapse
|
9
|
Dyson BJ. Variability in competitive decision-making speed and quality against exploiting and exploitative opponents. Sci Rep 2021; 11:2859. [PMID: 33536472 PMCID: PMC7859242 DOI: 10.1038/s41598-021-82269-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 01/18/2021] [Indexed: 12/02/2022] Open
Abstract
A presumption in previous work has been that sub-optimality in competitive performance following loss is the result of a reduction in decision-making time (i.e., post-error speeding). The main goal of this paper is to test the relationship between decision-making speed and quality, with the hypothesis that slowing down decision-making should increase the likelihood of successful performance in cases where a model of opponent domination can be implemented. Across Experiments 1–3, the speed and quality of competitive decision-making was examined in a zero-sum game as a function of the nature of the opponent (unexploitable, exploiting, exploitable). Performance was also examined against the nature of a credit (or token) system used as a within-experimental manipulation (no credit, fixed credit, variable credit). To compliment reaction time variation as a function of outcome, both the fixed credit and variable credit conditions were designed to slow down decision-making, relative to a no credit condition where the game could be played in quick succession and without interruption. The data confirmed that (a) self-imposed reductions in processing time following losses (post-error speeding) were causal factors in determining poorer-quality behaviour, (b) the expression of lose-shift was less flexible than the expression of win-stay, and, (c) the use of a variable credit system may enhance the perceived control participants have against exploitable opponents. Future work should seek to disentangle temporal delay and response interruption as determinants of decision-making quality against numerous styles of opponency.
Collapse
Affiliation(s)
- Benjamin James Dyson
- Department of Psychology, University of Alberta, P-217 Biological Sciences Building, Edmonton, AB, T6G 2E9, Canada. .,Ryerson University, Toronto, Canada. .,University of Sussex, Brighton, UK.
| |
Collapse
|
10
|
Switching Competitors Reduces Win-Stay but Not Lose-Shift Behaviour: The Role of Outcome-Action Association Strength on Reinforcement Learning. GAMES 2020. [DOI: 10.3390/g11030025] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Predictability is a hallmark of poor-quality decision-making during competition. One source of predictability is the strong association between current outcome and future action, as dictated by the reinforcement learning principles of win–stay and lose–shift. We tested the idea that predictability could be reduced during competition by weakening the associations between outcome and action. To do this, participants completed a competitive zero-sum game in which the opponent from the current trial was either replayed (opponent repeat) thereby strengthening the association, or, replaced (opponent change) by a different competitor thereby weakening the association. We observed that win–stay behavior was reduced during opponent change trials but lose–shiftbehavior remained reliably predictable. Consistent with the group data, the number of individuals who exhibited predictable behavior following wins decreased for opponent change relative to opponent repeat trials. Our data show that future actions are more under internal control following positive relative to negative outcomes, and that externally breaking the bonds between outcome and action via opponent association also allows us to become less prone to exploitation.
Collapse
|
11
|
Dyson BJ, Musgrave C, Rowe C, Sandhur R. Behavioural and neural interactions between objective and subjective performance in a Matching Pennies game. Int J Psychophysiol 2019; 147:128-136. [PMID: 31730790 DOI: 10.1016/j.ijpsycho.2019.11.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 11/05/2019] [Accepted: 11/07/2019] [Indexed: 02/06/2023]
Abstract
To examine the behavioural and neural interactions between objective and subjective performance during competitive decision-making, participants completed a Matching Pennies game where win-rates were fixed within three conditions (win > lose, win = lose, win < lose) and outcomes were predicted at each trial. Using random behaviour as the hallmark of optimal performance, we observed item (heads), contingency (win-stay, lose-shift) and combinatorial (HH, HT, TH, TT) biases across all conditions. Higher-quality behaviour represented by a reduction in combinatorial bias was observed during high win-rate exposure. In contrast, over-optimism biases were observed only in conditions where win rates were equal to, or less than, loss rates. At a group level, a neural measure of outcome evaluation (feedback-related negativity; FRN) indexed the binary distinction between positive and negative outcome. At an individual level, increased belief in successful performance accentuated FRN amplitude differences between wins and losses. Taken together, the data suggest that objective experiences of, or, subjective beliefs in, the predominance of positive outcomes may be mutual attempts to self-regulate performance during competition. In this way, increased exposure to positive outcomes (real or imagined) may help to weight the output of the more diligent and analytic System 2, relative to the impulsive and intuitive System 1.
Collapse
Affiliation(s)
- Benjamin James Dyson
- University of Alberta, Canada; University of Sussex, UK; Ryerson University, Canada.
| | | | | | | |
Collapse
|
12
|
Dyson BJ, Steward BA, Meneghetti T, Forder L. Behavioural and neural limits in competitive decision making: The roles of outcome, opponency and observation. Biol Psychol 2019; 149:107778. [PMID: 31593749 DOI: 10.1016/j.biopsycho.2019.107778] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Revised: 07/31/2019] [Accepted: 09/24/2019] [Indexed: 11/25/2022]
Abstract
To understand the boundaries we set for ourselves in terms of environmental responsibility during competition, we examined a neural index of outcome valence (feedback-related negativity; FRN) in relation to an early index of visual attention (N1), a later index of motivational significance (P3), and, eventual behaviour. In Experiment 1 (n = 36), participants either were (play) or were not (observe) responsible for action selection. In Experiment 2 (n = 36), opponents additionally either could (exploitable) or could not (unexploitable) be beaten. Various failures in reinforcement learning expression were revealed including large-scale approximations of random behaviour. Against unexploitable opponents, N1 determined the extent to which negative and positive outcomes were perceived as distinct categories by FRN. Against exploitable opponents, FRN determined the extent to which P3 generated neural gain for future events. Differential activation of the N1 - FRN - P3 processing chain provides a framework for understanding the behavioural dynamism observed during competitive decision making.
Collapse
Affiliation(s)
- Benjamin James Dyson
- University of Alberta, Canada; University of Sussex, UK; Ryerson University, Canada.
| | | | | | | |
Collapse
|
13
|
Behavioural Isomorphism, Cognitive Economy and Recursive Thought in Non-Transitive Game Strategy. GAMES 2019. [DOI: 10.3390/g10030032] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Game spaces in which an organism must repeatedly compete with an opponent for mutually exclusive outcomes are critical methodologies for understanding decision-making under pressure. In the non-transitive game rock, paper, scissors (RPS), the only technique that guarantees the lack of exploitation is to perform randomly in accordance with mixed-strategy. However, such behavior is thought to be outside bounded rationality and so decision-making can become deterministic, predictable, and ultimately exploitable. This review identifies similarities across economics, neuroscience, nonlinear dynamics, human, and animal cognition literatures, and provides a taxonomy of RPS strategy. RPS strategies are discussed in terms of (a) whether the relevant computations require sensitivity to item frequency, the cyclic relationships between responses, or the outcome of the previous trial, and (b) whether the strategy is framed around the self or other. The negative implication of this taxonomy is that despite the differences in cognitive economy and recursive thought, many of the identified strategies are behaviorally isomorphic. This makes it difficult to infer strategy from behavior. The positive implication is that this isomorphism can be used as a novel design feature in furthering our understanding of the attribution, agency, and acquisition of strategy in RPS and other game spaces.
Collapse
|
14
|
Forder L, Dyson BJ. Behavioural and neural modulation of win-stay but not lose-shift strategies as a function of outcome value in Rock, Paper, Scissors. Sci Rep 2016; 6:33809. [PMID: 27658703 PMCID: PMC5034336 DOI: 10.1038/srep33809] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Accepted: 09/01/2016] [Indexed: 11/08/2022] Open
Abstract
Competitive environments in which individuals compete for mutually-exclusive outcomes require rational decision making in order to maximize gains but often result in poor quality heuristics. Reasons for the greater reliance on lose-shift relative to win-stay behaviour shown in previous studies were explored using the game of Rock, Paper, Scissors and by manipulating the value of winning and losing. Decision-making following a loss was characterized as relatively fast and relatively inflexible both in terms of the failure to modulate the magnitude of lose-shift strategy and the lack of significant neural modulation. In contrast, decision-making following a win was characterized as relatively slow and relatively flexible both in terms of a behavioural increase in the magnitude of win-stay strategy and a neural modulation of feedback-related negativity (FRN) and stimulus-preceding negativity (SPN) following outcome value modulation. The win-stay/lose-shift heuristic appears not to be a unified mechanism, with the former relying on System 2 processes and the latter relying on System 1 processes. Our ability to play rationally appears more likely when the outcome is positive and when the value of wins are low, highlighting how vulnerable we can be when trying to succeed during competition.
Collapse
|