1
|
Philippe R, Janet R, Khalvati K, Rao RPN, Lee D, Dreher JC. Neurocomputational mechanisms involved in adaptation to fluctuating intentions of others. Nat Commun 2024; 15:3189. [PMID: 38609372 PMCID: PMC11014977 DOI: 10.1038/s41467-024-47491-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Accepted: 03/12/2024] [Indexed: 04/14/2024] Open
Abstract
Humans frequently interact with agents whose intentions can fluctuate between competition and cooperation over time. It is unclear how the brain adapts to fluctuating intentions of others when the nature of the interactions (to cooperate or compete) is not explicitly and truthfully signaled. Here, we use model-based fMRI and a task in which participants thought they were playing with another player. In fact, they played with an algorithm that alternated without signaling between cooperative and competitive strategies. We show that a neurocomputational mechanism with arbitration between competitive and cooperative experts outperforms other learning models in predicting choice behavior. At the brain level, the fMRI results show that the ventral striatum and ventromedial prefrontal cortex track the difference of reliability between these experts. When attributing competitive intentions, we find increased coupling between these regions and a network that distinguishes prediction errors related to competition and cooperation. These findings provide a neurocomputational account of how the brain arbitrates dynamically between cooperative and competitive intentions when making adaptive social decisions.
Collapse
Affiliation(s)
- Rémi Philippe
- CNRS-Institut des Sciences Cognitives Marc Jeannerod, UMR5229, Neuroeconomics, reward, and decision making laboratory, Lyon, France
- Université Claude Bernard Lyon 1, Lyon, France
| | - Rémi Janet
- CNRS-Institut des Sciences Cognitives Marc Jeannerod, UMR5229, Neuroeconomics, reward, and decision making laboratory, Lyon, France
- Université Claude Bernard Lyon 1, Lyon, France
| | - Koosha Khalvati
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Rajesh P N Rao
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
- Center for Neurotechnology, University of Washington, Seattle, WA, USA
| | - Daeyeol Lee
- Zanvyl Krieger Mind/Brain Institute, Johns Hopkins University, Baltimore, MD, USA
- Kavli Discovery Neuroscience Institute, Johns Hopkins University, Baltimore, MD, USA
- Department of Psychological and Brain Sciences, Johns Hopkins University, Baltimore, MD, USA
- Department of Neuroscience, Johns Hopkins University, Baltimore, MD, USA
| | - Jean-Claude Dreher
- CNRS-Institut des Sciences Cognitives Marc Jeannerod, UMR5229, Neuroeconomics, reward, and decision making laboratory, Lyon, France.
- Université Claude Bernard Lyon 1, Lyon, France.
| |
Collapse
|
2
|
Paunov A, L'Hôtellier M, Guo D, He Z, Yu A, Meyniel F. Multiple and subject-specific roles of uncertainty in reward-guided decision-making. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.27.587016. [PMID: 38585958 PMCID: PMC10996615 DOI: 10.1101/2024.03.27.587016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Decision-making in noisy, changing, and partially observable environments entails a basic tradeoff between immediate reward and longer-term information gain, known as the exploration-exploitation dilemma. Computationally, an effective way to balance this tradeoff is by leveraging uncertainty to guide exploration. Yet, in humans, empirical findings are mixed, from suggesting uncertainty-seeking to indifference and avoidance. In a novel bandit task that better captures uncertainty-driven behavior, we find multiple roles for uncertainty in human choices. First, stable and psychologically meaningful individual differences in uncertainty preferences actually range from seeking to avoidance, which can manifest as null group-level effects. Second, uncertainty modulates the use of basic decision heuristics that imperfectly exploit immediate rewards: a repetition bias and win-stay-lose-shift heuristic. These heuristics interact with uncertainty, favoring heuristic choices under higher uncertainty. These results, highlighting the rich and varied structure of reward-based choice, are a step to understanding its functional basis and dysfunction in psychopathology.
Collapse
Affiliation(s)
- Alexander Paunov
- INSERM-CEA Cognitive Neuroimaging Unit (UNICOG), NeuroSpin Center, CEA Paris-Saclay, Gif-sur-Yvette, France Université de Paris, Paris, France
- Institut de Neuromodulation, GHU Paris, Psychiatrie et Neurosciences, Centre Hospitalier Sainte-Anne, Pôle Hospitalo-universitaire 15, Université Paris Cité, Paris, France
| | - Maëva L'Hôtellier
- INSERM-CEA Cognitive Neuroimaging Unit (UNICOG), NeuroSpin Center, CEA Paris-Saclay, Gif-sur-Yvette, France Université de Paris, Paris, France
| | - Dalin Guo
- Department of Cognitive Science, University of California San Diego, San Diego, CA, USA
| | - Zoe He
- Department of Cognitive Science, University of California San Diego, San Diego, CA, USA
| | - Angela Yu
- Department of Cognitive Science, University of California San Diego, San Diego, CA, USA
- Centre for Cognitive Science & Hessian AI Center, Technical University of Darmstadt, Germany
| | - Florent Meyniel
- INSERM-CEA Cognitive Neuroimaging Unit (UNICOG), NeuroSpin Center, CEA Paris-Saclay, Gif-sur-Yvette, France Université de Paris, Paris, France
- Institut de Neuromodulation, GHU Paris, Psychiatrie et Neurosciences, Centre Hospitalier Sainte-Anne, Pôle Hospitalo-universitaire 15, Université Paris Cité, Paris, France
| |
Collapse
|
3
|
Ota K, Charles L, Haggard P. Autonomous behaviour and the limits of human volition. Cognition 2024; 244:105684. [PMID: 38101173 DOI: 10.1016/j.cognition.2023.105684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 11/22/2023] [Accepted: 11/28/2023] [Indexed: 12/17/2023]
Abstract
Humans and some other animals can autonomously generate action choices that contribute to solving complex problems. However, experimental investigations of the cognitive bases of human autonomy are challenging, because experimental paradigms typically constrain behaviour using controlled contexts, and elicit behaviour by external triggers. In contrast, autonomy and freedom imply unconstrained behaviour initiated by endogenous triggers. Here we propose a new theoretical construct of adaptive autonomy, meaning the capacity to make behavioural choices that are free from constraints of both immediate external triggers and of routine response patterns, but nevertheless show appropriate coordination with the environment. Participants (N = 152) played a competitive game in which they had to choose the right time to act, in the face of an opponent who punished (in separate blocks) either choice biases (such as always responding early), sequential patterns of action timing across trials (such as early, late, early, late…), or predictable action-outcome dependence (such as win-stay, lose-shift). Adaptive autonomy was quantified as the ability to maintain performance when each of these influences on action selection was punished. We found that participants could become free from habitual choices regarding when to act and could also become free from sequential action patterns. However, they were not able to free themselves from influences of action-outcome dependence, even when these resulted in poor performance. These results point to a new concept of autonomous behaviour as flexible adaptation of voluntary action choices in a way that avoids stereotypy. In a sequential analysis, we also demonstrated that participants increased their reliance on belief learning in which they attempt to understand the competitor's beliefs and intentions, when transition bias and reinforcement bias were punished. Taken together, our study points to a cognitive mechanism of adaptive autonomy in which competitive interactions with other agents could promote both social cognition and volition in the form of non-stereotyped action choices.
Collapse
Affiliation(s)
- Keiji Ota
- Institute of Cognitive Neuroscience, University College London, London, United Kingdom; Department of Psychology, School of Biological and Behavioural Sciences, Queen Mary University of London, London, United Kingdom.
| | - Lucie Charles
- Institute of Cognitive Neuroscience, University College London, London, United Kingdom; Department of Psychology, School of Biological and Behavioural Sciences, Queen Mary University of London, London, United Kingdom
| | - Patrick Haggard
- Institute of Cognitive Neuroscience, University College London, London, United Kingdom
| |
Collapse
|
4
|
Wang H, Kwan AC. Competitive and cooperative games for probing the neural basis of social decision-making in animals. Neurosci Biobehav Rev 2023; 149:105158. [PMID: 37019249 PMCID: PMC10175234 DOI: 10.1016/j.neubiorev.2023.105158] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 03/29/2023] [Accepted: 04/02/2023] [Indexed: 04/07/2023]
Abstract
In a social environment, it is essential for animals to consider the behavior of others when making decisions. To quantitatively assess such social decisions, games offer unique advantages. Games may have competitive and cooperative components, modeling situations with antagonistic and shared objectives between players. Games can be analyzed by mathematical frameworks, including game theory and reinforcement learning, such that an animal's choice behavior can be compared against the optimal strategy. However, so far games have been underappreciated in neuroscience research, particularly for rodent studies. In this review, we survey the varieties of competitive and cooperative games that have been tested, contrasting strategies employed by non-human primates and birds with rodents. We provide examples of how games can be used to uncover neural mechanisms and explore species-specific behavioral differences. We assess critically the limitations of current paradigms and propose improvements. Together, the synthesis of current literature highlights the advantages of using games to probe the neural basis of social decisions for neuroscience studies.
Collapse
Affiliation(s)
- Hongli Wang
- Interdepartmental Neuroscience Program, Yale University School of Medicine, New Haven, CT, USA
| | - Alex C Kwan
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA; Department of Neuroscience, Yale University School of Medicine, New Haven, CT, USA; Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY, USA; Department of Psychiatry, Weill Cornell Medicine, New York, NY 10065, USA.
| |
Collapse
|
5
|
Sundvall J, Dyson BJ. Breaking the bonds of reinforcement: Effects of trial outcome, rule consistency and rule complexity against exploitable and unexploitable opponents. PLoS One 2022; 17:e0262249. [PMID: 35108279 PMCID: PMC8809577 DOI: 10.1371/journal.pone.0262249] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Accepted: 12/21/2021] [Indexed: 11/18/2022] Open
Abstract
In two experiments, we used the simple zero-sum game Rock, Paper and Scissors to study the common reinforcement-based rules of repeating choices after winning (win-stay) and shifting from previous choice options after losing (lose-shift). Participants played the game against both computer opponents who could not be exploited and computer opponents who could be exploited by making choices that would at times conflict with reinforcement. Against unexploitable opponents, participants achieved an approximation of random behavior, contrary to previous research commonly finding reinforcement biases. Against exploitable opponents, the participants learned to exploit the opponent regardless of whether optimal choices conflicted with reinforcement or not. The data suggest that learning a rule that allows one to exploit was largely determined by the outcome of the previous trial.
Collapse
Affiliation(s)
| | - Benjamin James Dyson
- University of Alberta, Alberta, Canada
- University of Sussex, Sussex, United Kingdom
- Ryerson University, Toronto, Canada
| |
Collapse
|
6
|
Lindig-León C, Schmid G, Braun DA. Nash equilibria in human sensorimotor interactions explained by Q-learning with intrinsic costs. Sci Rep 2021; 11:20779. [PMID: 34675336 PMCID: PMC8531365 DOI: 10.1038/s41598-021-99428-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 09/01/2021] [Indexed: 11/09/2022] Open
Abstract
The Nash equilibrium concept has previously been shown to be an important tool to understand human sensorimotor interactions, where different actors vie for minimizing their respective effort while engaging in a multi-agent motor task. However, it is not clear how such equilibria are reached. Here, we compare different reinforcement learning models to human behavior engaged in sensorimotor interactions with haptic feedback based on three classic games, including the prisoner's dilemma, and the symmetric and asymmetric matching pennies games. We find that a discrete analysis that reduces the continuous sensorimotor interaction to binary choices as in classical matrix games does not allow to distinguish between the different learning algorithms, but that a more detailed continuous analysis with continuous formulations of the learning algorithms and the game-theoretic solutions affords different predictions. In particular, we find that Q-learning with intrinsic costs that disfavor deviations from average behavior explains the observed data best, even though all learning algorithms equally converge to admissible Nash equilibrium solutions. We therefore conclude that it is important to study different learning algorithms for understanding sensorimotor interactions, as such behavior cannot be inferred from a game-theoretic analysis alone, that simply focuses on the Nash equilibrium concept, as different learning algorithms impose preferences on the set of possible equilibrium solutions due to the inherent learning dynamics.
Collapse
Affiliation(s)
- Cecilia Lindig-León
- Institute of Neural Information Processing, Faculty of Engineering, Computer Science and Psychology, Ulm University, Ulm, Germany.
| | - Gerrit Schmid
- Institute of Neural Information Processing, Faculty of Engineering, Computer Science and Psychology, Ulm University, Ulm, Germany
| | - Daniel A Braun
- Institute of Neural Information Processing, Faculty of Engineering, Computer Science and Psychology, Ulm University, Ulm, Germany
| |
Collapse
|
7
|
Abstract
Consciousness has evolved and is a feature of all animals with sufficiently complex nervous systems. It is, therefore, primarily a problem for biology, rather than physics. In this review, I will consider three aspects of consciousness: level of consciousness, whether we are awake or in a coma; the contents of consciousness, what determines how a small amount of sensory information is associated with subjective experience, while the rest is not; and meta-consciousness, the ability to reflect upon our subjective experiences and, importantly, to share them with others. I will discuss and compare current theories of the neural and cognitive mechanisms involved in producing these three aspects of consciousness and conclude that the research in this area is flourishing and has already succeeded to delineate these mechanisms in surprising detail.
Collapse
Affiliation(s)
- Chris D Frith
- Wellcome Centre for Human Neuroimaging at University College London, UK
- Institute of Philosophy, Institute of Advanced Study, University of London, UK
| |
Collapse
|
8
|
Dyson BJ. Variability in competitive decision-making speed and quality against exploiting and exploitative opponents. Sci Rep 2021; 11:2859. [PMID: 33536472 PMCID: PMC7859242 DOI: 10.1038/s41598-021-82269-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 01/18/2021] [Indexed: 12/02/2022] Open
Abstract
A presumption in previous work has been that sub-optimality in competitive performance following loss is the result of a reduction in decision-making time (i.e., post-error speeding). The main goal of this paper is to test the relationship between decision-making speed and quality, with the hypothesis that slowing down decision-making should increase the likelihood of successful performance in cases where a model of opponent domination can be implemented. Across Experiments 1–3, the speed and quality of competitive decision-making was examined in a zero-sum game as a function of the nature of the opponent (unexploitable, exploiting, exploitable). Performance was also examined against the nature of a credit (or token) system used as a within-experimental manipulation (no credit, fixed credit, variable credit). To compliment reaction time variation as a function of outcome, both the fixed credit and variable credit conditions were designed to slow down decision-making, relative to a no credit condition where the game could be played in quick succession and without interruption. The data confirmed that (a) self-imposed reductions in processing time following losses (post-error speeding) were causal factors in determining poorer-quality behaviour, (b) the expression of lose-shift was less flexible than the expression of win-stay, and, (c) the use of a variable credit system may enhance the perceived control participants have against exploitable opponents. Future work should seek to disentangle temporal delay and response interruption as determinants of decision-making quality against numerous styles of opponency.
Collapse
Affiliation(s)
- Benjamin James Dyson
- Department of Psychology, University of Alberta, P-217 Biological Sciences Building, Edmonton, AB, T6G 2E9, Canada. .,Ryerson University, Toronto, Canada. .,University of Sussex, Brighton, UK.
| |
Collapse
|
9
|
Capuchin and rhesus monkeys show sunk cost effects in a psychomotor task. Sci Rep 2020; 10:20396. [PMID: 33230238 PMCID: PMC7683735 DOI: 10.1038/s41598-020-77301-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Accepted: 11/03/2020] [Indexed: 11/10/2022] Open
Abstract
Human decision-making is often swayed by irrecoverable investments even though it should only be based on future—and not past—costs and benefits. Although this sunk cost effect is widely documented and can lead to devastating losses, the underlying psychological mechanisms are unclear. To tease apart possible explanations through a comparative approach, we assessed capuchin and rhesus monkeys’ susceptibility to sunk costs in a psychomotor task. Monkeys needed to track a moving target with a joystick-controlled cursor for variable durations. They could stop at any time, ending the trial without reward. To minimize the work required for a reward, monkeys should have always persisted for at least 1 s, but should have abandoned the trial if that did not yield a reward. Capuchin monkeys and especially rhesus macaques persisted to trial completion even when it was suboptimal, and were more likely to complete the trial the longer they had already tracked the target. These effects were less pronounced, although still present, when the change in expected tracking duration was signalled visually. These results show that sunk cost effects can arise in the absence of human-unique factors and may emerge, in part, because persisting can resolve uncertainty.
Collapse
|
10
|
Wang L, Huang W, Li Y, Evans J, He S. Multi-AI competing and winning against humans in iterated Rock-Paper-Scissors game. Sci Rep 2020; 10:13873. [PMID: 32807813 PMCID: PMC7431549 DOI: 10.1038/s41598-020-70544-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Accepted: 07/28/2020] [Indexed: 11/09/2022] Open
Abstract
Predicting and modeling human behavior and finding trends within human decision-making processes is a major problem of social science. Rock Paper Scissors (RPS) is the fundamental strategic question in many game theory problems and real-world competitions. Finding the right approach to beat a particular human opponent is challenging. Here we use an AI (artificial intelligence) algorithm based on Markov Models of one fixed memory length (abbreviated as "single AI") to compete against humans in an iterated RPS game. We model and predict human competition behavior by combining many Markov Models with different fixed memory lengths (abbreviated as "multi-AI"), and develop an architecture of multi-AI with changeable parameters to adapt to different competition strategies. We introduce a parameter called "focus length" (a positive number such as 5 or 10) to control the speed and sensitivity for our multi-AI to adapt to the opponent's strategy change. The focus length is the number of previous rounds that the multi-AI should look at when determining which Single-AI has the best performance and should choose to play for the next game. We experimented with 52 different people, each playing 300 rounds continuously against one specific multi-AI model, and demonstrated that our strategy could win against more than 95% of human opponents.
Collapse
Affiliation(s)
- Lei Wang
- National Engineering Research Center for Optical Instruments, Centre for Optical and Electromagnetic Research, Zhejiang University, Hangzhou, 310058, China
| | - Wenbin Huang
- National Engineering Research Center for Optical Instruments, Centre for Optical and Electromagnetic Research, Zhejiang University, Hangzhou, 310058, China.,Ningbo Research Institute, Zhejiang University, Ningbo, 315100, China
| | - Yuanpeng Li
- National Engineering Research Center for Optical Instruments, Centre for Optical and Electromagnetic Research, Zhejiang University, Hangzhou, 310058, China.,Ningbo Research Institute, Zhejiang University, Ningbo, 315100, China
| | - Julian Evans
- National Engineering Research Center for Optical Instruments, Centre for Optical and Electromagnetic Research, Zhejiang University, Hangzhou, 310058, China
| | - Sailing He
- National Engineering Research Center for Optical Instruments, Centre for Optical and Electromagnetic Research, Zhejiang University, Hangzhou, 310058, China. .,Ningbo Research Institute, Zhejiang University, Ningbo, 315100, China. .,Department of Electromagnetic Engineering, School of Electrical Engineering, Royal Institute of Technology, 100 44, Stockholm, Sweden.
| |
Collapse
|
11
|
Switching Competitors Reduces Win-Stay but Not Lose-Shift Behaviour: The Role of Outcome-Action Association Strength on Reinforcement Learning. GAMES 2020. [DOI: 10.3390/g11030025] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Predictability is a hallmark of poor-quality decision-making during competition. One source of predictability is the strong association between current outcome and future action, as dictated by the reinforcement learning principles of win–stay and lose–shift. We tested the idea that predictability could be reduced during competition by weakening the associations between outcome and action. To do this, participants completed a competitive zero-sum game in which the opponent from the current trial was either replayed (opponent repeat) thereby strengthening the association, or, replaced (opponent change) by a different competitor thereby weakening the association. We observed that win–stay behavior was reduced during opponent change trials but lose–shiftbehavior remained reliably predictable. Consistent with the group data, the number of individuals who exhibited predictable behavior following wins decreased for opponent change relative to opponent repeat trials. Our data show that future actions are more under internal control following positive relative to negative outcomes, and that externally breaking the bonds between outcome and action via opponent association also allows us to become less prone to exploitation.
Collapse
|
12
|
Cai CR, Wu ZX. Analytical treatment for cyclic three-state dynamics on static networks. Phys Rev E 2020; 101:012305. [PMID: 32069571 DOI: 10.1103/physreve.101.012305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Indexed: 06/10/2023]
Abstract
Whenever a dynamical process unfolds on static networks, the dynamical state of any focal individual will be exclusively influenced by directly connected neighbors, rather than by those unconnected ones, hence the arising of the dynamical correlation problem, where mean-field-based methods fail to capture the scenario. The dynamic correlation coupling problem has always been an important and difficult problem in the theoretical field of physics. The explicit analytical expressions and the decoupling methods often play a key role in the development of corresponding field. In this paper, we study the cyclic three-state dynamics on static networks, which include a wide class of dynamical processes, for example, the cyclic Lotka-Volterra model, the directed migration model, the susceptible-infected-recovered-susceptible epidemic model, and the predator-prey with empty sites model. We derive the explicit analytical solutions of the propagating size and the threshold curve surface for the four different dynamics. We compare the results on static networks with those on annealed networks and made an interesting discovery: for the symmetrical dynamical model (the cyclic Lotka-Volterra model and the directed migration model, where the three states are of rotational symmetry), the macroscopic behaviors of the dynamical processes on static networks are the same as those on annealed networks; while the outcomes of the dynamical processes on static networks are different with, and more complicated than, those on annealed networks for asymmetric dynamical model (the susceptible-infected-recovered-susceptible epidemic model and the predator-prey with empty sites model). We also compare the results forecasted by our theoretical method with those by Monte Carlo simulations and find good agreement between the results obtained by the two methods.
Collapse
Affiliation(s)
- Chao-Ran Cai
- School of Physics, Northwest University, Xi'an 710069, China
- Shaanxi Key Laboratory for Theoretical Physics Frontiers, Xi'an 710069, China
| | - Zhi-Xi Wu
- Institute of Computational Physics and Complex Systems, Lanzhou University, Lanzhou, Gansu 730000, China
| |
Collapse
|
13
|
Dyson BJ, Musgrave C, Rowe C, Sandhur R. Behavioural and neural interactions between objective and subjective performance in a Matching Pennies game. Int J Psychophysiol 2019; 147:128-136. [PMID: 31730790 DOI: 10.1016/j.ijpsycho.2019.11.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 11/05/2019] [Accepted: 11/07/2019] [Indexed: 02/06/2023]
Abstract
To examine the behavioural and neural interactions between objective and subjective performance during competitive decision-making, participants completed a Matching Pennies game where win-rates were fixed within three conditions (win > lose, win = lose, win < lose) and outcomes were predicted at each trial. Using random behaviour as the hallmark of optimal performance, we observed item (heads), contingency (win-stay, lose-shift) and combinatorial (HH, HT, TH, TT) biases across all conditions. Higher-quality behaviour represented by a reduction in combinatorial bias was observed during high win-rate exposure. In contrast, over-optimism biases were observed only in conditions where win rates were equal to, or less than, loss rates. At a group level, a neural measure of outcome evaluation (feedback-related negativity; FRN) indexed the binary distinction between positive and negative outcome. At an individual level, increased belief in successful performance accentuated FRN amplitude differences between wins and losses. Taken together, the data suggest that objective experiences of, or, subjective beliefs in, the predominance of positive outcomes may be mutual attempts to self-regulate performance during competition. In this way, increased exposure to positive outcomes (real or imagined) may help to weight the output of the more diligent and analytic System 2, relative to the impulsive and intuitive System 1.
Collapse
Affiliation(s)
- Benjamin James Dyson
- University of Alberta, Canada; University of Sussex, UK; Ryerson University, Canada.
| | | | | | | |
Collapse
|
14
|
Dyson BJ, Steward BA, Meneghetti T, Forder L. Behavioural and neural limits in competitive decision making: The roles of outcome, opponency and observation. Biol Psychol 2019; 149:107778. [PMID: 31593749 DOI: 10.1016/j.biopsycho.2019.107778] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Revised: 07/31/2019] [Accepted: 09/24/2019] [Indexed: 11/25/2022]
Abstract
To understand the boundaries we set for ourselves in terms of environmental responsibility during competition, we examined a neural index of outcome valence (feedback-related negativity; FRN) in relation to an early index of visual attention (N1), a later index of motivational significance (P3), and, eventual behaviour. In Experiment 1 (n = 36), participants either were (play) or were not (observe) responsible for action selection. In Experiment 2 (n = 36), opponents additionally either could (exploitable) or could not (unexploitable) be beaten. Various failures in reinforcement learning expression were revealed including large-scale approximations of random behaviour. Against unexploitable opponents, N1 determined the extent to which negative and positive outcomes were perceived as distinct categories by FRN. Against exploitable opponents, FRN determined the extent to which P3 generated neural gain for future events. Differential activation of the N1 - FRN - P3 processing chain provides a framework for understanding the behavioural dynamism observed during competitive decision making.
Collapse
Affiliation(s)
- Benjamin James Dyson
- University of Alberta, Canada; University of Sussex, UK; Ryerson University, Canada.
| | | | | | | |
Collapse
|
15
|
Behavioural Isomorphism, Cognitive Economy and Recursive Thought in Non-Transitive Game Strategy. GAMES 2019. [DOI: 10.3390/g10030032] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Game spaces in which an organism must repeatedly compete with an opponent for mutually exclusive outcomes are critical methodologies for understanding decision-making under pressure. In the non-transitive game rock, paper, scissors (RPS), the only technique that guarantees the lack of exploitation is to perform randomly in accordance with mixed-strategy. However, such behavior is thought to be outside bounded rationality and so decision-making can become deterministic, predictable, and ultimately exploitable. This review identifies similarities across economics, neuroscience, nonlinear dynamics, human, and animal cognition literatures, and provides a taxonomy of RPS strategy. RPS strategies are discussed in terms of (a) whether the relevant computations require sensitivity to item frequency, the cyclic relationships between responses, or the outcome of the previous trial, and (b) whether the strategy is framed around the self or other. The negative implication of this taxonomy is that despite the differences in cognitive economy and recursive thought, many of the identified strategies are behaviorally isomorphic. This makes it difficult to infer strategy from behavior. The positive implication is that this isomorphism can be used as a novel design feature in furthering our understanding of the attribution, agency, and acquisition of strategy in RPS and other game spaces.
Collapse
|
16
|
Groman SM, Keistler C, Keip AJ, Hammarlund E, DiLeone RJ, Pittenger C, Lee D, Taylor JR. Orbitofrontal Circuits Control Multiple Reinforcement-Learning Processes. Neuron 2019; 103:734-746.e3. [PMID: 31253468 DOI: 10.1016/j.neuron.2019.05.042] [Citation(s) in RCA: 80] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Revised: 04/18/2019] [Accepted: 05/24/2019] [Indexed: 12/18/2022]
Abstract
Adaptive decision making in dynamic environments requires multiple reinforcement-learning steps that may be implemented by dissociable neural circuits. Here, we used a novel directionally specific viral ablation approach to investigate the function of several anatomically defined orbitofrontal cortex (OFC) circuits during adaptive, flexible decision making in rats trained on a probabilistic reversal learning task. Ablation of OFC neurons projecting to the nucleus accumbens selectively disrupted performance following a reversal, by disrupting the use of negative outcomes to guide subsequent choices. Ablation of amygdala neurons projecting to the OFC also impaired reversal performance, but due to disruptions in the use of positive outcomes to guide subsequent choices. Ablation of OFC neurons projecting to the amygdala, by contrast, enhanced reversal performance by destabilizing action values. Our data are inconsistent with a unitary function of the OFC in decision making. Rather, distinct OFC-amygdala-striatal circuits mediate distinct components of the action-value updating and maintenance necessary for decision making.
Collapse
Affiliation(s)
| | - Colby Keistler
- Department of Psychiatry, Yale University, New Haven, CT 06515, USA
| | - Alex J Keip
- Department of Psychiatry, Yale University, New Haven, CT 06515, USA
| | - Emma Hammarlund
- Department of Psychiatry, Yale University, New Haven, CT 06515, USA
| | - Ralph J DiLeone
- Department of Psychiatry, Yale University, New Haven, CT 06515, USA; Department of Neuroscience, Yale University, New Haven, CT 06515, USA
| | - Christopher Pittenger
- Department of Psychiatry, Yale University, New Haven, CT 06515, USA; Child Study Center, Yale University, New Haven, CT 06515, USA
| | - Daeyeol Lee
- Department of Psychiatry, Yale University, New Haven, CT 06515, USA; Department of Neuroscience, Yale University, New Haven, CT 06515, USA; Department of Psychology, Yale University, New Haven, CT 06515, USA
| | - Jane R Taylor
- Department of Psychiatry, Yale University, New Haven, CT 06515, USA; Department of Neuroscience, Yale University, New Haven, CT 06515, USA; Department of Psychology, Yale University, New Haven, CT 06515, USA.
| |
Collapse
|
17
|
Abstract
Habits form a crucial component of behavior. In recent years, key computational models have conceptualized habits as arising from model-free reinforcement learning mechanisms, which typically select between available actions based on the future value expected to result from each. Traditionally, however, habits have been understood as behaviors that can be triggered directly by a stimulus, without requiring the animal to evaluate expected outcomes. Here, we develop a computational model instantiating this traditional view, in which habits develop through the direct strengthening of recently taken actions rather than through the encoding of outcomes. We demonstrate that this model accounts for key behavioral manifestations of habits, including insensitivity to outcome devaluation and contingency degradation, as well as the effects of reinforcement schedule on the rate of habit formation. The model also explains the prevalent observation of perseveration in repeated-choice tasks as an additional behavioral manifestation of the habit system. We suggest that mapping habitual behaviors onto value-free mechanisms provides a parsimonious account of existing behavioral and neural data. This mapping may provide a new foundation for building robust and comprehensive models of the interaction of habits with other, more goal-directed types of behaviors and help to better guide research into the neural mechanisms underlying control of instrumental behavior more generally. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
Collapse
Affiliation(s)
| | - Amitai Shenhav
- Department of Cognitive, Linguistic, and Psychological Sciences, Brown Institute for Brain Science, Brown University
| | | |
Collapse
|
18
|
Out of sight, out of mind: Occlusion and eye closure destabilize moving bistable structure-from-motion displays. Atten Percept Psychophys 2018; 80:1193-1204. [PMID: 29560607 DOI: 10.3758/s13414-018-1505-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Our brain constantly tries to anticipate the future by using a variety of memory mechanisms. Interestingly, studies using the intermittent presentation of multistable displays have shown little perceptual persistence for interruptions longer than a few hundred milliseconds. Here we examined whether we can facilitate the perceptual stability of bistable displays following a period of invisibility by employing a physically plausible and ecologically valid occlusion event sequence, as opposed to the typical intermittent presentation, with sudden onsets and offsets. To this end, we presented a bistable rotating structure-from-motion display that was moving along a linear horizontal trajectory on the screen and either was temporarily occluded by another object (a cardboard strip in Exp. 1, a computer-generated image in Exp. 2) or became invisible due to eye closure (Exp. 3). We report that a bistable rotation direction reliably persisted following occlusion or interruption only (1) if the pre- and postinterruption locations overlapped spatially (an occluder with apertures in Exp. 2 or brief, spontaneous blinks in Exp. 3) or (2) if an object's size allowed for the efficient grouping of dots on both sides of the occluding object (large objects in Exp. 1). In contrast, we observed no persistence whenever the pre- and postinterruption locations were nonoverlapping (large solid occluding objects in Exps. 1 and 2 and long, prompted blinks in Exp. 3). We report that the bistable rotation direction of a moving object persisted only for spatially overlapping neural representations, and that persistence was not facilitated by a physically plausible and ecologically valid occlusion event.
Collapse
|
19
|
Grabowska MJ, Steeves J, Alpay J, van de Poll M, Ertekin D, van Swinderen B. Innate visual preferences and behavioral flexibility in Drosophila. J Exp Biol 2018; 221:jeb.185918. [DOI: 10.1242/jeb.185918] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2018] [Accepted: 10/10/2018] [Indexed: 01/02/2023]
Abstract
Visual decision-making in animals is influenced by innate preferences as well as experience. Interaction between hard-wired responses and changing motivational states determines whether a visual stimulus is attractive, aversive, or neutral. It is however difficult to separate the relative contribution of nature versus nurture in experimental paradigms, especially for more complex visual parameters such as the shape of objects. We used a closed-loop virtual reality paradigm for walking Drosophila flies to uncover innate visual preferences for the shape and size of objects, in a recursive choice scenario allowing the flies to reveal their visual preferences over time. We found that Drosophila flies display a robust attraction / repulsion profile for a range of objects sizes in this paradigm, and that this visual preference profile remains evident under a variety of conditions and persists into old age. We also demonstrate a level of flexibility in this behavior: innate repulsion to certain objects could be transiently overridden if these were novel, although this effect was only evident in younger flies. Finally, we show that a neuromodulatory circuit in the fly brain, Drosophila neuropeptide F (dNPF), can be recruited to guide visual decision-making. Optogenetic activation of dNPF-expressing neurons converted a visually repulsive object into a more attractive object. This suggests that dNPF activity in the Drosophila brain guides ongoing visual choices, to override innate preferences and thereby provide a necessary level of behavioral flexibility in visual decision-making.
Collapse
Affiliation(s)
- Martyna J. Grabowska
- Queensland Brain Institute, The University of Queensland, St Lucia, QLD 4072, Australia
| | - James Steeves
- Queensland Brain Institute, The University of Queensland, St Lucia, QLD 4072, Australia
| | - Julius Alpay
- Queensland Brain Institute, The University of Queensland, St Lucia, QLD 4072, Australia
| | - Matthew van de Poll
- Queensland Brain Institute, The University of Queensland, St Lucia, QLD 4072, Australia
| | - Deniz Ertekin
- Queensland Brain Institute, The University of Queensland, St Lucia, QLD 4072, Australia
| | - Bruno van Swinderen
- Queensland Brain Institute, The University of Queensland, St Lucia, QLD 4072, Australia
| |
Collapse
|
20
|
Forder L, Dyson BJ. Behavioural and neural modulation of win-stay but not lose-shift strategies as a function of outcome value in Rock, Paper, Scissors. Sci Rep 2016; 6:33809. [PMID: 27658703 PMCID: PMC5034336 DOI: 10.1038/srep33809] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Accepted: 09/01/2016] [Indexed: 11/08/2022] Open
Abstract
Competitive environments in which individuals compete for mutually-exclusive outcomes require rational decision making in order to maximize gains but often result in poor quality heuristics. Reasons for the greater reliance on lose-shift relative to win-stay behaviour shown in previous studies were explored using the game of Rock, Paper, Scissors and by manipulating the value of winning and losing. Decision-making following a loss was characterized as relatively fast and relatively inflexible both in terms of the failure to modulate the magnitude of lose-shift strategy and the lack of significant neural modulation. In contrast, decision-making following a win was characterized as relatively slow and relatively flexible both in terms of a behavioural increase in the magnitude of win-stay strategy and a neural modulation of feedback-related negativity (FRN) and stimulus-preceding negativity (SPN) following outcome value modulation. The win-stay/lose-shift heuristic appears not to be a unified mechanism, with the former relying on System 2 processes and the latter relying on System 1 processes. Our ability to play rationally appears more likely when the outcome is positive and when the value of wins are low, highlighting how vulnerable we can be when trying to succeed during competition.
Collapse
|
21
|
Neural Basis of Strategic Decision Making. Trends Neurosci 2015; 39:40-48. [PMID: 26688301 DOI: 10.1016/j.tins.2015.11.002] [Citation(s) in RCA: 64] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2015] [Revised: 11/03/2015] [Accepted: 11/10/2015] [Indexed: 11/23/2022]
Abstract
Human choice behaviors during social interactions often deviate from the predictions of game theory. This might arise partly from the limitations in the cognitive abilities necessary for recursive reasoning about the behaviors of others. In addition, during iterative social interactions, choices might change dynamically as knowledge about the intentions of others and estimates for choice outcomes are incrementally updated via reinforcement learning. Some of the brain circuits utilized during social decision making might be general-purpose and contribute to isomorphic individual and social decision making. By contrast, regions in the medial prefrontal cortex (mPFC) and temporal parietal junction (TPJ) might be recruited for cognitive processes unique to social decision making.
Collapse
|
22
|
Verma G, Chan K, Swami A. Zealotry promotes coexistence in the rock-paper-scissors model of cyclic dominance. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2015; 92:052807. [PMID: 26651744 DOI: 10.1103/physreve.92.052807] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/13/2015] [Indexed: 06/05/2023]
Abstract
Cyclic dominance models, such as the classic rock-paper-scissors (RPS) game, have found real-world applications in biology, ecology, and sociology. A key quantity of interest in such models is the coexistence time, i.e., the time until at least one population type goes extinct. Much recent research has considered conditions that lengthen coexistence times in an RPS model. A general finding is that coexistence is promoted by localized spatial interactions (low mobility), while extinction is fostered by global interactions (high mobility). That is, there exists a mobility threshold which separates a regime of long coexistence from a regime of rapid collapse of coexistence. The key finding of our paper is that if zealots (i.e., nodes able to defeat others while themselves being immune to defeat) of even a single type exist, then system coexistence time can be significantly prolonged, even in the presence of global interactions. This work thus highlights a crucial determinant of system survival time in cyclic dominance models.
Collapse
Affiliation(s)
- Gunjan Verma
- Computational and Information Sciences Directorate, Army Research Laboratory, Adelphi, Maryland 20783, USA
| | - Kevin Chan
- Computational and Information Sciences Directorate, Army Research Laboratory, Adelphi, Maryland 20783, USA
| | - Ananthram Swami
- Computational and Information Sciences Directorate, Army Research Laboratory, Adelphi, Maryland 20783, USA
| |
Collapse
|
23
|
Haroush K, Williams ZM. Neuronal prediction of opponent's behavior during cooperative social interchange in primates. Cell 2015; 160:1233-45. [PMID: 25728667 DOI: 10.1016/j.cell.2015.01.045] [Citation(s) in RCA: 150] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2014] [Revised: 10/25/2014] [Accepted: 01/05/2015] [Indexed: 10/23/2022]
Abstract
A cornerstone of successful social interchange is the ability to anticipate each other's intentions or actions. While generating these internal predictions is essential for constructive social behavior, their single neuronal basis and causal underpinnings are unknown. Here, we discover specific neurons in the primate dorsal anterior cingulate that selectively predict an opponent's yet unknown decision to invest in their common good or defect and distinct neurons that encode the monkey's own current decision based on prior outcomes. Mixed population predictions of the other was remarkably near optimal compared to behavioral decoders. Moreover, disrupting cingulate activity selectively biased mutually beneficial interactions between the monkeys but, surprisingly, had no influence on their decisions when no net-positive outcome was possible. These findings identify a group of other-predictive neurons in the primate anterior cingulate essential for enacting cooperative interactions and may pave a way toward the targeted treatment of social behavioral disorders.
Collapse
Affiliation(s)
- Keren Haroush
- Harvard-MIT Health Sciences and Technology, Harvard Medical School, Boston, MA 02114, USA; Department of Neurosurgery, MGH-HMS Center for Nervous System Repair, Harvard Medical School, Boston, MA 02114, USA.
| | - Ziv M Williams
- Harvard-MIT Health Sciences and Technology, Harvard Medical School, Boston, MA 02114, USA; Department of Neurosurgery, MGH-HMS Center for Nervous System Repair, Harvard Medical School, Boston, MA 02114, USA.
| |
Collapse
|
24
|
Abstract
Humans exhibit a suite of biases when making economic decisions. We review recent research on the origins of human decision making by examining whether similar choice biases are seen in nonhuman primates, our closest phylogenetic relatives. We propose that comparative studies can provide insight into four major questions about the nature of human choice biases that cannot be addressed by studies of our species alone. First, research with other primates can address the evolution of human choice biases and identify shared versus human-unique tendencies in decision making. Second, primate studies can constrain hypotheses about the psychological mechanisms underlying such biases. Third, comparisons of closely related species can identify when distinct mechanisms underlie related biases by examining evolutionary dissociations in choice strategies. Finally, comparative work can provide insight into the biological rationality of economically irrational preferences.
Collapse
Affiliation(s)
- Laurie R Santos
- Department of Psychology, Yale University, New Haven, Connecticut 06511;
| | | |
Collapse
|
25
|
Behavioral Variability through Stochastic Choice and Its Gating by Anterior Cingulate Cortex. Cell 2014; 159:21-32. [DOI: 10.1016/j.cell.2014.08.037] [Citation(s) in RCA: 123] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2014] [Revised: 08/22/2014] [Accepted: 08/25/2014] [Indexed: 10/24/2022]
|
26
|
Social cycling and conditional responses in the Rock-Paper-Scissors game. Sci Rep 2014; 4:5830. [PMID: 25060115 PMCID: PMC5376050 DOI: 10.1038/srep05830] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2014] [Accepted: 07/07/2014] [Indexed: 11/20/2022] Open
Abstract
How humans make decisions in non-cooperative strategic interactions is a big question. For the fundamental Rock-Paper-Scissors (RPS) model game system, classic Nash equilibrium (NE) theory predicts that players randomize completely their action choices to avoid being exploited, while evolutionary game theory of bounded rationality in general predicts persistent cyclic motions, especially in finite populations. However as empirical studies have been relatively sparse, it is still a controversial issue as to which theoretical framework is more appropriate to describe decision-making of human subjects. Here we observe population-level persistent cyclic motions in a laboratory experiment of the discrete-time iterated RPS game under the traditional random pairwise-matching protocol. This collective behavior contradicts with the NE theory but is quantitatively explained, without any adjustable parameter, by a microscopic model of win-lose-tie conditional response. Theoretical calculations suggest that if all players adopt the same optimized conditional response strategy, their accumulated payoff will be much higher than the reference value of the NE mixed strategy. Our work demonstrates the feasibility of understanding human competition behaviors from the angle of non-equilibrium statistical physics.
Collapse
|
27
|
Mochizuki K, Funahashi S. Opposing history effect of preceding decision and action in the free choice of saccade direction. J Neurophysiol 2014; 112:923-32. [PMID: 24848475 DOI: 10.1152/jn.00846.2013] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
When we act voluntarily, we make a decision to do so prior to the actual execution. However, because of the strong tie between decision and action, it has been difficult to dissociate these two processes in an animal's free behavior. In the present study, we tried to characterize the differences in these processes on the basis of their unique history effect. Using simple eye movement tasks in which the direction of a saccade was either instructed by a computer or freely chosen by the subject, we found that the preceding decision and action had different effects on the animal's subsequent behavior. While choosing a direction (previous decision) produced a positive history effect that prompted the choice of the same saccade direction, making a saccadic response to a direction (previous action) produced a negative history effect that discouraged the monkey from choosing the same direction. This result suggests that the history effect in sequential behavior reported in previous studies was a mixture of these two different components. Future studies on decision-making need to consider the importance of the distinction between decision and action in animal behavior.
Collapse
Affiliation(s)
- Kei Mochizuki
- Laboratory of Cognitive Brain Science, Department of Cognitive and Behavioral Sciences, Graduate School of Human and Environmental Studies, Kyoto University, Kyoto, Japan; and
| | - Shintaro Funahashi
- Laboratory of Cognitive Brain Science, Department of Cognitive and Behavioral Sciences, Graduate School of Human and Environmental Studies, Kyoto University, Kyoto, Japan; and Kokoro Research Center, Kyoto University, Kyoto, Japan
| |
Collapse
|
28
|
Sticking with the nice guy: trait warmth information impairs learning and modulates person perception brain network activity. COGNITIVE AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2014; 14:1420-37. [PMID: 24820264 DOI: 10.3758/s13415-014-0284-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Social learning requires inferring social information about another person, as well as evaluating outcomes. Previous research shows that prior social information biases decision making and reduces reliance on striatal activity during learning (Delgado, Frank, & Phelps, Nature Neuroscience 8 (11): 1611-1618, 2005). A rich literature in social psychology on person perception demonstrates that people spontaneously infer social information when viewing another person (Fiske & Taylor, 2013) and engage a network of brain regions, including the medial prefrontal cortex, temporal parietal junction, superior temporal sulcus, and precuneus (Amodio & Frith, Nature Reviews Neuroscience, 7(4), 268-277, 2006; Haxby, Gobbini, & Montgomery, 2004; van Overwalle Human Brain Mapping, 30, 829-858, 2009). We investigate the role of these brain regions during social learning about well-established dimensions of person perception-trait warmth and trait competence. We test the hypothesis that activity in person perception brain regions interacts with learning structures during social learning. Participants play an investment game where they must choose an agent to invest on their behalf. This choice is guided by cues signaling trait warmth or trait competence based on framing of monetary returns. Trait warmth information impairs learning about human but not computer agents, while trait competence information produces similar learning rates for human and computer agents. We see increased activation to warmth information about human agents in person perception brain regions. Interestingly, activity in person perception brain regions during the decision phase negatively predicts activity in the striatum during feedback for trait competence inferences about humans. These results suggest that social learning may engage additional processing within person perception brain regions that hampers learning in economic contexts.
Collapse
|
29
|
Lee VK, Harris LT. How social cognition can inform social decision making. Front Neurosci 2013; 7:259. [PMID: 24399928 PMCID: PMC3872305 DOI: 10.3389/fnins.2013.00259] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2013] [Accepted: 12/10/2013] [Indexed: 11/13/2022] Open
Abstract
Social decision-making is often complex, requiring the decision-maker to make inferences of others' mental states in addition to engaging traditional decision-making processes like valuation and reward processing. A growing body of research in neuroeconomics has examined decision-making involving social and non-social stimuli to explore activity in brain regions such as the striatum and prefrontal cortex, largely ignoring the power of the social context. Perhaps more complex processes may influence decision-making in social vs. non-social contexts. Years of social psychology and social neuroscience research have documented a multitude of processes (e.g., mental state inferences, impression formation, spontaneous trait inferences) that occur upon viewing another person. These processes rely on a network of brain regions including medial prefrontal cortex (MPFC), superior temporal sulcus (STS), temporal parietal junction, and precuneus among others. Undoubtedly, these social cognition processes affect social decision-making since mental state inferences occur spontaneously and automatically. Few studies have looked at how these social inference processes affect decision-making in a social context despite the capability of these inferences to serve as predictions that can guide future decision-making. Here we review and integrate the person perception and decision-making literatures to understand how social cognition can inform the study of social decision-making in a way that is consistent with both literatures. We identify gaps in both literatures-while behavioral economics largely ignores social processes that spontaneously occur upon viewing another person, social psychology has largely failed to talk about the implications of social cognition processes in an economic decision-making context-and examine the benefits of integrating social psychological theory with behavioral economic theory.
Collapse
Affiliation(s)
- Victoria K Lee
- Department of Psychology and Neuroscience, Duke University Durham, NC, USA
| | - Lasana T Harris
- Department of Psychology and Neuroscience, Duke University Durham, NC, USA ; Center for Cognitive Neuroscience, Duke University Durham, NC, USA
| |
Collapse
|
30
|
Response randomization of one- and two-person rock-paper-scissors games in individuals with schizophrenia. Psychiatry Res 2013; 207:158-63. [PMID: 23017652 DOI: 10.1016/j.psychres.2012.09.003] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/13/2012] [Revised: 09/04/2012] [Accepted: 09/05/2012] [Indexed: 11/22/2022]
Abstract
Randomization among successive choices is important in adaptive decision-making, particularly for strategic interactions in which the optimal strategy is a mixed strategy. Patients with schizophrenia have been reported to have deficits in random sequential behaviors arising from impaired executive function. However, whether schizophrenic patients exhibit distinct behaviors for response randomization in one- and two-person games requiring different behavioral strategies is not known. The aim of this study was to examine the response randomization of 48 schizophrenic patients and 50 healthy subjects in one- and two-person rock-paper-scissors games. Here we found that the schizophrenic patients exhibited non-random biases distinct from those of the healthy subjects (i.e., stereotypic switching in the one-person game and the tendency to choose the best response against the opponent's previous choice in the two-person game). The entropy of the choice sequences was prominently decreased in the schizophrenic patients for both games, thereby indicating an overall disturbance in the behavioral randomization in adaptive decision-making. These results suggest that the impairment of response randomization in schizophrenic patients manifests differently in interactive and non-interactive situations, which may be useful for the diagnosis and quantification of the severity of the disease.
Collapse
|
31
|
Gasser B, Cartmill EA, Arbib MA. Ontogenetic Ritualization of Primate Gesture as a Case Study in Dyadic Brain Modeling. Neuroinformatics 2013; 12:93-109. [DOI: 10.1007/s12021-013-9182-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
32
|
Heilbronner SR, Hayden BY. Contextual factors explain risk-seeking preferences in rhesus monkeys. Front Neurosci 2013; 7:7. [PMID: 23378827 PMCID: PMC3561601 DOI: 10.3389/fnins.2013.00007] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2012] [Accepted: 01/10/2013] [Indexed: 11/13/2022] Open
Abstract
In contrast to humans and most other animals, rhesus macaques strongly prefer risky rewards to safe ones with similar expected value. Why macaques prefer risk while other animals typically avoid it remains puzzling and challenges the idea that monkeys provide a model for human economic behavior. Here we argue that monkeys’ risk-seeking preferences are neither mysterious nor unique. Risk-seeking in macaques is possibly induced by specific elements of the tasks that have been used to measure their risk preferences. The most important of these elements are (1) very small stakes, (2) serially repeated gambles with short delays between trials, and (3) task parameters that are learned through experience, not described verbally. Together, we hypothesize that these features will readily induce risk-seeking in monkeys, humans, and rats. Thus, elements of task design that are often ignored when comparing studies of risk attitudes can easily overwhelm basal risk preferences. More broadly, these results highlight the fundamental importance of understanding the psychological basis of economic decisions in interpreting preference data and corresponding neural measures.
Collapse
Affiliation(s)
- Sarah R Heilbronner
- Department of Pharmacology and Physiology, University of Rochester Medical Center Rochester, NY, USA
| | | |
Collapse
|
33
|
Abstract
Metacognition concerns the processes by which we monitor and control our own cognitive processes. It can also be applied to others, in which case it is known as mentalizing. Both kinds of metacognition have implicit and explicit forms, where implicit means automatic and without awareness. Implicit metacognition enables us to adopt a we-mode, through which we automatically take account of the knowledge and intentions of others. Adoption of this mode enhances joint action. Explicit metacognition enables us to reflect on and justify our behaviour to others. However, access to the underlying processes is very limited for both self and others and our reports on our own and others' intentions can be very inaccurate. On the other hand, recent experiments have shown that, through discussions of our perceptual experiences with others, we can detect sensory signals more accurately, even in the absence of objective feedback. Through our willingness to discuss with others the reasons for our actions and perceptions, we overcome our lack of direct access to the underlying cognitive processes. This creates the potential for us to build more accurate accounts of the world and of ourselves. I suggest, therefore, that explicit metacognition is a uniquely human ability that has evolved through its enhancement of collaborative decision-making.
Collapse
|
34
|
Seo H, Lee D. Neural basis of learning and preference during social decision-making. Curr Opin Neurobiol 2012; 22:990-5. [PMID: 22704796 DOI: 10.1016/j.conb.2012.05.010] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2012] [Revised: 05/15/2012] [Accepted: 05/28/2012] [Indexed: 10/28/2022]
Abstract
Social decision-making is arguably the most complex cognitive function performed by the human brain. This is due to two unique features of social decision-making. First, predicting the behaviors of others is extremely difficult. Second, humans often take into consideration the well-beings of others during decision-making, but this is influenced by many contextual factors. Despite such complexity, studies on the neural basis of social decision-making have made substantial progress in the last several years. They demonstrated that the core brain areas involved in reinforcement learning and valuation, such as the ventral striatum and orbitofrontal cortex, make important contribution to social decision-making. Furthermore, the contribution of brain systems implicated for theory of mind during decision-making is being elucidated. Future studies are expected to provide additional details about the nature of information channeled through these brain areas.
Collapse
Affiliation(s)
- Hyojung Seo
- Department of Neurobiology, Yale University School of Medicine, 333 Cedar Street, SHM B404, New Haven, CT 06510, USA
| | | |
Collapse
|
35
|
Abstract
Reinforcement learning is an adaptive process in which an animal utilizes its previous experience to improve the outcomes of future choices. Computational theories of reinforcement learning play a central role in the newly emerging areas of neuroeconomics and decision neuroscience. In this framework, actions are chosen according to their value functions, which describe how much future reward is expected from each action. Value functions can be adjusted not only through reward and penalty, but also by the animal's knowledge of its current environment. Studies have revealed that a large proportion of the brain is involved in representing and updating value functions and using them to choose an action. However, how the nature of a behavioral task affects the neural mechanisms of reinforcement learning remains incompletely understood. Future studies should uncover the principles by which different computational elements of reinforcement learning are dynamically coordinated across the entire brain.
Collapse
Affiliation(s)
- Daeyeol Lee
- Department of Neurobiology, Kavli Institute for Neuroscience, Yale University School of Medicine, New Haven, Connecticut 06510, USA.
| | | | | |
Collapse
|
36
|
Abstract
Behavioral changes driven by reinforcement and punishment are referred to as simple or model-free reinforcement learning. Animals can also change their behaviors by observing events that are neither appetitive nor aversive when these events provide new information about payoffs available from alternative actions. This is an example of model-based reinforcement learning and can be accomplished by incorporating hypothetical reward signals into the value functions for specific actions. Recent neuroimaging and single-neuron recording studies showed that the prefrontal cortex and the striatum are involved not only in reinforcement and punishment, but also in model-based reinforcement learning. We found evidence for both types of learning, and hence hybrid learning, in monkeys during simulated competitive games. In addition, in both the dorsolateral prefrontal cortex and orbitofrontal cortex, individual neurons heterogeneously encoded signals related to actual and hypothetical outcomes from specific actions, suggesting that both areas might contribute to hybrid learning.
Collapse
Affiliation(s)
- Hiroshi Abe
- Laboratory of Neurobiology, The Rockefeller University, New York, New York, USA
| | | | | |
Collapse
|
37
|
Danckert J, Stöttinger E, Quehl N, Anderson B. Right Hemisphere Brain Damage Impairs Strategy Updating. Cereb Cortex 2011; 22:2745-60. [PMID: 22178711 DOI: 10.1093/cercor/bhr351] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- James Danckert
- Department of Psychology, University of Waterloo, Waterloo, N2L 3G1 Ontario, Canada.
| | | | | | | |
Collapse
|
38
|
Vickery T, Chun M, Lee D. Ubiquity and Specificity of Reinforcement Signals throughout the Human Brain. Neuron 2011; 72:166-77. [DOI: 10.1016/j.neuron.2011.08.011] [Citation(s) in RCA: 155] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/15/2011] [Indexed: 11/28/2022]
|
39
|
Perry A, Stein L, Bentin S. Motor and attentional mechanisms involved in social interaction—Evidence from mu and alpha EEG suppression. Neuroimage 2011; 58:895-904. [DOI: 10.1016/j.neuroimage.2011.06.060] [Citation(s) in RCA: 78] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2011] [Revised: 06/20/2011] [Accepted: 06/21/2011] [Indexed: 10/18/2022] Open
|
40
|
Abe H, Lee D. Distributed coding of actual and hypothetical outcomes in the orbital and dorsolateral prefrontal cortex. Neuron 2011; 70:731-41. [PMID: 21609828 DOI: 10.1016/j.neuron.2011.03.026] [Citation(s) in RCA: 129] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/30/2011] [Indexed: 10/18/2022]
Abstract
Knowledge about hypothetical outcomes from unchosen actions is beneficial only when such outcomes can be correctly attributed to specific actions. Here we show that during a simulated rock-paper-scissors game, rhesus monkeys can adjust their choice behaviors according to both actual and hypothetical outcomes from their chosen and unchosen actions, respectively. In addition, neurons in both dorsolateral prefrontal cortex and orbitofrontal cortex encoded the signals related to actual and hypothetical outcomes immediately after they were revealed to the animal. Moreover, compared to the neurons in the orbitofrontal cortex, those in the dorsolateral prefrontal cortex were more likely to change their activity according to the hypothetical outcomes from specific actions. Conjunctive and parallel coding of multiple actions and their outcomes in the prefrontal cortex might enhance the efficiency of reinforcement learning and also contribute to their context-dependent memory.
Collapse
Affiliation(s)
- Hiroshi Abe
- Department of Neurobiology, Yale University, New Haven, CT 06510, USA
| | | |
Collapse
|
41
|
|
42
|
Cohen MX. Individual differences and the neural representations of reward expectation and reward prediction error. Soc Cogn Affect Neurosci 2010; 2:20-30. [PMID: 17710118 PMCID: PMC1945222 DOI: 10.1093/scan/nsl021] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2006] [Accepted: 08/08/2006] [Indexed: 11/14/2022] Open
Abstract
Reward expectation and reward prediction errors are thought to be critical for dynamic adjustments in decision-making and reward-seeking behavior, but little is known about their representation in the brain during uncertainty and risk-taking. Furthermore, little is known about what role individual differences might play in such reinforcement processes. In this study, it is shown behavioral and neural responses during a decision-making task can be characterized by a computational reinforcement learning model and that individual differences in learning parameters in the model are critical for elucidating these processes. In the fMRI experiment, subjects chose between high- and low-risk rewards. A computational reinforcement learning model computed expected values and prediction errors that each subject might experience on each trial. These outputs predicted subjects' trial-to-trial choice strategies and neural activity in several limbic and prefrontal regions during the task. Individual differences in estimated reinforcement learning parameters proved critical for characterizing these processes, because models that incorporated individual learning parameters explained significantly more variance in the fMRI data than did a model using fixed learning parameters. These findings suggest that the brain engages a reinforcement learning process during risk-taking and that individual differences play a crucial role in modeling this process.
Collapse
Affiliation(s)
- Michael X Cohen
- Department of Epilepsy, University of Bonn, Sigmund-Freud-Strasse 25, Bonn, Germany.
| |
Collapse
|
43
|
Rowe JB, Hughes L, Nimmo-Smith I. Action selection: a race model for selected and non-selected actions distinguishes the contribution of premotor and prefrontal areas. Neuroimage 2010; 51:888-96. [PMID: 20188184 PMCID: PMC2877799 DOI: 10.1016/j.neuroimage.2010.02.045] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2009] [Revised: 02/06/2010] [Accepted: 02/12/2010] [Indexed: 11/26/2022] Open
Abstract
Race models have been used to explain perceptual, motor and oculomotor decisions. Here we developed a race model to explain how human subjects select actions when there are no overt rewards and no external cues to specify which action to make. Critically, we were able to estimate the cumulative activity of neuronal decision-units for selected and non-selected actions. We used functional magnetic resonance imaging (fMRI) to test for regional brain activity that correlated with the predictions of this race model. Activity in the pre-SMA, cingulate motor and premotor areas correlated with prospective selection between responses according to the race model. Activity in the lateral prefrontal cortex did not correlate with the race model, even though this area was active during action selection. This activity related to the degree to which individuals switched between alternative actions. Crucially, a follow-up experiment showed that it was not present on the first trial. Taken together, these results suggest that the lateral prefrontal cortex is not the source for the generation of action. It is more likely that it is involved in switching to alternatives or monitoring previous actions. Thus, our experiment shows the power of the race model in distinguishing the contribution of different areas in the selection of action.
Collapse
Affiliation(s)
- J B Rowe
- Cambridge University Department of Clinical Neurosciences, CB2 2QQ, UK.
| | | | | |
Collapse
|
44
|
Sanabria F, Thrailkill E. Pigeons (Columba livia) approach Nash equilibrium in experimental Matching Pennies competitions. J Exp Anal Behav 2009; 91:169-83. [PMID: 19794832 DOI: 10.1901/jeab.2009.91-169] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2008] [Accepted: 11/17/2008] [Indexed: 11/22/2022]
Abstract
The game of Matching Pennies (MP), a simplified version of the more popular Rock, Papers, Scissors, schematically represents competitions between organisms with incentives to predict each other's behavior. Optimal performance in iterated MP competitions involves the production of random choice patterns and the detection of nonrandomness in the opponent's choices. The purpose of this study was to replicate systematic deviations from optimal choice observed in humans when playing MP, and to establish whether suboptimal performance was better described by a modified linear learning model or by a more cognitively sophisticated reinforcement-tracking model. Two pairs of pigeons played iterated MP competitions; payoffs for successful choices (e.g., "Rock" vs. "Scissors") varied within experimental sessions and across experimental conditions, and were signaled by visual stimuli. Pigeons' behavior adjusted to payoff matrices; divergences from optimal play were analogous to those usually demonstrated by humans, except for the tendency of pigeons to persist on prior choices. Suboptimal play was well characterized by a linear learning model of the kind widely used to describe human performance. This linear learning model may thus serve as default account of competitive performance against which the imputation of cognitively sophisticated processes can be evaluated.
Collapse
Affiliation(s)
- Federico Sanabria
- Department of Psychology,Arizona State University, Tempe, Arizona 85287-1104, USA.
| | | |
Collapse
|
45
|
Braun DA, Ortega PA, Wolpert DM. Nash equilibria in multi-agent motor interactions. PLoS Comput Biol 2009; 5:e1000468. [PMID: 19680426 PMCID: PMC2714462 DOI: 10.1371/journal.pcbi.1000468] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2009] [Accepted: 07/14/2009] [Indexed: 11/18/2022] Open
Abstract
Social interactions in classic cognitive games like the ultimatum game or the
prisoner's dilemma typically lead to Nash equilibria when multiple
competitive decision makers with perfect knowledge select optimal strategies.
However, in evolutionary game theory it has been shown that Nash equilibria can
also arise as attractors in dynamical systems that can describe, for example,
the population dynamics of microorganisms. Similar to such evolutionary
dynamics, we find that Nash equilibria arise naturally in motor interactions in
which players vie for control and try to minimize effort. When confronted with
sensorimotor interaction tasks that correspond to the classical
prisoner's dilemma and the rope-pulling game, two-player motor
interactions led predominantly to Nash solutions. In contrast, when a single
player took both roles, playing the sensorimotor game bimanually, cooperative
solutions were found. Our methodology opens up a new avenue for the study of
human motor interactions within a game theoretic framework, suggesting that the
coupling of motor systems can lead to game theoretic solutions. Human motor interactions range from adversarial activities like judo and arm
wrestling to more cooperative activities like tandem riding and tango dancing.
In this study, we design a new methodology to study human sensorimotor
interactions quantitatively based on game theory. We develop two motor tasks
based on the prisoner's dilemma and the rope-pulling game in which we
introduce an intrinsic cost related to effort rather than the typical monetary
outcome used in cognitive game theory. We find that continuous motor
interactions converged to game theoretic outcomes similar to the interaction
dynamics reported for other dynamical systems in biology ranging in scale from
microorganisms to population dynamics.
Collapse
Affiliation(s)
- Daniel A Braun
- Computational and Biological Learning Laboratory, Department of Engineering, University of Cambridge, Cambridge, UK.
| | | | | |
Collapse
|
46
|
Abstract
The neural mechanisms supporting the ability to recognize and respond to fictive outcomes, outcomes of actions that one has not taken, remain obscure. We hypothesized that neurons in the anterior cingulate cortex (ACC), which monitors the consequences of actions and mediates subsequent changes in behavior, would respond to fictive reward information. We recorded responses of single neurons during performance of a choice task that provided information about the reward values of options that were not chosen. We found that ACC neurons signal fictive reward information and use a coding scheme similar to that used to signal experienced outcomes. Thus, individual ACC neurons process both experienced and fictive rewards.
Collapse
Affiliation(s)
- Benjamin Y Hayden
- Department of Neurobiology, Duke University School of Medicine, Center for Neuroeconomic Studies, Center for Cognitive Neuroscience, Duke University, Durham, NC 27701, USA.
| | | | | |
Collapse
|
47
|
Abstract
Human behaviors can be more powerfully influenced by conditioned reinforcers, such as money, than by primary reinforcers. Moreover, people often change their behaviors to avoid monetary losses. However, the effect of removing conditioned reinforcers on choices has not been explored in animals, and the neural mechanisms mediating the behavioral effects of gains and losses are not well understood. To investigate the behavioral and neural effects of gaining and losing a conditioned reinforcer, we trained rhesus monkeys for a matching pennies task in which the positive and negative values of its payoff matrix were realized by the delivery and removal of a conditioned reinforcer. Consistent with the findings previously obtained with non-negative payoffs and primary rewards, the animal's choice behavior during this task was nearly optimal. Nevertheless, the gain and loss of a conditioned reinforcer significantly increased and decreased, respectively, the tendency for the animal to choose the same target in subsequent trials. We also found that the neurons in the dorsomedial frontal cortex, dorsal anterior cingulate cortex, and dorsolateral prefrontal cortex often changed their activity according to whether the animal earned or lost a conditioned reinforcer in the current or previous trial. Moreover, many neurons in the dorsomedial frontal cortex also signaled the gain or loss occurring as a result of choosing a particular action as well as changes in the animal's behaviors resulting from such gains or losses. Thus, primate medial frontal cortex might mediate the behavioral effects of conditioned reinforcers and their losses.
Collapse
|
48
|
Kim S, Hwang J, Seo H, Lee D. Valuation of uncertain and delayed rewards in primate prefrontal cortex. Neural Netw 2009; 22:294-304. [PMID: 19375276 DOI: 10.1016/j.neunet.2009.03.010] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2008] [Revised: 03/09/2009] [Accepted: 03/21/2009] [Indexed: 11/26/2022]
Abstract
Humans and animals often must choose between rewards that differ in their qualities, magnitudes, immediacy, and likelihood, and must estimate these multiple reward parameters from their experience. However, the neural basis for such complex decision making is not well understood. To understand the role of the primate prefrontal cortex in determining the subjective value of delayed or uncertain reward, we examined the activity of individual prefrontal neurons during an inter-temporal choice task and a computer-simulated competitive game. Consistent with the findings from previous studies in humans and other animals, the monkey's behaviors during inter-temporal choice were well accounted for by a hyperbolic discount function. In addition, the activity of many neurons in the lateral prefrontal cortex reflected the signals related to the magnitude and delay of the reward expected from a particular action, and often encoded the difference in temporally discounted values that predicted the animal's choice. During a computerized matching pennies game, the animals approximated the optimal strategy, known as Nash equilibrium, using a reinforcement learning algorithm. We also found that many neurons in the lateral prefrontal cortex conveyed the signals related to the animal's previous choices and their outcomes, suggesting that this cortical area might play an important role in forming associations between actions and their outcomes. These results show that the primate lateral prefrontal cortex plays a central role in estimating the values of alternative actions based on multiple sources of information.
Collapse
Affiliation(s)
- Soyoun Kim
- Department of Neurobiology, Yale University School of Medicine, New Haven, CT 06510, USA
| | | | | | | |
Collapse
|
49
|
Seo H, Lee D. Cortical mechanisms for reinforcement learning in competitive games. Philos Trans R Soc Lond B Biol Sci 2008; 363:3845-57. [PMID: 18829430 DOI: 10.1098/rstb.2008.0158] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Game theory analyses optimal strategies for multiple decision makers interacting in a social group. However, the behaviours of individual humans and animals often deviate systematically from the optimal strategies described by game theory. The behaviours of rhesus monkeys (Macaca mulatta) in simple zero-sum games showed similar patterns, but their departures from the optimal strategies were well accounted for by a simple reinforcement-learning algorithm. During a computer-simulated zero-sum game, neurons in the dorsolateral prefrontal cortex often encoded the previous choices of the animal and its opponent as well as the animal's reward history. By contrast, the neurons in the anterior cingulate cortex predominantly encoded the animal's reward history. Using simple competitive games, therefore, we have demonstrated functional specialization between different areas of the primate frontal cortex involved in outcome monitoring and action selection. Temporally extended signals related to the animal's previous choices might facilitate the association between choices and their delayed outcomes, whereas information about the choices of the opponent might be used to estimate the reward expected from a particular action. Finally, signals related to the reward history might be used to monitor the overall success of the animal's current decision-making strategy.
Collapse
Affiliation(s)
- Hyojung Seo
- Department of Neurobiology, Yale University School of Medicine, 333 Cedar Street, SHM B404, New Haven, CT 06510, USA
| | | |
Collapse
|
50
|
Cohen MX, Frank MJ. Neurocomputational models of basal ganglia function in learning, memory and choice. Behav Brain Res 2008; 199:141-56. [PMID: 18950662 DOI: 10.1016/j.bbr.2008.09.029] [Citation(s) in RCA: 138] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2008] [Revised: 09/24/2008] [Accepted: 09/24/2008] [Indexed: 11/24/2022]
Abstract
The basal ganglia (BG) are critical for the coordination of several motor, cognitive, and emotional functions and become dysfunctional in several pathological states ranging from Parkinson's disease to Schizophrenia. Here we review principles developed within a neurocomputational framework of BG and related circuitry which provide insights into their functional roles in behavior. We focus on two classes of models: those that incorporate aspects of biological realism and constrained by functional principles, and more abstract mathematical models focusing on the higher level computational goals of the BG. While the former are arguably more "realistic", the latter have a complementary advantage in being able to describe functional principles of how the system works in a relatively simple set of equations, but are less suited to making specific hypotheses about the roles of specific nuclei and neurophysiological processes. We review the basic architecture and assumptions of these models, their relevance to our understanding of the neurobiological and cognitive functions of the BG, and provide an update on the potential roles of biological details not explicitly incorporated in existing models. Empirical studies ranging from those in transgenic mice to dopaminergic manipulation, deep brain stimulation, and genetics in humans largely support model predictions and provide the basis for further refinement. Finally, we discuss possible future directions and possible ways to integrate different types of models.
Collapse
Affiliation(s)
- Michael X Cohen
- Department of Psychology, Program in Neuroscience, University of Arizona, 1503 E University Blvd, Tucson, AZ 85721, United States
| | | |
Collapse
|