1
|
Selbing I, Skewes J. The expression of decision and learning variables in movement patterns related to decision actions. Exp Brain Res 2024; 242:1311-1325. [PMID: 38551690 PMCID: PMC11108959 DOI: 10.1007/s00221-024-06805-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 02/09/2024] [Indexed: 05/23/2024]
Abstract
Decisions are not necessarily easy to separate into a planning and an execution phase and the decision-making process can often be reflected in the movement associated with the decision. Here, we used formalized definitions of concepts relevant in decision-making and learning to explore if and how these concepts correlate with decision-related movement paths, both during and after a choice is made. To this end, we let 120 participants (46 males, mean age = 24.5 years) undergo a repeated probabilistic two-choice task with changing probabilities where we used mouse-tracking, a simple non-invasive technique, to study the movements related to decisions. The decisions of the participants were modelled using Bayesian inference which enabled the computation of variables related to decision-making and learning. Analyses of the movement during the decision showed effects of relevant decision variables, such as confidence, on aspects related to, for instance, timing and pausing, range of movement and deviation from the shortest distance. For the movements after a decision there were some effects of relevant learning variables, mainly related to timing and speed. We believe our findings can be of interest for researchers within several fields, spanning from social learning to experimental methods and human-machine/robot interaction.
Collapse
Affiliation(s)
- Ida Selbing
- Division of Psychology, Karolinska Institutet, Nobels väg 9, Solna, Stockholm, Sweden.
- Interacting Minds Centre, Aarhus University, Aarhus, Denmark.
| | - Joshua Skewes
- Department for Linguistics, Cognitive Science, and Semiotics, Aarhus University, Aarhus, Denmark
- Interacting Minds Centre, Aarhus University, Aarhus, Denmark
| |
Collapse
|
2
|
Philippe R, Janet R, Khalvati K, Rao RPN, Lee D, Dreher JC. Neurocomputational mechanisms involved in adaptation to fluctuating intentions of others. Nat Commun 2024; 15:3189. [PMID: 38609372 PMCID: PMC11014977 DOI: 10.1038/s41467-024-47491-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Accepted: 03/12/2024] [Indexed: 04/14/2024] Open
Abstract
Humans frequently interact with agents whose intentions can fluctuate between competition and cooperation over time. It is unclear how the brain adapts to fluctuating intentions of others when the nature of the interactions (to cooperate or compete) is not explicitly and truthfully signaled. Here, we use model-based fMRI and a task in which participants thought they were playing with another player. In fact, they played with an algorithm that alternated without signaling between cooperative and competitive strategies. We show that a neurocomputational mechanism with arbitration between competitive and cooperative experts outperforms other learning models in predicting choice behavior. At the brain level, the fMRI results show that the ventral striatum and ventromedial prefrontal cortex track the difference of reliability between these experts. When attributing competitive intentions, we find increased coupling between these regions and a network that distinguishes prediction errors related to competition and cooperation. These findings provide a neurocomputational account of how the brain arbitrates dynamically between cooperative and competitive intentions when making adaptive social decisions.
Collapse
Affiliation(s)
- Rémi Philippe
- CNRS-Institut des Sciences Cognitives Marc Jeannerod, UMR5229, Neuroeconomics, reward, and decision making laboratory, Lyon, France
- Université Claude Bernard Lyon 1, Lyon, France
| | - Rémi Janet
- CNRS-Institut des Sciences Cognitives Marc Jeannerod, UMR5229, Neuroeconomics, reward, and decision making laboratory, Lyon, France
- Université Claude Bernard Lyon 1, Lyon, France
| | - Koosha Khalvati
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Rajesh P N Rao
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
- Center for Neurotechnology, University of Washington, Seattle, WA, USA
| | - Daeyeol Lee
- Zanvyl Krieger Mind/Brain Institute, Johns Hopkins University, Baltimore, MD, USA
- Kavli Discovery Neuroscience Institute, Johns Hopkins University, Baltimore, MD, USA
- Department of Psychological and Brain Sciences, Johns Hopkins University, Baltimore, MD, USA
- Department of Neuroscience, Johns Hopkins University, Baltimore, MD, USA
| | - Jean-Claude Dreher
- CNRS-Institut des Sciences Cognitives Marc Jeannerod, UMR5229, Neuroeconomics, reward, and decision making laboratory, Lyon, France.
- Université Claude Bernard Lyon 1, Lyon, France.
| |
Collapse
|
3
|
Ota K, Charles L, Haggard P. Autonomous behaviour and the limits of human volition. Cognition 2024; 244:105684. [PMID: 38101173 DOI: 10.1016/j.cognition.2023.105684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 11/22/2023] [Accepted: 11/28/2023] [Indexed: 12/17/2023]
Abstract
Humans and some other animals can autonomously generate action choices that contribute to solving complex problems. However, experimental investigations of the cognitive bases of human autonomy are challenging, because experimental paradigms typically constrain behaviour using controlled contexts, and elicit behaviour by external triggers. In contrast, autonomy and freedom imply unconstrained behaviour initiated by endogenous triggers. Here we propose a new theoretical construct of adaptive autonomy, meaning the capacity to make behavioural choices that are free from constraints of both immediate external triggers and of routine response patterns, but nevertheless show appropriate coordination with the environment. Participants (N = 152) played a competitive game in which they had to choose the right time to act, in the face of an opponent who punished (in separate blocks) either choice biases (such as always responding early), sequential patterns of action timing across trials (such as early, late, early, late…), or predictable action-outcome dependence (such as win-stay, lose-shift). Adaptive autonomy was quantified as the ability to maintain performance when each of these influences on action selection was punished. We found that participants could become free from habitual choices regarding when to act and could also become free from sequential action patterns. However, they were not able to free themselves from influences of action-outcome dependence, even when these resulted in poor performance. These results point to a new concept of autonomous behaviour as flexible adaptation of voluntary action choices in a way that avoids stereotypy. In a sequential analysis, we also demonstrated that participants increased their reliance on belief learning in which they attempt to understand the competitor's beliefs and intentions, when transition bias and reinforcement bias were punished. Taken together, our study points to a cognitive mechanism of adaptive autonomy in which competitive interactions with other agents could promote both social cognition and volition in the form of non-stereotyped action choices.
Collapse
Affiliation(s)
- Keiji Ota
- Institute of Cognitive Neuroscience, University College London, London, United Kingdom; Department of Psychology, School of Biological and Behavioural Sciences, Queen Mary University of London, London, United Kingdom.
| | - Lucie Charles
- Institute of Cognitive Neuroscience, University College London, London, United Kingdom; Department of Psychology, School of Biological and Behavioural Sciences, Queen Mary University of London, London, United Kingdom
| | - Patrick Haggard
- Institute of Cognitive Neuroscience, University College London, London, United Kingdom
| |
Collapse
|
4
|
Wang H, Ortega HK, Kelly EB, Indajang J, Feng J, Li Y, Kwan AC. Frontal noradrenergic and cholinergic transients exhibit distinct spatiotemporal dynamics during competitive decision-making. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.23.576893. [PMID: 38328186 PMCID: PMC10849696 DOI: 10.1101/2024.01.23.576893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Norepinephrine (NE) and acetylcholine (ACh) are neuromodulators that are crucial for learning and decision-making. In the cortex, NE and ACh are released at specific sites along neuromodulatory axons, which would constrain their spatiotemporal dynamics at the subcellular scale. However, how the fluctuating patterns of NE and ACh signaling may be linked to behavioral events is unknown. Here, leveraging genetically encoded NE and ACh indicators, we use two-photon microscopy to visualize neuromodulatory signals in the superficial layer of the mouse medial frontal cortex during decision-making. Head-fixed mice engage in a competitive game called matching pennies against a computer opponent. We show that both NE and ACh transients carry information about decision-related variables including choice, outcome, and reinforcer. However, the two neuromodulators differ in their spatiotemporal pattern of task-related activation. Spatially, NE signals are more segregated with choice and outcome encoded at distinct locations, whereas ACh signals can multiplex and reflect different behavioral correlates at the same site. Temporally, task-driven NE transients were more synchronized and peaked earlier than ACh transients. To test functional relevance, using optogenetics we found that evoked elevation of NE, but not ACh, in the medial frontal cortex increases the propensity of the animals to switch and explore alternate options. Taken together, the results reveal distinct spatiotemporal patterns of rapid ACh and NE transients at the subcellular scale during decision-making in mice, which may endow these neuromodulators with different ways to impact neural plasticity to mediate learning and adaptive behavior.
Collapse
Affiliation(s)
- Hongli Wang
- Interdepartmental Neuroscience Program, Yale University School of Medicine, New Haven, Connecticut, 06511, USA
| | - Heather K. Ortega
- Interdepartmental Neuroscience Program, Yale University School of Medicine, New Haven, Connecticut, 06511, USA
| | - Emma B. Kelly
- Department of Psychiatry, Yale University School of Medicine, New Haven, Connecticut, 06511, USA
- Meinig School of Biomedical Engineering, Cornell University, Ithaca, New York, 14853, USA
| | - Jonathan Indajang
- Meinig School of Biomedical Engineering, Cornell University, Ithaca, New York, 14853, USA
| | - Jiesi Feng
- State Key Laboratory of Membrane Biology, Peking University School of Life Sciences, Beijing, China
| | - Yulong Li
- State Key Laboratory of Membrane Biology, Peking University School of Life Sciences, Beijing, China
- PKU-IDG/McGovern Institute for Brain Research, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- Chinese Institute for Brain Research, Beijing, China
| | - Alex C. Kwan
- Department of Psychiatry, Yale University School of Medicine, New Haven, Connecticut, 06511, USA
- Meinig School of Biomedical Engineering, Cornell University, Ithaca, New York, 14853, USA
- Department of Psychiatry, Weill Cornell Medicine, New York, New York, 10065, USA
| |
Collapse
|
5
|
Leib R, Howard IS, Millard M, Franklin DW. Behavioral Motor Performance. Compr Physiol 2023; 14:5179-5224. [PMID: 38158372 DOI: 10.1002/cphy.c220032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2024]
Abstract
The human sensorimotor control system has exceptional abilities to perform skillful actions. We easily switch between strenuous tasks that involve brute force, such as lifting a heavy sewing machine, and delicate movements such as threading a needle in the same machine. Using a structure with different control architectures, the motor system is capable of updating its ability to perform through our daily interaction with the fluctuating environment. However, there are issues that make this a difficult computational problem for the brain to solve. The brain needs to control a nonlinear, nonstationary neuromuscular system, with redundant and occasionally undesired degrees of freedom, in an uncertain environment using a body in which information transmission is subject to delays and noise. To gain insight into the mechanisms of motor control, here we survey movement laws and invariances that shape our everyday motion. We then examine the major solutions to each of these problems in the three parts of the sensorimotor control system, sensing, planning, and acting. We focus on how the sensory system, the control architectures, and the structure and operation of the muscles serve as complementary mechanisms to overcome deviations and disturbances to motor behavior and give rise to skillful motor performance. We conclude with possible future research directions based on suggested links between the operation of the sensorimotor system across the movement stages. © 2024 American Physiological Society. Compr Physiol 14:5179-5224, 2024.
Collapse
Affiliation(s)
- Raz Leib
- Neuromuscular Diagnostics, TUM School of Medicine and Health, Department of Health and Sport Sciences, Technical University of Munich, Munich, Germany
| | - Ian S Howard
- School of Engineering, Computing and Mathematics, University of Plymouth, Plymouth, UK
| | - Matthew Millard
- Institute of Sport and Movement Science, University of Stuttgart, Stuttgart, Germany
- Institute of Engineering and Computational Mechanics, University of Stuttgart, Stuttgart, Germany
| | - David W Franklin
- Neuromuscular Diagnostics, TUM School of Medicine and Health, Department of Health and Sport Sciences, Technical University of Munich, Munich, Germany
- Munich Institute of Robotics and Machine Intelligence (MIRMI), Technical University of Munich, Munich, Germany
- Munich Data Science Institute (MDSI), Technical University of Munich, Munich, Germany
| |
Collapse
|
6
|
Parr AC, Riek HC, Coe BC, Pari G, Masellis M, Marras C, Munoz DP. Genetic variation in the dopamine system is associated with mixed-strategy decision-making in patients with Parkinson's disease. Eur J Neurosci 2023; 58:4523-4544. [PMID: 36453013 DOI: 10.1111/ejn.15875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 11/16/2022] [Accepted: 11/25/2022] [Indexed: 12/02/2022]
Abstract
Decision-making during mixed-strategy games requires flexibly adapting choice strategies in response to others' actions and dynamically tracking outcomes. Such decisions involve diverse cognitive processes, including reinforcement learning, which are affected by disruptions to the striatal dopamine system. We therefore investigated how genetic variation in dopamine function affected mixed-strategy decision-making in Parkinson's disease (PD), which involves striatal dopamine pathology. Sixty-six PD patients (ages 49-85, Hoehn and Yahr Stages 1-3) and 22 healthy controls (ages 54-75) competed in a mixed-strategy game where successful performance depended on minimizing choice biases (i.e., flexibly adapting choices trial by trial). Participants also completed a fixed-strategy task that was matched for sensory input, motor outputs and overall reward rate. Factor analyses were used to disentangle cognitive from motor aspects within both tasks. Using a within-subject, multi-centre design, patients were examined on and off dopaminergic therapy, and genetic variation was examined via a multilocus genetic profile score representing the additive effects of three single nucleotide polymorphisms (SNPs) that influence dopamine transmission: rs4680 (COMT Val158 Met), rs6277 (C957T) and rs907094 (encoding DARPP-32). PD and control participants displayed comparable mixed-strategy choice behaviour (overall); however, PD patients with genetic profile scores indicating higher dopamine transmission showed improved performance relative to those with low scores. Exploratory follow-up tests across individual SNPs revealed better performance in individuals with the C957T polymorphism, reflecting higher striatal D2/D3 receptor density. Importantly, genetic variation modulated cognitive aspects of performance, above and beyond motor function, suggesting that genetic variation in dopamine signalling may underlie individual differences in cognitive function in PD.
Collapse
Affiliation(s)
- Ashley C Parr
- Centre for Neuroscience Studies, Queen's University, Kingston, Ontario, Canada
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Heidi C Riek
- Centre for Neuroscience Studies, Queen's University, Kingston, Ontario, Canada
| | - Brian C Coe
- Centre for Neuroscience Studies, Queen's University, Kingston, Ontario, Canada
| | - Giovanna Pari
- Centre for Neuroscience Studies, Queen's University, Kingston, Ontario, Canada
- Movement Disorder Clinic, Kingston General Hospital, Kingston, Ontario, Canada
| | - Mario Masellis
- Cognitive Neurology Research Unit, Sunnybrook Health Sciences Centre, Toronto, Ontario, Canada
| | - Connie Marras
- Movement Disorders Clinic, Krembil Neuroscience Centre, University Health Network, Toronto, Ontario, Canada
| | - Douglas P Munoz
- Centre for Neuroscience Studies, Queen's University, Kingston, Ontario, Canada
| |
Collapse
|
7
|
Wang H, Kwan AC. Competitive and cooperative games for probing the neural basis of social decision-making in animals. Neurosci Biobehav Rev 2023; 149:105158. [PMID: 37019249 PMCID: PMC10175234 DOI: 10.1016/j.neubiorev.2023.105158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 03/29/2023] [Accepted: 04/02/2023] [Indexed: 04/07/2023]
Abstract
In a social environment, it is essential for animals to consider the behavior of others when making decisions. To quantitatively assess such social decisions, games offer unique advantages. Games may have competitive and cooperative components, modeling situations with antagonistic and shared objectives between players. Games can be analyzed by mathematical frameworks, including game theory and reinforcement learning, such that an animal's choice behavior can be compared against the optimal strategy. However, so far games have been underappreciated in neuroscience research, particularly for rodent studies. In this review, we survey the varieties of competitive and cooperative games that have been tested, contrasting strategies employed by non-human primates and birds with rodents. We provide examples of how games can be used to uncover neural mechanisms and explore species-specific behavioral differences. We assess critically the limitations of current paradigms and propose improvements. Together, the synthesis of current literature highlights the advantages of using games to probe the neural basis of social decisions for neuroscience studies.
Collapse
Affiliation(s)
- Hongli Wang
- Interdepartmental Neuroscience Program, Yale University School of Medicine, New Haven, CT, USA
| | - Alex C Kwan
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA; Department of Neuroscience, Yale University School of Medicine, New Haven, CT, USA; Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY, USA; Department of Psychiatry, Weill Cornell Medicine, New York, NY 10065, USA.
| |
Collapse
|
8
|
Cristín J, Méndez V, Campos D. Informational Entropy Threshold as a Physical Mechanism for Explaining Tree-like Decision Making in Humans. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1819. [PMID: 36554223 PMCID: PMC9778513 DOI: 10.3390/e24121819] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/09/2022] [Accepted: 12/11/2022] [Indexed: 06/17/2023]
Abstract
While approaches based on physical grounds (such as the drift-diffusion model-DDM) have been exhaustively used in psychology and neuroscience to describe perceptual decision making in humans, similar approaches to complex situations, such as sequential (tree-like) decisions, are still scarce. For such scenarios that involve a reflective prospection of future options, we offer a plausible mechanism based on the idea that subjects can carry out an internal computation of the uncertainty about the different options available, which is computed through the corresponding Shannon entropy. When the amount of information gathered through sensory evidence is enough to reach a given threshold in the entropy, this will trigger the decision. Experimental evidence in favor of this entropy-based mechanism was provided by exploring human performance during navigation through a maze on a computer screen monitored with the help of eye trackers. In particular, our analysis allows us to prove that (i) prospection is effectively used by humans during such navigation tasks, and an indirect quantification of the level of prospection used is attainable; in addition, (ii) the distribution of decision times during the task exhibits power-law tails, a feature that our entropy-based mechanism is able to explain, unlike traditional (DDM-like) frameworks.
Collapse
Affiliation(s)
- Javier Cristín
- Istituto Sistemi Complessi, Consiglio Nazionale delle Ricerche, UOS Sapienza, 00185 Rome, Italy
- Dipartimento di Fisica, Universita’ Sapienza, 00185 Rome, Italy
| | - Vicenç Méndez
- Grup de Física Estadística, Departament de Física, Facultat de Ciències, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| | - Daniel Campos
- Grup de Física Estadística, Departament de Física, Facultat de Ciències, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| |
Collapse
|
9
|
Jin Y, Jensen G, Gottlieb J, Ferrera V. Superstitious learning of abstract order from random reinforcement. Proc Natl Acad Sci U S A 2022; 119:e2202789119. [PMID: 35998221 PMCID: PMC9436361 DOI: 10.1073/pnas.2202789119] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Accepted: 07/01/2022] [Indexed: 11/18/2022] Open
Abstract
Humans and other animals often infer spurious associations among unrelated events. However, such superstitious learning is usually accounted for by conditioned associations, raising the question of whether an animal could develop more complex cognitive structures independent of reinforcement. Here, we tasked monkeys with discovering the serial order of two pictorial sets: a "learnable" set in which the stimuli were implicitly ordered and monkeys were rewarded for choosing the higher-rank stimulus and an "unlearnable" set in which stimuli were unordered and feedback was random regardless of the choice. We replicated prior results that monkeys reliably learned the implicit order of the learnable set. Surprisingly, the monkeys behaved as though some ordering also existed in the unlearnable set, showing consistent choice preference that transferred to novel untrained pairs in this set, even under a preference-discouraging reward schedule that gave rewards more frequently to the stimulus that was selected less often. In simulations, a model-free reinforcement learning algorithm (Q-learning) displayed a degree of consistent ordering among the unlearnable set but, unlike the monkeys, failed to do so under the preference-discouraging reward schedule. Our results suggest that monkeys infer abstract structures from objectively random events using heuristics that extend beyond stimulus-outcome conditional learning to more cognitive model-based learning mechanisms.
Collapse
Affiliation(s)
- Yuhao Jin
- Department of Biological Sciences, Columbia University, New York, NY 10027
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027
| | - Greg Jensen
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027
- Department of Psychology, Reed College, Portland, OR 97202
- Department of Neuroscience, Columbia University, New York, NY 10027
| | - Jacqueline Gottlieb
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027
- Department of Neuroscience, Columbia University, New York, NY 10027
- Kavli Institute for Brain Science, Columbia University, New York, NY 10027
| | - Vincent Ferrera
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027
- Department of Neuroscience, Columbia University, New York, NY 10027
- Kavli Institute for Brain Science, Columbia University, New York, NY 10027
| |
Collapse
|
10
|
Pupil Correlates of Decision Variables in Mice Playing a Competitive Mixed-Strategy Game. eNeuro 2022; 9:ENEURO.0457-21.2022. [PMID: 35168951 PMCID: PMC8925722 DOI: 10.1523/eneuro.0457-21.2022] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 12/21/2021] [Accepted: 01/02/2022] [Indexed: 01/29/2023] Open
Abstract
In a competitive game involving an animal and an opponent, the outcome is contingent on the choices of both players. To succeed, the animal must continually adapt to competitive pressure, or else risk being exploited and lose out on rewards. In this study, we demonstrate that head-fixed male mice can be trained to play the iterative competitive game "matching pennies" against a virtual computer opponent. We find that the animals' performance is well described by a hybrid computational model that includes Q-learning and choice kernels. Comparing between matching pennies and a non-competitive two-armed bandit task, we show that the tasks encourage animals to operate at different regimes of reinforcement learning. To understand the involvement of neuromodulatory mechanisms, we measure fluctuations in pupil size and use multiple linear regression to relate the trial-by-trial transient pupil responses to decision-related variables. The analysis reveals that pupil responses are modulated by observable variables, including choice and outcome, as well as latent variables for value updating, but not action selection. Collectively, these results establish a paradigm for studying competitive decision-making in head-fixed mice and provide insights into the role of arousal-linked neuromodulation in the decision process.
Collapse
|
11
|
Parr AC, Calancie OG, Coe BC, Khalid-Khan S, Munoz DP. Impulsivity and Emotional Dysregulation Predict Choice Behavior During a Mixed-Strategy Game in Adolescents With Borderline Personality Disorder. Front Neurosci 2022; 15:667399. [PMID: 35237117 PMCID: PMC8882924 DOI: 10.3389/fnins.2021.667399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 12/28/2021] [Indexed: 11/13/2022] Open
Abstract
Impulsivity and emotional dysregulation are two core features of borderline personality disorder (BPD), and the neural mechanisms recruited during mixed-strategy interactions overlap with frontolimbic networks that have been implicated in BPD. We investigated strategic choice patterns during the classic two-player game, Matching Pennies, where the most efficient strategy is to choose each option randomly from trial-to-trial to avoid exploitation by one’s opponent. Twenty-seven female adolescents with BPD (mean age: 16 years) and twenty-seven age-matched female controls (mean age: 16 years) participated in an experiment that explored the relationship between strategic choice behavior and impulsivity in both groups and emotional dysregulation in BPD. Relative to controls, BPD participants showed marginally fewer reinforcement learning biases, particularly decreased lose-shift biases, increased variability in reaction times (coefficient of variation; CV), and a greater percentage of anticipatory decisions. A subset of BPD participants with high levels of impulsivity showed higher overall reward rates, and greater modulation of reaction times by outcome, particularly following loss trials, relative to control and BPD participants with lower levels of impulsivity. Additionally, BPD participants with higher levels of emotional dysregulation showed marginally increased reward rate and increased entropy in choice patterns. Together, our preliminary results suggest that impulsivity and emotional dysregulation may contribute to variability in mixed-strategy decision-making in female adolescents with BPD.
Collapse
Affiliation(s)
- Ashley C. Parr
- Centre for Neuroscience Studies, Queen’s University, Kingston, ON, Canada
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, United States
- Division of Child and Youth Mental Health, Kingston Health Sciences Centre, Kingston, ON, Canada
- *Correspondence: Ashley C. Parr,
| | - Olivia G. Calancie
- Centre for Neuroscience Studies, Queen’s University, Kingston, ON, Canada
- Division of Child and Youth Mental Health, Kingston Health Sciences Centre, Kingston, ON, Canada
| | - Brian C. Coe
- Centre for Neuroscience Studies, Queen’s University, Kingston, ON, Canada
| | - Sarosh Khalid-Khan
- Centre for Neuroscience Studies, Queen’s University, Kingston, ON, Canada
- Division of Child and Youth Mental Health, Kingston Health Sciences Centre, Kingston, ON, Canada
| | - Douglas P. Munoz
- Centre for Neuroscience Studies, Queen’s University, Kingston, ON, Canada
- Department of Biomedical and Molecular Sciences, Queen’s University, Kingston, ON, Canada
- Douglas P. Munoz,
| |
Collapse
|
12
|
Sundvall J, Dyson BJ. Breaking the bonds of reinforcement: Effects of trial outcome, rule consistency and rule complexity against exploitable and unexploitable opponents. PLoS One 2022; 17:e0262249. [PMID: 35108279 PMCID: PMC8809577 DOI: 10.1371/journal.pone.0262249] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Accepted: 12/21/2021] [Indexed: 11/18/2022] Open
Abstract
In two experiments, we used the simple zero-sum game Rock, Paper and Scissors to study the common reinforcement-based rules of repeating choices after winning (win-stay) and shifting from previous choice options after losing (lose-shift). Participants played the game against both computer opponents who could not be exploited and computer opponents who could be exploited by making choices that would at times conflict with reinforcement. Against unexploitable opponents, participants achieved an approximation of random behavior, contrary to previous research commonly finding reinforcement biases. Against exploitable opponents, the participants learned to exploit the opponent regardless of whether optimal choices conflicted with reinforcement or not. The data suggest that learning a rule that allows one to exploit was largely determined by the outcome of the previous trial.
Collapse
Affiliation(s)
| | - Benjamin James Dyson
- University of Alberta, Alberta, Canada
- University of Sussex, Sussex, United Kingdom
- Ryerson University, Toronto, Canada
| |
Collapse
|
13
|
Traner MR, Bromberg-Martin ES, Monosov IE. How the value of the environment controls persistence in visual search. PLoS Comput Biol 2021; 17:e1009662. [PMID: 34905548 PMCID: PMC8714092 DOI: 10.1371/journal.pcbi.1009662] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 12/28/2021] [Accepted: 11/21/2021] [Indexed: 11/18/2022] Open
Abstract
Classic foraging theory predicts that humans and animals aim to gain maximum reward per unit time. However, in standard instrumental conditioning tasks individuals adopt an apparently suboptimal strategy: they respond slowly when the expected value is low. This reward-related bias is often explained as reduced motivation in response to low rewards. Here we present evidence this behavior is associated with a complementary increased motivation to search the environment for alternatives. We trained monkeys to search for reward-related visual targets in environments with different values. We found that the reward-related bias scaled with environment value, was consistent with persistent searching after the target was already found, and was associated with increased exploratory gaze to objects in the environment. A novel computational model of foraging suggests that this search strategy could be adaptive in naturalistic settings where both environments and the objects within them provide partial information about hidden, uncertain rewards.
Collapse
Affiliation(s)
- Michael R. Traner
- Department of Biomedical Engineering, Washington University, St. Louis, Missouri, United States of America
| | - Ethan S. Bromberg-Martin
- Department of Neuroscience, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Ilya E. Monosov
- Department of Biomedical Engineering, Washington University, St. Louis, Missouri, United States of America
- Department of Neuroscience, Washington University School of Medicine, St. Louis, Missouri, United States of America
- Department of Neurosurgery, Washington University, St. Louis, Missouri, United States of America
- Pain Center, Washington University, St. Louis, Missouri, United States of America
- Department of Electrical Engineering, Washington University, St. Louis, Missouri, United States of America
- * E-mail:
| |
Collapse
|
14
|
Konovalov A, Hill C, Daunizeau J, Ruff CC. Dissecting functional contributions of the social brain to strategic behavior. Neuron 2021; 109:3323-3337.e5. [PMID: 34407389 DOI: 10.1016/j.neuron.2021.07.025] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Revised: 06/21/2021] [Accepted: 07/27/2021] [Indexed: 10/20/2022]
Abstract
Social interactions routinely lead to neural activity in a "social brain network" comprising, among other regions, the temporoparietal junction (TPJ) and the dorsomedial prefrontal cortex (dmPFC). But what is the function of these areas? Are they specialized for behavior in social contexts or do they implement computations required for dealing with any reactive process, even non-living entities? Here, we use fMRI and a game paradigm separating the need for these two aspects of cognition. We find that most social-brain areas respond to both social and non-social reactivity rather than just to human opponents. However, the TPJ shows a dissociation from the dmPFC: its activity and connectivity primarily reflect context-dependent outcome processing and reactivity detection, while dmPFC engagement is linked to implementation of a behavioral strategy. Our results characterize an overarching computational property of the social brain but also suggest specialized roles for subregions of this network.
Collapse
Affiliation(s)
- Arkady Konovalov
- Zurich Center for Neuroeconomics (ZNE), Department of Economics, University of Zurich, Zurich 8006, Switzerland.
| | - Christopher Hill
- Zurich Center for Neuroeconomics (ZNE), Department of Economics, University of Zurich, Zurich 8006, Switzerland
| | - Jean Daunizeau
- Université Pierre et Marie Curie, Paris, France; Institut du Cerveau et de la Moelle épinière, Paris, France; INSERM UMR S975, Paris, France
| | - Christian C Ruff
- Zurich Center for Neuroeconomics (ZNE), Department of Economics, University of Zurich, Zurich 8006, Switzerland.
| |
Collapse
|
15
|
Ohta H, Satori K, Takarada Y, Arake M, Ishizuka T, Morimoto Y, Takahashi T. The asymmetric learning rates of murine exploratory behavior in sparse reward environments. Neural Netw 2021; 143:218-229. [PMID: 34157646 DOI: 10.1016/j.neunet.2021.05.030] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2021] [Revised: 04/16/2021] [Accepted: 05/26/2021] [Indexed: 11/29/2022]
Abstract
Goal-oriented behaviors of animals can be modeled by reinforcement learning algorithms. Such algorithms predict future outcomes of selected actions utilizing action values and updating those values in response to the positive and negative outcomes. In many models of animal behavior, the action values are updated symmetrically based on a common learning rate, that is, in the same way for both positive and negative outcomes. However, animals in environments with scarce rewards may have uneven learning rates. To investigate the asymmetry in learning rates in reward and non-reward, we analyzed the exploration behavior of mice in five-armed bandit tasks using a Q-learning model with differential learning rates for positive and negative outcomes. The positive learning rate was significantly higher in a scarce reward environment than in a rich reward environment, and conversely, the negative learning rate was significantly lower in the scarce environment. The positive to negative learning rate ratio was about 10 in the scarce environment and about 2 in the rich environment. This result suggests that when the reward probability was low, the mice tend to ignore failures and exploit the rare rewards. Computational modeling analysis revealed that the increased learning rates ratio could cause an overestimation of and perseveration on rare-rewarding events, increasing total reward acquisition in the scarce environment but disadvantaging impartial exploration.
Collapse
Affiliation(s)
- Hiroyuki Ohta
- Department of Pharmacology, National Defense Medical College, Saitama, 359-8513, Japan.
| | | | - Yu Takarada
- Tokyo Denki University, Saitama, 350-0394, Japan
| | - Masashi Arake
- Department of Physiology, National Defense Medical College, Saitama, 359-8513, Japan
| | - Toshiaki Ishizuka
- Department of Pharmacology, National Defense Medical College, Saitama, 359-8513, Japan
| | - Yuji Morimoto
- Department of Physiology, National Defense Medical College, Saitama, 359-8513, Japan
| | | |
Collapse
|
16
|
Belkaid M, Bousseyrol E, Durand-de Cuttoli R, Dongelmans M, Duranté EK, Ahmed Yahia T, Didienne S, Hanesse B, Come M, Mourot A, Naudé J, Sigaud O, Faure P. Mice adaptively generate choice variability in a deterministic task. Commun Biol 2020; 3:34. [PMID: 31965053 PMCID: PMC6972896 DOI: 10.1038/s42003-020-0759-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Accepted: 12/20/2019] [Indexed: 01/30/2023] Open
Abstract
Can decisions be made solely by chance? Can variability be intrinsic to the decision-maker or is it inherited from environmental conditions? To investigate these questions, we designed a deterministic setting in which mice are rewarded for non-repetitive choice sequences, and modeled the experiment using reinforcement learning. We found that mice progressively increased their choice variability. Although an optimal strategy based on sequences learning was theoretically possible and would be more rewarding, animals used a pseudo-random selection which ensures high success rate. This was not the case if the animal is exposed to a uniform probabilistic reward delivery. We also show that mice were blind to changes in the temporal structure of reward delivery once they learned to choose at random. Overall, our results demonstrate that a decision-making process can self-generate variability and randomness, even when the rules governing reward delivery are neither stochastic nor volatile.
Collapse
Affiliation(s)
- Marwen Belkaid
- Sorbonne Université, CNRS, Institut des Systèmes Intelligents et de Robotique (ISIR), 75005, Paris, France
| | - Elise Bousseyrol
- Sorbonne Université, INSERM, CNRS, Neuroscience Paris Seine - Institut de Biologie Paris Seine (NPS - IBPS), 75005, Paris, France
| | - Romain Durand-de Cuttoli
- Sorbonne Université, INSERM, CNRS, Neuroscience Paris Seine - Institut de Biologie Paris Seine (NPS - IBPS), 75005, Paris, France
| | - Malou Dongelmans
- Sorbonne Université, INSERM, CNRS, Neuroscience Paris Seine - Institut de Biologie Paris Seine (NPS - IBPS), 75005, Paris, France
| | - Etienne K Duranté
- Sorbonne Université, INSERM, CNRS, Neuroscience Paris Seine - Institut de Biologie Paris Seine (NPS - IBPS), 75005, Paris, France
| | - Tarek Ahmed Yahia
- Sorbonne Université, INSERM, CNRS, Neuroscience Paris Seine - Institut de Biologie Paris Seine (NPS - IBPS), 75005, Paris, France
| | - Steve Didienne
- Sorbonne Université, INSERM, CNRS, Neuroscience Paris Seine - Institut de Biologie Paris Seine (NPS - IBPS), 75005, Paris, France
| | - Bernadette Hanesse
- Sorbonne Université, INSERM, CNRS, Neuroscience Paris Seine - Institut de Biologie Paris Seine (NPS - IBPS), 75005, Paris, France
| | - Maxime Come
- Sorbonne Université, INSERM, CNRS, Neuroscience Paris Seine - Institut de Biologie Paris Seine (NPS - IBPS), 75005, Paris, France
| | - Alexandre Mourot
- Sorbonne Université, INSERM, CNRS, Neuroscience Paris Seine - Institut de Biologie Paris Seine (NPS - IBPS), 75005, Paris, France
| | - Jérémie Naudé
- Sorbonne Université, INSERM, CNRS, Neuroscience Paris Seine - Institut de Biologie Paris Seine (NPS - IBPS), 75005, Paris, France
| | - Olivier Sigaud
- Sorbonne Université, CNRS, Institut des Systèmes Intelligents et de Robotique (ISIR), 75005, Paris, France
| | - Philippe Faure
- Sorbonne Université, INSERM, CNRS, Neuroscience Paris Seine - Institut de Biologie Paris Seine (NPS - IBPS), 75005, Paris, France.
| |
Collapse
|
17
|
Cook JL, Swart JC, Froböse MI, Diaconescu AO, Geurts DEM, den Ouden HEM, Cools R. Catecholaminergic modulation of meta-learning. eLife 2019; 8:e51439. [PMID: 31850844 PMCID: PMC6974360 DOI: 10.7554/elife.51439] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Accepted: 12/18/2019] [Indexed: 01/03/2023] Open
Abstract
The remarkable expedience of human learning is thought to be underpinned by meta-learning, whereby slow accumulative learning processes are rapidly adjusted to the current learning environment. To date, the neurobiological implementation of meta-learning remains unclear. A burgeoning literature argues for an important role for the catecholamines dopamine and noradrenaline in meta-learning. Here, we tested the hypothesis that enhancing catecholamine function modulates the ability to optimise a meta-learning parameter (learning rate) as a function of environmental volatility. 102 participants completed a task which required learning in stable phases, where the probability of reinforcement was constant, and volatile phases, where probabilities changed every 10-30 trials. The catecholamine transporter blocker methylphenidate enhanced participants' ability to adapt learning rate: Under methylphenidate, compared with placebo, participants exhibited higher learning rates in volatile relative to stable phases. Furthermore, this effect was significant only with respect to direct learning based on the participants' own experience, there was no significant effect on inferred-value learning where stimulus values had to be inferred. These data demonstrate a causal link between catecholaminergic modulation and the adjustment of the meta-learning parameter learning rate.
Collapse
Affiliation(s)
- Jennifer L Cook
- School of PsychologyUniversity of BirminghamBirminghamUnited Kingdom
| | - Jennifer C Swart
- Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive NeuroimagingRadboud UniversityNijmegenNetherlands
| | - Monja I Froböse
- Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive NeuroimagingRadboud UniversityNijmegenNetherlands
| | - Andreea O Diaconescu
- Translational Neuromodeling Unit, Institute for Biomedical EngineeringUniversity of Zurich and ETH ZurichZurichSwitzerland
- Department of PsychiatryUniversity of BaselBaselSwitzerland
- Krembil Centre for Neuroinformatics,CAMHUniversity of TorontoTorontoCanada
| | - Dirk EM Geurts
- Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive NeuroimagingRadboud UniversityNijmegenNetherlands
- Department of PsychiatryRadboud University Medical CentreNijmegenNetherlands
| | - Hanneke EM den Ouden
- Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive NeuroimagingRadboud UniversityNijmegenNetherlands
| | - Roshan Cools
- Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive NeuroimagingRadboud UniversityNijmegenNetherlands
- Department of PsychiatryRadboud University Medical CentreNijmegenNetherlands
| |
Collapse
|
18
|
Parr AC, Coe BC, Munoz DP, Dorris MC. A novel fMRI paradigm to dissociate the behavioral and neural components of mixed-strategy decision making from non-strategic decisions in humans. Eur J Neurosci 2019; 51:1914-1927. [PMID: 31596980 DOI: 10.1111/ejn.14586] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Revised: 08/22/2019] [Accepted: 09/18/2019] [Indexed: 11/30/2022]
Abstract
During competitive interactions, such as predator-prey or team sports, the outcome of one's actions is dependent on both their own choices and those of their opponents. Success in these rivalries requires that individuals choose dynamically and unpredictably, often adopting a mixed strategy. Understanding the neural basis of strategic decision making is complicated by the fact that it recruits various cognitive processes that are often shared with non-strategic forms of decision making, such as value estimation, working memory, response inhibition, response selection, and reward processes. Although researchers have explored neural activity within key brain regions during mixed-strategy games, how brain activity differs in the context of strategic interactions versus non-strategic choices is not well understood. We developed a novel behavioral paradigm to dissociate choice behavior during mixed-strategy interactions from non-strategic choices, and we used task-based functional magnetic resonance imaging (fMRI) to contrast brain activation. In a block design, participants competed in the classic mixed-strategy game, "matching pennies," against a dynamic computer opponent designed to exploit predictability in players' response patterns. Results were contrasted with a non-strategic task that had comparable sensory input, motor output, and reward rate; thus, differences in behavior and brain activation reflect strategic processes. The mixed-strategy game was associated with activation of a distributed cortico-striatal network compared to the non-strategic task. We propose that choosing in mixed-strategy contexts requires additional cognitive demands present to a lesser degree during the control task, illustrating the strength of this design in probing function of cognitive systems beyond core sensory, motor, and reward processes.
Collapse
Affiliation(s)
- Ashley C Parr
- Centre for Neuroscience Studies, Queen's University, Kingston, ON, Canada.,Department of Psychiatry, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Brian C Coe
- Centre for Neuroscience Studies, Queen's University, Kingston, ON, Canada
| | - Douglas P Munoz
- Centre for Neuroscience Studies, Queen's University, Kingston, ON, Canada.,Department of Biomedical and Molecular Sciences, Queen's University, Kingston, ON, Canada
| | - Michael C Dorris
- Institute of Neuroscience, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|
19
|
Behavioural Isomorphism, Cognitive Economy and Recursive Thought in Non-Transitive Game Strategy. GAMES 2019. [DOI: 10.3390/g10030032] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Game spaces in which an organism must repeatedly compete with an opponent for mutually exclusive outcomes are critical methodologies for understanding decision-making under pressure. In the non-transitive game rock, paper, scissors (RPS), the only technique that guarantees the lack of exploitation is to perform randomly in accordance with mixed-strategy. However, such behavior is thought to be outside bounded rationality and so decision-making can become deterministic, predictable, and ultimately exploitable. This review identifies similarities across economics, neuroscience, nonlinear dynamics, human, and animal cognition literatures, and provides a taxonomy of RPS strategy. RPS strategies are discussed in terms of (a) whether the relevant computations require sensitivity to item frequency, the cyclic relationships between responses, or the outcome of the previous trial, and (b) whether the strategy is framed around the self or other. The negative implication of this taxonomy is that despite the differences in cognitive economy and recursive thought, many of the identified strategies are behaviorally isomorphic. This makes it difficult to infer strategy from behavior. The positive implication is that this isomorphism can be used as a novel design feature in furthering our understanding of the attribution, agency, and acquisition of strategy in RPS and other game spaces.
Collapse
|
20
|
Thapa R, Donovan CH, Wong SA, Sutherland RJ, Gruber AJ. Lesions of lateral habenula attenuate win-stay but not lose-shift responses in a competitive choice task. Neurosci Lett 2019; 692:159-166. [PMID: 30389419 DOI: 10.1016/j.neulet.2018.10.056] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2018] [Revised: 10/04/2018] [Accepted: 10/29/2018] [Indexed: 11/18/2022]
Abstract
Multiple neural systems contribute to choice adaptation following reinforcement. Recent evidence suggests that the lateral habenula (LHb) plays a key role in such adaptations, particularly when reinforcements are worse than expected. Here, we investigated the effects of bilateral LHb lesions on responding in a binary choice task with no discriminatory cues. LHb lesions in rats decreased win-stay responses but surprisingly left lose-shift responses intact. This same dissociated effect was also observed after systemic administration of d-amphetamine in a separate cohort of animals. These results suggest that at least some behavioural responses triggered by reward omission do not depend on an intact LHb.
Collapse
Affiliation(s)
- Rajat Thapa
- Department of Neuroscience, Canadian Centre for Behavioural Neuroscience, University of Lethbridge, 4401 University Dr. W., T1K 3M4, Lethbridge, AB, Canada
| | - Clifford H Donovan
- Department of Neuroscience, Canadian Centre for Behavioural Neuroscience, University of Lethbridge, 4401 University Dr. W., T1K 3M4, Lethbridge, AB, Canada
| | - Scott A Wong
- Department of Neuroscience, Canadian Centre for Behavioural Neuroscience, University of Lethbridge, 4401 University Dr. W., T1K 3M4, Lethbridge, AB, Canada
| | - Robert J Sutherland
- Department of Neuroscience, Canadian Centre for Behavioural Neuroscience, University of Lethbridge, 4401 University Dr. W., T1K 3M4, Lethbridge, AB, Canada
| | - Aaron J Gruber
- Department of Neuroscience, Canadian Centre for Behavioural Neuroscience, University of Lethbridge, 4401 University Dr. W., T1K 3M4, Lethbridge, AB, Canada.
| |
Collapse
|
21
|
Thapa R, Gruber AJ. Lesions of ventrolateral striatum eliminate lose-shift but not win-stay behaviour in rats. Neurobiol Learn Mem 2018; 155:446-451. [PMID: 30179660 DOI: 10.1016/j.nlm.2018.08.022] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2018] [Revised: 08/27/2018] [Accepted: 08/31/2018] [Indexed: 11/19/2022]
Abstract
Animals tend to repeat actions that are associated with reward delivery, whereas they tend to shift responses to alternate choices following reward omission. These so-called win-stay and lose-shift responses are employed by a wide range of animals in a variety of decision-making scenarios, and depend on dissociated regions of the striatum. Specifically, lose-shift responding is impaired by extensive excitotoxic lesions of the lateral striatum. Here we used focal lesions to assess whether dorsal and ventral regions of the lateral striatum contribute differently to this effect. We found that damage to ventrolateral striatum reduced lose-shift responding without impairing win-stay, motoric, or motivational aspects of behaviour in the task, whereas lesions confined to the dorsolateral striatum significantly impaired the ability of rats to complete trials of the task. Moreover, lesions to the dorsomedial striatum had no effect on either lose-shift or win-stay responding. Together, these data suggest a novel role of the ventral portion of the lateral striatum in driving lose-shift decisions.
Collapse
Affiliation(s)
- Rajat Thapa
- Department of Neuroscience, Canadian Centre for Behavioural Neuroscience, University of Lethbridge, 4401 University Dr. W., T1K 3M4 Lethbridge, AB, Canada
| | - Aaron J Gruber
- Department of Neuroscience, Canadian Centre for Behavioural Neuroscience, University of Lethbridge, 4401 University Dr. W., T1K 3M4 Lethbridge, AB, Canada.
| |
Collapse
|
22
|
Iigaya K, Fonseca MS, Murakami M, Mainen ZF, Dayan P. An effect of serotonergic stimulation on learning rates for rewards apparent after long intertrial intervals. Nat Commun 2018; 9:2477. [PMID: 29946069 PMCID: PMC6018802 DOI: 10.1038/s41467-018-04840-2] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2017] [Accepted: 05/22/2018] [Indexed: 12/02/2022] Open
Abstract
Serotonin has widespread, but computationally obscure, modulatory effects on learning and cognition. Here, we studied the impact of optogenetic stimulation of dorsal raphe serotonin neurons in mice performing a non-stationary, reward-driven decision-making task. Animals showed two distinct choice strategies. Choices after short inter-trial-intervals (ITIs) depended only on the last trial outcome and followed a win-stay-lose-switch pattern. In contrast, choices after long ITIs reflected outcome history over multiple trials, as described by reinforcement learning models. We found that optogenetic stimulation during a trial significantly boosted the rate of learning that occurred due to the outcome of that trial, but these effects were only exhibited on choices after long ITIs. This suggests that serotonin neurons modulate reinforcement learning rates, and that this influence is masked by alternate, unaffected, decision mechanisms. These results provide insight into the role of serotonin in treating psychiatric disorders, particularly its modulation of neural plasticity and learning. Serotonin (5-HT) plays many important roles in reward, punishment, patience and beyond, and optogenetic stimulation of 5-HT neurons has not crisply parsed them. The authors report a novel analysis of a reward-based decision-making experiment, and show that 5-HT stimulation increases the learning rate, but only on a select subset of choices.
Collapse
Affiliation(s)
- Kiyohito Iigaya
- Gatsby Computational Neuroscience Unit, University College London, 25 Howland Street, London, W1T 4JG, UK. .,Max Planck UCL Centre for Computational Psychiatry and Ageing Research, Russell Square House, 10-12 Russell Square, London, WC1B 5EH, UK. .,Division of Humanities and Social Sciences, California Institute of Technology, 1200 E California Blvd, Pasadena, CA, 91125, USA.
| | - Madalena S Fonseca
- Champalimaud Research, Champalimaud Centre for the Unknown, Avenida Brasília, 1400-038, Lisbon, Portugal
| | - Masayoshi Murakami
- Champalimaud Research, Champalimaud Centre for the Unknown, Avenida Brasília, 1400-038, Lisbon, Portugal
| | - Zachary F Mainen
- Champalimaud Research, Champalimaud Centre for the Unknown, Avenida Brasília, 1400-038, Lisbon, Portugal
| | - Peter Dayan
- Gatsby Computational Neuroscience Unit, University College London, 25 Howland Street, London, W1T 4JG, UK.,Max Planck UCL Centre for Computational Psychiatry and Ageing Research, Russell Square House, 10-12 Russell Square, London, WC1B 5EH, UK
| |
Collapse
|
23
|
Donovan CH, Wong SA, Randolph SH, Stark RA, Gibb RL, Gruber AJ. Sex differences in rat decision-making: The confounding role of extraneous feeder sampling between trials. Behav Brain Res 2018; 342:62-69. [PMID: 29355674 DOI: 10.1016/j.bbr.2018.01.018] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2017] [Revised: 01/08/2018] [Accepted: 01/16/2018] [Indexed: 01/19/2023]
Abstract
Although male and female rats appear to perform differently in some tasks, a clear picture of sex differences in decision-making has yet to develop. This is in part due to significant variability arising from differences in strains and tasks. The aim of this study was to characterize the effects of sex on specific response elements in a reinforcement learning task so as to help identify potential explanations for this variability. We found that the primary difference between sexes was the propensity to approach feeders out of the task context. This extraneous feeder sampling affects choice on subsequent trials in both sexes by promoting a lose-shift response away from the last feeder sampled. Female rats, however, were more likely to engage in this extraneous feeder sampling, and therefore exhibited a greater rate of this effect. Once trials following extraneous sampling were removed, there were no significant sex differences in any of the tested measures. These data suggest that feeder approach outside of the task context, which is often not recorded, could produce a confound in sex-based differences of reinforcement sensitivity in some tasks.
Collapse
Affiliation(s)
- Clifford H Donovan
- Canadian Centre for Behavioural Neuroscience, Department of Neuroscience, University of Lethbridge, 4401 University Drive West, Lethbridge, AB, T1K 6T5, Canada
| | - Scott A Wong
- Canadian Centre for Behavioural Neuroscience, Department of Neuroscience, University of Lethbridge, 4401 University Drive West, Lethbridge, AB, T1K 6T5, Canada
| | - Sienna H Randolph
- Canadian Centre for Behavioural Neuroscience, Department of Neuroscience, University of Lethbridge, 4401 University Drive West, Lethbridge, AB, T1K 6T5, Canada
| | - Rachel A Stark
- Canadian Centre for Behavioural Neuroscience, Department of Neuroscience, University of Lethbridge, 4401 University Drive West, Lethbridge, AB, T1K 6T5, Canada
| | - Robbin L Gibb
- Canadian Centre for Behavioural Neuroscience, Department of Neuroscience, University of Lethbridge, 4401 University Drive West, Lethbridge, AB, T1K 6T5, Canada
| | - Aaron J Gruber
- Canadian Centre for Behavioural Neuroscience, Department of Neuroscience, University of Lethbridge, 4401 University Drive West, Lethbridge, AB, T1K 6T5, Canada.
| |
Collapse
|
24
|
Out of sight, out of mind: Occlusion and eye closure destabilize moving bistable structure-from-motion displays. Atten Percept Psychophys 2018; 80:1193-1204. [PMID: 29560607 DOI: 10.3758/s13414-018-1505-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Our brain constantly tries to anticipate the future by using a variety of memory mechanisms. Interestingly, studies using the intermittent presentation of multistable displays have shown little perceptual persistence for interruptions longer than a few hundred milliseconds. Here we examined whether we can facilitate the perceptual stability of bistable displays following a period of invisibility by employing a physically plausible and ecologically valid occlusion event sequence, as opposed to the typical intermittent presentation, with sudden onsets and offsets. To this end, we presented a bistable rotating structure-from-motion display that was moving along a linear horizontal trajectory on the screen and either was temporarily occluded by another object (a cardboard strip in Exp. 1, a computer-generated image in Exp. 2) or became invisible due to eye closure (Exp. 3). We report that a bistable rotation direction reliably persisted following occlusion or interruption only (1) if the pre- and postinterruption locations overlapped spatially (an occluder with apertures in Exp. 2 or brief, spontaneous blinks in Exp. 3) or (2) if an object's size allowed for the efficient grouping of dots on both sides of the occluding object (large objects in Exp. 1). In contrast, we observed no persistence whenever the pre- and postinterruption locations were nonoverlapping (large solid occluding objects in Exps. 1 and 2 and long, prompted blinks in Exp. 3). We report that the bistable rotation direction of a moving object persisted only for spatially overlapping neural representations, and that persistence was not facilitated by a physically plausible and ecologically valid occlusion event.
Collapse
|
25
|
Ivan VE, Banks PJ, Goodfellow K, Gruber AJ. Lose-Shift Responding in Humans Is Promoted by Increased Cognitive Load. Front Integr Neurosci 2018; 12:9. [PMID: 29568264 PMCID: PMC5852382 DOI: 10.3389/fnint.2018.00009] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2017] [Accepted: 02/22/2018] [Indexed: 01/20/2023] Open
Abstract
The propensity of animals to shift choices immediately after unexpectedly poor reinforcement outcomes is a pervasive strategy across species and tasks. We report here on the memory supporting such lose-shift responding in humans, assessed using a binary choice task in which random responding is the optimal strategy. Participants exhibited little lose-shift responding when fully attending to the task, but this increased by 30%–40% in participants that performed with additional cognitive load that is known to tax executive systems. Lose-shift responding in the cognitively loaded adults persisted throughout the testing session, despite being a sub-optimal strategy, but was less likely as the time increased between reinforcement and the subsequent choice. Furthermore, children (5–9 years old) without load performed similarly to the cognitively loaded adults. This effect disappeared in older children aged 11–13 years old. These data provide evidence supporting our hypothesis that lose-shift responding is a default and reflexive strategy in the mammalian brain, likely mediated by a decaying memory trace, and is normally suppressed by executive systems. Reducing the efficacy of executive control by cognitive load (adults) or underdevelopment (children) increases its prevalence. It may therefore be an important component to consider when interpreting choice data, and may serve as an objective behavioral assay of executive function in humans that is easy to measure.
Collapse
Affiliation(s)
- Victorita E Ivan
- Canadian Centre for Behavioral Neuroscience, Department of Neuroscience, University of Lethbridge, Lethbridge, AB, Canada
| | - Parker J Banks
- Canadian Centre for Behavioral Neuroscience, Department of Neuroscience, University of Lethbridge, Lethbridge, AB, Canada
| | - Kris Goodfellow
- Canadian Centre for Behavioral Neuroscience, Department of Neuroscience, University of Lethbridge, Lethbridge, AB, Canada
| | - Aaron J Gruber
- Canadian Centre for Behavioral Neuroscience, Department of Neuroscience, University of Lethbridge, Lethbridge, AB, Canada
| |
Collapse
|
26
|
Feeder Approach between Trials Is Increased by Uncertainty and Affects Subsequent Choices. eNeuro 2018; 4:eN-NWR-0437-17. [PMID: 29313000 PMCID: PMC5757189 DOI: 10.1523/eneuro.0437-17.2017] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2017] [Accepted: 12/14/2017] [Indexed: 01/16/2023] Open
Abstract
Animals quickly learn to approach sources of food. Here, we report on a form of approach in which rats made volitional orofacial contact with inactive feeders between trials of a self-paced operant task. This extraneous feeder sampling (EFS) was never reinforced and therefore imposed an opportunity and effort cost. EFS decreased during initial training but persisted thereafter. The relative rate of EFS to operant responding increased with novel changes to the operant chamber, reward devaluation by prefeeding, or lesions to the dorsolateral striatum. We speculate that this may function to increase exploration when the task is uncertain (early in learning or introduction of novel apparatus components), when the opportunity cost is low, or when the learned sensorimotor solution is compromised. Moreover, EFS strongly affected subsequent choices by triggering a lose-shift response away from the sampled feeder, even though it occurred outside of the trial context. This indicates that at least some behaviors occurring between trials impact future behaviors and should be considered in decision-making studies.
Collapse
|
27
|
Harré MS. Strategic Information Processing from Behavioural Data in Iterated Games. ENTROPY 2018; 20:e20010027. [PMID: 33265117 PMCID: PMC7512235 DOI: 10.3390/e20010027] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Revised: 12/24/2017] [Accepted: 12/28/2017] [Indexed: 11/25/2022]
Abstract
Iterated games are an important framework of economic theory and application, at least since the original work of Axelrod’s computational tournaments of the early 80’s. Recent theoretical results have shown that games (the economic context) and game theory (the decision-making process) are both formally equivalent to computational logic gates. Here these results are extended to behavioural data obtained from an experiment in which rhesus monkeys sequentially played thousands of the “matching pennies” game, an empirical example similar to Axelrod’s tournaments in which algorithms played against one another. The results show that the monkeys exhibit a rich variety of behaviours, both between and within subjects when playing opponents of varying complexity. Despite earlier suggestions, there is no clear evidence that the win-stay, lose-switch strategy is used, however there is evidence of non-linear strategy-based interactions between the predictors of future choices. It is also shown that there is consistent evidence across protocols and across individuals that the monkeys extract non-markovian information, i.e., information from more than just the most recent state of the game. This work shows that the use of information theory in game theory can test important hypotheses that would otherwise be more difficult to extract using traditional statistical methods.
Collapse
Affiliation(s)
- Michael S Harré
- Complex Systems Research Group, Faculty of Engineering and IT, The University of Sydney, Sydney 2006, Australia
| |
Collapse
|
28
|
Wong SA, Randolph SH, Ivan VE, Gruber AJ. Acute Δ-9-tetrahydrocannabinol administration in female rats attenuates immediate responses following losses but not multi-trial reinforcement learning from wins. Behav Brain Res 2017; 335:136-144. [PMID: 28811178 DOI: 10.1016/j.bbr.2017.08.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Revised: 07/28/2017] [Accepted: 08/05/2017] [Indexed: 10/19/2022]
Abstract
Δ-9-Tetrahydrocannabinol (THC) is the main psychoactive component of marijuana and has potent effects on decision-making, including a proposed reduction in cognitive flexibility. We demonstrate here that acute THC administration differentially affects some of the processes that contribute to cognitive flexibility. Specifically, THC reduces lose-shift responding in which female rats tend to immediately shift choice responses away from options that result in reward omission on the previous trial. THC, however, did not impair the ability of rats to flexibly bias responses toward feeders with higher probability of reward in a reversal task. This response adaptation developed over several trials, suggesting that THC did not impair slower forms of reinforcement learning needed to choose among options with unequal utility. This dissociation of THC's effects on innate/rapid and learned/gradual decision-making processes was unexpected, but is supported by emerging evidence that lose-shift responding is mediated by neural mechanisms distinct from those involved in other forms of reinforcement learning. The present data suggest that, at least in some tasks, the apparent reductions in cognitive flexibility by THC may be explained by the immediate effects on loss sensitivity, rather than impairments of all processes used for choice adaptation.
Collapse
Affiliation(s)
- Scott A Wong
- Canadian Centre for Behavioural Neuroscience, Department of Neuroscience, University of Lethbridge, Lethbridge, AB, Canada
| | - Sienna H Randolph
- Canadian Centre for Behavioural Neuroscience, Department of Neuroscience, University of Lethbridge, Lethbridge, AB, Canada
| | - Victorita E Ivan
- Canadian Centre for Behavioural Neuroscience, Department of Neuroscience, University of Lethbridge, Lethbridge, AB, Canada
| | - Aaron J Gruber
- Canadian Centre for Behavioural Neuroscience, Department of Neuroscience, University of Lethbridge, Lethbridge, AB, Canada.
| |
Collapse
|
29
|
Utility, Revealed Preferences Theory, and Strategic Ambiguity in Iterated Games. ENTROPY 2017. [DOI: 10.3390/e19050201] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
30
|
The Memory Trace Supporting Lose-Shift Responding Decays Rapidly after Reward Omission and Is Distinct from Other Learning Mechanisms in Rats. eNeuro 2016; 3:eN-NWR-0167-16. [PMID: 27896312 PMCID: PMC5112541 DOI: 10.1523/eneuro.0167-16.2016] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2016] [Revised: 10/27/2016] [Accepted: 11/01/2016] [Indexed: 11/21/2022] Open
Abstract
The propensity of animals to shift choices immediately after unexpectedly poor reinforcement outcomes is a pervasive strategy across species and tasks. We report here that the memory supporting such lose-shift responding in rats rapidly decays during the intertrial interval and persists throughout training and testing on a binary choice task, despite being a suboptimal strategy. Lose-shift responding is not positively correlated with the prevalence and temporal dependence of win-stay responding, and it is inconsistent with predictions of reinforcement learning on the task. These data provide further evidence that win-stay and lose-shift are mediated by dissociated neural mechanisms and indicate that lose-shift responding presents a potential confound for the study of choice in the many operant choice tasks with short intertrial intervals. We propose that this immediate lose-shift responding is an intrinsic feature of the brain’s choice mechanisms that is engaged as a choice reflex and works in parallel with reinforcement learning and other control mechanisms to guide action selection.
Collapse
|
31
|
Wong SA, Thapa R, Badenhorst CA, Briggs AR, Sawada JA, Gruber AJ. Opposing effects of acute and chronic d-amphetamine on decision-making in rats. Neuroscience 2016; 345:218-228. [PMID: 27113327 DOI: 10.1016/j.neuroscience.2016.04.021] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2015] [Revised: 03/15/2016] [Accepted: 04/15/2016] [Indexed: 11/17/2022]
Abstract
Amphetamine and other drugs of abuse have both short-term and long-lasting effects on brain function, and drug sensitization paradigms often result in chronic impairments in behavioral flexibility. Here we show that acute amphetamine administration temporarily renders rats less sensitive to reward omission, as revealed by a decrease in lose-shift responding during a binary choice task. Intracerebral infusions of amphetamine into the ventral striatum did not affect lose-shift responding but did increase impulsive behavior in which rats chose to check both reward feeders before beginning the next trial. In contrast to acute systemic and intracerebral infusions, sensitization through repeated exposure induced long-lasting increased sensitivity to reward omission. These treatments did not affect choices on trials following reward delivery (i.e. win-stay responding), and sensitization increased spine density in the sensorimotor striatum. The dichotomous effects of amphetamine on short-term and long-term loss sensitivity, and the null effect on win-stay responding, are consistent with a shift of behavioral control to the sensorimotor striatum after drug sensitization. These data provide a new demonstration of such a shift in a novel task unrelated to drug administration, and suggests that the dominance of sensorimotor control persists over many hundreds of trials after sensitization.
Collapse
Affiliation(s)
- Scott A Wong
- Canadian Centre for Behavioral Neuroscience, Department of Neuroscience, University of Lethbridge, Lethbridge, AB, Canada
| | - Raj Thapa
- Canadian Centre for Behavioral Neuroscience, Department of Neuroscience, University of Lethbridge, Lethbridge, AB, Canada
| | - Cecilia A Badenhorst
- Canadian Centre for Behavioral Neuroscience, Department of Neuroscience, University of Lethbridge, Lethbridge, AB, Canada
| | - Alicia R Briggs
- Canadian Centre for Behavioral Neuroscience, Department of Neuroscience, University of Lethbridge, Lethbridge, AB, Canada
| | - Justan A Sawada
- Canadian Centre for Behavioral Neuroscience, Department of Neuroscience, University of Lethbridge, Lethbridge, AB, Canada
| | - Aaron J Gruber
- Canadian Centre for Behavioral Neuroscience, Department of Neuroscience, University of Lethbridge, Lethbridge, AB, Canada.
| |
Collapse
|
32
|
Negative outcomes evoke cyclic irrational decisions in Rock, Paper, Scissors. Sci Rep 2016; 6:20479. [PMID: 26843423 PMCID: PMC4740902 DOI: 10.1038/srep20479] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2015] [Accepted: 10/26/2015] [Indexed: 11/08/2022] Open
Abstract
Rock, Paper, Scissors (RPS) represents a unique gaming space in which the predictions of human rational decision-making can be compared with actual performance. Playing a computerized opponent adopting a mixed-strategy equilibrium, participants revealed a non-significant tendency to over-select Rock. Further violations of rational decision-making were observed using an inter-trial analysis where participants were more likely to switch their item selection at trial n + 1 following a loss or draw at trial n, revealing the strategic vulnerability of individuals following the experience of negative rather than positive outcome. Unique switch strategies related to each of these trial n outcomes were also identified: after losing participants were more likely to 'downgrade' their item (e.g., Rock followed by Scissors) but after drawing participants were more likely to 'upgrade' their item (e.g., Rock followed by Paper). Further repetition analysis revealed that participants were more likely to continue their specific cyclic item change strategy into trial n + 2. The data reveal the strategic vulnerability of individuals following the experience of negative rather than positive outcome, the tensions between behavioural and cognitive influences on decision making, and underline the dangers of increased behavioural predictability in other recursive, non-cooperative environments such as economics and politics.
Collapse
|
33
|
Yu G, Xu B, Zhao Y, Zhang B, Yang M, Kan JYY, Milstein DM, Thevarajah D, Dorris MC. Microsaccade direction reflects the economic value of potential saccade goals and predicts saccade choice. J Neurophysiol 2016; 115:741-51. [PMID: 26609118 DOI: 10.1152/jn.00987.2015] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2015] [Accepted: 11/20/2015] [Indexed: 11/22/2022] Open
Abstract
Microsaccades are small-amplitude (typically <1°), ballistic eye movements that occur when attempting to fixate gaze. Initially thought to be generated randomly, it has recently been established that microsaccades are influenced by sensory stimuli, attentional processes, and certain cognitive states. Whether decision processes influence microsaccades, however, is unknown. Here, we adapted two classic economic tasks to examine whether microsaccades reflect evolving saccade decisions. Volitional saccade choices of monkey and human subjects provided a measure of the subjective value of targets. Importantly, analyses occurred during a period of complete darkness to minimize the known influence of sensory and attentional processes on microsaccades. As the time of saccadic choice approached, microsaccade direction became the following: 1) biased toward targets as a function of their subjective value and 2) predictive of upcoming, voluntary choice. Our results indicate that microsaccade direction is influenced by and is a reliable tell of evolving saccade decisions. Our results are consistent with dynamic decision processes within the midbrain superior colliculus; that is, microsaccade direction is influenced by the transition of activity toward caudal saccade regions associated with high saccade value and/or future saccade choice.
Collapse
Affiliation(s)
- Gongchen Yu
- Institute of Neuroscience and Key Laboratory of Primate Neurobiology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China; and
| | - Baijie Xu
- Institute of Neuroscience and Key Laboratory of Primate Neurobiology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China; and
| | - Yuchen Zhao
- Institute of Neuroscience and Key Laboratory of Primate Neurobiology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China; and
| | - Beizhen Zhang
- Institute of Neuroscience and Key Laboratory of Primate Neurobiology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China; and
| | - Mingpo Yang
- Institute of Neuroscience and Key Laboratory of Primate Neurobiology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China; and
| | - Janis Ying Ying Kan
- Department of Biomedical and Molecular Sciences, Centre for Neuroscience Studies, Queen's University, Kingston, Ontario, Canada
| | - David Martin Milstein
- Department of Biomedical and Molecular Sciences, Centre for Neuroscience Studies, Queen's University, Kingston, Ontario, Canada
| | - Dhushan Thevarajah
- Department of Biomedical and Molecular Sciences, Centre for Neuroscience Studies, Queen's University, Kingston, Ontario, Canada
| | - Michael Christopher Dorris
- Institute of Neuroscience and Key Laboratory of Primate Neurobiology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China; and
| |
Collapse
|
34
|
Neural Basis of Strategic Decision Making. Trends Neurosci 2015; 39:40-48. [PMID: 26688301 DOI: 10.1016/j.tins.2015.11.002] [Citation(s) in RCA: 64] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2015] [Revised: 11/03/2015] [Accepted: 11/10/2015] [Indexed: 11/23/2022]
Abstract
Human choice behaviors during social interactions often deviate from the predictions of game theory. This might arise partly from the limitations in the cognitive abilities necessary for recursive reasoning about the behaviors of others. In addition, during iterative social interactions, choices might change dynamically as knowledge about the intentions of others and estimates for choice outcomes are incrementally updated via reinforcement learning. Some of the brain circuits utilized during social decision making might be general-purpose and contribute to isomorphic individual and social decision making. By contrast, regions in the medial prefrontal cortex (mPFC) and temporal parietal junction (TPJ) might be recruited for cognitive processes unique to social decision making.
Collapse
|
35
|
Abstract
UNLABELLED Context plays a pivotal role in many decision-making scenarios, including social interactions wherein the identities and strategies of other decision makers often shape our behaviors. However, the neural mechanisms for tracking such contextual information are poorly understood. Here, we investigated how opponent identity affects human reinforcement learning during a simulated competitive game against two independent computerized opponents. We found that strategies of participants were affected preferentially by the outcomes of the previous interactions with the same opponent. In addition, reinforcement signals from the previous trial were less discriminable throughout the brain after the opponent changed, compared with when the same opponent was repeated. These opponent-selective reinforcement signals were particularly robust in right rostral anterior cingulate and right lingual regions, where opponent-selective reinforcement signals correlated with a behavioral measure of opponent-selective reinforcement learning. Therefore, when choices involve multiple contextual frames, such as different opponents in a game, decision making and its neural correlates are influenced by multithreaded histories of reinforcement. Overall, our findings are consistent with the availability of temporally overlapping, context-specific reinforcement signals. SIGNIFICANCE STATEMENT In real-world decision making, context plays a strong role in determining the value of an action. Similar choices take on different values depending on setting. We examined the contextual dependence of reward-based learning and reinforcement signals using a simple two-choice matching-pennies game played by humans against two independent computer opponents that were randomly interleaved. We found that human subjects' strategies were highly dependent on opponent context in this game, a fact that was reflected in select brain regions' activity (rostral anterior cingulate and lingual cortex). These results indicate that human reinforcement histories are highly dependent on contextual factors, a fact that is reflected in neural correlates of reinforcement signals.
Collapse
|
36
|
Ohira H, Ichikawa N, Kimura K, Fukuyama S, Shinoda J, Yamada J. Neural and sympathetic activity associated with exploration in decision-making: further evidence for involvement of insula. Front Behav Neurosci 2014; 8:381. [PMID: 25426038 PMCID: PMC4226165 DOI: 10.3389/fnbeh.2014.00381] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2014] [Accepted: 10/16/2014] [Indexed: 11/16/2022] Open
Abstract
We previously reported that sympathetic activity was associated with exploration in decision-making indexed by entropy, which is a concept in information theory and indexes randomness of choices or the degree of deviation from sticking to recent experiences of gains and losses, and that activation of the anterior insula mediated this association. The current study aims to replicate and to expand these findings in a situation where contingency between options and outcomes is manipulated. Sixteen participants performed a stochastic decision-making task in which we manipulated a condition with low uncertainty of gain/loss (contingent-reward condition) and a condition with high uncertainty of gain/loss (random-reward condition). Regional cerebral blood flow was measured by (15)O-water positron emission tomography (PET), and cardiovascular parameters and catecholamine in the peripheral blood were measured, during the task. In the contingent-reward condition, norepinephrine as an index of sympathetic activity was positively correlated with entropy indicating exploration in decision-making. Norepinephrine was negatively correlated with neural activity in the right posterior insula, rostral anterior cingulate cortex, and dorsal pons, suggesting neural bases for detecting changes of bodily states. Furthermore, right anterior insular activity was negatively correlated with entropy, suggesting influences on exploration in decision-making. By contrast, in the random-reward condition, entropy correlated with activity in the dorsolateral prefrontal and parietal cortices but not with sympathetic activity. These findings suggest that influences of sympathetic activity on exploration in decision-making and its underlying neural mechanisms might be dependent on the degree of uncertainty of situations.
Collapse
Affiliation(s)
- Hideki Ohira
- Department of Psychology, Nagoya UniversityNagoya, Japan
| | - Naho Ichikawa
- Department of Psychiatry and Neurosciences, Hiroshima UniversityHiroshima, Japan
| | - Kenta Kimura
- Human Technology Research Institute, National Institute of Advanced Industrial Science and TechnologyTsukuba, Japan
| | | | - Jun Shinoda
- Chubu Ryogo Center, Kizawa Memorial HospitalMinokamo, Japan
| | | |
Collapse
|
37
|
Behavioral Variability through Stochastic Choice and Its Gating by Anterior Cingulate Cortex. Cell 2014; 159:21-32. [DOI: 10.1016/j.cell.2014.08.037] [Citation(s) in RCA: 123] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2014] [Revised: 08/22/2014] [Accepted: 08/25/2014] [Indexed: 10/24/2022]
|
38
|
Misirlisoy E, Haggard P. Asymmetric predictability and cognitive competition in football penalty shootouts. Curr Biol 2014; 24:1918-22. [PMID: 25088554 DOI: 10.1016/j.cub.2014.07.013] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2014] [Revised: 06/05/2014] [Accepted: 07/04/2014] [Indexed: 10/25/2022]
Abstract
Sports provide powerful demonstrations of cognitive strategies underlying competitive behavior. Penalty shootouts in football (soccer) involve direct competition between elite players and absorb the attention of millions. The penalty shootout between Germany and England in the 1990 World Cup semifinal was viewed by an estimated 46.49% of the UK population. In a penalty shootout, a goalkeeper must defend their goal without teammate assistance while an opposing series of kickers aim to kick the ball past them into the net. As in many sports, the ball during a penalty kick often approaches too quickly for the goalkeeper to react to its direction of motion; instead, the goalkeeper must guess the likely direction of the kick, and dive in anticipation, if they are to have a chance of saving the shot. We examined all 361 kicks from the 37 penalty shootouts that occurred in World Cup and Euro Cup matches over a 36-year period from 1976 to 2012 and show that goalkeepers displayed a clear sequential bias. Following repeated kicks in the same direction, goalkeepers became increasingly likely to dive in the opposite direction on the next kick. Surprisingly, kickers failed to exploit these goalkeeper biases. Our findings highlight the importance of monitoring and predicting sequential behavior in real-world competition. Penalty shootouts pit one goalkeeper against several kickers in rapid succession. Asymmetries in the cognitive capacities of an individual versus a group could produce significant advantages over opponents.
Collapse
Affiliation(s)
- Erman Misirlisoy
- Institute of Cognitive Neuroscience, University College London, 17 Queen Square, London WC1N 3AR, UK.
| | - Patrick Haggard
- Institute of Cognitive Neuroscience, University College London, 17 Queen Square, London WC1N 3AR, UK.
| |
Collapse
|
39
|
Skelin I, Hakstol R, VanOyen J, Mudiayi D, Molina LA, Holec V, Hong NS, Euston DR, McDonald RJ, Gruber AJ. Lesions of dorsal striatum eliminate lose-switch responding but not mixed-response strategies in rats. Eur J Neurosci 2014; 39:1655-63. [DOI: 10.1111/ejn.12518] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2013] [Revised: 12/07/2013] [Accepted: 01/18/2014] [Indexed: 11/30/2022]
Affiliation(s)
- Ivan Skelin
- Department of Neuroscience; Canadian Centre for Behavioural Neuroscience; University of Lethbridge; 4401 University Dr. W. T1K 3M4 Lethbridge AB Canada
| | - Rhys Hakstol
- Department of Neuroscience; Canadian Centre for Behavioural Neuroscience; University of Lethbridge; 4401 University Dr. W. T1K 3M4 Lethbridge AB Canada
| | - Jenn VanOyen
- Department of Neuroscience; Canadian Centre for Behavioural Neuroscience; University of Lethbridge; 4401 University Dr. W. T1K 3M4 Lethbridge AB Canada
| | - Dominic Mudiayi
- Department of Neuroscience; Canadian Centre for Behavioural Neuroscience; University of Lethbridge; 4401 University Dr. W. T1K 3M4 Lethbridge AB Canada
| | - Leonardo A. Molina
- Department of Neuroscience; Canadian Centre for Behavioural Neuroscience; University of Lethbridge; 4401 University Dr. W. T1K 3M4 Lethbridge AB Canada
| | - Victoria Holec
- Department of Neuroscience; Canadian Centre for Behavioural Neuroscience; University of Lethbridge; 4401 University Dr. W. T1K 3M4 Lethbridge AB Canada
| | - Nancy S. Hong
- Department of Neuroscience; Canadian Centre for Behavioural Neuroscience; University of Lethbridge; 4401 University Dr. W. T1K 3M4 Lethbridge AB Canada
| | - David R. Euston
- Department of Neuroscience; Canadian Centre for Behavioural Neuroscience; University of Lethbridge; 4401 University Dr. W. T1K 3M4 Lethbridge AB Canada
| | - Robert J. McDonald
- Department of Neuroscience; Canadian Centre for Behavioural Neuroscience; University of Lethbridge; 4401 University Dr. W. T1K 3M4 Lethbridge AB Canada
| | - Aaron J. Gruber
- Department of Neuroscience; Canadian Centre for Behavioural Neuroscience; University of Lethbridge; 4401 University Dr. W. T1K 3M4 Lethbridge AB Canada
| |
Collapse
|
40
|
Iigaya K, Fusi S. Dynamical regimes in neural network models of matching behavior. Neural Comput 2013; 25:3093-112. [PMID: 24047324 DOI: 10.1162/neco_a_00522] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
The matching law constitutes a quantitative description of choice behavior that is often observed in foraging tasks. According to the matching law, organisms distribute their behavior across available response alternatives in the same proportion that reinforcers are distributed across those alternatives. Recently a few biophysically plausible neural network models have been proposed to explain the matching behavior observed in the experiments. Here we study systematically the learning dynamics of these networks while performing a matching task on the concurrent variable interval (VI) schedule. We found that the model neural network can operate in one of three qualitatively different regimes depending on the parameters that characterize the synaptic dynamics and the reward schedule: (1) a matching behavior regime, in which the probability of choosing an option is roughly proportional to the baiting fractional probability of that option; (2) a perseverative regime, in which the network tends to make always the same decision; and (3) a tristable regime, in which the network can either perseverate or choose the two targets randomly approximately with the same probability. Different parameters of the synaptic dynamics lead to different types of deviations from the matching law, some of which have been observed experimentally. We show that the performance of the network depends on the number of stable states of each synapse and that bistable synapses perform close to optimal when the proper learning rate is chosen. Because our model provides a link between synaptic dynamics and qualitatively different behaviors, this work provides us with insight into the effects of neuromodulators on adaptive behaviors and psychiatric disorders.
Collapse
Affiliation(s)
- Kiyohito Iigaya
- Center for Theoretical Neuroscience, Department of Neuroscience, Columbia University Medical Center, New York, NY 10032, and Department of Physics, Columbia University, New York, NY 10027, U.S.A.
| | | |
Collapse
|
41
|
Abstract
In stable environments, decision makers can exploit their previously learned strategies for optimal outcomes, while exploration might lead to better options in unstable environments. Here, to investigate the cortical contributions to exploratory behavior, we analyzed single-neuron activity recorded from four different cortical areas of monkeys performing a matching-pennies task and a visual search task, which encouraged and discouraged exploration, respectively. We found that neurons in multiple regions in the frontal and parietal cortex tended to encode signals related to previously rewarded actions more reliably than unrewarded actions. In addition, signals for rewarded choices in the supplementary eye field were attenuated during the visual search task and were correlated with the tendency to switch choices during the matching-pennies task. These results suggest that the supplementary eye field might play a unique role in encouraging animals to explore alternative decision-making strategies.
Collapse
|
42
|
Ohira H, Matsunaga M, Murakami H, Osumi T, Fukuyama S, Shinoda J, Yamada J. Neural mechanisms mediating association of sympathetic activity and exploration in decision-making. Neuroscience 2013; 246:362-74. [PMID: 23643977 DOI: 10.1016/j.neuroscience.2013.04.050] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2013] [Revised: 04/02/2013] [Accepted: 04/25/2013] [Indexed: 11/19/2022]
Abstract
The somatic marker hypothesis asserts that decision-making can be guided by feedback of bodily states to the brain. In line with this hypothesis, the present study tested whether sympathetic activity shows an association with a tonic dimension of decision-making, exploratory tendency represented by entropy in information theory, and further examined the neural mechanisms of the association. Twenty participants performed a stochastic reversal learning task that required decision-making in an unstable and uncertain situation. Regional cerebral blood flow was evaluated using (15)O-water positron emission tomography (PET), and cardiovascular indices and concentrations of catecholamine in peripheral blood were also measured, during the task. In reversal learning, increased epinephrine during the task positively correlated with larger entropy, indicating a greater tendency for exploration in decision-making. The increase of epinephrine also correlated with brain activity revealed by PET in the somatosensory cortices, anterior insula, dorsal anterior cingulate cortex, and the dorsal pons. This result is consistent with previously reported brain matrixes of representation of bodily states and interoception. In addition, activity of the anterior insula specifically correlated with entropy, suggesting possible mediation of this brain region between peripheral sympathetic arousal and exploration in decision-making. These findings shed a new light about a role of bodily states in decision-making and underlying neural mechanisms.
Collapse
Affiliation(s)
- H Ohira
- Department of Psychology, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8601, Japan.
| | | | | | | | | | | | | |
Collapse
|
43
|
Stamps JA, Briffa M, Biro PA. Unpredictable animals: individual differences in intraindividual variability (IIV). Anim Behav 2012. [DOI: 10.1016/j.anbehav.2012.02.017] [Citation(s) in RCA: 208] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
44
|
Kianercy A, Galstyan A. Dynamics of Boltzmann Q learning in two-player two-action games. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2012; 85:041145. [PMID: 22680455 DOI: 10.1103/physreve.85.041145] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2011] [Revised: 02/28/2012] [Indexed: 06/01/2023]
Abstract
We consider the dynamics of Q learning in two-player two-action games with a Boltzmann exploration mechanism. For any nonzero exploration rate the dynamics is dissipative, which guarantees that agent strategies converge to rest points that are generally different from the game's Nash equlibria (NEs). We provide a comprehensive characterization of the rest point structure for different games and examine the sensitivity of this structure with respect to the noise due to exploration. Our results indicate that for a class of games with multiple NEs the asymptotic behavior of learning dynamics can undergo drastic changes at critical exploration rates. Furthermore, we demonstrate that, for certain games with a single NE, it is possible to have additional rest points (not corresponding to any NE) that persist for a finite range of the exploration rates and disappear when the exploration rates of both players tend to zero.
Collapse
Affiliation(s)
- Ardeshir Kianercy
- USC Information Sciences Institute, Marina del Rey, California 90292, USA
| | | |
Collapse
|
45
|
Abstract
Reinforcement learning is an adaptive process in which an animal utilizes its previous experience to improve the outcomes of future choices. Computational theories of reinforcement learning play a central role in the newly emerging areas of neuroeconomics and decision neuroscience. In this framework, actions are chosen according to their value functions, which describe how much future reward is expected from each action. Value functions can be adjusted not only through reward and penalty, but also by the animal's knowledge of its current environment. Studies have revealed that a large proportion of the brain is involved in representing and updating value functions and using them to choose an action. However, how the nature of a behavioral task affects the neural mechanisms of reinforcement learning remains incompletely understood. Future studies should uncover the principles by which different computational elements of reinforcement learning are dynamically coordinated across the entire brain.
Collapse
Affiliation(s)
- Daeyeol Lee
- Department of Neurobiology, Kavli Institute for Neuroscience, Yale University School of Medicine, New Haven, Connecticut 06510, USA.
| | | | | |
Collapse
|
46
|
Abstract
Behavioral changes driven by reinforcement and punishment are referred to as simple or model-free reinforcement learning. Animals can also change their behaviors by observing events that are neither appetitive nor aversive when these events provide new information about payoffs available from alternative actions. This is an example of model-based reinforcement learning and can be accomplished by incorporating hypothetical reward signals into the value functions for specific actions. Recent neuroimaging and single-neuron recording studies showed that the prefrontal cortex and the striatum are involved not only in reinforcement and punishment, but also in model-based reinforcement learning. We found evidence for both types of learning, and hence hybrid learning, in monkeys during simulated competitive games. In addition, in both the dorsolateral prefrontal cortex and orbitofrontal cortex, individual neurons heterogeneously encoded signals related to actual and hypothetical outcomes from specific actions, suggesting that both areas might contribute to hybrid learning.
Collapse
Affiliation(s)
- Hiroshi Abe
- Laboratory of Neurobiology, The Rockefeller University, New York, New York, USA
| | | | | |
Collapse
|
47
|
Danckert J, Stöttinger E, Quehl N, Anderson B. Right Hemisphere Brain Damage Impairs Strategy Updating. Cereb Cortex 2011; 22:2745-60. [PMID: 22178711 DOI: 10.1093/cercor/bhr351] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- James Danckert
- Department of Psychology, University of Waterloo, Waterloo, N2L 3G1 Ontario, Canada.
| | | | | | | |
Collapse
|
48
|
Neiman T, Loewenstein Y. Reinforcement learning in professional basketball players. Nat Commun 2011; 2:569. [PMID: 22146388 PMCID: PMC3247813 DOI: 10.1038/ncomms1580] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2011] [Accepted: 11/02/2011] [Indexed: 11/24/2022] Open
Abstract
Reinforcement learning in complex natural environments is a challenging task because the agent should generalize from the outcomes of actions taken in one state of the world to future actions in different states of the world. The extent to which human experts find the proper level of generalization is unclear. Here we show, using the sequences of field goal attempts made by professional basketball players, that the outcome of even a single field goal attempt has a considerable effect on the rate of subsequent 3 point shot attempts, in line with standard models of reinforcement learning. However, this change in behaviour is associated with negative correlations between the outcomes of successive field goal attempts. These results indicate that despite years of experience and high motivation, professional players overgeneralize from the outcomes of their most recent actions, which leads to decreased performance. Reinforcement learning quantifies the change in behaviour in response to past experience. Using field goal attempt data from basketball, Neiman and Loewenstein demonstrate that even one failed or made attempt has an impact on subsequent attempts, showing that players overgeneralize from their most recent actions.
Collapse
Affiliation(s)
- Tal Neiman
- Department of Neurobiology, The Interdisciplinary Center for Neural Computation and Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem, Jerusalem 91904, Israel.
| | | |
Collapse
|
49
|
Vickery T, Chun M, Lee D. Ubiquity and Specificity of Reinforcement Signals throughout the Human Brain. Neuron 2011; 72:166-77. [DOI: 10.1016/j.neuron.2011.08.011] [Citation(s) in RCA: 155] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/15/2011] [Indexed: 11/28/2022]
|
50
|
Abe H, Lee D. Distributed coding of actual and hypothetical outcomes in the orbital and dorsolateral prefrontal cortex. Neuron 2011; 70:731-41. [PMID: 21609828 DOI: 10.1016/j.neuron.2011.03.026] [Citation(s) in RCA: 129] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/30/2011] [Indexed: 10/18/2022]
Abstract
Knowledge about hypothetical outcomes from unchosen actions is beneficial only when such outcomes can be correctly attributed to specific actions. Here we show that during a simulated rock-paper-scissors game, rhesus monkeys can adjust their choice behaviors according to both actual and hypothetical outcomes from their chosen and unchosen actions, respectively. In addition, neurons in both dorsolateral prefrontal cortex and orbitofrontal cortex encoded the signals related to actual and hypothetical outcomes immediately after they were revealed to the animal. Moreover, compared to the neurons in the orbitofrontal cortex, those in the dorsolateral prefrontal cortex were more likely to change their activity according to the hypothetical outcomes from specific actions. Conjunctive and parallel coding of multiple actions and their outcomes in the prefrontal cortex might enhance the efficiency of reinforcement learning and also contribute to their context-dependent memory.
Collapse
Affiliation(s)
- Hiroshi Abe
- Department of Neurobiology, Yale University, New Haven, CT 06510, USA
| | | |
Collapse
|