51
|
Abstract
Although reinforcement learning (RL) theories have been influential in characterizing the mechanisms for reward-guided choice in the brain, the predominant temporal difference (TD) algorithm cannot explain many flexible or goal-directed actions that have been demonstrated behaviorally. We investigate such actions by contrasting an RL algorithm that is model based, in that it relies on learning a map or model of the task and planning within it, to traditional model-free TD learning. To distinguish these approaches in humans, we used functional magnetic resonance imaging in a continuous spatial navigation task, in which frequent changes to the layout of the maze forced subjects continually to relearn their favored routes, thereby exposing the RL mechanisms used. We sought evidence for the neural substrates of such mechanisms by comparing choice behavior and blood oxygen level-dependent (BOLD) signals to decision variables extracted from simulations of either algorithm. Both choices and value-related BOLD signals in striatum, although most often associated with TD learning, were better explained by the model-based theory. Furthermore, predecessor quantities for the model-based value computation were correlated with BOLD signals in the medial temporal lobe and frontal cortex. These results point to a significant extension of both the computational and anatomical substrates for RL in the brain.
Collapse
|
52
|
A reservoir of time constants for memory traces in cortical neurons. Nat Neurosci 2011; 14:366-72. [PMID: 21317906 PMCID: PMC3079398 DOI: 10.1038/nn.2752] [Citation(s) in RCA: 194] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2010] [Accepted: 01/11/2011] [Indexed: 11/20/2022]
Abstract
According to reinforcement learning theory of decision making, reward expectation is computed by integrating past rewards with a fixed timescale. By contrast, we found that a wide range of time constants is available across cortical neurons recorded from monkeys performing a competitive game task. By recognizing that reward modulates neural activity multiplicatively, we found that one or two time constants of reward memory can be extracted for each neuron in prefrontal, cingulate, and parietal cortex. These timescales ranged from hundreds of milliseconds to tens of seconds, according to a power-law distribution, which is consistent across areas and reproduced by a “reservoir” neural network model. These neuronal memory timescales were weakly but significantly correlated with those of monkey's decisions. Our findings suggest a flexible memory system, where neural subpopulations with distinct sets of long or short memory timescales may be selectively deployed according to the task demands.
Collapse
|
53
|
Gläscher J, Daw N, Dayan P, O'Doherty JP. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron 2010; 66:585-95. [PMID: 20510862 DOI: 10.1016/j.neuron.2010.04.016] [Citation(s) in RCA: 683] [Impact Index Per Article: 48.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/26/2010] [Indexed: 10/19/2022]
Abstract
Reinforcement learning (RL) uses sequential experience with situations ("states") and outcomes to assess actions. Whereas model-free RL uses this experience directly, in the form of a reward prediction error (RPE), model-based RL uses it indirectly, building a model of the state transition and outcome structure of the environment, and evaluating actions by searching this model. A state prediction error (SPE) plays a central role, reporting discrepancies between the current model and the observed state transitions. Using functional magnetic resonance imaging in humans solving a probabilistic Markov decision task, we found the neural signature of an SPE in the intraparietal sulcus and lateral prefrontal cortex, in addition to the previously well-characterized RPE in the ventral striatum. This finding supports the existence of two unique forms of learning signal in humans, which may form the basis of distinct computational strategies for guiding behavior.
Collapse
Affiliation(s)
- Jan Gläscher
- Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA 91101, USA.
| | | | | | | |
Collapse
|
54
|
Suzuki S, Niki K, Fujisaki S, Akiyama E. Neural basis of conditional cooperation. Soc Cogn Affect Neurosci 2010; 6:338-47. [PMID: 20501484 DOI: 10.1093/scan/nsq042] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Cooperation among genetically unrelated individuals is a fundamental aspect of society, but it has been a longstanding puzzle in biological and social sciences. Recently, theoretical studies in biology and economics showed that conditional cooperation-cooperating only with those who have exhibited cooperative behavior-can spread over a society. Furthermore, experimental studies in psychology demonstrated that people are actually conditional cooperators. In this study, we used functional magnetic resonance imaging to investigate the neural system underlying conditional cooperation by scanning participants during interaction with cooperative, neutral and non-cooperative opponents in prisoner's dilemma games. The results showed that: (i) participants cooperated more frequently with both cooperative and neutral opponents than with non-cooperative opponents; and (ii) a brain area related to cognitive inhibition of pre-potent responses (right dorsolateral prefrontal cortex) showed greater activation, especially when participants confronted non-cooperative opponents. Consequently, we suggest that cognitive inhibition of the motivation to cooperate with non-cooperators drives the conditional behavior.
Collapse
Affiliation(s)
- Shinsuke Suzuki
- Laboratory of Integrated Theoretical Neuroscience, RIKEN Brain Science Institute, Saitama 351-0198, Japan.
| | | | | | | |
Collapse
|
55
|
Curtis CE, Lee D. Beyond working memory: the role of persistent activity in decision making. Trends Cogn Sci 2010; 14:216-22. [PMID: 20381406 DOI: 10.1016/j.tics.2010.03.006] [Citation(s) in RCA: 130] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2010] [Revised: 03/06/2010] [Accepted: 03/07/2010] [Indexed: 10/19/2022]
Abstract
Since its first discovery in the prefrontal cortex, persistent activity during the interval between a transient sensory stimulus and a subsequent behavioral response has been identified in many cortical and subcortical areas. Such persistent activity is thought to reflect the maintenance of working memory representations that bridge past events with future contingent plans. Indeed, the term persistent activity is sometimes used interchangeably with working memory. In this review, we argue that persistent activity observed broadly across many cortical and subcortical areas reflects not only working memory maintenance, but also a variety of other cognitive processes, including perceptual and reward-based decision making.
Collapse
Affiliation(s)
- Clayton E Curtis
- Department of Psychology and Center for Neural Science, New York University, 6 Washington Place, New York, NY 10003, USA
| | | |
Collapse
|
56
|
Thevarajah D, Webb R, Ferrall C, Dorris MC. Modeling the value of strategic actions in the superior colliculus. Front Behav Neurosci 2010; 3:57. [PMID: 20161807 PMCID: PMC2821176 DOI: 10.3389/neuro.08.057.2009] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2009] [Accepted: 12/01/2009] [Indexed: 11/25/2022] Open
Abstract
In learning models of strategic game play, an agent constructs a valuation (action value) over possible future choices as a function of past actions and rewards. Choices are then stochastic functions of these action values. Our goal is to uncover a neural signal that correlates with the action value posited by behavioral learning models. We measured activity from neurons in the superior colliculus (SC), a midbrain region involved in planning saccadic eye movements, while monkeys performed two saccade tasks. In the strategic task, monkeys competed against a computer in a saccade version of the mixed-strategy game ”matching-pennies”. In the instructed task, saccades were elicited through explicit instruction rather than free choices. In both tasks neuronal activity and behavior were shaped by past actions and rewards with more recent events exerting a larger influence. Further, SC activity predicted upcoming choices during the strategic task and upcoming reaction times during the instructed task. Finally, we found that neuronal activity in both tasks correlated with an established learning model, the Experience Weighted Attraction model of action valuation (Camerer and Ho, 1999). Collectively, our results provide evidence that action values hypothesized by learning models are represented in the motor planning regions of the brain in a manner that could be used to select strategic actions.
Collapse
Affiliation(s)
- Dhushan Thevarajah
- Department of Physiology, Centre for Neuroscience Studies and Canadian Institutes of Health Research Group in Sensory-Motor Systems, Queen's University Kingston, ON, Canada
| | | | | | | |
Collapse
|
57
|
Abstract
The striatum is thought to play a crucial role in value-based decision making. Although a large body of evidence suggests its involvement in action selection as well as action evaluation, underlying neural processes for these functions of the striatum are largely unknown. To obtain insights on this matter, we simultaneously recorded neuronal activity in the dorsal and ventral striatum of rats performing a dynamic two-armed bandit task, and examined temporal profiles of neural signals related to animal's choice, its outcome, and action value. Whereas significant neural signals for action value were found in both structures before animal's choice of action, signals related to the upcoming choice were relatively weak and began to emerge only in the dorsal striatum approximately 200 ms before the behavioral manifestation of the animal's choice. In contrast, once the animal revealed its choice, signals related to choice and its value increased steeply and persisted until the outcome of animal's choice was revealed, so that some neurons in both structures concurrently conveyed signals related to animal's choice, its outcome, and the value of chosen action. Thus, all the components necessary for updating values of chosen actions were available in the striatum. These results suggest that the striatum not only represents values associated with potential choices before animal's choice of action, but might also update the value of chosen action once its outcome is revealed. In contrast, action selection might take place elsewhere or in the dorsal striatum only immediately before its behavioral manifestation.
Collapse
|
58
|
Sanabria F, Thrailkill E. Pigeons (Columba livia) approach Nash equilibrium in experimental Matching Pennies competitions. J Exp Anal Behav 2009; 91:169-83. [PMID: 19794832 DOI: 10.1901/jeab.2009.91-169] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2008] [Accepted: 11/17/2008] [Indexed: 11/22/2022]
Abstract
The game of Matching Pennies (MP), a simplified version of the more popular Rock, Papers, Scissors, schematically represents competitions between organisms with incentives to predict each other's behavior. Optimal performance in iterated MP competitions involves the production of random choice patterns and the detection of nonrandomness in the opponent's choices. The purpose of this study was to replicate systematic deviations from optimal choice observed in humans when playing MP, and to establish whether suboptimal performance was better described by a modified linear learning model or by a more cognitively sophisticated reinforcement-tracking model. Two pairs of pigeons played iterated MP competitions; payoffs for successful choices (e.g., "Rock" vs. "Scissors") varied within experimental sessions and across experimental conditions, and were signaled by visual stimuli. Pigeons' behavior adjusted to payoff matrices; divergences from optimal play were analogous to those usually demonstrated by humans, except for the tendency of pigeons to persist on prior choices. Suboptimal play was well characterized by a linear learning model of the kind widely used to describe human performance. This linear learning model may thus serve as default account of competitive performance against which the imputation of cognitively sophisticated processes can be evaluated.
Collapse
Affiliation(s)
- Federico Sanabria
- Department of Psychology,Arizona State University, Tempe, Arizona 85287-1104, USA.
| | | |
Collapse
|
59
|
Abstract
Activity of the neurons in the lateral intraparietal cortex (LIP) displays a mixture of sensory, motor, and memory signals. Moreover, they often encode signals reflecting the accumulation of sensory evidence that certain eye movements might lead to a desirable outcome. However, when the environment changes dynamically, animals are also required to combine the information about its previously chosen actions and their outcomes appropriately to update continually the desirabilities of alternative actions. Here, we investigated whether LIP neurons encoded signals necessary to update an animal's decision-making strategies adaptively during a computer-simulated matching-pennies game. Using a reinforcement learning algorithm, we estimated the value functions that best predicted the animal's choices on a trial-by-trial basis. We found that, immediately before the animal revealed its choice, approximately 18% of LIP neurons changed their activity according to the difference in the value functions for the two targets. In addition, a somewhat higher fraction of LIP neurons displayed signals related to the sum of the value functions, which might correspond to the state value function or an average rate of reward used as a reference point. Similar to the neurons in the prefrontal cortex, many LIP neurons also encoded the signals related to the animal's previous choices. Thus, the posterior parietal cortex might be a part of the network that provides the substrate for forming appropriate associations between actions and outcomes.
Collapse
|
60
|
Huh N, Jo S, Kim H, Sul JH, Jung MW. Model-based reinforcement learning under concurrent schedules of reinforcement in rodents. Learn Mem 2009; 16:315-23. [PMID: 19403794 DOI: 10.1101/lm.1295509] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Reinforcement learning theories postulate that actions are chosen to maximize a long-term sum of positive outcomes based on value functions, which are subjective estimates of future rewards. In simple reinforcement learning algorithms, value functions are updated only by trial-and-error, whereas they are updated according to the decision-maker's knowledge or model of the environment in model-based reinforcement learning algorithms. To investigate how animals update value functions, we trained rats under two different free-choice tasks. The reward probability of the unchosen target remained unchanged in one task, whereas it increased over time since the target was last chosen in the other task. The results show that goal choice probability increased as a function of the number of consecutive alternative choices in the latter, but not the former task, indicating that the animals were aware of time-dependent increases in arming probability and used this information in choosing goals. In addition, the choice behavior in the latter task was better accounted for by a model-based reinforcement learning algorithm. Our results show that rats adopt a decision-making process that cannot be accounted for by simple reinforcement learning models even in a relatively simple binary choice task, suggesting that rats can readily improve their decision-making strategy through the knowledge of their environments.
Collapse
Affiliation(s)
- Namjung Huh
- Neuroscience Laboratory, Institute for Medical Sciences and Division of Cell Transformation and Restoration, Ajou University School of Medicine, Suwon, Korea
| | | | | | | | | |
Collapse
|
61
|
Abstract
Game theory outlines optimal response strategies during mixed-strategy competitions. The neural processes involved in choosing individual strategic actions, however, remain poorly understood. Here, we tested whether the superior colliculus (SC), a brain region critical for generating sensory-guided saccades, is also involved in choosing saccades under strategic conditions. Monkeys were free to choose either of two saccade targets as they competed against a computer opponent during the mixed-strategy game "matching pennies." The accuracy with which presaccadic SC activity predicted upcoming choice gradually increased in the time leading up to the saccade. Probing the SC with suprathreshold stimulation demonstrated that these evolving signals were functionally involved in preparing strategic saccades. Finally, subthreshold stimulation of the SC increased the likelihood that contralateral saccades were selected. Together, our results suggest that motor regions of the brain play an active role in choosing strategic actions rather than passively executing those prespecified by upstream executive regions.
Collapse
|
62
|
Abstract
Human behaviors can be more powerfully influenced by conditioned reinforcers, such as money, than by primary reinforcers. Moreover, people often change their behaviors to avoid monetary losses. However, the effect of removing conditioned reinforcers on choices has not been explored in animals, and the neural mechanisms mediating the behavioral effects of gains and losses are not well understood. To investigate the behavioral and neural effects of gaining and losing a conditioned reinforcer, we trained rhesus monkeys for a matching pennies task in which the positive and negative values of its payoff matrix were realized by the delivery and removal of a conditioned reinforcer. Consistent with the findings previously obtained with non-negative payoffs and primary rewards, the animal's choice behavior during this task was nearly optimal. Nevertheless, the gain and loss of a conditioned reinforcer significantly increased and decreased, respectively, the tendency for the animal to choose the same target in subsequent trials. We also found that the neurons in the dorsomedial frontal cortex, dorsal anterior cingulate cortex, and dorsolateral prefrontal cortex often changed their activity according to whether the animal earned or lost a conditioned reinforcer in the current or previous trial. Moreover, many neurons in the dorsomedial frontal cortex also signaled the gain or loss occurring as a result of choosing a particular action as well as changes in the animal's behaviors resulting from such gains or losses. Thus, primate medial frontal cortex might mediate the behavioral effects of conditioned reinforcers and their losses.
Collapse
|
63
|
Kim S, Hwang J, Seo H, Lee D. Valuation of uncertain and delayed rewards in primate prefrontal cortex. Neural Netw 2009; 22:294-304. [PMID: 19375276 DOI: 10.1016/j.neunet.2009.03.010] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2008] [Revised: 03/09/2009] [Accepted: 03/21/2009] [Indexed: 11/26/2022]
Abstract
Humans and animals often must choose between rewards that differ in their qualities, magnitudes, immediacy, and likelihood, and must estimate these multiple reward parameters from their experience. However, the neural basis for such complex decision making is not well understood. To understand the role of the primate prefrontal cortex in determining the subjective value of delayed or uncertain reward, we examined the activity of individual prefrontal neurons during an inter-temporal choice task and a computer-simulated competitive game. Consistent with the findings from previous studies in humans and other animals, the monkey's behaviors during inter-temporal choice were well accounted for by a hyperbolic discount function. In addition, the activity of many neurons in the lateral prefrontal cortex reflected the signals related to the magnitude and delay of the reward expected from a particular action, and often encoded the difference in temporally discounted values that predicted the animal's choice. During a computerized matching pennies game, the animals approximated the optimal strategy, known as Nash equilibrium, using a reinforcement learning algorithm. We also found that many neurons in the lateral prefrontal cortex conveyed the signals related to the animal's previous choices and their outcomes, suggesting that this cortical area might play an important role in forming associations between actions and their outcomes. These results show that the primate lateral prefrontal cortex plays a central role in estimating the values of alternative actions based on multiple sources of information.
Collapse
Affiliation(s)
- Soyoun Kim
- Department of Neurobiology, Yale University School of Medicine, New Haven, CT 06510, USA
| | | | | | | |
Collapse
|
64
|
Ishida F, Sasaki T, Sakaguchi Y, Shimai H. Reinforcement-learning agents with different temperature parameters explain the variety of human action–selection behavior in a Markov decision process task. Neurocomputing 2009. [DOI: 10.1016/j.neucom.2008.04.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
65
|
Jensen G, Neuringer A. Choice as a function of reinforcer "hold": from probability learning to concurrent reinforcement. ACTA ACUST UNITED AC 2009; 34:437-60. [PMID: 18954229 DOI: 10.1037/0097-7403.34.4.437] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Two procedures commonly used to study choice are concurrent reinforcement and probability learning. Under concurrent-reinforcement procedures, once a reinforcer is scheduled, it remains available indefinitely until collected. Therefore reinforcement becomes increasingly likely with passage of time or responses on other operanda. Under probability learning, reinforcer probabilities are constant and independent of passage of time or responses. Therefore a particular reinforcer is gained or not, on the basis of a single response, and potential reinforcers are not retained, as when betting at a roulette wheel. In the "real" world, continued availability of reinforcers often lies between these two extremes, with potential reinforcers being lost owing to competition, maturation, decay, and random scatter. The authors parametrically manipulated the likelihood of continued reinforcer availability, defined as hold, and examined the effects on pigeons' choices. Choices varied as power functions of obtained reinforcers under all values of hold. Stochastic models provided generally good descriptions of choice emissions with deviations from stochasticity systematically related to hold. Thus, a single set of principles accounted for choices across hold values that represent a wide range of real-world conditions.
Collapse
Affiliation(s)
- Greg Jensen
- Psychology Department, Reed College, Portland, OR 97202, USA
| | | |
Collapse
|
66
|
Krueger F, Grafman J, McCabe K. Neural correlates of economic game playing. Philos Trans R Soc Lond B Biol Sci 2008; 363:3859-74. [PMID: 18829425 PMCID: PMC2581786 DOI: 10.1098/rstb.2008.0165] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
The theory of games provides a mathematical formalization of strategic choices, which have been studied in both economics and neuroscience, and more recently has become the focus of neuroeconomics experiments with human and non-human actors. This paper reviews the results from a number of game experiments that establish a unitary system for forming subjective expected utility maps in the brain, and acting on these maps to produce choices. Social situations require the brain to build an understanding of the other person using neuronal mechanisms that share affective and intentional mental states. These systems allow subjects to better predict other players' choices, and allow them to modify their subjective utility maps to value pro-social strategies. New results for a trust game are presented, which show that the trust relationship includes systems common to both trusting and trustworthy behaviour, but they also show that the relative temporal positions of first and second players require computations unique to that role.
Collapse
Affiliation(s)
- Frank Krueger
- Cognitive Neuroscience Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892-1440, USA
| | | | | |
Collapse
|
67
|
Seo H, Lee D. Cortical mechanisms for reinforcement learning in competitive games. Philos Trans R Soc Lond B Biol Sci 2008; 363:3845-57. [PMID: 18829430 DOI: 10.1098/rstb.2008.0158] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Game theory analyses optimal strategies for multiple decision makers interacting in a social group. However, the behaviours of individual humans and animals often deviate systematically from the optimal strategies described by game theory. The behaviours of rhesus monkeys (Macaca mulatta) in simple zero-sum games showed similar patterns, but their departures from the optimal strategies were well accounted for by a simple reinforcement-learning algorithm. During a computer-simulated zero-sum game, neurons in the dorsolateral prefrontal cortex often encoded the previous choices of the animal and its opponent as well as the animal's reward history. By contrast, the neurons in the anterior cingulate cortex predominantly encoded the animal's reward history. Using simple competitive games, therefore, we have demonstrated functional specialization between different areas of the primate frontal cortex involved in outcome monitoring and action selection. Temporally extended signals related to the animal's previous choices might facilitate the association between choices and their delayed outcomes, whereas information about the choices of the opponent might be used to estimate the reward expected from a particular action. Finally, signals related to the reward history might be used to monitor the overall success of the animal's current decision-making strategy.
Collapse
Affiliation(s)
- Hyojung Seo
- Department of Neurobiology, Yale University School of Medicine, 333 Cedar Street, SHM B404, New Haven, CT 06510, USA
| | | |
Collapse
|
68
|
Gold JI, Law CT, Connolly P, Bennur S. The relative influences of priors and sensory evidence on an oculomotor decision variable during perceptual learning. J Neurophysiol 2008; 100:2653-68. [PMID: 18753326 PMCID: PMC2585410 DOI: 10.1152/jn.90629.2008] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2008] [Accepted: 08/27/2008] [Indexed: 11/22/2022] Open
Abstract
Choice behavior on simple sensory-motor tasks can exhibit trial-to-trial dependencies. For perceptual tasks, these dependencies reflect the influence of prior trials on choices that are also guided by sensory evidence, which is often independent across trials. Here we show that the relative influences of prior trials and sensory evidence on choice behavior can be shaped by training, such that prior influences are strongest when perceptual sensitivity to the relevant sensory evidence is weakest and then decline steadily as sensitivity improves. We trained monkeys to decide the direction of random-dot motion and indicate their decision with an eye movement. We characterized sequential dependencies by relating current choices to weighted averages of prior choices. We then modeled behavior as a drift-diffusion process, in which the weighted average of prior choices provided an additive offset to a decision variable that integrated incoming motion evidence to govern choice. The average magnitude of offset within individual training sessions declined steadily as the quality of the integrated motion evidence increased over many months of training. The trial-by-trial magnitude of offset was correlated with signals related to developing commands that generate the oculomotor response but not with neural activity in either the middle temporal area, which represents information about the motion stimulus, or the lateral intraparietal area, which represents the sensory-motor conversion. The results suggest that training can shape the relative contributions of expectations based on prior trends and incoming sensory evidence to select and prepare visually guided actions.
Collapse
Affiliation(s)
- Joshua I Gold
- University of Pennsylvania, Department of Neuroscience, 116 Johnson Pavilion, 3610 Hamilton Walk, Philadelphia, PA 19104-6074, USA.
| | | | | | | |
Collapse
|
69
|
Cohen MX, Frank MJ. Neurocomputational models of basal ganglia function in learning, memory and choice. Behav Brain Res 2008; 199:141-56. [PMID: 18950662 DOI: 10.1016/j.bbr.2008.09.029] [Citation(s) in RCA: 138] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2008] [Revised: 09/24/2008] [Accepted: 09/24/2008] [Indexed: 11/24/2022]
Abstract
The basal ganglia (BG) are critical for the coordination of several motor, cognitive, and emotional functions and become dysfunctional in several pathological states ranging from Parkinson's disease to Schizophrenia. Here we review principles developed within a neurocomputational framework of BG and related circuitry which provide insights into their functional roles in behavior. We focus on two classes of models: those that incorporate aspects of biological realism and constrained by functional principles, and more abstract mathematical models focusing on the higher level computational goals of the BG. While the former are arguably more "realistic", the latter have a complementary advantage in being able to describe functional principles of how the system works in a relatively simple set of equations, but are less suited to making specific hypotheses about the roles of specific nuclei and neurophysiological processes. We review the basic architecture and assumptions of these models, their relevance to our understanding of the neurobiological and cognitive functions of the BG, and provide an update on the potential roles of biological details not explicitly incorporated in existing models. Empirical studies ranging from those in transgenic mice to dopaminergic manipulation, deep brain stimulation, and genetics in humans largely support model predictions and provide the basis for further refinement. Finally, we discuss possible future directions and possible ways to integrate different types of models.
Collapse
Affiliation(s)
- Michael X Cohen
- Department of Psychology, Program in Neuroscience, University of Arizona, 1503 E University Blvd, Tucson, AZ 85721, United States
| | | |
Collapse
|
70
|
Vickery TJ, Jiang YV. Inferior Parietal Lobule Supports Decision Making under Uncertainty in Humans. Cereb Cortex 2008; 19:916-25. [DOI: 10.1093/cercor/bhn140] [Citation(s) in RCA: 91] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
71
|
Mikulić A, Dorris MC. Temporal and spatial allocation of motor preparation during a mixed-strategy game. J Neurophysiol 2008; 100:2101-8. [PMID: 18667538 DOI: 10.1152/jn.90703.2008] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Adopting a mixed response strategy in competitive situations can prevent opponents from exploiting predictable play. What drives stochastic action selection is unclear given that choice patterns suggest that, on average, players are indifferent to available options during mixed-strategy equilibria. To gain insight into this stochastic selection process, we examined how motor preparation was allocated during a mixed-strategy game. If selection processes on each trial reflect a global indifference between options, then there should be no bias in motor preparation (unbiased preparation hypothesis). If, however, differences exist in the desirability of options on each trial then motor preparation should be biased toward the preferred option (biased preparation hypothesis). We tested between these alternatives by examining how saccade preparation was allocated as human subjects competed against an adaptive computer opponent in an oculomotor version of the game "matching pennies." Subjects were free to choose between two visual targets using a saccadic eye movement. Saccade preparation was probed by occasionally flashing a visual distractor at a range of times preceding target presentation. The probability that a distractor would evoke a saccade error, and when it failed to do so, the probability of choosing each of the subsequent targets quantified the temporal and spatial evolution of saccade preparation, respectively. Our results show that saccade preparation became increasingly biased as the time of target presentation approached. Specifically, the spatial locus to which saccade preparation was directed varied from trial to trial, and its time course depended on task timing.
Collapse
Affiliation(s)
- Areh Mikulić
- Department of Physiology, Centre for Neuroscience Studies, Canadian Institutes of Health Research Group in Sensory Motor Systems, Queen's University, Botterell Hall, Rm. 440, Kingston K7L 3N6, ON, Canada
| | | |
Collapse
|
72
|
Abstract
Decision making in a social group has two distinguishing features. First, humans and other animals routinely alter their behavior in response to changes in their physical and social environment. As a result, the outcomes of decisions that depend on the behavior of multiple decision makers are difficult to predict and require highly adaptive decision-making strategies. Second, decision makers may have preferences regarding consequences to other individuals and therefore choose their actions to improve or reduce the well-being of others. Many neurobiological studies have exploited game theory to probe the neural basis of decision making and suggested that these features of social decision making might be reflected in the functions of brain areas involved in reward evaluation and reinforcement learning. Molecular genetic studies have also begun to identify genetic mechanisms for personal traits related to reinforcement learning and complex social decision making, further illuminating the biological basis of social behavior.
Collapse
Affiliation(s)
- Daeyeol Lee
- Yale University School of Medicine, Department of Neurobiology, 333 Cedar Street, SHM B404, New Haven, Connecticut 06510, USA.
| |
Collapse
|
73
|
Kim H, Lee D, Shin YM, Chey J. Impaired strategic decision making in schizophrenia. Brain Res 2007; 1180:90-100. [PMID: 17905200 DOI: 10.1016/j.brainres.2007.08.049] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2007] [Revised: 08/14/2007] [Accepted: 08/22/2007] [Indexed: 11/26/2022]
Abstract
Adaptive decision making in dynamic social settings requires frequent re-evaluation of choice outcomes and revision of strategies. This requires an array of multiple cognitive abilities, such as working memory and response inhibition. Thus, the disruption of such abilities in schizophrenia can have significant implications for social dysfunctions in affected patients. In the present study, 20 schizophrenia patients and 20 control subjects completed two computerized binary decision-making tasks. In the first task, the participants played a competitive zero-sum game against a computer in which the predictable choice behavior was penalized and the optimal strategy was to choose the two targets stochastically. In the second task, the expected payoffs of the two targets were fixed and unaffected by the subject's choices, so the optimal strategy was to choose the target with the higher expected payoff exclusively. The schizophrenia patients earned significantly less money during the first task, even though their overall choice probabilities were not significantly different from the control subjects. This was mostly because patients were impaired in integrating the outcomes of their previous choices appropriately in order to maintain the optimal strategy. During the second task, the choices of patients and control subjects displayed more similar patterns. This study elucidated the specific components in strategic decision making that are impaired in schizophrenia. The deficit, which can be characterized as strategic stiffness, may have implications for the poor social adjustment in schizophrenia patients.
Collapse
Affiliation(s)
- Hyojin Kim
- Department of Psychology, Seoul National University, San 56-1 Shillim-dong Kwanak-gu, Seoul 151-742, Republic of Korea
| | | | | | | |
Collapse
|
74
|
Abstract
By combining the models and tasks of Game Theory with modern psychological and neuroscientific methods, the neuroeconomic approach to the study of social decision-making has the potential to extend our knowledge of brain mechanisms involved in social decisions and to advance theoretical models of how we make decisions in a rich, interactive environment. Research has already begun to illustrate how social exchange can act directly on the brain's reward system, how affective factors play an important role in bargaining and competitive games, and how the ability to assess another's intentions is related to strategic play. These findings provide a fruitful starting point for improved models of social decision-making, informed by the formal mathematical approach of economics and constrained by known neural mechanisms.
Collapse
Affiliation(s)
- Alan G Sanfey
- Department of Psychology, University of Arizona, 1503 East University Boulevard, Tucson, AZ 85721, USA.
| |
Collapse
|
75
|
Seo H, Lee D. Temporal filtering of reward signals in the dorsal anterior cingulate cortex during a mixed-strategy game. J Neurosci 2007; 27:8366-77. [PMID: 17670983 PMCID: PMC2413179 DOI: 10.1523/jneurosci.2369-07.2007] [Citation(s) in RCA: 213] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
The process of decision making in humans and other animals is adaptive and can be tuned through experience so as to optimize the outcomes of their choices in a dynamic environment. Previous studies have demonstrated that the anterior cingulate cortex plays an important role in updating the animal's behavioral strategies when the action outcome contingencies change. Moreover, neurons in the anterior cingulate cortex often encode the signals related to expected or actual reward. We investigated whether reward-related activity in the anterior cingulate cortex is affected by the animal's previous reward history. This was tested in rhesus monkeys trained to make binary choices in a computer-simulated competitive zero-sum game. The animal's choice behavior was relatively close to the optimal strategy but also revealed small systematic biases that are consistent with the use of a reinforcement learning algorithm. In addition, the activity of neurons in the dorsal anterior cingulate cortex that was related to the reward received by the animal in a given trial often was modulated by the rewards in the previous trials. Some of these neurons encoded the rate of rewards in previous trials, whereas others displayed activity modulations more closely related to the reward prediction errors. In contrast, signals related to the animal's choices were represented only weakly in this cortical area. These results suggest that neurons in the dorsal anterior cingulate cortex might be involved in the subjective evaluation of choice outcomes based on the animal's reward history.
Collapse
Affiliation(s)
- Hyojung Seo
- Department of Neurobiology, Yale University School of Medicine, New Haven, Connecticut 06510
| | - Daeyeol Lee
- Department of Neurobiology, Yale University School of Medicine, New Haven, Connecticut 06510
| |
Collapse
|
76
|
Lee D, Rushworth MFS, Walton ME, Watanabe M, Sakagami M. Functional specialization of the primate frontal cortex during decision making. J Neurosci 2007; 27:8170-3. [PMID: 17670961 PMCID: PMC2413178 DOI: 10.1523/jneurosci.1561-07.2007] [Citation(s) in RCA: 98] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Economic theories of decision making are based on the principle of utility maximization, and reinforcement-learning theory provides computational algorithms that can be used to estimate the overall reward expected from alternative choices. These formal models not only account for a large range of behavioral observations in human and animal decision makers, but also provide useful tools for investigating the neural basis of decision making. Nevertheless, in reality, decision makers must combine different types of information about the costs and benefits associated with each available option, such as the quality and quantity of expected reward and required work. In this article, we put forward the hypothesis that different subdivisions of the primate frontal cortex may be specialized to focus on different aspects of dynamic decision-making processes. In this hypothesis, the lateral prefrontal cortex is primarily involved in maintaining the state representation necessary to identify optimal actions in a given environment. In contrast, the orbitofrontal cortex and the anterior cingulate cortex might be primarily involved in encoding and updating the utilities associated with different sensory stimuli and alternative actions, respectively. These cortical areas are also likely to contribute to decision making in a social context.
Collapse
Affiliation(s)
- Daeyeol Lee
- Department of Neurobiology, Yale University School of Medicine, New Haven, Connecticut 06510, USA.
| | | | | | | | | |
Collapse
|
77
|
Seo H, Barraclough DJ, Lee D. Dynamic Signals Related to Choices and Outcomes in the Dorsolateral Prefrontal Cortex. Cereb Cortex 2007; 17 Suppl 1:i110-7. [PMID: 17548802 DOI: 10.1093/cercor/bhm064] [Citation(s) in RCA: 115] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Although economic theories based on utility maximization account for a range of choice behaviors, utilities must be estimated through experience. Dynamics of this learning process may account for certain discrepancies between the predictions of economic theories and real choice behaviors of humans and other animals. To understand the neural mechanisms responsible for such adaptive decision making, we trained rhesus monkeys to play a simulated matching pennies game. Small but systematic deviations of the animal's behavior from the optimal strategy were consistent with the predictions of reinforcement learning theory. In addition, individual neurons in the dorsolateral prefrontal cortex (DLPFC) encoded 3 different types of signals that can potentially influence the animal's future choices. First, activity modulated by the animal's previous choices might provide the eligibility trace that can be used to attribute a particular outcome to its causative action. Second, activity related to the animal's rewards in the previous trials might be used to compute an average reward rate. Finally, activity of some neurons was modulated by the computer's choices in the previous trials and may reflect the process of updating the value functions. These results suggest that the DLPFC might be an important node in the cortical network of decision making.
Collapse
Affiliation(s)
- Hyojung Seo
- Department of Neurobiology, Yale University School of Medicine, New Haven, CT 06510, USA
| | | | | |
Collapse
|
78
|
Doya K. Reinforcement learning: Computational theory and biological mechanisms. HFSP JOURNAL 2007; 1:30-40. [PMID: 19404458 DOI: 10.2976/1.2732246/10.2976/1] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2006] [Accepted: 03/29/2007] [Indexed: 11/19/2022]
Abstract
Reinforcement learning is a computational framework for an active agent to learn behaviors on the basis of a scalar reward signal. The agent can be an animal, a human, or an artificial system such as a robot or a computer program. The reward can be food, water, money, or whatever measure of the performance of the agent. The theory of reinforcement learning, which was developed in an artificial intelligence community with intuitions from animal learning theory, is now giving a coherent account on the function of the basal ganglia. It now serves as the "common language" in which biologists, engineers, and social scientists can exchange their problems and findings. This article reviews the basic theoretical framework of reinforcement learning and discusses its recent and future contributions toward the understanding of animal behaviors and human decision making.
Collapse
Affiliation(s)
- Kenji Doya
- Neural Computation Unit, Okinawa Institute of Science and Technology, 12-22 Suzaki, Uruma, Okinawa 904-2234, Japan
| |
Collapse
|
79
|
Doya K. Reinforcement learning: Computational theory and biological mechanisms. HFSP JOURNAL 2007. [PMID: 19404458 DOI: 10.2976/1.2732246] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Reinforcement learning is a computational framework for an active agent to learn behaviors on the basis of a scalar reward signal. The agent can be an animal, a human, or an artificial system such as a robot or a computer program. The reward can be food, water, money, or whatever measure of the performance of the agent. The theory of reinforcement learning, which was developed in an artificial intelligence community with intuitions from animal learning theory, is now giving a coherent account on the function of the basal ganglia. It now serves as the "common language" in which biologists, engineers, and social scientists can exchange their problems and findings. This article reviews the basic theoretical framework of reinforcement learning and discusses its recent and future contributions toward the understanding of animal behaviors and human decision making.
Collapse
Affiliation(s)
- Kenji Doya
- Neural Computation Unit, Okinawa Institute of Science and Technology, 12-22 Suzaki, Uruma, Okinawa 904-2234, Japan
| |
Collapse
|
80
|
Kawato M, Samejima K. Efficient reinforcement learning: computational theories, neuroscience and robotics. Curr Opin Neurobiol 2007; 17:205-12. [PMID: 17374483 DOI: 10.1016/j.conb.2007.03.004] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2007] [Accepted: 03/08/2007] [Indexed: 11/22/2022]
Abstract
Reinforcement learning algorithms have provided some of the most influential computational theories for behavioral learning that depends on reward and penalty. After briefly reviewing supporting experimental data, this paper tackles three difficult theoretical issues that remain to be explored. First, plain reinforcement learning is much too slow to be considered a plausible brain model. Second, although the temporal-difference error has an important role both in theory and in experiments, how to compute it remains an enigma. Third, function of all brain areas, including the cerebral cortex, cerebellum, brainstem and basal ganglia, seems to necessitate a new computational framework. Computational studies that emphasize meta-parameters, hierarchy, modularity and supervised learning to resolve these issues are reviewed here, together with the related experimental data.
Collapse
Affiliation(s)
- Mitsuo Kawato
- ATR Computational Neuroscience Laboratories, 2-2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0288, Japan.
| | | |
Collapse
|
81
|
Lee D, Seo H. Mechanisms of reinforcement learning and decision making in the primate dorsolateral prefrontal cortex. Ann N Y Acad Sci 2007; 1104:108-22. [PMID: 17347332 DOI: 10.1196/annals.1390.007] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
To a first approximation, decision making is a process of optimization in which the decision maker tries to maximize the desirability of the outcomes resulting from chosen actions. Estimates of desirability are referred to as utilities or value functions, and they must be continually revised through experience according to the discrepancies between the predicted and obtained rewards. Reinforcement learning theory prescribes various algorithms for updating value functions and can parsimoniously account for the results of numerous behavioral, neurophysiological, and imaging studies in humans and other primates. In this article, we first discuss relative merits of various decision-making tasks used in neurophysiological studies of decision making in nonhuman primates. We then focus on how reinforcement learning theory can shed new light on the function of the primate dorsolateral prefrontal cortex. Similar to the findings from other brain areas, such as cingulate cortex and basal ganglia, activity in the dorsolateral prefrontal cortex often signals the value of expected reward and actual outcome. Thus, the dorsolateral prefrontal cortex is likely to be a part of the broader network involved in adaptive decision making. In addition, reward-related activity in the dorsolateral prefrontal cortex is influenced by the animal's choices and other contextual information, and therefore may provide a neural substrate by which the animals can flexibly modify their decision-making strategies according to the demands of specific tasks.
Collapse
Affiliation(s)
- Daeyeol Lee
- Department of Neurobiology, Yale University School of Medicine, New Haven, CT 06510, USA.
| | | |
Collapse
|
82
|
Soltani A, Lee D, Wang XJ. Neural mechanism for stochastic behaviour during a competitive game. Neural Netw 2006; 19:1075-90. [PMID: 17015181 PMCID: PMC1752206 DOI: 10.1016/j.neunet.2006.05.044] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2005] [Accepted: 05/22/2006] [Indexed: 11/18/2022]
Abstract
Previous studies have shown that non-human primates can generate highly stochastic choice behaviour, especially when this is required during a competitive interaction with another agent. To understand the neural mechanism of such dynamic choice behaviour, we propose a biologically plausible model of decision making endowed with synaptic plasticity that follows a reward-dependent stochastic Hebbian learning rule. This model constitutes a biophysical implementation of reinforcement learning, and it reproduces salient features of behavioural data from an experiment with monkeys playing a matching pennies game. Due to interaction with an opponent and learning dynamics, the model generates quasi-random behaviour robustly in spite of intrinsic biases. Furthermore, non-random choice behaviour can also emerge when the model plays against a non-interactive opponent, as observed in the monkey experiment. Finally, when combined with a meta-learning algorithm, our model accounts for the slow drift in the animal's strategy based on a process of reward maximization.
Collapse
Affiliation(s)
- Alireza Soltani
- Department of Physics and Volen Center for Complex Systems, Brandeis University, Waltham, MA 02454, USA.
| | | | | |
Collapse
|
83
|
Abstract
We studied the choice behavior of 2 monkeys in a discrete-trial task with reinforcement contingencies similar to those Herrnstein (1961) used when he described the matching law. In each session, the monkeys experienced blocks of discrete trials at different relative-reinforcer frequencies or magnitudes with unsignalled transitions between the blocks. Steady-state data following adjustment to each transition were well characterized by the generalized matching law; response ratios undermatched reinforcer frequency ratios but matched reinforcer magnitude ratios. We modelled response-by-response behavior with linear models that used past reinforcers as well as past choices to predict the monkeys' choices on each trial. We found that more recently obtained reinforcers more strongly influenced choice behavior. Perhaps surprisingly, we also found that the monkeys' actions were influenced by the pattern of their own past choices. It was necessary to incorporate both past reinforcers and past choices in order to accurately capture steady-state behavior as well as the fluctuations during block transitions and the response-by-response patterns of behavior. Our results suggest that simple reinforcement learning models must account for the effects of past choices to accurately characterize behavior in this task, and that models with these properties provide a conceptual tool for studying how both past reinforcers and past choices are integrated by the neural systems that generate behavior.
Collapse
Affiliation(s)
- Brian Lau
- Center for Neural Science, New York University, New York, New York 10003, USA.
| | | |
Collapse
|
84
|
Sohn JW, Lee D. Effects of reward expectancy on sequential eye movements in monkeys. Neural Netw 2006; 19:1181-91. [PMID: 16935467 DOI: 10.1016/j.neunet.2006.04.005] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2005] [Accepted: 04/07/2006] [Indexed: 11/19/2022]
Abstract
Desirability of an action, often referred to as utility or value, is determined by various factors, such as the probability and timing of expected reward. We investigated how performance of monkeys in an oculomotor serial reaction time task is influenced by multiple motivational factors. The animals produced a series of visually-guided eye movements, while the sequence of target locations and the location of the rewarded target were systematically manipulated. The results show that error rates as well as saccade latencies were consistently influenced by the number of remaining movements necessary to obtain a reward. In addition, when the animal produced multiple saccades before fixating a given target, the first saccade tended to be directed towards the rewarded location, suggesting that saccades to rewarded location and visual target might be programmed concurrently. These results show that monkeys can utilize information about the required sequence of movements to update their subjective values.
Collapse
Affiliation(s)
- Jeong-woo Sohn
- Department of Brain and Cognitive Sciences, Center for Visual Science, University of Rochester, Rochester, NY 14627, USA.
| | | |
Collapse
|
85
|
Kennerley SW, Walton ME, Behrens TEJ, Buckley MJ, Rushworth MFS. Optimal decision making and the anterior cingulate cortex. Nat Neurosci 2006; 9:940-7. [PMID: 16783368 DOI: 10.1038/nn1724] [Citation(s) in RCA: 633] [Impact Index Per Article: 35.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2006] [Accepted: 05/23/2006] [Indexed: 11/08/2022]
Abstract
Learning the value of options in an uncertain environment is central to optimal decision making. The anterior cingulate cortex (ACC) has been implicated in using reinforcement information to control behavior. Here we demonstrate that the ACC's critical role in reinforcement-guided behavior is neither in detecting nor in correcting errors, but in guiding voluntary choices based on the history of actions and outcomes. ACC lesions did not impair the performance of monkeys (Macaca mulatta) immediately after errors, but made them unable to sustain rewarded responses in a reinforcement-guided choice task and to integrate risk and payoff in a dynamic foraging task. These data suggest that the ACC is essential for learning the value of actions.
Collapse
Affiliation(s)
- Steven W Kennerley
- Department of Experimental Psychology, South Parks Road, Oxford OX1 3UD, UK.
| | | | | | | | | |
Collapse
|
86
|
Lee D, Schieber MH. Serial correlation in lateralized choices of hand and target. Exp Brain Res 2006; 174:499-509. [PMID: 16715180 DOI: 10.1007/s00221-006-0481-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2006] [Accepted: 03/29/2006] [Indexed: 10/24/2022]
Abstract
We investigated how lateralized choices of hand and target are influenced by previous behavior. Three monkeys retrieved food pellets following cues indicating the location of available food pellet targets, and the hand that could be used to acquire a target. In pseudo-randomized trials, the monkeys could retrieve food pellet targets only on their right side, only on their left side, or their choice of either right or left side, using only their right hand, only their left hand, or their choice of either hand. We examined separately the patterns of serial correlation in target choices and hand choices. Although individual monkeys showed overall laterality preferences, instead of repeatedly using the preferred hand, we found that the monkeys tended to switch hands in successive trials. This serial correlation in hand choice was stronger and more robust than serial correlation in target choice. Furthermore, the pattern of serial correlation for target choice closely resembled that of serial correlation for hand choice when the animal was allowed to choose both target and hand, but only when the target cue was presented before the hand cue. These results suggest that when cued to choose a hand first, the monkeys tended to make a separate decision as to whether to switch their target choices or not, whereas their decisions to switch hands and targets were linked more tightly if the animal was cued to choose a target first.
Collapse
Affiliation(s)
- Daeyeol Lee
- Department of Brain and Cognitive Sciences, Center for Visual Science, University of Rochester, Rochester, NY 14642, USA
| | | |
Collapse
|
87
|
Lee D. Neural basis of quasi-rational decision making. Curr Opin Neurobiol 2006; 16:191-8. [PMID: 16531040 DOI: 10.1016/j.conb.2006.02.001] [Citation(s) in RCA: 69] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2005] [Accepted: 02/27/2006] [Indexed: 01/22/2023]
Abstract
Standard economic theories conceive homo economicus as a rational decision maker capable of maximizing utility. In reality, however, people tend to approximate optimal decision-making strategies through a collection of heuristic routines. Some of these routines are driven by emotional processes, and others are adjusted iteratively through experience. In addition, routines specialized for social decision making, such as inference about the mental states of other decision makers, might share their origins and neural mechanisms with the ability to simulate or imagine outcomes expected from alternative actions that an individual can take. A recent surge of collaborations across economics, psychology and neuroscience has provided new insights into how such multiple elements of decision making interact in the brain.
Collapse
Affiliation(s)
- Daeyeol Lee
- Department of Brain and Cognitive Sciences, Center for Visual Science, University of Rochester, Rochester, NY 14627, USA.
| |
Collapse
|
88
|
|
89
|
Lee D, McGreevy BP, Barraclough DJ. Learning and decision making in monkeys during a rock-paper-scissors game. ACTA ACUST UNITED AC 2005; 25:416-30. [PMID: 16095886 DOI: 10.1016/j.cogbrainres.2005.07.003] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2005] [Revised: 05/27/2005] [Accepted: 07/12/2005] [Indexed: 10/25/2022]
Abstract
Game theory provides a solution to the problem of finding a set of optimal decision-making strategies in a group. However, people seldom play such optimal strategies and adjust their strategies based on their experience. Accordingly, many theories postulate a set of variables related to the probabilities of choosing various strategies and describe how such variables are dynamically updated. In reinforcement learning, these value functions are updated based on the outcome of the player's choice, whereas belief learning allows the value functions of all available choices to be updated according to the choices of other players. We investigated the nature of learning process in monkeys playing a competitive game with ternary choices, using a rock-paper-scissors game. During the baseline condition in which the computer selected its targets randomly, each animal displayed biases towards some targets. When the computer exploited the pattern of animal's choice sequence but not its reward history, the animal's choice was still systematically biased by the previous choice of the computer. This bias was reduced when the computer exploited both the choice and reward histories of the animal. Compared to simple models of reinforcement learning or belief learning, these adaptive processes were better described by a model that incorporated the features of both models. These results suggest that stochastic decision-making strategies in primates during social interactions might be adjusted according to both actual and hypothetical payoffs.
Collapse
Affiliation(s)
- Daeyeol Lee
- Department of Brain and Cognitive Sciences, Center for Visual Science, University of Rochester, Rochester, NY 14627, USA.
| | | | | |
Collapse
|