51
|
Abstract
Unidirectional connections from the cortex to the matrix of the corpus striatum initiate the cortico-basal ganglia (BG)-thalamocortical loop, thought to be important in momentary action selection and in longer-term fine tuning of behavioural repertoire; a discrete set of striatal compartments, striosomes, has the complementary role of registering or anticipating reward that shapes corticostriatal plasticity. Re-entrant signals traversing the cortico-BG loop impact predominantly frontal cortices, conveyed through topographically ordered output channels; by contrast, striatal input signals originate from a far broader span of cortex, and are far more divergent in their termination. The term 'disclosed loop' is introduced to describe this organisation: a closed circuit that is open to outside influence at the initial stage of cortical input. The closed circuit component of corticostriatal afferents is newly dubbed 'operative', as it is proposed to establish the bid for action selection on the part of an incipient cortical action plan; the broader set of converging corticostriatal afferents is described as contextual. A corollary of this proposal is that every unit of the striatal volume, including the long, C-shaped tail of the caudate nucleus, should receive a mandatory component of operative input, and hence include at least one area of BG-recipient cortex amongst the sources of its corticostriatal afferents. Individual operative afferents contact twin classes of GABAergic striatal projection neuron (SPN), distinguished by their neurochemical character, and onward circuitry. This is the basis of the classic direct and indirect pathway model of the cortico-BG loop. Each pathway utilises a serial chain of inhibition, with two such links, or three, providing positive and negative feedback, respectively. Operative co-activation of direct and indirect SPNs is, therefore, pictured to simultaneously promote action, and to restrain it. The balance of this rival activity is determined by the contextual inputs, which summarise the external and internal sensory environment, and the state of ongoing behavioural priorities. Notably, the distributed sources of contextual convergence upon a striatal locus mirror the transcortical network harnessed by the origin of the operative input to that locus, thereby capturing a similar set of contingencies relevant to determining action. The disclosed loop formulation of corticostriatal and subsequent BG loop circuitry, as advanced here, refines the operating rationale of the classic model and allows the integration of more recent anatomical and physiological data, some of which can appear at variance with the classic model. Equally, it provides a lucid functional context for continuing cellular studies of SPN biophysics and mechanisms of synaptic plasticity.
Collapse
|
52
|
Pedroarena-Leal N, Ruge D. Cerebellar neurophysiology in Gilles de la Tourette syndrome and its role as a target for therapeutic intervention. J Neuropsychol 2015; 11:327-346. [DOI: 10.1111/jnp.12091] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2015] [Revised: 10/07/2015] [Indexed: 01/01/2023]
Affiliation(s)
- Nicole Pedroarena-Leal
- Sobell Department of Motor Neuroscience and Movement Disorders; UCL-Institute of Neurology; University College London; UK
| | - Diane Ruge
- Sobell Department of Motor Neuroscience and Movement Disorders; UCL-Institute of Neurology; University College London; UK
| |
Collapse
|
53
|
Ito M, Doya K. Parallel Representation of Value-Based and Finite State-Based Strategies in the Ventral and Dorsal Striatum. PLoS Comput Biol 2015; 11:e1004540. [PMID: 26529522 PMCID: PMC4631489 DOI: 10.1371/journal.pcbi.1004540] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2015] [Accepted: 09/08/2015] [Indexed: 12/05/2022] Open
Abstract
Previous theoretical studies of animal and human behavioral learning have focused on the dichotomy of the value-based strategy using action value functions to predict rewards and the model-based strategy using internal models to predict environmental states. However, animals and humans often take simple procedural behaviors, such as the “win-stay, lose-switch” strategy without explicit prediction of rewards or states. Here we consider another strategy, the finite state-based strategy, in which a subject selects an action depending on its discrete internal state and updates the state depending on the action chosen and the reward outcome. By analyzing choice behavior of rats in a free-choice task, we found that the finite state-based strategy fitted their behavioral choices more accurately than value-based and model-based strategies did. When fitted models were run autonomously with the same task, only the finite state-based strategy could reproduce the key feature of choice sequences. Analyses of neural activity recorded from the dorsolateral striatum (DLS), the dorsomedial striatum (DMS), and the ventral striatum (VS) identified significant fractions of neurons in all three subareas for which activities were correlated with individual states of the finite state-based strategy. The signal of internal states at the time of choice was found in DMS, and for clusters of states was found in VS. In addition, action values and state values of the value-based strategy were encoded in DMS and VS, respectively. These results suggest that both the value-based strategy and the finite state-based strategy are implemented in the striatum. The neural mechanism of decision-making, a cognitive process to select one action among multiple possibilities, is a fundamental issue in neuroscience. Previous studies have revealed the roles of the cerebral cortex and the basal ganglia in decision-making, by assuming that subjects take a value-based reinforcement learning strategy, in which the expected reward for each action candidate is updated. However, animals and humans often use simple procedural strategies, such as “win-stay, lose-switch.” In this study, we consider a finite state-based strategy, in which a subject acts depending on its discrete internal state and updates the state based on reward feedback. We found that the finite state-based strategy could reproduce the choice behavior of rats in a binary choice task with higher accuracy than the value-based strategy. Interestingly, neuronal activity in the striatum, a crucial brain region for reward-based learning, encoded information regarding both strategies. These results suggest that both the value-based strategy and the finite state-based strategy are implemented in the striatum.
Collapse
Affiliation(s)
- Makoto Ito
- Okinawa Institute of Science and Technology Graduate University, Onna-son Okinawa, Japan
- * E-mail:
| | - Kenji Doya
- Okinawa Institute of Science and Technology Graduate University, Onna-son Okinawa, Japan
| |
Collapse
|
54
|
Balleine BW, Dezfouli A, Ito M, Doya K. Hierarchical control of goal-directed action in the cortical–basal ganglia network. Curr Opin Behav Sci 2015. [DOI: 10.1016/j.cobeha.2015.06.001] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
|
55
|
Roemmich RT, Bastian AJ. Two ways to save a newly learned motor pattern. J Neurophysiol 2015; 113:3519-30. [PMID: 25855699 DOI: 10.1152/jn.00965.2014] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Accepted: 04/02/2015] [Indexed: 11/22/2022] Open
Abstract
Savings, or faster relearning after initial learning, demonstrates humans' remarkable ability to retain learned movements amid changing environments. This is important within the context of locomotion, as the ability of the nervous system to "remember" how to walk in specific environments enables us to navigate changing terrains and progressively improve gait patterns with rehabilitation. Here, we used a split-belt treadmill to study precisely how people save newly learned walking patterns. In Experiment 1, we investigated savings by systematically varying the learning and unlearning environments. Savings was predominantly influenced by 1) previous exposure to similar abrupt changes in the environment and 2) the amount of exposure to the new environment. Relearning was fastest when these two factors coincided, and we did not observe savings after the environment was introduced gradually during initial learning. In Experiment 2, we then studied whether people store explicit information about different walking environments that mirrors savings of a new walking pattern. Like savings, we found that previous exposure to abrupt changes in the environment also drove the ability to recall a previously experienced walking environment accurately. Crucially, the information recalled was extrinsic information about the learning environment (i.e., treadmill speeds) and not intrinsic information about the walking pattern itself. We conclude that simply learning a new walking pattern is not enough for long-term savings; rather, savings of a learned walking pattern involves recall of the environment or extended training at the learned state.
Collapse
Affiliation(s)
- Ryan T Roemmich
- Department of Neuroscience, The Johns Hopkins University School of Medicine, Baltimore, Maryland; Motion Analysis Laboratory, The Kennedy Krieger Institute, Baltimore, Maryland
| | - Amy J Bastian
- Department of Neuroscience, The Johns Hopkins University School of Medicine, Baltimore, Maryland; Motion Analysis Laboratory, The Kennedy Krieger Institute, Baltimore, Maryland
| |
Collapse
|
56
|
Ito M, Doya K. Distinct neural representation in the dorsolateral, dorsomedial, and ventral parts of the striatum during fixed- and free-choice tasks. J Neurosci 2015; 35:3499-514. [PMID: 25716849 PMCID: PMC4339358 DOI: 10.1523/jneurosci.1962-14.2015] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2014] [Revised: 12/16/2014] [Accepted: 01/07/2015] [Indexed: 11/21/2022] Open
Abstract
The striatum is a major input site of the basal ganglia, which play an essential role in decision making. Previous studies have suggested that subareas of the striatum have distinct roles: the dorsolateral striatum (DLS) functions in habitual action, the dorsomedial striatum (DMS) in goal-directed actions, and the ventral striatum (VS) in motivation. To elucidate distinctive functions of subregions of the striatum in decision making, we systematically investigated information represented by phasically active neurons in DLS, DMS, and VS. Rats performed two types of choice tasks: fixed- and free-choice tasks. In both tasks, rats were required to perform nose poking to either the left or right hole after cue-tone presentation. A food pellet was delivered probabilistically depending on the presented cue and the selected action. The reward probability was fixed in fixed-choice task and varied in a block-wise manner in free-choice task. We found the following: (1) when rats began the tasks, a majority of VS neurons increased their firing rates and information regarding task type and state value was most strongly represented in VS; (2) during action selection, information of action and action values was most strongly represented in DMS; (3) action-command information (action representation before action selection) was stronger in the fixed-choice task than in the free-choice task in both DLS and DMS; and (4) action-command information was strongest in DLS, particularly when the same choice was repeated. We propose a hypothesis of hierarchical reinforcement learning in the basal ganglia to coherently explain these results.
Collapse
Affiliation(s)
- Makoto Ito
- Okinawa Institute of Science and Technology Graduate University, Onna-son Okinawa 904-0412, Japan
| | - Kenji Doya
- Okinawa Institute of Science and Technology Graduate University, Onna-son Okinawa 904-0412, Japan
| |
Collapse
|
57
|
Funamizu A, Ito M, Doya K, Kanzaki R, Takahashi H. Condition interference in rats performing a choice task with switched variable- and fixed-reward conditions. Front Neurosci 2015; 9:27. [PMID: 25741231 PMCID: PMC4327310 DOI: 10.3389/fnins.2015.00027] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2014] [Accepted: 01/20/2015] [Indexed: 11/13/2022] Open
Abstract
Because humans and animals encounter various situations, the ability to adaptively decide upon responses to any situation is essential. To date, however, decision processes and the underlying neural substrates have been investigated under specific conditions; thus, little is known about how various conditions influence one another in these processes. In this study, we designed a binary choice task with variable- and fixed-reward conditions and investigated neural activities of the prelimbic cortex and dorsomedial striatum in rats. Variable- and fixed-reward conditions induced flexible and inflexible behaviors, respectively; one of the two conditions was randomly assigned in each trial for testing the possibility of condition interference. Rats were successfully conditioned such that they could find the better reward holes of variable-reward-condition and fixed-reward-condition trials. A learning interference model, which updated expected rewards (i.e., values) used in variable-reward-condition trials on the basis of combined experiences of both conditions, better fit choice behaviors than conventional models which updated values in each condition independently. Thus, although rats distinguished the trial condition, they updated values in a condition-interference manner. Our electrophysiological study suggests that this interfering value-updating is mediated by the prelimbic cortex and dorsomedial striatum. First, some prelimbic cortical and striatal neurons represented the action-reward associations irrespective of trial conditions. Second, the striatal neurons kept tracking the values of variable-reward condition even in fixed-reward-condition trials, such that values were possibly interferingly updated even in the fixed-reward condition.
Collapse
Affiliation(s)
- Akihiro Funamizu
- Neural Computation Unit, Okinawa Institute of Science and Technology Graduate University Okinawa, Japan ; Graduate School of Information Science and Technology, The University of Tokyo Tokyo, Japan
| | - Makoto Ito
- Neural Computation Unit, Okinawa Institute of Science and Technology Graduate University Okinawa, Japan
| | - Kenji Doya
- Neural Computation Unit, Okinawa Institute of Science and Technology Graduate University Okinawa, Japan
| | - Ryohei Kanzaki
- Graduate School of Information Science and Technology, The University of Tokyo Tokyo, Japan ; Research Center for Advanced Science and Technology, The University of Tokyo Tokyo, Japan
| | - Hirokazu Takahashi
- Graduate School of Information Science and Technology, The University of Tokyo Tokyo, Japan ; Research Center for Advanced Science and Technology, The University of Tokyo Tokyo, Japan
| |
Collapse
|
58
|
Seo H, Cai X, Donahue CH, Lee D. Neural correlates of strategic reasoning during competitive games. Science 2014; 346:340-3. [PMID: 25236468 DOI: 10.1126/science.1256254] [Citation(s) in RCA: 65] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Although human and animal behaviors are largely shaped by reinforcement and punishment, choices in social settings are also influenced by information about the knowledge and experience of other decision-makers. During competitive games, monkeys increased their payoffs by systematically deviating from a simple heuristic learning algorithm and thereby countering the predictable exploitation by their computer opponent. Neurons in the dorsomedial prefrontal cortex (dmPFC) signaled the animal's recent choice and reward history that reflected the computer's exploitative strategy. The strength of switching signals in the dmPFC also correlated with the animal's tendency to deviate from the heuristic learning algorithm. Therefore, the dmPFC might provide control signals for overriding simple heuristic learning algorithms based on the inferred strategies of the opponent.
Collapse
Affiliation(s)
- Hyojung Seo
- Department of Neurobiology, Yale University School of Medicine, New Haven, CT 06510, USA.
| | - Xinying Cai
- Department of Neurobiology, Yale University School of Medicine, New Haven, CT 06510, USA
| | - Christopher H Donahue
- Department of Neurobiology, Yale University School of Medicine, New Haven, CT 06510, USA
| | - Daeyeol Lee
- Department of Neurobiology, Yale University School of Medicine, New Haven, CT 06510, USA. Kavli Institute for Neuroscience, Yale University School of Medicine, New Haven, CT 06510, USA. Department of Psychology, Yale University, New Haven, CT 06510, USA.
| |
Collapse
|
59
|
Krauzlis RJ, Bollimunta A, Arcizet F, Wang L. Attention as an effect not a cause. Trends Cogn Sci 2014; 18:457-64. [PMID: 24953964 PMCID: PMC4186707 DOI: 10.1016/j.tics.2014.05.008] [Citation(s) in RCA: 91] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2014] [Revised: 05/13/2014] [Accepted: 05/15/2014] [Indexed: 12/22/2022]
Abstract
Attention is commonly thought to be important for managing the limited resources available in sensory areas of the neocortex. Here we present an alternative view that attention arises as a byproduct of circuits centered on the basal ganglia involved in value-based decision making. The central idea is that decision making depends on properly estimating the current state of the animal and its environment and that the weighted inputs to the currently prevailing estimate give rise to the filter-like properties of attention. After outlining this new framework, we describe findings from physiological, anatomical, computational, and clinical work that support this point of view. We conclude that the brain mechanisms responsible for attention employ a conserved circuit motif that predates the emergence of the neocortex.
Collapse
Affiliation(s)
- Richard J Krauzlis
- Laboratory of Sensorimotor Research, National Eye Institute, Bethesda, MD 20892, USA.
| | - Anil Bollimunta
- Laboratory of Sensorimotor Research, National Eye Institute, Bethesda, MD 20892, USA
| | - Fabrice Arcizet
- Laboratory of Sensorimotor Research, National Eye Institute, Bethesda, MD 20892, USA
| | - Lupeng Wang
- Laboratory of Sensorimotor Research, National Eye Institute, Bethesda, MD 20892, USA
| |
Collapse
|
60
|
Morita K, Kato A. Striatal dopamine ramping may indicate flexible reinforcement learning with forgetting in the cortico-basal ganglia circuits. Front Neural Circuits 2014; 8:36. [PMID: 24782717 PMCID: PMC3988379 DOI: 10.3389/fncir.2014.00036] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2014] [Accepted: 03/24/2014] [Indexed: 11/13/2022] Open
Abstract
It has been suggested that the midbrain dopamine (DA) neurons, receiving inputs from the cortico-basal ganglia (CBG) circuits and the brainstem, compute reward prediction error (RPE), the difference between reward obtained or expected to be obtained and reward that had been expected to be obtained. These reward expectations are suggested to be stored in the CBG synapses and updated according to RPE through synaptic plasticity, which is induced by released DA. These together constitute the "DA=RPE" hypothesis, which describes the mutual interaction between DA and the CBG circuits and serves as the primary working hypothesis in studying reward learning and value-based decision-making. However, recent work has revealed a new type of DA signal that appears not to represent RPE. Specifically, it has been found in a reward-associated maze task that striatal DA concentration primarily shows a gradual increase toward the goal. We explored whether such ramping DA could be explained by extending the "DA=RPE" hypothesis by taking into account biological properties of the CBG circuits. In particular, we examined effects of possible time-dependent decay of DA-dependent plastic changes of synaptic strengths by incorporating decay of learned values into the RPE-based reinforcement learning model and simulating reward learning tasks. We then found that incorporation of such a decay dramatically changes the model's behavior, causing gradual ramping of RPE. Moreover, we further incorporated magnitude-dependence of the rate of decay, which could potentially be in accord with some past observations, and found that near-sigmoidal ramping of RPE, resembling the observed DA ramping, could then occur. Given that synaptic decay can be useful for flexibly reversing and updating the learned reward associations, especially in case the baseline DA is low and encoding of negative RPE by DA is limited, the observed DA ramping would be indicative of the operation of such flexible reward learning.
Collapse
Affiliation(s)
- Kenji Morita
- Physical and Health Education, Graduate School of Education, The University of Tokyo Tokyo, Japan
| | - Ayaka Kato
- Department of Biological Sciences, School of Science, The University of Tokyo Tokyo, Japan
| |
Collapse
|
61
|
Stephan KE, Mathys C. Computational approaches to psychiatry. Curr Opin Neurobiol 2014; 25:85-92. [DOI: 10.1016/j.conb.2013.12.007] [Citation(s) in RCA: 180] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2013] [Revised: 11/12/2013] [Accepted: 12/05/2013] [Indexed: 12/15/2022]
|
62
|
Abstract
The brain contains multiple yet distinct systems involved in reward prediction. To understand the nature of these processes, we recorded single-unit activity from the lateral prefrontal cortex (LPFC) and the striatum in monkeys performing a reward inference task using an asymmetric reward schedule. We found that neurons both in the LPFC and in the striatum predicted reward values for stimuli that had been previously well experienced with set reward quantities in the asymmetric reward task. Importantly, these LPFC neurons could predict the reward value of a stimulus using transitive inference even when the monkeys had not yet learned the stimulus-reward association directly; whereas these striatal neurons did not show such an ability. Nevertheless, because there were two set amounts of reward (large and small), the selected striatal neurons were able to exclusively infer the reward value (e.g., large) of one novel stimulus from a pair after directly experiencing the alternative stimulus with the other reward value (e.g., small). Our results suggest that although neurons that predict reward value for old stimuli in the LPFC could also do so for new stimuli via transitive inference, those in the striatum could only predict reward for new stimuli via exclusive inference. Moreover, the striatum showed more complex functions than was surmised previously for model-free learning.
Collapse
|
63
|
Díaz E, Vargas JP, Quintero E, Gonzalo de la Casa L, O'Donnell P, Lopez JC. Differential implication of dorsolateral and dorsomedial srtiatum in encoding and recovery processes of latent inhibition. Neurobiol Learn Mem 2014; 111:19-25. [PMID: 24607505 DOI: 10.1016/j.nlm.2014.02.008] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2013] [Revised: 02/11/2014] [Accepted: 02/22/2014] [Indexed: 10/25/2022]
Abstract
The dorsal striatum has been ascribed to different behavioral roles. While the lateral area (dls) is implicated in habitual actions, its medial part (dms) is linked to goal expectancy. According to this model, dls function includes representation of stimulus-response associations, but not of goals. Dls function has been typically analyzed with regard to movement, and there is no data indicating whether this region could processes specific stimulus-outcome associations. To test this possibility, we analyzed the effects of dls and dms inactivation on the retrieval phase, and dms lesion on the acquisition phase of a latent inhibition procedure using two conditions, long and short presentations of the future conditioned stimulus. Contrary to current theories of basal ganglia function, we report evidence in favor of the dls involvement in cognitive processes of learning and retrieval. Moreover, we provide data about the sequential relationship between dms and dls, in which the dms could be involved, but it would not be critical, in new learning and the dls could be subsequently involved in consolidating cognitive routines.
Collapse
Affiliation(s)
- Estrella Díaz
- Animal Behavior and Neuroscience Lab, Dpt. Psicología Experimental, Universidad de Sevilla, c/Camilo Jose Cela s/n, 41018 Seville, Spain
| | - Juan Pedro Vargas
- Animal Behavior and Neuroscience Lab, Dpt. Psicología Experimental, Universidad de Sevilla, c/Camilo Jose Cela s/n, 41018 Seville, Spain
| | - Esperanza Quintero
- Animal Behavior and Neuroscience Lab, Dpt. Psicología Experimental, Universidad de Sevilla, c/Camilo Jose Cela s/n, 41018 Seville, Spain
| | - Luis Gonzalo de la Casa
- Animal Behavior and Neuroscience Lab, Dpt. Psicología Experimental, Universidad de Sevilla, c/Camilo Jose Cela s/n, 41018 Seville, Spain
| | - Patricio O'Donnell
- Dpt. of Anatomy and Neurobiology, University of Maryland School of Medicine, 20 Penn Street, Baltimore, MD 21201, United States
| | - Juan Carlos Lopez
- Animal Behavior and Neuroscience Lab, Dpt. Psicología Experimental, Universidad de Sevilla, c/Camilo Jose Cela s/n, 41018 Seville, Spain.
| |
Collapse
|
64
|
Fee MS. The role of efference copy in striatal learning. Curr Opin Neurobiol 2014; 25:194-200. [PMID: 24566242 DOI: 10.1016/j.conb.2014.01.012] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2013] [Revised: 01/21/2014] [Accepted: 01/22/2014] [Indexed: 11/30/2022]
Abstract
Reinforcement learning requires the convergence of signals representing context, action, and reward. While models of basal ganglia function have well-founded hypotheses about the neural origin of signals representing context and reward, the function and origin of signals representing action are less clear. Recent findings suggest that exploratory or variable behaviors are initiated by a wide array of 'action-generating' circuits in the midbrain, brainstem, and cortex. Thus, in order to learn, the striatum must incorporate an efference copy of action decisions made in these action-generating circuits. Here we review several recent neural models of reinforcement learning that emphasize the role of efference copy signals. Also described are ideas about how these signals might be integrated with inputs signaling context and reward.
Collapse
Affiliation(s)
- Michale S Fee
- McGovern Institute for Brain Research, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, United States.
| |
Collapse
|
65
|
Multiplexing signals in reinforcement learning with internal models and dopamine. Curr Opin Neurobiol 2014; 25:123-9. [PMID: 24463329 DOI: 10.1016/j.conb.2014.01.001] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2013] [Revised: 12/10/2013] [Accepted: 01/02/2014] [Indexed: 11/23/2022]
Abstract
A fundamental challenge for computational and cognitive neuroscience is to understand how reward-based learning and decision-making are made and how accrued knowledge and internal models of the environment are incorporated. Remarkable progress has been made in the field, guided by the midbrain dopamine reward prediction error hypothesis and the underlying reinforcement learning framework, which does not involve internal models ('model-free'). Recent studies, however, have begun not only to address more complex decision-making processes that are integrated with model-free decision-making, but also to include internal models about environmental reward structures and the minds of other agents, including model-based reinforcement learning and using generalized prediction errors. Even dopamine, a classic model-free signal, may work as multiplexed signals using model-based information and contribute to representational learning of reward structure.
Collapse
|
66
|
Gutierrez-Garralda JM, Moreno-Briseño P, Boll MC, Morgado-Valle C, Campos-Romo A, Diaz R, Fernandez-Ruiz J. The effect of Parkinson's disease and Huntington's disease on human visuomotor learning. Eur J Neurosci 2013; 38:2933-40. [PMID: 23802680 DOI: 10.1111/ejn.12288] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2012] [Revised: 05/27/2013] [Accepted: 05/30/2013] [Indexed: 11/30/2022]
Abstract
Visuomotor adaptation is often driven by error-based (EB) learning in which signed errors update motor commands. There are, however, visuomotor tasks where signed errors are unavailable or cannot be mapped onto appropriate motor command changes, rendering EB learning ineffective; and yet, healthy subjects can learn in these EB learning-free conditions. While EB learning depends on cerebellar integrity, the neural bases of EB-independent learning are poorly understood. As basal ganglia are involved in learning mechanisms that are independent of signed error feedback, here we tested whether patients with basal ganglia lesions, including those with Huntington's disease and Parkinson's disease, would show impairments in a visuomotor learning task that prevents the use of EB learning. We employed two visuomotor throwing tasks that were similar, but were profoundly different in the resulting visual feedback. This difference was implemented through the introduction of either a lateral displacement of the visual field via a wedge prism (EB learning) or a horizontal reversal of the visual field via a dove prism (non-EB learning). Our results show that patients with basal ganglia degeneration had normal EB learning in the wedge prism task, but were profoundly impaired in the reversing prism task that does not depend on the signed error signal feedback. These results represent the first evidence that human visuomotor learning in the absence of EB feedback depends on the integrity of the basal ganglia.
Collapse
Affiliation(s)
- Juan Manuel Gutierrez-Garralda
- Departamento de Fisiología, Facultad de Medicina, Universidad Nacional Autónoma de México, Edificio antiguo de investigación, 5º piso, Circuito Exterior, Coyoacan, C.P. 04510, D.F., México
| | | | | | | | | | | | | |
Collapse
|
67
|
Peterson EJ, Seger CA. Many hats: intratrial and reward level-dependent BOLD activity in the striatum and premotor cortex. J Neurophysiol 2013; 110:1689-702. [PMID: 23741040 DOI: 10.1152/jn.00164.2012] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Human functional magnetic resonance imaging (fMRI) studies, as well as lesion, drug, and single-cell recording studies in animals, suggest that the striatum plays a key role in associating sensory events with rewarding actions, both by facilitating reward processing and prediction (i.e., reinforcement learning) and by biasing and later updating action selection. Previous human neuroimaging research has failed to dissociate striatal activity associated with reward, stimulus, and response processing, and previous electrophysiological research in nonhuman animals has typically only examined single striatal subregions. Overcoming both these limitations, we isolated blood oxygen level-dependent (BOLD) signal associated with four intratrial processes (stimulus, preparation of response, response, and feedback) in a visuomotor learning task and examined activity associated with each within four striatal subregions (ventral striatum, putamen, head of the caudate nucleus, and body of the caudate) and the lateral premotor cortex. Overall, the striatum and lateral premotor cortex were recruited during all trial components, confirming their importance in all aspects of visuomotor learning. However, the caudate was most active at stimulus and feedback, whereas the putamen peaked in activity at response. Activation in the lateral premotor cortex was, surprisingly, strongest during stimulus and following response as feedback approached. Activity was additionally examined at three reward magnitudes. Reward magnitude affected neural activity only during stimulus in the caudate, putamen, and premotor cortex, whereas the ventral striatum showed reward sensitivity during both stimulus and feedback. Collectively, these results indicate that each striatal region makes a unique contribution to visuomotor learning through functions performed at different points within single trials.
Collapse
Affiliation(s)
- Erik J Peterson
- Department of Psychology, Colorado State University, Fort Collins, Colorado
| | | |
Collapse
|
68
|
Pezzulo G, Rigoli F, Chersi F. The mixed instrumental controller: using value of information to combine habitual choice and mental simulation. Front Psychol 2013; 4:92. [PMID: 23459512 PMCID: PMC3586710 DOI: 10.3389/fpsyg.2013.00092] [Citation(s) in RCA: 90] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2012] [Accepted: 02/08/2013] [Indexed: 11/13/2022] Open
Abstract
Instrumental behavior depends on both goal-directed and habitual mechanisms of choice. Normative views cast these mechanisms in terms of model-free and model-based methods of reinforcement learning, respectively. An influential proposal hypothesizes that model-free and model-based mechanisms coexist and compete in the brain according to their relative uncertainty. In this paper we propose a novel view in which a single Mixed Instrumental Controller produces both goal-directed and habitual behavior by flexibly balancing and combining model-based and model-free computations. The Mixed Instrumental Controller performs a cost-benefits analysis to decide whether to chose an action immediately based on the available "cached" value of actions (linked to model-free mechanisms) or to improve value estimation by mentally simulating the expected outcome values (linked to model-based mechanisms). Since mental simulation entails cognitive effort and increases the reward delay, it is activated only when the associated "Value of Information" exceeds its costs. The model proposes a method to compute the Value of Information, based on the uncertainty of action values and on the distance of alternative cached action values. Overall, the model by default chooses on the basis of lighter model-free estimates, and integrates them with costly model-based predictions only when useful. Mental simulation uses a sampling method to produce reward expectancies, which are used to update the cached value of one or more actions; in turn, this updated value is used for the choice. The key predictions of the model are tested in different settings of a double T-maze scenario. Results are discussed in relation with neurobiological evidence on the hippocampus - ventral striatum circuit in rodents, which has been linked to goal-directed spatial navigation.
Collapse
Affiliation(s)
- Giovanni Pezzulo
- Istituto di Linguistica Computazionale, "Antonio Zampolli," Consiglio Nazionale delle Ricerche Pisa, Italy ; Istituto di Scienze e Tecnologie della Cognizione, Consiglio Nazionale delle Ricerche Roma, Italy
| | | | | |
Collapse
|
69
|
Kim H, Lee D, Jung MW. Signals for previous goal choice persist in the dorsomedial, but not dorsolateral striatum of rats. J Neurosci 2013; 33:52-63. [PMID: 23283321 PMCID: PMC6618644 DOI: 10.1523/jneurosci.2422-12.2013] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2012] [Revised: 10/05/2012] [Accepted: 10/10/2012] [Indexed: 11/21/2022] Open
Abstract
The cortico-basal ganglia network has been proposed to consist of parallel loops serving distinct functions. However, it is still uncertain how the content of processed information varies across different loops and how it is related to the functions of each loop. We investigated this issue by comparing neuronal activity in the dorsolateral (sensorimotor) and dorsomedial (associative) striatum, which have been linked to habitual and goal-directed action selection, respectively, in rats performing a dynamic foraging task. Both regions conveyed significant neural signals for the animal's goal choice and its outcome. Moreover, both regions conveyed similar levels of neural signals for action value before the animal's goal choice and chosen value after the outcome of the animal's choice was revealed. However, a striking difference was found in the persistence of neural signals for the animal's chosen action. Signals for the animal's goal choice persisted in the dorsomedial striatum until the outcome of the animal's next goal choice was revealed, whereas they dissipated rapidly in the dorsolateral striatum. These persistent choice signals might be used for causally linking temporally discontiguous responses and their outcomes in the dorsomedial striatum, thereby contributing to its role in goal-directed action selection.
Collapse
Affiliation(s)
- Hoseok Kim
- Neuroscience Laboratory, Institute for Medical Sciences and
- Neuroscience Graduate Program, Ajou University School of Medicine, Suwon 443-721, Korea, and
| | - Daeyeol Lee
- Department of Neurobiology, Yale University School of Medicine, New Haven, Connecticut 06510
| | - Min Whan Jung
- Neuroscience Laboratory, Institute for Medical Sciences and
- Neuroscience Graduate Program, Ajou University School of Medicine, Suwon 443-721, Korea, and
| |
Collapse
|
70
|
Seidler RD, Kwak Y, Fling BW, Bernard JA. Neurocognitive mechanisms of error-based motor learning. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2013; 782:39-60. [PMID: 23296480 PMCID: PMC3817858 DOI: 10.1007/978-1-4614-5465-6_3] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
- Rachael D. Seidler
- Department of Psychology and School of Kinesiology, University of Michigan, 401 Washtenaw Avenue, Ann Arbor, MI 48109-2214, USA,
| | - Youngbin Kwak
- Neuroscience Program, University of Michigan, 401 Washtenaw Avenue, Ann Arbor, MI 48109-2214, USA, ; Center for Cognitive Neuroscience, Duke University, Durham, NC 27708, USA
| | - Brett W. Fling
- School of Kinesiology, University of Michigan, 401 Washtenaw Avenue, Ann Arbor, MI 48109-2214, USA,
| | - Jessica A. Bernard
- Department of Psychology, University of Michigan, 401 Washtenaw Avenue, Ann Arbor, MI 48109-2214, USA,
| |
Collapse
|
71
|
Braunlich K, Seger C. The basal ganglia. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2012; 4:135-148. [PMID: 26304191 DOI: 10.1002/wcs.1217] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Through its connections with widespread cortical areas and with dopaminergic midbrain areas, the basal ganglia are well situated to integrate patterns of cortical input with the dopaminergic reward signal originating in the midbrain. In this review, we consider the functions of the basal ganglia in relation to its gross and cellular anatomy, and discuss how these mechanisms subserve the thresholding and selection of motor and cognitive processes. We also discuss how the dopaminergic reward signal enables flexible task learning through modulation of striatal plasticity, and how reinforcement learning models have been used to account for various aspects of basal ganglia activity. Specifically, we will discuss the important role of the basal ganglia in instrumental learning, cognitive control, sequence learning, and categorization tasks. Finally, we will discuss the neurobiological and cognitive characteristics of Parkinson's disease, Huntington's disease and addiction to illustrate the relationship between the basal ganglia and cognitive function. WIREs Cogn Sci 2013, 4:135-148. doi: 10.1002/wcs.1217 For further resources related to this article, please visit the WIREs website.
Collapse
Affiliation(s)
- Kurt Braunlich
- Departments of Psychology and Molecular, Cellular and Integrative Neurosciences, Colorado State University, Fort Collins, CO, USA
| | - Carol Seger
- Departments of Psychology and Molecular, Cellular and Integrative Neurosciences, Colorado State University, Fort Collins, CO, USA
| |
Collapse
|
72
|
Khamassi M, Humphries MD. Integrating cortico-limbic-basal ganglia architectures for learning model-based and model-free navigation strategies. Front Behav Neurosci 2012. [PMID: 23205006 PMCID: PMC3506961 DOI: 10.3389/fnbeh.2012.00079] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Behavior in spatial navigation is often organized into map-based (place-driven) vs. map-free (cue-driven) strategies; behavior in operant conditioning research is often organized into goal-directed vs. habitual strategies. Here we attempt to unify the two. We review one powerful theory for distinct forms of learning during instrumental conditioning, namely model-based (maintaining a representation of the world) and model-free (reacting to immediate stimuli) learning algorithms. We extend these lines of argument to propose an alternative taxonomy for spatial navigation, showing how various previously identified strategies can be distinguished as “model-based” or “model-free” depending on the usage of information and not on the type of information (e.g., cue vs. place). We argue that identifying “model-free” learning with dorsolateral striatum and “model-based” learning with dorsomedial striatum could reconcile numerous conflicting results in the spatial navigation literature. From this perspective, we further propose that the ventral striatum plays key roles in the model-building process. We propose that the core of the ventral striatum is positioned to learn the probability of action selection for every transition between states of the world. We further review suggestions that the ventral striatal core and shell are positioned to act as “critics” contributing to the computation of a reward prediction error for model-free and model-based systems, respectively.
Collapse
Affiliation(s)
- Mehdi Khamassi
- Institut des Systèmes Intelligents et de Robotique, Université Pierre et Marie Curie Paris, France ; Centre National de la Recherche Scientifique, UMR7222 Paris, France
| | | |
Collapse
|
73
|
Beeler JA. Thorndike's Law 2.0: Dopamine and the Regulation of Thrift. Front Neurosci 2012; 6:116. [PMID: 22905023 PMCID: PMC3415691 DOI: 10.3389/fnins.2012.00116] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2012] [Accepted: 07/19/2012] [Indexed: 12/03/2022] Open
Abstract
Dopamine is widely associated with reward, motivation, and reinforcement learning. Research on dopamine has emphasized its contribution to compulsive behaviors, such as addiction and overeating, with less examination of its potential role in behavioral flexibility in normal, non-pathological states. In the study reviewed here, we investigated the effect of increased tonic dopamine in a two-lever homecage operant paradigm where the relative value of the levers was dynamic, requiring the mice to constantly monitor reward outcome and adapt their behavior. The data were fit to a temporal difference learning model that showed that mice with elevated dopamine exhibited less coupling between reward history and behavioral choice. This work suggests a way to integrate motivational and learning theories of dopamine into a single formal model where tonic dopamine regulates the expression of prior reward learning by controlling the degree to which learned reward values bias behavioral choice. Here I place these results in a broader context of dopamine's role in instrumental learning and suggest a novel hypothesis that tonic dopamine regulates thrift, the degree to which an animal needs to exploit its prior reward learning to maximize return on energy expenditure. Our data suggest that increased dopamine decreases thriftiness, facilitating energy expenditure, and permitting greater exploration. Conversely, this implies that decreased dopamine increases thriftiness, favoring the exploitation of prior reward learning, and diminishing exploration. This perspective provides a different window onto the role dopamine may play in behavioral flexibility and its failure, compulsive behavior.
Collapse
Affiliation(s)
- Jeff A Beeler
- Department of Neurobiology, University of Chicago Chicago, IL, USA
| |
Collapse
|
74
|
Fee MS. Oculomotor learning revisited: a model of reinforcement learning in the basal ganglia incorporating an efference copy of motor actions. Front Neural Circuits 2012; 6:38. [PMID: 22754501 PMCID: PMC3385561 DOI: 10.3389/fncir.2012.00038] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2012] [Accepted: 06/01/2012] [Indexed: 11/13/2022] Open
Abstract
In its simplest formulation, reinforcement learning is based on the idea that if an action taken in a particular context is followed by a favorable outcome, then, in the same context, the tendency to produce that action should be strengthened, or reinforced. While reinforcement learning forms the basis of many current theories of basal ganglia (BG) function, these models do not incorporate distinct computational roles for signals that convey context, and those that convey what action an animal takes. Recent experiments in the songbird suggest that vocal-related BG circuitry receives two functionally distinct excitatory inputs. One input is from a cortical region that carries context information about the current “time” in the motor sequence. The other is an efference copy of motor commands from a separate cortical brain region that generates vocal variability during learning. Based on these findings, I propose here a general model of vertebrate BG function that combines context information with a distinct motor efference copy signal. The signals are integrated by a learning rule in which efference copy inputs gate the potentiation of context inputs (but not efference copy inputs) onto medium spiny neurons in response to a rewarded action. The hypothesis is described in terms of a circuit that implements the learning of visually guided saccades. The model makes testable predictions about the anatomical and functional properties of hypothesized context and efference copy inputs to the striatum from both thalamic and cortical sources.
Collapse
Affiliation(s)
- Michale S Fee
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge MA, USA
| |
Collapse
|
75
|
Summerfield C, Tsetsos K. Building Bridges between Perceptual and Economic Decision-Making: Neural and Computational Mechanisms. Front Neurosci 2012; 6:70. [PMID: 22654730 PMCID: PMC3359443 DOI: 10.3389/fnins.2012.00070] [Citation(s) in RCA: 102] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2012] [Accepted: 04/26/2012] [Indexed: 11/13/2022] Open
Abstract
Investigation into the neural and computational bases of decision-making has proceeded in two parallel but distinct streams. Perceptual decision-making (PDM) is concerned with how observers detect, discriminate, and categorize noisy sensory information. Economic decision-making (EDM) explores how options are selected on the basis of their reinforcement history. Traditionally, the sub-fields of PDM and EDM have employed different paradigms, proposed different mechanistic models, explored different brain regions, disagreed about whether decisions approach optimality. Nevertheless, we argue that there is a common framework for understanding decisions made in both tasks, under which an agent has to combine sensory information (what is the stimulus) with value information (what is it worth). We review computational models of the decision process typically used in PDM, based around the idea that decisions involve a serial integration of evidence, and assess their applicability to decisions between good and gambles. Subsequently, we consider the contribution of three key brain regions - the parietal cortex, the basal ganglia, and the orbitofrontal cortex (OFC) - to perceptual and EDM, with a focus on the mechanisms by which sensory and reward information are integrated during choice. We find that although the parietal cortex is often implicated in the integration of sensory evidence, there is evidence for its role in encoding the expected value of a decision. Similarly, although much research has emphasized the role of the striatum and OFC in value-guided choices, they may play an important role in categorization of perceptual information. In conclusion, we consider how findings from the two fields might be brought together, in order to move toward a general framework for understanding decision-making in humans and other primates.
Collapse
|
76
|
Abstract
Reinforcement learning is an adaptive process in which an animal utilizes its previous experience to improve the outcomes of future choices. Computational theories of reinforcement learning play a central role in the newly emerging areas of neuroeconomics and decision neuroscience. In this framework, actions are chosen according to their value functions, which describe how much future reward is expected from each action. Value functions can be adjusted not only through reward and penalty, but also by the animal's knowledge of its current environment. Studies have revealed that a large proportion of the brain is involved in representing and updating value functions and using them to choose an action. However, how the nature of a behavioral task affects the neural mechanisms of reinforcement learning remains incompletely understood. Future studies should uncover the principles by which different computational elements of reinforcement learning are dynamically coordinated across the entire brain.
Collapse
Affiliation(s)
- Daeyeol Lee
- Department of Neurobiology, Kavli Institute for Neuroscience, Yale University School of Medicine, New Haven, Connecticut 06510, USA.
| | | | | |
Collapse
|
77
|
Penhune VB, Steele CJ. Parallel contributions of cerebellar, striatal and M1 mechanisms to motor sequence learning. Behav Brain Res 2011; 226:579-91. [PMID: 22004979 DOI: 10.1016/j.bbr.2011.09.044] [Citation(s) in RCA: 258] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2011] [Revised: 09/27/2011] [Accepted: 09/30/2011] [Indexed: 10/17/2022]
Abstract
When learning a new motor sequence, we must execute the correct order of movements while simultaneously optimizing sensorimotor parameters such as trajectory, timing, velocity and force. Neurophysiological studies in animals and humans have identified the major brain regions involved in sequence learning, including the motor cortex (M1), basal ganglia (BG) and cerebellum. Current models link these regions to different stages of learning (early vs. late) or different components of performance (spatial vs. sensorimotor). At the same time, research in motor control has given rise to the concept that internal models at different levels of the motor system may contribute to learning. The goal of this review is to develop a new framework for motor sequence learning that combines stage and component models within the context of internal models. To do this, we review behavioral and neuroimaging studies in humans and neurophysiological studies in animals. Based on this evidence, we present a model proposing that sequence learning is underwritten by parallel, interacting processes, including internal model formation and sequence representation, that are instantiated in specific cerebellar, BG or M1 mechanisms depending on task demands and the stage of learning. The striatal system learns predictive stimulus-response associations and is critical for motor chunking. The role of the cerebellum is to acquire the optimal internal model for sequence performance in a particular context, and to contribute to error correction and control of on-going movement. M1 acts to store the representation of a learned sequence, likely as part of a distributed network including the parietal lobe and premotor cortex.
Collapse
Affiliation(s)
- Virginia B Penhune
- Laboratory for Motor Learning and Neural Plasticity, Department of Psychology, Concordia University, Canada.
| | | |
Collapse
|