1
|
Jürgensen AM, Sakagiannis P, Schleyer M, Gerber B, Nawrot MP. Prediction error drives associative learning and conditioned behavior in a spiking model of Drosophila larva. iScience 2024; 27:108640. [PMID: 38292165 PMCID: PMC10824792 DOI: 10.1016/j.isci.2023.108640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 11/10/2023] [Accepted: 12/01/2023] [Indexed: 02/01/2024] Open
Abstract
Predicting reinforcement from sensory cues is beneficial for goal-directed behavior. In insect brains, underlying associations between cues and reinforcement, encoded by dopaminergic neurons, are formed in the mushroom body. We propose a spiking model of the Drosophila larva mushroom body. It includes a feedback motif conveying learned reinforcement expectation to dopaminergic neurons, which can compute prediction error as the difference between expected and present reinforcement. We demonstrate that this can serve as a driving force in learning. When combined with synaptic homeostasis, our model accounts for theoretically derived features of acquisition and loss of associations that depend on the intensity of the reinforcement and its temporal proximity to the cue. From modeling olfactory learning over the time course of behavioral experiments and simulating the locomotion of individual larvae toward or away from odor sources in a virtual environment, we conclude that learning driven by prediction errors can explain larval behavior.
Collapse
Affiliation(s)
- Anna-Maria Jürgensen
- Computational Systems Neuroscience, Institute of Zoology, University of Cologne, 50674 Cologne, Germany
| | - Panagiotis Sakagiannis
- Computational Systems Neuroscience, Institute of Zoology, University of Cologne, 50674 Cologne, Germany
| | - Michael Schleyer
- Leibniz Institute for Neurobiology (LIN), Department of Genetics, 39118 Magdeburg, Germany
- Institute for the Advancement of Higher Education, Faculty of Science, Hokkaido University, Sapporo 060-08080, Japan
| | - Bertram Gerber
- Leibniz Institute for Neurobiology (LIN), Department of Genetics, 39118 Magdeburg, Germany
- Institute for Biology, Otto-von-Guericke University, 39120 Magdeburg, Germany
- Center for Brain and Behavioral Sciences (CBBS), Otto-von-Guericke University, 39118 Magdeburg, Germany
| | - Martin Paul Nawrot
- Computational Systems Neuroscience, Institute of Zoology, University of Cologne, 50674 Cologne, Germany
| |
Collapse
|
2
|
Lee H, Hikosaka O. Lateral habenula neurons signal step-by-step changes of reward prediction. iScience 2022; 25:105440. [DOI: 10.1016/j.isci.2022.105440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 08/15/2022] [Accepted: 10/20/2022] [Indexed: 11/06/2022] Open
|
3
|
Traner MR, Bromberg-Martin ES, Monosov IE. How the value of the environment controls persistence in visual search. PLoS Comput Biol 2021; 17:e1009662. [PMID: 34905548 PMCID: PMC8714092 DOI: 10.1371/journal.pcbi.1009662] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 12/28/2021] [Accepted: 11/21/2021] [Indexed: 11/18/2022] Open
Abstract
Classic foraging theory predicts that humans and animals aim to gain maximum reward per unit time. However, in standard instrumental conditioning tasks individuals adopt an apparently suboptimal strategy: they respond slowly when the expected value is low. This reward-related bias is often explained as reduced motivation in response to low rewards. Here we present evidence this behavior is associated with a complementary increased motivation to search the environment for alternatives. We trained monkeys to search for reward-related visual targets in environments with different values. We found that the reward-related bias scaled with environment value, was consistent with persistent searching after the target was already found, and was associated with increased exploratory gaze to objects in the environment. A novel computational model of foraging suggests that this search strategy could be adaptive in naturalistic settings where both environments and the objects within them provide partial information about hidden, uncertain rewards.
Collapse
Affiliation(s)
- Michael R. Traner
- Department of Biomedical Engineering, Washington University, St. Louis, Missouri, United States of America
| | - Ethan S. Bromberg-Martin
- Department of Neuroscience, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Ilya E. Monosov
- Department of Biomedical Engineering, Washington University, St. Louis, Missouri, United States of America
- Department of Neuroscience, Washington University School of Medicine, St. Louis, Missouri, United States of America
- Department of Neurosurgery, Washington University, St. Louis, Missouri, United States of America
- Pain Center, Washington University, St. Louis, Missouri, United States of America
- Department of Electrical Engineering, Washington University, St. Louis, Missouri, United States of America
- * E-mail:
| |
Collapse
|
4
|
Woolrych A, Vautrelle N, Reynolds JNJ, Parr-Brownlie LC. Throwing open the doors of perception: The role of dopamine in visual processing. Eur J Neurosci 2021; 54:6135-6146. [PMID: 34340265 DOI: 10.1111/ejn.15408] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Revised: 07/05/2021] [Accepted: 07/18/2021] [Indexed: 01/11/2023]
Abstract
Animals form associations between visual cues and behaviours. Although dopamine is known to be critical in many areas of the brain to bind sensory information with appropriate responses, dopamine's role in the visual system is less well understood. Visual signals, which indicate the likely occurrence of a rewarding or aversive stimulus or indicate the context within which such stimuli may arrive, modulate activity in the superior colliculus and alter behaviour. However, such signals primarily originate in cortical and basal ganglia circuits, and evidence of direct signalling from midbrain dopamine neurons to superior colliculus is lacking. Instead, hypothalamic A13 dopamine neurons innervate the superior colliculus, and dopamine receptors are differentially expressed in the superior colliculus, with D1 receptors in superficial layers and D2 receptors in deep layers. However, it remains unknown if A13 dopamine neurons control behaviours through their effect on afferents within the superior colliculus. We propose that A13 dopamine neurons may play a critical role in processing information in the superior colliculus, modifying behavioural responses to visual cues, and propose some testable hypotheses regarding dopamine's effect on visual perception.
Collapse
Affiliation(s)
- Alexander Woolrych
- Department of Anatomy, School of Biomedical Sciences, Brain Health Research Centre, University of Otago, Dunedin, New Zealand
| | - Nicolas Vautrelle
- Department of Anatomy, School of Biomedical Sciences, Brain Health Research Centre, University of Otago, Dunedin, New Zealand
| | - John N J Reynolds
- Department of Anatomy, School of Biomedical Sciences, Brain Health Research Centre, University of Otago, Dunedin, New Zealand
| | - Louise C Parr-Brownlie
- Department of Anatomy, School of Biomedical Sciences, Brain Health Research Centre, University of Otago, Dunedin, New Zealand
| |
Collapse
|
5
|
What Do You Want to Eat? Influence of Menu Description and Design on Consumer's Mind: An fMRI Study. Foods 2021; 10:foods10050919. [PMID: 33922036 PMCID: PMC8170898 DOI: 10.3390/foods10050919] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 04/15/2021] [Accepted: 04/19/2021] [Indexed: 11/21/2022] Open
Abstract
The main objective of this research was to analyse the active regions when processing dishes with a pleasant (vs. unpleasant) design and the effect of the previously read rational (vs. emotional) description when visualising the dish. The functional magnetic resonance image technique was used for the study. The results showed that participants who visualised pleasant vs. unpleasant dishes became active in several domains (e.g., attention, cognition and reward). On the other side, visualisation of unpleasant dishes activated stronger regions linked to inhibition, rejection, and related ambiguity. We found that subjects who read rational descriptions when visualising pleasant dishes activated regions related to congruence integration, while subjects who visualised emotional descriptions showed an increased neuronal response to pleasant dishes in the regions related to memory, emotion and congruence.
Collapse
|
6
|
Tanaka S, O'Doherty JP, Sakagami M. The cost of obtaining rewards enhances the reward prediction error signal of midbrain dopamine neurons. Nat Commun 2019; 10:3674. [PMID: 31417077 PMCID: PMC6695452 DOI: 10.1038/s41467-019-11334-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Accepted: 07/09/2019] [Indexed: 01/08/2023] Open
Abstract
Midbrain dopamine neurons are known to encode reward prediction errors (RPE) used to update value predictions. Here, we examine whether RPE signals coded by midbrain dopamine neurons are modulated by the cost paid to obtain rewards, by recording from dopamine neurons in awake behaving monkeys during performance of an effortful saccade task. Dopamine neuron responses to cues predicting reward and to the delivery of rewards were increased after the performance of a costly action compared to a less costly action, suggesting that RPEs are enhanced following the performance of a costly action. At the behavioral level, stimulus-reward associations are learned faster after performing a costly action compared to a less costly action. Thus, information about action cost is processed in the dopamine reward system in a manner that amplifies the following dopamine RPE signal, which in turn promotes more rapid learning under situations of high cost. Rewards that require high effort tend to be preferred over those that require low effort. Here, the authors show how the effort of obtaining rewards affects reward-related activity of dopamine neurons, and in turn the speed of learning stimulus-reward associations.
Collapse
Affiliation(s)
- Shingo Tanaka
- Brain Science Institute, Tamagawa University, 6-1-1 Tamagawagakuen, Machida, Tokyo, 194-8610, Japan
| | - John P O'Doherty
- Division of the Humanities and Social Sciences, California Institute of Technology, 1200 E California Blvd, Pasadena, CA, 91125, USA.,Computation and Neural Systems, California Institute of Technology, 1200 E California Blvd, Pasadena, CA, 91125, USA
| | - Masamichi Sakagami
- Brain Science Institute, Tamagawa University, 6-1-1 Tamagawagakuen, Machida, Tokyo, 194-8610, Japan.
| |
Collapse
|
7
|
When the simplest voluntary decisions appear patently suboptimal. Behav Brain Sci 2019; 41:e240. [PMID: 30767836 DOI: 10.1017/s0140525x18001474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Rahnev & Denison (R&D) catalog numerous experiments in which performance deviates, often in subtle ways, from the theoretical ideal. We discuss an extreme case, an elementary behavior (reactive saccades to single targets) for which a simple contextual manipulation results in responses that are dramatically different from those expected based on reward maximization - and yet are highly informative and amenable to mechanistic examination.
Collapse
|
8
|
Selective reward affects the rate of saccade adaptation. Neuroscience 2017; 355:113-125. [PMID: 28499971 DOI: 10.1016/j.neuroscience.2017.04.048] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2016] [Revised: 04/25/2017] [Accepted: 04/29/2017] [Indexed: 11/23/2022]
Abstract
In this study we tested whether a selective reward could affect the adaptation of saccadic eye movements in monkeys. We induced the adaptation of saccades by displacing the target of a horizontal saccade vertically as the eye moved toward it, thereby creating an apparent vertical dysmetria. The repeated upward target displacement caused the originally horizontal saccade to gradually deviate upward over the course of several hundred trials. We induced this directional adaptation in both right- and leftward saccades in every experiment (n=20). In half of the experiments (n=10), we rewarded monkeys only when they made leftward saccades and in the other half (n=10) only for rightward saccades. The reaction time of saccades in the rewarded direction was shorter and we, like others, interpreted this change as a sign of the reward's preferential effect in that direction. Saccades in the rewarded direction showed more rapid adaptation of their directions than did saccades in the non-rewarded direction, indicating that the selective reward increased the speed of saccade adaptation. The differences in adaptation speed were reflected in changes in saccade metrics, which were usually more noticeable in the deceleration phases of saccades than in their acceleration phases. Because previous studies have shown that the oculomotor cerebellum is involved with saccade deceleration and also participates in saccade adaptation, it is possible that selective reward could influence cerebellar plasticity.
Collapse
|
9
|
Hikosaka O, Ghazizadeh A, Griggs W, Amita H. Parallel basal ganglia circuits for decision making. J Neural Transm (Vienna) 2017; 125:515-529. [PMID: 28155134 DOI: 10.1007/s00702-017-1691-1] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2016] [Accepted: 01/26/2017] [Indexed: 10/20/2022]
Abstract
The basal ganglia control body movements, mainly, based on their values. Critical for this mechanism is dopamine neurons, which sends unpredicted value signals, mainly, to the striatum. This mechanism enables animals to change their behaviors flexibly, eventually choosing a valuable behavior. However, this may not be the best behavior, because the flexible choice is focused on recent, and, therefore, limited, experiences (i.e., short-term memories). Our old and recent studies suggest that the basal ganglia contain separate circuits that process value signals in a completely different manner. They are insensitive to recent changes in value, yet gradually accumulate the value of each behavior (i.e., movement or object choice). These stable circuits eventually encode values of many behaviors and then retain the value signals for a long time (i.e., long-term memories). They are innervated by a separate group of dopamine neurons that retain value signals, even when no reward is predicted. Importantly, the stable circuits can control motor behaviors (e.g., hand or eye) quickly and precisely, which allows animals to automatically acquire valuable outcomes based on historical life experiences. These behaviors would be called 'skills', which are crucial for survival. The stable circuits are localized in the posterior part of the basal ganglia, separately from the flexible circuits located in the anterior part. To summarize, the flexible and stable circuits in the basal ganglia, working together but independently, enable animals (and humans) to reach valuable goals in various contexts.
Collapse
Affiliation(s)
- Okihide Hikosaka
- Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health, Bethesda, MD, USA. .,National Institute on Drug Abuse, National Institutes of Health, Baltimore, MD, USA.
| | - Ali Ghazizadeh
- Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health, Bethesda, MD, USA
| | - Whitney Griggs
- Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health, Bethesda, MD, USA
| | - Hidetoshi Amita
- Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
10
|
Collins AL, Greenfield VY, Bye JK, Linker KE, Wang AS, Wassum KM. Dynamic mesolimbic dopamine signaling during action sequence learning and expectation violation. Sci Rep 2016; 6:20231. [PMID: 26869075 PMCID: PMC4751524 DOI: 10.1038/srep20231] [Citation(s) in RCA: 66] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2015] [Accepted: 12/23/2015] [Indexed: 02/06/2023] Open
Abstract
Prolonged mesolimbic dopamine concentration changes have been detected during spatial navigation, but little is known about the conditions that engender this signaling profile or how it develops with learning. To address this, we monitored dopamine concentration changes in the nucleus accumbens core of rats throughout acquisition and performance of an instrumental action sequence task. Prolonged dopamine concentration changes were detected that ramped up as rats executed each action sequence and declined after earned reward collection. With learning, dopamine concentration began to rise increasingly earlier in the execution of the sequence and ultimately backpropagated away from stereotyped sequence actions, becoming only transiently elevated by the most distal and unexpected reward predictor. Action sequence-related dopamine signaling was reactivated in well-trained rats if they became disengaged in the task and in response to an unexpected change in the value, but not identity of the earned reward. Throughout training and test, dopamine signaling correlated with sequence performance. These results suggest that action sequences can engender a prolonged mode of dopamine signaling in the nucleus accumbens core and that such signaling relates to elements of the motivation underlying sequence execution and is dynamic with learning, overtraining and violations in reward expectation.
Collapse
Affiliation(s)
| | | | | | - Kay E. Linker
- Dept. of Psychology, UCLA, Los Angeles, CA 90095, USA
| | - Alice S. Wang
- Dept. of Psychology, UCLA, Los Angeles, CA 90095, USA
| | - Kate M. Wassum
- Dept. of Psychology, UCLA, Los Angeles, CA 90095, USA
- Brain Research Institute, UCLA, Los Angeles, CA 90095, USA
| |
Collapse
|
11
|
Abstract
Besides their fundamental movement function evidenced by Parkinsonian deficits, the basal ganglia are involved in processing closely linked non-motor, cognitive and reward information. This review describes the reward functions of three brain structures that are major components of the basal ganglia or are closely associated with the basal ganglia, namely midbrain dopamine neurons, pedunculopontine nucleus, and striatum (caudate nucleus, putamen, nucleus accumbens). Rewards are involved in learning (positive reinforcement), approach behavior, economic choices and positive emotions. The response of dopamine neurons to rewards consists of an early detection component and a subsequent reward component that reflects a prediction error in economic utility, but is unrelated to movement. Dopamine activations to non-rewarded or aversive stimuli reflect physical impact, but not punishment. Neurons in pedunculopontine nucleus project their axons to dopamine neurons and process sensory stimuli, movements and rewards and reward-predicting stimuli without coding outright reward prediction errors. Neurons in striatum, besides their pronounced movement relationships, process rewards irrespective of sensory and motor aspects, integrate reward information into movement activity, code the reward value of individual actions, change their reward-related activity during learning, and code own reward in social situations depending on whose action produces the reward. These data demonstrate a variety of well-characterized reward processes in specific basal ganglia nuclei consistent with an important function in non-motor aspects of motivated behavior.
Collapse
Affiliation(s)
- Wolfram Schultz
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, CB2 3DY, UK.
| |
Collapse
|
12
|
Abstract
Rewards are crucial objects that induce learning, approach behavior, choices, and emotions. Whereas emotions are difficult to investigate in animals, the learning function is mediated by neuronal reward prediction error signals which implement basic constructs of reinforcement learning theory. These signals are found in dopamine neurons, which emit a global reward signal to striatum and frontal cortex, and in specific neurons in striatum, amygdala, and frontal cortex projecting to select neuronal populations. The approach and choice functions involve subjective value, which is objectively assessed by behavioral choices eliciting internal, subjective reward preferences. Utility is the formal mathematical characterization of subjective value and a prime decision variable in economic choice theory. It is coded as utility prediction error by phasic dopamine responses. Utility can incorporate various influences, including risk, delay, effort, and social interaction. Appropriate for formal decision mechanisms, rewards are coded as object value, action value, difference value, and chosen value by specific neurons. Although all reward, reinforcement, and decision variables are theoretical constructs, their neuronal signals constitute measurable physical implementations and as such confirm the validity of these concepts. The neuronal reward signals provide guidance for behavior while constraining the free will to act.
Collapse
Affiliation(s)
- Wolfram Schultz
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
13
|
Yoshimi K, Kumada S, Weitemier A, Jo T, Inoue M. Reward-Induced Phasic Dopamine Release in the Monkey Ventral Striatum and Putamen. PLoS One 2015; 10:e0130443. [PMID: 26110516 PMCID: PMC4482386 DOI: 10.1371/journal.pone.0130443] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2014] [Accepted: 05/20/2015] [Indexed: 12/13/2022] Open
Abstract
In-vivo voltammetry has successfully been used to detect dopamine release in rodent brains, but its application to monkeys has been limited. We have previously detected dopamine release in the caudate of behaving Japanese monkeys using diamond microelectrodes (Yoshimi 2011); however it is not known whether the release pattern is the same in various areas of the forebrain. Recent studies have suggested variations in the dopaminergic projections to forebrain areas. In the present study, we attempted simultaneous recording at two locations in the striatum, using fast-scan cyclic voltammetry (FSCV) on carbon fibers, which has been widely used in rodents. Responses to unpredicted food and liquid rewards were detected repeatedly. The response to the liquid reward after conditioned stimuli was enhanced after switching the prediction cue. These characteristics were generally similar between the ventral striatum and the putamen. Overall, the technical application of FSCV recording in multiple locations was successful in behaving primates, and further voltammetric recordings in multiple locations will expand our knowledge of dopamine reward responses.
Collapse
Affiliation(s)
- Kenji Yoshimi
- Department of Neurophysiology, Juntendo University School of Medicine, Bunkyo-ku, Tokyo, Japan
- * E-mail:
| | - Shiori Kumada
- Department of Psychology, Japan Women's University, Kawasaki, Kanagawa, Japan
| | | | - Takayuki Jo
- Department of Neurology, Juntendo University School of Medicine, Bunkyo-ku, Tokyo, Japan
| | - Masato Inoue
- Department of Neurophysiology, Juntendo University School of Medicine, Bunkyo-ku, Tokyo, Japan
| |
Collapse
|
14
|
Dasgupta S, Wörgötter F, Manoonpong P. Neuromodulatory adaptive combination of correlation-based learning in cerebellum and reward-based learning in basal ganglia for goal-directed behavior control. Front Neural Circuits 2014; 8:126. [PMID: 25389391 PMCID: PMC4211401 DOI: 10.3389/fncir.2014.00126] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2014] [Accepted: 09/30/2014] [Indexed: 12/30/2022] Open
Abstract
Goal-directed decision making in biological systems is broadly based on associations between conditional and unconditional stimuli. This can be further classified as classical conditioning (correlation-based learning) and operant conditioning (reward-based learning). A number of computational and experimental studies have well established the role of the basal ganglia in reward-based learning, where as the cerebellum plays an important role in developing specific conditioned responses. Although viewed as distinct learning systems, recent animal experiments point toward their complementary role in behavioral learning, and also show the existence of substantial two-way communication between these two brain structures. Based on this notion of co-operative learning, in this paper we hypothesize that the basal ganglia and cerebellar learning systems work in parallel and interact with each other. We envision that such an interaction is influenced by reward modulated heterosynaptic plasticity (RMHP) rule at the thalamus, guiding the overall goal directed behavior. Using a recurrent neural network actor-critic model of the basal ganglia and a feed-forward correlation-based learning model of the cerebellum, we demonstrate that the RMHP rule can effectively balance the outcomes of the two learning systems. This is tested using simulated environments of increasing complexity with a four-wheeled robot in a foraging task in both static and dynamic configurations. Although modeled with a simplified level of biological abstraction, we clearly demonstrate that such a RMHP induced combinatorial learning mechanism, leads to stabler and faster learning of goal-directed behaviors, in comparison to the individual systems. Thus, in this paper we provide a computational model for adaptive combination of the basal ganglia and cerebellum learning systems by way of neuromodulated plasticity for goal-directed decision making in biological and bio-mimetic organisms.
Collapse
Affiliation(s)
- Sakyasingha Dasgupta
- Institute for Physics - Biophysics, George-August-UniversityGöttingen, Germany
- Bernstein Center for Computational Neuroscience, George-August-UniversityGöttingen, Germany
| | - Florentin Wörgötter
- Institute for Physics - Biophysics, George-August-UniversityGöttingen, Germany
- Bernstein Center for Computational Neuroscience, George-August-UniversityGöttingen, Germany
| | - Poramate Manoonpong
- Bernstein Center for Computational Neuroscience, George-August-UniversityGöttingen, Germany
- Center for Biorobotics, Maersk Mc-Kinney Møller Institute, University of Southern DenmarkOdense, Denmark
| |
Collapse
|
15
|
Overton PG, Vautrelle N, Redgrave P. Sensory regulation of dopaminergic cell activity: Phenomenology, circuitry and function. Neuroscience 2014; 282:1-12. [PMID: 24462607 DOI: 10.1016/j.neuroscience.2014.01.023] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2013] [Revised: 01/13/2014] [Accepted: 01/14/2014] [Indexed: 01/11/2023]
Abstract
Dopaminergic neurons in a range of species are responsive to sensory stimuli. In the anesthetized preparation, responses to non-noxious and noxious sensory stimuli are usually tonic in nature, although long-duration changes in activity have been reported in the awake preparation as well. However, in the awake preparation, short-latency, phasic changes in activity are most common. These phasic responses can occur to unconditioned aversive and non-aversive stimuli, as well as to the stimuli which predict them. In both the anesthetized and awake preparations, not all dopaminergic neurons are responsive to sensory stimuli, however responsive neurons tend to respond to more than a single stimulus modality. Evidence suggests that short-latency sensory information is provided to dopaminergic neurons by relatively primitive subcortical structures - including the midbrain superior colliculus for vision and the mesopontine parabrachial nucleus for pain and possibly gustation. Although short-latency visual information is provided to dopaminergic neurons by the relatively primitive colliculus, dopaminergic neurons can discriminate between complex visual stimuli, an apparent paradox which can be resolved by the recently discovered route of information flow through to dopaminergic neurons from the cerebral cortex, via a relay in the colliculus. Given that projections from the cortex to the colliculus are extensive, such a relay potentially allows the activity of dopaminergic neurons to report the results of complex stimulus processing from widespread areas of the cortex. Furthermore, dopaminergic neurons could acquire their ability to reflect stimulus value by virtue of reward-related modification of sensory processing in the cortex. At the forebrain level, sensory-related changes in the tonic activity of dopaminergic neurons may regulate the impact of the cortex on forebrain structures such as the nucleus accumbens. In contrast, the short latency of the phasic responses to sensory stimuli in dopaminergic neurons, coupled with the activation of these neurons by non-rewarding stimuli, suggests that phasic responses of dopaminergic neurons may provide a signal to the forebrain which indicates that a salient event has occurred (and possibly an estimate of how salient that event is). A stimulus-related salience signal could be used by downstream systems to reinforce behavioral choices.
Collapse
Affiliation(s)
- P G Overton
- Department of Psychology, University of Sheffield, Western Bank, Sheffield S10 2TN, UK.
| | - N Vautrelle
- Department of Psychology, University of Sheffield, Western Bank, Sheffield S10 2TN, UK
| | - P Redgrave
- Department of Psychology, University of Sheffield, Western Bank, Sheffield S10 2TN, UK
| |
Collapse
|
16
|
Huys QJ, Tobler PN, Hasler G, Flagel SB. The role of learning-related dopamine signals in addiction vulnerability. PROGRESS IN BRAIN RESEARCH 2014; 211:31-77. [DOI: 10.1016/b978-0-444-63425-2.00003-9] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
17
|
Matsumoto M, Takada M. Distinct representations of cognitive and motivational signals in midbrain dopamine neurons. Neuron 2013; 79:1011-24. [PMID: 23932490 DOI: 10.1016/j.neuron.2013.07.002] [Citation(s) in RCA: 96] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/20/2013] [Indexed: 11/29/2022]
Abstract
Dopamine is essential to cognitive functions. However, despite abundant studies demonstrating that dopamine neuron activity is related to reinforcement and motivation, little is known about what signals dopamine neurons convey to promote cognitive processing. We therefore examined dopamine neuron activity in monkeys performing a delayed matching-to-sample task that required working memory and visual search. We found that dopamine neurons responded to task events associated with cognitive operations. A subset of dopamine neurons were activated by visual stimuli if the monkey had to store the stimuli in working memory. These neurons were located dorsolaterally in the substantia nigra pars compacta, whereas ventromedial dopamine neurons, some in the ventral tegmental area, represented reward prediction signals. Furthermore, dopamine neurons monitored visual search performance, becoming active when the monkey made an internal judgment that the search was successfully completed. Our findings suggest an anatomical gradient of dopamine signals along the dorsolateral-ventromedial axis of the ventral midbrain.
Collapse
Affiliation(s)
- Masayuki Matsumoto
- Systems Neuroscience Section, Department of Cellular and Molecular Biology, Primate Research Institute, Kyoto University, Inuyama, Aichi 484-8506, Japan; Division of Biomedical Science, Faculty of Medicine, University of Tsukuba, Tsukuba, Ibaraki 305-8577, Japan; Graduate School of Comprehensive Human Sciences, University of Tsukuba, Tsukuba, Ibaraki 305-8577, Japan.
| | | |
Collapse
|
18
|
Dopaminergic control of motivation and reinforcement learning: a closed-circuit account for reward-oriented behavior. J Neurosci 2013; 33:8866-90. [PMID: 23678129 DOI: 10.1523/jneurosci.4614-12.2013] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Humans and animals take actions quickly when they expect that the actions lead to reward, reflecting their motivation. Injection of dopamine receptor antagonists into the striatum has been shown to slow such reward-seeking behavior, suggesting that dopamine is involved in the control of motivational processes. Meanwhile, neurophysiological studies have revealed that phasic response of dopamine neurons appears to represent reward prediction error, indicating that dopamine plays central roles in reinforcement learning. However, previous attempts to elucidate the mechanisms of these dopaminergic controls have not fully explained how the motivational and learning aspects are related and whether they can be understood by the way the activity of dopamine neurons itself is controlled by their upstream circuitries. To address this issue, we constructed a closed-circuit model of the corticobasal ganglia system based on recent findings regarding intracortical and corticostriatal circuit architectures. Simulations show that the model could reproduce the observed distinct motivational effects of D1- and D2-type dopamine receptor antagonists. Simultaneously, our model successfully explains the dopaminergic representation of reward prediction error as observed in behaving animals during learning tasks and could also explain distinct choice biases induced by optogenetic stimulation of the D1 and D2 receptor-expressing striatal neurons. These results indicate that the suggested roles of dopamine in motivational control and reinforcement learning can be understood in a unified manner through a notion that the indirect pathway of the basal ganglia represents the value of states/actions at a previous time point, an empirically driven key assumption of our model.
Collapse
|
19
|
Walton T, Thirkettle M, Redgrave P, Gurney KN, Stafford T. The discovery of novel actions is affected by very brief reinforcement delays and reinforcement modality. J Mot Behav 2013; 45:351-60. [PMID: 23796130 DOI: 10.1080/00222895.2013.806108] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
ABSTRACT The authors investigated the ability of human participants to discover novel actions under conditions of delayed reinforcement. Participants used a joystick to search for a target indicated by visual or auditory reinforcement. Reinforcement delays of 75-150 ms were found to significantly impair action acquisition. They also found an effect of modality, with acquisition superior with auditory feedback. The duration at which delay was found to impede action discovery is, to the authors' knowledge, shorter than that previously reported from work with operant and causal learning paradigms. The sensitivity to delay reported, and the difference between modalities, is consistent with accounts of action discovery that emphasize the importance of a time stamp in the motor record for solving the credit assignment problem.
Collapse
Affiliation(s)
- Tom Walton
- Department of Psychology, University of Sheffield, Western Bank, Sheffield, S10 2TP, England
| | | | | | | | | |
Collapse
|
20
|
Silvetti M, Wiersema JR, Sonuga-Barke E, Verguts T. Deficient reinforcement learning in medial frontal cortex as a model of dopamine-related motivational deficits in ADHD. Neural Netw 2013; 46:199-209. [PMID: 23811383 DOI: 10.1016/j.neunet.2013.05.008] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2012] [Revised: 05/13/2013] [Accepted: 05/13/2013] [Indexed: 11/20/2022]
Abstract
Attention Deficit/Hyperactivity Disorder (ADHD) is a pathophysiologically complex and heterogeneous condition with both cognitive and motivational components. We propose a novel computational hypothesis of motivational deficits in ADHD, drawing together recent evidence on the role of anterior cingulate cortex (ACC) and associated mesolimbic dopamine circuits in both reinforcement learning and ADHD. Based on findings of dopamine dysregulation and ACC involvement in ADHD we simulated a lesion in a previously validated computational model of ACC (Reward Value and Prediction Model, RVPM). We explored the effects of the lesion on the processing of reinforcement signals. We tested specific behavioral predictions about the profile of reinforcement-related deficits in ADHD in three experimental contexts; probability tracking task, partial and continuous reward schedules, and immediate versus delayed rewards. In addition, predictions were made at the neurophysiological level. Behavioral and neurophysiological predictions from the RVPM-based lesion-model of motivational dysfunction in ADHD were confirmed by data from previously published studies. RVPM represents a promising model of ADHD reinforcement learning suggesting that ACC dysregulation might play a role in the pathogenesis of motivational deficits in ADHD. However, more behavioral and neurophysiological studies are required to test core predictions of the model. In addition, the interaction with different brain networks underpinning other aspects of ADHD neuropathology (i.e., executive function) needs to be better understood.
Collapse
Affiliation(s)
- Massimo Silvetti
- Department of Experimental Psychology, Ghent University, Henri Dunantlaan 2, 9000 Ghent, Belgium.
| | | | | | | |
Collapse
|
21
|
Wassum KM, Ostlund SB, Maidment NT. Phasic mesolimbic dopamine signaling precedes and predicts performance of a self-initiated action sequence task. Biol Psychiatry 2012; 71:846-54. [PMID: 22305286 PMCID: PMC3471807 DOI: 10.1016/j.biopsych.2011.12.019] [Citation(s) in RCA: 63] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/18/2011] [Revised: 11/19/2011] [Accepted: 12/20/2011] [Indexed: 11/20/2022]
Abstract
BACKGROUND Sequential reward-seeking actions are readily learned despite the temporal gap between the earliest (distal) action in the sequence and the reward delivery. Fast dopamine signaling is hypothesized to mediate this form of learning by reporting errors in reward prediction. However, such a role for dopamine release in voluntarily initiated action sequences remains to be demonstrated. METHODS Using fast-scan cyclic voltammetry, we monitored phasic mesolimbic dopamine release, in real time, as rats performed a self-initiated sequence of lever presses to earn sucrose rewards. Before testing, rats received either 0 (n = 11), 5 (n = 11), or 10 (n = 8) days of action sequence training. RESULTS For rats acquiring the action sequence task at test, dopamine release was strongly elicited by response-contingent (but unexpected) rewards. With learning, a significant elevation in dopamine release preceded performance of the proximal action and subsequently came to precede the distal action. This predistal dopamine release response was also observed in rats previously trained on the action sequence task, and the amplitude of this signal predicted the latency with which rats completed the action sequence. Importantly, the dopamine response to contingent reward delivery was not observed in rats given extensive pretraining. Pharmacological analysis confirmed that task performance was dopamine-dependent. CONCLUSIONS These data suggest that phasic mesolimbic dopamine release mediates the influence that rewards exert over the performance of self-paced, sequentially-organized behavior and sheds light on how dopamine signaling abnormalities may contribute to disorders of behavioral control.
Collapse
Affiliation(s)
- Kate M Wassum
- University of California Los Angeles, Department of Psychology, Los Angeles, CA 90095, USA.
| | | | | |
Collapse
|
22
|
Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 2012; 482:85-8. [PMID: 22258508 PMCID: PMC3271183 DOI: 10.1038/nature10754] [Citation(s) in RCA: 892] [Impact Index Per Article: 74.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2011] [Accepted: 12/02/2011] [Indexed: 12/12/2022]
Abstract
Dopamine has a central role in motivation and reward. Dopaminergic neurons in the ventral tegmental area (VTA) signal the discrepancy between expected and actual rewards (that is, reward prediction error), but how they compute such signals is unknown. We recorded the activity of VTA neurons while mice associated different odour cues with appetitive and aversive outcomes. We found three types of neuron based on responses to odours and outcomes: approximately half of the neurons (type I, 52%) showed phasic excitation after reward-predicting odours and rewards in a manner consistent with reward prediction error coding; the other half of neurons showed persistent activity during the delay between odour and outcome that was modulated positively (type II, 31%) or negatively (type III, 18%) by the value of outcomes. Whereas the activity of type I neurons was sensitive to actual outcomes (that is, when the reward was delivered as expected compared to when it was unexpectedly omitted), the activity of type II and type III neurons was determined predominantly by reward-predicting odours. We 'tagged' dopaminergic and GABAergic neurons with the light-sensitive protein channelrhodopsin-2 and identified them based on their responses to optical stimulation while recording. All identified dopaminergic neurons were of type I and all GABAergic neurons were of type II. These results show that VTA GABAergic neurons signal expected reward, a key variable for dopaminergic neurons to calculate reward prediction error.
Collapse
|
23
|
Redgrave P, Vautrelle N, Reynolds J. Functional properties of the basal ganglia's re-entrant loop architecture: selection and reinforcement. Neuroscience 2011; 198:138-51. [DOI: 10.1016/j.neuroscience.2011.07.060] [Citation(s) in RCA: 96] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2011] [Revised: 07/19/2011] [Accepted: 07/22/2011] [Indexed: 12/31/2022]
|
24
|
A neural correlate of predicted and actual reward-value information in monkey pedunculopontine tegmental and dorsal raphe nucleus during saccade tasks. Neural Plast 2011; 2011:579840. [PMID: 22013541 PMCID: PMC3195531 DOI: 10.1155/2011/579840] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2011] [Revised: 07/13/2011] [Accepted: 08/04/2011] [Indexed: 11/28/2022] Open
Abstract
Dopamine, acetylcholine, and serotonin, the main modulators of the central nervous system, have been proposed to play important roles in the execution of movement, control of several forms of attentional behavior, and reinforcement learning. While the response pattern of midbrain dopaminergic neurons and its specific role in reinforcement learning have been revealed, the role of the other neuromodulators remains rather elusive. Here, we review our recent studies using extracellular recording from neurons in the pedunculopontine tegmental nucleus, where many cholinergic neurons exist, and the dorsal raphe nucleus, where many serotonergic neurons exist, while monkeys performed eye movement tasks to obtain different reward values. The firing patterns of these neurons are often tonic throughout the task period, while dopaminergic neurons exhibited a phasic activity pattern to the task event. The different modulation patterns, together with the activity of dopaminergic neurons, reveal dynamic information processing between these different neuromodulator systems.
Collapse
|
25
|
Yoshimi K, Naya Y, Mitani N, Kato T, Inoue M, Natori S, Takahashi T, Weitemier A, Nishikawa N, McHugh T, Einaga Y, Kitazawa S. Phasic reward responses in the monkey striatum as detected by voltammetry with diamond microelectrodes. Neurosci Res 2011; 71:49-62. [PMID: 21645558 DOI: 10.1016/j.neures.2011.05.013] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2010] [Revised: 04/16/2011] [Accepted: 05/18/2011] [Indexed: 10/18/2022]
Abstract
Reward-induced burst firing of dopaminergic neurons has mainly been studied in the primate midbrain. Voltammetry allows high-speed detection of dopamine release in the projection area. Although voltammetry has revealed presynaptic modulation of dopamine release in the striatum, to date, reward-induced release in awakened brains has been recorded only in rodents. To make such recordings, it is possible to use conventional carbon fibres in monkey brains but the use of these fibres is limited by their physical fragility. In this study, constant-potential amperometry was applied to novel diamond microelectrodes for high-speed detection of dopamine. In primate brains during Pavlovian cue-reward trials, a sharp response to a reward cue was detected in the caudate of Japanese monkeys. Overall, this method allows measurements of monoamine release in specific target areas of large brains, the findings from which will expand the knowledge of reward responses obtained by unit recordings.
Collapse
Affiliation(s)
- Kenji Yoshimi
- Department of Neurophysiology, Juntendo University School of Medicine, Hongo 2-1-1, Bunkyo-ku, Tokyo 113-8421, Japan.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Hong S, Hikosaka O. Dopamine-mediated learning and switching in cortico-striatal circuit explain behavioral changes in reinforcement learning. Front Behav Neurosci 2011; 5:15. [PMID: 21472026 PMCID: PMC3065164 DOI: 10.3389/fnbeh.2011.00015] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2010] [Accepted: 03/09/2011] [Indexed: 11/13/2022] Open
Abstract
The basal ganglia are thought to play a crucial role in reinforcement learning. Central to the learning mechanism are dopamine (DA) D1 and D2 receptors located in the cortico-striatal synapses. However, it is still unclear how this DA-mediated synaptic plasticity is deployed and coordinated during reward-contingent behavioral changes. Here we propose a computational model of reinforcement learning that uses different thresholds of D1- and D2-mediated synaptic plasticity which are antagonized by DA-independent synaptic plasticity. A phasic increase in DA release caused by a larger-than-expected reward induces long-term potentiation (LTP) in the direct pathway, whereas a phasic decrease in DA release caused by a smaller-than-expected reward induces a cessation of long-term depression, leading to LTP in the indirect pathway. This learning mechanism can explain the robust behavioral adaptation observed in a location-reward-value-association task where the animal makes shorter latency saccades to reward locations. The changes in saccade latency become quicker as the monkey becomes more experienced. This behavior can be explained by a switching mechanism which activates the cortico-striatal circuit selectively. Our model also shows how D1- or D2-receptor blocking experiments affect selectively either reward or no-reward trials. The proposed mechanisms also explain the behavioral changes in Parkinson's disease.
Collapse
Affiliation(s)
- Simon Hong
- Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health Bethesda, MD, USA
| | | |
Collapse
|
27
|
Bromberg-Martin ES, Matsumoto M, Hikosaka O. Dopamine in motivational control: rewarding, aversive, and alerting. Neuron 2011; 68:815-34. [PMID: 21144997 DOI: 10.1016/j.neuron.2010.11.022] [Citation(s) in RCA: 1438] [Impact Index Per Article: 110.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/29/2010] [Indexed: 01/18/2023]
Abstract
Midbrain dopamine neurons are well known for their strong responses to rewards and their critical role in positive motivation. It has become increasingly clear, however, that dopamine neurons also transmit signals related to salient but nonrewarding experiences such as aversive and alerting events. Here we review recent advances in understanding the reward and nonreward functions of dopamine. Based on this data, we propose that dopamine neurons come in multiple types that are connected with distinct brain networks and have distinct roles in motivational control. Some dopamine neurons encode motivational value, supporting brain networks for seeking, evaluation, and value learning. Others encode motivational salience, supporting brain networks for orienting, cognition, and general motivation. Both types of dopamine neurons are augmented by an alerting signal involved in rapid detection of potentially important sensory cues. We hypothesize that these dopaminergic pathways for value, salience, and alerting cooperate to support adaptive behavior.
Collapse
Affiliation(s)
- Ethan S Bromberg-Martin
- Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | | | | |
Collapse
|
28
|
Wanat MJ, Kuhnen CM, Phillips PEM. Delays conferred by escalating costs modulate dopamine release to rewards but not their predictors. J Neurosci 2010; 30:12020-7. [PMID: 20826665 PMCID: PMC2946195 DOI: 10.1523/jneurosci.2691-10.2010] [Citation(s) in RCA: 80] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2010] [Revised: 07/06/2010] [Accepted: 07/07/2010] [Indexed: 11/21/2022] Open
Abstract
Efficient reward seeking is essential for survival and invariably requires overcoming costs, such as physical effort and delay, which are constantly changing in natural settings. Dopamine transmission has been implicated in decisions weighing the benefits and costs of obtaining a reward, but it is still unclear how dynamically changing effort and delay costs affect dopamine signaling to rewards and related stimuli. Using fast-scan cyclic voltammetry, we examined phasic dopamine release in the nucleus accumbens (NAcc) core and shell during reward-seeking behavior in rats. To manipulate the effort and time needed to earn a reward, we used instrumental tasks in which the response requirements (number of lever presses) were either fixed throughout a behavioral session [fixed ratio (FR)] or systematically increased from trial to trial [progressive ratio (PR)]. Dopamine release evoked by cues denoting reward availability was no different between these conditions, indicating insensitivity to escalating effort or delay costs. In contrast, dopamine release to reward delivery in both the NAcc core and shell increased in PR, but not in FR, sessions. This enhancement of reward-evoked dopamine signaling was also observed in sessions in which the response requirement was fixed but the delay to reward delivery increased, yoked to corresponding trials in PR sessions. These findings suggest that delay, and not effort, was principally responsible for the increased reward-evoked dopamine release in PR sessions. Together, these data demonstrate that NAcc dopamine release to rewards and their predictors are dissociable and differentially regulated by the delays conferred under escalating costs.
Collapse
Affiliation(s)
- Matthew J. Wanat
- Departments of Psychiatry and Behavioral Sciences, and Pharmacology, University of Washington, Seattle, Washington 98195, and
| | - Camelia M. Kuhnen
- Kellogg School of Management, Northwestern University, Evanston, Illinois 60208
| | - Paul E. M. Phillips
- Departments of Psychiatry and Behavioral Sciences, and Pharmacology, University of Washington, Seattle, Washington 98195, and
| |
Collapse
|
29
|
Bromberg-Martin ES, Matsumoto M, Nakahara H, Hikosaka O. Multiple timescales of memory in lateral habenula and dopamine neurons. Neuron 2010; 67:499-510. [PMID: 20696385 DOI: 10.1016/j.neuron.2010.06.031] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/08/2010] [Indexed: 01/10/2023]
Abstract
Midbrain dopamine neurons are thought to signal predictions about future rewards based on the memory of past rewarding experience. Little is known about the source of their reward memory and the factors that control its timescale. Here we recorded from dopamine neurons, as well as one of their sources of input, the lateral habenula, while animals predicted upcoming rewards based on the past reward history. We found that lateral habenula and dopamine neurons accessed two distinct reward memories: a short-timescale memory expressed at the start of the task and a near-optimal long-timescale memory expressed when a future reward outcome was revealed. The short- and long-timescale memories were expressed in different forms of reward-oriented eye movements. Our data show that the habenula-dopamine pathway contains multiple timescales of memory and provide evidence for their role in motivated behavior.
Collapse
|
30
|
Bromberg-Martin ES, Matsumoto M, Hikosaka O. Distinct tonic and phasic anticipatory activity in lateral habenula and dopamine neurons. Neuron 2010; 67:144-55. [PMID: 20624598 DOI: 10.1016/j.neuron.2010.06.016] [Citation(s) in RCA: 106] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/01/2010] [Indexed: 10/19/2022]
Abstract
Dopamine has a crucial role in anticipation of motivational events. To investigate the underlying mechanisms of this process, we analyzed the activity of dopamine neurons and one of their major sources of input, neurons in the lateral habenula, while animals anticipated upcoming behavioral tasks. We found that lateral habenula and dopamine neurons anticipated tasks in two distinct manners. First, neurons encoded the timing distribution of upcoming tasks through gradual changes in their tonic activity. This tonic signal encoded rewarding tasks in preference to punishing tasks and was correlated with classic phasic coding of motivational value. Second, neurons transmitted a phasic signal marking the time when a task began. This phasic signal encoded rewarding and punishing tasks in similar manners, as though reflecting motivational salience. Our data suggest that the habenula-dopamine pathway motivates anticipation through a combination of tonic reward-related and phasic salience-related signals.
Collapse
Affiliation(s)
- Ethan S Bromberg-Martin
- Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health, Bethesda, MD 20892, USA.
| | | | | |
Collapse
|
31
|
Bromberg-Martin ES, Matsumoto M, Hong S, Hikosaka O. A pallidus-habenula-dopamine pathway signals inferred stimulus values. J Neurophysiol 2010; 104:1068-76. [PMID: 20538770 DOI: 10.1152/jn.00158.2010] [Citation(s) in RCA: 115] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
The reward value of a stimulus can be learned through two distinct mechanisms: reinforcement learning through repeated stimulus-reward pairings and abstract inference based on knowledge of the task at hand. The reinforcement mechanism is often identified with midbrain dopamine neurons. Here we show that a neural pathway controlling the dopamine system does not rely exclusively on either stimulus-reward pairings or abstract inference but instead uses a combination of the two. We trained monkeys to perform a reward-biased saccade task in which the reward values of two saccade targets were related in a systematic manner. Animals used each trial's reward outcome to learn the values of both targets: the target that had been presented and whose reward outcome had been experienced (experienced value) and the target that had not been presented but whose value could be inferred from the reward statistics of the task (inferred value). We then recorded from three populations of reward-coding neurons: substantia nigra dopamine neurons; a major input to dopamine neurons, the lateral habenula; and neurons that project to the lateral habenula, located in the globus pallidus. All three populations encoded both experienced values and inferred values. In some animals, neurons encoded experienced values more strongly than inferred values, and the animals showed behavioral evidence of learning faster from experience than from inference. Our data indicate that the pallidus-habenula-dopamine pathway signals reward values estimated through both experience and inference.
Collapse
Affiliation(s)
- Ethan S Bromberg-Martin
- Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health, Bldg. 49, Rm. 2A50, Bethesda, Maryland 20892-4435, USA.
| | | | | | | |
Collapse
|
32
|
Schultz W. Dopamine signals for reward value and risk: basic and recent data. Behav Brain Funct 2010; 6:24. [PMID: 20416052 PMCID: PMC2876988 DOI: 10.1186/1744-9081-6-24] [Citation(s) in RCA: 413] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2010] [Accepted: 04/23/2010] [Indexed: 01/20/2023] Open
Abstract
Background Previous lesion, electrical self-stimulation and drug addiction studies suggest that the midbrain dopamine systems are parts of the reward system of the brain. This review provides an updated overview about the basic signals of dopamine neurons to environmental stimuli. Methods The described experiments used standard behavioral and neurophysiological methods to record the activity of single dopamine neurons in awake monkeys during specific behavioral tasks. Results Dopamine neurons show phasic activations to external stimuli. The signal reflects reward, physical salience, risk and punishment, in descending order of fractions of responding neurons. Expected reward value is a key decision variable for economic choices. The reward response codes reward value, probability and their summed product, expected value. The neurons code reward value as it differs from prediction, thus fulfilling the basic requirement for a bidirectional prediction error teaching signal postulated by learning theory. This response is scaled in units of standard deviation. By contrast, relatively few dopamine neurons show the phasic activation following punishers and conditioned aversive stimuli, suggesting a lack of relationship of the reward response to general attention and arousal. Large proportions of dopamine neurons are also activated by intense, physically salient stimuli. This response is enhanced when the stimuli are novel; it appears to be distinct from the reward value signal. Dopamine neurons show also unspecific activations to non-rewarding stimuli that are possibly due to generalization by similar stimuli and pseudoconditioning by primary rewards. These activations are shorter than reward responses and are often followed by depression of activity. A separate, slower dopamine signal informs about risk, another important decision variable. The prediction error response occurs only with reward; it is scaled by the risk of predicted reward. Conclusions Neurophysiological studies reveal phasic dopamine signals that transmit information related predominantly but not exclusively to reward. Although not being entirely homogeneous, the dopamine signal is more restricted and stereotyped than neuronal activity in most other brain structures involved in goal directed behavior.
Collapse
Affiliation(s)
- Wolfram Schultz
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge, UK.
| |
Collapse
|
33
|
Schultz W. Dopamine signals for reward value and risk: basic and recent data. BEHAVIORAL AND BRAIN FUNCTIONS : BBF 2010; 6:24. [PMID: 20416052 DOI: 10.1186/1744-9081-1186-1124] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Received: 03/16/2010] [Accepted: 04/23/2010] [Indexed: 05/27/2023]
Abstract
BACKGROUND Previous lesion, electrical self-stimulation and drug addiction studies suggest that the midbrain dopamine systems are parts of the reward system of the brain. This review provides an updated overview about the basic signals of dopamine neurons to environmental stimuli. METHODS The described experiments used standard behavioral and neurophysiological methods to record the activity of single dopamine neurons in awake monkeys during specific behavioral tasks. RESULTS Dopamine neurons show phasic activations to external stimuli. The signal reflects reward, physical salience, risk and punishment, in descending order of fractions of responding neurons. Expected reward value is a key decision variable for economic choices. The reward response codes reward value, probability and their summed product, expected value. The neurons code reward value as it differs from prediction, thus fulfilling the basic requirement for a bidirectional prediction error teaching signal postulated by learning theory. This response is scaled in units of standard deviation. By contrast, relatively few dopamine neurons show the phasic activation following punishers and conditioned aversive stimuli, suggesting a lack of relationship of the reward response to general attention and arousal. Large proportions of dopamine neurons are also activated by intense, physically salient stimuli. This response is enhanced when the stimuli are novel; it appears to be distinct from the reward value signal. Dopamine neurons show also unspecific activations to non-rewarding stimuli that are possibly due to generalization by similar stimuli and pseudoconditioning by primary rewards. These activations are shorter than reward responses and are often followed by depression of activity. A separate, slower dopamine signal informs about risk, another important decision variable. The prediction error response occurs only with reward; it is scaled by the risk of predicted reward. CONCLUSIONS Neurophysiological studies reveal phasic dopamine signals that transmit information related predominantly but not exclusively to reward. Although not being entirely homogeneous, the dopamine signal is more restricted and stereotyped than neuronal activity in most other brain structures involved in goal directed behavior.
Collapse
Affiliation(s)
- Wolfram Schultz
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge, UK.
| |
Collapse
|
34
|
Two types of dopamine neuron distinctly convey positive and negative motivational signals. Nature 2009; 459:837-41. [PMID: 19448610 DOI: 10.1038/nature08028] [Citation(s) in RCA: 931] [Impact Index Per Article: 62.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2009] [Accepted: 03/27/2009] [Indexed: 11/08/2022]
Abstract
Midbrain dopamine neurons are activated by reward or sensory stimuli predicting reward. These excitatory responses increase as the reward value increases. This response property has led to a hypothesis that dopamine neurons encode value-related signals and are inhibited by aversive events. Here we show that this is true only for a subset of dopamine neurons. We recorded the activity of dopamine neurons in monkeys (Macaca mulatta) during a Pavlovian procedure with appetitive and aversive outcomes (liquid rewards and airpuffs directed at the face, respectively). We found that some dopamine neurons were excited by reward-predicting stimuli and inhibited by airpuff-predicting stimuli, as the value hypothesis predicts. However, a greater number of dopamine neurons were excited by both of these stimuli, inconsistent with the hypothesis. Some dopamine neurons were also excited by both rewards and airpuffs themselves, especially when they were unpredictable. Neurons excited by the airpuff-predicting stimuli were located more dorsolaterally in the substantia nigra pars compacta, whereas neurons inhibited by the stimuli were located more ventromedially, some in the ventral tegmental area. A similar anatomical difference was observed for their responses to actual airpuffs. These findings suggest that different groups of dopamine neurons convey motivational signals in distinct manners.
Collapse
|
35
|
Abstract
Reward presentation is known to induce transient bursts of midbrain dopamine neurons in monkeys and rats, and the reward-induced dopamine overflow has been detected in the rat ventral striatum. To detect reward-related dopamine release in the dorsal striatum of behaving mice (C57BL/6), we used voltammetry with carbon-fiber microelectrodes implanted into the dorsal striatum. Dopamine signals increased transiently after food delivery with a peak at 0.6 s after the delivery onset. The success in detecting transient reward-response of dopamine in behaving mice opens a wide range of application to studies in mutant mice.
Collapse
|
36
|
May PJ, McHaffie JG, Stanford TR, Jiang H, Costello MG, Coizet V, Hayes LM, Haber SN, Redgrave P. Tectonigral projections in the primate: a pathway for pre-attentive sensory input to midbrain dopaminergic neurons. Eur J Neurosci 2009; 29:575-87. [PMID: 19175405 DOI: 10.1111/j.1460-9568.2008.06596.x] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Much of the evidence linking the short-latency phasic signaling of midbrain dopaminergic neurons with reward-prediction errors used in learning and habit formation comes from recording the visual responses of monkey dopaminergic neurons. However, the information encoded by dopaminergic neuron activity is constrained by the qualities of the afferent visual signals made available to these cells. Recent evidence from rats and cats indicates the primary source of this visual input originates subcortically, via a direct tectonigral projection. The present anatomical study sought to establish whether a direct tectonigral projection is a significant feature of the primate brain. Injections of anterograde tracers into the superior colliculus of macaque monkeys labelled terminal arbors throughout the substantia nigra, with the densest terminations in the dorsal tier. Labelled boutons were found in close association (possibly indicative of synaptic contact) with ventral midbrain neurons staining positively for the dopaminergic marker tyrosine hydroxylase. Injections of retrograde tracer confined to the macaque substantia nigra retrogradely labelled small- to medium-sized neurons in the intermediate and deep layers of the superior colliculus. Together, these data indicate that a direct tectonigral projection is also a feature of the monkey brain, and therefore likely to have been conserved throughout mammalian evolution. Insofar as the superior colliculus is configured to detect unpredicted, biologically salient, sensory events, it may be safer to regard the phasic responses of midbrain dopaminergic neurons as 'sensory prediction errors' rather than 'reward prediction errors', in which case dopamine-based theories of reinforcement learning will require revision.
Collapse
Affiliation(s)
- Paul J May
- Department of Anatomy, Ophthalmology & Neurology, University of Mississippi Medical Center, Jackson, MS, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Abstract
The dopamine system has been thought to play a central role in guiding behavior based on rewards. Recent pharmacological studies suggest that another monoamine neurotransmitter, serotonin, is also involved in reward processing. To elucidate the functional relationship between serotonin neurons and dopamine neurons, we performed single-unit recording in the dorsal raphe nucleus (DRN), a major source of serotonin, and the substantia nigra pars compacta, a major source of dopamine, while monkeys performed saccade tasks in which the position of the target indicated the size of an upcoming reward. After target onset, but before reward delivery, the activity of many DRN neurons was modulated tonically by the expected reward size with either large- or small-reward preference, whereas putative dopamine neurons had phasic responses and only preferred large rewards. After reward delivery, the activity of DRN neurons was modulated tonically by the received reward size with either large- or small-reward preference, whereas the activity of dopamine neurons was not modulated except after the unexpected reversal of the position-reward contingency. Thus, DRN neurons encode the expected and received rewards, whereas dopamine neurons encode the difference between the expected and received rewards. These results suggest that the DRN, probably including serotonin neurons, signals the reward value associated with the current behavior.
Collapse
|
38
|
What is reinforced by phasic dopamine signals? ACTA ACUST UNITED AC 2007; 58:322-39. [PMID: 18055018 DOI: 10.1016/j.brainresrev.2007.10.007] [Citation(s) in RCA: 166] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2007] [Revised: 10/08/2007] [Accepted: 10/10/2007] [Indexed: 11/23/2022]
Abstract
The basal ganglia have been associated with processes of reinforcement learning. A strong line of supporting evidence comes from the recording of dopamine (DA) neurones in behaving monkeys. Unpredicted, biologically salient events, including rewards cause a stereotypic short-latency (70-100 ms), short-duration (100-200 ms) burst of DA activity - the phasic response. This response is widely considered to represent reward prediction errors used as teaching signals in appetitive learning to promote actions that will maximise future reward acquisition. For DA signalling to perform this function, sensory processing afferent to DA neurones should discriminate unpredicted reward-related events. However, the comparative response latencies of DA neurones and orienting gaze-shifts indicate that phasic DA responses are triggered by pre-attentive sensory processing. Consequently, in circumstances where biologically salient events are both spatially and temporally unpredictable, it is unlikely their identity will be known at the time of DA signalling. The limited quality of afferent sensory processing and the precise timing of phasic DA signals, suggests that they may play a less direct role in 'Law of Effect' appetitive learning. Rather, the 'time-stamp' nature of the phasic response, in conjunction with the other signals likely to be present in the basal ganglia at the time of phasic DA input, suggests it may reinforce the discovery of unpredicted sensory events for which the organism is responsible. Furthermore, DA-promoted repetition of preceding actions/movements should enable the system to converge on those aspects of context and behavioural output that lead to the discovery of novel actions.
Collapse
|
39
|
Fields HL, Hjelmstad GO, Margolis EB, Nicola SM. Ventral tegmental area neurons in learned appetitive behavior and positive reinforcement. Annu Rev Neurosci 2007; 30:289-316. [PMID: 17376009 DOI: 10.1146/annurev.neuro.30.051606.094341] [Citation(s) in RCA: 414] [Impact Index Per Article: 24.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Ventral tegmental area (VTA) neuron firing precedes behaviors elicited by reward-predictive sensory cues and scales with the magnitude and unpredictability of received rewards. These patterns are consistent with roles in the performance of learned appetitive behaviors and in positive reinforcement, respectively. The VTA includes subpopulations of neurons with different afferent connections, neurotransmitter content, and projection targets. Because the VTA and substantia nigra pars compacta are the sole sources of striatal and limbic forebrain dopamine, measurements of dopamine release and manipulations of dopamine function have provided critical evidence supporting a VTA contribution to these functions. However, the VTA also sends GABAergic and glutamatergic projections to the nucleus accumbens and prefrontal cortex. Furthermore, VTA-mediated but dopamine-independent positive reinforcement has been demonstrated. Consequently, identifying the neurotransmitter content and projection target of VTA neurons recorded in vivo will be critical for determining their contribution to learned appetitive behaviors.
Collapse
Affiliation(s)
- Howard L Fields
- Ernest Gallo Clinic and Research Center and Wheeler Center for the Neurobiology of Addiction, University of California, San Francisco, Emeryville, California 94608, USA.
| | | | | | | |
Collapse
|
40
|
Matsumoto M, Hikosaka O. Lateral habenula as a source of negative reward signals in dopamine neurons. Nature 2007; 447:1111-5. [PMID: 17522629 DOI: 10.1038/nature05860] [Citation(s) in RCA: 912] [Impact Index Per Article: 53.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2007] [Accepted: 04/18/2007] [Indexed: 11/08/2022]
Abstract
Midbrain dopamine neurons are key components of the brain's reward system, which is thought to guide reward-seeking behaviours. Although recent studies have shown how dopamine neurons respond to rewards and sensory stimuli predicting reward, it is unclear which parts of the brain provide dopamine neurons with signals necessary for these actions. Here we show that the primate lateral habenula, part of the structure called the epithalamus, is a major candidate for a source of negative reward-related signals in dopamine neurons. We recorded the activity of habenula neurons and dopamine neurons while rhesus monkeys were performing a visually guided saccade task with positionally biased reward outcomes. Many habenula neurons were excited by a no-reward-predicting target and inhibited by a reward-predicting target. In contrast, dopamine neurons were excited and inhibited by reward-predicting and no-reward-predicting targets, respectively. Each time the rewarded and unrewarded positions were reversed, both habenula and dopamine neurons reversed their responses as the bias in saccade latency reversed. In unrewarded trials, the excitation of habenula neurons started earlier than the inhibition of dopamine neurons. Furthermore, weak electrical stimulation of the lateral habenula elicited strong inhibitions in dopamine neurons. These results suggest that the inhibitory input from the lateral habenula plays an important role in determining the reward-related activity of dopamine neurons.
Collapse
Affiliation(s)
- Masayuki Matsumoto
- Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health, Bethesda, Maryland 20892-4435, USA
| | | |
Collapse
|
41
|
Gdowski MJ, Miller LE, Bastianen CA, Nenonene EK, Houk JC. Signaling patterns of globus pallidus internal segment neurons during forearm rotation. Brain Res 2007; 1155:56-69. [PMID: 17499221 PMCID: PMC1989114 DOI: 10.1016/j.brainres.2007.04.028] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2007] [Revised: 04/10/2007] [Accepted: 04/11/2007] [Indexed: 01/07/2023]
Abstract
We recorded extracellular single unit discharges of globus pallidus internal segment (GPi) neurons in monkeys performing a visually driven forearm rotation movement task in order to quantify how discharge patterns changed in relation to kinematic parameters. Subjects grasped a handle that rotated about its axis while facing a video screen displaying visual targets. Continuous visual feedback of handle rotation position was provided. Monkeys generated forearm rotation movements of +/-35 degrees and +/-70 degrees amplitude in order to align the cursor and targets. Trial records were aligned to forearm rotation onset in order to compare the discharge patterns that were associated with movements of different amplitudes, velocities, and directions. In addition, we quantified the depth of modulation of neuronal discharge associated with movements generated in two different task phases. Comparisons of discharge patterns were made between the visually guided, rewarded phase ("cued movements") and the self-paced, unrewarded phase that returned the monkey to the task start position ("return movements") by quantifying the goodness of fit between neuronal discharge during cued and return movements. Our analyses revealed no systematic relationship between the depth of modulation of GPi neurons and forearm rotation amplitude, direction, or velocity. Furthermore, comparisons between the two behavioral contexts revealed a systematic attenuation of modulation that could not be attributed to differences in movement velocity. Collectively, these findings suggest that the GPi neurons that we studied were not significantly involved in mediating movement kinematics, but may have instead been instrumental in the processing of information about the behavioral context during which movements were generated.
Collapse
Affiliation(s)
- Martha Johnson Gdowski
- Department of Neurobiology and Anatomy, University of Rochester, Rochester, NY 14642, USA.
| | | | | | | | | |
Collapse
|
42
|
Abstract
Expectation of reward facilitates motor behaviors that enable the animal to approach a location in space where the reward is expected. It is now known that the same expectation of reward profoundly modifies sensory, motor, and cognitive information processing in the brain. However, it is still unclear which brain regions are responsible for causing the reward-approaching behavior. One candidate is the dorsal striatum where cortical and dopaminergic inputs converge. We tested this hypothesis by injecting dopamine antagonists into the caudate nucleus (CD) while the monkey was performing a saccade task with a position-dependent asymmetric reward schedule. We previously had shown that: (1) serial GABAergic connections from the CD to the superior colliculus (SC) via the substantia nigra pars reticulata (SNr) exert powerful control over the initiation of saccadic eye movement and (2) these GABAergic neurons encode target position and are strongly influenced by expected reward, while dopaminergic neurons in the substantia nigra pars compacta (SNc) encode only reward-related information. Before injections of dopamine antagonists the latencies of saccades to a given target were shorter when the saccades were followed by a large reward than when they were followed by a small reward. After injections of dopamine D1 receptor antagonist the reward-dependent latency bias became smaller. This was due to an increase in saccade latency on large-reward trials. After injections of D2 antagonist the latency bias became larger, largely due to an increase in saccade latency on small-reward trials. These results indicate that: (1) dopamine-dependent information processing in the CD is necessary for the reward-dependent modulation of saccadic eye movement and (2) D1 and D2 receptors play differential roles depending on the positive and negative reward outcomes.
Collapse
Affiliation(s)
- Okihide Hikosaka
- Laboratory of Sensorimotor Research, National Eye Institute, National Institute of Health, Bethesda, MD 20892-4435, USA.
| |
Collapse
|
43
|
Nicola SM. The nucleus accumbens as part of a basal ganglia action selection circuit. Psychopharmacology (Berl) 2007; 191:521-50. [PMID: 16983543 DOI: 10.1007/s00213-006-0510-4] [Citation(s) in RCA: 244] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/04/2006] [Accepted: 07/03/2006] [Indexed: 02/03/2023]
Abstract
BACKGROUND The nucleus accumbens is the ventral extent of the striatum, the main input nucleus of the basal ganglia. Recent hypotheses propose that the accumbens and its dopamine projection from the midbrain contribute to appetitive behaviors required to obtain reward. However, the specific nature of this contribution is unclear. In contrast, significant advances have been made in understanding the role of the dorsal striatum in action selection and decision making. OBJECTIVE In order to develop a hypothesis of the role of nucleus accumbens dopamine in action selection, the physiology and behavioral pharmacology of the nucleus accumbens are compared to those of the dorsal striatum. HYPOTHESES Three hypotheses concerning the role of dopamine in these structures are proposed: (1) that dopamine release in the dorsal striatum serves to facilitate the ability to respond appropriately to temporally predictable stimuli (that is, stimuli that are so predictable that animals engage in anticipatory behavior just prior to the stimulus); (2) that dopamine in the nucleus accumbens facilitates the ability to respond to temporally unpredictable stimuli (which require interruption of ongoing behavior); and (3) that accumbens neurons participate in action selection in response to such stimuli by virtue of their direct (monosynaptic inhibitory) and indirect (polysynaptic excitatory) projections to basal ganglia output nuclei.
Collapse
Affiliation(s)
- Saleem M Nicola
- Ernest Gallo Clinic and Research Center, University of California, San Francisco, 5858 Horton St., Ste. 200, Emeryville, CA 94608, USA.
| |
Collapse
|
44
|
Kawato M, Samejima K. Efficient reinforcement learning: computational theories, neuroscience and robotics. Curr Opin Neurobiol 2007; 17:205-12. [PMID: 17374483 DOI: 10.1016/j.conb.2007.03.004] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2007] [Accepted: 03/08/2007] [Indexed: 11/22/2022]
Abstract
Reinforcement learning algorithms have provided some of the most influential computational theories for behavioral learning that depends on reward and penalty. After briefly reviewing supporting experimental data, this paper tackles three difficult theoretical issues that remain to be explored. First, plain reinforcement learning is much too slow to be considered a plausible brain model. Second, although the temporal-difference error has an important role both in theory and in experiments, how to compute it remains an enigma. Third, function of all brain areas, including the cerebral cortex, cerebellum, brainstem and basal ganglia, seems to necessitate a new computational framework. Computational studies that emphasize meta-parameters, hierarchy, modularity and supervised learning to resolve these issues are reviewed here, together with the related experimental data.
Collapse
Affiliation(s)
- Mitsuo Kawato
- ATR Computational Neuroscience Laboratories, 2-2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0288, Japan.
| | | |
Collapse
|
45
|
Horvitz JC, Choi WY, Morvan C, Eyny Y, Balsam PD. A "good parent" function of dopamine: transient modulation of learning and performance during early stages of training. Ann N Y Acad Sci 2007; 1104:270-88. [PMID: 17360799 PMCID: PMC2827849 DOI: 10.1196/annals.1390.017] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
While extracellular dopamine (DA) concentrations are increased by a wide category of salient stimuli, there is evidence to suggest that DA responses to primary and conditioned rewards may be distinct from those elicited by other types of salient events. A reward-specific mode of neuronal responding would be necessary if DA acts to strengthen behavioral response tendencies under particular environmental conditions or to set current environmental inputs as goals that direct approach responses. As described in this review, DA critically mediates both the acquisition and expression of learned behaviors during early stages of training, however, during later stages, at least some forms of learned behavior become independent of (or less dependent upon) DA transmission for their expression.
Collapse
Affiliation(s)
- Jon C Horvitz
- Department of Psychology, Boston College, Chestnut Hill, MA 02467, USA.
| | | | | | | | | |
Collapse
|
46
|
Redgrave P, Gurney K. The short-latency dopamine signal: a role in discovering novel actions? Nat Rev Neurosci 2006; 7:967-75. [PMID: 17115078 DOI: 10.1038/nrn2022] [Citation(s) in RCA: 434] [Impact Index Per Article: 24.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
An influential concept in contemporary computational neuroscience is the reward prediction error hypothesis of phasic dopaminergic function. It maintains that midbrain dopaminergic neurons signal the occurrence of unpredicted reward, which is used in appetitive learning to reinforce existing actions that most often lead to reward. However, the availability of limited afferent sensory processing and the precise timing of dopaminergic signals suggest that they might instead have a central role in identifying which aspects of context and behavioural output are crucial in causing unpredicted events.
Collapse
Affiliation(s)
- Peter Redgrave
- Neuroscience Research Unit, Department of Psychology, University of Sheffield, Sheffield, S10 2TP, UK.
| | | |
Collapse
|
47
|
Haruno M, Kawato M. Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning. Neural Netw 2006; 19:1242-54. [PMID: 16987637 DOI: 10.1016/j.neunet.2006.06.007] [Citation(s) in RCA: 94] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2005] [Accepted: 06/01/2006] [Indexed: 11/28/2022]
Abstract
The brain's most difficult computation in decision-making learning is searching for essential information related to rewards among vast multimodal inputs and then integrating it into beneficial behaviors. Contextual cues consisting of limbic, cognitive, visual, auditory, somatosensory, and motor signals need to be associated with both rewards and actions by utilizing an internal representation such as reward prediction and reward prediction error. Previous studies have suggested that a suitable brain structure for such integration is the neural circuitry associated with multiple cortico-striatal loops. However, computational exploration still remains into how the information in and around these multiple closed loops can be shared and transferred. Here, we propose a "heterarchical reinforcement learning" model, where reward prediction made by more limbic and cognitive loops is propagated to motor loops by spiral projections between the striatum and substantia nigra, assisted by cortical projections to the pedunculopontine tegmental nucleus, which sends excitatory input to the substantia nigra. The model makes several fMRI-testable predictions of brain activity during stimulus-action-reward association learning. The caudate nucleus and the cognitive cortical areas are correlated with reward prediction error, while the putamen and motor-related areas are correlated with stimulus-action-dependent reward prediction. Furthermore, a heterogeneous activity pattern within the striatum is predicted depending on learning difficulty, i.e., the anterior medial caudate nucleus will be correlated more with reward prediction error when learning becomes difficult, while the posterior putamen will be correlated more with stimulus-action-dependent reward prediction in easy learning. Our fMRI results revealed that different cortico-striatal loops are operating, as suggested by the proposed model.
Collapse
Affiliation(s)
- Masahiko Haruno
- ATR Computational Neuroscience Laboratories, Department of Computational Neurobiology, 2-2-2 Hikaridai, Soraku-gun, Kyoto, Japan.
| | | |
Collapse
|
48
|
Ravel S, Richmond BJ. Dopamine neuronal responses in monkeys performing visually cued reward schedules. Eur J Neurosci 2006; 24:277-90. [PMID: 16882024 DOI: 10.1111/j.1460-9568.2006.04905.x] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Dopamine neurons are important for reward-related behaviours. They have been recorded during classical conditioning and operant tasks with stochastic reward delivery. However, daily behaviour, although frequently complex in the number of steps, is often very predictable. We studied the responses of 75 dopamine neurons during schedules of trials in which the events and related reward contingencies could be well-predicted, within and across trials. In this visually cued reward schedule task, a visual cue tells the monkeys exactly how many trials, 1, 2, 3, or 4, must be performed to obtain a reward. The number of errors became larger as the number of trials remaining before the reward increased. Dopamine neurons frequently responded to the cues at the beginning and end of the schedules. Approximately 75% of the first-cue responsive neurons did not distinguish among the schedules that were beginning even though the cues were different. Approximately half of the last-cue responsive neurons depended on which schedule was ending, even though the cue signalling the last trial was the same in all schedules. Thus, the responses were related to what the monkey knew about the relation between the cues and the schedules, not the identity of the cues. These neurons also frequently responded to the go signal and/or to the OK signal indicating the end of a correctly performed trial whether a reward was forthcoming or not, and to the reward itself. Thus, dopamine neurons seem to respond to behaviourally important, i.e. salient, events even when the events have been well-predicted.
Collapse
Affiliation(s)
- Sabrina Ravel
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Department of Health and Human Services, Bldg 49, Rm 1B80, Bethesda, MD 20892, USA
| | | |
Collapse
|
49
|
Abstract
Expectation of reward motivates our behaviors and influences our decisions. Indeed, neuronal activity in many brain areas is modulated by expected reward. However, it is still unclear where and how the reward-dependent modulation of neuronal activity occurs and how the reward-modulated signal is transformed into motor outputs. Recent studies suggest an important role of the basal ganglia. Sensorimotor/cognitive activities of neurons in the basal ganglia are strongly modulated by expected reward. Through their abundant outputs to the brain stem motor areas and the thalamocortical circuits, the basal ganglia appear capable of producing body movements based on expected reward. A good behavioral measure to test this hypothesis is saccadic eye movement because its brain stem mechanism has been extensively studied. Studies from our laboratory suggest that the basal ganglia play a key role in guiding the gaze to the location where reward is available. Neurons in the caudate nucleus and the substantia nigra pars reticulata are extremely sensitive to the positional difference in expected reward, which leads to a bias in excitability between the superior colliculi such that the saccade to the to-be-rewarded position occurs more quickly. It is suggested that the reward modulation occurs in the caudate where cortical inputs carrying spatial signals and dopaminergic inputs carrying reward-related signals are integrated. These data support a specific form of reinforcement learning theories, but also suggest further refinement of the theory.
Collapse
Affiliation(s)
- Okihide Hikosaka
- Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health, Bethesda, MD 20892, USA.
| | | | | |
Collapse
|
50
|
Haruno M, Kawato M. Different Neural Correlates of Reward Expectation and Reward Expectation Error in the Putamen and Caudate Nucleus During Stimulus-Action-Reward Association Learning. J Neurophysiol 2006; 95:948-59. [PMID: 16192338 DOI: 10.1152/jn.00382.2005] [Citation(s) in RCA: 309] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
To select appropriate behaviors leading to rewards, the brain needs to learn associations among sensory stimuli, selected behaviors, and rewards. Recent imaging and neural-recording studies have revealed that the dorsal striatum plays an important role in learning such stimulus-action-reward associations. However, the putamen and caudate nucleus are embedded in distinct cortico-striatal loop circuits, predominantly connected to motor-related cerebral cortical areas and frontal association areas, respectively. This difference in their cortical connections suggests that the putamen and caudate nucleus are engaged in different functional aspects of stimulus-action-reward association learning. To determine whether this is the case, we conducted an event-related and computational model–based functional MRI (fMRI) study with a stochastic decision-making task in which a stimulus-action-reward association must be learned. A simple reinforcement learning model not only reproduced the subject's action selections reasonably well but also allowed us to quantitatively estimate each subject's temporal profiles of stimulus-action-reward association and reward-prediction error during learning trials. These two internal representations were used in the fMRI correlation analysis. The results revealed that neural correlates of the stimulus-action-reward association reside in the putamen, whereas a correlation with reward-prediction error was found largely in the caudate nucleus and ventral striatum. These nonuniform spatiotemporal distributions of neural correlates within the dorsal striatum were maintained consistently at various levels of task difficulty, suggesting a functional difference in the dorsal striatum between the putamen and caudate nucleus during stimulus-action-reward association learning.
Collapse
Affiliation(s)
- Masahiko Haruno
- Department of Cognitive Neuroscience Computational Neuroscience Labs, Advanced Telecommunication Research Institute, Sorakugun, Kyoto 619-0288, Japan.
| | | |
Collapse
|