201
|
Kaveri S, Nakahara H. Dual reward prediction components yield Pavlovian sign- and goal-tracking. PLoS One 2014; 9:e108142. [PMID: 25310184 PMCID: PMC4195585 DOI: 10.1371/journal.pone.0108142] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2013] [Accepted: 08/26/2014] [Indexed: 11/18/2022] Open
Abstract
Reinforcement learning (RL) has become a dominant paradigm for understanding animal behaviors and neural correlates of decision-making, in part because of its ability to explain Pavlovian conditioned behaviors and the role of midbrain dopamine activity as reward prediction error (RPE). However, recent experimental findings indicate that dopamine activity, contrary to the RL hypothesis, may not signal RPE and differs based on the type of Pavlovian response (e.g. sign- and goal-tracking responses). In this study, we address this discrepancy by introducing a new neural correlate for learning reward predictions; the correlate is called "cue-evoked reward". It refers to a recall of reward evoked by the cue that is learned through simple cue-reward associations. We introduce a temporal difference learning model, in which neural correlates of the cue itself and cue-evoked reward underlie learning of reward predictions. The animal's reward prediction supported by these two correlates is divided into sign and goal components respectively. We relate the sign and goal components to approach responses towards the cue (i.e. sign-tracking) and the food-tray (i.e. goal-tracking) respectively. We found a number of correspondences between simulated models and the experimental findings (i.e. behavior and neural responses). First, the development of modeled responses is consistent with those observed in the experimental task. Second, the model's RPEs were similar to dopamine activity in respective response groups. Finally, goal-tracking, but not sign-tracking, responses rapidly emerged when RPE was restored in the simulated models, similar to experiments with recovery from dopamine-antagonist. These results suggest two complementary neural correlates, corresponding to the cue and its evoked reward, form the basis for learning reward predictions in the sign- and goal-tracking rats.
Collapse
Affiliation(s)
- Sivaramakrishnan Kaveri
- Lab for Integrated Theoretical Neuroscience, RIKEN BSI, Wako, Japan
- Dept. of Computational Intelligence and Systems Science, Tokyo Institute of Technology, Yokohama, Japan
- * E-mail:
| | | |
Collapse
|
202
|
Gruber MJ, Gelman BD, Ranganath C. States of curiosity modulate hippocampus-dependent learning via the dopaminergic circuit. Neuron 2014; 84:486-96. [PMID: 25284006 DOI: 10.1016/j.neuron.2014.08.060] [Citation(s) in RCA: 256] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/29/2014] [Indexed: 11/20/2022]
Abstract
People find it easier to learn about topics that interest them, but little is known about the mechanisms by which intrinsic motivational states affect learning. We used functional magnetic resonance imaging to investigate how curiosity (intrinsic motivation to learn) influences memory. In both immediate and one-day-delayed memory tests, participants showed improved memory for information that they were curious about and for incidental material learned during states of high curiosity. Functional magnetic resonance imaging results revealed that activity in the midbrain and the nucleus accumbens was enhanced during states of high curiosity. Importantly, individual variability in curiosity-driven memory benefits for incidental material was supported by anticipatory activity in the midbrain and hippocampus and by functional connectivity between these regions. These findings suggest a link between the mechanisms supporting extrinsic reward motivation and intrinsic curiosity and highlight the importance of stimulating curiosity to create more effective learning experiences.
Collapse
Affiliation(s)
- Matthias J Gruber
- Center for Neuroscience, University of California at Davis, Davis, CA 95618, USA.
| | - Bernard D Gelman
- Center for Neuroscience, University of California at Davis, Davis, CA 95618, USA
| | - Charan Ranganath
- Center for Neuroscience, University of California at Davis, Davis, CA 95618, USA; Department of Psychology, University of California at Davis, Davis, CA 95616, USA
| |
Collapse
|
203
|
Albrecht K, Abeler J, Weber B, Falk A. The brain correlates of the effects of monetary and verbal rewards on intrinsic motivation. Front Neurosci 2014; 8:303. [PMID: 25278834 PMCID: PMC4166960 DOI: 10.3389/fnins.2014.00303] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2014] [Accepted: 09/04/2014] [Indexed: 11/13/2022] Open
Abstract
Apart from everyday duties, such as doing the laundry or cleaning the house, there are tasks we do for pleasure and enjoyment. We do such tasks, like solving crossword puzzles or reading novels, without any external pressure or force; instead, we are intrinsically motivated: we do the tasks because we enjoy doing them. Previous studies suggest that external rewards, i.e., rewards from the outside, affect the intrinsic motivation to engage in a task: while performance-based monetary rewards are perceived as controlling and induce a business-contract framing, verbal rewards praising one's competence can enhance the perceived self-determination. Accordingly, the former have been shown to decrease intrinsic motivation, whereas the latter have been shown to increase intrinsic motivation. The present study investigated the neural processes underlying the effects of monetary and verbal rewards on intrinsic motivation in a group of 64 subjects applying functional magnetic resonance imaging (fMRI). We found that, when participants received positive performance feedback, activation in the anterior striatum and midbrain was affected by the nature of the reward; compared to a non-rewarded control group, activation was higher while monetary rewards were administered. However, we did not find a decrease in activation after reward withdrawal. In contrast, we found an increase in activation for verbal rewards: after verbal rewards had been withdrawn, participants showed a higher activation in the aforementioned brain areas when they received success compared to failure feedback. We further found that, while participants worked on the task, activation in the lateral prefrontal cortex was enhanced after the verbal rewards were administered and withdrawn.
Collapse
Affiliation(s)
- Konstanze Albrecht
- Department of Education, Cognition, and Communication, Institute of Psychology, RWTH Aachen University Aachen, Germany
| | | | - Bernd Weber
- Center for Economics and Neuroscience, University of Bonn Bonn, Germany ; Department of Epileptology, University Hospital of Bonn Bonn, Germany
| | - Armin Falk
- Center for Economics and Neuroscience, University of Bonn Bonn, Germany ; Department of Economics, University of Bonn Bonn, Germany
| |
Collapse
|
204
|
Harb MR, Sousa N, Zihl J, Almeida OFX. Reward components of feeding behavior are preserved during mouse aging. Front Aging Neurosci 2014; 6:242. [PMID: 25278876 PMCID: PMC4165288 DOI: 10.3389/fnagi.2014.00242] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2014] [Accepted: 08/25/2014] [Indexed: 01/10/2023] Open
Abstract
Eating behavior depends on associations between the sensory and energetic properties of foods. Healthful balance of these factors is a challenge for industrialized societies that have an abundance of food, food choices and food-related cues. Here, we were interested in whether appetitive conditioning changes as a function of age. Operant and pavlovian conditioning experiments (rewarding stimulus was a palatable food) in male mice (aged 3, 6, and 15 months) showed that implicit (non-declarative) memory remains intact during aging. Two other essential components of eating behavior, motivation and hedonic preference for rewarding foods, were also found not to be altered in aging mice. Specifically, hedonic responding by satiated mice to isocaloric foods of differing sensory properties (sucrose, milk) was similar in all age groups; importantly, however, this paradigm disclosed that older animals adjust their energy intake according to energetic need. Based on the assumption that the mechanisms that control feeding are conserved across species, it would appear that overeating and obesity in humans reflects a mismatch between ancient physiological mechanisms and today's cue-laden environment. The implication of the present results showing that aging does not impair the ability to learn stimulus-food associations is that the risk of overeating in response to food cues is maintained through to old age.
Collapse
Affiliation(s)
- Mazen R Harb
- Max Planck Institute of Psychiatry Munich, Germany ; Portugal and ICVS/3B's-PT Government Associate Laboratory, Institute of Life and Health Sciences (ICVS), University of Minho Braga, Portugal
| | - Nuno Sousa
- Portugal and ICVS/3B's-PT Government Associate Laboratory, Institute of Life and Health Sciences (ICVS), University of Minho Braga, Portugal
| | - Joseph Zihl
- Department of Neuropsychology, Ludwig Maximilian University Munich, Germany
| | | |
Collapse
|
205
|
Turi Z, Mittner M, Opitz A, Popkes M, Paulus W, Antal A. Transcranial direct current stimulation over the left prefrontal cortex increases randomness of choice in instrumental learning. Cortex 2014; 63:145-54. [PMID: 25282053 DOI: 10.1016/j.cortex.2014.08.026] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2014] [Revised: 05/13/2014] [Accepted: 08/26/2014] [Indexed: 01/12/2023]
Abstract
INTRODUCTION There is growing evidence from neuro-computational studies that instrumental learning involves the dynamic interaction of a computationally rigid, low-level striatal and a more flexible, high-level prefrontal component. METHODS To evaluate the role of the prefrontal cortex in instrumental learning, we applied anodal transcranial direct current stimulation (tDCS) optimized for the left dorsolateral prefrontal cortex, by using realistic MR-derived finite element model-based electric field simulations. In a study with a double-blind, sham-controlled, repeated-measures design, sixteen male participants performed a probabilistic learning task while receiving anodal and sham tDCS in a counterbalanced order. RESULTS Compared to sham tDCS, anodal tDCS significantly increased the amount of maladaptive shifting behavior after optimal outcomes during learning when reward probabilities were highly dissociable. Derived parameters of the Q-learning computational model further revealed a significantly increased model parameter that was sensitive to random action selection in the anodal compared to the sham tDCS session, whereas the learning rate parameter was not influenced significantly by tDCS. CONCLUSION These results congruently indicate that prefrontal tDCS during instrumental learning increased randomness of choice, possibly reflecting the influence of the cognitive prefrontal component.
Collapse
Affiliation(s)
- Zsolt Turi
- Department Clinical Neurophysiology, University Medical Center, Georg-August University, Göttingen, Germany.
| | | | - Alexander Opitz
- Department Clinical Neurophysiology, University Medical Center, Georg-August University, Göttingen, Germany
| | - Miriam Popkes
- Department Clinical Neurophysiology, University Medical Center, Georg-August University, Göttingen, Germany
| | - Walter Paulus
- Department Clinical Neurophysiology, University Medical Center, Georg-August University, Göttingen, Germany
| | - Andrea Antal
- Department Clinical Neurophysiology, University Medical Center, Georg-August University, Göttingen, Germany
| |
Collapse
|
206
|
Abstract
AbstractWilson et al. draw our attention to the problem of a science of intentional change. We stress the connection between their approach and existing paradigms for learning and goal generation that have been developed in machine learning, artificial intelligence, and psychology. These paradigms outline the structural principles of a domain-general and teleologically open agent.
Collapse
|
207
|
Myers CE, Smith IM, Servatius RJ, Beck KD. Absence of "Warm-Up" during Active Avoidance Learning in a Rat Model of Anxiety Vulnerability: Insights from Computational Modeling. Front Behav Neurosci 2014; 8:283. [PMID: 25183956 PMCID: PMC4135546 DOI: 10.3389/fnbeh.2014.00283] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2014] [Accepted: 08/01/2014] [Indexed: 11/13/2022] Open
Abstract
Avoidance behaviors, in which a learned response causes omission of an upcoming punisher, are a core feature of many psychiatric disorders. While reinforcement learning (RL) models have been widely used to study the development of appetitive behaviors, less attention has been paid to avoidance. Here, we present a RL model of lever-press avoidance learning in Sprague-Dawley (SD) rats and in the inbred Wistar Kyoto (WKY) rat, which has been proposed as a model of anxiety vulnerability. We focus on “warm-up,” transiently decreased avoidance responding at the start of a testing session, which is shown by SD but not WKY rats. We first show that a RL model can correctly simulate key aspects of acquisition, extinction, and warm-up in SD rats; we then show that WKY behavior can be simulated by altering three model parameters, which respectively govern the tendency to explore new behaviors vs. exploit previously reinforced ones, the tendency to repeat previous behaviors regardless of reinforcement, and the learning rate for predicting future outcomes. This suggests that several, dissociable mechanisms may contribute independently to strain differences in behavior. The model predicts that, if the “standard” inter-session interval is shortened from 48 to 24 h, SD rats (but not WKY) will continue to show warm-up; we confirm this prediction in an empirical study with SD and WKY rats. The model further predicts that SD rats will continue to show warm-up with inter-session intervals as short as a few minutes, while WKY rats will not show warm-up, even with inter-session intervals as long as a month. Together, the modeling and empirical data indicate that strain differences in warm-up are qualitative rather than just the result of differential sensitivity to task variables. Understanding the mechanisms that govern expression of warm-up behavior in avoidance may lead to better understanding of pathological avoidance, and potential pathways to modify these processes.
Collapse
Affiliation(s)
- Catherine E Myers
- Department of Veterans Affairs, VA New Jersey Health Care System , East Orange, NJ , USA ; Stress and Motivated Behavior Institute, Department of Neurology and Neurosciences, New Jersey Medical School, Rutgers, The State University of New Jersey , Newark, NJ , USA
| | - Ian M Smith
- Department of Veterans Affairs, VA New Jersey Health Care System , East Orange, NJ , USA
| | - Richard J Servatius
- Department of Veterans Affairs, VA New Jersey Health Care System , East Orange, NJ , USA ; Stress and Motivated Behavior Institute, Department of Neurology and Neurosciences, New Jersey Medical School, Rutgers, The State University of New Jersey , Newark, NJ , USA
| | - Kevin D Beck
- Department of Veterans Affairs, VA New Jersey Health Care System , East Orange, NJ , USA ; Stress and Motivated Behavior Institute, Department of Neurology and Neurosciences, New Jersey Medical School, Rutgers, The State University of New Jersey , Newark, NJ , USA
| |
Collapse
|
208
|
Jung K, Jang H, Kralik JD, Jeong J. Bursts and heavy tails in temporal and sequential dynamics of foraging decisions. PLoS Comput Biol 2014; 10:e1003759. [PMID: 25122498 PMCID: PMC4133158 DOI: 10.1371/journal.pcbi.1003759] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2013] [Accepted: 06/17/2014] [Indexed: 11/22/2022] Open
Abstract
A fundamental understanding of behavior requires predicting when and what an individual will choose. However, the actual temporal and sequential dynamics of successive choices made among multiple alternatives remain unclear. In the current study, we tested the hypothesis that there is a general bursting property in both the timing and sequential patterns of foraging decisions. We conducted a foraging experiment in which rats chose among four different foods over a continuous two-week time period. Regarding when choices were made, we found bursts of rapidly occurring actions, separated by time-varying inactive periods, partially based on a circadian rhythm. Regarding what was chosen, we found sequential dynamics in affective choices characterized by two key features: (a) a highly biased choice distribution; and (b) preferential attachment, in which the animals were more likely to choose what they had previously chosen. To capture the temporal dynamics, we propose a dual-state model consisting of active and inactive states. We also introduce a satiation-attainment process for bursty activity, and a non-homogeneous Poisson process for longer inactivity between bursts. For the sequential dynamics, we propose a dual-control model consisting of goal-directed and habit systems, based on outcome valuation and choice history, respectively. This study provides insights into how the bursty nature of behavior emerges from the interaction of different underlying systems, leading to heavy tails in the distribution of behavior over time and choices. To understand spontaneous animal behavior, two key elements must be explained: when an action is made and what is chosen. Here, we conducted a foraging experiment in which rats chose among four different foods over a continuous two-week time period. With respect to when, we found bursts of rapidly occurring responses separated by long inactive periods. With respect to what, we found biased choice behavior toward the favorite items as well as repetitive behavior, reflecting goal-directed and habitual responding, respectively. We account for the when and what components with two distinct computational mechanisms, each composed of two processes: (a) active and inactive states for the temporal dynamics, and (b) goal-directed and habitual control for the sequential dynamics. This study provides behavioral and computational insights into the dynamical properties of decision-making that determine both when an animal will act and what the animal will choose. Our findings provide an integrated framework for describing the temporal and sequential structure of everyday choices among, for example, food, music, books, brands, web-browsing and social interaction.
Collapse
Affiliation(s)
- Kanghoon Jung
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, New Hampshire, United States of America
| | - Hyeran Jang
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea
| | - Jerald D. Kralik
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, New Hampshire, United States of America
| | - Jaeseung Jeong
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea
- * E-mail:
| |
Collapse
|
209
|
Impulse control disorders in Parkinson's disease are associated with dysfunction in stimulus valuation but not action valuation. J Neurosci 2014; 34:7814-24. [PMID: 24899705 DOI: 10.1523/jneurosci.4063-13.2014] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
A substantial subset of Parkinson's disease (PD) patients suffers from impulse control disorders (ICDs), which are side effects of dopaminergic medication. Dopamine plays a key role in reinforcement learning processes. One class of reinforcement learning models, known as the actor-critic model, suggests that two components are involved in these reinforcement learning processes: a critic, which estimates values of stimuli and calculates prediction errors, and an actor, which estimates values of potential actions. To understand the information processing mechanism underlying impulsive behavior, we investigated stimulus and action value learning from reward and punishment in four groups of participants: on-medication PD patients with ICD, on-medication PD patients without ICD, off-medication PD patients without ICD, and healthy controls. Analysis of responses suggested that participants used an actor-critic learning strategy and computed prediction errors based on stimulus values rather than action values. Quantitative model fits also revealed that an actor-critic model of the basal ganglia with different learning rates for positive and negative prediction errors best matched the choice data. Moreover, whereas ICDs were associated with model parameters related to stimulus valuation (critic), PD was associated with parameters related to action valuation (actor). Specifically, PD patients with ICD exhibited lower learning from negative prediction errors in the critic, resulting in an underestimation of adverse consequences associated with stimuli. These findings offer a specific neurocomputational account of the nature of compulsive behaviors induced by dopaminergic drugs.
Collapse
|
210
|
Laboratory studies of imitation/field studies of tradition: towards a synthesis in animal social learning. Behav Processes 2014; 112:114-9. [PMID: 25058622 DOI: 10.1016/j.beproc.2014.07.008] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2014] [Revised: 06/24/2014] [Accepted: 07/14/2014] [Indexed: 11/21/2022]
Abstract
Here I discuss: (1) historical precedents that have resulted in comparative psychologists accepting the two-action method as the "gold standard" in laboratory investigations of imitation learning, (2) evidence suggesting that the two-action procedure may not be adequate to answer questions concerning the role of imitation in the development of traditional behaviors of animals living in natural habitat, and (3) an alternative approach to the laboratory study of imitation that might increase the relevance of laboratory studies of imitation to the work of behavioral ecologists/primatologists interested in animal traditions and their relationship to human cumulative culture. This article is part of a Special Issue entitled: Tribute to Tom Zentall.
Collapse
|
211
|
Apps MAJ, Tsakiris M. Predictive codes of familiarity and context during the perceptual learning of facial identities. Nat Commun 2014; 4:2698. [PMID: 24220539 DOI: 10.1038/ncomms3698] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2012] [Accepted: 10/01/2013] [Indexed: 11/09/2022] Open
Abstract
Face recognition is a key component of successful social behaviour. However, the computational processes that underpin perceptual learning and recognition as faces transition from unfamiliar to familiar are poorly understood. In predictive coding, learning occurs through prediction errors that update stimulus familiarity, but recognition is a function of both stimulus and contextual familiarity. Here we show that behavioural responses on a two-option face recognition task can be predicted by the level of contextual and facial familiarity in a computational model derived from predictive-coding principles. Using fMRI, we show that activity in the superior temporal sulcus varies with the contextual familiarity in the model, whereas activity in the fusiform face area covaries with the prediction error parameter that updated facial familiarity. Our results characterize the key computations underpinning the perceptual learning of faces, highlighting that the functional properties of face-processing areas conform to the principles of predictive coding.
Collapse
Affiliation(s)
- Matthew A J Apps
- 1] Nuffield Department of Clinical Neuroscience, University of Oxford, Oxford OX3 9DU, UK [2] Department of Experimental Psychology, University of Oxford, Oxford OX1 3UD, UK [3] Laboratory of Action and Body, Department of Psychology, University of London, Royal Holloway, Egham, Surrey TW20 0EX, UK
| | | |
Collapse
|
212
|
Selbing I, Lindström B, Olsson A. Demonstrator skill modulates observational aversive learning. Cognition 2014; 133:128-39. [PMID: 25016187 DOI: 10.1016/j.cognition.2014.06.010] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2013] [Revised: 06/10/2014] [Accepted: 06/13/2014] [Indexed: 11/16/2022]
Abstract
Learning to avoid danger by observing others can be relatively safe, because it does not incur the potential costs of individual trial and error. However, information gained through social observation might be less reliable than information gained through individual experiences, underscoring the need to apply observational learning critically. In order for observational learning to be adaptive it should be modulated by the skill of the observed person, the demonstrator. To address this issue, we used a probabilistic two-choice task where participants learned to minimize the number of electric shocks through individual learning and by observing a demonstrator performing the same task. By manipulating the demonstrator's skill we varied how useful the observable information was; the demonstrator either learned the task quickly or did not learn it at all (random choices). To investigate the modulatory effect in detail, the task was performed under three conditions of available observable information; no observable information, observation of choices only, and observation of both the choices and their consequences. As predicted, our results showed that observable information can improve performance compared to individual learning, both when the demonstrator is skilled and unskilled; observation of consequences improved performance for both groups while observation of choices only improved performance for the group observing the skilled demonstrator. Reinforcement learning modeling showed that demonstrator skill modulated observational learning from the demonstrator's choices, but not their consequences, by increasing the degree of imitation over time for the group that observed a fast learner. Our results show that humans can adaptively modulate observational learning in response to the usefulness of observable information.
Collapse
Affiliation(s)
- Ida Selbing
- Karolinska Institute, Division of Psychology, Nobels väg 9, 171 65 Solna, Sweden; Stockholm Brain Institute, Retzius väg 8, 171 65 Solna, Sweden.
| | - Björn Lindström
- Karolinska Institute, Division of Psychology, Nobels väg 9, 171 65 Solna, Sweden; Stockholm Brain Institute, Retzius väg 8, 171 65 Solna, Sweden.
| | - Andreas Olsson
- Karolinska Institute, Division of Psychology, Nobels väg 9, 171 65 Solna, Sweden; Stockholm Brain Institute, Retzius väg 8, 171 65 Solna, Sweden.
| |
Collapse
|
213
|
Jung K, Jang H, Kralik JD, Jeong J. Computational modeling of temporal and sequential dynamics of foraging decisions. BMC Neurosci 2014. [PMCID: PMC4125035 DOI: 10.1186/1471-2202-15-s1-p137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
|
214
|
Shine JM, Shine R. Delegation to automaticity: the driving force for cognitive evolution? Front Neurosci 2014; 8:90. [PMID: 24808820 PMCID: PMC4010745 DOI: 10.3389/fnins.2014.00090] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2014] [Accepted: 04/09/2014] [Indexed: 11/30/2022] Open
Abstract
The ability to delegate control over repetitive tasks from higher to lower neural centers may be a fundamental innovation in human cognition. Plausibly, the massive neurocomputational challenges associated with the mastery of balance during the evolution of bipedality in proto-humans provided a strong selective advantage to individuals with brains capable of efficiently transferring tasks in this way. Thus, the shift from quadrupedal to bipedal locomotion may have driven the rapid evolution of distinctive features of human neuronal functioning. We review recent studies of functional neuroanatomy that bear upon this hypothesis, and identify ways to test our ideas.
Collapse
Affiliation(s)
- J. M. Shine
- Brain and Mind Research Institute, The University of SydneySydney, NSW, Australia
| | - R. Shine
- School of Biological Sciences, The University of SydneySydney, NSW, Australia
| |
Collapse
|
215
|
Aquili L. The causal role between phasic midbrain dopamine signals and learning. Front Behav Neurosci 2014; 8:139. [PMID: 24795588 PMCID: PMC4007013 DOI: 10.3389/fnbeh.2014.00139] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2014] [Accepted: 04/04/2014] [Indexed: 12/22/2022] Open
Affiliation(s)
- Luca Aquili
- Department of Psychology, Sunway University Bandar Sunway, Petaling Jaya, Malaysia
| |
Collapse
|
216
|
Alvares GA, Balleine BW, Guastella AJ. Impairments in goal-directed actions predict treatment response to cognitive-behavioral therapy in social anxiety disorder. PLoS One 2014; 9:e94778. [PMID: 24728288 PMCID: PMC3984205 DOI: 10.1371/journal.pone.0094778] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2013] [Accepted: 03/19/2014] [Indexed: 11/18/2022] Open
Abstract
Social anxiety disorder is characterized by excessive fear and habitual avoidance of social situations. Decision-making models suggest that patients with anxiety disorders may fail to exhibit goal-directed control over actions. We therefore investigated whether such biases may also be associated with social anxiety and to examine the relationship between such behavior with outcomes from cognitive-behavioral therapy. Patients diagnosed with social anxiety and controls completed an instrumental learning task in which two actions were performed to earn food outcomes. After outcome devaluation, where one outcome was consumed to satiety, participants were re-tested in extinction. Results indicated that, as expected, controls were goal-directed, selectively reducing responding on the action that previously delivered the devalued outcome. Patients with social anxiety, however, exhibited no difference in responding on either action. This loss of a devaluation effect was associated with greater symptom severity and poorer response to therapy. These findings indicate that variations in goal-directed control in social anxiety may represent both a behavioral endophenotype and may be used to predict individuals who will respond to learning-based therapies.
Collapse
Affiliation(s)
- Gail A. Alvares
- Brain & Mind Research Institute, The University of Sydney, Sydney, New South Wales, Australia
| | - Bernard W. Balleine
- Brain & Mind Research Institute, The University of Sydney, Sydney, New South Wales, Australia
| | - Adam J. Guastella
- Brain & Mind Research Institute, The University of Sydney, Sydney, New South Wales, Australia
- * E-mail: .
| |
Collapse
|
217
|
Chumbley JR, Burke CJ, Stephan KE, Friston KJ, Tobler PN, Fehr E. Surprise beyond prediction error. Hum Brain Mapp 2014; 35:4805-14. [PMID: 24700400 PMCID: PMC4312927 DOI: 10.1002/hbm.22513] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2013] [Revised: 02/04/2014] [Accepted: 03/18/2014] [Indexed: 11/18/2022] Open
Abstract
Surprise drives learning. Various neural “prediction error” signals are believed to underpin surprise‐based reinforcement learning. Here, we report a surprise signal that reflects reinforcement learning but is neither un/signed reward prediction error (RPE) nor un/signed state prediction error (SPE). To exclude these alternatives, we measured surprise responses in the absence of RPE and accounted for a host of potential SPE confounds. This new surprise signal was evident in ventral striatum, primary sensory cortex, frontal poles, and amygdala. We interpret these findings via a normative model of surprise. Hum Brain Mapp 35:4805–4814, 2014. © 2014 The Authors. Human Brain Mapping Published by Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Justin R Chumbley
- Laboratory for Social and Neural Systems Research, University of Zurich, Switzerland
| | | | | | | | | | | |
Collapse
|
218
|
Exploration and learning in capuchin monkeys (Sapajus spp.): the role of action-outcome contingencies. Anim Cogn 2014; 17:1081-8. [PMID: 24638875 DOI: 10.1007/s10071-014-0740-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2013] [Revised: 02/21/2014] [Accepted: 03/03/2014] [Indexed: 10/25/2022]
Abstract
Animals have a strong propensity to explore the environment. Spontaneous exploration has a great biological significance since it allows animals to discover and learn the relation between specific behaviours and their consequences. The role of the contingency between action and outcome for learning has been mainly investigated in instrumental learning settings and much less in free exploration contexts. We tested 16 capuchin monkeys (Sapajus spp.) with a mechatronic platform that allowed complex modules to be manipulated and to produce different outcomes. Experimental subjects could manipulate the modules and discover the contingencies between their own specific actions and the outcomes produced (i.e., the opening and lighting of a box). By contrast, Control subjects could operate on the modules, but the outcomes experienced were those performed by their paired Experimental subjects ("yoked-control" paradigm). In the exploration phase, in which no food reward was present, Experimental subjects spent more time on the board and manipulated the modules more than Yoked subjects. Experimental subjects outperformed Yoked subjects in the following test phase, where success required recalling the effective action so to open the box, now baited with food. These findings demonstrate that the opportunity to experience action-outcome contingencies in the absence of extrinsic rewards promotes capuchins' exploration and facilitates learning processes. Thus, this intrinsically motivated learning represents a powerful mechanism allowing the acquisition of skills and cognitive competence that the individual can later exploit for adaptive purposes.
Collapse
|
219
|
Burt SA, Klump KL. Prosocial peer affiliation suppresses genetic influences on non-aggressive antisocial behaviors during childhood. Psychol Med 2014; 44:821-830. [PMID: 23659437 PMCID: PMC3749251 DOI: 10.1017/s0033291713000974] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
BACKGROUND Available research has suggested that affiliation with prosocial peers reduces child and adolescent antisocial behavior. However, the etiologic mechanisms driving this association remain unclear. The current study sought to evaluate whether this association takes the form of a gene-environment interaction (G × E) in which prosocial peer affiliation acts to reduce the consequences of genetic risk for non-aggressive antisocial behavior during childhood. METHOD Our sample consisted of 500 twin pairs aged 6-10 years from the Michigan State University Twin Registry (MSUTR). RESULTS The results robustly support moderation by prosocial peer affiliation. Genetic influences on non-aggressive antisocial behavior were observed to be several times larger in those with lower levels of prosocial peer affiliation than in those with higher levels of prosocial peer affiliation. This pattern of results persisted even after controlling for gene-environment correlations and deviant peer affiliation, and when restricting our analyses to those twins who shared all or nearly all of their friends. CONCLUSIONS Such findings not only suggest that prosocial peer affiliation moderates genetic influences on non-aggressive antisocial behaviors during childhood but also provide support for the theoretical notion that protective environmental experiences may exert their influence by promoting resilience to genetic risk.
Collapse
Affiliation(s)
- S A Burt
- Department of Psychology, Michigan State University, East Lansing, MI, USA
| | - K L Klump
- Department of Psychology, Michigan State University, East Lansing, MI, USA
| |
Collapse
|
220
|
Lesaint F, Sigaud O, Flagel SB, Robinson TE, Khamassi M. Modelling individual differences in the form of Pavlovian conditioned approach responses: a dual learning systems approach with factored representations. PLoS Comput Biol 2014; 10:e1003466. [PMID: 24550719 PMCID: PMC3923662 DOI: 10.1371/journal.pcbi.1003466] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2013] [Accepted: 12/19/2013] [Indexed: 12/04/2022] Open
Abstract
Reinforcement Learning has greatly influenced models of conditioning, providing powerful explanations of acquired behaviour and underlying physiological observations. However, in recent autoshaping experiments in rats, variation in the form of Pavlovian conditioned responses (CRs) and associated dopamine activity, have questioned the classical hypothesis that phasic dopamine activity corresponds to a reward prediction error-like signal arising from a classical Model-Free system, necessary for Pavlovian conditioning. Over the course of Pavlovian conditioning using food as the unconditioned stimulus (US), some rats (sign-trackers) come to approach and engage the conditioned stimulus (CS) itself - a lever - more and more avidly, whereas other rats (goal-trackers) learn to approach the location of food delivery upon CS presentation. Importantly, although both sign-trackers and goal-trackers learn the CS-US association equally well, only in sign-trackers does phasic dopamine activity show classical reward prediction error-like bursts. Furthermore, neither the acquisition nor the expression of a goal-tracking CR is dopamine-dependent. Here we present a computational model that can account for such individual variations. We show that a combination of a Model-Based system and a revised Model-Free system can account for the development of distinct CRs in rats. Moreover, we show that revising a classical Model-Free system to individually process stimuli by using factored representations can explain why classical dopaminergic patterns may be observed for some rats and not for others depending on the CR they develop. In addition, the model can account for other behavioural and pharmacological results obtained using the same, or similar, autoshaping procedures. Finally, the model makes it possible to draw a set of experimental predictions that may be verified in a modified experimental protocol. We suggest that further investigation of factored representations in computational neuroscience studies may be useful.
Collapse
Affiliation(s)
- Florian Lesaint
- Institut des Systèmes Intelligents et de Robotique, UMR 7222, UPMC Univ Paris 06, Paris, France
- Institut des Systèmes Intelligents et de Robotique, UMR 7222, CNRS, Paris, France
| | - Olivier Sigaud
- Institut des Systèmes Intelligents et de Robotique, UMR 7222, UPMC Univ Paris 06, Paris, France
- Institut des Systèmes Intelligents et de Robotique, UMR 7222, CNRS, Paris, France
| | - Shelly B. Flagel
- Department of Psychiatry, University of Michigan, Ann Arbor, Michigan, United States of America
- Molecular and Behavioral Neuroscience Institute, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Psychology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Terry E. Robinson
- Department of Psychology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Mehdi Khamassi
- Institut des Systèmes Intelligents et de Robotique, UMR 7222, UPMC Univ Paris 06, Paris, France
- Institut des Systèmes Intelligents et de Robotique, UMR 7222, CNRS, Paris, France
| |
Collapse
|
221
|
Lindström B, Selbing I, Molapour T, Olsson A. Racial bias shapes social reinforcement learning. Psychol Sci 2014; 25:711-9. [PMID: 24458270 DOI: 10.1177/0956797613514093] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Both emotional facial expressions and markers of racial-group belonging are ubiquitous signals in social interaction, but little is known about how these signals together affect future behavior through learning. To address this issue, we investigated how emotional (threatening or friendly) in-group and out-group faces reinforced behavior in a reinforcement-learning task. We asked whether reinforcement learning would be modulated by intergroup attitudes (i.e., racial bias). The results showed that individual differences in racial bias critically modulated reinforcement learning. As predicted, racial bias was associated with more efficiently learned avoidance of threatening out-group individuals. We used computational modeling analysis to quantitatively delimit the underlying processes affected by social reinforcement. These analyses showed that racial bias modulates the rate at which exposure to threatening out-group individuals is transformed into future avoidance behavior. In concert, these results shed new light on the learning processes underlying social interaction with racial-in-group and out-group individuals.
Collapse
|
222
|
A new perspective on human reward research: How consciously and unconsciously perceived reward information influences performance. COGNITIVE AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2014; 14:493-508. [DOI: 10.3758/s13415-013-0241-z] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
223
|
Chen JF. Adenosine receptor control of cognition in normal and disease. INTERNATIONAL REVIEW OF NEUROBIOLOGY 2014; 119:257-307. [PMID: 25175970 DOI: 10.1016/b978-0-12-801022-8.00012-x] [Citation(s) in RCA: 100] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Adenosine and adenosine receptors (ARs) are increasingly recognized as important therapeutic targets for controlling cognition under normal and disease conditions for its dual roles of neuromodulation as well as of homeostatic function in the brain. This chapter first presents the unique ability of adenosine, by acting on the inhibitory A1 and facilitating A2A receptor, to integrate dopamine, glutamate, and BNDF signaling and to modulate synaptic plasticity (e.g., long-term potentiation and long-term depression) in brain regions relevant to learning and memory, providing the molecular and cellular bases for adenosine receptor (AR) control of cognition. This led to the demonstration of AR modulation of social recognition memory, working memory, reference memory, reversal learning, goal-directed behavior/habit formation, Pavlovian fear conditioning, and effort-related behavior. Furthermore, human and animal studies support that AR activity can also, through cognitive enhancement and neuroprotection, reverse cognitive impairments in animal models of Alzheimer's disease (AD), Parkinson's disease (PD), Huntington's disease, and schizophrenia. Lastly, epidemiological evidence indicates that regular human consumption of caffeine, the most widely used psychoactive drug and nonselective AR antagonists, is associated with the reduced cognitive decline in aging and AD patients, and with the reduced risk in developing PD. Thus, there is a convergence of the molecular studies revealing AR as molecular targets for integrating neurotransmitter signaling and controlling synaptic plasticity, with animal studies demonstrating the strong procognitive impact upon AR antagonism in normal and disease brains and with epidemiological and clinical evidences in support of caffeine and AR drugs for therapeutic modulation of cognition. Since some of adenosine A2A receptor antagonists are already in phase III clinical trials for motor benefits in PD patients with remarkable safety profiles, additional animal and human studies to better understand the mechanism underlying the AR-mediated control of cognition under normal and disease conditions will provide the required rationale to stimulate the necessary clinical investigation to rapidly translate adenosine and AR drug as a novel strategy to control memory impairment in neuropsychiatric disorders.
Collapse
Affiliation(s)
- Jiang-Fan Chen
- Department of Neurology, Boston University School of Medicine, Boston, Massachusetts, USA; The Molecular Medicine Institute, Wenzhou Medical University, Wenzhou, Zhejiang, PR China.
| |
Collapse
|
224
|
Abstract
Exposure to an uncontrollable stressor elicits a constellation of physiological and behavioral sequel in laboratory rats that often reflect aspects of anxiety and other emotional disruptions. We review evidence suggesting that plasticity within the serotonergic dorsal raphe nucleus (DRN) is critical to the expression of uncontrollable stressor-induced anxiety. Specifically, after uncontrollable stressor exposure subsequent anxiogenic stimuli evoke greater 5-HT release in DRN terminal regions including the amygdala and striatum; and pharmacological blockade of postsynaptic 5-HT(2C) receptors in these regions prevents expression of stressor-induced anxiety. Importantly, the controllability of stress, the presence of safety signals, and a history of exercise mitigate the expression of stressor-induced anxiety. These stress-protective factors appear to involve distinct neural substrates; with stressor controllability requiring the medial prefrontal cortex, safety signals the insular cortex and exercise affecting the 5-HT system directly. Knowledge of the distinct yet converging mechanisms underlying these stress-protective factors could provide insight into novel strategies for the treatment and prevention of stress-related psychiatric disorders.
Collapse
|
225
|
Fatal attraction: ventral striatum predicts costly choice errors in humans. Neuroimage 2013; 89:1-9. [PMID: 24291504 DOI: 10.1016/j.neuroimage.2013.11.039] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2013] [Revised: 11/07/2013] [Accepted: 11/18/2013] [Indexed: 11/23/2022] Open
Abstract
Animals approach rewards and cues associated with reward, even when this behavior is irrelevant or detrimental to the attainment of these rewards. Motivated by these findings we study the biology of financially-costly approach behavior in humans. Our subjects passively learned to predict the occurrence of erotic rewards. We show that neuronal responses in ventral striatum during this Pavlovian learning task stably predict an individual's general tendency towards financially-costly approach behavior in an active choice task several months later. Our data suggest that approach behavior may prevent some individuals from acting in their own interests.
Collapse
|
226
|
Dopamine and extinction: a convergence of theory with fear and reward circuitry. Neurobiol Learn Mem 2013; 108:65-77. [PMID: 24269353 DOI: 10.1016/j.nlm.2013.11.007] [Citation(s) in RCA: 147] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2013] [Revised: 11/01/2013] [Accepted: 11/08/2013] [Indexed: 01/11/2023]
Abstract
Research on dopamine lies at the intersection of sophisticated theoretical and neurobiological approaches to learning and memory. Dopamine has been shown to be critical for many processes that drive learning and memory, including motivation, prediction error, incentive salience, memory consolidation, and response output. Theories of dopamine's function in these processes have, for the most part, been developed from behavioral approaches that examine learning mechanisms in reward-related tasks. A parallel and growing literature indicates that dopamine is involved in fear conditioning and extinction. These studies are consistent with long-standing ideas about appetitive-aversive interactions in learning theory and they speak to the general nature of cellular and molecular processes that underlie behavior. We review the behavioral and neurobiological literature showing a role for dopamine in fear conditioning and extinction. At a cellular level, we review dopamine signaling and receptor pharmacology, cellular and molecular events that follow dopamine receptor activation, and brain systems in which dopamine functions. At a behavioral level, we describe theories of learning and dopamine function that could describe the fundamental rules underlying how dopamine modulates different aspects of learning and memory processes.
Collapse
|
227
|
Zahedi K, Martius G, Ay N. Linear combination of one-step predictive information with an external reward in an episodic policy gradient setting: a critical analysis. Front Psychol 2013; 4:801. [PMID: 24204351 PMCID: PMC3816314 DOI: 10.3389/fpsyg.2013.00801] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2013] [Accepted: 10/10/2013] [Indexed: 11/23/2022] Open
Abstract
One of the main challenges in the field of embodied artificial intelligence is the open-ended autonomous learning of complex behaviors. Our approach is to use task-independent, information-driven intrinsic motivation(s) to support task-dependent learning. The work presented here is a preliminary step in which we investigate the predictive information (the mutual information of the past and future of the sensor stream) as an intrinsic drive, ideally supporting any kind of task acquisition. Previous experiments have shown that the predictive information (PI) is a good candidate to support autonomous, open-ended learning of complex behaviors, because a maximization of the PI corresponds to an exploration of morphology- and environment-dependent behavioral regularities. The idea is that these regularities can then be exploited in order to solve any given task. Three different experiments are presented and their results lead to the conclusion that the linear combination of the one-step PI with an external reward function is not generally recommended in an episodic policy gradient setting. Only for hard tasks a great speed-up can be achieved at the cost of an asymptotic performance lost.
Collapse
Affiliation(s)
- Keyan Zahedi
- Max Planck Institute for Mathematics in the Sciences Leipzig, Germany
| | | | | |
Collapse
|
228
|
Galef BG. Imitation and local enhancement: Detrimental effects of consensus definitions on analyses of social learning in animals. Behav Processes 2013; 100:123-30. [DOI: 10.1016/j.beproc.2013.07.026] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2013] [Revised: 07/15/2013] [Accepted: 07/27/2013] [Indexed: 02/05/2023]
|
229
|
Abstract
Conduct disorder is a childhood behaviour disorder that is characterized by persistent aggressive or antisocial behaviour that disrupts the child's environment and impairs his or her functioning. A proportion of children with conduct disorder have psychopathic traits. Psychopathic traits consist of a callous-unemotional component and an impulsive-antisocial component, which are associated with two core impairments. The first is a reduced empathic response to the distress of other individuals, which primarily reflects reduced amygdala responsiveness to distress cues; the second is deficits in decision making and in reinforcement learning, which reflects dysfunction in the ventromedial prefrontal cortex and striatum. Genetic and prenatal factors contribute to the abnormal development of these neural systems, and social-environmental variables that affect motivation influence the probability that antisocial behaviour will be subsequently displayed.
Collapse
|
230
|
Ogino M, Nishikawa A, Asada M. A motivation model for interaction between parent and child based on the need for relatedness. Front Psychol 2013; 4:618. [PMID: 24062710 PMCID: PMC3770941 DOI: 10.3389/fpsyg.2013.00618] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2013] [Accepted: 08/22/2013] [Indexed: 11/16/2022] Open
Abstract
In parent-child communication, emotions are evoked by various types of intrinsic and extrinsic motivation. Those emotions encourage actions that promote more interactions. We present a motivation model of infant-caregiver interactions, in which relatedness, one of the most important basic psychological needs, is a variable that increases with experiences of emotion sharing. Besides being an important factor of pleasure, relatedness is a meta-factor that affects other factors such as stress and emotional mirroring. The proposed model is implemented in an artificial agent equipped with a system to recognize gestures and facial expressions. The baby-like agent successfully interacts with an actual human and adversely reacts when the caregiver suddenly ceases facial expressions, similar to the “still-face paradigm” demonstrated by infants in psychological experiments. In the simulation experiment, two agents, each controlled by the proposed motivation model, show relatedness-dependent emotional communication that mimics actual human communication.
Collapse
Affiliation(s)
- Masaki Ogino
- Department of Informatics, Kansai University Osaka, Japan
| | | | | |
Collapse
|
231
|
In vivo two-photon Ca2+ imaging reveals selective reward effects on stimulus-specific assemblies in mouse visual cortex. J Neurosci 2013; 33:11540-55. [PMID: 23843524 DOI: 10.1523/jneurosci.1341-12.2013] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Experiences can alter functional properties of neurons in primary sensory neocortex but it is poorly understood how stimulus-reward associations contribute to these changes. Using in vivo two-photon calcium imaging in mouse primary visual cortex (V1), we show that association of a directional visual stimulus with reward results in broadened orientation tuning and sharpened direction tuning in a stimulus-selective subpopulation of V1 neurons. Neurons with preferred orientations similar, but not identical to, the CS+ selectively increased their tuning curve bandwidth and thereby exhibited an increased response amplitude at the CS+ orientation. The increase in response amplitude was observed for a small range of orientations around the CS+ orientation. A nonuniform spatial distribution of reward effects across the cortical surface was observed, as the spatial distance between pairs of CS+ tuned neurons was reduced compared with pairs of CS- tuned neurons and pairs of control directions or orientations. These data show that, in primary visual cortex, formation of a stimulus-reward association results in selective alterations in stimulus-specific assemblies rather than population-wide effects.
Collapse
|
232
|
Hikosaka O, Yamamoto S, Yasuda M, Kim HF. Why skill matters. Trends Cogn Sci 2013; 17:434-41. [PMID: 23911579 PMCID: PMC3756891 DOI: 10.1016/j.tics.2013.07.001] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2013] [Revised: 07/01/2013] [Accepted: 07/01/2013] [Indexed: 11/16/2022]
Abstract
Maximizing rewards per unit time is ideal for success and survival in humans and animals. This goal can be approached by speeding up behavior aiming at rewards and this is done most efficiently by acquiring skills. Importantly, reward-directed skills consist of two components: finding a good object (i.e., object skill) and acting on the object (i.e., action skill), which occur sequentially. Recent studies suggest that object skill is based on high-capacity memory for object-value associations. When a learned object is encountered the corresponding memory is quickly expressed as a value-based gaze bias, leading to the automatic acquisition or avoidance of the object. Object skill thus plays a crucial role in increasing rewards per unit time.
Collapse
Affiliation(s)
- Okihide Hikosaka
- Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health, Bethesda, MD 20892, USA.
| | | | | | | |
Collapse
|
233
|
Distinct basal ganglia circuits controlling behaviors guided by flexible and stable values. Neuron 2013; 79:1001-10. [PMID: 23954031 DOI: 10.1016/j.neuron.2013.06.044] [Citation(s) in RCA: 143] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/26/2013] [Indexed: 11/20/2022]
Abstract
Choosing valuable objects is critical for survival, but their values may change flexibly or remain stable. Therefore, animals should be able to update the object values flexibly by recent experiences and retain them stably by long-term experiences. However, it is unclear how the brain encodes the two conflicting forms of values and controls behavior accordingly. We found that distinct circuits of the primate caudate nucleus control behavior selectively in the flexible and stable value conditions. Single caudate neurons encoded the values of visual objects in a regionally distinct manner: flexible value coding in the caudate head and stable value coding in the caudate tail. Monkeys adapted in both conditions by looking at objects with higher values. Importantly, inactivation of each caudate subregion disrupted the high-low value discrimination selectively in the flexible or stable context. This parallel complementary mechanism enables animals to choose valuable objects in both flexible and stable conditions.
Collapse
|
234
|
Abstract
Pavlovian biases influence learning and decision making by intricately coupling reward seeking with action invigoration and punishment avoidance with action suppression. This bias is not always adaptive-it can often interfere with instrumental requirements. The prefrontal cortex is thought to help resolve such conflict between motivational systems, but the nature of this control process remains unknown. EEG recordings of midfrontal theta band power are sensitive to conflict and predictive of adaptive control over behavior, but it is not clear whether this signal reflects control over conflict between motivational systems. Here we used a task that orthogonalized action requirements and outcome valence while recording concurrent EEG in human participants. By applying a computational model of task performance, we derived parameters reflective of the latent influence of Pavlovian bias and how it was modulated by midfrontal theta power during motivational conflict. Between subjects, those who performed better under Pavlovian conflict exhibited higher midfrontal theta power. Within subjects, trial-to-trial variance in theta power was predictive of ability to overcome the influence of the Pavlovian bias, and this effect was most pronounced in subjects with higher midfrontal theta to conflict. These findings demonstrate that midfrontal theta is not only a sensitive index of prefrontal control, but it can also reflect the application of top-down control over instrumental processes.
Collapse
|
235
|
Dopaminergic control of motivation and reinforcement learning: a closed-circuit account for reward-oriented behavior. J Neurosci 2013; 33:8866-90. [PMID: 23678129 DOI: 10.1523/jneurosci.4614-12.2013] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Humans and animals take actions quickly when they expect that the actions lead to reward, reflecting their motivation. Injection of dopamine receptor antagonists into the striatum has been shown to slow such reward-seeking behavior, suggesting that dopamine is involved in the control of motivational processes. Meanwhile, neurophysiological studies have revealed that phasic response of dopamine neurons appears to represent reward prediction error, indicating that dopamine plays central roles in reinforcement learning. However, previous attempts to elucidate the mechanisms of these dopaminergic controls have not fully explained how the motivational and learning aspects are related and whether they can be understood by the way the activity of dopamine neurons itself is controlled by their upstream circuitries. To address this issue, we constructed a closed-circuit model of the corticobasal ganglia system based on recent findings regarding intracortical and corticostriatal circuit architectures. Simulations show that the model could reproduce the observed distinct motivational effects of D1- and D2-type dopamine receptor antagonists. Simultaneously, our model successfully explains the dopaminergic representation of reward prediction error as observed in behaving animals during learning tasks and could also explain distinct choice biases induced by optogenetic stimulation of the D1 and D2 receptor-expressing striatal neurons. These results indicate that the suggested roles of dopamine in motivational control and reinforcement learning can be understood in a unified manner through a notion that the indirect pathway of the basal ganglia represents the value of states/actions at a previous time point, an empirically driven key assumption of our model.
Collapse
|
236
|
Herd SA, Krueger KA, Kriete TE, Huang TR, Hazy TE, O'Reilly RC. Strategic cognitive sequencing: a computational cognitive neuroscience approach. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2013; 2013:149329. [PMID: 23935605 PMCID: PMC3722785 DOI: 10.1155/2013/149329] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/22/2012] [Revised: 05/07/2013] [Accepted: 05/28/2013] [Indexed: 11/20/2022]
Abstract
We address strategic cognitive sequencing, the "outer loop" of human cognition: how the brain decides what cognitive process to apply at a given moment to solve complex, multistep cognitive tasks. We argue that this topic has been neglected relative to its importance for systematic reasons but that recent work on how individual brain systems accomplish their computations has set the stage for productively addressing how brain regions coordinate over time to accomplish our most impressive thinking. We present four preliminary neural network models. The first addresses how the prefrontal cortex (PFC) and basal ganglia (BG) cooperate to perform trial-and-error learning of short sequences; the next, how several areas of PFC learn to make predictions of likely reward, and how this contributes to the BG making decisions at the level of strategies. The third models address how PFC, BG, parietal cortex, and hippocampus can work together to memorize sequences of cognitive actions from instruction (or "self-instruction"). The last shows how a constraint satisfaction process can find useful plans. The PFC maintains current and goal states and associates from both of these to find a "bridging" state, an abstract plan. We discuss how these processes could work together to produce strategic cognitive sequencing and discuss future directions in this area.
Collapse
Affiliation(s)
- Seth A Herd
- Department of Psychology, University of Colorado Boulder, Boulder, CO 80309, USA.
| | | | | | | | | | | |
Collapse
|
237
|
Smart K, Desmond RC, Poulos CX, Zack M. Modafinil increases reward salience in a slot machine game in low and high impulsivity pathological gamblers. Neuropharmacology 2013; 73:66-74. [PMID: 23711549 DOI: 10.1016/j.neuropharm.2013.05.015] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2012] [Revised: 05/06/2013] [Accepted: 05/09/2013] [Indexed: 10/26/2022]
Abstract
This study examined the effects of modafinil (200 mg) on slot machine betting profiles from a previous sample of low and high impulsivity (LI/HI) pathological gamblers (10/Group; Zack and Poulos, 2009). Hierarchical regression assessed the prospective relationship between Payoff and Bet Size on consecutive trials, along with moderating effects of Group, Cumulative Winnings (low/high) and Phase of game (early/late) under drug and placebo. Y intercepts for the simple regressions of Bet Size on Payoff indexed overall motivation to bet. Under placebo, both groups gauged their bets less closely to the preceding Payoff as trials continued when Winnings were low but not high. Under modafinil, both groups gauged their bets more closely to the preceding Payoff when Winnings were low but gauged their bets less closely to the previous Payoff when Winnings were high. The tendency to gauge bets closely to the previous Payoff coincided with a bias toward low overall Bet Size, and modafinil accentuated this relationship, in LI but not HI subjects. Results suggest that modafinil increases the salience of environmental rewards, leading to more tightly calibrated responses to individual rewards when resources are low, but progressively loosens reward-response calibration when resources are high. Increased relative impact of phasic vs. tonic dopamine signals may account for patterns seen at low vs. high Winnings, respectively, under the drug. Clinically, modafinil may deter pathological gamblers from chasing losses but also encourage them to continue betting rather than quit while they are ahead. Whether low-dose modafinil confers more uniform benefits deserves investigation.
Collapse
Affiliation(s)
- Kelly Smart
- Centre for Addiction and Mental Health, 33 Russell Street, Toronto, Ontario M5S 2S1, Canada
| | | | | | | |
Collapse
|
238
|
Functional circuits and anatomical distribution of response properties in the primate amygdala. J Neurosci 2013; 33:722-33. [PMID: 23303950 DOI: 10.1523/jneurosci.2970-12.2013] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Recent electrophysiological studies on the primate amygdala have advanced our understanding of how individual neurons encode information relevant to emotional processes, but it remains unclear how these neurons are functionally and anatomically organized. To address this, we analyzed cross-correlograms of amygdala spike trains recorded during a task in which monkeys learned to associate novel images with rewarding and aversive outcomes. Using this task, we have recently described two populations of amygdala neurons: one that responds more strongly to images predicting reward (positive value-coding), and another that responds more strongly to images predicting an aversive stimulus (negative value-coding). Here, we report that these neural populations are organized into distinct, but anatomically intermingled, appetitive and aversive functional circuits, which are dynamically modulated as animals used the images to predict outcomes. Furthermore, we report that responses to sensory stimuli are prevalent in the lateral amygdala, and are also prevalent in the medial amygdala for sensory stimuli that are emotionally significant. The circuits identified here could potentially mediate valence-specific emotional behaviors thought to involve the amygdala.
Collapse
|
239
|
van der Vegt JPM, Hulme OJ, Zittel S, Madsen KH, Weiss MM, Buhmann C, Bloem BR, Münchau A, Siebner HR. Attenuated neural response to gamble outcomes in drug-naive patients with Parkinson's disease. ACTA ACUST UNITED AC 2013; 136:1192-203. [PMID: 23442226 DOI: 10.1093/brain/awt027] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Parkinson's disease results from the degeneration of dopaminergic neurons in the substantia nigra, manifesting as a spectrum of motor, cognitive and affective deficits. Parkinson's disease also affects reward processing, but disease-related deficits in reinforcement learning are thought to emerge at a slower pace than motor symptoms as the degeneration progresses from dorsal to ventral striatum. Dysfunctions in reward processing are difficult to study in Parkinson's disease as most patients have been treated with dopaminergic drugs, which sensitize reward responses in the ventral striatum, commonly resulting in impulse control disorders. To circumvent this treatment confound, we assayed the neural basis of reward processing in a group of newly diagnosed patients with Parkinson's disease that had never been treated with dopaminergic drugs. Thirteen drug-naive patients with Parkinson's disease and 12 healthy age-matched control subjects underwent whole-brain functional magnetic resonance imaging while they performed a simple two-choice gambling task resulting in stochastic and parametrically variable monetary gains and losses. In patients with Parkinson's disease, the neural response to reward outcome (as reflected by the blood oxygen level-dependent signal) was attenuated in a large group of mesolimbic and mesocortical regions, comprising the ventral putamen, ventral tegmental area, thalamus and hippocampus. Although these regions showed a linear response to reward outcome in healthy individuals, this response was either markedly reduced or undetectable in drug-naive patients with Parkinson's disease. The results show that the core regions of the meso-cortico-limbic dopaminergic system, including the ventral tegmental area, ventral striatum, and medial orbitofrontal cortex, are already significantly compromised in the early stages of the disease and that these deficits cannot be attributed to the contaminating effect of dopaminergic treatment.
Collapse
Affiliation(s)
- Joyce P M van der Vegt
- Danish Research Centre for Magnetic Resonance, Copenhagen University Hospital Hvidovre, Kettegaard Allé 30, DK-2650 Hvidovre, Denmark
| | | | | | | | | | | | | | | | | |
Collapse
|
240
|
Abstract
It is now widely accepted that instrumental actions can be either goal-directed or habitual; whereas the former are rapidly acquired and regulated by their outcome, the latter are reflexive, elicited by antecedent stimuli rather than their consequences. Model-based reinforcement learning (RL) provides an elegant description of goal-directed action. Through exposure to states, actions and rewards, the agent rapidly constructs a model of the world and can choose an appropriate action based on quite abstract changes in environmental and evaluative demands. This model is powerful but has a problem explaining the development of habitual actions. To account for habits, theorists have argued that another action controller is required, called model-free RL, that does not form a model of the world but rather caches action values within states allowing a state to select an action based on its reward history rather than its consequences. Nevertheless, there are persistent problems with important predictions from the model; most notably the failure of model-free RL correctly to predict the insensitivity of habitual actions to changes in the action-reward contingency. Here, we suggest that introducing model-free RL in instrumental conditioning is unnecessary, and demonstrate that reconceptualizing habits as action sequences allows model-based RL to be applied to both goal-directed and habitual actions in a manner consistent with what real animals do. This approach has significant implications for the way habits are currently investigated and generates new experimental predictions.
Collapse
Affiliation(s)
- Amir Dezfouli
- Brain & Mind Research Institute, University of Sydney, Camperdown, NSW 2050, Australia
| | | |
Collapse
|
241
|
Berridge KC. From prediction error to incentive salience: mesolimbic computation of reward motivation. Eur J Neurosci 2013; 35:1124-43. [PMID: 22487042 DOI: 10.1111/j.1460-9568.2012.07990.x] [Citation(s) in RCA: 369] [Impact Index Per Article: 33.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
Reward contains separable psychological components of learning, incentive motivation and pleasure. Most computational models have focused only on the learning component of reward, but the motivational component is equally important in reward circuitry, and even more directly controls behavior. Modeling the motivational component requires recognition of additional control factors besides learning. Here I discuss how mesocorticolimbic mechanisms generate the motivation component of incentive salience. Incentive salience takes Pavlovian learning and memory as one input and as an equally important input takes neurobiological state factors (e.g. drug states, appetite states, satiety states) that can vary independently of learning. Neurobiological state changes can produce unlearned fluctuations or even reversals in the ability of a previously learned reward cue to trigger motivation. Such fluctuations in cue-triggered motivation can dramatically depart from all previously learned values about the associated reward outcome. Thus, one consequence of the difference between incentive salience and learning can be to decouple cue-triggered motivation of the moment from previously learned values of how good the associated reward has been in the past. Another consequence can be to produce irrationally strong motivation urges that are not justified by any memories of previous reward values (and without distorting associative predictions of future reward value). Such irrationally strong motivation may be especially problematic in addiction. To understand these phenomena, future models of mesocorticolimbic reward function should address the neurobiological state factors that participate to control generation of incentive salience.
Collapse
Affiliation(s)
- Kent C Berridge
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109-1043, USA.
| |
Collapse
|
242
|
Abstract
A traditional view of cognition is that it involves an internal process that represents, tracks or predicts an external process. This is not a general characteristic of all complex neural processing or feedback control, but rather implies specific forms of processing giving rise to specific behavioural capabilities. In this paper, I will review the evidence for such capabilities in insect navigation and learning. Do insects know where they are, or do they only know what to do? Do they learn what stimuli mean, or do they only learn how to behave?
Collapse
Affiliation(s)
- Barbara Webb
- School of Informatics, University of Edinburgh, Edinburgh, EH8 9AB, UK.
| |
Collapse
|
243
|
Girardi G, Antonucci G, Nico D. Cueing spatial attention through timing and probability. Cortex 2013; 49:211-21. [DOI: 10.1016/j.cortex.2011.08.010] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2011] [Revised: 08/12/2011] [Accepted: 08/24/2011] [Indexed: 11/26/2022]
|
244
|
Seidler RD, Kwak Y, Fling BW, Bernard JA. Neurocognitive mechanisms of error-based motor learning. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2013; 782:39-60. [PMID: 23296480 PMCID: PMC3817858 DOI: 10.1007/978-1-4614-5465-6_3] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
- Rachael D. Seidler
- Department of Psychology and School of Kinesiology, University of Michigan, 401 Washtenaw Avenue, Ann Arbor, MI 48109-2214, USA,
| | - Youngbin Kwak
- Neuroscience Program, University of Michigan, 401 Washtenaw Avenue, Ann Arbor, MI 48109-2214, USA, ; Center for Cognitive Neuroscience, Duke University, Durham, NC 27708, USA
| | - Brett W. Fling
- School of Kinesiology, University of Michigan, 401 Washtenaw Avenue, Ann Arbor, MI 48109-2214, USA,
| | - Jessica A. Bernard
- Department of Psychology, University of Michigan, 401 Washtenaw Avenue, Ann Arbor, MI 48109-2214, USA,
| |
Collapse
|
245
|
Foerde K, Braun EK, Shohamy D. A Trade-Off between Feedback-Based Learning and Episodic Memory for Feedback Events: Evidence from Parkinsons Disease. NEURODEGENER DIS 2013; 11:93-101. [DOI: 10.1159/000342000] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
246
|
Abstract
The dorsal striatum, with its functional microcircuits galore, serves as the primary gateway of the basal ganglia and is known to play a key role in implicit learning. Initially, excitatory inputs from the cortex and thalamus arrive on the direct and indirect pathways, where the precise flow of information is then regulated by local GABAergic interneurons. The balance of excitatory and inhibitory transmission in the dorsal striatum is modulated by neuromodulators such as dopamine and acetylcholine. Under pathophysiological states in the dorsal striatum, an alteration in excitatory and inhibitory transmission may underlie dysfunctional motor control. Here, we review the cellular connections and modulation of striatal microcircuits and propose that modulating the excitatory and inhibitory balance in synaptic transmission of the dorsal striatum is important for regulating locomotion.
Collapse
|
247
|
Khamassi M, Humphries MD. Integrating cortico-limbic-basal ganglia architectures for learning model-based and model-free navigation strategies. Front Behav Neurosci 2012. [PMID: 23205006 PMCID: PMC3506961 DOI: 10.3389/fnbeh.2012.00079] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Behavior in spatial navigation is often organized into map-based (place-driven) vs. map-free (cue-driven) strategies; behavior in operant conditioning research is often organized into goal-directed vs. habitual strategies. Here we attempt to unify the two. We review one powerful theory for distinct forms of learning during instrumental conditioning, namely model-based (maintaining a representation of the world) and model-free (reacting to immediate stimuli) learning algorithms. We extend these lines of argument to propose an alternative taxonomy for spatial navigation, showing how various previously identified strategies can be distinguished as “model-based” or “model-free” depending on the usage of information and not on the type of information (e.g., cue vs. place). We argue that identifying “model-free” learning with dorsolateral striatum and “model-based” learning with dorsomedial striatum could reconcile numerous conflicting results in the spatial navigation literature. From this perspective, we further propose that the ventral striatum plays key roles in the model-building process. We propose that the core of the ventral striatum is positioned to learn the probability of action selection for every transition between states of the world. We further review suggestions that the ventral striatal core and shell are positioned to act as “critics” contributing to the computation of a reward prediction error for model-free and model-based systems, respectively.
Collapse
Affiliation(s)
- Mehdi Khamassi
- Institut des Systèmes Intelligents et de Robotique, Université Pierre et Marie Curie Paris, France ; Centre National de la Recherche Scientifique, UMR7222 Paris, France
| | | |
Collapse
|
248
|
The effect of ratio and interval training on Pavlovian-instrumental transfer in mice. PLoS One 2012; 7:e48227. [PMID: 23144742 PMCID: PMC3483270 DOI: 10.1371/journal.pone.0048227] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2012] [Accepted: 09/21/2012] [Indexed: 11/19/2022] Open
Abstract
Conditional stimuli (CS) that are paired with reward can be used to motivate instrumental responses. This process is called Pavlovian-instrumental transfer (PIT). A recent study in rats suggested that habitual responses are particularly sensitive to the motivational effects of reward cues. The current experiments examined this idea using ratio and interval training in mice. Two groups of animals were trained to lever press for food pellets that were delivered on random ratio or random interval schedules. Devaluation tests revealed that interval training led to habitual responding while ratio training produced goal-directed actions. The presentation of CSs paired with reward led to positive transfer in both groups, however, the size of this effect was much larger in mice that were trained on interval schedules. This result suggests that habitual responses are more sensitive to the motivational influence of reward cues than goal-directed actions. The implications for neurobiological models of motivation and drug seeking behaviors are discussed.
Collapse
|
249
|
Burke CJ, Huetteroth W, Owald D, Perisse E, Krashes MJ, Das G, Gohl D, Silies M, Certel S, Waddell S. Layered reward signalling through octopamine and dopamine in Drosophila. Nature 2012; 492:433-7. [PMID: 23103875 PMCID: PMC3528794 DOI: 10.1038/nature11614] [Citation(s) in RCA: 380] [Impact Index Per Article: 31.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2012] [Accepted: 09/24/2012] [Indexed: 11/11/2022]
Abstract
Dopamine (DA) is synonymous with reward and motivation in mammals1,2. However, only recently has dopamine been linked to motivated behavior and rewarding reinforcement in fruit flies3,4. Instead octopamine (OA) has historically been considered the signal for reward in insects5-7. Here we show using temporal control of neural function in Drosophila that only short-term appetitive memory is reinforced by OA. Moreover, OA-dependent memory formation requires signaling through DA neurons. Part of the OA signal requires the α-adrenergic like OAMB receptor in an identified subset of mushroom body (MB)-targeted DA neurons. OA triggers an increase in intracellular calcium in these DA neurons and their direct activation can substitute for sugar to form appetitive memory, even in flies lacking OA. Analysis of the β-adrenergic like Octβ2R receptor reveals that OA-dependent reinforcement also requires an interaction with DA neurons that control appetitive motivation. These data suggest that sweet taste engages a distributed OA signal that reinforces memory through discrete subsets of MB-targeted DA neurons. In addition, they reconcile prior findings with OA and DA and suggest that reinforcement systems in flies are more similar to mammals than previously envisaged.
Collapse
Affiliation(s)
- Christopher J Burke
- Department of Neurobiology, University of Massachusetts Medical School, 364 Plantation Street, Worcester, Massachusetts 01605, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
250
|
Schulz JM, Reynolds JNJ. Pause and rebound: sensory control of cholinergic signaling in the striatum. Trends Neurosci 2012; 36:41-50. [PMID: 23073210 DOI: 10.1016/j.tins.2012.09.006] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2012] [Revised: 08/07/2012] [Accepted: 09/19/2012] [Indexed: 11/29/2022]
Abstract
Cholinergic interneurons have emerged as one of the key players controlling network functions in the striatum. Extracellularly recorded cholinergic interneurons acquire characteristic responses to sensory stimuli during reward-related learning, including a pause and subsequent rebound in spiking. However, the precise underlying cellular mechanisms have remained elusive. Here, we review recent advances in our understanding of the regulation of cholinergic interneuron activity. We discuss evidence of mechanisms that have been proposed to underlie sensory responses, including antagonistic actions by dopamine, recurrent inhibition via local interneurons, and an intrinsically generated membrane hyperpolarization in response to excitatory inputs. The review highlights outstanding questions and concludes with a model of the sensory responses and their downstream effects through dynamic acetylcholine receptor activation.
Collapse
Affiliation(s)
- Jan M Schulz
- Department of Biomedicine, Physiological Institute, University of Basel, Pestalozzistr. 20, 4056 Basel, Switzerland.
| | | |
Collapse
|