1
|
Mah A, Golden CEM, Constantinople CM. Dopamine transients encode reward prediction errors independent of learning rates. Cell Rep 2024; 43:114840. [PMID: 39395170 DOI: 10.1016/j.celrep.2024.114840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 08/19/2024] [Accepted: 09/20/2024] [Indexed: 10/14/2024] Open
Abstract
Biological accounts of reinforcement learning posit that dopamine encodes reward prediction errors (RPEs), which are multiplied by a learning rate to update state or action values. These values are thought to be represented by corticostriatal synaptic weights, which are updated by dopamine-dependent plasticity. This suggests that dopamine release reflects the product of the learning rate and RPE. Here, we characterize dopamine encoding of learning rates in the nucleus accumbens core (NAcc) in a volatile environment. Using a task with semi-observable states offering different rewards, we find that rats adjust how quickly they initiate trials across states using RPEs. Computational modeling and behavioral analyses show that learning rates are higher following state transitions and scale with trial-by-trial changes in beliefs about hidden states, approximating normative Bayesian strategies. Notably, dopamine release in the NAcc encodes RPEs independent of learning rates, suggesting that dopamine-independent mechanisms instantiate dynamic learning rates.
Collapse
Affiliation(s)
- Andrew Mah
- Center for Neural Science, New York University, New York, NY, USA
| | - Carla E M Golden
- Center for Neural Science, New York University, New York, NY, USA
| | | |
Collapse
|
2
|
Zhang Z, Takahashi YK, Montesinos-Cartegena M, Kahnt T, Langdon AJ, Schoenbaum G. Expectancy-related changes in firing of dopamine neurons depend on hippocampus. Nat Commun 2024; 15:8911. [PMID: 39414794 PMCID: PMC11484966 DOI: 10.1038/s41467-024-53308-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Accepted: 10/07/2024] [Indexed: 10/18/2024] Open
Abstract
The orbitofrontal cortex (OFC) and hippocampus (HC) both contribute to the cognitive maps that support flexible behavior. Previously, we used the dopamine neurons to measure the functional role of OFC. We recorded midbrain dopamine neurons as rats performed an odor-based choice task, in which expected rewards were manipulated across blocks. We found that ipsilateral OFC lesions degraded dopaminergic prediction errors, consistent with reduced resolution of the task states. Here we have repeated this experiment in male rats with ipsilateral HC lesions. The results show HC also shapes the task states, however unlike OFC, which provides information local to the trial, the HC is necessary for estimating upper-level hidden states that distinguish blocks. The results contrast the roles of the OFC and HC in cognitive mapping and suggest that the dopamine neurons access rich information from distributed regions regarding the environment's structure, potentially enabling this teaching signal to support complex behaviors.
Collapse
Affiliation(s)
- Zhewei Zhang
- Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD, USA.
| | - Yuji K Takahashi
- Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD, USA
| | | | - Thorsten Kahnt
- Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD, USA
| | - Angela J Langdon
- Intramural Research Program, National Institute on Mental Health, Bethesda, MD, USA
| | - Geoffrey Schoenbaum
- Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD, USA.
| |
Collapse
|
3
|
Kim MJ, Gibson DJ, Hu D, Yoshida T, Hueske E, Matsushima A, Mahar A, Schofield CJ, Sompolpong P, Tran KT, Tian L, Graybiel AM. Dopamine release plateau and outcome signals in dorsal striatum contrast with classic reinforcement learning formulations. Nat Commun 2024; 15:8856. [PMID: 39402067 PMCID: PMC11473536 DOI: 10.1038/s41467-024-53176-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 10/03/2024] [Indexed: 10/17/2024] Open
Abstract
We recorded dopamine release signals in centromedial and centrolateral sectors of the striatum as mice learned consecutive versions of visual cue-outcome conditioning tasks. Dopamine release responses differed for the centromedial and centrolateral sites. In neither sector could these be accounted for by classic reinforcement learning alone as classically applied to the activity of nigral dopamine-containing neurons. Medially, cue responses ranged from initial sharp peaks to modulated plateau responses; outcome (reward) responses during cue conditioning were minimal or, initially, negative. At centrolateral sites, by contrast, strong, transient dopamine release responses occurred at both cue and outcome. Prolonged, plateau release responses to cues emerged in both regions when discriminative behavioral responses became required. At most sites, we found no evidence for a transition from outcome signaling to cue signaling, a hallmark of temporal difference reinforcement learning as applied to midbrain dopaminergic neuronal activity. These findings delineate a reshaping of striatal dopamine release activity during learning and suggest that current views of reward prediction error encoding need review to accommodate distinct learning-related spatial and temporal patterns of striatal dopamine release in the dorsal striatum.
Collapse
Affiliation(s)
- Min Jung Kim
- McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar St., Cambridge, MA, 02139, USA
- Advanced Imaging Research Center, University of Texas, Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Daniel J Gibson
- McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar St., Cambridge, MA, 02139, USA
| | - Dan Hu
- McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar St., Cambridge, MA, 02139, USA
| | - Tomoko Yoshida
- McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar St., Cambridge, MA, 02139, USA
| | - Emily Hueske
- McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar St., Cambridge, MA, 02139, USA
| | - Ayano Matsushima
- McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar St., Cambridge, MA, 02139, USA
| | - Ara Mahar
- McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar St., Cambridge, MA, 02139, USA
| | - Cynthia J Schofield
- McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar St., Cambridge, MA, 02139, USA
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Patlapa Sompolpong
- McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar St., Cambridge, MA, 02139, USA
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Kathy T Tran
- McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar St., Cambridge, MA, 02139, USA
| | - Lin Tian
- Max Planck Florida Institute for Neuroscience, Jupiter, FL, 33458, USA
| | - Ann M Graybiel
- McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar St., Cambridge, MA, 02139, USA.
| |
Collapse
|
4
|
Liu Z, Reiner R, Loewenstein Y, Lottem E. Value modulation of self-defeating impulsivity. Biol Psychiatry 2024:S0006-3223(24)01622-6. [PMID: 39349156 DOI: 10.1016/j.biopsych.2024.09.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Revised: 09/15/2024] [Accepted: 09/23/2024] [Indexed: 10/02/2024]
Abstract
BACKGROUND Impulse control is a critical aspect of cognitive functioning. Intuitively, whether an action is executed prematurely depends on its associated reward, yet the link between value and impulsivity remains poorly understood. Three frameworks for impulsivity offer contrasting views: impulsive behavior may be valuable because it is associated with hidden internal reward (e.g., reduction of mental effort). Alternatively, it can emerge from exploration, which is disadvantageous in the short term, but can yield long-term benefits. Finally, impulsivity may reflect Pavlovian bias, an inherent tendency that occurs even when its outcome is negative. METHODS To test these hypotheses, we trained seventeen male mice to withhold licking while anticipating variable rewards. We then measured and optogenetically manipulated dopamine release in the ventral striatum. RESULTS We found that higher reward magnitudes correlated with increased impulsivity. This behavior was well explained by a Pavlovian-bias model. Furthermore, we observed negative dopamine signals during premature licking, suggesting that in this task, impulsivity is not merely an unsuccessful attempt at obtaining a reward. Rather, it is a failure to overcome the urge to act prematurely despite knowledge of the negative consequences of such impulsive action. CONCLUSION Our findings underscore the integral role value plays in regulating impulsivity and suggest that the dopaminergic system influences impulsivity through the mediation of value learning.
Collapse
Affiliation(s)
- Zhe Liu
- The Edmond and Lily Safra Center for Brain Sciences
| | | | - Yonatan Loewenstein
- The Edmond and Lily Safra Center for Brain Sciences; The Alexander Silberman Institute of Life Sciences, Dept. of Cognitive and Brain Sciences and The Federmann Center for the Study of Rationality, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Eran Lottem
- The Edmond and Lily Safra Center for Brain Sciences.
| |
Collapse
|
5
|
Hill DF, Hickman RW, Al-Mohammad A, Stasiak A, Schultz W. Dopamine neurons encode trial-by-trial subjective reward value in an auction-like task. Nat Commun 2024; 15:8138. [PMID: 39289338 PMCID: PMC11408490 DOI: 10.1038/s41467-024-52311-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 08/29/2024] [Indexed: 09/19/2024] Open
Abstract
The dopamine reward prediction error signal is known to be subjective but has so far only been assessed in aggregate choices. However, personal choices fluctuate across trials and thus reflect the instantaneous subjective reward value. In the well-established Becker-DeGroot-Marschak (BDM) auction-like mechanism, participants are encouraged to place bids that accurately reveal their instantaneous subjective reward value; inaccurate bidding results in suboptimal reward ("incentive compatibility"). In our experiment, male rhesus monkeys became experienced over several years to place accurate BDM bids for juice rewards without specific external constraints. Their bids for physically identical rewards varied trial by trial and increased overall for larger rewards. In these highly experienced animals, responses of midbrain dopamine neurons followed the trial-by-trial variations of bids despite constant, explicitly predicted reward amounts. Inversely, dopamine responses were similar with similar bids for different physical reward amounts. Support Vector Regression demonstrated accurate prediction of the animals' bids by as few as twenty dopamine neurons. Thus, the phasic dopamine reward signal reflects instantaneous subjective reward value.
Collapse
Affiliation(s)
- Daniel F Hill
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, UK.
| | - Robert W Hickman
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, UK
| | - Alaa Al-Mohammad
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, UK
| | - Arkadiusz Stasiak
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, UK
| | - Wolfram Schultz
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, UK.
| |
Collapse
|
6
|
Kocharian A, Redish AD, Rothwell PE. Individual differences in decision-making shape how mesolimbic dopamine regulates choice confidence and change-of-mind. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.16.613237. [PMID: 39345599 PMCID: PMC11429702 DOI: 10.1101/2024.09.16.613237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Nucleus accumbens dopamine signaling is an important neural substrate for decision-making. Dominant theories generally discretize and homogenize decision-making, when it is in fact a continuous process, with evaluation and re-evaluation components that extend beyond simple outcome prediction into consideration of past and future value. Extensive work has examined mesolimbic dopamine in the context of reward prediction error, but major gaps persist in our understanding of how dopamine regulates volitional and self-guided decision-making. Moreover, there is little consideration of individual differences in value processing that may shape how dopamine regulates decision-making. Here, using an economic foraging task in mice, we found that dopamine dynamics in the nucleus accumbens core reflected decision confidence during evaluation of decisions, as well as both past and future value during re-evaluation and change-of-mind. Optogenetic manipulations of mesolimbic dopamine release selectively altered evaluation and re-evaluation of decisions in mice whose dopamine dynamics and behavior reflected future value.
Collapse
Affiliation(s)
- Adrina Kocharian
- Graduate Program in Neuroscience, University of Minnesota Medical School, Minneapolis, MN
- Medical Scientist Training Program, University of Minnesota Medical School, Minneapolis, MN
| | - A. David Redish
- Department of Neuroscience, University of Minnesota Medical School, Minneapolis, MN
| | - Patrick E. Rothwell
- Department of Neuroscience, University of Minnesota Medical School, Minneapolis, MN
| |
Collapse
|
7
|
Qü AJ, Tai LH, Hall CD, Tu EM, Eckstein MK, Mishchanchuk K, Lin WC, Chase JB, MacAskill AF, Collins AGE, Gershman SJ, Wilbrecht L. Nucleus accumbens dopamine release reflects Bayesian inference during instrumental learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.10.566306. [PMID: 38014354 PMCID: PMC10680647 DOI: 10.1101/2023.11.10.566306] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Dopamine release in the nucleus accumbens has been hypothesized to signal reward prediction error, the difference between observed and predicted reward, suggesting a biological implementation for reinforcement learning. Rigorous tests of this hypothesis require assumptions about how the brain maps sensory signals to reward predictions, yet this mapping is still poorly understood. In particular, the mapping is non-trivial when sensory signals provide ambiguous information about the hidden state of the environment. Previous work using classical conditioning tasks has suggested that reward predictions are generated conditional on probabilistic beliefs about the hidden state, such that dopamine implicitly reflects these beliefs. Here we test this hypothesis in the context of an instrumental task (a two-armed bandit), where the hidden state switches repeatedly. We measured choice behavior and recorded dLight signals reflecting dopamine release in the nucleus accumbens core. Model comparison among a wide set of cognitive models based on the behavioral data favored models that used Bayesian updating of probabilistic beliefs. These same models also quantitatively matched the dopamine measurements better than non-Bayesian alternatives. We conclude that probabilistic belief computation contributes to instrumental task performance in mice and is reflected in mesolimbic dopamine signaling.
Collapse
Affiliation(s)
- Albert J. Qü
- Department of Psychology, University of California, Berkeley, CA, 94720, USA
- Center for Computational Biology, University of California, Berkeley, CA, 94720, USA
| | - Lung-Hao Tai
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA, 94720, USA
| | - Christopher D. Hall
- Sainsbury Wellcome Centre for Neural Circuits and Behaviour, University College London, London, W1T 4JG, UK
| | - Emilie M. Tu
- Department of Psychology, University of California, Berkeley, CA, 94720, USA
| | | | - Karyna Mishchanchuk
- Department of Neuroscience, Physiology and Pharmacology, University College London, UK
| | - Wan Chen Lin
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA, 94720, USA
| | - Juliana B. Chase
- Department of Psychology, University of California, Berkeley, CA, 94720, USA
| | - Andrew F. MacAskill
- Department of Neuroscience, Physiology and Pharmacology, University College London, UK
| | - Anne G. E. Collins
- Department of Psychology, University of California, Berkeley, CA, 94720, USA
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA, 94720, USA
| | - Samuel J. Gershman
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA, 02138, USA
- Center for Brains, Minds, and Machines, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Linda Wilbrecht
- Department of Psychology, University of California, Berkeley, CA, 94720, USA
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA, 94720, USA
| |
Collapse
|
8
|
Isaac J, Karkare SC, Balasubramanian H, Schappaugh N, Javier JL, Rashid M, Murugan M. Sex differences in neural representations of social and nonsocial reward in the medial prefrontal cortex. Nat Commun 2024; 15:8018. [PMID: 39271723 PMCID: PMC11399386 DOI: 10.1038/s41467-024-52294-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Accepted: 08/29/2024] [Indexed: 09/15/2024] Open
Abstract
The reinforcing nature of social interactions is necessary for the maintenance of appropriate social behavior. However, the neural substrates underlying social reward processing and how they might differ based on the sex and internal state of the animal remains unknown. It is also unclear whether these neural substrates are shared with those involved in nonsocial rewarding processing. We developed a fully automated, two choice (social-sucrose) operant assay in which mice choose between social and nonsocial rewards to directly compare the reward-related behaviors associated with two competing stimuli. We performed cellular resolution calcium imaging of medial prefrontal cortex (mPFC) neurons in male and female mice across varying states of water restriction and social isolation. We found that mPFC neurons maintain largely non-overlapping, flexible representations of social and nonsocial reward that vary with internal state in a sex-dependent manner. Additionally, optogenetic manipulation of mPFC activity during the reward period of the assay disrupted reward-seeking behavior across male and female mice. Thus, using a two choice operant assay, we have identified sex-dependent, non-overlapping neural representations of social and nonsocial reward in the mPFC that vary with internal state and that are essential for appropriate reward-seeking behavior.
Collapse
Affiliation(s)
- Jennifer Isaac
- Neuroscience Graduate Program, Emory University, Atlanta, GA, 30322, USA
- Department of Biology, Emory University, Atlanta, GA, 30322, USA
| | - Sonia Corbett Karkare
- Neuroscience Graduate Program, Emory University, Atlanta, GA, 30322, USA
- Department of Biology, Emory University, Atlanta, GA, 30322, USA
| | - Hymavathy Balasubramanian
- Neuroscience Graduate Program, Emory University, Atlanta, GA, 30322, USA
- Department of Biology, Emory University, Atlanta, GA, 30322, USA
| | | | - Jarildy Larimar Javier
- Neuroscience Graduate Program, Emory University, Atlanta, GA, 30322, USA
- Department of Biology, Emory University, Atlanta, GA, 30322, USA
| | - Maha Rashid
- Neuroscience Graduate Program, Emory University, Atlanta, GA, 30322, USA
- Department of Biology, Emory University, Atlanta, GA, 30322, USA
| | - Malavika Murugan
- Neuroscience Graduate Program, Emory University, Atlanta, GA, 30322, USA.
- Department of Biology, Emory University, Atlanta, GA, 30322, USA.
| |
Collapse
|
9
|
Furutachi S, Franklin AD, Aldea AM, Mrsic-Flogel TD, Hofer SB. Cooperative thalamocortical circuit mechanism for sensory prediction errors. Nature 2024; 633:398-406. [PMID: 39198646 PMCID: PMC11390482 DOI: 10.1038/s41586-024-07851-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 07/18/2024] [Indexed: 09/01/2024]
Abstract
The brain functions as a prediction machine, utilizing an internal model of the world to anticipate sensations and the outcomes of our actions. Discrepancies between expected and actual events, referred to as prediction errors, are leveraged to update the internal model and guide our attention towards unexpected events1-10. Despite the importance of prediction-error signals for various neural computations across the brain, surprisingly little is known about the neural circuit mechanisms responsible for their implementation. Here we describe a thalamocortical disinhibitory circuit that is required for generating sensory prediction-error signals in mouse primary visual cortex (V1). We show that violating animals' predictions by an unexpected visual stimulus preferentially boosts responses of the layer 2/3 V1 neurons that are most selective for that stimulus. Prediction errors specifically amplify the unexpected visual input, rather than representing non-specific surprise or difference signals about how the visual input deviates from the animal's predictions. This selective amplification is implemented by a cooperative mechanism requiring thalamic input from the pulvinar and cortical vasoactive-intestinal-peptide-expressing (VIP) inhibitory interneurons. In response to prediction errors, VIP neurons inhibit a specific subpopulation of somatostatin-expressing inhibitory interneurons that gate excitatory pulvinar input to V1, resulting in specific pulvinar-driven response amplification of the most stimulus-selective neurons in V1. Therefore, the brain prioritizes unpredicted sensory information by selectively increasing the salience of unpredicted sensory features through the synergistic interaction of thalamic input and neocortical disinhibitory circuits.
Collapse
Affiliation(s)
- Shohei Furutachi
- Sainsbury Wellcome Centre, University College London, London, UK.
| | | | - Andreea M Aldea
- Sainsbury Wellcome Centre, University College London, London, UK
| | | | - Sonja B Hofer
- Sainsbury Wellcome Centre, University College London, London, UK.
| |
Collapse
|
10
|
Mah A, Golden CE, Constantinople CM. Dopamine transients encode reward prediction errors independent of learning rates. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.18.590090. [PMID: 38659861 PMCID: PMC11042285 DOI: 10.1101/2024.04.18.590090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Biological accounts of reinforcement learning posit that dopamine encodes reward prediction errors (RPEs), which are multiplied by a learning rate to update state or action values. These values are thought to be represented in synaptic weights in the striatum, and updated by dopamine-dependent plasticity, suggesting that dopamine release might reflect the product of the learning rate and RPE. Here, we leveraged the fact that animals learn faster in volatile environments to characterize dopamine encoding of learning rates in the nucleus accumbens core (NAcc). We trained rats on a task with semi-observable states offering different rewards, and rats adjusted how quickly they initiated trials across states using RPEs. Computational modeling and behavioral analyses showed that learning rates were higher following state transitions, and scaled with trial-by-trial changes in beliefs about hidden states, approximating normative Bayesian strategies. Notably, dopamine release in the NAcc encoded RPEs independent of learning rates, suggesting that dopamine-independent mechanisms instantiate dynamic learning rates.
Collapse
Affiliation(s)
- Andrew Mah
- Center for Neural Science, New York University
| | | | | |
Collapse
|
11
|
Basu A, Yang JH, Yu A, Glaeser-Khan S, Rondeau JA, Feng J, Krystal JH, Li Y, Kaye AP. Frontal Norepinephrine Represents a Threat Prediction Error Under Uncertainty. Biol Psychiatry 2024; 96:256-267. [PMID: 38316333 PMCID: PMC11269024 DOI: 10.1016/j.biopsych.2024.01.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 01/19/2024] [Accepted: 01/29/2024] [Indexed: 02/07/2024]
Abstract
BACKGROUND To adapt to threats in the environment, animals must predict them and engage in defensive behavior. While the representation of a prediction error signal for reward has been linked to dopamine, a neuromodulatory prediction error for aversive learning has not been identified. METHODS We measured and manipulated norepinephrine release during threat learning using optogenetics and a novel fluorescent norepinephrine sensor. RESULTS We found that norepinephrine response to conditioned stimuli reflects aversive memory strength. When delays between auditory stimuli and footshock are introduced, norepinephrine acts as a prediction error signal. However, temporal difference prediction errors do not fully explain norepinephrine dynamics. To explain noradrenergic signaling, we used an updated reinforcement learning model with uncertainty about time and found that it explained norepinephrine dynamics across learning and variations in temporal and auditory task structure. CONCLUSIONS Norepinephrine thus combines cognitive and affective information into a predictive signal and links time with the anticipation of danger.
Collapse
Affiliation(s)
- Aakash Basu
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut; Interdepartmental Neuroscience Program, Yale School of Medicine, New Haven, Connecticut
| | - Jen-Hau Yang
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut
| | - Abigail Yu
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut
| | | | - Jocelyne A Rondeau
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut
| | - Jiesi Feng
- State Key Laboratory of Membrane Biology, Peking University School of Life Sciences, Beijing, China
| | - John H Krystal
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut; Clinical Neuroscience Division, Veterans Administration National Center for PTSD, West Haven, Connecticut
| | - Yulong Li
- State Key Laboratory of Membrane Biology, Peking University School of Life Sciences, Beijing, China; Peking University-IDG/McGovern Institute for Brain Research, Beijing, China; Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China; Chinese Institute for Brain Research, Beijing, China
| | - Alfred P Kaye
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut; Clinical Neuroscience Division, Veterans Administration National Center for PTSD, West Haven, Connecticut; Wu Tsai Institute, Yale University, New Haven, Connecticut.
| |
Collapse
|
12
|
Cone I, Clopath C, Shouval HZ. Learning to express reward prediction error-like dopaminergic activity requires plastic representations of time. Nat Commun 2024; 15:5856. [PMID: 38997276 PMCID: PMC11245539 DOI: 10.1038/s41467-024-50205-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 07/02/2024] [Indexed: 07/14/2024] Open
Abstract
The dominant theoretical framework to account for reinforcement learning in the brain is temporal difference learning (TD) learning, whereby certain units signal reward prediction errors (RPE). The TD algorithm has been traditionally mapped onto the dopaminergic system, as firing properties of dopamine neurons can resemble RPEs. However, certain predictions of TD learning are inconsistent with experimental results, and previous implementations of the algorithm have made unscalable assumptions regarding stimulus-specific fixed temporal bases. We propose an alternate framework to describe dopamine signaling in the brain, FLEX (Flexibly Learned Errors in Expected Reward). In FLEX, dopamine release is similar, but not identical to RPE, leading to predictions that contrast to those of TD. While FLEX itself is a general theoretical framework, we describe a specific, biophysically plausible implementation, the results of which are consistent with a preponderance of both existing and reanalyzed experimental data.
Collapse
Affiliation(s)
- Ian Cone
- Department of Bioengineering, Imperial College London, London, UK
- Department of Neurobiology and Anatomy, University of Texas Medical School at Houston, Houston, TX, USA
- Applied Physics Program, Rice University, Houston, TX, USA
| | - Claudia Clopath
- Department of Bioengineering, Imperial College London, London, UK
| | - Harel Z Shouval
- Department of Neurobiology and Anatomy, University of Texas Medical School at Houston, Houston, TX, USA.
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA.
| |
Collapse
|
13
|
Robke R, Arbab T, Smith R, Willuhn I. Value-Driven Adaptations of Mesolimbic Dopamine Release Are Governed by Both Model-Based and Model-Free Mechanisms. eNeuro 2024; 11:ENEURO.0223-24.2024. [PMID: 38918053 PMCID: PMC11223458 DOI: 10.1523/eneuro.0223-24.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Accepted: 05/29/2024] [Indexed: 06/27/2024] Open
Abstract
The magnitude of dopamine signals elicited by rewarding events and their predictors is updated when reward value changes. It is actively debated how readily these dopamine signals adapt and whether adaptation aligns with model-free or model-based reinforcement-learning principles. To investigate this, we trained male rats in a pavlovian-conditioning paradigm and measured dopamine release in the nucleus accumbens core in response to food reward (unconditioned stimulus) and reward-predictive conditioned stimuli (CS), both before and after reward devaluation, induced via either sensory-specific or nonspecific satiety. We demonstrate that (1) such devaluation reduces CS-induced dopamine release rapidly, without additional pairing of CS with devalued reward and irrespective of whether the devaluation was sensory-specific or nonspecific. In contrast, (2) reward devaluation did not decrease food reward-induced dopamine release. Surprisingly, (3) postdevaluation reconditioning, by additional pairing of CS with devalued reward, rapidly reinstated CS-induced dopamine signals to predevaluation levels. Taken together, we identify distinct, divergent adaptations in dopamine-signal magnitude when reward value is decreased: CS dopamine diminishes but reinstates fast, whereas reward dopamine is resistant to change. This implies that, respective to abovementioned findings, (1) CS dopamine may be governed by a model-based mechanism and (2) reward dopamine by a model-free one, where (3) the latter may contribute to swift reinstatement of the former. However, changes in CS dopamine were not selective for sensory specificity of reward devaluation, which is inconsistent with model-based processes. Thus, mesolimbic dopamine signaling incorporates both model-free and model-based mechanisms and is not exclusively governed by either.
Collapse
Affiliation(s)
- Rhiannon Robke
- The Netherlands Institute for Neuroscience, Royal Netherlands Academy of Arts and Sciences, Amsterdam 1105BA, The Netherlands
- Department of Psychiatry, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam 1105AZ, The Netherlands
| | - Tara Arbab
- The Netherlands Institute for Neuroscience, Royal Netherlands Academy of Arts and Sciences, Amsterdam 1105BA, The Netherlands
- Department of Psychiatry, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam 1105AZ, The Netherlands
| | - Rachel Smith
- The Netherlands Institute for Neuroscience, Royal Netherlands Academy of Arts and Sciences, Amsterdam 1105BA, The Netherlands
| | - Ingo Willuhn
- The Netherlands Institute for Neuroscience, Royal Netherlands Academy of Arts and Sciences, Amsterdam 1105BA, The Netherlands
- Department of Psychiatry, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam 1105AZ, The Netherlands
| |
Collapse
|
14
|
Schütt HH, Kim D, Ma WJ. Reward prediction error neurons implement an efficient code for reward. Nat Neurosci 2024; 27:1333-1339. [PMID: 38898182 DOI: 10.1038/s41593-024-01671-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 04/29/2024] [Indexed: 06/21/2024]
Abstract
We use efficient coding principles borrowed from sensory neuroscience to derive the optimal neural population to encode a reward distribution. We show that the responses of dopaminergic reward prediction error neurons in mouse and macaque are similar to those of the efficient code in the following ways: the neurons have a broad distribution of midpoints covering the reward distribution; neurons with higher thresholds have higher gains, more convex tuning functions and lower slopes; and their slope is higher when the reward distribution is narrower. Furthermore, we derive learning rules that converge to the efficient code. The learning rule for the position of the neuron on the reward axis closely resembles distributional reinforcement learning. Thus, reward prediction error neuron responses may be optimized to broadcast an efficient reward signal, forming a connection between efficient coding and reinforcement learning, two of the most successful theories in computational neuroscience.
Collapse
Affiliation(s)
- Heiko H Schütt
- Center for Neural Science and Department of Psychology, New York University, New York, NY, USA.
- Department of Behavioural and Cognitive Sciences, Université du Luxembourg, Esch-Belval, Luxembourg.
| | - Dongjae Kim
- Center for Neural Science and Department of Psychology, New York University, New York, NY, USA
- Department of AI-Based Convergence, Dankook University, Yongin, Republic of Korea
| | - Wei Ji Ma
- Center for Neural Science and Department of Psychology, New York University, New York, NY, USA
| |
Collapse
|
15
|
Chai M, Holroyd CB, Brass M, Braem S. Dynamic changes in task preparation in a multi-task environment: The task transformation paradigm. Cognition 2024; 247:105784. [PMID: 38599142 DOI: 10.1016/j.cognition.2024.105784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 02/13/2024] [Accepted: 03/25/2024] [Indexed: 04/12/2024]
Abstract
A key element of human flexible behavior concerns the ability to continuously predict and prepare for sudden changes in tasks or actions. Here, we tested whether people can dynamically modulate task preparation processes and decision-making strategies when the identity of a to-be-performed task becomes uncertain. To this end, we developed a new paradigm where participants need to prepare for one of nine tasks on each trial. Crucially, in some blocks, the task being prepared could suddenly shift to a different task after a longer cue-target interval, by changing either the stimulus category or categorization rule that defined the initial task. We found that participants were able to dynamically modulate task preparation in the face of this task uncertainty. A second experiment shows that these changes in behavior were not simply a function of decreasing task expectancy, but rather of increasing switch expectancy. Finally, in the third and fourth experiment, we demonstrate that these dynamic modulations can be applied in a compositional manner, depending on whether either only the stimulus category or categorization rule would be expected to change.
Collapse
Affiliation(s)
- Mengqiao Chai
- Department of Experimental Psychology, Ghent University, Henri Dunantlaan 2, 9000 Ghent, Belgium.
| | - Clay B Holroyd
- Department of Experimental Psychology, Ghent University, Henri Dunantlaan 2, 9000 Ghent, Belgium.
| | - Marcel Brass
- Department of Experimental Psychology, Ghent University, Henri Dunantlaan 2, 9000 Ghent, Belgium; Berlin School of Mind and Brain, Department of Psychology, Humboldt-Universität zu Berlin, Luisenstraße 56, Haus 1, 10117 Berlin, Germany.
| | - Senne Braem
- Department of Experimental Psychology, Ghent University, Henri Dunantlaan 2, 9000 Ghent, Belgium.
| |
Collapse
|
16
|
Jang HJ, Ward RM, Golden CEM, Constantinople CM. Acetylcholine demixes heterogeneous dopamine signals for learning and moving. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.03.592444. [PMID: 38746300 PMCID: PMC11092744 DOI: 10.1101/2024.05.03.592444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Midbrain dopamine neurons promote reinforcement learning and movement vigor. A major outstanding question is how dopamine-recipient neurons in the striatum parse these heterogeneous signals. Here we characterized dopamine and acetylcholine release in the dorsomedial striatum (DMS) of rats performing a decision-making task. We found that dopamine acted as a reward prediction error (RPE), modulating behavior and DMS spiking on subsequent trials when coincident with pauses in cholinergic release. In contrast, at task events that elicited coincident bursts of acetylcholine and dopamine, dopamine preceded contralateral movements and predicted movement vigor without inducing plastic changes in DMS firing rates. Our findings provide a circuit-level mechanism by which cholinergic modulation allows the same dopamine signals to be used for either movement or learning depending on instantaneous behavioral context.
Collapse
|
17
|
Pereira-Obilinovic U, Hou H, Svoboda K, Wang XJ. Brain mechanism of foraging: Reward-dependent synaptic plasticity versus neural integration of values. Proc Natl Acad Sci U S A 2024; 121:e2318521121. [PMID: 38551832 PMCID: PMC10998608 DOI: 10.1073/pnas.2318521121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 01/16/2024] [Indexed: 04/02/2024] Open
Abstract
During foraging behavior, action values are persistently encoded in neural activity and updated depending on the history of choice outcomes. What is the neural mechanism for action value maintenance and updating? Here, we explore two contrasting network models: synaptic learning of action value versus neural integration. We show that both models can reproduce extant experimental data, but they yield distinct predictions about the underlying biological neural circuits. In particular, the neural integrator model but not the synaptic model requires that reward signals are mediated by neural pools selective for action alternatives and their projections are aligned with linear attractor axes in the valuation system. We demonstrate experimentally observable neural dynamical signatures and feasible perturbations to differentiate the two contrasting scenarios, suggesting that the synaptic model is a more robust candidate mechanism. Overall, this work provides a modeling framework to guide future experimental research on probabilistic foraging.
Collapse
Affiliation(s)
- Ulises Pereira-Obilinovic
- Center for Neural Science, New York University, New York, NY10003
- Allen Institute for Neural Dynamics, Seattle, WA98109
| | - Han Hou
- Allen Institute for Neural Dynamics, Seattle, WA98109
| | - Karel Svoboda
- Allen Institute for Neural Dynamics, Seattle, WA98109
| | - Xiao-Jing Wang
- Center for Neural Science, New York University, New York, NY10003
| |
Collapse
|
18
|
Sukumar S, Shadmehr R, Ahmed AA. Effects of reward and effort history on decision making and movement vigor during foraging. J Neurophysiol 2024; 131:638-651. [PMID: 38056423 PMCID: PMC11305639 DOI: 10.1152/jn.00092.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 12/01/2023] [Accepted: 12/04/2023] [Indexed: 12/08/2023] Open
Abstract
During foraging, animals explore a site and harvest reward and then abandon that site and travel to the next opportunity. One aspect of this behavior involves decision making, and the other involves movement control. These two aspects of behavior may be linked via an underlying desire to maximize a single normative utility: the sum of all rewards acquired, minus all efforts expended, divided by time. According to this theory, the history of rewards, and not just its immediate availability, should dictate how long one should stay and harvest reward and how vigorously one should travel to the next opportunity. We tested this theory in a series of experiments in which humans used their hand to harvest tokens at a reward patch and then used their arm to reach toward another patch. After a history of high rewards, the subjects not only shortened their harvest duration but also moved more vigorously toward the next reward opportunity. In contrast, after a history of high effort they lengthened their harvest duration but reduced their movement vigor, reaching more slowly to the next reward site. Thus, a history of high reward or low effort biased decisions by promoting early abandonment of the reward site and biased movements by promoting vigor.NEW & NOTEWORTHY Much of life is spent foraging. Whereas previous work has focused on the decision regarding time spent harvesting from a reward patch, here we test the idea that both decision making and movement control are tuned to optimize the net rate of reward in an environment. Our results show that movement patterns reflect not just immediate expectations but also past experiences in the environment, providing fundamental insight into the factors governing volitional control of arm movements.
Collapse
Affiliation(s)
- Shruthi Sukumar
- Department of Computer Science, University of Colorado Boulder, Boulder, Colorado, United States
| | - Reza Shadmehr
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, United States
| | - Alaa A Ahmed
- Department of Mechanical Engineering, University of Colorado Boulder, Boulder, Colorado, United States
| |
Collapse
|
19
|
Wilbrecht L, Lin WC, Callahan K, Bateson M, Myers K, Ross R. Experimental biology can inform our understanding of food insecurity. J Exp Biol 2024; 227:jeb246215. [PMID: 38449329 PMCID: PMC10949070 DOI: 10.1242/jeb.246215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2024]
Abstract
Food insecurity is a major public health issue. Millions of households worldwide have intermittent and unpredictable access to food and this experience is associated with greater risk for a host of negative health outcomes. While food insecurity is a contemporary concern, we can understand its effects better if we acknowledge that there are ancient biological programs that evolved to respond to the experience of food scarcity and uncertainty, and they may be particularly sensitive to food insecurity during development. Support for this conjecture comes from common findings in several recent animal studies that have modeled insecurity by manipulating predictability of food access in various ways. Using different experimental paradigms in different species, these studies have shown that experience of insecure access to food can lead to changes in weight, motivation and cognition. Some of these studies account for changes in weight through changes in metabolism, while others observe increases in feeding and motivation to work for food. It has been proposed that weight gain is an adaptive response to the experience of food insecurity as 'insurance' in an uncertain future, while changes in motivation and cognition may reflect strategic adjustments in foraging behavior. Animal studies also offer the opportunity to make in-depth controlled studies of mechanisms and behavior. So far, there is evidence that the experience of food insecurity can impact metabolic efficiency, reproductive capacity and dopamine neuron synapses. Further work on behavior, the central and peripheral nervous system, the gut and liver, along with variation in age of exposure, will be needed to better understand the full body impacts of food insecurity at different stages of development.
Collapse
Affiliation(s)
- Linda Wilbrecht
- Department of Psychology, University of California, Berkeley, Berkeley, CA 94720-1650, USA
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Wan Chen Lin
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Kathryn Callahan
- Psychiatric Research Institute of Montefiore and Einstein, Department of Neuroscience, Albert Einstein College of Medicine, Bronx, New York, NY 10461, USA
| | - Melissa Bateson
- Bioscience Institute, University of Newcastle, Newcastle upon Tyne, NE2 4HH, UK
| | - Kevin Myers
- Department of Psychology and Programs in Animal Behavior and Neuroscience, Bucknell University, Lewisburg, PA 17837, USA
| | - Rachel Ross
- Psychiatric Research Institute of Montefiore and Einstein, Department of Neuroscience, Albert Einstein College of Medicine, Bronx, New York, NY 10461, USA
- Department of Psychiatry, Montefiore Medical Center, Bronx, New York, NY 10467, USA
| |
Collapse
|
20
|
Qian L, Burrell M, Hennig JA, Matias S, Murthy VN, Gershman SJ, Uchida N. The role of prospective contingency in the control of behavior and dopamine signals during associative learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.05.578961. [PMID: 38370735 PMCID: PMC10871210 DOI: 10.1101/2024.02.05.578961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Associative learning depends on contingency, the degree to which a stimulus predicts an outcome. Despite its importance, the neural mechanisms linking contingency to behavior remain elusive. Here we examined the dopamine activity in the ventral striatum - a signal implicated in associative learning - in a Pavlovian contingency degradation task in mice. We show that both anticipatory licking and dopamine responses to a conditioned stimulus decreased when additional rewards were delivered uncued, but remained unchanged if additional rewards were cued. These results conflict with contingency-based accounts using a traditional definition of contingency or a novel causal learning model (ANCCR), but can be explained by temporal difference (TD) learning models equipped with an appropriate inter-trial-interval (ITI) state representation. Recurrent neural networks trained within a TD framework develop state representations like our best 'handcrafted' model. Our findings suggest that the TD error can be a measure that describes both contingency and dopaminergic activity.
Collapse
Affiliation(s)
- Lechen Qian
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
- Center for Brain Science, Harvard University, Cambridge, MA, USA
- These authors contributed equally
| | - Mark Burrell
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
- Center for Brain Science, Harvard University, Cambridge, MA, USA
- These authors contributed equally
| | - Jay A. Hennig
- Center for Brain Science, Harvard University, Cambridge, MA, USA
- Department of Psychology, Harvard University, Cambridge, MA, USA
| | - Sara Matias
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
- Center for Brain Science, Harvard University, Cambridge, MA, USA
| | - Venkatesh. N. Murthy
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
- Center for Brain Science, Harvard University, Cambridge, MA, USA
| | - Samuel J. Gershman
- Center for Brain Science, Harvard University, Cambridge, MA, USA
- Department of Psychology, Harvard University, Cambridge, MA, USA
| | - Naoshige Uchida
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
- Center for Brain Science, Harvard University, Cambridge, MA, USA
| |
Collapse
|
21
|
Amo R. Prediction error in dopamine neurons during associative learning. Neurosci Res 2024; 199:12-20. [PMID: 37451506 DOI: 10.1016/j.neures.2023.07.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 06/18/2023] [Accepted: 07/07/2023] [Indexed: 07/18/2023]
Abstract
Dopamine neurons have long been thought to facilitate learning by broadcasting reward prediction error (RPE), a teaching signal used in machine learning, but more recent work has advanced alternative models of dopamine's computational role. Here, I revisit this critical issue and review new experimental evidences that tighten the link between dopamine activity and RPE. First, I introduce the recent observation of a gradual backward shift of dopamine activity that had eluded researchers for over a decade. I also discuss several other findings, such as dopamine ramping, that were initially interpreted to conflict but later found to be consistent with RPE. These findings improve our understanding of neural computation in dopamine neurons.
Collapse
Affiliation(s)
- Ryunosuke Amo
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA 02138, USA.
| |
Collapse
|
22
|
de Jong JW, Liang Y, Verharen JPH, Fraser KM, Lammel S. State and rate-of-change encoding in parallel mesoaccumbal dopamine pathways. Nat Neurosci 2024; 27:309-318. [PMID: 38212586 DOI: 10.1038/s41593-023-01547-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Accepted: 12/07/2023] [Indexed: 01/13/2024]
Abstract
The nervous system uses fast- and slow-adapting sensory detectors in parallel to enable neuronal representations of external states and their temporal dynamics. It is unknown whether this dichotomy also applies to internal representations that have no direct correlation in the physical world. Here we find that two distinct dopamine (DA) neuron subtypes encode either a state or its rate-of-change. In mice performing a reward-seeking task, we found that the animal's behavioral state and rate-of-change were encoded by the sustained activity of DA neurons in medial ventral tegmental area (VTA) DA neurons and transient activity in lateral VTA DA neurons, respectively. The neural activity patterns of VTA DA cell bodies matched DA release patterns within anatomically defined mesoaccumbal pathways. Based on these results, we propose a model in which the DA system uses two parallel lines for proportional-differential encoding of a state variable and its temporal dynamics.
Collapse
Affiliation(s)
- Johannes W de Jong
- Department of Molecular and Cell Biology and Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA
| | - Yilan Liang
- Department of Molecular and Cell Biology and Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA
| | - Jeroen P H Verharen
- Department of Molecular and Cell Biology and Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA
| | - Kurt M Fraser
- Department of Molecular and Cell Biology and Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA
| | - Stephan Lammel
- Department of Molecular and Cell Biology and Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA.
| |
Collapse
|
23
|
Blanco-Pozo M, Akam T, Walton ME. Dopamine-independent effect of rewards on choices through hidden-state inference. Nat Neurosci 2024; 27:286-297. [PMID: 38216649 PMCID: PMC10849965 DOI: 10.1038/s41593-023-01542-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 12/01/2023] [Indexed: 01/14/2024]
Abstract
Dopamine is implicated in adaptive behavior through reward prediction error (RPE) signals that update value estimates. There is also accumulating evidence that animals in structured environments can use inference processes to facilitate behavioral flexibility. However, it is unclear how these two accounts of reward-guided decision-making should be integrated. Using a two-step task for mice, we show that dopamine reports RPEs using value information inferred from task structure knowledge, alongside information about reward rate and movement. Nonetheless, although rewards strongly influenced choices and dopamine activity, neither activating nor inhibiting dopamine neurons at trial outcome affected future choice. These data were recapitulated by a neural network model where cortex learned to track hidden task states by predicting observations, while basal ganglia learned values and actions via RPEs. This shows that the influence of rewards on choices can stem from dopamine-independent information they convey about the world's state, not the dopaminergic RPEs they produce.
Collapse
Affiliation(s)
- Marta Blanco-Pozo
- Department of Experimental Psychology, Oxford University, Oxford, UK.
- Wellcome Centre for Integrative Neuroimaging, Oxford University, Oxford, UK.
| | - Thomas Akam
- Department of Experimental Psychology, Oxford University, Oxford, UK.
- Wellcome Centre for Integrative Neuroimaging, Oxford University, Oxford, UK.
| | - Mark E Walton
- Department of Experimental Psychology, Oxford University, Oxford, UK.
- Wellcome Centre for Integrative Neuroimaging, Oxford University, Oxford, UK.
| |
Collapse
|
24
|
Harhen NC, Bornstein AM. Interval Timing as a Computational Pathway From Early Life Adversity to Affective Disorders. Top Cogn Sci 2024; 16:92-112. [PMID: 37824831 PMCID: PMC10842617 DOI: 10.1111/tops.12701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 09/26/2023] [Accepted: 09/27/2023] [Indexed: 10/14/2023]
Abstract
Adverse early life experiences can have remarkably enduring negative consequences on mental health, with numerous, varied psychiatric conditions sharing this developmental origin. Yet, the mechanisms linking adverse experiences to these conditions remain poorly understood. Here, we draw on a principled model of interval timing to propose that statistically optimal adaptation of temporal representations to an unpredictable early life environment can produce key characteristics of anhedonia, a transdiagnostic symptom associated with affective disorders like depression and anxiety. The core observation is that early temporal unpredictability produces broader, more imprecise temporal expectations. As a result, reward anticipation is diminished, and associative learning is slowed. When agents with such representations are later introduced to more stable environments, they demonstrate a negativity bias, responding more to the omission of reward than its receipt. Increased encoding of negative events has been proposed to contribute to disorders with anhedonia as a symptom. We then examined how unpredictability interacts with another form of adversity, low reward availability. We found that unpredictability's effect was most strongly felt in richer environments, potentially leading to categorically different phenotypic expressions. In sum, our formalization suggests a single mechanism can help to link early life adversity to a range of behaviors associated with anhedonia, and offers novel insights into the interactive impacts of multiple adversities.
Collapse
Affiliation(s)
- Nora C. Harhen
- Department of Cognitive Sciences, University of California, Irvine
| | - Aaron M. Bornstein
- Department of Cognitive Sciences, University of California, Irvine
- Center for the Neurobiology of Learning and Memory, University of California, Irvine
| |
Collapse
|
25
|
Antony JW, Van Dam J, Massey JR, Barnett AJ, Bennion KA. Long-term, multi-event surprise correlates with enhanced autobiographical memory. Nat Hum Behav 2023; 7:2152-2168. [PMID: 37322234 DOI: 10.1038/s41562-023-01631-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Accepted: 05/16/2023] [Indexed: 06/17/2023]
Abstract
Neurobiological and psychological models of learning emphasize the importance of prediction errors (surprises) for memory formation. This relationship has been shown for individual momentary surprising events; however, it is less clear whether surprise that unfolds across multiple events and timescales is also linked with better memory of those events. We asked basketball fans about their most positive and negative autobiographical memories of individual plays, games and seasons, allowing surprise measurements spanning seconds, hours and months. We used advanced analytics on National Basketball Association play-by-play data and betting odds spanning 17 seasons, more than 22,000 games and more than 5.6 million plays to compute and align the estimated surprise value of each memory. We found that surprising events were associated with better recall of positive memories on the scale of seconds and months and negative memories across all three timescales. Game and season memories could not be explained by surprise at shorter timescales, suggesting that long-term, multi-event surprise correlates with memory. These results expand notions of surprise in models of learning and reinforce its relevance in real-world domains.
Collapse
Affiliation(s)
- James W Antony
- Department of Psychology and Child Development, California Polytechnic State University, San Luis Obispo, CA, USA.
| | - Jacob Van Dam
- Department of Psychology and Child Development, California Polytechnic State University, San Luis Obispo, CA, USA
| | - Jarett R Massey
- Department of Psychology and Child Development, California Polytechnic State University, San Luis Obispo, CA, USA
| | | | - Kelly A Bennion
- Department of Psychology and Child Development, California Polytechnic State University, San Luis Obispo, CA, USA
| |
Collapse
|
26
|
Mah A, Schiereck SS, Bossio V, Constantinople CM. Distinct value computations support rapid sequential decisions. Nat Commun 2023; 14:7573. [PMID: 37989741 PMCID: PMC10663503 DOI: 10.1038/s41467-023-43250-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 11/03/2023] [Indexed: 11/23/2023] Open
Abstract
The value of the environment determines animals' motivational states and sets expectations for error-based learning1-3. How are values computed? Reinforcement learning systems can store or cache values of states or actions that are learned from experience, or they can compute values using a model of the environment to simulate possible futures3. These value computations have distinct trade-offs, and a central question is how neural systems decide which computations to use or whether/how to combine them4-8. Here we show that rats use distinct value computations for sequential decisions within single trials. We used high-throughput training to collect statistically powerful datasets from 291 rats performing a temporal wagering task with hidden reward states. Rats adjusted how quickly they initiated trials and how long they waited for rewards across states, balancing effort and time costs against expected rewards. Statistical modeling revealed that animals computed the value of the environment differently when initiating trials versus when deciding how long to wait for rewards, even though these decisions were only seconds apart. Moreover, value estimates interacted via a dynamic learning rate. Our results reveal how distinct value computations interact on rapid timescales, and demonstrate the power of using high-throughput training to understand rich, cognitive behaviors.
Collapse
Affiliation(s)
- Andrew Mah
- Center for Neural Science, New York University, New York, NY, 10003, USA
| | | | - Veronica Bossio
- Center for Neural Science, New York University, New York, NY, 10003, USA
- Zuckerman Institute, Columbia University, New York, NY, 10027, USA
| | | |
Collapse
|
27
|
Stetsenko A, Koos T. Neuronal implementation of the temporal difference learning algorithm in the midbrain dopaminergic system. Proc Natl Acad Sci U S A 2023; 120:e2309015120. [PMID: 37903252 PMCID: PMC10636325 DOI: 10.1073/pnas.2309015120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 09/29/2023] [Indexed: 11/01/2023] Open
Abstract
The temporal difference learning (TDL) algorithm has been essential to conceptualizing the role of dopamine in reinforcement learning (RL). Despite its theoretical importance, it remains unknown whether a neuronal implementation of this algorithm exists in the brain. Here, we provide an interpretation of the recently described signaling properties of ventral tegmental area (VTA) GABAergic neurons and show that a circuitry of these neurons implements the TDL algorithm. Specifically, we identified the neuronal mechanism of three key components of the TDL model: a sustained state value signal encoded by an afferent input to the VTA, a temporal differentiation circuit formed by two types of VTA GABAergic neurons the combined output of which computes momentary reward prediction (RP) as the derivative of the state value, and the computation of reward prediction errors (RPEs) in dopamine neurons utilizing the output of the differentiation circuit. Using computational methods, we also show that this mechanism is optimally adapted to the biophysics of RPE signaling in dopamine neurons, mechanistically links the emergence of conditioned reinforcement to RP, and can naturally account for the temporal discounting of reinforcement. Elucidating the implementation of the TDL algorithm may further the investigation of RL in biological and artificial systems.
Collapse
Affiliation(s)
- Anya Stetsenko
- Center for Molecular and Behavioral Neuroscience, Rutgers University, Newark, NJ07102
| | - Tibor Koos
- Center for Molecular and Behavioral Neuroscience, Rutgers University, Newark, NJ07102
| |
Collapse
|
28
|
Fraser KM, Collins VL, Wolff AR, Ottenheimer DJ, Bornhoft KN, Pat F, Chen BJ, Janak PH, Saunders BT. Contexts facilitate dynamic value encoding in the mesolimbic dopamine system. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.05.565687. [PMID: 37961363 PMCID: PMC10635154 DOI: 10.1101/2023.11.05.565687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Adaptive behavior in a dynamic environment often requires rapid revaluation of stimuli that deviates from well-learned associations. The divergence between stable value-encoding and appropriate behavioral output remains a critical test to theories of dopamine's function in learning, motivation, and motor control. Yet how dopamine neurons are involved in the revaluation of cues when the world changes to alter our behavior remains unclear. Here we make use of pharmacology, in vivo electrophysiology, fiber photometry, and optogenetics to resolve the contributions of the mesolimbic dopamine system to the dynamic reorganization of reward-seeking. Male and female rats were trained to discriminate when a conditioned stimulus would be followed by sucrose reward by exploiting the prior, non-overlapping presentation of a separate discrete cue - an occasion setter. Only when the occasion setter's presentation preceded the conditioned stimulus did the conditioned stimulus predict sucrose delivery. As a result, in this task we were able to dissociate the average value of the conditioned stimulus from its immediate expected value on a trial-to-trial basis. Both the activity of ventral tegmental area dopamine neurons and dopamine signaling in the nucleus accumbens were essential for rats to successfully update behavioral responding in response to the occasion setter. Moreover, dopamine release in the nucleus accumbens following the conditioned stimulus only occurred when the occasion setter indicated it would predict reward. Downstream of dopamine release, we found that single neurons in the nucleus accumbens dynamically tracked the value of the conditioned stimulus. Together these results reveal a novel mechanism within the mesolimbic dopamine system for the rapid revaluation of motivation.
Collapse
Affiliation(s)
- Kurt M Fraser
- Department of Psychological and Brain Sciences, Johns Hopkins University
| | | | - Amy R Wolff
- Department of Neuroscience, University of Minnesota
| | | | | | - Fiona Pat
- Department of Psychological and Brain Sciences, Johns Hopkins University
| | - Bridget J Chen
- Department of Psychological and Brain Sciences, Johns Hopkins University
| | - Patricia H Janak
- Department of Psychological and Brain Sciences, Johns Hopkins University
- The Solomon H. Snyder Department of Neuroscience, Johns Hopkins University
| | - Benjamin T Saunders
- Department of Neuroscience, University of Minnesota
- Medical Discovery Team on Addiction, University of Minnesota
| |
Collapse
|
29
|
Cone I, Clopath C, Shouval HZ. Learning to Express Reward Prediction Error-like Dopaminergic Activity Requires Plastic Representations of Time. RESEARCH SQUARE 2023:rs.3.rs-3289985. [PMID: 37790466 PMCID: PMC10543312 DOI: 10.21203/rs.3.rs-3289985/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
The dominant theoretical framework to account for reinforcement learning in the brain is temporal difference (TD) reinforcement learning. The TD framework predicts that some neuronal elements should represent the reward prediction error (RPE), which means they signal the difference between the expected future rewards and the actual rewards. The prominence of the TD theory arises from the observation that firing properties of dopaminergic neurons in the ventral tegmental area appear similar to those of RPE model-neurons in TD learning. Previous implementations of TD learning assume a fixed temporal basis for each stimulus that might eventually predict a reward. Here we show that such a fixed temporal basis is implausible and that certain predictions of TD learning are inconsistent with experiments. We propose instead an alternative theoretical framework, coined FLEX (Flexibly Learned Errors in Expected Reward). In FLEX, feature specific representations of time are learned, allowing for neural representations of stimuli to adjust their timing and relation to rewards in an online manner. In FLEX dopamine acts as an instructive signal which helps build temporal models of the environment. FLEX is a general theoretical framework that has many possible biophysical implementations. In order to show that FLEX is a feasible approach, we present a specific biophysically plausible model which implements the principles of FLEX. We show that this implementation can account for various reinforcement learning paradigms, and that its results and predictions are consistent with a preponderance of both existing and reanalyzed experimental data.
Collapse
Affiliation(s)
- Ian Cone
- Department of Bioengineering, Imperial College London, London, United Kingdom
- Department of Neurobiology and Anatomy, University of Texas Medical School at Houston, Houston, TX
- Applied Physics Program, Rice University, Houston, TX
| | - Claudia Clopath
- Department of Bioengineering, Imperial College London, London, United Kingdom
| | - Harel Z Shouval
- Department of Neurobiology and Anatomy, University of Texas Medical School at Houston, Houston, TX
- Department of Electrical and Computer Engineering, Rice University, Houston, TX
| |
Collapse
|
30
|
Hennig JA, Romero Pinto SA, Yamaguchi T, Linderman SW, Uchida N, Gershman SJ. Emergence of belief-like representations through reinforcement learning. PLoS Comput Biol 2023; 19:e1011067. [PMID: 37695776 PMCID: PMC10513382 DOI: 10.1371/journal.pcbi.1011067] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 09/21/2023] [Accepted: 08/27/2023] [Indexed: 09/13/2023] Open
Abstract
To behave adaptively, animals must learn to predict future reward, or value. To do this, animals are thought to learn reward predictions using reinforcement learning. However, in contrast to classical models, animals must learn to estimate value using only incomplete state information. Previous work suggests that animals estimate value in partially observable tasks by first forming "beliefs"-optimal Bayesian estimates of the hidden states in the task. Although this is one way to solve the problem of partial observability, it is not the only way, nor is it the most computationally scalable solution in complex, real-world environments. Here we show that a recurrent neural network (RNN) can learn to estimate value directly from observations, generating reward prediction errors that resemble those observed experimentally, without any explicit objective of estimating beliefs. We integrate statistical, functional, and dynamical systems perspectives on beliefs to show that the RNN's learned representation encodes belief information, but only when the RNN's capacity is sufficiently large. These results illustrate how animals can estimate value in tasks without explicitly estimating beliefs, yielding a representation useful for systems with limited capacity.
Collapse
Affiliation(s)
- Jay A. Hennig
- Department of Psychology, Harvard University, Cambridge, Massachusetts, United States of America
- Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
| | - Sandra A. Romero Pinto
- Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
- Program in Speech and Hearing Bioscience and Technology, Harvard Medical School, Boston, Massachusetts, USA
| | - Takahiro Yamaguchi
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
- Future Research Department, Toyota Research Institute of North America, Toyota Motor North America, Ann Arbor, Michigan, United States of America
| | - Scott W. Linderman
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, California, United States of America
- Department of Statistics, Stanford University, Stanford, California, United States of America
| | - Naoshige Uchida
- Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Samuel J. Gershman
- Department of Psychology, Harvard University, Cambridge, Massachusetts, United States of America
- Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
| |
Collapse
|
31
|
Li WR, Nakano T, Mizutani K, Matsubara T, Kawatani M, Mukai Y, Danjo T, Ito H, Aizawa H, Yamanaka A, Petersen CCH, Yoshimoto J, Yamashita T. Neural mechanisms underlying uninstructed orofacial movements during reward-based learning behaviors. Curr Biol 2023; 33:3436-3451.e7. [PMID: 37536343 DOI: 10.1016/j.cub.2023.07.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Revised: 07/06/2023] [Accepted: 07/10/2023] [Indexed: 08/05/2023]
Abstract
During reward-based learning tasks, animals make orofacial movements that globally influence brain activity at the timings of reward expectation and acquisition. These orofacial movements are not explicitly instructed and typically appear along with goal-directed behaviors. Here, we show that reinforcing optogenetic stimulation of dopamine neurons in the ventral tegmental area (oDAS) in mice is sufficient to induce orofacial movements in the whiskers and nose without accompanying goal-directed behaviors. Pavlovian conditioning with a sensory cue and oDAS elicited cue-locked and oDAS-aligned orofacial movements, which were distinguishable by a machine-learning model. Inhibition or knockout of dopamine D1 receptors in the nucleus accumbens inhibited oDAS-induced motion but spared cue-locked motion, suggesting differential regulation of these two types of orofacial motions. In contrast, inactivation of the whisker primary motor cortex (wM1) abolished both types of orofacial movements. We found specific neuronal populations in wM1 representing either oDAS-aligned or cue-locked whisker movements. Notably, optogenetic stimulation of wM1 neurons successfully replicated these two types of movements. Our results thus suggest that accumbal D1-receptor-dependent and -independent neuronal signals converge in the wM1 for facilitating distinct uninstructed orofacial movements during a reward-based learning task.
Collapse
Affiliation(s)
- Wan-Ru Li
- Department of Physiology, Fujita Health University School of Medicine, 1-98 Dengakugakubo, Kutsukake-cho, Toyoake 470-1192, Japan; Department of Neuroscience II, Research Institute of Environmental Medicine, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8601, Japan; Department of Functional Anatomy & Neuroscience, Graduate School of Medicine, Nagoya University, 65 Tsurumai-cho, Showa-ku, Nagoya 466-8550, Japan
| | - Takashi Nakano
- Department of Computational Biology, Fujita Health University School of Medicine, 1-98 Dengakugakubo, Kutsukake-cho, Toyoake 470-1192, Japan; Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma 630-0192, Japan; International Center for Brain Science (ICBS), Fujita Health University, 1-98 Dengakugakubo, Kutsukake-cho, Toyoake 470-1192, Japan
| | - Kohta Mizutani
- Department of Neuroscience II, Research Institute of Environmental Medicine, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8601, Japan; Laboratory for Advanced Brain Functions, Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita 565-0871, Japan
| | - Takanori Matsubara
- Department of Physiology, Fujita Health University School of Medicine, 1-98 Dengakugakubo, Kutsukake-cho, Toyoake 470-1192, Japan; Department of Neuroscience II, Research Institute of Environmental Medicine, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8601, Japan
| | - Masahiro Kawatani
- Department of Physiology, Fujita Health University School of Medicine, 1-98 Dengakugakubo, Kutsukake-cho, Toyoake 470-1192, Japan; Department of Neuroscience II, Research Institute of Environmental Medicine, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8601, Japan; Department of Functional Anatomy & Neuroscience, Graduate School of Medicine, Nagoya University, 65 Tsurumai-cho, Showa-ku, Nagoya 466-8550, Japan
| | - Yasutaka Mukai
- Department of Neuroscience II, Research Institute of Environmental Medicine, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8601, Japan
| | - Teruko Danjo
- Department of Physiology, Fujita Health University School of Medicine, 1-98 Dengakugakubo, Kutsukake-cho, Toyoake 470-1192, Japan
| | - Hikaru Ito
- Department of Neurobiology, Graduate School of Biomedical and Health Sciences, Hiroshima University, 1-2-3 Kasumi, Minami-ku, Hiroshima 734-8553, Japan; Research Facility Center for Science and Technology, Kagawa University, 1750-1 Ikenobe, Miki-cho, Kita-gun, Kagawa 761-0793, Japan
| | - Hidenori Aizawa
- Department of Neurobiology, Graduate School of Biomedical and Health Sciences, Hiroshima University, 1-2-3 Kasumi, Minami-ku, Hiroshima 734-8553, Japan
| | - Akihiro Yamanaka
- Department of Neuroscience II, Research Institute of Environmental Medicine, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8601, Japan
| | - Carl C H Petersen
- Laboratory of Sensory Processing, Brain Mind Institute, Faculty of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
| | - Junichiro Yoshimoto
- Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma 630-0192, Japan; International Center for Brain Science (ICBS), Fujita Health University, 1-98 Dengakugakubo, Kutsukake-cho, Toyoake 470-1192, Japan; Department of Biomedical Data Science, Fujita Health University School of Medicine, 1-98 Dengakugakubo, Kutsukake-cho, Toyoake 470-1192, Japan.
| | - Takayuki Yamashita
- Department of Physiology, Fujita Health University School of Medicine, 1-98 Dengakugakubo, Kutsukake-cho, Toyoake 470-1192, Japan; Department of Neuroscience II, Research Institute of Environmental Medicine, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8601, Japan; International Center for Brain Science (ICBS), Fujita Health University, 1-98 Dengakugakubo, Kutsukake-cho, Toyoake 470-1192, Japan.
| |
Collapse
|
32
|
Kim MJ, Gibson DJ, Hu D, Mahar A, Schofield CJ, Sompolpong P, Yoshida T, Tran KT, Graybiel AM. Dopamine Release Plateau and Outcome Signals in Dorsal Striatum Contrast with Classic Reinforcement Learning Formulations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.15.553421. [PMID: 37645888 PMCID: PMC10462077 DOI: 10.1101/2023.08.15.553421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
We recorded dopamine release signals in medial and lateral sectors of the striatum as mice learned consecutive visual cue-outcome conditioning tasks including cue association, cue discrimination, reversal, and probabilistic discrimination task versions. Dopamine release responses in medial and lateral sites exhibited learning-related changes within and across phases of acquisition. These were different for the medial and lateral sites. In neither sector could these be accounted for by classic reinforcement learning as applied to dopamine-containing neuron activity. Cue responses ranged from initial sharp peaks to modulated plateau responses. In the medial sector, outcome (reward) responses during cue conditioning were minimal or, initially, negative. By contrast, in lateral sites, strong, transient dopamine release responses occurred at both cue and outcome. Prolonged, plateau release responses to cues emerged in both regions when discriminative behavioral responses became required. In most sites, we found no evidence for a transition from outcome to cue signaling, a hallmark of temporal difference reinforcement learning as applied to midbrain dopamine activity. These findings delineate reshaping of dopamine release activity during learning and suggest that current views of reward prediction error encoding need review to accommodate distinct learning-related spatial and temporal patterns of striatal dopamine release in the dorsal striatum.
Collapse
|
33
|
Takahashi YK, Zhang Z, Montesinos-Cartegena M, Kahnt T, Langdon AJ, Schoenbaum G. Expectancy-related changes in firing of dopamine neurons depend on hippocampus. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.19.549728. [PMID: 37781610 PMCID: PMC10541105 DOI: 10.1101/2023.07.19.549728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/03/2023]
Abstract
The orbitofrontal cortex (OFC) and hippocampus (HC) are both implicated in forming the cognitive or task maps that support flexible behavior. Previously, we used the dopamine neurons as a sensor or tool to measure the functional effects of OFC lesions (Takahashi et al., 2011). We recorded midbrain dopamine neurons as rats performed an odor-based choice task, in which errors in the prediction of reward were induced by manipulating the number or timing of the expected rewards across blocks of trials. We found that OFC lesions ipsilateral to the recording electrodes caused prediction errors to be degraded consistent with a loss in the resolution of the task states, particularly under conditions where hidden information was critical to sharpening the predictions. Here we have repeated this experiment, along with computational modeling of the results, in rats with ipsilateral HC lesions. The results show HC also shapes the map of our task, however unlike OFC, which provides information local to the trial, the HC appears to be necessary for estimating the upper-level hidden states based on the information that is discontinuous or separated by longer timescales. The results contrast the respective roles of the OFC and HC in cognitive mapping and add to evidence that the dopamine neurons access a rich information set from distributed regions regarding the predictive structure of the environment, potentially enabling this powerful teaching signal to support complex learning and behavior.
Collapse
Affiliation(s)
- Yuji K Takahashi
- Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD
| | - Zhewei Zhang
- Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD
| | | | - Thorsten Kahnt
- Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD
| | - Angela J Langdon
- Intramural Research Program, National Institute on Mental Health, Bethesda, MD
| | - Geoffrey Schoenbaum
- Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD
| |
Collapse
|
34
|
Takahashi YK, Stalnaker TA, Mueller LE, Harootonian SK, Langdon AJ, Schoenbaum G. Dopaminergic prediction errors in the ventral tegmental area reflect a multithreaded predictive model. Nat Neurosci 2023; 26:830-839. [PMID: 37081296 PMCID: PMC10646487 DOI: 10.1038/s41593-023-01310-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 03/16/2023] [Indexed: 04/22/2023]
Abstract
Dopamine neuron activity is tied to the prediction error in temporal difference reinforcement learning models. These models make significant simplifying assumptions, particularly with regard to the structure of the predictions fed into the dopamine neurons, which consist of a single chain of timepoint states. Although this predictive structure can explain error signals observed in many studies, it cannot cope with settings where subjects might infer multiple independent events and outcomes. In the present study, we recorded dopamine neurons in the ventral tegmental area in such a setting to test the validity of the single-stream assumption. Rats were trained in an odor-based choice task, in which the timing and identity of one of several rewards delivered in each trial changed across trial blocks. This design revealed an error signaling pattern that requires the dopamine neurons to access and update multiple independent predictive streams reflecting the subject's belief about timing and potentially unique identities of expected rewards.
Collapse
Affiliation(s)
- Yuji K Takahashi
- Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD, USA.
| | - Thomas A Stalnaker
- Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD, USA
| | - Lauren E Mueller
- Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD, USA
| | | | - Angela J Langdon
- Intramural Research Program, National Institute of Mental Health, Bethesda, MD, USA.
| | - Geoffrey Schoenbaum
- Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD, USA.
| |
Collapse
|
35
|
Engineered highs: Reward variability and frequency as potential prerequisites of behavioural addiction. Addict Behav 2023; 140:107626. [PMID: 36701907 DOI: 10.1016/j.addbeh.2023.107626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 12/20/2022] [Accepted: 01/18/2023] [Indexed: 01/22/2023]
Abstract
Influential learning-based accounts of substance addictions posit the attribution of incentive salience to drug-associated cues, and its escalation by the direct dopaminergic effects of drugs. In translating this account to disordered gambling, we have noted how the intermittent nature of monetary rewards in gambling (i.e. the variable ratio) may allow for analogous learning processes, via effects on dopaminergic signalling. The aim of the present article is to consider how multiple sources of reward variability operate within modern gambling products, and how similar sources of variability, as well as some novel sources of variability, also apply to other digital products implicated in behavioural addictions, including gaming, shopping, social media and online pornography. Online access to these activities facilitates not only unparalleled accessibility but also introduces novel forms of reward variability, as seen in the effects of infinite scrolls and personalized recommendations. We use the term uncertainty to refer to the subjective experience of reward variability. We further highlight two psychological factors that appear to moderate the effects of uncertainty: 1) the timecourse of uncertainty, especially with regard to its resolution, 2) the frequency of exposure, allowing temporal compression. Collectively, the evidence illustrates how qualitative and quantitative variability of reward can confer addictive potential to non-drug reinforcers by exploiting the psychological and neural processes that rely on predictability to guide reward seeking behaviour.
Collapse
|
36
|
Sandhu TR, Xiao B, Lawson RP. Transdiagnostic computations of uncertainty: towards a new lens on intolerance of uncertainty. Neurosci Biobehav Rev 2023; 148:105123. [PMID: 36914079 DOI: 10.1016/j.neubiorev.2023.105123] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 02/21/2023] [Accepted: 03/08/2023] [Indexed: 03/13/2023]
Abstract
People radically differ in how they cope with uncertainty. Clinical researchers describe a dispositional characteristic known as "intolerance of uncertainty", a tendency to find uncertainty aversive, reported to be elevated across psychiatric and neurodevelopmental conditions. Concurrently, recent research in computational psychiatry has leveraged theoretical work to characterise individual differences in uncertainty processing. Under this framework, differences in how people estimate different forms of uncertainty can contribute to mental health difficulties. In this review, we briefly outline the concept of intolerance of uncertainty within its clinical context, and we argue that the mechanisms underlying this construct may be further elucidated through modelling how individuals make inferences about uncertainty. We will review the evidence linking psychopathology to different computationally specified forms of uncertainty and consider how these findings might suggest distinct mechanistic routes towards intolerance of uncertainty. We also discuss the implications of this computational approach for behavioural and pharmacological interventions, as well as the importance of different cognitive domains and subjective experiences in studying uncertainty processing.
Collapse
Affiliation(s)
- Timothy R Sandhu
- Department of Psychology, Downing Place, University of Cambridge, CB2 3EB, UK; MRC Cognition and Brain Sciences Unit, 15 Chaucer Road, CB2 7EF, UK.
| | - Bowen Xiao
- Department of Psychology, Downing Place, University of Cambridge, CB2 3EB, UK
| | - Rebecca P Lawson
- Department of Psychology, Downing Place, University of Cambridge, CB2 3EB, UK; MRC Cognition and Brain Sciences Unit, 15 Chaucer Road, CB2 7EF, UK
| |
Collapse
|
37
|
Hennig JA, Pinto SAR, Yamaguchi T, Linderman SW, Uchida N, Gershman SJ. Emergence of belief-like representations through reinforcement learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.04.535512. [PMID: 37066383 PMCID: PMC10104054 DOI: 10.1101/2023.04.04.535512] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
To behave adaptively, animals must learn to predict future reward, or value. To do this, animals are thought to learn reward predictions using reinforcement learning. However, in contrast to classical models, animals must learn to estimate value using only incomplete state information. Previous work suggests that animals estimate value in partially observable tasks by first forming "beliefs"-optimal Bayesian estimates of the hidden states in the task. Although this is one way to solve the problem of partial observability, it is not the only way, nor is it the most computationally scalable solution in complex, real-world environments. Here we show that a recurrent neural network (RNN) can learn to estimate value directly from observations, generating reward prediction errors that resemble those observed experimentally, without any explicit objective of estimating beliefs. We integrate statistical, functional, and dynamical systems perspectives on beliefs to show that the RNN's learned representation encodes belief information, but only when the RNN's capacity is sufficiently large. These results illustrate how animals can estimate value in tasks without explicitly estimating beliefs, yielding a representation useful for systems with limited capacity. Author Summary Natural environments are full of uncertainty. For example, just because my fridge had food in it yesterday does not mean it will have food today. Despite such uncertainty, animals can estimate which states and actions are the most valuable. Previous work suggests that animals estimate value using a brain area called the basal ganglia, using a process resembling a reinforcement learning algorithm called TD learning. However, traditional reinforcement learning algorithms cannot accurately estimate value in environments with state uncertainty (e.g., when my fridge's contents are unknown). One way around this problem is if agents form "beliefs," a probabilistic estimate of how likely each state is, given any observations so far. However, estimating beliefs is a demanding process that may not be possible for animals in more complex environments. Here we show that an artificial recurrent neural network (RNN) trained with TD learning can estimate value from observations, without explicitly estimating beliefs. The trained RNN's error signals resembled the neural activity of dopamine neurons measured during the same task. Importantly, the RNN's activity resembled beliefs, but only when the RNN had enough capacity. This work illustrates how animals could estimate value in uncertain environments without needing to first form beliefs, which may be useful in environments where computing the true beliefs is too costly.
Collapse
Affiliation(s)
- Jay A. Hennig
- Department of Psychology, Harvard University, Cambridge, MA, USA
- Center for Brain Science, Harvard University, Cambridge, MA, USA
| | - Sandra A. Romero Pinto
- Center for Brain Science, Harvard University, Cambridge, MA, USA
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
| | - Takahiro Yamaguchi
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
- Future Vehicle Research Department, Toyota Research Institute North America, Toyota Motor North America Inc., Ann Arbor, MI, USA
| | - Scott W. Linderman
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA
- Department of Statistics, Stanford University, Stanford, CA, USA
| | - Naoshige Uchida
- Center for Brain Science, Harvard University, Cambridge, MA, USA
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
| | - Samuel J. Gershman
- Department of Psychology, Harvard University, Cambridge, MA, USA
- Center for Brain Science, Harvard University, Cambridge, MA, USA
| |
Collapse
|
38
|
Banerjee A, Wang BA, Teutsch J, Helmchen F, Pleger B. Analogous cognitive strategies for tactile learning in the rodent and human brain. Prog Neurobiol 2023; 222:102401. [PMID: 36608783 DOI: 10.1016/j.pneurobio.2023.102401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Revised: 12/21/2022] [Accepted: 01/02/2023] [Indexed: 01/05/2023]
Abstract
Evolution has molded individual species' sensory capacities and abilities. In rodents, who mostly inhabit dark tunnels and burrows, the whisker-based somatosensory system has developed as the dominant sensory modality, essential for environmental exploration and spatial navigation. In contrast, humans rely more on visual and auditory inputs when collecting information from their surrounding sensory space in everyday life. As a result of such species-specific differences in sensory dominance, cognitive relevance and capacities, the evidence for analogous sensory-cognitive mechanisms across species remains sparse. However, recent research in rodents and humans yielded surprisingly comparable processing rules for detecting tactile stimuli, integrating touch information into percepts, and goal-directed rule learning. Here, we review how the brain, across species, harnesses such processing rules to establish decision-making during tactile learning, following canonical circuits from the thalamus and the primary somatosensory cortex up to the frontal cortex. We discuss concordances between empirical and computational evidence from micro- and mesoscopic circuit studies in rodents to findings from macroscopic imaging in humans. Furthermore, we discuss the relevance and challenges for future cross-species research in addressing mutual context-dependent evaluation processes underpinning perceptual learning.
Collapse
Affiliation(s)
- Abhishek Banerjee
- Adaptive Decisions Lab, Biosciences Institute, Newcastle University, United Kingdom.
| | - Bin A Wang
- Department of Neurology, BG University Hospital Bergmannsheil, Ruhr University Bochum, Germany; Collaborative Research Centre 874 "Integration and Representation of Sensory Processes", Ruhr University Bochum, Germany.
| | - Jasper Teutsch
- Adaptive Decisions Lab, Biosciences Institute, Newcastle University, United Kingdom
| | - Fritjof Helmchen
- Laboratory of Neural Circuit Dynamics, Brain Research Institute, University of Zürich, Switzerland
| | - Burkhard Pleger
- Department of Neurology, BG University Hospital Bergmannsheil, Ruhr University Bochum, Germany; Collaborative Research Centre 874 "Integration and Representation of Sensory Processes", Ruhr University Bochum, Germany
| |
Collapse
|
39
|
Elias LJ, Succi IK, Schaffler MD, Foster W, Gradwell MA, Bohic M, Fushiki A, Upadhyay A, Ejoh LL, Schwark R, Frazer R, Bistis B, Burke JE, Saltz V, Boyce JE, Jhumka A, Costa RM, Abraira VE, Abdus-Saboor I. Touch neurons underlying dopaminergic pleasurable touch and sexual receptivity. Cell 2023; 186:577-590.e16. [PMID: 36693373 PMCID: PMC9898224 DOI: 10.1016/j.cell.2022.12.034] [Citation(s) in RCA: 37] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Revised: 10/21/2022] [Accepted: 12/20/2022] [Indexed: 01/24/2023]
Abstract
Pleasurable touch is paramount during social behavior, including sexual encounters. However, the identity and precise role of sensory neurons that transduce sexual touch remain unknown. A population of sensory neurons labeled by developmental expression of the G protein-coupled receptor Mrgprb4 detects mechanical stimulation in mice. Here, we study the social relevance of Mrgprb4-lineage neurons and reveal that these neurons are required for sexual receptivity and sufficient to induce dopamine release in the brain. Even in social isolation, optogenetic stimulation of Mrgprb4-lineage neurons through the back skin is sufficient to induce a conditioned place preference and a striking dorsiflexion resembling the lordotic copulatory posture. In the absence of Mrgprb4-lineage neurons, female mice no longer find male mounts rewarding: sexual receptivity is supplanted by aggression and a coincident decline in dopamine release in the nucleus accumbens. Together, these findings establish that Mrgprb4-lineage neurons initiate a skin-to-brain circuit encoding the rewarding quality of social touch.
Collapse
Affiliation(s)
- Leah J Elias
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA; Neuroscience Graduate Group, University of Pennsylvania, Philadelphia, PA, USA
| | - Isabella K Succi
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA; Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Melanie D Schaffler
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA; Neuroscience Graduate Group, University of Pennsylvania, Philadelphia, PA, USA
| | - William Foster
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - Mark A Gradwell
- Cell Biology and Neuroscience Department, Rutgers University, The State University of New Jersey, New Brunswick, NJ, USA; W.M. Keck Center for Collaborative Neuroscience, Rutgers University, The State University of New Jersey, New Brunswick, NJ, USA
| | - Manon Bohic
- Cell Biology and Neuroscience Department, Rutgers University, The State University of New Jersey, New Brunswick, NJ, USA; W.M. Keck Center for Collaborative Neuroscience, Rutgers University, The State University of New Jersey, New Brunswick, NJ, USA
| | - Akira Fushiki
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - Aman Upadhyay
- Cell Biology and Neuroscience Department, Rutgers University, The State University of New Jersey, New Brunswick, NJ, USA; W.M. Keck Center for Collaborative Neuroscience, Rutgers University, The State University of New Jersey, New Brunswick, NJ, USA
| | - Lindsay L Ejoh
- Neuroscience Graduate Group, University of Pennsylvania, Philadelphia, PA, USA
| | - Ryan Schwark
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA; Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Rachel Frazer
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - Brittany Bistis
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA; Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Jessica E Burke
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - Victoria Saltz
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA; Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Jared E Boyce
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - Anissa Jhumka
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA; Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Rui M Costa
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - Victoria E Abraira
- Cell Biology and Neuroscience Department, Rutgers University, The State University of New Jersey, New Brunswick, NJ, USA; W.M. Keck Center for Collaborative Neuroscience, Rutgers University, The State University of New Jersey, New Brunswick, NJ, USA
| | - Ishmail Abdus-Saboor
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA; Department of Biological Sciences, Columbia University, New York, NY, USA.
| |
Collapse
|
40
|
Sajad A, Errington SP, Schall JD. Functional architecture of executive control and associated event-related potentials in macaques. Nat Commun 2022; 13:6270. [PMID: 36271051 PMCID: PMC9586948 DOI: 10.1038/s41467-022-33942-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 10/07/2022] [Indexed: 12/25/2022] Open
Abstract
The medial frontal cortex (MFC) enables executive control by monitoring relevant information and using it to adapt behavior. In macaques performing a saccade countermanding (stop-signal) task, we simultaneously recorded electrical potentials over MFC and neural spiking across all layers of the supplementary eye field (SEF). We report the laminar organization of neurons enabling executive control by monitoring the conflict between incompatible responses, the timing of events, and sustaining goal maintenance. These neurons were a mix of narrow-spiking and broad-spiking found in all layers, but those predicting the duration of control and sustaining the task goal until the release of operant control were more commonly narrow-spiking neurons confined to layers 2 and 3 (L2/3). We complement these results with evidence for a monkey homolog of the N2/P3 event-related potential (ERP) complex associated with response inhibition. N2 polarization varied with error-likelihood and P3 polarization varied with the duration of expected control. The amplitude of the N2 and P3 were predicted by the spike rate of different classes of neurons located in L2/3 but not L5/6. These findings reveal features of the cortical microcircuitry supporting executive control and producing associated ERPs.
Collapse
Affiliation(s)
- Amirsaman Sajad
- Department of Psychology, Vanderbilt Vision Research Center, Center for Integrative & Cognitive Neuroscience, Vanderbilt University, Nashville, TN, USA
| | - Steven P Errington
- Department of Psychology, Vanderbilt Vision Research Center, Center for Integrative & Cognitive Neuroscience, Vanderbilt University, Nashville, TN, USA
| | - Jeffrey D Schall
- Department of Psychology, Vanderbilt Vision Research Center, Center for Integrative & Cognitive Neuroscience, Vanderbilt University, Nashville, TN, USA.
- Department of Biology, Centre for Vision Research, Vision Science to Application, York University, Toronto, ON, Canada.
| |
Collapse
|
41
|
Midbrain dopamine neurons signal phasic and ramping reward prediction error during goal-directed navigation. Cell Rep 2022; 41:111470. [PMID: 36223748 PMCID: PMC9631116 DOI: 10.1016/j.celrep.2022.111470] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 08/17/2022] [Accepted: 09/19/2022] [Indexed: 01/06/2023] Open
Abstract
Goal-directed navigation requires learning to accurately estimate location and select optimal actions in each location. Midbrain dopamine neurons are involved in reward value learning and have been linked to reward location learning. They are therefore ideally placed to provide teaching signals for goal-directed navigation. By imaging dopamine neural activity as mice learned to actively navigate a closed-loop virtual reality corridor to obtain reward, we observe phasic and pre-reward ramping dopamine activity, which are modulated by learning stage and task engagement. A Q-learning model incorporating position inference recapitulates our results, displaying prediction errors resembling phasic and ramping dopamine neural activity. The model predicts that ramping is followed by improved task performance, which we confirm in our experimental data, indicating that the dopamine ramp may have a teaching effect. Our results suggest that midbrain dopamine neurons encode phasic and ramping reward prediction error signals to improve goal-directed navigation.
Collapse
|
42
|
Jakob AMV, Mikhael JG, Hamilos AE, Assad JA, Gershman SJ. Dopamine mediates the bidirectional update of interval timing. Behav Neurosci 2022; 136:445-452. [PMID: 36222637 PMCID: PMC9725808 DOI: 10.1037/bne0000529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/03/2023]
Abstract
The role of dopamine (DA) as a reward prediction error (RPE) signal in reinforcement learning (RL) tasks has been well-established over the past decades. Recent work has shown that the RPE interpretation can also account for the effects of DA on interval timing by controlling the speed of subjective time. According to this theory, the timing of the dopamine signal relative to reward delivery dictates whether subjective time speeds up or slows down: Early DA signals speed up subjective time and late signals slow it down. To test this bidirectional prediction, we reanalyzed measurements of dopaminergic neurons in the substantia nigra pars compacta of mice performing a self-timed movement task. Using the slope of ramping dopamine activity as a readout of subjective time speed, we found that trial-by-trial changes in the slope could be predicted from the timing of dopamine activity on the previous trial. This result provides a key piece of evidence supporting a unified computational theory of RL and interval timing. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
Collapse
Affiliation(s)
- Anthony M V Jakob
- Section of Life Sciences Engineering, École Polytechnique Fédérale de Lausanne
| | | | | | - John A Assad
- Department of Neurobiology, Harvard Medical School
| | - Samuel J Gershman
- Department of Psychology and Center for Brain Science, Harvard University
| |
Collapse
|
43
|
Xue C, Kramer LE, Cohen MR. Dynamic task-belief is an integral part of decision-making. Neuron 2022; 110:2503-2511.e3. [PMID: 35700735 PMCID: PMC9357195 DOI: 10.1016/j.neuron.2022.05.010] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 02/10/2022] [Accepted: 05/11/2022] [Indexed: 11/20/2022]
Abstract
Natural decisions involve two seemingly separable processes: inferring the relevant task (task-belief) and performing the believed-relevant task. The assumed separability has led to the traditional practice of studying task-switching and perceptual decision-making individually. Here, we used a novel paradigm to manipulate and measure macaque monkeys' task-belief and demonstrated inextricable neuronal links between flexible task-belief and perceptual decision-making. We showed that in animals, but not in artificial networks that performed as well or better than the animals, stronger task-belief is associated with better perception. Correspondingly, recordings from neuronal populations in cortical areas 7a and V1 revealed that stronger task-belief is associated with better discriminability of the believed-relevant, but not the believed-irrelevant, feature. Perception also impacts belief updating; noise fluctuations in V1 help explain how task-belief is updated. Our results demonstrate that complex tasks and multi-area recordings can reveal fundamentally new principles of how biology affects behavior in health and disease.
Collapse
Affiliation(s)
- Cheng Xue
- Department of Neuroscience and Center for Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, PA 15260, USA.
| | - Lily E Kramer
- Department of Neuroscience and Center for Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | - Marlene R Cohen
- Department of Neuroscience and Center for Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, PA 15260, USA.
| |
Collapse
|
44
|
Efficient coding of cognitive variables underlies dopamine response and choice behavior. Nat Neurosci 2022; 25:738-748. [PMID: 35668173 DOI: 10.1038/s41593-022-01085-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Accepted: 04/26/2022] [Indexed: 11/26/2022]
Abstract
Reward expectations based on internal knowledge of the external environment are a core component of adaptive behavior. However, internal knowledge may be inaccurate or incomplete due to errors in sensory measurements. Some features of the environment may also be encoded inaccurately to minimize representational costs associated with their processing. In this study, we investigated how reward expectations are affected by features of internal representations by studying behavior and dopaminergic activity while mice make time-based decisions. We show that several possible representations allow a reinforcement learning agent to model animals' overall performance during the task. However, only a small subset of highly compressed representations simultaneously reproduced the co-variability in animals' choice behavior and dopaminergic activity. Strikingly, these representations predict an unusual distribution of response times that closely match animals' behavior. These results inform how constraints of representational efficiency may be expressed in encoding representations of dynamic cognitive variables used for reward-based computations.
Collapse
|
45
|
The role of state uncertainty in the dynamics of dopamine. Curr Biol 2022; 32:1077-1087.e9. [PMID: 35114098 PMCID: PMC8930519 DOI: 10.1016/j.cub.2022.01.025] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 11/22/2021] [Accepted: 01/10/2022] [Indexed: 11/22/2022]
Abstract
Reinforcement learning models of the basal ganglia map the phasic dopamine signal to reward prediction errors (RPEs). Conventional models assert that, when a stimulus predicts a reward with fixed delay, dopamine activity during the delay should converge to baseline through learning. However, recent studies have found that dopamine ramps up before reward in certain conditions even after learning, thus challenging the conventional models. In this work, we show that sensory feedback causes an unbiased learner to produce RPE ramps. Our model predicts that when feedback gradually decreases during a trial, dopamine activity should resemble a "bump," whose ramp-up phase should, furthermore, be greater than that of conditions where the feedback stays high. We trained mice on a virtual navigation task with varying brightness, and both predictions were empirically observed. In sum, our theoretical and experimental results reconcile the seemingly conflicting data on dopamine behaviors under the RPE hypothesis.
Collapse
|
46
|
Ryan TJ, Frankland PW. Forgetting as a form of adaptive engram cell plasticity. Nat Rev Neurosci 2022; 23:173-186. [PMID: 35027710 DOI: 10.1038/s41583-021-00548-3] [Citation(s) in RCA: 44] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/25/2021] [Indexed: 12/30/2022]
Abstract
One leading hypothesis suggests that memories are stored in ensembles of neurons (or 'engram cells') and that successful recall involves reactivation of these ensembles. A logical extension of this idea is that forgetting occurs when engram cells cannot be reactivated. Forms of 'natural forgetting' vary considerably in terms of their underlying mechanisms, time course and reversibility. However, we suggest that all forms of forgetting involve circuit remodelling that switches engram cells from an accessible state (where they can be reactivated by natural recall cues) to an inaccessible state (where they cannot). In many cases, forgetting rates are modulated by environmental conditions and we therefore propose that forgetting is a form of neuroplasticity that alters engram cell accessibility in a manner that is sensitive to mismatches between expectations and the environment. Moreover, we hypothesize that disease states associated with forgetting may hijack natural forgetting mechanisms, resulting in reduced engram cell accessibility and memory loss.
Collapse
Affiliation(s)
- Tomás J Ryan
- School of Biochemistry and Immunology, Trinity College Dublin, Dublin, Ireland. .,Trinity College Institute for Neuroscience, Trinity College Dublin, Dublin, Ireland. .,Florey Institute of Neuroscience and Mental Health, Melbourne Brain Centre, University of Melbourne, Melbourne, Victoria, Australia. .,Child & Brain Development Program, Canadian Institute for Advanced Research (CIFAR), Toronto, Ontario, Canada.
| | - Paul W Frankland
- Child & Brain Development Program, Canadian Institute for Advanced Research (CIFAR), Toronto, Ontario, Canada. .,Program in Neurosciences & Mental Health, Hospital for Sick Children, Toronto, Ontario, Canada. .,Department of Psychology, University of Toronto, Toronto, Ontario, Canada. .,Department of Physiology, University of Toronto, Toronto, Ontario, Canada. .,Institute of Medical Sciences, University of Toronto, Toronto, Ontario, Canada.
| |
Collapse
|
47
|
Dopamine firing plays a dual role in coding reward prediction errors and signaling motivation in a working memory task. Proc Natl Acad Sci U S A 2022; 119:2113311119. [PMID: 34992139 PMCID: PMC8764687 DOI: 10.1073/pnas.2113311119] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/29/2021] [Indexed: 11/21/2022] Open
Abstract
Little is known about how dopamine (DA) neuron firing rates behave in cognitively demanding decision-making tasks. Here, we investigated midbrain DA activity in monkeys performing a discrimination task in which the animal had to use working memory (WM) to report which of two sequentially applied vibrotactile stimuli had the higher frequency. We found that perception was altered by an internal bias, likely generated by deterioration of the representation of the first frequency during the WM period. This bias greatly controlled the DA phasic response during the two stimulation periods, confirming that DA reward prediction errors reflected stimulus perception. In contrast, tonic dopamine activity during WM was not affected by the bias and did not encode the stored frequency. More interestingly, both delay-period activity and phasic responses before the second stimulus negatively correlated with reaction times of the animals after the trial start cue and thus represented motivated behavior on a trial-by-trial basis. During WM, this motivation signal underwent a ramp-like increase. At the same time, motivation positively correlated with accuracy, especially in difficult trials, probably by decreasing the effect of the bias. Overall, our results indicate that DA activity, in addition to encoding reward prediction errors, could at the same time be involved in motivation and WM. In particular, the ramping activity during the delay period suggests a possible DA role in stabilizing sustained cortical activity, hypothetically by increasing the gain communicated to prefrontal neurons in a motivation-dependent way.
Collapse
|
48
|
Hamilos AE, Spedicato G, Hong Y, Sun F, Li Y, Assad J. Slowly evolving dopaminergic activity modulates the moment-to-moment probability of reward-related self-timed movements. eLife 2021; 10:62583. [PMID: 34939925 PMCID: PMC8860451 DOI: 10.7554/elife.62583] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2020] [Accepted: 12/21/2021] [Indexed: 11/13/2022] Open
Abstract
Clues from human movement disorders have long suggested that the neurotransmitter dopamine plays a role in motor control, but how the endogenous dopaminergic system influences movement is unknown. Here we examined the relationship between dopaminergic signaling and the timing of reward-related movements in mice. Animals were trained to initiate licking after a self-timed interval following a start-timing cue; reward was delivered in response to movements initiated after a criterion time. The movement time was variable from trial-to-trial, as expected from previous studies. Surprisingly, dopaminergic signals ramped-up over seconds between the start-timing cue and the self-timed movement, with variable dynamics that predicted the movement/reward time on single trials. Steeply rising signals preceded early lick-initiation, whereas slowly rising signals preceded later initiation. Higher baseline signals also predicted earlier self-timed movements. Optogenetic activation of dopamine neurons during self-timing did not trigger immediate movements, but rather caused systematic early-shifting of movement initiation, whereas inhibition caused late-shifting, as if modulating the probability of movement. Consistent with this view, the dynamics of the endogenous dopaminergic signals quantitatively predicted the moment-by-moment probability of movement initiation on single trials. We propose that ramping dopaminergic signals, likely encoding dynamic reward expectation, can modulate the decision of when to move.
Collapse
Affiliation(s)
- Allison E Hamilos
- Department of Neurobiology, Harvard Medical School, Boston, United States
| | - Giulia Spedicato
- Department of Neurobiology, Harvard Medical School, Boston, United States
| | - Ye Hong
- Department of Neurobiology, Harvard Medical School, Boston, United States
| | - Fangmiao Sun
- State Key Laboratory of Membrane Biology, Peking University School of Life Science, Beijing, China
| | - Yulong Li
- State Key Laboratory of Membrane Biology, Peiking University School of Life Sciences, Beijing, China
| | - John Assad
- Department of Neurobiology, Harvard Medical School, Boston, United States
| |
Collapse
|
49
|
Deserno L, Moran R, Michely J, Lee Y, Dayan P, Dolan RJ. Dopamine enhances model-free credit assignment through boosting of retrospective model-based inference. eLife 2021; 10:e67778. [PMID: 34882092 PMCID: PMC8758138 DOI: 10.7554/elife.67778] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 12/08/2021] [Indexed: 11/13/2022] Open
Abstract
Dopamine is implicated in representing model-free (MF) reward prediction errors a as well as influencing model-based (MB) credit assignment and choice. Putative cooperative interactions between MB and MF systems include a guidance of MF credit assignment by MB inference. Here, we used a double-blind, placebo-controlled, within-subjects design to test an hypothesis that enhancing dopamine levels boosts the guidance of MF credit assignment by MB inference. In line with this, we found that levodopa enhanced guidance of MF credit assignment by MB inference, without impacting MF and MB influences directly. This drug effect correlated negatively with a dopamine-dependent change in purely MB credit assignment, possibly reflecting a trade-off between these two MB components of behavioural control. Our findings of a dopamine boost in MB inference guidance of MF learning highlight a novel DA influence on MB-MF cooperative interactions.
Collapse
Affiliation(s)
- Lorenz Deserno
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College LondonLondonUnited Kingdom
- The Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College LondonLondonUnited Kingdom
- Department of Child and Adolescent Psychiatry, Psychotherapy and Psychosomatics, University of WürzburgWürzburgGermany
- Department of Psychiatry and Psychotherapy, Technische Universität DresdenDresdenGermany
| | - Rani Moran
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College LondonLondonUnited Kingdom
- The Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College LondonLondonUnited Kingdom
| | - Jochen Michely
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College LondonLondonUnited Kingdom
- The Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College LondonLondonUnited Kingdom
- Department of Psychiatry and Psychotherapy, Charité Universitätsmedizin BerlinBerlinGermany
| | - Ying Lee
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College LondonLondonUnited Kingdom
- The Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College LondonLondonUnited Kingdom
- Department of Psychiatry and Psychotherapy, Technische Universität DresdenDresdenGermany
| | - Peter Dayan
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College LondonLondonUnited Kingdom
- Max Planck Institute for Biological CyberneticsTübingenGermany
- University of TübingenTübingenGermany
| | - Raymond J Dolan
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College LondonLondonUnited Kingdom
- The Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College LondonLondonUnited Kingdom
| |
Collapse
|
50
|
Gao Z, Wang H, Lu C, Lu T, Froudist-Walsh S, Chen M, Wang XJ, Hu J, Sun W. The neural basis of delayed gratification. SCIENCE ADVANCES 2021; 7:eabg6611. [PMID: 34851665 PMCID: PMC8635439 DOI: 10.1126/sciadv.abg6611] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Accepted: 10/12/2021] [Indexed: 06/13/2023]
Abstract
Balancing instant gratification versus delayed but better gratification is important for optimizing survival and reproductive success. Although delayed gratification has been studied through human psychological and brain activity monitoring and animal research, little is known about its neural basis. We successfully trained mice to perform a waiting-for-water-reward delayed gratification task and used these animals in physiological recording and optical manipulation of neuronal activity during the task to explore its neural basis. Our results showed that the activity of dopaminergic (DAergic) neurons in the ventral tegmental area increases steadily during the waiting period. Optical activation or silencing of these neurons, respectively, extends or reduces the duration of waiting. To interpret these data, we developed a reinforcement learning model that reproduces our experimental observations. Steady increases in DAergic activity signal the value of waiting and support the hypothesis that delayed gratification involves real-time deliberation.
Collapse
Affiliation(s)
- Zilong Gao
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Chinese Institute for Brain Research, Beijing 102206, China
| | - Hanqing Wang
- Center for Neural Science, New York University, New York, NY 10003, USA
| | - Chen Lu
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Tiezhan Lu
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Chinese Institute for Brain Research, Beijing 102206, China
| | | | - Ming Chen
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Xiao-Jing Wang
- Center for Neural Science, New York University, New York, NY 10003, USA
| | - Ji Hu
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
- Shanghai Key Laboratory of Psychotic Disorders, Shanghai Mental Health Center, Shanghai 200030, China
| | - Wenzhi Sun
- Chinese Institute for Brain Research, Beijing 102206, China
- School of Basic Medical Sciences, Capital Medical University, Beijing 100069, China
| |
Collapse
|