101
|
Gao Z, Wang H, Lu C, Lu T, Froudist-Walsh S, Chen M, Wang XJ, Hu J, Sun W. The neural basis of delayed gratification. SCIENCE ADVANCES 2021; 7:eabg6611. [PMID: 34851665 PMCID: PMC8635439 DOI: 10.1126/sciadv.abg6611] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Accepted: 10/12/2021] [Indexed: 06/13/2023]
Abstract
Balancing instant gratification versus delayed but better gratification is important for optimizing survival and reproductive success. Although delayed gratification has been studied through human psychological and brain activity monitoring and animal research, little is known about its neural basis. We successfully trained mice to perform a waiting-for-water-reward delayed gratification task and used these animals in physiological recording and optical manipulation of neuronal activity during the task to explore its neural basis. Our results showed that the activity of dopaminergic (DAergic) neurons in the ventral tegmental area increases steadily during the waiting period. Optical activation or silencing of these neurons, respectively, extends or reduces the duration of waiting. To interpret these data, we developed a reinforcement learning model that reproduces our experimental observations. Steady increases in DAergic activity signal the value of waiting and support the hypothesis that delayed gratification involves real-time deliberation.
Collapse
Affiliation(s)
- Zilong Gao
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Chinese Institute for Brain Research, Beijing 102206, China
| | - Hanqing Wang
- Center for Neural Science, New York University, New York, NY 10003, USA
| | - Chen Lu
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Tiezhan Lu
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Chinese Institute for Brain Research, Beijing 102206, China
| | | | - Ming Chen
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Xiao-Jing Wang
- Center for Neural Science, New York University, New York, NY 10003, USA
| | - Ji Hu
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
- Shanghai Key Laboratory of Psychotic Disorders, Shanghai Mental Health Center, Shanghai 200030, China
| | - Wenzhi Sun
- Chinese Institute for Brain Research, Beijing 102206, China
- School of Basic Medical Sciences, Capital Medical University, Beijing 100069, China
| |
Collapse
|
102
|
K Namboodiri VM, Stuber GD. The learning of prospective and retrospective cognitive maps within neural circuits. Neuron 2021; 109:3552-3575. [PMID: 34678148 PMCID: PMC8809184 DOI: 10.1016/j.neuron.2021.09.034] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Revised: 08/26/2021] [Accepted: 09/16/2021] [Indexed: 11/18/2022]
Abstract
Brain circuits are thought to form a "cognitive map" to process and store statistical relationships in the environment. A cognitive map is commonly defined as a mental representation that describes environmental states (i.e., variables or events) and the relationship between these states. This process is commonly conceptualized as a prospective process, as it is based on the relationships between states in chronological order (e.g., does reward follow a given state?). In this perspective, we expand this concept on the basis of recent findings to postulate that in addition to a prospective map, the brain forms and uses a retrospective cognitive map (e.g., does a given state precede reward?). In doing so, we demonstrate that many neural signals and behaviors (e.g., habits) that seem inflexible and non-cognitive can result from retrospective cognitive maps. Together, we present a significant conceptual reframing of the neurobiological study of associative learning, memory, and decision making.
Collapse
Affiliation(s)
- Vijay Mohan K Namboodiri
- Department of Neurology, Center for Integrative Neuroscience, Kavli Institute for Fundamental Neuroscience, Neuroscience Graduate Program, University of California, San Francisco, San Francisco, CA 94158, USA.
| | - Garret D Stuber
- Center for the Neurobiology of Addiction, Pain, and Emotion, Department of Anesthesiology and Pain Medicine, Department of Pharmacology, Neuroscience Graduate Program, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
103
|
Chen J, Bruchas M. Neuromodulation: A model for dopamine in salience encoding. Curr Biol 2021; 31:R1426-R1429. [PMID: 34752767 DOI: 10.1016/j.cub.2021.09.038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
The neurotransmitter dopamine has well-known roles in reward-seeking behaviors: a new study with mice has now revealed that dopamine signaling in the nucleus accumbens core, a region of the basal forebrain, encodes saliency during reinforcement learning.
Collapse
Affiliation(s)
- Jingyi Chen
- Center for the Neurobiology of Addiction, Pain, and Emotion, University of Washington, Seattle, WA 98195, USA; Departments of Anesthesiology and Pain Medicine, University of Washington, Seattle, WA 98195, USA
| | - Michael Bruchas
- Center for the Neurobiology of Addiction, Pain, and Emotion, University of Washington, Seattle, WA 98195, USA; Departments of Anesthesiology and Pain Medicine, University of Washington, Seattle, WA 98195, USA; Department of Pharmacology, University of Washington, Seattle, WA 98195, USA; Department of Bioengineering, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
104
|
Vandaele Y, Ottenheimer DJ, Janak PH. Dorsomedial Striatal Activity Tracks Completion of Behavioral Sequences in Rats. eNeuro 2021; 8:ENEURO.0279-21.2021. [PMID: 34725103 PMCID: PMC8607909 DOI: 10.1523/eneuro.0279-21.2021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Revised: 09/24/2021] [Accepted: 10/13/2021] [Indexed: 11/21/2022] Open
Abstract
For proper execution of goal-directed behaviors, individuals require both a general representation of the goal and an ability to monitor their own progress toward that goal. Here, we examine how dorsomedial striatum (DMS), a region pivotal for forming associations among stimuli, actions, and outcomes, encodes the execution of goal-directed action sequences that require self-monitoring of behavior. We trained rats to complete a sequence of at least five consecutive lever presses (without visiting the reward port) to obtain a reward and recorded the activity of individual cells in DMS while rats performed the task. We found that the pattern of DMS activity gradually changed during the execution of the sequence, permitting accurate decoding of sequence progress from neural activity at a population level. Moreover, this sequence-related activity was blunted on trials where rats did not complete a sufficient number of presses. Overall, these data suggest a link between DMS activity and the execution of behavioral sequences that require monitoring of ongoing behavior.
Collapse
Affiliation(s)
- Youna Vandaele
- Department of Psychological and Brain Sciences, Krieger School of Arts and Sciences, Johns Hopkins University, Baltimore, MD 21218
| | - David J Ottenheimer
- Department of Psychological and Brain Sciences, Krieger School of Arts and Sciences, Johns Hopkins University, Baltimore, MD 21218
- The Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Johns Hopkins University, Baltimore, MD 21205
| | - Patricia H Janak
- Department of Psychological and Brain Sciences, Krieger School of Arts and Sciences, Johns Hopkins University, Baltimore, MD 21218
- The Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Johns Hopkins University, Baltimore, MD 21205
| |
Collapse
|
105
|
Hamid AA. Dopaminergic specializations for flexible behavioral control: linking levels of analysis and functional architectures. Curr Opin Behav Sci 2021. [DOI: 10.1016/j.cobeha.2021.07.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
|
106
|
Feng Z, Nagase AM, Morita K. A Reinforcement Learning Approach to Understanding Procrastination: Does Inaccurate Value Approximation Cause Irrational Postponing of a Task? Front Neurosci 2021; 15:660595. [PMID: 34602962 PMCID: PMC8481628 DOI: 10.3389/fnins.2021.660595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 08/16/2021] [Indexed: 11/27/2022] Open
Abstract
Procrastination is the voluntary but irrational postponing of a task despite being aware that the delay can lead to worse consequences. It has been extensively studied in psychological field, from contributing factors, to theoretical models. From value-based decision making and reinforcement learning (RL) perspective, procrastination has been suggested to be caused by non-optimal choice resulting from cognitive limitations. Exactly what sort of cognitive limitations are involved, however, remains elusive. In the current study, we examined if a particular type of cognitive limitation, namely, inaccurate valuation resulting from inadequate state representation, would cause procrastination. Recent work has suggested that humans may adopt a particular type of state representation called the successor representation (SR) and that humans can learn to represent states by relatively low-dimensional features. Combining these suggestions, we assumed a dimension-reduced version of SR. We modeled a series of behaviors of a "student" doing assignments during the school term, when putting off doing the assignments (i.e., procrastination) is not allowed, and during the vacation, when whether to procrastinate or not can be freely chosen. We assumed that the "student" had acquired a rigid reduced SR of each state, corresponding to each step in completing an assignment, under the policy without procrastination. The "student" learned the approximated value of each state which was computed as a linear function of features of the states in the rigid reduced SR, through temporal-difference (TD) learning. During the vacation, the "student" made decisions at each time-step whether to procrastinate based on these approximated values. Simulation results showed that the reduced SR-based RL model generated procrastination behavior, which worsened across episodes. According to the values approximated by the "student," to procrastinate was the better choice, whereas not to procrastinate was mostly better according to the true values. Thus, the current model generated procrastination behavior caused by inaccurate value approximation, which resulted from the adoption of the reduced SR as state representation. These findings indicate that the reduced SR, or more generally, the dimension reduction in state representation, can be a potential form of cognitive limitation that leads to procrastination.
Collapse
Affiliation(s)
- Zheyu Feng
- Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo, Japan
| | - Asako Mitsuto Nagase
- Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo, Japan
- Division of Neurology, Department of Brain and Neurosciences, Faculty of Medicine, Tottori University, Yonago, Japan
- Research Fellowship for Young Scientists, Japan Society for the Promotion of Science, Tokyo, Japan
- Department of Neurology, Faculty of Medicine, Shimane University, Izumo, Japan
| | - Kenji Morita
- Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo, Japan
- International Research Center for Neurointelligence (WPI-IRCN), The University of Tokyo, Tokyo, Japan
| |
Collapse
|
107
|
Chen Y. Neural Representation of Costs and Rewards in Decision Making. Brain Sci 2021; 11:1096. [PMID: 34439715 PMCID: PMC8391424 DOI: 10.3390/brainsci11081096] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 08/17/2021] [Accepted: 08/18/2021] [Indexed: 11/16/2022] Open
Abstract
Decision making is crucial for animal survival because the choices they make based on their current situation could influence their future rewards and could have potential costs. This review summarises recent developments in decision making, discusses how rewards and costs could be encoded in the brain, and how different options are compared such that the most optimal one is chosen. The reward and cost are mainly encoded by the forebrain structures (e.g., anterior cingulate cortex, orbitofrontal cortex), and their value is updated through learning. The recent development on dopamine and the lateral habenula's role in reporting prediction errors and instructing learning will be emphasised. The importance of dopamine in powering the choice and accounting for the internal state will also be discussed. While the orbitofrontal cortex is the place where the state values are stored, the anterior cingulate cortex is more important when the environment is volatile. All of these structures compare different attributes of the task simultaneously, and the local competition of different neuronal networks allows for the selection of the most appropriate one. Therefore, the total value of the task is not encoded as a scalar quantity in the brain but, instead, as an emergent phenomenon, arising from the computation at different brain regions.
Collapse
Affiliation(s)
- Yixuan Chen
- Queens' College, University of Cambridge, Cambridgeshire CB3 9ET, UK
| |
Collapse
|
108
|
Abstract
An organism's survival can depend on its ability to recall and navigate to spatial locations associated with rewards, such as food or a home. Accumulating research has revealed that computations of reward and its prediction occur on multiple levels across a complex set of interacting brain regions, including those that support memory and navigation. However, how the brain coordinates the encoding, recall and use of reward information to guide navigation remains incompletely understood. In this Review, we propose that the brain's classical navigation centres - the hippocampus and the entorhinal cortex - are ideally suited to coordinate this larger network by representing both physical and mental space as a series of states. These states may be linked to reward via neuromodulatory inputs to the hippocampus-entorhinal cortex system. Hippocampal outputs can then broadcast sequences of states to the rest of the brain to store reward associations or to facilitate decision-making, potentially engaging additional value signals downstream. This proposal is supported by recent advances in both experimental and theoretical neuroscience. By discussing the neural systems traditionally tied to navigation and reward at their intersection, we aim to offer an integrated framework for understanding navigation to reward as a fundamental feature of many cognitive processes.
Collapse
|
109
|
Abstract
Heterogeneity is an increasingly appreciated feature of dopamine signaling in the striatum. Hamid et al. (2021) leverage a variety of imaging techniques to reveal striking spatiotemporal patterns of dopamine signals in mouse dorsal striatum. Time will tell what this means for reinforcement learning in the brain.
Collapse
Affiliation(s)
- Bruno F Cruz
- Champalimaud Research, Champalimaud Centre for the Unknown, Lisbon 1400-038, Portugal
| | - Joseph J Paton
- Champalimaud Research, Champalimaud Centre for the Unknown, Lisbon 1400-038, Portugal.
| |
Collapse
|
110
|
Fleming W, Jewell S, Engelhard B, Witten DM, Witten IB. Inferring spikes from calcium imaging in dopamine neurons. PLoS One 2021; 16:e0252345. [PMID: 34086726 PMCID: PMC8177503 DOI: 10.1371/journal.pone.0252345] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Accepted: 05/12/2021] [Indexed: 11/18/2022] Open
Abstract
Calcium imaging has led to discoveries about neural correlates of behavior in subcortical neurons, including dopamine (DA) neurons. However, spike inference methods have not been tested in most populations of subcortical neurons. To address this gap, we simultaneously performed calcium imaging and electrophysiology in DA neurons in brain slices and applied a recently developed spike inference algorithm to the GCaMP fluorescence. This revealed that individual spikes can be inferred accurately in this population. Next, we inferred spikes in vivo from calcium imaging from these neurons during Pavlovian conditioning, as well as during navigation in virtual reality. In both cases, we quantitatively recapitulated previous in vivo electrophysiological observations. Our work provides a validated approach to infer spikes from calcium imaging in DA neurons and implies that aspects of both tonic and phasic spike patterns can be recovered.
Collapse
Affiliation(s)
- Weston Fleming
- Princeton Neuroscience Institute, Princeton University, Princeton, New Jersey, United States of America
| | - Sean Jewell
- Department of Statistics & Biostatistics, University of Washington, Seattle, Washington, United States of America
| | - Ben Engelhard
- Princeton Neuroscience Institute, Princeton University, Princeton, New Jersey, United States of America
| | - Daniela M. Witten
- Department of Statistics & Biostatistics, University of Washington, Seattle, Washington, United States of America
| | - Ilana B. Witten
- Princeton Neuroscience Institute, Princeton University, Princeton, New Jersey, United States of America
| |
Collapse
|
111
|
Liu C, Goel P, Kaeser PS. Spatial and temporal scales of dopamine transmission. Nat Rev Neurosci 2021; 22:345-358. [PMID: 33837376 PMCID: PMC8220193 DOI: 10.1038/s41583-021-00455-7] [Citation(s) in RCA: 95] [Impact Index Per Article: 31.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/04/2021] [Indexed: 02/02/2023]
Abstract
Dopamine is a prototypical neuromodulator that controls circuit function through G protein-coupled receptor signalling. Neuromodulators are volume transmitters, with release followed by diffusion for widespread receptor activation on many target cells. Yet, we are only beginning to understand the specific organization of dopamine transmission in space and time. Although some roles of dopamine are mediated by slow and diffuse signalling, recent studies suggest that certain dopamine functions necessitate spatiotemporal precision. Here, we review the literature describing dopamine signalling in the striatum, including its release mechanisms and receptor organization. We then propose the domain-overlap model, in which release and receptors are arranged relative to one another in micrometre-scale structures. This architecture is different from both point-to-point synaptic transmission and the widespread organization that is often proposed for neuromodulation. It enables the activation of receptor subsets that are within micrometre-scale domains of release sites during baseline activity and broader receptor activation with domain overlap when firing is synchronized across dopamine neuron populations. This signalling structure, together with the properties of dopamine release, may explain how switches in firing modes support broad and dynamic roles for dopamine and may lead to distinct pathway modulation.
Collapse
Affiliation(s)
- Changliang Liu
- Department of Neurobiology, Harvard Medical School, Boston, MA, USA
| | - Pragya Goel
- Department of Neurobiology, Harvard Medical School, Boston, MA, USA
| | - Pascal S Kaeser
- Department of Neurobiology, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
112
|
Shimomura K, Kato A, Morita K. Rigid reduced successor representation as a potential mechanism for addiction. Eur J Neurosci 2021; 53:3768-3790. [PMID: 33840120 PMCID: PMC8252639 DOI: 10.1111/ejn.15227] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Revised: 03/30/2021] [Accepted: 04/07/2021] [Indexed: 12/14/2022]
Abstract
Difficulty in cessation of drinking, smoking, or gambling has been widely recognized. Conventional theories proposed relative dominance of habitual over goal-directed control, but human studies have not convincingly supported them. Referring to the recently suggested "successor representation (SR)" of states that enables partially goal-directed control, we propose a dopamine-related mechanism that makes resistance to habitual reward-obtaining particularly difficult. We considered that long-standing behavior towards a certain reward without resisting temptation can (but not always) lead to a formation of rigid dimension-reduced SR based on the goal state, which cannot be updated. Then, in our model assuming such rigid reduced SR, whereas no reward prediction error (RPE) is generated at the goal while no resistance is made, a sustained large positive RPE is generated upon goal reaching once the person starts resisting temptation. Such sustained RPE is somewhat similar to the hypothesized sustained fictitious RPE caused by drug-induced dopamine. In contrast, if rigid reduced SR is not formed and states are represented individually as in simple reinforcement learning models, no sustained RPE is generated at the goal. Formation of rigid reduced SR also attenuates the resistance-dependent decrease in the value of the cue for behavior, makes subsequent introduction of punishment after the goal ineffective, and potentially enhances the propensity of nonresistance through the influence of RPEs via the spiral striatum-midbrain circuit. These results suggest that formation of rigid reduced SR makes cessation of habitual reward-obtaining particularly difficult and can thus be a mechanism for addiction, common to substance and nonsubstance reward.
Collapse
Affiliation(s)
- Kanji Shimomura
- Physical and Health EducationGraduate School of EducationThe University of TokyoTokyoJapan
- Department of Behavioral MedicineNational Institute of Mental HealthNational Center of Neurology and PsychiatryKodairaJapan
| | - Ayaka Kato
- Department of Life SciencesGraduate School of Arts and SciencesThe University of TokyoTokyoJapan
- Laboratory for Circuit Mechanisms of Sensory PerceptionRIKEN Center for Brain ScienceWakoJapan
- Research Fellowship for Young ScientistsJapan Society for the Promotion of ScienceTokyoJapan
| | - Kenji Morita
- Physical and Health EducationGraduate School of EducationThe University of TokyoTokyoJapan
- International Research Center for Neurointelligence (WPI‐IRCN)The University of TokyoTokyoJapan
| |
Collapse
|
113
|
Xu HA, Modirshanechi A, Lehmann MP, Gerstner W, Herzog MH. Novelty is not surprise: Human exploratory and adaptive behavior in sequential decision-making. PLoS Comput Biol 2021; 17:e1009070. [PMID: 34081705 PMCID: PMC8205159 DOI: 10.1371/journal.pcbi.1009070] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 06/15/2021] [Accepted: 05/12/2021] [Indexed: 11/19/2022] Open
Abstract
Classic reinforcement learning (RL) theories cannot explain human behavior in the absence of external reward or when the environment changes. Here, we employ a deep sequential decision-making paradigm with sparse reward and abrupt environmental changes. To explain the behavior of human participants in these environments, we show that RL theories need to include surprise and novelty, each with a distinct role. While novelty drives exploration before the first encounter of a reward, surprise increases the rate of learning of a world-model as well as of model-free action-values. Even though the world-model is available for model-based RL, we find that human decisions are dominated by model-free action choices. The world-model is only marginally used for planning, but it is important to detect surprising events. Our theory predicts human action choices with high probability and allows us to dissociate surprise, novelty, and reward in EEG signals.
Collapse
Affiliation(s)
- He A. Xu
- Laboratory of Psychophysics, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Alireza Modirshanechi
- Brain-Mind Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- School of Computer and Communication Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Marco P. Lehmann
- Brain-Mind Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- School of Computer and Communication Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Wulfram Gerstner
- Brain-Mind Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- School of Computer and Communication Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Michael H. Herzog
- Laboratory of Psychophysics, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Brain-Mind Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| |
Collapse
|
114
|
Leventhal DK, Albin RL. Interviewing Mice and the Functions of Striatal Dopamine. Mov Disord 2021; 36:1330-1331. [PMID: 33983666 DOI: 10.1002/mds.28646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Revised: 04/22/2021] [Accepted: 04/26/2021] [Indexed: 11/06/2022] Open
Affiliation(s)
- Daniel K Leventhal
- Department of Neurology, University of Michigan, Ann Arbor, Michigan, USA.,Department of Biomedical Engineering, University of Michigan, Ann Arbor, Michigan, USA.,Parkinson Disease Foundation Research Center of Excellence, University of Michigan, Ann Arbor, Michigan, USA.,Department of Neurology Service and GRECC, VA Ann Arbor Health System, Ann Arbor, Michigan, USA
| | - Roger L Albin
- Department of Neurology, University of Michigan, Ann Arbor, Michigan, USA.,Parkinson Disease Foundation Research Center of Excellence, University of Michigan, Ann Arbor, Michigan, USA.,Department of Neurology Service and GRECC, VA Ann Arbor Health System, Ann Arbor, Michigan, USA
| |
Collapse
|
115
|
Hamid AA, Frank MJ, Moore CI. Wave-like dopamine dynamics as a mechanism for spatiotemporal credit assignment. Cell 2021; 184:2733-2749.e16. [PMID: 33861952 PMCID: PMC8122079 DOI: 10.1016/j.cell.2021.03.046] [Citation(s) in RCA: 71] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 12/31/2020] [Accepted: 03/23/2021] [Indexed: 12/17/2022]
Abstract
Significant evidence supports the view that dopamine shapes learning by encoding reward prediction errors. However, it is unknown whether striatal targets receive tailored dopamine dynamics based on regional functional specialization. Here, we report wave-like spatiotemporal activity patterns in dopamine axons and release across the dorsal striatum. These waves switch between activational motifs and organize dopamine transients into localized clusters within functionally related striatal subregions. Notably, wave trajectories were tailored to task demands, propagating from dorsomedial to dorsolateral striatum when rewards are contingent on animal behavior and in the opponent direction when rewards are independent of behavioral responses. We propose a computational architecture in which striatal dopamine waves are sculpted by inference about agency and provide a mechanism to direct credit assignment to specialized striatal subregions. Supporting model predictions, dorsomedial dopamine activity during reward-pursuit signaled the extent of instrumental control and interacted with reward waves to predict future behavioral adjustments.
Collapse
Affiliation(s)
- Arif A Hamid
- Department of Neuroscience, Brown University, Providence, RI 02912, USA; Carney Institute for Brain Science, Brown University, Providence, RI 02912, USA.
| | - Michael J Frank
- Department of Cognitive Linguistics & Psychological Sciences, Brown University, Providence, RI 02912, USA; Carney Institute for Brain Science, Brown University, Providence, RI 02912, USA.
| | - Christopher I Moore
- Department of Neuroscience, Brown University, Providence, RI 02912, USA; Carney Institute for Brain Science, Brown University, Providence, RI 02912, USA.
| |
Collapse
|
116
|
Liu Y, Xin Y, Xu NL. A cortical circuit mechanism for structural knowledge-based flexible sensorimotor decision-making. Neuron 2021; 109:2009-2024.e6. [PMID: 33957065 DOI: 10.1016/j.neuron.2021.04.014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 03/01/2021] [Accepted: 04/14/2021] [Indexed: 10/21/2022]
Abstract
Making flexible decisions based on prior knowledge about causal environmental structures is a hallmark of goal-directed cognition in mammalian brains. Although several association brain regions, including the orbitofrontal cortex (OFC), have been implicated, the precise neuronal circuit mechanisms underlying knowledge-based decision-making remain elusive. Here, we established an inference-based auditory categorization task where mice performed within-session flexible stimulus re-categorization by inferring the changing task rules. We constructed a reinforcement learning model to recapitulate the inference-based flexible behavior and quantify the hidden variables associated with task structural knowledge. Combining two-photon population imaging and projection-specific optogenetics, we found that auditory cortex (ACx) neurons encoded the hidden task rule variable, which requires feedback input from the OFC. Silencing OFC-ACx input specifically disrupted re-categorization behavior. Direct imaging from OFC axons in the ACx revealed task state-related feedback signals, supporting the knowledge-based updating mechanism. Our data reveal a cortical circuit mechanism underlying structural knowledge-based flexible decision-making.
Collapse
Affiliation(s)
- Yanhe Liu
- Institute of Neuroscience, State Key Laboratory of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031, China; University of the Chinese Academy of Sciences, Beijing 100049, China
| | - Yu Xin
- Institute of Neuroscience, State Key Laboratory of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031, China; University of the Chinese Academy of Sciences, Beijing 100049, China
| | - Ning-Long Xu
- Institute of Neuroscience, State Key Laboratory of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031, China; University of the Chinese Academy of Sciences, Beijing 100049, China; Shanghai Center for Brain Science and Brain-Inspired Intelligence Technology, Shanghai 201210, China.
| |
Collapse
|
117
|
Hegedüs P, Heckenast J, Hangya B. Differential recruitment of ventral pallidal e-types by behaviorally salient stimuli during Pavlovian conditioning. iScience 2021; 24:102377. [PMID: 33912818 PMCID: PMC8066429 DOI: 10.1016/j.isci.2021.102377] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Revised: 02/22/2021] [Accepted: 03/26/2021] [Indexed: 10/25/2022] Open
Abstract
The ventral pallidum (VP) is interfacing striatopallidal and limbic circuits, conveying information about salience and valence crucial to adjusting behavior. However, how VP neuron populations with distinct electrophysiological properties (e-types) represent these variables is not fully understood. Therefore, we trained mice on probabilistic Pavlovian conditioning while recording the activity of VP neurons. Many VP neurons responded to punishment (54%), reward (48%), and outcome-predicting auditory stimuli (32%), increasingly differentiating distinct outcome probabilities through learning. We identified e-types based on the presence of bursts or fast rhythmic discharges and found that non-bursting, non-rhythmic neurons were the most sensitive to reward and punishment. Some neurons exhibited distinct responses of their bursts and single spikes, suggesting a multiplexed coding scheme in the VP. Finally, we demonstrate synchronously firing neuron assemblies, particularly responsive to reinforcing stimuli. These results suggest that electrophysiologically defined e-types of the VP differentially participate in transmitting reinforcement signals during learning.
Collapse
Affiliation(s)
- Panna Hegedüs
- Lendület Laboratory of Systems Neuroscience, Institute of Experimental Medicine, Budapest 1083, Hungary
- János Szentágothai Doctoral School of Neurosciences, Semmelweis University, Budapest 1085, Hungary
| | - Julia Heckenast
- Lendület Laboratory of Systems Neuroscience, Institute of Experimental Medicine, Budapest 1083, Hungary
| | - Balázs Hangya
- Lendület Laboratory of Systems Neuroscience, Institute of Experimental Medicine, Budapest 1083, Hungary
| |
Collapse
|
118
|
Lerner TN, Holloway AL, Seiler JL. Dopamine, Updated: Reward Prediction Error and Beyond. Curr Opin Neurobiol 2021; 67:123-130. [PMID: 33197709 PMCID: PMC8116345 DOI: 10.1016/j.conb.2020.10.012] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 10/12/2020] [Accepted: 10/14/2020] [Indexed: 01/10/2023]
Abstract
Dopamine neurons have been intensely studied for their roles in reinforcement learning. A dominant theory of how these neurons contribute to learning is through the encoding of a reward prediction error (RPE) signal. Recent advances in dopamine research have added nuance to RPE theory by incorporating the ideas of sensory prediction error, distributional encoding, and belief states. Further nuance is likely to be added shortly by convergent lines of research on dopamine neuron diversity. Finally, a major challenge is to reconcile RPE theory with other current theories of dopamine function to account for dopamine's role in movement, motivation, and goal-directed planning.
Collapse
Affiliation(s)
- Talia N Lerner
- Feinberg School of Medicine and Department of Physiology, Northwestern University, Chicago, IL, USA; Northwestern University Interdepartmental Neuroscience Program, Chicago, IL, USA.
| | - Ashley L Holloway
- Feinberg School of Medicine and Department of Physiology, Northwestern University, Chicago, IL, USA; Northwestern University Interdepartmental Neuroscience Program, Chicago, IL, USA
| | - Jillian L Seiler
- Feinberg School of Medicine and Department of Physiology, Northwestern University, Chicago, IL, USA; Department of Psychology, University of Illinois at Chicago, Chicago, IL, USA
| |
Collapse
|
119
|
Mikhael JG, Lai L, Gershman SJ. Rational inattention and tonic dopamine. PLoS Comput Biol 2021; 17:e1008659. [PMID: 33760806 PMCID: PMC7990190 DOI: 10.1371/journal.pcbi.1008659] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Accepted: 12/28/2020] [Indexed: 11/27/2022] Open
Abstract
Slow-timescale (tonic) changes in dopamine (DA) contribute to a wide variety of processes in reinforcement learning, interval timing, and other domains. Furthermore, changes in tonic DA exert distinct effects depending on when they occur (e.g., during learning vs. performance) and what task the subject is performing (e.g., operant vs. classical conditioning). Two influential theories of tonic DA-the average reward theory and the Bayesian theory in which DA controls precision-have each been successful at explaining a subset of empirical findings. But how the same DA signal performs two seemingly distinct functions without creating crosstalk is not well understood. Here we reconcile the two theories under the unifying framework of 'rational inattention,' which (1) conceptually links average reward and precision, (2) outlines how DA manipulations affect this relationship, and in so doing, (3) captures new empirical phenomena. In brief, rational inattention asserts that agents can increase their precision in a task (and thus improve their performance) by paying a cognitive cost. Crucially, whether this cost is worth paying depends on average reward availability, reported by DA. The monotonic relationship between average reward and precision means that the DA signal contains the information necessary to retrieve the precision. When this information is needed after the task is performed, as presumed by Bayesian inference, acute manipulations of DA will bias behavior in predictable ways. We show how this framework reconciles a remarkably large collection of experimental findings. In reinforcement learning, the rational inattention framework predicts that learning from positive and negative feedback should be enhanced in high and low DA states, respectively, and that DA should tip the exploration-exploitation balance toward exploitation. In interval timing, this framework predicts that DA should increase the speed of the internal clock and decrease the extent of interference by other temporal stimuli during temporal reproduction (the central tendency effect). Finally, rational inattention makes the new predictions that these effects should be critically dependent on the controllability of rewards, that post-reward delays in intertemporal choice tasks should be underestimated, and that average reward manipulations should affect the speed of the clock-thus capturing empirical findings that are unexplained by either theory alone. Our results suggest that a common computational repertoire may underlie the seemingly heterogeneous roles of DA.
Collapse
Affiliation(s)
- John G. Mikhael
- Program in Neuroscience, Harvard Medical School, Boston, Massachusetts, United States of America
- MD-PhD Program, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Lucy Lai
- Program in Neuroscience, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Samuel J. Gershman
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
| |
Collapse
|
120
|
Birtalan E, Bánhidi A, Sanders JI, Balázsfi D, Hangya B. Efficient training of mice on the 5-choice serial reaction time task in an automated rodent training system. Sci Rep 2020; 10:22362. [PMID: 33349672 PMCID: PMC7752912 DOI: 10.1038/s41598-020-79290-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Accepted: 12/07/2020] [Indexed: 11/24/2022] Open
Abstract
Experiments aiming to understand sensory-motor systems, cognition and behavior necessitate training animals to perform complex tasks. Traditional training protocols require lab personnel to move the animals between home cages and training chambers, to start and end training sessions, and in some cases, to hand-control each training trial. Human labor not only limits the amount of training per day, but also introduces several sources of variability and may increase animal stress. Here we present an automated training system for the 5-choice serial reaction time task (5CSRTT), a classic rodent task often used to test sensory detection, sustained attention and impulsivity. We found that full automation without human intervention allowed rapid, cost-efficient training, and decreased stress as measured by corticosterone levels. Training breaks introduced only a transient drop in performance, and mice readily generalized across training systems when transferred from automated to manual protocols. We further validated our automated training system with wireless optogenetics and pharmacology experiments, expanding the breadth of experimental needs our system may fulfill. Our automated 5CSRTT system can serve as a prototype for fully automated behavioral training, with methods and principles transferrable to a range of rodent tasks.
Collapse
Affiliation(s)
- Eszter Birtalan
- Lendület Laboratory of Systems Neuroscience, Institute of Experimental Medicine, Budapest, Hungary
| | - Anita Bánhidi
- Lendület Laboratory of Systems Neuroscience, Institute of Experimental Medicine, Budapest, Hungary
| | | | - Diána Balázsfi
- Lendület Laboratory of Systems Neuroscience, Institute of Experimental Medicine, Budapest, Hungary.
| | - Balázs Hangya
- Lendület Laboratory of Systems Neuroscience, Institute of Experimental Medicine, Budapest, Hungary.
| |
Collapse
|
121
|
Lowet AS, Zheng Q, Matias S, Drugowitsch J, Uchida N. Distributional Reinforcement Learning in the Brain. Trends Neurosci 2020; 43:980-997. [PMID: 33092893 PMCID: PMC8073212 DOI: 10.1016/j.tins.2020.09.004] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Revised: 08/14/2020] [Accepted: 09/08/2020] [Indexed: 12/11/2022]
Abstract
Learning about rewards and punishments is critical for survival. Classical studies have demonstrated an impressive correspondence between the firing of dopamine neurons in the mammalian midbrain and the reward prediction errors of reinforcement learning algorithms, which express the difference between actual reward and predicted mean reward. However, it may be advantageous to learn not only the mean but also the complete distribution of potential rewards. Recent advances in machine learning have revealed a biologically plausible set of algorithms for reconstructing this reward distribution from experience. Here, we review the mathematical foundations of these algorithms as well as initial evidence for their neurobiological implementation. We conclude by highlighting outstanding questions regarding the circuit computation and behavioral readout of these distributional codes.
Collapse
Affiliation(s)
- Adam S Lowet
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA 02138, USA
| | - Qiao Zheng
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Sara Matias
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA 02138, USA
| | - Jan Drugowitsch
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115, USA.
| | - Naoshige Uchida
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA 02138, USA.
| |
Collapse
|