1
|
Chen W, Liang J, Wu Q, Han Y. Anterior cingulate cortex provides the neural substrates for feedback-driven iteration of decision and value representation. Nat Commun 2024; 15:6020. [PMID: 39019943 PMCID: PMC11255269 DOI: 10.1038/s41467-024-50388-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Accepted: 07/05/2024] [Indexed: 07/19/2024] Open
Abstract
Adjusting decision-making under uncertain and dynamic situations is the hallmark of intelligence. It requires a system capable of converting feedback information to renew the internal value. The anterior cingulate cortex (ACC) involves in error and reward events that prompt switching or maintenance of current decision strategies. However, it is unclear whether and how the changes of stimulus-action mapping during behavioral adaptation are encoded, nor how such computation drives decision adaptation. Here, we tracked ACC activity in male mice performing go/no-go auditory discrimination tasks with manipulated stimulus-reward contingencies. Individual ACC neurons integrate the outcome information to the value representation in the next-run trials. Dynamic recruitment of them determines the learning rate of error-guided value iteration and decision adaptation, forming a non-linear feedback-driven updating system to secure the appropriate decision switch. Optogenetically suppressing ACC significantly slowed down feedback-driven decision switching without interfering with the execution of the established strategy.
Collapse
Affiliation(s)
- Wenqi Chen
- Department of Neurobiology, School of Basic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Jiejunyi Liang
- State Key Laboratory of Intelligent Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
| | - Qiyun Wu
- State Key Laboratory of Intelligent Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
| | - Yunyun Han
- Department of Neurobiology, School of Basic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China.
| |
Collapse
|
2
|
Mendonça MD, da Silva JA, Hernandez LF, Castela I, Obeso J, Costa RM. Dopamine neuron activity encodes the length of upcoming contralateral movement sequences. Curr Biol 2024; 34:1034-1047.e4. [PMID: 38377999 PMCID: PMC10931818 DOI: 10.1016/j.cub.2024.01.067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2022] [Revised: 12/01/2023] [Accepted: 01/26/2024] [Indexed: 02/22/2024]
Abstract
Dopaminergic neurons (DANs) in the substantia nigra pars compacta (SNc) have been related to movement speed, and loss of these neurons leads to bradykinesia in Parkinson's disease (PD). However, other aspects of movement vigor are also affected in PD; for example, movement sequences are typically shorter. However, the relationship between the activity of DANs and the length of movement sequences is unknown. We imaged activity of SNc DANs in mice trained in a freely moving operant task, which relies on individual forelimb sequences. We uncovered a similar proportion of SNc DANs increasing their activity before either ipsilateral or contralateral sequences. However, the magnitude of this activity was higher for contralateral actions and was related to contralateral but not ipsilateral sequence length. In contrast, the activity of reward-modulated DANs, largely distinct from those modulated by movement, was not lateralized. Finally, unilateral dopamine depletion impaired contralateral, but not ipsilateral, sequence length. These results indicate that movement-initiation DANs encode more than a general motivation signal and invigorate aspects of contralateral movements.
Collapse
Affiliation(s)
- Marcelo D Mendonça
- Champalimaud Research, Champalimaud Foundation, 1400 038 Lisbon, Portugal; Champalimaud Clinical Centre, Champalimaud Foundation, 1400 038 Lisbon, Portugal; NOVA Medical School, Faculdade de Ciências Médicas, Universidade Nova de Lisboa, Lisbon 1169 056, Portugal
| | - Joaquim Alves da Silva
- Champalimaud Research, Champalimaud Foundation, 1400 038 Lisbon, Portugal; NOVA Medical School, Faculdade de Ciências Médicas, Universidade Nova de Lisboa, Lisbon 1169 056, Portugal
| | - Ledia F Hernandez
- HM CINAC, Centro Integral de Neurociencias AC, Fundación de Investigación HM Hospitales, Madrid 28938, Spain; Center for Networked Biomedical Research on Neurodegenerative Diseases (CIBERNED), Carlos III Institute of Health, Madrid 28029, Spain; Universidad CEU San Pablo, Madrid 28003, Spain
| | - Ivan Castela
- HM CINAC, Centro Integral de Neurociencias AC, Fundación de Investigación HM Hospitales, Madrid 28938, Spain; Center for Networked Biomedical Research on Neurodegenerative Diseases (CIBERNED), Carlos III Institute of Health, Madrid 28029, Spain; PhD Program in Neuroscience, Autonoma de Madrid University, Madrid 28029, Spain
| | - José Obeso
- HM CINAC, Centro Integral de Neurociencias AC, Fundación de Investigación HM Hospitales, Madrid 28938, Spain; Center for Networked Biomedical Research on Neurodegenerative Diseases (CIBERNED), Carlos III Institute of Health, Madrid 28029, Spain; Universidad CEU San Pablo, Madrid 28003, Spain; Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD 20815, USA
| | - Rui M Costa
- Champalimaud Research, Champalimaud Foundation, 1400 038 Lisbon, Portugal; Departments of Neuroscience and Neurology, Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, USA; Allen Institute, Seattle, WA 98109, USA; Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD 20815, USA.
| |
Collapse
|
3
|
Stetsenko A, Koos T. Neuronal implementation of the temporal difference learning algorithm in the midbrain dopaminergic system. Proc Natl Acad Sci U S A 2023; 120:e2309015120. [PMID: 37903252 PMCID: PMC10636325 DOI: 10.1073/pnas.2309015120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 09/29/2023] [Indexed: 11/01/2023] Open
Abstract
The temporal difference learning (TDL) algorithm has been essential to conceptualizing the role of dopamine in reinforcement learning (RL). Despite its theoretical importance, it remains unknown whether a neuronal implementation of this algorithm exists in the brain. Here, we provide an interpretation of the recently described signaling properties of ventral tegmental area (VTA) GABAergic neurons and show that a circuitry of these neurons implements the TDL algorithm. Specifically, we identified the neuronal mechanism of three key components of the TDL model: a sustained state value signal encoded by an afferent input to the VTA, a temporal differentiation circuit formed by two types of VTA GABAergic neurons the combined output of which computes momentary reward prediction (RP) as the derivative of the state value, and the computation of reward prediction errors (RPEs) in dopamine neurons utilizing the output of the differentiation circuit. Using computational methods, we also show that this mechanism is optimally adapted to the biophysics of RPE signaling in dopamine neurons, mechanistically links the emergence of conditioned reinforcement to RP, and can naturally account for the temporal discounting of reinforcement. Elucidating the implementation of the TDL algorithm may further the investigation of RL in biological and artificial systems.
Collapse
Affiliation(s)
- Anya Stetsenko
- Center for Molecular and Behavioral Neuroscience, Rutgers University, Newark, NJ07102
| | - Tibor Koos
- Center for Molecular and Behavioral Neuroscience, Rutgers University, Newark, NJ07102
| |
Collapse
|
4
|
Ianni AM, Eisenberg DP, Boorman ED, Constantino SM, Hegarty CE, Gregory MD, Masdeu JC, Kohn PD, Behrens TE, Berman KF. PET-measured human dopamine synthesis capacity and receptor availability predict trading rewards and time-costs during foraging. Nat Commun 2023; 14:6122. [PMID: 37777515 PMCID: PMC10542376 DOI: 10.1038/s41467-023-41897-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 09/18/2023] [Indexed: 10/02/2023] Open
Abstract
Foraging behavior requires weighing costs of time to decide when to leave one reward patch to search for another. Computational and animal studies suggest that striatal dopamine is key to this process; however, the specific role of dopamine in foraging behavior in humans is not well characterized. We use positron emission tomography (PET) imaging to directly measure dopamine synthesis capacity and D1 and D2/3 receptor availability in 57 healthy adults who complete a computerized foraging task. Using voxelwise data and principal component analysis to identify patterns of variation across PET measures, we show that striatal D1 and D2/3 receptor availability and a pattern of mesolimbic and anterior cingulate cortex dopamine function are important for adjusting the threshold for leaving a patch to explore, with specific sensitivity to changes in travel time. These findings suggest a key role for dopamine in trading reward benefits against temporal costs to modulate behavioral adaptions to changes in the reward environment critical for foraging.
Collapse
Affiliation(s)
- Angela M Ianni
- Clinical & Translational Neuroscience Branch, National Institutes of Mental Health, Intramural Research Program, National Institutes of Health, Bethesda, MD, USA.
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, United Kingdom.
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, USA.
| | - Daniel P Eisenberg
- Clinical & Translational Neuroscience Branch, National Institutes of Mental Health, Intramural Research Program, National Institutes of Health, Bethesda, MD, USA
| | - Erie D Boorman
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, United Kingdom
| | - Sara M Constantino
- Department of Psychology, New York University, New York, NY, USA
- School of Public Policy and Urban Affairs, Northeastern University, Boston, MA, USA
- Department of Psychology, Northeastern University, Boston, MA, USA
- School of Public and International Affairs, Princeton University, Princeton, NJ, USA
| | - Catherine E Hegarty
- Clinical & Translational Neuroscience Branch, National Institutes of Mental Health, Intramural Research Program, National Institutes of Health, Bethesda, MD, USA
| | - Michael D Gregory
- Clinical & Translational Neuroscience Branch, National Institutes of Mental Health, Intramural Research Program, National Institutes of Health, Bethesda, MD, USA
| | - Joseph C Masdeu
- Clinical & Translational Neuroscience Branch, National Institutes of Mental Health, Intramural Research Program, National Institutes of Health, Bethesda, MD, USA
- Houston Methodist Institute for Academic Medicine, Houston, TX, USA
- Weill Cornell Medicine, New York, NY, USA
| | - Philip D Kohn
- Clinical & Translational Neuroscience Branch, National Institutes of Mental Health, Intramural Research Program, National Institutes of Health, Bethesda, MD, USA
| | - Timothy E Behrens
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, United Kingdom
| | - Karen F Berman
- Clinical & Translational Neuroscience Branch, National Institutes of Mental Health, Intramural Research Program, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
5
|
Takahashi YK, Zhang Z, Montesinos-Cartegena M, Kahnt T, Langdon AJ, Schoenbaum G. Expectancy-related changes in firing of dopamine neurons depend on hippocampus. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.19.549728. [PMID: 37781610 PMCID: PMC10541105 DOI: 10.1101/2023.07.19.549728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/03/2023]
Abstract
The orbitofrontal cortex (OFC) and hippocampus (HC) are both implicated in forming the cognitive or task maps that support flexible behavior. Previously, we used the dopamine neurons as a sensor or tool to measure the functional effects of OFC lesions (Takahashi et al., 2011). We recorded midbrain dopamine neurons as rats performed an odor-based choice task, in which errors in the prediction of reward were induced by manipulating the number or timing of the expected rewards across blocks of trials. We found that OFC lesions ipsilateral to the recording electrodes caused prediction errors to be degraded consistent with a loss in the resolution of the task states, particularly under conditions where hidden information was critical to sharpening the predictions. Here we have repeated this experiment, along with computational modeling of the results, in rats with ipsilateral HC lesions. The results show HC also shapes the map of our task, however unlike OFC, which provides information local to the trial, the HC appears to be necessary for estimating the upper-level hidden states based on the information that is discontinuous or separated by longer timescales. The results contrast the respective roles of the OFC and HC in cognitive mapping and add to evidence that the dopamine neurons access a rich information set from distributed regions regarding the predictive structure of the environment, potentially enabling this powerful teaching signal to support complex learning and behavior.
Collapse
Affiliation(s)
- Yuji K Takahashi
- Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD
| | - Zhewei Zhang
- Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD
| | | | - Thorsten Kahnt
- Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD
| | - Angela J Langdon
- Intramural Research Program, National Institute on Mental Health, Bethesda, MD
| | - Geoffrey Schoenbaum
- Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD
| |
Collapse
|
6
|
Odland AU, Sandahl R, Andreasen JT. Chronic corticosterone improves perseverative behavior in mice during sequential reversal learning. Behav Brain Res 2023; 450:114479. [PMID: 37169127 DOI: 10.1016/j.bbr.2023.114479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 04/04/2023] [Accepted: 05/06/2023] [Indexed: 05/13/2023]
Abstract
BACKGROUND Stressful life events can both trigger development of psychiatric disorders and promote positive behavioral changes in response to adversities. The relationship between stress and cognitive flexibility is complex, and conflicting effects of stress manifest in both humans and laboratory animals. OBJECTIVE To mirror the clinical situation where stressful life events impair mental health or promote behavioral change, we examined the post-exposure effects of stress on cognitive flexibility in mice. METHODS We tested female C57BL/6JOlaHsd mice in the touchscreen-based sequential reversal learning test. Corticosterone (CORT) was used as a model of stress and was administered in the drinking water for two weeks before reversal learning. Control animals received drinking water without CORT. Behaviors in supplementary tests were included to exclude non-specific confounding effects of CORT and improve interpretation of the results. RESULTS CORT-treated mice were similar to controls on all touchscreen parameters before reversal. During the low accuracy phase of reversal learning, CORT reduced perseveration index, a measure of perseverative responding, but did not affect acquisition of the new reward contingency. This effect was not related to non-specific deficits in chamber activity. CORT increased anxiety-like behavior in the elevated zero maze test and repetitive digging in the marble burying test, reduced locomotor activity, but did not affect spontaneous alternation behavior. CONCLUSION CORT improved cognitive flexibility in the reversal learning test by extinguishing prepotent responses that were no longer rewarded, an effect possibly related to a stress-mediated increase in sensitivity to negative feedback that should be confirmed in a larger study.
Collapse
Affiliation(s)
- Anna U Odland
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, DK-2100, Copenhagen, Denmark
| | - Rune Sandahl
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, DK-2100, Copenhagen, Denmark
| | - Jesper T Andreasen
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, DK-2100, Copenhagen, Denmark.
| |
Collapse
|
7
|
Takahashi YK, Stalnaker TA, Mueller LE, Harootonian SK, Langdon AJ, Schoenbaum G. Dopaminergic prediction errors in the ventral tegmental area reflect a multithreaded predictive model. Nat Neurosci 2023; 26:830-839. [PMID: 37081296 PMCID: PMC10646487 DOI: 10.1038/s41593-023-01310-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 03/16/2023] [Indexed: 04/22/2023]
Abstract
Dopamine neuron activity is tied to the prediction error in temporal difference reinforcement learning models. These models make significant simplifying assumptions, particularly with regard to the structure of the predictions fed into the dopamine neurons, which consist of a single chain of timepoint states. Although this predictive structure can explain error signals observed in many studies, it cannot cope with settings where subjects might infer multiple independent events and outcomes. In the present study, we recorded dopamine neurons in the ventral tegmental area in such a setting to test the validity of the single-stream assumption. Rats were trained in an odor-based choice task, in which the timing and identity of one of several rewards delivered in each trial changed across trial blocks. This design revealed an error signaling pattern that requires the dopamine neurons to access and update multiple independent predictive streams reflecting the subject's belief about timing and potentially unique identities of expected rewards.
Collapse
Affiliation(s)
- Yuji K Takahashi
- Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD, USA.
| | - Thomas A Stalnaker
- Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD, USA
| | - Lauren E Mueller
- Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD, USA
| | | | - Angela J Langdon
- Intramural Research Program, National Institute of Mental Health, Bethesda, MD, USA.
| | - Geoffrey Schoenbaum
- Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD, USA.
| |
Collapse
|
8
|
Banerjee A, Wang BA, Teutsch J, Helmchen F, Pleger B. Analogous cognitive strategies for tactile learning in the rodent and human brain. Prog Neurobiol 2023; 222:102401. [PMID: 36608783 DOI: 10.1016/j.pneurobio.2023.102401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Revised: 12/21/2022] [Accepted: 01/02/2023] [Indexed: 01/05/2023]
Abstract
Evolution has molded individual species' sensory capacities and abilities. In rodents, who mostly inhabit dark tunnels and burrows, the whisker-based somatosensory system has developed as the dominant sensory modality, essential for environmental exploration and spatial navigation. In contrast, humans rely more on visual and auditory inputs when collecting information from their surrounding sensory space in everyday life. As a result of such species-specific differences in sensory dominance, cognitive relevance and capacities, the evidence for analogous sensory-cognitive mechanisms across species remains sparse. However, recent research in rodents and humans yielded surprisingly comparable processing rules for detecting tactile stimuli, integrating touch information into percepts, and goal-directed rule learning. Here, we review how the brain, across species, harnesses such processing rules to establish decision-making during tactile learning, following canonical circuits from the thalamus and the primary somatosensory cortex up to the frontal cortex. We discuss concordances between empirical and computational evidence from micro- and mesoscopic circuit studies in rodents to findings from macroscopic imaging in humans. Furthermore, we discuss the relevance and challenges for future cross-species research in addressing mutual context-dependent evaluation processes underpinning perceptual learning.
Collapse
Affiliation(s)
- Abhishek Banerjee
- Adaptive Decisions Lab, Biosciences Institute, Newcastle University, United Kingdom.
| | - Bin A Wang
- Department of Neurology, BG University Hospital Bergmannsheil, Ruhr University Bochum, Germany; Collaborative Research Centre 874 "Integration and Representation of Sensory Processes", Ruhr University Bochum, Germany.
| | - Jasper Teutsch
- Adaptive Decisions Lab, Biosciences Institute, Newcastle University, United Kingdom
| | - Fritjof Helmchen
- Laboratory of Neural Circuit Dynamics, Brain Research Institute, University of Zürich, Switzerland
| | - Burkhard Pleger
- Department of Neurology, BG University Hospital Bergmannsheil, Ruhr University Bochum, Germany; Collaborative Research Centre 874 "Integration and Representation of Sensory Processes", Ruhr University Bochum, Germany
| |
Collapse
|
9
|
Bosulu J, Allaire MA, Tremblay-Grénier L, Luo Y, Eickhoff S, Hétu S. "Wanting" versus "needing" related value: An fMRI meta-analysis. Brain Behav 2022; 12:e32713. [PMID: 36000558 PMCID: PMC9480935 DOI: 10.1002/brb3.2713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 07/04/2022] [Indexed: 11/30/2022] Open
Abstract
Consumption and its excesses are sometimes explained by imbalance of need or lack of control over "wanting." "Wanting" assigns value to cues that predict rewards, whereas "needing" assigns value to biologically significant stimuli that one is deprived of. Here we aimed at studying how the brain activation patterns related to value of "wanted" stimuli differs from that of "needed" stimuli using activation likelihood estimation neuroimaging meta-analysis approaches. We used the perception of a cue predicting a reward for "wanting" related value and the perception of food stimuli in a hungry state as a model for "needing" related value. We carried out separate, contrasts, and conjunction meta-analyses to identify differences and similarities between "wanting" and "needing" values. Our overall results for "wanting" related value show consistent activation of the ventral tegmental area, striatum, and pallidum, regions that both activate behavior and direct choice, while for "needing" related value, we found an overall consistent activation of the middle insula and to some extent the caudal-ventral putamen, regions that only direct choice. Our study suggests that wanting has more control on consumption and behavioral activation.
Collapse
Affiliation(s)
- Juvenal Bosulu
- Faculté Des Arts et des Sciences, Université de Montréal, Montréal, Canada
| | | | | | - Yi Luo
- School of Psychology and Cognitive Science, East China Normal University, Shanghai, China
| | - Simon Eickhoff
- Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.,Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Centre Jülich, Jülich, Germany
| | - Sébastien Hétu
- Faculté Des Arts et des Sciences, Université de Montréal, Montréal, Canada
| |
Collapse
|
10
|
de Jong JW, Fraser KM, Lammel S. Mesoaccumbal Dopamine Heterogeneity: What Do Dopamine Firing and Release Have to Do with It? Annu Rev Neurosci 2022; 45:109-129. [PMID: 35226827 PMCID: PMC9271543 DOI: 10.1146/annurev-neuro-110920-011929] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Ventral tegmental area (VTA) dopamine (DA) neurons are often thought to uniformly encode reward prediction errors. Conversely, DA release in the nucleus accumbens (NAc), the prominent projection target of these neurons, has been implicated in reinforcement learning, motivation, aversion, and incentive salience. This contrast between heterogeneous functions of DA release versus a homogeneous role for DA neuron activity raises numerous questions regarding how VTA DA activity translates into NAc DA release. Further complicating this issue is increasing evidence that distinct VTA DA projections into defined NAc subregions mediate diverse behavioral functions. Here, we evaluate evidence for heterogeneity within the mesoaccumbal DA system and argue that frameworks of DA function must incorporate the precise topographic organization of VTA DA neurons to clarify their contribution to health and disease.
Collapse
Affiliation(s)
- Johannes W de Jong
- Department of Molecular and Cell Biology and Helen Wills Neuroscience Institute, University of California, Berkeley, California, USA;
| | - Kurt M Fraser
- Department of Molecular and Cell Biology and Helen Wills Neuroscience Institute, University of California, Berkeley, California, USA;
| | - Stephan Lammel
- Department of Molecular and Cell Biology and Helen Wills Neuroscience Institute, University of California, Berkeley, California, USA;
| |
Collapse
|
11
|
Kaushik P, Naudé J, Raju SB, Alexandre F. A VTA GABAergic computational model of dissociated reward prediction error computation in classical conditioning. Neurobiol Learn Mem 2022; 193:107653. [PMID: 35772681 DOI: 10.1016/j.nlm.2022.107653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Revised: 06/15/2022] [Accepted: 06/22/2022] [Indexed: 10/17/2022]
Abstract
Classical Conditioning is a fundamental learning mechanism where the Ventral Striatum is generally thought to be the source of inhibition to Ventral Tegmental Area (VTA) Dopamine neurons when a reward is expected. However, recent evidences point to a new candidate in VTA GABA encoding expectation for computing the reward prediction error in the VTA. In this system-level computational model, the VTA GABA signal is hypothesised to be a combination of magnitude and timing computed in the Peduncolopontine and Ventral Striatum respectively. This dissociation enables the model to explain recent results wherein Ventral Striatum lesions affected the temporal expectation of the reward but the magnitude of the reward was intact. This model also exhibits other features in classical conditioning namely, progressively decreasing firing for early rewards closer to the actual reward, twin peaks of VTA dopamine during training and cancellation of US dopamine after training.
Collapse
Affiliation(s)
- Pramod Kaushik
- International Institute of Information Technology, Hyderabad, India; Inria Bordeaux Sud-Ouest, Talence, France
| | - Jérémie Naudé
- Institut de Génomique Fonctionnelle, Université Montpellier, Centre National de la Recherche Scientifique, Institut National de la Santé et de la Recherche Médicale, Montpellier, France
| | | | - Frédéric Alexandre
- Inria Bordeaux Sud-Ouest, Talence, France; LaBRI, University of Bordeaux, Bordeaux INP, CNRS, UMR 5800, Talence, France; Institute of Neurodegenerative Diseases, University of Bordeaux, CNRS, UMR 5293, Bordeaux, France.
| |
Collapse
|
12
|
Winkelmeier L, Filosa C, Hartig R, Scheller M, Sack M, Reinwald JR, Becker R, Wolf D, Gerchen MF, Sartorius A, Meyer-Lindenberg A, Weber-Fahr W, Clemm von Hohenberg C, Russo E, Kelsch W. Striatal hub of dynamic and stabilized prediction coding in forebrain networks for olfactory reinforcement learning. Nat Commun 2022; 13:3305. [PMID: 35676281 PMCID: PMC9177857 DOI: 10.1038/s41467-022-30978-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Accepted: 05/19/2022] [Indexed: 11/13/2022] Open
Abstract
Identifying the circuits responsible for cognition and understanding their embedded computations is a challenge for neuroscience. We establish here a hierarchical cross-scale approach, from behavioral modeling and fMRI in task-performing mice to cellular recordings, in order to disentangle local network contributions to olfactory reinforcement learning. At mesoscale, fMRI identifies a functional olfactory-striatal network interacting dynamically with higher-order cortices. While primary olfactory cortices respectively contribute only some value components, the downstream olfactory tubercle of the ventral striatum expresses comprehensively reward prediction, its dynamic updating, and prediction error components. In the tubercle, recordings reveal two underlying neuronal populations with non-redundant reward prediction coding schemes. One population collectively produces stabilized predictions as distributed activity across neurons; in the other, neurons encode value individually and dynamically integrate the recent history of uncertain outcomes. These findings validate a cross-scale approach to mechanistic investigations of higher cognitive functions in rodents. Where and how the brain learns from experience is not fully understood. Here the authors use a hierarchical approach from behavioural modelling to systems fMRI to cellular coding reveals brain mechanisms for history informed updating of future predictions.
Collapse
Affiliation(s)
- Laurens Winkelmeier
- Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, 68159, Mannheim, Germany
| | - Carla Filosa
- Department of Psychiatry and Psychotherapy, University Medical Center, Johannes Gutenberg University, 55131, Mainz, Germany
| | - Renée Hartig
- Department of Psychiatry and Psychotherapy, University Medical Center, Johannes Gutenberg University, 55131, Mainz, Germany
| | - Max Scheller
- Department of Psychiatry and Psychotherapy, University Medical Center, Johannes Gutenberg University, 55131, Mainz, Germany
| | - Markus Sack
- Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, 68159, Mannheim, Germany
| | - Jonathan R Reinwald
- Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, 68159, Mannheim, Germany
| | - Robert Becker
- Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, 68159, Mannheim, Germany
| | - David Wolf
- Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, 68159, Mannheim, Germany
| | - Martin Fungisai Gerchen
- Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, 68159, Mannheim, Germany
| | - Alexander Sartorius
- Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, 68159, Mannheim, Germany
| | - Andreas Meyer-Lindenberg
- Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, 68159, Mannheim, Germany
| | - Wolfgang Weber-Fahr
- Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, 68159, Mannheim, Germany
| | | | - Eleonora Russo
- Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, 68159, Mannheim, Germany.,Department of Psychiatry and Psychotherapy, University Medical Center, Johannes Gutenberg University, 55131, Mainz, Germany
| | - Wolfgang Kelsch
- Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, 68159, Mannheim, Germany. .,Department of Psychiatry and Psychotherapy, University Medical Center, Johannes Gutenberg University, 55131, Mainz, Germany.
| |
Collapse
|
13
|
Song M, Takahashi YK, Burton AC, Roesch MR, Schoenbaum G, Niv Y, Langdon AJ. Minimal cross-trial generalization in learning the representation of an odor-guided choice task. PLoS Comput Biol 2022; 18:e1009897. [PMID: 35333867 PMCID: PMC8986096 DOI: 10.1371/journal.pcbi.1009897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2021] [Revised: 04/06/2022] [Accepted: 02/04/2022] [Indexed: 11/18/2022] Open
Abstract
There is no single way to represent a task. Indeed, despite experiencing the same task events and contingencies, different subjects may form distinct task representations. As experimenters, we often assume that subjects represent the task as we envision it. However, such a representation cannot be taken for granted, especially in animal experiments where we cannot deliver explicit instruction regarding the structure of the task. Here, we tested how rats represent an odor-guided choice task in which two odor cues indicated which of two responses would lead to reward, whereas a third odor indicated free choice among the two responses. A parsimonious task representation would allow animals to learn from the forced trials what is the better option to choose in the free-choice trials. However, animals may not necessarily generalize across odors in this way. We fit reinforcement-learning models that use different task representations to trial-by-trial choice behavior of individual rats performing this task, and quantified the degree to which each animal used the more parsimonious representation, generalizing across trial types. Model comparison revealed that most rats did not acquire this representation despite extensive experience. Our results demonstrate the importance of formally testing possible task representations that can afford the observed behavior, rather than assuming that animals’ task representations abide by the generative task structure that governs the experimental design. To study how animals learn and make decisions, scientists design experiments, train animals to perform them, and observe how they behave. During this process, an important but rarely asked question is how animals understand the experiment. Merely through observing animals’ behavior in a task, it is often hard to determine if they understand the task in the same way as the experimenter expects. Assuming that animals represent tasks differently than they actually do may lead to incorrect interpretations of behavioral or neural results. Here, we compared different possible representations for a simple reward-learning task in terms of how well these alternative models explain animal’s choice behavior. We found that rats did not represent the task in the most parsimonious way, thereby failing to learn from forced-choice trials what rewards are available on free-choice trials, despite extensive training on the task. These results caution against simply assuming that animals’ understanding of a task corresponds to the way the task was designed.
Collapse
Affiliation(s)
- Mingyu Song
- Princeton Neuroscience Institute, Princeton University, Princeton, New Jersey, United States of America
- * E-mail: (MS); (AJL)
| | - Yuji K. Takahashi
- National Institute on Drug Abuse Intramural Research Program, NIH, Baltimore, Maryland, United States of America
| | - Amanda C. Burton
- Department of Psychology, University of Maryland, College Park, Maryland, United States of America
- Program in Neuroscience and Cognitive Science, University of Maryland, College Park, Maryland, United States of America
| | - Matthew R. Roesch
- Department of Psychology, University of Maryland, College Park, Maryland, United States of America
- Program in Neuroscience and Cognitive Science, University of Maryland, College Park, Maryland, United States of America
| | - Geoffrey Schoenbaum
- National Institute on Drug Abuse Intramural Research Program, NIH, Baltimore, Maryland, United States of America
| | - Yael Niv
- Princeton Neuroscience Institute, Princeton University, Princeton, New Jersey, United States of America
- Department of Psychology, Princeton University, Princeton, New Jersey, United States of America
| | - Angela J. Langdon
- Princeton Neuroscience Institute, Princeton University, Princeton, New Jersey, United States of America
- Department of Psychology, Princeton University, Princeton, New Jersey, United States of America
- * E-mail: (MS); (AJL)
| |
Collapse
|
14
|
Hollon NG, Williams EW, Howard CD, Li H, Traut TI, Jin X. Nigrostriatal dopamine signals sequence-specific action-outcome prediction errors. Curr Biol 2021; 31:5350-5363.e5. [PMID: 34637751 DOI: 10.1016/j.cub.2021.09.040] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 08/31/2021] [Accepted: 09/15/2021] [Indexed: 01/08/2023]
Abstract
Dopamine has been suggested to encode cue-reward prediction errors during Pavlovian conditioning, signaling discrepancies between actual versus expected reward predicted by the cues.1-5 While this theory has been widely applied to reinforcement learning concerning instrumental actions, whether dopamine represents action-outcome prediction errors and how it controls sequential behavior remain largely unknown. The vast majority of previous studies examining dopamine responses primarily have used discrete reward-predictive stimuli,1-15 whether Pavlovian conditioned stimuli for which no action is required to earn reward or explicit discriminative stimuli that essentially instruct an animal how and when to respond for reward. Here, by training mice to perform optogenetic intracranial self-stimulation, we examined how self-initiated goal-directed behavior influences nigrostriatal dopamine transmission during single and sequential instrumental actions, in behavioral contexts with minimal overt changes in the animal's external environment. We found that dopamine release evoked by direct optogenetic stimulation was dramatically reduced when delivered as the consequence of the animal's own action, relative to non-contingent passive stimulation. This dopamine suppression generalized to food rewards was specific to the reinforced action, was temporally restricted to counteract the expected outcome, and exhibited sequence-selectivity consistent with hierarchical control of sequential behavior. These findings demonstrate that nigrostriatal dopamine signals sequence-specific prediction errors in action-outcome associations, with fundamental implications for reinforcement learning and instrumental behavior in health and disease.
Collapse
Affiliation(s)
- Nick G Hollon
- Molecular Neurobiology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Elora W Williams
- Molecular Neurobiology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Christopher D Howard
- Molecular Neurobiology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Hao Li
- Molecular Neurobiology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Tavish I Traut
- Molecular Neurobiology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Xin Jin
- Molecular Neurobiology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA; Center for Motor Control and Disease, Key Laboratory of Brain Functional Genomics, East China Normal University, Shanghai 200062, China; NYU-ECNU Institute of Brain and Cognitive Science, New York University Shanghai, Shanghai 200062, China.
| |
Collapse
|
15
|
Kutlu MG, Zachry JE, Melugin PR, Cajigas SA, Chevee MF, Kelly SJ, Kutlu B, Tian L, Siciliano CA, Calipari ES. Dopamine release in the nucleus accumbens core signals perceived saliency. Curr Biol 2021; 31:4748-4761.e8. [PMID: 34529938 PMCID: PMC9084920 DOI: 10.1016/j.cub.2021.08.052] [Citation(s) in RCA: 81] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 07/15/2021] [Accepted: 08/18/2021] [Indexed: 12/01/2022]
Abstract
A large body of work has aimed to define the precise information encoded by dopaminergic projections innervating the nucleus accumbens (NAc). Prevailing models are based on reward prediction error (RPE) theory, in which dopamine updates associations between rewards and predictive cues by encoding perceived errors between predictions and outcomes. However, RPE cannot describe multiple phenomena to which dopamine is inextricably linked, such as behavior driven by aversive and neutral stimuli. We combined a series of behavioral tasks with direct, subsecond dopamine monitoring in the NAc of mice, machine learning, computational modeling, and optogenetic manipulations to describe behavior and related dopamine release patterns across multiple contingencies reinforced by differentially valenced outcomes. We show that dopamine release only conforms to RPE predictions in a subset of learning scenarios but fits valence-independent perceived saliency encoding across conditions. Here, we provide an extended, comprehensive framework for accumbal dopamine release in behavioral control.
Collapse
Affiliation(s)
- Munir Gunes Kutlu
- Department of Pharmacology, Vanderbilt University, Nashville, TN 37232, USA
| | - Jennifer E Zachry
- Department of Pharmacology, Vanderbilt University, Nashville, TN 37232, USA
| | - Patrick R Melugin
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN 37232, USA
| | - Stephanie A Cajigas
- Department of Pharmacology, Vanderbilt University, Nashville, TN 37232, USA; Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN 37232, USA
| | - Maxime F Chevee
- Department of Pharmacology, Vanderbilt University, Nashville, TN 37232, USA
| | - Shannon J Kelly
- Department of Pharmacology, Vanderbilt University, Nashville, TN 37232, USA
| | - Banu Kutlu
- Department of Pharmacology, Vanderbilt University, Nashville, TN 37232, USA; Libraries Strategic Technologies, Penn State University Libraries, University Park, PA 16802, USA
| | - Lin Tian
- Department of Biochemistry and Molecular Medicine, University of California, Davis, Sacramento, CA 95817, USA
| | - Cody A Siciliano
- Department of Pharmacology, Vanderbilt University, Nashville, TN 37232, USA; Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN 37232, USA; Vanderbilt Center for Addiction Research, Vanderbilt University, Nashville, TN 37232, USA
| | - Erin S Calipari
- Department of Pharmacology, Vanderbilt University, Nashville, TN 37232, USA; Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN 37232, USA; Vanderbilt Center for Addiction Research, Vanderbilt University, Nashville, TN 37232, USA; Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37232, USA; Department of Psychiatry and Behavioral Sciences, Vanderbilt University, Nashville, TN 37232, USA.
| |
Collapse
|
16
|
Langdon A, Botvinick M, Nakahara H, Tanaka K, Matsumoto M, Kanai R. Meta-learning, social cognition and consciousness in brains and machines. Neural Netw 2021; 145:80-89. [PMID: 34735893 DOI: 10.1016/j.neunet.2021.10.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Revised: 09/20/2021] [Accepted: 10/01/2021] [Indexed: 12/11/2022]
Abstract
The intersection between neuroscience and artificial intelligence (AI) research has created synergistic effects in both fields. While neuroscientific discoveries have inspired the development of AI architectures, new ideas and algorithms from AI research have produced new ways to study brain mechanisms. A well-known example is the case of reinforcement learning (RL), which has stimulated neuroscience research on how animals learn to adjust their behavior to maximize reward. In this review article, we cover recent collaborative work between the two fields in the context of meta-learning and its extension to social cognition and consciousness. Meta-learning refers to the ability to learn how to learn, such as learning to adjust hyperparameters of existing learning algorithms and how to use existing models and knowledge to efficiently solve new tasks. This meta-learning capability is important for making existing AI systems more adaptive and flexible to efficiently solve new tasks. Since this is one of the areas where there is a gap between human performance and current AI systems, successful collaboration should produce new ideas and progress. Starting from the role of RL algorithms in driving neuroscience, we discuss recent developments in deep RL applied to modeling prefrontal cortex functions. Even from a broader perspective, we discuss the similarities and differences between social cognition and meta-learning, and finally conclude with speculations on the potential links between intelligence as endowed by model-based RL and consciousness. For future work we highlight data efficiency, autonomy and intrinsic motivation as key research areas for advancing both fields.
Collapse
Affiliation(s)
- Angela Langdon
- Princeton Neuroscience Institute, Princeton University, USA
| | - Matthew Botvinick
- DeepMind, London, UK; Gatsby Computational Neuroscience Unit, University College London, London, UK
| | | | - Keiji Tanaka
- RIKEN Center for Brain Science, Wako, Saitama, Japan
| | - Masayuki Matsumoto
- Division of Biomedical Science, Faculty of Medicine, University of Tsukuba, Ibaraki, Japan; Graduate School of Comprehensive Human Sciences, University of Tsukuba, Ibaraki, Japan; Transborder Medical Research Center, University of Tsukuba, Ibaraki, Japan
| | | |
Collapse
|
17
|
Langdon AJ, Chaudhuri R. An evolving perspective on the dynamic brain: Notes from the Brain Conference on Dynamics of the brain: Temporal aspects of computation. Eur J Neurosci 2021; 53:3511-3524. [PMID: 32896026 PMCID: PMC7946155 DOI: 10.1111/ejn.14963] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Revised: 08/15/2020] [Accepted: 08/26/2020] [Indexed: 11/29/2022]
Affiliation(s)
- Angela J. Langdon
- Princeton Neuroscience Institute & Department of Psychology, Princeton University, Princeton, NJ, USA
| | - Rishidev Chaudhuri
- Center for Neuroscience, Department of Mathematics and Department of Neurobiology, Physiology & Behavior, University of California, Davis, Davis CA, USA
| |
Collapse
|
18
|
Iordanova MD, Yau JOY, McDannald MA, Corbit LH. Neural substrates of appetitive and aversive prediction error. Neurosci Biobehav Rev 2021; 123:337-351. [PMID: 33453307 PMCID: PMC7933120 DOI: 10.1016/j.neubiorev.2020.10.029] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 08/24/2020] [Accepted: 10/13/2020] [Indexed: 12/14/2022]
Abstract
Prediction error, defined by the discrepancy between real and expected outcomes, lies at the core of associative learning. Behavioural investigations have provided evidence that prediction error up- and down-regulates associative relationships, and allocates attention to stimuli to enable learning. These behavioural advances have recently been followed by investigations into the neurobiological substrates of prediction error. In the present paper, we review neuroscience data obtained using causal and recording neural methods from a variety of key behavioural designs. We explore the neurobiology of both appetitive (reward) and aversive (fear) prediction error with a focus on the mesolimbic dopamine system, the amygdala, ventrolateral periaqueductal gray, hippocampus, cortex and locus coeruleus noradrenaline. New questions and avenues for research are considered.
Collapse
Affiliation(s)
- Mihaela D Iordanova
- Department of Psychology/Centre for Studies in Behavioral Neurobiology, Concordia University, 7141 Sherbrooke St, Montreal, QC, H4B 1R6, Canada.
| | - Joanna Oi-Yue Yau
- School of Psychology, The University of New South Wales, UNSW Sydney, NSW, 2052, Australia.
| | - Michael A McDannald
- Department of Psychology & Neuroscience, Boston College, 140 Commonwealth Avenue, 514 McGuinn Hall, Chestnut Hill, MA, 02467, USA.
| | - Laura H Corbit
- Departments of Psychology and Cell and Systems Biology, University of Toronto, 100 St. George Street, Toronto, ON, M5S 3G3, Canada.
| |
Collapse
|
19
|
Sosa JLR, Buonomano D, Izquierdo A. The orbitofrontal cortex in temporal cognition. Behav Neurosci 2021; 135:154-164. [PMID: 34060872 DOI: 10.1037/bne0000430] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
One of the most important factors in decision-making is estimating the value of available options. Subregions of the prefrontal cortex, including the orbitofrontal cortex (OFC), have been deemed essential for this process. Value computations require a complex integration across numerous dimensions, including, reward magnitude, effort, internal state, and time. The importance of the temporal dimension is well illustrated by temporal discounting tasks, in which subjects select between smaller-sooner versus larger-later rewards. The specific role of OFC in telling time and integrating temporal information into decision-making remains unclear. Based on the current literature, in this review we reevaluate current theories of OFC function, accounting for the influence of time. Incorporating temporal information into value estimation and decision-making requires distinct, yet interrelated, forms of temporal information including the ability to tell time, represent time, create temporal expectations, and the ability to use this information for optimal decision-making in a wide range of tasks, including temporal discounting and wagering. We use the term "temporal cognition" to refer to the integrated use of these different aspects of temporal information. We suggest that the OFC may be a critical site for the integration of reward magnitude and delay, and thus important for temporal cognition. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Collapse
Affiliation(s)
| | - Dean Buonomano
- Department of Psychology, University of California-Los Angeles
| | | |
Collapse
|
20
|
Inglis JB, Valentin VV, Ashby FG. Modulation of Dopamine for Adaptive Learning: A Neurocomputational Model. COMPUTATIONAL BRAIN & BEHAVIOR 2021; 4:34-52. [PMID: 34151186 PMCID: PMC8210637 DOI: 10.1007/s42113-020-00083-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
There have been many proposals that learning rates in the brain are adaptive, in the sense that they increase or decrease depending on environmental conditions. The majority of these models are abstract and make no attempt to describe the neural circuitry that implements the proposed computations. This article describes a biologically detailed computational model that overcomes this shortcoming. Specifically, we propose a neural circuit that implements adaptive learning rates by modulating the gain on the dopamine response to reward prediction errors, and we model activity within this circuit at the level of spiking neurons. The model generates a dopamine signal that depends on the size of the tonically active dopamine neuron population and the phasic spike rate. The model was tested successfully against results from two single-neuron recording studies and a fast-scan cyclic voltammetry study. We conclude by discussing the general applicability of the model to dopamine mediated tasks that transcend the experimental phenomena it was initially designed to address.
Collapse
Affiliation(s)
- Jeffrey B Inglis
- Interdepartmental Graduate Program in Dynamical Neuroscience, University of California, Santa Barbara
| | - Vivian V Valentin
- Department of Psychological & Brain Sciences, University of California, Santa Barbara
| | - F Gregory Ashby
- Department of Psychological & Brain Sciences, University of California, Santa Barbara
| |
Collapse
|
21
|
A salience misattribution model for addictive-like behaviors. Neurosci Biobehav Rev 2021; 125:466-477. [PMID: 33657434 DOI: 10.1016/j.neubiorev.2021.02.039] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 02/22/2021] [Accepted: 02/23/2021] [Indexed: 11/21/2022]
Abstract
Adapting to the changing environment is a key component of optimal decision-making. Internal-models that accurately represent and selectively update from behaviorally relevant/salient stimuli may facilitate adaptive behaviors. Anterior cingulate cortex (ACC) and dopaminergic systems may produce these adaptive internal-models through selective updates from behaviorally relevant stimuli. Dysfunction of ACC and dopaminergic systems could therefore produce misaligned internal-models where updates are disproportionate to the salience of the cues. An aspect of addictive-like behaviors is reduced adaptation and, ACC and dopaminergic systems typically exhibit dysfunction in drug-dependents. We argue that ACC and dopaminergic dysfunction in dependents may produce misaligned internal-models such that drug-related stimuli are misattributed with a higher salience compared to non-drug related stimuli. Hence, drug-related rewarding stimuli generate over-weighted updates to the internal-model, while negative feedback and non-drug related rewarding stimuli generate down-weighted updates. This misaligned internal-model may therefore incorrectly reinforce maladaptive drug-related behaviors. We use the proposed framework to discuss ways behavior may be made more adaptive and how the framework may be supported or falsified experimentally.
Collapse
|
22
|
Cannon JJ, Patel AD. How Beat Perception Co-opts Motor Neurophysiology. Trends Cogn Sci 2020; 25:137-150. [PMID: 33353800 DOI: 10.1016/j.tics.2020.11.002] [Citation(s) in RCA: 82] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2020] [Revised: 11/06/2020] [Accepted: 11/12/2020] [Indexed: 02/08/2023]
Abstract
Beat perception offers cognitive scientists an exciting opportunity to explore how cognition and action are intertwined in the brain even in the absence of movement. Many believe the motor system predicts the timing of beats, yet current models of beat perception do not specify how this is neurally implemented. Drawing on recent insights into the neurocomputational properties of the motor system, we propose that beat anticipation relies on action-like processes consisting of precisely patterned neural time-keeping activity in the supplementary motor area (SMA), orchestrated and sequenced by activity in the dorsal striatum. In addition to synthesizing recent advances in cognitive science and motor neuroscience, our framework provides testable predictions to guide future work.
Collapse
Affiliation(s)
- Jonathan J Cannon
- Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Aniruddh D Patel
- Department of Psychology, Tufts University, Medford, MA, USA; Program in Brain, Mind, and Consciousness, Canadian Institute for Advanced Research (CIFAR), Toronto, CA.
| |
Collapse
|
23
|
Wang D, Si W, Luo Y. A Biologically Inspired Behavior Control for the Unexpected Uncertainty With Motivated Developmental Network. IEEE Trans Cogn Dev Syst 2020. [DOI: 10.1109/tcds.2019.2953944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
24
|
A quantitative reward prediction error signal in the ventral pallidum. Nat Neurosci 2020; 23:1267-1276. [PMID: 32778791 DOI: 10.1038/s41593-020-0688-5] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 07/07/2020] [Indexed: 01/04/2023]
Abstract
The nervous system is hypothesized to compute reward prediction errors (RPEs) to promote adaptive behavior. Correlates of RPEs have been observed in the midbrain dopamine system, but the extent to which RPE signals exist in other reward-processing regions is less well understood. In the present study, we quantified outcome history-based RPE signals in the ventral pallidum (VP), a basal ganglia region functionally linked to reward-seeking behavior. We trained rats to respond to reward-predicting cues, and we fit computational models to predict the firing rates of individual neurons at the time of reward delivery. We found that a subset of VP neurons encoded RPEs and did so more robustly than the nucleus accumbens, an input to the VP. VP RPEs predicted changes in task engagement, and optogenetic manipulation of the VP during reward delivery bidirectionally altered rats' subsequent reward-seeking behavior. Our data suggest a pivotal role for the VP in computing teaching signals that influence adaptive reward seeking.
Collapse
|
25
|
Oettl LL, Scheller M, Filosa C, Wieland S, Haag F, Loeb C, Durstewitz D, Shusterman R, Russo E, Kelsch W. Phasic dopamine reinforces distinct striatal stimulus encoding in the olfactory tubercle driving dopaminergic reward prediction. Nat Commun 2020; 11:3460. [PMID: 32651365 PMCID: PMC7351739 DOI: 10.1038/s41467-020-17257-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2020] [Accepted: 06/16/2020] [Indexed: 01/07/2023] Open
Abstract
The learning of stimulus-outcome associations allows for predictions about the environment. Ventral striatum and dopaminergic midbrain neurons form a larger network for generating reward prediction signals from sensory cues. Yet, the network plasticity mechanisms to generate predictive signals in these distributed circuits have not been entirely clarified. Also, direct evidence of the underlying interregional assembly formation and information transfer is still missing. Here we show that phasic dopamine is sufficient to reinforce the distinctness of stimulus representations in the ventral striatum even in the absence of reward. Upon such reinforcement, striatal stimulus encoding gives rise to interregional assemblies that drive dopaminergic neurons during stimulus-outcome learning. These assemblies dynamically encode the predicted reward value of conditioned stimuli. Together, our data reveal that ventral striatal and midbrain reward networks form a reinforcing loop to generate reward prediction coding. It is not entirely understood how network plasticity produces the coding of predicted value during stimulus-outcome learning. Here, the authors reveal a reinforcing loop in distributed limbic circuits, transforming sensory stimuli into reward prediction coding broadcasted by dopamine neurons to the brain.
Collapse
Affiliation(s)
- Lars-Lennart Oettl
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, 68159, Mannheim, Germany.,Sainsbury Wellcome Centre for Neural Circuits and Behaviour, London, W1T 4JG, UK
| | - Max Scheller
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, 68159, Mannheim, Germany
| | - Carla Filosa
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, 68159, Mannheim, Germany.,Department of Psychiatry and Psychotherapy, University Medical Center, Johannes Gutenberg University, 55131, Mainz, Germany
| | - Sebastian Wieland
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, 68159, Mannheim, Germany
| | - Franziska Haag
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, 68159, Mannheim, Germany
| | - Cathrin Loeb
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, 68159, Mannheim, Germany
| | - Daniel Durstewitz
- Department of Theoretical Neuroscience, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, 68159, Mannheim, Germany
| | - Roman Shusterman
- Institute of Neuroscience, University of Oregon, Eugene, OR, 97403, USA
| | - Eleonora Russo
- Department of Theoretical Neuroscience, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, 68159, Mannheim, Germany.
| | - Wolfgang Kelsch
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, 68159, Mannheim, Germany. .,Department of Psychiatry and Psychotherapy, University Medical Center, Johannes Gutenberg University, 55131, Mainz, Germany.
| |
Collapse
|
26
|
Széll A, Martínez-Bellver S, Hegedüs P, Hangya B. OPETH: Open Source Solution for Real-Time Peri-Event Time Histogram Based on Open Ephys. Front Neuroinform 2020; 14:21. [PMID: 32508613 PMCID: PMC7251067 DOI: 10.3389/fninf.2020.00021] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Accepted: 04/17/2020] [Indexed: 12/22/2022] Open
Abstract
Single cell electrophysiology remains one of the most widely used approaches of systems neuroscience. Decisions made by the experimenter during electrophysiology recording largely determine recording quality, duration of the project and value of the collected data. Therefore, online feedback aiding these decisions can lower monetary and time investment, and substantially speed up projects as well as allow novel studies otherwise not possible due to prohibitively low throughput. Real-time feedback is especially important in studies that involve optogenetic cell type identification by enabling a systematic search for neurons of interest. However, such tools are scarce and limited to costly commercial systems with high degree of specialization, which hitherto prevented wide-ranging benefits for the community. To address this, we present an open-source tool that enables online feedback during electrophysiology experiments and provides a Python interface for the widely used Open Ephys open source data acquisition system. Specifically, our software allows flexible online visualization of spike alignment to external events, called the online peri-event time histogram (OPETH). These external events, conveyed by digital logic signals, may indicate photostimulation time stamps for in vivo optogenetic cell type identification or the times of behaviorally relevant events during in vivo behavioral neurophysiology experiments. Therefore, OPETH allows real-time identification of genetically defined neuron types or behaviorally responsive populations. By allowing "hunting" for neurons of interest, OPETH significantly reduces experiment time and thus increases the efficiency of experiments that combine in vivo electrophysiology with behavior or optogenetic tagging of neurons.
Collapse
Affiliation(s)
- András Széll
- Lendület Laboratory of Systems Neuroscience, Institute of Experimental Medicine, Budapest, Hungary
| | - Sergio Martínez-Bellver
- Lendület Laboratory of Systems Neuroscience, Institute of Experimental Medicine, Budapest, Hungary
- Laboratory of Neural Circuitry, Faculty of Medicine and Dentistry, University of Valencia, Valencia, Spain
| | - Panna Hegedüs
- Lendület Laboratory of Systems Neuroscience, Institute of Experimental Medicine, Budapest, Hungary
- János Szentágothai Doctoral School of Neurosciences, Semmelweis University, Budapest, Hungary
| | - Balázs Hangya
- Lendület Laboratory of Systems Neuroscience, Institute of Experimental Medicine, Budapest, Hungary
| |
Collapse
|
27
|
Ergo K, De Loof E, Verguts T. Reward Prediction Error and Declarative Memory. Trends Cogn Sci 2020; 24:388-397. [PMID: 32298624 DOI: 10.1016/j.tics.2020.02.009] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Revised: 02/03/2020] [Accepted: 02/22/2020] [Indexed: 01/04/2023]
Abstract
Learning based on reward prediction error (RPE) was originally proposed in the context of nondeclarative memory. We postulate that RPE may support declarative memory as well. Indeed, recent years have witnessed a number of independent empirical studies reporting effects of RPE on declarative memory. We provide a brief overview of these studies, identify emerging patterns, and discuss open issues such as the role of signed versus unsigned RPEs in declarative learning.
Collapse
Affiliation(s)
- Kate Ergo
- Department of Experimental Psychology, Ghent University, Henri Dunantlaan 2, B-9000 Ghent, Belgium
| | - Esther De Loof
- Department of Experimental Psychology, Ghent University, Henri Dunantlaan 2, B-9000 Ghent, Belgium
| | - Tom Verguts
- Department of Experimental Psychology, Ghent University, Henri Dunantlaan 2, B-9000 Ghent, Belgium.
| |
Collapse
|
28
|
Cheng Z, Cui R, Ge T, Yang W, Li B. Optogenetics: What it has uncovered in potential pathways of depression. Pharmacol Res 2020; 152:104596. [DOI: 10.1016/j.phrs.2019.104596] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Revised: 11/29/2019] [Accepted: 12/11/2019] [Indexed: 01/07/2023]
|
29
|
Grabenhorst M, Michalareas G, Maloney LT, Poeppel D. The anticipation of events in time. Nat Commun 2019; 10:5802. [PMID: 31862912 PMCID: PMC6925136 DOI: 10.1038/s41467-019-13849-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 11/25/2019] [Indexed: 11/09/2022] Open
Abstract
Humans anticipate events signaled by sensory cues. It is commonly assumed that two uncertainty parameters modulate the brain's capacity to predict: the hazard rate (HR) of event probability and the uncertainty in time estimation which increases with elapsed time. We investigate both assumptions by presenting event probability density functions (PDFs) in each of three sensory modalities. We show that perceptual systems use the reciprocal PDF and not the HR to model event probability density. We also demonstrate that temporal uncertainty does not necessarily grow with elapsed time but can also diminish, depending on the event PDF. Previous research identified neuronal activity related to event probability in multiple levels of the cortical hierarchy (sensory (V4), association (LIP), motor and other areas) proposing the HR as an elementary neuronal computation. Our results-consistent across vision, audition, and somatosensation-suggest that the neurobiological implementation of event anticipation is based on a different, simpler and more stable computation than HR: the reciprocal PDF of events in time.
Collapse
Affiliation(s)
- Matthias Grabenhorst
- Neuroscience Department, Max-Planck-Institute for Empirical Aesthetics, Grüneburgweg 14, 60322, Frankfurt, Germany.
| | - Georgios Michalareas
- Neuroscience Department, Max-Planck-Institute for Empirical Aesthetics, Grüneburgweg 14, 60322, Frankfurt, Germany
| | - Laurence T Maloney
- Department of Psychology, Center for Neural Science, 6 Washington Place, New York, NY, 10003, USA
| | - David Poeppel
- Neuroscience Department, Max-Planck-Institute for Empirical Aesthetics, Grüneburgweg 14, 60322, Frankfurt, Germany.,Department of Psychology, Center for Neural Science, 6 Washington Place, New York, NY, 10003, USA
| |
Collapse
|
30
|
Abstract
Midbrain dopamine signals are widely thought to report reward prediction errors that drive learning in the basal ganglia. However, dopamine has also been implicated in various probabilistic computations, such as encoding uncertainty and controlling exploration. Here, we show how these different facets of dopamine signalling can be brought together under a common reinforcement learning framework. The key idea is that multiple sources of uncertainty impinge on reinforcement learning computations: uncertainty about the state of the environment, the parameters of the value function and the optimal action policy. Each of these sources plays a distinct role in the prefrontal cortex-basal ganglia circuit for reinforcement learning and is ultimately reflected in dopamine activity. The view that dopamine plays a central role in the encoding and updating of beliefs brings the classical prediction error theory into alignment with more recent theories of Bayesian reinforcement learning.
Collapse
Affiliation(s)
- Samuel J Gershman
- Department of Psychology, Center for Brain Science, Harvard University, Cambridge, MA, USA.
| | - Naoshige Uchida
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA, USA
| |
Collapse
|
31
|
Rusu SI, Pennartz CMA. Learning, memory and consolidation mechanisms for behavioral control in hierarchically organized cortico-basal ganglia systems. Hippocampus 2019; 30:73-98. [PMID: 31617622 PMCID: PMC6972576 DOI: 10.1002/hipo.23167] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Revised: 09/09/2019] [Accepted: 09/11/2019] [Indexed: 01/05/2023]
Abstract
This article aims to provide a synthesis on the question how brain structures cooperate to accomplish hierarchically organized behaviors, characterized by low‐level, habitual routines nested in larger sequences of planned, goal‐directed behavior. The functioning of a connected set of brain structures—prefrontal cortex, hippocampus, striatum, and dopaminergic mesencephalon—is reviewed in relation to two important distinctions: (a) goal‐directed as opposed to habitual behavior and (b) model‐based and model‐free learning. Recent evidence indicates that the orbitomedial prefrontal cortices not only subserve goal‐directed behavior and model‐based learning, but also code the “landscape” (task space) of behaviorally relevant variables. While the hippocampus stands out for its role in coding and memorizing world state representations, it is argued to function in model‐based learning but is not required for coding of action–outcome contingencies, illustrating that goal‐directed behavior is not congruent with model‐based learning. While the dorsolateral and dorsomedial striatum largely conform to the dichotomy between habitual versus goal‐directed behavior, ventral striatal functions go beyond this distinction. Next, we contextualize findings on coding of reward‐prediction errors by ventral tegmental dopamine neurons to suggest a broader role of mesencephalic dopamine cells, viz. in behavioral reactivity and signaling unexpected sensory changes. We hypothesize that goal‐directed behavior is hierarchically organized in interconnected cortico‐basal ganglia loops, where a limbic‐affective prefrontal‐ventral striatal loop controls action selection in a dorsomedial prefrontal–striatal loop, which in turn regulates activity in sensorimotor‐dorsolateral striatal circuits. This structure for behavioral organization requires alignment with mechanisms for memory formation and consolidation. We propose that frontal corticothalamic circuits form a high‐level loop for memory processing that initiates and temporally organizes nested activities in lower‐level loops, including the hippocampus and the ripple‐associated replay it generates. The evidence on hierarchically organized behavior converges with that on consolidation mechanisms in suggesting a frontal‐to‐caudal directionality in processing control.
Collapse
Affiliation(s)
- Silviu I Rusu
- Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands.,Research Priority Program Brain and Cognition, University of Amsterdam, Amsterdam, The Netherlands
| | - Cyriel M A Pennartz
- Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands.,Research Priority Program Brain and Cognition, University of Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
32
|
Abstract
Arguably, the most difficult part of learning is deciding what to learn about. Should I associate the positive outcome of safely completing a street-crossing with the situation 'the car approaching the crosswalk was red' or with 'the approaching car was slowing down'? In this Perspective, we summarize our recent research into the computational and neural underpinnings of 'representation learning'-how humans (and other animals) construct task representations that allow efficient learning and decision-making. We first discuss the problem of learning what to ignore when confronted with too much information, so that experience can properly generalize across situations. We then turn to the problem of augmenting perceptual information with inferred latent causes that embody unobservable task-relevant information, such as contextual knowledge. Finally, we discuss recent findings regarding the neural substrates of task representations that suggest the orbitofrontal cortex represents 'task states', deploying them for decision-making and learning elsewhere in the brain.
Collapse
|
33
|
Takahashi YK, Stalnaker TA, Marrero-Garcia Y, Rada RM, Schoenbaum G. Expectancy-Related Changes in Dopaminergic Error Signals Are Impaired by Cocaine Self-Administration. Neuron 2019; 101:294-306.e3. [PMID: 30653935 DOI: 10.1016/j.neuron.2018.11.025] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2018] [Revised: 08/30/2018] [Accepted: 11/13/2018] [Indexed: 11/29/2022]
Abstract
Addiction is a disorder of behavioral control and learning. While this may reflect pre-existing propensities, drug use also clearly contributes by causing changes in outcome processing in prefrontal and striatal regions. This altered processing is associated with behavioral deficits, including changes in learning. These areas provide critical input to midbrain dopamine neurons regarding expected outcomes, suggesting that effects on learning may result from changes in dopaminergic error signaling. Here, we show that dopamine neurons recorded in rats that had self-administered cocaine failed to suppress firing on omission of an expected reward and exhibited lower amplitude and imprecisely timed increases in firing to an unexpected reward. Learning also appeared to have less of an effect on reward-evoked and cue-evoked firing in the cocaine-experienced rats. Overall, the changes are consistent with reduced fidelity of input regarding the expected outcomes, such as their size, timing, and overall value, because of cocaine use.
Collapse
Affiliation(s)
- Yuji K Takahashi
- Intramural Research program of the National Institute on Drug Abuse, NIH, Baltimore, MD 21224, USA.
| | - Thomas A Stalnaker
- Intramural Research program of the National Institute on Drug Abuse, NIH, Baltimore, MD 21224, USA
| | - Yasmin Marrero-Garcia
- Intramural Research program of the National Institute on Drug Abuse, NIH, Baltimore, MD 21224, USA
| | - Ray M Rada
- Intramural Research program of the National Institute on Drug Abuse, NIH, Baltimore, MD 21224, USA
| | - Geoffrey Schoenbaum
- Intramural Research program of the National Institute on Drug Abuse, NIH, Baltimore, MD 21224, USA; Department of Anatomy and Neurobiology, University of Maryland School of Medicine, Baltimore, MD 21201, USA; Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA.
| |
Collapse
|
34
|
Langdon AJ, Song M, Niv Y. Uncovering the 'state': Tracing the hidden state representations that structure learning and decision-making. Behav Processes 2019; 167:103891. [PMID: 31381985 DOI: 10.1016/j.beproc.2019.103891] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 05/23/2019] [Accepted: 06/21/2019] [Indexed: 02/02/2023]
Abstract
We review the abstract concept of a 'state' - an internal representation posited by reinforcement learning theories to be used by an agent, whether animal, human or artificial, to summarize the features of the external and internal environment that are relevant for future behavior on a particular task. Armed with this summary representation, an agent can make decisions and perform actions to interact effectively with the world. Here, we review recent findings from the neurobiological and behavioral literature to ask: 'what is a state?' with respect to the internal representations that organize learning and decision making across a range of tasks. We find that state representations include information beyond a straightforward summary of the immediate cues in the environment, providing timing or contextual information from the recent or more distant past, which allows these additional factors to influence decision making and other goal-directed behaviors in complex and perhaps unexpected ways.
Collapse
Affiliation(s)
- Angela J Langdon
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ, 08544, United States.
| | - Mingyu Song
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ, 08544, United States
| | - Yael Niv
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ, 08544, United States.
| |
Collapse
|
35
|
Paton JJ, Buonomano DV. The Neural Basis of Timing: Distributed Mechanisms for Diverse Functions. Neuron 2019; 98:687-705. [PMID: 29772201 DOI: 10.1016/j.neuron.2018.03.045] [Citation(s) in RCA: 179] [Impact Index Per Article: 35.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2018] [Revised: 02/26/2018] [Accepted: 03/24/2018] [Indexed: 12/15/2022]
Abstract
Timing is critical to most forms of learning, behavior, and sensory-motor processing. Converging evidence supports the notion that, precisely because of its importance across a wide range of brain functions, timing relies on intrinsic and general properties of neurons and neural circuits; that is, the brain uses its natural cellular and network dynamics to solve a diversity of temporal computations. Many circuits have been shown to encode elapsed time in dynamically changing patterns of neural activity-so-called population clocks. But temporal processing encompasses a wide range of different computations, and just as there are different circuits and mechanisms underlying computations about space, there are a multitude of circuits and mechanisms underlying the ability to tell time and generate temporal patterns.
Collapse
Affiliation(s)
- Joseph J Paton
- Champalimaud Research, Champalimaud Centre for the Unknown, Lisbon, Portugal.
| | - Dean V Buonomano
- Departments of Neurobiology and Psychology and Brain Research Institute, Integrative Center for Learning and Memory, University of California, Los Angeles, Los Angeles, CA, USA.
| |
Collapse
|
36
|
Lim TV, Cardinal RN, Savulich G, Jones PS, Moustafa AA, Robbins TW, Ersche KD. Impairments in reinforcement learning do not explain enhanced habit formation in cocaine use disorder. Psychopharmacology (Berl) 2019; 236:2359-2371. [PMID: 31372665 PMCID: PMC6695345 DOI: 10.1007/s00213-019-05330-z] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Accepted: 07/08/2019] [Indexed: 12/21/2022]
Abstract
RATIONALE Drug addiction has been suggested to develop through drug-induced changes in learning and memory processes. Whilst the initiation of drug use is typically goal-directed and hedonically motivated, over time, drug-taking may develop into a stimulus-driven habit, characterised by persistent use of the drug irrespective of the consequences. Converging lines of evidence suggest that stimulant drugs facilitate the transition of goal-directed into habitual drug-taking, but their contribution to goal-directed learning is less clear. Computational modelling may provide an elegant means for elucidating changes during instrumental learning that may explain enhanced habit formation. OBJECTIVES We used formal reinforcement learning algorithms to deconstruct the process of appetitive instrumental learning and to explore potential associations between goal-directed and habitual actions in patients with cocaine use disorder (CUD). METHODS We re-analysed appetitive instrumental learning data in 55 healthy control volunteers and 70 CUD patients by applying a reinforcement learning model within a hierarchical Bayesian framework. We used a regression model to determine the influence of learning parameters and variations in brain structure on subsequent habit formation. RESULTS Poor instrumental learning performance in CUD patients was largely determined by difficulties with learning from feedback, as reflected by a significantly reduced learning rate. Subsequent formation of habitual response patterns was partly explained by group status and individual variation in reinforcement sensitivity. White matter integrity within goal-directed networks was only associated with performance parameters in controls but not in CUD patients. CONCLUSIONS Our data indicate that impairments in reinforcement learning are insufficient to account for enhanced habitual responding in CUD.
Collapse
Affiliation(s)
- T V Lim
- Departments of Psychiatry, Psychology and Clinical Neurosciences, University of Cambridge, Herchel Smith Building for Brain & Mind Sciences, Cambridge Biomedical Campus, Cambridge, CB2 0SZ, UK
| | - R N Cardinal
- Departments of Psychiatry, Psychology and Clinical Neurosciences, University of Cambridge, Herchel Smith Building for Brain & Mind Sciences, Cambridge Biomedical Campus, Cambridge, CB2 0SZ, UK
- Behavioural and Clinical Neurosciences Institute, University of Cambridge, Cambridge, UK
- Liaison Psychiatry Service, Cambridgeshire & Peterborough NHS Foundation Trust, Box 190, Cambridge Biomedical Campus, Cambridge, CB2 0QQ, UK
| | - G Savulich
- Departments of Psychiatry, Psychology and Clinical Neurosciences, University of Cambridge, Herchel Smith Building for Brain & Mind Sciences, Cambridge Biomedical Campus, Cambridge, CB2 0SZ, UK
- Behavioural and Clinical Neurosciences Institute, University of Cambridge, Cambridge, UK
| | - P S Jones
- Departments of Psychiatry, Psychology and Clinical Neurosciences, University of Cambridge, Herchel Smith Building for Brain & Mind Sciences, Cambridge Biomedical Campus, Cambridge, CB2 0SZ, UK
| | - A A Moustafa
- School of Social Sciences and Psychology, MARCS Institute for Brain and Behaviour, Western Sydney University, Sydney, NSW, Australia
| | - T W Robbins
- Departments of Psychiatry, Psychology and Clinical Neurosciences, University of Cambridge, Herchel Smith Building for Brain & Mind Sciences, Cambridge Biomedical Campus, Cambridge, CB2 0SZ, UK
- Behavioural and Clinical Neurosciences Institute, University of Cambridge, Cambridge, UK
| | - K D Ersche
- Departments of Psychiatry, Psychology and Clinical Neurosciences, University of Cambridge, Herchel Smith Building for Brain & Mind Sciences, Cambridge Biomedical Campus, Cambridge, CB2 0SZ, UK.
- Behavioural and Clinical Neurosciences Institute, University of Cambridge, Cambridge, UK.
| |
Collapse
|
37
|
Drieu C, Zugaro M. Hippocampal Sequences During Exploration: Mechanisms and Functions. Front Cell Neurosci 2019; 13:232. [PMID: 31263399 PMCID: PMC6584963 DOI: 10.3389/fncel.2019.00232] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Accepted: 05/08/2019] [Indexed: 12/13/2022] Open
Abstract
Although the hippocampus plays a critical role in spatial and episodic memories, the mechanisms underlying memory formation, stabilization, and recall for adaptive behavior remain relatively unknown. During exploration, within single cycles of the ongoing theta rhythm that dominates hippocampal local field potentials, place cells form precisely ordered sequences of activity. These neural sequences result from the integration of both external inputs conveying sensory-motor information, and intrinsic network dynamics possibly related to memory processes. Their endogenous replay during subsequent sleep is critical for memory consolidation. The present review discusses possible mechanisms and functions of hippocampal theta sequences during exploration. We present several lines of evidence suggesting that these neural sequences play a key role in information processing and support the formation of initial memory traces, and discuss potential functional distinctions between neural sequences emerging during theta vs. awake sharp-wave ripples.
Collapse
Affiliation(s)
- Céline Drieu
- Center for Interdisciplinary Research in Biology, Collège de France, CNRS UMR 7241, INSERM U 1050, PSL Research University, Paris, France
| | - Michaël Zugaro
- Center for Interdisciplinary Research in Biology, Collège de France, CNRS UMR 7241, INSERM U 1050, PSL Research University, Paris, France
| |
Collapse
|
38
|
Chen R, Puzerey PA, Roeser AC, Riccelli TE, Podury A, Maher K, Farhang AR, Goldberg JH. Songbird Ventral Pallidum Sends Diverse Performance Error Signals to Dopaminergic Midbrain. Neuron 2019; 103:266-276.e4. [PMID: 31153647 DOI: 10.1016/j.neuron.2019.04.038] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Revised: 01/30/2019] [Accepted: 04/25/2019] [Indexed: 12/31/2022]
Abstract
Motor skills improve with practice, requiring outcomes to be evaluated against ever-changing performance benchmarks, yet it remains unclear how performance error signals are computed. Here, we show that the songbird ventral pallidum (VP) is required for song learning and sends diverse song timing and performance error signals to the ventral tegmental area (VTA). Viral tracing revealed inputs to VP from auditory and vocal motor thalamus, auditory and vocal motor cortex, and VTA. Our findings show that VP circuits, commonly associated with hedonic functions, signal performance error during motor sequence learning.
Collapse
Affiliation(s)
- Ruidong Chen
- Department of Neurobiology and Behavior, Cornell University, Ithaca, NY 14853, USA
| | - Pavel A Puzerey
- Department of Neurobiology and Behavior, Cornell University, Ithaca, NY 14853, USA
| | - Andrea C Roeser
- Department of Neurobiology and Behavior, Cornell University, Ithaca, NY 14853, USA
| | - Tori E Riccelli
- Department of Neurobiology and Behavior, Cornell University, Ithaca, NY 14853, USA
| | - Archana Podury
- Department of Neurobiology and Behavior, Cornell University, Ithaca, NY 14853, USA
| | - Kamal Maher
- Department of Neurobiology and Behavior, Cornell University, Ithaca, NY 14853, USA
| | - Alexander R Farhang
- Department of Neurobiology and Behavior, Cornell University, Ithaca, NY 14853, USA
| | - Jesse H Goldberg
- Department of Neurobiology and Behavior, Cornell University, Ithaca, NY 14853, USA.
| |
Collapse
|
39
|
Synchronicity: The Role of Midbrain Dopamine in Whole-Brain Coordination. eNeuro 2019; 6:ENEURO.0345-18.2019. [PMID: 31053604 PMCID: PMC6500793 DOI: 10.1523/eneuro.0345-18.2019] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2018] [Revised: 03/10/2019] [Accepted: 03/31/2019] [Indexed: 01/02/2023] Open
Abstract
Midbrain dopamine seems to play an outsized role in motivated behavior and learning. Widely associated with mediating reward-related behavior, decision making, and learning, dopamine continues to generate controversies in the field. While many studies and theories focus on what dopamine cells encode, the question of how the midbrain derives the information it encodes is poorly understood and comparatively less addressed. Recent anatomical studies suggest greater diversity and complexity of afferent inputs than previously appreciated, requiring rethinking of prior models. Here, we elaborate a hypothesis that construes midbrain dopamine as implementing a Bayesian selector in which individual dopamine cells sample afferent activity across distributed brain substrates, comprising evidence to be evaluated on the extent to which stimuli in the on-going sensorimotor stream organizes distributed, parallel processing, reflecting implicit value. To effectively generate a temporally resolved phasic signal, a population of dopamine cells must exhibit synchronous activity. We argue that synchronous activity across a population of dopamine cells signals consensus across distributed afferent substrates, invigorating responding to recognized opportunities and facilitating further learning. In framing our hypothesis, we shift from the question of how value is computed to the broader question of how the brain achieves coordination across distributed, parallel processing. We posit the midbrain is part of an “axis of agency” in which the prefrontal cortex (PFC), basal ganglia (BGS), and midbrain form an axis mediating control, coordination, and consensus, respectively.
Collapse
|
40
|
Byrne JEM, Tremain H, Leitan ND, Keating C, Johnson SL, Murray G. Circadian modulation of human reward function: Is there an evidentiary signal in existing neuroimaging studies? Neurosci Biobehav Rev 2019; 99:251-274. [PMID: 30721729 DOI: 10.1016/j.neubiorev.2019.01.025] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2018] [Revised: 01/22/2019] [Accepted: 01/24/2019] [Indexed: 12/22/2022]
Abstract
Reward functioning in animals is modulated by the circadian system, but such effects are poorly understood in the human case. The aim of this study was to address this deficit via a systematic review of human fMRI studies measuring one or more proxies for circadian function and a neural reward outcome. A narrative synthesis of 15 studies meeting inclusion criteria identified 13 studies that show a circadian impact on the human reward system, with four types of proxy (circadian system biology, downstream circadian rhythms, circadian challenge, and time of day) associated with neural reward activation. Specific reward-related regions/networks subserving this effect included the medial prefrontal cortex, ventral striatum, putamen and default mode network. The circadian effect was observed in measures of both reward anticipation and reward receipt, with more consistent evidence for the latter. Findings are limited by marked heterogeneity across study designs. We encourage a systematic program of research investigating circadian-reward interactions as an adapted biobehavioural feature and as an aetiological mechanism in reward-related pathologies.
Collapse
Affiliation(s)
- Jamie E M Byrne
- Centre for Mental Health, Swinburne University of Technology, PO Box 312 John St Hawthorn, VIC, 3122, Australia
| | - Hailey Tremain
- Centre for Mental Health, Swinburne University of Technology, PO Box 312 John St Hawthorn, VIC, 3122, Australia
| | - Nuwan D Leitan
- Centre for Mental Health, Swinburne University of Technology, PO Box 312 John St Hawthorn, VIC, 3122, Australia
| | - Charlotte Keating
- Centre for Mental Health, Swinburne University of Technology, PO Box 312 John St Hawthorn, VIC, 3122, Australia
| | - Sheri L Johnson
- Department of Psychology, University of California, Berkeley, 3210, Tolman Hall, Berkeley, CA, 94720-1650, USA
| | - Greg Murray
- Centre for Mental Health, Swinburne University of Technology, PO Box 312 John St Hawthorn, VIC, 3122, Australia.
| |
Collapse
|
41
|
Oemisch M, Westendorff S, Azimi M, Hassani SA, Ardid S, Tiesinga P, Womelsdorf T. Feature-specific prediction errors and surprise across macaque fronto-striatal circuits. Nat Commun 2019; 10:176. [PMID: 30635579 PMCID: PMC6329800 DOI: 10.1038/s41467-018-08184-9] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Accepted: 12/20/2018] [Indexed: 01/23/2023] Open
Abstract
To adjust expectations efficiently, prediction errors need to be associated with the precise features that gave rise to the unexpected outcome, but this credit assignment may be problematic if stimuli differ on multiple dimensions and it is ambiguous which feature dimension caused the outcome. Here, we report a potential solution: neurons in four recorded areas of the anterior fronto-striatal networks encode prediction errors that are specific to feature values of different dimensions of attended multidimensional stimuli. The most ubiquitous prediction error occurred for the reward-relevant dimension. Feature-specific prediction error signals a) emerge on average shortly after non-specific prediction error signals, b) arise earliest in the anterior cingulate cortex and later in dorsolateral prefrontal cortex, caudate and ventral striatum, and c) contribute to feature-based stimulus selection after learning. Thus, a widely-distributed feature-specific eligibility trace may be used to update synaptic weights for improved feature-based attention. In order to adjust expectations efficiently, prediction errors need to be associated with the features that gave rise to the unexpected outcome. Here, the authors show that neurons in anterior fronto-striatal networks encode prediction errors that are specific to feature values of different stimulus dimensions.
Collapse
Affiliation(s)
- Mariann Oemisch
- Department of Biology, Centre for Vision Research, York University, 4700 Keele Street, Toronto, ON, M6J 1P3, Canada. .,Department of Neuroscience, Yale University School of Medicine, New Haven, CT, 06510, USA.
| | - Stephanie Westendorff
- Department of Biology, Centre for Vision Research, York University, 4700 Keele Street, Toronto, ON, M6J 1P3, Canada.,Institute of Neurobiology, University of Tübingen, Tübingen, 72076, Germany
| | - Marzyeh Azimi
- Department of Biology, Centre for Vision Research, York University, 4700 Keele Street, Toronto, ON, M6J 1P3, Canada
| | - Seyed Alireza Hassani
- Department of Biology, Centre for Vision Research, York University, 4700 Keele Street, Toronto, ON, M6J 1P3, Canada.,Department of Psychology, Vanderbilt University, Nashville, TN, 37240, USA
| | - Salva Ardid
- Department of Mathematics and Statistics, Boston University, Boston, MA, 02215, USA
| | - Paul Tiesinga
- Donders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen, Nijmegen, 6525 EN, Netherlands
| | - Thilo Womelsdorf
- Department of Biology, Centre for Vision Research, York University, 4700 Keele Street, Toronto, ON, M6J 1P3, Canada. .,Department of Psychology, Vanderbilt University, Nashville, TN, 37240, USA.
| |
Collapse
|
42
|
Marković D, Reiter AMF, Kiebel SJ. Predicting change: Approximate inference under explicit representation of temporal structure in changing environments. PLoS Comput Biol 2019; 15:e1006707. [PMID: 30703108 PMCID: PMC6372216 DOI: 10.1371/journal.pcbi.1006707] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2018] [Revised: 02/12/2019] [Accepted: 12/11/2018] [Indexed: 11/18/2022] Open
Abstract
In our daily lives timing of our actions plays an essential role when we navigate the complex everyday environment. It is an open question though how the representations of the temporal structure of the world influence our behavior. Here we propose a probabilistic model with an explicit representation of state durations which may provide novel insights in how the brain predicts upcoming changes. We illustrate several properties of the behavioral model using a standard reversal learning design and compare its task performance to standard reinforcement learning models. Furthermore, using experimental data, we demonstrate how the model can be applied to identify participants' beliefs about the latent temporal task structure. We found that roughly one quarter of participants seem to have learned the latent temporal structure and used it to anticipate changes, whereas the remaining participants' behavior did not show signs of anticipatory responses, suggesting a lack of precise temporal expectations. We expect that the introduced behavioral model will allow, in future studies, for a systematic investigation of how participants learn the underlying temporal structure of task environments and how these representations shape behavior.
Collapse
Affiliation(s)
- Dimitrije Marković
- Department of Psychology, Technische Universität Dresden, Dresden, Germany
| | | | - Stefan J. Kiebel
- Department of Psychology, Technische Universität Dresden, Dresden, Germany
| |
Collapse
|
43
|
Zhang K, Chen CD, Monosov IE. Novelty, Salience, and Surprise Timing Are Signaled by Neurons in the Basal Forebrain. Curr Biol 2018; 29:134-142.e3. [PMID: 30581022 DOI: 10.1016/j.cub.2018.11.012] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Revised: 10/12/2018] [Accepted: 11/02/2018] [Indexed: 10/27/2022]
Abstract
The basal forebrain (BF) is a principal source of modulation of the neocortex [1-6] and is thought to regulate cognitive functions such as attention, motivation, and learning by broadcasting information about salience [2, 3, 5, 7-19]. However, events can be salient for multiple reasons-such as novelty, surprise, or reward prediction errors [20-24]-and to date, precisely which salience-related information the BF broadcasts is unclear. Here, we report that the primate BF contains at least two types of neurons that often process salient events in distinct manners: one with phasic burst responses to cues predicting salient events and one with ramping activity anticipating such events. Bursting neurons respond to cues that convey predictions about the magnitude, probability, and timing of primary reinforcements. They also burst to the reinforcement itself, particularly when it is unexpected. However, they do not have a selective response to reinforcement omission (the unexpected absence of an event). Thus, bursting neurons do not convey value-prediction errors but do signal surprise associated with external events. Indeed, they are not limited to processing primary reinforcement: they discriminate fully expected novel visual objects from familiar objects and respond to object-sequence violations. In contrast, ramping neurons predict the timing of many salient, novel, and surprising events. Their ramping activity is highly sensitive to the subjects' confidence in event timing and on average encodes the subjects' surprise after unexpected events occur. These data suggest that the primate BF contains mechanisms to anticipate the timing of a diverse set of important external events (via ramping activity) and to rapidly deploy cognitive resources when these events occur (via short latency bursting).
Collapse
Affiliation(s)
- Kaining Zhang
- Department of Neuroscience, Washington University in St. Louis, St. Louis, MO 63110; Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Charles D Chen
- Department of Neuroscience, Washington University in St. Louis, St. Louis, MO 63110
| | - Ilya E Monosov
- Department of Neuroscience, Washington University in St. Louis, St. Louis, MO 63110; Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, MO 63110, USA.
| |
Collapse
|
44
|
Abstract
Adaptive behavior requires animals to learn from experience. Ideally, learning should both promote choices that lead to rewards and reduce choices that lead to losses. Because the ventral striatum (VS) contains neurons that respond to aversive stimuli and aversive stimuli can drive dopamine release in the VS, it is possible that the VS contributes to learning about aversive outcomes, including losses. However, other work suggests that the VS may play a specific role in learning to choose among rewards, with other systems mediating learning from aversive outcomes. To examine the role of the VS in learning from gains and losses, we compared the performance of macaque monkeys with VS lesions and unoperated controls on a reinforcement learning task. In the task, the monkeys gained or lost tokens, which were periodically cashed out for juice, as outcomes for choices. They learned over trials to choose cues associated with gains, and not choose cues associated with losses. We found that monkeys with VS lesions had a deficit in learning to choose between cues that differed in reward magnitude. By contrast, monkeys with VS lesions performed as well as controls when choices involved a potential loss. We also fit reinforcement learning models to the behavior and compared learning rates between groups. Relative to controls, the monkeys with VS lesions had reduced learning rates for gain cues. Therefore, in this task, the VS plays a specific role in learning to choose between rewarding options.
Collapse
|
45
|
Gardner MPH, Schoenbaum G, Gershman SJ. Rethinking dopamine as generalized prediction error. Proc Biol Sci 2018; 285:20181645. [PMID: 30464063 PMCID: PMC6253385 DOI: 10.1098/rspb.2018.1645] [Citation(s) in RCA: 63] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2018] [Accepted: 11/02/2018] [Indexed: 01/10/2023] Open
Abstract
Midbrain dopamine neurons are commonly thought to report a reward prediction error (RPE), as hypothesized by reinforcement learning (RL) theory. While this theory has been highly successful, several lines of evidence suggest that dopamine activity also encodes sensory prediction errors unrelated to reward. Here, we develop a new theory of dopamine function that embraces a broader conceptualization of prediction errors. By signalling errors in both sensory and reward predictions, dopamine supports a form of RL that lies between model-based and model-free algorithms. This account remains consistent with current canon regarding the correspondence between dopamine transients and RPEs, while also accounting for new data suggesting a role for these signals in phenomena such as sensory preconditioning and identity unblocking, which ostensibly draw upon knowledge beyond reward predictions.
Collapse
Affiliation(s)
- Matthew P H Gardner
- Intramural Research Program of the National Institute on Drug Abuse, NIH, Bethesda, MD, USA
| | - Geoffrey Schoenbaum
- Intramural Research Program of the National Institute on Drug Abuse, NIH, Bethesda, MD, USA
- Department of Anatomy and Neurobiology, University of Maryland School of Medicine, Baltimore, MD, USA
- Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Samuel J Gershman
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA, USA
| |
Collapse
|
46
|
Burton AC, Bissonette GB, Vazquez D, Blume EM, Donnelly M, Heatley KC, Hinduja A, Roesch MR. Previous cocaine self-administration disrupts reward expectancy encoding in ventral striatum. Neuropsychopharmacology 2018; 43:2350-2360. [PMID: 29728645 PMCID: PMC6180050 DOI: 10.1038/s41386-018-0058-0] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Revised: 03/06/2018] [Accepted: 03/27/2018] [Indexed: 01/16/2023]
Abstract
The nucleus accumbens core (NAc) is important for integrating and providing information to downstream areas about the timing and value of anticipated reward. Although NAc is one of the first brain regions to be affected by drugs of abuse, we still do not know how neural correlates related to reward expectancy are affected by previous cocaine self-administration. To address this issue, we recorded from single neurons in the NAc of rats that had previously self-administered cocaine or sucrose (control). Neural recordings were then taken while rats performed an odor-guided decision-making task in which we independently manipulated value of expected reward by changing the delay to or size of reward across a series of trial blocks. We found that previous cocaine self-administration made rats more impulsive, biasing choice behavior toward more immediate reward. Further, compared to controls, cocaine-exposed rats showed significantly fewer neurons in the NAc that were responsive during odor cues and reward delivery, and in the reward-responsive neurons that remained, diminished directional and value encoding was observed. Lastly, we found that after cocaine exposure, reward-related firing during longer delays was reduced compared to controls. These results demonstrate that prior cocaine self-administration alters reward-expectancy encoding in NAc, which could contribute to poor decision making observed after chronic cocaine use.
Collapse
Affiliation(s)
- Amanda C Burton
- Department of Psychology, 1147 Biology-Psychology Building University of Maryland, College Park, MD, 20742, USA
- Program in Neuroscience and Cognitive Science, 1147 Biology-Psychology Building University of Maryland, College Park, MD, 20742, USA
| | - Gregory B Bissonette
- Department of Psychology, 1147 Biology-Psychology Building University of Maryland, College Park, MD, 20742, USA
- Program in Neuroscience and Cognitive Science, 1147 Biology-Psychology Building University of Maryland, College Park, MD, 20742, USA
| | - Daniela Vazquez
- Department of Psychology, 1147 Biology-Psychology Building University of Maryland, College Park, MD, 20742, USA
| | - Elyse M Blume
- Department of Psychology, 1147 Biology-Psychology Building University of Maryland, College Park, MD, 20742, USA
| | - Maria Donnelly
- Department of Psychology, 1147 Biology-Psychology Building University of Maryland, College Park, MD, 20742, USA
| | - Kendall C Heatley
- Department of Psychology, 1147 Biology-Psychology Building University of Maryland, College Park, MD, 20742, USA
| | - Abhishek Hinduja
- Department of Psychology, 1147 Biology-Psychology Building University of Maryland, College Park, MD, 20742, USA
| | - Matthew R Roesch
- Department of Psychology, 1147 Biology-Psychology Building University of Maryland, College Park, MD, 20742, USA.
- Program in Neuroscience and Cognitive Science, 1147 Biology-Psychology Building University of Maryland, College Park, MD, 20742, USA.
| |
Collapse
|
47
|
Gmaz JM, Carmichael JE, van der Meer MA. Persistent coding of outcome-predictive cue features in the rat nucleus accumbens. eLife 2018; 7:37275. [PMID: 30234485 PMCID: PMC6195350 DOI: 10.7554/elife.37275] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Accepted: 09/15/2018] [Indexed: 01/09/2023] Open
Abstract
The nucleus accumbens (NAc) is important for learning from feedback, and for biasing and invigorating behaviour in response to cues that predict motivationally relevant outcomes. NAc encodes outcome-related cue features such as the magnitude and identity of reward. However, little is known about how features of cues themselves are encoded. We designed a decision making task where rats learned multiple sets of outcome-predictive cues, and recorded single-unit activity in the NAc during performance. We found that coding of cue identity and location occurred alongside coding of expected outcome. Furthermore, this coding persisted both during a delay period, after the rat made a decision and was waiting for an outcome, and after the outcome was revealed. Encoding of cue features in the NAc may enable contextual modulation of on-going behaviour, and provide an eligibility trace of outcome-predictive stimuli for updating stimulus-outcome associations to inform future behaviour.
Collapse
Affiliation(s)
- Jimmie M Gmaz
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, United States
| | - James E Carmichael
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, United States
| | | |
Collapse
|
48
|
Babayan BM, Uchida N, Gershman SJ. Belief state representation in the dopamine system. Nat Commun 2018; 9:1891. [PMID: 29760401 PMCID: PMC5951832 DOI: 10.1038/s41467-018-04397-0] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2017] [Accepted: 04/26/2018] [Indexed: 12/19/2022] Open
Abstract
Learning to predict future outcomes is critical for driving appropriate behaviors. Reinforcement learning (RL) models have successfully accounted for such learning, relying on reward prediction errors (RPEs) signaled by midbrain dopamine neurons. It has been proposed that when sensory data provide only ambiguous information about which state an animal is in, it can predict reward based on a set of probabilities assigned to hypothetical states (called the belief state). Here we examine how dopamine RPEs and subsequent learning are regulated under state uncertainty. Mice are first trained in a task with two potential states defined by different reward amounts. During testing, intermediate-sized rewards are given in rare trials. Dopamine activity is a non-monotonic function of reward size, consistent with RL models operating on belief states. Furthermore, the magnitude of dopamine responses quantitatively predicts changes in behavior. These results establish the critical role of state inference in RL.
Collapse
Affiliation(s)
- Benedicte M Babayan
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, 16 Divinity Avenue, Cambridge, MA, 02138, USA
- Department of Psychology, Center for Brain Science, Harvard University, 52 Oxford Street, Cambridge, MA, 02138, USA
| | - Naoshige Uchida
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, 16 Divinity Avenue, Cambridge, MA, 02138, USA.
| | - Samuel J Gershman
- Department of Psychology, Center for Brain Science, Harvard University, 52 Oxford Street, Cambridge, MA, 02138, USA.
| |
Collapse
|
49
|
Firing of Putative Dopamine Neurons in Ventral Tegmental Area Is Modulated by Probability of Success during Performance of a Stop-Change Task. eNeuro 2018; 5:eN-NWR-0007-18. [PMID: 29687078 PMCID: PMC5909181 DOI: 10.1523/eneuro.0007-18.2018] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2018] [Revised: 03/19/2018] [Accepted: 04/03/2018] [Indexed: 11/29/2022] Open
Abstract
Response inhibition, the ability to refrain from unwanted actions, is an essential component of complex behavior and is often impaired across numerous neuropsychiatric disorders such as addiction, attention-deficit hyperactivity disorder (ADHD), schizophrenia, and obsessive-compulsive disorder. Accordingly, much research has been devoted to characterizing brain regions responsible for the regulation of response inhibition. The stop-signal task, a task in which animals are required to inhibit a prepotent response in the presence of a STOP cue, is one of the most well-studied tasks of response inhibition. While pharmacological evidence suggests that dopamine (DA) contributes to the regulation of response inhibition, what is exactly encoded by DA neurons during performance of response inhibition tasks is unknown. To address this issue, we recorded from single units in the ventral tegmental area (VTA), while rats performed a stop-change task. We found that putative DA neurons fired less and higher to cues and reward on STOP trials relative to GO trials, respectively, and that firing was reduced during errors. These results suggest that DA neurons in VTA encode the uncertainty associated with the probability of obtaining reward on difficult trials instead of the saliency associated with STOP cues or the need to resolve conflict between competing responses during response inhibition.
Collapse
|
50
|
Starkweather CK, Gershman SJ, Uchida N. The Medial Prefrontal Cortex Shapes Dopamine Reward Prediction Errors under State Uncertainty. Neuron 2018; 98:616-629.e6. [PMID: 29656872 DOI: 10.1016/j.neuron.2018.03.036] [Citation(s) in RCA: 74] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2017] [Revised: 01/31/2018] [Accepted: 03/20/2018] [Indexed: 11/16/2022]
Abstract
Animals make predictions based on currently available information. In natural settings, sensory cues may not reveal complete information, requiring the animal to infer the "hidden state" of the environment. The brain structures important in hidden state inference remain unknown. A previous study showed that midbrain dopamine neurons exhibit distinct response patterns depending on whether reward is delivered in 100% (task 1) or 90% of trials (task 2) in a classical conditioning task. Here we found that inactivation of the medial prefrontal cortex (mPFC) affected dopaminergic signaling in task 2, in which the hidden state must be inferred ("will reward come or not?"), but not in task 1, where the state was known with certainty. Computational modeling suggests that the effects of inactivation are best explained by a circuit in which the mPFC conveys inference over hidden states to the dopamine system. VIDEO ABSTRACT.
Collapse
Affiliation(s)
- Clara Kwon Starkweather
- Center for Brain Science, Department of Molecular and Cellular Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA
| | - Samuel J Gershman
- Center for Brain Science, Department of Psychology, Harvard University, 52 Oxford Street, Cambridge, MA 02138, USA.
| | - Naoshige Uchida
- Center for Brain Science, Department of Molecular and Cellular Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA.
| |
Collapse
|