1
|
Mishchanchuk K, Gregoriou G, Qü A, Kastler A, Huys QJM, Wilbrecht L, MacAskill AF. Hidden state inference requires abstract contextual representations in the ventral hippocampus. Science 2024; 386:926-932. [PMID: 39571013 DOI: 10.1126/science.adq5874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Accepted: 10/16/2024] [Indexed: 11/24/2024]
Abstract
The ability to use subjective, latent contextual representations to influence decision-making is crucial for everyday life. The hippocampus is hypothesized to bind together otherwise abstract combinations of stimuli to represent such latent contexts, to support the process of hidden state inference. Yet evidence for a role of the hippocampus in hidden state inference remains limited. We found that the ventral hippocampus is required for mice to perform hidden state inference during a two-armed bandit task. Hippocampal neurons differentiate the two abstract contexts required for this strategy in a manner similar to the differentiation of spatial locations, and their activity is essential for appropriate dopamine dynamics. These findings offer insight into how latent contextual information is used to optimize decisions, and they emphasize a key role for the hippocampus in hidden state inference.
Collapse
Affiliation(s)
- Karyna Mishchanchuk
- Department of Neuroscience, Physiology and Pharmacology, University College London, London, UK
| | - Gabrielle Gregoriou
- Department of Neuroscience, Physiology and Pharmacology, University College London, London, UK
| | - Albert Qü
- Helen Wills Institute of Neuroscience, Department of Psychology, University of California, Berkeley, CA, USA
- Center for Computational Biology, University of California, Berkeley, CA, USA
| | - Alizée Kastler
- Department of Neuroscience, Physiology and Pharmacology, University College London, London, UK
| | - Quentin J M Huys
- Applied Computational Psychiatry Lab, Mental Health Neuroscience Department, Division of Psychiatry and Max Planck UCL Centre for Computational Psychiatry and Ageing Research, Queen Square Institute of Neurology, University College London, London, UK
| | - Linda Wilbrecht
- Helen Wills Institute of Neuroscience, Department of Psychology, University of California, Berkeley, CA, USA
| | - Andrew F MacAskill
- Department of Neuroscience, Physiology and Pharmacology, University College London, London, UK
| |
Collapse
|
2
|
Qü AJ, Tai LH, Hall CD, Tu EM, Eckstein MK, Mishchanchuk K, Lin WC, Chase JB, MacAskill AF, Collins AGE, Gershman SJ, Wilbrecht L. Nucleus accumbens dopamine release reflects Bayesian inference during instrumental learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.10.566306. [PMID: 38014354 PMCID: PMC10680647 DOI: 10.1101/2023.11.10.566306] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Dopamine release in the nucleus accumbens has been hypothesized to signal reward prediction error, the difference between observed and predicted reward, suggesting a biological implementation for reinforcement learning. Rigorous tests of this hypothesis require assumptions about how the brain maps sensory signals to reward predictions, yet this mapping is still poorly understood. In particular, the mapping is non-trivial when sensory signals provide ambiguous information about the hidden state of the environment. Previous work using classical conditioning tasks has suggested that reward predictions are generated conditional on probabilistic beliefs about the hidden state, such that dopamine implicitly reflects these beliefs. Here we test this hypothesis in the context of an instrumental task (a two-armed bandit), where the hidden state switches repeatedly. We measured choice behavior and recorded dLight signals reflecting dopamine release in the nucleus accumbens core. Model comparison among a wide set of cognitive models based on the behavioral data favored models that used Bayesian updating of probabilistic beliefs. These same models also quantitatively matched the dopamine measurements better than non-Bayesian alternatives. We conclude that probabilistic belief computation contributes to instrumental task performance in mice and is reflected in mesolimbic dopamine signaling.
Collapse
Affiliation(s)
- Albert J. Qü
- Department of Psychology, University of California, Berkeley, CA, 94720, USA
- Center for Computational Biology, University of California, Berkeley, CA, 94720, USA
| | - Lung-Hao Tai
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA, 94720, USA
| | - Christopher D. Hall
- Sainsbury Wellcome Centre for Neural Circuits and Behaviour, University College London, London, W1T 4JG, UK
| | - Emilie M. Tu
- Department of Psychology, University of California, Berkeley, CA, 94720, USA
| | | | - Karyna Mishchanchuk
- Department of Neuroscience, Physiology and Pharmacology, University College London, UK
| | - Wan Chen Lin
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA, 94720, USA
| | - Juliana B. Chase
- Department of Psychology, University of California, Berkeley, CA, 94720, USA
| | - Andrew F. MacAskill
- Department of Neuroscience, Physiology and Pharmacology, University College London, UK
| | - Anne G. E. Collins
- Department of Psychology, University of California, Berkeley, CA, 94720, USA
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA, 94720, USA
| | - Samuel J. Gershman
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA, 02138, USA
- Center for Brains, Minds, and Machines, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Linda Wilbrecht
- Department of Psychology, University of California, Berkeley, CA, 94720, USA
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA, 94720, USA
| |
Collapse
|
3
|
Mah A, Golden CE, Constantinople CM. Dopamine transients encode reward prediction errors independent of learning rates. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.18.590090. [PMID: 38659861 PMCID: PMC11042285 DOI: 10.1101/2024.04.18.590090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Biological accounts of reinforcement learning posit that dopamine encodes reward prediction errors (RPEs), which are multiplied by a learning rate to update state or action values. These values are thought to be represented in synaptic weights in the striatum, and updated by dopamine-dependent plasticity, suggesting that dopamine release might reflect the product of the learning rate and RPE. Here, we leveraged the fact that animals learn faster in volatile environments to characterize dopamine encoding of learning rates in the nucleus accumbens core (NAcc). We trained rats on a task with semi-observable states offering different rewards, and rats adjusted how quickly they initiated trials across states using RPEs. Computational modeling and behavioral analyses showed that learning rates were higher following state transitions, and scaled with trial-by-trial changes in beliefs about hidden states, approximating normative Bayesian strategies. Notably, dopamine release in the NAcc encoded RPEs independent of learning rates, suggesting that dopamine-independent mechanisms instantiate dynamic learning rates.
Collapse
Affiliation(s)
- Andrew Mah
- Center for Neural Science, New York University
| | | | | |
Collapse
|
4
|
Bernklau TW, Righetti B, Mehrke LS, Jacob SN. Striatal dopamine signals reflect perceived cue-action-outcome associations in mice. Nat Neurosci 2024; 27:747-757. [PMID: 38291283 PMCID: PMC11001585 DOI: 10.1038/s41593-023-01567-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 12/21/2023] [Indexed: 02/01/2024]
Abstract
Striatal dopamine drives associative learning by acting as a teaching signal. Much work has focused on simple learning paradigms, including Pavlovian and instrumental learning. However, higher cognition requires that animals generate internal concepts of their environment, where sensory stimuli, actions and outcomes become flexibly associated. Here, we performed fiber photometry dopamine measurements across the striatum of male mice as they learned cue-action-outcome associations based on implicit and changing task rules. Reinforcement learning models of the behavioral and dopamine data showed that rule changes lead to adjustments of learned cue-action-outcome associations. After rule changes, mice discarded learned associations and reset outcome expectations. Cue- and outcome-triggered dopamine signals became uncoupled and dependent on the adopted behavioral strategy. As mice learned the new association, coupling between cue- and outcome-triggered dopamine signals and task performance re-emerged. Our results suggest that dopaminergic reward prediction errors reflect an agent's perceived locus of control.
Collapse
Affiliation(s)
- Tobias W Bernklau
- Translational Neurotechnology Laboratory, Department of Neurosurgery, Klinikum rechts der Isar, Technical University of Munich, Munich, Germany
- Graduate School of Systemic Neurosciences, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Beatrice Righetti
- Translational Neurotechnology Laboratory, Department of Neurosurgery, Klinikum rechts der Isar, Technical University of Munich, Munich, Germany
| | - Leonie S Mehrke
- Translational Neurotechnology Laboratory, Department of Neurosurgery, Klinikum rechts der Isar, Technical University of Munich, Munich, Germany
| | - Simon N Jacob
- Translational Neurotechnology Laboratory, Department of Neurosurgery, Klinikum rechts der Isar, Technical University of Munich, Munich, Germany.
| |
Collapse
|
5
|
Zhou T, Ho YY, Lee RX, Fath AB, He K, Scott J, Bajwa N, Hartley ND, Wilde J, Gao X, Li C, Hong E, Nassar MR, Wimmer RD, Singh T, Halassa MM, Feng G. Enhancement of mediodorsal thalamus rescues aberrant belief dynamics in a mouse model with schizophrenia-associated mutation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.08.574745. [PMID: 38260581 PMCID: PMC10802391 DOI: 10.1101/2024.01.08.574745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Optimizing behavioral strategy requires belief updating based on new evidence, a process that engages higher cognition. In schizophrenia, aberrant belief dynamics may lead to psychosis, but the mechanisms underlying this process are unknown, in part, due to lack of appropriate animal models and behavior readouts. Here, we address this challenge by taking two synergistic approaches. First, we generate a mouse model bearing patient-derived point mutation in Grin2a (Grin2aY700X+/-), a gene that confers high-risk for schizophrenia and recently identified by large-scale exome sequencing. Second, we develop a computationally trackable foraging task, in which mice form and update belief-driven strategies in a dynamic environment. We found that Grin2aY700X+/- mice perform less optimally than their wild-type (WT) littermates, showing unstable behavioral states and a slower belief update rate. Using functional ultrasound imaging, we identified the mediodorsal (MD) thalamus as hypofunctional in Grin2aY700X+/- mice, and in vivo task recordings showed that MD neurons encoded dynamic values and behavioral states in WT mice. Optogenetic inhibition of MD neurons in WT mice phenocopied Grin2aY700X+/- mice, and enhancing MD activity rescued task deficits in Grin2aY700X+/- mice. Together, our study identifies the MD thalamus as a key node for schizophrenia-relevant cognitive dysfunction, and a potential target for future therapeutics.
Collapse
Affiliation(s)
- Tingting Zhou
- Yang Tan Collection and McGovern Institute for Brain Research, Massachusetts Institute of Technology
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology
| | - Yi-Yun Ho
- Yang Tan Collection and McGovern Institute for Brain Research, Massachusetts Institute of Technology
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology
| | - Ray X Lee
- Yang Tan Collection and McGovern Institute for Brain Research, Massachusetts Institute of Technology
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology
| | - Amanda B Fath
- Yang Tan Collection and McGovern Institute for Brain Research, Massachusetts Institute of Technology
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology
| | - Kathleen He
- Yang Tan Collection and McGovern Institute for Brain Research, Massachusetts Institute of Technology
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology
| | - Jonathan Scott
- Department of Neuroscience, Tufts University School of Medicine
| | - Navdeep Bajwa
- Department of Neuroscience, Tufts University School of Medicine
| | - Nolan D Hartley
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard
| | - Jonathan Wilde
- Yang Tan Collection and McGovern Institute for Brain Research, Massachusetts Institute of Technology
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology
| | - Xian Gao
- Yang Tan Collection and McGovern Institute for Brain Research, Massachusetts Institute of Technology
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology
| | - Cui Li
- Yang Tan Collection and McGovern Institute for Brain Research, Massachusetts Institute of Technology
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology
| | - Evan Hong
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology
| | | | - Ralf D Wimmer
- Department of Neuroscience, Tufts University School of Medicine
| | - Tarjinder Singh
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard
| | | | - Guoping Feng
- Yang Tan Collection and McGovern Institute for Brain Research, Massachusetts Institute of Technology
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard
| |
Collapse
|
6
|
Amo R. Prediction error in dopamine neurons during associative learning. Neurosci Res 2024; 199:12-20. [PMID: 37451506 DOI: 10.1016/j.neures.2023.07.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 06/18/2023] [Accepted: 07/07/2023] [Indexed: 07/18/2023]
Abstract
Dopamine neurons have long been thought to facilitate learning by broadcasting reward prediction error (RPE), a teaching signal used in machine learning, but more recent work has advanced alternative models of dopamine's computational role. Here, I revisit this critical issue and review new experimental evidences that tighten the link between dopamine activity and RPE. First, I introduce the recent observation of a gradual backward shift of dopamine activity that had eluded researchers for over a decade. I also discuss several other findings, such as dopamine ramping, that were initially interpreted to conflict but later found to be consistent with RPE. These findings improve our understanding of neural computation in dopamine neurons.
Collapse
Affiliation(s)
- Ryunosuke Amo
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA 02138, USA.
| |
Collapse
|
7
|
Blanco-Pozo M, Akam T, Walton ME. Dopamine-independent effect of rewards on choices through hidden-state inference. Nat Neurosci 2024; 27:286-297. [PMID: 38216649 PMCID: PMC10849965 DOI: 10.1038/s41593-023-01542-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 12/01/2023] [Indexed: 01/14/2024]
Abstract
Dopamine is implicated in adaptive behavior through reward prediction error (RPE) signals that update value estimates. There is also accumulating evidence that animals in structured environments can use inference processes to facilitate behavioral flexibility. However, it is unclear how these two accounts of reward-guided decision-making should be integrated. Using a two-step task for mice, we show that dopamine reports RPEs using value information inferred from task structure knowledge, alongside information about reward rate and movement. Nonetheless, although rewards strongly influenced choices and dopamine activity, neither activating nor inhibiting dopamine neurons at trial outcome affected future choice. These data were recapitulated by a neural network model where cortex learned to track hidden task states by predicting observations, while basal ganglia learned values and actions via RPEs. This shows that the influence of rewards on choices can stem from dopamine-independent information they convey about the world's state, not the dopaminergic RPEs they produce.
Collapse
Affiliation(s)
- Marta Blanco-Pozo
- Department of Experimental Psychology, Oxford University, Oxford, UK.
- Wellcome Centre for Integrative Neuroimaging, Oxford University, Oxford, UK.
| | - Thomas Akam
- Department of Experimental Psychology, Oxford University, Oxford, UK.
- Wellcome Centre for Integrative Neuroimaging, Oxford University, Oxford, UK.
| | - Mark E Walton
- Department of Experimental Psychology, Oxford University, Oxford, UK.
- Wellcome Centre for Integrative Neuroimaging, Oxford University, Oxford, UK.
| |
Collapse
|
8
|
Hennig JA, Romero Pinto SA, Yamaguchi T, Linderman SW, Uchida N, Gershman SJ. Emergence of belief-like representations through reinforcement learning. PLoS Comput Biol 2023; 19:e1011067. [PMID: 37695776 PMCID: PMC10513382 DOI: 10.1371/journal.pcbi.1011067] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 09/21/2023] [Accepted: 08/27/2023] [Indexed: 09/13/2023] Open
Abstract
To behave adaptively, animals must learn to predict future reward, or value. To do this, animals are thought to learn reward predictions using reinforcement learning. However, in contrast to classical models, animals must learn to estimate value using only incomplete state information. Previous work suggests that animals estimate value in partially observable tasks by first forming "beliefs"-optimal Bayesian estimates of the hidden states in the task. Although this is one way to solve the problem of partial observability, it is not the only way, nor is it the most computationally scalable solution in complex, real-world environments. Here we show that a recurrent neural network (RNN) can learn to estimate value directly from observations, generating reward prediction errors that resemble those observed experimentally, without any explicit objective of estimating beliefs. We integrate statistical, functional, and dynamical systems perspectives on beliefs to show that the RNN's learned representation encodes belief information, but only when the RNN's capacity is sufficiently large. These results illustrate how animals can estimate value in tasks without explicitly estimating beliefs, yielding a representation useful for systems with limited capacity.
Collapse
Affiliation(s)
- Jay A. Hennig
- Department of Psychology, Harvard University, Cambridge, Massachusetts, United States of America
- Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
| | - Sandra A. Romero Pinto
- Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
- Program in Speech and Hearing Bioscience and Technology, Harvard Medical School, Boston, Massachusetts, USA
| | - Takahiro Yamaguchi
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
- Future Research Department, Toyota Research Institute of North America, Toyota Motor North America, Ann Arbor, Michigan, United States of America
| | - Scott W. Linderman
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, California, United States of America
- Department of Statistics, Stanford University, Stanford, California, United States of America
| | - Naoshige Uchida
- Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Samuel J. Gershman
- Department of Psychology, Harvard University, Cambridge, Massachusetts, United States of America
- Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
| |
Collapse
|
9
|
Mikus N, Eisenegger C, Mathys C, Clark L, Müller U, Robbins TW, Lamm C, Naef M. Blocking D2/D3 dopamine receptors in male participants increases volatility of beliefs when learning to trust others. Nat Commun 2023; 14:4049. [PMID: 37422466 PMCID: PMC10329681 DOI: 10.1038/s41467-023-39823-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 06/29/2023] [Indexed: 07/10/2023] Open
Abstract
The ability to learn about other people is crucial for human social functioning. Dopamine has been proposed to regulate the precision of beliefs, but direct behavioural evidence of this is lacking. In this study, we investigate how a high dose of the D2/D3 dopamine receptor antagonist sulpiride impacts learning about other people's prosocial attitudes in a repeated Trust game. Using a Bayesian model of belief updating, we show that in a sample of 76 male participants sulpiride increases the volatility of beliefs, which leads to higher precision weights on prediction errors. This effect is driven by participants with genetically conferred higher dopamine availability (Taq1a polymorphism) and remains even after controlling for working memory performance. Higher precision weights are reflected in higher reciprocal behaviour in the repeated Trust game but not in single-round Trust games. Our data provide evidence that the D2 receptors are pivotal in regulating prediction error-driven belief updating in a social context.
Collapse
Affiliation(s)
- Nace Mikus
- Department of Cognition, Emotion, and Methods in Psychology, Faculty of Psychology, University of Vienna, Vienna, Austria.
- Interacting Minds Centre, Aarhus University, Aarhus, Denmark.
| | - Christoph Eisenegger
- Department of Cognition, Emotion, and Methods in Psychology, Faculty of Psychology, University of Vienna, Vienna, Austria
- Behavioural and Clinical Neuroscience Institute and Department of Psychology, University of Cambridge, Cambridge, UK
| | - Christoph Mathys
- Interacting Minds Centre, Aarhus University, Aarhus, Denmark
- Translational Neuromodeling Unit (TNU), Institute for Biomedical Engineering, University of Zurich and ETH Zurich, Zurich, Switzerland
- Scuola Internazionale Superiore di Studi Avanzati (SISSA), Trieste, Italy
| | - Luke Clark
- Centre for Gambling Research at UBC, Department of Psychology, University of British, Columbia, Vancouver, BC, Canada
- Djavad Mowafaghian Centre for Brain Health, University of British Columbia, Vancouver, BC, Canada
| | - Ulrich Müller
- Behavioural and Clinical Neuroscience Institute and Department of Psychology, University of Cambridge, Cambridge, UK
- Adult Neurodevelopmental Services, Health & Community Services, Government of Jersey, St Helier, Jersey
| | - Trevor W Robbins
- Behavioural and Clinical Neuroscience Institute and Department of Psychology, University of Cambridge, Cambridge, UK
| | - Claus Lamm
- Department of Cognition, Emotion, and Methods in Psychology, Faculty of Psychology, University of Vienna, Vienna, Austria.
| | - Michael Naef
- Department of Economics, University of Durham, Durham, UK.
| |
Collapse
|
10
|
Lamberti M, Tripathi S, van Putten MJAM, Marzen S, le Feber J. Prediction in cultured cortical neural networks. PNAS NEXUS 2023; 2:pgad188. [PMID: 37383023 PMCID: PMC10299080 DOI: 10.1093/pnasnexus/pgad188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 04/18/2023] [Accepted: 05/25/2023] [Indexed: 06/30/2023]
Abstract
Theory suggest that networks of neurons may predict their input. Prediction may underlie most aspects of information processing and is believed to be involved in motor and cognitive control and decision-making. Retinal cells have been shown to be capable of predicting visual stimuli, and there is some evidence for prediction of input in the visual cortex and hippocampus. However, there is no proof that the ability to predict is a generic feature of neural networks. We investigated whether random in vitro neuronal networks can predict stimulation, and how prediction is related to short- and long-term memory. To answer these questions, we applied two different stimulation modalities. Focal electrical stimulation has been shown to induce long-term memory traces, whereas global optogenetic stimulation did not. We used mutual information to quantify how much activity recorded from these networks reduces the uncertainty of upcoming stimuli (prediction) or recent past stimuli (short-term memory). Cortical neural networks did predict future stimuli, with the majority of all predictive information provided by the immediate network response to the stimulus. Interestingly, prediction strongly depended on short-term memory of recent sensory inputs during focal as well as global stimulation. However, prediction required less short-term memory during focal stimulation. Furthermore, the dependency on short-term memory decreased during 20 h of focal stimulation, when long-term connectivity changes were induced. These changes are fundamental for long-term memory formation, suggesting that besides short-term memory the formation of long-term memory traces may play a role in efficient prediction.
Collapse
Affiliation(s)
- Martina Lamberti
- Department of Clinical Neurophysiology, University of Twente, PO Box 217 7500AE, Enschede, The Netherlands
| | - Shiven Tripathi
- Department of Electrical Engineering, Indian Institute of Technology, Kanpur 208016, India
| | - Michel J A M van Putten
- Department of Clinical Neurophysiology, University of Twente, PO Box 217 7500AE, Enschede, The Netherlands
| | - Sarah Marzen
- W. M. Keck Science Department, Pitzer, Scripps, and Claremont McKenna College, Claremont, CA 91711, USA
| | - Joost le Feber
- Department of Clinical Neurophysiology, University of Twente, PO Box 217 7500AE, Enschede, The Netherlands
| |
Collapse
|
11
|
Alexander WH, Deraeve J, Vassena E. Dissociation and integration of outcome and state uncertainty signals in cognitive control. COGNITIVE, AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2023:10.3758/s13415-023-01091-7. [PMID: 37058212 PMCID: PMC10390360 DOI: 10.3758/s13415-023-01091-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Accepted: 03/13/2023] [Indexed: 04/15/2023]
Abstract
Signals related to uncertainty are frequently observed in regions of the cognitive control network, including anterior cingulate/medial prefrontal cortex (ACC/mPFC), dorsolateral prefrontal cortex (dlPFC), and anterior insular cortex. Uncertainty generally refers to conditions in which decision variables may assume multiple possible values and can arise at multiple points in the perception-action cycle, including sensory input, inferred states of the environment, and the consequences of actions. These sources of uncertainty are frequently correlated: noisy input can lead to unreliable estimates of the state of the environment, with consequential influences on action selection. Given this correlation amongst various sources of uncertainty, dissociating the neural structures underlying their estimation presents an ongoing issue: a region associated with uncertainty related to outcomes may estimate outcome uncertainty itself, or it may reflect a cascade effect of state uncertainty on outcome estimates. In this study, we derive signals of state and outcome uncertainty from mathematical models of risk and observe regions in the cognitive control network whose activity is best explained by signals related to state uncertainty (anterior insula), outcome uncertainty (dlPFC), as well as regions that appear to integrate the two (ACC/mPFC).
Collapse
Affiliation(s)
- William H Alexander
- Center for Complex Systems & Brain Sciences, Florida Atlantic University, Boca Raton, FL, USA.
- Department of Psychology, Florida Atlantic University, Boca Raton, FL, USA.
- The Brain Institute, Florida Atlantic University, Boca Raton, FL, USA.
- Department of Experimental Psychology, Ghent University, Ghent, Belgium.
| | - James Deraeve
- Department of Experimental Psychology, Ghent University, Ghent, Belgium
| | - Eliana Vassena
- Experimental Psychopathology and Treatment, Behavioural Science Institute, Radboud University, Nijmegen, Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboudumc, Nijmegen, Netherlands
| |
Collapse
|
12
|
Banerjee A, Wang BA, Teutsch J, Helmchen F, Pleger B. Analogous cognitive strategies for tactile learning in the rodent and human brain. Prog Neurobiol 2023; 222:102401. [PMID: 36608783 DOI: 10.1016/j.pneurobio.2023.102401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Revised: 12/21/2022] [Accepted: 01/02/2023] [Indexed: 01/05/2023]
Abstract
Evolution has molded individual species' sensory capacities and abilities. In rodents, who mostly inhabit dark tunnels and burrows, the whisker-based somatosensory system has developed as the dominant sensory modality, essential for environmental exploration and spatial navigation. In contrast, humans rely more on visual and auditory inputs when collecting information from their surrounding sensory space in everyday life. As a result of such species-specific differences in sensory dominance, cognitive relevance and capacities, the evidence for analogous sensory-cognitive mechanisms across species remains sparse. However, recent research in rodents and humans yielded surprisingly comparable processing rules for detecting tactile stimuli, integrating touch information into percepts, and goal-directed rule learning. Here, we review how the brain, across species, harnesses such processing rules to establish decision-making during tactile learning, following canonical circuits from the thalamus and the primary somatosensory cortex up to the frontal cortex. We discuss concordances between empirical and computational evidence from micro- and mesoscopic circuit studies in rodents to findings from macroscopic imaging in humans. Furthermore, we discuss the relevance and challenges for future cross-species research in addressing mutual context-dependent evaluation processes underpinning perceptual learning.
Collapse
Affiliation(s)
- Abhishek Banerjee
- Adaptive Decisions Lab, Biosciences Institute, Newcastle University, United Kingdom.
| | - Bin A Wang
- Department of Neurology, BG University Hospital Bergmannsheil, Ruhr University Bochum, Germany; Collaborative Research Centre 874 "Integration and Representation of Sensory Processes", Ruhr University Bochum, Germany.
| | - Jasper Teutsch
- Adaptive Decisions Lab, Biosciences Institute, Newcastle University, United Kingdom
| | - Fritjof Helmchen
- Laboratory of Neural Circuit Dynamics, Brain Research Institute, University of Zürich, Switzerland
| | - Burkhard Pleger
- Department of Neurology, BG University Hospital Bergmannsheil, Ruhr University Bochum, Germany; Collaborative Research Centre 874 "Integration and Representation of Sensory Processes", Ruhr University Bochum, Germany
| |
Collapse
|
13
|
Jacobs DS, Allen MC, Park J, Moghaddam B. Learning of probabilistic punishment as a model of anxiety produces changes in action but not punisher encoding in the dmPFC and VTA. eLife 2022; 11:e78912. [PMID: 36102386 PMCID: PMC9525102 DOI: 10.7554/elife.78912] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 08/30/2022] [Indexed: 11/13/2022] Open
Abstract
Previously, we developed a novel model for anxiety during motivated behavior by training rats to perform a task where actions executed to obtain a reward were probabilistically punished and observed that after learning, neuronal activity in the ventral tegmental area (VTA) and dorsomedial prefrontal cortex (dmPFC) represent the relationship between action and punishment risk (Park and Moghaddam, 2017). Here, we used male and female rats to expand on the previous work by focusing on neural changes in the dmPFC and VTA that were associated with the learning of probabilistic punishment, and anxiolytic treatment with diazepam after learning. We find that adaptive neural responses of dmPFC and VTA during the learning of anxiogenic contingencies are independent from the punisher experience and occur primarily during the peri-action and reward period. Our results also identify peri-action ramping of VTA neural calcium activity, and VTA-dmPFC correlated activity, as potential markers for the anxiolytic properties of diazepam.
Collapse
Affiliation(s)
- David S Jacobs
- Department of Behavioral Neuroscience, Oregon Health & Science UniversityPortlandUnited States
| | - Madeleine C Allen
- Department of Behavioral Neuroscience, Oregon Health & Science UniversityPortlandUnited States
- Department of Psychiatry, Oregon Health & Science UniversityPortlandUnited States
| | - Junchol Park
- Janelia Research Campus, Howard Hughes Medical InstituteAshburnUnited States
| | - Bita Moghaddam
- Department of Behavioral Neuroscience, Oregon Health & Science UniversityPortlandUnited States
- Department of Psychiatry, Oregon Health & Science UniversityPortlandUnited States
| |
Collapse
|
14
|
Nour MM, Liu Y, Dolan RJ. Functional neuroimaging in psychiatry and the case for failing better. Neuron 2022; 110:2524-2544. [PMID: 35981525 DOI: 10.1016/j.neuron.2022.07.005] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Revised: 06/06/2022] [Accepted: 07/08/2022] [Indexed: 12/27/2022]
Abstract
Psychiatric disorders encompass complex aberrations of cognition and affect and are among the most debilitating and poorly understood of any medical condition. Current treatments rely primarily on interventions that target brain function (drugs) or learning processes (psychotherapy). A mechanistic understanding of how these interventions mediate their therapeutic effects remains elusive. From the early 1990s, non-invasive functional neuroimaging, coupled with parallel developments in the cognitive neurosciences, seemed to signal a new era of neurobiologically grounded diagnosis and treatment in psychiatry. Yet, despite three decades of intense neuroimaging research, we still lack a neurobiological account for any psychiatric condition. Likewise, functional neuroimaging plays no role in clinical decision making. Here, we offer a critical commentary on this impasse and suggest how the field might fare better and deliver impactful neurobiological insights.
Collapse
Affiliation(s)
- Matthew M Nour
- Max Planck University College London Centre for Computational Psychiatry and Ageing Research, London WC1B 5EH, UK; Wellcome Trust Centre for Human Neuroimaging, University College London, London WC1N 3AR, UK; Department of Psychiatry, University of Oxford, Oxford OX3 7JX, UK.
| | - Yunzhe Liu
- Max Planck University College London Centre for Computational Psychiatry and Ageing Research, London WC1B 5EH, UK; State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China; Chinese Institute for Brain Research, Beijing 102206, China
| | - Raymond J Dolan
- Max Planck University College London Centre for Computational Psychiatry and Ageing Research, London WC1B 5EH, UK; Wellcome Trust Centre for Human Neuroimaging, University College London, London WC1N 3AR, UK; State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China.
| |
Collapse
|
15
|
Amo R, Matias S, Yamanaka A, Tanaka KF, Uchida N, Watabe-Uchida M. A gradual temporal shift of dopamine responses mirrors the progression of temporal difference error in machine learning. Nat Neurosci 2022; 25:1082-1092. [PMID: 35798979 PMCID: PMC9624460 DOI: 10.1038/s41593-022-01109-2] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2020] [Accepted: 05/24/2022] [Indexed: 02/03/2023]
Abstract
A large body of evidence has indicated that the phasic responses of midbrain dopamine neurons show a remarkable similarity to a type of teaching signal (temporal difference (TD) error) used in machine learning. However, previous studies failed to observe a key prediction of this algorithm: that when an agent associates a cue and a reward that are separated in time, the timing of dopamine signals should gradually move backward in time from the time of the reward to the time of the cue over multiple trials. Here we demonstrate that such a gradual shift occurs both at the level of dopaminergic cellular activity and dopamine release in the ventral striatum in mice. Our results establish a long-sought link between dopaminergic activity and the TD learning algorithm, providing fundamental insights into how the brain associates cues and rewards that are separated in time.
Collapse
Affiliation(s)
- Ryunosuke Amo
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA, USA
| | - Sara Matias
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA, USA
| | - Akihiro Yamanaka
- Department of Neuroscience II, Research Institute of Environmental Medicine, Nagoya University, Nagoya, Japan
| | - Kenji F Tanaka
- Division of Brain Sciences, Institute for Advanced Medical Research, Keio University School of Medicine, Tokyo, Japan
| | - Naoshige Uchida
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA, USA
| | - Mitsuko Watabe-Uchida
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
16
|
Tietz S, Wagner-Skacel J, Angel HF, Ratzenhofer M, Fellendorf FT, Fleischmann E, Körner C, Reininghaus EZ, Seitz RJ, Dalkner N. Believing processes during the COVID-19 pandemic in individuals with bipolar disorder: An exploratory study. World J Psychiatry 2022; 12:929-943. [PMID: 36051599 PMCID: PMC9331453 DOI: 10.5498/wjp.v12.i7.929] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Revised: 03/27/2022] [Accepted: 06/27/2022] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Believing or “credition” refers to psychological processes that integrate the cognitions and emotions that influence our behavior. In the credition model by Angel and Seitz, four parameters are postulated: proposition, certainty, emotion and mightiness. It is assumed that believing processes are influenced by both the individual as well as socio-cultural factors and external circumstances. External or environmental circumstances can include threatening situations such as the ongoing pandemic. It has been hypothesized that believing processes related to the pandemic differ between individuals with bipolar disorder (BD) and healthy controls (HC).
AIM To investigate credition in individuals with BD during the coronavirus disease 2019 (COVID-19) pandemic.
METHODS Psychiatrically stable individuals with BD (n = 52) and age- and sex matched HC (n = 52) participated in an online survey during the first lockdown of the COVID-19 pandemic. The survey took place between April 9th and June 4th, 2020, in Austria. Participants completed the Brief Symptom Inventory-18, the Beck Depression Inventory-II, the Altman Self-Rating Mania Scale, the Pittsburgh Sleep Quality Index and a dedicated Believing Questionnaire assessing four parameters of credition (proposition, certainty, emotion and mightiness). The MAXQDA software was used to analyze the qualitative data. Statistical analyses included analyses of variance, a multivariate analysis of variance and a multivariate analysis of co-variance.
RESULTS Individuals with BD reported significantly more negative propositions [F (1,102) = 8.89, P = 0.004, η2p = 0.08] and negative emotions [Welch´s F (1,82.46) = 18.23, P < 0.001, η2p = 0.18], while HC showed significantly more positive propositions [F (1,102) = 7.78, P = 0.006, η2p = 0.07] and emotions [F (1,102) = 14.31, P < 0.001, η2p = 0.12]. In addition, individuals with BD showed a higher incongruence between their propositions and their emotions [F (1,102) = 9.42, P = 0.003, η2p = 0.08] and showed strong correlations between the parameters of the Believing Questionnaire and their psychiatric symptoms (r = 0.51-0.77, all P < 0.001). Positive as well as negative emotions and propositions were associated with scores measuring symptoms of depression, anxiety and sleep quality.
CONCLUSION Believing parameters were associated with psychiatric symptoms in BD during the pandemic. Findings broaden knowledge about the susceptibility of believing processes for ambient challenges in individuals with BD.
Collapse
Affiliation(s)
- Sophie Tietz
- Institute of Psychology, University of Graz, Graz 8010, Austria
- Department of Psychiatry and Psychotherapeutic Medicine, Medical University of Graz, Graz 8036, Austria
| | - Jolana Wagner-Skacel
- Department of Medical Psychology and Psychotherapy, Medical University of Graz, Graz 8036, Austria
| | - Hans-Ferdinand Angel
- Department of Catechetics and Religious Education, University of Graz, Graz 8010, Austria
| | - Michaela Ratzenhofer
- Department of Psychiatry and Psychotherapeutic Medicine, Medical University of Graz, Graz 8036, Austria
| | - Frederike T Fellendorf
- Department of Psychiatry and Psychotherapeutic Medicine, Medical University of Graz, Graz 8036, Austria
| | - Eva Fleischmann
- Department of Psychiatry and Psychotherapeutic Medicine, Medical University of Graz, Graz 8036, Austria
| | - Christof Körner
- Institute of Psychology, University of Graz, Graz 8010, Austria
| | - Eva Z Reininghaus
- Department of Psychiatry and Psychotherapeutic Medicine, Medical University of Graz, Graz 8036, Austria
| | - Rüdiger J Seitz
- Department of Neurology, Centre of Neurology and Neuropsychiatry Heinrich-Heine-University Düsseldorf, Medical Faculty, Düsseldorf D-40629, Germany
| | - Nina Dalkner
- Department of Psychiatry and Psychotherapeutic Medicine, Medical University of Graz, Graz 8036, Austria
| |
Collapse
|
17
|
Kumar JNA, Francis JT. Improved Grip Force Prediction Using a Loss Function that Penalizes Reward Related Neural Information. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2022; 2022:2336-2339. [PMID: 36085700 DOI: 10.1109/embc48229.2022.9871920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Neural activity in the sensorimotor cortices has been previously shown to correlate with kinematics, kinetics, and non-sensorimotor variables, such as reward. In this work, we compare the grip force offline Brain Machine Interface (BMI) prediction performance, of a simple artificial neural network (ANN), under two loss functions: the standard mean squared error (MSE) and a modified reward penalized mean squared error (RP_MSE), which penalizes for correlation between reward and grip force. Our results show that the ANN performs significantly better under the RP_MSE loss function in three brain regions: dorsal premotor cortex (PMd), primary motor cortex (M1) and the primary somatosensory cortex (S1) by approximately 6%.
Collapse
|
18
|
The role of state uncertainty in the dynamics of dopamine. Curr Biol 2022; 32:1077-1087.e9. [PMID: 35114098 PMCID: PMC8930519 DOI: 10.1016/j.cub.2022.01.025] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 11/22/2021] [Accepted: 01/10/2022] [Indexed: 11/22/2022]
Abstract
Reinforcement learning models of the basal ganglia map the phasic dopamine signal to reward prediction errors (RPEs). Conventional models assert that, when a stimulus predicts a reward with fixed delay, dopamine activity during the delay should converge to baseline through learning. However, recent studies have found that dopamine ramps up before reward in certain conditions even after learning, thus challenging the conventional models. In this work, we show that sensory feedback causes an unbiased learner to produce RPE ramps. Our model predicts that when feedback gradually decreases during a trial, dopamine activity should resemble a "bump," whose ramp-up phase should, furthermore, be greater than that of conditions where the feedback stays high. We trained mice on a virtual navigation task with varying brightness, and both predictions were empirically observed. In sum, our theoretical and experimental results reconcile the seemingly conflicting data on dopamine behaviors under the RPE hypothesis.
Collapse
|
19
|
Dopamine firing plays a dual role in coding reward prediction errors and signaling motivation in a working memory task. Proc Natl Acad Sci U S A 2022; 119:2113311119. [PMID: 34992139 PMCID: PMC8764687 DOI: 10.1073/pnas.2113311119] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/29/2021] [Indexed: 11/21/2022] Open
Abstract
Little is known about how dopamine (DA) neuron firing rates behave in cognitively demanding decision-making tasks. Here, we investigated midbrain DA activity in monkeys performing a discrimination task in which the animal had to use working memory (WM) to report which of two sequentially applied vibrotactile stimuli had the higher frequency. We found that perception was altered by an internal bias, likely generated by deterioration of the representation of the first frequency during the WM period. This bias greatly controlled the DA phasic response during the two stimulation periods, confirming that DA reward prediction errors reflected stimulus perception. In contrast, tonic dopamine activity during WM was not affected by the bias and did not encode the stored frequency. More interestingly, both delay-period activity and phasic responses before the second stimulus negatively correlated with reaction times of the animals after the trial start cue and thus represented motivated behavior on a trial-by-trial basis. During WM, this motivation signal underwent a ramp-like increase. At the same time, motivation positively correlated with accuracy, especially in difficult trials, probably by decreasing the effect of the bias. Overall, our results indicate that DA activity, in addition to encoding reward prediction errors, could at the same time be involved in motivation and WM. In particular, the ramping activity during the delay period suggests a possible DA role in stabilizing sustained cortical activity, hypothetically by increasing the gain communicated to prefrontal neurons in a motivation-dependent way.
Collapse
|
20
|
Cools R, Arnsten AFT. Neuromodulation of prefrontal cortex cognitive function in primates: the powerful roles of monoamines and acetylcholine. Neuropsychopharmacology 2022; 47:309-328. [PMID: 34312496 PMCID: PMC8617291 DOI: 10.1038/s41386-021-01100-8] [Citation(s) in RCA: 62] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Revised: 07/06/2021] [Accepted: 07/06/2021] [Indexed: 02/07/2023]
Abstract
The primate prefrontal cortex (PFC) subserves our highest order cognitive operations, and yet is tremendously dependent on a precise neurochemical environment for proper functioning. Depletion of noradrenaline and dopamine, or of acetylcholine from the dorsolateral PFC (dlPFC), is as devastating as removing the cortex itself, and serotonergic influences are also critical to proper functioning of the orbital and medial PFC. Most neuromodulators have a narrow inverted U dose response, which coordinates arousal state with cognitive state, and contributes to cognitive deficits with fatigue or uncontrollable stress. Studies in monkeys have revealed the molecular signaling mechanisms that govern the generation and modulation of mental representations by the dlPFC, allowing dynamic regulation of network strength, a process that requires tight regulation to prevent toxic actions, e.g., as occurs with advanced age. Brain imaging studies in humans have observed drug and genotype influences on a range of cognitive tasks and on PFC circuit functional connectivity, e.g., showing that catecholamines stabilize representations in a baseline-dependent manner. Research in monkeys has already led to new treatments for cognitive disorders in humans, encouraging future research in this important field.
Collapse
Affiliation(s)
- Roshan Cools
- Department of Psychiatry, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Amy F T Arnsten
- Department of Neuroscience, Yale University School of Medicine, New Haven, CT, USA.
| |
Collapse
|
21
|
Collins AGE, Shenhav A. Advances in modeling learning and decision-making in neuroscience. Neuropsychopharmacology 2022; 47:104-118. [PMID: 34453117 PMCID: PMC8617262 DOI: 10.1038/s41386-021-01126-y] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Revised: 07/14/2021] [Accepted: 07/22/2021] [Indexed: 02/07/2023]
Abstract
An organism's survival depends on its ability to learn about its environment and to make adaptive decisions in the service of achieving the best possible outcomes in that environment. To study the neural circuits that support these functions, researchers have increasingly relied on models that formalize the computations required to carry them out. Here, we review the recent history of computational modeling of learning and decision-making, and how these models have been used to advance understanding of prefrontal cortex function. We discuss how such models have advanced from their origins in basic algorithms of updating and action selection to increasingly account for complexities in the cognitive processes required for learning and decision-making, and the representations over which they operate. We further discuss how a deeper understanding of the real-world complexities in these computations has shed light on the fundamental constraints on optimal behavior, and on the complex interactions between corticostriatal pathways to determine such behavior. The continuing and rapid development of these models holds great promise for understanding the mechanisms by which animals adapt to their environments, and what leads to maladaptive forms of learning and decision-making within clinical populations.
Collapse
Affiliation(s)
- Anne G E Collins
- Department of Psychology and Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, USA.
| | - Amitai Shenhav
- Department of Cognitive, Linguistic, & Psychological Sciences and Carney Institute for Brain Science, Brown University, Providence, RI, USA.
| |
Collapse
|
22
|
Ogasawara T, Sogukpinar F, Zhang K, Feng YY, Pai J, Jezzini A, Monosov IE. A primate temporal cortex-zona incerta pathway for novelty seeking. Nat Neurosci 2022; 25:50-60. [PMID: 34903880 DOI: 10.1038/s41593-021-00950-1] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Accepted: 09/28/2021] [Indexed: 11/08/2022]
Abstract
Primates interact with the world by exploring visual objects; they seek opportunities to view novel objects even when these have no extrinsic reward value. How the brain controls this novelty seeking is unknown. Here we show that novelty seeking in monkeys is regulated by the zona incerta (ZI). As monkeys made eye movements to familiar objects to trigger an opportunity to view novel objects, many ZI neurons were preferentially activated by predictions of novel objects before the gaze shift. Low-intensity ZI stimulation facilitated gaze shifts, whereas ZI inactivation reduced novelty seeking. ZI-dependent novelty seeking was not regulated by neurons in the lateral habenula or by many dopamine neurons in the substantia nigra, traditionally associated with reward seeking. But the anterior ventral medial temporal cortex, an area important for object vision and memory, was a prominent source of novelty predictions. These data uncover a functional pathway in the primate brain that regulates novelty seeking.
Collapse
Affiliation(s)
- Takaya Ogasawara
- Department of Neuroscience, Washington University School of Medicine, St. Louis, MO, USA.
| | - Fatih Sogukpinar
- Department of Electrical Engineering, Washington University, St. Louis, MO, USA
| | - Kaining Zhang
- Department of Biomedical Engineering, Washington University, St. Louis, MO, USA
| | - Yang-Yang Feng
- Department of Biomedical Engineering, Washington University, St. Louis, MO, USA
| | - Julia Pai
- Department of Neuroscience, Washington University School of Medicine, St. Louis, MO, USA
| | - Ahmad Jezzini
- Department of Neuroscience, Washington University School of Medicine, St. Louis, MO, USA
| | - Ilya E Monosov
- Department of Neuroscience, Washington University School of Medicine, St. Louis, MO, USA.
- Department of Electrical Engineering, Washington University, St. Louis, MO, USA.
- Department of Biomedical Engineering, Washington University, St. Louis, MO, USA.
- Department of Neurosurgery School of Medicine, Washington University, St. Louis, MO, USA.
- Pain Center, Washington University School of Medicine, St. Louis, MO, USA.
| |
Collapse
|
23
|
Gao Z, Wang H, Lu C, Lu T, Froudist-Walsh S, Chen M, Wang XJ, Hu J, Sun W. The neural basis of delayed gratification. SCIENCE ADVANCES 2021; 7:eabg6611. [PMID: 34851665 PMCID: PMC8635439 DOI: 10.1126/sciadv.abg6611] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Accepted: 10/12/2021] [Indexed: 06/13/2023]
Abstract
Balancing instant gratification versus delayed but better gratification is important for optimizing survival and reproductive success. Although delayed gratification has been studied through human psychological and brain activity monitoring and animal research, little is known about its neural basis. We successfully trained mice to perform a waiting-for-water-reward delayed gratification task and used these animals in physiological recording and optical manipulation of neuronal activity during the task to explore its neural basis. Our results showed that the activity of dopaminergic (DAergic) neurons in the ventral tegmental area increases steadily during the waiting period. Optical activation or silencing of these neurons, respectively, extends or reduces the duration of waiting. To interpret these data, we developed a reinforcement learning model that reproduces our experimental observations. Steady increases in DAergic activity signal the value of waiting and support the hypothesis that delayed gratification involves real-time deliberation.
Collapse
Affiliation(s)
- Zilong Gao
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Chinese Institute for Brain Research, Beijing 102206, China
| | - Hanqing Wang
- Center for Neural Science, New York University, New York, NY 10003, USA
| | - Chen Lu
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Tiezhan Lu
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Chinese Institute for Brain Research, Beijing 102206, China
| | | | - Ming Chen
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Xiao-Jing Wang
- Center for Neural Science, New York University, New York, NY 10003, USA
| | - Ji Hu
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
- Shanghai Key Laboratory of Psychotic Disorders, Shanghai Mental Health Center, Shanghai 200030, China
| | - Wenzhi Sun
- Chinese Institute for Brain Research, Beijing 102206, China
- School of Basic Medical Sciences, Capital Medical University, Beijing 100069, China
| |
Collapse
|
24
|
McDougle SD, Ballard IC, Baribault B, Bishop SJ, Collins AGE. Executive Function Assigns Value to Novel Goal-Congruent Outcomes. Cereb Cortex 2021; 32:231-247. [PMID: 34231854 PMCID: PMC8634563 DOI: 10.1093/cercor/bhab205] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 05/10/2021] [Accepted: 06/04/2021] [Indexed: 11/14/2022] Open
Abstract
People often learn from the outcomes of their actions, even when these outcomes do not involve material rewards or punishments. How does our brain provide this flexibility? We combined behavior, computational modeling, and functional neuroimaging to probe whether learning from abstract novel outcomes harnesses the same circuitry that supports learning from familiar secondary reinforcers. Behavior and neuroimaging revealed that novel images can act as a substitute for rewards during instrumental learning, producing reliable reward-like signals in dopaminergic circuits. Moreover, we found evidence that prefrontal correlates of executive control may play a role in shaping flexible responses in reward circuits. These results suggest that learning from novel outcomes is supported by an interplay between high-level representations in prefrontal cortex and low-level responses in subcortical reward circuits. This interaction may allow for human reinforcement learning over arbitrarily abstract reward functions.
Collapse
Affiliation(s)
| | - Ian C Ballard
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA 94720, USA
| | - Beth Baribault
- Department of Psychology, University of California, Berkeley, CA 94704, USA
| | - Sonia J Bishop
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA 94720, USA
- Department of Psychology, University of California, Berkeley, CA 94704, USA
| | - Anne G E Collins
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA 94720, USA
- Department of Psychology, University of California, Berkeley, CA 94704, USA
| |
Collapse
|
25
|
Amaro D, Ferreiro DN, Grothe B, Pecka M. Source identity shapes spatial preference in primary auditory cortex during active navigation. Curr Biol 2021; 31:3875-3883.e5. [PMID: 34192513 DOI: 10.1016/j.cub.2021.06.025] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 05/10/2021] [Accepted: 06/09/2021] [Indexed: 01/05/2023]
Abstract
Information about the position of sensory objects and identifying their concurrent behavioral relevance is vital to navigate the environment. In the auditory system, spatial information is computed in the brain based on the position of the sound source relative to the observer and thus assumed to be egocentric throughout the auditory pathway. This assumption is largely based on studies conducted in either anesthetized or head-fixed and passively listening animals, thus lacking self-motion and selective listening. Yet these factors are fundamental components of natural sensing1 that may crucially impact the nature of spatial coding and sensory object representation.2 How individual objects are neuronally represented during unrestricted self-motion and active sensing remains mostly unexplored. Here, we trained gerbils on a behavioral foraging paradigm that required localization and identification of sound sources during free navigation. Chronic tetrode recordings in primary auditory cortex during task performance revealed previously unreported sensory object representations. Strikingly, the egocentric angle preference of the majority of spatially sensitive neurons changed significantly depending on the task-specific identity (outcome association) of the sound source. Spatial tuning also exhibited large temporal complexity. Moreover, we encountered egocentrically untuned neurons whose response magnitude differed between source identities. Using a neural network decoder, we show that, together, these neuronal response ensembles provide spatiotemporally co-existent information about both the egocentric location and the identity of individual sensory objects during self-motion, revealing a novel cortical computation principle for naturalistic sensing.
Collapse
Affiliation(s)
- Diana Amaro
- Division of Neurobiology, Department Biology II, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany; Graduate School of Systemic Neurosciences, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany
| | - Dardo N Ferreiro
- Division of Neurobiology, Department Biology II, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany; Department of General Psychology and Education, Ludwig-Maximilians-Universität München, Germany
| | - Benedikt Grothe
- Division of Neurobiology, Department Biology II, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany; Graduate School of Systemic Neurosciences, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany; Max Planck Institute of Neurobiology, Planegg-Martinsried, Germany
| | - Michael Pecka
- Division of Neurobiology, Department Biology II, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany.
| |
Collapse
|
26
|
Moran R, Dayan P, Dolan RJ. Efficiency and prioritization of inference-based credit assignment. Curr Biol 2021; 31:2747-2756.e6. [PMID: 33887181 PMCID: PMC8279739 DOI: 10.1016/j.cub.2021.03.091] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 02/11/2021] [Accepted: 03/29/2021] [Indexed: 11/16/2022]
Abstract
Organisms adapt to their environments by learning to approach states that predict rewards and avoid states associated with punishments. Knowledge about the affective value of states often relies on credit assignment (CA), whereby state values are updated on the basis of reward feedback. Remarkably, humans assign credit to states that are not observed but are instead inferred based on a cognitive map that represents structural knowledge of an environment. A pertinent example is authors attempting to infer the identity of anonymous reviewers to assign them credit or blame and, on this basis, inform future referee recommendations. Although inference is cognitively costly, it is unknown how it influences CA or how it is apportioned between hidden and observable states (for example, both anonymous and revealed reviewers). We addressed these questions in a task that provided choices between lotteries where each led to a unique pair of occasionally rewarding outcome states. On some trials, both states were observable (rendering inference nugatory), whereas on others, the identity of one of the states was concealed. Importantly, by exploiting knowledge of choice-state associations, subjects could infer the identity of this hidden state. We show that having to perform inference reduces state-value updates. Strikingly, and in violation of normative theories, this reduction in CA was selective for the observed outcome alone. These findings have implications for the operation of putative cognitive maps.
Collapse
Affiliation(s)
- Rani Moran
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, 10-12 Russell Square, London WC1B 5EH, UK; Wellcome Centre for Human Neuroimaging, University College London, London WC1N 3BG, UK.
| | - Peter Dayan
- Max Planck Institute for Biological Cybernetics, Max Planck-Ring 8, 72076 Tübingen, Germany; University of Tübingen, 72074 Tübingen, Germany
| | - Raymond J Dolan
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, 10-12 Russell Square, London WC1B 5EH, UK; Wellcome Centre for Human Neuroimaging, University College London, London WC1N 3BG, UK
| |
Collapse
|
27
|
Westbrook A, Frank MJ, Cools R. A mosaic of cost-benefit control over cortico-striatal circuitry. Trends Cogn Sci 2021; 25:710-721. [PMID: 34120845 DOI: 10.1016/j.tics.2021.04.007] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Revised: 04/12/2021] [Accepted: 04/15/2021] [Indexed: 12/22/2022]
Abstract
Dopamine contributes to cognitive control through well-established effects in both the striatum and cortex. Although earlier work suggests that dopamine affects cognitive control capacity, more recent work suggests that striatal dopamine may also impact on cognitive motivation. We consider the emerging perspective that striatal dopamine boosts control by making people more sensitive to the benefits versus the costs of cognitive effort, and we discuss how this sensitivity shapes competition between controlled and prepotent actions. We propose that dopamine signaling in distinct cortico-striatal subregions mediates different types of cost-benefit tradeoffs, and also discuss mechanisms for the local control of dopamine release, enabling selectivity among cortico-striatal circuits. In so doing, we show how this cost-benefit mosaic can reconcile seemingly conflicting findings about the impact of dopamine signaling on cognitive control.
Collapse
Affiliation(s)
- Andrew Westbrook
- Donders Centre for Cognitive Neuroimaging, Nijmegen, The Netherlands; Department of Psychiatry, Radboud University Medical Center, Nijmegen, The Netherlands; Department of Cognitive, Linguistic, and Psychological Sciences, Brown University, Providence, RI, USA.
| | - Michael J Frank
- Department of Cognitive, Linguistic, and Psychological Sciences, Brown University, Providence, RI, USA; Carney Institute for Brain Science, Brown University, Providence, RI, USA
| | - Roshan Cools
- Donders Centre for Cognitive Neuroimaging, Nijmegen, The Netherlands; Department of Psychiatry, Radboud University Medical Center, Nijmegen, The Netherlands
| |
Collapse
|
28
|
Palidis DJ, McGregor HR, Vo A, MacDonald PA, Gribble PL. Null effects of levodopa on reward- and error-based motor adaptation, savings, and anterograde interference. J Neurophysiol 2021; 126:47-67. [PMID: 34038228 DOI: 10.1152/jn.00696.2020] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Dopamine signaling is thought to mediate reward-based learning. We tested for a role of dopamine in motor adaptation by administering the dopamine precursor levodopa to healthy participants in two experiments involving reaching movements. Levodopa has been shown to impair reward-based learning in cognitive tasks. Thus, we hypothesized that levodopa would selectively impair aspects of motor adaptation that depend on the reinforcement of rewarding actions. In the first experiment, participants performed two separate tasks in which adaptation was driven either by visual error-based feedback of the hand position or binary reward feedback. We used EEG to measure event-related potentials evoked by task feedback. We hypothesized that levodopa would specifically diminish adaptation and the neural responses to feedback in the reward learning task. However, levodopa did not affect motor adaptation in either task nor did it diminish event-related potentials elicited by reward outcomes. In the second experiment, participants learned to compensate for mechanical force field perturbations applied to the hand during reaching. Previous exposure to a particular force field can result in savings during subsequent adaptation to the same force field or interference during adaptation to an opposite force field. We hypothesized that levodopa would diminish savings and anterograde interference, as previous work suggests that these phenomena result from a reinforcement learning process. However, we found no reliable effects of levodopa. These results suggest that reward-based motor adaptation, savings, and interference may not depend on the same dopaminergic mechanisms that have been shown to be disrupted by levodopa during various cognitive tasks.NEW & NOTEWORTHY Motor adaptation relies on multiple processes including reinforcement of successful actions. Cognitive reinforcement learning is impaired by levodopa-induced disruption of dopamine function. We administered levodopa to healthy adults who participated in multiple motor adaptation tasks. We found no effects of levodopa on any component of motor adaptation. This suggests that motor adaptation may not depend on the same dopaminergic mechanisms as cognitive forms or reinforcement learning that have been shown to be impaired by levodopa.
Collapse
Affiliation(s)
- Dimitrios J Palidis
- Brain and Mind Institute, Western University, London, Ontario, Canada.,Department of Psychology, Western University, London, Ontario, Canada.,Graduate Program in Neuroscience, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - Heather R McGregor
- Department of Applied Physiology and Kinesiology, University of Florida, Gainesville, Florida
| | - Andrew Vo
- Department of Neurology and Neurosurgery, Montreal Neurological Institute, McGill University, Montreal, Quebec, Canada
| | - Penny A MacDonald
- Brain and Mind Institute, Western University, London, Ontario, Canada.,Department of Psychology, Western University, London, Ontario, Canada.,Department of Physiology and Pharmacology, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada.,Department of Clinical Neurological Sciences, University of Western Ontario, London, Ontario, Canada
| | - Paul L Gribble
- Brain and Mind Institute, Western University, London, Ontario, Canada.,Department of Psychology, Western University, London, Ontario, Canada.,Department of Physiology and Pharmacology, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada.,Haskins Laboratories, New Haven, Connecticut
| |
Collapse
|
29
|
Liu Y, Xin Y, Xu NL. A cortical circuit mechanism for structural knowledge-based flexible sensorimotor decision-making. Neuron 2021; 109:2009-2024.e6. [PMID: 33957065 DOI: 10.1016/j.neuron.2021.04.014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 03/01/2021] [Accepted: 04/14/2021] [Indexed: 10/21/2022]
Abstract
Making flexible decisions based on prior knowledge about causal environmental structures is a hallmark of goal-directed cognition in mammalian brains. Although several association brain regions, including the orbitofrontal cortex (OFC), have been implicated, the precise neuronal circuit mechanisms underlying knowledge-based decision-making remain elusive. Here, we established an inference-based auditory categorization task where mice performed within-session flexible stimulus re-categorization by inferring the changing task rules. We constructed a reinforcement learning model to recapitulate the inference-based flexible behavior and quantify the hidden variables associated with task structural knowledge. Combining two-photon population imaging and projection-specific optogenetics, we found that auditory cortex (ACx) neurons encoded the hidden task rule variable, which requires feedback input from the OFC. Silencing OFC-ACx input specifically disrupted re-categorization behavior. Direct imaging from OFC axons in the ACx revealed task state-related feedback signals, supporting the knowledge-based updating mechanism. Our data reveal a cortical circuit mechanism underlying structural knowledge-based flexible decision-making.
Collapse
Affiliation(s)
- Yanhe Liu
- Institute of Neuroscience, State Key Laboratory of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031, China; University of the Chinese Academy of Sciences, Beijing 100049, China
| | - Yu Xin
- Institute of Neuroscience, State Key Laboratory of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031, China; University of the Chinese Academy of Sciences, Beijing 100049, China
| | - Ning-Long Xu
- Institute of Neuroscience, State Key Laboratory of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031, China; University of the Chinese Academy of Sciences, Beijing 100049, China; Shanghai Center for Brain Science and Brain-Inspired Intelligence Technology, Shanghai 201210, China.
| |
Collapse
|
30
|
Abstract
Experiments have implicated dopamine in model-based reinforcement learning (RL). These findings are unexpected as dopamine is thought to encode a reward prediction error (RPE), which is the key teaching signal in model-free RL. Here we examine two possible accounts for dopamine's involvement in model-based RL: the first that dopamine neurons carry a prediction error used to update a type of predictive state representation called a successor representation, the second that two well established aspects of dopaminergic activity, RPEs and surprise signals, can together explain dopamine's involvement in model-based RL.
Collapse
|
31
|
Rothenhoefer KM, Hong T, Alikaya A, Stauffer WR. Rare rewards amplify dopamine responses. Nat Neurosci 2021; 24:465-469. [PMID: 33686298 PMCID: PMC9373731 DOI: 10.1038/s41593-021-00807-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Accepted: 01/20/2021] [Indexed: 01/02/2023]
Abstract
Dopamine prediction error responses are essential components of universal learning mechanisms. However, it is unknown whether individual dopamine neurons reflect the shape of reward distributions. Here, we used symmetrical distributions with differently weighted tails to investigate how the frequency of rewards and reward prediction errors influence dopamine signals. Rare rewards amplified dopamine responses, even when conventional prediction errors were identical, indicating a mechanism for learning the complexities of real-world incentives.
Collapse
Affiliation(s)
- Kathryn M Rothenhoefer
- Center for Neuroscience, University of Pittsburgh, Pittsburgh, PA, USA
- Center for the Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, PA, USA
- Systems Neuroscience Center, University of Pittsburgh, Pittsburgh, PA, USA
- The Brain Institute, University of Pittsburgh, Pittsburgh, PA, USA
| | - Tao Hong
- Center for the Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, PA, USA
- Systems Neuroscience Center, University of Pittsburgh, Pittsburgh, PA, USA
- The Brain Institute, University of Pittsburgh, Pittsburgh, PA, USA
- Program in Neural Computation, Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Aydin Alikaya
- Center for Neuroscience, University of Pittsburgh, Pittsburgh, PA, USA
- Center for the Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, PA, USA
- Systems Neuroscience Center, University of Pittsburgh, Pittsburgh, PA, USA
- The Brain Institute, University of Pittsburgh, Pittsburgh, PA, USA
| | - William R Stauffer
- Center for Neuroscience, University of Pittsburgh, Pittsburgh, PA, USA.
- Center for the Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, PA, USA.
- Systems Neuroscience Center, University of Pittsburgh, Pittsburgh, PA, USA.
- The Brain Institute, University of Pittsburgh, Pittsburgh, PA, USA.
| |
Collapse
|
32
|
Starkweather CK, Uchida N. Dopamine signals as temporal difference errors: recent advances. Curr Opin Neurobiol 2021; 67:95-105. [PMID: 33186815 PMCID: PMC8107188 DOI: 10.1016/j.conb.2020.08.014] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 08/24/2020] [Accepted: 08/26/2020] [Indexed: 11/28/2022]
Abstract
In the brain, dopamine is thought to drive reward-based learning by signaling temporal difference reward prediction errors (TD errors), a 'teaching signal' used to train computers. Recent studies using optogenetic manipulations have provided multiple pieces of evidence supporting that phasic dopamine signals function as TD errors. Furthermore, novel experimental results have indicated that when the current state of the environment is uncertain, dopamine neurons compute TD errors using 'belief states' or a probability distribution over potential states. It remains unclear how belief states are computed but emerging evidence suggests involvement of the prefrontal cortex and the hippocampus. These results refine our understanding of the role of dopamine in learning and the algorithms by which dopamine functions in the brain.
Collapse
Affiliation(s)
- Clara Kwon Starkweather
- Center for Brain Science, Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA
| | - Naoshige Uchida
- Center for Brain Science, Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA.
| |
Collapse
|
33
|
|
34
|
Rmus M, McDougle SD, Collins AGE. The role of executive function in shaping reinforcement learning. Curr Opin Behav Sci 2021; 38:66-73. [DOI: 10.1016/j.cobeha.2020.10.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
35
|
Lerner TN, Holloway AL, Seiler JL. Dopamine, Updated: Reward Prediction Error and Beyond. Curr Opin Neurobiol 2021; 67:123-130. [PMID: 33197709 PMCID: PMC8116345 DOI: 10.1016/j.conb.2020.10.012] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 10/12/2020] [Accepted: 10/14/2020] [Indexed: 01/10/2023]
Abstract
Dopamine neurons have been intensely studied for their roles in reinforcement learning. A dominant theory of how these neurons contribute to learning is through the encoding of a reward prediction error (RPE) signal. Recent advances in dopamine research have added nuance to RPE theory by incorporating the ideas of sensory prediction error, distributional encoding, and belief states. Further nuance is likely to be added shortly by convergent lines of research on dopamine neuron diversity. Finally, a major challenge is to reconcile RPE theory with other current theories of dopamine function to account for dopamine's role in movement, motivation, and goal-directed planning.
Collapse
Affiliation(s)
- Talia N Lerner
- Feinberg School of Medicine and Department of Physiology, Northwestern University, Chicago, IL, USA; Northwestern University Interdepartmental Neuroscience Program, Chicago, IL, USA.
| | - Ashley L Holloway
- Feinberg School of Medicine and Department of Physiology, Northwestern University, Chicago, IL, USA; Northwestern University Interdepartmental Neuroscience Program, Chicago, IL, USA
| | - Jillian L Seiler
- Feinberg School of Medicine and Department of Physiology, Northwestern University, Chicago, IL, USA; Department of Psychology, University of Illinois at Chicago, Chicago, IL, USA
| |
Collapse
|
36
|
Coordinated Prefrontal State Transition Leads Extinction of Reward-Seeking Behaviors. J Neurosci 2021; 41:2406-2419. [PMID: 33531416 DOI: 10.1523/jneurosci.2588-20.2021] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Revised: 12/16/2020] [Accepted: 01/17/2021] [Indexed: 11/21/2022] Open
Abstract
Extinction learning suppresses conditioned reward responses and is thus fundamental to adapt to changing environmental demands and to control excessive reward seeking. The medial prefrontal cortex (mPFC) monitors and controls conditioned reward responses. Abrupt transitions in mPFC activity anticipate changes in conditioned responses to altered contingencies. It remains, however, unknown whether such transitions are driven by the extinction of old behavioral strategies or by the acquisition of new competing ones. Using in vivo multiple single-unit recordings of mPFC in male rats, we studied the relationship between single-unit and population dynamics during extinction learning, using alcohol as a positive reinforcer in an operant conditioning paradigm. To examine the fine temporal relation between neural activity and behavior, we developed a novel behavioral model that allowed us to identify the number, onset, and duration of extinction-learning episodes in the behavior of each animal. We found that single-unit responses to conditioned stimuli changed even under stable experimental conditions and behavior. However, when behavioral responses to task contingencies had to be updated, unit-specific modulations became coordinated across the whole population, pushing the network into a new stable attractor state. Thus, extinction learning is not associated with suppressed mPFC responses to conditioned stimuli, but is anticipated by single-unit coordination into population-wide transitions of the internal state of the animal.SIGNIFICANCE STATEMENT The ability to suppress conditioned behaviors when no longer beneficial is fundamental for the survival of any organism. While pharmacological and optogenetic interventions have shown a critical involvement of the mPFC in the suppression of conditioned responses, the neural dynamics underlying such a process are still largely unknown. Combining novel analysis tools to describe behavior, single-neuron response, and population activity, we found that widespread changes in neuronal firing temporally coordinate across the whole mPFC population in anticipation of behavioral extinction. This coordination leads to a global transition in the internal state of the network, driving extinction of conditioned behavior.
Collapse
|
37
|
Pisupati S, Chartarifsky-Lynn L, Khanal A, Churchland AK. Lapses in perceptual decisions reflect exploration. eLife 2021; 10:55490. [PMID: 33427198 PMCID: PMC7846276 DOI: 10.7554/elife.55490] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2020] [Accepted: 01/10/2021] [Indexed: 12/17/2022] Open
Abstract
Perceptual decision-makers often display a constant rate of errors independent of evidence strength. These ‘lapses’ are treated as a nuisance arising from noise tangential to the decision, e.g. inattention or motor errors. Here, we use a multisensory decision task in rats to demonstrate that these explanations cannot account for lapses’ stimulus dependence. We propose a novel explanation: lapses reflect a strategic trade-off between exploiting known rewarding actions and exploring uncertain ones. We tested this model’s predictions by selectively manipulating one action’s reward magnitude or probability. As uniquely predicted by this model, changes were restricted to lapses associated with that action. Finally, we show that lapses are a powerful tool for assigning decision-related computations to neural structures based on disruption experiments (here, posterior striatum and secondary motor cortex). These results suggest that lapses reflect an integral component of decision-making and are informative about action values in normal and disrupted brain states.
Collapse
Affiliation(s)
- Sashank Pisupati
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States.,CSHL School of Biological Sciences, Cold Spring Harbor, New York, United States
| | - Lital Chartarifsky-Lynn
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States.,CSHL School of Biological Sciences, Cold Spring Harbor, New York, United States
| | - Anup Khanal
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States
| | | |
Collapse
|
38
|
Kim HR, Malik AN, Mikhael JG, Bech P, Tsutsui-Kimura I, Sun F, Zhang Y, Li Y, Watabe-Uchida M, Gershman SJ, Uchida N. A Unified Framework for Dopamine Signals across Timescales. Cell 2020; 183:1600-1616.e25. [PMID: 33248024 PMCID: PMC7736562 DOI: 10.1016/j.cell.2020.11.013] [Citation(s) in RCA: 139] [Impact Index Per Article: 34.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 08/20/2020] [Accepted: 11/09/2020] [Indexed: 01/06/2023]
Abstract
Rapid phasic activity of midbrain dopamine neurons is thought to signal reward prediction errors (RPEs), resembling temporal difference errors used in machine learning. However, recent studies describing slowly increasing dopamine signals have instead proposed that they represent state values and arise independent from somatic spiking activity. Here we developed experimental paradigms using virtual reality that disambiguate RPEs from values. We examined dopamine circuit activity at various stages, including somatic spiking, calcium signals at somata and axons, and striatal dopamine concentrations. Our results demonstrate that ramping dopamine signals are consistent with RPEs rather than value, and this ramping is observed at all stages examined. Ramping dopamine signals can be driven by a dynamic stimulus that indicates a gradual approach to a reward. We provide a unified computational understanding of rapid phasic and slowly ramping dopamine signals: dopamine neurons perform a derivative-like computation over values on a moment-by-moment basis.
Collapse
Affiliation(s)
- HyungGoo R Kim
- Center for Brain Science, Department of Molecular and Cellular Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA.
| | - Athar N Malik
- Center for Brain Science, Department of Molecular and Cellular Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA; Department of Neurosurgery, Massachusetts General Hospital, 55 Fruit Street, Boston, MA 02114, USA
| | - John G Mikhael
- Program in Neuroscience, Harvard Medical School, 220 Longwood Avenue, Boston, MA 02115, USA; MD-PhD Program, Harvard Medical School, 260 Longwood Avenue, Boston, MA 02115, USA
| | - Pol Bech
- Center for Brain Science, Department of Molecular and Cellular Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA
| | - Iku Tsutsui-Kimura
- Center for Brain Science, Department of Molecular and Cellular Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA
| | - Fangmiao Sun
- State Key Laboratory of Membrane Biology, Peking University School of Life Sciences, Beijing 100871, China; Peking-Tsinghua Center for Life Sciences, Beijing 100871, China; PKU-IDG/McGovern Institute for Brain Research, Beijing 100871, China
| | - Yajun Zhang
- State Key Laboratory of Membrane Biology, Peking University School of Life Sciences, Beijing 100871, China; Peking-Tsinghua Center for Life Sciences, Beijing 100871, China; PKU-IDG/McGovern Institute for Brain Research, Beijing 100871, China
| | - Yulong Li
- State Key Laboratory of Membrane Biology, Peking University School of Life Sciences, Beijing 100871, China; Peking-Tsinghua Center for Life Sciences, Beijing 100871, China; PKU-IDG/McGovern Institute for Brain Research, Beijing 100871, China
| | - Mitsuko Watabe-Uchida
- Center for Brain Science, Department of Molecular and Cellular Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA
| | - Samuel J Gershman
- Department of Psychology, Center for Brain Science, Harvard University, 52 Oxford Street, Cambridge, MA 02138, USA
| | - Naoshige Uchida
- Center for Brain Science, Department of Molecular and Cellular Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA.
| |
Collapse
|
39
|
Lowet AS, Zheng Q, Matias S, Drugowitsch J, Uchida N. Distributional Reinforcement Learning in the Brain. Trends Neurosci 2020; 43:980-997. [PMID: 33092893 PMCID: PMC8073212 DOI: 10.1016/j.tins.2020.09.004] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Revised: 08/14/2020] [Accepted: 09/08/2020] [Indexed: 12/11/2022]
Abstract
Learning about rewards and punishments is critical for survival. Classical studies have demonstrated an impressive correspondence between the firing of dopamine neurons in the mammalian midbrain and the reward prediction errors of reinforcement learning algorithms, which express the difference between actual reward and predicted mean reward. However, it may be advantageous to learn not only the mean but also the complete distribution of potential rewards. Recent advances in machine learning have revealed a biologically plausible set of algorithms for reconstructing this reward distribution from experience. Here, we review the mathematical foundations of these algorithms as well as initial evidence for their neurobiological implementation. We conclude by highlighting outstanding questions regarding the circuit computation and behavioral readout of these distributional codes.
Collapse
Affiliation(s)
- Adam S Lowet
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA 02138, USA
| | - Qiao Zheng
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Sara Matias
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA 02138, USA
| | - Jan Drugowitsch
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115, USA.
| | - Naoshige Uchida
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA 02138, USA.
| |
Collapse
|
40
|
Mendoza JA, Lafferty CK, Yang AK, Britt JP. Cue-Evoked Dopamine Neuron Activity Helps Maintain but Does Not Encode Expected Value. Cell Rep 2020; 29:1429-1437.e3. [PMID: 31693885 DOI: 10.1016/j.celrep.2019.09.077] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Revised: 08/21/2019] [Accepted: 09/26/2019] [Indexed: 11/16/2022] Open
Abstract
Cue-evoked midbrain dopamine (DA) neuron activity reflects expected value, but its influence on reward assessment is unclear. In mice performing a trial-based operant task, we test if bidirectional manipulations of cue or operant-associated DA neuron activity drive learning as a result of under- or overexpectation of reward value. We target optogenetic manipulations to different components of forced trials, when only one lever is presented, and assess lever biases on choice trials in the absence of photomanipulation. Although lever biases are demonstrated to be flexible and sensitive to changes in expected value, augmentation of cue or operant-associated DA signaling does not significantly alter choice behavior, and blunting DA signaling during any component of the forced trials reduces choice trial responses on the associated lever. These data suggest cue-evoked DA helps maintain cue-value associations but does not encode expected value as to set the benchmark against which received reward is judged.
Collapse
Affiliation(s)
- Jesse A Mendoza
- Department of Psychology, McGill University, Montreal, QC H3A 1B1, Canada; Center for Studies in Behavioral Neurobiology, Concordia University, Montreal, QC H4B 1R6, Canada
| | - Christopher K Lafferty
- Department of Psychology, McGill University, Montreal, QC H3A 1B1, Canada; Center for Studies in Behavioral Neurobiology, Concordia University, Montreal, QC H4B 1R6, Canada
| | - Angela K Yang
- Integrated Program in Neuroscience, McGill University, Montreal, QC H3A 2B4, Canada; Center for Studies in Behavioral Neurobiology, Concordia University, Montreal, QC H4B 1R6, Canada
| | - Jonathan P Britt
- Department of Psychology, McGill University, Montreal, QC H3A 1B1, Canada; Integrated Program in Neuroscience, McGill University, Montreal, QC H3A 2B4, Canada; Center for Studies in Behavioral Neurobiology, Concordia University, Montreal, QC H4B 1R6, Canada.
| |
Collapse
|
41
|
Jacobs DS, Moghaddam B. Prefrontal Cortex Representation of Learning of Punishment Probability During Reward-Motivated Actions. J Neurosci 2020; 40:5063-5077. [PMID: 32409619 PMCID: PMC7314405 DOI: 10.1523/jneurosci.0310-20.2020] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Revised: 04/14/2020] [Accepted: 05/10/2020] [Indexed: 01/14/2023] Open
Abstract
Actions executed toward obtaining a reward are frequently associated with the probability of harm occurring during action execution. Learning this probability allows for appropriate computation of future harm to guide action selection. Impaired learning of this probability may be critical for the pathogenesis of anxiety or reckless and impulsive behavior. Here we designed a task for punishment probability learning during reward-guided actions to begin to understand the neuronal basis of this form of learning, and the biological or environmental variables that influence action selection after learning. Male and female Long-Evans rats were trained in a seek-take behavioral paradigm where the seek action was associated with varying probability of punishment. The take action remained safe and was followed by reward delivery. Learning was evident as subjects selectively adapted seek action behavior as a function of punishment probability. Recording of neural activity in the mPFC during learning revealed changes in phasic mPFC neuronal activity during risky-seek actions but not during the safe take actions or reward delivery, revealing that this region is involved in learning of probabilistic punishment. After learning, the variables that influenced behavior included reinforcer and punisher value, pretreatment with the anxiolytic diazepam, and biological sex. In particular, females were more sensitive to probabilistic punishment than males. These data demonstrate that flexible encoding of risky actions by mPFC is involved in probabilistic punishment learning and provide a novel behavioral approach for studying the pathogenesis of anxiety and impulsivity with inclusion of sex as a biological variable.SIGNIFICANCE STATEMENT Actions we choose to execute toward obtaining a reward are often associated with the probability of harm occurring. Impaired learning of this probability may be critical for the pathogenesis of anxiety or reckless behavior and impulsivity. We developed a behavioral model to assess this mode of learning. This procedure allowed us to determine biological and environmental factors that influence the resistance of reward seeking to probabilistic punishment and to identify the mPFC as a region that flexibly adapts its response to risky actions as contingencies are learned.
Collapse
Affiliation(s)
- David S Jacobs
- Behavioral and Systems Neuroscience Program and the Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, Oregon 97239
| | - Bita Moghaddam
- Behavioral and Systems Neuroscience Program and the Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, Oregon 97239
| |
Collapse
|
42
|
Seitz RJ, Angel HF. Belief formation - A driving force for brain evolution. Brain Cogn 2020; 140:105548. [PMID: 32062327 DOI: 10.1016/j.bandc.2020.105548] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2019] [Revised: 02/06/2020] [Accepted: 02/06/2020] [Indexed: 01/10/2023]
Abstract
The topic of belief has been neglected in the natural sciences for a long period of time. Recent neuroscience research in non-human primates and humans, however, has shown that beliefs are the neuropsychic product of fundamental brain processes that attribute affective meaning to concrete objects and events, enabling individual goal setting, decision making and maneuvering in the environment. With regard to the involved neural processes they can be categorized as empirical, relational, and conceptual beliefs. Empirical beliefs are about objects and relational beliefs are about events as in tool use and in interactions between subjects that develop below the level of awareness and are up-dated dynamically. Conceptual beliefs are more complex being based on narratives and participation in ritual acts. As neural processes are known to require computational space in the brain, the formation of inceasingly complex beliefs demands extra neural resources. Here, we argue that the evolution of human beliefs is related to the phylogenetic enlargement of the brain including the parietal and medial frontal cortex in humans.
Collapse
Affiliation(s)
- Rüdiger J Seitz
- Department of Neurology, Centre of Neurology and Neuropsychiatry, LVR-Klinikum Düsseldorf, Medical Faculty, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany; Florey Neuroscience Institutes, Melbourne, Australia.
| | - Hans-Ferdinand Angel
- Karl Franzens University Graz, Institute of Catechetic and Pedagogic of Religion, Graz, Austria
| |
Collapse
|
43
|
Anderson EC, Carleton RN, Diefenbach M, Han PKJ. The Relationship Between Uncertainty and Affect. Front Psychol 2019; 10:2504. [PMID: 31781003 PMCID: PMC6861361 DOI: 10.3389/fpsyg.2019.02504] [Citation(s) in RCA: 99] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 10/22/2019] [Indexed: 11/23/2022] Open
Abstract
Uncertainty and affect are fundamental and interrelated aspects of the human condition. Uncertainty is often associated with negative affect, but in some circumstances, it is associated with positive affect. In this article, we review different explanations for the varying relationship between uncertainty and affect. We identify "mental simulation" as a key process that links uncertainty to affective states. We suggest that people have a propensity to simulate negative outcomes, which result in a propensity toward negative affective responses to uncertainty. We also propose the existence of several important moderators of this process, including context and individual differences such as uncertainty tolerance, as well as emotion regulation strategies. Finally, we highlight important knowledge gaps and promising areas for future research, both empirical and conceptual, to further elucidate the relationship between uncertainty and affect.
Collapse
Affiliation(s)
- Eric C. Anderson
- Center for Outcomes Research and Evaluation, Maine Medical Center Research Institute, Portland, ME, United States
- Department of Medicine, Tufts University Medical Center, Boston, MA, United States
| | | | - Michael Diefenbach
- Departments of Medicine, Urology, and Psychiatry, Northwell Health, New York, NY, United States
| | - Paul K. J. Han
- Center for Outcomes Research and Evaluation, Maine Medical Center Research Institute, Portland, ME, United States
- Department of Medicine, Tufts University Medical Center, Boston, MA, United States
| |
Collapse
|
44
|
Abstract
Midbrain dopamine signals are widely thought to report reward prediction errors that drive learning in the basal ganglia. However, dopamine has also been implicated in various probabilistic computations, such as encoding uncertainty and controlling exploration. Here, we show how these different facets of dopamine signalling can be brought together under a common reinforcement learning framework. The key idea is that multiple sources of uncertainty impinge on reinforcement learning computations: uncertainty about the state of the environment, the parameters of the value function and the optimal action policy. Each of these sources plays a distinct role in the prefrontal cortex-basal ganglia circuit for reinforcement learning and is ultimately reflected in dopamine activity. The view that dopamine plays a central role in the encoding and updating of beliefs brings the classical prediction error theory into alignment with more recent theories of Bayesian reinforcement learning.
Collapse
Affiliation(s)
- Samuel J Gershman
- Department of Psychology, Center for Brain Science, Harvard University, Cambridge, MA, USA.
| | - Naoshige Uchida
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA, USA
| |
Collapse
|
45
|
Abstract
The latest animal neurophysiology has revealed that the dopamine reward prediction error signal drives neuronal learning in addition to behavioral learning and reflects subjective reward representations beyond explicit contingency. The signal complies with formal economic concepts and functions in real-world consumer choice and social interaction. An early response component is influenced by physical impact, reward environment, and novelty but does not fully code prediction error. Some dopamine neurons are activated by aversive stimuli, which may reflect physical stimulus impact or true aversiveness, but they do not seem to code general negative value or aversive prediction error. The reward prediction error signal is complemented by distinct, heterogeneous, smaller and slower changes reflecting sensory and motor contributors to behavioral activation, such as substantial movement (as opposed to precise motor control), reward expectation, spatial choice, vigor, and motivation. The different dopamine signals seem to defy a simple unifying concept and should be distinguished to better understand phasic dopamine functions.
Collapse
Affiliation(s)
- Wolfram Schultz
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, CB2 3DY, UK
| |
Collapse
|
46
|
Morel C, Montgomery S, Han MH. Nicotine and alcohol: the role of midbrain dopaminergic neurons in drug reinforcement. Eur J Neurosci 2019; 50:2180-2200. [PMID: 30251377 PMCID: PMC6431587 DOI: 10.1111/ejn.14160] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2018] [Revised: 07/31/2018] [Accepted: 08/20/2018] [Indexed: 12/11/2022]
Abstract
Nicotine and alcohol addiction are leading causes of preventable death worldwide and continue to constitute a huge socio-economic burden. Both nicotine and alcohol perturb the brain's mesocorticolimbic system. Dopamine (DA) neurons projecting from the ventral tegmental area (VTA) to multiple downstream structures, including the nucleus accumbens, prefrontal cortex, and amygdala, are highly involved in the maintenance of healthy brain function. VTA DA neurons play a crucial role in associative learning and reinforcement. Nicotine and alcohol usurp these functions, promoting reinforcement of drug taking behaviors. In this review, we will first describe how nicotine and alcohol individually affect VTA DA neurons by examining how drug exposure alters the heterogeneous VTA microcircuit and network-wide projections. We will also examine how coadministration or previous exposure to nicotine or alcohol may augment the reinforcing effects of the other. Additionally, this review briefly summarizes the role of VTA DA neurons in nicotine, alcohol, and their synergistic effects in reinforcement and also addresses the remaining questions related to the circuit-function specificity of the dopaminergic system in mediating nicotine/alcohol reinforcement and comorbidity.
Collapse
Affiliation(s)
- Carole Morel
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Affective Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Sarah Montgomery
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Affective Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ming-Hu Han
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Affective Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
47
|
Watabe-Uchida M, Uchida N. Multiple Dopamine Systems: Weal and Woe of Dopamine. COLD SPRING HARBOR SYMPOSIA ON QUANTITATIVE BIOLOGY 2019; 83:83-95. [PMID: 30787046 DOI: 10.1101/sqb.2018.83.037648] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
The ability to predict future outcomes increases the fitness of the animal. Decades of research have shown that dopamine neurons broadcast reward prediction error (RPE) signals-the discrepancy between actual and predicted reward-to drive learning to predict future outcomes. Recent studies have begun to show, however, that dopamine neurons are more diverse than previously thought. In this review, we will summarize a series of our studies that have shown unique properties of dopamine neurons projecting to the posterior "tail" of the striatum (TS) in terms of anatomy, activity, and function. Specifically, TS-projecting dopamine neurons are activated by a subset of negative events including threats from a novel object, send prediction errors for external threats, and reinforce avoidance behaviors. These results indicate that there are at least two axes of dopamine-mediated reinforcement learning in the brain-one learning from canonical RPEs and another learning from threat prediction errors. We argue that the existence of multiple learning systems is an adaptive strategy that makes possible each system optimized for its own needs. The compartmental organization in the mammalian striatum resembles that of a dopamine-recipient area in insects (mushroom body), pointing to a principle of dopamine function conserved across phyla.
Collapse
Affiliation(s)
- Mitsuko Watabe-Uchida
- Center for Brain Science, Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts 02138, USA
| | - Naoshige Uchida
- Center for Brain Science, Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts 02138, USA
| |
Collapse
|
48
|
Moran R, Keramati M, Dayan P, Dolan RJ. Retrospective model-based inference guides model-free credit assignment. Nat Commun 2019; 10:750. [PMID: 30765718 PMCID: PMC6375980 DOI: 10.1038/s41467-019-08662-8] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2018] [Accepted: 01/17/2019] [Indexed: 11/09/2022] Open
Abstract
An extensive reinforcement learning literature shows that organisms assign credit efficiently, even under conditions of state uncertainty. However, little is known about credit-assignment when state uncertainty is subsequently resolved. Here, we address this problem within the framework of an interaction between model-free (MF) and model-based (MB) control systems. We present and support experimentally a theory of MB retrospective-inference. Within this framework, a MB system resolves uncertainty that prevailed when actions were taken thus guiding an MF credit-assignment. Using a task in which there was initial uncertainty about the lotteries that were chosen, we found that when participants' momentary uncertainty about which lottery had generated an outcome was resolved by provision of subsequent information, participants preferentially assigned credit within a MF system to the lottery they retrospectively inferred was responsible for this outcome. These findings extend our knowledge about the range of MB functions and the scope of system interactions.
Collapse
Affiliation(s)
- Rani Moran
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, 10-12 Russell Square, London, WC1B 5EH, UK. .,Wellcome Centre for Human Neuroimaging, University College London, London, WC1N 3BG, United Kingdom.
| | - Mehdi Keramati
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, 10-12 Russell Square, London, WC1B 5EH, UK.,Wellcome Centre for Human Neuroimaging, University College London, London, WC1N 3BG, United Kingdom.,Department of Psychology, City, University of London, London, EC1R 0JD, UK
| | - Peter Dayan
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, 10-12 Russell Square, London, WC1B 5EH, UK.,Gatsby Computational Neuroscience Unit, University College London, London, W1T 4JG, UK.,Max Planck Institute for Biological Cybernetics, Max Plank-Ring 8, 72076, Tuebingen, Germany
| | - Raymond J Dolan
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, 10-12 Russell Square, London, WC1B 5EH, UK.,Wellcome Centre for Human Neuroimaging, University College London, London, WC1N 3BG, United Kingdom
| |
Collapse
|
49
|
Seitz RJ, Paloutzian RF, Angel HF. Believing is representation mediated by the dopamine brain system. Eur J Neurosci 2018; 49:1212-1214. [PMID: 30586210 DOI: 10.1111/ejn.14317] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2018] [Revised: 11/15/2018] [Accepted: 12/18/2018] [Indexed: 12/12/2022]
Affiliation(s)
- Rüdiger J Seitz
- Medical Faculty, Heinrich-Heine-University Düsseldorf, LVR-Klinikum Düsseldorf, Düsseldorf, Germany
| | | | - Hans-Ferdinand Angel
- Institute of Catechetic and Pedagogic of Religion, Karl-Franzens University Graz, Graz, Austria
| |
Collapse
|
50
|
Zhang K, Chen CD, Monosov IE. Novelty, Salience, and Surprise Timing Are Signaled by Neurons in the Basal Forebrain. Curr Biol 2018; 29:134-142.e3. [PMID: 30581022 DOI: 10.1016/j.cub.2018.11.012] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Revised: 10/12/2018] [Accepted: 11/02/2018] [Indexed: 10/27/2022]
Abstract
The basal forebrain (BF) is a principal source of modulation of the neocortex [1-6] and is thought to regulate cognitive functions such as attention, motivation, and learning by broadcasting information about salience [2, 3, 5, 7-19]. However, events can be salient for multiple reasons-such as novelty, surprise, or reward prediction errors [20-24]-and to date, precisely which salience-related information the BF broadcasts is unclear. Here, we report that the primate BF contains at least two types of neurons that often process salient events in distinct manners: one with phasic burst responses to cues predicting salient events and one with ramping activity anticipating such events. Bursting neurons respond to cues that convey predictions about the magnitude, probability, and timing of primary reinforcements. They also burst to the reinforcement itself, particularly when it is unexpected. However, they do not have a selective response to reinforcement omission (the unexpected absence of an event). Thus, bursting neurons do not convey value-prediction errors but do signal surprise associated with external events. Indeed, they are not limited to processing primary reinforcement: they discriminate fully expected novel visual objects from familiar objects and respond to object-sequence violations. In contrast, ramping neurons predict the timing of many salient, novel, and surprising events. Their ramping activity is highly sensitive to the subjects' confidence in event timing and on average encodes the subjects' surprise after unexpected events occur. These data suggest that the primate BF contains mechanisms to anticipate the timing of a diverse set of important external events (via ramping activity) and to rapidly deploy cognitive resources when these events occur (via short latency bursting).
Collapse
Affiliation(s)
- Kaining Zhang
- Department of Neuroscience, Washington University in St. Louis, St. Louis, MO 63110; Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Charles D Chen
- Department of Neuroscience, Washington University in St. Louis, St. Louis, MO 63110
| | - Ilya E Monosov
- Department of Neuroscience, Washington University in St. Louis, St. Louis, MO 63110; Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, MO 63110, USA.
| |
Collapse
|