1
|
Sosis B, Rubin JE. Distinct dopaminergic spike-timing-dependent plasticity rules are suited to different functional roles. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.24.600372. [PMID: 38979377 PMCID: PMC11230239 DOI: 10.1101/2024.06.24.600372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Various mathematical models have been formulated to describe the changes in synaptic strengths resulting from spike-timing-dependent plasticity (STDP). A subset of these models include a third factor, dopamine, which interacts with spike timing to contribute to plasticity at specific synapses, notably those from cortex to striatum at the input layer of the basal ganglia. Theoretical work to analyze these plasticity models has largely focused on abstract issues, such as the conditions under which they may promote synchronization and the weight distributions induced by inputs with simple correlation structures, rather than on scenarios associated with specific tasks, and has generally not considered dopamine-dependent forms of STDP. In this paper we introduce three forms of dopamine-modulated STDP adapted from previously proposed plasticity rules. We then analyze, mathematically and with simulations, their performance in three biologically relevant scenarios. We test the ability of each of the three models to maintain its weights in the face of noise and to complete simple reward prediction and action selection tasks, studying the learned weight distributions and corresponding task performance in each setting. Interestingly, we find that each plasticity rule is well suited to a subset of the scenarios studied but falls short in others. Different tasks may therefore require different forms of synaptic plasticity, yielding the prediction that the precise form of the STDP mechanism present may vary across regions of the striatum, and other brain areas impacted by dopamine, that are involved in distinct computational functions.
Collapse
|
2
|
Banuelos C, Creswell K, Walsh C, Manuck SB, Gianaros PJ, Verstynen T. D2 dopamine receptor expression, reactivity to rewards, and reinforcement learning in a complex value-based decision-making task. Soc Cogn Affect Neurosci 2024; 19:nsae050. [PMID: 38988197 PMCID: PMC11281849 DOI: 10.1093/scan/nsae050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 04/24/2024] [Accepted: 07/10/2024] [Indexed: 07/12/2024] Open
Abstract
Different dopamine (DA) subtypes have opposing dynamics at postsynaptic receptors, with the ratio of D1 to D2 receptors determining the relative sensitivity to gains and losses, respectively, during value-based learning. This effective sensitivity to different reward feedback interacts with phasic DA levels to determine the effectiveness of learning, particularly in dynamic feedback situations where the frequency and magnitude of rewards need to be integrated over time to make optimal decisions. We modeled this effect in simulations of the underlying basal ganglia pathways and then tested the predictions in individuals with a variant of the human dopamine receptor D2 (DRD2; -141C Ins/Del and Del/Del) gene that associates with lower levels of D2 receptor expression (N = 119) and compared their performance in the Iowa Gambling Task to noncarrier controls (N = 319). Ventral striatal (VS) reactivity to rewards was measured in the Cards task with fMRI. DRD2 variant carriers made less effective decisions than noncarriers, but this effect was not moderated by VS reward reactivity as is hypothesized by our model. These results suggest that the interaction between DA receptor subtypes and reactivity to rewards during learning may be more complex than originally thought.
Collapse
Affiliation(s)
- Cristina Banuelos
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA 15213, United States
- Carnegie Mellon Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, United States
- Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, PA 15213, United States
| | - Kasey Creswell
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA 15213, United States
| | - Catherine Walsh
- Department of Psychology, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - Stephen B Manuck
- Department of Psychology, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - Peter J Gianaros
- Department of Psychology, University of Pittsburgh, Pittsburgh, PA 15260, United States
- Center for the Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - Timothy Verstynen
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA 15213, United States
- Carnegie Mellon Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, United States
- Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, PA 15213, United States
- Department of Biomedical Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, United States
| |
Collapse
|
3
|
Schütt HH, Kim D, Ma WJ. Reward prediction error neurons implement an efficient code for reward. Nat Neurosci 2024; 27:1333-1339. [PMID: 38898182 DOI: 10.1038/s41593-024-01671-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 04/29/2024] [Indexed: 06/21/2024]
Abstract
We use efficient coding principles borrowed from sensory neuroscience to derive the optimal neural population to encode a reward distribution. We show that the responses of dopaminergic reward prediction error neurons in mouse and macaque are similar to those of the efficient code in the following ways: the neurons have a broad distribution of midpoints covering the reward distribution; neurons with higher thresholds have higher gains, more convex tuning functions and lower slopes; and their slope is higher when the reward distribution is narrower. Furthermore, we derive learning rules that converge to the efficient code. The learning rule for the position of the neuron on the reward axis closely resembles distributional reinforcement learning. Thus, reward prediction error neuron responses may be optimized to broadcast an efficient reward signal, forming a connection between efficient coding and reinforcement learning, two of the most successful theories in computational neuroscience.
Collapse
Affiliation(s)
- Heiko H Schütt
- Center for Neural Science and Department of Psychology, New York University, New York, NY, USA.
- Department of Behavioural and Cognitive Sciences, Université du Luxembourg, Esch-Belval, Luxembourg.
| | - Dongjae Kim
- Center for Neural Science and Department of Psychology, New York University, New York, NY, USA
- Department of AI-Based Convergence, Dankook University, Yongin, Republic of Korea
| | - Wei Ji Ma
- Center for Neural Science and Department of Psychology, New York University, New York, NY, USA
| |
Collapse
|
4
|
Augustat N, Endres D, Mueller EM. Uncertainty of treatment efficacy moderates placebo effects on reinforcement learning. Sci Rep 2024; 14:14421. [PMID: 38909105 PMCID: PMC11193823 DOI: 10.1038/s41598-024-64240-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 06/06/2024] [Indexed: 06/24/2024] Open
Abstract
The placebo-reward hypothesis postulates that positive effects of treatment expectations on health (i.e., placebo effects) and reward processing share common neural underpinnings. Moreover, experiments in humans and animals indicate that reward uncertainty increases striatal dopamine, which is presumably involved in placebo responses and reward learning. Therefore, treatment uncertainty analogously to reward uncertainty may affect updating from rewards after placebo treatment. Here, we address whether different degrees of uncertainty regarding the efficacy of a sham treatment affect reward sensitivity. In an online between-subjects experiment with N = 141 participants, we systematically varied the provided efficacy instructions before participants first received a sham treatment that consisted of listening to binaural beats and then performed a probabilistic reinforcement learning task. We fitted a Q-learning model including two different learning rates for positive (gain) and negative (loss) reward prediction errors and an inverse gain parameter to behavioral decision data in the reinforcement learning task. Our results yielded an inverted-U-relationship between provided treatment efficacy probability and learning rates for gain, such that higher levels of treatment uncertainty, rather than of expected net efficacy, affect presumably dopamine-related reward learning. These findings support the placebo-reward hypothesis and suggest harnessing uncertainty in placebo treatment for recovering reward learning capabilities.
Collapse
Affiliation(s)
- Nick Augustat
- Department of Psychology, University of Marburg, Marburg, Germany.
| | - Dominik Endres
- Department of Psychology, University of Marburg, Marburg, Germany
| | - Erik M Mueller
- Department of Psychology, University of Marburg, Marburg, Germany
| |
Collapse
|
5
|
Giossi C, Bahuguna J, Rubin JE, Verstynen T, Vich C. Arkypallidal neurons in the external globus pallidus can mediate inhibitory control by altering competition in the striatum. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.03.592321. [PMID: 38746308 PMCID: PMC11092778 DOI: 10.1101/2024.05.03.592321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Reactive inhibitory control is crucial for survival. Traditionally, this control in mammals was attributed solely to the hyperdirect pathway, with cortical control signals flowing unidirectionally from the subthalamic nucleus (STN) to basal ganglia output regions. Yet recent findings have put this model into question, suggesting that the STN is assisted in stopping actions through ascending control signals to the striatum mediated by the external globus pallidus (GPe). Here we investigate this suggestion by harnessing a biologically-constrained spiking model of the corticobasal ganglia-thalamic (CBGT) circuit that includes pallidostriatal pathways originating from arkypallidal neurons. Through a series of experiments probing the interaction between three critical inhibitory nodes (the STN, arkypallidal cells, and indirect path-way spiny projection neurons), we find that the GPe acts as a critical mediator of both ascending and descending inhibitory signals in the CBGT circuit. In particular, pallidostriatal pathways regulate this process by weakening the direct pathway dominance of the evidence accumulation process driving decisions, which increases the relative suppressive influence of the indirect pathway on basal ganglia output. These findings delineate how pallidostriatal pathways can facilitate action cancellation by managing the bidirectional flow of information within CBGT circuits.
Collapse
|
6
|
Du Y, Forrence AD, Metcalf DM, Haith AM. Action initiation and action inhibition follow the same time course when compared under matched experimental conditions. J Neurophysiol 2024; 131:757-767. [PMID: 38478894 DOI: 10.1152/jn.00434.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 02/15/2024] [Accepted: 03/07/2024] [Indexed: 04/16/2024] Open
Abstract
The ability to initiate an action quickly when needed and the ability to cancel an impending action are both fundamental to action control. It is often presumed that they are qualitatively distinct processes, yet they have largely been studied in isolation and little is known about how they relate to one another. Comparing previous experimental results shows a similar time course for response initiation and response inhibition. However, the exact time course varies widely depending on experimental conditions, including the frequency of different trial types and the urgency to respond. For example, in the stop-signal task, where both action initiation and action inhibition are involved and could be compared, action inhibition is typically found to be much faster. However, this apparent difference is likely due to there being much greater urgency to inhibit an action than to initiate one in order to avoid failing at the task. This asymmetry in the urgency between action initiation and action inhibition makes it impossible to compare their relative time courses in a single task. Here, we demonstrate that when action initiation and action inhibition are measured separately under conditions that are matched as closely as possible, their speeds are not distinguishable and are positively correlated across participants. Our results raise the possibility that action initiation and action inhibition may not necessarily be qualitatively distinct processes but may instead reflect complementary outcomes of a single decision process determining whether or not to act.NEW & NOTEWORTHY The time courses of initiating an action and canceling an action have largely been studied in isolation, and little is known about their relationship. Here, we show that when measured under comparable conditions the speeds of action initiation and action inhibition are the same. This finding raises the possibility that these two functions may be more closely related than previously assumed, with potentially important implications for their underlying neural basis.
Collapse
Affiliation(s)
- Yue Du
- Department of NeurologyJohns Hopkins University, BaltimoreMarylandUnited States
| | | | - Delaney M Metcalf
- Department of NeurologyJohns Hopkins University, BaltimoreMarylandUnited States
| | - Adrian M Haith
- Department of NeurologyJohns Hopkins University, BaltimoreMarylandUnited States
| |
Collapse
|
7
|
Wang Y, Lak A, Manohar SG, Bogacz R. Dopamine encoding of novelty facilitates efficient uncertainty-driven exploration. PLoS Comput Biol 2024; 20:e1011516. [PMID: 38626219 PMCID: PMC11051659 DOI: 10.1371/journal.pcbi.1011516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 04/26/2024] [Accepted: 03/23/2024] [Indexed: 04/18/2024] Open
Abstract
When facing an unfamiliar environment, animals need to explore to gain new knowledge about which actions provide reward, but also put the newly acquired knowledge to use as quickly as possible. Optimal reinforcement learning strategies should therefore assess the uncertainties of these action-reward associations and utilise them to inform decision making. We propose a novel model whereby direct and indirect striatal pathways act together to estimate both the mean and variance of reward distributions, and mesolimbic dopaminergic neurons provide transient novelty signals, facilitating effective uncertainty-driven exploration. We utilised electrophysiological recording data to verify our model of the basal ganglia, and we fitted exploration strategies derived from the neural model to data from behavioural experiments. We also compared the performance of directed exploration strategies inspired by our basal ganglia model with other exploration algorithms including classic variants of upper confidence bound (UCB) strategy in simulation. The exploration strategies inspired by the basal ganglia model can achieve overall superior performance in simulation, and we found qualitatively similar results in fitting model to behavioural data compared with the fitting of more idealised normative models with less implementation level detail. Overall, our results suggest that transient dopamine levels in the basal ganglia that encode novelty could contribute to an uncertainty representation which efficiently drives exploration in reinforcement learning.
Collapse
Affiliation(s)
- Yuhao Wang
- MRC Brain Network Dynamics Unit, University of Oxford, Oxford, United Kingdom
| | - Armin Lak
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
| | - Sanjay G. Manohar
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, United Kingdom
| | - Rafal Bogacz
- MRC Brain Network Dynamics Unit, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
8
|
Houston AI, Rosenström TH. A critical review of risk-sensitive foraging. Biol Rev Camb Philos Soc 2024; 99:478-495. [PMID: 37987237 DOI: 10.1111/brv.13031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 10/31/2023] [Accepted: 11/01/2023] [Indexed: 11/22/2023]
Abstract
Foraging is risk sensitive if choices depend on the variability of returns from the options as well as their mean return. Risk-sensitive foraging is important in behavioural ecology, psychology and neurophysiology. It has been explained both in terms of mechanisms and in terms of evolutionary advantage. We provide a critical review, evaluating both mechanistic and evolutionary accounts. Some derivations of risk sensitivity from mechanistic models based on psychophysics are not convincing because they depend on an inappropriate use of Jensen's inequality. Attempts have been made to link risk sensitivity to the ecology of a species, but again these are not convincing. The field of risk-sensitive foraging has provided a focus for theoretical and empirical work and has yielded important insights, but we lack a simple and empirically defendable general account of it in either mechanistic or evolutionary terms. However, empirical analysis of choice sequences under theoretically motivated experimental designs and environmental settings appears a promising avenue for mapping the scope and relative merits of existing theories. Simply put, the devil is in the sequence.
Collapse
Affiliation(s)
- Alasdair I Houston
- School of Biological Sciences, University of Bristol, 24 Tyndall Avenue, Bristol, BS8 1TQ, UK
| | - Tom H Rosenström
- Department of Psychology and Logopedics, Faculty of Medicine, University of Helsinki, PL 21 (Haartmaninkatu 3), 00014, Helsinki, Finland
| |
Collapse
|
9
|
Jin F, Yang L, Yang L, Li J, Li M, Shang Z. Dynamics Learning Rate Bias in Pigeons: Insights from Reinforcement Learning and Neural Correlates. Animals (Basel) 2024; 14:489. [PMID: 38338131 PMCID: PMC10854969 DOI: 10.3390/ani14030489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 01/23/2024] [Accepted: 01/30/2024] [Indexed: 02/12/2024] Open
Abstract
Research in reinforcement learning indicates that animals respond differently to positive and negative reward prediction errors, which can be calculated by assuming learning rate bias. Many studies have shown that humans and other animals have learning rate bias during learning, but it is unclear whether and how the bias changes throughout the entire learning process. Here, we recorded the behavior data and the local field potentials (LFPs) in the striatum of five pigeons performing a probabilistic learning task. Reinforcement learning models with and without learning rate biases were used to dynamically fit the pigeons' choice behavior and estimate the option values. Furthemore, the correlation between the striatal LFPs power and the model-estimated option values was explored. We found that the pigeons' learning rate bias shifted from negative to positive during the learning process, and the striatal Gamma (31 to 80 Hz) power correlated with the option values modulated by dynamic learning rate bias. In conclusion, our results support the hypothesis that pigeons employ a dynamic learning strategy in the learning process from both behavioral and neural aspects, providing valuable insights into reinforcement learning mechanisms of non-human animals.
Collapse
Affiliation(s)
- Fuli Jin
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou 450001, China; (F.J.); (L.Y.); (L.Y.); (J.L.)
- Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology, Zhengzhou 450001, China
| | - Lifang Yang
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou 450001, China; (F.J.); (L.Y.); (L.Y.); (J.L.)
- Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology, Zhengzhou 450001, China
| | - Long Yang
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou 450001, China; (F.J.); (L.Y.); (L.Y.); (J.L.)
- Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology, Zhengzhou 450001, China
| | - Jiajia Li
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou 450001, China; (F.J.); (L.Y.); (L.Y.); (J.L.)
- Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology, Zhengzhou 450001, China
| | - Mengmeng Li
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou 450001, China; (F.J.); (L.Y.); (L.Y.); (J.L.)
- Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology, Zhengzhou 450001, China
| | - Zhigang Shang
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou 450001, China; (F.J.); (L.Y.); (L.Y.); (J.L.)
- Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology, Zhengzhou 450001, China
- Institute of Medical Engineering Technology and Data Mining, Zhengzhou University, Zhengzhou 450001, China
| |
Collapse
|
10
|
Lasaponara S, Scozia G, Lozito S, Pinto M, Conversi D, Costanzi M, Vriens T, Silvetti M, Doricchi F. Temperament and probabilistic predictive coding in visual-spatial attention. Cortex 2024; 171:60-74. [PMID: 37979232 DOI: 10.1016/j.cortex.2023.10.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 07/21/2023] [Accepted: 10/16/2023] [Indexed: 11/20/2023]
Abstract
Cholinergic (Ach), Noradrenergic (NE), and Dopaminergic (DA) pathways play an important role in the regulation of spatial attention. The same neurotransmitters are also responsible for inter-individual differences in temperamental traits. Here we explored whether biologically defined temperamental traits determine differences in the ability to orient spatial attention as a function of the probabilistic association between cues and targets. To this aim, we administered the Structure of Temperament Questionnaire (STQ-77) to a sample of 151 participants who also performed a Posner task with central endogenous predictive (80 % valid/20 % invalid) or non-predictive cues (50 % valid/50 % invalid). We found that only participants with high scores in Plasticity and Intellectual Endurance showed a selective abatement of attentional costs with non-predictive cues. In addition, stepwise regression showed that costs in the non-predictive condition were negatively predicted by scores in Plasticity and positively predicted by scores in Probabilistic Thinking. These results show that stable temperamental characteristics play an important role in defining the inter-individual differences in attentional behaviour, especially in the presence of different probabilistic organisations of the sensory environment. These findings emphasize the importance of considering temperamental and personality traits in social and professional environments where the ability to control one's attention is a crucial functional skill.
Collapse
Affiliation(s)
- Stefano Lasaponara
- Department of Psychology, "Sapienza" University of Rome, Italy; IRCCS Fondazione Santa Lucia, Rome, Italy.
| | - Gabriele Scozia
- Department of Psychology, "Sapienza" University of Rome, Italy; IRCCS Fondazione Santa Lucia, Rome, Italy; PhD Programme in Behavioural Neuroscience, "Sapienza" University of Rome, Italy
| | - Silvana Lozito
- Department of Psychology, "Sapienza" University of Rome, Italy; IRCCS Fondazione Santa Lucia, Rome, Italy; PhD Programme in Behavioural Neuroscience, "Sapienza" University of Rome, Italy
| | - Mario Pinto
- Department of Psychology, "Sapienza" University of Rome, Italy; IRCCS Fondazione Santa Lucia, Rome, Italy
| | - David Conversi
- Department of Psychology, "Sapienza" University of Rome, Italy
| | - Marco Costanzi
- Department of Human Science, LUMSA University, Rome, Italy
| | - Tim Vriens
- Computational and Translational Neuroscience Laboratory (CTNLab), Institute of Cognitive Sciences and Technologies, National Research Council, Rome, Italy
| | - Massimo Silvetti
- Computational and Translational Neuroscience Laboratory (CTNLab), Institute of Cognitive Sciences and Technologies, National Research Council, Rome, Italy
| | - Fabrizio Doricchi
- Department of Psychology, "Sapienza" University of Rome, Italy; IRCCS Fondazione Santa Lucia, Rome, Italy.
| |
Collapse
|
11
|
Lowet AS, Zheng Q, Meng M, Matias S, Drugowitsch J, Uchida N. An opponent striatal circuit for distributional reinforcement learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.02.573966. [PMID: 38260354 PMCID: PMC10802299 DOI: 10.1101/2024.01.02.573966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Machine learning research has achieved large performance gains on a wide range of tasks by expanding the learning target from mean rewards to entire probability distributions of rewards - an approach known as distributional reinforcement learning (RL)1. The mesolimbic dopamine system is thought to underlie RL in the mammalian brain by updating a representation of mean value in the striatum2,3, but little is known about whether, where, and how neurons in this circuit encode information about higher-order moments of reward distributions4. To fill this gap, we used high-density probes (Neuropixels) to acutely record striatal activity from well-trained, water-restricted mice performing a classical conditioning task in which reward mean, reward variance, and stimulus identity were independently manipulated. In contrast to traditional RL accounts, we found robust evidence for abstract encoding of variance in the striatum. Remarkably, chronic ablation of dopamine inputs disorganized these distributional representations in the striatum without interfering with mean value coding. Two-photon calcium imaging and optogenetics revealed that the two major classes of striatal medium spiny neurons - D1 and D2 MSNs - contributed to this code by preferentially encoding the right and left tails of the reward distribution, respectively. We synthesize these findings into a new model of the striatum and mesolimbic dopamine that harnesses the opponency between D1 and D2 MSNs5-15 to reap the computational benefits of distributional RL.
Collapse
Affiliation(s)
- Adam S Lowet
- Center for Brain Science, Harvard University, Cambridge, MA, USA
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
- Program in Neuroscience, Harvard University, Boston, MA, USA
| | - Qiao Zheng
- Center for Brain Science, Harvard University, Cambridge, MA, USA
- Department of Neurobiology, Harvard Medical School, Boston, MA, USA
| | - Melissa Meng
- Center for Brain Science, Harvard University, Cambridge, MA, USA
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
| | - Sara Matias
- Center for Brain Science, Harvard University, Cambridge, MA, USA
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
| | - Jan Drugowitsch
- Center for Brain Science, Harvard University, Cambridge, MA, USA
- Department of Neurobiology, Harvard Medical School, Boston, MA, USA
| | - Naoshige Uchida
- Center for Brain Science, Harvard University, Cambridge, MA, USA
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
| |
Collapse
|
12
|
Bond K, Rasero J, Madan R, Bahuguna J, Rubin J, Verstynen T. Competing neural representations of choice shape evidence accumulation in humans. eLife 2023; 12:e85223. [PMID: 37818943 PMCID: PMC10624421 DOI: 10.7554/elife.85223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 10/10/2023] [Indexed: 10/13/2023] Open
Abstract
Making adaptive choices in dynamic environments requires flexible decision policies. Previously, we showed how shifts in outcome contingency change the evidence accumulation process that determines decision policies. Using in silico experiments to generate predictions, here we show how the cortico-basal ganglia-thalamic (CBGT) circuits can feasibly implement shifts in decision policies. When action contingencies change, dopaminergic plasticity redirects the balance of power, both within and between action representations, to divert the flow of evidence from one option to another. When competition between action representations is highest, the rate of evidence accumulation is the lowest. This prediction was validated in in vivo experiments on human participants, using fMRI, which showed that (1) evoked hemodynamic responses can reliably predict trial-wise choices and (2) competition between action representations, measured using a classifier model, tracked with changes in the rate of evidence accumulation. These results paint a holistic picture of how CBGT circuits manage and adapt the evidence accumulation process in mammals.
Collapse
Affiliation(s)
- Krista Bond
- Department of Psychology, Carnegie Mellon UniversityPittsburghUnited States
- Center for the Neural Basis of CognitionPittsburghUnited States
- Carnegie Mellon Neuroscience InstitutePittsburghUnited States
| | - Javier Rasero
- Department of Psychology, Carnegie Mellon UniversityPittsburghUnited States
| | - Raghav Madan
- Department of Biomedical and Health Informatics, University of WashingtonSeattleUnited States
| | - Jyotika Bahuguna
- Department of Psychology, Carnegie Mellon UniversityPittsburghUnited States
| | - Jonathan Rubin
- Center for the Neural Basis of CognitionPittsburghUnited States
- Department of Mathematics, University of PittsburghPittsburghUnited States
| | - Timothy Verstynen
- Department of Psychology, Carnegie Mellon UniversityPittsburghUnited States
- Center for the Neural Basis of CognitionPittsburghUnited States
- Carnegie Mellon Neuroscience InstitutePittsburghUnited States
- Department of Biomedical Engineering, Carnegie Mellon UniversityPittsburghUnited States
| |
Collapse
|
13
|
Blackwell KT, Doya K. Enhancing reinforcement learning models by including direct and indirect pathways improves performance on striatal dependent tasks. PLoS Comput Biol 2023; 19:e1011385. [PMID: 37594982 PMCID: PMC10479916 DOI: 10.1371/journal.pcbi.1011385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 09/05/2023] [Accepted: 07/25/2023] [Indexed: 08/20/2023] Open
Abstract
A major advance in understanding learning behavior stems from experiments showing that reward learning requires dopamine inputs to striatal neurons and arises from synaptic plasticity of cortico-striatal synapses. Numerous reinforcement learning models mimic this dopamine-dependent synaptic plasticity by using the reward prediction error, which resembles dopamine neuron firing, to learn the best action in response to a set of cues. Though these models can explain many facets of behavior, reproducing some types of goal-directed behavior, such as renewal and reversal, require additional model components. Here we present a reinforcement learning model, TD2Q, which better corresponds to the basal ganglia with two Q matrices, one representing direct pathway neurons (G) and another representing indirect pathway neurons (N). Unlike previous two-Q architectures, a novel and critical aspect of TD2Q is to update the G and N matrices utilizing the temporal difference reward prediction error. A best action is selected for N and G using a softmax with a reward-dependent adaptive exploration parameter, and then differences are resolved using a second selection step applied to the two action probabilities. The model is tested on a range of multi-step tasks including extinction, renewal, discrimination; switching reward probability learning; and sequence learning. Simulations show that TD2Q produces behaviors similar to rodents in choice and sequence learning tasks, and that use of the temporal difference reward prediction error is required to learn multi-step tasks. Blocking the update rule on the N matrix blocks discrimination learning, as observed experimentally. Performance in the sequence learning task is dramatically improved with two matrices. These results suggest that including additional aspects of basal ganglia physiology can improve the performance of reinforcement learning models, better reproduce animal behaviors, and provide insight as to the role of direct- and indirect-pathway striatal neurons.
Collapse
Affiliation(s)
- Kim T Blackwell
- Department of Bioengineering, Volgenau School of Engineering, George Mason University, Fairfax, Virginia, United States of America
| | - Kenji Doya
- Neural Computation Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| |
Collapse
|
14
|
Mikus N, Eisenegger C, Mathys C, Clark L, Müller U, Robbins TW, Lamm C, Naef M. Blocking D2/D3 dopamine receptors in male participants increases volatility of beliefs when learning to trust others. Nat Commun 2023; 14:4049. [PMID: 37422466 PMCID: PMC10329681 DOI: 10.1038/s41467-023-39823-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 06/29/2023] [Indexed: 07/10/2023] Open
Abstract
The ability to learn about other people is crucial for human social functioning. Dopamine has been proposed to regulate the precision of beliefs, but direct behavioural evidence of this is lacking. In this study, we investigate how a high dose of the D2/D3 dopamine receptor antagonist sulpiride impacts learning about other people's prosocial attitudes in a repeated Trust game. Using a Bayesian model of belief updating, we show that in a sample of 76 male participants sulpiride increases the volatility of beliefs, which leads to higher precision weights on prediction errors. This effect is driven by participants with genetically conferred higher dopamine availability (Taq1a polymorphism) and remains even after controlling for working memory performance. Higher precision weights are reflected in higher reciprocal behaviour in the repeated Trust game but not in single-round Trust games. Our data provide evidence that the D2 receptors are pivotal in regulating prediction error-driven belief updating in a social context.
Collapse
Affiliation(s)
- Nace Mikus
- Department of Cognition, Emotion, and Methods in Psychology, Faculty of Psychology, University of Vienna, Vienna, Austria.
- Interacting Minds Centre, Aarhus University, Aarhus, Denmark.
| | - Christoph Eisenegger
- Department of Cognition, Emotion, and Methods in Psychology, Faculty of Psychology, University of Vienna, Vienna, Austria
- Behavioural and Clinical Neuroscience Institute and Department of Psychology, University of Cambridge, Cambridge, UK
| | - Christoph Mathys
- Interacting Minds Centre, Aarhus University, Aarhus, Denmark
- Translational Neuromodeling Unit (TNU), Institute for Biomedical Engineering, University of Zurich and ETH Zurich, Zurich, Switzerland
- Scuola Internazionale Superiore di Studi Avanzati (SISSA), Trieste, Italy
| | - Luke Clark
- Centre for Gambling Research at UBC, Department of Psychology, University of British, Columbia, Vancouver, BC, Canada
- Djavad Mowafaghian Centre for Brain Health, University of British Columbia, Vancouver, BC, Canada
| | - Ulrich Müller
- Behavioural and Clinical Neuroscience Institute and Department of Psychology, University of Cambridge, Cambridge, UK
- Adult Neurodevelopmental Services, Health & Community Services, Government of Jersey, St Helier, Jersey
| | - Trevor W Robbins
- Behavioural and Clinical Neuroscience Institute and Department of Psychology, University of Cambridge, Cambridge, UK
| | - Claus Lamm
- Department of Cognition, Emotion, and Methods in Psychology, Faculty of Psychology, University of Vienna, Vienna, Austria.
| | - Michael Naef
- Department of Economics, University of Durham, Durham, UK.
| |
Collapse
|
15
|
Sato R, Shimomura K, Morita K. Opponent learning with different representations in the cortico-basal ganglia pathways can develop obsession-compulsion cycle. PLoS Comput Biol 2023; 19:e1011206. [PMID: 37319256 PMCID: PMC10306209 DOI: 10.1371/journal.pcbi.1011206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Accepted: 05/23/2023] [Indexed: 06/17/2023] Open
Abstract
Obsessive-compulsive disorder (OCD) has been suggested to be associated with impairment of model-based behavioral control. Meanwhile, recent work suggested shorter memory trace for negative than positive prediction errors (PEs) in OCD. We explored relations between these two suggestions through computational modeling. Based on the properties of cortico-basal ganglia pathways, we modeled human as an agent having a combination of successor representation (SR)-based system that enables model-based-like control and individual representation (IR)-based system that only hosts model-free control, with the two systems potentially learning from positive and negative PEs in different rates. We simulated the agent's behavior in the environmental model used in the recent work that describes potential development of obsession-compulsion cycle. We found that the dual-system agent could develop enhanced obsession-compulsion cycle, similarly to the agent having memory trace imbalance in the recent work, if the SR- and IR-based systems learned mainly from positive and negative PEs, respectively. We then simulated the behavior of such an opponent SR+IR agent in the two-stage decision task, in comparison with the agent having only SR-based control. Fitting of the agents' behavior by the model weighing model-based and model-free control developed in the original two-stage task study resulted in smaller weights of model-based control for the opponent SR+IR agent than for the SR-only agent. These results reconcile the previous suggestions about OCD, i.e., impaired model-based control and memory trace imbalance, raising a novel possibility that opponent learning in model(SR)-based and model-free controllers underlies obsession-compulsion. Our model cannot explain the behavior of OCD patients in punishment, rather than reward, contexts, but it could be resolved if opponent SR+IR learning operates also in the recently revealed non-canonical cortico-basal ganglia-dopamine circuit for threat/aversiveness, rather than reward, reinforcement learning, and the aversive SR + appetitive IR agent could actually develop obsession-compulsion if the environment is modeled differently.
Collapse
Affiliation(s)
- Reo Sato
- Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo, Japan
| | - Kanji Shimomura
- Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo, Japan
| | - Kenji Morita
- Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo, Japan
- International Research Center for Neurointelligence (WPI-IRCN), The University of Tokyo, Tokyo, Japan
| |
Collapse
|
16
|
van Swieten MMH, Bogacz R, Manohar SG. Gambling on an empty stomach: Hunger modulates preferences for learned but not described risks. Brain Behav 2023; 13:e2978. [PMID: 37016956 PMCID: PMC10176009 DOI: 10.1002/brb3.2978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 03/10/2023] [Accepted: 03/14/2023] [Indexed: 04/06/2023] Open
Abstract
INTRODUCTION We assess risks differently when they are explicitly described, compared to when we learn directly from experience, suggesting dissociable decision-making systems. Our needs, such as hunger, could globally affect our risk preferences, but do they affect described and learned risks equally? On one hand, decision-making from descriptions is often considered flexible and context sensitive, and might therefore be modulated by metabolic needs. On the other hand, preferences learned through reinforcement might be more strongly coupled to biological drives. METHOD Thirty-two healthy participants (females: 20, mean age: 25.6 ± 6.5 years) with a normal weight (Body Mass Index: 22.9 ± 3.2 kg/m2 ) were tested in a within-subjects counterbalanced, randomized crossover design for the effects of hunger on two separate risk-taking tasks. We asked participants to choose between two options with different risks to obtain monetary outcomes. In one task, the outcome probabilities were described numerically, whereas in a second task, they were learned. RESULT In agreement with previous studies, we found that rewarding contexts induced risk-aversion when risks were explicitly described (F1,31 = 55.01, p < .0001, ηp 2 = .64), but risk-seeking when they were learned through experience (F1,31 = 10.28, p < .003, ηp 2 = .25). Crucially, hunger attenuated these contextual biases, but only for learned risks (F1,31 = 8.38, p < .007, ηp 2 = .21). CONCLUSION The results suggest that our metabolic state determines risk-taking biases when we lack explicit descriptions.
Collapse
Affiliation(s)
| | - Rafal Bogacz
- Nuffield Department of Clinical NeuroscienceUniversity of OxfordOxfordUK
| | - Sanjay G. Manohar
- Nuffield Department of Clinical NeuroscienceUniversity of OxfordOxfordUK
| |
Collapse
|
17
|
Tangmose K, Rostrup E, Bojesen KB, Sigvard A, Jessen K, Johansen LB, Glenthøj BY, Nielsen MØ. Reward disturbances in antipsychotic-naïve patients with first-episode psychosis and their association to glutamate levels. Psychol Med 2023; 53:1629-1638. [PMID: 37010221 DOI: 10.1017/s0033291721003305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
BACKGROUND Aberrant anticipation of motivational salient events and processing of outcome evaluation in striatal and prefrontal regions have been suggested to underlie psychosis. Altered glutamate levels have likewise been linked to schizophrenia. Glutamatergic abnormalities may affect the processing of motivational salience and outcome evaluation. It remains unresolved, whether glutamatergic dysfunction is associated with the coding of motivational salience and outcome evaluation in antipsychotic-naïve patients with first-episode psychosis. METHODS Fifty-one antipsychotic-naïve patients with first-episode psychosis (22 ± 5.2 years, female/male: 31/20) and 52 healthy controls (HC) matched on age, sex, and parental education underwent functional magnetic resonance imaging and magnetic resonance spectroscopy (3T) in one session. Brain responses to motivational salience and negative outcome evaluation (NOE) were examined using a monetary incentive delay task. Glutamate levels were estimated in the left thalamus and anterior cingulate cortex using LCModel. RESULTS Patients displayed a positive signal change to NOE in the caudate (p = 0.001) and dorsolateral prefrontal cortex (DLPFC; p = 0.003) compared to HC. No group difference was observed in motivational salience or in levels of glutamate. There was a different association between NOE signal in the caudate and DLPFC and thalamic glutamate levels in patients and HC due to a negative correlation in patients (caudate: p = 0.004, DLPFC: p = 0.005) that was not seen in HC. CONCLUSIONS Our findings confirm prior findings of abnormal outcome evaluation as a part of the pathophysiology of schizophrenia. The results also suggest a possible link between thalamic glutamate and NOE signaling in patients with first-episode psychosis.
Collapse
Affiliation(s)
- Karen Tangmose
- Center for Neuropsychiatric Schizophrenia Research (CNSR) and Center for Clinical Intervention and Neuropsychiatric Schizophrenia Research (CINS), Mental Health Center Glostrup, Glostrup, Denmark
- Department of Clinical Medicine Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Egill Rostrup
- Center for Neuropsychiatric Schizophrenia Research (CNSR) and Center for Clinical Intervention and Neuropsychiatric Schizophrenia Research (CINS), Mental Health Center Glostrup, Glostrup, Denmark
- Functional Imaging Unit, Department of Clinical Physiology, Nuclear Medicine and PET, Rigshospitalet Glostrup, University of Copenhagen, Glostrup, Denmark
| | - Kirsten B Bojesen
- Center for Neuropsychiatric Schizophrenia Research (CNSR) and Center for Clinical Intervention and Neuropsychiatric Schizophrenia Research (CINS), Mental Health Center Glostrup, Glostrup, Denmark
| | - Anne Sigvard
- Center for Neuropsychiatric Schizophrenia Research (CNSR) and Center for Clinical Intervention and Neuropsychiatric Schizophrenia Research (CINS), Mental Health Center Glostrup, Glostrup, Denmark
- Department of Clinical Medicine Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Kasper Jessen
- Center for Neuropsychiatric Schizophrenia Research (CNSR) and Center for Clinical Intervention and Neuropsychiatric Schizophrenia Research (CINS), Mental Health Center Glostrup, Glostrup, Denmark
| | - Louise Baruël Johansen
- Center for Neuropsychiatric Schizophrenia Research (CNSR) and Center for Clinical Intervention and Neuropsychiatric Schizophrenia Research (CINS), Mental Health Center Glostrup, Glostrup, Denmark
| | - Birte Y Glenthøj
- Center for Neuropsychiatric Schizophrenia Research (CNSR) and Center for Clinical Intervention and Neuropsychiatric Schizophrenia Research (CINS), Mental Health Center Glostrup, Glostrup, Denmark
- Department of Clinical Medicine Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Mette Ødegaard Nielsen
- Center for Neuropsychiatric Schizophrenia Research (CNSR) and Center for Clinical Intervention and Neuropsychiatric Schizophrenia Research (CINS), Mental Health Center Glostrup, Glostrup, Denmark
- Department of Clinical Medicine Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
18
|
Weiss AR, Korzeniewska A, Chrabaszcz A, Bush A, Fiez JA, Crone NE, Richardson RM. Lexicality-Modulated Influence of Auditory Cortex on Subthalamic Nucleus During Motor Planning for Speech. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2023; 4:53-80. [PMID: 37229140 PMCID: PMC10205077 DOI: 10.1162/nol_a_00086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Accepted: 10/18/2022] [Indexed: 05/27/2023]
Abstract
Speech requires successful information transfer within cortical-basal ganglia loop circuits to produce the desired acoustic output. For this reason, up to 90% of Parkinson's disease patients experience impairments of speech articulation. Deep brain stimulation (DBS) is highly effective in controlling the symptoms of Parkinson's disease, sometimes alongside speech improvement, but subthalamic nucleus (STN) DBS can also lead to decreases in semantic and phonological fluency. This paradox demands better understanding of the interactions between the cortical speech network and the STN, which can be investigated with intracranial EEG recordings collected during DBS implantation surgery. We analyzed the propagation of high-gamma activity between STN, superior temporal gyrus (STG), and ventral sensorimotor cortices during reading aloud via event-related causality, a method that estimates strengths and directionalities of neural activity propagation. We employed a newly developed bivariate smoothing model based on a two-dimensional moving average, which is optimal for reducing random noise while retaining a sharp step response, to ensure precise embedding of statistical significance in the time-frequency space. Sustained and reciprocal neural interactions between STN and ventral sensorimotor cortex were observed. Moreover, high-gamma activity propagated from the STG to the STN prior to speech onset. The strength of this influence was affected by the lexical status of the utterance, with increased activity propagation during word versus pseudoword reading. These unique data suggest a potential role for the STN in the feedforward control of speech.
Collapse
Affiliation(s)
- Alexander R. Weiss
- JHU Cognitive Neurophysiology and BMI Lab, Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Anna Korzeniewska
- JHU Cognitive Neurophysiology and BMI Lab, Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Anna Chrabaszcz
- Department of Psychology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Alan Bush
- Brain Modulation Lab, Department of Neurosurgery, Massachusetts General Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Julie A. Fiez
- Department of Psychology, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, PA, USA
- University of Pittsburgh Brain Institute, Pittsburgh, PA, USA
| | - Nathan E. Crone
- JHU Cognitive Neurophysiology and BMI Lab, Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Robert M. Richardson
- Brain Modulation Lab, Department of Neurosurgery, Massachusetts General Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| |
Collapse
|
19
|
Morita K, Shimomura K, Kawaguchi Y. Opponent Learning with Different Representations in the Cortico-Basal Ganglia Circuits. eNeuro 2023; 10:ENEURO.0422-22.2023. [PMID: 36653187 PMCID: PMC9884109 DOI: 10.1523/eneuro.0422-22.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 12/06/2022] [Accepted: 01/03/2023] [Indexed: 01/20/2023] Open
Abstract
The direct and indirect pathways of the basal ganglia (BG) have been suggested to learn mainly from positive and negative feedbacks, respectively. Since these pathways unevenly receive inputs from different cortical neuron types and/or regions, they may preferentially use different state/action representations. We explored whether such a combined use of different representations, coupled with different learning rates from positive and negative reward prediction errors (RPEs), has computational benefits. We modeled animal as an agent equipped with two learning systems, each of which adopted individual representation (IR) or successor representation (SR) of states. With varying the combination of IR or SR and also the learning rates from positive and negative RPEs in each system, we examined how the agent performed in a dynamic reward navigation task. We found that combination of SR-based system learning mainly from positive RPEs and IR-based system learning mainly from negative RPEs could achieve a good performance in the task, as compared with other combinations. In such a combination of appetitive SR-based and aversive IR-based systems, both systems show activities of comparable magnitudes with opposite signs, consistent with the suggested profiles of the two BG pathways. Moreover, the architecture of such a combination provides a novel coherent explanation for the functional significance and underlying mechanism of diverse findings about the cortico-BG circuits. These results suggest that particularly combining different representations with appetitive and aversive learning could be an effective learning strategy in certain dynamic environments, and it might actually be implemented in the cortico-BG circuits.
Collapse
Affiliation(s)
- Kenji Morita
- Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo 113-0033, Japan
- International Research Center for Neurointelligence (WPI-IRCN), The University of Tokyo, Tokyo 113-0033, Japan
| | - Kanji Shimomura
- Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo 113-0033, Japan
- Department of Behavioral Medicine, National Institute of Mental Health, National Center of Neurology and Psychiatry, Kodaira 187-8551, Japan
| | - Yasuo Kawaguchi
- Brain Science Institute, Tamagawa University, Machida 194-8610, Japan
- National Institute for Physiological Sciences (NIPS), Okazaki 444-8787, Japan
| |
Collapse
|
20
|
Liebenow B, Jones R, DiMarco E, Trattner JD, Humphries J, Sands LP, Spry KP, Johnson CK, Farkas EB, Jiang A, Kishida KT. Computational reinforcement learning, reward (and punishment), and dopamine in psychiatric disorders. Front Psychiatry 2022; 13:886297. [PMID: 36339844 PMCID: PMC9630918 DOI: 10.3389/fpsyt.2022.886297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 09/23/2022] [Indexed: 11/13/2022] Open
Abstract
In the DSM-5, psychiatric diagnoses are made based on self-reported symptoms and clinician-identified signs. Though helpful in choosing potential interventions based on the available regimens, this conceptualization of psychiatric diseases can limit basic science investigation into their underlying causes. The reward prediction error (RPE) hypothesis of dopamine neuron function posits that phasic dopamine signals encode the difference between the rewards a person expects and experiences. The computational framework from which this hypothesis was derived, temporal difference reinforcement learning (TDRL), is largely focused on reward processing rather than punishment learning. Many psychiatric disorders are characterized by aberrant behaviors, expectations, reward processing, and hypothesized dopaminergic signaling, but also characterized by suffering and the inability to change one's behavior despite negative consequences. In this review, we provide an overview of the RPE theory of phasic dopamine neuron activity and review the gains that have been made through the use of computational reinforcement learning theory as a framework for understanding changes in reward processing. The relative dearth of explicit accounts of punishment learning in computational reinforcement learning theory and its application in neuroscience is highlighted as a significant gap in current computational psychiatric research. Four disorders comprise the main focus of this review: two disorders of traditionally hypothesized hyperdopaminergic function, addiction and schizophrenia, followed by two disorders of traditionally hypothesized hypodopaminergic function, depression and post-traumatic stress disorder (PTSD). Insights gained from a reward processing based reinforcement learning framework about underlying dopaminergic mechanisms and the role of punishment learning (when available) are explored in each disorder. Concluding remarks focus on the future directions required to characterize neuropsychiatric disorders with a hypothesized cause of underlying dopaminergic transmission.
Collapse
Affiliation(s)
- Brittany Liebenow
- Neuroscience Graduate Program, Wake Forest University School of Medicine, Winston-Salem, NC, United States
- Department of Physiology and Pharmacology, Wake Forest University School of Medicine, Winston-Salem, NC, United States
| | - Rachel Jones
- Neuroscience Graduate Program, Wake Forest University School of Medicine, Winston-Salem, NC, United States
- Department of Physiology and Pharmacology, Wake Forest University School of Medicine, Winston-Salem, NC, United States
| | - Emily DiMarco
- Neuroscience Graduate Program, Wake Forest University School of Medicine, Winston-Salem, NC, United States
- Department of Physiology and Pharmacology, Wake Forest University School of Medicine, Winston-Salem, NC, United States
| | - Jonathan D. Trattner
- Neuroscience Graduate Program, Wake Forest University School of Medicine, Winston-Salem, NC, United States
- Department of Physiology and Pharmacology, Wake Forest University School of Medicine, Winston-Salem, NC, United States
| | - Joseph Humphries
- Neuroscience Graduate Program, Wake Forest University School of Medicine, Winston-Salem, NC, United States
- Department of Physiology and Pharmacology, Wake Forest University School of Medicine, Winston-Salem, NC, United States
| | - L. Paul Sands
- Neuroscience Graduate Program, Wake Forest University School of Medicine, Winston-Salem, NC, United States
- Department of Physiology and Pharmacology, Wake Forest University School of Medicine, Winston-Salem, NC, United States
| | - Kasey P. Spry
- Neuroscience Graduate Program, Wake Forest University School of Medicine, Winston-Salem, NC, United States
- Department of Physiology and Pharmacology, Wake Forest University School of Medicine, Winston-Salem, NC, United States
| | - Christina K. Johnson
- Department of Physiology and Pharmacology, Wake Forest University School of Medicine, Winston-Salem, NC, United States
| | - Evelyn B. Farkas
- Georgia State University Undergraduate Neuroscience Institute, Atlanta, GA, United States
| | - Angela Jiang
- Department of Physiology and Pharmacology, Wake Forest University School of Medicine, Winston-Salem, NC, United States
| | - Kenneth T. Kishida
- Neuroscience Graduate Program, Wake Forest University School of Medicine, Winston-Salem, NC, United States
- Department of Physiology and Pharmacology, Wake Forest University School of Medicine, Winston-Salem, NC, United States
- Department of Neurosurgery, Wake Forest University School of Medicine, Winston-Salem, NC, United States
- Department of Biomedical Engineering, Wake Forest University School of Medicine, Winston-Salem, NC, United States
| |
Collapse
|
21
|
Identifying control ensembles for information processing within the cortico-basal ganglia-thalamic circuit. PLoS Comput Biol 2022; 18:e1010255. [PMID: 35737720 PMCID: PMC9258830 DOI: 10.1371/journal.pcbi.1010255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Revised: 07/06/2022] [Accepted: 05/27/2022] [Indexed: 11/20/2022] Open
Abstract
In situations featuring uncertainty about action-reward contingencies, mammals can flexibly adopt strategies for decision-making that are tuned in response to environmental changes. Although the cortico-basal ganglia thalamic (CBGT) network has been identified as contributing to the decision-making process, it features a complex synaptic architecture, comprised of multiple feed-forward, reciprocal, and feedback pathways, that complicate efforts to elucidate the roles of specific CBGT populations in the process by which evidence is accumulated and influences behavior. In this paper we apply a strategic sampling approach, based on Latin hypercube sampling, to explore how variations in CBGT network properties, including subpopulation firing rates and synaptic weights, map to variability of parameters in a normative drift diffusion model (DDM), representing algorithmic aspects of information processing during decision-making. Through the application of canonical correlation analysis, we find that this relationship can be characterized in terms of three low-dimensional control ensembles within the CBGT network that impact specific qualities of the emergent decision policy: responsiveness (a measure of how quickly evidence evaluation gets underway, associated with overall activity in corticothalamic and direct pathways), pliancy (a measure of the standard of evidence needed to commit to a decision, associated largely with overall activity in components of the indirect pathway of the basal ganglia), and choice (a measure of commitment toward one available option, associated with differences in direct and indirect pathways across action channels). These analyses provide mechanistic predictions about the roles of specific CBGT network elements in tuning the way that information is accumulated and translated into decision-related behavior.
Collapse
|
22
|
Möller M, Manohar S, Bogacz R. Uncertainty-guided learning with scaled prediction errors in the basal ganglia. PLoS Comput Biol 2022; 18:e1009816. [PMID: 35622863 PMCID: PMC9182698 DOI: 10.1371/journal.pcbi.1009816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 06/09/2022] [Accepted: 05/05/2022] [Indexed: 11/19/2022] Open
Abstract
To accurately predict rewards associated with states or actions, the variability of observations has to be taken into account. In particular, when the observations are noisy, the individual rewards should have less influence on tracking of average reward, and the estimate of the mean reward should be updated to a smaller extent after each observation. However, it is not known how the magnitude of the observation noise might be tracked and used to control prediction updates in the brain reward system. Here, we introduce a new model that uses simple, tractable learning rules that track the mean and standard deviation of reward, and leverages prediction errors scaled by uncertainty as the central feedback signal. We show that the new model has an advantage over conventional reinforcement learning models in a value tracking task, and approaches a theoretic limit of performance provided by the Kalman filter. Further, we propose a possible biological implementation of the model in the basal ganglia circuit. In the proposed network, dopaminergic neurons encode reward prediction errors scaled by standard deviation of rewards. We show that such scaling may arise if the striatal neurons learn the standard deviation of rewards and modulate the activity of dopaminergic neurons. The model is consistent with experimental findings concerning dopamine prediction error scaling relative to reward magnitude, and with many features of striatal plasticity. Our results span across the levels of implementation, algorithm, and computation, and might have important implications for understanding the dopaminergic prediction error signal and its relation to adaptive and effective learning.
Collapse
Affiliation(s)
- Moritz Möller
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, United Kingdom
| | - Sanjay Manohar
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, United Kingdom
- Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom
| | - Rafal Bogacz
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
23
|
The role of state uncertainty in the dynamics of dopamine. Curr Biol 2022; 32:1077-1087.e9. [PMID: 35114098 PMCID: PMC8930519 DOI: 10.1016/j.cub.2022.01.025] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 11/22/2021] [Accepted: 01/10/2022] [Indexed: 11/22/2022]
Abstract
Reinforcement learning models of the basal ganglia map the phasic dopamine signal to reward prediction errors (RPEs). Conventional models assert that, when a stimulus predicts a reward with fixed delay, dopamine activity during the delay should converge to baseline through learning. However, recent studies have found that dopamine ramps up before reward in certain conditions even after learning, thus challenging the conventional models. In this work, we show that sensory feedback causes an unbiased learner to produce RPE ramps. Our model predicts that when feedback gradually decreases during a trial, dopamine activity should resemble a "bump," whose ramp-up phase should, furthermore, be greater than that of conditions where the feedback stays high. We trained mice on a virtual navigation task with varying brightness, and both predictions were empirically observed. In sum, our theoretical and experimental results reconcile the seemingly conflicting data on dopamine behaviors under the RPE hypothesis.
Collapse
|
24
|
Lefebvre G, Summerfield C, Bogacz R. A Normative Account of Confirmation Bias During Reinforcement Learning. Neural Comput 2022; 34:307-337. [PMID: 34758486 PMCID: PMC7612695 DOI: 10.1162/neco_a_01455] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 07/26/2021] [Indexed: 11/04/2022]
Abstract
Reinforcement learning involves updating estimates of the value of states and actions on the basis of experience. Previous work has shown that in humans, reinforcement learning exhibits a confirmatory bias: when the value of a chosen option is being updated, estimates are revised more radically following positive than negative reward prediction errors, but the converse is observed when updating the unchosen option value estimate. Here, we simulate performance on a multi-arm bandit task to examine the consequences of a confirmatory bias for reward harvesting. We report a paradoxical finding: that confirmatory biases allow the agent to maximize reward relative to an unbiased updating rule. This principle holds over a wide range of experimental settings and is most influential when decisions are corrupted by noise. We show that this occurs because on average, confirmatory biases lead to overestimating the value of more valuable bandits and underestimating the value of less valuable bandits, rendering decisions overall more robust in the face of noise. Our results show how apparently suboptimal learning rules can in fact be reward maximizing if decisions are made with finite computational precision.
Collapse
Affiliation(s)
- Germain Lefebvre
- MRC Brain Network Dynamics Unit, Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford OX3 9DU, U.K.
| | | | - Rafal Bogacz
- MRC Brain Network Dynamics Unit, Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford OX3 9DU, U.K.
| |
Collapse
|
25
|
Hirschbichler ST, Rothwell JC, Manohar SG. Dopamine increases risky choice while D2 blockade shortens decision time. Exp Brain Res 2022; 240:3351-3360. [PMID: 36350356 PMCID: PMC9678996 DOI: 10.1007/s00221-022-06501-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 10/27/2022] [Indexed: 11/11/2022]
Abstract
Dopamine is crucially involved in decision-making and overstimulation within dopaminergic pathways can lead to impulsive behaviour, including a desire to take risks and reduced deliberation before acting. These behavioural changes are side effects of treatment with dopaminergic drugs in Parkinson disease, but their likelihood of occurrence is difficult to predict and may be influenced by the individual's baseline endogenous dopamine state, and indeed correlate with sensation-seeking personality traits. We here collected data on a standard gambling task in healthy volunteers given either placebo, 2.5 mg of the dopamine antagonist haloperidol or 100/25 mg of the dopamine precursor levodopa in a within-subject design. We found an increase in risky choices on levodopa. Choices were, however, made faster on haloperidol with no effect of levodopa on deliberation time. Shortened deliberation times on haloperidol occurred in low sensation-seekers only, suggesting a correlation between sensation-seeking personality trait and baseline dopamine levels. We hypothesise that levodopa increases risk-taking behaviour via overstimulation at both D1 and D2 receptor level, while a single low dose of haloperidol, as previously reported (Frank and O'Reilly 2006), may block D2 receptors pre- and post-synaptically and may paradoxically lead to higher striatal dopamine acting on remaining striatal D1 receptors, causing speedier decision without influencing risk tolerance. These effects could also fit with a recently proposed computational model of the basal ganglia (Moeller and Bogacz 2019; Moeller et al. 2021). Furthermore, our data suggest that the actual dopaminergic drug effect may be dependent on the individual's baseline dopamine state, which may influence our therapeutic decision as clinicians in the future.
Collapse
Affiliation(s)
- Stephanie T. Hirschbichler
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, London, WC1N 3BG UK ,Department of Neurology, University Hospital St. Pölten, Dunant-Platz 1, 3100 St. Pölten, Austria ,Karl Landsteiner University of Health Sciences, Dr. Karl-Dorrek-Straße 30, 3500 Krems, Austria
| | - John C. Rothwell
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, London, WC1N 3BG UK
| | - Sanjay G. Manohar
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, London, WC1N 3BG UK ,Nuffield Department of Clinical Neurosciences, John Radcliffe Hospital, Oxford, OX3 9DU UK
| |
Collapse
|
26
|
Bond K, Dunovan K, Porter A, Rubin JE, Verstynen T. Dynamic decision policy reconfiguration under outcome uncertainty. eLife 2021; 10:e65540. [PMID: 34951589 PMCID: PMC8806193 DOI: 10.7554/elife.65540] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Accepted: 12/23/2021] [Indexed: 11/18/2022] Open
Abstract
In uncertain or unstable environments, sometimes the best decision is to change your mind. To shed light on this flexibility, we evaluated how the underlying decision policy adapts when the most rewarding action changes. Human participants performed a dynamic two-armed bandit task that manipulated the certainty in relative reward (conflict) and the reliability of action-outcomes (volatility). Continuous estimates of conflict and volatility contributed to shifts in exploratory states by changing both the rate of evidence accumulation (drift rate) and the amount of evidence needed to make a decision (boundary height), respectively. At the trialwise level, following a switch in the optimal choice, the drift rate plummets and the boundary height weakly spikes, leading to a slow exploratory state. We find that the drift rate drives most of this response, with an unreliable contribution of boundary height across experiments. Surprisingly, we find no evidence that pupillary responses associated with decision policy changes. We conclude that humans show a stereotypical shift in their decision policies in response to environmental changes.
Collapse
Affiliation(s)
- Krista Bond
- Department of Psychology, Carnegie Mellon UniversityPittsburghUnited States
- Center for the Neural Basis of CognitionPittsburghUnited States
- Carnegie Mellon Neuroscience InstitutePittsburghUnited States
| | - Kyle Dunovan
- Department of Psychology, Carnegie Mellon UniversityPittsburghUnited States
| | - Alexis Porter
- Department of Psychology, Northwestern UniversityEvanstonUnited States
| | - Jonathan E Rubin
- Center for the Neural Basis of CognitionPittsburghUnited States
- Department of Mathematics, University of PittsburghPittsburghUnited States
| | - Timothy Verstynen
- Department of Psychology, Carnegie Mellon UniversityPittsburghUnited States
- Center for the Neural Basis of CognitionPittsburghUnited States
- Carnegie Mellon Neuroscience InstitutePittsburghUnited States
- Department of Biomedical Engineering, Carnegie Mellon UniversityPittsburghUnited States
| |
Collapse
|
27
|
Feng Z, Nagase AM, Morita K. A Reinforcement Learning Approach to Understanding Procrastination: Does Inaccurate Value Approximation Cause Irrational Postponing of a Task? Front Neurosci 2021; 15:660595. [PMID: 34602962 PMCID: PMC8481628 DOI: 10.3389/fnins.2021.660595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 08/16/2021] [Indexed: 11/27/2022] Open
Abstract
Procrastination is the voluntary but irrational postponing of a task despite being aware that the delay can lead to worse consequences. It has been extensively studied in psychological field, from contributing factors, to theoretical models. From value-based decision making and reinforcement learning (RL) perspective, procrastination has been suggested to be caused by non-optimal choice resulting from cognitive limitations. Exactly what sort of cognitive limitations are involved, however, remains elusive. In the current study, we examined if a particular type of cognitive limitation, namely, inaccurate valuation resulting from inadequate state representation, would cause procrastination. Recent work has suggested that humans may adopt a particular type of state representation called the successor representation (SR) and that humans can learn to represent states by relatively low-dimensional features. Combining these suggestions, we assumed a dimension-reduced version of SR. We modeled a series of behaviors of a "student" doing assignments during the school term, when putting off doing the assignments (i.e., procrastination) is not allowed, and during the vacation, when whether to procrastinate or not can be freely chosen. We assumed that the "student" had acquired a rigid reduced SR of each state, corresponding to each step in completing an assignment, under the policy without procrastination. The "student" learned the approximated value of each state which was computed as a linear function of features of the states in the rigid reduced SR, through temporal-difference (TD) learning. During the vacation, the "student" made decisions at each time-step whether to procrastinate based on these approximated values. Simulation results showed that the reduced SR-based RL model generated procrastination behavior, which worsened across episodes. According to the values approximated by the "student," to procrastinate was the better choice, whereas not to procrastinate was mostly better according to the true values. Thus, the current model generated procrastination behavior caused by inaccurate value approximation, which resulted from the adoption of the reduced SR as state representation. These findings indicate that the reduced SR, or more generally, the dimension reduction in state representation, can be a potential form of cognitive limitation that leads to procrastination.
Collapse
Affiliation(s)
- Zheyu Feng
- Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo, Japan
| | - Asako Mitsuto Nagase
- Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo, Japan
- Division of Neurology, Department of Brain and Neurosciences, Faculty of Medicine, Tottori University, Yonago, Japan
- Research Fellowship for Young Scientists, Japan Society for the Promotion of Science, Tokyo, Japan
- Department of Neurology, Faculty of Medicine, Shimane University, Izumo, Japan
| | - Kenji Morita
- Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo, Japan
- International Research Center for Neurointelligence (WPI-IRCN), The University of Tokyo, Tokyo, Japan
| |
Collapse
|
28
|
Moeller M, Grohn J, Manohar S, Bogacz R. An association between prediction errors and risk-seeking: Theory and behavioral evidence. PLoS Comput Biol 2021; 17:e1009213. [PMID: 34270552 PMCID: PMC8318232 DOI: 10.1371/journal.pcbi.1009213] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Revised: 07/28/2021] [Accepted: 06/23/2021] [Indexed: 11/19/2022] Open
Abstract
Reward prediction errors (RPEs) and risk preferences have two things in common: both can shape decision making behavior, and both are commonly associated with dopamine. RPEs drive value learning and are thought to be represented in the phasic release of striatal dopamine. Risk preferences bias choices towards or away from uncertainty; they can be manipulated with drugs that target the dopaminergic system. Based on the common neural substrate, we hypothesize that RPEs and risk preferences are linked on the level of behavior as well. Here, we develop this hypothesis theoretically and test it empirically. First, we apply a recent theory of learning in the basal ganglia to predict how RPEs influence risk preferences. We find that positive RPEs should cause increased risk-seeking, while negative RPEs should cause risk-aversion. We then test our behavioral predictions using a novel bandit task in which value and risk vary independently across options. Critically, conditions are included where options vary in risk but are matched for value. We find that our prediction was correct: participants become more risk-seeking if choices are preceded by positive RPEs, and more risk-averse if choices are preceded by negative RPEs. These findings cannot be explained by other known effects, such as nonlinear utility curves or dynamic learning rates.
Collapse
Affiliation(s)
- Moritz Moeller
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, United Kingdom
| | - Jan Grohn
- Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom
| | - Sanjay Manohar
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, United Kingdom
- Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom
| | - Rafal Bogacz
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
29
|
Kasai H, Ziv NE, Okazaki H, Yagishita S, Toyoizumi T. Spine dynamics in the brain, mental disorders and artificial neural networks. Nat Rev Neurosci 2021; 22:407-422. [PMID: 34050339 DOI: 10.1038/s41583-021-00467-3] [Citation(s) in RCA: 77] [Impact Index Per Article: 25.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/14/2021] [Indexed: 12/15/2022]
Abstract
In the brain, most synapses are formed on minute protrusions known as dendritic spines. Unlike their artificial intelligence counterparts, spines are not merely tuneable memory elements: they also embody algorithms that implement the brain's ability to learn from experience and cope with new challenges. Importantly, they exhibit structural dynamics that depend on activity, excitatory input and inhibitory input (synaptic plasticity or 'extrinsic' dynamics) and dynamics independent of activity ('intrinsic' dynamics), both of which are subject to neuromodulatory influences and reinforcers such as dopamine. Here we succinctly review extrinsic and intrinsic dynamics, compare these with parallels in machine learning where they exist, describe the importance of intrinsic dynamics for memory management and adaptation, and speculate on how disruption of extrinsic and intrinsic dynamics may give rise to mental disorders. Throughout, we also highlight algorithmic features of spine dynamics that may be relevant to future artificial intelligence developments.
Collapse
Affiliation(s)
- Haruo Kasai
- Laboratory of Structural Physiology, Center for Disease Biology and Integrative Medicine, Faculty of Medicine, The University of Tokyo, Tokyo, Japan. .,International Research Center for Neurointelligence (WPI-IRCN), UTIAS, The University of Tokyo, Bunkyo-ku, Tokyo, Japan.
| | - Noam E Ziv
- Technion Faculty of Medicine and Network Biology Research Labs, Technion City, Haifa, Israel
| | - Hitoshi Okazaki
- Laboratory of Structural Physiology, Center for Disease Biology and Integrative Medicine, Faculty of Medicine, The University of Tokyo, Tokyo, Japan.,International Research Center for Neurointelligence (WPI-IRCN), UTIAS, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Sho Yagishita
- Laboratory of Structural Physiology, Center for Disease Biology and Integrative Medicine, Faculty of Medicine, The University of Tokyo, Tokyo, Japan.,International Research Center for Neurointelligence (WPI-IRCN), UTIAS, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Taro Toyoizumi
- Laboratory for Neural Computation and Adaptation, RIKEN Center for Brain Science, Saitama, Japan.,Department of Mathematical Informatics, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
30
|
Gilbertson T, Steele D. Tonic dopamine, uncertainty and basal ganglia action selection. Neuroscience 2021; 466:109-124. [PMID: 34015370 DOI: 10.1016/j.neuroscience.2021.05.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 05/04/2021] [Accepted: 05/08/2021] [Indexed: 11/29/2022]
Abstract
To make optimal decisions in uncertain circumstances flexible adaption of behaviour is required; exploring alternatives when the best choice is unknown, exploiting what is known when that is best. Using a computational model of the basal ganglia, we propose that switches between exploratory and exploitative decisions are mediated by the interaction between tonic dopamine and cortical input to the basal ganglia. We show that a biologically detailed action selection circuit model, endowed with dopamine dependant striatal plasticity, can optimally solve the explore-exploit problem, estimating the true underlying state of a noisy Gaussian diffusion process. Critical to the model's performance was a fluctuating level of tonic dopamine which increased under conditions of uncertainty. With an optimal range of tonic dopamine, explore-exploit decisions were mediated by the effects of tonic dopamine on the precision of the model action selection mechanism. Under conditions of uncertain reward pay-out, the model's reduced selectivity allowed disinhibition of multiple alternative actions to be explored at random. Conversely, when uncertainly about reward pay-out was low, enhanced selectivity of the action selection circuit facilitated exploitation of the high value choice. Model performance was at the level of a Kalman filter which provides an optimal solution for the task. These simulations support the idea that this subcortical neural circuit may have evolved to facilitate decision making in non-stationary reward environments. The model generates several experimental predictions with relevance to abnormal decision making in neuropsychiatric and neurological disease.
Collapse
Affiliation(s)
- Tom Gilbertson
- Department of Neurology, Level 6, South Block, Ninewells Hospital & Medical School, Dundee DD2 4BF, UK; Division of Imaging Science and Technology, Medical School, University of Dundee, DD2 4BF, UK.
| | - Douglas Steele
- Division of Imaging Science and Technology, Medical School, University of Dundee, DD2 4BF, UK
| |
Collapse
|
31
|
Chai Y, Bian Y, Liu H, Li J, Xu J. Glaucoma diagnosis in the Chinese context: An uncertainty information-centric Bayesian deep learning model. Inf Process Manag 2021. [DOI: 10.1016/j.ipm.2020.102454] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
32
|
Lowet AS, Zheng Q, Matias S, Drugowitsch J, Uchida N. Distributional Reinforcement Learning in the Brain. Trends Neurosci 2020; 43:980-997. [PMID: 33092893 PMCID: PMC8073212 DOI: 10.1016/j.tins.2020.09.004] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Revised: 08/14/2020] [Accepted: 09/08/2020] [Indexed: 12/11/2022]
Abstract
Learning about rewards and punishments is critical for survival. Classical studies have demonstrated an impressive correspondence between the firing of dopamine neurons in the mammalian midbrain and the reward prediction errors of reinforcement learning algorithms, which express the difference between actual reward and predicted mean reward. However, it may be advantageous to learn not only the mean but also the complete distribution of potential rewards. Recent advances in machine learning have revealed a biologically plausible set of algorithms for reconstructing this reward distribution from experience. Here, we review the mathematical foundations of these algorithms as well as initial evidence for their neurobiological implementation. We conclude by highlighting outstanding questions regarding the circuit computation and behavioral readout of these distributional codes.
Collapse
Affiliation(s)
- Adam S Lowet
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA 02138, USA
| | - Qiao Zheng
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Sara Matias
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA 02138, USA
| | - Jan Drugowitsch
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115, USA.
| | - Naoshige Uchida
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA 02138, USA.
| |
Collapse
|
33
|
Verstynen T, Dunovan K, Walsh C, Kuan CH, Manuck SB, Gianaros PJ. Adiposity covaries with signatures of asymmetric feedback learning during adaptive decisions. Soc Cogn Affect Neurosci 2020; 15:1145-1156. [PMID: 32608485 PMCID: PMC7657458 DOI: 10.1093/scan/nsaa088] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Revised: 06/03/2020] [Accepted: 06/15/2020] [Indexed: 12/19/2022] Open
Abstract
Unhealthy weight gain relates, in part, to how people make decisions based on prior experience. Here we conducted post hoc analysis on an archival data set to evaluate whether individual differences in adiposity, an anthropometric construct encompassing a spectrum of body types, from lean to obese, associate with signatures of asymmetric feedback learning during value-based decision-making. In a sample of neurologically healthy adults (N = 433), ventral striatal responses to rewards, measured using fMRI, were not directly associated with adiposity, but rather moderated its relationship with feedback-driven learning in the Iowa gambling task, tested outside the scanner. Using a biologically inspired model of basal ganglia-dependent decision processes, we found this moderating effect of reward reactivity to be explained by an asymmetrical use of feedback to drive learning; that is, with more plasticity for gains than for losses, stronger reward reactivity leads to decisions that minimize exploration for maximizing long-term outcomes. Follow-up analysis confirmed that individual differences in adiposity correlated with signatures of asymmetric use of feedback cues during learning, suggesting that reward reactivity may especially relate to adiposity, and possibly obesity risk, when gains impact future decisions more than losses.
Collapse
Affiliation(s)
- Timothy Verstynen
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA 15213, USA.,Carnegie Mellon Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA.,Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Kyle Dunovan
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA 15213, USA.,Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Catherine Walsh
- Department of Psychology, University of Pittsburgh, Pittsburgh, PA 15260, USA.,Center for the Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | - Chieh-Hsin Kuan
- Department of Psychology, University of Pittsburgh, Pittsburgh, PA 15260, USA.,Center for the Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | - Stephen B Manuck
- Department of Psychology, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | - Peter J Gianaros
- Department of Psychology, University of Pittsburgh, Pittsburgh, PA 15260, USA.,Center for the Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, PA 15260, USA
| |
Collapse
|
34
|
Deep Reinforcement Learning and Its Neuroscientific Implications. Neuron 2020; 107:603-616. [DOI: 10.1016/j.neuron.2020.06.014] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Revised: 06/08/2020] [Accepted: 06/12/2020] [Indexed: 11/23/2022]
|
35
|
Fujita Y, Yagishita S, Kasai H, Ishii S. Computational Characteristics of the Striatal Dopamine System Described by Reinforcement Learning With Fast Generalization. Front Comput Neurosci 2020; 14:66. [PMID: 32774245 PMCID: PMC7388898 DOI: 10.3389/fncom.2020.00066] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Accepted: 06/08/2020] [Indexed: 11/13/2022] Open
Abstract
Generalization is the ability to apply past experience to similar but non-identical situations. It not only affects stimulus-outcome relationships, as observed in conditioning experiments, but may also be essential for adaptive behaviors, which involve the interaction between individuals and their environment. Computational modeling could potentially clarify the effect of generalization on adaptive behaviors and how this effect emerges from the underlying computation. Recent neurobiological observation indicated that the striatal dopamine system achieves generalization and subsequent discrimination by updating the corticostriatal synaptic connections in differential response to reward and punishment. In this study, we analyzed how computational characteristics in this neurobiological system affects adaptive behaviors. We proposed a novel reinforcement learning model with multilayer neural networks in which the synaptic weights of only the last layer are updated according to the prediction error. We set fixed connections between the input and hidden layers to maintain the similarity of inputs in the hidden-layer representation. This network enabled fast generalization of reward and punishment learning, and thereby facilitated safe and efficient exploration of spatial navigation tasks. Notably, it demonstrated a quick reward approach and efficient punishment aversion in the early learning phase, compared to algorithms that do not show generalization. However, disturbance of the network that causes noisy generalization and impaired discrimination induced maladaptive valuation. These results suggested the advantage and potential drawback of computation by the striatal dopamine system with regard to adaptive behaviors.
Collapse
Affiliation(s)
- Yoshihisa Fujita
- Integrated Systems Biology Laboratory, Department of Systems Science, Graduate School of Informatics, Kyoto University, Kyoto, Japan
| | - Sho Yagishita
- Laboratory of Structural Physiology, Center for Disease Biology and Integrative Medicine, Faculty of Medicine, The University of Tokyo, Tokyo, Japan
- International Research Center for Neurointelligence, The University of Tokyo Institutes for Advanced Study, The University of Tokyo, Tokyo, Japan
| | - Haruo Kasai
- Laboratory of Structural Physiology, Center for Disease Biology and Integrative Medicine, Faculty of Medicine, The University of Tokyo, Tokyo, Japan
- International Research Center for Neurointelligence, The University of Tokyo Institutes for Advanced Study, The University of Tokyo, Tokyo, Japan
| | - Shin Ishii
- Integrated Systems Biology Laboratory, Department of Systems Science, Graduate School of Informatics, Kyoto University, Kyoto, Japan
- International Research Center for Neurointelligence, The University of Tokyo Institutes for Advanced Study, The University of Tokyo, Tokyo, Japan
- Neural Information Processing Laboratories, Advanced Telecommunications Research Institute International (ATR), Kyoto, Japan
| |
Collapse
|
36
|
Abstract
This paper describes a framework for modelling dopamine function in the mammalian brain. It proposes that both learning and action planning involve processes minimizing prediction errors encoded by dopaminergic neurons. In this framework, dopaminergic neurons projecting to different parts of the striatum encode errors in predictions made by the corresponding systems within the basal ganglia. The dopaminergic neurons encode differences between rewards and expectations in the goal-directed system, and differences between the chosen and habitual actions in the habit system. These prediction errors trigger learning about rewards and habit formation, respectively. Additionally, dopaminergic neurons in the goal-directed system play a key role in action planning: They compute the difference between a desired reward and the reward expected from the current motor plan, and they facilitate action planning until this difference diminishes. Presented models account for dopaminergic responses during movements, effects of dopamine depletion on behaviour, and make several experimental predictions. In the brain, chemicals such as dopamine allow nerve cells to ‘talk’ to each other and to relay information from and to the environment. Dopamine, in particular, is released when pleasant surprises are experienced: this helps the organism to learn about the consequences of certain actions. If a new flavour of ice-cream tastes better than expected, for example, the release of dopamine tells the brain that this flavour is worth choosing again. However, dopamine has an additional role in controlling movement. When the cells that produce dopamine die, for instance in Parkinson’s disease, individuals may find it difficult to initiate deliberate movements. Here, Rafal Bogacz aimed to develop a comprehensive framework that could reconcile the two seemingly unrelated roles played by dopamine. The new theory proposes that dopamine is released when an outcome differs from expectations, which helps the organism to adjust and minimise these differences. In the ice-cream example, the difference is between how good the treat is expected to taste, and how tasty it really is. By learning to select the same flavour repeatedly, the brain aligns expectation and the result of the choice. This ability would also apply when movements are planned. In this case, the brain compares the desired reward with the predicted results of the planned actions. For example, while planning to get a spoonful of ice-cream, the brain compares the pleasure expected from the movement that is currently planned, and the pleasure of eating a full spoon of the treat. If the two differ, for example because no movement has been planned yet, the brain releases dopamine to form a better version of the action plan. The theory was then tested using a computer simulation of nerve cells that release dopamine; this showed that the behaviour of the virtual cells closely matched that of their real-life counterparts. This work offers a comprehensive description of the fundamental role of dopamine in the brain. The model now needs to be verified through experiments on living nerve cells; ultimately, it could help doctors and researchers to develop better treatments for conditions such as Parkinson’s disease or ADHD, which are linked to a lack of dopamine.
Collapse
Affiliation(s)
- Rafal Bogacz
- MRC Brain Networks Dynamics Unit, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
37
|
Rubin JE, Vich C, Clapp M, Noneman K, Verstynen T. The credit assignment problem in cortico‐basal ganglia‐thalamic networks: A review, a problem and a possible solution. Eur J Neurosci 2020; 53:2234-2253. [DOI: 10.1111/ejn.14745] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Revised: 03/23/2020] [Accepted: 03/25/2020] [Indexed: 12/21/2022]
Affiliation(s)
- Jonathan E. Rubin
- Department of Mathematics Center for the Neural Basis of Cognition University of Pittsburgh Pittsburgh PA USA
| | - Catalina Vich
- Department de Matemàtiques i Informàtica Institute of Applied Computing and Community Code Universitat de les Illes Balears Palma Spain
| | - Matthew Clapp
- Carnegie Mellon Neuroscience Institute Carnegie Mellon University Pittsburgh PA USA
| | - Kendra Noneman
- Micron School of Materials Science and Engineering Boise State University Boise ID USA
| | - Timothy Verstynen
- Carnegie Mellon Neuroscience Institute Carnegie Mellon University Pittsburgh PA USA
- Department of Psychology Center for the Neural Basis of Cognition Carnegie Mellon University Pittsburgh PA USA
| |
Collapse
|
38
|
van Swieten MMH, Bogacz R. Modeling the effects of motivation on choice and learning in the basal ganglia. PLoS Comput Biol 2020; 16:e1007465. [PMID: 32453725 PMCID: PMC7274475 DOI: 10.1371/journal.pcbi.1007465] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Revised: 06/05/2020] [Accepted: 04/03/2020] [Indexed: 01/08/2023] Open
Abstract
Decision making relies on adequately evaluating the consequences of actions on the basis of past experience and the current physiological state. A key role in this process is played by the basal ganglia, where neural activity and plasticity are modulated by dopaminergic input from the midbrain. Internal physiological factors, such as hunger, scale signals encoded by dopaminergic neurons and thus they alter the motivation for taking actions and learning. However, to our knowledge, no formal mathematical formulation exists for how a physiological state affects learning and action selection in the basal ganglia. We developed a framework for modelling the effect of motivation on choice and learning. The framework defines the motivation to obtain a particular resource as the difference between the desired and the current level of this resource, and proposes how the utility of reinforcements depends on the motivation. To account for dopaminergic activity previously recorded in different physiological states, the paper argues that the prediction error encoded in the dopaminergic activity needs to be redefined as the difference between utility and expected utility, which depends on both the objective reinforcement and the motivation. We also demonstrate a possible mechanism by which the evaluation and learning of utility of actions can be implemented in the basal ganglia network. The presented theory brings together models of learning in the basal ganglia with the incentive salience theory in a single simple framework, and it provides a mechanistic insight into how decision processes and learning in the basal ganglia are modulated by the motivation. Moreover, this theory is also consistent with data on neural underpinnings of overeating and obesity, and makes further experimental predictions.
Collapse
Affiliation(s)
| | - Rafal Bogacz
- MRC Brain Network Dynamics Unit, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
39
|
A distributional code for value in dopamine-based reinforcement learning. Nature 2020; 577:671-675. [PMID: 31942076 DOI: 10.1038/s41586-019-1924-6] [Citation(s) in RCA: 174] [Impact Index Per Article: 43.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Accepted: 11/19/2019] [Indexed: 12/12/2022]
Abstract
Since its introduction, the reward prediction error theory of dopamine has explained a wealth of empirical phenomena, providing a unifying framework for understanding the representation of reward and value in the brain1-3. According to the now canonical theory, reward predictions are represented as a single scalar quantity, which supports learning about the expectation, or mean, of stochastic outcomes. Here we propose an account of dopamine-based reinforcement learning inspired by recent artificial intelligence research on distributional reinforcement learning4-6. We hypothesized that the brain represents possible future rewards not as a single mean, but instead as a probability distribution, effectively representing multiple future outcomes simultaneously and in parallel. This idea implies a set of empirical predictions, which we tested using single-unit recordings from mouse ventral tegmental area. Our findings provide strong evidence for a neural realization of distributional reinforcement learning.
Collapse
|
40
|
Abstract
Modern decision neuroscience offers a powerful and broad account of human behaviour using computational techniques that link psychological and neuroscientific approaches to the ways that individuals can generate near-optimal choices in complex controlled environments. However, until recently, relatively little attention has been paid to the extent to which the structure of experimental environments relates to natural scenarios, and the survival problems that individuals have evolved to solve. This situation not only risks leaving decision-theoretic accounts ungrounded but also makes various aspects of the solutions, such as hard-wired or Pavlovian policies, difficult to interpret in the natural world. Here, we suggest importing concepts, paradigms and approaches from the fields of ethology and behavioural ecology, which concentrate on the contextual and functional correlates of decisions made about foraging and escape and address these lacunae.
Collapse
|
41
|
Dunovan K, Vich C, Clapp M, Verstynen T, Rubin J. Reward-driven changes in striatal pathway competition shape evidence evaluation in decision-making. PLoS Comput Biol 2019; 15:e1006998. [PMID: 31060045 PMCID: PMC6534331 DOI: 10.1371/journal.pcbi.1006998] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 05/24/2019] [Accepted: 04/01/2019] [Indexed: 01/25/2023] Open
Abstract
Cortico-basal-ganglia-thalamic (CBGT) networks are critical for adaptive decision-making, yet how changes to circuit-level properties impact cognitive algorithms remains unclear. Here we explore how dopaminergic plasticity at corticostriatal synapses alters competition between striatal pathways, impacting the evidence accumulation process during decision-making. Spike-timing dependent plasticity simulations showed that dopaminergic feedback based on rewards modified the ratio of direct and indirect corticostriatal weights within opposing action channels. Using the learned weight ratios in a full spiking CBGT network model, we simulated neural dynamics and decision outcomes in a reward-driven decision task and fit them with a drift diffusion model. Fits revealed that the rate of evidence accumulation varied with inter-channel differences in direct pathway activity while boundary height varied with overall indirect pathway activity. This multi-level modeling approach demonstrates how complementary learning and decision computations can emerge from corticostriatal plasticity. Cognitive process models such as reinforcement learning (RL) and the drift diffusion model (DDM) have helped to elucidate the basic algorithms underlying error-corrective learning and the evaluation of accumulating decision evidence leading up to a choice. While these relatively abstract models help to guide experimental and theoretical probes into associated phenomena, they remain uninformative about the actual physical mechanics by which learning and decision algorithms are carried out in a neurobiological substrate during adaptive choice behavior. Here we present an “upwards mapping” approach to bridging neural and cognitive models of value-based decision-making, showing how dopaminergic feedback alters the network-level dynamics of cortico-basal-ganglia-thalamic (CBGT) pathways during learning to bias behavioral choice towards more rewarding actions. By mapping “up” the levels of analysis, this approach yields specific predictions about aspects of neuronal activity that map to the quantities appearing in the cognitive decision-making framework.
Collapse
Affiliation(s)
- Kyle Dunovan
- Dept. of Psychology, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- Center for the Neural Basis of Cognition, Pittsburgh, Pennsylvania, United States of America
| | - Catalina Vich
- Dept. de Matemàtiques i Informàtica, Universitat de les Illes Balears, Palma, Illes Balears, Spain
- Institute of Applied Computing and Community Code, Palma, Illes Balears, Spain
| | - Matthew Clapp
- Dept. of Biomedical Engineering, University of South Carolina, Columbia, South Carolina, United States of America
| | - Timothy Verstynen
- Dept. of Psychology, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- Center for the Neural Basis of Cognition, Pittsburgh, Pennsylvania, United States of America
- * E-mail: (TV); (JR)
| | - Jonathan Rubin
- Center for the Neural Basis of Cognition, Pittsburgh, Pennsylvania, United States of America
- Dept. of Mathematics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- * E-mail: (TV); (JR)
| |
Collapse
|
42
|
Möller M, Bogacz R. Learning the payoffs and costs of actions. PLoS Comput Biol 2019; 15:e1006285. [PMID: 30818357 PMCID: PMC6413954 DOI: 10.1371/journal.pcbi.1006285] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2018] [Revised: 03/12/2019] [Accepted: 01/15/2019] [Indexed: 11/19/2022] Open
Abstract
A set of sub-cortical nuclei called basal ganglia is critical for learning the values of actions. The basal ganglia include two pathways, which have been associated with approach and avoid behavior respectively and are differentially modulated by dopamine projections from the midbrain. Inspired by the influential opponent actor learning model, we demonstrate that, under certain circumstances, these pathways may represent learned estimates of the positive and negative consequences (payoffs and costs) of individual actions. In the model, the level of dopamine activity encodes the motivational state and controls to what extent payoffs and costs enter the overall evaluation of actions. We show that a set of previously proposed plasticity rules is suitable to extract payoffs and costs from a prediction error signal if they occur at different moments in time. For those plasticity rules, successful learning requires differential effects of positive and negative outcome prediction errors on the two pathways and a weak decay of synaptic weights over trials. We also confirm through simulations that the model reproduces drug-induced changes of willingness to work, as observed in classical experiments with the D2-antagonist haloperidol. The basal ganglia are structures underneath the surface of the vertebrate brain, associated with error-driven learning. Much is known about the anatomical and biological features of the basal ganglia; scientists now try to understand the algorithms implemented by these structures. Numerous models aspire to capture the learning functionality, but many of them only cover some specific aspect of the algorithm. Instead of further adding to that pool of partial models, we unify two existing ones—one which captures what the basal ganglia learn, and one that describes the learning mechanism itself. The first model suggests that the basal ganglia weigh positive against negative consequences of actions according to the motivational state. It hints how payoff and cost might be represented, but does not explain how those representations arise. The other model consists of biologically plausible plasticity rules, which describe how learning takes place, but not how the brain makes use of what is learned. We show that the two theories are compatible. Together, they form a model of learning and decision making that integrates the motivational state as well as the learned payoffs and costs of opportunities.
Collapse
Affiliation(s)
- Moritz Möller
- MRC Brain Network Dynamics Unit, Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, United Kingdom
| | - Rafal Bogacz
- MRC Brain Network Dynamics Unit, Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, United Kingdom
- * E-mail:
| |
Collapse
|
43
|
Alabi OO, Fortunato MP, Fuccillo MV. Behavioral Paradigms to Probe Individual Mouse Differences in Value-Based Decision Making. Front Neurosci 2019; 13:50. [PMID: 30792620 PMCID: PMC6374631 DOI: 10.3389/fnins.2019.00050] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2018] [Accepted: 01/18/2019] [Indexed: 01/08/2023] Open
Abstract
Value-based decision making relies on distributed neural systems that weigh the benefits of actions against the cost required to obtain a given outcome. Perturbations of these systems are thought to underlie abnormalities in action selection seen across many neuropsychiatric disorders. Genetic tools in mice provide a promising opportunity to explore the cellular components of these systems and their molecular foundations. However, few tasks have been designed that robustly characterize how individual mice integrate differential reward benefits and cost in their selection of actions. Here we present a forced-choice, two-alternative task in which each option is associated with a specific reward outcome, and unique operant contingency. We employed global and individual trial measures to assess the choice patterns and behavioral flexibility of mice in response to differing "choice benefits" (modeled as varying reward magnitude ratios) and different modalities of "choice cost" (modeled as either increasing repetitive motor output to obtain reward or increased delay to reward delivery). We demonstrate that (1) mouse choice is highly sensitive to the relative benefit of outcomes; (2) choice costs are heavily discounted in environments with large discrepancies in relative reward; (3) divergent cost modalities are differentially integrated into action selection; (4) individual mouse sensitivity to reward benefit is correlated with sensitivity to reward costs. These paradigms reveal stable individual animal differences in value-based action selection, thereby providing a foundation for interrogating the neural circuit and molecular pathophysiology of goal-directed dysfunction.
Collapse
Affiliation(s)
- Opeyemi O Alabi
- Department of Neuroscience, University of Pennsylvania, Philadelphia, PA, United States.,Neuroscience Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Michael P Fortunato
- Department of Neuroscience, University of Pennsylvania, Philadelphia, PA, United States
| | - Marc V Fuccillo
- Department of Neuroscience, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
44
|
Cabessa J, Villa AEP. Attractor dynamics of a Boolean model of a brain circuit controlled by multiple parameters. CHAOS (WOODBURY, N.Y.) 2018; 28:106318. [PMID: 30384642 DOI: 10.1063/1.5042312] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Accepted: 08/29/2018] [Indexed: 06/08/2023]
Abstract
Studies of Boolean recurrent neural networks are briefly introduced with an emphasis on the attractor dynamics determined by the sequence of distinct attractors observed in the limit cycles. We apply this framework to a simplified model of the basal ganglia-thalamocortical circuit where each brain area is represented by a "neuronal" node in a directed graph. Control parameters ranging from neuronal excitability that affects all cells to targeted local connections modified by a new adaptive plasticity rule, and the regulation of the interactive feedback affecting the external input stream of information, allow the network dynamics to switch between stable domains delimited by highly discontinuous boundaries and reach very high levels of complexity with specific configurations. The significance of this approach with regard to brain circuit studies is briefly discussed.
Collapse
Affiliation(s)
- Jérémie Cabessa
- Laboratory of Mathematical Economics (LEMMA), Université Paris 2-Panthéon-Assas, 75005 Paris, France
| | - Alessandro E P Villa
- Neuroheuristic Research Group, University of Lausanne, CH-1015 Lausanne, Switzerland
| |
Collapse
|
45
|
Burke CJ, Soutschek A, Weber S, Raja Beharelle A, Fehr E, Haker H, Tobler PN. Dopamine Receptor-Specific Contributions to the Computation of Value. Neuropsychopharmacology 2018; 43:1415-1424. [PMID: 29251282 PMCID: PMC5916370 DOI: 10.1038/npp.2017.302] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/21/2017] [Revised: 11/07/2017] [Accepted: 12/08/2017] [Indexed: 11/09/2022]
Abstract
Dopamine is thought to play a crucial role in value-based decision making. However, the specific contributions of different dopamine receptor subtypes to the computation of subjective value remain unknown. Here we demonstrate how the balance between D1 and D2 dopamine receptor subtypes shapes subjective value computation during risky decision making. We administered the D2 receptor antagonist amisulpride or placebo before participants made choices between risky options. Compared with placebo, D2 receptor blockade resulted in more frequent choice of higher risk and higher expected value options. Using a novel model fitting procedure, we concurrently estimated the three parameters that define individual risk attitude according to an influential theoretical account of risky decision making (prospect theory). This analysis revealed that the observed reduction in risk aversion under amisulpride was driven by increased sensitivity to reward magnitude and decreased distortion of outcome probability, resulting in more linear value coding. Our data suggest that different components that govern individual risk attitude are under dopaminergic control, such that D2 receptor blockade facilitates risk taking and expected value processing.
Collapse
Affiliation(s)
- Christopher J Burke
- Laboratory for Social and Neural Systems Research, Department of Economics, University of Zurich, Zurich, Switzerland
| | - Alexander Soutschek
- Laboratory for Social and Neural Systems Research, Department of Economics, University of Zurich, Zurich, Switzerland
| | - Susanna Weber
- Laboratory for Social and Neural Systems Research, Department of Economics, University of Zurich, Zurich, Switzerland
| | - Anjali Raja Beharelle
- Laboratory for Social and Neural Systems Research, Department of Economics, University of Zurich, Zurich, Switzerland
| | - Ernst Fehr
- Laboratory for Social and Neural Systems Research, Department of Economics, University of Zurich, Zurich, Switzerland
| | - Helene Haker
- Translational Neuromodeling Unit, Institute for Biomedical Engineering, ETH Zurich, Zurich, Switzerland
| | - Philippe N Tobler
- Laboratory for Social and Neural Systems Research, Department of Economics, University of Zurich, Zurich, Switzerland
| |
Collapse
|
46
|
Chronic nicotine exposure impairs uncertainty modulation on reinforcement learning in anterior cingulate cortex and serotonin system. Neuroimage 2018; 169:323-333. [DOI: 10.1016/j.neuroimage.2017.11.048] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Revised: 11/04/2017] [Accepted: 11/21/2017] [Indexed: 11/18/2022] Open
|
47
|
Abstract
The hypothesis that the phasic dopamine response reports a reward prediction error has become deeply entrenched. However, dopamine neurons exhibit several notable deviations from this hypothesis. A coherent explanation for these deviations can be obtained by analyzing the dopamine response in terms of Bayesian reinforcement learning. The key idea is that prediction errors are modulated by probabilistic beliefs about the relationship between cues and outcomes, updated through Bayesian inference. This account can explain dopamine responses to inferred value in sensory preconditioning, the effects of cue preexposure (latent inhibition), and adaptive coding of prediction errors when rewards vary across orders of magnitude. We further postulate that orbitofrontal cortex transforms the stimulus representation through recurrent dynamics, such that a simple error-driven learning rule operating on the transformed representation can implement the Bayesian reinforcement learning update.
Collapse
Affiliation(s)
- Samuel J. Gershman
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA 02138, U.S.A
| |
Collapse
|
48
|
Grogan JP, Tsivos D, Smith L, Knight BE, Bogacz R, Whone A, Coulthard EJ. Effects of dopamine on reinforcement learning and consolidation in Parkinson's disease. eLife 2017; 6. [PMID: 28691905 PMCID: PMC5531832 DOI: 10.7554/elife.26801] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Accepted: 07/07/2017] [Indexed: 01/24/2023] Open
Abstract
Emerging evidence suggests that dopamine may modulate learning and memory with important implications for understanding the neurobiology of memory and future therapeutic targeting. An influential hypothesis posits that dopamine biases reinforcement learning. More recent data also suggest an influence during both consolidation and retrieval. Eighteen Parkinson's disease patients learned through feedback ON or OFF medication, with memory tested 24 hr later ON or OFF medication (4 conditions, within-subjects design with matched healthy control group). Patients OFF medication during learning decreased in memory accuracy over the following 24 hr. In contrast to previous studies, however, dopaminergic medication during learning and testing did not affect expression of positive or negative reinforcement. Two further experiments were run without the 24 hr delay, but they too failed to reproduce effects of dopaminergic medication on reinforcement learning. While supportive of a dopaminergic role in consolidation, this study failed to replicate previous findings on reinforcement learning.
Collapse
Affiliation(s)
- John P Grogan
- Institute of Clinical Neurosciences, School of Clinical Sciences, University of Bristol, Bristol, United Kingdom
| | - Demitra Tsivos
- Clinical Neurosciences, North Bristol NHS Trust, Bristol, United Kingdom
| | - Laura Smith
- Institute of Clinical Neurosciences, School of Clinical Sciences, University of Bristol, Bristol, United Kingdom
| | - Brogan E Knight
- Clinical Neurosciences, North Bristol NHS Trust, Bristol, United Kingdom
| | - Rafal Bogacz
- MRC Brain Network Dynamics Unit, Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, United Kingdom
| | - Alan Whone
- Institute of Clinical Neurosciences, School of Clinical Sciences, University of Bristol, Bristol, United Kingdom
| | - Elizabeth J Coulthard
- Institute of Clinical Neurosciences, School of Clinical Sciences, University of Bristol, Bristol, United Kingdom.,Clinical Neurosciences, North Bristol NHS Trust, Bristol, United Kingdom
| |
Collapse
|
49
|
Gershman SJ, Monfils MH, Norman KA, Niv Y. The computational nature of memory modification. eLife 2017; 6:e23763. [PMID: 28294944 PMCID: PMC5391211 DOI: 10.7554/elife.23763] [Citation(s) in RCA: 61] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2016] [Accepted: 03/13/2017] [Indexed: 11/25/2022] Open
Abstract
Retrieving a memory can modify its influence on subsequent behavior. We develop a computational theory of memory modification, according to which modification of a memory trace occurs through classical associative learning, but which memory trace is eligible for modification depends on a structure learning mechanism that discovers the units of association by segmenting the stream of experience into statistically distinct clusters (latent causes). New memories are formed when the structure learning mechanism infers that a new latent cause underlies current sensory observations. By the same token, old memories are modified when old and new sensory observations are inferred to have been generated by the same latent cause. We derive this framework from probabilistic principles, and present a computational implementation. Simulations demonstrate that our model can reproduce the major experimental findings from studies of memory modification in the Pavlovian conditioning literature.
Collapse
Affiliation(s)
- Samuel J Gershman
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, United States
| | - Marie-H Monfils
- Department of Psychology, University of Texas, Austin, United States
| | - Kenneth A Norman
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, United States
| | - Yael Niv
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, United States
| |
Collapse
|
50
|
Kurzawa N, Summerfield C, Bogacz R. Neural Circuits Trained with Standard Reinforcement Learning Can Accumulate Probabilistic Information during Decision Making. Neural Comput 2016; 29:368-393. [PMID: 27870610 DOI: 10.1162/neco_a_00917] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Much experimental evidence suggests that during decision making, neural circuits accumulate evidence supporting alternative options. A computational model well describing this accumulation for choices between two options assumes that the brain integrates the log ratios of the likelihoods of the sensory inputs given the two options. Several models have been proposed for how neural circuits can learn these log-likelihood ratios from experience, but all of these models introduced novel and specially dedicated synaptic plasticity rules. Here we show that for a certain wide class of tasks, the log-likelihood ratios are approximately linearly proportional to the expected rewards for selecting actions. Therefore, a simple model based on standard reinforcement learning rules is able to estimate the log-likelihood ratios from experience and on each trial accumulate the log-likelihood ratios associated with presented stimuli while selecting an action. The simulations of the model replicate experimental data on both behavior and neural activity in tasks requiring accumulation of probabilistic cues. Our results suggest that there is no need for the brain to support dedicated plasticity rules, as the standard mechanisms proposed to describe reinforcement learning can enable the neural circuits to perform efficient probabilistic inference.
Collapse
Affiliation(s)
- Nils Kurzawa
- Medical Research Council Brain Network Dynamics Unit, University of Oxford, Oxford, OX1 3QT, U.K., and Institute of Pharmacy and Molecular Biotechnology, University of Heidelberg, D-69120 Heidelberg, Germany
| | | | - Rafal Bogacz
- Medical Research Council Brain Network Dynamics Unit, University of Oxford, Oxford OX1 3UD, U.K., and Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford OX1 3UD, U.K.
| |
Collapse
|