1
|
Seo I, Lee H. Investigating Transfer Learning in Noisy Environments: A Study of Predecessor and Successor Features in Spatial Learning Using a T-Maze. SENSORS (BASEL, SWITZERLAND) 2024; 24:6419. [PMID: 39409459 PMCID: PMC11479366 DOI: 10.3390/s24196419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/29/2024] [Revised: 09/27/2024] [Accepted: 10/02/2024] [Indexed: 10/20/2024]
Abstract
In this study, we investigate the adaptability of artificial agents within a noisy T-maze that use Markov decision processes (MDPs) and successor feature (SF) and predecessor feature (PF) learning algorithms. Our focus is on quantifying how varying the hyperparameters, specifically the reward learning rate (αr) and the eligibility trace decay rate (λ), can enhance their adaptability. Adaptation is evaluated by analyzing the hyperparameters of cumulative reward, step length, adaptation rate, and adaptation step length and the relationships between them using Spearman's correlation tests and linear regression. Our findings reveal that an αr of 0.9 consistently yields superior adaptation across all metrics at a noise level of 0.05. However, the optimal setting for λ varies by metric and context. In discussing these results, we emphasize the critical role of hyperparameter optimization in refining the performance and transfer learning efficacy of learning algorithms. This research advances our understanding of the functionality of PF and SF algorithms, particularly in navigating the inherent uncertainty of transfer learning tasks. By offering insights into the optimal hyperparameter configurations, this study contributes to the development of more adaptive and robust learning algorithms, paving the way for future explorations in artificial intelligence and neuroscience.
Collapse
Affiliation(s)
- Incheol Seo
- Department of Immunology, Kyungpook National University School of Medicine, Daegu 41944, Republic of Korea
| | - Hyunsu Lee
- Department of Physiology, Pusan National University School of Medicine, Yangsan 50612, Republic of Korea
- Research Institute for Convergence of Biomedical Science and Technology, Pusan National University Yangsan Hospital, Yangsan 50612, Republic of Korea
| |
Collapse
|
2
|
Furutachi S, Franklin AD, Aldea AM, Mrsic-Flogel TD, Hofer SB. Cooperative thalamocortical circuit mechanism for sensory prediction errors. Nature 2024; 633:398-406. [PMID: 39198646 PMCID: PMC11390482 DOI: 10.1038/s41586-024-07851-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 07/18/2024] [Indexed: 09/01/2024]
Abstract
The brain functions as a prediction machine, utilizing an internal model of the world to anticipate sensations and the outcomes of our actions. Discrepancies between expected and actual events, referred to as prediction errors, are leveraged to update the internal model and guide our attention towards unexpected events1-10. Despite the importance of prediction-error signals for various neural computations across the brain, surprisingly little is known about the neural circuit mechanisms responsible for their implementation. Here we describe a thalamocortical disinhibitory circuit that is required for generating sensory prediction-error signals in mouse primary visual cortex (V1). We show that violating animals' predictions by an unexpected visual stimulus preferentially boosts responses of the layer 2/3 V1 neurons that are most selective for that stimulus. Prediction errors specifically amplify the unexpected visual input, rather than representing non-specific surprise or difference signals about how the visual input deviates from the animal's predictions. This selective amplification is implemented by a cooperative mechanism requiring thalamic input from the pulvinar and cortical vasoactive-intestinal-peptide-expressing (VIP) inhibitory interneurons. In response to prediction errors, VIP neurons inhibit a specific subpopulation of somatostatin-expressing inhibitory interneurons that gate excitatory pulvinar input to V1, resulting in specific pulvinar-driven response amplification of the most stimulus-selective neurons in V1. Therefore, the brain prioritizes unpredicted sensory information by selectively increasing the salience of unpredicted sensory features through the synergistic interaction of thalamic input and neocortical disinhibitory circuits.
Collapse
Affiliation(s)
- Shohei Furutachi
- Sainsbury Wellcome Centre, University College London, London, UK.
| | | | - Andreea M Aldea
- Sainsbury Wellcome Centre, University College London, London, UK
| | | | - Sonja B Hofer
- Sainsbury Wellcome Centre, University College London, London, UK.
| |
Collapse
|
3
|
Lin W, Dolan RJ. Decision-Making, Pro-variance Biases and Mood-Related Traits. COMPUTATIONAL PSYCHIATRY (CAMBRIDGE, MASS.) 2024; 8:142-158. [PMID: 39184228 PMCID: PMC11342847 DOI: 10.5334/cpsy.114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Accepted: 07/19/2024] [Indexed: 08/27/2024]
Abstract
In value-based decision-making there is wide behavioural variability in how individuals respond to uncertainty. Maladaptive responses to uncertainty have been linked to a vulnerability to mental illness, for example, between risk aversion and affective disorders. Here, we examine individual differences in risk sensitivity when subjects confront options drawn from different value distributions, where these embody the same or different means and variances. In simulations, we show that a model that learns a distribution using Bayes' rule and reads out different parts of the distribution under the influence of a risk-sensitive parameter (Conditional Value at Risk, CVaR) predicts how likely an agent is to prefer a broader over a narrow distribution (pro-variance bias/risk-seeking) under the same overall means. Using empirical data, we show that CVaR estimates correlate with participants' pro-variance biases better than a range of alternative parameters derived from other models. Importantly, across two independent samples, CVaR estimates and participants' pro-variance bias negatively correlated with trait rumination, a common trait in depression and anxiety. We conclude that a Bayesian-CVaR model captures individual differences in sensitivity to variance in value distributions and task-independent trait dispositions linked to affective disorders.
Collapse
Affiliation(s)
- Wanjun Lin
- Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London WC1B 5EH, UK
| | - Raymond J. Dolan
- Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London WC1B 5EH, UK
- Welcome Centre for Human Neuroimaging, University College London, London WC1N 3BG, UK
| |
Collapse
|
4
|
Negm A, Ma X, Aggidis G. Deep reinforcement learning challenges and opportunities for urban water systems. WATER RESEARCH 2024; 253:121145. [PMID: 38330870 DOI: 10.1016/j.watres.2024.121145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 01/09/2024] [Accepted: 01/14/2024] [Indexed: 02/10/2024]
Abstract
The efficient and sustainable supply and transport of water is a key component to any functioning civilisation making the role of urban water systems (UWS) inherently crucial to the wellbeing of its customers. However, managing water is not a simple task. Whether it is ageing infrastructure, transient flows, air cavities or low pressures; water can be lost as a result of many issues that face UWSs. The complexity of those networks grows with the high urbanisation trends and climate change making water companies and regulatory bodies in need of new solutions. So, it comes as no surprise that many researchers are invested in innovating within the water industry to ensure that the future of our water is safe. Deep reinforcement learning (DRL) has the potential to tackle complexities that used to be very challenging as it relies on deep neural networks for function approximation and representation. This technology has conquered many fields due to its impressive results and can effectively revolutionise UWS. In this article, we explain the background of DRL and the milestones of this field using a novel taxonomy of the DRL algorithms. This will be followed by with a novel review of DRL applications in the UWS which focus on water distribution networks and stormwater systems. The review will be concluded with critical insights on how DRL can benefit different aspects of urban water systems.
Collapse
Affiliation(s)
- Ahmed Negm
- Lancaster University Energy Group, School of Engineering, Lancaster LA1 4YW, UK
| | - Xiandong Ma
- Lancaster University Energy Group, School of Engineering, Lancaster LA1 4YW, UK
| | - George Aggidis
- Lancaster University Energy Group, School of Engineering, Lancaster LA1 4YW, UK.
| |
Collapse
|
5
|
Amo R, Uchida N, Watabe-Uchida M. Glutamate inputs send prediction error of reward, but not negative value of aversive stimuli, to dopamine neurons. Neuron 2024; 112:1001-1019.e6. [PMID: 38278147 PMCID: PMC10957320 DOI: 10.1016/j.neuron.2023.12.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Revised: 11/10/2023] [Accepted: 12/21/2023] [Indexed: 01/28/2024]
Abstract
Midbrain dopamine neurons are thought to signal reward prediction errors (RPEs), but the mechanisms underlying RPE computation, particularly the contributions of different neurotransmitters, remain poorly understood. Here, we used a genetically encoded glutamate sensor to examine the pattern of glutamate inputs to dopamine neurons in mice. We found that glutamate inputs exhibit virtually all of the characteristics of RPE rather than conveying a specific component of RPE computation, such as reward or expectation. Notably, whereas glutamate inputs were transiently inhibited by reward omission, they were excited by aversive stimuli. Opioid analgesics altered dopamine negative responses to aversive stimuli into more positive responses, whereas excitatory responses of glutamate inputs remained unchanged. Our findings uncover previously unknown synaptic mechanisms underlying RPE computations; dopamine responses are shaped by both synergistic and competitive interactions between glutamatergic and GABAergic inputs to dopamine neurons depending on valences, with competitive interactions playing a role in responses to aversive stimuli.
Collapse
Affiliation(s)
- Ryunosuke Amo
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA 02138, USA
| | - Naoshige Uchida
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA 02138, USA
| | - Mitsuko Watabe-Uchida
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA 02138, USA.
| |
Collapse
|
6
|
Li J, Liu Q, Chi G. Distributed deep reinforcement learning based on bi-objective framework for multi-robot formation. Neural Netw 2024; 171:61-72. [PMID: 38091765 DOI: 10.1016/j.neunet.2023.11.063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 10/18/2023] [Accepted: 11/29/2023] [Indexed: 01/29/2024]
Abstract
Improving generalization ability in multi-robot formation can reduce repetitive training and calculation. In this paper, we study the multi-robot formation problem with the ability to generalize the target position. Since the generalization ability of neural network is directly proportional to spatial dimension, we adopt the strategy of using different networks to solve different objectives, so that the network learning can focus on the learning of one objective to obtain better performance. In addition, this paper presents a distributed deep reinforcement learning method based on soft actor-critic algorithm for solving multi-robot formation problem. At the same time, the formation evaluation assignment function is designed to adapt to distributed training. Compared with the original algorithm, the improved algorithm can get higher reward cumulative values. The experimental results show that the proposed algorithm can better maintain the desired formation in the moving process, and the rotation design in the reward function makes the multi-robot system have better flexibility in formation. The comparison of control signal curve shows that the proposed algorithm is more stable. At the end of the experiments, the universality of the proposed algorithm in formation maintenance and formation variations is demonstrated.
Collapse
Affiliation(s)
- Jinming Li
- School of Mathematics, Southeast University, Nanjing 210096, China.
| | - Qingshan Liu
- School of Mathematics, Southeast University, Nanjing 210096, China; Purple Mountain Laboratories, Nanjing 211111, China.
| | - Guoyi Chi
- Tencent Robotics X Lab, Tencent Technology (Shenzhen) Co., Ltd., Shenzhen 518057, China.
| |
Collapse
|
7
|
Amo R. Prediction error in dopamine neurons during associative learning. Neurosci Res 2024; 199:12-20. [PMID: 37451506 DOI: 10.1016/j.neures.2023.07.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 06/18/2023] [Accepted: 07/07/2023] [Indexed: 07/18/2023]
Abstract
Dopamine neurons have long been thought to facilitate learning by broadcasting reward prediction error (RPE), a teaching signal used in machine learning, but more recent work has advanced alternative models of dopamine's computational role. Here, I revisit this critical issue and review new experimental evidences that tighten the link between dopamine activity and RPE. First, I introduce the recent observation of a gradual backward shift of dopamine activity that had eluded researchers for over a decade. I also discuss several other findings, such as dopamine ramping, that were initially interpreted to conflict but later found to be consistent with RPE. These findings improve our understanding of neural computation in dopamine neurons.
Collapse
Affiliation(s)
- Ryunosuke Amo
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA 02138, USA.
| |
Collapse
|
8
|
Lowet AS, Zheng Q, Meng M, Matias S, Drugowitsch J, Uchida N. An opponent striatal circuit for distributional reinforcement learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.02.573966. [PMID: 38260354 PMCID: PMC10802299 DOI: 10.1101/2024.01.02.573966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Machine learning research has achieved large performance gains on a wide range of tasks by expanding the learning target from mean rewards to entire probability distributions of rewards - an approach known as distributional reinforcement learning (RL)1. The mesolimbic dopamine system is thought to underlie RL in the mammalian brain by updating a representation of mean value in the striatum2,3, but little is known about whether, where, and how neurons in this circuit encode information about higher-order moments of reward distributions4. To fill this gap, we used high-density probes (Neuropixels) to acutely record striatal activity from well-trained, water-restricted mice performing a classical conditioning task in which reward mean, reward variance, and stimulus identity were independently manipulated. In contrast to traditional RL accounts, we found robust evidence for abstract encoding of variance in the striatum. Remarkably, chronic ablation of dopamine inputs disorganized these distributional representations in the striatum without interfering with mean value coding. Two-photon calcium imaging and optogenetics revealed that the two major classes of striatal medium spiny neurons - D1 and D2 MSNs - contributed to this code by preferentially encoding the right and left tails of the reward distribution, respectively. We synthesize these findings into a new model of the striatum and mesolimbic dopamine that harnesses the opponency between D1 and D2 MSNs5-15 to reap the computational benefits of distributional RL.
Collapse
Affiliation(s)
- Adam S. Lowet
- Center for Brain Science, Harvard University, Cambridge, MA, USA
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
- Program in Neuroscience, Harvard University, Boston, MA, USA
| | - Qiao Zheng
- Center for Brain Science, Harvard University, Cambridge, MA, USA
- Department of Neurobiology, Harvard Medical School, Boston, MA, USA
| | - Melissa Meng
- Center for Brain Science, Harvard University, Cambridge, MA, USA
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
| | - Sara Matias
- Center for Brain Science, Harvard University, Cambridge, MA, USA
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
| | - Jan Drugowitsch
- Center for Brain Science, Harvard University, Cambridge, MA, USA
- Department of Neurobiology, Harvard Medical School, Boston, MA, USA
| | - Naoshige Uchida
- Center for Brain Science, Harvard University, Cambridge, MA, USA
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
| |
Collapse
|
9
|
Hoy CW, Quiroga-Martinez DR, Sandoval E, King-Stephens D, Laxer KD, Weber P, Lin JJ, Knight RT. Asymmetric coding of reward prediction errors in human insula and dorsomedial prefrontal cortex. Nat Commun 2023; 14:8520. [PMID: 38129440 PMCID: PMC10739882 DOI: 10.1038/s41467-023-44248-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 12/05/2023] [Indexed: 12/23/2023] Open
Abstract
The signed value and unsigned salience of reward prediction errors (RPEs) are critical to understanding reinforcement learning (RL) and cognitive control. Dorsomedial prefrontal cortex (dMPFC) and insula (INS) are key regions for integrating reward and surprise information, but conflicting evidence for both signed and unsigned activity has led to multiple proposals for the nature of RPE representations in these brain areas. Recently developed RL models allow neurons to respond differently to positive and negative RPEs. Here, we use intracranially recorded high frequency activity (HFA) to test whether this flexible asymmetric coding strategy captures RPE coding diversity in human INS and dMPFC. At the region level, we found a bias towards positive RPEs in both areas which paralleled behavioral adaptation. At the local level, we found spatially interleaved neural populations responding to unsigned RPE salience and valence-specific positive and negative RPEs. Furthermore, directional connectivity estimates revealed a leading role of INS in communicating positive and unsigned RPEs to dMPFC. These findings support asymmetric coding across distinct but intermingled neural populations as a core principle of RPE processing and inform theories of the role of dMPFC and INS in RL and cognitive control.
Collapse
Affiliation(s)
- Colin W Hoy
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA.
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, USA.
| | - David R Quiroga-Martinez
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, USA
- Center for Music in the Brain, Aarhus University & The Royal Academy of Music, Aarhus, Denmark
| | - Eduardo Sandoval
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, USA
| | - David King-Stephens
- Department of Neurology and Neurosurgery, California Pacific Medical Center, San Francisco, CA, USA
- Department of Neurology, Yale School of Medicine, New Haven, CT, USA
| | - Kenneth D Laxer
- Department of Neurology and Neurosurgery, California Pacific Medical Center, San Francisco, CA, USA
| | - Peter Weber
- Department of Neurology and Neurosurgery, California Pacific Medical Center, San Francisco, CA, USA
| | - Jack J Lin
- Department of Neurology, University of California, Davis, Davis, CA, USA
- Center for Mind and Brain, University of California, Davis, Davis, CA, USA
| | - Robert T Knight
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, USA
- Department of Psychology, University of California, Berkeley, Berkeley, CA, USA
| |
Collapse
|
10
|
Danskin BP, Hattori R, Zhang YE, Babic Z, Aoi M, Komiyama T. Exponential history integration with diverse temporal scales in retrosplenial cortex supports hyperbolic behavior. SCIENCE ADVANCES 2023; 9:eadj4897. [PMID: 38019904 PMCID: PMC10686558 DOI: 10.1126/sciadv.adj4897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 10/27/2023] [Indexed: 12/01/2023]
Abstract
Animals use past experience to guide future choices. The integration of experiences typically follows a hyperbolic, rather than exponential, decay pattern with a heavy tail for distant history. Hyperbolic integration affords sensitivity to both recent environmental dynamics and long-term trends. However, it is unknown how the brain implements hyperbolic integration. We found that mouse behavior in a foraging task showed hyperbolic decay of past experience, but the activity of cortical neurons showed exponential decay. We resolved this apparent mismatch by observing that cortical neurons encode history information with heterogeneous exponential time constants that vary across neurons. A model combining these diverse timescales recreated the heavy-tailed, hyperbolic history integration observed in behavior. In particular, the time constants of retrosplenial cortex (RSC) neurons best matched the behavior, and optogenetic inactivation of RSC uniquely reduced behavioral history dependence. These results indicate that behavior-relevant history information is maintained across multiple timescales in parallel and that RSC is a critical reservoir of information guiding decision-making.
Collapse
Affiliation(s)
- Bethanny P. Danskin
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA
- Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
| | - Ryoma Hattori
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA
- Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
| | - Yu E. Zhang
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA
- Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
| | - Zeljana Babic
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA
- Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
| | - Mikio Aoi
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
| | - Takaki Komiyama
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA
- Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
| |
Collapse
|
11
|
Blaess S, Krabbe S. Cell type specificity for circuit output in the midbrain dopaminergic system. Curr Opin Neurobiol 2023; 83:102811. [PMID: 37972537 DOI: 10.1016/j.conb.2023.102811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 09/14/2023] [Accepted: 10/19/2023] [Indexed: 11/19/2023]
Abstract
Midbrain dopaminergic neurons are a relatively small group of neurons in the mammalian brain controlling a wide range of behaviors. In recent years, increasingly sophisticated tracing, imaging, transcriptomic, and machine learning approaches have provided substantial insights into the anatomical, molecular, and functional heterogeneity of dopaminergic neurons. Despite this wealth of new knowledge, it remains unclear whether and how the diverse features defining dopaminergic subclasses converge to delineate functional ensembles within the dopaminergic system. Here, we review recent studies investigating various aspects of dopaminergic heterogeneity and discuss how development, behavior, and disease influence subtype characteristics. We then outline what further approaches could be pursued to gain a more inclusive picture of dopaminergic diversity, which could be crucial to understanding the functional architecture of this system.
Collapse
Affiliation(s)
- Sandra Blaess
- Neurodevelopmental Genetics, Institute of Reconstructive Neurobiology, Medical Faculty, University of Bonn, 53127 Bonn, Germany.
| | - Sabine Krabbe
- German Center for Neurodegenerative Diseases (DZNE), 53127 Bonn, Germany.
| |
Collapse
|
12
|
Pinto SR, Uchida N. Tonic dopamine and biases in value learning linked through a biologically inspired reinforcement learning model. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.10.566580. [PMID: 38014087 PMCID: PMC10680794 DOI: 10.1101/2023.11.10.566580] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
A hallmark of various psychiatric disorders is biased future predictions. Here we examined the mechanisms for biased value learning using reinforcement learning models incorporating recent findings on synaptic plasticity and opponent circuit mechanisms in the basal ganglia. We show that variations in tonic dopamine can alter the balance between learning from positive and negative reward prediction errors, leading to biased value predictions. This bias arises from the sigmoidal shapes of the dose-occupancy curves and distinct affinities of D1- and D2-type dopamine receptors: changes in tonic dopamine differentially alters the slope of the dose-occupancy curves of these receptors, thus sensitivities, at baseline dopamine concentrations. We show that this mechanism can explain biased value learning in both mice and humans and may also contribute to symptoms observed in psychiatric disorders. Our model provides a foundation for understanding the basal ganglia circuit and underscores the significance of tonic dopamine in modulating learning processes.
Collapse
Affiliation(s)
- Sandra Romero Pinto
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA 02138, USA
- Program in Speech and Hearing Bioscience and Technology, Division of Medical Sciences, Harvard Medical School, Boston, MA 02115, USA
| | - Naoshige Uchida
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
13
|
Masset P, Tano P, Kim HR, Malik AN, Pouget A, Uchida N. Multi-timescale reinforcement learning in the brain. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.12.566754. [PMID: 38014166 PMCID: PMC10680596 DOI: 10.1101/2023.11.12.566754] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
To thrive in complex environments, animals and artificial agents must learn to act adaptively to maximize fitness and rewards. Such adaptive behavior can be learned through reinforcement learning1, a class of algorithms that has been successful at training artificial agents2-6 and at characterizing the firing of dopamine neurons in the midbrain7-9. In classical reinforcement learning, agents discount future rewards exponentially according to a single time scale, controlled by the discount factor. Here, we explore the presence of multiple timescales in biological reinforcement learning. We first show that reinforcement agents learning at a multitude of timescales possess distinct computational benefits. Next, we report that dopamine neurons in mice performing two behavioral tasks encode reward prediction error with a diversity of discount time constants. Our model explains the heterogeneity of temporal discounting in both cue-evoked transient responses and slower timescale fluctuations known as dopamine ramps. Crucially, the measured discount factor of individual neurons is correlated across the two tasks suggesting that it is a cell-specific property. Together, our results provide a new paradigm to understand functional heterogeneity in dopamine neurons, a mechanistic basis for the empirical observation that humans and animals use non-exponential discounts in many situations10-14, and open new avenues for the design of more efficient reinforcement learning algorithms.
Collapse
Affiliation(s)
- Paul Masset
- Department of Molecular and Cellular Biology, Harvard University, USA
- Center for Brain Science, Harvard University, USA
| | - Pablo Tano
- Department of Basic Neuroscience, University of Geneva, Switzerland
| | - HyungGoo R Kim
- Department of Molecular and Cellular Biology, Harvard University, USA
- Center for Brain Science, Harvard University, USA
- Department of Biomedical Engineering, Sungkyunkwan University, Suwon 16419, Republic of Korea
- Center for Neuroscience Imaging Research, Institute for Basic Science (IBS), Suwon 16419, Republic of Korea
| | - Athar N Malik
- Department of Molecular and Cellular Biology, Harvard University, USA
- Center for Brain Science, Harvard University, USA
- Department of Neurosurgery, Warren Alpert Medical School of Brown University, USA
- Norman Prince Neurosciences Institute, Rhode Island Hospital, USA
| | - Alexandre Pouget
- Department of Basic Neuroscience, University of Geneva, Switzerland
| | - Naoshige Uchida
- Department of Molecular and Cellular Biology, Harvard University, USA
- Center for Brain Science, Harvard University, USA
| |
Collapse
|
14
|
Amo R, Uchida N, Watabe-Uchida M. Glutamate inputs send prediction error of reward but not negative value of aversive stimuli to dopamine neurons. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.09.566472. [PMID: 37986868 PMCID: PMC10659341 DOI: 10.1101/2023.11.09.566472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
Midbrain dopamine neurons are thought to signal reward prediction errors (RPEs) but the mechanisms underlying RPE computation, particularly contributions of different neurotransmitters, remain poorly understood. Here we used a genetically-encoded glutamate sensor to examine the pattern of glutamate inputs to dopamine neurons. We found that glutamate inputs exhibit virtually all of the characteristics of RPE, rather than conveying a specific component of RPE computation such as reward or expectation. Notably, while glutamate inputs were transiently inhibited by reward omission, they were excited by aversive stimuli. Opioid analgesics altered dopamine negative responses to aversive stimuli toward more positive responses, while excitatory responses of glutamate inputs remained unchanged. Our findings uncover previously unknown synaptic mechanisms underlying RPE computations; dopamine responses are shaped by both synergistic and competitive interactions between glutamatergic and GABAergic inputs to dopamine neurons depending on valences, with competitive interactions playing a role in responses to aversive stimuli.
Collapse
Affiliation(s)
- Ryunosuke Amo
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA 02138, USA
| | - Naoshige Uchida
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA 02138, USA
| | - Mitsuko Watabe-Uchida
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
15
|
Tymula A, Wang X, Imaizumi Y, Kawai T, Kunimatsu J, Matsumoto M, Yamada H. Dynamic prospect theory: Two core decision theories coexist in the gambling behavior of monkeys and humans. SCIENCE ADVANCES 2023; 9:eade7972. [PMID: 37205752 DOI: 10.1126/sciadv.ade7972] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 04/14/2023] [Indexed: 05/21/2023]
Abstract
Research in the multidisciplinary field of neuroeconomics has mainly been driven by two influential theories regarding human economic choice: prospect theory, which describes decision-making under risk, and reinforcement learning theory, which describes learning for decision-making. We hypothesized that these two distinct theories guide decision-making in a comprehensive manner. Here, we propose and test a decision-making theory under uncertainty that combines these highly influential theories. Collecting many gambling decisions from laboratory monkeys allowed for reliable testing of our model and revealed a systematic violation of prospect theory's assumption that probability weighting is static. Using the same experimental paradigm in humans, substantial similarities between these species were uncovered by various econometric analyses of our dynamic prospect theory model, which incorporates decision-by-decision learning dynamics of prediction errors into static prospect theory. Our model provides a unified theoretical framework for exploring a neurobiological model of economic choice in human and nonhuman primates.
Collapse
Affiliation(s)
- Agnieszka Tymula
- School of Economics, University of Sydney, Sydney, NSW 2006, Australia
| | - Xueting Wang
- School of Economics, Finance and Marketing, College of Business and Law, RMIT University, Melbourne, VIC 2476, Australia
| | - Yuri Imaizumi
- Medical Sciences, University of Tsukuba, 1-1-1 Tenno-dai, Tsukuba, Ibaraki 305-8577, Japan
| | - Takashi Kawai
- Division of Biomedical Science, Institute of Medicine, University of Tsukuba, 1-1-1 Tenno-dai, Tsukuba, Ibaraki 305-8577, Japan
| | - Jun Kunimatsu
- Division of Biomedical Science, Institute of Medicine, University of Tsukuba, 1-1-1 Tenno-dai, Tsukuba, Ibaraki 305-8577, Japan
- Graduate School of Comprehensive Human Sciences, University of Tsukuba, 1-1-1 Tenno-dai, Tsukuba, Ibaraki 305-8577, Japan
- Transborder Medical Research Center, University of Tsukuba, 1-1-1 Tenno-dai, Tsukuba, Ibaraki 305-8577, Japan
| | - Masayuki Matsumoto
- Division of Biomedical Science, Institute of Medicine, University of Tsukuba, 1-1-1 Tenno-dai, Tsukuba, Ibaraki 305-8577, Japan
- Graduate School of Comprehensive Human Sciences, University of Tsukuba, 1-1-1 Tenno-dai, Tsukuba, Ibaraki 305-8577, Japan
- Transborder Medical Research Center, University of Tsukuba, 1-1-1 Tenno-dai, Tsukuba, Ibaraki 305-8577, Japan
| | - Hiroshi Yamada
- Division of Biomedical Science, Institute of Medicine, University of Tsukuba, 1-1-1 Tenno-dai, Tsukuba, Ibaraki 305-8577, Japan
- Graduate School of Comprehensive Human Sciences, University of Tsukuba, 1-1-1 Tenno-dai, Tsukuba, Ibaraki 305-8577, Japan
- Transborder Medical Research Center, University of Tsukuba, 1-1-1 Tenno-dai, Tsukuba, Ibaraki 305-8577, Japan
| |
Collapse
|
16
|
Edwin Thanarajah S, DiFeliceantonio AG, Albus K, Kuzmanovic B, Rigoux L, Iglesias S, Hanßen R, Schlamann M, Cornely OA, Brüning JC, Tittgemeyer M, Small DM. Habitual daily intake of a sweet and fatty snack modulates reward processing in humans. Cell Metab 2023; 35:571-584.e6. [PMID: 36958330 DOI: 10.1016/j.cmet.2023.02.015] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 10/21/2022] [Accepted: 02/23/2023] [Indexed: 03/25/2023]
Abstract
Western diets rich in fat and sugar promote excess calorie intake and weight gain; however, the underlying mechanisms are unclear. Despite a well-documented association between obesity and altered brain dopamine function, it remains elusive whether these alterations are (1) pre-existing, increasing the individual susceptibility to weight gain, (2) secondary to obesity, or (3) directly attributable to repeated exposure to western diet. To close this gap, we performed a randomized, controlled study (NCT05574660) with normal-weight participants exposed to a high-fat/high-sugar snack or a low-fat/low-sugar snack for 8 weeks in addition to their regular diet. The high-fat/high-sugar intervention decreased the preference for low-fat food while increasing brain response to food and associative learning independent of food cues or reward. These alterations were independent of changes in body weight and metabolic parameters, indicating a direct effect of high-fat, high-sugar foods on neurobehavioral adaptations that may increase the risk for overeating and weight gain.
Collapse
Affiliation(s)
- Sharmili Edwin Thanarajah
- Max Planck Institute for Metabolism Research, Cologne, Germany; Department of Psychiatry, Psychosomatic Medicine and Psychotherapy, University Hospital, Goethe University, Frankfurt, Germany
| | - Alexandra G DiFeliceantonio
- Fralin Biomedical Research Institute at Virginia Tech Carilion & Department of Human Nutrition, Foods, and Exercise, College of Agriculture and Life Sciences, Roanoke, VA, USA
| | - Kerstin Albus
- Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases (CECAD), University of Cologne, Cologne, Germany; Department I of Internal Medicine, Center for Integrated Oncology Aachen Bonn Cologne Duesseldorf (CIO ABCD) & Excellence Center for Medical Mycology (ECMM), Faculty of Medicine and University Hospital Cologne, Cologne, Germany
| | | | - Lionel Rigoux
- Max Planck Institute for Metabolism Research, Cologne, Germany
| | - Sandra Iglesias
- Translational Neuromodeling Unit, Institute for Biomedical Engineering, University of Zurich and Swiss Federal Institute of Technology, Zurich, Switzerland
| | - Ruth Hanßen
- Max Planck Institute for Metabolism Research, Cologne, Germany; Policlinic for Endocrinology, Diabetes and Preventive Medicine (PEPD), University of Cologne, Faculty of Medicine and University Hospital Cologne, Cologne, Germany
| | - Marc Schlamann
- Department of Neuroradiology, University Hospital of Cologne, Kerpener Str. 62, 50937 Cologne, Germany
| | - Oliver A Cornely
- Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases (CECAD), University of Cologne, Cologne, Germany; Department I of Internal Medicine, Center for Integrated Oncology Aachen Bonn Cologne Duesseldorf (CIO ABCD) & Excellence Center for Medical Mycology (ECMM), Faculty of Medicine and University Hospital Cologne, Cologne, Germany; German Centre for Infection Research (DZIF), Partner Site Bonn-Cologne, Cologne, Germany; Clinical Trials Centre Cologne (ZKS Köln), University of Cologne, Faculty of Medicine and University Hospital Cologne, Cologne, Germany
| | - Jens C Brüning
- Max Planck Institute for Metabolism Research, Cologne, Germany; Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases (CECAD), University of Cologne, Cologne, Germany; Policlinic for Endocrinology, Diabetes and Preventive Medicine (PEPD), University of Cologne, Faculty of Medicine and University Hospital Cologne, Cologne, Germany
| | - Marc Tittgemeyer
- Max Planck Institute for Metabolism Research, Cologne, Germany; Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases (CECAD), University of Cologne, Cologne, Germany.
| | - Dana M Small
- Modern Diet and Physiology Research Center, New Haven, CT, USA; Yale University School of Medicine, Department of Psychiatry, New Haven, CT, USA.
| |
Collapse
|
17
|
Barnby JM, Dayan P, Bell V. Formalising social representation to explain psychiatric symptoms. Trends Cogn Sci 2023; 27:317-332. [PMID: 36609016 DOI: 10.1016/j.tics.2022.12.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Revised: 12/09/2022] [Accepted: 12/13/2022] [Indexed: 01/06/2023]
Abstract
Recent work in social cognition has moved beyond a focus on how people process social rewards to examine how healthy people represent other agents and how this is altered in psychiatric disorders. However, formal modelling of social representation has not kept pace with these changes, impeding our understanding of how core aspects of social cognition function, and fail, in psychopathology. Here, we suggest that belief-based computational models provide a basis for an integrated sociocognitive approach to psychiatry, with the potential to address important but unexamined pathologies of social representation, such as maladaptive schemas and illusory social agents.
Collapse
Affiliation(s)
- Joseph M Barnby
- Social Computation and Cognitive Representation Lab, Department of Psychology, Royal Holloway, University of London, Egham TW20 0EX, UK.
| | - Peter Dayan
- Max Planck Institute for Biological Cybernetics, Tübingen, 72076, Germany; University of Tübingen, Tübingen, 72074, Germany
| | - Vaughan Bell
- Clinical, Educational, and Health Psychology, University College London, London WC1E 7HB, UK; South London and Maudsley NHS Foundation Trust, London SE5 8AZ, UK
| |
Collapse
|
18
|
Xu T, Zhou X, Kanen JW, Wang L, Li J, Chen Z, Zhang R, Jiao G, Zhou F, Zhao W, Yao S, Becker B. Angiotensin blockade enhances motivational reward learning via enhancing striatal prediction error signaling and frontostriatal communication. Mol Psychiatry 2023; 28:1692-1702. [PMID: 36810437 DOI: 10.1038/s41380-023-02001-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 02/09/2023] [Accepted: 02/10/2023] [Indexed: 02/23/2023]
Abstract
Adaptive human learning utilizes reward prediction errors (RPEs) that scale the differences between expected and actual outcomes to optimize future choices. Depression has been linked with biased RPE signaling and an exaggerated impact of negative outcomes on learning which may promote amotivation and anhedonia. The present proof-of-concept study combined computational modeling and multivariate decoding with neuroimaging to determine the influence of the selective competitive angiotensin II type 1 receptor antagonist losartan on learning from positive or negative outcomes and the underlying neural mechanisms in healthy humans. In a double-blind, between-subjects, placebo-controlled pharmaco-fMRI experiment, 61 healthy male participants (losartan, n = 30; placebo, n = 31) underwent a probabilistic selection reinforcement learning task incorporating a learning and transfer phase. Losartan improved choice accuracy for the hardest stimulus pair via increasing expected value sensitivity towards the rewarding stimulus relative to the placebo group during learning. Computational modeling revealed that losartan reduced the learning rate for negative outcomes and increased exploitatory choice behaviors while preserving learning for positive outcomes. These behavioral patterns were paralleled on the neural level by increased RPE signaling in orbitofrontal-striatal regions and enhanced positive outcome representations in the ventral striatum (VS) following losartan. In the transfer phase, losartan accelerated response times and enhanced VS functional connectivity with left dorsolateral prefrontal cortex when approaching maximum rewards. These findings elucidate the potential of losartan to reduce the impact of negative outcomes during learning and subsequently facilitate motivational approach towards maximum rewards in the transfer of learning. This may indicate a promising therapeutic mechanism to normalize distorted reward learning and fronto-striatal functioning in depression.
Collapse
Affiliation(s)
- Ting Xu
- The Center of Psychosomatic Medicine, Sichuan Provincial Center for Mental Health, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu, China.,MOE Key Laboratory for Neuroinformation, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Xinqi Zhou
- The Center of Psychosomatic Medicine, Sichuan Provincial Center for Mental Health, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu, China.,MOE Key Laboratory for Neuroinformation, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Jonathan W Kanen
- Department of Psychology, University of Cambridge, Cambridge, UK.,Behavioural and Clinical Neuroscience Institute, University of Cambridge, Cambridge, UK
| | - Lan Wang
- The Center of Psychosomatic Medicine, Sichuan Provincial Center for Mental Health, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu, China.,MOE Key Laboratory for Neuroinformation, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Jialin Li
- Max Planck School of Cognition, Leipzig, Germany
| | - Zhiyi Chen
- Faculty of Psychology, Southwest University, Chongqing, China.,Key Laboratory of Cognition and Personality, Ministry of Education, Chongqing, China
| | - Ran Zhang
- The Center of Psychosomatic Medicine, Sichuan Provincial Center for Mental Health, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu, China.,MOE Key Laboratory for Neuroinformation, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Guojuan Jiao
- The Center of Psychosomatic Medicine, Sichuan Provincial Center for Mental Health, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu, China.,MOE Key Laboratory for Neuroinformation, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Feng Zhou
- Faculty of Psychology, Southwest University, Chongqing, China.,Key Laboratory of Cognition and Personality, Ministry of Education, Chongqing, China
| | - Weihua Zhao
- MOE Key Laboratory for Neuroinformation, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Shuxia Yao
- MOE Key Laboratory for Neuroinformation, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Benjamin Becker
- The Center of Psychosomatic Medicine, Sichuan Provincial Center for Mental Health, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu, China. .,MOE Key Laboratory for Neuroinformation, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
19
|
Codol O, Kashefi M, Forgaard CJ, Galea JM, Pruszynski JA, Gribble PL. Sensorimotor feedback loops are selectively sensitive to reward. eLife 2023; 12:81325. [PMID: 36637162 PMCID: PMC9910828 DOI: 10.7554/elife.81325] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Accepted: 12/29/2022] [Indexed: 01/14/2023] Open
Abstract
Although it is well established that motivational factors such as earning more money for performing well improve motor performance, how the motor system implements this improvement remains unclear. For instance, feedback-based control, which uses sensory feedback from the body to correct for errors in movement, improves with greater reward. But feedback control encompasses many feedback loops with diverse characteristics such as the brain regions involved and their response time. Which specific loops drive these performance improvements with reward is unknown, even though their diversity makes it unlikely that they are contributing uniformly. We systematically tested the effect of reward on the latency (how long for a corrective response to arise?) and gain (how large is the corrective response?) of seven distinct sensorimotor feedback loops in humans. Only the fastest feedback loops were insensitive to reward, and the earliest reward-driven changes were consistently an increase in feedback gains, not a reduction in latency. Rather, a reduction of response latencies only tended to occur in slower feedback loops. These observations were similar across sensory modalities (vision and proprioception). Our results may have implications regarding feedback control performance in athletic coaching. For instance, coaching methodologies that rely on reinforcement or 'reward shaping' may need to specifically target aspects of movement that rely on reward-sensitive feedback responses.
Collapse
Affiliation(s)
- Olivier Codol
- Brain and Mind Institute, University of Western OntarioLondonCanada
- Department of Psychology, University of Western OntarioLondonCanada
- School of Psychology, University of BirminghamBirminghamUnited Kingdom
| | - Mehrdad Kashefi
- Brain and Mind Institute, University of Western OntarioLondonCanada
- Department of Psychology, University of Western OntarioLondonCanada
- Department of Physiology & Pharmacology, Schulich School of Medicine & Dentistry, University of Western OntarioOntarioCanada
- Robarts Research Institute, University of Western OntarioLondonCanada
| | - Christopher J Forgaard
- Brain and Mind Institute, University of Western OntarioLondonCanada
- Department of Psychology, University of Western OntarioLondonCanada
| | - Joseph M Galea
- School of Psychology, University of BirminghamBirminghamUnited Kingdom
| | - J Andrew Pruszynski
- Brain and Mind Institute, University of Western OntarioLondonCanada
- Department of Psychology, University of Western OntarioLondonCanada
- Department of Physiology & Pharmacology, Schulich School of Medicine & Dentistry, University of Western OntarioOntarioCanada
- Robarts Research Institute, University of Western OntarioLondonCanada
| | - Paul L Gribble
- Brain and Mind Institute, University of Western OntarioLondonCanada
- Department of Psychology, University of Western OntarioLondonCanada
- Department of Physiology & Pharmacology, Schulich School of Medicine & Dentistry, University of Western OntarioOntarioCanada
- Haskins LaboratoriesNew HavenUnited States
| |
Collapse
|
20
|
Zihl J, Reppermund S. The aging mind: A complex challenge for research and practice. AGING BRAIN 2022; 3:100060. [PMID: 36911259 PMCID: PMC9997127 DOI: 10.1016/j.nbas.2022.100060] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Revised: 12/10/2022] [Accepted: 12/10/2022] [Indexed: 12/24/2022] Open
Abstract
Cognitive decline as part of mental ageing is typically assessed with standardized tests; below-average performance in such tests is used as an indicator for pathological cognitive aging. In addition, morphological and functional changes in the brain are used as parameters for age-related pathological decline in cognitive abilities. However, there is no simple link between the trajectories of changes in cognition and morphological or functional changes in the brain. Furthermore, below-average test performance does not necessarily mean a significant impairment in everyday activities. It therefore appears crucial to record individual everyday tasks and their cognitive (and other) requirements in functional terms. This would also allow reliable assessment of the ecological validity of existing and insufficient cognitive skills. Understanding and dealing with the phenomena and consequences of mental aging does of course not only depend on cognition. Motivation and emotions as well personal meaning of life and life satisfaction play an equally important role. This means, however, that cognition represents only one, albeit important, aspect of mental aging. Furthermore, creating and development of proper assessment tools for functional cognition is important. In this contribution we would like to discuss some aspects that we consider relevant for a holistic view of the aging mind and promote a strengthening of a multidisciplinary approach with close cooperation between all basic and applied sciences involved in aging research, a quick translation of the research results into practice, and a close cooperation between all disciplines and professions who advise and support older people.
Collapse
Affiliation(s)
- Josef Zihl
- Ludwig-Maximilians-University, Department of Psychology, Munich, Germany
| | - Simone Reppermund
- Centre for Healthy Brain Ageing, Discipline of Psychiatry and Mental Health, School of Clinical Medicine, Faculty of Medicine and Health, University of New South Wales, Sydney, Australia
- Department of Developmental Disability Neuropsychiatry, Discipline of Psychiatry and Mental Health, School of Clinical Medicine, Faculty of Medicine and Health, University of New South Wales, Sydney, Australia
| |
Collapse
|
21
|
Akiti K, Tsutsui-Kimura I, Xie Y, Mathis A, Markowitz JE, Anyoha R, Datta SR, Mathis MW, Uchida N, Watabe-Uchida M. Striatal dopamine explains novelty-induced behavioral dynamics and individual variability in threat prediction. Neuron 2022; 110:3789-3804.e9. [PMID: 36130595 PMCID: PMC9671833 DOI: 10.1016/j.neuron.2022.08.022] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 06/03/2022] [Accepted: 08/18/2022] [Indexed: 12/15/2022]
Abstract
Animals both explore and avoid novel objects in the environment, but the neural mechanisms that underlie these behaviors and their dynamics remain uncharacterized. Here, we used multi-point tracking (DeepLabCut) and behavioral segmentation (MoSeq) to characterize the behavior of mice freely interacting with a novel object. Novelty elicits a characteristic sequence of behavior, starting with investigatory approach and culminating in object engagement or avoidance. Dopamine in the tail of the striatum (TS) suppresses engagement, and dopamine responses were predictive of individual variability in behavior. Behavioral dynamics and individual variability are explained by a reinforcement-learning (RL) model of threat prediction in which behavior arises from a novelty-induced initial threat prediction (akin to "shaping bonus") and a threat prediction that is learned through dopamine-mediated threat prediction errors. These results uncover an algorithmic similarity between reward- and threat-related dopamine sub-systems.
Collapse
Affiliation(s)
- Korleki Akiti
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA 02138, USA
| | - Iku Tsutsui-Kimura
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA 02138, USA
| | - Yudi Xie
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA 02138, USA; Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Alexander Mathis
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA 02138, USA; The Rowland Institute at Harvard, Harvard University, Cambridge, MA 02138, USA; Swiss Federal Institute of Technology Lausanne, Geneve 1202, Switzerland
| | - Jeffrey E Markowitz
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115, USA; Wallace H. Coulter Department of Biomedical Engineering, Emory School of Medicine, Georgia Institute of Technology, Atlanta, GA 30322, USA
| | - Rockwell Anyoha
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115, USA
| | | | - Mackenzie Weygandt Mathis
- The Rowland Institute at Harvard, Harvard University, Cambridge, MA 02138, USA; Swiss Federal Institute of Technology Lausanne, Geneve 1202, Switzerland
| | - Naoshige Uchida
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA 02138, USA
| | - Mitsuko Watabe-Uchida
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA 02138, USA.
| |
Collapse
|
22
|
Chen W. Neural circuits provide insights into reward and aversion. Front Neural Circuits 2022; 16:1002485. [PMID: 36389177 PMCID: PMC9650032 DOI: 10.3389/fncir.2022.1002485] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 10/12/2022] [Indexed: 01/07/2023] Open
Abstract
Maladaptive changes in the neural circuits associated with reward and aversion result in some common symptoms, such as drug addiction, anxiety, and depression. Historically, the study of these circuits has been hampered by technical limitations. In recent years, however, much progress has been made in understanding the neural mechanisms of reward and aversion owing to the development of technologies such as cell type-specific electrophysiology, neuronal tracing, and behavioral manipulation based on optogenetics. The aim of this paper is to summarize the latest findings on the mechanisms of the neural circuits associated with reward and aversion in a review of previous studies with a focus on the ventral tegmental area (VTA), nucleus accumbens (NAc), and basal forebrain (BF). These findings may inform efforts to prevent and treat mental illnesses associated with dysfunctions of the brain's reward and aversion system.
Collapse
|
23
|
de Jong JW, Fraser KM, Lammel S. Mesoaccumbal Dopamine Heterogeneity: What Do Dopamine Firing and Release Have to Do with It? Annu Rev Neurosci 2022; 45:109-129. [PMID: 35226827 PMCID: PMC9271543 DOI: 10.1146/annurev-neuro-110920-011929] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Ventral tegmental area (VTA) dopamine (DA) neurons are often thought to uniformly encode reward prediction errors. Conversely, DA release in the nucleus accumbens (NAc), the prominent projection target of these neurons, has been implicated in reinforcement learning, motivation, aversion, and incentive salience. This contrast between heterogeneous functions of DA release versus a homogeneous role for DA neuron activity raises numerous questions regarding how VTA DA activity translates into NAc DA release. Further complicating this issue is increasing evidence that distinct VTA DA projections into defined NAc subregions mediate diverse behavioral functions. Here, we evaluate evidence for heterogeneity within the mesoaccumbal DA system and argue that frameworks of DA function must incorporate the precise topographic organization of VTA DA neurons to clarify their contribution to health and disease.
Collapse
Affiliation(s)
- Johannes W de Jong
- Department of Molecular and Cell Biology and Helen Wills Neuroscience Institute, University of California, Berkeley, California, USA;
| | - Kurt M Fraser
- Department of Molecular and Cell Biology and Helen Wills Neuroscience Institute, University of California, Berkeley, California, USA;
| | - Stephan Lammel
- Department of Molecular and Cell Biology and Helen Wills Neuroscience Institute, University of California, Berkeley, California, USA;
| |
Collapse
|
24
|
Codol O, Gribble PL, Gurney KN. Differential Dopamine Receptor-Dependent Sensitivity Improves the Switch Between Hard and Soft Selection in a Model of the Basal Ganglia. Neural Comput 2022; 34:1588-1615. [PMID: 35671472 DOI: 10.1162/neco_a_01517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 04/01/2022] [Indexed: 11/04/2022]
Abstract
The problem of selecting one action from a set of different possible actions, simply referred to as the problem of action selection, is a ubiquitous challenge in the animal world. For vertebrates, the basal ganglia (BG) are widely thought to implement the core computation to solve this problem, as its anatomy and physiology are well suited to this end. However, the BG still display physiological features whose role in achieving efficient action selection remains unclear. In particular, it is known that the two types of dopaminergic receptors (D1 and D2) present in the BG give rise to mechanistically different responses. The overall effect will be a difference in sensitivity to dopamine, which may have ramifications for action selection. However, which receptor type leads to a stronger response is unclear due to the complexity of the intracellular mechanisms involved. In this study, we use an existing, high-level computational model of the BG, which assumes that dopamine contributes to action selection by enabling a switch between different selection regimes, to predict which of D1 or D2 has the greater sensitivity. Thus, we ask, Assuming dopamine enables a switch between action selection regimes in the BG, what functional sensitivity values would result in improved action selection computation? To do this, we quantitatively assessed the model's capacity to perform action selection as we parametrically manipulated the sensitivity weights of D1 and D2. We show that differential (rather than equal) D1 and D2 sensitivity to dopaminergic input improves the switch between selection regimes during the action selection computation in our model. Specifically, greater D2 sensitivity compared to D1 led to these improvements.
Collapse
Affiliation(s)
- Olivier Codol
- Department of Psychology and Department of Physiology and Pharmacology, Schulich School of Medicine and Dentistry, University of Western Ontario, London, ON N6A 3K7, Canada
| | - Paul L Gribble
- Department of Psychology and Department of Physiology and Pharmacology, Schulich School of Medicine and Dentistry, University of Western Ontario, London, ON N6A 3K7, Canada.,Haskins Laboratories, New Haven, CT 06511, U.S.A.
| | - Kevin N Gurney
- Department of Psychology, University of Sheffield, Sheffield S10 2TN, U.K.
| |
Collapse
|
25
|
Lin CHS, Garrido MI. Towards a cross-level understanding of Bayesian inference in the brain. Neurosci Biobehav Rev 2022; 137:104649. [PMID: 35395333 DOI: 10.1016/j.neubiorev.2022.104649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Revised: 02/28/2022] [Accepted: 03/29/2022] [Indexed: 10/18/2022]
Abstract
Perception emerges from unconscious probabilistic inference, which guides behaviour in our ubiquitously uncertain environment. Bayesian decision theory is a prominent computational model that describes how people make rational decisions using noisy and ambiguous sensory observations. However, critical questions have been raised about the validity of the Bayesian framework in explaining the mental process of inference. Firstly, some natural behaviours deviate from Bayesian optimum. Secondly, the neural mechanisms that support Bayesian computations in the brain are yet to be understood. Taking Marr's cross level approach, we review the recent progress made in addressing these challenges. We first review studies that combined behavioural paradigms and modelling approaches to explain both optimal and suboptimal behaviours. Next, we evaluate the theoretical advances and the current evidence for ecologically feasible algorithms and neural implementations in the brain, which may enable probabilistic inference. We argue that this cross-level approach is necessary for the worthwhile pursuit to uncover mechanistic accounts of human behaviour.
Collapse
Affiliation(s)
- Chin-Hsuan Sophie Lin
- Melbourne School of Psychological Sciences, The University of Melbourne, Australia; Australian Research Council for Integrative Brain Function, Australia.
| | - Marta I Garrido
- Melbourne School of Psychological Sciences, The University of Melbourne, Australia; Australian Research Council for Integrative Brain Function, Australia
| |
Collapse
|
26
|
Grujic N, Brus J, Burdakov D, Polania R. Rational inattention in mice. SCIENCE ADVANCES 2022; 8:eabj8935. [PMID: 35245128 PMCID: PMC8896787 DOI: 10.1126/sciadv.abj8935] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Behavior exhibited by humans and other organisms is generally inconsistent and biased and, thus, is often labeled irrational. However, the origins of this seemingly suboptimal behavior remain elusive. We developed a behavioral task and normative framework to reveal how organisms should allocate their limited processing resources such that sensory precision and its related metabolic investment are balanced to guarantee maximal utility. We found that mice act as rational inattentive agents by adaptively allocating their sensory resources in a way that maximizes reward consumption in previously unexperienced stimulus-reward association environments. Unexpectedly, perception of commonly occurring stimuli was relatively imprecise; however, this apparent statistical fallacy implies "awareness" and efficient adaptation to their neurocognitive limitations. Arousal systems carry reward distribution information of sensory signals, and distributional reinforcement learning mechanisms regulate sensory precision via top-down normalization. These findings reveal how organisms efficiently perceive and adapt to previously unexperienced environmental contexts within the constraints imposed by neurobiology.
Collapse
Affiliation(s)
- Nikola Grujic
- Institute for Neuroscience, Department of Health Sciences and Technology, ETH Zurich, Zurich, Switzerland
- Neuroscience Center Zürich, Zurich, Switzerland
| | - Jeroen Brus
- Neuroscience Center Zürich, Zurich, Switzerland
- Decision Neuroscience Lab, Department of Health Sciences and Technology, ETH Zurich, Zurich, Switzerland
| | - Denis Burdakov
- Institute for Neuroscience, Department of Health Sciences and Technology, ETH Zurich, Zurich, Switzerland
- Neuroscience Center Zürich, Zurich, Switzerland
- Corresponding author. (R.P.); (D.B.)
| | - Rafael Polania
- Neuroscience Center Zürich, Zurich, Switzerland
- Decision Neuroscience Lab, Department of Health Sciences and Technology, ETH Zurich, Zurich, Switzerland
- Corresponding author. (R.P.); (D.B.)
| |
Collapse
|
27
|
Sennesh E, Theriault J, Brooks D, van de Meent JW, Barrett LF, Quigley KS. Interoception as modeling, allostasis as control. Biol Psychol 2022; 167:108242. [PMID: 34942287 PMCID: PMC9270659 DOI: 10.1016/j.biopsycho.2021.108242] [Citation(s) in RCA: 41] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 12/13/2021] [Accepted: 12/14/2021] [Indexed: 01/09/2023]
Abstract
The brain regulates the body by anticipating its needs and attempting to meet them before they arise - a process called allostasis. Allostasis requires a model of the changing sensory conditions within the body, a process called interoception. In this paper, we examine how interoception may provide performance feedback for allostasis. We suggest studying allostasis in terms of control theory, reviewing control theory's applications to related issues in physiology, motor control, and decision making. We synthesize these by relating them to the important properties of allostatic regulation as a control problem. We then sketch a novel formalism for how the brain might perform allostatic control of the viscera by analogy to skeletomotor control, including a mathematical view on how interoception acts as performance feedback for allostasis. Finally, we suggest ways to test implications of our hypotheses.
Collapse
Affiliation(s)
- Eli Sennesh
- Northeastern University, Boston, MA , United States.
| | | | - Dana Brooks
- Northeastern University, Boston, MA , United States
| | | | | | | |
Collapse
|
28
|
Zhang Y, Pan X, Wang Y. Category learning in a recurrent neural network with reinforcement learning. Front Psychiatry 2022; 13:1008011. [PMID: 36387007 PMCID: PMC9640766 DOI: 10.3389/fpsyt.2022.1008011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Accepted: 10/10/2022] [Indexed: 11/13/2022] Open
Abstract
It is known that humans and animals can learn and utilize category information quickly and efficiently to adapt to changing environments, and several brain areas are involved in learning and encoding category information. However, it is unclear that how the brain system learns and forms categorical representations from the view of neural circuits. In order to investigate this issue from the network level, we combine a recurrent neural network with reinforcement learning to construct a deep reinforcement learning model to demonstrate how the category is learned and represented in the network. The model consists of a policy network and a value network. The policy network is responsible for updating the policy to choose actions, while the value network is responsible for evaluating the action to predict rewards. The agent learns dynamically through the information interaction between the policy network and the value network. This model was trained to learn six stimulus-stimulus associative chains in a sequential paired-association task that was learned by the monkey. The simulated results demonstrated that our model was able to learn the stimulus-stimulus associative chains, and successfully reproduced the similar behavior of the monkey performing the same task. Two types of neurons were found in this model: one type primarily encoded identity information about individual stimuli; the other type mainly encoded category information of associated stimuli in one chain. The two types of activity-patterns were also observed in the primate prefrontal cortex after the monkey learned the same task. Furthermore, the ability of these two types of neurons to encode stimulus or category information was enhanced during this model was learning the task. Our results suggest that the neurons in the recurrent neural network have the ability to form categorical representations through deep reinforcement learning during learning stimulus-stimulus associations. It might provide a new approach for understanding neuronal mechanisms underlying how the prefrontal cortex learns and encodes category information.
Collapse
Affiliation(s)
- Ying Zhang
- Institute for Cognitive Neurodynamics, East China University of Science and Technology, Shanghai, China
| | - Xiaochuan Pan
- Institute for Cognitive Neurodynamics, East China University of Science and Technology, Shanghai, China
| | - Yihong Wang
- Institute for Cognitive Neurodynamics, East China University of Science and Technology, Shanghai, China
| |
Collapse
|
29
|
Mei J, Muller E, Ramaswamy S. Informing deep neural networks by multiscale principles of neuromodulatory systems. Trends Neurosci 2022; 45:237-250. [DOI: 10.1016/j.tins.2021.12.008] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 12/04/2021] [Accepted: 12/21/2021] [Indexed: 01/19/2023]
|
30
|
Tanaka S, Taylor JE, Sakagami M. The effect of effort on reward prediction error signals in midbrain dopamine neurons. Curr Opin Behav Sci 2021. [DOI: 10.1016/j.cobeha.2021.07.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
31
|
Tsutsui-Kimura I, Matsumoto H, Akiti K, Yamada MM, Uchida N, Watabe-Uchida M. Distinct temporal difference error signals in dopamine axons in three regions of the striatum in a decision-making task. eLife 2020; 9:e62390. [PMID: 33345774 PMCID: PMC7771962 DOI: 10.7554/elife.62390] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2020] [Accepted: 12/18/2020] [Indexed: 12/24/2022] Open
Abstract
Different regions of the striatum regulate different types of behavior. However, how dopamine signals differ across striatal regions and how dopamine regulates different behaviors remain unclear. Here, we compared dopamine axon activity in the ventral, dorsomedial, and dorsolateral striatum, while mice performed a perceptual and value-based decision task. Surprisingly, dopamine axon activity was similar across all three areas. At a glance, the activity multiplexed different variables such as stimulus-associated values, confidence, and reward feedback at different phases of the task. Our modeling demonstrates, however, that these modulations can be inclusively explained by moment-by-moment changes in the expected reward, that is the temporal difference error. A major difference between areas was the overall activity level of reward responses: reward responses in dorsolateral striatum were positively shifted, lacking inhibitory responses to negative prediction errors. The differences in dopamine signals put specific constraints on the properties of behaviors controlled by dopamine in these regions.
Collapse
Affiliation(s)
- Iku Tsutsui-Kimura
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard UniversityCambridgeUnited States
| | - Hideyuki Matsumoto
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard UniversityCambridgeUnited States
- Department of Physiology, Osaka City University Graduate School of MedicineOsakaJapan
| | - Korleki Akiti
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard UniversityCambridgeUnited States
| | - Melissa M Yamada
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard UniversityCambridgeUnited States
| | - Naoshige Uchida
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard UniversityCambridgeUnited States
| | - Mitsuko Watabe-Uchida
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard UniversityCambridgeUnited States
| |
Collapse
|