1
|
Ohta H, Nozawa T, Nakano T, Morimoto Y, Ishizuka T. Nonlinear age-related differences in probabilistic learning in mice: A 5-armed bandit task study. Neurobiol Aging 2024; 142:8-16. [PMID: 39029360 DOI: 10.1016/j.neurobiolaging.2024.06.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 06/17/2024] [Accepted: 06/19/2024] [Indexed: 07/21/2024]
Abstract
This study explores the impact of aging on reinforcement learning in mice, focusing on changes in learning rates and behavioral strategies. A 5-armed bandit task (5-ABT) and a computational Q-learning model were used to evaluate the positive and negative learning rates and the inverse temperature across three age groups (3, 12, and 18 months). Results showed a significant decline in the negative learning rate of 18-month-old mice, which was not observed for the positive learning rate. This suggests that older mice maintain the ability to learn from successful experiences while decreasing the ability to learn from negative outcomes. We also observed a significant age-dependent variation in inverse temperature, reflecting a shift in action selection policy. Middle-aged mice (12 months) exhibited higher inverse temperature, indicating a higher reliance on previous rewarding experiences and reduced exploratory behaviors, when compared to both younger and older mice. This study provides new insights into aging research by demonstrating that there are age-related differences in specific components of reinforcement learning, which exhibit a non-linear pattern.
Collapse
Affiliation(s)
- Hiroyuki Ohta
- Department of Pharmacology, National Defense Medical College, 3-2 Namiki, Tokorozawa, Saitama 359-8513, Japan.
| | - Takashi Nozawa
- Mejiro University, 4-31-1 Naka-Ochiai, Shinjuku, Tokyo 161-8539, Japan
| | - Takashi Nakano
- Department of Computational Biology, School of Medicine, Fujita Health University, 1-98 Dengakugakubo, Kutsukake, Toyoake, Aichi 470-1192, Japan; International Center for Brain Science (ICBS), Fujita Health University, 1-98 Dengakugakubo, Kutsukake, Toyoake, Aichi 470-1192, Japan
| | - Yuji Morimoto
- Department of Physiology, National Defense Medical College, 3-2 Namiki, Tokorozawa, Saitama 359-8513, Japan
| | - Toshiaki Ishizuka
- Department of Pharmacology, National Defense Medical College, 3-2 Namiki, Tokorozawa, Saitama 359-8513, Japan
| |
Collapse
|
2
|
Gong L, Pasqualetti F, Papouin T, Ching S. Astrocytes as a mechanism for contextually-guided network dynamics and function. PLoS Comput Biol 2024; 20:e1012186. [PMID: 38820533 PMCID: PMC11168681 DOI: 10.1371/journal.pcbi.1012186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 06/12/2024] [Accepted: 05/21/2024] [Indexed: 06/02/2024] Open
Abstract
Astrocytes are a ubiquitous and enigmatic type of non-neuronal cell and are found in the brain of all vertebrates. While traditionally viewed as being supportive of neurons, it is increasingly recognized that astrocytes play a more direct and active role in brain function and neural computation. On account of their sensitivity to a host of physiological covariates and ability to modulate neuronal activity and connectivity on slower time scales, astrocytes may be particularly well poised to modulate the dynamics of neural circuits in functionally salient ways. In the current paper, we seek to capture these features via actionable abstractions within computational models of neuron-astrocyte interaction. Specifically, we engage how nested feedback loops of neuron-astrocyte interaction, acting over separated time-scales, may endow astrocytes with the capability to enable learning in context-dependent settings, where fluctuations in task parameters may occur much more slowly than within-task requirements. We pose a general model of neuron-synapse-astrocyte interaction and use formal analysis to characterize how astrocytic modulation may constitute a form of meta-plasticity, altering the ways in which synapses and neurons adapt as a function of time. We then embed this model in a bandit-based reinforcement learning task environment, and show how the presence of time-scale separated astrocytic modulation enables learning over multiple fluctuating contexts. Indeed, these networks learn far more reliably compared to dynamically homogeneous networks and conventional non-network-based bandit algorithms. Our results fuel the notion that neuron-astrocyte interactions in the brain benefit learning over different time-scales and the conveyance of task-relevant contextual information onto circuit dynamics.
Collapse
Affiliation(s)
- Lulu Gong
- Department of Electrical and Systems Engineering, Washington University, St. Louis, Missouri, United States of America
| | - Fabio Pasqualetti
- Department of Mechanical Engineering, University of California, Riverside, California, United States of America
| | - Thomas Papouin
- Department of Neuroscience, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - ShiNung Ching
- Department of Electrical and Systems Engineering, Washington University, St. Louis, Missouri, United States of America
| |
Collapse
|
3
|
Sazhin D, Dachs A, Smith DV. Meta-Analysis Reveals That Explore-Exploit Decisions are Dissociable by Activation in the Dorsal Lateral Prefrontal Cortex and the Anterior Cingulate Cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.10.21.563317. [PMID: 37961286 PMCID: PMC10634720 DOI: 10.1101/2023.10.21.563317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Explore-exploit research has challenges in generalizability due to a limited theoretical basis of exploration and exploitation. Neuroimaging can help identify whether explore-exploit decisions use an opponent processing system to address this issue. Thus, we conducted a coordinate-based meta-analysis (N=23 studies) where we found activation in the dorsal lateral prefrontal cortex and anterior cingulate cortex during exploration versus exploitation, providing some evidence for opponent processing. However, the conjunction of explore-exploit decisions was associated with activation in the dorsal anterior cingulate cortex, dorsal medial prefrontal cortex, and anterior insula, suggesting that these brain regions do not engage in opponent processing. Further, exploratory analyses revealed heterogeneity in brain responses between task types during exploration and exploitation respectively. Coupled with results suggesting that activation in exploration and exploitation decisions is generally more similar than it is different suggests there remain significant challenges toward characterizing explore-exploit decision making. Nonetheless, dlPFC and ACC activation differentiate explore and exploit decisions and identifying these responses can help in targeted interventions aimed at manipulating these decisions.
Collapse
|
4
|
Jin F, Yang L, Yang L, Li J, Li M, Shang Z. Dynamics Learning Rate Bias in Pigeons: Insights from Reinforcement Learning and Neural Correlates. Animals (Basel) 2024; 14:489. [PMID: 38338131 PMCID: PMC10854969 DOI: 10.3390/ani14030489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 01/23/2024] [Accepted: 01/30/2024] [Indexed: 02/12/2024] Open
Abstract
Research in reinforcement learning indicates that animals respond differently to positive and negative reward prediction errors, which can be calculated by assuming learning rate bias. Many studies have shown that humans and other animals have learning rate bias during learning, but it is unclear whether and how the bias changes throughout the entire learning process. Here, we recorded the behavior data and the local field potentials (LFPs) in the striatum of five pigeons performing a probabilistic learning task. Reinforcement learning models with and without learning rate biases were used to dynamically fit the pigeons' choice behavior and estimate the option values. Furthemore, the correlation between the striatal LFPs power and the model-estimated option values was explored. We found that the pigeons' learning rate bias shifted from negative to positive during the learning process, and the striatal Gamma (31 to 80 Hz) power correlated with the option values modulated by dynamic learning rate bias. In conclusion, our results support the hypothesis that pigeons employ a dynamic learning strategy in the learning process from both behavioral and neural aspects, providing valuable insights into reinforcement learning mechanisms of non-human animals.
Collapse
Affiliation(s)
- Fuli Jin
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou 450001, China; (F.J.); (L.Y.); (L.Y.); (J.L.)
- Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology, Zhengzhou 450001, China
| | - Lifang Yang
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou 450001, China; (F.J.); (L.Y.); (L.Y.); (J.L.)
- Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology, Zhengzhou 450001, China
| | - Long Yang
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou 450001, China; (F.J.); (L.Y.); (L.Y.); (J.L.)
- Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology, Zhengzhou 450001, China
| | - Jiajia Li
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou 450001, China; (F.J.); (L.Y.); (L.Y.); (J.L.)
- Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology, Zhengzhou 450001, China
| | - Mengmeng Li
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou 450001, China; (F.J.); (L.Y.); (L.Y.); (J.L.)
- Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology, Zhengzhou 450001, China
| | - Zhigang Shang
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou 450001, China; (F.J.); (L.Y.); (L.Y.); (J.L.)
- Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology, Zhengzhou 450001, China
- Institute of Medical Engineering Technology and Data Mining, Zhengzhou University, Zhengzhou 450001, China
| |
Collapse
|
5
|
Ihara K, Shikano Y, Kato S, Yagishita S, Tanaka KF, Takata N. A reinforcement learning model with choice traces for a progressive ratio schedule. Front Behav Neurosci 2024; 17:1302842. [PMID: 38268795 PMCID: PMC10806202 DOI: 10.3389/fnbeh.2023.1302842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 12/13/2023] [Indexed: 01/26/2024] Open
Abstract
The progressive ratio (PR) lever-press task serves as a benchmark for assessing goal-oriented motivation. However, a well-recognized limitation of the PR task is that only a single data point, known as the breakpoint, is obtained from an entire session as a barometer of motivation. Because the breakpoint is defined as the final ratio of responses achieved in a PR session, variations in choice behavior during the PR task cannot be captured. We addressed this limitation by constructing four reinforcement learning models: a simple Q-learning model, an asymmetric model with two learning rates, a perseverance model with choice traces, and a perseverance model without learning. These models incorporated three behavioral choices: reinforced and non-reinforced lever presses and void magazine nosepokes, because we noticed that male mice performed frequent magazine nosepokes during PR tasks. The best model was the perseverance model, which predicted a gradual reduction in amplitudes of reward prediction errors (RPEs) upon void magazine nosepokes. We confirmed the prediction experimentally with fiber photometry of extracellular dopamine (DA) dynamics in the ventral striatum of male mice using a fluorescent protein (genetically encoded GPCR activation-based DA sensor: GRABDA2m). We verified application of the model by acute intraperitoneal injection of low-dose methamphetamine (METH) before a PR task, which increased the frequency of magazine nosepokes during the PR session without changing the breakpoint. The perseverance model captured behavioral modulation as a result of increased initial action values, which are customarily set to zero and disregarded in reinforcement learning analysis. Our findings suggest that the perseverance model reveals the effects of psychoactive drugs on choice behaviors during PR tasks.
Collapse
Affiliation(s)
- Keiko Ihara
- Division of Brain Sciences, Institute for Advanced Medical Research, Keio University School of Medicine, Tokyo, Japan
| | - Yu Shikano
- Division of Brain Sciences, Institute for Advanced Medical Research, Keio University School of Medicine, Tokyo, Japan
- Department of Biology, Stanford University, Stanford, CA, United States
| | - Sae Kato
- Division of Brain Sciences, Institute for Advanced Medical Research, Keio University School of Medicine, Tokyo, Japan
| | - Sho Yagishita
- Center for Disease Biology and Integrative Medicine, Faculty of Medicine, The University of Tokyo, Tokyo, Japan
| | - Kenji F. Tanaka
- Division of Brain Sciences, Institute for Advanced Medical Research, Keio University School of Medicine, Tokyo, Japan
| | - Norio Takata
- Division of Brain Sciences, Institute for Advanced Medical Research, Keio University School of Medicine, Tokyo, Japan
| |
Collapse
|
6
|
Chierchia G, Soukupová M, Kilford EJ, Griffin C, Leung J, Palminteri S, Blakemore SJ. Confirmatory reinforcement learning changes with age during adolescence. Dev Sci 2023; 26:e13330. [PMID: 36194156 PMCID: PMC7615280 DOI: 10.1111/desc.13330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Revised: 07/26/2022] [Accepted: 09/20/2022] [Indexed: 11/26/2022]
Abstract
Understanding how learning changes during human development has been one of the long-standing objectives of developmental science. Recently, advances in computational biology have demonstrated that humans display a bias when learning to navigate novel environments through rewards and punishments: they learn more from outcomes that confirm their expectations than from outcomes that disconfirm them. Here, we ask whether confirmatory learning is stable across development, or whether it might be attenuated in developmental stages in which exploration is beneficial, such as in adolescence. In a reinforcement learning (RL) task, 77 participants aged 11-32 years (four men, mean age = 16.26) attempted to maximize monetary rewards by repeatedly sampling different pairs of novel options, which varied in their reward/punishment probabilities. Mixed-effect models showed an age-related increase in accuracy as long as learning contingencies remained stable across trials, but less so when they reversed halfway through the trials. Age was also associated with a greater tendency to stay with an option that had just delivered a reward, more than to switch away from an option that had just delivered a punishment. At the computational level, a confirmation model provided increasingly better fit with age. This model showed that age differences are captured by decreases in noise or exploration, rather than in the magnitude of the confirmation bias. These findings provide new insights into how learning changes during development and could help better tailor learning environments to people of different ages. RESEARCH HIGHLIGHTS: Reinforcement learning shows age-related improvement during adolescence, but more in stable learning environments compared with volatile learning environments. People tend to stay with an option after a win more than they shift from an option after a loss, and this asymmetry increases with age during adolescence. Computationally, these changes are captured by a developing confirmatory learning style, in which people learn more from outcomes that confirm rather than disconfirm their choices. Age-related differences in confirmatory learning are explained by decreases in stochasticity, rather than changes in the magnitude of the confirmation bias.
Collapse
Affiliation(s)
- Gabriele Chierchia
- Department of Psychology, University of Cambridge, UK
- Institute of Cognitive Neuroscience, University College London, UK
| | | | - Emma J. Kilford
- Institute of Cognitive Neuroscience, University College London, UK
- Department of Clinical, Educational and Health Psychology, University College London, UK
| | - Cait Griffin
- Institute of Cognitive Neuroscience, University College London, UK
| | - Jovita Leung
- Institute of Cognitive Neuroscience, University College London, UK
| | - Stefano Palminteri
- Institute of Cognitive Neuroscience, University College London, UK
- Department of Cognitive Science, École Normale Supérieure, FR
- Institute of Cognitive Neuroscience, HSE, Moscow, Federation of Russia
| | - Sarah-Jayne Blakemore
- Department of Psychology, University of Cambridge, UK
- Institute of Cognitive Neuroscience, University College London, UK
| |
Collapse
|
7
|
Doya K, Friston K, Sugiyama M, Tenenbaum J. Neural Networks special issue on Artificial Intelligence and Brain Science. Neural Netw 2022; 155:328-329. [PMID: 36099665 DOI: 10.1016/j.neunet.2022.08.018] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Kenji Doya
- Okinawa Institute of Science and Technology Graduate University, Japan.
| | | | | | - Josh Tenenbaum
- Massachusetts Institute of Technology, United States of America
| |
Collapse
|
8
|
Palminteri S, Lebreton M. The computational roots of positivity and confirmation biases in reinforcement learning. Trends Cogn Sci 2022; 26:607-621. [PMID: 35662490 DOI: 10.1016/j.tics.2022.04.005] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Revised: 04/13/2022] [Accepted: 04/18/2022] [Indexed: 12/16/2022]
Abstract
Humans do not integrate new information objectively: outcomes carrying a positive affective value and evidence confirming one's own prior belief are overweighed. Until recently, theoretical and empirical accounts of the positivity and confirmation biases assumed them to be specific to 'high-level' belief updates. We present evidence against this account. Learning rates in reinforcement learning (RL) tasks, estimated across different contexts and species, generally present the same characteristic asymmetry, suggesting that belief and value updating processes share key computational principles and distortions. This bias generates over-optimistic expectations about the probability of making the right choices and, consequently, generates over-optimistic reward expectations. We discuss the normative and neurobiological roots of these RL biases and their position within the greater picture of behavioral decision-making theories.
Collapse
Affiliation(s)
- Stefano Palminteri
- Laboratoire de Neurosciences Cognitives et Computationnelles, Institut National de la Santé et Recherche Médicale, Paris, France; Département d'Études Cognitives, Ecole Normale Supérieure, Paris, France; Université de Recherche Paris Sciences et Lettres, Paris, France.
| | - Maël Lebreton
- Paris School of Economics, Paris, France; LabNIC, Department of Fundamental Neurosciences, University of Geneva, Geneva, Switzerland; Swiss Center for Affective Science, Geneva, Switzerland.
| |
Collapse
|