1
|
Ohta H, Nozawa T, Nakano T, Morimoto Y, Ishizuka T. Nonlinear age-related differences in probabilistic learning in mice: A 5-armed bandit task study. Neurobiol Aging 2024; 142:8-16. [PMID: 39029360 DOI: 10.1016/j.neurobiolaging.2024.06.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 06/17/2024] [Accepted: 06/19/2024] [Indexed: 07/21/2024]
Abstract
This study explores the impact of aging on reinforcement learning in mice, focusing on changes in learning rates and behavioral strategies. A 5-armed bandit task (5-ABT) and a computational Q-learning model were used to evaluate the positive and negative learning rates and the inverse temperature across three age groups (3, 12, and 18 months). Results showed a significant decline in the negative learning rate of 18-month-old mice, which was not observed for the positive learning rate. This suggests that older mice maintain the ability to learn from successful experiences while decreasing the ability to learn from negative outcomes. We also observed a significant age-dependent variation in inverse temperature, reflecting a shift in action selection policy. Middle-aged mice (12 months) exhibited higher inverse temperature, indicating a higher reliance on previous rewarding experiences and reduced exploratory behaviors, when compared to both younger and older mice. This study provides new insights into aging research by demonstrating that there are age-related differences in specific components of reinforcement learning, which exhibit a non-linear pattern.
Collapse
Affiliation(s)
- Hiroyuki Ohta
- Department of Pharmacology, National Defense Medical College, 3-2 Namiki, Tokorozawa, Saitama 359-8513, Japan.
| | - Takashi Nozawa
- Mejiro University, 4-31-1 Naka-Ochiai, Shinjuku, Tokyo 161-8539, Japan
| | - Takashi Nakano
- Department of Computational Biology, School of Medicine, Fujita Health University, 1-98 Dengakugakubo, Kutsukake, Toyoake, Aichi 470-1192, Japan; International Center for Brain Science (ICBS), Fujita Health University, 1-98 Dengakugakubo, Kutsukake, Toyoake, Aichi 470-1192, Japan
| | - Yuji Morimoto
- Department of Physiology, National Defense Medical College, 3-2 Namiki, Tokorozawa, Saitama 359-8513, Japan
| | - Toshiaki Ishizuka
- Department of Pharmacology, National Defense Medical College, 3-2 Namiki, Tokorozawa, Saitama 359-8513, Japan
| |
Collapse
|
2
|
Neville V, Finnegan E, Paul ES, Davidson M, Dayan P, Mendl M. You are How You Eat: Foraging Behavior as a Potential Novel Marker of Rat Affective State. AFFECTIVE SCIENCE 2024; 5:232-245. [PMID: 39391344 PMCID: PMC11461729 DOI: 10.1007/s42761-024-00242-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Accepted: 05/20/2024] [Indexed: 10/12/2024]
Abstract
Effective and safe foraging requires animals to behave according to the expectations they have about the rewards, threats, and costs in their environment. Since these factors are thought to be reflected in the animals' affective states, we can use foraging behavior as a window into those states. In this study, rats completed a foraging task in which they had repeatedly to decide whether to continue to harvest a food source despite increasing time costs, or to forgo food to switch to a different food source. Rats completed this task across two experiments using manipulations designed to induce both positive and negative, and shorter- and longer- term changes in affective state: removal and return of enrichment (Experiment 1), implementation and reversal of an unpredictable housing treatment (Experiment 1), and delivery of rewards (tickling or sucrose) and punishers (air-puff or back-handling) immediately prior to testing (Experiment 2). In Experiment 1, rats completed fewer trials and were more prone to switching between troughs when housed in standard, compared to enriched, housing conditions. In Experiment 2, rats completed more trials following pre-test tickling compared to pre-test sucrose delivery. However, we also found that they were prone to disengaging from the task, suggesting they were really choosing between three options: 'harvest', 'switch', or 'not work'. This limits the straightforward interpretation of the results. At present, foraging behavior within the context of this task cannot reliably be used as an indicator of an affective state in animals. Supplementary Information The online version contains supplementary material available at 10.1007/s42761-024-00242-4.
Collapse
Affiliation(s)
- Vikki Neville
- Bristol Veterinary School, University of Bristol, Langford, UK
| | - Emily Finnegan
- Bristol Veterinary School, University of Bristol, Langford, UK
| | | | - Molly Davidson
- Bristol Veterinary School, University of Bristol, Langford, UK
| | - Peter Dayan
- Max Planck Institute for Biological Cybernetics & University of Tübingen, Tübingen, Germany
| | - Michael Mendl
- Bristol Veterinary School, University of Bristol, Langford, UK
| |
Collapse
|
3
|
Cinotti F, Coutureau E, Khamassi M, Marchand AR, Girard B. Regulation of reinforcement learning parameters captures long-term changes in rat behaviour. Eur J Neurosci 2024; 60:4469-4490. [PMID: 38923238 DOI: 10.1111/ejn.16449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 05/14/2024] [Accepted: 06/05/2024] [Indexed: 06/28/2024]
Abstract
In uncertain environments in which resources fluctuate continuously, animals must permanently decide whether to stabilise learning and exploit what they currently believe to be their best option, or instead explore potential alternatives and learn fast from new observations. While such a trade-off has been extensively studied in pretrained animals facing non-stationary decision-making tasks, it is yet unknown how they progressively tune it while learning the task structure during pretraining. Here, we compared the ability of different computational models to account for long-term changes in the behaviour of 24 rats while they learned to choose a rewarded lever in a three-armed bandit task across 24 days of pretraining. We found that the day-by-day evolution of rat performance and win-shift tendency revealed a progressive stabilisation of the way they regulated reinforcement learning parameters. We successfully captured these behavioural adaptations using a meta-learning model in which either the learning rate or the inverse temperature was controlled by the average reward rate.
Collapse
Affiliation(s)
- François Cinotti
- Institut des Systèmes Intelligents et de Robotique, Sorbonne Université, CNRS, Paris, France
- University of Reading, School of Psychology and Clinical Language Sciences, Whiteknights, Reading, UK
| | | | - Mehdi Khamassi
- Institut des Systèmes Intelligents et de Robotique, Sorbonne Université, CNRS, Paris, France
| | | | - Benoît Girard
- Institut des Systèmes Intelligents et de Robotique, Sorbonne Université, CNRS, Paris, France
| |
Collapse
|
4
|
Lazebnik T, Golov Y, Gurka R, Harari A, Liberzon A. Exploration-exploitation model of moth-inspired olfactory navigation. J R Soc Interface 2024; 21:20230746. [PMID: 39013419 PMCID: PMC11251768 DOI: 10.1098/rsif.2023.0746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 04/25/2024] [Indexed: 07/18/2024] Open
Abstract
Navigation of male moths towards females during the mating search offers a unique perspective on the exploration-exploitation (EE) model in decision-making. This study uses the EE model to explain male moth pheromone-driven flight paths. Wind tunnel measurements and three-dimensional tracking using infrared cameras have been leveraged to gain insights into male moth behaviour. During the experiments in the wind tunnel, disturbance to the airflow has been added and the effect of increased fluctuations on moth flights has been analysed, in the context of the proposed EE model. The exploration and exploitation phases are separated using a genetic algorithm to the experimentally obtained dataset of moth three-dimensional trajectories. First, the exploration-to-exploitation rate (EER) increases with distance from the source of the female pheromone is demonstrated, which can be explained in the context of the EE model. Furthermore, our findings reveal a compelling relationship between EER and increased flow fluctuations near the pheromone source. Using an olfactory navigation simulation and our moth-inspired navigation model, the phenomenon where male moths exhibit an enhanced EER as turbulence levels increase is explained. This research extends our understanding of optimal navigation strategies based on general biological EE models and supports the development of bioinspired navigation algorithms.
Collapse
Affiliation(s)
- Teddy Lazebnik
- Department of Mathematics, Ariel University, Ariel, Israel
- Department of Cancer Biology, Cancer Institute, University College London, London, UK
| | - Yiftach Golov
- Department of Entomology, The Volcani Center, Israel
| | - Roi Gurka
- Department of Physics and Engineering Science, Coastal Carolina University, Conway, SC, USA
| | - Ally Harari
- Department of Entomology, The Volcani Center, Israel
| | - Alex Liberzon
- Turbulence Structure Laboratory, School of Mechanical Engineering, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
5
|
Kobayashi K, Kable JW. Neural mechanisms of information seeking. Neuron 2024; 112:1741-1756. [PMID: 38703774 DOI: 10.1016/j.neuron.2024.04.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 01/30/2024] [Accepted: 04/08/2024] [Indexed: 05/06/2024]
Abstract
We ubiquitously seek information to make better decisions. Particularly in the modern age, when more information is available at our fingertips than ever, the information we choose to collect determines the quality of our decisions. Decision neuroscience has long adopted empirical approaches where the information available to decision-makers is fully controlled by the researchers, leaving neural mechanisms of information seeking less understood. Although information seeking has long been studied in the context of the exploration-exploitation trade-off, recent studies have widened the scope to investigate more overt information seeking in a way distinct from other decision processes. Insights gained from these studies, accumulated over the last few years, raise the possibility that information seeking is driven by the reward system signaling the subjective value of information. In this piece, we review findings from the recent studies, highlighting the conceptual and empirical relationships between distinct literatures, and discuss future research directions necessary to establish a more comprehensive understanding of how individuals seek information as a part of value-based decision-making.
Collapse
Affiliation(s)
- Kenji Kobayashi
- Department of Psychology, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | - Joseph W Kable
- Department of Psychology, University of Pennsylvania, Philadelphia, PA 19104, USA.
| |
Collapse
|
6
|
Montgomery SE, Li L, Russo SJ, Calipari ES, Nestler EJ, Morel C, Han MH. Mesolimbic Neural Response Dynamics Predict Future Individual Alcohol Drinking in Mice. Biol Psychiatry 2024; 95:951-962. [PMID: 38061466 DOI: 10.1016/j.biopsych.2023.11.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 11/11/2023] [Accepted: 11/14/2023] [Indexed: 01/27/2024]
Abstract
BACKGROUND Individual variability in response to rewarding stimuli is a striking but understudied phenomenon. The mesolimbic dopamine system is critical in encoding the reinforcing properties of both natural reward and alcohol; however, how innate or baseline differences in the response dynamics of this circuit define individual behavior and shape future vulnerability to alcohol remain unknown. METHODS Using naturalistic behavioral assays, a voluntary alcohol drinking paradigm, in vivo fiber photometry, in vivo electrophysiology, and chemogenetics, we investigated how differences in mesolimbic neural circuit activity contribute to the individual variability seen in reward processing and, by proxy, alcohol drinking. RESULTS We first characterized heterogeneous behavioral and neural responses to natural reward and defined how these baseline responses predicted future individual alcohol-drinking phenotypes in male mice. We then determined spontaneous ventral tegmental area dopamine neuron firing profiles associated with responses to natural reward that predicted alcohol drinking. Using a dual chemogenetic approach, we mimicked specific mesolimbic dopamine neuron firing activity before or during voluntary alcohol drinking to link unique neurophysiological profiles to individual phenotype. We show that hyperdopaminergic individuals exhibit a lower neuronal response to both natural reward and alcohol that predicts lower levels of alcohol consumption in the future. CONCLUSIONS These findings reveal unique, circuit-specific neural signatures that predict future individual vulnerability or resistance to alcohol and expand the current knowledge base on how some individuals are able to titrate their alcohol consumption whereas others go on to engage in unhealthy alcohol-drinking behaviors.
Collapse
Affiliation(s)
- Sarah E Montgomery
- Nash Family Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, New York; Friedman Brain Institute and the Center for Affective Neuroscience, Icahn School of Medicine at Mount Sinai, New York, New York; Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Long Li
- Nash Family Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, New York; Friedman Brain Institute and the Center for Affective Neuroscience, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Scott J Russo
- Nash Family Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, New York; Friedman Brain Institute and the Center for Affective Neuroscience, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Erin S Calipari
- Nash Family Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, New York; Friedman Brain Institute and the Center for Affective Neuroscience, Icahn School of Medicine at Mount Sinai, New York, New York; Departments of Pharmacology, Molecular Physiology and Biophysics, and Psychiatry and Behavioral Sciences, Vanderbilt Center for Addiction Research, Vanderbilt Brain Institute, Vanderbilt University, Nashville, Tennessee
| | - Eric J Nestler
- Nash Family Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, New York; Friedman Brain Institute and the Center for Affective Neuroscience, Icahn School of Medicine at Mount Sinai, New York, New York; Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Carole Morel
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, New York.
| | - Ming-Hu Han
- Nash Family Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, New York; Friedman Brain Institute and the Center for Affective Neuroscience, Icahn School of Medicine at Mount Sinai, New York, New York; Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, New York; Department of Mental Health and Public Health, Faculty of Life and Health Sciences, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China.
| |
Collapse
|
7
|
Wang Y, Lak A, Manohar SG, Bogacz R. Dopamine encoding of novelty facilitates efficient uncertainty-driven exploration. PLoS Comput Biol 2024; 20:e1011516. [PMID: 38626219 PMCID: PMC11051659 DOI: 10.1371/journal.pcbi.1011516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 04/26/2024] [Accepted: 03/23/2024] [Indexed: 04/18/2024] Open
Abstract
When facing an unfamiliar environment, animals need to explore to gain new knowledge about which actions provide reward, but also put the newly acquired knowledge to use as quickly as possible. Optimal reinforcement learning strategies should therefore assess the uncertainties of these action-reward associations and utilise them to inform decision making. We propose a novel model whereby direct and indirect striatal pathways act together to estimate both the mean and variance of reward distributions, and mesolimbic dopaminergic neurons provide transient novelty signals, facilitating effective uncertainty-driven exploration. We utilised electrophysiological recording data to verify our model of the basal ganglia, and we fitted exploration strategies derived from the neural model to data from behavioural experiments. We also compared the performance of directed exploration strategies inspired by our basal ganglia model with other exploration algorithms including classic variants of upper confidence bound (UCB) strategy in simulation. The exploration strategies inspired by the basal ganglia model can achieve overall superior performance in simulation, and we found qualitatively similar results in fitting model to behavioural data compared with the fitting of more idealised normative models with less implementation level detail. Overall, our results suggest that transient dopamine levels in the basal ganglia that encode novelty could contribute to an uncertainty representation which efficiently drives exploration in reinforcement learning.
Collapse
Affiliation(s)
- Yuhao Wang
- MRC Brain Network Dynamics Unit, University of Oxford, Oxford, United Kingdom
| | - Armin Lak
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
| | - Sanjay G. Manohar
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, United Kingdom
| | - Rafal Bogacz
- MRC Brain Network Dynamics Unit, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
8
|
Venditto SJC, Miller KJ, Brody CD, Daw ND. Dynamic reinforcement learning reveals time-dependent shifts in strategy during reward learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.28.582617. [PMID: 38464244 PMCID: PMC10925334 DOI: 10.1101/2024.02.28.582617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Different brain systems have been hypothesized to subserve multiple "experts" that compete to generate behavior. In reinforcement learning, two general processes, one model-free (MF) and one model-based (MB), are often modeled as a mixture of agents (MoA) and hypothesized to capture differences between automaticity vs. deliberation. However, shifts in strategy cannot be captured by a static MoA. To investigate such dynamics, we present the mixture-of-agents hidden Markov model (MoA-HMM), which simultaneously learns inferred action values from a set of agents and the temporal dynamics of underlying "hidden" states that capture shifts in agent contributions over time. Applying this model to a multi-step,reward-guided task in rats reveals a progression of within-session strategies: a shift from initial MB exploration to MB exploitation, and finally to reduced engagement. The inferred states predict changes in both response time and OFC neural encoding during the task, suggesting that these states are capturing real shifts in dynamics.
Collapse
|
9
|
Lloyd A, Viding E, McKay R, Furl N. Understanding patch foraging strategies across development. Trends Cogn Sci 2023; 27:1085-1098. [PMID: 37500422 DOI: 10.1016/j.tics.2023.07.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 07/05/2023] [Accepted: 07/06/2023] [Indexed: 07/29/2023]
Abstract
Patch foraging is a near-ubiquitous behaviour across the animal kingdom and characterises many decision-making domains encountered by humans. We review how a disposition to explore in adolescence may reflect the evolutionary conditions under which hunter-gatherers foraged for resources. We propose that neurocomputational mechanisms responsible for reward processing, learning, and cognitive control facilitate the transition from exploratory strategies in adolescence to exploitative strategies in adulthood - where individuals capitalise on known resources. This developmental transition may be disrupted by psychopathology, as there is emerging evidence of biases in explore/exploit choices in mental health problems. Explore/exploit choices may be an informative marker for mental health across development and future research should consider this feature of decision-making as a target for clinical intervention.
Collapse
Affiliation(s)
- Alex Lloyd
- Clinical, Educational, and Health Psychology, Psychology and Language Sciences, University College London, 26 Bedford Way, London, WC1H 0AP, UK.
| | - Essi Viding
- Clinical, Educational, and Health Psychology, Psychology and Language Sciences, University College London, 26 Bedford Way, London, WC1H 0AP, UK
| | - Ryan McKay
- Department of Psychology, Royal Holloway, University of London, Egham Hill, Egham, TW20 0EX, UK
| | - Nicholas Furl
- Department of Psychology, Royal Holloway, University of London, Egham Hill, Egham, TW20 0EX, UK
| |
Collapse
|
10
|
Lorents A, Colin ME, Bjerke IE, Nougaret S, Montelisciani L, Diaz M, Verschure P, Vezoli J. Human Brain Project Partnering Projects Meeting: Status Quo and Outlook. eNeuro 2023; 10:ENEURO.0091-23.2023. [PMID: 37669867 PMCID: PMC10481639 DOI: 10.1523/eneuro.0091-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Revised: 08/03/2023] [Accepted: 08/04/2023] [Indexed: 09/07/2023] Open
Abstract
As the European Flagship Human Brain Project (HBP) ends in September 2023, a meeting dedicated to the Partnering Projects (PPs), a collective of independent research groups that partnered with the HBP, was held on September 4-7, 2022. The purpose of this meeting was to allow these groups to present their results, reflect on their collaboration with the HBP and discuss future interactions with the European Research Infrastructure (RI) EBRAINS that has emerged from the HBP. In this report, we share the tour-de-force that the Partnering Projects that were present in the meeting have made in furthering knowledge concerning various aspects of Brain Research with the HBP. We describe briefly major achievements of the HBP Partnering Projects in terms of a systems-level understanding of the functional architecture of the brain and its possible emulation in artificial systems. We then recapitulate open discussions with EBRAINS representatives about the evolution of EBRAINS as a sustainable Research Infrastructure for the Partnering Projects after the HBP, and also for the wider scientific community.
Collapse
Affiliation(s)
| | | | - Ingvild Elise Bjerke
- Neural Systems Laboratory, Institute of Basic Medical Sciences, University of Oslo, Oslo 0372, Norway
| | - Simon Nougaret
- Institut de Neurosciences de la Timone, Unité Mixte de Recherche 7289, Aix Marseille Université, Centre National de la Recherche Scientifique, Marseille 13005, France
| | - Luca Montelisciani
- Cognitive and Systems Neuroscience Group, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam 1098XH, The Netherlands
| | - Marissa Diaz
- Institute for Advanced Simulation (IAS), Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich GmbH, Jülich 52428, Germany
| | - Paul Verschure
- Donders Center for Neuroscience (DCN-FNWI), Radboud University, Nijmegen 6500HD, The Netherlands
| | - Julien Vezoli
- Ernst Strügmann Institute (ESI) for Neuroscience in Cooperation with Max Planck Society, Frankfurt am Main 60528, Germany
- Institut National de la Santé et de la Recherche Médicale Unité 1208, Stem Cell and Brain Research Institute, Université Claude Bernard Lyon 1, Bron 69500, France
| |
Collapse
|
11
|
Blackwell KT, Doya K. Enhancing reinforcement learning models by including direct and indirect pathways improves performance on striatal dependent tasks. PLoS Comput Biol 2023; 19:e1011385. [PMID: 37594982 PMCID: PMC10479916 DOI: 10.1371/journal.pcbi.1011385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 09/05/2023] [Accepted: 07/25/2023] [Indexed: 08/20/2023] Open
Abstract
A major advance in understanding learning behavior stems from experiments showing that reward learning requires dopamine inputs to striatal neurons and arises from synaptic plasticity of cortico-striatal synapses. Numerous reinforcement learning models mimic this dopamine-dependent synaptic plasticity by using the reward prediction error, which resembles dopamine neuron firing, to learn the best action in response to a set of cues. Though these models can explain many facets of behavior, reproducing some types of goal-directed behavior, such as renewal and reversal, require additional model components. Here we present a reinforcement learning model, TD2Q, which better corresponds to the basal ganglia with two Q matrices, one representing direct pathway neurons (G) and another representing indirect pathway neurons (N). Unlike previous two-Q architectures, a novel and critical aspect of TD2Q is to update the G and N matrices utilizing the temporal difference reward prediction error. A best action is selected for N and G using a softmax with a reward-dependent adaptive exploration parameter, and then differences are resolved using a second selection step applied to the two action probabilities. The model is tested on a range of multi-step tasks including extinction, renewal, discrimination; switching reward probability learning; and sequence learning. Simulations show that TD2Q produces behaviors similar to rodents in choice and sequence learning tasks, and that use of the temporal difference reward prediction error is required to learn multi-step tasks. Blocking the update rule on the N matrix blocks discrimination learning, as observed experimentally. Performance in the sequence learning task is dramatically improved with two matrices. These results suggest that including additional aspects of basal ganglia physiology can improve the performance of reinforcement learning models, better reproduce animal behaviors, and provide insight as to the role of direct- and indirect-pathway striatal neurons.
Collapse
Affiliation(s)
- Kim T Blackwell
- Department of Bioengineering, Volgenau School of Engineering, George Mason University, Fairfax, Virginia, United States of America
| | - Kenji Doya
- Neural Computation Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| |
Collapse
|
12
|
Tranter MM, Aggarwal S, Young JW, Dillon DG, Barnes SA. Reinforcement learning deficits exhibited by postnatal PCP-treated rats enable deep neural network classification. Neuropsychopharmacology 2023; 48:1377-1385. [PMID: 36509858 PMCID: PMC10354061 DOI: 10.1038/s41386-022-01514-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 11/21/2022] [Accepted: 11/26/2022] [Indexed: 12/14/2022]
Abstract
The ability to appropriately update the value of a given action is a critical component of flexible decision making. Several psychiatric disorders, including schizophrenia, are associated with impairments in flexible decision making that can be evaluated using the probabilistic reversal learning (PRL) task. The PRL task has been reverse-translated for use in rodents. Disrupting glutamate neurotransmission during early postnatal neurodevelopment in rodents has induced behavioral, cognitive, and neuropathophysiological abnormalities relevant to schizophrenia. Here, we tested the hypothesis that using the NMDA receptor antagonist phencyclidine (PCP) to disrupt postnatal glutamatergic transmission in rats would lead to impaired decision making in the PRL. Consistent with this hypothesis, compared to controls the postnatal PCP-treated rats completed fewer reversals and exhibited disruptions in reward and punishment sensitivity (i.e., win-stay and lose-shift responding, respectively). Moreover, computational analysis of behavior revealed that postnatal PCP-treatment resulted in a pronounced impairment in the learning rate throughout PRL testing. Finally, a deep neural network (DNN) trained on the rodent behavior could accurately predict the treatment group of subjects. These data demonstrate that disrupting early postnatal glutamatergic neurotransmission impairs flexible decision making and provides evidence that DNNs can be trained on behavioral datasets to accurately predict the treatment group of new subjects, highlighting the potential for DNNs to aid in the diagnosis of schizophrenia.
Collapse
Affiliation(s)
- Michael M Tranter
- Department of Psychiatry, University of California San Diego, La Jolla, CA, 92093, USA
- Department of Mental Health, VA San Diego Healthcare System, La Jolla, CA, 92093, USA
| | - Samarth Aggarwal
- Department of Psychiatry, University of California San Diego, La Jolla, CA, 92093, USA
| | - Jared W Young
- Department of Psychiatry, University of California San Diego, La Jolla, CA, 92093, USA
- Department of Mental Health, VA San Diego Healthcare System, La Jolla, CA, 92093, USA
| | - Daniel G Dillon
- Center for Depression, Anxiety and Stress Research, McLean Hospital, Belmont, MA, 02478, USA
- Harvard Medical School, Boston, MA, 02115, USA
| | - Samuel A Barnes
- Department of Psychiatry, University of California San Diego, La Jolla, CA, 92093, USA.
- Department of Mental Health, VA San Diego Healthcare System, La Jolla, CA, 92093, USA.
| |
Collapse
|
13
|
Chen CS, Mueller D, Knep E, Ebitz RB, Grissom NM. Dopamine and norepinephrine differentially mediate the exploration-exploitation tradeoff. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.09.523322. [PMID: 36711959 PMCID: PMC9881999 DOI: 10.1101/2023.01.09.523322] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
The catecholamines dopamine (DA) and norepinephrine (NE) have been repeatedly implicated in neuropsychiatric vulnerability, in part via their roles in mediating the decision making processes. Although the two neuromodulators share a synthesis pathway and are co-activated under states of arousal, they engage in distinct circuits and roles in modulating neural activity across the brain. However, in the computational neuroscience literature, they have been assigned similar roles in modulating the latent cognitive processes of decision making, in particular the exploration-exploitation tradeoff. Revealing how each neuromodulator contributes to this explore-exploit process will be important in guiding mechanistic hypotheses emerging from computational psychiatric approaches. To understand the differences and overlaps of the roles of these two catecholamine systems in regulating exploration and exploitation, a direct comparison using the same dynamic decision making task is needed. Here, we ran mice in a restless two-armed bandit task, which encourages both exploration and exploitation. We systemically administered a nonselective DA receptor antagonist (flupenthixol), a nonselective DA receptor agonist (apomorphine), a NE beta-receptor antagonist (propranolol), and a NE beta-receptor agonist (isoproterenol), and examined changes in exploration within subjects across sessions. We found a bidirectional modulatory effect of dopamine receptor activity on the level of exploration. Increasing dopamine activity decreased exploration and decreasing dopamine activity increased exploration. Beta-noradrenergic receptor activity also modulated exploration, but the modulatory effect was mediated by sex. Reinforcement learning model parameters suggested that dopamine modulation affected exploration via decision noise and norepinephrine modulation affected exploration via outcome sensitivity. Together, these findings suggested that the mechanisms that govern the transition between exploration and exploitation are sensitive to changes in both catecholamine functions and revealed differential roles for NE and DA in mediating exploration.
Collapse
|
14
|
Wang S, Gerken B, Wieland JR, Wilson RC, Fellous JM. The effects of time horizon and guided choices on explore-exploit decisions in rodents. Behav Neurosci 2023; 137:127-142. [PMID: 36633987 PMCID: PMC10787949 DOI: 10.1037/bne0000549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Humans and animals have to balance the need for exploring new options with exploiting known options that yield good outcomes. This tradeoff is known as the explore-exploit dilemma. To better understand the neural mechanisms underlying how humans and animals address the explore-exploit dilemma, a good animal behavioral model is critical. Most previous rodents explore-exploit studies used ethologically unrealistic operant boxes and reversal learning paradigms in which the decision to abandon a bad option is confounded by the need for exploring a novel option for information collection, making it difficult to separate different drives and heuristics for exploration. In this study, we investigated how rodents make explore-exploit decisions using a spatial navigation horizon task (Wilson et al., 2014) adapted to rats to address the above limitations. We compared the rats' performance to that of humans using identical measures. We showed that rats use prior information to effectively guide exploration. In addition, rats use information-driven directed exploration like humans, but the extent to which they explore has the opposite dependance on time horizon than humans. Moreover, we found that free choices and guided choices have different influences on exploration in rodents, a finding that has not yet been tested in humans. This study reveals that the explore-exploit spatial behavior of rats is more complex than previously thought. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
Collapse
|
15
|
Speers LJ, Bilkey DK. Maladaptive explore/exploit trade-offs in schizophrenia. Trends Neurosci 2023; 46:341-354. [PMID: 36878821 DOI: 10.1016/j.tins.2023.02.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 01/30/2023] [Accepted: 02/08/2023] [Indexed: 03/07/2023]
Abstract
Schizophrenia is a complex disorder that remains poorly understood, particularly at the systems level. In this opinion article we argue that the explore/exploit trade-off concept provides a holistic and ecologically valid framework to resolve some of the apparent paradoxes that have emerged within schizophrenia research. We review recent evidence suggesting that fundamental explore/exploit behaviors may be maladaptive in schizophrenia during physical, visual, and cognitive foraging. We also describe how theories from the broader optimal foraging literature, such as the marginal value theorem (MVT), could provide valuable insight into how aberrant processing of reward, context, and cost/effort evaluations interact to produce maladaptive responses.
Collapse
Affiliation(s)
- Lucinda J Speers
- Department of Psychology, University of Otago, Dunedin 9016, New Zealand
| | - David K Bilkey
- Department of Psychology, University of Otago, Dunedin 9016, New Zealand.
| |
Collapse
|
16
|
Rojas GR, Curry-Pochy LS, Chen CS, Heller AT, Grissom NM. Sequential delay and probability discounting tasks in mice reveal anchoring effects partially attributable to decision noise. Behav Brain Res 2022; 431:113951. [PMID: 35661751 PMCID: PMC9844124 DOI: 10.1016/j.bbr.2022.113951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 05/20/2022] [Accepted: 05/29/2022] [Indexed: 01/19/2023]
Abstract
Delay discounting and probability discounting decision making tasks in rodent models have high translational potential. However, it is unclear whether the discounted value of the large reward option is the main contributor to variability in animals' choices in either task, which may limit translation to humans. Male and female mice underwent sessions of delay and probability discounting in sequence to assess how choice behavior adapts over experience with each task. To control for "anchoring" (persistent choices based on the initial delay or probability), mice experienced "Worsening" schedules where the large reward was offered under initially favorable conditions that became less favorable during testing, followed by "Improving" schedules where the large reward was offered under initially unfavorable conditions that improved over a session. During delay discounting, both male and female mice showed elimination of anchoring effects over training. In probability discounting, both sexes of mice continued to show some anchoring even after months of training. One possibility is that "noisy", exploratory choices could contribute to these persistent anchoring effects, rather than constant fluctuations in value discounting. We fit choice behavior in individual animals using models that included both a value-based discounting parameter and a decision noise parameter that captured variability in choices deviating from value maximization. Changes in anchoring behavior over time were tracked by changes in both the value and decision noise parameters in delay discounting, but by the decision noise parameter in probability discounting. Exploratory decision making was also reflected in choice response times that tracked the degree of conflict caused by both uncertainty and temporal cost, but was not linked with differences in locomotor activity reflecting chamber exploration. Thus, variable discounting behavior in mice can result from changes in exploration of the decision options rather than changes in reward valuation.
Collapse
|
17
|
Karin O, Alon U. The dopamine circuit as a reward-taxis navigation system. PLoS Comput Biol 2022; 18:e1010340. [PMID: 35877694 PMCID: PMC9352198 DOI: 10.1371/journal.pcbi.1010340] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 08/04/2022] [Accepted: 06/29/2022] [Indexed: 01/29/2023] Open
Abstract
Studying the brain circuits that control behavior is challenging, since in addition to their structural complexity there are continuous feedback interactions between actions and sensed inputs from the environment. It is therefore important to identify mathematical principles that can be used to develop testable hypotheses. In this study, we use ideas and concepts from systems biology to study the dopamine system, which controls learning, motivation, and movement. Using data from neuronal recordings in behavioral experiments, we developed a mathematical model for dopamine responses and the effect of dopamine on movement. We show that the dopamine system shares core functional analogies with bacterial chemotaxis. Just as chemotaxis robustly climbs chemical attractant gradients, the dopamine circuit performs ‘reward-taxis’ where the attractant is the expected value of reward. The reward-taxis mechanism provides a simple explanation for scale-invariant dopaminergic responses and for matching in free operant settings, and makes testable quantitative predictions. We propose that reward-taxis is a simple and robust navigation strategy that complements other, more goal-directed navigation mechanisms.
Collapse
Affiliation(s)
- Omer Karin
- Dept. of Molecular Cell Biology, Weizmann Institute of Science, Rehovot Israel
- Dept. of Applied Mathematics and Theoretical Physics, Centre for Mathematical Sciences, University of Cambridge, Cambridge, United Kingdom
- Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, Cambridge, United Kingdom
- * E-mail: (OK); (UA)
| | - Uri Alon
- Dept. of Molecular Cell Biology, Weizmann Institute of Science, Rehovot Israel
- * E-mail: (OK); (UA)
| |
Collapse
|
18
|
Grzywacz NM, Aleem H. Does Amount of Information Support Aesthetic Values? Front Neurosci 2022; 16:805658. [PMID: 35392414 PMCID: PMC8982361 DOI: 10.3389/fnins.2022.805658] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Accepted: 02/16/2022] [Indexed: 11/24/2022] Open
Abstract
Obtaining information from the world is important for survival. The brain, therefore, has special mechanisms to extract as much information as possible from sensory stimuli. Hence, given its importance, the amount of available information may underlie aesthetic values. Such information-based aesthetic values would be significant because they would compete with others to drive decision-making. In this article, we ask, "What is the evidence that amount of information support aesthetic values?" An important concept in the measurement of informational volume is entropy. Research on aesthetic values has thus used Shannon entropy to evaluate the contribution of quantity of information. We review here the concepts of information and aesthetic values, and research on the visual and auditory systems to probe whether the brain uses entropy or other relevant measures, specially, Fisher information, in aesthetic decisions. We conclude that information measures contribute to these decisions in two ways: first, the absolute quantity of information can modulate aesthetic preferences for certain sensory patterns. However, the preference for volume of information is highly individualized, with information-measures competing with organizing principles, such as rhythm and symmetry. In addition, people tend to be resistant to too much entropy, but not necessarily, high amounts of Fisher information. We show that this resistance may stem in part from the distribution of amount of information in natural sensory stimuli. Second, the measurement of entropic-like quantities over time reveal that they can modulate aesthetic decisions by varying degrees of surprise given temporally integrated expectations. We propose that amount of information underpins complex aesthetic values, possibly informing the brain on the allocation of resources or the situational appropriateness of some cognitive models.
Collapse
Affiliation(s)
- Norberto M. Grzywacz
- Department of Psychology, Loyola University Chicago, Chicago, IL, United States
- Department of Molecular Pharmacology and Neuroscience, Loyola University Chicago, Chicago, IL, United States
- Interdisciplinary Program in Neuroscience, Georgetown University, Washington, DC, United States
| | - Hassan Aleem
- Interdisciplinary Program in Neuroscience, Georgetown University, Washington, DC, United States
| |
Collapse
|
19
|
Faure P, Fayad SL, Solié C, Reynolds LM. Social Determinants of Inter-Individual Variability and Vulnerability: The Role of Dopamine. Front Behav Neurosci 2022; 16:836343. [PMID: 35386723 PMCID: PMC8979673 DOI: 10.3389/fnbeh.2022.836343] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Accepted: 02/14/2022] [Indexed: 11/13/2022] Open
Abstract
Individuals differ in their traits and preferences, which shape their interactions, their prospects for survival and their susceptibility to diseases. These correlations are well documented, yet the neurophysiological mechanisms underlying the emergence of distinct personalities and their relation to vulnerability to diseases are poorly understood. Social ties, in particular, are thought to be major modulators of personality traits and psychiatric vulnerability, yet the majority of neuroscience studies are performed on rodents in socially impoverished conditions. Rodent micro-society paradigms are therefore key experimental paradigms to understand how social life generates diversity by shaping individual traits. Dopamine circuitry is implicated at the interface between social life experiences, the expression of essential traits, and the emergence of pathologies, thus proving a possible mechanism to link these three concepts at a neuromodulatory level. Evaluating inter-individual variability in automated social testing environments shows great promise for improving our understanding of the link between social life, personality, and precision psychiatry – as well as elucidating the underlying neurophysiological mechanisms.
Collapse
|
20
|
The role of state uncertainty in the dynamics of dopamine. Curr Biol 2022; 32:1077-1087.e9. [PMID: 35114098 PMCID: PMC8930519 DOI: 10.1016/j.cub.2022.01.025] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 11/22/2021] [Accepted: 01/10/2022] [Indexed: 11/22/2022]
Abstract
Reinforcement learning models of the basal ganglia map the phasic dopamine signal to reward prediction errors (RPEs). Conventional models assert that, when a stimulus predicts a reward with fixed delay, dopamine activity during the delay should converge to baseline through learning. However, recent studies have found that dopamine ramps up before reward in certain conditions even after learning, thus challenging the conventional models. In this work, we show that sensory feedback causes an unbiased learner to produce RPE ramps. Our model predicts that when feedback gradually decreases during a trial, dopamine activity should resemble a "bump," whose ramp-up phase should, furthermore, be greater than that of conditions where the feedback stays high. We trained mice on a virtual navigation task with varying brightness, and both predictions were empirically observed. In sum, our theoretical and experimental results reconcile the seemingly conflicting data on dopamine behaviors under the RPE hypothesis.
Collapse
|
21
|
Mikhael JG, Gershman SJ. Impulsivity and risk-seeking as Bayesian inference under dopaminergic control. Neuropsychopharmacology 2022; 47:465-476. [PMID: 34376813 PMCID: PMC8674258 DOI: 10.1038/s41386-021-01125-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/01/2020] [Revised: 07/17/2021] [Accepted: 07/21/2021] [Indexed: 02/07/2023]
Abstract
Bayesian models successfully account for several of dopamine (DA)'s effects on contextual calibration in interval timing and reward estimation. In these models, tonic levels of DA control the precision of stimulus encoding, which is weighed against contextual information when making decisions. When DA levels are high, the animal relies more heavily on the (highly precise) stimulus encoding, whereas when DA levels are low, the context affects decisions more strongly. Here, we extend this idea to intertemporal choice and probability discounting tasks. In intertemporal choice tasks, agents must choose between a small reward delivered soon and a large reward delivered later, whereas in probability discounting tasks, agents must choose between a small reward that is always delivered and a large reward that may be omitted with some probability. Beginning with the principle that animals will seek to maximize their reward rates, we show that the Bayesian model predicts a number of curious empirical findings in both tasks. First, the model predicts that higher DA levels should normally promote selection of the larger/later option, which is often taken to imply that DA decreases 'impulsivity,' and promote selection of the large/risky option, often taken to imply that DA increases 'risk-seeking.' However, if the temporal precision is sufficiently decreased, higher DA levels should have the opposite effect-promoting selection of the smaller/sooner option (higher impulsivity) and the small/safe option (lower risk-seeking). Second, high enough levels of DA can result in preference reversals. Third, selectively decreasing the temporal precision, without manipulating DA, should promote selection of the larger/later and large/risky options. Fourth, when a different post-reward delay is associated with each option, animals will not learn the option-delay contingencies, but this learning can be salvaged when the post-reward delays are made more salient. Finally, the Bayesian model predicts correlations among behavioral phenotypes: Animals that are better timers will also appear less impulsive.
Collapse
Affiliation(s)
- John G. Mikhael
- grid.38142.3c000000041936754XProgram in Neuroscience, Harvard Medical School, Boston, MA USA ,grid.38142.3c000000041936754XMD-PhD Program, Harvard Medical School, Boston, MA USA
| | - Samuel J. Gershman
- grid.38142.3c000000041936754XDepartment of Psychology and Center for Brain Science, Harvard University, Cambridge, MA USA ,grid.116068.80000 0001 2341 2786Center for Brains, Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA USA
| |
Collapse
|
22
|
Spreng RN, Turner GR. From exploration to exploitation: a shifting mental mode in late life development. Trends Cogn Sci 2021; 25:1058-1071. [PMID: 34593321 PMCID: PMC8844884 DOI: 10.1016/j.tics.2021.09.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2021] [Revised: 08/30/2021] [Accepted: 09/01/2021] [Indexed: 12/31/2022]
Abstract
Changes in cognition, affect, and brain function combine to promote a shift in the nature of mentation in older adulthood, favoring exploitation of prior knowledge over exploratory search as the starting point for thought and action. Age-related exploitation biases result from the accumulation of prior knowledge, reduced cognitive control, and a shift toward affective goals. These are accompanied by changes in cortical networks, as well as attention and reward circuits. By incorporating these factors into a unified account, the exploration-to-exploitation shift offers an integrative model of cognitive, affective, and brain aging. Here, we review evidence for this model, identify determinants and consequences, and survey the challenges and opportunities posed by an exploitation-biased mental mode in later life.
Collapse
Affiliation(s)
- R Nathan Spreng
- Laboratory of Brain and Cognition, Montreal Neurological Institute, Department of Neurology and Neurosurgery, McGill University, Montreal, QC H3A 2B4, Canada; McConnell Brain Imaging Centre, Montreal Neurological Institute, McGill University, Montreal, QC H3A 2B4, Canada; Departments of Psychiatry and Psychology, McGill University, Montreal, QC H3A 0G4, Canada.
| | - Gary R Turner
- Department of Psychology, York University, Toronto, ON M3J 1P3, Canada
| |
Collapse
|
23
|
Adaptive exploration policy for exploration–exploitation tradeoff in continuous action control optimization. INT J MACH LEARN CYB 2021. [DOI: 10.1007/s13042-021-01387-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
24
|
Chronic nicotine increases midbrain dopamine neuron activity and biases individual strategies towards reduced exploration in mice. Nat Commun 2021; 12:6945. [PMID: 34836948 PMCID: PMC8635406 DOI: 10.1038/s41467-021-27268-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Accepted: 11/04/2021] [Indexed: 11/09/2022] Open
Abstract
Long-term exposure to nicotine alters brain circuits and induces profound changes in decision-making strategies, affecting behaviors both related and unrelated to drug seeking and consumption. Using an intracranial self-stimulation reward-based foraging task, we investigated in mice the impact of chronic nicotine on midbrain dopamine neuron activity and its consequence on the trade-off between exploitation and exploration. Model-based and archetypal analysis revealed substantial inter-individual variability in decision-making strategies, with mice passively exposed to nicotine shifting toward a more exploitative profile compared to non-exposed animals. We then mimicked the effect of chronic nicotine on the tonic activity of dopamine neurons using optogenetics, and found that photo-stimulated mice adopted a behavioral phenotype similar to that of mice exposed to chronic nicotine. Our results reveal a key role of tonic midbrain dopamine in the exploration/exploitation trade-off and highlight a potential mechanism by which nicotine affects the exploration/exploitation balance and decision-making.
Collapse
|
25
|
Chen CS, Knep E, Han A, Ebitz RB, Grissom N. Sex differences in learning from exploration. eLife 2021; 10:69748. [PMID: 34796870 PMCID: PMC8794469 DOI: 10.7554/elife.69748] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2021] [Accepted: 11/18/2021] [Indexed: 11/13/2022] Open
Abstract
Sex-based modulation of cognitive processes could set the stage for individual differences in vulnerability to neuropsychiatric disorders. While value-based decision making processes in particular have been proposed to be influenced by sex differences, the overall correct performance in decision making tasks often show variable or minimal differences across sexes. Computational tools allow us to uncover latent variables that define different decision making approaches, even in animals with similar correct performance. Here, we quantify sex differences in mice in the latent variables underlying behavior in a classic value-based decision making task: a restless 2-armed bandit. While male and female mice had similar accuracy, they achieved this performance via different patterns of exploration. Male mice tended to make more exploratory choices overall, largely because they appeared to get 'stuck' in exploration once they had started. Female mice tended to explore less but learned more quickly during exploration. Together, these results suggest that sex exerts stronger influences on decision making during periods of learning and exploration than during stable choices. Exploration during decision making is altered in people diagnosed with addictions, depression, and neurodevelopmental disabilities, pinpointing the neural mechanisms of exploration as a highly translational avenue for conferring sex-modulated vulnerability to neuropsychiatric diagnoses.
Collapse
Affiliation(s)
- Cathy S Chen
- University of Minnesota, Minneapolis, United States
| | - Evan Knep
- University of Minnesota, Minneapolis, United States
| | - Autumn Han
- University of Minnesota, Minneapolis, United States
| | - R Becket Ebitz
- Department of Neurosciences, Princeton University, Princeton, United States
| | | |
Collapse
|
26
|
Hamelin H, Poizat G, Florian C, Kursa MB, Pittaras E, Callebert J, Rampon C, Taouis M, Hamed A, Granon S. Prolonged Consumption of Sweetened Beverages Lastingly Deteriorates Cognitive Functions and Reward Processing in Mice. Cereb Cortex 2021; 32:1365-1378. [PMID: 34491298 DOI: 10.1093/cercor/bhab274] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Revised: 07/10/2021] [Accepted: 07/12/2021] [Indexed: 12/25/2022] Open
Abstract
We investigated the detrimental effects of chronic consumption of sweet or sweetened beverages in mice. We report that consumption of beverages containing small amounts of sucrose during several weeks impaired reward systems. This is evidenced by robust changes in the activation pattern of prefrontal brain regions associated with abnormal risk-taking and delayed establishment of decision-making strategy. Supporting these findings, we find that chronic consumption of low doses of artificial sweeteners such as saccharin disrupts brain regions' activity engaged in decision-making and reward processes. Consequently, this leads to the rapid development of inflexible decisions, particularly in a subset of vulnerable individuals. Our data also reveal that regular consumption, even at low doses, of sweet or sweeteners dramatically alters brain neurochemistry, i.e., dopamine content and turnover, and high cognitive functions, while sparing metabolic regulations. Our findings suggest that it would be relevant to focus on long-term consequences on the brain of sweet or sweetened beverages in humans, especially as they may go metabolically unnoticed.
Collapse
Affiliation(s)
- Héloïse Hamelin
- Université Paris-Saclay, CNRS, Institut des Neurosciences Paris-Saclay, 91190, Gif-ur-Yvette, France
| | - Ghislaine Poizat
- Université Paris-Saclay, CNRS, Institut des Neurosciences Paris-Saclay, 91190, Gif-ur-Yvette, France
| | - Cédrick Florian
- Research Center on Animal Cognition (CRCA), Center for Integrative Biology, CNRS UMR 5169, Toulouse 31062, France
| | - Miron Bartosz Kursa
- Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, 02-106 Warsaw, Poland
| | - Elsa Pittaras
- Stanford University, Heller Laboratory, Stanford, CA 94305-5020, USA
| | - Jacques Callebert
- Service of Biochemistry and Molecular Biology, INSERM U942, Hospital Lariboisière, APHP, Paris 75010, France
| | - Claire Rampon
- Research Center on Animal Cognition (CRCA), Center for Integrative Biology, CNRS UMR 5169, Toulouse 31062, France
| | - Mohammed Taouis
- Université Paris-Saclay, CNRS, Institut des Neurosciences Paris-Saclay, 91190, Gif-ur-Yvette, France
| | - Adam Hamed
- Laboratory of Spatial Memory, Nencki Institute of Experimental Biology, Polish Academy of Sciences, 02-093 Warsaw, Poland
| | - Sylvie Granon
- Université Paris-Saclay, CNRS, Institut des Neurosciences Paris-Saclay, 91190, Gif-ur-Yvette, France
| |
Collapse
|
27
|
Foo C, Lozada A, Aljadeff J, Li Y, Wang JW, Slesinger PA, Kleinfeld D. Reinforcement learning links spontaneous cortical dopamine impulses to reward. Curr Biol 2021; 31:4111-4119.e4. [PMID: 34302743 DOI: 10.1016/j.cub.2021.06.069] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 05/28/2021] [Accepted: 06/24/2021] [Indexed: 11/15/2022]
Abstract
In their pioneering study on dopamine release, Romo and Schultz speculated "...that the amount of dopamine released by unmodulated spontaneous impulse activity exerts a tonic, permissive influence on neuronal processes more actively engaged in preparation of self-initiated movements...."1 Motivated by the suggestion of "spontaneous impulses," as well as by the "ramp up" of dopaminergic neuronal activity that occurs when rodents navigate to a reward,2-5 we asked two questions. First, are there spontaneous impulses of dopamine that are released in cortex? Using cell-based optical sensors of extrasynaptic dopamine, [DA]ex,6 we found that spontaneous dopamine impulses in cortex of naive mice occur at a rate of ∼0.01 per second. Next, can mice be trained to change the amplitude and/or timing of dopamine events triggered by internal brain dynamics, much as they can change the amplitude and timing of dopamine impulses based on an external cue?7-9 Using a reinforcement learning paradigm based solely on rewards that were gated by feedback from real-time measurements of [DA]ex, we found that mice can volitionally modulate their spontaneous [DA]ex. In particular, by only the second session of daily, hour-long training, mice increased the rate of impulses of [DA]ex, increased the amplitude of the impulses, and increased their tonic level of [DA]ex for a reward. Critically, mice learned to reliably elicit [DA]ex impulses prior to receiving a reward. These effects reversed when the reward was removed. We posit that spontaneous dopamine impulses may serve as a salient cognitive event in behavioral planning.
Collapse
Affiliation(s)
- Conrad Foo
- Department of Physics, University of California at San Diego, La Jolla, CA 92093, USA
| | - Adrian Lozada
- Department of Physics, University of California at San Diego, La Jolla, CA 92093, USA
| | - Johnatan Aljadeff
- Section of Neurobiology, University of California at San Diego, La Jolla, CA 92093, USA
| | - Yulong Li
- Peking University, School of Life Sciences, Peking University, Beijing 100871, P.R. China
| | - Jing W Wang
- Section of Neurobiology, University of California at San Diego, La Jolla, CA 92093, USA
| | - Paul A Slesinger
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - David Kleinfeld
- Department of Physics, University of California at San Diego, La Jolla, CA 92093, USA; Section of Neurobiology, University of California at San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
28
|
Koralek AC, Costa RM. Dichotomous dopaminergic and noradrenergic neural states mediate distinct aspects of exploitative behavioral states. SCIENCE ADVANCES 2021; 7:7/30/eabh2059. [PMID: 34301604 PMCID: PMC8302134 DOI: 10.1126/sciadv.abh2059] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 06/07/2021] [Indexed: 06/13/2023]
Abstract
The balance between exploiting known actions and exploring alternatives is critical for survival and hypothesized to rely on shifts in neuromodulation. We developed a behavioral paradigm to capture exploitative and exploratory states and imaged calcium dynamics in genetically identified dopaminergic and noradrenergic neurons. During exploitative states, characterized by motivated repetition of the same action choice, dopamine neurons in SNc encoding movement vigor showed sustained elevation of basal activity that lasted many seconds. This sustained activity emerged from longer positive responses, which accumulated during exploitative action-reward bouts, and hysteretic dynamics. Conversely, noradrenergic neurons in LC showed sustained inhibition of basal activity due to the accumulation of longer negative responses in LC. Chemogenetic manipulation of these sustained dynamics revealed that dopaminergic activity mediates action drive, whereas noradrenergic activity modulates choice diversity. These data uncover the emergence of sustained neural states in dopaminergic and noradrenergic networks that mediate dissociable aspects of exploitative bouts.
Collapse
Affiliation(s)
- Aaron C Koralek
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
- Champalimaud Neuroscience Programme, Champalimaud Centre for the Unknown, Lisbon, Portugal
| | - Rui M Costa
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA.
- Champalimaud Neuroscience Programme, Champalimaud Centre for the Unknown, Lisbon, Portugal
| |
Collapse
|
29
|
Guo D, Yu AJ. Revisiting the Role of Uncertainty-Driven Exploration in a (Perceived) Non-Stationary World. COGSCI ... ANNUAL CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY. COGNITIVE SCIENCE SOCIETY (U.S.). CONFERENCE 2021; 43:2045-2051. [PMID: 34368809 PMCID: PMC8341546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Humans are often faced with an exploration-versus-exploitation trade-off. A commonly used paradigm, multi-armed bandit, has shown humans to exhibit an "uncertainty bonus", which combines with estimated reward to drive exploration. However, previous studies often modeled belief updating using either a Bayesian model that assumed the reward contingency to remain stationary, or a reinforcement learning model. Separately, we previously showed that human learning in the bandit task is best captured by a dynamic-belief Bayesian model. We hypothesize that the estimated uncertainty bonus may depend on which learning model is employed. Here, we re-analyze a bandit dataset using all three learning models. We find that the dynamic-belief model captures human choice behavior best, while also uncovering a much larger uncertainty bonus than the other models. More broadly, our results also emphasize the importance of an appropriate learning model, as it is crucial for correctly characterizing the processes underlying human decision making.
Collapse
Affiliation(s)
- Dalin Guo
- Department of Cognitive Science, University of California, San Diego La Jolla, CA 92093 USA
| | - Angela J Yu
- Department of Cognitive Science & Halıcıoglu Data Science Institute, University of California, San Diego La Jolla, CA 92093 USA
| |
Collapse
|
30
|
Ohta H, Satori K, Takarada Y, Arake M, Ishizuka T, Morimoto Y, Takahashi T. The asymmetric learning rates of murine exploratory behavior in sparse reward environments. Neural Netw 2021; 143:218-229. [PMID: 34157646 DOI: 10.1016/j.neunet.2021.05.030] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2021] [Revised: 04/16/2021] [Accepted: 05/26/2021] [Indexed: 11/29/2022]
Abstract
Goal-oriented behaviors of animals can be modeled by reinforcement learning algorithms. Such algorithms predict future outcomes of selected actions utilizing action values and updating those values in response to the positive and negative outcomes. In many models of animal behavior, the action values are updated symmetrically based on a common learning rate, that is, in the same way for both positive and negative outcomes. However, animals in environments with scarce rewards may have uneven learning rates. To investigate the asymmetry in learning rates in reward and non-reward, we analyzed the exploration behavior of mice in five-armed bandit tasks using a Q-learning model with differential learning rates for positive and negative outcomes. The positive learning rate was significantly higher in a scarce reward environment than in a rich reward environment, and conversely, the negative learning rate was significantly lower in the scarce environment. The positive to negative learning rate ratio was about 10 in the scarce environment and about 2 in the rich environment. This result suggests that when the reward probability was low, the mice tend to ignore failures and exploit the rare rewards. Computational modeling analysis revealed that the increased learning rates ratio could cause an overestimation of and perseveration on rare-rewarding events, increasing total reward acquisition in the scarce environment but disadvantaging impartial exploration.
Collapse
Affiliation(s)
- Hiroyuki Ohta
- Department of Pharmacology, National Defense Medical College, Saitama, 359-8513, Japan.
| | | | - Yu Takarada
- Tokyo Denki University, Saitama, 350-0394, Japan
| | - Masashi Arake
- Department of Physiology, National Defense Medical College, Saitama, 359-8513, Japan
| | - Toshiaki Ishizuka
- Department of Pharmacology, National Defense Medical College, Saitama, 359-8513, Japan
| | - Yuji Morimoto
- Department of Physiology, National Defense Medical College, Saitama, 359-8513, Japan
| | | |
Collapse
|
31
|
Gilbertson T, Steele D. Tonic dopamine, uncertainty and basal ganglia action selection. Neuroscience 2021; 466:109-124. [PMID: 34015370 DOI: 10.1016/j.neuroscience.2021.05.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 05/04/2021] [Accepted: 05/08/2021] [Indexed: 11/29/2022]
Abstract
To make optimal decisions in uncertain circumstances flexible adaption of behaviour is required; exploring alternatives when the best choice is unknown, exploiting what is known when that is best. Using a computational model of the basal ganglia, we propose that switches between exploratory and exploitative decisions are mediated by the interaction between tonic dopamine and cortical input to the basal ganglia. We show that a biologically detailed action selection circuit model, endowed with dopamine dependant striatal plasticity, can optimally solve the explore-exploit problem, estimating the true underlying state of a noisy Gaussian diffusion process. Critical to the model's performance was a fluctuating level of tonic dopamine which increased under conditions of uncertainty. With an optimal range of tonic dopamine, explore-exploit decisions were mediated by the effects of tonic dopamine on the precision of the model action selection mechanism. Under conditions of uncertain reward pay-out, the model's reduced selectivity allowed disinhibition of multiple alternative actions to be explored at random. Conversely, when uncertainly about reward pay-out was low, enhanced selectivity of the action selection circuit facilitated exploitation of the high value choice. Model performance was at the level of a Kalman filter which provides an optimal solution for the task. These simulations support the idea that this subcortical neural circuit may have evolved to facilitate decision making in non-stationary reward environments. The model generates several experimental predictions with relevance to abnormal decision making in neuropsychiatric and neurological disease.
Collapse
Affiliation(s)
- Tom Gilbertson
- Department of Neurology, Level 6, South Block, Ninewells Hospital & Medical School, Dundee DD2 4BF, UK; Division of Imaging Science and Technology, Medical School, University of Dundee, DD2 4BF, UK.
| | - Douglas Steele
- Division of Imaging Science and Technology, Medical School, University of Dundee, DD2 4BF, UK
| |
Collapse
|
32
|
Marzecová A, Kaiser LF, Maddah A. Neuromodulation of Foraging Decisions: The Role of Dopamine. Front Behav Neurosci 2021; 15:660667. [PMID: 33927602 PMCID: PMC8076528 DOI: 10.3389/fnbeh.2021.660667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 03/15/2021] [Indexed: 11/22/2022] Open
Affiliation(s)
- Anna Marzecová
- Department of Experimental Psychology, Ghent University, Ghent, Belgium.,Institute of Experimental Psychology, Heinrich-Heine University, Düsseldorf, Germany
| | - Luca F Kaiser
- Institute of Experimental Psychology, Heinrich-Heine University, Düsseldorf, Germany
| | - Armin Maddah
- Institute of Experimental Psychology, Heinrich-Heine University, Düsseldorf, Germany
| |
Collapse
|
33
|
Wilson RC, Bonawitz E, Costa VD, Ebitz RB. Balancing exploration and exploitation with information and randomization. Curr Opin Behav Sci 2021; 38:49-56. [PMID: 33184605 PMCID: PMC7654823 DOI: 10.1016/j.cobeha.2020.10.001] [Citation(s) in RCA: 78] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Explore-exploit decisions require us to trade off the benefits of exploring unknown options to learn more about them, with exploiting known options, for immediate reward. Such decisions are ubiquitous in nature, but from a computational perspective, they are notoriously hard. There is therefore much interest in how humans and animals make these decisions and recently there has been an explosion of research in this area. Here we provide a biased and incomplete snapshot of this field focusing on the major finding that many organisms use two distinct strategies to solve the explore-exploit dilemma: a bias for information ('directed exploration') and the randomization of choice ('random exploration'). We review evidence for the existence of these strategies, their computational properties, their neural implementations, as well as how directed and random exploration vary over the lifespan. We conclude by highlighting open questions in this field that are ripe to both explore and exploit.
Collapse
Affiliation(s)
- Robert C. Wilson
- Department of Psychology, University of Arizona, Tucson AZ USA
- Cognitive Science Program, University of Arizona, Tucson AZ USA
- Evelyn F. McKnight Brain Institute, University of Arizona, Tucson AZ USA
| | | | - Vincent D. Costa
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland OR USA
| | - R. Becket Ebitz
- Department of Neuroscience, University of Montréal, Montréal, Québec, Canada
| |
Collapse
|
34
|
Wiehler A, Chakroun K, Peters J. Attenuated Directed Exploration during Reinforcement Learning in Gambling Disorder. J Neurosci 2021; 41:2512-2522. [PMID: 33531415 PMCID: PMC7984586 DOI: 10.1523/jneurosci.1607-20.2021] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Revised: 01/18/2021] [Accepted: 01/22/2021] [Indexed: 12/30/2022] Open
Abstract
Gambling disorder (GD) is a behavioral addiction associated with impairments in value-based decision-making and behavioral flexibility and might be linked to changes in the dopamine system. Maximizing long-term rewards requires a flexible trade-off between the exploitation of known options and the exploration of novel options for information gain. This exploration-exploitation trade-off is thought to depend on dopamine neurotransmission. We hypothesized that human gamblers would show a reduction in directed (uncertainty-based) exploration, accompanied by changes in brain activity in a fronto-parietal exploration-related network. Twenty-three frequent, non-treatment seeking gamblers and twenty-three healthy matched controls (all male) performed a four-armed bandit task during functional magnetic resonance imaging (fMRI). Computational modeling using hierarchical Bayesian parameter estimation revealed signatures of directed exploration, random exploration, and perseveration in both groups. Gamblers showed a reduction in directed exploration, whereas random exploration and perseveration were similar between groups. Neuroimaging revealed no evidence for group differences in neural representations of basic task variables (expected value, prediction errors). Our hypothesis of reduced frontal pole (FP) recruitment in gamblers was not supported. Exploratory analyses showed that during directed exploration, gamblers showed reduced parietal cortex and substantia-nigra/ventral-tegmental-area activity. Cross-validated classification analyses revealed that connectivity in an exploration-related network was predictive of group status, suggesting that connectivity patterns might be more predictive of problem gambling than univariate effects. Findings reveal specific reductions of strategic exploration in gamblers that might be linked to altered processing in a fronto-parietal network and/or changes in dopamine neurotransmission implicated in GD.SIGNIFICANCE STATEMENT Wiehler et al. (2021) report that gamblers rely less on the strategic exploration of unknown, but potentially better rewards during reward learning. This is reflected in a related network of brain activity. Parameters of this network can be used to predict the presence of problem gambling behavior in participants.
Collapse
Affiliation(s)
- A Wiehler
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg 20246, Germany
- Université de Paris, Paris F-75006, France
- Department of Psychiatry, Service Hospitalo-Universitaire, Groupe Hospitalier Universitaire Paris Psychiatrie & Neurosciences, Paris F-75014, France
| | - K Chakroun
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg 20246, Germany
| | - J Peters
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg 20246, Germany
- Department of Psychology, Biological Psychology, University of Cologne, Cologne 50923, Germany
| |
Collapse
|
35
|
Mikhael JG, Lai L, Gershman SJ. Rational inattention and tonic dopamine. PLoS Comput Biol 2021; 17:e1008659. [PMID: 33760806 PMCID: PMC7990190 DOI: 10.1371/journal.pcbi.1008659] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Accepted: 12/28/2020] [Indexed: 11/27/2022] Open
Abstract
Slow-timescale (tonic) changes in dopamine (DA) contribute to a wide variety of processes in reinforcement learning, interval timing, and other domains. Furthermore, changes in tonic DA exert distinct effects depending on when they occur (e.g., during learning vs. performance) and what task the subject is performing (e.g., operant vs. classical conditioning). Two influential theories of tonic DA-the average reward theory and the Bayesian theory in which DA controls precision-have each been successful at explaining a subset of empirical findings. But how the same DA signal performs two seemingly distinct functions without creating crosstalk is not well understood. Here we reconcile the two theories under the unifying framework of 'rational inattention,' which (1) conceptually links average reward and precision, (2) outlines how DA manipulations affect this relationship, and in so doing, (3) captures new empirical phenomena. In brief, rational inattention asserts that agents can increase their precision in a task (and thus improve their performance) by paying a cognitive cost. Crucially, whether this cost is worth paying depends on average reward availability, reported by DA. The monotonic relationship between average reward and precision means that the DA signal contains the information necessary to retrieve the precision. When this information is needed after the task is performed, as presumed by Bayesian inference, acute manipulations of DA will bias behavior in predictable ways. We show how this framework reconciles a remarkably large collection of experimental findings. In reinforcement learning, the rational inattention framework predicts that learning from positive and negative feedback should be enhanced in high and low DA states, respectively, and that DA should tip the exploration-exploitation balance toward exploitation. In interval timing, this framework predicts that DA should increase the speed of the internal clock and decrease the extent of interference by other temporal stimuli during temporal reproduction (the central tendency effect). Finally, rational inattention makes the new predictions that these effects should be critically dependent on the controllability of rewards, that post-reward delays in intertemporal choice tasks should be underestimated, and that average reward manipulations should affect the speed of the clock-thus capturing empirical findings that are unexplained by either theory alone. Our results suggest that a common computational repertoire may underlie the seemingly heterogeneous roles of DA.
Collapse
Affiliation(s)
- John G. Mikhael
- Program in Neuroscience, Harvard Medical School, Boston, Massachusetts, United States of America
- MD-PhD Program, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Lucy Lai
- Program in Neuroscience, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Samuel J. Gershman
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
| |
Collapse
|
36
|
Dubois M, Habicht J, Michely J, Moran R, Dolan RJ, Hauser TU. Human complex exploration strategies are enriched by noradrenaline-modulated heuristics. eLife 2021; 10:e59907. [PMID: 33393461 PMCID: PMC7815309 DOI: 10.7554/elife.59907] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 01/03/2021] [Indexed: 01/15/2023] Open
Abstract
An exploration-exploitation trade-off, the arbitration between sampling a lesser-known against a known rich option, is thought to be solved using computationally demanding exploration algorithms. Given known limitations in human cognitive resources, we hypothesised the presence of additional cheaper strategies. We examined for such heuristics in choice behaviour where we show this involves a value-free random exploration, that ignores all prior knowledge, and a novelty exploration that targets novel options alone. In a double-blind, placebo-controlled drug study, assessing contributions of dopamine (400 mg amisulpride) and noradrenaline (40 mg propranolol), we show that value-free random exploration is attenuated under the influence of propranolol, but not under amisulpride. Our findings demonstrate that humans deploy distinct computationally cheap exploration strategies and that value-free random exploration is under noradrenergic control.
Collapse
Affiliation(s)
- Magda Dubois
- Max Planck UCL Centre for Computational Psychiatry and Ageing ResearchLondonUnited Kingdom
- Wellcome Trust Centre for Neuroimaging, University College LondonLondonUnited Kingdom
| | - Johanna Habicht
- Max Planck UCL Centre for Computational Psychiatry and Ageing ResearchLondonUnited Kingdom
- Wellcome Trust Centre for Neuroimaging, University College LondonLondonUnited Kingdom
| | - Jochen Michely
- Max Planck UCL Centre for Computational Psychiatry and Ageing ResearchLondonUnited Kingdom
- Wellcome Trust Centre for Neuroimaging, University College LondonLondonUnited Kingdom
- Department of Psychiatry and Psychotherapy, Charité – Universitätsmedizin BerlinBerlinGermany
| | - Rani Moran
- Max Planck UCL Centre for Computational Psychiatry and Ageing ResearchLondonUnited Kingdom
- Wellcome Trust Centre for Neuroimaging, University College LondonLondonUnited Kingdom
| | - Ray J Dolan
- Max Planck UCL Centre for Computational Psychiatry and Ageing ResearchLondonUnited Kingdom
- Wellcome Trust Centre for Neuroimaging, University College LondonLondonUnited Kingdom
| | - Tobias U Hauser
- Max Planck UCL Centre for Computational Psychiatry and Ageing ResearchLondonUnited Kingdom
- Wellcome Trust Centre for Neuroimaging, University College LondonLondonUnited Kingdom
| |
Collapse
|
37
|
Lengersdorff LL, Wagner IC, Lockwood PL, Lamm C. When Implicit Prosociality Trumps Selfishness: The Neural Valuation System Underpins More Optimal Choices When Learning to Avoid Harm to Others Than to Oneself. J Neurosci 2020; 40:7286-7299. [PMID: 32839234 PMCID: PMC7534918 DOI: 10.1523/jneurosci.0842-20.2020] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Revised: 06/09/2020] [Accepted: 07/07/2020] [Indexed: 12/12/2022] Open
Abstract
Humans learn quickly which actions cause them harm. As social beings, we also need to learn to avoid actions that hurt others. It is currently unknown whether humans are as good at learning to avoid others' harm (prosocial learning) as they are at learning to avoid self-harm (self-relevant learning). Moreover, it remains unclear how the neural mechanisms of prosocial learning differ from those of self-relevant learning. In this fMRI study, 96 male human participants learned to avoid painful stimuli either for themselves or for another individual. We found that participants performed more optimally when learning for the other than for themselves. Computational modeling revealed that this could be explained by an increased sensitivity to subjective values of choice alternatives during prosocial learning. Increased value sensitivity was further associated with empathic traits. On the neural level, higher value sensitivity during prosocial learning was associated with stronger engagement of the ventromedial PFC during valuation. Moreover, the ventromedial PFC exhibited higher connectivity with the right temporoparietal junction during prosocial, compared with self-relevant, choices. Our results suggest that humans are particularly adept at learning to protect others from harm. This ability appears implemented by neural mechanisms overlapping with those supporting self-relevant learning, but with the additional recruitment of structures associated to the social brain. Our findings contrast with recent proposals that humans are egocentrically biased when learning to obtain monetary rewards for self or others. Prosocial tendencies may thus trump egocentric biases in learning when another person's physical integrity is at stake.SIGNIFICANCE STATEMENT We quickly learn to avoid actions that cause us harm. As "social animals," we also need to learn and consider the harmful consequences our actions might have for others. Here, we investigated how learning to protect others from pain (prosocial learning) differs from learning to protect oneself (self-relevant learning). We found that human participants performed better during prosocial learning than during self-relevant learning, as they were more sensitive toward the information they collected when making choices for the other. Prosocial learning recruited similar brain areas as self-relevant learning, but additionally involved parts of the "social brain" that underpin perspective-taking and self-other distinction. Our findings suggest that people show an inherent tendency toward "intuitive" prosociality.
Collapse
Affiliation(s)
- Lukas L Lengersdorff
- Social, Cognitive and Affective Neuroscience Unit, Department of Cognition, Emotion, and Methods in Psychology, Faculty of Psychology, University of Vienna, Vienna, 1010, Austria
| | - Isabella C Wagner
- Social, Cognitive and Affective Neuroscience Unit, Department of Cognition, Emotion, and Methods in Psychology, Faculty of Psychology, University of Vienna, Vienna, 1010, Austria
| | - Patricia L Lockwood
- Department of Experimental Psychology, University of Oxford, Oxford, OX1 3PH, United Kingdom
- Centre for Human Brain Health, University of Birmingham, Birmingham, B15 2TT, United Kingdom
| | - Claus Lamm
- Social, Cognitive and Affective Neuroscience Unit, Department of Cognition, Emotion, and Methods in Psychology, Faculty of Psychology, University of Vienna, Vienna, 1010, Austria
| |
Collapse
|
38
|
Sablotny-Wackershauser V, Betts MJ, Brunnlieb C, Apostolova I, Buchert R, Düzel E, Gruendler TOJ, Vogt B. Older adults show a reduced tendency to engage in context-dependent decision biases. Neuropsychologia 2020; 142:107445. [PMID: 32275966 DOI: 10.1016/j.neuropsychologia.2020.107445] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 02/19/2020] [Accepted: 03/25/2020] [Indexed: 11/16/2022]
Abstract
When we make decisions, we usually consider the context. This can sometimes lead to suboptimal choices or choice abnormalities. One such abnormality is the compromise effect, according to which deciders tend to favour options positioned as a compromise in an available set of extreme options. Theoretical accounts consider that these effects relate to available cognitive resources, which, in turn, have been found to depend on an individual's dopaminergic innervation. Referring to a correlative triad between cognition, dopamine and aging, the present study demonstrates that the compromise effect is replicable in a group of younger adults (n = 27, 20-32 years of age) yet is attenuated in older adults (n = 27, 62-80 years of age). Results from an [18F]-FDOPA-PET analysis in older adults indicate a positive association between older adults' inclination to engage in compromise effects and their striatal dopamine synthesis capacity. These results demonstrate altered context-dependent decision biases in older adults and suggest a neuromodulatory mechanism underlying this irregular choice.
Collapse
Affiliation(s)
- Verena Sablotny-Wackershauser
- Faculty of Economics and Management, Otto-von-Guericke-University Magdeburg, Germany; Harz University of Applied Sciences Wernigerode, Germany.
| | - Matthew J Betts
- Institute of Cognitive Neurology and Dementia Research, Otto-von-Guericke-University Magdeburg, Germany; German Centre for Neurodegenerative Diseases (DZNE), Magdeburg, Germany
| | | | - Ivayla Apostolova
- Department of Radiology and Nuclear Medicine, University Hospital Hamburg-Eppendorf, Germany
| | - Ralph Buchert
- Department of Radiology and Nuclear Medicine, University Hospital Hamburg-Eppendorf, Germany
| | - Emrah Düzel
- Institute of Cognitive Neurology and Dementia Research, Otto-von-Guericke-University Magdeburg, Germany; German Centre for Neurodegenerative Diseases (DZNE), Magdeburg, Germany; Institute of Cognitive Neuroscience, University College London, UK
| | - Theo O J Gruendler
- Faculty of Economics and Management, Otto-von-Guericke-University Magdeburg, Germany; Center for Military Mental Health, Military Hospital Berlin, Germany
| | - Bodo Vogt
- Faculty of Economics and Management, Otto-von-Guericke-University Magdeburg, Germany; Institute of Social Medicine and Health Economics, Otto-von-Guericke-University Magdeburg, Germany
| |
Collapse
|
39
|
Van Slooten JC, Jahfari S, Theeuwes J. Spontaneous eye blink rate predicts individual differences in exploration and exploitation during reinforcement learning. Sci Rep 2019; 9:17436. [PMID: 31758031 PMCID: PMC6874684 DOI: 10.1038/s41598-019-53805-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Accepted: 10/31/2019] [Indexed: 12/13/2022] Open
Abstract
Spontaneous eye blink rate (sEBR) has been linked to striatal dopamine function and to how individuals make value-based choices after a period of reinforcement learning (RL). While sEBR is thought to reflect how individuals learn from the negative outcomes of their choices, this idea has not been tested explicitly. This study assessed how individual differences in sEBR relate to learning by focusing on the cognitive processes that drive RL. Using Bayesian latent mixture modelling to quantify the mapping between RL behaviour and its underlying cognitive processes, we were able to differentiate low and high sEBR individuals at the level of these cognitive processes. Further inspection of these cognitive processes indicated that sEBR uniquely indexed explore-exploit tendencies during RL: lower sEBR predicted exploitative choices for high valued options, whereas higher sEBR predicted exploration of lower value options. This relationship was additionally supported by a network analysis where, notably, no link was observed between sEBR and how individuals learned from negative outcomes. Our findings challenge the notion that sEBR predicts learning from negative outcomes during RL, and suggest that sEBR predicts individual explore-exploit tendencies. These then influence value sensitivity during choices to support successful performance when facing uncertain reward.
Collapse
Affiliation(s)
- Joanne C Van Slooten
- Department of Experimental and Applied Psychology, Vrije Universiteit, Amsterdam, The Netherlands.
| | - Sara Jahfari
- Spinoza Centre for Neuroimaging, Royal Academy of Sciences, Amsterdam, The Netherlands
- Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands
| | - Jan Theeuwes
- Department of Experimental and Applied Psychology, Vrije Universiteit, Amsterdam, The Netherlands
| |
Collapse
|
40
|
Sebold M, Garbusow M, Jetzschmann P, Schad DJ, Nebe S, Schlagenhauf F, Heinz A, Rapp M, Romanczuk-Seiferth N. Reward and avoidance learning in the context of aversive environments and possible implications for depressive symptoms. Psychopharmacology (Berl) 2019; 236:2437-2449. [PMID: 31254091 PMCID: PMC6695365 DOI: 10.1007/s00213-019-05299-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Accepted: 06/05/2019] [Indexed: 01/22/2023]
Abstract
BACKGROUND Aversive stimuli in the environment influence human actions. This includes valence-dependent influences on action selection, e.g., increased avoidance but decreased approach behavior. However, it is yet unclear how aversive stimuli interact with complex learning and decision-making in the reward and avoidance domain. Moreover, the underlying computational mechanisms of these decision-making biases are unknown. METHODS To elucidate these mechanisms, 54 healthy young male subjects performed a two-step sequential decision-making task, which allows to computationally model different aspects of learning, e.g., model-free, habitual, and model-based, goal-directed learning. We used a within-subject design, crossing task valence (reward vs. punishment learning) with emotional context (aversive vs. neutral background stimuli). We analyzed choice data, applied a computational model, and performed simulations. RESULTS Whereas model-based learning was not affected, aversive stimuli interacted with model-free learning in a way that depended on task valence. Thus, aversive stimuli increased model-free avoidance learning but decreased model-free reward learning. The computational model confirmed this effect: the parameter lambda that indicates the influence of reward prediction errors on decision values was increased in the punishment condition but decreased in the reward condition when aversive stimuli were present. Further, by using the inferred computational parameters to simulate choice data, our effects were captured. Exploratory analyses revealed that the observed biases were associated with subclinical depressive symptoms. CONCLUSION Our data show that aversive environmental stimuli affect complex learning and decision-making, which depends on task valence. Further, we provide a model of the underlying computations of this affective modulation. Finally, our finding of increased decision-making biases in subjects reporting subclinical depressive symptoms matches recent reports of amplified Pavlovian influences on action selection in depression and suggests a potential vulnerability factor for mood disorders. We discuss our findings in the light of the involvement of the neuromodulators serotonin and dopamine.
Collapse
Affiliation(s)
- Miriam Sebold
- Department of Psychiatry and Psychotherapy, Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117, Berlin, Germany.
- Department for Social and Preventive Medicine, University of Potsdam, Potsdam, Germany.
| | - M Garbusow
- Department of Psychiatry and Psychotherapy, Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117, Berlin, Germany
| | - P Jetzschmann
- Department of Psychiatry and Psychotherapy, Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117, Berlin, Germany
| | - D J Schad
- Cognitive Science, University of Potsdam, Potsdam, Germany
| | - S Nebe
- Zurich Center for Neuroeconomics, Department of Economics, University of Zurich, Zurich, Switzerland
| | - F Schlagenhauf
- Department of Psychiatry and Psychotherapy, Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117, Berlin, Germany
- Max Planck Institute for Human Cognitive and Brain Sciences, 04303, Leipzig, Germany
| | - A Heinz
- Department of Psychiatry and Psychotherapy, Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117, Berlin, Germany
| | - M Rapp
- Department for Social and Preventive Medicine, University of Potsdam, Potsdam, Germany
| | - N Romanczuk-Seiferth
- Department of Psychiatry and Psychotherapy, Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117, Berlin, Germany
| |
Collapse
|
41
|
Impacts of inter-trial interval duration on a computational model of sign-tracking vs. goal-tracking behaviour. Psychopharmacology (Berl) 2019; 236:2373-2388. [PMID: 31367850 PMCID: PMC6695359 DOI: 10.1007/s00213-019-05323-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Accepted: 07/01/2019] [Indexed: 01/15/2023]
Abstract
In the context of Pavlovian conditioning, two types of behaviour may emerge within the population (Flagel et al. Nature, 469(7328): 53-57, 2011). Animals may choose to engage either with the conditioned stimulus (CS), a behaviour known as sign-tracking (ST) which is sensitive to dopamine inhibition for its acquisition, or with the food cup in which the reward or unconditioned stimulus (US) will eventually be delivered, a behaviour known as goal-tracking (GT) which is dependent on dopamine for its expression only. Previous work by Lesaint et al. (PLoS Comput Biol, 10(2), 2014) offered a computational explanation for these phenomena and led to the prediction that varying the duration of the inter-trial interval (ITI) would change the relative ST-GT proportion in the population as well as phasic dopamine responses. A recent study verified this prediction, but also found a rich variance of ST and GT behaviours within the trial which goes beyond the original computational model. In this paper, we provide a computational perspective on these novel results.
Collapse
|