1
|
Wang Y, Lak A, Manohar SG, Bogacz R. Dopamine encoding of novelty facilitates efficient uncertainty-driven exploration. PLoS Comput Biol 2024; 20:e1011516. [PMID: 38626219 PMCID: PMC11051659 DOI: 10.1371/journal.pcbi.1011516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 04/26/2024] [Accepted: 03/23/2024] [Indexed: 04/18/2024] Open
Abstract
When facing an unfamiliar environment, animals need to explore to gain new knowledge about which actions provide reward, but also put the newly acquired knowledge to use as quickly as possible. Optimal reinforcement learning strategies should therefore assess the uncertainties of these action-reward associations and utilise them to inform decision making. We propose a novel model whereby direct and indirect striatal pathways act together to estimate both the mean and variance of reward distributions, and mesolimbic dopaminergic neurons provide transient novelty signals, facilitating effective uncertainty-driven exploration. We utilised electrophysiological recording data to verify our model of the basal ganglia, and we fitted exploration strategies derived from the neural model to data from behavioural experiments. We also compared the performance of directed exploration strategies inspired by our basal ganglia model with other exploration algorithms including classic variants of upper confidence bound (UCB) strategy in simulation. The exploration strategies inspired by the basal ganglia model can achieve overall superior performance in simulation, and we found qualitatively similar results in fitting model to behavioural data compared with the fitting of more idealised normative models with less implementation level detail. Overall, our results suggest that transient dopamine levels in the basal ganglia that encode novelty could contribute to an uncertainty representation which efficiently drives exploration in reinforcement learning.
Collapse
Affiliation(s)
- Yuhao Wang
- MRC Brain Network Dynamics Unit, University of Oxford, Oxford, United Kingdom
| | - Armin Lak
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
| | - Sanjay G. Manohar
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, United Kingdom
| | - Rafal Bogacz
- MRC Brain Network Dynamics Unit, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
2
|
Blackwell KT, Doya K. Enhancing reinforcement learning models by including direct and indirect pathways improves performance on striatal dependent tasks. PLoS Comput Biol 2023; 19:e1011385. [PMID: 37594982 PMCID: PMC10479916 DOI: 10.1371/journal.pcbi.1011385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 09/05/2023] [Accepted: 07/25/2023] [Indexed: 08/20/2023] Open
Abstract
A major advance in understanding learning behavior stems from experiments showing that reward learning requires dopamine inputs to striatal neurons and arises from synaptic plasticity of cortico-striatal synapses. Numerous reinforcement learning models mimic this dopamine-dependent synaptic plasticity by using the reward prediction error, which resembles dopamine neuron firing, to learn the best action in response to a set of cues. Though these models can explain many facets of behavior, reproducing some types of goal-directed behavior, such as renewal and reversal, require additional model components. Here we present a reinforcement learning model, TD2Q, which better corresponds to the basal ganglia with two Q matrices, one representing direct pathway neurons (G) and another representing indirect pathway neurons (N). Unlike previous two-Q architectures, a novel and critical aspect of TD2Q is to update the G and N matrices utilizing the temporal difference reward prediction error. A best action is selected for N and G using a softmax with a reward-dependent adaptive exploration parameter, and then differences are resolved using a second selection step applied to the two action probabilities. The model is tested on a range of multi-step tasks including extinction, renewal, discrimination; switching reward probability learning; and sequence learning. Simulations show that TD2Q produces behaviors similar to rodents in choice and sequence learning tasks, and that use of the temporal difference reward prediction error is required to learn multi-step tasks. Blocking the update rule on the N matrix blocks discrimination learning, as observed experimentally. Performance in the sequence learning task is dramatically improved with two matrices. These results suggest that including additional aspects of basal ganglia physiology can improve the performance of reinforcement learning models, better reproduce animal behaviors, and provide insight as to the role of direct- and indirect-pathway striatal neurons.
Collapse
Affiliation(s)
- Kim T Blackwell
- Department of Bioengineering, Volgenau School of Engineering, George Mason University, Fairfax, Virginia, United States of America
| | - Kenji Doya
- Neural Computation Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| |
Collapse
|
3
|
Speers LJ, Bilkey DK. Maladaptive explore/exploit trade-offs in schizophrenia. Trends Neurosci 2023; 46:341-354. [PMID: 36878821 DOI: 10.1016/j.tins.2023.02.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 01/30/2023] [Accepted: 02/08/2023] [Indexed: 03/07/2023]
Abstract
Schizophrenia is a complex disorder that remains poorly understood, particularly at the systems level. In this opinion article we argue that the explore/exploit trade-off concept provides a holistic and ecologically valid framework to resolve some of the apparent paradoxes that have emerged within schizophrenia research. We review recent evidence suggesting that fundamental explore/exploit behaviors may be maladaptive in schizophrenia during physical, visual, and cognitive foraging. We also describe how theories from the broader optimal foraging literature, such as the marginal value theorem (MVT), could provide valuable insight into how aberrant processing of reward, context, and cost/effort evaluations interact to produce maladaptive responses.
Collapse
Affiliation(s)
- Lucinda J Speers
- Department of Psychology, University of Otago, Dunedin 9016, New Zealand
| | - David K Bilkey
- Department of Psychology, University of Otago, Dunedin 9016, New Zealand.
| |
Collapse
|
4
|
Trait somatic anxiety is associated with reduced directed exploration and underestimation of uncertainty. Nat Hum Behav 2023; 7:102-113. [PMID: 36192493 DOI: 10.1038/s41562-022-01455-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 08/26/2022] [Indexed: 02/01/2023]
Abstract
Anxiety has been related to decreased physical exploration, but past findings on the interaction between anxiety and exploration during decision making were inconclusive. Here we examined how latent factors of trait anxiety relate to different exploration strategies when facing volatility-induced uncertainty. Across two studies (total N = 985), we demonstrated that people used a hybrid of directed, random and undirected exploration strategies, which were respectively sensitive to relative uncertainty, total uncertainty and value difference. Trait somatic anxiety, that is, the propensity to experience physical symptoms of anxiety, was inversely correlated with directed exploration and undirected exploration, manifesting as a lesser likelihood for choosing the uncertain option and reducing choice stochasticity regardless of uncertainty. Somatic anxiety is also associated with underestimation of relative uncertainty. Together, these results reveal the selective role of trait somatic anxiety in modulating both uncertainty-driven and value-driven exploration strategies.
Collapse
|
5
|
Jepma M, Roy M, Ramlakhan K, van Velzen M, Dahan A. Different brain systems support learning from received and avoided pain during human pain-avoidance learning. eLife 2022; 11:74149. [PMID: 35731646 PMCID: PMC9217130 DOI: 10.7554/elife.74149] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 06/07/2022] [Indexed: 12/14/2022] Open
Abstract
Both unexpected pain and unexpected pain absence can drive avoidance learning, but whether they do so via shared or separate neural and neurochemical systems is largely unknown. To address this issue, we combined an instrumental pain-avoidance learning task with computational modeling, functional magnetic resonance imaging (fMRI), and pharmacological manipulations of the dopaminergic (100 mg levodopa) and opioidergic (50 mg naltrexone) systems (N = 83). Computational modeling provided evidence that untreated participants learned more from received than avoided pain. Our dopamine and opioid manipulations negated this learning asymmetry by selectively increasing learning rates for avoided pain. Furthermore, our fMRI analyses revealed that pain prediction errors were encoded in subcortical and limbic brain regions, whereas no-pain prediction errors were encoded in frontal and parietal cortical regions. However, we found no effects of our pharmacological manipulations on the neural encoding of prediction errors. Together, our results suggest that human pain-avoidance learning is supported by separate threat- and safety-learning systems, and that dopamine and endogenous opioids specifically regulate learning from successfully avoided pain.
Collapse
Affiliation(s)
- Marieke Jepma
- Department of Psychology, University of Amsterdam, Amsterdam, Netherlands.,Department of Psychology, Leiden University, Leiden, Netherlands.,Leiden Institute for Brain and Cognition, Leiden, Netherlands
| | - Mathieu Roy
- Department of Psychology, McGill University, Montreal, Canada.,Alan Edwards Centre for Research on Pain, McGill University, Montreal, Canada
| | - Kiran Ramlakhan
- Department of Psychology, Leiden University, Leiden, Netherlands.,Department of Research and Statistics, Municipality of Amsterdam, Amsterdam, Netherlands
| | - Monique van Velzen
- Department of Anesthesiology, Leiden University Medical Center, Leiden, Netherlands
| | - Albert Dahan
- Department of Anesthesiology, Leiden University Medical Center, Leiden, Netherlands
| |
Collapse
|
6
|
A neural and behavioral trade-off between value and uncertainty underlies exploratory decisions in normative anxiety. Mol Psychiatry 2022; 27:1573-1587. [PMID: 34725456 DOI: 10.1038/s41380-021-01363-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 10/10/2021] [Accepted: 10/14/2021] [Indexed: 11/08/2022]
Abstract
Exploration reduces uncertainty about the environment and improves the quality of future decisions, but at the cost of provisional uncertain and suboptimal outcomes. Although anxiety promotes intolerance to uncertainty, it remains unclear whether and by which mechanisms anxiety relates to exploratory decision-making. We use a dynamic three-armed-bandit task and find that higher trait-anxiety is associated with increased exploration, which in turn harms overall performance. We identify two distinct behavioral sources: first, decisions made by anxious individuals are guided toward reduction of uncertainty; and second, decisions are less guided by immediate value gains. These findings are similar in both loss and gain domains, and further demonstrate that an affective trait relates to exploration and results in an inverse-U-shaped relationship between anxiety and overall performance. Additional imaging data (fMRI) suggests that normative anxiety correlates negatively with the representation of expected-value in the dorsal-anterior-cingulate-cortex, and in contrast, positively with the representation of uncertainty in the anterior-insula. We conclude that a trade-off between value-gains and uncertainty-reduction entails maladaptive decision-making in individuals with higher normal-range anxiety.
Collapse
|
7
|
Mikhael JG, Gershman SJ. Impulsivity and risk-seeking as Bayesian inference under dopaminergic control. Neuropsychopharmacology 2022; 47:465-476. [PMID: 34376813 PMCID: PMC8674258 DOI: 10.1038/s41386-021-01125-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/01/2020] [Revised: 07/17/2021] [Accepted: 07/21/2021] [Indexed: 02/07/2023]
Abstract
Bayesian models successfully account for several of dopamine (DA)'s effects on contextual calibration in interval timing and reward estimation. In these models, tonic levels of DA control the precision of stimulus encoding, which is weighed against contextual information when making decisions. When DA levels are high, the animal relies more heavily on the (highly precise) stimulus encoding, whereas when DA levels are low, the context affects decisions more strongly. Here, we extend this idea to intertemporal choice and probability discounting tasks. In intertemporal choice tasks, agents must choose between a small reward delivered soon and a large reward delivered later, whereas in probability discounting tasks, agents must choose between a small reward that is always delivered and a large reward that may be omitted with some probability. Beginning with the principle that animals will seek to maximize their reward rates, we show that the Bayesian model predicts a number of curious empirical findings in both tasks. First, the model predicts that higher DA levels should normally promote selection of the larger/later option, which is often taken to imply that DA decreases 'impulsivity,' and promote selection of the large/risky option, often taken to imply that DA increases 'risk-seeking.' However, if the temporal precision is sufficiently decreased, higher DA levels should have the opposite effect-promoting selection of the smaller/sooner option (higher impulsivity) and the small/safe option (lower risk-seeking). Second, high enough levels of DA can result in preference reversals. Third, selectively decreasing the temporal precision, without manipulating DA, should promote selection of the larger/later and large/risky options. Fourth, when a different post-reward delay is associated with each option, animals will not learn the option-delay contingencies, but this learning can be salvaged when the post-reward delays are made more salient. Finally, the Bayesian model predicts correlations among behavioral phenotypes: Animals that are better timers will also appear less impulsive.
Collapse
Affiliation(s)
- John G. Mikhael
- grid.38142.3c000000041936754XProgram in Neuroscience, Harvard Medical School, Boston, MA USA ,grid.38142.3c000000041936754XMD-PhD Program, Harvard Medical School, Boston, MA USA
| | - Samuel J. Gershman
- grid.38142.3c000000041936754XDepartment of Psychology and Center for Brain Science, Harvard University, Cambridge, MA USA ,grid.116068.80000 0001 2341 2786Center for Brains, Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA USA
| |
Collapse
|
8
|
Bond K, Dunovan K, Porter A, Rubin JE, Verstynen T. Dynamic decision policy reconfiguration under outcome uncertainty. eLife 2021; 10:e65540. [PMID: 34951589 PMCID: PMC8806193 DOI: 10.7554/elife.65540] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Accepted: 12/23/2021] [Indexed: 11/18/2022] Open
Abstract
In uncertain or unstable environments, sometimes the best decision is to change your mind. To shed light on this flexibility, we evaluated how the underlying decision policy adapts when the most rewarding action changes. Human participants performed a dynamic two-armed bandit task that manipulated the certainty in relative reward (conflict) and the reliability of action-outcomes (volatility). Continuous estimates of conflict and volatility contributed to shifts in exploratory states by changing both the rate of evidence accumulation (drift rate) and the amount of evidence needed to make a decision (boundary height), respectively. At the trialwise level, following a switch in the optimal choice, the drift rate plummets and the boundary height weakly spikes, leading to a slow exploratory state. We find that the drift rate drives most of this response, with an unreliable contribution of boundary height across experiments. Surprisingly, we find no evidence that pupillary responses associated with decision policy changes. We conclude that humans show a stereotypical shift in their decision policies in response to environmental changes.
Collapse
Affiliation(s)
- Krista Bond
- Department of Psychology, Carnegie Mellon UniversityPittsburghUnited States
- Center for the Neural Basis of CognitionPittsburghUnited States
- Carnegie Mellon Neuroscience InstitutePittsburghUnited States
| | - Kyle Dunovan
- Department of Psychology, Carnegie Mellon UniversityPittsburghUnited States
| | - Alexis Porter
- Department of Psychology, Northwestern UniversityEvanstonUnited States
| | - Jonathan E Rubin
- Center for the Neural Basis of CognitionPittsburghUnited States
- Department of Mathematics, University of PittsburghPittsburghUnited States
| | - Timothy Verstynen
- Department of Psychology, Carnegie Mellon UniversityPittsburghUnited States
- Center for the Neural Basis of CognitionPittsburghUnited States
- Carnegie Mellon Neuroscience InstitutePittsburghUnited States
- Department of Biomedical Engineering, Carnegie Mellon UniversityPittsburghUnited States
| |
Collapse
|
9
|
Barnes K, Rottman BM, Colagiuri B. The placebo effect: To explore or to exploit? Cognition 2021; 214:104753. [PMID: 34023671 DOI: 10.1016/j.cognition.2021.104753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 04/21/2021] [Accepted: 04/26/2021] [Indexed: 11/26/2022]
Abstract
How people choose between options with differing outcomes (explore-exploit) is a central question to understanding human behaviour. However, the standard explore-exploit paradigm relies on gamified tasks with low-stake outcomes. Consequently, little is known about decision making for biologically-relevant stimuli. Here, we combined placebo and explore-exploit paradigms to examine detection and selection of the most effective treatment in a pain model. During conditioning, where 'optimal' and 'suboptimal' sham-treatments were paired with a reduction in electrical pain stimulation, participants learnt which treatment most successfully reduced pain. Modelling participant responses revealed three important findings. First, participants' choices reflected both directed and random exploration. Second, expectancy modulated pain - indicative of recursive placebo effects. Third, individual differences in terms of expectancy during conditioning predicted placebo effects during a subsequent test phase. These findings reveal directed and random exploration when the outcome is biologically-relevant. Moreover, this research shows how placebo and explore-exploit literatures can be unified.
Collapse
|
10
|
Gilbertson T, Steele D. Tonic dopamine, uncertainty and basal ganglia action selection. Neuroscience 2021; 466:109-124. [PMID: 34015370 DOI: 10.1016/j.neuroscience.2021.05.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 05/04/2021] [Accepted: 05/08/2021] [Indexed: 11/29/2022]
Abstract
To make optimal decisions in uncertain circumstances flexible adaption of behaviour is required; exploring alternatives when the best choice is unknown, exploiting what is known when that is best. Using a computational model of the basal ganglia, we propose that switches between exploratory and exploitative decisions are mediated by the interaction between tonic dopamine and cortical input to the basal ganglia. We show that a biologically detailed action selection circuit model, endowed with dopamine dependant striatal plasticity, can optimally solve the explore-exploit problem, estimating the true underlying state of a noisy Gaussian diffusion process. Critical to the model's performance was a fluctuating level of tonic dopamine which increased under conditions of uncertainty. With an optimal range of tonic dopamine, explore-exploit decisions were mediated by the effects of tonic dopamine on the precision of the model action selection mechanism. Under conditions of uncertain reward pay-out, the model's reduced selectivity allowed disinhibition of multiple alternative actions to be explored at random. Conversely, when uncertainly about reward pay-out was low, enhanced selectivity of the action selection circuit facilitated exploitation of the high value choice. Model performance was at the level of a Kalman filter which provides an optimal solution for the task. These simulations support the idea that this subcortical neural circuit may have evolved to facilitate decision making in non-stationary reward environments. The model generates several experimental predictions with relevance to abnormal decision making in neuropsychiatric and neurological disease.
Collapse
Affiliation(s)
- Tom Gilbertson
- Department of Neurology, Level 6, South Block, Ninewells Hospital & Medical School, Dundee DD2 4BF, UK; Division of Imaging Science and Technology, Medical School, University of Dundee, DD2 4BF, UK.
| | - Douglas Steele
- Division of Imaging Science and Technology, Medical School, University of Dundee, DD2 4BF, UK
| |
Collapse
|
11
|
Wilson RC, Bonawitz E, Costa VD, Ebitz RB. Balancing exploration and exploitation with information and randomization. Curr Opin Behav Sci 2021; 38:49-56. [PMID: 33184605 PMCID: PMC7654823 DOI: 10.1016/j.cobeha.2020.10.001] [Citation(s) in RCA: 62] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Explore-exploit decisions require us to trade off the benefits of exploring unknown options to learn more about them, with exploiting known options, for immediate reward. Such decisions are ubiquitous in nature, but from a computational perspective, they are notoriously hard. There is therefore much interest in how humans and animals make these decisions and recently there has been an explosion of research in this area. Here we provide a biased and incomplete snapshot of this field focusing on the major finding that many organisms use two distinct strategies to solve the explore-exploit dilemma: a bias for information ('directed exploration') and the randomization of choice ('random exploration'). We review evidence for the existence of these strategies, their computational properties, their neural implementations, as well as how directed and random exploration vary over the lifespan. We conclude by highlighting open questions in this field that are ripe to both explore and exploit.
Collapse
Affiliation(s)
- Robert C. Wilson
- Department of Psychology, University of Arizona, Tucson AZ USA
- Cognitive Science Program, University of Arizona, Tucson AZ USA
- Evelyn F. McKnight Brain Institute, University of Arizona, Tucson AZ USA
| | | | - Vincent D. Costa
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland OR USA
| | - R. Becket Ebitz
- Department of Neuroscience, University of Montréal, Montréal, Québec, Canada
| |
Collapse
|
12
|
Wiehler A, Chakroun K, Peters J. Attenuated Directed Exploration during Reinforcement Learning in Gambling Disorder. J Neurosci 2021; 41:2512-2522. [PMID: 33531415 PMCID: PMC7984586 DOI: 10.1523/jneurosci.1607-20.2021] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Revised: 01/18/2021] [Accepted: 01/22/2021] [Indexed: 12/30/2022] Open
Abstract
Gambling disorder (GD) is a behavioral addiction associated with impairments in value-based decision-making and behavioral flexibility and might be linked to changes in the dopamine system. Maximizing long-term rewards requires a flexible trade-off between the exploitation of known options and the exploration of novel options for information gain. This exploration-exploitation trade-off is thought to depend on dopamine neurotransmission. We hypothesized that human gamblers would show a reduction in directed (uncertainty-based) exploration, accompanied by changes in brain activity in a fronto-parietal exploration-related network. Twenty-three frequent, non-treatment seeking gamblers and twenty-three healthy matched controls (all male) performed a four-armed bandit task during functional magnetic resonance imaging (fMRI). Computational modeling using hierarchical Bayesian parameter estimation revealed signatures of directed exploration, random exploration, and perseveration in both groups. Gamblers showed a reduction in directed exploration, whereas random exploration and perseveration were similar between groups. Neuroimaging revealed no evidence for group differences in neural representations of basic task variables (expected value, prediction errors). Our hypothesis of reduced frontal pole (FP) recruitment in gamblers was not supported. Exploratory analyses showed that during directed exploration, gamblers showed reduced parietal cortex and substantia-nigra/ventral-tegmental-area activity. Cross-validated classification analyses revealed that connectivity in an exploration-related network was predictive of group status, suggesting that connectivity patterns might be more predictive of problem gambling than univariate effects. Findings reveal specific reductions of strategic exploration in gamblers that might be linked to altered processing in a fronto-parietal network and/or changes in dopamine neurotransmission implicated in GD.SIGNIFICANCE STATEMENT Wiehler et al. (2021) report that gamblers rely less on the strategic exploration of unknown, but potentially better rewards during reward learning. This is reflected in a related network of brain activity. Parameters of this network can be used to predict the presence of problem gambling behavior in participants.
Collapse
Affiliation(s)
- A Wiehler
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg 20246, Germany
- Université de Paris, Paris F-75006, France
- Department of Psychiatry, Service Hospitalo-Universitaire, Groupe Hospitalier Universitaire Paris Psychiatrie & Neurosciences, Paris F-75014, France
| | - K Chakroun
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg 20246, Germany
| | - J Peters
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg 20246, Germany
- Department of Psychology, Biological Psychology, University of Cologne, Cologne 50923, Germany
| |
Collapse
|
13
|
Mikhael JG, Lai L, Gershman SJ. Rational inattention and tonic dopamine. PLoS Comput Biol 2021; 17:e1008659. [PMID: 33760806 PMCID: PMC7990190 DOI: 10.1371/journal.pcbi.1008659] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Accepted: 12/28/2020] [Indexed: 11/27/2022] Open
Abstract
Slow-timescale (tonic) changes in dopamine (DA) contribute to a wide variety of processes in reinforcement learning, interval timing, and other domains. Furthermore, changes in tonic DA exert distinct effects depending on when they occur (e.g., during learning vs. performance) and what task the subject is performing (e.g., operant vs. classical conditioning). Two influential theories of tonic DA-the average reward theory and the Bayesian theory in which DA controls precision-have each been successful at explaining a subset of empirical findings. But how the same DA signal performs two seemingly distinct functions without creating crosstalk is not well understood. Here we reconcile the two theories under the unifying framework of 'rational inattention,' which (1) conceptually links average reward and precision, (2) outlines how DA manipulations affect this relationship, and in so doing, (3) captures new empirical phenomena. In brief, rational inattention asserts that agents can increase their precision in a task (and thus improve their performance) by paying a cognitive cost. Crucially, whether this cost is worth paying depends on average reward availability, reported by DA. The monotonic relationship between average reward and precision means that the DA signal contains the information necessary to retrieve the precision. When this information is needed after the task is performed, as presumed by Bayesian inference, acute manipulations of DA will bias behavior in predictable ways. We show how this framework reconciles a remarkably large collection of experimental findings. In reinforcement learning, the rational inattention framework predicts that learning from positive and negative feedback should be enhanced in high and low DA states, respectively, and that DA should tip the exploration-exploitation balance toward exploitation. In interval timing, this framework predicts that DA should increase the speed of the internal clock and decrease the extent of interference by other temporal stimuli during temporal reproduction (the central tendency effect). Finally, rational inattention makes the new predictions that these effects should be critically dependent on the controllability of rewards, that post-reward delays in intertemporal choice tasks should be underestimated, and that average reward manipulations should affect the speed of the clock-thus capturing empirical findings that are unexplained by either theory alone. Our results suggest that a common computational repertoire may underlie the seemingly heterogeneous roles of DA.
Collapse
Affiliation(s)
- John G. Mikhael
- Program in Neuroscience, Harvard Medical School, Boston, Massachusetts, United States of America
- MD-PhD Program, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Lucy Lai
- Program in Neuroscience, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Samuel J. Gershman
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
| |
Collapse
|
14
|
Cathomas F, Klaus F, Guetter K, Chung HK, Raja Beharelle A, Spiller TR, Schlegel R, Seifritz E, Hartmann-Riemer MN, Tobler PN, Kaiser S. Increased random exploration in schizophrenia is associated with inflammation. NPJ SCHIZOPHRENIA 2021; 7:6. [PMID: 33536449 PMCID: PMC7859392 DOI: 10.1038/s41537-020-00133-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Accepted: 11/24/2020] [Indexed: 01/30/2023]
Abstract
One aspect of goal-directed behavior, which is known to be impaired in patients with schizophrenia (SZ), is balancing between exploiting a familiar choice with known reward value and exploring a lesser known, but potentially more rewarding option. Despite its relevance to several symptom domains of SZ, this has received little attention in SZ research. In addition, while there is increasing evidence that SZ is associated with chronic low-grade inflammation, few studies have investigated how this relates to specific behaviors, such as balancing exploration and exploitation. We therefore assessed behaviors underlying the exploration-exploitation trade-off using a three-armed bandit task in 45 patients with SZ and 19 healthy controls (HC). This task allowed us to dissociate goal-unrelated (random) from goal-related (directed) exploration and correlate them with psychopathological symptoms. Moreover, we assessed a broad range of inflammatory proteins in the blood and related them to bandit task behavior. We found that, compared to HC, patients with SZ showed reduced task performance. This impairment was due to a shift from exploitation to random exploration, which was associated with symptoms of disorganization. Relative to HC, patients with SZ showed a pro-inflammatory blood profile. Furthermore, high-sensitivity C-reactive protein (hsCRP) positively correlated with random exploration, but not with directed exploration or exploitation. In conclusion, we show that low-grade inflammation in patients with SZ is associated with random exploration, which can be considered a behavioral marker for disorganization. hsCRP may constitute a marker for severity of, and a potential treatment target for maladaptive exploratory behaviors.
Collapse
Affiliation(s)
- Flurin Cathomas
- grid.7400.30000 0004 1937 0650Department of Psychiatry, Psychotherapy and Psychosomatics, Psychiatric Hospital, University of Zurich, 8032 Zurich, Switzerland ,grid.59734.3c0000 0001 0670 2351Fishberg Department of Neuroscience, Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Federica Klaus
- grid.7400.30000 0004 1937 0650Department of Psychiatry, Psychotherapy and Psychosomatics, Psychiatric Hospital, University of Zurich, 8032 Zurich, Switzerland ,grid.266100.30000 0001 2107 4242Department of Psychiatry, University of California San Diego, San Diego, USA
| | - Karoline Guetter
- grid.7400.30000 0004 1937 0650Department of Psychiatry, Psychotherapy and Psychosomatics, Psychiatric Hospital, University of Zurich, 8032 Zurich, Switzerland
| | - Hui-Kuan Chung
- grid.7400.30000 0004 1937 0650Zurich Center for Neuroeconomics, Department of Economics, University of Zurich, 8006 Zurich, Switzerland
| | - Anjali Raja Beharelle
- grid.7400.30000 0004 1937 0650Zurich Center for Neuroeconomics, Department of Economics, University of Zurich, 8006 Zurich, Switzerland ,grid.7400.30000 0004 1937 0650Neuroscience Center Zurich, ETH Zurich and University of Zurich, 8057 Zurich, Switzerland
| | - Tobias R. Spiller
- University of Zurich, University Hospital Zurich, Department of Consultation-Liaison Psychiatry and Psychosomatic Medicine, Ramistrasse 100, 8091 Zurich, Switzerland
| | - Rebecca Schlegel
- grid.7400.30000 0004 1937 0650Department of Psychiatry, Psychotherapy and Psychosomatics, Psychiatric Hospital, University of Zurich, 8032 Zurich, Switzerland
| | - Erich Seifritz
- grid.7400.30000 0004 1937 0650Department of Psychiatry, Psychotherapy and Psychosomatics, Psychiatric Hospital, University of Zurich, 8032 Zurich, Switzerland ,grid.7400.30000 0004 1937 0650Neuroscience Center Zurich, ETH Zurich and University of Zurich, 8057 Zurich, Switzerland ,grid.7400.30000 0004 1937 0650Zurich Center for Integrative Human Physiology, University of Zurich, 8057 Zurich, Switzerland
| | - Matthias N. Hartmann-Riemer
- grid.7400.30000 0004 1937 0650Department of Psychiatry, Psychotherapy and Psychosomatics, Psychiatric Hospital, University of Zurich, 8032 Zurich, Switzerland
| | - Philippe N. Tobler
- grid.7400.30000 0004 1937 0650Zurich Center for Neuroeconomics, Department of Economics, University of Zurich, 8006 Zurich, Switzerland ,grid.7400.30000 0004 1937 0650Neuroscience Center Zurich, ETH Zurich and University of Zurich, 8057 Zurich, Switzerland ,grid.7400.30000 0004 1937 0650Zurich Center for Integrative Human Physiology, University of Zurich, 8057 Zurich, Switzerland
| | - Stefan Kaiser
- grid.150338.c0000 0001 0721 9812Division of Adult Psychiatry, Department of Psychiatry, Geneva University Hospitals, Chemin du Petit-Bel-Air, 1225 Chêne-Bourg, Switzerland
| |
Collapse
|
15
|
|
16
|
Chakroun K, Mathar D, Wiehler A, Ganzer F, Peters J. Dopaminergic modulation of the exploration/exploitation trade-off in human decision-making. eLife 2020; 9:e51260. [PMID: 32484779 PMCID: PMC7266623 DOI: 10.7554/elife.51260] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Accepted: 05/01/2020] [Indexed: 01/15/2023] Open
Abstract
Involvement of dopamine in regulating exploration during decision-making has long been hypothesized, but direct causal evidence in humans is still lacking. Here, we use a combination of computational modeling, pharmacological intervention and functional magnetic resonance imaging to address this issue. Thirty-one healthy male participants performed a restless four-armed bandit task in a within-subjects design under three drug conditions: 150 mg of the dopamine precursor L-dopa, 2 mg of the D2 receptor antagonist haloperidol, and placebo. Choices were best explained by an extension of an established Bayesian learning model accounting for perseveration, directed exploration and random exploration. Modeling revealed attenuated directed exploration under L-dopa, while neural signatures of exploration, exploitation and prediction error were unaffected. Instead, L-dopa attenuated neural representations of overall uncertainty in insula and dorsal anterior cingulate cortex. Our results highlight the computational role of these regions in exploration and suggest that dopamine modulates how this circuit tracks accumulating uncertainty during decision-making.
Collapse
Affiliation(s)
- Karima Chakroun
- Department of Systems Neuroscience, University Medical Center Hamburg-EppendorfHamburgGermany
| | - David Mathar
- Department of Psychology, Biological Psychology, University of CologneCologneGermany
| | - Antonius Wiehler
- Department of Systems Neuroscience, University Medical Center Hamburg-EppendorfHamburgGermany
- Institut du Cerveau et de la Moelle épinière - ICM, Centre de NeuroImagerie de Recherche - CENIR, Sorbonne Universités, Groupe Hospitalier Pitié-SalpêtrièreParisFrance
| | - Florian Ganzer
- German Center for Addiction Research in Childhood and Adolescence, University Medical Center Hamburg-EppendorfHamburgGermany
| | - Jan Peters
- Department of Systems Neuroscience, University Medical Center Hamburg-EppendorfHamburgGermany
- Department of Psychology, Biological Psychology, University of CologneCologneGermany
| |
Collapse
|
17
|
Tomov MS, Truong VQ, Hundia RA, Gershman SJ. Dissociable neural correlates of uncertainty underlie different exploration strategies. Nat Commun 2020; 11:2371. [PMID: 32398675 PMCID: PMC7217879 DOI: 10.1038/s41467-020-15766-z] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Accepted: 03/12/2020] [Indexed: 01/27/2023] Open
Abstract
Most real-world decisions involve a delicate balance between exploring unfamiliar alternatives and committing to the best known option. Previous work has shown that humans rely on different forms of uncertainty to negotiate this "explore-exploit" trade-off, yet the neural basis of the underlying computations remains unclear. Using fMRI (n = 31), we find that relative uncertainty is represented in right rostrolateral prefrontal cortex and drives directed exploration, while total uncertainty is represented in right dorsolateral prefrontal cortex and drives random exploration. The decision value signal combining relative and total uncertainty to compute choice is reflected in motor cortex activity. The variance of this signal scales with total uncertainty, consistent with a sampling mechanism for random exploration. Overall, these results are consistent with a hybrid computational architecture in which different uncertainty computations are performed separately and then combined by downstream decision circuits to compute choice.
Collapse
Affiliation(s)
- Momchil S Tomov
- Program in Neuroscience, Harvard Medical School, Boston, MA, 02115, USA.
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA, 02138, USA.
| | - Van Q Truong
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA, 02138, USA
| | - Rohan A Hundia
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA, 02138, USA
| | - Samuel J Gershman
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA, 02138, USA
| |
Collapse
|
18
|
Sadeghiyeh H, Wang S, Alberhasky MR, Kyllo HM, Shenhav A, Wilson RC. Temporal discounting correlates with directed exploration but not with random exploration. Sci Rep 2020; 10:4020. [PMID: 32132573 PMCID: PMC7055215 DOI: 10.1038/s41598-020-60576-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Accepted: 02/12/2020] [Indexed: 11/09/2022] Open
Abstract
The explore-exploit dilemma describes the trade off that occurs any time we must choose between exploring unknown options and exploiting options we know well. Implicit in this trade off is how we value future rewards - exploiting is usually better in the short term, but in the longer term the benefits of exploration can be huge. Thus, in theory there should be a tight connection between how much people value future rewards, i.e. how much they discount future rewards relative to immediate rewards, and how likely they are to explore, with less 'temporal discounting' associated with more exploration. By measuring individual differences in temporal discounting and correlating them with explore-exploit behavior, we tested whether this theoretical prediction holds in practice. We used the 27-item Delay-Discounting Questionnaire to estimate temporal discounting and the Horizon Task to quantify two strategies of explore-exploit behavior: directed exploration, where information drives exploration by choice, and random exploration, where behavioral variability drives exploration by chance. We find a clear correlation between temporal discounting and directed exploration, with more temporal discounting leading to less directed exploration. Conversely, we find no relationship between temporal discounting and random exploration. Unexpectedly, we find that the relationship with directed exploration appears to be driven by a correlation between temporal discounting and uncertainty seeking at short time horizons, rather than information seeking at long horizons. Taken together our results suggest a nuanced relationship between temporal discounting and explore-exploit behavior that may be mediated by multiple factors.
Collapse
Affiliation(s)
- Hashem Sadeghiyeh
- Department of Psychology, University of Arizona, Tucson, USA. .,Department of Psychological Science, Missouri University of Science and Technology, Rolla, USA.
| | - Siyu Wang
- Department of Psychology, University of Arizona, Tucson, USA
| | | | - Hannah M Kyllo
- Department of Psychology, University of Arizona, Tucson, USA
| | - Amitai Shenhav
- Department of Cognitive, Linguistic, & Psychological Sciences, Brown University, Providence, USA
| | - Robert C Wilson
- Department of Psychology, University of Arizona, Tucson, USA.,Cognitive Science Program, University of Arizona, Tucson, USA
| |
Collapse
|
19
|
Adams RA, Moutoussis M, Nour MM, Dahoun T, Lewis D, Illingworth B, Veronese M, Mathys C, de Boer L, Guitart-Masip M, Friston KJ, Howes OD, Roiser JP. Variability in Action Selection Relates to Striatal Dopamine 2/3 Receptor Availability in Humans: A PET Neuroimaging Study Using Reinforcement Learning and Active Inference Models. Cereb Cortex 2020; 30:3573-3589. [PMID: 32083297 PMCID: PMC7233027 DOI: 10.1093/cercor/bhz327] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Revised: 11/18/2019] [Accepted: 12/05/2019] [Indexed: 12/17/2022] Open
Abstract
Choosing actions that result in advantageous outcomes is a fundamental function of nervous systems. All computational decision-making models contain a mechanism that controls the variability of (or confidence in) action selection, but its neural implementation is unclear-especially in humans. We investigated this mechanism using two influential decision-making frameworks: active inference (AI) and reinforcement learning (RL). In AI, the precision (inverse variance) of beliefs about policies controls action selection variability-similar to decision 'noise' parameters in RL-and is thought to be encoded by striatal dopamine signaling. We tested this hypothesis by administering a 'go/no-go' task to 75 healthy participants, and measuring striatal dopamine 2/3 receptor (D2/3R) availability in a subset (n = 25) using [11C]-(+)-PHNO positron emission tomography. In behavioral model comparison, RL performed best across the whole group but AI performed best in participants performing above chance levels. Limbic striatal D2/3R availability had linear relationships with AI policy precision (P = 0.029) as well as with RL irreducible decision 'noise' (P = 0.020), and this relationship with D2/3R availability was confirmed with a 'decision stochasticity' factor that aggregated across both models (P = 0.0006). These findings are consistent with occupancy of inhibitory striatal D2/3Rs decreasing the variability of action selection in humans.
Collapse
Affiliation(s)
- Rick A Adams
- Institute of Cognitive Neuroscience, University College London, London WC1N 3AZ, UK.,Division of Psychiatry, University College London, London W1T 7NF, UK.,Psychiatric Imaging Group, Robert Steiner MRI Unit, MRC London Institute of Medical Sciences, Hammersmith Hospital, London W12 0NN, UK.,Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Hospital, London W12 0NN, UK
| | - Michael Moutoussis
- Wellcome Centre for Human Neuroimaging, University College London, London WC1N 3BG, UK.,Max Planck-UCL Centre for Computational Psychiatry and Ageing Research, London WC1B 5EH, UK
| | - Matthew M Nour
- Psychiatric Imaging Group, Robert Steiner MRI Unit, MRC London Institute of Medical Sciences, Hammersmith Hospital, London W12 0NN, UK.,Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Hospital, London W12 0NN, UK.,Department of Psychosis Studies, Institute of Psychiatry, Psychology & Neuroscience (IoPPN), King's College London, London SE5 8AF, UK
| | - Tarik Dahoun
- Psychiatric Imaging Group, Robert Steiner MRI Unit, MRC London Institute of Medical Sciences, Hammersmith Hospital, London W12 0NN, UK.,Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Hospital, London W12 0NN, UK.,Department of Psychiatry, University of Oxford, Warneford Hospital, Oxford OX3 7JX, UK
| | - Declan Lewis
- Institute of Cognitive Neuroscience, University College London, London WC1N 3AZ, UK
| | - Benjamin Illingworth
- Institute of Cognitive Neuroscience, University College London, London WC1N 3AZ, UK
| | - Mattia Veronese
- Centre for Neuroimaging Sciences, Institute of Psychiatry, Psychology & Neuroscience (IoPPN), King's College London, London SE5 8AF, UK
| | - Christoph Mathys
- Max Planck-UCL Centre for Computational Psychiatry and Ageing Research, London WC1B 5EH, UK.,Scuola Internazionale Superiore di Studi Avanzati (SISSA), 34136 Trieste, Italy.,Translational Neuromodeling Unit (TNU), Institute for Biomedical Engineering, University of Zurich and ETH Zurich, 8032 Zurich, Switzerland
| | - Lieke de Boer
- Aging Research Center, Karolinska Institute, 171 65 Stockholm, Sweden
| | - Marc Guitart-Masip
- Max Planck-UCL Centre for Computational Psychiatry and Ageing Research, London WC1B 5EH, UK.,Aging Research Center, Karolinska Institute, 171 65 Stockholm, Sweden
| | - Karl J Friston
- Wellcome Centre for Human Neuroimaging, University College London, London WC1N 3BG, UK
| | - Oliver D Howes
- Psychiatric Imaging Group, Robert Steiner MRI Unit, MRC London Institute of Medical Sciences, Hammersmith Hospital, London W12 0NN, UK.,Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Hospital, London W12 0NN, UK.,Department of Psychosis Studies, Institute of Psychiatry, Psychology & Neuroscience (IoPPN), King's College London, London SE5 8AF, UK
| | - Jonathan P Roiser
- Institute of Cognitive Neuroscience, University College London, London WC1N 3AZ, UK
| |
Collapse
|
20
|
Verharen JPH, Adan RAH, Vanderschuren LJMJ. Differential contributions of striatal dopamine D1 and D2 receptors to component processes of value-based decision making. Neuropsychopharmacology 2019; 44:2195-2204. [PMID: 31254972 PMCID: PMC6897916 DOI: 10.1038/s41386-019-0454-0] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Revised: 06/17/2019] [Accepted: 06/21/2019] [Indexed: 01/26/2023]
Abstract
Dopamine has been implicated in value-based learning and decision making by signaling reward prediction errors and facilitating cognitive flexibility, incentive motivation, and voluntary movement. Dopamine receptors can roughly be divided into the D1 and D2 subtypes, and it has been hypothesized that these two types of receptors have an opposite function in facilitating reward-related and aversion-related behaviors, respectively. Here, we tested the contribution of striatal dopamine D1 and D2 receptors to processes underlying value-based learning and decision making in rats, employing a probabilistic reversal learning paradigm. Using computational trial-by-trial analysis of task behavior after systemic or intracranial treatment with dopamine D1 and D2 receptor agonists and antagonists, we show that negative feedback learning can be modulated through D2 receptor signaling and positive feedback learning through D1 receptor signaling in the ventral striatum. Furthermore, stimulation of D2 receptors in the ventral or dorsolateral (but not dorsomedial) striatum promoted explorative choice behavior, suggesting an additional function of dopamine in these areas in value-based decision making. Finally, treatment with most dopaminergic drugs affected response latencies and number of trials completed, which was also seen after infusion of D2, but not D1 receptor-acting drugs into the striatum. Together, our data support the idea that dopamine D1 and D2 receptors have complementary functions in learning on the basis of emotionally valenced feedback, and provide evidence that dopamine facilitates value-based and motivated behaviors through distinct striatal regions.
Collapse
Affiliation(s)
- Jeroen P. H. Verharen
- 0000000090126352grid.7692.aDepartment of Translational Neuroscience, Brain Center Rudolf Magnus, University Medical Center Utrecht, Universiteitsweg 100, 3584 CG Utrecht, the Netherlands ,0000000120346234grid.5477.1Department of Animals in Science and Society, Division of Behavioural Neuroscience, Faculty of Veterinary Medicine, Utrecht University, Yalelaan 2, 3584 CM Utrecht, the Netherlands
| | - Roger A. H. Adan
- 0000000090126352grid.7692.aDepartment of Translational Neuroscience, Brain Center Rudolf Magnus, University Medical Center Utrecht, Universiteitsweg 100, 3584 CG Utrecht, the Netherlands
| | - Louk J. M. J. Vanderschuren
- 0000000120346234grid.5477.1Department of Animals in Science and Society, Division of Behavioural Neuroscience, Faculty of Veterinary Medicine, Utrecht University, Yalelaan 2, 3584 CM Utrecht, the Netherlands
| |
Collapse
|
21
|
Dopamine blockade impairs the exploration-exploitation trade-off in rats. Sci Rep 2019; 9:6770. [PMID: 31043685 PMCID: PMC6494917 DOI: 10.1038/s41598-019-43245-z] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Accepted: 04/18/2019] [Indexed: 01/30/2023] Open
Abstract
In a volatile environment where rewards are uncertain, successful performance requires a delicate balance between exploitation of the best option and exploration of alternative choices. It has theoretically been proposed that dopamine contributes to the control of this exploration-exploitation trade-off, specifically that the higher the level of tonic dopamine, the more exploitation is favored. We demonstrate here that there is a formal relationship between the rescaling of dopamine positive reward prediction errors and the exploration-exploitation trade-off in simple non-stationary multi-armed bandit tasks. We further show in rats performing such a task that systemically antagonizing dopamine receptors greatly increases the number of random choices without affecting learning capacities. Simulations and comparison of a set of different computational models (an extended Q-learning model, a directed exploration model, and a meta-learning model) fitted on each individual confirm that, independently of the model, decreasing dopaminergic activity does not affect learning rate but is equivalent to an increase in random exploration rate. This study shows that dopamine could adapt the exploration-exploitation trade-off in decision-making when facing changing environmental contingencies.
Collapse
|