Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Cinotti F, Fresno V, Aklil N, Coutureau E, Girard B, Marchand AR, Khamassi M. Dopamine blockade impairs the exploration-exploitation trade-off in rats. Sci Rep 2019;9:6770. [PMID: 31043685 DOI: 10.1038/s41598-019-43245-z] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Accepted: 04/18/2019] [Indexed: 01/30/2023] Open

For:	Cinotti F, Fresno V, Aklil N, Coutureau E, Girard B, Marchand AR, Khamassi M. Dopamine blockade impairs the exploration-exploitation trade-off in rats. Sci Rep 2019;9:6770. [PMID: 31043685 DOI: 10.1038/s41598-019-43245-z] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Accepted: 04/18/2019] [Indexed: 01/30/2023] Open

Number

Cited by Other Article(s)

Ohta H, Nozawa T, Nakano T, Morimoto Y, Ishizuka T. Nonlinear age-related differences in probabilistic learning in mice: A 5-armed bandit task study. Neurobiol Aging 2024;142:8-16. [PMID: 39029360 DOI: 10.1016/j.neurobiolaging.2024.06.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 06/17/2024] [Accepted: 06/19/2024] [Indexed: 07/21/2024]

Neville V, Finnegan E, Paul ES, Davidson M, Dayan P, Mendl M. You are How You Eat: Foraging Behavior as a Potential Novel Marker of Rat Affective State. AFFECTIVE SCIENCE 2024;5:232-245. [PMID: 39391344 PMCID: PMC11461729 DOI: 10.1007/s42761-024-00242-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Accepted: 05/20/2024] [Indexed: 10/12/2024]

Cinotti F, Coutureau E, Khamassi M, Marchand AR, Girard B. Regulation of reinforcement learning parameters captures long-term changes in rat behaviour. Eur J Neurosci 2024;60:4469-4490. [PMID: 38923238 DOI: 10.1111/ejn.16449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 05/14/2024] [Accepted: 06/05/2024] [Indexed: 06/28/2024]

Lazebnik T, Golov Y, Gurka R, Harari A, Liberzon A. Exploration-exploitation model of moth-inspired olfactory navigation. J R Soc Interface 2024;21:20230746. [PMID: 39013419 PMCID: PMC11251768 DOI: 10.1098/rsif.2023.0746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 04/25/2024] [Indexed: 07/18/2024] Open

Kobayashi K, Kable JW. Neural mechanisms of information seeking. Neuron 2024;112:1741-1756. [PMID: 38703774 DOI: 10.1016/j.neuron.2024.04.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 01/30/2024] [Accepted: 04/08/2024] [Indexed: 05/06/2024]

Montgomery SE, Li L, Russo SJ, Calipari ES, Nestler EJ, Morel C, Han MH. Mesolimbic Neural Response Dynamics Predict Future Individual Alcohol Drinking in Mice. Biol Psychiatry 2024;95:951-962. [PMID: 38061466 DOI: 10.1016/j.biopsych.2023.11.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 11/11/2023] [Accepted: 11/14/2023] [Indexed: 01/27/2024]

Affiliation(s)

Sarah E Montgomery Nash Family Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, New York; Friedman Brain Institute and the Center for Affective Neuroscience, Icahn School of Medicine at Mount Sinai, New York, New York; Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, New York
Long Li Nash Family Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, New York; Friedman Brain Institute and the Center for Affective Neuroscience, Icahn School of Medicine at Mount Sinai, New York, New York
Scott J Russo Nash Family Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, New York; Friedman Brain Institute and the Center for Affective Neuroscience, Icahn School of Medicine at Mount Sinai, New York, New York
Erin S Calipari Nash Family Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, New York; Friedman Brain Institute and the Center for Affective Neuroscience, Icahn School of Medicine at Mount Sinai, New York, New York; Departments of Pharmacology, Molecular Physiology and Biophysics, and Psychiatry and Behavioral Sciences, Vanderbilt Center for Addiction Research, Vanderbilt Brain Institute, Vanderbilt University, Nashville, Tennessee
Eric J Nestler Nash Family Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, New York; Friedman Brain Institute and the Center for Affective Neuroscience, Icahn School of Medicine at Mount Sinai, New York, New York; Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, New York
Carole Morel Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, New York.
Ming-Hu Han Nash Family Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, New York; Friedman Brain Institute and the Center for Affective Neuroscience, Icahn School of Medicine at Mount Sinai, New York, New York; Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, New York; Department of Mental Health and Public Health, Faculty of Life and Health Sciences, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China.

Collapse

Wang Y, Lak A, Manohar SG, Bogacz R. Dopamine encoding of novelty facilitates efficient uncertainty-driven exploration. PLoS Comput Biol 2024;20:e1011516. [PMID: 38626219 PMCID: PMC11051659 DOI: 10.1371/journal.pcbi.1011516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 04/26/2024] [Accepted: 03/23/2024] [Indexed: 04/18/2024] Open

Venditto SJC, Miller KJ, Brody CD, Daw ND. Dynamic reinforcement learning reveals time-dependent shifts in strategy during reward learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.28.582617. [PMID: 38464244 PMCID: PMC10925334 DOI: 10.1101/2024.02.28.582617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]

Lloyd A, Viding E, McKay R, Furl N. Understanding patch foraging strategies across development. Trends Cogn Sci 2023;27:1085-1098. [PMID: 37500422 DOI: 10.1016/j.tics.2023.07.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 07/05/2023] [Accepted: 07/06/2023] [Indexed: 07/29/2023]

Lorents A, Colin ME, Bjerke IE, Nougaret S, Montelisciani L, Diaz M, Verschure P, Vezoli J. Human Brain Project Partnering Projects Meeting: Status Quo and Outlook. eNeuro 2023;10:ENEURO.0091-23.2023. [PMID: 37669867 PMCID: PMC10481639 DOI: 10.1523/eneuro.0091-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Revised: 08/03/2023] [Accepted: 08/04/2023] [Indexed: 09/07/2023] Open

Blackwell KT, Doya K. Enhancing reinforcement learning models by including direct and indirect pathways improves performance on striatal dependent tasks. PLoS Comput Biol 2023;19:e1011385. [PMID: 37594982 PMCID: PMC10479916 DOI: 10.1371/journal.pcbi.1011385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 09/05/2023] [Accepted: 07/25/2023] [Indexed: 08/20/2023] Open

Abstract

A major advance in understanding learning behavior stems from experiments showing that reward learning requires dopamine inputs to striatal neurons and arises from synaptic plasticity of cortico-striatal synapses. Numerous reinforcement learning models mimic this dopamine-dependent synaptic plasticity by using the reward prediction error, which resembles dopamine neuron firing, to learn the best action in response to a set of cues. Though these models can explain many facets of behavior, reproducing some types of goal-directed behavior, such as renewal and reversal, require additional model components. Here we present a reinforcement learning model, TD2Q, which better corresponds to the basal ganglia with two Q matrices, one representing direct pathway neurons (G) and another representing indirect pathway neurons (N). Unlike previous two-Q architectures, a novel and critical aspect of TD2Q is to update the G and N matrices utilizing the temporal difference reward prediction error. A best action is selected for N and G using a softmax with a reward-dependent adaptive exploration parameter, and then differences are resolved using a second selection step applied to the two action probabilities. The model is tested on a range of multi-step tasks including extinction, renewal, discrimination; switching reward probability learning; and sequence learning. Simulations show that TD2Q produces behaviors similar to rodents in choice and sequence learning tasks, and that use of the temporal difference reward prediction error is required to learn multi-step tasks. Blocking the update rule on the N matrix blocks discrimination learning, as observed experimentally. Performance in the sequence learning task is dramatically improved with two matrices. These results suggest that including additional aspects of basal ganglia physiology can improve the performance of reinforcement learning models, better reproduce animal behaviors, and provide insight as to the role of direct- and indirect-pathway striatal neurons.

Collapse

Tranter MM, Aggarwal S, Young JW, Dillon DG, Barnes SA. Reinforcement learning deficits exhibited by postnatal PCP-treated rats enable deep neural network classification. Neuropsychopharmacology 2023;48:1377-1385. [PMID: 36509858 PMCID: PMC10354061 DOI: 10.1038/s41386-022-01514-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 11/21/2022] [Accepted: 11/26/2022] [Indexed: 12/14/2022]

Chen CS, Mueller D, Knep E, Ebitz RB, Grissom NM. Dopamine and norepinephrine differentially mediate the exploration-exploitation tradeoff. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.09.523322. [PMID: 36711959 PMCID: PMC9881999 DOI: 10.1101/2023.01.09.523322] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]

Abstract

The catecholamines dopamine (DA) and norepinephrine (NE) have been repeatedly implicated in neuropsychiatric vulnerability, in part via their roles in mediating the decision making processes. Although the two neuromodulators share a synthesis pathway and are co-activated under states of arousal, they engage in distinct circuits and roles in modulating neural activity across the brain. However, in the computational neuroscience literature, they have been assigned similar roles in modulating the latent cognitive processes of decision making, in particular the exploration-exploitation tradeoff. Revealing how each neuromodulator contributes to this explore-exploit process will be important in guiding mechanistic hypotheses emerging from computational psychiatric approaches. To understand the differences and overlaps of the roles of these two catecholamine systems in regulating exploration and exploitation, a direct comparison using the same dynamic decision making task is needed. Here, we ran mice in a restless two-armed bandit task, which encourages both exploration and exploitation. We systemically administered a nonselective DA receptor antagonist (flupenthixol), a nonselective DA receptor agonist (apomorphine), a NE beta-receptor antagonist (propranolol), and a NE beta-receptor agonist (isoproterenol), and examined changes in exploration within subjects across sessions. We found a bidirectional modulatory effect of dopamine receptor activity on the level of exploration. Increasing dopamine activity decreased exploration and decreasing dopamine activity increased exploration. Beta-noradrenergic receptor activity also modulated exploration, but the modulatory effect was mediated by sex. Reinforcement learning model parameters suggested that dopamine modulation affected exploration via decision noise and norepinephrine modulation affected exploration via outcome sensitivity. Together, these findings suggested that the mechanisms that govern the transition between exploration and exploitation are sensitive to changes in both catecholamine functions and revealed differential roles for NE and DA in mediating exploration.

Collapse

Wang S, Gerken B, Wieland JR, Wilson RC, Fellous JM. The effects of time horizon and guided choices on explore-exploit decisions in rodents. Behav Neurosci 2023;137:127-142. [PMID: 36633987 PMCID: PMC10787949 DOI: 10.1037/bne0000549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]

Speers LJ, Bilkey DK. Maladaptive explore/exploit trade-offs in schizophrenia. Trends Neurosci 2023;46:341-354. [PMID: 36878821 DOI: 10.1016/j.tins.2023.02.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 01/30/2023] [Accepted: 02/08/2023] [Indexed: 03/07/2023]

Rojas GR, Curry-Pochy LS, Chen CS, Heller AT, Grissom NM. Sequential delay and probability discounting tasks in mice reveal anchoring effects partially attributable to decision noise. Behav Brain Res 2022;431:113951. [PMID: 35661751 PMCID: PMC9844124 DOI: 10.1016/j.bbr.2022.113951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 05/20/2022] [Accepted: 05/29/2022] [Indexed: 01/19/2023]

Abstract

Delay discounting and probability discounting decision making tasks in rodent models have high translational potential. However, it is unclear whether the discounted value of the large reward option is the main contributor to variability in animals' choices in either task, which may limit translation to humans. Male and female mice underwent sessions of delay and probability discounting in sequence to assess how choice behavior adapts over experience with each task. To control for "anchoring" (persistent choices based on the initial delay or probability), mice experienced "Worsening" schedules where the large reward was offered under initially favorable conditions that became less favorable during testing, followed by "Improving" schedules where the large reward was offered under initially unfavorable conditions that improved over a session. During delay discounting, both male and female mice showed elimination of anchoring effects over training. In probability discounting, both sexes of mice continued to show some anchoring even after months of training. One possibility is that "noisy", exploratory choices could contribute to these persistent anchoring effects, rather than constant fluctuations in value discounting. We fit choice behavior in individual animals using models that included both a value-based discounting parameter and a decision noise parameter that captured variability in choices deviating from value maximization. Changes in anchoring behavior over time were tracked by changes in both the value and decision noise parameters in delay discounting, but by the decision noise parameter in probability discounting. Exploratory decision making was also reflected in choice response times that tracked the degree of conflict caused by both uncertainty and temporal cost, but was not linked with differences in locomotor activity reflecting chamber exploration. Thus, variable discounting behavior in mice can result from changes in exploration of the decision options rather than changes in reward valuation.

Collapse

Karin O, Alon U. The dopamine circuit as a reward-taxis navigation system. PLoS Comput Biol 2022;18:e1010340. [PMID: 35877694 PMCID: PMC9352198 DOI: 10.1371/journal.pcbi.1010340] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 08/04/2022] [Accepted: 06/29/2022] [Indexed: 01/29/2023] Open

Grzywacz NM, Aleem H. Does Amount of Information Support Aesthetic Values? Front Neurosci 2022;16:805658. [PMID: 35392414 PMCID: PMC8982361 DOI: 10.3389/fnins.2022.805658] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Accepted: 02/16/2022] [Indexed: 11/24/2022] Open

Abstract

Obtaining information from the world is important for survival. The brain, therefore, has special mechanisms to extract as much information as possible from sensory stimuli. Hence, given its importance, the amount of available information may underlie aesthetic values. Such information-based aesthetic values would be significant because they would compete with others to drive decision-making. In this article, we ask, "What is the evidence that amount of information support aesthetic values?" An important concept in the measurement of informational volume is entropy. Research on aesthetic values has thus used Shannon entropy to evaluate the contribution of quantity of information. We review here the concepts of information and aesthetic values, and research on the visual and auditory systems to probe whether the brain uses entropy or other relevant measures, specially, Fisher information, in aesthetic decisions. We conclude that information measures contribute to these decisions in two ways: first, the absolute quantity of information can modulate aesthetic preferences for certain sensory patterns. However, the preference for volume of information is highly individualized, with information-measures competing with organizing principles, such as rhythm and symmetry. In addition, people tend to be resistant to too much entropy, but not necessarily, high amounts of Fisher information. We show that this resistance may stem in part from the distribution of amount of information in natural sensory stimuli. Second, the measurement of entropic-like quantities over time reveal that they can modulate aesthetic decisions by varying degrees of surprise given temporally integrated expectations. We propose that amount of information underpins complex aesthetic values, possibly informing the brain on the allocation of resources or the situational appropriateness of some cognitive models.

Collapse

Faure P, Fayad SL, Solié C, Reynolds LM. Social Determinants of Inter-Individual Variability and Vulnerability: The Role of Dopamine. Front Behav Neurosci 2022;16:836343. [PMID: 35386723 PMCID: PMC8979673 DOI: 10.3389/fnbeh.2022.836343] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Accepted: 02/14/2022] [Indexed: 11/13/2022] Open

The role of state uncertainty in the dynamics of dopamine. Curr Biol 2022;32:1077-1087.e9. [PMID: 35114098 PMCID: PMC8930519 DOI: 10.1016/j.cub.2022.01.025] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 11/22/2021] [Accepted: 01/10/2022] [Indexed: 11/22/2022]

Mikhael JG, Gershman SJ. Impulsivity and risk-seeking as Bayesian inference under dopaminergic control. Neuropsychopharmacology 2022;47:465-476. [PMID: 34376813 PMCID: PMC8674258 DOI: 10.1038/s41386-021-01125-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/01/2020] [Revised: 07/17/2021] [Accepted: 07/21/2021] [Indexed: 02/07/2023]

Abstract

Bayesian models successfully account for several of dopamine (DA)'s effects on contextual calibration in interval timing and reward estimation. In these models, tonic levels of DA control the precision of stimulus encoding, which is weighed against contextual information when making decisions. When DA levels are high, the animal relies more heavily on the (highly precise) stimulus encoding, whereas when DA levels are low, the context affects decisions more strongly. Here, we extend this idea to intertemporal choice and probability discounting tasks. In intertemporal choice tasks, agents must choose between a small reward delivered soon and a large reward delivered later, whereas in probability discounting tasks, agents must choose between a small reward that is always delivered and a large reward that may be omitted with some probability. Beginning with the principle that animals will seek to maximize their reward rates, we show that the Bayesian model predicts a number of curious empirical findings in both tasks. First, the model predicts that higher DA levels should normally promote selection of the larger/later option, which is often taken to imply that DA decreases 'impulsivity,' and promote selection of the large/risky option, often taken to imply that DA increases 'risk-seeking.' However, if the temporal precision is sufficiently decreased, higher DA levels should have the opposite effect-promoting selection of the smaller/sooner option (higher impulsivity) and the small/safe option (lower risk-seeking). Second, high enough levels of DA can result in preference reversals. Third, selectively decreasing the temporal precision, without manipulating DA, should promote selection of the larger/later and large/risky options. Fourth, when a different post-reward delay is associated with each option, animals will not learn the option-delay contingencies, but this learning can be salvaged when the post-reward delays are made more salient. Finally, the Bayesian model predicts correlations among behavioral phenotypes: Animals that are better timers will also appear less impulsive.

Collapse

Spreng RN, Turner GR. From exploration to exploitation: a shifting mental mode in late life development. Trends Cogn Sci 2021;25:1058-1071. [PMID: 34593321 PMCID: PMC8844884 DOI: 10.1016/j.tics.2021.09.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2021] [Revised: 08/30/2021] [Accepted: 09/01/2021] [Indexed: 12/31/2022]

Adaptive exploration policy for exploration–exploitation tradeoff in continuous action control optimization. INT J MACH LEARN CYB 2021. [DOI: 10.1007/s13042-021-01387-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Chronic nicotine increases midbrain dopamine neuron activity and biases individual strategies towards reduced exploration in mice. Nat Commun 2021;12:6945. [PMID: 34836948 PMCID: PMC8635406 DOI: 10.1038/s41467-021-27268-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Accepted: 11/04/2021] [Indexed: 11/09/2022] Open

Chen CS, Knep E, Han A, Ebitz RB, Grissom N. Sex differences in learning from exploration. eLife 2021;10:69748. [PMID: 34796870 PMCID: PMC8794469 DOI: 10.7554/elife.69748] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2021] [Accepted: 11/18/2021] [Indexed: 11/13/2022] Open

Hamelin H, Poizat G, Florian C, Kursa MB, Pittaras E, Callebert J, Rampon C, Taouis M, Hamed A, Granon S. Prolonged Consumption of Sweetened Beverages Lastingly Deteriorates Cognitive Functions and Reward Processing in Mice. Cereb Cortex 2021;32:1365-1378. [PMID: 34491298 DOI: 10.1093/cercor/bhab274] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Revised: 07/10/2021] [Accepted: 07/12/2021] [Indexed: 12/25/2022] Open

Foo C, Lozada A, Aljadeff J, Li Y, Wang JW, Slesinger PA, Kleinfeld D. Reinforcement learning links spontaneous cortical dopamine impulses to reward. Curr Biol 2021;31:4111-4119.e4. [PMID: 34302743 DOI: 10.1016/j.cub.2021.06.069] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 05/28/2021] [Accepted: 06/24/2021] [Indexed: 11/15/2022]

Koralek AC, Costa RM. Dichotomous dopaminergic and noradrenergic neural states mediate distinct aspects of exploitative behavioral states. SCIENCE ADVANCES 2021;7:7/30/eabh2059. [PMID: 34301604 PMCID: PMC8302134 DOI: 10.1126/sciadv.abh2059] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 06/07/2021] [Indexed: 06/13/2023]

Guo D, Yu AJ. Revisiting the Role of Uncertainty-Driven Exploration in a (Perceived) Non-Stationary World. COGSCI ... ANNUAL CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY. COGNITIVE SCIENCE SOCIETY (U.S.). CONFERENCE 2021;43:2045-2051. [PMID: 34368809 PMCID: PMC8341546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Ohta H, Satori K, Takarada Y, Arake M, Ishizuka T, Morimoto Y, Takahashi T. The asymmetric learning rates of murine exploratory behavior in sparse reward environments. Neural Netw 2021;143:218-229. [PMID: 34157646 DOI: 10.1016/j.neunet.2021.05.030] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2021] [Revised: 04/16/2021] [Accepted: 05/26/2021] [Indexed: 11/29/2022]

Gilbertson T, Steele D. Tonic dopamine, uncertainty and basal ganglia action selection. Neuroscience 2021;466:109-124. [PMID: 34015370 DOI: 10.1016/j.neuroscience.2021.05.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 05/04/2021] [Accepted: 05/08/2021] [Indexed: 11/29/2022]

Marzecová A, Kaiser LF, Maddah A. Neuromodulation of Foraging Decisions: The Role of Dopamine. Front Behav Neurosci 2021;15:660667. [PMID: 33927602 PMCID: PMC8076528 DOI: 10.3389/fnbeh.2021.660667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 03/15/2021] [Indexed: 11/22/2022] Open

Wilson RC, Bonawitz E, Costa VD, Ebitz RB. Balancing exploration and exploitation with information and randomization. Curr Opin Behav Sci 2021;38:49-56. [PMID: 33184605 PMCID: PMC7654823 DOI: 10.1016/j.cobeha.2020.10.001] [Citation(s) in RCA: 78] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]

Wiehler A, Chakroun K, Peters J. Attenuated Directed Exploration during Reinforcement Learning in Gambling Disorder. J Neurosci 2021;41:2512-2522. [PMID: 33531415 PMCID: PMC7984586 DOI: 10.1523/jneurosci.1607-20.2021] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Revised: 01/18/2021] [Accepted: 01/22/2021] [Indexed: 12/30/2022] Open

Abstract

Gambling disorder (GD) is a behavioral addiction associated with impairments in value-based decision-making and behavioral flexibility and might be linked to changes in the dopamine system. Maximizing long-term rewards requires a flexible trade-off between the exploitation of known options and the exploration of novel options for information gain. This exploration-exploitation trade-off is thought to depend on dopamine neurotransmission. We hypothesized that human gamblers would show a reduction in directed (uncertainty-based) exploration, accompanied by changes in brain activity in a fronto-parietal exploration-related network. Twenty-three frequent, non-treatment seeking gamblers and twenty-three healthy matched controls (all male) performed a four-armed bandit task during functional magnetic resonance imaging (fMRI). Computational modeling using hierarchical Bayesian parameter estimation revealed signatures of directed exploration, random exploration, and perseveration in both groups. Gamblers showed a reduction in directed exploration, whereas random exploration and perseveration were similar between groups. Neuroimaging revealed no evidence for group differences in neural representations of basic task variables (expected value, prediction errors). Our hypothesis of reduced frontal pole (FP) recruitment in gamblers was not supported. Exploratory analyses showed that during directed exploration, gamblers showed reduced parietal cortex and substantia-nigra/ventral-tegmental-area activity. Cross-validated classification analyses revealed that connectivity in an exploration-related network was predictive of group status, suggesting that connectivity patterns might be more predictive of problem gambling than univariate effects. Findings reveal specific reductions of strategic exploration in gamblers that might be linked to altered processing in a fronto-parietal network and/or changes in dopamine neurotransmission implicated in GD.SIGNIFICANCE STATEMENT Wiehler et al. (2021) report that gamblers rely less on the strategic exploration of unknown, but potentially better rewards during reward learning. This is reflected in a related network of brain activity. Parameters of this network can be used to predict the presence of problem gambling behavior in participants.

Collapse

Mikhael JG, Lai L, Gershman SJ. Rational inattention and tonic dopamine. PLoS Comput Biol 2021;17:e1008659. [PMID: 33760806 PMCID: PMC7990190 DOI: 10.1371/journal.pcbi.1008659] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Accepted: 12/28/2020] [Indexed: 11/27/2022] Open

Abstract

Slow-timescale (tonic) changes in dopamine (DA) contribute to a wide variety of processes in reinforcement learning, interval timing, and other domains. Furthermore, changes in tonic DA exert distinct effects depending on when they occur (e.g., during learning vs. performance) and what task the subject is performing (e.g., operant vs. classical conditioning). Two influential theories of tonic DA-the average reward theory and the Bayesian theory in which DA controls precision-have each been successful at explaining a subset of empirical findings. But how the same DA signal performs two seemingly distinct functions without creating crosstalk is not well understood. Here we reconcile the two theories under the unifying framework of 'rational inattention,' which (1) conceptually links average reward and precision, (2) outlines how DA manipulations affect this relationship, and in so doing, (3) captures new empirical phenomena. In brief, rational inattention asserts that agents can increase their precision in a task (and thus improve their performance) by paying a cognitive cost. Crucially, whether this cost is worth paying depends on average reward availability, reported by DA. The monotonic relationship between average reward and precision means that the DA signal contains the information necessary to retrieve the precision. When this information is needed after the task is performed, as presumed by Bayesian inference, acute manipulations of DA will bias behavior in predictable ways. We show how this framework reconciles a remarkably large collection of experimental findings. In reinforcement learning, the rational inattention framework predicts that learning from positive and negative feedback should be enhanced in high and low DA states, respectively, and that DA should tip the exploration-exploitation balance toward exploitation. In interval timing, this framework predicts that DA should increase the speed of the internal clock and decrease the extent of interference by other temporal stimuli during temporal reproduction (the central tendency effect). Finally, rational inattention makes the new predictions that these effects should be critically dependent on the controllability of rewards, that post-reward delays in intertemporal choice tasks should be underestimated, and that average reward manipulations should affect the speed of the clock-thus capturing empirical findings that are unexplained by either theory alone. Our results suggest that a common computational repertoire may underlie the seemingly heterogeneous roles of DA.

Collapse

Dubois M, Habicht J, Michely J, Moran R, Dolan RJ, Hauser TU. Human complex exploration strategies are enriched by noradrenaline-modulated heuristics. eLife 2021;10:e59907. [PMID: 33393461 PMCID: PMC7815309 DOI: 10.7554/elife.59907] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 01/03/2021] [Indexed: 01/15/2023] Open

Lengersdorff LL, Wagner IC, Lockwood PL, Lamm C. When Implicit Prosociality Trumps Selfishness: The Neural Valuation System Underpins More Optimal Choices When Learning to Avoid Harm to Others Than to Oneself. J Neurosci 2020;40:7286-7299. [PMID: 32839234 PMCID: PMC7534918 DOI: 10.1523/jneurosci.0842-20.2020] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Revised: 06/09/2020] [Accepted: 07/07/2020] [Indexed: 12/12/2022] Open

Abstract

Humans learn quickly which actions cause them harm. As social beings, we also need to learn to avoid actions that hurt others. It is currently unknown whether humans are as good at learning to avoid others' harm (prosocial learning) as they are at learning to avoid self-harm (self-relevant learning). Moreover, it remains unclear how the neural mechanisms of prosocial learning differ from those of self-relevant learning. In this fMRI study, 96 male human participants learned to avoid painful stimuli either for themselves or for another individual. We found that participants performed more optimally when learning for the other than for themselves. Computational modeling revealed that this could be explained by an increased sensitivity to subjective values of choice alternatives during prosocial learning. Increased value sensitivity was further associated with empathic traits. On the neural level, higher value sensitivity during prosocial learning was associated with stronger engagement of the ventromedial PFC during valuation. Moreover, the ventromedial PFC exhibited higher connectivity with the right temporoparietal junction during prosocial, compared with self-relevant, choices. Our results suggest that humans are particularly adept at learning to protect others from harm. This ability appears implemented by neural mechanisms overlapping with those supporting self-relevant learning, but with the additional recruitment of structures associated to the social brain. Our findings contrast with recent proposals that humans are egocentrically biased when learning to obtain monetary rewards for self or others. Prosocial tendencies may thus trump egocentric biases in learning when another person's physical integrity is at stake.SIGNIFICANCE STATEMENT We quickly learn to avoid actions that cause us harm. As "social animals," we also need to learn and consider the harmful consequences our actions might have for others. Here, we investigated how learning to protect others from pain (prosocial learning) differs from learning to protect oneself (self-relevant learning). We found that human participants performed better during prosocial learning than during self-relevant learning, as they were more sensitive toward the information they collected when making choices for the other. Prosocial learning recruited similar brain areas as self-relevant learning, but additionally involved parts of the "social brain" that underpin perspective-taking and self-other distinction. Our findings suggest that people show an inherent tendency toward "intuitive" prosociality.

Collapse

Sablotny-Wackershauser V, Betts MJ, Brunnlieb C, Apostolova I, Buchert R, Düzel E, Gruendler TOJ, Vogt B. Older adults show a reduced tendency to engage in context-dependent decision biases. Neuropsychologia 2020;142:107445. [PMID: 32275966 DOI: 10.1016/j.neuropsychologia.2020.107445] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 02/19/2020] [Accepted: 03/25/2020] [Indexed: 11/16/2022]

Van Slooten JC, Jahfari S, Theeuwes J. Spontaneous eye blink rate predicts individual differences in exploration and exploitation during reinforcement learning. Sci Rep 2019;9:17436. [PMID: 31758031 PMCID: PMC6874684 DOI: 10.1038/s41598-019-53805-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Accepted: 10/31/2019] [Indexed: 12/13/2022] Open

Sebold M, Garbusow M, Jetzschmann P, Schad DJ, Nebe S, Schlagenhauf F, Heinz A, Rapp M, Romanczuk-Seiferth N. Reward and avoidance learning in the context of aversive environments and possible implications for depressive symptoms. Psychopharmacology (Berl) 2019;236:2437-2449. [PMID: 31254091 PMCID: PMC6695365 DOI: 10.1007/s00213-019-05299-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Accepted: 06/05/2019] [Indexed: 01/22/2023]

Abstract

BACKGROUND

Aversive stimuli in the environment influence human actions. This includes valence-dependent influences on action selection, e.g., increased avoidance but decreased approach behavior. However, it is yet unclear how aversive stimuli interact with complex learning and decision-making in the reward and avoidance domain. Moreover, the underlying computational mechanisms of these decision-making biases are unknown.

METHODS

To elucidate these mechanisms, 54 healthy young male subjects performed a two-step sequential decision-making task, which allows to computationally model different aspects of learning, e.g., model-free, habitual, and model-based, goal-directed learning. We used a within-subject design, crossing task valence (reward vs. punishment learning) with emotional context (aversive vs. neutral background stimuli). We analyzed choice data, applied a computational model, and performed simulations.

RESULTS

Whereas model-based learning was not affected, aversive stimuli interacted with model-free learning in a way that depended on task valence. Thus, aversive stimuli increased model-free avoidance learning but decreased model-free reward learning. The computational model confirmed this effect: the parameter lambda that indicates the influence of reward prediction errors on decision values was increased in the punishment condition but decreased in the reward condition when aversive stimuli were present. Further, by using the inferred computational parameters to simulate choice data, our effects were captured. Exploratory analyses revealed that the observed biases were associated with subclinical depressive symptoms.

CONCLUSION

Our data show that aversive environmental stimuli affect complex learning and decision-making, which depends on task valence. Further, we provide a model of the underlying computations of this affective modulation. Finally, our finding of increased decision-making biases in subjects reporting subclinical depressive symptoms matches recent reports of amplified Pavlovian influences on action selection in depression and suggests a potential vulnerability factor for mood disorders. We discuss our findings in the light of the involvement of the neuromodulators serotonin and dopamine.

Collapse

Impacts of inter-trial interval duration on a computational model of sign-tracking vs. goal-tracking behaviour. Psychopharmacology (Berl) 2019;236:2373-2388. [PMID: 31367850 PMCID: PMC6695359 DOI: 10.1007/s00213-019-05323-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Accepted: 07/01/2019] [Indexed: 01/15/2023]