1
|
Garner KG, Leow LA, Uchida A, Nolan C, Jensen O, Garrido MI, Dux PE. Assessing the influence of dopamine and mindfulness on the formation of routines in visual search. Psychophysiology 2024; 61:e14571. [PMID: 38679809 DOI: 10.1111/psyp.14571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 02/08/2024] [Accepted: 03/06/2024] [Indexed: 05/01/2024]
Abstract
Given experience in cluttered but stable visual environments, our eye-movements form stereotyped routines that sample task-relevant locations, while not mixing-up routines between similar task-settings. Both dopamine signaling and mindfulness have been posited as factors that influence the formation of such routines, yet quantification of their impact remains to be tested in healthy humans. Over two sessions, participants searched through grids of doors to find hidden targets, using a gaze-contingent display. Within each session, door scenes appeared in either one of two colors, with each color signaling a differing set of likely target locations. We derived measures for how well target locations were learned (target-accuracy), how routine were sets of eye-movements (stereotypy), and the extent of interference between the two scenes (setting-accuracy). Participants completed two sessions, where they were administered either levodopa (dopamine precursor) or placebo (vitamin C), under double-blind counterbalanced conditions. Dopamine and trait mindfulness (assessed by questionnaire) interacted to influence both target-accuracy and stereotypy. Increasing dopamine improved accuracy and reduced stereotypy for high mindfulness scorers, but induced the opposite pattern for low mindfulness scorers. Dopamine also disrupted setting-accuracy invariant to mindfulness. Our findings show that mindfulness modulates the impact of dopamine on the target-accuracy and stereotypy of eye-movement routines, whereas increasing dopamine promotes interference between task-settings, regardless of mindfulness. These findings provide a link between non-human and human models regarding the influence of dopamine on the formation of task-relevant eye-movement routines and provide novel insights into behavior-trait factors that modulate the use of experience when building adaptive repertoires.
Collapse
Affiliation(s)
- Kelly G Garner
- School of Psychology, University of New South Wales, Sydney, New South Wales, Australia
- School of Psychology, University of Queensland, Saint Lucia, Queensland, Australia
| | - Li-Ann Leow
- School of Psychology, University of Queensland, Saint Lucia, Queensland, Australia
| | - Aya Uchida
- School of Psychology, University of Queensland, Saint Lucia, Queensland, Australia
| | - Christopher Nolan
- School of Psychology, University of New South Wales, Sydney, New South Wales, Australia
| | - Ole Jensen
- Centre for Human Brain Health, University of Birmingham, Birmingham, UK
| | - Marta I Garrido
- Melbourne School of Psychological Sciences and Graeme Clark Institute for Biomedical Engineering, University of Melbourne, Melbourne, Victoria, Australia
| | - Paul E Dux
- School of Psychology, University of Queensland, Saint Lucia, Queensland, Australia
| |
Collapse
|
2
|
Yan X, Ebitz RB, Grissom N, Darrow DP, Herman AB. Individual differences in uncertainty evaluation explain opposing exploratory behaviors in anxiety and apathy. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.04.597412. [PMID: 38895240 PMCID: PMC11185698 DOI: 10.1101/2024.06.04.597412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Navigating uncertain environments is a fundamental challenge for adaptive behavior, and affective states such as anxiety and apathy can profoundly influence an individual's response to uncertainty. Uncertainty encompasses both volatility and stochasticity, where volatility refers to how rapidly the environment changes and stochasticity describes outcomes resulting from random chance. This study investigates how anxiety and apathy modulate perceptions of environmental volatility and stochasticity and how these perceptions impact exploratory behavior. In a large online sample (N = 1001), participants completed a restless three-armed bandit task, and their choices were analyzed using latent state models to quantify the computational processes. We found that anxious individuals attributed uncertainty more to environmental volatility than stochasticity, leading to increased exploration, particularly after reward omission. Conversely, apathetic individuals perceived uncertainty as more stochastic than volatile, resulting in decreased exploration. The ratio of perceived volatility to stochasticity mediated the relationship between anxiety and exploratory behavior following adverse outcomes. These findings reveal distinct computational mechanisms underlying anxiety and apathy in uncertain environments. Our results provide a novel framework for understanding the cognitive and affective processes driving adaptive and potentially maladaptive behaviors under uncertainty, with implications for the characterization and treatment of neuropsychiatric disorders.
Collapse
Affiliation(s)
- Xinyuan Yan
- Department of Psychiatry and Behavioral Sciences, University of Minnesota, Minneapolis, MN 55455, USA
| | - R. Becket Ebitz
- Department of Neuroscience, Universite de Montreal, 2900 Edouard Montpetit Blvd, Montreal, Quebec H3T 1J4, Canada
| | - Nicola Grissom
- Department of Psychology, University of Minnesota, 75 E River Rd, Minneapolis, MN 55455, USA
| | - David P. Darrow
- Department of Neurosurgery, University of Minnesota, Minneapolis, MN 55455, USA
| | - Alexander B. Herman
- Department of Psychiatry and Behavioral Sciences, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
3
|
Kobayashi K, Kable JW. Neural mechanisms of information seeking. Neuron 2024; 112:1741-1756. [PMID: 38703774 DOI: 10.1016/j.neuron.2024.04.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 01/30/2024] [Accepted: 04/08/2024] [Indexed: 05/06/2024]
Abstract
We ubiquitously seek information to make better decisions. Particularly in the modern age, when more information is available at our fingertips than ever, the information we choose to collect determines the quality of our decisions. Decision neuroscience has long adopted empirical approaches where the information available to decision-makers is fully controlled by the researchers, leaving neural mechanisms of information seeking less understood. Although information seeking has long been studied in the context of the exploration-exploitation trade-off, recent studies have widened the scope to investigate more overt information seeking in a way distinct from other decision processes. Insights gained from these studies, accumulated over the last few years, raise the possibility that information seeking is driven by the reward system signaling the subjective value of information. In this piece, we review findings from the recent studies, highlighting the conceptual and empirical relationships between distinct literatures, and discuss future research directions necessary to establish a more comprehensive understanding of how individuals seek information as a part of value-based decision-making.
Collapse
Affiliation(s)
- Kenji Kobayashi
- Department of Psychology, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | - Joseph W Kable
- Department of Psychology, University of Pennsylvania, Philadelphia, PA 19104, USA.
| |
Collapse
|
4
|
Güldener L, Pollmann S. Behavioral Bias for Exploration Is Associated with Enhanced Signaling in the Lateral and Medial Frontopolar Cortex. J Cogn Neurosci 2024; 36:1156-1171. [PMID: 38437186 DOI: 10.1162/jocn_a_02132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2024]
Abstract
Should we keep doing what we know works for us, or should we risk trying something new as it could work even better? The exploration-exploitation dilemma is ubiquitous in daily life decision-making, and balancing between the two is crucial for adaptive behavior. Yet, we only have started to unravel the neurocognitive mechanisms that help us to find this balance in practice. Analyzing BOLD signals of healthy young adults during virtual foraging, we could show that a behavioral tendency for prolonged exploitation was associated with weakened signaling during exploration in central node points of the frontoparietal attention network, plus the frontopolar cortex. These results provide an important link between behavioral heuristics that we use to balance between exploitation and exploration and the brain function that supports shifts from one tendency to the other. Importantly, they stress that interindividual differences in behavioral strategies are reflected in differences in brain activity during exploration and should thus be more in the focus of basic research that aims at delineating general laws governing visual attention.
Collapse
|
5
|
Kang P, Tobler PN, Dayan P. Bayesian reinforcement learning: A basic overview. Neurobiol Learn Mem 2024; 211:107924. [PMID: 38579896 DOI: 10.1016/j.nlm.2024.107924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 03/21/2024] [Accepted: 04/02/2024] [Indexed: 04/07/2024]
Abstract
We and other animals learn because there is some aspect of the world about which we are uncertain. This uncertainty arises from initial ignorance, and from changes in the world that we do not perfectly know; the uncertainty often becomes evident when our predictions about the world are found to be erroneous. The Rescorla-Wagner learning rule, which specifies one way that prediction errors can occasion learning, has been hugely influential as a characterization of Pavlovian conditioning and, through its equivalence to the delta rule in engineering, in a much wider class of learning problems. Here, we review the embedding of the Rescorla-Wagner rule in a Bayesian context that is precise about the link between uncertainty and learning, and thereby discuss extensions to such suggestions as the Kalman filter, structure learning, and beyond, that collectively encompass a wider range of uncertainties and accommodate a wider assortment of phenomena in conditioning.
Collapse
Affiliation(s)
- Pyungwon Kang
- University of Zurich, Department of Economics, Laboratory for Social and Neural Systems Research, Zurich, Switzerland.
| | - Philippe N Tobler
- University of Zurich, Department of Economics, Laboratory for Social and Neural Systems Research, Zurich, Switzerland.
| | - Peter Dayan
- Max Planck Institute for Biological Cybernetics, Tübingen, Germany; University of Tübingen, Tübingen Germany.
| |
Collapse
|
6
|
Lloyd A, Roiser JP, Skeen S, Freeman Z, Badalova A, Agunbiade A, Busakhwe C, DeFlorio C, Marcu A, Pirie H, Saleh R, Snyder T, Fearon P, Viding E. Reviewing explore/exploit decision-making as a transdiagnostic target for psychosis, depression, and anxiety. COGNITIVE, AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2024:10.3758/s13415-024-01186-9. [PMID: 38653937 DOI: 10.3758/s13415-024-01186-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 03/27/2024] [Indexed: 04/25/2024]
Abstract
In many everyday decisions, individuals choose between trialling something novel or something they know well. Deciding when to try a new option or stick with an option that is already known to you, known as the "explore/exploit" dilemma, is an important feature of cognition that characterises a range of decision-making contexts encountered by humans. Recent evidence has suggested preferences in explore/exploit biases are associated with psychopathology, although this has typically been examined within individual disorders. The current review examined whether explore/exploit decision-making represents a promising transdiagnostic target for psychosis, depression, and anxiety. A systematic search of academic databases was conducted, yielding a total of 29 studies. Studies examining psychosis were mostly consistent in showing that individuals with psychosis explored more compared with individuals without psychosis. The literature on anxiety and depression was more heterogenous; some studies found that anxiety and depression were associated with more exploration, whereas other studies demonstrated reduced exploration in anxiety and depression. However, examining a subset of studies that employed case-control methods, there was some evidence that both anxiety and depression also were associated with increased exploration. Due to the heterogeneity across the literature, we suggest that there is insufficient evidence to conclude whether explore/exploit decision-making is a transdiagnostic target for psychosis, depression, and anxiety. However, alongside our advisory groups of lived experience advisors, we suggest that this context of decision-making is a promising candidate that merits further investigation using well-powered, longitudinal designs. Such work also should examine whether biases in explore/exploit choices are amenable to intervention.
Collapse
Affiliation(s)
- Alex Lloyd
- Clinical, Educational and Health Psychology, Psychology and Language Sciences, University College London, 26 Bedford Way, London, WC1H 0AP, UK.
| | - Jonathan P Roiser
- Institute of Cognitive Neuroscience, University College London, London, UK
| | - Sarah Skeen
- Institute for Life Course Health Research, Stellenbosch University, Stellenbosch, South Africa
| | - Ze Freeman
- Department of Psychology, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, UK
| | - Aygun Badalova
- Institute of Neurology, University College London, London, UK
| | | | | | | | - Anna Marcu
- Young People's Advisor Group, London, UK
| | | | | | | | - Pasco Fearon
- Clinical, Educational and Health Psychology, Psychology and Language Sciences, University College London, 26 Bedford Way, London, WC1H 0AP, UK
- Centre for Family Research, Department of Psychology, University of Cambridge, Cambridge, UK
| | - Essi Viding
- Clinical, Educational and Health Psychology, Psychology and Language Sciences, University College London, 26 Bedford Way, London, WC1H 0AP, UK
| |
Collapse
|
7
|
Gilmour W, Mackenzie G, Feile M, Tayler-Grint L, Suveges S, Macfarlane JA, Macleod AD, Marshall V, Grunwald IQ, Steele JD, Gilbertson T. Impaired value-based decision-making in Parkinson's disease apathy. Brain 2024; 147:1362-1376. [PMID: 38305691 PMCID: PMC10994558 DOI: 10.1093/brain/awae025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 12/07/2023] [Accepted: 01/13/2024] [Indexed: 02/03/2024] Open
Abstract
Apathy is a common and disabling complication of Parkinson's disease characterized by reduced goal-directed behaviour. Several studies have reported dysfunction within prefrontal cortical regions and projections from brainstem nuclei whose neuromodulators include dopamine, serotonin and noradrenaline. Work in animal and human neuroscience have confirmed contributions of these neuromodulators on aspects of motivated decision-making. Specifically, these neuromodulators have overlapping contributions to encoding the value of decisions, and influence whether to explore alternative courses of action or persist in an existing strategy to achieve a rewarding goal. Building upon this work, we hypothesized that apathy in Parkinson's disease should be associated with an impairment in value-based learning. Using a four-armed restless bandit reinforcement learning task, we studied decision-making in 75 volunteers; 53 patients with Parkinson's disease, with and without clinical apathy, and 22 age-matched healthy control subjects. Patients with apathy exhibited impaired ability to choose the highest value bandit. Task performance predicted an individual patient's apathy severity measured using the Lille Apathy Rating Scale (R = -0.46, P < 0.001). Computational modelling of the patient's choices confirmed the apathy group made decisions that were indifferent to the learnt value of the options, consistent with previous reports of reward insensitivity. Further analysis demonstrated a shift away from exploiting the highest value option and a reduction in perseveration, which also correlated with apathy scores (R = -0.5, P < 0.001). We went on to acquire functional MRI in 59 volunteers; a group of 19 patients with and 20 without apathy and 20 age-matched controls performing the Restless Bandit Task. Analysis of the functional MRI signal at the point of reward feedback confirmed diminished signal within ventromedial prefrontal cortex in Parkinson's disease, which was more marked in apathy, but not predictive of their individual apathy severity. Using a model-based categorization of choice type, decisions to explore lower value bandits in the apathy group activated prefrontal cortex to a similar degree to the age-matched controls. In contrast, Parkinson's patients without apathy demonstrated significantly increased activation across a distributed thalamo-cortical network. Enhanced activity in the thalamus predicted individual apathy severity across both patient groups and exhibited functional connectivity with dorsal anterior cingulate cortex and anterior insula. Given that task performance in patients without apathy was no different to the age-matched control subjects, we interpret the recruitment of this network as a possible compensatory mechanism, which compensates against symptomatic manifestation of apathy in Parkinson's disease.
Collapse
Affiliation(s)
- William Gilmour
- Division of Imaging Science and Technology, Ninewells Hospital and Medical School, University of Dundee, Dundee DD1 9SY, UK
- Department of Neurology, Ninewells Hospital and Medical School, Dundee DD1 9SY, UK
| | - Graeme Mackenzie
- Division of Imaging Science and Technology, Ninewells Hospital and Medical School, University of Dundee, Dundee DD1 9SY, UK
- Department of Neurology, Ninewells Hospital and Medical School, Dundee DD1 9SY, UK
| | - Mathias Feile
- Rehabilitation Psychiatry, Murray Royal Hospital, Perth PH2 7BH, UK
| | | | - Szabolcs Suveges
- Division of Imaging Science and Technology, Ninewells Hospital and Medical School, University of Dundee, Dundee DD1 9SY, UK
| | - Jennifer A Macfarlane
- Division of Imaging Science and Technology, Ninewells Hospital and Medical School, University of Dundee, Dundee DD1 9SY, UK
- Medical Physics, Ninewells Hospital and Medical School, Dundee DD1 9SY, UK
- SINAPSE, University of Glasgow, Imaging Centre of Excellence, Level 2, Queen Elizabeth University Hospital, Glasgow G51 4TF, Scotland, UK
| | - Angus D Macleod
- Institute of Applied Health Sciences, School of Medicine, University of Aberdeen, Foresterhill, Aberdeen AB24 2ZD, UK
- Department of Neurology, Aberdeen Royal Infirmary, Foresterhill, Aberdeen AB24 2ZD, UK
| | - Vicky Marshall
- Institute of Neurological Sciences, Queen Elizabeth University Hospital, Glasgow G51 4TF, UK
| | - Iris Q Grunwald
- Division of Imaging Science and Technology, Ninewells Hospital and Medical School, University of Dundee, Dundee DD1 9SY, UK
| | - J Douglas Steele
- Division of Imaging Science and Technology, Ninewells Hospital and Medical School, University of Dundee, Dundee DD1 9SY, UK
| | - Tom Gilbertson
- Division of Imaging Science and Technology, Ninewells Hospital and Medical School, University of Dundee, Dundee DD1 9SY, UK
- Department of Neurology, Ninewells Hospital and Medical School, Dundee DD1 9SY, UK
| |
Collapse
|
8
|
Colas JT, O’Doherty JP, Grafton ST. Active reinforcement learning versus action bias and hysteresis: control with a mixture of experts and nonexperts. PLoS Comput Biol 2024; 20:e1011950. [PMID: 38552190 PMCID: PMC10980507 DOI: 10.1371/journal.pcbi.1011950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 02/26/2024] [Indexed: 04/01/2024] Open
Abstract
Active reinforcement learning enables dynamic prediction and control, where one should not only maximize rewards but also minimize costs such as of inference, decisions, actions, and time. For an embodied agent such as a human, decisions are also shaped by physical aspects of actions. Beyond the effects of reward outcomes on learning processes, to what extent can modeling of behavior in a reinforcement-learning task be complicated by other sources of variance in sequential action choices? What of the effects of action bias (for actions per se) and action hysteresis determined by the history of actions chosen previously? The present study addressed these questions with incremental assembly of models for the sequential choice data from a task with hierarchical structure for additional complexity in learning. With systematic comparison and falsification of computational models, human choices were tested for signatures of parallel modules representing not only an enhanced form of generalized reinforcement learning but also action bias and hysteresis. We found evidence for substantial differences in bias and hysteresis across participants-even comparable in magnitude to the individual differences in learning. Individuals who did not learn well revealed the greatest biases, but those who did learn accurately were also significantly biased. The direction of hysteresis varied among individuals as repetition or, more commonly, alternation biases persisting from multiple previous actions. Considering that these actions were button presses with trivial motor demands, the idiosyncratic forces biasing sequences of action choices were robust enough to suggest ubiquity across individuals and across tasks requiring various actions. In light of how bias and hysteresis function as a heuristic for efficient control that adapts to uncertainty or low motivation by minimizing the cost of effort, these phenomena broaden the consilient theory of a mixture of experts to encompass a mixture of expert and nonexpert controllers of behavior.
Collapse
Affiliation(s)
- Jaron T. Colas
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, California, United States of America
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, California, United States of America
- Computation and Neural Systems Program, California Institute of Technology, Pasadena, California, United States of America
| | - John P. O’Doherty
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, California, United States of America
- Computation and Neural Systems Program, California Institute of Technology, Pasadena, California, United States of America
| | - Scott T. Grafton
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, California, United States of America
| |
Collapse
|
9
|
Sazhin D, Dachs A, Smith DV. Meta-Analysis Reveals That Explore-Exploit Decisions are Dissociable by Activation in the Dorsal Lateral Prefrontal Cortex and the Anterior Cingulate Cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.10.21.563317. [PMID: 37961286 PMCID: PMC10634720 DOI: 10.1101/2023.10.21.563317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Explore-exploit research has challenges in generalizability due to a limited theoretical basis of exploration and exploitation. Neuroimaging can help identify whether explore-exploit decisions use an opponent processing system to address this issue. Thus, we conducted a coordinate-based meta-analysis (N=23 studies) where we found activation in the dorsal lateral prefrontal cortex and anterior cingulate cortex during exploration versus exploitation, providing some evidence for opponent processing. However, the conjunction of explore-exploit decisions was associated with activation in the dorsal anterior cingulate cortex, dorsal medial prefrontal cortex, and anterior insula, suggesting that these brain regions do not engage in opponent processing. Further, exploratory analyses revealed heterogeneity in brain responses between task types during exploration and exploitation respectively. Coupled with results suggesting that activation in exploration and exploitation decisions is generally more similar than it is different suggests there remain significant challenges toward characterizing explore-exploit decision making. Nonetheless, dlPFC and ACC activation differentiate explore and exploit decisions and identifying these responses can help in targeted interventions aimed at manipulating these decisions.
Collapse
|
10
|
Shen X, Helion C, Smith DV, Murty VP. Motivation as a Lens for Understanding Information-seeking Behaviors. J Cogn Neurosci 2024; 36:362-376. [PMID: 37944120 DOI: 10.1162/jocn_a_02083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2023]
Abstract
Most prior research characterizes information-seeking behaviors as serving utilitarian purposes, such as whether the obtained information can help solve practical problems. However, information-seeking behaviors are sensitive to different contexts (i.e., threat vs. curiosity), despite having equivalent utility. Furthermore, these search behaviors can be modulated by individuals' life history and personality traits. Yet the emphasis on utilitarian utility has precluded the development of a unified model, which explains when and how individuals actively seek information. To account for this variability and flexibility, we propose a unified information-seeking framework that examines information-seeking through the lens of motivation. This unified model accounts for integration across individuals' internal goal states and the salient features of the environment to influence information-seeking behavior. We propose that information-seeking is determined by motivation for information, invigorated either by instrumental utility or hedonic utility, wherein one's personal or environmental context moderates this relationship. Furthermore, we speculate that the final common denominator in guiding information-seeking is the engagement of different neuromodulatory circuits centered on dopaminergic and noradrenergic tone. Our framework provides a unified framework for information-seeking behaviors and generates several testable predictions for future studies.
Collapse
|
11
|
Wyatt LE, Hewan PA, Hogeveen J, Spreng RN, Turner GR. Exploration versus exploitation decisions in the human brain: A systematic review of functional neuroimaging and neuropsychological studies. Neuropsychologia 2024; 192:108740. [PMID: 38036246 DOI: 10.1016/j.neuropsychologia.2023.108740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Revised: 10/15/2023] [Accepted: 11/21/2023] [Indexed: 12/02/2023]
Abstract
Thoughts and actions are often driven by a decision to either explore new avenues with unknown outcomes, or to exploit known options with predictable outcomes. Yet, the neural mechanisms underlying this exploration-exploitation trade-off in humans remain poorly understood. This is attributable to variability in the operationalization of exploration and exploitation as psychological constructs, as well as the heterogeneity of experimental protocols and paradigms used to study these choice behaviours. To address this gap, here we present a comprehensive review of the literature to investigate the neural basis of explore-exploit decision-making in humans. We first conducted a systematic review of functional magnetic resonance imaging (fMRI) studies of exploration-versus exploitation-based decision-making in healthy adult humans during foraging, reinforcement learning, and information search. Eleven fMRI studies met inclusion criterion for this review. Adopting a network neuroscience framework, synthesis of the findings across these studies revealed that exploration-based choice was associated with the engagement of attentional, control, and salience networks. In contrast, exploitation-based choice was associated with engagement of default network brain regions. We interpret these results in the context of a network architecture that supports the flexible switching between externally and internally directed cognitive processes, necessary for adaptive, goal-directed behaviour. To further investigate potential neural mechanisms underlying the exploration-exploitation trade-off we next surveyed studies involving neurodevelopmental, neuropsychological, and neuropsychiatric disorders, as well as lifespan development, and neurodegenerative diseases. We observed striking differences in patterns of explore-exploit decision-making across these populations, again suggesting that these two decision-making modes are supported by independent neural circuits. Taken together, our review highlights the need for precision-mapping of the neural circuitry and behavioural correlates associated with exploration and exploitation in humans. Characterizing exploration versus exploitation decision-making biases may offer a novel, trans-diagnostic approach to assessment, surveillance, and intervention for cognitive decline and dysfunction in normal development and clinical populations.
Collapse
Affiliation(s)
- Lindsay E Wyatt
- Department of Psychology, York University, Toronto, ON, Canada
| | - Patrick A Hewan
- Department of Psychology, York University, Toronto, ON, Canada
| | - Jeremy Hogeveen
- Department of Psychology, The University of New Mexico, Albuquerque, NM, USA
| | - R Nathan Spreng
- Montréal Neurological Institute, Department of Neurology and Neurosurgery, McGill University, Montréal, QC, H3A 2B4, Canada; Department of Psychology, McGill University, Montréal, QC, Canada; Department of Psychiatry, McGill University, Montréal, QC, Canada; McConnell Brain Imaging Centre, Montréal Neurological Institute, McGill University, Montréal, QC, Canada.
| | - Gary R Turner
- Department of Psychology, York University, Toronto, ON, Canada.
| |
Collapse
|
12
|
Fujimoto A, Elorette C, Fujimoto SH, Fleysher L, Rudebeck PH, Russ BE. Pharmacological modulation of dopamine D1 and D2 receptors reveals distinct neural networks related to probabilistic learning in non-human primates. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.27.573487. [PMID: 38234858 PMCID: PMC10793459 DOI: 10.1101/2023.12.27.573487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
The neurotransmitter dopamine (DA) has a multifaceted role in healthy and disordered brains through its action on multiple subtypes of dopaminergic receptors. How modulation of these receptors controls behavior by altering connectivity across intrinsic brain-wide networks remains elusive. Here we performed parallel behavioral and resting-state functional MRI experiments after administration of two different DA receptor antagonists in macaque monkeys. Systemic administration of SCH-23390 (D1 antagonist) disrupted probabilistic learning when subjects had to learn new stimulus-reward associations and diminished functional connectivity (FC) in cortico-cortical and fronto-striatal connections. By contrast, haloperidol (D2 antagonist) improved learning and broadly enhanced FC in cortical connections. Further comparison between the effect of SCH-23390/haloperidol on behavioral and resting-state FC revealed specific cortical and subcortical networks associated with the cognitive and motivational effects of DA, respectively. Thus, we reveal the distinct brain-wide networks that are associated with the dopaminergic control of learning and motivation via DA receptors.
Collapse
Affiliation(s)
- Atsushi Fujimoto
- Nash Family Department of Neuroscience and Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029
- Lipschultz Center for Cognitive Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, 10029
| | - Catherine Elorette
- Nash Family Department of Neuroscience and Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029
- Lipschultz Center for Cognitive Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, 10029
| | - Satoka H. Fujimoto
- Nash Family Department of Neuroscience and Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029
- Lipschultz Center for Cognitive Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, 10029
| | - Lazar Fleysher
- BioMedical Engineering and Imaging Institute, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029
| | - Peter H. Rudebeck
- Nash Family Department of Neuroscience and Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029
- Lipschultz Center for Cognitive Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, 10029
| | - Brian E. Russ
- Nash Family Department of Neuroscience and Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029
- Center for Biomedical Imaging and Neuromodulation, Nathan Kline Institute, 140 Old Orangeburg Road, Orangeburg, NY 10962
- Department of Psychiatry, New York University at Langone, One, 8, Park Ave, New York, NY 10016
| |
Collapse
|
13
|
Mathar D, Wiebe A, Tuzsus D, Knauth K, Peters J. Erotic cue exposure increases physiological arousal, biases choices toward immediate rewards, and attenuates model-based reinforcement learning. Psychophysiology 2023; 60:e14381. [PMID: 37435973 DOI: 10.1111/psyp.14381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 04/21/2023] [Accepted: 06/17/2023] [Indexed: 07/13/2023]
Abstract
Computational psychiatry focuses on identifying core cognitive processes that appear altered across distinct psychiatric disorders. Temporal discounting of future rewards and model-based control during reinforcement learning have proven as two promising candidates. Despite its trait-like stability, temporal discounting may be at least partly under contextual control. Highly arousing cues were shown to increase discounting, although evidence to date remains somewhat mixed. Whether model-based reinforcement learning is similarly affected by arousing cues remains unclear. Here, we tested cue-reactivity effects (erotic pictures) on subsequent temporal discounting and model-based reinforcement learning in a within-subjects design in n = 39 healthy heterosexual male participants. Self-reported and physiological arousal (cardiac activity and pupil dilation) were assessed before and during cue exposure. Arousal was increased during exposure of erotic versus neutral cues both on the subjective and autonomic level. Erotic cue exposure increased discounting as reflected by more impatient choices. Hierarchical drift diffusion modeling (DDM) linked increased discounting to a shift in the starting point bias of evidence accumulation toward immediate options. Model-based control during reinforcement learning was reduced following erotic cues according to model-agnostic analysis. Notably, DDM linked this effect to attenuated forgetting rates of unchosen options, leaving the model-based control parameter unchanged. Our findings replicate previous work on cue-reactivity effects in temporal discounting and for the first time show similar effects in model-based reinforcement learning in a heterosexual male sample. This highlights how environmental cues can impact core human decision processes and reveal that comprehensive modeling approaches can yield novel insights in reward-based decision processes.
Collapse
Affiliation(s)
- David Mathar
- Department of Psychology, Biological Psychology, University of Cologne, Cologne, Germany
| | - Annika Wiebe
- Department of Psychiatry and Psychotherapy, University Hospital Bonn, Bonn, Germany
| | - Deniz Tuzsus
- Department of Psychology, Biological Psychology, University of Cologne, Cologne, Germany
| | - Kilian Knauth
- Department of Psychology, Biological Psychology, University of Cologne, Cologne, Germany
| | - Jan Peters
- Department of Psychology, Biological Psychology, University of Cologne, Cologne, Germany
| |
Collapse
|
14
|
Daumas L, Zory R, Junquera-Badilla I, Ferrandez M, Ettore E, Robert P, Sacco G, Manera V, Ramanoël S. How does apathy impact exploration-exploitation decision-making in older patients with neurocognitive disorders? NPJ AGING 2023; 9:25. [PMID: 37903801 PMCID: PMC10616174 DOI: 10.1038/s41514-023-00121-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 09/14/2023] [Indexed: 11/01/2023]
Abstract
Apathy is a pervasive clinical syndrome in neurocognitive disorders, characterized by a quantitative reduction in goal-directed behaviors. The brain structures involved in the physiopathology of apathy have also been connected to the brain structures involved in probabilistic reward learning in the exploration-exploitation dilemma. This dilemma in question involves the challenge of selecting between a familiar option with a more predictable outcome, and another option whose outcome is uncertain and may yield potentially greater rewards compared to the known option. The aim of this study was to combine experimental procedures and computational modeling to examine whether, in older adults with mild neurocognitive disorders, apathy affects performance in the exploration-exploitation dilemma. Through using a four-armed bandit reinforcement-learning task, we showed that apathetic older adults explored more and performed worse than non-apathetic subjects. Moreover, the mental flexibility assessed by the Trail-making test-B was negatively associated with the percentage of exploration. These results suggest that apathy is characterized by an increased explorative behavior and inefficient decision-making, possibly due to weak mental flexibility to switch toward the exploitation of the more rewarding options. Apathetic participants also took longer to make a choice and failed more often to respond in the allotted time, which could reflect the difficulties in action initiation and selection. In conclusion, the present results suggest that apathy in participants with neurocognitive disorders is associated with specific disturbances in the exploration-exploitation trade-off and sheds light on the disturbances in reward processing in patients with apathy.
Collapse
Affiliation(s)
- Lyne Daumas
- Université Côte d'Azur, LAMHESS, Nice, France.
- Université Côte d'Azur, CoBTeK, Nice, France.
| | - Raphaël Zory
- Université Côte d'Azur, LAMHESS, Nice, France
- Institut Universitaire de France, Paris, France
| | | | - Marion Ferrandez
- Université Côte d'Azur, CoBTeK, Nice, France
- Université Côte d'Azur, Centre Hospitalier Universitaire de Nice, service Clinique Gériatrique de Soins Ambulatoires, Centre Mémoire de Ressources et de Recherche, Nice, France
| | - Eric Ettore
- Université Côte d'Azur, CoBTeK, Nice, France
- Université Côte d'Azur, Centre Hospitalier Universitaire de Nice, service Clinique Gériatrique de Soins Ambulatoires, Centre Mémoire de Ressources et de Recherche, Nice, France
- Association Innovation Alzheimer, Nice, France
| | - Philippe Robert
- Université Côte d'Azur, CoBTeK, Nice, France
- Association Innovation Alzheimer, Nice, France
| | - Guillaume Sacco
- Université Côte d'Azur, CoBTeK, Nice, France
- Université Côte d'Azur, Centre Hospitalier Universitaire de Nice, service Clinique Gériatrique de Soins Ambulatoires, Centre Mémoire de Ressources et de Recherche, Nice, France
- Association Innovation Alzheimer, Nice, France
- Univ Angers, Université de Nantes, LPPL, SFR CONFLUENCES, 49000, Angers, France
| | - Valeria Manera
- Université Côte d'Azur, CoBTeK, Nice, France
- Association Innovation Alzheimer, Nice, France
| | - Stephen Ramanoël
- Université Côte d'Azur, LAMHESS, Nice, France
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, 17 rue Moreau, 75012, Paris, France
| |
Collapse
|
15
|
Soussi C, Berthoz S, Chirokoff V, Chanraud S. Interindividual Brain and Behavior Differences in Adaptation to Unexpected Uncertainty. BIOLOGY 2023; 12:1323. [PMID: 37887033 PMCID: PMC10604029 DOI: 10.3390/biology12101323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Revised: 09/25/2023] [Accepted: 10/03/2023] [Indexed: 10/28/2023]
Abstract
To adapt to a new environment, individuals must alternate between exploiting previously learned "action-consequence" combinations and exploring new actions for which the consequences are unknown: they face an exploration/exploitation trade-off. The neural substrates of these behaviors and the factors that may relate to the interindividual variability in their expression remain overlooked, in particular when considering neural connectivity patterns. Here, to trigger environmental uncertainty, false feedbacks were introduced in the second phase of an associative learning task. Indices reflecting exploitation and cost of uncertainty were computed. Changes in the intrinsic connectivity were determined using resting-state functional connectivity (rFC) analyses before and after performing the "cheated" phase of the task in the MRI. We explored their links with behavioral and psychological factors. Dispersion in the participants' cost of uncertainty was used to categorize two groups. These groups showed different patterns of rFC changes. Moreover, in the overall sample, exploitation was correlated with rFC changes between (1) the anterior cingulate cortex and the cerebellum region 3, and (2) the left frontal inferior gyrus (orbital part) and the right frontal inferior gyrus (triangular part). Anxiety and doubt about action propensity were weakly correlated with some rFC changes. These results demonstrate that the exploration/exploitation trade-off involves the modulation of cortico-cerebellar intrinsic connectivity.
Collapse
Affiliation(s)
- Célia Soussi
- INCIA CNRS 5287, University of Bordeaux, 33076 Bordeaux, France; (C.S.); (V.C.); (S.C.)
- UNICAEN, INSERM, U1237, PhIND “Physiopathology and Imaging of Neurological Disorders”, NeuroPresage Team, Cyceron, Normandy University, 14000 Caen, France
| | - Sylvie Berthoz
- INCIA CNRS 5287, University of Bordeaux, 33076 Bordeaux, France; (C.S.); (V.C.); (S.C.)
- Department of Psychiatry for Adolescents and Young Adults, Institut Mutualiste Montsouris, 75014 Paris, France
| | - Valentine Chirokoff
- INCIA CNRS 5287, University of Bordeaux, 33076 Bordeaux, France; (C.S.); (V.C.); (S.C.)
- Ecole Pratique des Hautes Etudes, Section of Life and Earth Sciences, PSL Research University, 75014 Paris, France
| | - Sandra Chanraud
- INCIA CNRS 5287, University of Bordeaux, 33076 Bordeaux, France; (C.S.); (V.C.); (S.C.)
- Ecole Pratique des Hautes Etudes, Section of Life and Earth Sciences, PSL Research University, 75014 Paris, France
| |
Collapse
|
16
|
Ianni AM, Eisenberg DP, Boorman ED, Constantino SM, Hegarty CE, Gregory MD, Masdeu JC, Kohn PD, Behrens TE, Berman KF. PET-measured human dopamine synthesis capacity and receptor availability predict trading rewards and time-costs during foraging. Nat Commun 2023; 14:6122. [PMID: 37777515 PMCID: PMC10542376 DOI: 10.1038/s41467-023-41897-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 09/18/2023] [Indexed: 10/02/2023] Open
Abstract
Foraging behavior requires weighing costs of time to decide when to leave one reward patch to search for another. Computational and animal studies suggest that striatal dopamine is key to this process; however, the specific role of dopamine in foraging behavior in humans is not well characterized. We use positron emission tomography (PET) imaging to directly measure dopamine synthesis capacity and D1 and D2/3 receptor availability in 57 healthy adults who complete a computerized foraging task. Using voxelwise data and principal component analysis to identify patterns of variation across PET measures, we show that striatal D1 and D2/3 receptor availability and a pattern of mesolimbic and anterior cingulate cortex dopamine function are important for adjusting the threshold for leaving a patch to explore, with specific sensitivity to changes in travel time. These findings suggest a key role for dopamine in trading reward benefits against temporal costs to modulate behavioral adaptions to changes in the reward environment critical for foraging.
Collapse
Affiliation(s)
- Angela M Ianni
- Clinical & Translational Neuroscience Branch, National Institutes of Mental Health, Intramural Research Program, National Institutes of Health, Bethesda, MD, USA.
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, United Kingdom.
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, USA.
| | - Daniel P Eisenberg
- Clinical & Translational Neuroscience Branch, National Institutes of Mental Health, Intramural Research Program, National Institutes of Health, Bethesda, MD, USA
| | - Erie D Boorman
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, United Kingdom
| | - Sara M Constantino
- Department of Psychology, New York University, New York, NY, USA
- School of Public Policy and Urban Affairs, Northeastern University, Boston, MA, USA
- Department of Psychology, Northeastern University, Boston, MA, USA
- School of Public and International Affairs, Princeton University, Princeton, NJ, USA
| | - Catherine E Hegarty
- Clinical & Translational Neuroscience Branch, National Institutes of Mental Health, Intramural Research Program, National Institutes of Health, Bethesda, MD, USA
| | - Michael D Gregory
- Clinical & Translational Neuroscience Branch, National Institutes of Mental Health, Intramural Research Program, National Institutes of Health, Bethesda, MD, USA
| | - Joseph C Masdeu
- Clinical & Translational Neuroscience Branch, National Institutes of Mental Health, Intramural Research Program, National Institutes of Health, Bethesda, MD, USA
- Houston Methodist Institute for Academic Medicine, Houston, TX, USA
- Weill Cornell Medicine, New York, NY, USA
| | - Philip D Kohn
- Clinical & Translational Neuroscience Branch, National Institutes of Mental Health, Intramural Research Program, National Institutes of Health, Bethesda, MD, USA
| | - Timothy E Behrens
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, United Kingdom
| | - Karen F Berman
- Clinical & Translational Neuroscience Branch, National Institutes of Mental Health, Intramural Research Program, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
17
|
Sidorenko N, Chung HK, Grueschow M, Quednow BB, Hayward-Könnecke H, Jetter A, Tobler PN. Acetylcholine and noradrenaline enhance foraging optimality in humans. Proc Natl Acad Sci U S A 2023; 120:e2305596120. [PMID: 37639601 PMCID: PMC10483619 DOI: 10.1073/pnas.2305596120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Accepted: 07/26/2023] [Indexed: 08/31/2023] Open
Abstract
Foraging theory prescribes when optimal foragers should leave the current option for more rewarding alternatives. Actual foragers often exploit options longer than prescribed by the theory, but it is unclear how this foraging suboptimality arises. We investigated whether the upregulation of cholinergic, noradrenergic, and dopaminergic systems increases foraging optimality. In a double-blind, between-subject design, participants (N = 160) received placebo, the nicotinic acetylcholine receptor agonist nicotine, a noradrenaline reuptake inhibitor reboxetine, or a preferential dopamine reuptake inhibitor methylphenidate, and played the role of a farmer who collected milk from patches with different yield. Across all groups, participants on average overharvested. While methylphenidate had no effects on this bias, nicotine, and to some extent also reboxetine, significantly reduced deviation from foraging optimality, which resulted in better performance compared to placebo. Concurring with amplified goal-directedness and excluding heuristic explanations, nicotine independently also improved trial initiation and time perception. Our findings elucidate the neurochemical basis of behavioral flexibility and decision optimality and open unique perspectives on psychiatric disorders affecting these functions.
Collapse
Affiliation(s)
- Nick Sidorenko
- Department of Economics, Laboratory for Social and Neural Systems Research, University of Zurich, Zurich8006, Switzerland
- Department of Economics, Zurich Center for Neuroeconomics, University of Zurich, Zurich8006, Switzerland
| | - Hui-Kuan Chung
- Department of Economics, Laboratory for Social and Neural Systems Research, University of Zurich, Zurich8006, Switzerland
- Department of Economics, Zurich Center for Neuroeconomics, University of Zurich, Zurich8006, Switzerland
| | - Marcus Grueschow
- Department of Economics, Laboratory for Social and Neural Systems Research, University of Zurich, Zurich8006, Switzerland
- Department of Economics, Zurich Center for Neuroeconomics, University of Zurich, Zurich8006, Switzerland
| | - Boris B. Quednow
- Experimental and Clinical Pharmacopsychology, Department of Psychiatry, Psychotherapy and Psychosomatics, Psychiatric University Hospital Zurich, University of Zurich, Zurich8008, Switzerland
- Neuroscience Center Zurich, ETH Zurich and University of Zurich, Zurich8057, Switzerland
| | - Helen Hayward-Könnecke
- Department of Neurology, Section of Neuroimmunology and Multiple Sclerosis Research, University Hospital Zurich, Zurich8091, Switzerland
| | - Alexander Jetter
- National Poisons Information Centre, Tox Info Suisse, Associated Institute of the University of Zurich, Zurich8032, Switzerland
| | - Philippe N. Tobler
- Department of Economics, Laboratory for Social and Neural Systems Research, University of Zurich, Zurich8006, Switzerland
- Department of Economics, Zurich Center for Neuroeconomics, University of Zurich, Zurich8006, Switzerland
- Neuroscience Center Zurich, ETH Zurich and University of Zurich, Zurich8057, Switzerland
| |
Collapse
|
18
|
Chakroun K, Wiehler A, Wagner B, Mathar D, Ganzer F, van Eimeren T, Sommer T, Peters J. Dopamine regulates decision thresholds in human reinforcement learning in males. Nat Commun 2023; 14:5369. [PMID: 37666865 PMCID: PMC10477234 DOI: 10.1038/s41467-023-41130-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 08/22/2023] [Indexed: 09/06/2023] Open
Abstract
Dopamine fundamentally contributes to reinforcement learning, but recent accounts also suggest a contribution to specific action selection mechanisms and the regulation of response vigour. Here, we examine dopaminergic mechanisms underlying human reinforcement learning and action selection via a combined pharmacological neuroimaging approach in male human volunteers (n = 31, within-subjects; Placebo, 150 mg of the dopamine precursor L-dopa, 2 mg of the D2 receptor antagonist Haloperidol). We found little credible evidence for previously reported beneficial effects of L-dopa vs. Haloperidol on learning from gains and altered neural prediction error signals, which may be partly due to differences experimental design and/or drug dosages. Reinforcement learning drift diffusion models account for learning-related changes in accuracy and response times, and reveal consistent decision threshold reductions under both drugs, in line with the idea that lower dosages of D2 receptor antagonists increase striatal DA release via an autoreceptor-mediated feedback mechanism. These results are in line with the idea that dopamine regulates decision thresholds during reinforcement learning, and may help to bridge action selection and response vigor accounts of dopamine.
Collapse
Affiliation(s)
- Karima Chakroun
- Institute for Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Antonius Wiehler
- Motivation, Brain and Behavior Lab, Paris Brain Institute (ICM), Pitié-Salpêtrière Hospital, Paris, France
| | - Ben Wagner
- Chair of Cognitive Computational Neuroscience, Technical University Dresden, Dresden, Germany
| | - David Mathar
- Department of Psychology, Biological Psychology, University of Cologne, Cologne, Germany
| | - Florian Ganzer
- Integrated Psychiatry Winterthur, Winterthur, Switzerland
| | - Thilo van Eimeren
- Multimodal Neuroimaging Group, Department of Nuclear Medicine, University Medical Center Cologne, Cologne, Germany
| | - Tobias Sommer
- Institute for Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Jan Peters
- Institute for Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
- Department of Psychology, Biological Psychology, University of Cologne, Cologne, Germany.
| |
Collapse
|
19
|
Blackwell KT, Doya K. Enhancing reinforcement learning models by including direct and indirect pathways improves performance on striatal dependent tasks. PLoS Comput Biol 2023; 19:e1011385. [PMID: 37594982 PMCID: PMC10479916 DOI: 10.1371/journal.pcbi.1011385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 09/05/2023] [Accepted: 07/25/2023] [Indexed: 08/20/2023] Open
Abstract
A major advance in understanding learning behavior stems from experiments showing that reward learning requires dopamine inputs to striatal neurons and arises from synaptic plasticity of cortico-striatal synapses. Numerous reinforcement learning models mimic this dopamine-dependent synaptic plasticity by using the reward prediction error, which resembles dopamine neuron firing, to learn the best action in response to a set of cues. Though these models can explain many facets of behavior, reproducing some types of goal-directed behavior, such as renewal and reversal, require additional model components. Here we present a reinforcement learning model, TD2Q, which better corresponds to the basal ganglia with two Q matrices, one representing direct pathway neurons (G) and another representing indirect pathway neurons (N). Unlike previous two-Q architectures, a novel and critical aspect of TD2Q is to update the G and N matrices utilizing the temporal difference reward prediction error. A best action is selected for N and G using a softmax with a reward-dependent adaptive exploration parameter, and then differences are resolved using a second selection step applied to the two action probabilities. The model is tested on a range of multi-step tasks including extinction, renewal, discrimination; switching reward probability learning; and sequence learning. Simulations show that TD2Q produces behaviors similar to rodents in choice and sequence learning tasks, and that use of the temporal difference reward prediction error is required to learn multi-step tasks. Blocking the update rule on the N matrix blocks discrimination learning, as observed experimentally. Performance in the sequence learning task is dramatically improved with two matrices. These results suggest that including additional aspects of basal ganglia physiology can improve the performance of reinforcement learning models, better reproduce animal behaviors, and provide insight as to the role of direct- and indirect-pathway striatal neurons.
Collapse
Affiliation(s)
- Kim T Blackwell
- Department of Bioengineering, Volgenau School of Engineering, George Mason University, Fairfax, Virginia, United States of America
| | - Kenji Doya
- Neural Computation Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| |
Collapse
|
20
|
Sinclair AH, Wang YC, Adcock RA. Instructed motivational states bias reinforcement learning and memory formation. Proc Natl Acad Sci U S A 2023; 120:e2304881120. [PMID: 37490530 PMCID: PMC10401012 DOI: 10.1073/pnas.2304881120] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 06/19/2023] [Indexed: 07/27/2023] Open
Abstract
Motivation influences goals, decisions, and memory formation. Imperative motivation links urgent goals to actions, narrowing the focus of attention and memory. Conversely, interrogative motivation integrates goals over time and space, supporting rich memory encoding for flexible future use. We manipulated motivational states via cover stories for a reinforcement learning task: The imperative group imagined executing a museum heist, whereas the interrogative group imagined planning a future heist. Participants repeatedly chose among four doors, representing different museum rooms, to sample trial-unique paintings with variable rewards (later converted to bonus payments). The next day, participants performed a surprise memory test. Crucially, only the cover stories differed between the imperative and interrogative groups; the reinforcement learning task was identical, and all participants had the same expectations about how and when bonus payments would be awarded. In an initial sample and a preregistered replication, we demonstrated that imperative motivation increased exploitation during reinforcement learning. Conversely, interrogative motivation increased directed (but not random) exploration, despite the cost to participants' earnings. At test, the interrogative group was more accurate at recognizing paintings and recalling associated values. In the interrogative group, higher value paintings were more likely to be remembered; imperative motivation disrupted this effect of reward modulating memory. Overall, we demonstrate that a prelearning motivational manipulation can bias learning and memory, bearing implications for education, behavior change, clinical interventions, and communication.
Collapse
Affiliation(s)
- Alyssa H. Sinclair
- Department of Psychology & Neuroscience, Duke University, Durham, NC27710
| | - Yuxi C. Wang
- Department of Psychology & Neuroscience, Duke University, Durham, NC27710
| | - R. Alison Adcock
- Department of Psychology & Neuroscience, Duke University, Durham, NC27710
- Department of Psychiatry & Behavioral Sciences, Duke University, Durham, NC27710
| |
Collapse
|
21
|
Chen CS, Mueller D, Knep E, Ebitz RB, Grissom NM. Dopamine and norepinephrine differentially mediate the exploration-exploitation tradeoff. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.09.523322. [PMID: 36711959 PMCID: PMC9881999 DOI: 10.1101/2023.01.09.523322] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
The catecholamines dopamine (DA) and norepinephrine (NE) have been repeatedly implicated in neuropsychiatric vulnerability, in part via their roles in mediating the decision making processes. Although the two neuromodulators share a synthesis pathway and are co-activated under states of arousal, they engage in distinct circuits and roles in modulating neural activity across the brain. However, in the computational neuroscience literature, they have been assigned similar roles in modulating the latent cognitive processes of decision making, in particular the exploration-exploitation tradeoff. Revealing how each neuromodulator contributes to this explore-exploit process will be important in guiding mechanistic hypotheses emerging from computational psychiatric approaches. To understand the differences and overlaps of the roles of these two catecholamine systems in regulating exploration and exploitation, a direct comparison using the same dynamic decision making task is needed. Here, we ran mice in a restless two-armed bandit task, which encourages both exploration and exploitation. We systemically administered a nonselective DA receptor antagonist (flupenthixol), a nonselective DA receptor agonist (apomorphine), a NE beta-receptor antagonist (propranolol), and a NE beta-receptor agonist (isoproterenol), and examined changes in exploration within subjects across sessions. We found a bidirectional modulatory effect of dopamine receptor activity on the level of exploration. Increasing dopamine activity decreased exploration and decreasing dopamine activity increased exploration. Beta-noradrenergic receptor activity also modulated exploration, but the modulatory effect was mediated by sex. Reinforcement learning model parameters suggested that dopamine modulation affected exploration via decision noise and norepinephrine modulation affected exploration via outcome sensitivity. Together, these findings suggested that the mechanisms that govern the transition between exploration and exploitation are sensitive to changes in both catecholamine functions and revealed differential roles for NE and DA in mediating exploration.
Collapse
|
22
|
He H, Hong L, Sajda P. Pupillary response is associated with the reset and switching of functional brain networks during salience processing. PLoS Comput Biol 2023; 19:e1011081. [PMID: 37172067 DOI: 10.1371/journal.pcbi.1011081] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 05/24/2023] [Accepted: 04/06/2023] [Indexed: 05/14/2023] Open
Abstract
The interface between processing internal goals and salient events in the environment involves various top-down processes. Previous studies have identified multiple brain areas for salience processing, including the salience network (SN), dorsal attention network, and the locus coeruleus-norepinephrine (LC-NE) system. However, interactions among these systems in salience processing remain unclear. Here, we simultaneously recorded pupillometry, EEG, and fMRI during an auditory oddball paradigm. The analyses of EEG and fMRI data uncovered spatiotemporally organized target-associated neural correlates. By modeling the target-modulated effective connectivity, we found that the target-evoked pupillary response is associated with the network directional couplings from late to early subsystems in the trial, as well as the network switching initiated by the SN. These findings indicate that the SN might cooperate with the pupil-indexed LC-NE system in the reset and switching of cortical networks, and shed light on their implications in various cognitive processes and neurological diseases.
Collapse
Affiliation(s)
- Hengda He
- Department of Biomedical Engineering, Columbia University, New York, New York, United States of America
| | - Linbi Hong
- Department of Biomedical Engineering, Columbia University, New York, New York, United States of America
| | - Paul Sajda
- Department of Biomedical Engineering, Columbia University, New York, New York, United States of America
- Department of Electrical Engineering, Columbia University, New York, New York, United States of America
- Department of Radiology, Columbia University, New York, New York, United States of America
- Data Science Institute, Columbia University, New York, New York, United States of America
| |
Collapse
|
23
|
Jansen M, Lockwood PL, Cutler J, de Bruijn ERA. l-DOPA and oxytocin influence the neurocomputational mechanisms of self-benefitting and prosocial reinforcement learning. Neuroimage 2023; 270:119983. [PMID: 36848972 DOI: 10.1016/j.neuroimage.2023.119983] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 02/03/2023] [Accepted: 02/23/2023] [Indexed: 02/27/2023] Open
Abstract
Humans learn through reinforcement, particularly when outcomes are unexpected. Recent research suggests similar mechanisms drive how we learn to benefit other people, that is, how we learn to be prosocial. Yet the neurochemical mechanisms underlying such prosocial computations remain poorly understood. Here, we investigated whether pharmacological manipulation of oxytocin and dopamine influence the neurocomputational mechanisms underlying self-benefitting and prosocial reinforcement learning. Using a double-blind placebo-controlled cross-over design, we administered intranasal oxytocin (24 IU), dopamine precursor l-DOPA (100 mg + 25 mg carbidopa), or placebo over three sessions. Participants performed a probabilistic reinforcement learning task with potential rewards for themselves, another participant, or no one, during functional magnetic resonance imaging. Computational models of reinforcement learning were used to calculate prediction errors (PEs) and learning rates. Participants behavior was best explained by a model with different learning rates for each recipient, but these were unaffected by either drug. On the neural level, however, both drugs blunted PE signaling in the ventral striatum and led to negative signaling of PEs in the anterior mid-cingulate cortex, dorsolateral prefrontal cortex, inferior parietal gyrus, and precentral gyrus, compared to placebo, and regardless of recipient. Oxytocin (versus placebo) administration was additionally associated with opposing tracking of self-benefitting versus prosocial PEs in dorsal anterior cingulate cortex, insula and superior temporal gyrus. These findings suggest that both l-DOPA and oxytocin induce a context-independent shift from positive towards negative tracking of PEs during learning. Moreover, oxytocin may have opposing effects on PE signaling when learning to benefit oneself versus another.
Collapse
Affiliation(s)
- Myrthe Jansen
- Department of Clinical Psychology, Leiden University, the Netherlands; Leiden Institute for Brain and Cognition (LIBC), Leiden, the Netherlands.
| | - Patricia L Lockwood
- Centre for Human Brain Health, School of Psychology, University of Birmingham, Birmingham, UK; Institute for Mental Health, School of Psychology, University of Birmingham, Birmingham, UK; Centre for Developmental Science, School of Psychology, University of Birmingham, UK
| | - Jo Cutler
- Centre for Human Brain Health, School of Psychology, University of Birmingham, Birmingham, UK; Institute for Mental Health, School of Psychology, University of Birmingham, Birmingham, UK; Centre for Developmental Science, School of Psychology, University of Birmingham, UK
| | - Ellen R A de Bruijn
- Department of Clinical Psychology, Leiden University, the Netherlands; Leiden Institute for Brain and Cognition (LIBC), Leiden, the Netherlands
| |
Collapse
|
24
|
Speers LJ, Bilkey DK. Maladaptive explore/exploit trade-offs in schizophrenia. Trends Neurosci 2023; 46:341-354. [PMID: 36878821 DOI: 10.1016/j.tins.2023.02.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 01/30/2023] [Accepted: 02/08/2023] [Indexed: 03/07/2023]
Abstract
Schizophrenia is a complex disorder that remains poorly understood, particularly at the systems level. In this opinion article we argue that the explore/exploit trade-off concept provides a holistic and ecologically valid framework to resolve some of the apparent paradoxes that have emerged within schizophrenia research. We review recent evidence suggesting that fundamental explore/exploit behaviors may be maladaptive in schizophrenia during physical, visual, and cognitive foraging. We also describe how theories from the broader optimal foraging literature, such as the marginal value theorem (MVT), could provide valuable insight into how aberrant processing of reward, context, and cost/effort evaluations interact to produce maladaptive responses.
Collapse
Affiliation(s)
- Lucinda J Speers
- Department of Psychology, University of Otago, Dunedin 9016, New Zealand
| | - David K Bilkey
- Department of Psychology, University of Otago, Dunedin 9016, New Zealand.
| |
Collapse
|
25
|
Recurrent networks endowed with structural priors explain suboptimal animal behavior. Curr Biol 2023; 33:622-638.e7. [PMID: 36657448 DOI: 10.1016/j.cub.2022.12.044] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 10/03/2022] [Accepted: 12/16/2022] [Indexed: 01/19/2023]
Abstract
The strategies found by animals facing a new task are determined both by individual experience and by structural priors evolved to leverage the statistics of natural environments. Rats quickly learn to capitalize on the trial sequence correlations of two-alternative forced choice (2AFC) tasks after correct trials but consistently deviate from optimal behavior after error trials. To understand this outcome-dependent gating, we first show that recurrent neural networks (RNNs) trained in the same 2AFC task outperform rats as they can readily learn to use across-trial information both after correct and error trials. We hypothesize that, although RNNs can optimize their behavior in the 2AFC task without any a priori restrictions, rats' strategy is constrained by a structural prior adapted to a natural environment in which rewarded and non-rewarded actions provide largely asymmetric information. When pre-training RNNs in a more ecological task with more than two possible choices, networks develop a strategy by which they gate off the across-trial evidence after errors, mimicking rats' behavior. Population analyses show that the pre-trained networks form an accurate representation of the sequence statistics independently of the outcome in the previous trial. After error trials, gating is implemented by a change in the network dynamics that temporarily decouple the categorization of the stimulus from the across-trial accumulated evidence. Our results suggest that the rats' suboptimal behavior reflects the influence of a structural prior that reacts to errors by isolating the network decision dynamics from the context, ultimately constraining the performance in a 2AFC laboratory task.
Collapse
|
26
|
de A Marcelino AL, Gray O, Al-Fatly B, Gilmour W, Douglas Steele J, Kühn AA, Gilbertson T. Pallidal neuromodulation of the explore/exploit trade-off in decision-making. eLife 2023; 12:79642. [PMID: 36727860 PMCID: PMC9940911 DOI: 10.7554/elife.79642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 02/01/2023] [Indexed: 02/03/2023] Open
Abstract
Every decision that we make involves a conflict between exploiting our current knowledge of an action's value or exploring alternative courses of action that might lead to a better, or worse outcome. The sub-cortical nuclei that make up the basal ganglia have been proposed as a neural circuit that may contribute to resolving this explore-exploit 'dilemma'. To test this hypothesis, we examined the effects of neuromodulating the basal ganglia's output nucleus, the globus pallidus interna, in patients who had undergone deep brain stimulation (DBS) for isolated dystonia. Neuromodulation enhanced the number of exploratory choices to the lower value option in a two-armed bandit probabilistic reversal-learning task. Enhanced exploration was explained by a reduction in the rate of evidence accumulation (drift rate) in a reinforcement learning drift diffusion model. We estimated the functional connectivity profile between the stimulating DBS electrode and the rest of the brain using a normative functional connectome derived from heathy controls. Variation in the extent of neuromodulation induced exploration between patients was associated with functional connectivity from the stimulation electrode site to a distributed brain functional network. We conclude that the basal ganglia's output nucleus, the globus pallidus interna, can adaptively modify decision choice when faced with the dilemma to explore or exploit.
Collapse
Affiliation(s)
- Ana Luisa de A Marcelino
- Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Movement Disorder and Neuromodulation Unit, Department of Neurology, Charité Campus MitteBerlinGermany
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Core Facility GenomicsBerlinGermany
| | - Owen Gray
- Division of Imaging Science and Technology, Medical School, University of DundeeDundeeUnited Kingdom
| | - Bassam Al-Fatly
- Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Movement Disorder and Neuromodulation Unit, Department of Neurology, Charité Campus MitteBerlinGermany
| | - William Gilmour
- Division of Imaging Science and Technology, Medical School, University of DundeeDundeeUnited Kingdom
| | - J Douglas Steele
- Division of Imaging Science and Technology, Medical School, University of DundeeDundeeUnited Kingdom
| | - Andrea A Kühn
- Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Movement Disorder and Neuromodulation Unit, Department of Neurology, Charité Campus MitteBerlinGermany
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Core Facility GenomicsBerlinGermany
- Berlin School of Mind and Brain, Charité - University Medicine BerlinBerlinGermany
- NeuroCure, Charité - University Medicine BerlinBerlinGermany
- DZNE, German Centre for Degenerative DiseasesBerlinGermany
| | - Tom Gilbertson
- Division of Imaging Science and Technology, Medical School, University of DundeeDundeeUnited Kingdom
- Department of Neurology, Ninewells Hospital & Medical SchoolDundeeUnited Kingdom
| |
Collapse
|
27
|
Mathar D, Erfanian Abdoust M, Marrenbach T, Tuzsus D, Peters J. The catecholamine precursor Tyrosine reduces autonomic arousal and decreases decision thresholds in reinforcement learning and temporal discounting. PLoS Comput Biol 2022; 18:e1010785. [PMID: 36548401 PMCID: PMC9822114 DOI: 10.1371/journal.pcbi.1010785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 01/06/2023] [Accepted: 12/01/2022] [Indexed: 12/24/2022] Open
Abstract
Supplementation with the catecholamine precursor L-Tyrosine might enhance cognitive performance, but overall findings are mixed. Here, we investigate the effect of a single dose of tyrosine (2g) vs. placebo on two catecholamine-dependent trans-diagnostic traits: model-based control during reinforcement learning (2-step task) and temporal discounting, using a double-blind, placebo-controlled, within-subject design (n = 28 healthy male participants). We leveraged drift diffusion models in a hierarchical Bayesian framework to jointly model participants' choices and response times (RTS) in both tasks. Furthermore, comprehensive autonomic monitoring (heart rate, heart rate variability, pupillometry, spontaneous eye blink rate) was performed both pre- and post-supplementation, to explore potential physiological effects of supplementation. Across tasks, tyrosine consistently reduced participants' RTs without deteriorating task-performance. Diffusion modeling linked this effect to attenuated decision-thresholds in both tasks and further revealed increased model-based control (2-step task) and (if anything) attenuated temporal discounting. On the physiological level, participants' pupil dilation was predictive of the individual degree of temporal discounting. Tyrosine supplementation reduced physiological arousal as revealed by increases in pupil dilation variability and reductions in heart rate. Supplementation-related changes in physiological arousal predicted individual changes in temporal discounting. Our findings provide first evidence that tyrosine supplementation might impact psychophysiological parameters, and suggest that modeling approaches based on sequential sampling models can yield novel insights into latent cognitive processes modulated by amino-acid supplementation.
Collapse
Affiliation(s)
- David Mathar
- Department of Psychology, Biological Psychology, University of Cologne, Cologne, Germany
- * E-mail:
| | - Mani Erfanian Abdoust
- Biological Psychology of Decision Making, Institute of Experimental Psychology, Heinrich Heine University Duesseldorf, Duesseldorf, Germany
| | - Tobias Marrenbach
- Biological Psychology of Decision Making, Institute of Experimental Psychology, Heinrich Heine University Duesseldorf, Duesseldorf, Germany
| | - Deniz Tuzsus
- Department of Psychology, Biological Psychology, University of Cologne, Cologne, Germany
| | - Jan Peters
- Department of Psychology, Biological Psychology, University of Cologne, Cologne, Germany
| |
Collapse
|
28
|
Disentangling the roles of dopamine and noradrenaline in the exploration-exploitation tradeoff during human decision-making. Neuropsychopharmacology 2022; 48:1078-1086. [PMID: 36522404 PMCID: PMC10209107 DOI: 10.1038/s41386-022-01517-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 11/29/2022] [Accepted: 11/30/2022] [Indexed: 12/23/2022]
Abstract
Balancing the exploration of new options and the exploitation of known options is a fundamental challenge in decision-making, yet the mechanisms involved in this balance are not fully understood. Here, we aimed to elucidate the distinct roles of dopamine and noradrenaline in the exploration-exploitation tradeoff during human choice. To this end, we used a double-blind, placebo-controlled design in which participants received either a placebo, 400 mg of the D2/D3 receptor antagonist amisulpride, or 40 mg of the β-adrenergic receptor antagonist propranolol before they completed a virtual patch-foraging task probing exploration and exploitation. We systematically varied the rewards associated with choice options, the rate by which rewards decreased over time, and the opportunity costs it took to switch to the next option to disentangle the contributions of dopamine and noradrenaline to specific choice aspects. Our data show that amisulpride increased the sensitivity to all of these three critical choice features, whereas propranolol was associated with a reduced tendency to use value information. Our findings provide novel insights into the specific roles of dopamine and noradrenaline in the regulation of human choice behavior, suggesting a critical involvement of dopamine in directed exploration and a role of noradrenaline in more random exploration.
Collapse
|
29
|
Menon V, Palaniyappan L, Supekar K. Integrative Brain Network and Salience Models of Psychopathology and Cognitive Dysfunction in Schizophrenia. Biol Psychiatry 2022:S0006-3223(22)01637-7. [PMID: 36702660 DOI: 10.1016/j.biopsych.2022.09.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Revised: 08/09/2022] [Accepted: 09/06/2022] [Indexed: 01/28/2023]
Abstract
Brain network models of cognitive control are central to advancing our understanding of psychopathology and cognitive dysfunction in schizophrenia. This review examines the role of large-scale brain organization in schizophrenia, with a particular focus on a triple-network model of cognitive control and its role in aberrant salience processing. First, we provide an overview of the triple network involving the salience, frontoparietal, and default mode networks and highlight the central role of the insula-anchored salience network in the aberrant mapping of salient external and internal events in schizophrenia. We summarize the extensive evidence that has emerged from structural, neurochemical, and functional brain imaging studies for aberrancies in these networks and their dynamic temporal interactions in schizophrenia. Next, we consider the hypothesis that atypical striatal dopamine release results in misattribution of salience to irrelevant external stimuli and self-referential mental events. We propose an integrated triple-network salience-based model incorporating striatal dysfunction and sensitivity to perceptual and cognitive prediction errors in the insula node of the salience network and postulate that dysregulated dopamine modulation of salience network-centered processes contributes to the core clinical phenotype of schizophrenia. Thus, a powerful paradigm to characterize the neurobiology of schizophrenia emerges when we combine conceptual models of salience with large-scale cognitive control networks in a unified manner. We conclude by discussing potential therapeutic leads on restoring brain network dysfunction in schizophrenia.
Collapse
Affiliation(s)
- Vinod Menon
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California; Department of Neurology and Neurological Sciences, Stanford University School of Medicine, Stanford, California; Wu Tsai Neurosciences Institute, Stanford University School of Medicine, Stanford, California.
| | - Lena Palaniyappan
- Department of Psychiatry and Robarts Research Institute, University of Western Ontario, London, Ontario, Canada; Lawson Health Research Institute, London, Ontario, Canada; Douglas Mental Health University Institute, Montreal, Quebec, Canada; Department of Psychiatry, McGill University, Montreal, Quebec, Canada
| | - Kaustubh Supekar
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California; Wu Tsai Neurosciences Institute, Stanford University School of Medicine, Stanford, California
| |
Collapse
|
30
|
Demiral ŞB, Manza P, Biesecker E, Wiers C, Shokri-Kojori E, McPherson K, Dennis E, Johnson A, Tomasi D, Wang GJ, Volkow ND. Striatal D1 and D2 receptor availability are selectively associated with eye-blink rates after methylphenidate treatment. Commun Biol 2022; 5:1015. [PMID: 36163254 PMCID: PMC9513088 DOI: 10.1038/s42003-022-03979-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 09/12/2022] [Indexed: 11/18/2022] Open
Abstract
Eye-blink rate has been proposed as a biomarker of the brain dopamine system, however, findings have not been consistent. This study assessed the relationship between blink rates, measured after oral placebo) (PL) and after a challenge with oral methylphenidate (MP; 60 mg) and striatal D1 receptor (D1R) (measured at baseline) and D2 receptor (D2R) availability (measured after PL and after MP) in healthy participants. PET measures of baseline D1R ([11C]NNC112) (BL-D1R) and D2R availability ([11C]raclopride) after PL (PL-D2R) and after MP (MP-D2R) were quantified in the striatum as non-displaceable binding potential. MP reduced the number of blinks and increased the time participants kept their eyes open. Correlations with dopamine receptors were only significant for the eye blink measures obtained after MP; being positive for BL-D1R in putamen and MP-D2R in caudate (PL-D2R were not significant). MP-induced changes in blink rates (PL minus MP) were negatively correlated with BL-D1R in caudate and putamen. Our findings suggest that eye blink measures obtained while stressing the dopamine system might provide a more sensitive behavioral biomarker of striatal D1R or D2R in healthy volunteers than that obtained at baseline or after placebo. PET imaging in human participants revealed that D1 and D2 dopamine receptor availability was associated with eye-blink rates following treatment with oral methylphenidate, but not a placebo.
Collapse
Affiliation(s)
- Şükrü B Demiral
- National Institute on Alcohol Abuse and Alcoholism, Bethesda, MD, USA.
| | - Peter Manza
- National Institute on Alcohol Abuse and Alcoholism, Bethesda, MD, USA
| | - Erin Biesecker
- National Institute on Alcohol Abuse and Alcoholism, Bethesda, MD, USA
| | - Corinde Wiers
- Department of Psychiatry, University of Pennsylvania, Philadelphia, PA, USA
| | | | | | - Evan Dennis
- National Institute on Alcohol Abuse and Alcoholism, Bethesda, MD, USA
| | - Allison Johnson
- National Institute on Alcohol Abuse and Alcoholism, Bethesda, MD, USA
| | - Dardo Tomasi
- National Institute on Alcohol Abuse and Alcoholism, Bethesda, MD, USA
| | - Gene-Jack Wang
- National Institute on Alcohol Abuse and Alcoholism, Bethesda, MD, USA
| | - Nora D Volkow
- National Institute on Drug Abuse, Bethesda, MD, USA.
| |
Collapse
|
31
|
Jepma M, Roy M, Ramlakhan K, van Velzen M, Dahan A. Different brain systems support learning from received and avoided pain during human pain-avoidance learning. eLife 2022; 11:74149. [PMID: 35731646 PMCID: PMC9217130 DOI: 10.7554/elife.74149] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 06/07/2022] [Indexed: 12/14/2022] Open
Abstract
Both unexpected pain and unexpected pain absence can drive avoidance learning, but whether they do so via shared or separate neural and neurochemical systems is largely unknown. To address this issue, we combined an instrumental pain-avoidance learning task with computational modeling, functional magnetic resonance imaging (fMRI), and pharmacological manipulations of the dopaminergic (100 mg levodopa) and opioidergic (50 mg naltrexone) systems (N = 83). Computational modeling provided evidence that untreated participants learned more from received than avoided pain. Our dopamine and opioid manipulations negated this learning asymmetry by selectively increasing learning rates for avoided pain. Furthermore, our fMRI analyses revealed that pain prediction errors were encoded in subcortical and limbic brain regions, whereas no-pain prediction errors were encoded in frontal and parietal cortical regions. However, we found no effects of our pharmacological manipulations on the neural encoding of prediction errors. Together, our results suggest that human pain-avoidance learning is supported by separate threat- and safety-learning systems, and that dopamine and endogenous opioids specifically regulate learning from successfully avoided pain.
Collapse
Affiliation(s)
- Marieke Jepma
- Department of Psychology, University of Amsterdam, Amsterdam, Netherlands.,Department of Psychology, Leiden University, Leiden, Netherlands.,Leiden Institute for Brain and Cognition, Leiden, Netherlands
| | - Mathieu Roy
- Department of Psychology, McGill University, Montreal, Canada.,Alan Edwards Centre for Research on Pain, McGill University, Montreal, Canada
| | - Kiran Ramlakhan
- Department of Psychology, Leiden University, Leiden, Netherlands.,Department of Research and Statistics, Municipality of Amsterdam, Amsterdam, Netherlands
| | - Monique van Velzen
- Department of Anesthesiology, Leiden University Medical Center, Leiden, Netherlands
| | - Albert Dahan
- Department of Anesthesiology, Leiden University Medical Center, Leiden, Netherlands
| |
Collapse
|
32
|
Smith E, Peters J. Motor response vigour and visual fixation patterns reflect subjective valuation during intertemporal choice. PLoS Comput Biol 2022; 18:e1010096. [PMID: 35687550 PMCID: PMC9187114 DOI: 10.1371/journal.pcbi.1010096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Accepted: 04/12/2022] [Indexed: 11/18/2022] Open
Abstract
Value-based decision-making is of central interest in cognitive neuroscience and psychology, as well as in the context of neuropsychiatric disorders characterised by decision-making impairments. Studies examining (neuro-)computational mechanisms underlying choice behaviour typically focus on participants’ decisions. However, there is increasing evidence that option valuation might also be reflected in motor response vigour and eye movements, implicit measures of subjective utility. To examine motor response vigour and visual fixation correlates of option valuation in intertemporal choice, we set up a task where the participants selected an option by pressing a grip force transducer, simultaneously tracking fixation shifts between options. As outlined in our preregistration (https://osf.io/k6jct), we used hierarchical Bayesian parameter estimation to model the choices assuming hyperbolic discounting, compared variants of the softmax and drift diffusion model, and assessed the relationship between response vigour and the estimated model parameters. The behavioural data were best explained by a drift diffusion model specifying a non-linear scaling of the drift rate by the subjective value differences. Replicating previous findings, we found a magnitude effect for temporal discounting, such that higher rewards were discounted less. This magnitude effect was further reflected in motor response vigour, such that stronger forces were exerted in the high vs. the low magnitude condition. Bayesian hierarchical linear regression further revealed higher grip forces, faster response times and a lower number of fixation shifts for trials with higher subjective value differences. An exploratory analysis revealed that subjective value sums across options showed an even more pronounced association with trial-wise grip force amplitudes. Our data suggest that subjective utility or implicit valuation is reflected in motor response vigour and visual fixation patterns during intertemporal choice. Taking into account response vigour might thus provide deeper insight into decision-making, reward valuation and maladaptive changes in these processes, e.g. in the context of neuropsychiatric disorders.
Collapse
Affiliation(s)
- Elke Smith
- Department of Psychology, Biological Psychology, University of Cologne, Cologne, Germany
- * E-mail:
| | - Jan Peters
- Department of Psychology, Biological Psychology, University of Cologne, Cologne, Germany
| |
Collapse
|
33
|
Dennison JB, Sazhin D, Smith DV. Decision neuroscience and neuroeconomics: Recent progress and ongoing challenges. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2022; 13:e1589. [PMID: 35137549 PMCID: PMC9124684 DOI: 10.1002/wcs.1589] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Revised: 11/28/2021] [Accepted: 12/21/2021] [Indexed: 01/10/2023]
Abstract
In the past decade, decision neuroscience and neuroeconomics have developed many new insights in the study of decision making. This review provides an overarching update on how the field has advanced in this time period. Although our initial review a decade ago outlined several theoretical, conceptual, methodological, empirical, and practical challenges, there has only been limited progress in resolving these challenges. We summarize significant trends in decision neuroscience through the lens of the challenges outlined for the field and review examples where the field has had significant, direct, and applicable impacts across economics and psychology. First, we review progress on topics including reward learning, explore-exploit decisions, risk and ambiguity, intertemporal choice, and valuation. Next, we assess the impacts of emotion, social rewards, and social context on decision making. Then, we follow up with how individual differences impact choices and new exciting developments in the prediction and neuroforecasting of future decisions. Finally, we consider how trends in decision-neuroscience research reflect progress toward resolving past challenges, discuss new and exciting applications of recent research, and identify new challenges for the field. This article is categorized under: Psychology > Reasoning and Decision Making Psychology > Emotion and Motivation.
Collapse
Affiliation(s)
- Jeffrey B Dennison
- Department of Psychology, Temple University, Philadelphia, Pennsylvania, USA
| | - Daniel Sazhin
- Department of Psychology, Temple University, Philadelphia, Pennsylvania, USA
| | - David V Smith
- Department of Psychology, Temple University, Philadelphia, Pennsylvania, USA
| |
Collapse
|
34
|
Wischnewski M, Compen B. Effects of theta transcranial alternating current stimulation (tACS) on exploration and exploitation during uncertain decision-making. Behav Brain Res 2022; 426:113840. [PMID: 35325684 DOI: 10.1016/j.bbr.2022.113840] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Revised: 03/02/2022] [Accepted: 03/08/2022] [Indexed: 01/15/2023]
Abstract
Exploring ones surroundings may yield unexpected rewards, but is associated with uncertainty and risk. Alternatively, exploitation of certain outcomes is related to low risk, yet potentially better outcomes remain unexamined. As such, risk-taking behavior depends on perceived uncertainty and a trade-off between exploration-exploitation. Previously, it has been suggested that risk-taking may relate to theta activity in the prefrontal cortex. Furthermore, previous studies hinted at a relationship between a right-hemispheric bias in frontal theta asymmetry and risky behavior. In the present double-blind sham-controlled within-subject study, we applied bifrontal transcranial alternating current stimulation (tACS) at the theta frequency (5 Hz) on eighteen healthy volunteers during a gambling task. Two tACS montages with either left-right or posterior-anterior current flow were employed at an intensity of 1 mA. Results showed that, compared to sham, theta tACS increased perceived uncertainty irrespective of current flow direction. Despite this observation, no direct effect of tACS on exploration behavior and general risk-taking was observed. Furthermore, frontal theta asymmetry was more right-hemispherically biased after posterior-anterior tACS, compared to sham. Finally, we used electric field simulation to identify which regions were targeted by the tACS montages as an overlap in regions may explain why the two montages resulted in comparable outcomes. Our findings provide a first step towards understanding the relationship between frontal theta oscillations and different features of risk-taking.
Collapse
Affiliation(s)
- Miles Wischnewski
- Department of Biomedical Engineering, University of Minnesota, Minneapolis, MN, United States.
| | - Boukje Compen
- School of Health Professions Education, Maastricht University, Maastricht, the Netherlands
| |
Collapse
|
35
|
A neural and behavioral trade-off between value and uncertainty underlies exploratory decisions in normative anxiety. Mol Psychiatry 2022; 27:1573-1587. [PMID: 34725456 DOI: 10.1038/s41380-021-01363-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 10/10/2021] [Accepted: 10/14/2021] [Indexed: 11/08/2022]
Abstract
Exploration reduces uncertainty about the environment and improves the quality of future decisions, but at the cost of provisional uncertain and suboptimal outcomes. Although anxiety promotes intolerance to uncertainty, it remains unclear whether and by which mechanisms anxiety relates to exploratory decision-making. We use a dynamic three-armed-bandit task and find that higher trait-anxiety is associated with increased exploration, which in turn harms overall performance. We identify two distinct behavioral sources: first, decisions made by anxious individuals are guided toward reduction of uncertainty; and second, decisions are less guided by immediate value gains. These findings are similar in both loss and gain domains, and further demonstrate that an affective trait relates to exploration and results in an inverse-U-shaped relationship between anxiety and overall performance. Additional imaging data (fMRI) suggests that normative anxiety correlates negatively with the representation of expected-value in the dorsal-anterior-cingulate-cortex, and in contrast, positively with the representation of uncertainty in the anterior-insula. We conclude that a trade-off between value-gains and uncertainty-reduction entails maladaptive decision-making in individuals with higher normal-range anxiety.
Collapse
|
36
|
Petzke TM, Schomaker J. A bias toward the unknown: individual and environmental factors influencing exploratory behavior. Ann N Y Acad Sci 2022; 1512:61-75. [PMID: 35218049 PMCID: PMC9306615 DOI: 10.1111/nyas.14757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 01/21/2022] [Indexed: 11/29/2022]
Abstract
With limited resources, exploring new opportunities is crucial for survival. Exploring novel options, however, comes at the cost of uncertainty. Therefore, there is a trade‐off between exploiting options with a known beneficial outcome and exploring novel options with a potentially higher gain. Computational models have suggested that novelty may promote exploratory behavior by inducing a so‐called novelty bonus through reward‐related processes. So far, few studies have provided behavioral evidence for such a novelty bonus. In this study, we aimed to investigate whether spatial novelty can stimulate exploratory behavior (Experiment 1), and whether age, novelty‐seeking, and reduced action radius or social interactions due to COVID‐19 restrictions influenced the exploration–exploitation trade‐off (Experiment 2). In both experiments, we employed a novel paradigm in which participants made binary decisions between food items, while on rare trials, a surprise option was presented. Results from Experiment 1 are in line with a novelty bonus, with spatial novelty promoting exploratory behavior. In Experiment 2, we found that exploratory behavior declined with age, high novelty seekers made more exploratory choices than low novelty seekers, and participants with a smaller action radius made fewer exploratory choices. These findings are consistent with previous findings in animals and predictions from computational models.
Collapse
Affiliation(s)
- Tara M Petzke
- Department of Health, Medical & Neuropsychology, Leiden University, Leiden, the Netherlands
| | - Judith Schomaker
- Department of Health, Medical & Neuropsychology, Leiden University, Leiden, the Netherlands.,Leiden Institute for Brain and Cognition, Leiden, the Netherlands
| |
Collapse
|
37
|
Palaniyappan L. Subcortical Origin of Salience Processing Deficits in Schizophrenia. BIOLOGICAL PSYCHIATRY GLOBAL OPEN SCIENCE 2022; 3:6-7. [PMID: 36712574 PMCID: PMC9874130 DOI: 10.1016/j.bpsgos.2021.12.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Accepted: 12/24/2021] [Indexed: 02/01/2023] Open
Affiliation(s)
- Lena Palaniyappan
- Address correspondence to Lena Palaniyappan, M.D., Ph.D., F.R.C.P.C.
| |
Collapse
|
38
|
Bağci B, Düsmez S, Zorlu N, Bahtiyar G, Isikli S, Bayrakci A, Heinz A, Schad DJ, Sebold M. Computational analysis of probabilistic reversal learning deficits in male subjects with alcohol use disorder. Front Psychiatry 2022; 13:960238. [PMID: 36339830 PMCID: PMC9626515 DOI: 10.3389/fpsyt.2022.960238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 09/27/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Alcohol use disorder is characterized by perseverative alcohol use despite negative consequences. This hallmark feature of addiction potentially relates to impairments in behavioral flexibility, which can be measured by probabilistic reversal learning (PRL) paradigms. We here aimed to examine the cognitive mechanisms underlying impaired PRL task performance in patients with alcohol use disorder (AUDP) using computational models of reinforcement learning. METHODS Twenty-eight early abstinent AUDP and 27 healthy controls (HC) performed an extensive PRL paradigm. We compared conventional behavioral variables of choices (perseveration; correct responses) between groups. Moreover, we fitted Bayesian computational models to the task data to compare differences in latent cognitive variables including reward and punishment learning and choice consistency between groups. RESULTS AUDP and HC did not significantly differ with regard to direct perseveration rates after reversals. However, AUDP made overall less correct responses and specifically showed decreased win-stay behavior compared to HC. Interestingly, AUDP showed premature switching after no or little negative feedback but elevated proneness to stay when accumulation of negative feedback would make switching a more optimal option. Computational modeling revealed that AUDP compared to HC showed enhanced learning from punishment, a tendency to learn less from positive feedback and lower choice consistency. CONCLUSION Our data do not support the assumption that AUDP are characterized by increased perseveration behavior. Instead our findings provide evidence that enhanced negative reinforcement and decreased non-drug-related reward learning as well as diminished choice consistency underlie dysfunctional choice behavior in AUDP.
Collapse
Affiliation(s)
- Başak Bağci
- Department of Psychiatry, Katip Celebi University Ataturk Education and Research Hospital, İzmir, Turkey
| | - Selin Düsmez
- Department of Psychiatry, Midyat State Hospital, Mardin, Turkey
| | - Nabi Zorlu
- Department of Psychiatry, Katip Celebi University Ataturk Education and Research Hospital, İzmir, Turkey
| | - Gökhan Bahtiyar
- Department of Psychiatry, Bingöl State Hospital, Bingöl, Turkey
| | - Serhan Isikli
- Department of Psychiatry, Katip Celebi University Ataturk Education and Research Hospital, İzmir, Turkey
| | - Adem Bayrakci
- Department of Psychiatry, Katip Celebi University Ataturk Education and Research Hospital, İzmir, Turkey
| | - Andreas Heinz
- Department of Psychiatry and Neurosciences, Charité Campus Mitte (CCM), Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Daniel J Schad
- Department of Psychology, Health and Medical University, Potsdam, Germany
| | - Miriam Sebold
- Department of Psychiatry and Neurosciences, Charité Campus Mitte (CCM), Charité-Universitätsmedizin Berlin, Berlin, Germany
| |
Collapse
|
39
|
Bond K, Dunovan K, Porter A, Rubin JE, Verstynen T. Dynamic decision policy reconfiguration under outcome uncertainty. eLife 2021; 10:e65540. [PMID: 34951589 PMCID: PMC8806193 DOI: 10.7554/elife.65540] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Accepted: 12/23/2021] [Indexed: 11/18/2022] Open
Abstract
In uncertain or unstable environments, sometimes the best decision is to change your mind. To shed light on this flexibility, we evaluated how the underlying decision policy adapts when the most rewarding action changes. Human participants performed a dynamic two-armed bandit task that manipulated the certainty in relative reward (conflict) and the reliability of action-outcomes (volatility). Continuous estimates of conflict and volatility contributed to shifts in exploratory states by changing both the rate of evidence accumulation (drift rate) and the amount of evidence needed to make a decision (boundary height), respectively. At the trialwise level, following a switch in the optimal choice, the drift rate plummets and the boundary height weakly spikes, leading to a slow exploratory state. We find that the drift rate drives most of this response, with an unreliable contribution of boundary height across experiments. Surprisingly, we find no evidence that pupillary responses associated with decision policy changes. We conclude that humans show a stereotypical shift in their decision policies in response to environmental changes.
Collapse
Affiliation(s)
- Krista Bond
- Department of Psychology, Carnegie Mellon UniversityPittsburghUnited States
- Center for the Neural Basis of CognitionPittsburghUnited States
- Carnegie Mellon Neuroscience InstitutePittsburghUnited States
| | - Kyle Dunovan
- Department of Psychology, Carnegie Mellon UniversityPittsburghUnited States
| | - Alexis Porter
- Department of Psychology, Northwestern UniversityEvanstonUnited States
| | - Jonathan E Rubin
- Center for the Neural Basis of CognitionPittsburghUnited States
- Department of Mathematics, University of PittsburghPittsburghUnited States
| | - Timothy Verstynen
- Department of Psychology, Carnegie Mellon UniversityPittsburghUnited States
- Center for the Neural Basis of CognitionPittsburghUnited States
- Carnegie Mellon Neuroscience InstitutePittsburghUnited States
- Department of Biomedical Engineering, Carnegie Mellon UniversityPittsburghUnited States
| |
Collapse
|
40
|
Spreng RN, Turner GR. From exploration to exploitation: a shifting mental mode in late life development. Trends Cogn Sci 2021; 25:1058-1071. [PMID: 34593321 PMCID: PMC8844884 DOI: 10.1016/j.tics.2021.09.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2021] [Revised: 08/30/2021] [Accepted: 09/01/2021] [Indexed: 12/31/2022]
Abstract
Changes in cognition, affect, and brain function combine to promote a shift in the nature of mentation in older adulthood, favoring exploitation of prior knowledge over exploratory search as the starting point for thought and action. Age-related exploitation biases result from the accumulation of prior knowledge, reduced cognitive control, and a shift toward affective goals. These are accompanied by changes in cortical networks, as well as attention and reward circuits. By incorporating these factors into a unified account, the exploration-to-exploitation shift offers an integrative model of cognitive, affective, and brain aging. Here, we review evidence for this model, identify determinants and consequences, and survey the challenges and opportunities posed by an exploitation-biased mental mode in later life.
Collapse
Affiliation(s)
- R Nathan Spreng
- Laboratory of Brain and Cognition, Montreal Neurological Institute, Department of Neurology and Neurosurgery, McGill University, Montreal, QC H3A 2B4, Canada; McConnell Brain Imaging Centre, Montreal Neurological Institute, McGill University, Montreal, QC H3A 2B4, Canada; Departments of Psychiatry and Psychology, McGill University, Montreal, QC H3A 0G4, Canada.
| | - Gary R Turner
- Department of Psychology, York University, Toronto, ON M3J 1P3, Canada
| |
Collapse
|
41
|
Better living through understanding the insula: Why subregions can make all the difference. Neuropharmacology 2021; 198:108765. [PMID: 34461066 DOI: 10.1016/j.neuropharm.2021.108765] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 07/19/2021] [Accepted: 08/23/2021] [Indexed: 02/07/2023]
Abstract
Insula function is considered critical for many motivated behaviors, with proposed functions ranging from attention, behavioral control, emotional regulation, goal-directed and aversion-resistant responding. Further, the insula is implicated in many neuropsychiatric conditions including substance abuse. More recently, multiple insula subregions have been distinguished based on anatomy, connectivity, and functional contributions. Generally, posterior insula is thought to encode more somatosensory inputs, which integrate with limbic/emotional information in middle insula, that in turn integrate with cognitive processes in anterior insula. Together, these regions provide rapid interoceptive information about the current or predicted situation, facilitating autonomic recruitment and quick, flexible action. Here, we seek to create a robust foundation from which to understand potential subregion differences, and provide direction for future studies. We address subregion differences across humans and rodents, so that the latter's mechanistic interventions can best mesh with clinical relevance of human conditions. We first consider the insula's suggested roles in humans, then compare subregional studies, and finally describe rodent work. One primary goal is to encourage precision in describing insula subregions, since imprecision (e.g. including both posterior and anterior studies when describing insula work) does a disservice to a larger understanding of insula contributions. Additionally, we note that specific task details can greatly impact recruitment of various subregions, requiring care and nuance in design and interpretation of studies. Nonetheless, the central ethological importance of the insula makes continued research to uncover mechanistic, mood, and behavioral contributions of paramount importance and interest. This article is part of the special Issue on 'Neurocircuitry Modulating Drug and Alcohol Abuse'.
Collapse
|
42
|
Zhen S, Yaple ZA, Eickhoff SB, Yu R. To learn or to gain: neural signatures of exploration in human decision-making. Brain Struct Funct 2021; 227:63-76. [PMID: 34596757 DOI: 10.1007/s00429-021-02389-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2020] [Accepted: 09/19/2021] [Indexed: 11/26/2022]
Abstract
Individuals not only take actions to obtain immediate rewards but also to gain more information to guide future choices. An ideal exploration-exploitation balance is crucial for maximizing reward over the long run. However, the neural signatures of exploration in humans remain unclear. Using quantitative meta-analyses of functional magnetic resonance imaging experiments on exploratory behaviors, we sought to identify the concordant activity pertaining to exploration over a range of experiments. The results revealed that exploration activates concordant brain activity associated with risk (e.g., dorsal medial prefrontal cortex and anterior insula), cognitive control (e.g., dorsolateral prefrontal cortex and inferior frontal gyrus), and motor processing (e.g., premotor cortex). These stereotaxic maps of exploration may indicate that exploration is highly linked to risk processing, but is also specifically associated with regions involved in executive control processes. Although this explanation should be treated as exploratory, these findings support theories positing an important role for the prefrontal-insular-motor cortical network in exploration.
Collapse
Affiliation(s)
- Shanshan Zhen
- Department of Management, Hong Kong Baptist University, Hong Kong, China
| | - Zachary A Yaple
- Department of Psychology, Faculty of Health, York University, Toronto, ON, Canada
| | - Simon B Eickhoff
- Medical Faculty, Institute of Systems Neuroscience, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute of Neuroscience and Medicine, Brain and Behaviour (INM-7), Research Centre Jülich, Jülich, Germany
| | - Rongjun Yu
- Department of Management, Hong Kong Baptist University, Hong Kong, China.
| |
Collapse
|
43
|
Koralek AC, Costa RM. Dichotomous dopaminergic and noradrenergic neural states mediate distinct aspects of exploitative behavioral states. SCIENCE ADVANCES 2021; 7:7/30/eabh2059. [PMID: 34301604 PMCID: PMC8302134 DOI: 10.1126/sciadv.abh2059] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 06/07/2021] [Indexed: 06/13/2023]
Abstract
The balance between exploiting known actions and exploring alternatives is critical for survival and hypothesized to rely on shifts in neuromodulation. We developed a behavioral paradigm to capture exploitative and exploratory states and imaged calcium dynamics in genetically identified dopaminergic and noradrenergic neurons. During exploitative states, characterized by motivated repetition of the same action choice, dopamine neurons in SNc encoding movement vigor showed sustained elevation of basal activity that lasted many seconds. This sustained activity emerged from longer positive responses, which accumulated during exploitative action-reward bouts, and hysteretic dynamics. Conversely, noradrenergic neurons in LC showed sustained inhibition of basal activity due to the accumulation of longer negative responses in LC. Chemogenetic manipulation of these sustained dynamics revealed that dopaminergic activity mediates action drive, whereas noradrenergic activity modulates choice diversity. These data uncover the emergence of sustained neural states in dopaminergic and noradrenergic networks that mediate dissociable aspects of exploitative bouts.
Collapse
Affiliation(s)
- Aaron C Koralek
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
- Champalimaud Neuroscience Programme, Champalimaud Centre for the Unknown, Lisbon, Portugal
| | - Rui M Costa
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA.
- Champalimaud Neuroscience Programme, Champalimaud Centre for the Unknown, Lisbon, Portugal
| |
Collapse
|
44
|
Gilbertson T, Steele D. Tonic dopamine, uncertainty and basal ganglia action selection. Neuroscience 2021; 466:109-124. [PMID: 34015370 DOI: 10.1016/j.neuroscience.2021.05.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 05/04/2021] [Accepted: 05/08/2021] [Indexed: 11/29/2022]
Abstract
To make optimal decisions in uncertain circumstances flexible adaption of behaviour is required; exploring alternatives when the best choice is unknown, exploiting what is known when that is best. Using a computational model of the basal ganglia, we propose that switches between exploratory and exploitative decisions are mediated by the interaction between tonic dopamine and cortical input to the basal ganglia. We show that a biologically detailed action selection circuit model, endowed with dopamine dependant striatal plasticity, can optimally solve the explore-exploit problem, estimating the true underlying state of a noisy Gaussian diffusion process. Critical to the model's performance was a fluctuating level of tonic dopamine which increased under conditions of uncertainty. With an optimal range of tonic dopamine, explore-exploit decisions were mediated by the effects of tonic dopamine on the precision of the model action selection mechanism. Under conditions of uncertain reward pay-out, the model's reduced selectivity allowed disinhibition of multiple alternative actions to be explored at random. Conversely, when uncertainly about reward pay-out was low, enhanced selectivity of the action selection circuit facilitated exploitation of the high value choice. Model performance was at the level of a Kalman filter which provides an optimal solution for the task. These simulations support the idea that this subcortical neural circuit may have evolved to facilitate decision making in non-stationary reward environments. The model generates several experimental predictions with relevance to abnormal decision making in neuropsychiatric and neurological disease.
Collapse
Affiliation(s)
- Tom Gilbertson
- Department of Neurology, Level 6, South Block, Ninewells Hospital & Medical School, Dundee DD2 4BF, UK; Division of Imaging Science and Technology, Medical School, University of Dundee, DD2 4BF, UK.
| | - Douglas Steele
- Division of Imaging Science and Technology, Medical School, University of Dundee, DD2 4BF, UK
| |
Collapse
|
45
|
Wilson RC, Bonawitz E, Costa VD, Ebitz RB. Balancing exploration and exploitation with information and randomization. Curr Opin Behav Sci 2021; 38:49-56. [PMID: 33184605 PMCID: PMC7654823 DOI: 10.1016/j.cobeha.2020.10.001] [Citation(s) in RCA: 62] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Explore-exploit decisions require us to trade off the benefits of exploring unknown options to learn more about them, with exploiting known options, for immediate reward. Such decisions are ubiquitous in nature, but from a computational perspective, they are notoriously hard. There is therefore much interest in how humans and animals make these decisions and recently there has been an explosion of research in this area. Here we provide a biased and incomplete snapshot of this field focusing on the major finding that many organisms use two distinct strategies to solve the explore-exploit dilemma: a bias for information ('directed exploration') and the randomization of choice ('random exploration'). We review evidence for the existence of these strategies, their computational properties, their neural implementations, as well as how directed and random exploration vary over the lifespan. We conclude by highlighting open questions in this field that are ripe to both explore and exploit.
Collapse
Affiliation(s)
- Robert C. Wilson
- Department of Psychology, University of Arizona, Tucson AZ USA
- Cognitive Science Program, University of Arizona, Tucson AZ USA
- Evelyn F. McKnight Brain Institute, University of Arizona, Tucson AZ USA
| | | | - Vincent D. Costa
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland OR USA
| | - R. Becket Ebitz
- Department of Neuroscience, University of Montréal, Montréal, Québec, Canada
| |
Collapse
|
46
|
Wiehler A, Chakroun K, Peters J. Attenuated Directed Exploration during Reinforcement Learning in Gambling Disorder. J Neurosci 2021; 41:2512-2522. [PMID: 33531415 PMCID: PMC7984586 DOI: 10.1523/jneurosci.1607-20.2021] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Revised: 01/18/2021] [Accepted: 01/22/2021] [Indexed: 12/30/2022] Open
Abstract
Gambling disorder (GD) is a behavioral addiction associated with impairments in value-based decision-making and behavioral flexibility and might be linked to changes in the dopamine system. Maximizing long-term rewards requires a flexible trade-off between the exploitation of known options and the exploration of novel options for information gain. This exploration-exploitation trade-off is thought to depend on dopamine neurotransmission. We hypothesized that human gamblers would show a reduction in directed (uncertainty-based) exploration, accompanied by changes in brain activity in a fronto-parietal exploration-related network. Twenty-three frequent, non-treatment seeking gamblers and twenty-three healthy matched controls (all male) performed a four-armed bandit task during functional magnetic resonance imaging (fMRI). Computational modeling using hierarchical Bayesian parameter estimation revealed signatures of directed exploration, random exploration, and perseveration in both groups. Gamblers showed a reduction in directed exploration, whereas random exploration and perseveration were similar between groups. Neuroimaging revealed no evidence for group differences in neural representations of basic task variables (expected value, prediction errors). Our hypothesis of reduced frontal pole (FP) recruitment in gamblers was not supported. Exploratory analyses showed that during directed exploration, gamblers showed reduced parietal cortex and substantia-nigra/ventral-tegmental-area activity. Cross-validated classification analyses revealed that connectivity in an exploration-related network was predictive of group status, suggesting that connectivity patterns might be more predictive of problem gambling than univariate effects. Findings reveal specific reductions of strategic exploration in gamblers that might be linked to altered processing in a fronto-parietal network and/or changes in dopamine neurotransmission implicated in GD.SIGNIFICANCE STATEMENT Wiehler et al. (2021) report that gamblers rely less on the strategic exploration of unknown, but potentially better rewards during reward learning. This is reflected in a related network of brain activity. Parameters of this network can be used to predict the presence of problem gambling behavior in participants.
Collapse
Affiliation(s)
- A Wiehler
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg 20246, Germany
- Université de Paris, Paris F-75006, France
- Department of Psychiatry, Service Hospitalo-Universitaire, Groupe Hospitalier Universitaire Paris Psychiatrie & Neurosciences, Paris F-75014, France
| | - K Chakroun
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg 20246, Germany
| | - J Peters
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg 20246, Germany
- Department of Psychology, Biological Psychology, University of Cologne, Cologne 50923, Germany
| |
Collapse
|
47
|
Pisupati S, Chartarifsky-Lynn L, Khanal A, Churchland AK. Lapses in perceptual decisions reflect exploration. eLife 2021; 10:55490. [PMID: 33427198 PMCID: PMC7846276 DOI: 10.7554/elife.55490] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2020] [Accepted: 01/10/2021] [Indexed: 12/17/2022] Open
Abstract
Perceptual decision-makers often display a constant rate of errors independent of evidence strength. These ‘lapses’ are treated as a nuisance arising from noise tangential to the decision, e.g. inattention or motor errors. Here, we use a multisensory decision task in rats to demonstrate that these explanations cannot account for lapses’ stimulus dependence. We propose a novel explanation: lapses reflect a strategic trade-off between exploiting known rewarding actions and exploring uncertain ones. We tested this model’s predictions by selectively manipulating one action’s reward magnitude or probability. As uniquely predicted by this model, changes were restricted to lapses associated with that action. Finally, we show that lapses are a powerful tool for assigning decision-related computations to neural structures based on disruption experiments (here, posterior striatum and secondary motor cortex). These results suggest that lapses reflect an integral component of decision-making and are informative about action values in normal and disrupted brain states.
Collapse
Affiliation(s)
- Sashank Pisupati
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States.,CSHL School of Biological Sciences, Cold Spring Harbor, New York, United States
| | - Lital Chartarifsky-Lynn
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States.,CSHL School of Biological Sciences, Cold Spring Harbor, New York, United States
| | - Anup Khanal
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States
| | | |
Collapse
|
48
|
Dubois M, Habicht J, Michely J, Moran R, Dolan RJ, Hauser TU. Human complex exploration strategies are enriched by noradrenaline-modulated heuristics. eLife 2021; 10:e59907. [PMID: 33393461 PMCID: PMC7815309 DOI: 10.7554/elife.59907] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 01/03/2021] [Indexed: 01/15/2023] Open
Abstract
An exploration-exploitation trade-off, the arbitration between sampling a lesser-known against a known rich option, is thought to be solved using computationally demanding exploration algorithms. Given known limitations in human cognitive resources, we hypothesised the presence of additional cheaper strategies. We examined for such heuristics in choice behaviour where we show this involves a value-free random exploration, that ignores all prior knowledge, and a novelty exploration that targets novel options alone. In a double-blind, placebo-controlled drug study, assessing contributions of dopamine (400 mg amisulpride) and noradrenaline (40 mg propranolol), we show that value-free random exploration is attenuated under the influence of propranolol, but not under amisulpride. Our findings demonstrate that humans deploy distinct computationally cheap exploration strategies and that value-free random exploration is under noradrenergic control.
Collapse
Affiliation(s)
- Magda Dubois
- Max Planck UCL Centre for Computational Psychiatry and Ageing ResearchLondonUnited Kingdom
- Wellcome Trust Centre for Neuroimaging, University College LondonLondonUnited Kingdom
| | - Johanna Habicht
- Max Planck UCL Centre for Computational Psychiatry and Ageing ResearchLondonUnited Kingdom
- Wellcome Trust Centre for Neuroimaging, University College LondonLondonUnited Kingdom
| | - Jochen Michely
- Max Planck UCL Centre for Computational Psychiatry and Ageing ResearchLondonUnited Kingdom
- Wellcome Trust Centre for Neuroimaging, University College LondonLondonUnited Kingdom
- Department of Psychiatry and Psychotherapy, Charité – Universitätsmedizin BerlinBerlinGermany
| | - Rani Moran
- Max Planck UCL Centre for Computational Psychiatry and Ageing ResearchLondonUnited Kingdom
- Wellcome Trust Centre for Neuroimaging, University College LondonLondonUnited Kingdom
| | - Ray J Dolan
- Max Planck UCL Centre for Computational Psychiatry and Ageing ResearchLondonUnited Kingdom
- Wellcome Trust Centre for Neuroimaging, University College LondonLondonUnited Kingdom
| | - Tobias U Hauser
- Max Planck UCL Centre for Computational Psychiatry and Ageing ResearchLondonUnited Kingdom
- Wellcome Trust Centre for Neuroimaging, University College LondonLondonUnited Kingdom
| |
Collapse
|
49
|
Human Belief State-Based Exploration and Exploitation in an Information-Selective Symmetric Reversal Bandit Task. COMPUTATIONAL BRAIN & BEHAVIOR 2021; 4:442-462. [PMID: 34368622 PMCID: PMC8327602 DOI: 10.1007/s42113-021-00112-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Accepted: 05/24/2021] [Indexed: 02/07/2023]
Abstract
Humans often face sequential decision-making problems, in which information about the environmental reward structure is detached from rewards for a subset of actions. In the current exploratory study, we introduce an information-selective symmetric reversal bandit task to model such situations and obtained choice data on this task from 24 participants. To arbitrate between different decision-making strategies that participants may use on this task, we developed a set of probabilistic agent-based behavioral models, including exploitative and explorative Bayesian agents, as well as heuristic control agents. Upon validating the model and parameter recovery properties of our model set and summarizing the participants' choice data in a descriptive way, we used a maximum likelihood approach to evaluate the participants' choice data from the perspective of our model set. In brief, we provide quantitative evidence that participants employ a belief state-based hybrid explorative-exploitative strategy on the information-selective symmetric reversal bandit task, lending further support to the finding that humans are guided by their subjective uncertainty when solving exploration-exploitation dilemmas. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s42113-021-00112-3.
Collapse
|