1
|
Ramírez-Ruiz J, Grytskyy D, Mastrogiuseppe C, Habib Y, Moreno-Bote R. Complex behavior from intrinsic motivation to occupy future action-state path space. Nat Commun 2024; 15:6368. [PMID: 39075046 PMCID: PMC11286966 DOI: 10.1038/s41467-024-49711-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 06/13/2024] [Indexed: 07/31/2024] Open
Abstract
Most theories of behavior posit that agents tend to maximize some form of reward or utility. However, animals very often move with curiosity and seem to be motivated in a reward-free manner. Here we abandon the idea of reward maximization and propose that the goal of behavior is maximizing occupancy of future paths of actions and states. According to this maximum occupancy principle, rewards are the means to occupy path space, not the goal per se; goal-directedness simply emerges as rational ways of searching for resources so that movement, understood amply, never ends. We find that action-state path entropy is the only measure consistent with additivity and other intuitive properties of expected future action-state path occupancy. We provide analytical expressions that relate the optimal policy and state-value function and prove convergence of our value iteration algorithm. Using discrete and continuous state tasks, including a high-dimensional controller, we show that complex behaviors such as "dancing", hide-and-seek, and a basic form of altruistic behavior naturally result from the intrinsic motivation to occupy path space. All in all, we present a theory of behavior that generates both variability and goal-directedness in the absence of reward maximization.
Collapse
Affiliation(s)
- Jorge Ramírez-Ruiz
- Center for Brain and Cognition, Departament d'Enginyeria i Escola d'Enginyeria, Universitat Pompeu Fabra, Barcelona, Spain.
| | - Dmytro Grytskyy
- Center for Brain and Cognition, Departament d'Enginyeria i Escola d'Enginyeria, Universitat Pompeu Fabra, Barcelona, Spain
| | - Chiara Mastrogiuseppe
- Center for Brain and Cognition, Departament d'Enginyeria i Escola d'Enginyeria, Universitat Pompeu Fabra, Barcelona, Spain
| | - Yamen Habib
- Center for Brain and Cognition, Departament d'Enginyeria i Escola d'Enginyeria, Universitat Pompeu Fabra, Barcelona, Spain
| | - Rubén Moreno-Bote
- Center for Brain and Cognition, Departament d'Enginyeria i Escola d'Enginyeria, Universitat Pompeu Fabra, Barcelona, Spain
- Serra Húnter Fellow Programme, Universitat Pompeu Fabra, Barcelona, Spain
| |
Collapse
|
2
|
Kobayashi K, Kable JW. Neural mechanisms of information seeking. Neuron 2024; 112:1741-1756. [PMID: 38703774 DOI: 10.1016/j.neuron.2024.04.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 01/30/2024] [Accepted: 04/08/2024] [Indexed: 05/06/2024]
Abstract
We ubiquitously seek information to make better decisions. Particularly in the modern age, when more information is available at our fingertips than ever, the information we choose to collect determines the quality of our decisions. Decision neuroscience has long adopted empirical approaches where the information available to decision-makers is fully controlled by the researchers, leaving neural mechanisms of information seeking less understood. Although information seeking has long been studied in the context of the exploration-exploitation trade-off, recent studies have widened the scope to investigate more overt information seeking in a way distinct from other decision processes. Insights gained from these studies, accumulated over the last few years, raise the possibility that information seeking is driven by the reward system signaling the subjective value of information. In this piece, we review findings from the recent studies, highlighting the conceptual and empirical relationships between distinct literatures, and discuss future research directions necessary to establish a more comprehensive understanding of how individuals seek information as a part of value-based decision-making.
Collapse
Affiliation(s)
- Kenji Kobayashi
- Department of Psychology, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | - Joseph W Kable
- Department of Psychology, University of Pennsylvania, Philadelphia, PA 19104, USA.
| |
Collapse
|
3
|
Tang H, Bartolo-Orozco R, Averbeck BB. Ventral frontostriatal circuitry mediates the computation of reinforcement from symbolic gains and losses. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.03.587097. [PMID: 38617219 PMCID: PMC11014508 DOI: 10.1101/2024.04.03.587097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
Reinforcement learning (RL), particularly in primates, is often driven by symbolic outcomes. However, it is usually studied with primary reinforcers. To examine the neural mechanisms underlying learning from symbolic outcomes, we trained monkeys on a task in which they learned to choose options that led to gains of tokens and avoid choosing options that led to losses of tokens. We then recorded simultaneously from the orbitofrontal cortex (OFC), ventral striatum (VS), amygdala (AMY), and the mediodorsal thalamus (MDt). We found that the OFC played a dominant role in coding token outcomes and token prediction errors. The other areas contributed complementary functions with the VS coding appetitive outcomes and the AMY coding the salience of outcomes. The MDt coded actions and relayed information about tokens between the OFC and VS. Thus, OFC leads the process of symbolic reinforcement learning in the ventral frontostriatal circuitry.
Collapse
|
4
|
Alejandro RJ, Holroyd CB. Hierarchical control over foraging behavior by anterior cingulate cortex. Neurosci Biobehav Rev 2024; 160:105623. [PMID: 38490499 DOI: 10.1016/j.neubiorev.2024.105623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 02/14/2024] [Accepted: 03/13/2024] [Indexed: 03/17/2024]
Abstract
Foraging is a natural behavior that involves making sequential decisions to maximize rewards while minimizing the costs incurred when doing so. The prevalence of foraging across species suggests that a common brain computation underlies its implementation. Although anterior cingulate cortex is believed to contribute to foraging behavior, its specific role has been contentious, with predominant theories arguing either that it encodes environmental value or choice difficulty. Additionally, recent attempts to characterize foraging have taken place within the reinforcement learning framework, with increasingly complex models scaling with task complexity. Here we review reinforcement learning foraging models, highlighting the hierarchical structure of many foraging problems. We extend this literature by proposing that ACC guides foraging according to principles of model-based hierarchical reinforcement learning. This idea holds that ACC function is organized hierarchically along a rostral-caudal gradient, with rostral structures monitoring the status and completion of high-level task goals (like finding food), and midcingulate structures overseeing the execution of task options (subgoals, like harvesting fruit) and lower-level actions (such as grabbing an apple).
Collapse
Affiliation(s)
| | - Clay B Holroyd
- Department of Experimental Psychology, Ghent University, Ghent, Belgium
| |
Collapse
|
5
|
Venditto SJC, Miller KJ, Brody CD, Daw ND. Dynamic reinforcement learning reveals time-dependent shifts in strategy during reward learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.28.582617. [PMID: 38464244 PMCID: PMC10925334 DOI: 10.1101/2024.02.28.582617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Different brain systems have been hypothesized to subserve multiple "experts" that compete to generate behavior. In reinforcement learning, two general processes, one model-free (MF) and one model-based (MB), are often modeled as a mixture of agents (MoA) and hypothesized to capture differences between automaticity vs. deliberation. However, shifts in strategy cannot be captured by a static MoA. To investigate such dynamics, we present the mixture-of-agents hidden Markov model (MoA-HMM), which simultaneously learns inferred action values from a set of agents and the temporal dynamics of underlying "hidden" states that capture shifts in agent contributions over time. Applying this model to a multi-step,reward-guided task in rats reveals a progression of within-session strategies: a shift from initial MB exploration to MB exploitation, and finally to reduced engagement. The inferred states predict changes in both response time and OFC neural encoding during the task, suggesting that these states are capturing real shifts in dynamics.
Collapse
|
6
|
Giarrocco F, Costa VD, Basile BM, Pujara MS, Murray EA, Averbeck BB. Motor System-Dependent Effects of Amygdala and Ventral Striatum Lesions on Explore-Exploit Behaviors. J Neurosci 2024; 44:e1206232023. [PMID: 38296647 PMCID: PMC10860650 DOI: 10.1523/jneurosci.1206-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 11/17/2023] [Accepted: 11/21/2023] [Indexed: 02/02/2024] Open
Abstract
Deciding whether to forego immediate rewards or explore new opportunities is a key component of flexible behavior and is critical for the survival of the species. Although previous studies have shown that different cortical and subcortical areas, including the amygdala and ventral striatum (VS), are implicated in representing the immediate (exploitative) and future (explorative) value of choices, the effect of the motor system used to make choices has not been examined. Here, we tested male rhesus macaques with amygdala or VS lesions on two versions of a three-arm bandit task where choices were registered with either a saccade or an arm movement. In both tasks we presented the monkeys with explore-exploit tradeoffs by periodically replacing familiar options with novel options that had unknown reward probabilities. We found that monkeys explored more with saccades but showed better learning with arm movements. VS lesions caused the monkeys to be more explorative with arm movements and less explorative with saccades, although this may have been due to an overall decrease in performance. VS lesions affected the monkeys' ability to learn novel stimulus-reward associations in both tasks, while after amygdala lesions this effect was stronger when choices were made with saccades. Further, on average, VS and amygdala lesions reduced the monkeys' ability to choose better options only when choices were made with a saccade. These results show that learning reward value associations to manage explore-exploit behaviors is motor system dependent and they further define the contributions of amygdala and VS to reinforcement learning.
Collapse
Affiliation(s)
- Franco Giarrocco
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda 20892-4415, MD
| | - Vincent D Costa
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda 20892-4415, MD
- Division of Neuroscience, Oregon National Primate Research Center, Beaverton 97006, OR
| | - Benjamin M Basile
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda 20892-4415, MD
- Department of Psychology, Dickinson College, Carlisle 17013, PA
| | - Maia S Pujara
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda 20892-4415, MD
| | - Elisabeth A Murray
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda 20892-4415, MD
| | - Bruno B Averbeck
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda 20892-4415, MD
| |
Collapse
|
7
|
Rolls ET, Deco G, Huang CC, Feng J. The connectivity of the human frontal pole cortex, and a theory of its involvement in exploit versus explore. Cereb Cortex 2024; 34:bhad416. [PMID: 37991264 DOI: 10.1093/cercor/bhad416] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 10/16/2023] [Accepted: 10/17/2023] [Indexed: 11/23/2023] Open
Abstract
The frontal pole is implicated in humans in whether to exploit resources versus explore alternatives. Effective connectivity, functional connectivity, and tractography were measured between six human frontal pole regions and for comparison 13 dorsolateral and dorsal prefrontal cortex regions, and the 360 cortical regions in the Human Connectome Project Multi-modal-parcellation atlas in 171 HCP participants. The frontal pole regions have effective connectivity with Dorsolateral Prefrontal Cortex regions, the Dorsal Prefrontal Cortex, both implicated in working memory; and with the orbitofrontal and anterior cingulate cortex reward/non-reward system. There is also connectivity with temporal lobe, inferior parietal, and posterior cingulate regions. Given this new connectivity evidence, and evidence from activations and damage, it is proposed that the frontal pole cortex contains autoassociation attractor networks that are normally stable in a short-term memory state, and maintain stability in the other prefrontal networks during stable exploitation of goals and strategies. However, if an input from the orbitofrontal or anterior cingulate cortex that expected reward, non-reward, or punishment is received, this destabilizes the frontal pole and thereby other prefrontal networks to enable exploration of competing alternative goals and strategies. The frontal pole connectivity with reward systems may be key in exploit versus explore.
Collapse
Affiliation(s)
- Edmund T Rolls
- Oxford Centre for Computational Neuroscience, Oxford, United Kingdom
- Department of Computer Science, University of Warwick, Coventry CV4 7AL, United Kingdom
- Institute of Science and Technology for Brain Inspired Intelligence, Fudan University, Shanghai 200403, China
| | - Gustavo Deco
- Center for Brain and Cognition, Computational Neuroscience Group, Department of Information and Communication Technologies, Universitat Pompeu Fabra, Roc Boronat 138, Barcelona 08018, Spain
- Brain and Cognition, Pompeu Fabra University, Barcelona 08018, Spain
- Institució Catalana de la Recerca i Estudis Avançats (ICREA), Universitat Pompeu Fabra, Passeig Lluís Companys 23, Barcelona 08010, Spain
| | - Chu-Chung Huang
- Shanghai Key Laboratory of Brain Functional Genomics (Ministry of Education), Institute of Brain and Education Innovation, School of Psychology and Cognitive Science, East China Normal University, Shanghai 200602, China
- Shanghai Center for Brain Science and Brain-Inspired Technology, Shanghai 200602, China
| | - Jianfeng Feng
- Department of Computer Science, University of Warwick, Coventry CV4 7AL, United Kingdom
- Institute of Science and Technology for Brain Inspired Intelligence, Fudan University, Shanghai 200403, China
| |
Collapse
|
8
|
Wyatt LE, Hewan PA, Hogeveen J, Spreng RN, Turner GR. Exploration versus exploitation decisions in the human brain: A systematic review of functional neuroimaging and neuropsychological studies. Neuropsychologia 2024; 192:108740. [PMID: 38036246 DOI: 10.1016/j.neuropsychologia.2023.108740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Revised: 10/15/2023] [Accepted: 11/21/2023] [Indexed: 12/02/2023]
Abstract
Thoughts and actions are often driven by a decision to either explore new avenues with unknown outcomes, or to exploit known options with predictable outcomes. Yet, the neural mechanisms underlying this exploration-exploitation trade-off in humans remain poorly understood. This is attributable to variability in the operationalization of exploration and exploitation as psychological constructs, as well as the heterogeneity of experimental protocols and paradigms used to study these choice behaviours. To address this gap, here we present a comprehensive review of the literature to investigate the neural basis of explore-exploit decision-making in humans. We first conducted a systematic review of functional magnetic resonance imaging (fMRI) studies of exploration-versus exploitation-based decision-making in healthy adult humans during foraging, reinforcement learning, and information search. Eleven fMRI studies met inclusion criterion for this review. Adopting a network neuroscience framework, synthesis of the findings across these studies revealed that exploration-based choice was associated with the engagement of attentional, control, and salience networks. In contrast, exploitation-based choice was associated with engagement of default network brain regions. We interpret these results in the context of a network architecture that supports the flexible switching between externally and internally directed cognitive processes, necessary for adaptive, goal-directed behaviour. To further investigate potential neural mechanisms underlying the exploration-exploitation trade-off we next surveyed studies involving neurodevelopmental, neuropsychological, and neuropsychiatric disorders, as well as lifespan development, and neurodegenerative diseases. We observed striking differences in patterns of explore-exploit decision-making across these populations, again suggesting that these two decision-making modes are supported by independent neural circuits. Taken together, our review highlights the need for precision-mapping of the neural circuitry and behavioural correlates associated with exploration and exploitation in humans. Characterizing exploration versus exploitation decision-making biases may offer a novel, trans-diagnostic approach to assessment, surveillance, and intervention for cognitive decline and dysfunction in normal development and clinical populations.
Collapse
Affiliation(s)
- Lindsay E Wyatt
- Department of Psychology, York University, Toronto, ON, Canada
| | - Patrick A Hewan
- Department of Psychology, York University, Toronto, ON, Canada
| | - Jeremy Hogeveen
- Department of Psychology, The University of New Mexico, Albuquerque, NM, USA
| | - R Nathan Spreng
- Montréal Neurological Institute, Department of Neurology and Neurosurgery, McGill University, Montréal, QC, H3A 2B4, Canada; Department of Psychology, McGill University, Montréal, QC, Canada; Department of Psychiatry, McGill University, Montréal, QC, Canada; McConnell Brain Imaging Centre, Montréal Neurological Institute, McGill University, Montréal, QC, Canada.
| | - Gary R Turner
- Department of Psychology, York University, Toronto, ON, Canada.
| |
Collapse
|
9
|
Xu Y, Harms MB, Green CS, Wilson RC, Pollak SD. Childhood unpredictability and the development of exploration. Proc Natl Acad Sci U S A 2023; 120:e2303869120. [PMID: 38011553 DOI: 10.1073/pnas.2303869120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 09/12/2023] [Indexed: 11/29/2023] Open
Abstract
Early in development, the process of exploration helps children gather new information that fosters learning about the world. Yet, it is unclear how childhood experiences may influence the way humans approach new learning. What influences decisions to exploit known, familiar options versus trying a novel alternative? We found that childhood unpredictability, characterized by unpredictable caregiving and unstable living environments, was associated with reduced exploratory behavior. This effect holds while controlling for individual differences, including anxiety and stress. Individuals who perceived their childhoods as unpredictable explored less and were instead more likely to repeat previous choices (habitual responding). They were also more sensitive to uncertainty than to potential rewards, even when the familiar options yielded lower rewards. We examined these effects across multiple task contexts and via both in-person (N = 78) and online replication (N = 84) studies among 10- to 13-y-olds. Results are discussed in terms of the potential cascading effects of unpredictable environments on the development of decision-making and the effects of early experience on subsequent learning.
Collapse
Affiliation(s)
- Yuyan Xu
- Department of Psychology, University of Wisconsin-Madison, Madison, WI 53706
| | - Madeline B Harms
- Department of Psychology, University of Minnesota Duluth, Duluth, MN 55812
| | - C Shawn Green
- Department of Psychology, University of Wisconsin-Madison, Madison, WI 53706
| | - Robert C Wilson
- Department of Psychology, University of Arizona, Tucson, AZ 85721
- Cognitive Science Program, University of Arizona, Tucson, AZ 85716
| | - Seth D Pollak
- Department of Psychology, University of Wisconsin-Madison, Madison, WI 53706
| |
Collapse
|
10
|
Campbell EM, Singh G, Claus ED, Witkiewitz K, Costa VD, Hogeveen J, Cavanagh JF. Electrophysiological Markers of Aberrant Cue-Specific Exploration in Hazardous Drinkers. COMPUTATIONAL PSYCHIATRY (CAMBRIDGE, MASS.) 2023; 7:47-59. [PMID: 38774639 PMCID: PMC11104413 DOI: 10.5334/cpsy.96] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/04/2023] [Accepted: 06/28/2023] [Indexed: 05/24/2024]
Abstract
Background Hazardous drinking is associated with maladaptive alcohol-related decision-making. Existing studies have often focused on how participants learn to exploit familiar cues based on prior reinforcement, but little is known about the mechanisms that drive hazardous drinkers to explore novel alcohol cues when their value is not known. Methods We investigated exploration of novel alcohol and non-alcohol cues in hazardous drinkers (N = 27) and control participants (N = 26) during electroencephalography (EEG). A normative computational model with two free parameters was fit to estimate participants' weighting of the future value of exploration and immediate value of exploitation. Results Hazardous drinkers demonstrated increased exploration of novel alcohol cues, and conversely, increased probability of exploiting familiar alternatives instead of exploring novel non-alcohol cues. The motivation to explore novel alcohol stimuli in hazardous drinkers was driven by an elevated relative future valuation of uncertain alcohol cues. P3a predicted more exploratory decision policies driven by an enhanced relative future valuation of novel alcohol cues. P3b did not predict choice behavior, but computational parameter estimates suggested that hazardous drinkers with enhanced P3b to alcohol cues were likely to learn to exploit their immediate expected value. Conclusions Hazardous drinkers did not display atypical choice behavior, different P3a/P3b amplitudes, or computational estimates to novel non-alcohol cues-diverging from previous studies in addiction showing atypical generalized explore-exploit decisions with non-drug-related cues. These findings reveal that cue-specific neural computations may drive aberrant alcohol-related decision-making in hazardous drinkers-highlighting the importance of drug-relevant cues in studies of decision-making in addiction.
Collapse
Affiliation(s)
- Ethan M. Campbell
- Department of Psychology & Psychology Clinical Neuroscience Center, University of New Mexico, US
| | - Garima Singh
- Department of Psychology & Psychology Clinical Neuroscience Center, University of New Mexico, US
| | - Eric D. Claus
- Department of Biobehavioral Health, Pennsylvania State University, US
| | - Katie Witkiewitz
- Department of Psychology & Psychology Clinical Neuroscience Center, University of New Mexico, US
| | - Vincent D. Costa
- Division of Neuroscience, Oregon National Primate Research Center, US
| | - Jeremy Hogeveen
- Department of Psychology & Psychology Clinical Neuroscience Center, University of New Mexico, US
| | - James F. Cavanagh
- Department of Psychology & Psychology Clinical Neuroscience Center, University of New Mexico, US
| |
Collapse
|
11
|
Lee JK, Rouault M, Wyart V. Adaptive tuning of human learning and choice variability to unexpected uncertainty. SCIENCE ADVANCES 2023; 9:eadd0501. [PMID: 36989365 PMCID: PMC10058239 DOI: 10.1126/sciadv.add0501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 02/28/2023] [Indexed: 06/19/2023]
Abstract
Human value-based decisions are notably variable under uncertainty. This variability is known to arise from two distinct sources: variable choices aimed at exploring available options and imprecise learning of option values due to limited cognitive resources. However, whether these two sources of decision variability are tuned to their specific costs and benefits remains unclear. To address this question, we compared the effects of expected and unexpected uncertainty on decision-making in the same reinforcement learning task. Across two large behavioral datasets, we found that humans choose more variably between options but simultaneously learn less imprecisely their values in response to unexpected uncertainty. Using simulations of learning agents, we demonstrate that these opposite adjustments reflect adaptive tuning of exploration and learning precision to the structure of uncertainty. Together, these findings indicate that humans regulate not only how much they explore uncertain options but also how precisely they learn the values of these options.
Collapse
Affiliation(s)
- Junseok K. Lee
- Laboratoire de Neurosciences Cognitives et Computationnelles, Institut National de la Santé et de la Recherche Médicale (Inserm), Paris, France
- Département d’Études Cognitives, École Normale Supérieure, Université PSL, Paris, France
| | - Marion Rouault
- Laboratoire de Neurosciences Cognitives et Computationnelles, Institut National de la Santé et de la Recherche Médicale (Inserm), Paris, France
- Département d’Études Cognitives, École Normale Supérieure, Université PSL, Paris, France
| | - Valentin Wyart
- Laboratoire de Neurosciences Cognitives et Computationnelles, Institut National de la Santé et de la Recherche Médicale (Inserm), Paris, France
- Département d’Études Cognitives, École Normale Supérieure, Université PSL, Paris, France
- Institut du Psychotraumatisme de l’Enfant et de l’Adolescent, Conseil Départemental Yvelines et Hauts-de-Seine, Versailles, France
| |
Collapse
|
12
|
Khatib D, Morris G. Spontaneous behaviour is shaped by dopamine in two ways. Nature 2023; 614:36-37. [PMID: 36653602 DOI: 10.1038/d41586-023-00004-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
|
13
|
Burk DC, Averbeck BB. Environmental uncertainty and the advantage of impulsive choice strategies. PLoS Comput Biol 2023; 19:e1010873. [PMID: 36716320 PMCID: PMC9910799 DOI: 10.1371/journal.pcbi.1010873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 02/09/2023] [Accepted: 01/15/2023] [Indexed: 02/01/2023] Open
Abstract
Choice impulsivity is characterized by the choice of immediate, smaller reward options over future, larger reward options, and is often thought to be associated with negative life outcomes. However, some environments make future rewards more uncertain, and in these environments impulsive choices can be beneficial. Here we examined the conditions under which impulsive vs. non-impulsive decision strategies would be advantageous. We used Markov Decision Processes (MDPs) to model three common decision-making tasks: Temporal Discounting, Information Sampling, and an Explore-Exploit task. We manipulated environmental variables to create circumstances where future outcomes were relatively uncertain. We then manipulated the discount factor of an MDP agent, which affects the value of immediate versus future rewards, to model impulsive and non-impulsive behavior. This allowed us to examine the performance of impulsive and non-impulsive agents in more or less predictable environments. In Temporal Discounting, we manipulated the transition probability to delayed rewards and found that the agent with the lower discount factor (i.e. the impulsive agent) collected more average reward than the agent with a higher discount factor (the non-impulsive agent) by selecting immediate reward options when the probability of receiving the future reward was low. In the Information Sampling task, we manipulated the amount of information obtained with each sample. When sampling led to small information gains, the impulsive MDP agent collected more average reward than the non-impulsive agent. Third, in the Explore-Exploit task, we manipulated the substitution rate for novel options. When the substitution rate was high, the impulsive agent again performed better than the non-impulsive agent, as it explored the novel options less and instead exploited options with known reward values. The results of these analyses show that impulsivity can be advantageous in environments that are unexpectedly uncertain.
Collapse
Affiliation(s)
- Diana C. Burk
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Bruno B. Averbeck
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail:
| |
Collapse
|
14
|
Schach S, Lindner A, Braun DA. Bounded rational decision-making models suggest capacity-limited concurrent motor planning in human posterior parietal and frontal cortex. PLoS Comput Biol 2022; 18:e1010585. [PMID: 36227842 PMCID: PMC9560147 DOI: 10.1371/journal.pcbi.1010585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Accepted: 09/18/2022] [Indexed: 11/05/2022] Open
Abstract
While traditional theories of sensorimotor processing have often assumed a serial decision-making pipeline, more recent approaches have suggested that multiple actions may be planned concurrently and vie for execution. Evidence for the latter almost exclusively stems from electrophysiological studies in posterior parietal and premotor cortex of monkeys. Here we study concurrent prospective motor planning in humans by recording functional magnetic resonance imaging (fMRI) during a delayed response task engaging movement sequences towards multiple potential targets. We find that also in human posterior parietal and premotor cortex delay activity modulates both with sequence complexity and the number of potential targets. We tested the hypothesis that this modulation is best explained by concurrent prospective planning as opposed to the mere maintenance of potential targets in memory. We devise a bounded rationality model with information constraints that optimally assigns information resources for planning and memory for this task and determine predicted information profiles according to the two hypotheses. When regressing delay activity on these model predictions, we find that the concurrent prospective planning strategy provides a significantly better explanation of the fMRI-signal modulations. Moreover, we find that concurrent prospective planning is more costly and thus limited for most subjects, as expressed by the best fitting information capacities. We conclude that bounded rational decision-making models allow relating both behavior and neural representations to utilitarian task descriptions based on bounded optimal information-processing assumptions. When the future is uncertain, it can be beneficial to concurrently plan several action possibilities in advance. Electrophysiological research found evidence in monkeys that brain regions in posterior parietal and promotor cortex are indeed capable of planning several actions in parallel. We now used fMRI to study brain activity in these brain regions in humans. For our analyses we applied bounded rationality models that optimally assign information resources to fMRI activity in a complex motor planning task. We find that theoretical information costs of concurrent prospective planning explained fMRI activity profiles significantly better than assuming alternative memory-based strategies. Moreover, exploiting the model allowed us to quantify the individual capacity limit for concurrent planning and to relate these individual limits to both subjects’ behavior and to their neural representations of planning.
Collapse
Affiliation(s)
- Sonja Schach
- Institute of Neural Information Processing, University of Ulm, Ulm, Germany
- * E-mail:
| | - Axel Lindner
- Tübingen Center for Mental Health, Department of Psychiatry and Psychotherapy, University of Tübingen, Tübingen, Germany
- Centre of Neurology, Division of Neuropsychology, Hertie-Institute for Clinical Brain Research, University of Tübingen, Tübingen, Germany
| | | |
Collapse
|
15
|
Pupil dilation and response slowing distinguish deliberate explorative choices in the probabilistic learning task. COGNITIVE, AFFECTIVE, & BEHAVIORAL NEUROSCIENCE 2022; 22:1108-1129. [PMID: 35359274 PMCID: PMC9458574 DOI: 10.3758/s13415-022-00996-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Accepted: 03/07/2022] [Indexed: 12/22/2022]
Abstract
This study examined whether pupil size and response time would distinguish directed exploration from random exploration and exploitation. Eighty-nine participants performed the two-choice probabilistic learning task while their pupil size and response time were continuously recorded. Using LMM analysis, we estimated differences in the pupil size and response time between the advantageous and disadvantageous choices as a function of learning success, i.e., whether or not a participant has learned the probabilistic contingency between choices and their outcomes. We proposed that before a true value of each choice became known to a decision-maker, both advantageous and disadvantageous choices represented a random exploration of the two options with an equally uncertain outcome, whereas the same choices after learning manifested exploitation and direct exploration strategies, respectively. We found that disadvantageous choices were associated with increases both in response time and pupil size, but only after the participants had learned the choice-reward contingencies. For the pupil size, this effect was strongly amplified for those disadvantageous choices that immediately followed gains as compared to losses in the preceding choice. Pupil size modulations were evident during the behavioral choice rather than during the pretrial baseline. These findings suggest that occasional disadvantageous choices, which violate the acquired internal utility model, represent directed exploration. This exploratory strategy shifts choice priorities in favor of information seeking and its autonomic and behavioral concomitants are mainly driven by the conflict between the behavioral plan of the intended exploratory choice and its strong alternative, which has already proven to be more rewarding.
Collapse
|
16
|
Rethinking delusions: A selective review of delusion research through a computational lens. Schizophr Res 2022; 245:23-41. [PMID: 33676820 PMCID: PMC8413395 DOI: 10.1016/j.schres.2021.01.023] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Revised: 01/27/2021] [Accepted: 01/29/2021] [Indexed: 02/06/2023]
Abstract
Delusions are rigid beliefs held with high certainty despite contradictory evidence. Notwithstanding decades of research, we still have a limited understanding of the computational and neurobiological alterations giving rise to delusions. In this review, we highlight a selection of recent work in computational psychiatry aimed at developing quantitative models of inference and its alterations, with the goal of providing an explanatory account for the form of delusional beliefs in psychosis. First, we assess and evaluate the experimental paradigms most often used to study inferential alterations in delusions. Based on our review of the literature and theoretical considerations, we contend that classic draws-to-decision paradigms are not well-suited to isolate inferential processes, further arguing that the commonly cited 'jumping-to-conclusion' bias may reflect neither delusion-specific nor inferential alterations. Second, we discuss several enhancements to standard paradigms that show promise in more effectively isolating inferential processes and delusion-related alterations therein. We further draw on our recent work to build an argument for a specific failure mode for delusions consisting of prior overweighting in high-level causal inferences about partially observable hidden states. Finally, we assess plausible neurobiological implementations for this candidate failure mode of delusional beliefs and outline promising future directions in this area.
Collapse
|
17
|
Abstract
Ancestors of macaques and humans separated into distinct lineages 25 million years ago. Despite this long separation, Hogeveen et al. (2022) show, in this issue of Neuron, that they mediate the explore-exploit tradeoff, which must be managed by any agent adapting to a dynamic environment, using similar computational and neural mechanisms.
Collapse
|
18
|
Hogeveen J, Mullins TS, Romero JD, Eversole E, Rogge-Obando K, Mayer AR, Costa VD. The neurocomputational bases of explore-exploit decision-making. Neuron 2022; 110:1869-1879.e5. [PMID: 35390278 PMCID: PMC9167768 DOI: 10.1016/j.neuron.2022.03.014] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 12/11/2021] [Accepted: 03/10/2022] [Indexed: 02/04/2023]
Abstract
Flexible decision-making requires animals to forego immediate rewards (exploitation) and try novel choice options (exploration) to discover if they are preferable to familiar alternatives. Using the same task and a partially observable Markov decision process (POMDP) model to quantify the value of choices, we first determined that the computational basis for managing explore-exploit tradeoffs is conserved across monkeys and humans. We then used fMRI to identify where in the human brain the immediate value of exploitative choices and relative uncertainty about the value of exploratory choices were encoded. Consistent with prior neurophysiological evidence in monkeys, we observed divergent encoding of reward value and uncertainty in prefrontal and parietal regions, including frontopolar cortex, and parallel encoding of these computations in motivational regions including the amygdala, ventral striatum, and orbitofrontal cortex. These results clarify the interplay between prefrontal and motivational circuits that supports adaptive explore-exploit decisions in humans and nonhuman primates.
Collapse
Affiliation(s)
- Jeremy Hogeveen
- Department of Psychology, University of New Mexico, Albuquerque, NM 87131, USA; Psychology Clinical Neuroscience Center, University of New Mexico, Albuquerque, NM 87131, USA.
| | - Teagan S Mullins
- Department of Psychology, University of New Mexico, Albuquerque, NM 87131, USA; Psychology Clinical Neuroscience Center, University of New Mexico, Albuquerque, NM 87131, USA
| | - John D Romero
- Department of Psychology, University of New Mexico, Albuquerque, NM 87131, USA; Psychology Clinical Neuroscience Center, University of New Mexico, Albuquerque, NM 87131, USA
| | - Elizabeth Eversole
- Department of Psychology, University of New Mexico, Albuquerque, NM 87131, USA; Psychology Clinical Neuroscience Center, University of New Mexico, Albuquerque, NM 87131, USA
| | - Kimberly Rogge-Obando
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN 37235, USA
| | - Andrew R Mayer
- Department of Psychology, University of New Mexico, Albuquerque, NM 87131, USA; Department of Psychiatry & Behavioral Sciences, University of New Mexico School of Medicine, Albuquerque, NM 87131, USA; Department of Neurology, University of New Mexico School of Medicine, Albuquerque, NM 87131, USA; The Mind Research Network/Lovelace Biomedical Research Institute, Pete & Nancy Domenici Hall, Albuquerque, NM 87106, USA
| | - Vincent D Costa
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR 97239, USA; Division of Neuroscience, Oregon National Primate Research Center, Beaverton, OR 97006, USA.
| |
Collapse
|
19
|
Leopold DA, Averbeck BB. Self-tuition as an essential design feature of the brain. Philos Trans R Soc Lond B Biol Sci 2022; 377:20200530. [PMID: 34957855 PMCID: PMC8710880 DOI: 10.1098/rstb.2020.0530] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
We are curious by nature, particularly when young. Evolution has endowed our brain with an inbuilt obligation to educate itself. In this perspectives article, we posit that self-tuition is an evolved principle of vertebrate brain design that is reflected in its basic architecture and critical for its normal development. Self-tuition involves coordination between functionally distinct components of the brain, with one set of areas motivating exploration that leads to the experiences that train another set. We review key hypothalamic and telencephalic structures involved in this interplay, including their anatomical connections and placement within the segmental architecture of conserved forebrain circuits. We discuss the nature of educative behaviours motivated by the hypothalamus, innate stimulus biases, the relationship to survival in early life, and mechanisms by which telencephalic areas gradually accumulate knowledge. We argue that this aspect of brain function is of paramount importance for systems neuroscience, as it confers neural specialization and allows animals to attain far more sophisticated behaviours than would be possible through genetic mechanisms alone. Self-tuition is of particular importance in humans and other primates, whose large brains and complex social cognition rely critically on experience-based learning during a protracted childhood period. This article is part of the theme issue ‘Systems neuroscience through the lens of evolutionary theory’.
Collapse
Affiliation(s)
- David A Leopold
- Section on Cognitive Neurophysiology and Imaging, Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, USA.,Neurophysiology Imaging Facility, National Institute of Mental Health, National Institute of Neurological Disorders and Stroke, National Eye Institute, National Institutes of Health, Bethesda, MD, USA
| | - Bruno B Averbeck
- Section on Learning and Decision Making, Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
20
|
Overcoming cognitive set bias requires more than seeing an alternative strategy. Sci Rep 2022; 12:2179. [PMID: 35140344 PMCID: PMC8828898 DOI: 10.1038/s41598-022-06237-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 01/21/2022] [Indexed: 12/05/2022] Open
Abstract
Determining when to switch from one strategy to another is at the heart of adaptive decision-making. Previous research shows that humans exhibit a ‘cognitive set’ bias, which occurs when a familiar strategy occludes—even much better—alternatives. Here we examined the mechanisms underlying cognitive set by investigating whether better solutions are visually overlooked, or fixated on but disregarded. We analyzed gaze data from 67 American undergraduates (91% female) while they completed the learned strategy-direct strategy (LS-DS) task, which measures their ability to switch from a learned strategy (LS) to a more efficient direct strategy (DS or shortcut). We found that, in the first trial block, participants fixated on the location of the shortcut more when it was available but most (89.6%) did not adopt it. Next, participants watched a video demonstrating either the DS (N = 34 Informed participants) or the familiar LS (N = 33 Controls). In post-video trials, Informed participants used the DS more than pre-video trials and compared to Controls. Notably, 29.4% of Informed participants continued to use the LS despite watching the DS video. We suggest that cognitive set in the LS-DS task does not stem from an inability to see the shortcut but rather a failure to try it.
Collapse
|
21
|
Differential coding of goals and actions in ventral and dorsal corticostriatal circuits during goal-directed behavior. Cell Rep 2022; 38:110198. [PMID: 34986350 PMCID: PMC9608360 DOI: 10.1016/j.celrep.2021.110198] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 07/08/2021] [Accepted: 12/10/2021] [Indexed: 02/04/2023] Open
Abstract
Goal-directed behavior requires identifying objects in the environment that can satisfy internal needs and executing actions to obtain those objects. The current study examines ventral and dorsal corticostriatal circuits that support complementary aspects of goal-directed behavior. We analyze activity from the amygdala, ventral striatum, orbitofrontal cortex, and lateral prefrontal cortex (LPFC) while monkeys perform a three-armed bandit task. Information about chosen stimuli and their value is primarily encoded in the amygdala, ventral striatum, and orbitofrontal cortex, while the spatial information is primarily encoded in the LPFC. Before the options are presented, information about the to-be-chosen stimulus is represented in the amygdala, ventral striatum, and orbitofrontal cortex; at the time of choice, the information is passed to the LPFC to direct a saccade. Thus, learned value information specifying behavioral goals is maintained throughout the ventral corticostriatal circuit, and it is routed through the dorsal circuit at the time actions are selected.
Collapse
|
22
|
Averbeck B, O'Doherty JP. Reinforcement-learning in fronto-striatal circuits. Neuropsychopharmacology 2022; 47:147-162. [PMID: 34354249 PMCID: PMC8616931 DOI: 10.1038/s41386-021-01108-0] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 07/06/2021] [Accepted: 07/09/2021] [Indexed: 01/03/2023]
Abstract
We review the current state of knowledge on the computational and neural mechanisms of reinforcement-learning with a particular focus on fronto-striatal circuits. We divide the literature in this area into five broad research themes: the target of the learning-whether it be learning about the value of stimuli or about the value of actions; the nature and complexity of the algorithm used to drive the learning and inference process; how learned values get converted into choices and associated actions; the nature of state representations, and of other cognitive machinery that support the implementation of various reinforcement-learning operations. An emerging fifth area focuses on how the brain allocates or arbitrates control over different reinforcement-learning sub-systems or "experts". We will outline what is known about the role of the prefrontal cortex and striatum in implementing each of these functions. We then conclude by arguing that it will be necessary to build bridges from algorithmic level descriptions of computational reinforcement-learning to implementational level models to better understand how reinforcement-learning emerges from multiple distributed neural networks in the brain.
Collapse
Affiliation(s)
| | - John P O'Doherty
- Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, USA.
| |
Collapse
|
23
|
Curiosity or savouring? Information seeking is modulated by both uncertainty and valence. PLoS One 2021; 16:e0257011. [PMID: 34559816 PMCID: PMC8462690 DOI: 10.1371/journal.pone.0257011] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Accepted: 08/20/2021] [Indexed: 11/19/2022] Open
Abstract
Curiosity is pervasive in our everyday lives, but we know little about the factors that contribute to this drive. In the current study, we assessed whether curiosity about uncertain outcomes is modulated by the valence of the information, i.e. whether the information is good or bad news. Using a lottery task in which outcome uncertainty, expected value and outcome valence (gain versus loss) were manipulated independently, we found that curiosity is overall higher for gains compared with losses and that curiosity increased with increasing outcome uncertainty for both gains and losses. These effects of uncertainty and valence did not interact, indicating that the motivation to reduce uncertainty and the motivation to maximize positive information represent separate, independent drives.
Collapse
|
24
|
Petitet P, Attaallah B, Manohar SG, Husain M. The computational cost of active information sampling before decision-making under uncertainty. Nat Hum Behav 2021; 5:935-946. [PMID: 34045719 DOI: 10.1038/s41562-021-01116-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Accepted: 04/14/2021] [Indexed: 01/30/2023]
Abstract
Humans often seek information to minimize the pervasive effect of uncertainty on decisions. Current theories explain how much knowledge people should gather before a decision, based on the cost-benefit structure of the problem at hand. Here, we demonstrate that this framework omits a crucial agent-related factor: the cognitive effort expended while collecting information. Using an active sampling model, we unveil a speed-efficiency trade-off whereby more informative samples take longer to find. Crucially, under sufficient time pressure, humans can break this trade-off, sampling both faster and more efficiently. Computational modelling demonstrates the existence of a cost of cognitive effort which, when incorporated into theoretical models, provides a better account of people's behaviour and also relates to self-reported fatigue accumulated during active sampling. Thus, the way people seek knowledge to guide their decisions is shaped not only by task-related costs and benefits, but also crucially by the quantifiable computational costs incurred.
Collapse
Affiliation(s)
- Pierre Petitet
- Department of Experimental Psychology, University of Oxford, Oxford, UK.
| | | | - Sanjay G Manohar
- Department of Experimental Psychology, University of Oxford, Oxford, UK
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, UK
| | - Masud Husain
- Department of Experimental Psychology, University of Oxford, Oxford, UK.
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, UK.
| |
Collapse
|
25
|
Livermore JJA, Holmes CL, Cutler J, Levstek M, Moga G, Brittain JRC, Campbell-Meiklejohn D. Selective effects of serotonin on choices to gather more information. J Psychopharmacol 2021; 35:631-640. [PMID: 33601931 PMCID: PMC8278551 DOI: 10.1177/0269881121991571] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
BACKGROUND Gathering and evaluating information leads to better decisions, but often at cost. The balance between information seeking and exploitation features in neurodevelopmental, mood, psychotic and substance-related disorders. Serotonin's role has been highlighted by experimental reduction of its precursor, tryptophan. AIMS We tested the boundaries and applicability of this role by asking whether changes to information sampling would be observed following acute doses of serotonergic and catecholaminergic clinical treatments. We used a variant of the Information Sampling Task (IST) to measure how much information a person requires before they make a decision. This task allows participants to sample information until satisfied to make a choice. METHODS In separate double-blind placebo-controlled experiments, we tested 27 healthy participants on/off 20 mg of the serotonin reuptake inhibitor (SRI) citalopram, and 22 participants on/off 40 mg of the noradrenergic reuptake inhibitor atomoxetine. The IST variant minimised effects of temporal impulsivity and loss aversion. Analyses used a variety of participant prior expectations of sampling spaces in the IST, including a new prior that accounts for learning of likely states across trials. We analysed behaviour by a new method that also accounts for baseline individual differences of risk preference. RESULTS Baseline preferences demonstrated risk aversion. Citalopram decreased the expected utility of choices and probability of being correct based on informational content of samples collected, suggesting participants collected less useful information before making a choice. Atomoxetine did not influence information seeking. CONCLUSION Acute changes of serotonin activity by way of a single SRI dose alter information-seeking behaviour.
Collapse
Affiliation(s)
- James JA Livermore
- Sussex Neuroscience/School of Psychology, University of Sussex, Brighton, UK
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Clare L Holmes
- Sussex Neuroscience/School of Psychology, University of Sussex, Brighton, UK
| | - Jo Cutler
- Sussex Neuroscience/School of Psychology, University of Sussex, Brighton, UK
- School of Psychology, University of Birmingham, Birmingham, UK
| | - Maruša Levstek
- Sussex Neuroscience/School of Psychology, University of Sussex, Brighton, UK
| | - Gyorgy Moga
- Sussex Neuroscience/School of Psychology, University of Sussex, Brighton, UK
| | - James RC Brittain
- Brighton and Sussex Medical School, Brighton, UK
- Chelsea and Westminster Hospital, London, UK
| | | |
Collapse
|
26
|
Abstract
Theories of orbitofrontal cortex (OFC) function have evolved substantially over the last few decades. There is now a general consensus that the OFC is important for predicting aspects of future events and for using these predictions to guide behavior. Yet the precise content of these predictions and the degree to which OFC contributes to agency contingent upon them has become contentious, with several plausible theories advocating different answers to these questions. In this review we will focus on three of these ideas-the economic value, credit assignment, and cognitive map hypotheses-describing both their successes and failures. We will propose that these failures hint at a more nuanced and perhaps unique role for the OFC, particularly the lateral subdivision, in supporting the proposed functions when an underlying model or map of the causal structures in the environment must be constructed or updated. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Collapse
|
27
|
Wilson RC, Bonawitz E, Costa VD, Ebitz RB. Balancing exploration and exploitation with information and randomization. Curr Opin Behav Sci 2021; 38:49-56. [PMID: 33184605 PMCID: PMC7654823 DOI: 10.1016/j.cobeha.2020.10.001] [Citation(s) in RCA: 77] [Impact Index Per Article: 25.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Explore-exploit decisions require us to trade off the benefits of exploring unknown options to learn more about them, with exploiting known options, for immediate reward. Such decisions are ubiquitous in nature, but from a computational perspective, they are notoriously hard. There is therefore much interest in how humans and animals make these decisions and recently there has been an explosion of research in this area. Here we provide a biased and incomplete snapshot of this field focusing on the major finding that many organisms use two distinct strategies to solve the explore-exploit dilemma: a bias for information ('directed exploration') and the randomization of choice ('random exploration'). We review evidence for the existence of these strategies, their computational properties, their neural implementations, as well as how directed and random exploration vary over the lifespan. We conclude by highlighting open questions in this field that are ripe to both explore and exploit.
Collapse
Affiliation(s)
- Robert C. Wilson
- Department of Psychology, University of Arizona, Tucson AZ USA
- Cognitive Science Program, University of Arizona, Tucson AZ USA
- Evelyn F. McKnight Brain Institute, University of Arizona, Tucson AZ USA
| | | | - Vincent D. Costa
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland OR USA
| | - R. Becket Ebitz
- Department of Neuroscience, University of Montréal, Montréal, Québec, Canada
| |
Collapse
|
28
|
Ferrari-Toniolo S, Bujold PM, Grabenhorst F, Báez-Mendoza R, Schultz W. Nonhuman Primates Satisfy Utility Maximization in Compliance with the Continuity Axiom of Expected Utility Theory. J Neurosci 2021; 41:2964-2979. [PMID: 33542082 PMCID: PMC8018892 DOI: 10.1523/jneurosci.0955-20.2020] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Revised: 11/13/2020] [Accepted: 11/19/2020] [Indexed: 11/21/2022] Open
Abstract
Expected Utility Theory (EUT), the first axiomatic theory of risky choice, describes choices as a utility maximization process: decision makers assign a subjective value (utility) to each choice option and choose the one with the highest utility. The continuity axiom, central to Expected Utility Theory and its modifications, is a necessary and sufficient condition for the definition of numerical utilities. The axiom requires decision makers to be indifferent between a gamble and a specific probabilistic combination of a more preferred and a less preferred gamble. While previous studies demonstrated that monkeys choose according to combinations of objective reward magnitude and probability, a concept-driven experimental approach for assessing the axiomatically defined conditions for maximizing utility by animals is missing. We experimentally tested the continuity axiom for a broad class of gamble types in 4 male rhesus macaque monkeys, showing that their choice behavior complied with the existence of a numerical utility measure as defined by the economic theory. We used the numerical quantity specified in the continuity axiom to characterize subjective preferences in a magnitude-probability space. This mapping highlighted a trade-off relation between reward magnitudes and probabilities, compatible with the existence of a utility function underlying subjective value computation. These results support the existence of a numerical utility function able to describe choices, allowing for the investigation of the neuronal substrates responsible for coding such rigorously defined quantity.SIGNIFICANCE STATEMENT A common assumption of several economic choice theories is that decisions result from the comparison of subjectively assigned values (utilities). This study demonstrated the compliance of monkey behavior with the continuity axiom of Expected Utility Theory, implying a subjective magnitude-probability trade-off relation, which supports the existence of numerical utility directly linked to the theoretical economic framework. We determined a numerical utility measure able to describe choices, which can serve as a correlate for the neuronal activity in the quest for brain structures and mechanisms guiding decisions.
Collapse
Affiliation(s)
- Simone Ferrari-Toniolo
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, CB2 3DY, United Kingdom
| | - Philipe M Bujold
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, CB2 3DY, United Kingdom
| | - Fabian Grabenhorst
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, CB2 3DY, United Kingdom
| | - Raymundo Báez-Mendoza
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, CB2 3DY, United Kingdom
- Department of Neurosurgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts 02114
| | - Wolfram Schultz
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, CB2 3DY, United Kingdom
| |
Collapse
|
29
|
Feng SF, Wang S, Zarnescu S, Wilson RC. The dynamics of explore-exploit decisions reveal a signal-to-noise mechanism for random exploration. Sci Rep 2021; 11:3077. [PMID: 33542333 PMCID: PMC7862437 DOI: 10.1038/s41598-021-82530-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 12/16/2020] [Indexed: 12/29/2022] Open
Abstract
Growing evidence suggests that behavioral variability plays a critical role in how humans manage the tradeoff between exploration and exploitation. In these decisions a little variability can help us to overcome the desire to exploit known rewards by encouraging us to randomly explore something else. Here we investigate how such 'random exploration' could be controlled using a drift-diffusion model of the explore-exploit choice. In this model, variability is controlled by either the signal-to-noise ratio with which reward is encoded (the 'drift rate'), or the amount of information required before a decision is made (the 'threshold'). By fitting this model to behavior, we find that while, statistically, both drift and threshold change when people randomly explore, numerically, the change in drift rate has by far the largest effect. This suggests that random exploration is primarily driven by changes in the signal-to-noise ratio with which reward information is represented in the brain.
Collapse
Affiliation(s)
- Samuel F Feng
- Department of Mathematics, Khalifa University of Science and Technology, Abu Dhabi, UAE
- Khalifa University Centre for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, UAE
| | - Siyu Wang
- Department of Psychology, University of Arizona, Tucson, AZ, USA
| | - Sylvia Zarnescu
- Department of Psychology, University of Arizona, Tucson, AZ, USA
| | - Robert C Wilson
- Department of Psychology, University of Arizona, Tucson, AZ, USA.
- Cognitive Science Program, University of Arizona, Tucson, AZ, USA.
| |
Collapse
|
30
|
van Lieshout LLF, de Lange FP, Cools R. Why so curious? Quantifying mechanisms of information seeking. Curr Opin Behav Sci 2020. [DOI: 10.1016/j.cobeha.2020.08.005] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
31
|
Averbeck BB, Murray EA. Hypothalamic Interactions with Large-Scale Neural Circuits Underlying Reinforcement Learning and Motivated Behavior. Trends Neurosci 2020; 43:681-694. [PMID: 32762959 PMCID: PMC7483858 DOI: 10.1016/j.tins.2020.06.006] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Revised: 06/02/2020] [Accepted: 06/19/2020] [Indexed: 02/02/2023]
Abstract
Biological agents adapt behavior to support the survival needs of the individual and the species. In this review we outline the anatomical, physiological, and computational processes that support reinforcement learning (RL). We describe two circuits in the primate brain that are linked to specific aspects of learning and goal-directed behavior. The ventral circuit, that includes the amygdala, ventral medial prefrontal cortex, and ventral striatum, has substantial connectivity with the hypothalamus. The dorsal circuit, that includes inferior parietal cortex, dorsal lateral prefrontal cortex, and the dorsal striatum, has minimal connectivity with the hypothalamus. The hypothalamic connectivity suggests distinct roles for these circuits. We propose that the ventral circuit defines behavioral goals, and the dorsal circuit orchestrates behavior to achieve those goals.
Collapse
Affiliation(s)
- Bruno B Averbeck
- Laboratory of Neuropsychology, National Institute of Mental Health (NIMH), National Institutes of Health, Bethesda, MD 20892-4415, USA.
| | - Elisabeth A Murray
- Laboratory of Neuropsychology, National Institute of Mental Health (NIMH), National Institutes of Health, Bethesda, MD 20892-4415, USA
| |
Collapse
|
32
|
Moreno-Bote R, Ramírez-Ruiz J, Drugowitsch J, Hayden BY. Heuristics and optimal solutions to the breadth-depth dilemma. Proc Natl Acad Sci U S A 2020; 117:19799-19808. [PMID: 32759219 PMCID: PMC7443877 DOI: 10.1073/pnas.2004929117] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
In multialternative risky choice, we are often faced with the opportunity to allocate our limited information-gathering capacity between several options before receiving feedback. In such cases, we face a natural trade-off between breadth-spreading our capacity across many options-and depth-gaining more information about a smaller number of options. Despite its broad relevance to daily life, including in many naturalistic foraging situations, the optimal strategy in the breadth-depth trade-off has not been delineated. Here, we formalize the breadth-depth dilemma through a finite-sample capacity model. We find that, if capacity is small (∼10 samples), it is optimal to draw one sample per alternative, favoring breadth. However, for larger capacities, a sharp transition is observed, and it becomes best to deeply sample a very small fraction of alternatives, which roughly decreases with the square root of capacity. Thus, ignoring most options, even when capacity is large enough to shallowly sample all of them, is a signature of optimal behavior. Our results also provide a rich casuistic for metareasoning in multialternative decisions with bounded capacity using close-to-optimal heuristics.
Collapse
Affiliation(s)
- Rubén Moreno-Bote
- Center for Brain and Cognition, Universitat Pompeu Fabra, 08002 Barcelona, Spain;
- Department of Information and Communication Technologies, Universitat Pompeu Fabra, 08002 Barcelona, Spain
- Serra Húnter Fellow Programme, Universitat Pompeu Fabra, 08002 Barcelona, Spain
- Catalan Institution for Research and Advanced Studies-Academia, Universitat Pompeu Fabra, 08002 Barcelona, Spain
| | - Jorge Ramírez-Ruiz
- Center for Brain and Cognition, Universitat Pompeu Fabra, 08002 Barcelona, Spain
- Department of Information and Communication Technologies, Universitat Pompeu Fabra, 08002 Barcelona, Spain
| | - Jan Drugowitsch
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115
| | - Benjamin Y Hayden
- Department of Neuroscience, University of Minnesota, Minneapolis, MN 55455
- Center for Magnetic Resonance Research, University of Minnesota, Minneapolis, MN 55455
- Center for Neural Engineering, University of Minnesota, Minneapolis, MN 55455
| |
Collapse
|
33
|
Soltani A, Izquierdo A. Adaptive learning under expected and unexpected uncertainty. Nat Rev Neurosci 2020; 20:635-644. [PMID: 31147631 DOI: 10.1038/s41583-019-0180-y] [Citation(s) in RCA: 105] [Impact Index Per Article: 26.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The outcome of a decision is often uncertain, and outcomes can vary over repeated decisions. Whether decision outcomes should substantially affect behaviour and learning depends on whether they are representative of a typically experienced range of outcomes or signal a change in the reward environment. Successful learning and decision-making therefore require the ability to estimate expected uncertainty (related to the variability of outcomes) and unexpected uncertainty (related to the variability of the environment). Understanding the bases and effects of these two types of uncertainty and the interactions between them - at the computational and the neural level - is crucial for understanding adaptive learning. Here, we examine computational models and experimental findings to distil computational principles and neural mechanisms for adaptive learning under uncertainty.
Collapse
Affiliation(s)
- Alireza Soltani
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA.
| | - Alicia Izquierdo
- Department of Psychology, The Brain Research Institute, University of California, Los Angeles, Los Angeles, CA, USA.
| |
Collapse
|
34
|
Primate Orbitofrontal Cortex Codes Information Relevant for Managing Explore-Exploit Tradeoffs. J Neurosci 2020; 40:2553-2561. [PMID: 32060169 DOI: 10.1523/jneurosci.2355-19.2020] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Revised: 01/26/2020] [Accepted: 02/09/2020] [Indexed: 11/21/2022] Open
Abstract
Reinforcement learning (RL) refers to the behavioral process of learning to obtain reward and avoid punishment. An important component of RL is managing explore-exploit tradeoffs, which refers to the problem of choosing between exploiting options with known values and exploring unfamiliar options. We examined correlates of this tradeoff, as well as other RL related variables, in orbitofrontal cortex (OFC) while three male monkeys performed a three-armed bandit learning task. During the task, novel choice options periodically replaced familiar options. The values of the novel options were unknown, and the monkeys had to explore them to see if they were better than other currently available options. The identity of the chosen stimulus and the reward outcome were strongly encoded in the responses of single OFC neurons. These two variables define the states and state transitions in our model that are relevant to decision-making. The chosen value of the option and the relative value of exploring that option were encoded at intermediate levels. We also found that OFC value coding was stimulus specific, as opposed to coding value independent of the identity of the option. The location of the option and the value of the current environment were encoded at low levels. Therefore, we found encoding of the variables relevant to learning and managing explore-exploit tradeoffs in OFC. These results are consistent with findings in the ventral striatum and amygdala and show that this monosynaptically connected network plays an important role in learning based on the immediate and future consequences of choices.SIGNIFICANCE STATEMENT Orbitofrontal cortex (OFC) has been implicated in representing the expected values of choices. Here we extend these results and show that OFC also encodes information relevant to managing explore-exploit tradeoffs. Specifically, OFC encodes an exploration bonus, which characterizes the relative value of exploring novel choice options. OFC also strongly encodes the identity of the chosen stimulus, and reward outcomes, which are necessary for computing the value of novel and familiar options.
Collapse
|
35
|
Ebitz RB, Sleezer BJ, Jedema HP, Bradberry CW, Hayden BY. Tonic exploration governs both flexibility and lapses. PLoS Comput Biol 2019; 15:e1007475. [PMID: 31703063 PMCID: PMC6867658 DOI: 10.1371/journal.pcbi.1007475] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Revised: 11/20/2019] [Accepted: 10/10/2019] [Indexed: 11/20/2022] Open
Abstract
In many cognitive tasks, lapses (spontaneous errors) are tacitly dismissed as the result of nuisance processes like sensorimotor noise, fatigue, or disengagement. However, some lapses could also be caused by exploratory noise: randomness in behavior that facilitates learning in changing environments. If so, then strategic processes would need only up-regulate (rather than generate) exploration to adapt to a changing environment. This view predicts that more frequent lapses should be associated with greater flexibility because these behaviors share a common cause. Here, we report that when rhesus macaques performed a set-shifting task, lapse rates were negatively correlated with perseverative error frequency across sessions, consistent with a common basis in exploration. The results could not be explained by local failures to learn. Furthermore, chronic exposure to cocaine, which is known to impair cognitive flexibility, did increase perseverative errors, but, surprisingly, also improved overall set-shifting task performance by reducing lapse rates. We reconcile these results with a state-switching model in which cocaine decreases exploration by deepening attractor basins corresponding to rule states. These results support the idea that exploratory noise contributes to lapses, affecting rule-based decision-making even when it has no strategic value, and suggest that one key mechanism for regulating exploration may be the depth of rule states.
Collapse
Affiliation(s)
- R. Becket Ebitz
- Department of Neuroscience and Center for Magnetic Resonance Research University of Minnesota, Minneapolis, MN, United States of America
| | - Brianna J. Sleezer
- Department of Neurobiology and Behavior, Cornell University, Ithaca, NY, United States of America
| | - Hank P. Jedema
- NIDA Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD, United States of America
| | - Charles W. Bradberry
- NIDA Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD, United States of America
| | - Benjamin Y. Hayden
- Department of Neuroscience and Center for Magnetic Resonance Research University of Minnesota, Minneapolis, MN, United States of America
| |
Collapse
|
36
|
Reitich-Stolero T, Aberg KC, Paz R. Re-exploring Mechanisms of Exploration. Neuron 2019; 103:360-363. [PMID: 31394060 DOI: 10.1016/j.neuron.2019.07.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Deciding when to exploit what is already known and when to explore new possibilities is crucial for adapting to novel and dynamic environments. Using reinforcement-based decision making, Costa et al. (2019) in this issue of Neuron find that neurons in the amygdala and ventral-striatum differentially signal the benefit from exploring new options and exploiting familiar ones.
Collapse
Affiliation(s)
| | - Kristoffer C Aberg
- Department of Neurobiology, Weizmann Institute of Science, Rehovot, Israel
| | - Rony Paz
- Department of Neurobiology, Weizmann Institute of Science, Rehovot, Israel.
| |
Collapse
|
37
|
Costa VD, Mitz AR, Averbeck BB. Subcortical Substrates of Explore-Exploit Decisions in Primates. Neuron 2019; 103:533-545.e5. [PMID: 31196672 PMCID: PMC6687547 DOI: 10.1016/j.neuron.2019.05.017] [Citation(s) in RCA: 66] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2018] [Revised: 03/27/2019] [Accepted: 05/08/2019] [Indexed: 01/06/2023]
Abstract
The explore-exploit dilemma refers to the challenge of deciding when to forego immediate rewards and explore new opportunities that could lead to greater rewards in the future. While motivational neural circuits facilitate learning based on past choices and outcomes, it is unclear whether they also support computations relevant for deciding when to explore. We recorded neural activity in the amygdala and ventral striatum of rhesus macaques as they solved a task that required them to balance novelty-driven exploration with exploitation of what they had already learned. Using a partially observable Markov decision process (POMDP) model to quantify explore-exploit trade-offs, we identified that the ventral striatum and amygdala differ in how they represent the immediate value of exploitative choices and the future value of exploratory choices. These findings show that subcortical motivational circuits are important in guiding explore-exploit decisions.
Collapse
Affiliation(s)
- Vincent D Costa
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institute of Health, Bethesda, MD 20892, USA; Department of Behavioral Neuroscience, Oregon Health & Science University, Portland, OR 97239, USA; Division of Neuroscience, Oregon National Primate Research Center, Beaverton, OR 97006, USA.
| | - Andrew R Mitz
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institute of Health, Bethesda, MD 20892, USA
| | - Bruno B Averbeck
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institute of Health, Bethesda, MD 20892, USA
| |
Collapse
|
38
|
Baker SC, Konova AB, Daw ND, Horga G. A distinct inferential mechanism for delusions in schizophrenia. Brain 2019; 142:1797-1812. [PMID: 30895299 PMCID: PMC6644849 DOI: 10.1093/brain/awz051] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Revised: 12/27/2018] [Accepted: 01/16/2019] [Indexed: 12/14/2022] Open
Abstract
Delusions, a core symptom of psychosis, are false beliefs that are rigidly held with strong conviction despite contradictory evidence. Alterations in inferential processes have long been proposed to underlie delusional pathology, but previous attempts to show this have failed to yield compelling evidence for a specific relationship between inferential abnormalities and delusional severity in schizophrenia. Using a novel, incentivized information-sampling task (a modified version of the beads task), alongside well-characterized decision-making tasks, we sought a mechanistic understanding of delusions in a sample of medicated and unmedicated patients with schizophrenia who exhibited a wide range of delusion severity. In this novel task, participants chose whether to draw beads from one of two hidden jars or to guess the identity of the hidden jar, in order to minimize financial loss from a monetary endowment, and concurrently reported their probability estimates for the hidden jar. We found that patients with higher delusion severity exhibited increased information seeking (i.e. increased draws-to-decision behaviour). This increase was highly specific to delusion severity as compared to the severity of other psychotic symptoms, working-memory capacity, and other clinical and socio-demographic characteristics. Delusion-related increases in information seeking were present in unmedicated patients, indicating that they were unlikely due to antipsychotic medication. In addition, after adjusting for delusion severity, patients as a whole exhibited decreased information seeking relative to healthy individuals, a decrease that correlated with lower socioeconomic status. Computational analyses of reported probability estimates further showed that more delusional patients exhibited abnormal belief updating characterized by stronger reliance on prior beliefs formed early in the inferential process, a feature that correlated with increased information seeking in patients. Other decision-making parameters that could have theoretically explained the delusion effects, such as those related to subjective valuation, were uncorrelated with both delusional severity and information seeking among the patients. In turn, we found some preliminary evidence that subjective valuation (rather than belief updating) may explain group differences in information seeking unrelated to delusions. Together, these results suggest that abnormalities in belief updating, characterized by stronger reliance on prior beliefs formed by incorporating information presented earlier in the inferential process, may be a core computational mechanism of delusional ideation in psychosis. Our results thus provide direct empirical support for an inferential mechanism that naturally captures the characteristic rigidity associated with delusional beliefs.
Collapse
Affiliation(s)
- Seth C Baker
- Department of Psychiatry, New York State Psychiatric Institute, Columbia University Medical Center, 1051 Riverside Drive, New York, NY, USA
| | - Anna B Konova
- Department of Psychiatry, University Behavioral Health Care, and Brain Health Institute, Rutgers University – New Brunswick, 671 Hoes Lane West, Piscataway, NJ, USA
| | - Nathaniel D Daw
- Department of Psychology and Princeton Neuroscience Institute, Princeton University, South Drive, Princeton, NJ, USA
| | - Guillermo Horga
- Department of Psychiatry, New York State Psychiatric Institute, Columbia University Medical Center, 1051 Riverside Drive, New York, NY, USA
| |
Collapse
|
39
|
Dopamine blockade impairs the exploration-exploitation trade-off in rats. Sci Rep 2019; 9:6770. [PMID: 31043685 PMCID: PMC6494917 DOI: 10.1038/s41598-019-43245-z] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Accepted: 04/18/2019] [Indexed: 01/30/2023] Open
Abstract
In a volatile environment where rewards are uncertain, successful performance requires a delicate balance between exploitation of the best option and exploration of alternative choices. It has theoretically been proposed that dopamine contributes to the control of this exploration-exploitation trade-off, specifically that the higher the level of tonic dopamine, the more exploitation is favored. We demonstrate here that there is a formal relationship between the rescaling of dopamine positive reward prediction errors and the exploration-exploitation trade-off in simple non-stationary multi-armed bandit tasks. We further show in rats performing such a task that systemically antagonizing dopamine receptors greatly increases the number of random choices without affecting learning capacities. Simulations and comparison of a set of different computational models (an extended Q-learning model, a directed exploration model, and a meta-learning model) fitted on each individual confirm that, independently of the model, decreasing dopaminergic activity does not affect learning rate but is equivalent to an increase in random exploration rate. This study shows that dopamine could adapt the exploration-exploitation trade-off in decision-making when facing changing environmental contingencies.
Collapse
|
40
|
Monkeys are curious about counterfactual outcomes. Cognition 2019; 189:1-10. [PMID: 30889493 DOI: 10.1016/j.cognition.2019.03.009] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Revised: 03/11/2019] [Accepted: 03/13/2019] [Indexed: 11/22/2022]
Abstract
Many non-human animals show exploratory behaviors. It remains unclear whether any possess human-like curiosity. We previously proposed three criteria for applying the term curiosity to animal behavior: (1) the subject is willing to sacrifice reward to obtain information, (2) the information provides no immediate instrumental or strategic benefit, and (3) the amount the subject is willing to pay depends systematically on the amount of information available. In previous work on information-seeking in animals, information generally predicts upcoming rewards, and animals' decisions may therefore be a byproduct of reinforcement processes. Here we get around this potential confound by taking advantage of macaques' ability to reason counterfactually (that is, about outcomes that could have occurred had the subject chosen differently). Specifically, macaques sacrificed fluid reward to obtain information about counterfactual outcomes. Moreover, their willingness to pay scaled with the information (Shannon entropy) offered by the counterfactual option. These results demonstrate the existence of human-like curiosity in non-human primates according to our criteria, which circumvent several confounds associated with less stringent criteria.
Collapse
|
41
|
Furl N, Averbeck BB, McKay RT. Looking for Mr(s) Right: Decision bias can prevent us from finding the most attractive face. Cogn Psychol 2019; 111:1-14. [PMID: 30826584 DOI: 10.1016/j.cogpsych.2019.02.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Revised: 12/21/2018] [Accepted: 02/22/2019] [Indexed: 01/28/2023]
Abstract
In realistic and challenging decision contexts, people may show biases that prevent them from choosing their favored options. For example, astronomer Johannes Kepler famously interviewed several candidate fiancées sequentially, but was rejected when attempting to return to a previous candidate. Similarly, we examined human performance on searches for attractive faces through fixed-length sequences by adapting optimal stopping computational theory developed from behavioral ecology and economics. Although economics studies have repeatedly found that participants sample too few options before choosing the best-ranked number from a series, we instead found overlong searches with many sequences ending without choice. Participants employed irrationally high choice thresholds, compared to the more lax, realistic standards of a Bayesian ideal observer, which achieved better-ranked faces. We consider several computational accounts and find that participants most resemble a Bayesian model that decides based on altered attractiveness values. These values may produce starkly different biases in the facial attractiveness domain than in other decision domains.
Collapse
Affiliation(s)
- Nicholas Furl
- Department of Psychology, Royal Holloway, University of London, Egham TW20 0EX, United Kingdom.
| | - Bruno B Averbeck
- NIMH/NIH, 49 Convent Drive, MSC 4415, Bethesda, MD 20892-4415, United States
| | - Ryan T McKay
- Department of Psychology, Royal Holloway, University of London, Egham TW20 0EX, United Kingdom
| |
Collapse
|
42
|
|
43
|
Dale G, Sampers D, Loo S, Green CS. Individual differences in exploration and persistence: Grit and beliefs about ability and reward. PLoS One 2018; 13:e0203131. [PMID: 30180200 PMCID: PMC6122809 DOI: 10.1371/journal.pone.0203131] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Accepted: 08/15/2018] [Indexed: 11/18/2022] Open
Abstract
The tradeoff between knowing when to seek greater rewards (exploration), and knowing when to settle (exploitation), is critical to success. One dispositional factor that may modulate this tradeoff is "grit." Gritty individuals tend to persist in the face of difficulty and consequently experience greater life success. It is possible that they may also experience a greater tendency to explore in a reward task. However, although most exploration/exploitation tasks manipulate beliefs about the presence/magnitude of rewards in the environment, the belief of one's ability to actually achieve a reward is also critical. As such, we investigated whether individuals higher in grit were more likely to explore, and how beliefs about the magnitude/presence of rewards, and the perceived ability to achieve a reward, modulated their exploration tendencies. Over two experiments, participants completed 4 different exploration/persistence tasks: two that tapped into participant beliefs about the presence/magnitude of rewards, and two that tapped into participant beliefs about their ability to achieve a reward. Participants also completed measures of dispositional grit (Experiment 1a and 1b), conscientiousness (Experiment 1b), and working memory (Experiment 1a and 1b). In both experiments, we found a relationship between the two "belief of rewards" tasks, as well as between the two "belief of ability" tasks, but performance was unrelated across the two types of task. We also found that dispositional grit was strongly associated with greater exploration, but only on the "belief of ability" tasks. Finally, in Experiment 1b we showed that conscientiousness better predicted exploration on the "belief of ability" tasks than grit, suggesting that it is not grittiness per se that is associated with exploration. Overall, our findings showed that individuals high in grit/conscientiousness are more likely to explore, but only when there is a known reward available that they believe they have the ability to achieve.
Collapse
Affiliation(s)
- Gillian Dale
- Department of Psychology, University of Wisconsin-Madison, Madison, WI, United States of America
| | - Danielle Sampers
- Department of Psychology, University of Wisconsin-Madison, Madison, WI, United States of America
| | - Stephanie Loo
- Department of Psychology, University of Wisconsin-Madison, Madison, WI, United States of America
| | - C. Shawn Green
- Department of Psychology, University of Wisconsin-Madison, Madison, WI, United States of America
| |
Collapse
|
44
|
Fetsch CR, Odean NN, Jeurissen D, El-Shamayleh Y, Horwitz GD, Shadlen MN. Focal optogenetic suppression in macaque area MT biases direction discrimination and decision confidence, but only transiently. eLife 2018; 7:e36523. [PMID: 30051817 PMCID: PMC6086666 DOI: 10.7554/elife.36523] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2018] [Accepted: 07/12/2018] [Indexed: 12/29/2022] Open
Abstract
Insights from causal manipulations of brain activity depend on targeting the spatial and temporal scales most relevant for behavior. Using a sensitive perceptual decision task in monkeys, we examined the effects of rapid, reversible inactivation on a spatial scale previously achieved only with electrical microstimulation. Inactivating groups of similarly tuned neurons in area MT produced systematic effects on choice and confidence. Behavioral effects were attenuated over the course of each session, suggesting compensatory adjustments in the downstream readout of MT over tens of minutes. Compensation also occurred on a sub-second time scale: behavior was largely unaffected when the visual stimulus (and concurrent suppression) lasted longer than 350 ms. These trends were similar for choice and confidence, consistent with the idea of a common mechanism underlying both measures. The findings demonstrate the utility of hyperpolarizing opsins for linking neural population activity at fine spatial and temporal scales to cognitive functions in primates.
Collapse
Affiliation(s)
- Christopher R Fetsch
- Zanvyl Krieger Mind/Brain InstituteJohns Hopkins UniversityBaltimoreUnited States
- Solomon H. Snyder Department of NeuroscienceJohns Hopkins UniversityBaltimoreUnited States
| | - Naomi N Odean
- Kavli InstituteColumbia UniversityNew YorkUnited States
- Howard Hughes Medical InstituteColumbia UniversityNew YorkUnited States
- Department of Neuroscience, Zuckerman Mind Brain Behavior InstituteColumbia UniversityNew YorkUnited States
| | - Danique Jeurissen
- Kavli InstituteColumbia UniversityNew YorkUnited States
- Howard Hughes Medical InstituteColumbia UniversityNew YorkUnited States
- Department of Neuroscience, Zuckerman Mind Brain Behavior InstituteColumbia UniversityNew YorkUnited States
| | - Yasmine El-Shamayleh
- Department of Physiology & BiophysicsWashington National Primate Research Center, University of WashingtonWashingtonUnited States
| | - Gregory D Horwitz
- Department of Physiology & BiophysicsWashington National Primate Research Center, University of WashingtonWashingtonUnited States
| | - Michael N Shadlen
- Kavli InstituteColumbia UniversityNew YorkUnited States
- Howard Hughes Medical InstituteColumbia UniversityNew YorkUnited States
- Department of Neuroscience, Zuckerman Mind Brain Behavior InstituteColumbia UniversityNew YorkUnited States
| |
Collapse
|
45
|
Martinelli C, Rigoli F, Averbeck B, Shergill SS. The value of novelty in schizophrenia. Schizophr Res 2018; 192:287-293. [PMID: 28495493 PMCID: PMC5890442 DOI: 10.1016/j.schres.2017.05.007] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/06/2016] [Revised: 04/27/2017] [Accepted: 05/06/2017] [Indexed: 11/15/2022]
Abstract
Influential models of schizophrenia suggest that patients experience incoming stimuli as excessively novel and motivating, with important consequences for hallucinatory experience and delusional belief. However, whether schizophrenia patients exhibit excessive novelty value and whether this interferes with adaptive behaviour has not yet been formally tested. Here, we employed a three-armed bandit task to investigate this hypothesis. Schizophrenia patients and healthy controls were first familiarised with a group of images and then asked to repeatedly choose between familiar and unfamiliar images associated with different monetary reward probabilities. By fitting a reinforcement-learning model we were able to estimate the values attributed to familiar and unfamiliar images when first presented in the context of the decision-making task. In line with our hypothesis, we found increased preference for newly introduced images (irrespective of whether these were familiar or unfamiliar) in patients compared to healthy controls and this to correlate with severity of hallucinatory experience. In addition, we found a correlation between value assigned to novel images and task performance, suggesting that excessive novelty value may interfere with optimal learning in patients, putatively through the disruption of the mechanisms regulating exploration versus exploitation. Our results suggest excessive novelty value in patients, whereby even previously seen stimuli acquire higher value as the result of their exposure in a novel context - a form of 'hyper novelty' which may explain why patients are often attracted by familiar stimuli experienced as new.
Collapse
Affiliation(s)
- Cristina Martinelli
- Department of Psychosis Studies, Institute of Psychiatry, Psychology & Neuroscience, King's College London, De Crespigny Park, SE5 8AF London, United Kingdom.
| | - Francesco Rigoli
- Wellcome Trust Centre for Neuroimaging, University College London, 12 Queen's Square, WC1N 3BG London, United Kingdom
| | - Bruno Averbeck
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institute of Health, Bethesda, MD 20892-4415, USA
| | - Sukhwinder S Shergill
- Department of Psychosis Studies, Institute of Psychiatry, Psychology & Neuroscience, King's College London, De Crespigny Park, SE5 8AF London, United Kingdom
| |
Collapse
|
46
|
Cogliati Dezza I, Yu AJ, Cleeremans A, Alexander W. Learning the value of information and reward over time when solving exploration-exploitation problems. Sci Rep 2017; 7:16919. [PMID: 29209058 PMCID: PMC5717252 DOI: 10.1038/s41598-017-17237-w] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2017] [Accepted: 11/22/2017] [Indexed: 11/09/2022] Open
Abstract
To flexibly adapt to the demands of their environment, animals are constantly exposed to the conflict resulting from having to choose between predictably rewarding familiar options (exploitation) and risky novel options, the value of which essentially consists of obtaining new information about the space of possible rewards (exploration). Despite extensive research, the mechanisms that subtend the manner in which animals solve this exploitation-exploration dilemma are still poorly understood. Here, we investigate human decision-making in a gambling task in which the informational value of each trial and the reward potential were separately manipulated. To better characterize the mechanisms that underlined the observed behavioural choices, we introduce a computational model that augments the standard reward-based reinforcement learning formulation by associating a value to information. We find that both reward and information gained during learning influence the balance between exploitation and exploration, and that this influence was dependent on the reward context. Our results shed light on the mechanisms that underpin decision-making under uncertainty, and suggest new approaches for investigating the exploration-exploitation dilemma throughout the animal kingdom.
Collapse
Affiliation(s)
- Irene Cogliati Dezza
- Centre for Research in Cognition & Neurosciences (CRCN), Université Libre de Bruxelles, Brussels, Belgium.
| | - Angela J Yu
- Department of Cognitive Science, University of California San Diego, La Jolla, CA, United States
| | - Axel Cleeremans
- Centre for Research in Cognition & Neurosciences (CRCN), Université Libre de Bruxelles, Brussels, Belgium
| | - William Alexander
- Department of Experimental Psychology, Ghent University, Gent, Belgium
| |
Collapse
|
47
|
Vicario-Feliciano R, Murray EA, Averbeck BB. Ventral striatum lesions do not affect reinforcement learning with deterministic outcomes on slow time scales. Behav Neurosci 2017; 131:385-91. [PMID: 28805428 DOI: 10.1037/bne0000211] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
A large body of work has implicated the ventral striatum (VS) in aspects of reinforcement learning (RL). However, less work has directly examined the effects of lesions in the VS, or other forms of inactivation, on 2-armed bandit RL tasks. We have recently found that lesions in the VS in macaque monkeys affect learning with stochastic schedules but have minimal effects with deterministic schedules. The reasons for this are not currently clear. Because our previous work used short intertrial intervals, one possibility is that the animals were using working memory to bridge stimulus-reward associations from 1 trial to the next. In the present study, we examined learning of 60 pairs of objects, in which the animals received only 1 trial per day with each pair. The large number of object pairs and the long interval (approximately 24 hr) between trials with a given pair minimized the chances that the animals could use working memory to bridge trials. We found that monkeys with VS lesions were unimpaired relative to controls, which suggests that animals with VS lesions can still learn to select rewarded objects even when they cannot make use of working memory. (PsycINFO Database Record
Collapse
Affiliation(s)
- Raquel Vicario-Feliciano
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health
| | - Elisabeth A Murray
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health
| | - Bruno B Averbeck
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health
| |
Collapse
|
48
|
Autonomous robotic exploration using a utility function based on Rényi’s general theory of entropy. Auton Robots 2017. [DOI: 10.1007/s10514-017-9662-9] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
49
|
van den Berg R, Zylberberg A, Kiani R, Shadlen MN, Wolpert DM. Confidence Is the Bridge between Multi-stage Decisions. Curr Biol 2016; 26:3157-3168. [PMID: 27866891 PMCID: PMC5154755 DOI: 10.1016/j.cub.2016.10.021] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2016] [Revised: 09/18/2016] [Accepted: 10/12/2016] [Indexed: 11/30/2022]
Abstract
Demanding tasks often require a series of decisions to reach a goal. Recent progress in perceptual decision-making has served to unite decision accuracy, speed, and confidence in a common framework of bounded evidence accumulation, furnishing a platform for the study of such multi-stage decisions. In many instances, the strategy applied to each decision, such as the speed-accuracy trade-off, ought to depend on the accuracy of the previous decisions. However, as the accuracy of each decision is often unknown to the decision maker, we hypothesized that subjects may carry forward a level of confidence in previous decisions to affect subsequent decisions. Subjects made two perceptual decisions sequentially and were rewarded only if they made both correctly. The speed and accuracy of individual decisions were explained by noisy evidence accumulation to a terminating bound. We found that subjects adjusted their speed-accuracy setting by elevating the termination bound on the second decision in proportion to their confidence in the first. The findings reveal a novel role for confidence and a degree of flexibility, hitherto unknown, in the brain's ability to rapidly and precisely modify the mechanisms that control the termination of a decision.
Collapse
Affiliation(s)
- Ronald van den Berg
- Computational and Biological Learning Laboratory, Department of Engineering, Cambridge University, Cambridge CB2 1PZ, UK
| | - Ariel Zylberberg
- Department of Neuroscience, Zuckerman Mind Brain Behavior Institute, Kavli Institute of Brain Science, and Howard Hughes Medical Institute, Columbia University, New York, NY 10032, USA
| | - Roozbeh Kiani
- Center for Neural Science, New York University, New York, NY 10003, USA
| | - Michael N Shadlen
- Department of Neuroscience, Zuckerman Mind Brain Behavior Institute, Kavli Institute of Brain Science, and Howard Hughes Medical Institute, Columbia University, New York, NY 10032, USA
| | - Daniel M Wolpert
- Computational and Biological Learning Laboratory, Department of Engineering, Cambridge University, Cambridge CB2 1PZ, UK.
| |
Collapse
|
50
|
Abstract
A key component of interacting with the world is how to direct ones' sensors so as to extract task-relevant information - a process referred to as active sensing. In this review, we present a framework for active sensing that forms a closed loop between an ideal observer, that extracts task-relevant information from a sequence of observations, and an ideal planner which specifies the actions that lead to the most informative observations. We discuss active sensing as an approximation to exploration in the wider framework of reinforcement learning, and conversely, discuss several sensory, perceptual, and motor processes as approximations to active sensing. Based on this framework, we introduce a taxonomy of sensing strategies, identify hallmarks of active sensing, and discuss recent advances in formalizing and quantifying active sensing.
Collapse
Affiliation(s)
- Scott Cheng-Hsin Yang
- Computational and Biological Learning Lab, Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK
| | - Daniel M Wolpert
- Computational and Biological Learning Lab, Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK
| | - Máté Lengyel
- Computational and Biological Learning Lab, Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK.,Department of Cognitive Science, Central European University, Budapest H-1051, Hungary
| |
Collapse
|