1
|
Qian P, Bridgers S, Taliaferro M, Parece K, Ullman TD. Ambivalence by design: A computational account of loopholes. Cognition 2024; 252:105914. [PMID: 39178715 DOI: 10.1016/j.cognition.2024.105914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 07/31/2024] [Accepted: 08/02/2024] [Indexed: 08/26/2024]
Abstract
Loopholes offer an opening. Rather than comply or directly refuse, people can subvert an intended request by an intentional misunderstanding. Such behaviors exploit ambiguity and under-specification in language. Using loopholes is commonplace and intuitive in everyday social interaction, both familiar and consequential. Loopholes are also of concern in the law, and increasingly in artificial intelligence. However, the computational and cognitive underpinnings of loopholes are not well understood. Here, we propose a utility-theoretic recursive social reasoning model that formalizes and accounts for loophole behavior. The model captures the decision process of a loophole-aware listener, who trades off their own utility with that of the speaker, and considers an expected social penalty for non-cooperative behavior. The social penalty is computed through the listener's recursive reasoning about a virtual naive observer's inference of a naive listener's social intent. Our model captures qualitative patterns in previous data, and also generates new quantitative predictions consistent with novel studies (N = 265). We consider the broader implications of our model for other aspects of social reasoning, including plausible deniability and humor.
Collapse
Affiliation(s)
- Peng Qian
- Department of Brain and Cognitive Sciences, MIT, United States of America; Department of Psychology, Harvard University, United States of America.
| | - Sophie Bridgers
- Department of Brain and Cognitive Sciences, MIT, United States of America; Department of Psychology, Harvard University, United States of America
| | - Maya Taliaferro
- Department of Brain and Cognitive Sciences, MIT, United States of America
| | - Kiera Parece
- Department of Psychology, Harvard University, United States of America
| | - Tomer D Ullman
- Department of Psychology, Harvard University, United States of America
| |
Collapse
|
2
|
Lamba A, Frank MJ, FeldmanHall O. Keeping an Eye Out for Change: Anxiety Disrupts Adaptive Resolution of Policy Uncertainty. BIOLOGICAL PSYCHIATRY. COGNITIVE NEUROSCIENCE AND NEUROIMAGING 2024; 9:1188-1198. [PMID: 39069235 DOI: 10.1016/j.bpsc.2024.07.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2024] [Revised: 07/17/2024] [Accepted: 07/17/2024] [Indexed: 07/30/2024]
Abstract
BACKGROUND Human learning unfolds under uncertainty. Uncertainty is heterogeneous with different forms exerting distinct influences on learning. While one can be uncertain about what to do to maximize rewarding outcomes, known as policy uncertainty, one can also be uncertain about general world knowledge, known as epistemic uncertainty (EU). In complex and naturalistic environments such as the social world, adaptive learning may hinge on striking a balance between attending to and resolving each type of uncertainty. Prior work illustrates that people with anxiety-those with increased threat and uncertainty sensitivity-learn less from aversive outcomes, particularly as outcomes become more uncertain. How does a learner adaptively trade-off between attending to these distinct sources of uncertainty to successfully learn about their social environment? METHODS We developed a novel eye-tracking method to capture highly granular estimates of policy uncertainty and EU based on gaze patterns and pupil diameter (a physiological estimate of arousal). RESULTS These empirically derived uncertainty measures revealed that humans (N = 94) flexibly switched between resolving policy uncertainty and EU to adaptively learn about which individuals can be trusted and which should be avoided. However, those with increased anxiety (n = 49) did not flexibly switch between resolving policy uncertainty and EU and instead expressed less uncertainty overall. CONCLUSIONS Combining modeling and eye-tracking techniques, we show that altered learning in people with anxiety emerged from an insensitivity to policy uncertainty and rigid choice policies, leading to maladaptive behaviors with untrustworthy people.
Collapse
Affiliation(s)
- Amrita Lamba
- Department of Cognitive and Psychological Sciences, Brown University, Providence, Rhode Island; Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts
| | - Michael J Frank
- Department of Cognitive and Psychological Sciences, Brown University, Providence, Rhode Island; Carney Institute of Brain Sciences, Brown University, Providence, Rhode Island
| | - Oriel FeldmanHall
- Department of Cognitive and Psychological Sciences, Brown University, Providence, Rhode Island; Carney Institute of Brain Sciences, Brown University, Providence, Rhode Island.
| |
Collapse
|
3
|
Toghi A, Chizari M, Khosrowabadi R. A causal role of the right dorsolateral prefrontal cortex in random exploration. Sci Rep 2024; 14:24796. [PMID: 39433838 PMCID: PMC11493979 DOI: 10.1038/s41598-024-76025-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Accepted: 10/09/2024] [Indexed: 10/23/2024] Open
Abstract
Decision to explore new options with uncertain outcomes or exploit familiar options with known outcomes is a fundamental challenge that the brain faces in almost all real-life decisions. Previous studies have shown that humans use two main explorative strategies to negotiate this explore-exploit tradeoff. Exploring for the sake of information is called directed exploration, and exploration driven by behavioral variability is known as random exploration. While previous neuroimaging studies have shown different neural correlates for these explorative strategies, including right frontopolar cortex (FPC), right dorsolateral prefrontal cortex (DLPFC), and dorsal anterior cingulate cortex (dACC), there is still a lack of causal evidence for most of these brain regions. Here, we focused on the right DLPFC, which was previously supported to be involved in exploration. Using the continuous theta burst stimulation (cTBS) and Horizon task on twenty-five healthy right-handed adult participants, we showed that inhibiting rDLPFC did not change directed exploration but selectively reduced random exploration, by increasing reward sensitivity over the average reward of each option. This suggests a causal role for rDLPFC in random exploration, and further supports dissociable neural implementations for these two explorative strategies.
Collapse
Affiliation(s)
- Armin Toghi
- Institute for Cognitive and Brain Science, Shahid Beheshti University, Tehran, Iran
| | - Mojtaba Chizari
- Institute for Cognitive and Brain Science, Shahid Beheshti University, Tehran, Iran
| | - Reza Khosrowabadi
- Institute for Cognitive and Brain Science, Shahid Beheshti University, Tehran, Iran.
| |
Collapse
|
4
|
Parr AC, Sydnor VJ, Calabro FJ, Luna B. Adolescent-to-adult gains in cognitive flexibility are adaptively supported by reward sensitivity, exploration, and neural variability. Curr Opin Behav Sci 2024; 58:101399. [PMID: 38826569 PMCID: PMC11138371 DOI: 10.1016/j.cobeha.2024.101399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Cognitive flexibility exhibits dynamic changes throughout development, with different forms of flexibility showing dissociable developmental trajectories. In this review, we propose that an adolescent-specific mode of flexibility in the face of changing environmental contingencies supports the emergence of adolescent-to-adult gains in cognitive shifting efficiency. We first describe how cognitive shifting abilities monotonically improve from childhood to adulthood, accompanied by increases in brain state flexibility, neural variability, and excitatory/inhibitory balance. We next summarize evidence supporting the existence of a dopamine-driven, adolescent peak in flexible behavior that results in reward seeking, undirected exploration, and environmental sampling. We propose a neurodevelopmental framework that relates these adolescent behaviors to the refinement of neural phenotypes relevant to mature cognitive flexibility, and thus highlight the importance of the adolescent period in fostering healthy neurocognitive trajectories.
Collapse
Affiliation(s)
- Ashley C. Parr
- Department of Psychiatry, University of Pittsburgh, Pittsburgh PA, 14213, USA
| | - Valerie J. Sydnor
- Department of Psychiatry, University of Pittsburgh, Pittsburgh PA, 14213, USA
| | - Finnegan J. Calabro
- Department of Psychiatry, University of Pittsburgh, Pittsburgh PA, 14213, USA
| | - Beatriz Luna
- Department of Psychiatry, University of Pittsburgh, Pittsburgh PA, 14213, USA
- Department of Psychology, University of Pittsburgh, Pittsburgh PA, 14213, USA
| |
Collapse
|
5
|
Higashi H. Dynamics of visual attention in exploration and exploitation for reward-guided adjustment tasks. Conscious Cogn 2024; 123:103724. [PMID: 38996747 DOI: 10.1016/j.concog.2024.103724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 06/24/2024] [Accepted: 06/26/2024] [Indexed: 07/14/2024]
Abstract
The learning process encompasses exploration and exploitation phases. While reinforcement learning models have revealed functional and neuroscientific distinctions between these phases, knowledge regarding how they affect visual attention while observing the external environment is limited. This study sought to elucidate the interplay between these learning phases and visual attention allocation using visual adjustment tasks combined with a two-armed bandit problem tailored to detect serial effects only when attention is dispersed across both arms. Per our findings, human participants exhibited a distinct serial effect only during the exploration phase, suggesting enhanced attention to the visual stimulus associated with the non-target arm. Remarkably, although rewards did not motivate attention dispersion in our task, during the exploration phase, individuals engaged in active observation and searched for targets to observe. This behavior highlights a unique information-seeking process in exploration that is distinct from exploitation.
Collapse
Affiliation(s)
- Hiroshi Higashi
- Graduate School of Engineering, Osaka University, Suita, Osaka, Japan.
| |
Collapse
|
6
|
Greco A, D'Alessandro M, Gallitto G, Rastelli C, Braun C, Caria A. Statistical Learning of Incidental Perceptual Regularities Induces Sensory Conditioned Cortical Responses. BIOLOGY 2024; 13:576. [PMID: 39194514 DOI: 10.3390/biology13080576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Revised: 07/24/2024] [Accepted: 07/29/2024] [Indexed: 08/29/2024]
Abstract
Statistical learning of sensory patterns can lead to predictive neural processes enhancing stimulus perception and enabling fast deviancy detection. Predictive processes have been extensively demonstrated when environmental statistical regularities are relevant to task execution. Preliminary evidence indicates that statistical learning can even occur independently of task relevance and top-down attention, although the temporal profile and neural mechanisms underlying sensory predictions and error signals induced by statistical learning of incidental sensory regularities remain unclear. In our study, we adopted an implicit sensory conditioning paradigm that elicited the generation of specific perceptual priors in relation to task-irrelevant audio-visual associations, while recording Electroencephalography (EEG). Our results showed that learning task-irrelevant associations between audio-visual stimuli resulted in anticipatory neural responses to predictive auditory stimuli conveying anticipatory signals of expected visual stimulus presence or absence. Moreover, we observed specific modulation of cortical responses to probabilistic visual stimulus presentation or omission. Pattern similarity analysis indicated that predictive auditory stimuli tended to resemble the response to expected visual stimulus presence or absence. Remarkably, Hierarchical Gaussian filter modeling estimating dynamic changes of prediction error signals in relation to differential probabilistic occurrences of audio-visual stimuli further demonstrated instantiation of predictive neural signals by showing distinct neural processing of prediction error in relation to violation of expected visual stimulus presence or absence. Overall, our findings indicated that statistical learning of non-salient and task-irrelevant perceptual regularities could induce the generation of neural priors at the time of predictive stimulus presentation, possibly conveying sensory-specific information about the predicted consecutive stimulus.
Collapse
Affiliation(s)
- Antonino Greco
- Department of Neural Dynamics and Magnetoencephalography, Hertie Institute for Clinical Brain Research, University of Tübingen, 72076 Tübingen, Germany
- Werner Reichardt Centre for Integrative Neuroscience, University of Tübingen, 72076 Tübingen, Germany
- MEG Center, University of Tübingen, 72076 Tübingen, Germany
| | - Marco D'Alessandro
- Institute of Cognitive Sciences and Technologies, National Research Council, 00185 Rome, Italy
| | - Giuseppe Gallitto
- Department of Neurology, University Hospital Essen, 45147 Essen, Germany
| | - Clara Rastelli
- MEG Center, University of Tübingen, 72076 Tübingen, Germany
- Department of Psychology and Cognitive Science, University of Trento, 38068 Rovereto, Italy
| | - Christoph Braun
- Department of Neural Dynamics and Magnetoencephalography, Hertie Institute for Clinical Brain Research, University of Tübingen, 72076 Tübingen, Germany
- Werner Reichardt Centre for Integrative Neuroscience, University of Tübingen, 72076 Tübingen, Germany
- MEG Center, University of Tübingen, 72076 Tübingen, Germany
| | - Andrea Caria
- Department of Psychology and Cognitive Science, University of Trento, 38068 Rovereto, Italy
| |
Collapse
|
7
|
Poli F, Li YL, Naidu P, Mars RB, Hunnius S, Ruggeri A. Toddlers strategically adapt their information search. Nat Commun 2024; 15:5780. [PMID: 38987261 PMCID: PMC11237003 DOI: 10.1038/s41467-024-48855-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 05/14/2024] [Indexed: 07/12/2024] Open
Abstract
Adaptive information seeking is essential for humans to effectively navigate complex and dynamic environments. Here, we developed a gaze-contingent eye-tracking paradigm to examine the early emergence of adaptive information-seeking. Toddlers (N = 60, 18-36 months) and adults (N = 42) either learnt that an animal was equally likely to be found in any of four available locations, or that it was most likely to be found in one particular location. Afterwards, they were given control of a torchlight, which they could move with their eyes to explore the otherwise pitch-black task environment. Eye-movement data and Markov models show that, from 24 months of age, toddlers become more exploratory than adults, and start adapting their exploratory strategies to the information structure of the task. These results show that toddlers' search strategies are more sophisticated than previously thought, and identify the unique features that distinguish their information search from adults'.
Collapse
Affiliation(s)
- Francesco Poli
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands.
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK.
| | - Yi-Lin Li
- Wellcome Centre for Integrative Neuroimaging, Centre for Functional MRI of the Brain (FMRIB), Nuffield Department of Clinical Neurosciences, John Radcliffe Hospital, University of Oxford, Oxford, UK
| | - Pravallika Naidu
- Wellcome Centre for Integrative Neuroimaging, Centre for Functional MRI of the Brain (FMRIB), Nuffield Department of Clinical Neurosciences, John Radcliffe Hospital, University of Oxford, Oxford, UK
| | - Rogier B Mars
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Sabine Hunnius
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Azzurra Ruggeri
- Max Planck Research Group iSearch, Max Planck Institute for Human Development, Berlin, Germany.
- School of Social Sciences and Technology, Department of Education, Technical University Munich, Munich, Germany.
- Department of Cognitive Science, Central European University, Vienna, Austria.
| |
Collapse
|
8
|
Li N, Lavalley CA, Chou KP, Chuning AE, Taylor S, Goldman CM, Torres T, Hodson R, Wilson RC, Stewart JL, Khalsa SS, Paulus MP, Smith R. Directed exploration is elevated in affective disorders but reduced by an aversive interoceptive state induction. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.06.19.24309110. [PMID: 38947082 PMCID: PMC11213056 DOI: 10.1101/2024.06.19.24309110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Elevated anxiety and uncertainty avoidance are known to exacerbate maladaptive choice in individuals with affective disorders. However, the differential roles of state vs. trait anxiety remain unclear, and underlying computational mechanisms have not been thoroughly characterized. In the present study, we investigated how a somatic (interoceptive) state anxiety induction influences learning and decision-making under uncertainty in individuals with clinically significant levels of trait anxiety. A sample of 58 healthy comparisons (HCs) and 61 individuals with affective disorders (iADs; i.e., depression and/or anxiety) completed a previously validated explore-exploit decision task, with and without an added breathing resistance manipulation designed to induce state anxiety. Computational modeling revealed a pattern in which iADs showed greater information-seeking (i.e., directed exploration; Cohen's d=.39, p=.039) in resting conditions, but that this was reduced by the anxiety induction. The affective disorders group also showed slower learning rates across conditions (Cohen's d=.52, p=.003), suggesting more persistent uncertainty. These findings highlight a complex interplay between trait anxiety and state anxiety. Specifically, while elevated trait anxiety is associated with persistent uncertainty, acute somatic anxiety can paradoxically curtail exploratory behaviors, potentially reinforcing maladaptive decision-making patterns in affective disorders.
Collapse
Affiliation(s)
- Ning Li
- Laureate Institute for Brain Research, Tulsa, OK
| | | | - Ko-Ping Chou
- Laureate Institute for Brain Research, Tulsa, OK
| | | | | | | | | | - Rowan Hodson
- Laureate Institute for Brain Research, Tulsa, OK
| | - Robert C. Wilson
- Department of Psychology, University of Arizona, Tucson, AZ
- Cognitive Science Program, University of Arizona, Tucson, AZ
| | | | - Sahib S. Khalsa
- Laureate Institute for Brain Research, Tulsa, OK
- Oxley College of Health and Natural Sciences, University of Tulsa, Tulsa, OK
| | - Martin P. Paulus
- Laureate Institute for Brain Research, Tulsa, OK
- Oxley College of Health and Natural Sciences, University of Tulsa, Tulsa, OK
| | - Ryan Smith
- Laureate Institute for Brain Research, Tulsa, OK
- Oxley College of Health and Natural Sciences, University of Tulsa, Tulsa, OK
| |
Collapse
|
9
|
Harms MB, Xu Y, Green CS, Woodard K, Wilson R, Pollak SD. The structure and development of explore-exploit decision making. Cogn Psychol 2024; 150:101650. [PMID: 38461609 PMCID: PMC11275514 DOI: 10.1016/j.cogpsych.2024.101650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 03/05/2024] [Accepted: 03/06/2024] [Indexed: 03/12/2024]
Abstract
A critical component of human learning reflects the balance people must achieve between focusing on the utility of what they know versus openness to what they have yet to experience. How individuals decide whether to explore new options versus exploit known options has garnered growing interest in recent years. Yet, the component processes underlying decisions to explore and whether these processes change across development remain poorly understood. By contrasting a variety of tasks that measure exploration in slightly different ways, we found that decisions about whether to explore reflect (a) random exploration that is not explicitly goal-directed and (b) directed exploration to purposefully reduce uncertainty. While these components similarly characterized the decision-making of both youth and adults, younger participants made decisions that were less strategic, but more exploratory and flexible, than those of adults. These findings are discussed in terms of how people adapt to and learn from changing environments over time.Data has been made available in the Open Science Foundation platform (osf.io).
Collapse
Affiliation(s)
- Madeline B Harms
- Department of Psychology, University of Wisconsin - Madison, 1202 West Johnson Street, Madison, WI 53706, United States.
| | - Yuyan Xu
- Department of Psychology, University of Wisconsin - Madison, 1202 West Johnson Street, Madison, WI 53706, United States
| | - C Shawn Green
- Department of Psychology, University of Wisconsin - Madison, 1202 West Johnson Street, Madison, WI 53706, United States
| | - Kristina Woodard
- Department of Psychology, University of Wisconsin - Madison, 1202 West Johnson Street, Madison, WI 53706, United States
| | - Robert Wilson
- Department of Psychology, University of Arizona, 1503 E. University Blvd. (Building 68), Tucson, AZ 85721, United States
| | - Seth D Pollak
- Department of Psychology, University of Wisconsin - Madison, 1202 West Johnson Street, Madison, WI 53706, United States
| |
Collapse
|
10
|
Arumugam D, Ho MK, Goodman ND, Van Roy B. Bayesian Reinforcement Learning With Limited Cognitive Load. Open Mind (Camb) 2024; 8:395-438. [PMID: 38665544 PMCID: PMC11045037 DOI: 10.1162/opmi_a_00132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Accepted: 02/16/2024] [Indexed: 04/28/2024] Open
Abstract
All biological and artificial agents must act given limits on their ability to acquire and process information. As such, a general theory of adaptive behavior should be able to account for the complex interactions between an agent's learning history, decisions, and capacity constraints. Recent work in computer science has begun to clarify the principles that shape these dynamics by bridging ideas from reinforcement learning, Bayesian decision-making, and rate-distortion theory. This body of work provides an account of capacity-limited Bayesian reinforcement learning, a unifying normative framework for modeling the effect of processing constraints on learning and action selection. Here, we provide an accessible review of recent algorithms and theoretical results in this setting, paying special attention to how these ideas can be applied to studying questions in the cognitive and behavioral sciences.
Collapse
Affiliation(s)
| | - Mark K. Ho
- Center for Data Science, New York University
| | - Noah D. Goodman
- Department of Computer Science, Stanford University
- Department of Psychology, Stanford University
| | - Benjamin Van Roy
- Department of Electrical Engineering, Stanford University
- Department of Management Science & Engineering, Stanford University
| |
Collapse
|
11
|
Wiehler A, Peters J. Decomposition of Reinforcement Learning Deficits in Disordered Gambling via Drift Diffusion Modeling and Functional Magnetic Resonance Imaging. COMPUTATIONAL PSYCHIATRY (CAMBRIDGE, MASS.) 2024; 8:23-45. [PMID: 38774428 PMCID: PMC11104325 DOI: 10.5334/cpsy.104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 03/07/2024] [Indexed: 05/24/2024]
Abstract
Gambling disorder is associated with deficits in reward-based learning, but the underlying computational mechanisms are still poorly understood. Here, we examined this issue using a stationary reinforcement learning task in combination with computational modeling and functional resonance imaging (fMRI) in individuals that regular participate in gambling (n = 23, seven fulfilled one to three DSM 5 criteria for gambling disorder, sixteen fulfilled four or more) and matched controls (n = 23). As predicted, the gambling group exhibited substantially reduced accuracy, whereas overall response times (RTs) were not reliably different between groups. We then used comprehensive modeling using reinforcement learning drift diffusion models (RLDDMs) in combination with hierarchical Bayesian parameter estimation to shed light on the computational underpinnings of this performance deficit. In both groups, an RLDDM in which both non-decision time and decision threshold (boundary separation) changed over the course of the experiment accounted for the data best. The model showed good parameter and model recovery, and posterior predictive checks revealed that, in both groups, the model accurately reproduced the evolution of accuracies and RTs over time. Modeling revealed that, compared to controls, the learning impairment in the gambling group was linked to a more rapid reduction in decision thresholds over time, and a reduced impact of value-differences on the drift rate. The gambling group also showed shorter non-decision times. FMRI analyses replicated effects of prediction error coding in the ventral striatum and value coding in the ventro-medial prefrontal cortex, but there was no credible evidence for group differences in these effects. Taken together, our findings show that reinforcement learning impairments in disordered gambling are linked to both maladaptive decision threshold adjustments and a reduced consideration of option values in the choice process.
Collapse
Affiliation(s)
- Antonius Wiehler
- Department of Systems Neuroscience, University Medical Centre Hamburg-Eppendorf, Hamburg, Germany
- Institut du Cerveau et de la Moelle épinière (ICM), INSERM U 1127, CNRS UMR 7225, Sorbonne Universités Paris, France
| | - Jan Peters
- Department of Systems Neuroscience, University Medical Centre Hamburg-Eppendorf, Hamburg, Germany
- Department of Psychology, Biological Psychology, University of Cologne, Cologne, Germany
| |
Collapse
|
12
|
Colas JT, O’Doherty JP, Grafton ST. Active reinforcement learning versus action bias and hysteresis: control with a mixture of experts and nonexperts. PLoS Comput Biol 2024; 20:e1011950. [PMID: 38552190 PMCID: PMC10980507 DOI: 10.1371/journal.pcbi.1011950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 02/26/2024] [Indexed: 04/01/2024] Open
Abstract
Active reinforcement learning enables dynamic prediction and control, where one should not only maximize rewards but also minimize costs such as of inference, decisions, actions, and time. For an embodied agent such as a human, decisions are also shaped by physical aspects of actions. Beyond the effects of reward outcomes on learning processes, to what extent can modeling of behavior in a reinforcement-learning task be complicated by other sources of variance in sequential action choices? What of the effects of action bias (for actions per se) and action hysteresis determined by the history of actions chosen previously? The present study addressed these questions with incremental assembly of models for the sequential choice data from a task with hierarchical structure for additional complexity in learning. With systematic comparison and falsification of computational models, human choices were tested for signatures of parallel modules representing not only an enhanced form of generalized reinforcement learning but also action bias and hysteresis. We found evidence for substantial differences in bias and hysteresis across participants-even comparable in magnitude to the individual differences in learning. Individuals who did not learn well revealed the greatest biases, but those who did learn accurately were also significantly biased. The direction of hysteresis varied among individuals as repetition or, more commonly, alternation biases persisting from multiple previous actions. Considering that these actions were button presses with trivial motor demands, the idiosyncratic forces biasing sequences of action choices were robust enough to suggest ubiquity across individuals and across tasks requiring various actions. In light of how bias and hysteresis function as a heuristic for efficient control that adapts to uncertainty or low motivation by minimizing the cost of effort, these phenomena broaden the consilient theory of a mixture of experts to encompass a mixture of expert and nonexpert controllers of behavior.
Collapse
Affiliation(s)
- Jaron T. Colas
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, California, United States of America
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, California, United States of America
- Computation and Neural Systems Program, California Institute of Technology, Pasadena, California, United States of America
| | - John P. O’Doherty
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, California, United States of America
- Computation and Neural Systems Program, California Institute of Technology, Pasadena, California, United States of America
| | - Scott T. Grafton
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, California, United States of America
| |
Collapse
|
13
|
Li JJ, Shi C, Li L, Collins AGE. Dynamic noise estimation: A generalized method for modeling noise fluctuations in decision-making. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.06.19.545524. [PMID: 38328176 PMCID: PMC10849494 DOI: 10.1101/2023.06.19.545524] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Computational cognitive modeling is an important tool for understanding the processes supporting human and animal decision-making. Choice data in decision-making tasks are inherently noisy, and separating noise from signal can improve the quality of computational modeling. Common approaches to model decision noise often assume constant levels of noise or exploration throughout learning (e.g., the ϵ -softmax policy). However, this assumption is not guaranteed to hold - for example, a subject might disengage and lapse into an inattentive phase for a series of trials in the middle of otherwise low-noise performance. Here, we introduce a new, computationally inexpensive method to dynamically infer the levels of noise in choice behavior, under a model assumption that agents can transition between two discrete latent states (e.g., fully engaged and random). Using simulations, we show that modeling noise levels dynamically instead of statically can substantially improve model fit and parameter estimation, especially in the presence of long periods of noisy behavior, such as prolonged attentional lapses. We further demonstrate the empirical benefits of dynamic noise estimation at the individual and group levels by validating it on four published datasets featuring diverse populations, tasks, and models. Based on the theoretical and empirical evaluation of the method reported in the current work, we expect that dynamic noise estimation will improve modeling in many decision-making paradigms over the static noise estimation method currently used in the modeling literature, while keeping additional model complexity and assumptions minimal.
Collapse
Affiliation(s)
- Jing-Jing Li
- Helen Wills Neuroscience Institute, University of California, Berkeley, 175 Li Ka Shing Center, Berkeley, 94720, CA, United States
| | - Chengchun Shi
- Department of Statistics, London School of Economics and Political Science, 69 Aldwych, London, WC2B 4RR, United Kingdom
| | - Lexin Li
- Helen Wills Neuroscience Institute, University of California, Berkeley, 175 Li Ka Shing Center, Berkeley, 94720, CA, United States
- Department of Biostatistics and Epidemiology, University of California, Berkeley, 2121 Berkeley Way, Berkeley, 94720, CA, United States
| | - Anne G E Collins
- Helen Wills Neuroscience Institute, University of California, Berkeley, 175 Li Ka Shing Center, Berkeley, 94720, CA, United States
- Department of Psychology, University of California, Berkeley, Berkeley, 94720, CA, United States
| |
Collapse
|
14
|
Gordon J, Chierichetti F, Panconesi A, Pezzulo G. Information foraging with an oracle. PLoS One 2023; 18:e0295005. [PMID: 38153955 PMCID: PMC10754449 DOI: 10.1371/journal.pone.0295005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 11/13/2023] [Indexed: 12/30/2023] Open
Abstract
During ecological decisions, such as when foraging for food or selecting a weekend activity, we often have to balance the costs and benefits of exploiting known options versus exploring novel ones. Here, we ask how individuals address such cost-benefit tradeoffs during tasks in which we can either explore by ourselves or seek external advice from an oracle (e.g., a domain expert or recommendation system). To answer this question, we designed two studies in which participants chose between inquiring (at a cost) for expert advice from an oracle, or to search for options without guidance, under manipulations affecting the optimal choice. We found that participants showed a greater propensity to seek expert advice when it was instrumental to increase payoff (study A), and when it reduced choice uncertainty, above and beyond payoff maximization (study B). This latter result was especially apparent in participants with greater trait-level intolerance of uncertainty. Taken together, these results suggest that we seek expert advice for both economic goals (i.e., payoff maximization) and epistemic goals (i.e., uncertainty minimization) and that our decisions to ask or not ask for advice are sensitive to cost-benefit tradeoffs.
Collapse
Affiliation(s)
- Jeremy Gordon
- University of California, Berkeley, Berkeley, CA, United States of America
| | | | | | - Giovanni Pezzulo
- Institute of Cognitive Sciences and Technologies, National Research Council, Rome, Italy
| |
Collapse
|
15
|
Modirshanechi A, Kondrakiewicz K, Gerstner W, Haesler S. Curiosity-driven exploration: foundations in neuroscience and computational modeling. Trends Neurosci 2023; 46:1054-1066. [PMID: 37925342 DOI: 10.1016/j.tins.2023.10.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 09/28/2023] [Accepted: 10/04/2023] [Indexed: 11/06/2023]
Abstract
Curiosity refers to the intrinsic desire of humans and animals to explore the unknown, even when there is no apparent reason to do so. Thus far, no single, widely accepted definition or framework for curiosity has emerged, but there is growing consensus that curious behavior is not goal-directed but related to seeking or reacting to information. In this review, we take a phenomenological approach and group behavioral and neurophysiological studies which meet these criteria into three categories according to the type of information seeking observed. We then review recent computational models of curiosity from the field of machine learning and discuss how they enable integrating different types of information seeking into one theoretical framework. Combinations of behavioral and neurophysiological studies along with computational modeling will be instrumental in demystifying the notion of curiosity.
Collapse
Affiliation(s)
| | - Kacper Kondrakiewicz
- Neuroelectronics Research Flanders (NERF), Leuven, Belgium; VIB, Leuven, Belgium; Department of Neuroscience, KU Leuven, Leuven, Belgium
| | - Wulfram Gerstner
- École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.
| | - Sebastian Haesler
- Neuroelectronics Research Flanders (NERF), Leuven, Belgium; VIB, Leuven, Belgium; Department of Neuroscience, KU Leuven, Leuven, Belgium; Leuven Brain Institute, Leuven, Belgium.
| |
Collapse
|
16
|
Lloyd A, Viding E, McKay R, Furl N. Understanding patch foraging strategies across development. Trends Cogn Sci 2023; 27:1085-1098. [PMID: 37500422 DOI: 10.1016/j.tics.2023.07.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 07/05/2023] [Accepted: 07/06/2023] [Indexed: 07/29/2023]
Abstract
Patch foraging is a near-ubiquitous behaviour across the animal kingdom and characterises many decision-making domains encountered by humans. We review how a disposition to explore in adolescence may reflect the evolutionary conditions under which hunter-gatherers foraged for resources. We propose that neurocomputational mechanisms responsible for reward processing, learning, and cognitive control facilitate the transition from exploratory strategies in adolescence to exploitative strategies in adulthood - where individuals capitalise on known resources. This developmental transition may be disrupted by psychopathology, as there is emerging evidence of biases in explore/exploit choices in mental health problems. Explore/exploit choices may be an informative marker for mental health across development and future research should consider this feature of decision-making as a target for clinical intervention.
Collapse
Affiliation(s)
- Alex Lloyd
- Clinical, Educational, and Health Psychology, Psychology and Language Sciences, University College London, 26 Bedford Way, London, WC1H 0AP, UK.
| | - Essi Viding
- Clinical, Educational, and Health Psychology, Psychology and Language Sciences, University College London, 26 Bedford Way, London, WC1H 0AP, UK
| | - Ryan McKay
- Department of Psychology, Royal Holloway, University of London, Egham Hill, Egham, TW20 0EX, UK
| | - Nicholas Furl
- Department of Psychology, Royal Holloway, University of London, Egham Hill, Egham, TW20 0EX, UK
| |
Collapse
|
17
|
Modirshanechi A, Becker S, Brea J, Gerstner W. Surprise and novelty in the brain. Curr Opin Neurobiol 2023; 82:102758. [PMID: 37619425 DOI: 10.1016/j.conb.2023.102758] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 06/30/2023] [Accepted: 07/20/2023] [Indexed: 08/26/2023]
Abstract
Notions of surprise and novelty have been used in various experimental and theoretical studies across multiple brain areas and species. However, 'surprise' and 'novelty' refer to different quantities in different studies, which raises concerns about whether these studies indeed relate to the same functionalities and mechanisms in the brain. Here, we address these concerns through a systematic investigation of how different aspects of surprise and novelty relate to different brain functions and physiological signals. We review recent classifications of definitions proposed for surprise and novelty along with links to experimental observations. We show that computational modeling and quantifiable definitions enable novel interpretations of previous findings and form a foundation for future theoretical and experimental studies.
Collapse
Affiliation(s)
- Alireza Modirshanechi
- Brain-Mind Institute, School of Life Sciences, EPFL, Lausanne, Switzerland; School of Computer and Communication Sciences, EPFL, Lausanne, Switzerland.
| | - Sophia Becker
- Brain-Mind Institute, School of Life Sciences, EPFL, Lausanne, Switzerland; School of Computer and Communication Sciences, EPFL, Lausanne, Switzerland. https://twitter.com/sophiabecker95
| | - Johanni Brea
- Brain-Mind Institute, School of Life Sciences, EPFL, Lausanne, Switzerland; School of Computer and Communication Sciences, EPFL, Lausanne, Switzerland
| | - Wulfram Gerstner
- Brain-Mind Institute, School of Life Sciences, EPFL, Lausanne, Switzerland; School of Computer and Communication Sciences, EPFL, Lausanne, Switzerland.
| |
Collapse
|
18
|
Brändle F, Stocks LJ, Tenenbaum JB, Gershman SJ, Schulz E. Empowerment contributes to exploration behaviour in a creative video game. Nat Hum Behav 2023; 7:1481-1489. [PMID: 37488401 DOI: 10.1038/s41562-023-01661-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Accepted: 06/15/2023] [Indexed: 07/26/2023]
Abstract
Studies of human exploration frequently cast people as serendipitously stumbling upon good options. Yet these studies may not capture the richness of exploration strategies that people exhibit in more complex environments. Here we study behaviour in a large dataset of 29,493 players of the richly structured online game 'Little Alchemy 2'. In this game, players start with four elements, which they can combine to create up to 720 complex objects. We find that players are driven not only by external reward signals, such as an attempt to produce successful outcomes, but also by an intrinsic motivation to create objects that empower them to create even more objects. We find that this drive for empowerment is eliminated when playing a game variant that lacks recognizable semantics, indicating that people use their knowledge about the world and its possibilities to guide their exploration. Our results suggest that the drive for empowerment may be a potent source of intrinsic motivation in richly structured domains, particularly those that lack explicit reward signals.
Collapse
Affiliation(s)
| | - Lena J Stocks
- Max Planck Institute for Biological Cybernetics, Tübingen, Germany
| | - Joshua B Tenenbaum
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- Center for Brains, Minds, and Machines, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Samuel J Gershman
- Center for Brains, Minds, and Machines, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA, USA
| | - Eric Schulz
- Max Planck Institute for Biological Cybernetics, Tübingen, Germany
| |
Collapse
|
19
|
Yang Y, Ai C, Chen W, Zhen J, Kong X, Jiang Y. Recent Advances in Sources of Bio-Inspiration and Materials for Robotics and Actuators. SMALL METHODS 2023; 7:e2300338. [PMID: 37381685 DOI: 10.1002/smtd.202300338] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 05/16/2023] [Indexed: 06/30/2023]
Abstract
Bionic robotics and actuators have made dramatic advancements in structural design, material preparation, and application owing to the richness of nature and innovative material design. Appropriate and ingenious sources of bio-inspiration can stimulate a large number of different bionic systems. After millennia of survival and evolutionary exploration, the mere existence of life confirms that nature is constantly moving in an evolutionary direction of optimization and improvement. To this end, bio-inspired robots and actuators can be constructed for the completion of a variety of artificial design instructions and requirements. In this article, the advances in bio-inspired materials for robotics and actuators with the sources of bio-inspiration are reviewed. The specific sources of inspiration in bionic systems and corresponding bio-inspired applications are summarized first. Then the basic functions of materials in bio-inspired robots and actuators is discussed. Moreover, a principle of matching biomaterials is creatively suggested. Furthermore, the implementation of biological information extraction is discussed, and the preparation methods of bionic materials are reclassified. Finally, the challenges and potential opportunities involved in finding sources of bio-inspiration and materials for robotics and actuators in the future is discussed.
Collapse
Affiliation(s)
- Yue Yang
- Hebei Provincial Key Laboratory of Heavy Machinery Fluid Power Transmission and Control, Yanshan University, Qinhuangdao, 066004, P.R. China
- School of Mechanical Engineering, Yanshan University, Qinhuangdao, 066004, P.R. China
| | - Chao Ai
- Hebei Provincial Key Laboratory of Heavy Machinery Fluid Power Transmission and Control, Yanshan University, Qinhuangdao, 066004, P.R. China
- School of Mechanical Engineering, Yanshan University, Qinhuangdao, 066004, P.R. China
- Key Laboratory of Advanced Forging & Stamping Technology and Science (Yanshan University), Ministry of Education of China, Qinhuangdao, 066004, P.R. China
| | - Wenting Chen
- Hebei Provincial Key Laboratory of Heavy Machinery Fluid Power Transmission and Control, Yanshan University, Qinhuangdao, 066004, P.R. China
- School of Mechanical Engineering, Yanshan University, Qinhuangdao, 066004, P.R. China
- Key Laboratory of Advanced Forging & Stamping Technology and Science (Yanshan University), Ministry of Education of China, Qinhuangdao, 066004, P.R. China
| | - Jinpeng Zhen
- Hebei Provincial Key Laboratory of Heavy Machinery Fluid Power Transmission and Control, Yanshan University, Qinhuangdao, 066004, P.R. China
- School of Mechanical Engineering, Yanshan University, Qinhuangdao, 066004, P.R. China
| | - Xiangdong Kong
- Hebei Provincial Key Laboratory of Heavy Machinery Fluid Power Transmission and Control, Yanshan University, Qinhuangdao, 066004, P.R. China
- School of Mechanical Engineering, Yanshan University, Qinhuangdao, 066004, P.R. China
- Key Laboratory of Advanced Forging & Stamping Technology and Science (Yanshan University), Ministry of Education of China, Qinhuangdao, 066004, P.R. China
| | - Yunhong Jiang
- Hub for Biotechnology in the Built Environment, Department of Applied Sciences, Northumbria University, Newcastle, NE1 8ST, UK
| |
Collapse
|
20
|
Fan H, Burke T, Sambrano DC, Dial E, Phelps EA, Gershman SJ. Pupil Size Encodes Uncertainty during Exploration. J Cogn Neurosci 2023; 35:1508-1520. [PMID: 37382476 DOI: 10.1162/jocn_a_02025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/30/2023]
Abstract
Exploration is an important part of decision making and is crucial to maximizing long-term rewards. Past work has shown that people use different forms of uncertainty to guide exploration. In this study, we investigate the role of the pupil-linked arousal system in uncertainty-guided exploration. We measured participants' (n = 48) pupil dilation while they performed a two-armed bandit task. Consistent with previous work, we found that people adopted a hybrid of directed, random, and undirected exploration, which are sensitive to relative uncertainty, total uncertainty, and value difference between options, respectively. We also found a positive correlation between pupil size and total uncertainty. Furthermore, augmenting the choice model with subject-specific total uncertainty estimates decoded from the pupil size improved predictions of held-out choices, suggesting that people used the uncertainty estimate encoded in pupil size to decide which option to explore. Together, the data shed light on the computations underlying uncertainty-driven exploration. Under the assumption that pupil size reflects locus coeruleus-norepinephrine neuromodulatory activity, these results also extend the theory of the locus coeruleus-norepinephrine function in exploration, highlighting its selective role in driving uncertainty-guided random exploration.
Collapse
Affiliation(s)
| | | | | | | | | | - Samuel J Gershman
- Harvard University, Cambridge, MA
- Center for Brains, Minds, and Machines, MIT, Cambridge, MA
| |
Collapse
|
21
|
Blackwell KT, Doya K. Enhancing reinforcement learning models by including direct and indirect pathways improves performance on striatal dependent tasks. PLoS Comput Biol 2023; 19:e1011385. [PMID: 37594982 PMCID: PMC10479916 DOI: 10.1371/journal.pcbi.1011385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 09/05/2023] [Accepted: 07/25/2023] [Indexed: 08/20/2023] Open
Abstract
A major advance in understanding learning behavior stems from experiments showing that reward learning requires dopamine inputs to striatal neurons and arises from synaptic plasticity of cortico-striatal synapses. Numerous reinforcement learning models mimic this dopamine-dependent synaptic plasticity by using the reward prediction error, which resembles dopamine neuron firing, to learn the best action in response to a set of cues. Though these models can explain many facets of behavior, reproducing some types of goal-directed behavior, such as renewal and reversal, require additional model components. Here we present a reinforcement learning model, TD2Q, which better corresponds to the basal ganglia with two Q matrices, one representing direct pathway neurons (G) and another representing indirect pathway neurons (N). Unlike previous two-Q architectures, a novel and critical aspect of TD2Q is to update the G and N matrices utilizing the temporal difference reward prediction error. A best action is selected for N and G using a softmax with a reward-dependent adaptive exploration parameter, and then differences are resolved using a second selection step applied to the two action probabilities. The model is tested on a range of multi-step tasks including extinction, renewal, discrimination; switching reward probability learning; and sequence learning. Simulations show that TD2Q produces behaviors similar to rodents in choice and sequence learning tasks, and that use of the temporal difference reward prediction error is required to learn multi-step tasks. Blocking the update rule on the N matrix blocks discrimination learning, as observed experimentally. Performance in the sequence learning task is dramatically improved with two matrices. These results suggest that including additional aspects of basal ganglia physiology can improve the performance of reinforcement learning models, better reproduce animal behaviors, and provide insight as to the role of direct- and indirect-pathway striatal neurons.
Collapse
Affiliation(s)
- Kim T Blackwell
- Department of Bioengineering, Volgenau School of Engineering, George Mason University, Fairfax, Virginia, United States of America
| | - Kenji Doya
- Neural Computation Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| |
Collapse
|
22
|
Shamash P, Lee S, Saxe AM, Branco T. Mice identify subgoal locations through an action-driven mapping process. Neuron 2023; 111:1966-1978.e8. [PMID: 37119818 PMCID: PMC10636595 DOI: 10.1016/j.neuron.2023.03.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 10/12/2022] [Accepted: 03/27/2023] [Indexed: 05/01/2023]
Abstract
Mammals form mental maps of the environments by exploring their surroundings. Here, we investigate which elements of exploration are important for this process. We studied mouse escape behavior, in which mice are known to memorize subgoal locations-obstacle edges-to execute efficient escape routes to shelter. To test the role of exploratory actions, we developed closed-loop neural-stimulation protocols for interrupting various actions while mice explored. We found that blocking running movements directed at obstacle edges prevented subgoal learning; however, blocking several control movements had no effect. Reinforcement learning simulations and analysis of spatial data show that artificial agents can match these results if they have a region-level spatial representation and explore with object-directed movements. We conclude that mice employ an action-driven process for integrating subgoals into a hierarchical cognitive map. These findings broaden our understanding of the cognitive toolkit that mammals use to acquire spatial knowledge.
Collapse
Affiliation(s)
- Philip Shamash
- UCL Sainsbury Wellcome Centre for Neural Circuits and Behaviour, London W1T 4JG, UK
| | - Sebastian Lee
- UCL Gatsby Computational Neuroscience Unit, London W1T 4JG, UK
| | - Andrew M Saxe
- UCL Gatsby Computational Neuroscience Unit, London W1T 4JG, UK
| | - Tiago Branco
- UCL Sainsbury Wellcome Centre for Neural Circuits and Behaviour, London W1T 4JG, UK.
| |
Collapse
|
23
|
Dubourg E, Thouzeau V, de Dampierre C, Mogoutov A, Baumard N. Exploratory preferences explain the human fascination for imaginary worlds in fictional stories. Sci Rep 2023; 13:8657. [PMID: 37246187 DOI: 10.1038/s41598-023-35151-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 05/13/2023] [Indexed: 05/30/2023] Open
Abstract
Imaginary worlds are present and often central in many of the most culturally successful modern narrative fictions, be it in novels (e.g., Harry Potter), movies (e.g., Star Wars), video games (e.g., The Legend of Zelda), graphic novels (e.g., One Piece) and TV series (e.g., Game of Thrones). We propose that imaginary worlds are popular because they activate exploratory preferences that evolved to help us navigate the real world and find new fitness-relevant information. Therefore, we hypothesize that the attraction to imaginary worlds is intrinsically linked to the desire to explore novel environments and that both are influenced by the same underlying factors. Notably, the inter-individual and cross-cultural variability of the preference for imaginary worlds should follow the inter-individual and cross-cultural variability of exploratory preferences (with the personality trait Openness-to-experience, age, sex, and ecological conditions). We test these predictions with both experimental and computational methods. For experimental tests, we run a pre-registered online experiment about movie preferences (N = 230). For computational tests, we leverage two large cultural datasets, namely the Internet Movie Database (N = 9424 movies) and the Movie Personality Dataset (N = 3.5 million participants), and use machine-learning algorithms (i.e., random forest and topic modeling). In all, consistent with how the human preference for spatial exploration adaptively varies, we provide empirical evidence that imaginary worlds appeal more to more explorative people, people higher in Openness-to-experience, younger individuals, males, and individuals living in more affluent environments. We discuss the implications of these findings for our understanding of the cultural evolution of narrative fiction and, more broadly, the evolution of human exploratory preferences.
Collapse
Affiliation(s)
- Edgar Dubourg
- Institut Jean Nicod, Département d'études cognitives, Ecole normale supérieure, Université PSL, EHESS, CNRS, Paris, France.
| | - Valentin Thouzeau
- Institut Jean Nicod, Département d'études cognitives, Ecole normale supérieure, Université PSL, EHESS, CNRS, Paris, France
| | - Charles de Dampierre
- Institut Jean Nicod, Département d'études cognitives, Ecole normale supérieure, Université PSL, EHESS, CNRS, Paris, France
| | - Andrei Mogoutov
- Institut Jean Nicod, Département d'études cognitives, Ecole normale supérieure, Université PSL, EHESS, CNRS, Paris, France
| | - Nicolas Baumard
- Institut Jean Nicod, Département d'études cognitives, Ecole normale supérieure, Université PSL, EHESS, CNRS, Paris, France
| |
Collapse
|
24
|
Turner MA, Moya C, Smaldino PE, Jones JH. The form of uncertainty affects selection for social learning. EVOLUTIONARY HUMAN SCIENCES 2023; 5:e20. [PMID: 37587949 PMCID: PMC10426062 DOI: 10.1017/ehs.2023.11] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 04/13/2023] [Accepted: 04/14/2023] [Indexed: 08/18/2023] Open
Abstract
Social learning is a critical adaptation for dealing with different forms of variability. Uncertainty is a severe form of variability where the space of possible decisions or probabilities of associated outcomes are unknown. We identified four theoretically important sources of uncertainty: temporal environmental variability; payoff ambiguity; selection-set size; and effective lifespan. When these combine, it is nearly impossible to fully learn about the environment. We develop an evolutionary agent-based model to test how each form of uncertainty affects the evolution of social learning. Agents perform one of several behaviours, modelled as a multi-armed bandit, to acquire payoffs. All agents learn about behavioural payoffs individually through an adaptive behaviour-choice model that uses a softmax decision rule. Use of vertical and oblique payoff-biased social learning evolved to serve as a scaffold for adaptive individual learning - they are not opposite strategies. Different types of uncertainty had varying effects. Temporal environmental variability suppressed social learning, whereas larger selection-set size promoted social learning, even when the environment changed frequently. Payoff ambiguity and lifespan interacted with other uncertainty parameters. This study begins to explain how social learning can predominate despite highly variable real-world environments when effective individual learning helps individuals recover from learning outdated social information.
Collapse
Affiliation(s)
- Matthew A. Turner
- Department of Earth System Science, Stanford University, Stanford, CA 94305 USA
- Division of Social Sciences, Stanford Doerr School of Sustainability, Stanford University, Stanford, CA 94305 USA
| | - Cristina Moya
- Department of Anthropology, University of California at Davis, Davis, CA 95616 USA
| | - Paul E. Smaldino
- Cognitive and Information Sciences, University of California at Merced, Merced, CA 95340 USA
- Santa Fe Institute, Santa Fe, NM 87501 USA
- Center for Advanced Study in the Behavioral Sciences, Stanford University, Stanford, CA 94305 USA
| | - James Holland Jones
- Department of Earth System Science, Stanford University, Stanford, CA 94305 USA
- Division of Social Sciences, Stanford Doerr School of Sustainability, Stanford University, Stanford, CA 94305 USA
- Center for Advanced Study in the Behavioral Sciences, Stanford University, Stanford, CA 94305 USA
| |
Collapse
|
25
|
Roark CL, Chandrasekaran B. Stable, flexible, common, and distinct behaviors support rule-based and information-integration category learning. NPJ SCIENCE OF LEARNING 2023; 8:14. [PMID: 37179364 PMCID: PMC10183008 DOI: 10.1038/s41539-023-00163-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 04/21/2023] [Indexed: 05/15/2023]
Abstract
The ability to organize variable sensory signals into discrete categories is a fundamental process in human cognition thought to underlie many real-world learning problems. Decades of research suggests that two learning systems may support category learning and that categories with different distributional structures (rule-based, information-integration) optimally rely on different learning systems. However, it remains unclear how the same individual learns these different categories and whether the behaviors that support learning success are common or distinct across different categories. In two experiments, we investigate learning and develop a taxonomy of learning behaviors to investigate which behaviors are stable or flexible as the same individual learns rule-based and information-integration categories and which behaviors are common or distinct to learning success for these different types of categories. We found that some learning behaviors are stable in an individual across category learning tasks (learning success, strategy consistency), while others are flexibly task-modulated (learning speed, strategy, stability). Further, success in rule-based and information-integration category learning was supported by both common (faster learning speeds, higher working memory ability) and distinct factors (learning strategies, strategy consistency). Overall, these results demonstrate that even with highly similar categories and identical training tasks, individuals dynamically adjust some behaviors to fit the task and success in learning different kinds of categories is supported by both common and distinct factors. These results illustrate a need for theoretical perspectives of category learning to include nuances of behavior at the level of an individual learner.
Collapse
Affiliation(s)
- Casey L Roark
- Department of Communication Science & Disorders,University of Pittsburgh, Pittsburgh, PA, USA.
- Center for the Neural Basis of Cognition, Pittsburgh, PA, USA.
| | - Bharath Chandrasekaran
- Department of Communication Science & Disorders,University of Pittsburgh, Pittsburgh, PA, USA.
- Center for the Neural Basis of Cognition, Pittsburgh, PA, USA.
| |
Collapse
|
26
|
Frankenhuis WE, Gopnik A. Early adversity and the development of explore-exploit tradeoffs. Trends Cogn Sci 2023:S1364-6613(23)00091-8. [PMID: 37142526 DOI: 10.1016/j.tics.2023.04.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Revised: 03/28/2023] [Accepted: 04/05/2023] [Indexed: 05/06/2023]
Abstract
Childhood adversity can have wide-ranging and long-lasting effects on later life. But what are the mechanisms that are responsible for these effects? This article brings together the cognitive science literature on explore-exploit tradeoffs, the empirical literature on early adversity, and the literature in evolutionary biology on 'life history' to explain how early experience influences later life. We propose one potential mechanism: early experiences influence 'hyperparameters' that determine the balance between exploration and exploitation. Adversity might accelerate a shift from exploration to exploitation, with broad and enduring effects on the adult brain and mind. These effects may be produced by life-history adaptations that use early experience to tailor development and learning to the likely future states of an organism and its environment.
Collapse
Affiliation(s)
- Willem E Frankenhuis
- Department of Psychology, Utrecht University, Utrecht, The Netherlands; Max Planck Institute for the Study of Crime, Security and Law, Freiburg, Germany.
| | - Alison Gopnik
- Department of Psychology and Berkeley Artificial Intelligence Research, University of California at Berkeley, CA, USA
| |
Collapse
|
27
|
Cisler JM, Tamman AJF, Fonzo GA. Diminished prospective mental representations of reward mediate reward learning strategies among youth with internalizing symptoms. Psychol Med 2023; 53:1-11. [PMID: 36878892 PMCID: PMC10600826 DOI: 10.1017/s0033291723000478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 01/09/2023] [Accepted: 02/08/2023] [Indexed: 03/08/2023]
Abstract
BACKGROUND Adolescent internalizing symptoms and trauma exposure have been linked with altered reward learning processes and decreased ventral striatal responses to rewarding cues. Recent computational work on decision-making highlights an important role for prospective representations of the imagined outcomes of different choices. This study tested whether internalizing symptoms and trauma exposure among youth impact the generation of prospective reward representations during decision-making and potentially mediate altered behavioral strategies during reward learning. METHODS Sixty-one adolescent females with varying exposure to interpersonal violence exposure (n = 31 with histories of physical or sexual assault) and severity of internalizing symptoms completed a social reward learning task during fMRI. Multivariate pattern analyses (MVPA) were used to decode neural reward representations at the time of choice. RESULTS MVPA demonstrated that rewarding outcomes could accurately be decoded within several large-scale distributed networks (e.g. frontoparietal and striatum networks), that these reward representations were reactivated prospectively at the time of choice in proportion to the expected probability of receiving reward, and that youth with behavioral strategies that favored exploiting high reward options demonstrated greater prospective generation of reward representations. Youth internalizing symptoms, but not trauma exposure characteristics, were negatively associated with both the behavioral strategy of exploiting high reward options as well as the prospective generation of reward representations in the striatum. CONCLUSIONS These data suggest diminished prospective mental simulation of reward as a mechanism of altered reward learning strategies among youth with internalizing symptoms.
Collapse
Affiliation(s)
- Josh M. Cisler
- Department of Psychiatry and Behavioral Sciences, Dell Medical School, University of Texas at Austin, USA
- Institute for Early Life Adversity Research, Dell Medical School, University of Texas at Austin, USA
| | - Amanda J. F. Tamman
- Menninger Department of Psychiatry and Behavioral Sciences, Baylor College of Medicine, Houston, TX, USA
| | - Greg A. Fonzo
- Department of Psychiatry and Behavioral Sciences, Dell Medical School, University of Texas at Austin, USA
- Institute for Early Life Adversity Research, Dell Medical School, University of Texas at Austin, USA
- Center for Psychedelic Research and Therapy, Dell Medical School, University of Texas at Austin, USA
| |
Collapse
|
28
|
de A Marcelino AL, Gray O, Al-Fatly B, Gilmour W, Douglas Steele J, Kühn AA, Gilbertson T. Pallidal neuromodulation of the explore/exploit trade-off in decision-making. eLife 2023; 12:79642. [PMID: 36727860 PMCID: PMC9940911 DOI: 10.7554/elife.79642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 02/01/2023] [Indexed: 02/03/2023] Open
Abstract
Every decision that we make involves a conflict between exploiting our current knowledge of an action's value or exploring alternative courses of action that might lead to a better, or worse outcome. The sub-cortical nuclei that make up the basal ganglia have been proposed as a neural circuit that may contribute to resolving this explore-exploit 'dilemma'. To test this hypothesis, we examined the effects of neuromodulating the basal ganglia's output nucleus, the globus pallidus interna, in patients who had undergone deep brain stimulation (DBS) for isolated dystonia. Neuromodulation enhanced the number of exploratory choices to the lower value option in a two-armed bandit probabilistic reversal-learning task. Enhanced exploration was explained by a reduction in the rate of evidence accumulation (drift rate) in a reinforcement learning drift diffusion model. We estimated the functional connectivity profile between the stimulating DBS electrode and the rest of the brain using a normative functional connectome derived from heathy controls. Variation in the extent of neuromodulation induced exploration between patients was associated with functional connectivity from the stimulation electrode site to a distributed brain functional network. We conclude that the basal ganglia's output nucleus, the globus pallidus interna, can adaptively modify decision choice when faced with the dilemma to explore or exploit.
Collapse
Affiliation(s)
- Ana Luisa de A Marcelino
- Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Movement Disorder and Neuromodulation Unit, Department of Neurology, Charité Campus MitteBerlinGermany
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Core Facility GenomicsBerlinGermany
| | - Owen Gray
- Division of Imaging Science and Technology, Medical School, University of DundeeDundeeUnited Kingdom
| | - Bassam Al-Fatly
- Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Movement Disorder and Neuromodulation Unit, Department of Neurology, Charité Campus MitteBerlinGermany
| | - William Gilmour
- Division of Imaging Science and Technology, Medical School, University of DundeeDundeeUnited Kingdom
| | - J Douglas Steele
- Division of Imaging Science and Technology, Medical School, University of DundeeDundeeUnited Kingdom
| | - Andrea A Kühn
- Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Movement Disorder and Neuromodulation Unit, Department of Neurology, Charité Campus MitteBerlinGermany
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Core Facility GenomicsBerlinGermany
- Berlin School of Mind and Brain, Charité - University Medicine BerlinBerlinGermany
- NeuroCure, Charité - University Medicine BerlinBerlinGermany
- DZNE, German Centre for Degenerative DiseasesBerlinGermany
| | - Tom Gilbertson
- Division of Imaging Science and Technology, Medical School, University of DundeeDundeeUnited Kingdom
- Department of Neurology, Ninewells Hospital & Medical SchoolDundeeUnited Kingdom
| |
Collapse
|
29
|
Fan H, Gershman SJ, Phelps EA. Trait somatic anxiety is associated with reduced directed exploration and underestimation of uncertainty. Nat Hum Behav 2023; 7:102-113. [PMID: 36192493 DOI: 10.1038/s41562-022-01455-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 08/26/2022] [Indexed: 02/01/2023]
Abstract
Anxiety has been related to decreased physical exploration, but past findings on the interaction between anxiety and exploration during decision making were inconclusive. Here we examined how latent factors of trait anxiety relate to different exploration strategies when facing volatility-induced uncertainty. Across two studies (total N = 985), we demonstrated that people used a hybrid of directed, random and undirected exploration strategies, which were respectively sensitive to relative uncertainty, total uncertainty and value difference. Trait somatic anxiety, that is, the propensity to experience physical symptoms of anxiety, was inversely correlated with directed exploration and undirected exploration, manifesting as a lesser likelihood for choosing the uncertain option and reducing choice stochasticity regardless of uncertainty. Somatic anxiety is also associated with underestimation of relative uncertainty. Together, these results reveal the selective role of trait somatic anxiety in modulating both uncertainty-driven and value-driven exploration strategies.
Collapse
Affiliation(s)
- Haoxue Fan
- Department of Psychology, Harvard University, Cambridge, MA, USA
| | - Samuel J Gershman
- Department of Psychology, Harvard University, Cambridge, MA, USA
- Center for Brain Science, Harvard University, Cambridge, MA, USA
- Center for Brains, Minds and Machines, Cambridge, MA, USA
| | - Elizabeth A Phelps
- Department of Psychology, Harvard University, Cambridge, MA, USA.
- Center for Brain Science, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
30
|
Sharot T, Rollwage M, Sunstein CR, Fleming SM. Why and When Beliefs Change. PERSPECTIVES ON PSYCHOLOGICAL SCIENCE 2023; 18:142-151. [PMID: 35939828 DOI: 10.1177/17456916221082967] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Why people do or do not change their beliefs has been a long-standing puzzle. Sometimes people hold onto false beliefs despite ample contradictory evidence; sometimes they change their beliefs without sufficient reason. Here, we propose that the utility of a belief is derived from the potential outcomes associated with holding it. Outcomes can be internal (e.g., positive/negative feelings) or external (e.g., material gain/loss), and only some are dependent on belief accuracy. Belief change can then be understood as an economic transaction in which the multidimensional utility of the old belief is compared against that of the new belief. Change will occur when potential outcomes alter across attributes, for example because of changing environments or when certain outcomes are made more or less salient.
Collapse
Affiliation(s)
- Tali Sharot
- Department of Experimental Psychology, University College London.,Max Planck University College London Centre for Computational Psychiatry and Ageing Research.,Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology
| | - Max Rollwage
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology.,Wellcome Centre for Human Neuroimaging, University College London
| | | | - Stephen M Fleming
- Department of Experimental Psychology, University College London.,Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology.,Wellcome Centre for Human Neuroimaging, University College London
| |
Collapse
|
31
|
Disentangling the roles of dopamine and noradrenaline in the exploration-exploitation tradeoff during human decision-making. Neuropsychopharmacology 2022; 48:1078-1086. [PMID: 36522404 PMCID: PMC10209107 DOI: 10.1038/s41386-022-01517-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 11/29/2022] [Accepted: 11/30/2022] [Indexed: 12/23/2022]
Abstract
Balancing the exploration of new options and the exploitation of known options is a fundamental challenge in decision-making, yet the mechanisms involved in this balance are not fully understood. Here, we aimed to elucidate the distinct roles of dopamine and noradrenaline in the exploration-exploitation tradeoff during human choice. To this end, we used a double-blind, placebo-controlled design in which participants received either a placebo, 400 mg of the D2/D3 receptor antagonist amisulpride, or 40 mg of the β-adrenergic receptor antagonist propranolol before they completed a virtual patch-foraging task probing exploration and exploitation. We systematically varied the rewards associated with choice options, the rate by which rewards decreased over time, and the opportunity costs it took to switch to the next option to disentangle the contributions of dopamine and noradrenaline to specific choice aspects. Our data show that amisulpride increased the sensitivity to all of these three critical choice features, whereas propranolol was associated with a reduced tendency to use value information. Our findings provide novel insights into the specific roles of dopamine and noradrenaline in the regulation of human choice behavior, suggesting a critical involvement of dopamine in directed exploration and a role of noradrenaline in more random exploration.
Collapse
|
32
|
Colas JT, Dundon NM, Gerraty RT, Saragosa‐Harris NM, Szymula KP, Tanwisuth K, Tyszka JM, van Geen C, Ju H, Toga AW, Gold JI, Bassett DS, Hartley CA, Shohamy D, Grafton ST, O'Doherty JP. Reinforcement learning with associative or discriminative generalization across states and actions: fMRI at 3 T and 7 T. Hum Brain Mapp 2022; 43:4750-4790. [PMID: 35860954 PMCID: PMC9491297 DOI: 10.1002/hbm.25988] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 05/20/2022] [Accepted: 06/10/2022] [Indexed: 11/12/2022] Open
Abstract
The model-free algorithms of "reinforcement learning" (RL) have gained clout across disciplines, but so too have model-based alternatives. The present study emphasizes other dimensions of this model space in consideration of associative or discriminative generalization across states and actions. This "generalized reinforcement learning" (GRL) model, a frugal extension of RL, parsimoniously retains the single reward-prediction error (RPE), but the scope of learning goes beyond the experienced state and action. Instead, the generalized RPE is efficiently relayed for bidirectional counterfactual updating of value estimates for other representations. Aided by structural information but as an implicit rather than explicit cognitive map, GRL provided the most precise account of human behavior and individual differences in a reversal-learning task with hierarchical structure that encouraged inverse generalization across both states and actions. Reflecting inference that could be true, false (i.e., overgeneralization), or absent (i.e., undergeneralization), state generalization distinguished those who learned well more so than action generalization. With high-resolution high-field fMRI targeting the dopaminergic midbrain, the GRL model's RPE signals (alongside value and decision signals) were localized within not only the striatum but also the substantia nigra and the ventral tegmental area, including specific effects of generalization that also extend to the hippocampus. Factoring in generalization as a multidimensional process in value-based learning, these findings shed light on complexities that, while challenging classic RL, can still be resolved within the bounds of its core computations.
Collapse
Affiliation(s)
- Jaron T. Colas
- Department of Psychological and Brain SciencesUniversity of CaliforniaSanta BarbaraCaliforniaUSA
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
- Computation and Neural Systems Program, California Institute of TechnologyPasadenaCaliforniaUSA
| | - Neil M. Dundon
- Department of Psychological and Brain SciencesUniversity of CaliforniaSanta BarbaraCaliforniaUSA
- Department of Child and Adolescent Psychiatry, Psychotherapy, and PsychosomaticsUniversity of FreiburgFreiburg im BreisgauGermany
| | - Raphael T. Gerraty
- Department of PsychologyColumbia UniversityNew YorkNew YorkUSA
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkNew YorkUSA
- Center for Science and SocietyColumbia UniversityNew YorkNew YorkUSA
| | - Natalie M. Saragosa‐Harris
- Department of PsychologyNew York UniversityNew YorkNew YorkUSA
- Department of PsychologyUniversity of CaliforniaLos AngelesCaliforniaUSA
| | - Karol P. Szymula
- Department of BioengineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Koranis Tanwisuth
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
- Department of PsychologyUniversity of CaliforniaBerkeleyCaliforniaUSA
| | - J. Michael Tyszka
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
| | - Camilla van Geen
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkNew YorkUSA
- Department of PsychologyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Harang Ju
- Neuroscience Graduate GroupUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Arthur W. Toga
- Laboratory of Neuro ImagingUSC Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, University of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Joshua I. Gold
- Department of NeuroscienceUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Dani S. Bassett
- Department of BioengineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of Electrical and Systems EngineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of NeurologyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of PsychiatryUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of Physics and AstronomyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Santa Fe InstituteSanta FeNew MexicoUSA
| | - Catherine A. Hartley
- Department of PsychologyNew York UniversityNew YorkNew YorkUSA
- Center for Neural ScienceNew York UniversityNew YorkNew YorkUSA
| | - Daphna Shohamy
- Department of PsychologyColumbia UniversityNew YorkNew YorkUSA
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkNew YorkUSA
- Kavli Institute for Brain ScienceColumbia UniversityNew YorkNew YorkUSA
| | - Scott T. Grafton
- Department of Psychological and Brain SciencesUniversity of CaliforniaSanta BarbaraCaliforniaUSA
| | - John P. O'Doherty
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
- Computation and Neural Systems Program, California Institute of TechnologyPasadenaCaliforniaUSA
| |
Collapse
|
33
|
Dubois M, Bowler A, Moses-Payne ME, Habicht J, Moran R, Steinbeis N, Hauser TU. Exploration heuristics decrease during youth. COGNITIVE, AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2022; 22:969-983. [PMID: 35589910 PMCID: PMC9458685 DOI: 10.3758/s13415-022-01009-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Accepted: 04/22/2022] [Indexed: 01/01/2023]
Abstract
Deciding between exploring new avenues and exploiting known choices is central to learning, and this exploration-exploitation trade-off changes during development. Exploration is not a unitary concept, and humans deploy multiple distinct mechanisms, but little is known about their specific emergence during development. Using a previously validated task in adults, changes in exploration mechanisms were investigated between childhood (8-9 y/o, N = 26; 16 females), early (12-13 y/o, N = 38; 21 females), and late adolescence (16-17 y/o, N = 33; 19 females) in ethnically and socially diverse schools from disadvantaged areas. We find an increased usage of a computationally light exploration heuristic in younger groups, effectively accommodating their limited neurocognitive resources. Moreover, this heuristic was associated with self-reported, attention-deficit/hyperactivity disorder symptoms in this population-based sample. This study enriches our mechanistic understanding about how exploration strategies mature during development.
Collapse
Affiliation(s)
- Magda Dubois
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, WC1B 5EH, London, UK.
- Wellcome Centre for Human Neuroimaging, University College London, WC1N 3BG, London, UK.
| | - Aislinn Bowler
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, WC1B 5EH, London, UK
- Wellcome Centre for Human Neuroimaging, University College London, WC1N 3BG, London, UK
- Centre for Brain and Cognitive Development, Birkbeck, University of London, WC1E 7HX, London, UK
| | - Madeleine E Moses-Payne
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, WC1B 5EH, London, UK
- Wellcome Centre for Human Neuroimaging, University College London, WC1N 3BG, London, UK
- UCL Institute of Cognitive Neuroscience, WC1N 3AZ, London, UK
| | - Johanna Habicht
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, WC1B 5EH, London, UK
- Wellcome Centre for Human Neuroimaging, University College London, WC1N 3BG, London, UK
| | - Rani Moran
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, WC1B 5EH, London, UK
- Wellcome Centre for Human Neuroimaging, University College London, WC1N 3BG, London, UK
| | - Nikolaus Steinbeis
- Division of Psychology and Language Sciences, University College London, WC1H 0AP, London, UK
| | - Tobias U Hauser
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, WC1B 5EH, London, UK
- Wellcome Centre for Human Neuroimaging, University College London, WC1N 3BG, London, UK
| |
Collapse
|
34
|
Abstract
Deciding whether to forgo a good choice in favour of exploring a potentially more rewarding alternative is one of the most challenging arbitrations both in human reasoning and in artificial intelligence. Humans show substantial variability in their exploration, and theoretical (but only limited empirical) work has suggested that excessive exploration is a critical mechanism underlying the psychiatric dimension of impulsivity. In this registered report, we put these theories to test using large online samples, dimensional analyses, and computational modelling. Capitalising on recent advances in disentangling distinct human exploration strategies, we not only demonstrate that impulsivity is associated with a specific form of exploration—value-free random exploration—but also explore links between exploration and other psychiatric dimensions. The Stage 1 protocol for this Registered Report was accepted in principle on 19/03/2021. The protocol, as accepted by the journal, can be found at 10.6084/m9.figshare.14346506.v1. Deciding between known rewarding options and exploring novel avenues is central to decision making. Humans show variability in their exploration. Here, the authors show that impulsivity is associated to an increased usage of a cognitively cheap (and sometimes sub-optimal) exploration strategy.
Collapse
|
35
|
Narita R, Kurashige K. Multi-Faceted Decision Making Using Multiple Reinforcement Learning to Reducing Wasteful Actions. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS 2022. [DOI: 10.20965/jaciii.2022.p0504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Reinforcement learning can lead to autonomous behavior depending on the environment. However, in complex and high-dimensional environments, such as real environments, a large number of trials are required for learning. In this paper, we propose a solution for the learning problem using local learning to select an action based on the surrounding environmental information. Simulation experiments were conducted using maze problems, pitfall problems, and environments with random agents. The actions that did not contribute to task accomplishment were compared between the proposed method and ordinary reinforcement learning method.
Collapse
|
36
|
Identifying control ensembles for information processing within the cortico-basal ganglia-thalamic circuit. PLoS Comput Biol 2022; 18:e1010255. [PMID: 35737720 PMCID: PMC9258830 DOI: 10.1371/journal.pcbi.1010255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Revised: 07/06/2022] [Accepted: 05/27/2022] [Indexed: 11/20/2022] Open
Abstract
In situations featuring uncertainty about action-reward contingencies, mammals can flexibly adopt strategies for decision-making that are tuned in response to environmental changes. Although the cortico-basal ganglia thalamic (CBGT) network has been identified as contributing to the decision-making process, it features a complex synaptic architecture, comprised of multiple feed-forward, reciprocal, and feedback pathways, that complicate efforts to elucidate the roles of specific CBGT populations in the process by which evidence is accumulated and influences behavior. In this paper we apply a strategic sampling approach, based on Latin hypercube sampling, to explore how variations in CBGT network properties, including subpopulation firing rates and synaptic weights, map to variability of parameters in a normative drift diffusion model (DDM), representing algorithmic aspects of information processing during decision-making. Through the application of canonical correlation analysis, we find that this relationship can be characterized in terms of three low-dimensional control ensembles within the CBGT network that impact specific qualities of the emergent decision policy: responsiveness (a measure of how quickly evidence evaluation gets underway, associated with overall activity in corticothalamic and direct pathways), pliancy (a measure of the standard of evidence needed to commit to a decision, associated largely with overall activity in components of the indirect pathway of the basal ganglia), and choice (a measure of commitment toward one available option, associated with differences in direct and indirect pathways across action channels). These analyses provide mechanistic predictions about the roles of specific CBGT network elements in tuning the way that information is accumulated and translated into decision-related behavior.
Collapse
|
37
|
Wurm F, Steinhauser M. Why cognitive control matters in learning and decision-making. Neurosci Biobehav Rev 2022; 136:104636. [PMID: 35339485 DOI: 10.1016/j.neubiorev.2022.104636] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 03/03/2022] [Indexed: 10/18/2022]
Affiliation(s)
- Franz Wurm
- Department of Psychology, Leiden University, Leiden 2333 AK, The Netherlands; Leiden Institute for Brain and Cognition, Leiden 2333 AK, The Netherlands.
| | - Marco Steinhauser
- Department of Psychology, Catholic University of Eichstätt-Ingolstadt, Ostenstraße 25, 85072 Eichstätt, Germany
| |
Collapse
|
38
|
Abir Y, Marvin CB, van Geen C, Leshkowitz M, Hassin RR, Shohamy D. An energizing role for motivation in information-seeking during the early phase of the COVID-19 pandemic. Nat Commun 2022; 13:2310. [PMID: 35484153 PMCID: PMC9050882 DOI: 10.1038/s41467-022-30011-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Accepted: 04/07/2022] [Indexed: 11/18/2022] Open
Abstract
The COVID-19 pandemic has highlighted the importance of understanding and managing information seeking behavior. Information-seeking in humans is often viewed as irrational rather than utility maximizing. Here, we hypothesized that this apparent disconnect between utility and information-seeking is due to a latent third variable, motivation. We quantified information-seeking, learning, and COVID-19-related concern (which we used as a proxy for motivation regarding COVID-19 and the changes in circumstance it caused) in a US-based sample (n = 5376) during spring 2020. We found that self-reported levels of COVID-19 concern were associated with directed seeking of COVID-19-related content and better memory for such information. Interestingly, this specific motivational state was also associated with a general enhancement of information-seeking for content unrelated to COVID-19. These effects were associated with commensurate changes to utility expectations and were dissociable from the influence of non-specific anxiety. Thus, motivation both directs and energizes epistemic behavior, linking together utility and curiosity. Information-seeking behavior in humans is often viewed as irrational rather than utility maximizing. Here the authors describe data obtained in Spring 2020 showing that participants’ concern about COVID-19 was related not only to their drive to seek information about the virus, but also to their curiosity about other more general topics.
Collapse
Affiliation(s)
- Yaniv Abir
- Department of Psychology, Columbia University, New York, NY, USA.
| | | | - Camilla van Geen
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA.,Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA
| | - Maya Leshkowitz
- Department of Cognitive Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Ran R Hassin
- Department of Psychology and The Federmann Center for the Study of Rationality, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Daphna Shohamy
- Department of Psychology, Columbia University, New York, NY, USA.,Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA.,Kavli Institute for Brain Science, Columbia University, New York, NY, USA
| |
Collapse
|
39
|
Kaanders P, Sepulveda P, Folke T, Ortoleva P, De Martino B. Humans actively sample evidence to support prior beliefs. eLife 2022; 11:e71768. [PMID: 35404234 PMCID: PMC9038198 DOI: 10.7554/elife.71768] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 04/08/2022] [Indexed: 11/13/2022] Open
Abstract
No one likes to be wrong. Previous research has shown that participants may underweight information incompatible with previous choices, a phenomenon called confirmation bias. In this paper, we argue that a similar bias exists in the way information is actively sought. We investigate how choice influences information gathering using a perceptual choice task and find that participants sample more information from a previously chosen alternative. Furthermore, the higher the confidence in the initial choice, the more biased information sampling becomes. As a consequence, when faced with the possibility of revising an earlier decision, participants are more likely to stick with their original choice, even when incorrect. Critically, we show that agency controls this phenomenon. The effect disappears in a fixed sampling condition where presentation of evidence is controlled by the experimenter, suggesting that the way in which confirmatory evidence is acquired critically impacts the decision process. These results suggest active information acquisition plays a critical role in the propagation of strongly held beliefs over time.
Collapse
Affiliation(s)
- Paula Kaanders
- Department of Experimental Psychology, University of OxfordOxfordUnited Kingdom
- Wellcome Centre for Integrative Neuroimaging, University of OxfordOxfordUnited Kingdom
| | - Pradyumna Sepulveda
- Institute of Cognitive Neuroscience, University College LondonLondonUnited Kingdom
| | - Tomas Folke
- Department of Mathematics and Computer Science, Rutgers UniversityNewarkUnited States
- Centre for Business Research, Cambridge Judge Business School, University of CambridgeCambridgeUnited Kingdom
| | - Pietro Ortoleva
- Department of Economics and Woodrow Wilson School, Princeton UniversityPrincetonUnited States
| | - Benedetto De Martino
- Institute of Cognitive Neuroscience, University College LondonLondonUnited Kingdom
- Wellcome Centre for Human Neuroimaging, University College LondonLondonUnited Kingdom
| |
Collapse
|
40
|
Time pressure changes how people explore and respond to uncertainty. Sci Rep 2022; 12:4122. [PMID: 35260717 PMCID: PMC8904509 DOI: 10.1038/s41598-022-07901-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Accepted: 02/28/2022] [Indexed: 12/25/2022] Open
Abstract
How does time pressure influence exploration and decision-making? We investigated this question with several four-armed bandit tasks manipulating (within subjects) expected reward, uncertainty, and time pressure (limited vs. unlimited). With limited time, people have less opportunity to perform costly computations, thus shifting the cost-benefit balance of different exploration strategies. Through behavioral, reinforcement learning (RL), reaction time (RT), and evidence accumulation analyses, we show that time pressure changes how people explore and respond to uncertainty. Specifically, participants reduced their uncertainty-directed exploration under time pressure, were less value-directed, and repeated choices more often. Since our analyses relate uncertainty to slower responses and dampened evidence accumulation (i.e., drift rates), this demonstrates a resource-rational shift towards simpler, lower-cost strategies under time pressure. These results shed light on how people adapt their exploration and decision-making strategies to externally imposed cognitive constraints.
Collapse
|
41
|
Wang D, Chen S, Hu Y, Liu L, Wang H. Behavior Decision of Mobile Robot With a Neurophysiologically Motivated Reinforcement Learning Model. IEEE Trans Cogn Dev Syst 2022. [DOI: 10.1109/tcds.2020.3035778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
42
|
Smith R, Taylor S, Wilson RC, Chuning AE, Persich MR, Wang S, Killgore WDS. Lower Levels of Directed Exploration and Reflective Thinking Are Associated With Greater Anxiety and Depression. Front Psychiatry 2022; 12:782136. [PMID: 35126200 PMCID: PMC8808291 DOI: 10.3389/fpsyt.2021.782136] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 12/07/2021] [Indexed: 01/15/2023] Open
Abstract
Anxiety and depression are often associated with strong beliefs that entering specific situations will lead to aversive outcomes - even when these situations are objectively safe and avoiding them reduces well-being. A possible mechanism underlying this maladaptive avoidance behavior is a failure to reflect on: (1) appropriate levels of uncertainty about the situation, and (2) how this uncertainty could be reduced by seeking further information (i.e., exploration). To test this hypothesis, we asked a community sample of 416 individuals to complete measures of reflective cognition, exploration, and symptoms of anxiety and depression. Consistent with our hypotheses, we found significant associations between each of these measures in expected directions (i.e., positive relationships between reflective cognition and strategic information-seeking behavior or "directed exploration", and negative relationships between these measures and anxiety/depression symptoms). Further analyses suggested that the relationship between directed exploration and depression/anxiety was due in part to an ambiguity aversion promoting exploration in conditions where information-seeking was not beneficial (as opposed to only being due to under-exploration when more information would aid future choices). In contrast, reflectiveness was associated with greater exploration in appropriate settings and separately accounted for differences in reaction times, decision noise, and choice accuracy in expected directions. These results shed light on the mechanisms underlying information-seeking behavior and how they may contribute to symptoms of emotional disorders. They also highlight the potential clinical relevance of individual differences in reflectiveness and exploration and should motivate future research on their possible contributions to vulnerability and/or maintenance of affective disorders.
Collapse
Affiliation(s)
- Ryan Smith
- Laureate Institute for Brain Research, Tulsa, OK, United States
| | - Samuel Taylor
- Laureate Institute for Brain Research, Tulsa, OK, United States
| | - Robert C. Wilson
- Department of Psychology, University of Arizona, Tucson, AZ, United States
| | - Anne E. Chuning
- Laureate Institute for Brain Research, Tulsa, OK, United States
| | | | - Siyu Wang
- Department of Psychology, University of Arizona, Tucson, AZ, United States
| | - William D. S. Killgore
- Department of Psychology, University of Arizona, Tucson, AZ, United States
- Department of Psychiatry, University of Arizona, Tucson, AZ, United States
| |
Collapse
|
43
|
Collins AGE, Shenhav A. Advances in modeling learning and decision-making in neuroscience. Neuropsychopharmacology 2022; 47:104-118. [PMID: 34453117 PMCID: PMC8617262 DOI: 10.1038/s41386-021-01126-y] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Revised: 07/14/2021] [Accepted: 07/22/2021] [Indexed: 02/07/2023]
Abstract
An organism's survival depends on its ability to learn about its environment and to make adaptive decisions in the service of achieving the best possible outcomes in that environment. To study the neural circuits that support these functions, researchers have increasingly relied on models that formalize the computations required to carry them out. Here, we review the recent history of computational modeling of learning and decision-making, and how these models have been used to advance understanding of prefrontal cortex function. We discuss how such models have advanced from their origins in basic algorithms of updating and action selection to increasingly account for complexities in the cognitive processes required for learning and decision-making, and the representations over which they operate. We further discuss how a deeper understanding of the real-world complexities in these computations has shed light on the fundamental constraints on optimal behavior, and on the complex interactions between corticostriatal pathways to determine such behavior. The continuing and rapid development of these models holds great promise for understanding the mechanisms by which animals adapt to their environments, and what leads to maladaptive forms of learning and decision-making within clinical populations.
Collapse
Affiliation(s)
- Anne G E Collins
- Department of Psychology and Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, USA.
| | - Amitai Shenhav
- Department of Cognitive, Linguistic, & Psychological Sciences and Carney Institute for Brain Science, Brown University, Providence, RI, USA.
| |
Collapse
|
44
|
Spreng RN, Turner GR. From exploration to exploitation: a shifting mental mode in late life development. Trends Cogn Sci 2021; 25:1058-1071. [PMID: 34593321 PMCID: PMC8844884 DOI: 10.1016/j.tics.2021.09.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2021] [Revised: 08/30/2021] [Accepted: 09/01/2021] [Indexed: 12/31/2022]
Abstract
Changes in cognition, affect, and brain function combine to promote a shift in the nature of mentation in older adulthood, favoring exploitation of prior knowledge over exploratory search as the starting point for thought and action. Age-related exploitation biases result from the accumulation of prior knowledge, reduced cognitive control, and a shift toward affective goals. These are accompanied by changes in cortical networks, as well as attention and reward circuits. By incorporating these factors into a unified account, the exploration-to-exploitation shift offers an integrative model of cognitive, affective, and brain aging. Here, we review evidence for this model, identify determinants and consequences, and survey the challenges and opportunities posed by an exploitation-biased mental mode in later life.
Collapse
Affiliation(s)
- R Nathan Spreng
- Laboratory of Brain and Cognition, Montreal Neurological Institute, Department of Neurology and Neurosurgery, McGill University, Montreal, QC H3A 2B4, Canada; McConnell Brain Imaging Centre, Montreal Neurological Institute, McGill University, Montreal, QC H3A 2B4, Canada; Departments of Psychiatry and Psychology, McGill University, Montreal, QC H3A 0G4, Canada.
| | - Gary R Turner
- Department of Psychology, York University, Toronto, ON M3J 1P3, Canada
| |
Collapse
|
45
|
Wiesner CD, Meyer J, Lindner C. Detours increase local knowledge-Exploring the hidden benefits of self-control failure. PLoS One 2021; 16:e0257717. [PMID: 34597326 PMCID: PMC8486128 DOI: 10.1371/journal.pone.0257717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Accepted: 09/08/2021] [Indexed: 11/19/2022] Open
Abstract
Self-control enables people to override momentary thoughts, emotions, or impulses in order to pursue long-term goals. Good self-control is a predictor for health, success, and subjective well-being, as bad self-control is for the opposite. Therefore, the question arises why evolution has not endowed us with perfect self-control. In this article, we draw some attention to the hidden benefits of self-control failure and present a new experimental paradigm that captures both costs and benefits of self-control failure. In an experiment, participants worked on three consecutive tasks: 1) In a transcription task, we manipulated how much effortful self-control two groups of participants had to exert. 2) In a number-comparison task, participants of both groups were asked to compare numbers and ignore distracting neutral versus reward-related pictures. 3) After a pause for recreation, participants were confronted with an unannounced recognition task measuring whether they had incidentally encoded the distracting pictures during the previous number-comparison task. The results showed that participants who exerted a high amount of effortful self-control during the first task shifted their priorities and attention toward the distractors during the second self-control demanding task: The cost of self-control failure was reflected in worse performance in the number-comparison task. Moreover, the group which had exerted a high amount of self-control during the first task and showed self-control failure during the second task was better in the unannounced third task. The benefit of self-control failure during number comparison was reflected in better performance during the recognition task. However, costs and benefits were not specific for reward-related distractors but also occurred with neutral pictures. We propose that the hidden benefit of self-control failure lies in the exploration of distractors present during goal pursuit, i.e. the collection of information about the environment and the potential discovery of new sources of reward. Detours increase local knowledge.
Collapse
Affiliation(s)
- Christian Dirk Wiesner
- Department of Clinical Psychology and Psychotherapy, Institute of Psychology, Christian-Albrechts-University, Kiel, Germany
| | - Jennifer Meyer
- Leibniz-Institute for Science and Mathematics Education (IPN), Kiel, Germany
| | - Christoph Lindner
- Educational Psychology, Faculty of Education, University of Hamburg (UHH), Hamburg, Germany
| |
Collapse
|
46
|
Marković D, Stojić H, Schwöbel S, Kiebel SJ. An empirical evaluation of active inference in multi-armed bandits. Neural Netw 2021; 144:229-246. [PMID: 34507043 DOI: 10.1016/j.neunet.2021.08.018] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 07/07/2021] [Accepted: 08/11/2021] [Indexed: 10/20/2022]
Abstract
A key feature of sequential decision making under uncertainty is a need to balance between exploiting-choosing the best action according to the current knowledge, and exploring-obtaining information about values of other actions. The multi-armed bandit problem, a classical task that captures this trade-off, served as a vehicle in machine learning for developing bandit algorithms that proved to be useful in numerous industrial applications. The active inference framework, an approach to sequential decision making recently developed in neuroscience for understanding human and animal behaviour, is distinguished by its sophisticated strategy for resolving the exploration-exploitation trade-off. This makes active inference an exciting alternative to already established bandit algorithms. Here we derive an efficient and scalable approximate active inference algorithm and compare it to two state-of-the-art bandit algorithms: Bayesian upper confidence bound and optimistic Thompson sampling. This comparison is done on two types of bandit problems: a stationary and a dynamic switching bandit. Our empirical evaluation shows that the active inference algorithm does not produce efficient long-term behaviour in stationary bandits. However, in the more challenging switching bandit problem active inference performs substantially better than the two state-of-the-art bandit algorithms. The results open exciting venues for further research in theoretical and applied machine learning, as well as lend additional credibility to active inference as a general framework for studying human and animal behaviour.
Collapse
Affiliation(s)
- Dimitrije Marković
- Faculty of Psychology, Technische Universität Dresden, 01062 Dresden, Germany; Centre for Tactile Internet with Human-in-the-Loop (CeTI), Technische Universität Dresden, 01062 Dresden, Germany.
| | - Hrvoje Stojić
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, 10-12 Russell Square, London, WC1B 5EH, United Kingdom; Secondmind, 72 Hills Rd, Cambridge, CB2 1LA, United Kingdom
| | - Sarah Schwöbel
- Faculty of Psychology, Technische Universität Dresden, 01062 Dresden, Germany
| | - Stefan J Kiebel
- Faculty of Psychology, Technische Universität Dresden, 01062 Dresden, Germany; Centre for Tactile Internet with Human-in-the-Loop (CeTI), Technische Universität Dresden, 01062 Dresden, Germany
| |
Collapse
|
47
|
Marković D, Goschke T, Kiebel SJ. Meta-control of the exploration-exploitation dilemma emerges from probabilistic inference over a hierarchy of time scales. COGNITIVE, AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2021; 21:509-533. [PMID: 33372237 PMCID: PMC8208938 DOI: 10.3758/s13415-020-00837-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Accepted: 09/17/2020] [Indexed: 12/12/2022]
Abstract
Cognitive control is typically understood as a set of mechanisms that enable humans to reach goals that require integrating the consequences of actions over longer time scales. Importantly, using routine behaviour or making choices beneficial only at short time scales would prevent one from attaining these goals. During the past two decades, researchers have proposed various computational cognitive models that successfully account for behaviour related to cognitive control in a wide range of laboratory tasks. As humans operate in a dynamic and uncertain environment, making elaborate plans and integrating experience over multiple time scales is computationally expensive. Importantly, it remains poorly understood how uncertain consequences at different time scales are integrated into adaptive decisions. Here, we pursue the idea that cognitive control can be cast as active inference over a hierarchy of time scales, where inference, i.e., planning, at higher levels of the hierarchy controls inference at lower levels. We introduce the novel concept of meta-control states, which link higher-level beliefs with lower-level policy inference. Specifically, we conceptualize cognitive control as inference over these meta-control states, where solutions to cognitive control dilemmas emerge through surprisal minimisation at different hierarchy levels. We illustrate this concept using the exploration-exploitation dilemma based on a variant of a restless multi-armed bandit task. We demonstrate that beliefs about contexts and meta-control states at a higher level dynamically modulate the balance of exploration and exploitation at the lower level of a single action. Finally, we discuss the generalisation of this meta-control concept to other control dilemmas.
Collapse
Affiliation(s)
- Dimitrije Marković
- Chair of Neuroimaging, Faculty of Psychology, Technische Universität Dresden, 01062, Dresden, Germany
| | - Thomas Goschke
- Chair of General Psychology, Faculty of Psychology, Technische Universität Dresden, 01062, Dresden, Germany
- Centre for Tactile Internet with Human-in-the-Loop (CeTI), Technische Universität Dresden, 01062, Dresden, Germany
| | - Stefan J Kiebel
- Chair of Neuroimaging, Faculty of Psychology, Technische Universität Dresden, 01062, Dresden, Germany.
- Centre for Tactile Internet with Human-in-the-Loop (CeTI), Technische Universität Dresden, 01062, Dresden, Germany.
| |
Collapse
|
48
|
Xu HA, Modirshanechi A, Lehmann MP, Gerstner W, Herzog MH. Novelty is not surprise: Human exploratory and adaptive behavior in sequential decision-making. PLoS Comput Biol 2021; 17:e1009070. [PMID: 34081705 PMCID: PMC8205159 DOI: 10.1371/journal.pcbi.1009070] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 06/15/2021] [Accepted: 05/12/2021] [Indexed: 11/19/2022] Open
Abstract
Classic reinforcement learning (RL) theories cannot explain human behavior in the absence of external reward or when the environment changes. Here, we employ a deep sequential decision-making paradigm with sparse reward and abrupt environmental changes. To explain the behavior of human participants in these environments, we show that RL theories need to include surprise and novelty, each with a distinct role. While novelty drives exploration before the first encounter of a reward, surprise increases the rate of learning of a world-model as well as of model-free action-values. Even though the world-model is available for model-based RL, we find that human decisions are dominated by model-free action choices. The world-model is only marginally used for planning, but it is important to detect surprising events. Our theory predicts human action choices with high probability and allows us to dissociate surprise, novelty, and reward in EEG signals.
Collapse
Affiliation(s)
- He A. Xu
- Laboratory of Psychophysics, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Alireza Modirshanechi
- Brain-Mind Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- School of Computer and Communication Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Marco P. Lehmann
- Brain-Mind Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- School of Computer and Communication Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Wulfram Gerstner
- Brain-Mind Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- School of Computer and Communication Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Michael H. Herzog
- Laboratory of Psychophysics, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Brain-Mind Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| |
Collapse
|
49
|
Wilson RC, Bonawitz E, Costa VD, Ebitz RB. Balancing exploration and exploitation with information and randomization. Curr Opin Behav Sci 2021; 38:49-56. [PMID: 33184605 PMCID: PMC7654823 DOI: 10.1016/j.cobeha.2020.10.001] [Citation(s) in RCA: 78] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Explore-exploit decisions require us to trade off the benefits of exploring unknown options to learn more about them, with exploiting known options, for immediate reward. Such decisions are ubiquitous in nature, but from a computational perspective, they are notoriously hard. There is therefore much interest in how humans and animals make these decisions and recently there has been an explosion of research in this area. Here we provide a biased and incomplete snapshot of this field focusing on the major finding that many organisms use two distinct strategies to solve the explore-exploit dilemma: a bias for information ('directed exploration') and the randomization of choice ('random exploration'). We review evidence for the existence of these strategies, their computational properties, their neural implementations, as well as how directed and random exploration vary over the lifespan. We conclude by highlighting open questions in this field that are ripe to both explore and exploit.
Collapse
Affiliation(s)
- Robert C. Wilson
- Department of Psychology, University of Arizona, Tucson AZ USA
- Cognitive Science Program, University of Arizona, Tucson AZ USA
- Evelyn F. McKnight Brain Institute, University of Arizona, Tucson AZ USA
| | | | - Vincent D. Costa
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland OR USA
| | - R. Becket Ebitz
- Department of Neuroscience, University of Montréal, Montréal, Québec, Canada
| |
Collapse
|
50
|
Wiehler A, Chakroun K, Peters J. Attenuated Directed Exploration during Reinforcement Learning in Gambling Disorder. J Neurosci 2021; 41:2512-2522. [PMID: 33531415 PMCID: PMC7984586 DOI: 10.1523/jneurosci.1607-20.2021] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Revised: 01/18/2021] [Accepted: 01/22/2021] [Indexed: 12/30/2022] Open
Abstract
Gambling disorder (GD) is a behavioral addiction associated with impairments in value-based decision-making and behavioral flexibility and might be linked to changes in the dopamine system. Maximizing long-term rewards requires a flexible trade-off between the exploitation of known options and the exploration of novel options for information gain. This exploration-exploitation trade-off is thought to depend on dopamine neurotransmission. We hypothesized that human gamblers would show a reduction in directed (uncertainty-based) exploration, accompanied by changes in brain activity in a fronto-parietal exploration-related network. Twenty-three frequent, non-treatment seeking gamblers and twenty-three healthy matched controls (all male) performed a four-armed bandit task during functional magnetic resonance imaging (fMRI). Computational modeling using hierarchical Bayesian parameter estimation revealed signatures of directed exploration, random exploration, and perseveration in both groups. Gamblers showed a reduction in directed exploration, whereas random exploration and perseveration were similar between groups. Neuroimaging revealed no evidence for group differences in neural representations of basic task variables (expected value, prediction errors). Our hypothesis of reduced frontal pole (FP) recruitment in gamblers was not supported. Exploratory analyses showed that during directed exploration, gamblers showed reduced parietal cortex and substantia-nigra/ventral-tegmental-area activity. Cross-validated classification analyses revealed that connectivity in an exploration-related network was predictive of group status, suggesting that connectivity patterns might be more predictive of problem gambling than univariate effects. Findings reveal specific reductions of strategic exploration in gamblers that might be linked to altered processing in a fronto-parietal network and/or changes in dopamine neurotransmission implicated in GD.SIGNIFICANCE STATEMENT Wiehler et al. (2021) report that gamblers rely less on the strategic exploration of unknown, but potentially better rewards during reward learning. This is reflected in a related network of brain activity. Parameters of this network can be used to predict the presence of problem gambling behavior in participants.
Collapse
Affiliation(s)
- A Wiehler
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg 20246, Germany
- Université de Paris, Paris F-75006, France
- Department of Psychiatry, Service Hospitalo-Universitaire, Groupe Hospitalier Universitaire Paris Psychiatrie & Neurosciences, Paris F-75014, France
| | - K Chakroun
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg 20246, Germany
| | - J Peters
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg 20246, Germany
- Department of Psychology, Biological Psychology, University of Cologne, Cologne 50923, Germany
| |
Collapse
|