1
|
Ramírez-Ruiz J, Grytskyy D, Mastrogiuseppe C, Habib Y, Moreno-Bote R. Complex behavior from intrinsic motivation to occupy future action-state path space. Nat Commun 2024; 15:6368. [PMID: 39075046 PMCID: PMC11286966 DOI: 10.1038/s41467-024-49711-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 06/13/2024] [Indexed: 07/31/2024] Open
Abstract
Most theories of behavior posit that agents tend to maximize some form of reward or utility. However, animals very often move with curiosity and seem to be motivated in a reward-free manner. Here we abandon the idea of reward maximization and propose that the goal of behavior is maximizing occupancy of future paths of actions and states. According to this maximum occupancy principle, rewards are the means to occupy path space, not the goal per se; goal-directedness simply emerges as rational ways of searching for resources so that movement, understood amply, never ends. We find that action-state path entropy is the only measure consistent with additivity and other intuitive properties of expected future action-state path occupancy. We provide analytical expressions that relate the optimal policy and state-value function and prove convergence of our value iteration algorithm. Using discrete and continuous state tasks, including a high-dimensional controller, we show that complex behaviors such as "dancing", hide-and-seek, and a basic form of altruistic behavior naturally result from the intrinsic motivation to occupy path space. All in all, we present a theory of behavior that generates both variability and goal-directedness in the absence of reward maximization.
Collapse
Affiliation(s)
- Jorge Ramírez-Ruiz
- Center for Brain and Cognition, Departament d'Enginyeria i Escola d'Enginyeria, Universitat Pompeu Fabra, Barcelona, Spain.
| | - Dmytro Grytskyy
- Center for Brain and Cognition, Departament d'Enginyeria i Escola d'Enginyeria, Universitat Pompeu Fabra, Barcelona, Spain
| | - Chiara Mastrogiuseppe
- Center for Brain and Cognition, Departament d'Enginyeria i Escola d'Enginyeria, Universitat Pompeu Fabra, Barcelona, Spain
| | - Yamen Habib
- Center for Brain and Cognition, Departament d'Enginyeria i Escola d'Enginyeria, Universitat Pompeu Fabra, Barcelona, Spain
| | - Rubén Moreno-Bote
- Center for Brain and Cognition, Departament d'Enginyeria i Escola d'Enginyeria, Universitat Pompeu Fabra, Barcelona, Spain
- Serra Húnter Fellow Programme, Universitat Pompeu Fabra, Barcelona, Spain
| |
Collapse
|
2
|
Lamba A, Frank MJ, FeldmanHall O. Keeping an eye out for change: Anxiety disrupts adaptive resolution of policy uncertainty. BIOLOGICAL PSYCHIATRY. COGNITIVE NEUROSCIENCE AND NEUROIMAGING 2024:S2451-9022(24)00203-9. [PMID: 39069235 DOI: 10.1016/j.bpsc.2024.07.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2024] [Revised: 07/17/2024] [Accepted: 07/17/2024] [Indexed: 07/30/2024]
Abstract
BACKGROUND Human learning unfolds under uncertainty. Uncertainty is heterogeneous with different forms exerting distinct influences on learning. While one can be uncertain about what to do to maximize rewarding outcomes, known as policy uncertainty, one can also be uncertain about general world knowledge, known as epistemic uncertainty. In complex and naturalistic environments such as the social world, adaptive learning may hinge on striking a balance between attending to and resolving each type of uncertainty. Prior work illustrates that people with anxiety-those with increased threat and uncertainty sensitivity-learn less from aversive outcomes, particularly as outcomes become more uncertain. How does a learner adaptively trade-off between attending to these distinct sources of uncertainty to successfully learn about their social environment? METHODS We developed a novel eye-tracking method to capture highly granular estimates of policy and epistemic uncertainty based on gaze patterns and pupil diameter (a physiological estimate of arousal) RESULTS: These empirically derived uncertainty measures reveal that humans (N = 94) flexibly switch between resolving policy and epistemic uncertainty to adaptively learn about which individuals can be trusted and which should be avoided. However, those with increased anxiety (N = 49) do not flexibly switch between resolving policy and epistemic uncertainty, and instead express less uncertainty overall CONCLUSIONS: Combining modeling and eye-tracking techniques, we show that altered learning in people with anxiety emerges from an insensitivity to policy uncertainty and rigid choice policies, leading to maladaptive behaviors with untrustworthy people.
Collapse
Affiliation(s)
- Amrita Lamba
- Department of Cognitive & Psychological Sciences, Brown University, Providence, RI; Department of Brain & Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA
| | - Michael J Frank
- Department of Cognitive & Psychological Sciences, Brown University, Providence, RI; Carney Institute of Brain Sciences, Brown University, Providence, RI
| | - Oriel FeldmanHall
- Department of Cognitive & Psychological Sciences, Brown University, Providence, RI; Carney Institute of Brain Sciences, Brown University, Providence, RI.
| |
Collapse
|
3
|
Goudar V, Kim JW, Liu Y, Dede AJO, Jutras MJ, Skelin I, Ruvalcaba M, Chang W, Ram B, Fairhall AL, Lin JJ, Knight RT, Buffalo EA, Wang XJ. A Comparison of Rapid Rule-Learning Strategies in Humans and Monkeys. J Neurosci 2024; 44:e0231232024. [PMID: 38871463 PMCID: PMC11236592 DOI: 10.1523/jneurosci.0231-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 05/28/2024] [Accepted: 05/31/2024] [Indexed: 06/15/2024] Open
Abstract
Interspecies comparisons are key to deriving an understanding of the behavioral and neural correlates of human cognition from animal models. We perform a detailed comparison of the strategies of female macaque monkeys to male and female humans on a variant of the Wisconsin Card Sorting Test (WCST), a widely studied and applied task that provides a multiattribute measure of cognitive function and depends on the frontal lobe. WCST performance requires the inference of a rule change given ambiguous feedback. We found that well-trained monkeys infer new rules three times more slowly than minimally instructed humans. Input-dependent hidden Markov model-generalized linear models were fit to their choices, revealing hidden states akin to feature-based attention in both species. Decision processes resembled a win-stay, lose-shift strategy with interspecies similarities as well as key differences. Monkeys and humans both test multiple rule hypotheses over a series of rule-search trials and perform inference-like computations to exclude candidate choice options. We quantitatively show that perseveration, random exploration, and poor sensitivity to negative feedback account for the slower task-switching performance in monkeys.
Collapse
Affiliation(s)
- Vishwa Goudar
- Center for Neural Science, New York University, New York 10003
| | - Jeong-Woo Kim
- Center for Neural Science, New York University, New York 10003
| | - Yue Liu
- Center for Neural Science, New York University, New York 10003
| | - Adam J O Dede
- Department of Physiology and Biophysics, University of Washington, Seattle, Washington 98195
| | - Michael J Jutras
- Department of Physiology and Biophysics, University of Washington, Seattle, Washington 98195
| | - Ivan Skelin
- Department of Neurology, University of California, Davis, California 95616
- The Center for Mind and Brain, University of California, Davis, California 95616
| | - Michael Ruvalcaba
- Helen Wills Neuroscience Institute, University of California, Berkeley, California 94720
| | - William Chang
- Helen Wills Neuroscience Institute, University of California, Berkeley, California 94720
| | - Bhargavi Ram
- Department of Neurology, University of California, Davis, California 95616
- The Center for Mind and Brain, University of California, Davis, California 95616
| | - Adrienne L Fairhall
- Department of Physiology and Biophysics, University of Washington, Seattle, Washington 98195
| | - Jack J Lin
- Department of Neurology, University of California, Davis, California 95616
- The Center for Mind and Brain, University of California, Davis, California 95616
| | - Robert T Knight
- Helen Wills Neuroscience Institute, University of California, Berkeley, California 94720
- Department of Psychology, University of California, Berkeley, California 94720
| | - Elizabeth A Buffalo
- Department of Physiology and Biophysics, University of Washington, Seattle, Washington 98195
- Washington Primate Research Center, University of Washington, Seattle, Washington 98195
| | - Xiao-Jing Wang
- Center for Neural Science, New York University, New York 10003
| |
Collapse
|
4
|
Zid M, Laurie VJ, Levine-Champagne A, Shourkeshti A, Harrell D, Herman AB, Ebitz RB. Humans forage for reward in reinforcement learning tasks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.08.602539. [PMID: 39026817 PMCID: PMC11257465 DOI: 10.1101/2024.07.08.602539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
Abstract
How do we make good decisions in uncertain environments? In psychology and neuroscience, the classic answer is that we calculate the value of each option and then compare the values to choose the most rewarding, modulo some exploratory noise. An ethologist, conversely, would argue that we commit to one option until its value drops below a threshold, at which point we start exploring other options. In order to determine which view better describes human decision-making, we developed a novel, foraging-inspired sequential decision-making model and used it to ask whether humans compare to threshold ("Forage") or compare alternatives ("Reinforcement-Learn" [RL]). We found that the foraging model was a better fit for participant behavior, better predicted the participants' tendency to repeat choices, and predicted the existence of held-out participants with a pattern of choice that was almost impossible under RL. Together, these results suggest that humans use foraging computations, rather than RL, even in classic reinforcement learning tasks.
Collapse
Affiliation(s)
- Meriam Zid
- Department of Neuroscience, University of Montreal, Montreal, QC , H3T 1J4, Canada
| | - Veldon-James Laurie
- Department of Neuroscience, University of Montreal, Montreal, QC , H3T 1J4, Canada
| | | | - Akram Shourkeshti
- Department of Neuroscience, University of Montreal, Montreal, QC , H3T 1J4, Canada
| | - Dameon Harrell
- Department of Psychiatry, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Alexander B. Herman
- Department of Psychiatry, University of Minnesota, Minneapolis, MN, 55455, USA
| | - R. Becket Ebitz
- Department of Neuroscience, University of Montreal, Montreal, QC , H3T 1J4, Canada
| |
Collapse
|
5
|
Chen CS, Vinogradov S. Personalized Cognitive Health in Psychiatry: Current State and the Promise of Computational Methods. Schizophr Bull 2024:sbae108. [PMID: 38934792 DOI: 10.1093/schbul/sbae108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 06/28/2024]
Abstract
BACKGROUND Decades of research have firmly established that cognitive health and cognitive treatment services are a key need for people living with psychosis. However, many current clinical programs do not address this need, despite the essential role that an individual's cognitive and social cognitive capacities play in determining their real-world functioning. Preliminary practice-based research in the Early Psychosis Intervention Network early psychosis intervention network shows that it is possible to develop and implement tools that delineate an individuals' cognitive health profile and that help engage the client and the clinician in shared decision-making and treatment planning that includes cognitive treatments. These findings signify a promising shift toward personalized cognitive health. STUDY DESIGN Extending upon this early progress, we review the concept of interindividual variability in cognitive domains/processes in psychosis as the basis for offering personalized treatment plans. We present evidence from studies that have used traditional neuropsychological measures as well as findings from emerging computational studies that leverage trial-by-trial behavior data to illuminate the different latent strategies that individuals employ. STUDY RESULT We posit that these computational techniques, when combined with traditional cognitive assessments, can enrich our understanding of individual differences in treatment needs, which in turn can guide evermore personalized interventions. CONCLUSION As we find clinically relevant ways to decompose maladaptive behaviors into separate latent cognitive elements captured by model parameters, the ultimate goal is to develop and implement approaches that empower clients and their clinical providers to leverage individual's existing learning capacities to improve their cognitive health and well-being.
Collapse
Affiliation(s)
- Cathy S Chen
- Department of Psychiatry & Behavioral Sciences, University of Minnesota Medical School, Minneapolis, MN, USA
| | - Sophia Vinogradov
- Department of Psychiatry & Behavioral Sciences, University of Minnesota Medical School, Minneapolis, MN, USA
| |
Collapse
|
6
|
Nick Q, Gale DJ, Areshenkoff C, De Brouwer A, Nashed J, Wammes J, Zhu T, Flanagan R, Smallwood J, Gallivan J. Reconfigurations of cortical manifold structure during reward-based motor learning. eLife 2024; 12:RP91928. [PMID: 38916598 PMCID: PMC11198988 DOI: 10.7554/elife.91928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/26/2024] Open
Abstract
Adaptive motor behavior depends on the coordinated activity of multiple neural systems distributed across the brain. While the role of sensorimotor cortex in motor learning has been well established, how higher-order brain systems interact with sensorimotor cortex to guide learning is less well understood. Using functional MRI, we examined human brain activity during a reward-based motor task where subjects learned to shape their hand trajectories through reinforcement feedback. We projected patterns of cortical and striatal functional connectivity onto a low-dimensional manifold space and examined how regions expanded and contracted along the manifold during learning. During early learning, we found that several sensorimotor areas in the dorsal attention network exhibited increased covariance with areas of the salience/ventral attention network and reduced covariance with areas of the default mode network (DMN). During late learning, these effects reversed, with sensorimotor areas now exhibiting increased covariance with DMN areas. However, areas in posteromedial cortex showed the opposite pattern across learning phases, with its connectivity suggesting a role in coordinating activity across different networks over time. Our results establish the neural changes that support reward-based motor learning and identify distinct transitions in the functional coupling of sensorimotor to transmodal cortex when adapting behavior.
Collapse
Affiliation(s)
- Qasem Nick
- Centre for Neuroscience Studies, Queen’s UniversityKingstonCanada
- Department of Psychology, Queen’s UniversityKingstonCanada
| | - Daniel J Gale
- Centre for Neuroscience Studies, Queen’s UniversityKingstonCanada
| | - Corson Areshenkoff
- Centre for Neuroscience Studies, Queen’s UniversityKingstonCanada
- Department of Psychology, Queen’s UniversityKingstonCanada
| | - Anouk De Brouwer
- Centre for Neuroscience Studies, Queen’s UniversityKingstonCanada
| | - Joseph Nashed
- Centre for Neuroscience Studies, Queen’s UniversityKingstonCanada
- Department of Medicine, Queen's UniversityKingstonCanada
| | - Jeffrey Wammes
- Centre for Neuroscience Studies, Queen’s UniversityKingstonCanada
- Department of Psychology, Queen’s UniversityKingstonCanada
| | - Tianyao Zhu
- Centre for Neuroscience Studies, Queen’s UniversityKingstonCanada
| | - Randy Flanagan
- Centre for Neuroscience Studies, Queen’s UniversityKingstonCanada
- Department of Psychology, Queen’s UniversityKingstonCanada
| | - Jonny Smallwood
- Centre for Neuroscience Studies, Queen’s UniversityKingstonCanada
- Department of Psychology, Queen’s UniversityKingstonCanada
| | - Jason Gallivan
- Centre for Neuroscience Studies, Queen’s UniversityKingstonCanada
- Department of Psychology, Queen’s UniversityKingstonCanada
- Department of Biomedical and Molecular Sciences, Queen’s UniversityKingstonCanada
| |
Collapse
|
7
|
Li N, Lavalley CA, Chou KP, Chuning AE, Taylor S, Goldman CM, Torres T, Hodson R, Wilson RC, Stewart JL, Khalsa SS, Paulus MP, Smith R. Directed exploration is elevated in affective disorders but reduced by an aversive interoceptive state induction. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.06.19.24309110. [PMID: 38947082 PMCID: PMC11213056 DOI: 10.1101/2024.06.19.24309110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Elevated anxiety and uncertainty avoidance are known to exacerbate maladaptive choice in individuals with affective disorders. However, the differential roles of state vs. trait anxiety remain unclear, and underlying computational mechanisms have not been thoroughly characterized. In the present study, we investigated how a somatic (interoceptive) state anxiety induction influences learning and decision-making under uncertainty in individuals with clinically significant levels of trait anxiety. A sample of 58 healthy comparisons (HCs) and 61 individuals with affective disorders (iADs; i.e., depression and/or anxiety) completed a previously validated explore-exploit decision task, with and without an added breathing resistance manipulation designed to induce state anxiety. Computational modeling revealed a pattern in which iADs showed greater information-seeking (i.e., directed exploration; Cohen's d=.39, p=.039) in resting conditions, but that this was reduced by the anxiety induction. The affective disorders group also showed slower learning rates across conditions (Cohen's d=.52, p=.003), suggesting more persistent uncertainty. These findings highlight a complex interplay between trait anxiety and state anxiety. Specifically, while elevated trait anxiety is associated with persistent uncertainty, acute somatic anxiety can paradoxically curtail exploratory behaviors, potentially reinforcing maladaptive decision-making patterns in affective disorders.
Collapse
Affiliation(s)
- Ning Li
- Laureate Institute for Brain Research, Tulsa, OK
| | | | - Ko-Ping Chou
- Laureate Institute for Brain Research, Tulsa, OK
| | | | | | | | | | - Rowan Hodson
- Laureate Institute for Brain Research, Tulsa, OK
| | - Robert C. Wilson
- Department of Psychology, University of Arizona, Tucson, AZ
- Cognitive Science Program, University of Arizona, Tucson, AZ
| | | | - Sahib S. Khalsa
- Laureate Institute for Brain Research, Tulsa, OK
- Oxley College of Health and Natural Sciences, University of Tulsa, Tulsa, OK
| | - Martin P. Paulus
- Laureate Institute for Brain Research, Tulsa, OK
- Oxley College of Health and Natural Sciences, University of Tulsa, Tulsa, OK
| | - Ryan Smith
- Laureate Institute for Brain Research, Tulsa, OK
- Oxley College of Health and Natural Sciences, University of Tulsa, Tulsa, OK
| |
Collapse
|
8
|
Zuo L, Ai K, Liu W, Qiu B, Tang R, Fu J, Yang P, Kong Z, Song H, Zhu X, Zhang X. Navigating Exploitative Traps: Unveiling the Uncontrollable Reward Seeking of Individuals With Internet Gaming Disorder. BIOLOGICAL PSYCHIATRY. COGNITIVE NEUROSCIENCE AND NEUROIMAGING 2024:S2451-9022(24)00138-1. [PMID: 38839035 DOI: 10.1016/j.bpsc.2024.05.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 04/17/2024] [Accepted: 05/19/2024] [Indexed: 06/07/2024]
Abstract
BACKGROUND Internet gaming disorder (IGD) involves an imbalance in the brain's dual system, characterized by heightened reward seeking and diminished cognitive control, which lead to decision-making challenges. The exploration-exploitation strategy is key to decision making, but how IGD affects this process is unclear. METHODS To investigate the impact of IGD on decision making, a modified version of the 2-armed bandit task was employed. Participants included 41 individuals with IGD and 44 healthy control individuals. The study assessed the strategies used by participants in the task, particularly focusing on the exploitation-exploration strategy. Additionally, functional magnetic resonance imaging was used to examine brain activation patterns during decision-making and estimation phases. RESULTS The study found that individuals with IGD demonstrated greater reliance on exploitative strategies in decision making due to their elevated value-seeking tendencies and decreased cognitive control. Individuals with IGD also displayed heightened activation in the presupplementary motor area and the ventral striatum compared with the healthy control group in both decision-making and estimation phases. Meanwhile, the prefrontal cortex showed more inhibition in individuals with IGD than in the healthy control group during exploitative strategies. This inhibition decreased as cognitive control diminished. CONCLUSIONS The imbalance in the development of the dual system in individuals with IGD may lead to an overreliance on exploitative strategies. This imbalance, marked by increased reward seeking and reduced cognitive control, contributes to difficulties in decision making and value-related behavioral processes in individuals with IGD.
Collapse
Affiliation(s)
- Lin Zuo
- Hefei National Research Center for Physical Sciences at the Microscale, and Department of Hematology, the First Affiliated Hospital of USTC, Division of Life Science and Medicine, University of Science and Technology of China, Anhui, China
| | - Kedan Ai
- Hefei National Research Center for Physical Sciences at the Microscale, and Department of Hematology, the First Affiliated Hospital of USTC, Division of Life Science and Medicine, University of Science and Technology of China, Anhui, China
| | - Weili Liu
- Hefei National Research Center for Physical Sciences at the Microscale, and Department of Hematology, the First Affiliated Hospital of USTC, Division of Life Science and Medicine, University of Science and Technology of China, Anhui, China
| | - Bensheng Qiu
- Centers for Biomedical Engineering, USTC, Anhui, China
| | - Rui Tang
- Hefei National Research Center for Physical Sciences at the Microscale, and Department of Hematology, the First Affiliated Hospital of USTC, Division of Life Science and Medicine, University of Science and Technology of China, Anhui, China
| | - Jiaxin Fu
- Hefei National Research Center for Physical Sciences at the Microscale, and Department of Hematology, the First Affiliated Hospital of USTC, Division of Life Science and Medicine, University of Science and Technology of China, Anhui, China
| | - Ping Yang
- Department of Psychology, School of Humanities & Social Science, USTC, Anhui, China
| | - Zhuo Kong
- Hefei National Research Center for Physical Sciences at the Microscale, and Department of Hematology, the First Affiliated Hospital of USTC, Division of Life Science and Medicine, University of Science and Technology of China, Anhui, China
| | - Hongwen Song
- Hefei National Research Center for Physical Sciences at the Microscale, and Department of Hematology, the First Affiliated Hospital of USTC, Division of Life Science and Medicine, University of Science and Technology of China, Anhui, China; Key Laboratory of Philosophy and Social Science of Anhui Province on Adolescent Mental Health and Crisis Intelligence Intervention, Anhui, China.
| | - Xiaoyu Zhu
- Department of Hematology, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, USTC, Anhui, China.
| | - Xiaochu Zhang
- Hefei National Research Center for Physical Sciences at the Microscale, and Department of Hematology, the First Affiliated Hospital of USTC, Division of Life Science and Medicine, University of Science and Technology of China, Anhui, China; Department of Psychology, School of Humanities & Social Science, USTC, Anhui, China; Business School, Guizhou Education University, Guiyang, China; Institute of Health and Medicine, Hefei Comprehensive Science Center, Anhui, China.
| |
Collapse
|
9
|
Selbing I, Skewes J. The expression of decision and learning variables in movement patterns related to decision actions. Exp Brain Res 2024; 242:1311-1325. [PMID: 38551690 PMCID: PMC11108959 DOI: 10.1007/s00221-024-06805-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 02/09/2024] [Indexed: 05/23/2024]
Abstract
Decisions are not necessarily easy to separate into a planning and an execution phase and the decision-making process can often be reflected in the movement associated with the decision. Here, we used formalized definitions of concepts relevant in decision-making and learning to explore if and how these concepts correlate with decision-related movement paths, both during and after a choice is made. To this end, we let 120 participants (46 males, mean age = 24.5 years) undergo a repeated probabilistic two-choice task with changing probabilities where we used mouse-tracking, a simple non-invasive technique, to study the movements related to decisions. The decisions of the participants were modelled using Bayesian inference which enabled the computation of variables related to decision-making and learning. Analyses of the movement during the decision showed effects of relevant decision variables, such as confidence, on aspects related to, for instance, timing and pausing, range of movement and deviation from the shortest distance. For the movements after a decision there were some effects of relevant learning variables, mainly related to timing and speed. We believe our findings can be of interest for researchers within several fields, spanning from social learning to experimental methods and human-machine/robot interaction.
Collapse
Affiliation(s)
- Ida Selbing
- Division of Psychology, Karolinska Institutet, Nobels väg 9, Solna, Stockholm, Sweden.
- Interacting Minds Centre, Aarhus University, Aarhus, Denmark.
| | - Joshua Skewes
- Department for Linguistics, Cognitive Science, and Semiotics, Aarhus University, Aarhus, Denmark
- Interacting Minds Centre, Aarhus University, Aarhus, Denmark
| |
Collapse
|
10
|
Bahuguna J, Verstynen T, Rubin JE. How cortico-basal ganglia-thalamic subnetworks can shift decision policies to maximize reward rate. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.21.595174. [PMID: 38826315 PMCID: PMC11142098 DOI: 10.1101/2024.05.21.595174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
All mammals exhibit flexible decision policies that depend, at least in part, on the cortico-basal ganglia-thalamic (CBGT) pathways. Yet understanding how the complex connectivity, dynamics, and plasticity of CBGT circuits translates into experience-dependent shifts of decision policies represents a longstanding challenge in neuroscience. Here we used a computational approach to address this problem. Specifically, we simulated decisions driven by CBGT circuits under baseline, unrewarded conditions using a spiking neural network, and fit the resulting behavior to an evidence accumulation model. Using canonical correlation analysis, we then replicated the existence of three recently identified control ensembles (responsiveness, pliancy and choice) within CBGT circuits, with each ensemble mapping to a specific configuration of the evidence accumulation process. We subsequently simulated learning in a simple two-choice task with one optimal (i.e., rewarded) target. We find that value-based learning, via dopaminergic signals acting on cortico-striatal synapses, effectively manages the speed-accuracy tradeoff so as to increase reward rate over time. Within this process, learning-related changes in decision policy can be decomposed in terms of the contributions of each control ensemble, and these changes are driven by sequential reward prediction errors on individual trials. Our results provide a clear and simple mechanism for how dopaminergic plasticity shifts specific subnetworks within CBGT circuits so as to strategically modulate decision policies in order to maximize effective reward rate.
Collapse
Affiliation(s)
- Jyotika Bahuguna
- Department of Psychology & Neuroscience Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Timothy Verstynen
- Department of Psychology & Neuroscience Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- Center for the Neural Basis of Cognition, Pittsburgh, Pennsylvania, United States of America
| | - Jonathan E Rubin
- Center for the Neural Basis of Cognition, Pittsburgh, Pennsylvania, United States of America
- Department of Mathematics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
11
|
Goldman CM, Takahashi T, Lavalley CA, Li N, Taylor S, Chuning AE, Hodson R, Stewart JL, Wilson RC, Khalsa SS, Paulus MP, Smith R. Individuals with Methamphetamine Use Disorder Show Reduced Directed Exploration and Learning Rates Independent of an Aversive Interoceptive State Induction. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.17.24307491. [PMID: 38826438 PMCID: PMC11142260 DOI: 10.1101/2024.05.17.24307491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Methamphetamine Use Disorder (MUD) is associated with substantially reduced quality of life. Yet, decisions to use persist, due in part to avoidance of anticipated withdrawal states. However, the specific cognitive mechanisms underlying this decision process, and possible modulatory effects of aversive states, remain unclear. Here, 56 individuals with MUD and 58 healthy comparisons (HCs) performed a decision task, both with and without an aversive interoceptive state induction. Computational modeling measured the tendency to test beliefs about uncertain outcomes (directed exploration) and the ability to update beliefs in response to outcomes (learning rates). Compared to HCs, those with MUD exhibited less directed exploration and slower learning rates, but these differences were not affected by aversive state induction. These results suggest novel, state-independent computational mechanisms whereby individuals with MUD may have difficulties in testing beliefs about the tolerability of abstinence and in adjusting behavior in response to consequences of continued use.
Collapse
Affiliation(s)
| | - Toru Takahashi
- Laureate Institute for Brain Research, Tulsa, OK
- Japan Society for the Promotion of Science, Tokyo, Japan
| | | | - Ning Li
- Laureate Institute for Brain Research, Tulsa, OK
| | | | | | - Rowan Hodson
- Laureate Institute for Brain Research, Tulsa, OK
| | - Jennifer L. Stewart
- Laureate Institute for Brain Research, Tulsa, OK
- Oxley College of Health and Natural Sciences, University of Tulsa, Tulsa, OK
| | - Robert C. Wilson
- Department of Psychology, University of Arizona, Tucson, AZ
- Cognitive Science Program, University of Arizona, Tucson, AZ
| | - Sahib S. Khalsa
- Laureate Institute for Brain Research, Tulsa, OK
- Oxley College of Health and Natural Sciences, University of Tulsa, Tulsa, OK
| | - Martin P. Paulus
- Laureate Institute for Brain Research, Tulsa, OK
- Oxley College of Health and Natural Sciences, University of Tulsa, Tulsa, OK
| | - Ryan Smith
- Laureate Institute for Brain Research, Tulsa, OK
- Oxley College of Health and Natural Sciences, University of Tulsa, Tulsa, OK
| |
Collapse
|
12
|
Zhou D, Bornstein AM. Expanding horizons in reinforcement learning for curious exploration and creative planning. Behav Brain Sci 2024; 47:e118. [PMID: 38770877 DOI: 10.1017/s0140525x23003394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Curiosity and creativity are expressions of the trade-off between leveraging that with which we are familiar or seeking out novelty. Through the computational lens of reinforcement learning, we describe how formulating the value of information seeking and generation via their complementary effects on planning horizons formally captures a range of solutions to striking this balance.
Collapse
Affiliation(s)
- Dale Zhou
- Neurobiology and Behavior, 519 Biological Sciences Quad, University of California, Irvine, CA, USA ://dalezhou.com
- Center for the Neurobiology of Learning and Memory, Qureshey, Research Laboratory, University of California, Irvine, CA, USA ://aaron.bornstein.org/
| | - Aaron M Bornstein
- Center for the Neurobiology of Learning and Memory, Qureshey, Research Laboratory, University of California, Irvine, CA, USA ://aaron.bornstein.org/
- Department of Cognitive Sciences, 2318 Social & Behavioral Sciences Gateway, University of California, Irvine, CA, USA
| |
Collapse
|
13
|
Barbier-Chebbah A, Vestergaard CL, Masson JB. Approximate information for efficient exploration-exploitation strategies. Phys Rev E 2024; 109:L052105. [PMID: 38907409 DOI: 10.1103/physreve.109.l052105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 01/29/2024] [Indexed: 06/24/2024]
Abstract
This paper addresses the exploration-exploitation dilemma inherent in decision-making, focusing on multiarmed bandit problems. These involve an agent deciding whether to exploit current knowledge for immediate gains or explore new avenues for potential long-term rewards. We here introduce a class of algorithms, approximate information maximization (AIM), which employs a carefully chosen analytical approximation to the gradient of the entropy to choose which arm to pull at each point in time. AIM matches the performance of Thompson sampling, which is known to be asymptotically optimal, as well as that of Infomax from which it derives. AIM thus retains the advantages of Infomax while also offering enhanced computational speed, tractability, and ease of implementation. In particular, we demonstrate how to apply it to a 50-armed bandit game. Its expression is tunable, which allows for specific optimization in various settings, making it possible to surpass the performance of Thompson sampling at short and intermediary times.
Collapse
Affiliation(s)
- Alex Barbier-Chebbah
- Institut Pasteur, Université Paris Cité, CNRS UMR 3571, Decision and Bayesian Computation, 75015 Paris, France
- Épimethée, Inria, 75012 Paris, France
| | - Christian L Vestergaard
- Institut Pasteur, Université Paris Cité, CNRS UMR 3571, Decision and Bayesian Computation, 75015 Paris, France
- Épimethée, Inria, 75012 Paris, France
| | - Jean-Baptiste Masson
- Institut Pasteur, Université Paris Cité, CNRS UMR 3571, Decision and Bayesian Computation, 75015 Paris, France
- Épimethée, Inria, 75012 Paris, France
| |
Collapse
|
14
|
Hagan KE, Aimufua I, Haynos AF, Walsh BT. The explore/exploit trade-off: An ecologically valid and translational framework that can advance mechanistic understanding of eating disorders. Int J Eat Disord 2024; 57:1102-1108. [PMID: 38385592 DOI: 10.1002/eat.24173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 01/26/2024] [Accepted: 02/08/2024] [Indexed: 02/23/2024]
Abstract
The explore/exploit trade-off is a decision-making process that is conserved across species and balances exploring unfamiliar choices of unknown value with choosing familiar options of known value to maximize reward. This framework is rooted in behavioral ecology and has traditionally been used to study maladaptive versus adaptive non-human animal foraging behavior. Researchers have begun to recognize the potential utility of understanding human decision-making and psychopathology through the explore/exploit trade-off. In this article, we propose that explore/exploit trade-off holds promise for advancing our mechanistic understanding of decision-making processes that confer vulnerability for and maintain eating pathology due to its neurodevelopmental bases, conservation across species, and ability to be mathematically modeled. We present a model for how suboptimal explore/exploit decision-making can promote disordered eating and present recommendations for future research applying this framework to eating pathology. Taken together, the explore/exploit trade-off provides a translational framework for expanding etiologic and maintenance models of eating pathology, given developmental changes in explore/exploit decision-making that coincide in time with the emergence of eating pathology and evidence of biased explore/exploit decision-making in psychopathology. Additionally, understanding explore/exploit decision-making in eating disorders may improve knowledge of their underlying pathophysiology, informing targeted clinical interventions such as neuromodulation and pharmacotherapy. PUBLIC SIGNIFICANCE STATEMENT: The explore/exploit trade-off is a cross-species decision-making process whereby organisms choose between a known option with a known reward or sampling unfamiliar options. We hypothesize that imbalanced explore/exploit decision-making can promote disordered eating and present preliminary data. We propose that explore/exploit trade-off has significant potential to advance understanding of the neurocognitive and neurodevelopmental mechanisms of eating pathology, which could ultimately guide revisions of etiologic models and inform novel interventions.
Collapse
Affiliation(s)
- Kelsey E Hagan
- Department of Psychiatry, Virginia Commonwealth University, Richmond, Virginia, USA
- Institute for Women's Health, Virginia Commonwealth University, Richmond, Virginia, USA
| | - Ivieosa Aimufua
- Department of Psychiatry, New York State Psychiatric Institute, Columbia University Irving Medical Center, New York, New York, USA
| | - Ann F Haynos
- Department of Psychiatry, Virginia Commonwealth University, Richmond, Virginia, USA
- Department of Psychology, Virginia Commonwealth University, Richmond, Virginia, USA
- Department of Psychiatry and Behavioral Sciences, University of Minnesota, Minneapolis, Minnesota, USA
| | - B Timothy Walsh
- Department of Psychiatry, New York State Psychiatric Institute, Columbia University Irving Medical Center, New York, New York, USA
| |
Collapse
|
15
|
Harms MB, Xu Y, Green CS, Woodard K, Wilson R, Pollak SD. The structure and development of explore-exploit decision making. Cogn Psychol 2024; 150:101650. [PMID: 38461609 PMCID: PMC11275514 DOI: 10.1016/j.cogpsych.2024.101650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 03/05/2024] [Accepted: 03/06/2024] [Indexed: 03/12/2024]
Abstract
A critical component of human learning reflects the balance people must achieve between focusing on the utility of what they know versus openness to what they have yet to experience. How individuals decide whether to explore new options versus exploit known options has garnered growing interest in recent years. Yet, the component processes underlying decisions to explore and whether these processes change across development remain poorly understood. By contrasting a variety of tasks that measure exploration in slightly different ways, we found that decisions about whether to explore reflect (a) random exploration that is not explicitly goal-directed and (b) directed exploration to purposefully reduce uncertainty. While these components similarly characterized the decision-making of both youth and adults, younger participants made decisions that were less strategic, but more exploratory and flexible, than those of adults. These findings are discussed in terms of how people adapt to and learn from changing environments over time.Data has been made available in the Open Science Foundation platform (osf.io).
Collapse
Affiliation(s)
- Madeline B Harms
- Department of Psychology, University of Wisconsin - Madison, 1202 West Johnson Street, Madison, WI 53706, United States.
| | - Yuyan Xu
- Department of Psychology, University of Wisconsin - Madison, 1202 West Johnson Street, Madison, WI 53706, United States
| | - C Shawn Green
- Department of Psychology, University of Wisconsin - Madison, 1202 West Johnson Street, Madison, WI 53706, United States
| | - Kristina Woodard
- Department of Psychology, University of Wisconsin - Madison, 1202 West Johnson Street, Madison, WI 53706, United States
| | - Robert Wilson
- Department of Psychology, University of Arizona, 1503 E. University Blvd. (Building 68), Tucson, AZ 85721, United States
| | - Seth D Pollak
- Department of Psychology, University of Wisconsin - Madison, 1202 West Johnson Street, Madison, WI 53706, United States
| |
Collapse
|
16
|
Wiehler A, Peters J. Decomposition of Reinforcement Learning Deficits in Disordered Gambling via Drift Diffusion Modeling and Functional Magnetic Resonance Imaging. COMPUTATIONAL PSYCHIATRY (CAMBRIDGE, MASS.) 2024; 8:23-45. [PMID: 38774428 PMCID: PMC11104325 DOI: 10.5334/cpsy.104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 03/07/2024] [Indexed: 05/24/2024]
Abstract
Gambling disorder is associated with deficits in reward-based learning, but the underlying computational mechanisms are still poorly understood. Here, we examined this issue using a stationary reinforcement learning task in combination with computational modeling and functional resonance imaging (fMRI) in individuals that regular participate in gambling (n = 23, seven fulfilled one to three DSM 5 criteria for gambling disorder, sixteen fulfilled four or more) and matched controls (n = 23). As predicted, the gambling group exhibited substantially reduced accuracy, whereas overall response times (RTs) were not reliably different between groups. We then used comprehensive modeling using reinforcement learning drift diffusion models (RLDDMs) in combination with hierarchical Bayesian parameter estimation to shed light on the computational underpinnings of this performance deficit. In both groups, an RLDDM in which both non-decision time and decision threshold (boundary separation) changed over the course of the experiment accounted for the data best. The model showed good parameter and model recovery, and posterior predictive checks revealed that, in both groups, the model accurately reproduced the evolution of accuracies and RTs over time. Modeling revealed that, compared to controls, the learning impairment in the gambling group was linked to a more rapid reduction in decision thresholds over time, and a reduced impact of value-differences on the drift rate. The gambling group also showed shorter non-decision times. FMRI analyses replicated effects of prediction error coding in the ventral striatum and value coding in the ventro-medial prefrontal cortex, but there was no credible evidence for group differences in these effects. Taken together, our findings show that reinforcement learning impairments in disordered gambling are linked to both maladaptive decision threshold adjustments and a reduced consideration of option values in the choice process.
Collapse
Affiliation(s)
- Antonius Wiehler
- Department of Systems Neuroscience, University Medical Centre Hamburg-Eppendorf, Hamburg, Germany
- Institut du Cerveau et de la Moelle épinière (ICM), INSERM U 1127, CNRS UMR 7225, Sorbonne Universités Paris, France
| | - Jan Peters
- Department of Systems Neuroscience, University Medical Centre Hamburg-Eppendorf, Hamburg, Germany
- Department of Psychology, Biological Psychology, University of Cologne, Cologne, Germany
| |
Collapse
|
17
|
Venditto SJC, Miller KJ, Brody CD, Daw ND. Dynamic reinforcement learning reveals time-dependent shifts in strategy during reward learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.28.582617. [PMID: 38464244 PMCID: PMC10925334 DOI: 10.1101/2024.02.28.582617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Different brain systems have been hypothesized to subserve multiple "experts" that compete to generate behavior. In reinforcement learning, two general processes, one model-free (MF) and one model-based (MB), are often modeled as a mixture of agents (MoA) and hypothesized to capture differences between automaticity vs. deliberation. However, shifts in strategy cannot be captured by a static MoA. To investigate such dynamics, we present the mixture-of-agents hidden Markov model (MoA-HMM), which simultaneously learns inferred action values from a set of agents and the temporal dynamics of underlying "hidden" states that capture shifts in agent contributions over time. Applying this model to a multi-step,reward-guided task in rats reveals a progression of within-session strategies: a shift from initial MB exploration to MB exploitation, and finally to reduced engagement. The inferred states predict changes in both response time and OFC neural encoding during the task, suggesting that these states are capturing real shifts in dynamics.
Collapse
|
18
|
Shen X, Helion C, Smith DV, Murty VP. Motivation as a Lens for Understanding Information-seeking Behaviors. J Cogn Neurosci 2024; 36:362-376. [PMID: 37944120 DOI: 10.1162/jocn_a_02083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2023]
Abstract
Most prior research characterizes information-seeking behaviors as serving utilitarian purposes, such as whether the obtained information can help solve practical problems. However, information-seeking behaviors are sensitive to different contexts (i.e., threat vs. curiosity), despite having equivalent utility. Furthermore, these search behaviors can be modulated by individuals' life history and personality traits. Yet the emphasis on utilitarian utility has precluded the development of a unified model, which explains when and how individuals actively seek information. To account for this variability and flexibility, we propose a unified information-seeking framework that examines information-seeking through the lens of motivation. This unified model accounts for integration across individuals' internal goal states and the salient features of the environment to influence information-seeking behavior. We propose that information-seeking is determined by motivation for information, invigorated either by instrumental utility or hedonic utility, wherein one's personal or environmental context moderates this relationship. Furthermore, we speculate that the final common denominator in guiding information-seeking is the engagement of different neuromodulatory circuits centered on dopaminergic and noradrenergic tone. Our framework provides a unified framework for information-seeking behaviors and generates several testable predictions for future studies.
Collapse
|
19
|
Gillespie B, Houghton MJ, Ganio K, McDevitt CA, Bennett D, Dunn A, Raju S, Schroeder A, Hill RA, Cardoso BR. Maternal selenium dietary supplementation alters sociability and reinforcement learning deficits induced by in utero exposure to maternal immune activation in mice. Brain Behav Immun 2024; 116:349-361. [PMID: 38142918 DOI: 10.1016/j.bbi.2023.12.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/24/2023] [Accepted: 12/19/2023] [Indexed: 12/26/2023] Open
Abstract
Maternal immune activation (MIA) during pregnancy increases the risk for the unborn foetus to develop neurodevelopmental conditions such as autism spectrum disorder and schizophrenia later in life. MIA mouse models recapitulate behavioural and biological phenotypes relevant to both conditions, and are valuable models to test novel treatment approaches. Selenium (Se) has potent anti-inflammatory properties suggesting it may be an effective prophylactic treatment against MIA. The aim of this study was to determine if Se supplementation during pregnancy can prevent adverse effects of MIA on offspring brain and behaviour in a mouse model. Selenium was administered via drinking water (1.5 ppm) to pregnant dams from gestational day (GD) 9 to birth, and MIA was induced at GD17 using polyinosinic:polycytidylic acid (poly-I:C, 20 mg/kg via intraperitoneal injection). Foetal placenta and brain cytokine levels were assessed using a Luminex assay and brain elemental nutrients assessed using inductively coupled plasma- mass spectrometry. Adult offspring were behaviourally assessed using a reinforcement learning paradigm, the three-chamber sociability test and the open field test. MIA elevated placental IL-1β and IL-17, and Se supplementation successfully prevented this elevation. MIA caused an increase in foetal brain calcium, which was prevented by Se supplement. MIA caused in offspring a female-specific reduction in sociability, which was recovered by Se, and a male-specific reduction in social memory, which was not recovered by Se. Exposure to poly-I:C or selenium, but not both, reduced performance in the reinforcement learning task. Computational modelling indicated that this was predominantly due to increased exploratory behaviour, rather than reduced rate of learning the location of the food reward. This study demonstrates that while Se may be beneficial in ameliorating sociability deficits caused by MIA, it may have negative effects in other behavioural domains. Caution in the use of Se supplementation during pregnancy is therefore warranted.
Collapse
Affiliation(s)
- Brendan Gillespie
- Department of Psychiatry, Monash University, Clayton, VIC 3168, Australia
| | - Michael J Houghton
- Department of Nutrition, Dietetics and Food, Monash University, Notting Hill, VIC 3168, Australia; Victorian Heart Institute, Monash University, Clayton, VIC 3168, Australia
| | - Katherine Ganio
- Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, VIC 3000, Australia
| | - Christopher A McDevitt
- Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, VIC 3000, Australia
| | - Daniel Bennett
- Department of Psychiatry, Monash University, Clayton, VIC 3168, Australia
| | - Ariel Dunn
- Department of Psychiatry, Monash University, Clayton, VIC 3168, Australia
| | - Sharvada Raju
- Department of Psychiatry, Monash University, Clayton, VIC 3168, Australia
| | - Anna Schroeder
- Department of Psychiatry, Monash University, Clayton, VIC 3168, Australia.
| | - Rachel A Hill
- Department of Psychiatry, Monash University, Clayton, VIC 3168, Australia.
| | - Barbara R Cardoso
- Department of Nutrition, Dietetics and Food, Monash University, Notting Hill, VIC 3168, Australia.
| |
Collapse
|
20
|
Mizell JM, Wang S, Frisvold A, Alvarado L, Farrell-Skupny A, Keung W, Phelps CE, Sundman MH, Franchetti MK, Chou YH, Alexander GE, Wilson RC. Differential impacts of healthy cognitive aging on directed and random exploration. Psychol Aging 2024; 39:88-101. [PMID: 38358695 PMCID: PMC10871551 DOI: 10.1037/pag0000791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2024]
Abstract
Deciding whether to explore unknown opportunities or exploit well-known options is a ubiquitous part of our everyday lives. Extensive work in college students suggests that young people make explore-exploit decisions using a mixture of information seeking and random behavioral variability. Whether, and to what extent, older adults use the same strategies is unknown. To address this question, 51 older adults (ages 65-74) and 32 younger adults (ages 18-25) completed the Horizon Task, a gambling task that quantifies information seeking and behavioral variability as well as how these strategies are controlled for the purposes of exploration. Qualitatively, we found that older adults performed similar to younger adults on this task, increasing both their information seeking and behavioral variability when it was adaptive to explore. Quantitively, however, there were substantial differences between the age groups, with older adults showing less information seeking overall and less reliance on variability as a means to explore. In addition, we found a subset of approximately 26% of older adults whose information seeking was close to zero, avoiding informative options even when they were clearly the better choice. Unsurprisingly, these "information avoiders" performed worse on the task. In contrast, task performance in the remaining "information seeking" older adults was comparable to that of younger adults suggesting that age-related differences in explore-exploit decision making may be adaptive except when they are taken to extremes. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Collapse
Affiliation(s)
| | - Siyu Wang
- University of Arizona, Department of Psychology
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Wyatt LE, Hewan PA, Hogeveen J, Spreng RN, Turner GR. Exploration versus exploitation decisions in the human brain: A systematic review of functional neuroimaging and neuropsychological studies. Neuropsychologia 2024; 192:108740. [PMID: 38036246 DOI: 10.1016/j.neuropsychologia.2023.108740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Revised: 10/15/2023] [Accepted: 11/21/2023] [Indexed: 12/02/2023]
Abstract
Thoughts and actions are often driven by a decision to either explore new avenues with unknown outcomes, or to exploit known options with predictable outcomes. Yet, the neural mechanisms underlying this exploration-exploitation trade-off in humans remain poorly understood. This is attributable to variability in the operationalization of exploration and exploitation as psychological constructs, as well as the heterogeneity of experimental protocols and paradigms used to study these choice behaviours. To address this gap, here we present a comprehensive review of the literature to investigate the neural basis of explore-exploit decision-making in humans. We first conducted a systematic review of functional magnetic resonance imaging (fMRI) studies of exploration-versus exploitation-based decision-making in healthy adult humans during foraging, reinforcement learning, and information search. Eleven fMRI studies met inclusion criterion for this review. Adopting a network neuroscience framework, synthesis of the findings across these studies revealed that exploration-based choice was associated with the engagement of attentional, control, and salience networks. In contrast, exploitation-based choice was associated with engagement of default network brain regions. We interpret these results in the context of a network architecture that supports the flexible switching between externally and internally directed cognitive processes, necessary for adaptive, goal-directed behaviour. To further investigate potential neural mechanisms underlying the exploration-exploitation trade-off we next surveyed studies involving neurodevelopmental, neuropsychological, and neuropsychiatric disorders, as well as lifespan development, and neurodegenerative diseases. We observed striking differences in patterns of explore-exploit decision-making across these populations, again suggesting that these two decision-making modes are supported by independent neural circuits. Taken together, our review highlights the need for precision-mapping of the neural circuitry and behavioural correlates associated with exploration and exploitation in humans. Characterizing exploration versus exploitation decision-making biases may offer a novel, trans-diagnostic approach to assessment, surveillance, and intervention for cognitive decline and dysfunction in normal development and clinical populations.
Collapse
Affiliation(s)
- Lindsay E Wyatt
- Department of Psychology, York University, Toronto, ON, Canada
| | - Patrick A Hewan
- Department of Psychology, York University, Toronto, ON, Canada
| | - Jeremy Hogeveen
- Department of Psychology, The University of New Mexico, Albuquerque, NM, USA
| | - R Nathan Spreng
- Montréal Neurological Institute, Department of Neurology and Neurosurgery, McGill University, Montréal, QC, H3A 2B4, Canada; Department of Psychology, McGill University, Montréal, QC, Canada; Department of Psychiatry, McGill University, Montréal, QC, Canada; McConnell Brain Imaging Centre, Montréal Neurological Institute, McGill University, Montréal, QC, Canada.
| | - Gary R Turner
- Department of Psychology, York University, Toronto, ON, Canada.
| |
Collapse
|
22
|
Bass I, Mahaffey E, Bonawitz E. Children Use Teachers' Beliefs About Their Abilities to Calibrate Explore-Exploit Decisions. Top Cogn Sci 2023. [PMID: 38033200 DOI: 10.1111/tops.12714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 11/14/2023] [Accepted: 11/17/2023] [Indexed: 12/02/2023]
Abstract
Models of the explore-exploit problem have explained how children's decision making is weighed by a bias for information (directed exploration), randomness, and generalization. These behaviors are often tested in domains where a choice to explore (or exploit) is guaranteed to reveal an outcome. An often overlooked but critical component of the assessment of explore-exploit decisions lies in the expected success of taking actions in the first place-and, crucially, how such decisions might be carried out when learning from others. Here, we examine how children consider an informal teacher's beliefs about the child's competence when deciding how difficult a task they want to pursue. We present a simple model of this problem that predicts that while learners should follow the recommendation of an accurate teacher, they should exploit easier games when a teacher overestimates their abilities, and explore harder games when she underestimates them. We tested these predictions in two experiments with adults (Experiment 1) and 6- to 8-year-old children (Experiment 2). In our task, participants' performance on a picture-matching game was either overestimated, underestimated, or accurately represented by a confederate (the "Teacher"), who then presented three new matching games of varying assessed difficulty (too easy, too hard, just right) at varying potential reward (low, medium, high). In line with our model's predictions, we found that both adults and children calibrated their choices to the teacher's representation of their competence. That is, to maximize expected reward, when she underestimated them, participants chose games the teacher evaluated as being too hard for them; when she overestimated them, they chose games she evaluated as being too easy; and when she was accurate, they chose games she assessed as being just right. This work provides insight into the early-emerging ability to calibrate explore-exploit decisions to others' knowledge when learning in informal pedagogical contexts.
Collapse
Affiliation(s)
- Ilona Bass
- Department of Psychology, Harvard University
- Graduate School of Education, Harvard University
| | | | | |
Collapse
|
23
|
Lloyd A, Viding E, McKay R, Furl N. Understanding patch foraging strategies across development. Trends Cogn Sci 2023; 27:1085-1098. [PMID: 37500422 DOI: 10.1016/j.tics.2023.07.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 07/05/2023] [Accepted: 07/06/2023] [Indexed: 07/29/2023]
Abstract
Patch foraging is a near-ubiquitous behaviour across the animal kingdom and characterises many decision-making domains encountered by humans. We review how a disposition to explore in adolescence may reflect the evolutionary conditions under which hunter-gatherers foraged for resources. We propose that neurocomputational mechanisms responsible for reward processing, learning, and cognitive control facilitate the transition from exploratory strategies in adolescence to exploitative strategies in adulthood - where individuals capitalise on known resources. This developmental transition may be disrupted by psychopathology, as there is emerging evidence of biases in explore/exploit choices in mental health problems. Explore/exploit choices may be an informative marker for mental health across development and future research should consider this feature of decision-making as a target for clinical intervention.
Collapse
Affiliation(s)
- Alex Lloyd
- Clinical, Educational, and Health Psychology, Psychology and Language Sciences, University College London, 26 Bedford Way, London, WC1H 0AP, UK.
| | - Essi Viding
- Clinical, Educational, and Health Psychology, Psychology and Language Sciences, University College London, 26 Bedford Way, London, WC1H 0AP, UK
| | - Ryan McKay
- Department of Psychology, Royal Holloway, University of London, Egham Hill, Egham, TW20 0EX, UK
| | - Nicholas Furl
- Department of Psychology, Royal Holloway, University of London, Egham Hill, Egham, TW20 0EX, UK
| |
Collapse
|
24
|
Daumas L, Zory R, Junquera-Badilla I, Ferrandez M, Ettore E, Robert P, Sacco G, Manera V, Ramanoël S. How does apathy impact exploration-exploitation decision-making in older patients with neurocognitive disorders? NPJ AGING 2023; 9:25. [PMID: 37903801 PMCID: PMC10616174 DOI: 10.1038/s41514-023-00121-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 09/14/2023] [Indexed: 11/01/2023]
Abstract
Apathy is a pervasive clinical syndrome in neurocognitive disorders, characterized by a quantitative reduction in goal-directed behaviors. The brain structures involved in the physiopathology of apathy have also been connected to the brain structures involved in probabilistic reward learning in the exploration-exploitation dilemma. This dilemma in question involves the challenge of selecting between a familiar option with a more predictable outcome, and another option whose outcome is uncertain and may yield potentially greater rewards compared to the known option. The aim of this study was to combine experimental procedures and computational modeling to examine whether, in older adults with mild neurocognitive disorders, apathy affects performance in the exploration-exploitation dilemma. Through using a four-armed bandit reinforcement-learning task, we showed that apathetic older adults explored more and performed worse than non-apathetic subjects. Moreover, the mental flexibility assessed by the Trail-making test-B was negatively associated with the percentage of exploration. These results suggest that apathy is characterized by an increased explorative behavior and inefficient decision-making, possibly due to weak mental flexibility to switch toward the exploitation of the more rewarding options. Apathetic participants also took longer to make a choice and failed more often to respond in the allotted time, which could reflect the difficulties in action initiation and selection. In conclusion, the present results suggest that apathy in participants with neurocognitive disorders is associated with specific disturbances in the exploration-exploitation trade-off and sheds light on the disturbances in reward processing in patients with apathy.
Collapse
Affiliation(s)
- Lyne Daumas
- Université Côte d'Azur, LAMHESS, Nice, France.
- Université Côte d'Azur, CoBTeK, Nice, France.
| | - Raphaël Zory
- Université Côte d'Azur, LAMHESS, Nice, France
- Institut Universitaire de France, Paris, France
| | | | - Marion Ferrandez
- Université Côte d'Azur, CoBTeK, Nice, France
- Université Côte d'Azur, Centre Hospitalier Universitaire de Nice, service Clinique Gériatrique de Soins Ambulatoires, Centre Mémoire de Ressources et de Recherche, Nice, France
| | - Eric Ettore
- Université Côte d'Azur, CoBTeK, Nice, France
- Université Côte d'Azur, Centre Hospitalier Universitaire de Nice, service Clinique Gériatrique de Soins Ambulatoires, Centre Mémoire de Ressources et de Recherche, Nice, France
- Association Innovation Alzheimer, Nice, France
| | - Philippe Robert
- Université Côte d'Azur, CoBTeK, Nice, France
- Association Innovation Alzheimer, Nice, France
| | - Guillaume Sacco
- Université Côte d'Azur, CoBTeK, Nice, France
- Université Côte d'Azur, Centre Hospitalier Universitaire de Nice, service Clinique Gériatrique de Soins Ambulatoires, Centre Mémoire de Ressources et de Recherche, Nice, France
- Association Innovation Alzheimer, Nice, France
- Univ Angers, Université de Nantes, LPPL, SFR CONFLUENCES, 49000, Angers, France
| | - Valeria Manera
- Université Côte d'Azur, CoBTeK, Nice, France
- Association Innovation Alzheimer, Nice, France
| | - Stephen Ramanoël
- Université Côte d'Azur, LAMHESS, Nice, France
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, 17 rue Moreau, 75012, Paris, France
| |
Collapse
|
25
|
Lapidow E, Bonawitz E. What's in the Box? Preschoolers Consider Ambiguity, Expected Value, and Information for Future Decisions in Explore-Exploit Tasks. Open Mind (Camb) 2023; 7:855-878. [PMID: 37946850 PMCID: PMC10631797 DOI: 10.1162/opmi_a_00110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2023] [Accepted: 09/16/2023] [Indexed: 11/12/2023] Open
Abstract
Self-directed exploration in childhood appears driven by a desire to resolve uncertainties in order to learn more about the world. However, in adult decision-making, the choice to explore new information rather than exploit what is already known takes many factors beyond uncertainty (such as expected utilities and costs) into account. The evidence for whether young children are sensitive to complex, contextual factors in making exploration decisions is limited and mixed. Here, we investigate whether modifying uncertain options influences explore-exploit behavior in preschool-aged children (48-68 months). Over the course of three experiments, we manipulate uncertain options' ambiguity, expected value, and potential to improve epistemic state for future exploration in a novel forced-choice design. We find evidence that young children are influenced by each of these factors, suggesting that early, self-directed exploration involves sophisticated, context-sensitive decision-making under uncertainty.
Collapse
Affiliation(s)
- Elizabeth Lapidow
- Department of Psychology, University of Waterloo, Waterloo, ON, Canada
| | | |
Collapse
|
26
|
Brändle F, Stocks LJ, Tenenbaum JB, Gershman SJ, Schulz E. Empowerment contributes to exploration behaviour in a creative video game. Nat Hum Behav 2023; 7:1481-1489. [PMID: 37488401 DOI: 10.1038/s41562-023-01661-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Accepted: 06/15/2023] [Indexed: 07/26/2023]
Abstract
Studies of human exploration frequently cast people as serendipitously stumbling upon good options. Yet these studies may not capture the richness of exploration strategies that people exhibit in more complex environments. Here we study behaviour in a large dataset of 29,493 players of the richly structured online game 'Little Alchemy 2'. In this game, players start with four elements, which they can combine to create up to 720 complex objects. We find that players are driven not only by external reward signals, such as an attempt to produce successful outcomes, but also by an intrinsic motivation to create objects that empower them to create even more objects. We find that this drive for empowerment is eliminated when playing a game variant that lacks recognizable semantics, indicating that people use their knowledge about the world and its possibilities to guide their exploration. Our results suggest that the drive for empowerment may be a potent source of intrinsic motivation in richly structured domains, particularly those that lack explicit reward signals.
Collapse
Affiliation(s)
| | - Lena J Stocks
- Max Planck Institute for Biological Cybernetics, Tübingen, Germany
| | - Joshua B Tenenbaum
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- Center for Brains, Minds, and Machines, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Samuel J Gershman
- Center for Brains, Minds, and Machines, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA, USA
| | - Eric Schulz
- Max Planck Institute for Biological Cybernetics, Tübingen, Germany
| |
Collapse
|
27
|
Zhuang W, Niebaum J, Munakata Y. Changes in adaptation to time horizons across development. Dev Psychol 2023; 59:1532-1542. [PMID: 37166865 PMCID: PMC10524449 DOI: 10.1037/dev0001529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
When making decisions, the amount of time remaining matters. When time horizons are long, exploring unknown options can inform later decisions, but when time horizons are short, exploiting known options should be prioritized. While adults and adolescents adapt their exploration in this way, it is unclear when such adaptation emerges and how individuals behave when time horizons are ambiguous, as in many real-life situations. We examined these questions by having 5- to 6-year-olds (N = 43), 11- to 12-year-olds (N = 40), and adult college students (N = 49) in the United States complete a Simplified Horizons Task under short, long, and ambiguous time horizons. Adaptation to time horizons increased with age: older children and adults explored more when horizons were long than when short, and while some younger children adapted to time horizons, younger children overall did not show strong evidence of adapting. Under ambiguous horizons, older children and adults preferred to exploit over explore, while younger children did not show this preference. Thus, adaptation to time horizons is evident by ages 11-12 and may begin to emerge around 5-6 years, and children decrease their tendencies to explore under short and ambiguous time horizons with development. This developmental shift may lead to less learning but more adaptive decision making. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
Collapse
Affiliation(s)
- Winnie Zhuang
- Department of Psychology and Center for Mind and Brain, University of California, Davis
- Department of Psychology, University of Colorado Boulder
| | - Jesse Niebaum
- Department of Psychology and Center for Mind and Brain, University of California, Davis
| | - Yuko Munakata
- Department of Psychology and Center for Mind and Brain, University of California, Davis
- Department of Psychology, University of Colorado Boulder
| |
Collapse
|
28
|
Sinclair AH, Wang YC, Adcock RA. Instructed motivational states bias reinforcement learning and memory formation. Proc Natl Acad Sci U S A 2023; 120:e2304881120. [PMID: 37490530 PMCID: PMC10401012 DOI: 10.1073/pnas.2304881120] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 06/19/2023] [Indexed: 07/27/2023] Open
Abstract
Motivation influences goals, decisions, and memory formation. Imperative motivation links urgent goals to actions, narrowing the focus of attention and memory. Conversely, interrogative motivation integrates goals over time and space, supporting rich memory encoding for flexible future use. We manipulated motivational states via cover stories for a reinforcement learning task: The imperative group imagined executing a museum heist, whereas the interrogative group imagined planning a future heist. Participants repeatedly chose among four doors, representing different museum rooms, to sample trial-unique paintings with variable rewards (later converted to bonus payments). The next day, participants performed a surprise memory test. Crucially, only the cover stories differed between the imperative and interrogative groups; the reinforcement learning task was identical, and all participants had the same expectations about how and when bonus payments would be awarded. In an initial sample and a preregistered replication, we demonstrated that imperative motivation increased exploitation during reinforcement learning. Conversely, interrogative motivation increased directed (but not random) exploration, despite the cost to participants' earnings. At test, the interrogative group was more accurate at recognizing paintings and recalling associated values. In the interrogative group, higher value paintings were more likely to be remembered; imperative motivation disrupted this effect of reward modulating memory. Overall, we demonstrate that a prelearning motivational manipulation can bias learning and memory, bearing implications for education, behavior change, clinical interventions, and communication.
Collapse
Affiliation(s)
- Alyssa H. Sinclair
- Department of Psychology & Neuroscience, Duke University, Durham, NC27710
| | - Yuxi C. Wang
- Department of Psychology & Neuroscience, Duke University, Durham, NC27710
| | - R. Alison Adcock
- Department of Psychology & Neuroscience, Duke University, Durham, NC27710
- Department of Psychiatry & Behavioral Sciences, Duke University, Durham, NC27710
| |
Collapse
|
29
|
Campbell EM, Singh G, Claus ED, Witkiewitz K, Costa VD, Hogeveen J, Cavanagh JF. Electrophysiological Markers of Aberrant Cue-Specific Exploration in Hazardous Drinkers. COMPUTATIONAL PSYCHIATRY (CAMBRIDGE, MASS.) 2023; 7:47-59. [PMID: 38774639 PMCID: PMC11104413 DOI: 10.5334/cpsy.96] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/04/2023] [Accepted: 06/28/2023] [Indexed: 05/24/2024]
Abstract
Background Hazardous drinking is associated with maladaptive alcohol-related decision-making. Existing studies have often focused on how participants learn to exploit familiar cues based on prior reinforcement, but little is known about the mechanisms that drive hazardous drinkers to explore novel alcohol cues when their value is not known. Methods We investigated exploration of novel alcohol and non-alcohol cues in hazardous drinkers (N = 27) and control participants (N = 26) during electroencephalography (EEG). A normative computational model with two free parameters was fit to estimate participants' weighting of the future value of exploration and immediate value of exploitation. Results Hazardous drinkers demonstrated increased exploration of novel alcohol cues, and conversely, increased probability of exploiting familiar alternatives instead of exploring novel non-alcohol cues. The motivation to explore novel alcohol stimuli in hazardous drinkers was driven by an elevated relative future valuation of uncertain alcohol cues. P3a predicted more exploratory decision policies driven by an enhanced relative future valuation of novel alcohol cues. P3b did not predict choice behavior, but computational parameter estimates suggested that hazardous drinkers with enhanced P3b to alcohol cues were likely to learn to exploit their immediate expected value. Conclusions Hazardous drinkers did not display atypical choice behavior, different P3a/P3b amplitudes, or computational estimates to novel non-alcohol cues-diverging from previous studies in addiction showing atypical generalized explore-exploit decisions with non-drug-related cues. These findings reveal that cue-specific neural computations may drive aberrant alcohol-related decision-making in hazardous drinkers-highlighting the importance of drug-relevant cues in studies of decision-making in addiction.
Collapse
Affiliation(s)
- Ethan M. Campbell
- Department of Psychology & Psychology Clinical Neuroscience Center, University of New Mexico, US
| | - Garima Singh
- Department of Psychology & Psychology Clinical Neuroscience Center, University of New Mexico, US
| | - Eric D. Claus
- Department of Biobehavioral Health, Pennsylvania State University, US
| | - Katie Witkiewitz
- Department of Psychology & Psychology Clinical Neuroscience Center, University of New Mexico, US
| | - Vincent D. Costa
- Division of Neuroscience, Oregon National Primate Research Center, US
| | - Jeremy Hogeveen
- Department of Psychology & Psychology Clinical Neuroscience Center, University of New Mexico, US
| | - James F. Cavanagh
- Department of Psychology & Psychology Clinical Neuroscience Center, University of New Mexico, US
| |
Collapse
|
30
|
Chen CS, Mueller D, Knep E, Ebitz RB, Grissom NM. Dopamine and norepinephrine differentially mediate the exploration-exploitation tradeoff. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.09.523322. [PMID: 36711959 PMCID: PMC9881999 DOI: 10.1101/2023.01.09.523322] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
The catecholamines dopamine (DA) and norepinephrine (NE) have been repeatedly implicated in neuropsychiatric vulnerability, in part via their roles in mediating the decision making processes. Although the two neuromodulators share a synthesis pathway and are co-activated under states of arousal, they engage in distinct circuits and roles in modulating neural activity across the brain. However, in the computational neuroscience literature, they have been assigned similar roles in modulating the latent cognitive processes of decision making, in particular the exploration-exploitation tradeoff. Revealing how each neuromodulator contributes to this explore-exploit process will be important in guiding mechanistic hypotheses emerging from computational psychiatric approaches. To understand the differences and overlaps of the roles of these two catecholamine systems in regulating exploration and exploitation, a direct comparison using the same dynamic decision making task is needed. Here, we ran mice in a restless two-armed bandit task, which encourages both exploration and exploitation. We systemically administered a nonselective DA receptor antagonist (flupenthixol), a nonselective DA receptor agonist (apomorphine), a NE beta-receptor antagonist (propranolol), and a NE beta-receptor agonist (isoproterenol), and examined changes in exploration within subjects across sessions. We found a bidirectional modulatory effect of dopamine receptor activity on the level of exploration. Increasing dopamine activity decreased exploration and decreasing dopamine activity increased exploration. Beta-noradrenergic receptor activity also modulated exploration, but the modulatory effect was mediated by sex. Reinforcement learning model parameters suggested that dopamine modulation affected exploration via decision noise and norepinephrine modulation affected exploration via outcome sensitivity. Together, these findings suggested that the mechanisms that govern the transition between exploration and exploitation are sensitive to changes in both catecholamine functions and revealed differential roles for NE and DA in mediating exploration.
Collapse
|
31
|
Xiaobao P, Hongyu C, Horsey EM. The predictive effect of relative intuition on social entrepreneurship orientation: How do exploratory and exploitative learning and personal identity interact? Acta Psychol (Amst) 2023; 237:103951. [PMID: 37279622 DOI: 10.1016/j.actpsy.2023.103951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 05/24/2023] [Accepted: 05/24/2023] [Indexed: 06/08/2023] Open
Abstract
This study complements the stream of psychology studies on the effects of an individual's intuition on strategic decisions and how it shapes behavioral tendencies by extending how these effects evolve social entrepreneurship orientation in social entrepreneurship. Theoretically, we establish the nexus between relative intuition and social entrepreneurship orientation as well as the moderating roles of exploratory and exploitative learning and personal identity. Empirical validation of these nexuses was based on a cross-section of 276 certified social enterprises in China. The findings indicate that social entrepreneurs' relative intuition has a positive association with social entrepreneurship orientation. Exploratory and exploitative learning positively mediate the nexus between relative intuition and social entrepreneurship orientation. In addition, personal identity positively moderates the effects of exploratory and exploitative learning on social entrepreneurship orientation. Subsequently, we found that the link between relative intuition and social entrepreneurship orientation strengthens as the social entrepreneurs' personal identity increases. In this light, we identify relative intuition as the foundation of exploratory and exploratory learning for the development of social entrepreneurship orientation. Similarly, we shed light on how personal identity positively facilitates the roles of these factors by arousing dedication to the processes/stages of the pursuit of social entrepreneurship orientation goal attainment.
Collapse
Affiliation(s)
- Peng Xiaobao
- School of Public Affairs, University of Science and Technology of China, Hefei, Anhui Province, China.
| | - Chen Hongyu
- School of Public Affairs, University of Science and Technology of China, Hefei, Anhui Province, China.
| | - Emmanuel Mensah Horsey
- School of Public Affairs, University of Science and Technology of China, Hefei, Anhui Province, China.
| |
Collapse
|
32
|
Rischall I, Hunter L, Jensen G, Gottlieb J. Inefficient prioritization of task-relevant attributes during instrumental information demand. Nat Commun 2023; 14:3174. [PMID: 37264004 DOI: 10.1038/s41467-023-38821-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Accepted: 05/17/2023] [Indexed: 06/03/2023] Open
Abstract
In natural settings, people evaluate complex multi-attribute situations and decide which attribute to request information about. Little is known about how people make this selection and specifically, how they identify individual observations that best predict the value of a multi-attribute situation. Here show that, in a simple task of information demand, participants inefficiently query attributes that have high individual value but are relatively uninformative about a total payoff. This inefficiency is robust in two instrumental conditions in which gathering less informative observations leads to significantly lower rewards. Across individuals, variations in the sensitivity to informativeness is associated with personality metrics, showing negative associations with extraversion and thrill seeking and positive associations with stress tolerance and need for cognition. Thus, people select informative queries using sub-optimal strategies that are associated with personality traits and influence consequential choices.
Collapse
Affiliation(s)
- Isabella Rischall
- Department of Neuroscience, Columbia University, New York, NY, USA
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - Laura Hunter
- Department of Neuroscience, Columbia University, New York, NY, USA
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - Greg Jensen
- Department of Neuroscience, Columbia University, New York, NY, USA
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
- Department of Psychology, Reed College, Portland, OR, USA
| | - Jacqueline Gottlieb
- Department of Neuroscience, Columbia University, New York, NY, USA.
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA.
- Kavli Institute for Brain Science, Columbia University, New York, NY, USA.
| |
Collapse
|
33
|
Dubourg E, Thouzeau V, de Dampierre C, Mogoutov A, Baumard N. Exploratory preferences explain the human fascination for imaginary worlds in fictional stories. Sci Rep 2023; 13:8657. [PMID: 37246187 DOI: 10.1038/s41598-023-35151-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 05/13/2023] [Indexed: 05/30/2023] Open
Abstract
Imaginary worlds are present and often central in many of the most culturally successful modern narrative fictions, be it in novels (e.g., Harry Potter), movies (e.g., Star Wars), video games (e.g., The Legend of Zelda), graphic novels (e.g., One Piece) and TV series (e.g., Game of Thrones). We propose that imaginary worlds are popular because they activate exploratory preferences that evolved to help us navigate the real world and find new fitness-relevant information. Therefore, we hypothesize that the attraction to imaginary worlds is intrinsically linked to the desire to explore novel environments and that both are influenced by the same underlying factors. Notably, the inter-individual and cross-cultural variability of the preference for imaginary worlds should follow the inter-individual and cross-cultural variability of exploratory preferences (with the personality trait Openness-to-experience, age, sex, and ecological conditions). We test these predictions with both experimental and computational methods. For experimental tests, we run a pre-registered online experiment about movie preferences (N = 230). For computational tests, we leverage two large cultural datasets, namely the Internet Movie Database (N = 9424 movies) and the Movie Personality Dataset (N = 3.5 million participants), and use machine-learning algorithms (i.e., random forest and topic modeling). In all, consistent with how the human preference for spatial exploration adaptively varies, we provide empirical evidence that imaginary worlds appeal more to more explorative people, people higher in Openness-to-experience, younger individuals, males, and individuals living in more affluent environments. We discuss the implications of these findings for our understanding of the cultural evolution of narrative fiction and, more broadly, the evolution of human exploratory preferences.
Collapse
Affiliation(s)
- Edgar Dubourg
- Institut Jean Nicod, Département d'études cognitives, Ecole normale supérieure, Université PSL, EHESS, CNRS, Paris, France.
| | - Valentin Thouzeau
- Institut Jean Nicod, Département d'études cognitives, Ecole normale supérieure, Université PSL, EHESS, CNRS, Paris, France
| | - Charles de Dampierre
- Institut Jean Nicod, Département d'études cognitives, Ecole normale supérieure, Université PSL, EHESS, CNRS, Paris, France
| | - Andrei Mogoutov
- Institut Jean Nicod, Département d'études cognitives, Ecole normale supérieure, Université PSL, EHESS, CNRS, Paris, France
| | - Nicolas Baumard
- Institut Jean Nicod, Département d'études cognitives, Ecole normale supérieure, Université PSL, EHESS, CNRS, Paris, France
| |
Collapse
|
34
|
Shourkeshti A, Marrocco G, Jurewicz K, Moore T, Ebitz RB. Pupil size predicts the onset of exploration in brain and behavior. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.24.541981. [PMID: 37292773 PMCID: PMC10245915 DOI: 10.1101/2023.05.24.541981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In uncertain environments, intelligent decision-makers exploit actions that have been rewarding in the past, but also explore actions that could be even better. Several neuromodulatory systems are implicated in exploration, based, in part, on work linking exploration to pupil size-a peripheral correlate of neuromodulatory tone and index of arousal. However, pupil size could instead track variables that make exploration more likely, like volatility or reward, without directly predicting either exploration or its neural bases. Here, we simultaneously measured pupil size, exploration, and neural population activity in the prefrontal cortex while two rhesus macaques explored and exploited in a dynamic environment. We found that pupil size under constant luminance specifically predicted the onset of exploration, beyond what could be explained by reward history. Pupil size also predicted disorganized patterns of prefrontal neural activity at both the single neuron and population levels, even within periods of exploitation. Ultimately, our results support a model in which pupil-linked mechanisms promote the onset of exploration via driving the prefrontal cortex through a critical tipping point where prefrontal control dynamics become disorganized and exploratory decisions are possible.
Collapse
Affiliation(s)
- Akram Shourkeshti
- Department of Neurosciences, Université de Montréal, Montréal, QC, Canada
| | - Gabriel Marrocco
- Department of Neurosciences, Université de Montréal, Montréal, QC, Canada
| | - Katarzyna Jurewicz
- Department of Neurosciences, Université de Montréal, Montréal, QC, Canada
- Department of Physiology, McGill University, Montréal, QC, Canada
| | - Tirin Moore
- Department of Neurobiology, Stanford University School of Medicine, Stanford, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - R. Becket Ebitz
- Department of Neurosciences, Université de Montréal, Montréal, QC, Canada
| |
Collapse
|
35
|
Maith O, Baladron J, Einhäuser W, Hamker FH. Exploration behavior after reversals is predicted by STN-GPe synaptic plasticity in a basal ganglia model. iScience 2023; 26:106599. [PMID: 37250300 PMCID: PMC10214406 DOI: 10.1016/j.isci.2023.106599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 02/02/2023] [Accepted: 03/29/2023] [Indexed: 05/31/2023] Open
Abstract
Humans can quickly adapt their behavior to changes in the environment. Classical reversal learning tasks mainly measure how well participants can disengage from a previously successful behavior but not how alternative responses are explored. Here, we propose a novel 5-choice reversal learning task with alternating position-reward contingencies to study exploration behavior after a reversal. We compare human exploratory saccade behavior with a prediction obtained from a neuro-computational model of the basal ganglia. A new synaptic plasticity rule for learning the connectivity between the subthalamic nucleus (STN) and external globus pallidus (GPe) results in exploration biases to previously rewarded positions. The model simulations and human data both show that during experimental experience exploration becomes limited to only those positions that have been rewarded in the past. Our study demonstrates how quite complex behavior may result from a simple sub-circuit within the basal ganglia pathways.
Collapse
Affiliation(s)
- Oliver Maith
- Department of Computer Science, Chemnitz University of Technology, Chemnitz, Germany
| | - Javier Baladron
- Department of Computer Science, Chemnitz University of Technology, Chemnitz, Germany
- Departamento de Ingeniería Informática, Universidad de Santiago de Chile, Santiago, Chile
| | - Wolfgang Einhäuser
- Institute of Physics, Chemnitz University of Technology, Chemnitz, Germany
| | - Fred H. Hamker
- Department of Computer Science, Chemnitz University of Technology, Chemnitz, Germany
| |
Collapse
|
36
|
Chernyshev BV, Pultsina KI, Tretyakova VD, Miasnikova AS, Prokofyev AO, Kozunova GL, Stroganova TA. Losses resulting from deliberate exploration trigger beta oscillations in frontal cortex. Front Neurosci 2023; 17:1152926. [PMID: 37250414 PMCID: PMC10211346 DOI: 10.3389/fnins.2023.1152926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Accepted: 04/24/2023] [Indexed: 05/31/2023] Open
Abstract
We examined the neural signature of directed exploration by contrasting MEG beta (16-30 Hz) power changes between disadvantageous and advantageous choices in the two-choice probabilistic reward task. We analyzed the choices made after the participants have learned the probabilistic contingency between choices and their outcomes, i.e., acquired the inner model of choice values. Therefore, rare disadvantageous choices might serve explorative, environment-probing purposes. The study brought two main findings. Firstly, decision making leading to disadvantageous choices took more time and evidenced greater large-scale suppression of beta oscillations than its advantageous alternative. Additional neural resources recruited during disadvantageous decisions strongly suggest their deliberately explorative nature. Secondly, an outcome of disadvantageous and advantageous choices had qualitatively different impact on feedback-related beta oscillations. After the disadvantageous choices, only losses-but not gains-were followed by late beta synchronization in frontal cortex. Our results are consistent with the role of frontal beta oscillations in the stabilization of neural representations for selected behavioral rule when explorative strategy conflicts with value-based behavior. Punishment for explorative choice being congruent with its low value in the reward history is more likely to strengthen, through punishment-related beta oscillations, the representation of exploitative choices consistent with the inner utility model.
Collapse
Affiliation(s)
- Boris V. Chernyshev
- Center for Neurocognitive Research (MEG Center), Moscow State University of Psychology and Education, Moscow, Russia
- Department of Higher Nervous Activity, Lomonosov Moscow State University, Moscow, Russia
- Department of Psychology, Higher School of Economics, Moscow, Russia
| | - Kristina I. Pultsina
- Center for Neurocognitive Research (MEG Center), Moscow State University of Psychology and Education, Moscow, Russia
| | - Vera D. Tretyakova
- Center for Neurocognitive Research (MEG Center), Moscow State University of Psychology and Education, Moscow, Russia
| | - Aleksandra S. Miasnikova
- Center for Neurocognitive Research (MEG Center), Moscow State University of Psychology and Education, Moscow, Russia
| | - Andrey O. Prokofyev
- Center for Neurocognitive Research (MEG Center), Moscow State University of Psychology and Education, Moscow, Russia
| | - Galina L. Kozunova
- Center for Neurocognitive Research (MEG Center), Moscow State University of Psychology and Education, Moscow, Russia
| | - Tatiana A. Stroganova
- Center for Neurocognitive Research (MEG Center), Moscow State University of Psychology and Education, Moscow, Russia
| |
Collapse
|
37
|
Wang S, Gerken B, Wieland JR, Wilson RC, Fellous JM. The effects of time horizon and guided choices on explore-exploit decisions in rodents. Behav Neurosci 2023; 137:127-142. [PMID: 36633987 PMCID: PMC10787949 DOI: 10.1037/bne0000549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Humans and animals have to balance the need for exploring new options with exploiting known options that yield good outcomes. This tradeoff is known as the explore-exploit dilemma. To better understand the neural mechanisms underlying how humans and animals address the explore-exploit dilemma, a good animal behavioral model is critical. Most previous rodents explore-exploit studies used ethologically unrealistic operant boxes and reversal learning paradigms in which the decision to abandon a bad option is confounded by the need for exploring a novel option for information collection, making it difficult to separate different drives and heuristics for exploration. In this study, we investigated how rodents make explore-exploit decisions using a spatial navigation horizon task (Wilson et al., 2014) adapted to rats to address the above limitations. We compared the rats' performance to that of humans using identical measures. We showed that rats use prior information to effectively guide exploration. In addition, rats use information-driven directed exploration like humans, but the extent to which they explore has the opposite dependance on time horizon than humans. Moreover, we found that free choices and guided choices have different influences on exploration in rodents, a finding that has not yet been tested in humans. This study reveals that the explore-exploit spatial behavior of rats is more complex than previously thought. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
Collapse
|
38
|
Xu T, Singh K, Rajivan P. Personalized persuasion: Quantifying susceptibility to information exploitation in spear-phishing attacks. APPLIED ERGONOMICS 2023; 108:103908. [PMID: 36403509 DOI: 10.1016/j.apergo.2022.103908] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 06/10/2022] [Accepted: 09/20/2022] [Indexed: 06/16/2023]
Abstract
Many cyberattacks begin with a malicious email message, known as spear phishing, targeted at unsuspecting victims. Although security technologies have improved significantly in recent years, spear phishing continues to be successful due to the bespoke nature of such attacks. Crafting such emails requires attackers to conduct careful research about their victims and collect personal information about them and their acquaintances. Despite the widespread nature of spear-phishing attacks, little is understood about the human factors behind them. This is particularly the case when considering the role of attack personalization on end-user vulnerability. To study spear-phishing attacks in the laboratory, we developed a simulation environment called SpearSim that simulates the tasks involved in the generation and reception of spear-phishing messages. Using SpearSim, we conducted a laboratory experiment with human subjects to study the effect of information availability and information exploitation end-user vulnerability. The results of the experiment show that end-users in the high information-availability condition were 2.97 times more vulnerable to spear-phishing attacks than those in the low information-availability condition. We found that access to more personal information about targets can result in attacks involving contextually meaningful impersonation and narratives. We discuss the implications of this research for the design of anti-phishing training solutions.
Collapse
Affiliation(s)
- Tianhao Xu
- University of Washington, Department of Industrial and System Engineering, United States
| | - Kuldeep Singh
- The University of Texas at El Paso, Department of Computer Science, United States
| | - Prashanth Rajivan
- University of Washington, Department of Industrial and System Engineering, United States.
| |
Collapse
|
39
|
Lee JK, Rouault M, Wyart V. Adaptive tuning of human learning and choice variability to unexpected uncertainty. SCIENCE ADVANCES 2023; 9:eadd0501. [PMID: 36989365 PMCID: PMC10058239 DOI: 10.1126/sciadv.add0501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 02/28/2023] [Indexed: 06/19/2023]
Abstract
Human value-based decisions are notably variable under uncertainty. This variability is known to arise from two distinct sources: variable choices aimed at exploring available options and imprecise learning of option values due to limited cognitive resources. However, whether these two sources of decision variability are tuned to their specific costs and benefits remains unclear. To address this question, we compared the effects of expected and unexpected uncertainty on decision-making in the same reinforcement learning task. Across two large behavioral datasets, we found that humans choose more variably between options but simultaneously learn less imprecisely their values in response to unexpected uncertainty. Using simulations of learning agents, we demonstrate that these opposite adjustments reflect adaptive tuning of exploration and learning precision to the structure of uncertainty. Together, these findings indicate that humans regulate not only how much they explore uncertain options but also how precisely they learn the values of these options.
Collapse
Affiliation(s)
- Junseok K. Lee
- Laboratoire de Neurosciences Cognitives et Computationnelles, Institut National de la Santé et de la Recherche Médicale (Inserm), Paris, France
- Département d’Études Cognitives, École Normale Supérieure, Université PSL, Paris, France
| | - Marion Rouault
- Laboratoire de Neurosciences Cognitives et Computationnelles, Institut National de la Santé et de la Recherche Médicale (Inserm), Paris, France
- Département d’Études Cognitives, École Normale Supérieure, Université PSL, Paris, France
| | - Valentin Wyart
- Laboratoire de Neurosciences Cognitives et Computationnelles, Institut National de la Santé et de la Recherche Médicale (Inserm), Paris, France
- Département d’Études Cognitives, École Normale Supérieure, Université PSL, Paris, France
- Institut du Psychotraumatisme de l’Enfant et de l’Adolescent, Conseil Départemental Yvelines et Hauts-de-Seine, Versailles, France
| |
Collapse
|
40
|
Barnett WH, Kuznetsov A, Lapish CC. Distinct cortico-striatal compartments drive competition between adaptive and automatized behavior. PLoS One 2023; 18:e0279841. [PMID: 36943842 PMCID: PMC10030038 DOI: 10.1371/journal.pone.0279841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Accepted: 12/15/2022] [Indexed: 03/23/2023] Open
Abstract
Cortical and basal ganglia circuits play a crucial role in the formation of goal-directed and habitual behaviors. In this study, we investigate the cortico-striatal circuitry involved in learning and the role of this circuitry in the emergence of inflexible behaviors such as those observed in addiction. Specifically, we develop a computational model of cortico-striatal interactions that performs concurrent goal-directed and habit learning. The model accomplishes this by distinguishing learning processes in the dorsomedial striatum (DMS) that rely on reward prediction error signals as distinct from the dorsolateral striatum (DLS) where learning is supported by salience signals. These striatal subregions each operate on unique cortical input: the DMS receives input from the prefrontal cortex (PFC) which represents outcomes, and the DLS receives input from the premotor cortex which determines action selection. Following an initial learning of a two-alternative forced choice task, we subjected the model to reversal learning, reward devaluation, and learning a punished outcome. Behavior driven by stimulus-response associations in the DLS resisted goal-directed learning of new reward feedback rules despite devaluation or punishment, indicating the expression of habit. We repeated these simulations after the impairment of executive control, which was implemented as poor outcome representation in the PFC. The degraded executive control reduced the efficacy of goal-directed learning, and stimulus-response associations in the DLS were even more resistant to the learning of new reward feedback rules. In summary, this model describes how circuits of the dorsal striatum are dynamically engaged to control behavior and how the impairment of executive control by the PFC enhances inflexible behavior.
Collapse
Affiliation(s)
- William H. Barnett
- Department of Psychology, Indiana University—Purdue University Indianapolis, Indianapolis, Indiana, United States of America
| | - Alexey Kuznetsov
- Department of Mathematics, Indiana University—Purdue University Indianapolis, Indianapolis, Indiana, United States of America
| | - Christopher C. Lapish
- Department of Psychology, Indiana University—Purdue University Indianapolis, Indianapolis, Indiana, United States of America
- Stark Neurosciences Research Institute, Indiana University—Purdue University Indianapolis, Indianapolis, Indiana, United States of America
| |
Collapse
|
41
|
Lyu J, Yang H, Christie S. Mommy, Can I Play Outside? How Urban Design Influences Parental Attitudes on Play. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:4909. [PMID: 36981816 PMCID: PMC10048976 DOI: 10.3390/ijerph20064909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 02/27/2023] [Accepted: 03/01/2023] [Indexed: 06/18/2023]
Abstract
Although play results in physical, social, and cognitive benefits, there is a consensus that children's opportunities to play have been reduced, particularly for those who live in urban environments. What are the barriers to play, and how can we mitigate them? This review examines a critical factor in play opportunities: parents as the decision-makers with regard to children's play. Using perspectives from psychology, urban design, and cognitive science, we analyze the relationships between the design of built environments, parental attitudes and beliefs, and parental decisions on allowing children to play. For example, can a new implementation of children-centered urban design change parents' skeptical attitude toward play? By drawing from global studies, we chart (1) the three key beliefs of parents regarding play and built environments: play should benefit learning, be safe, and match the child's competence and (2) the design principles that can foster these beliefs: learning, social, and progressive challenge designs. By making the link between parents, urban design, and play explicit, this paper aims to inform parents, educators, policymakers, urban planners, and architects on the evidence-based measures for creating and increasing opportunities to play.
Collapse
Affiliation(s)
- Jinyun Lyu
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
- Department of Psychology, Tsinghua University, Beijing 100084, China
| | - Huiying Yang
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
- Department of Psychology, Tsinghua University, Beijing 100084, China
| | - Stella Christie
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
- Department of Psychology, Tsinghua University, Beijing 100084, China
| |
Collapse
|
42
|
Cisler JM, Tamman AJF, Fonzo GA. Diminished prospective mental representations of reward mediate reward learning strategies among youth with internalizing symptoms. Psychol Med 2023; 53:1-11. [PMID: 36878892 PMCID: PMC10600826 DOI: 10.1017/s0033291723000478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 01/09/2023] [Accepted: 02/08/2023] [Indexed: 03/08/2023]
Abstract
BACKGROUND Adolescent internalizing symptoms and trauma exposure have been linked with altered reward learning processes and decreased ventral striatal responses to rewarding cues. Recent computational work on decision-making highlights an important role for prospective representations of the imagined outcomes of different choices. This study tested whether internalizing symptoms and trauma exposure among youth impact the generation of prospective reward representations during decision-making and potentially mediate altered behavioral strategies during reward learning. METHODS Sixty-one adolescent females with varying exposure to interpersonal violence exposure (n = 31 with histories of physical or sexual assault) and severity of internalizing symptoms completed a social reward learning task during fMRI. Multivariate pattern analyses (MVPA) were used to decode neural reward representations at the time of choice. RESULTS MVPA demonstrated that rewarding outcomes could accurately be decoded within several large-scale distributed networks (e.g. frontoparietal and striatum networks), that these reward representations were reactivated prospectively at the time of choice in proportion to the expected probability of receiving reward, and that youth with behavioral strategies that favored exploiting high reward options demonstrated greater prospective generation of reward representations. Youth internalizing symptoms, but not trauma exposure characteristics, were negatively associated with both the behavioral strategy of exploiting high reward options as well as the prospective generation of reward representations in the striatum. CONCLUSIONS These data suggest diminished prospective mental simulation of reward as a mechanism of altered reward learning strategies among youth with internalizing symptoms.
Collapse
Affiliation(s)
- Josh M. Cisler
- Department of Psychiatry and Behavioral Sciences, Dell Medical School, University of Texas at Austin, USA
- Institute for Early Life Adversity Research, Dell Medical School, University of Texas at Austin, USA
| | - Amanda J. F. Tamman
- Menninger Department of Psychiatry and Behavioral Sciences, Baylor College of Medicine, Houston, TX, USA
| | - Greg A. Fonzo
- Department of Psychiatry and Behavioral Sciences, Dell Medical School, University of Texas at Austin, USA
- Institute for Early Life Adversity Research, Dell Medical School, University of Texas at Austin, USA
- Center for Psychedelic Research and Therapy, Dell Medical School, University of Texas at Austin, USA
| |
Collapse
|
43
|
Speers LJ, Bilkey DK. Maladaptive explore/exploit trade-offs in schizophrenia. Trends Neurosci 2023; 46:341-354. [PMID: 36878821 DOI: 10.1016/j.tins.2023.02.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 01/30/2023] [Accepted: 02/08/2023] [Indexed: 03/07/2023]
Abstract
Schizophrenia is a complex disorder that remains poorly understood, particularly at the systems level. In this opinion article we argue that the explore/exploit trade-off concept provides a holistic and ecologically valid framework to resolve some of the apparent paradoxes that have emerged within schizophrenia research. We review recent evidence suggesting that fundamental explore/exploit behaviors may be maladaptive in schizophrenia during physical, visual, and cognitive foraging. We also describe how theories from the broader optimal foraging literature, such as the marginal value theorem (MVT), could provide valuable insight into how aberrant processing of reward, context, and cost/effort evaluations interact to produce maladaptive responses.
Collapse
Affiliation(s)
- Lucinda J Speers
- Department of Psychology, University of Otago, Dunedin 9016, New Zealand
| | - David K Bilkey
- Department of Psychology, University of Otago, Dunedin 9016, New Zealand.
| |
Collapse
|
44
|
Yang CY, Shiranthika C, Wang CY, Chen KW, Sumathipala S. Reinforcement learning strategies in cancer chemotherapy treatments: A review. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 229:107280. [PMID: 36529000 DOI: 10.1016/j.cmpb.2022.107280] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 11/20/2022] [Accepted: 11/25/2022] [Indexed: 06/17/2023]
Abstract
BACKGROUND AND OBJECTIVE Cancer is one of the major causes of death worldwide and chemotherapies are the most significant anti-cancer therapy, in spite of the emerging precision cancer medicines in the last 2 decades. The growing interest in developing the effective chemotherapy regimen with optimal drug dosing schedule to benefit the clinical cancer patients has spawned innovative solutions involving mathematical modeling since the chemotherapy regimens are administered cyclically until the futility or the occurrence of intolerable adverse events. Thus, in this present work, we reviewed the emerging trends involved in forming a computational solution from the aspect of reinforcement learning. METHODS Initially, this survey in-depth focused on the details of the dynamic treatment regimens from a broad perspective and then narrowed down to inspirations from reinforcement learning that were advantageous to chemotherapy dosing, including both offline reinforcement learning and supervised reinforcement learning. RESULTS The insights established in the chemotherapy-planning problem associated with the Reinforcement Learning (RL) has been discussed in this study. It showed that the researchers were able to widen their perspectives in comprehending the theoretical basis, dynamic treatment regimens (DTR), use of the adaptive control on DTR, and the associated RL techniques. CONCLUSIONS This study reviewed the recent researches relevant to the topic, and highlighted the challenges, open questions, possible solutions, and future steps in inventing a realistic solution for the aforementioned problem.
Collapse
Affiliation(s)
- Chan-Yun Yang
- Department of Electrical Engineering, National Taipei University, New Taipei City, Taiwan
| | - Chamani Shiranthika
- Department of Electrical Engineering, National Taipei University, New Taipei City, Taiwan
| | - Chung-Yih Wang
- Department of Radiation Oncology, Cheng Hsin General Hospital, Taipei City, Taiwan
| | - Kuo-Wei Chen
- Section of Hematology and Oncology, Department of Internal Medicine, Cheng Hsin General Hospital, Taipei City, Taiwan.
| | - Sagara Sumathipala
- Faculty of Information Technology, University of Moratuwa, Katubedda, Moratuwa, Sri Lanka
| |
Collapse
|
45
|
Jahn CI, Grohn J, Cuell S, Emberton A, Bouret S, Walton ME, Kolling N, Sallet J. Neural responses in macaque prefrontal cortex are linked to strategic exploration. PLoS Biol 2023; 21:e3001985. [PMID: 36716348 PMCID: PMC9910800 DOI: 10.1371/journal.pbio.3001985] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 02/09/2023] [Accepted: 01/03/2023] [Indexed: 02/01/2023] Open
Abstract
Humans have been shown to strategically explore. They can identify situations in which gathering information about distant and uncertain options is beneficial for the future. Because primates rely on scarce resources when they forage, they are also thought to strategically explore, but whether they use the same strategies as humans and the neural bases of strategic exploration in monkeys are largely unknown. We designed a sequential choice task to investigate whether monkeys mobilize strategic exploration based on whether information can improve subsequent choice, but also to ask the novel question about whether monkeys adjust their exploratory choices based on the contingency between choice and information, by sometimes providing the counterfactual feedback about the unchosen option. We show that monkeys decreased their reliance on expected value when exploration could be beneficial, but this was not mediated by changes in the effect of uncertainty on choices. We found strategic exploratory signals in anterior and mid-cingulate cortex (ACC/MCC) and dorsolateral prefrontal cortex (dlPFC). This network was most active when a low value option was chosen, which suggests a role in counteracting expected value signals, when exploration away from value should to be considered. Such strategic exploration was abolished when the counterfactual feedback was available. Learning from counterfactual outcome was associated with the recruitment of a different circuit centered on the medial orbitofrontal cortex (OFC), where we showed that monkeys represent chosen and unchosen reward prediction errors. Overall, our study shows how ACC/MCC-dlPFC and OFC circuits together could support exploitation of available information to the fullest and drive behavior towards finding more information through exploration when it is beneficial.
Collapse
Affiliation(s)
- Caroline I. Jahn
- Wellcome Centre for Integrative Neuroimaging, Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom
- Motivation, Brain and Behavior Team, Institut du Cerveau et de la Moelle Epinière, Paris, France
- Sorbonne Paris Cité universités, Université Paris Descartes, Frontières du Vivant, Paris, France
- * E-mail: (CIJ); (JG); (NK); (JS)
| | - Jan Grohn
- Wellcome Centre for Integrative Neuroimaging, Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom
- * E-mail: (CIJ); (JG); (NK); (JS)
| | - Steven Cuell
- Wellcome Centre for Integrative Neuroimaging, Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom
| | - Andrew Emberton
- Biomedical Science Services, University of Oxford, Oxford, United Kingdom
| | - Sebastien Bouret
- Motivation, Brain and Behavior Team, Institut du Cerveau et de la Moelle Epinière, Paris, France
| | - Mark E. Walton
- Wellcome Centre for Integrative Neuroimaging, Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom
| | - Nils Kolling
- Wellcome Centre for Integrative Neuroimaging, OBHA, University of Oxford, Headington, United Kingdom
- Univ Lyon, Université Lyon 1, Inserm, Stem Cell and Brain Research Institute U1208, Bron, France
- * E-mail: (CIJ); (JG); (NK); (JS)
| | - Jérôme Sallet
- Wellcome Centre for Integrative Neuroimaging, Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom
- Univ Lyon, Université Lyon 1, Inserm, Stem Cell and Brain Research Institute U1208, Bron, France
- * E-mail: (CIJ); (JG); (NK); (JS)
| |
Collapse
|
46
|
Goudar V, Kim JW, Liu Y, Dede AJO, Jutras MJ, Skelin I, Ruvalcaba M, Chang W, Fairhall AL, Lin JJ, Knight RT, Buffalo EA, Wang XJ. Comparing rapid rule-learning strategies in humans and monkeys. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.10.523416. [PMID: 36711889 PMCID: PMC9882042 DOI: 10.1101/2023.01.10.523416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Inter-species comparisons are key to deriving an understanding of the behavioral and neural correlates of human cognition from animal models. We perform a detailed comparison of macaque monkey and human strategies on an analogue of the Wisconsin Card Sort Test, a widely studied and applied multi-attribute measure of cognitive function, wherein performance requires the inference of a changing rule given ambiguous feedback. We found that well-trained monkeys rapidly infer rules but are three times slower than humans. Model fits to their choices revealed hidden states akin to feature-based attention in both species, and decision processes that resembled a Win-stay lose-shift strategy with key differences. Monkeys and humans test multiple rule hypotheses over a series of rule-search trials and perform inference-like computations to exclude candidates. An attention-set based learning stage categorization revealed that perseveration, random exploration and poor sensitivity to negative feedback explain the under-performance in monkeys.
Collapse
Affiliation(s)
- Vishwa Goudar
- Center for Neural Science, New York University, NY, USA
| | - Jeong-Woo Kim
- Center for Neural Science, New York University, NY, USA
| | - Yue Liu
- Center for Neural Science, New York University, NY, USA
| | - Adam J. O. Dede
- Department of Physiology and Biophysics, University of Washington, Seattle, WA, USA
| | - Michael J. Jutras
- Department of Physiology and Biophysics, University of Washington, Seattle, WA, USA
| | - Ivan Skelin
- Department of Neurology, University of California, Davis, Davis, CA, USA
- The Center for Mind and Brain, University of California, Davis, Davis, CA, USA
| | - Michael Ruvalcaba
- Helen Wills Neuroscience Institute, University of California Berkeley, Berkeley, CA, USA
| | - William Chang
- Helen Wills Neuroscience Institute, University of California Berkeley, Berkeley, CA, USA
| | - Adrienne L. Fairhall
- Department of Physiology and Biophysics, University of Washington, Seattle, WA, USA
| | - Jack J. Lin
- Department of Neurology, University of California, Davis, Davis, CA, USA
- The Center for Mind and Brain, University of California, Davis, Davis, CA, USA
| | - Robert T. Knight
- Department of Psychology, University of California Berkeley, Berkeley, CA, USA
- Helen Wills Neuroscience Institute, University of California Berkeley, Berkeley, CA, USA
| | - Elizabeth A. Buffalo
- Department of Physiology and Biophysics, University of Washington, Seattle, WA, USA
- Washington Primate Research Center, University of Washington, Seattle, WA, USA
| | | |
Collapse
|
47
|
Trait somatic anxiety is associated with reduced directed exploration and underestimation of uncertainty. Nat Hum Behav 2023; 7:102-113. [PMID: 36192493 DOI: 10.1038/s41562-022-01455-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 08/26/2022] [Indexed: 02/01/2023]
Abstract
Anxiety has been related to decreased physical exploration, but past findings on the interaction between anxiety and exploration during decision making were inconclusive. Here we examined how latent factors of trait anxiety relate to different exploration strategies when facing volatility-induced uncertainty. Across two studies (total N = 985), we demonstrated that people used a hybrid of directed, random and undirected exploration strategies, which were respectively sensitive to relative uncertainty, total uncertainty and value difference. Trait somatic anxiety, that is, the propensity to experience physical symptoms of anxiety, was inversely correlated with directed exploration and undirected exploration, manifesting as a lesser likelihood for choosing the uncertain option and reducing choice stochasticity regardless of uncertainty. Somatic anxiety is also associated with underestimation of relative uncertainty. Together, these results reveal the selective role of trait somatic anxiety in modulating both uncertainty-driven and value-driven exploration strategies.
Collapse
|
48
|
Rizk-Allah RM, Hagag EA, El-Fergany AA. Chaos-enhanced multi-objective tunicate swarm algorithm for economic-emission load dispatch problem. Soft comput 2022. [DOI: 10.1007/s00500-022-07794-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
AbstractClimate change and environmental protection have a significant impact on thermal plants. So, the main principles of combined economic-emission dispatch (CEED) problem are indeed to reduce greenhouse gas emissions and fuel costs. Many approaches have demonstrated their efficacy in addressing CEED problem. However, designing a robust algorithm capable of achieving the Pareto optimal solutions under its multimodality and non-convexity natures caused by valve ripple effects is a true challenge. In this paper, chaos-enhanced multi-objective tunicate swarm algorithm (CMOTSA) for CEED problem. To promote the exploration and exploitation abilities of the basic tunicate swarm algorithm (TSA), an exponential strategy based on chaotic logistic map (ESCL) is incorporated. Based on ESCL in CMOTSA, it can improve the possibility of diversification feature to search different areas within the solution space, and then, gradually with the progress of iterative process it converts to emphasize the intensification ability. The efficacy of CMOTSA is approved by applying it to some of multi-objective benchmarking functions which have different Pareto front characteristics including convex, discrete, and non-convex. The inverted generational distance (IGD) and generational distance (GD) are employed to assess the robustness and the good quality of CMOTSA against some successful algorithms. Additionally, the computational time is evaluated, the CMOTSA consumes less time for most functions. The CMOTSA is applied to one of the practical engineering problems such as combined economic and emission dispatch (CEED) with including the valve ripples. By using three different systems (IEEE 30-bus with 6 generators system, 10 units system and IEEE 118-bus with 14 generating units), the methodology validation is made. It can be stated for the large-scale case of 118-bus systems that the results of the CMOTSA are equal to 8741.3 $/h for the minimum cost and 2747.6 ton/h for the minimum emission which are very viable to others. It can be pointed out that the cropped results of the proposed CMOTSA based methodology as an efficient tool for CEED is proven.
Collapse
|
49
|
Grilli MD, Sheldon S. Autobiographical event memory and aging: older adults get the gist. Trends Cogn Sci 2022; 26:1079-1089. [PMID: 36195539 PMCID: PMC9669242 DOI: 10.1016/j.tics.2022.09.007] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Revised: 08/15/2022] [Accepted: 09/07/2022] [Indexed: 01/12/2023]
Abstract
We propose that older adults' ability to retrieve episodic autobiographical events, although often viewed through a lens of decline, reveals much about what is preserved and prioritized in cognitive aging. Central to our proposal is the idea that the so-called gist of an autobiographical event is not only spared with normal aging but also well adapted to serve memory-guided behavior in older age. To support our proposal, we review cognitive and brain evidence indicating an age-related shift toward gist memory. We then discuss why this shift likely arises from more than age-related decline and instead partly reflects a natural, arguably adaptive, outcome of experience, motivation, and mode-of-thinking factors. Our proposal reveals an upside of age-related memory changes and identifies important research questions.
Collapse
Affiliation(s)
- Matthew D Grilli
- Department of Psychology, The University of Arizona, Tucson, AZ 85721, USA.
| | - Signy Sheldon
- Department of Psychology, McGill University, Montreal, QC, H3A 1G1, Canada.
| |
Collapse
|
50
|
Gijsen S, Grundei M, Blankenburg F. Active inference and the two-step task. Sci Rep 2022; 12:17682. [PMID: 36271279 PMCID: PMC9586964 DOI: 10.1038/s41598-022-21766-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Accepted: 09/30/2022] [Indexed: 01/18/2023] Open
Abstract
Sequential decision problems distill important challenges frequently faced by humans. Through repeated interactions with an uncertain world, unknown statistics need to be learned while balancing exploration and exploitation. Reinforcement learning is a prominent method for modeling such behaviour, with a prevalent application being the two-step task. However, recent studies indicate that the standard reinforcement learning model sometimes describes features of human task behaviour inaccurately and incompletely. We investigated whether active inference, a framework proposing a trade-off to the exploration-exploitation dilemma, could better describe human behaviour. Therefore, we re-analysed four publicly available datasets of the two-step task, performed Bayesian model selection, and compared behavioural model predictions. Two datasets, which revealed more model-based inference and behaviour indicative of directed exploration, were better described by active inference, while the models scored similarly for the remaining datasets. Learning using probability distributions appears to contribute to the improved model fits. Further, approximately half of all participants showed sensitivity to information gain as formulated under active inference, although behavioural exploration effects were not fully captured. These results contribute to the empirical validation of active inference as a model of human behaviour and the study of alternative models for the influential two-step task.
Collapse
Affiliation(s)
- Sam Gijsen
- grid.14095.390000 0000 9116 4836Neurocomputation and Neuroimaging Unit, Freie Universität Berlin, 14195 Berlin, Germany ,grid.7468.d0000 0001 2248 7639Berlin School of Mind and Brain, Humboldt-Universität zu Berlin, 10117 Berlin, Germany
| | - Miro Grundei
- grid.14095.390000 0000 9116 4836Neurocomputation and Neuroimaging Unit, Freie Universität Berlin, 14195 Berlin, Germany ,grid.7468.d0000 0001 2248 7639Berlin School of Mind and Brain, Humboldt-Universität zu Berlin, 10117 Berlin, Germany
| | - Felix Blankenburg
- grid.14095.390000 0000 9116 4836Neurocomputation and Neuroimaging Unit, Freie Universität Berlin, 14195 Berlin, Germany ,grid.7468.d0000 0001 2248 7639Berlin School of Mind and Brain, Humboldt-Universität zu Berlin, 10117 Berlin, Germany
| |
Collapse
|