1
|
Gershman SJ, Lak A. Policy complexity suppresses dopamine responses. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.15.613150. [PMID: 39345642 PMCID: PMC11429712 DOI: 10.1101/2024.09.15.613150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Limits on information processing capacity impose limits on task performance. We show that animals achieve performance on a perceptual decision task that is near-optimal given their capacity limits, as measured by policy complexity (the mutual information between states and actions). This behavioral profile could be achieved by reinforcement learning with a penalty on high complexity policies, realized through modulation of dopaminergic learning signals. In support of this hypothesis, we find that policy complexity suppresses midbrain dopamine responses to reward outcomes, thereby reducing behavioral sensitivity to these outcomes. Our results suggest that policy compression shapes basic mechanisms of reinforcement learning in the brain.
Collapse
|
2
|
Paunov A, L’Hôtellier M, Guo D, He Z, Yu A, Meyniel F. Multiple and subject-specific roles of uncertainty in reward-guided decision-making. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.27.587016. [PMID: 38585958 PMCID: PMC10996615 DOI: 10.1101/2024.03.27.587016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Decision-making in noisy, changing, and partially observable environments entails a basic tradeoff between immediate reward and longer-term information gain, known as the exploration-exploitation dilemma. Computationally, an effective way to balance this tradeoff is by leveraging uncertainty to guide exploration. Yet, in humans, empirical findings are mixed, from suggesting uncertainty-seeking to indifference and avoidance. In a novel bandit task that better captures uncertainty-driven behavior, we find multiple roles for uncertainty in human choices. First, stable and psychologically meaningful individual differences in uncertainty preferences actually range from seeking to avoidance, which can manifest as null group-level effects. Second, uncertainty modulates the use of basic decision heuristics that imperfectly exploit immediate rewards: a repetition bias and win-stay-lose-shift heuristic. These heuristics interact with uncertainty, favoring heuristic choices under higher uncertainty. These results, highlighting the rich and varied structure of reward-based choice, are a step to understanding its functional basis and dysfunction in psychopathology.
Collapse
Affiliation(s)
- Alexander Paunov
- INSERM-CEA Cognitive Neuroimaging Unit (UNICOG), NeuroSpin Center, CEA Paris-Saclay, Gif-sur-Yvette, France Université de Paris, Paris, France
- Institut de Neuromodulation, GHU Paris, Psychiatrie et Neurosciences, Centre Hospitalier Sainte-Anne, Pôle Hospitalo-Universitaire 15, Université Paris Cité, Paris, France
| | - Maëva L’Hôtellier
- INSERM-CEA Cognitive Neuroimaging Unit (UNICOG), NeuroSpin Center, CEA Paris-Saclay, Gif-sur-Yvette, France Université de Paris, Paris, France
| | - Dalin Guo
- Department of Cognitive Science, University of California San Diego, San Diego, CA, USA
| | - Zoe He
- Department of Cognitive Science, University of California San Diego, San Diego, CA, USA
| | - Angela Yu
- Department of Cognitive Science, University of California San Diego, San Diego, CA, USA
- Centre for Cognitive Science & Hessian AI Center, Technical University of Darmstadt, Germany
| | - Florent Meyniel
- INSERM-CEA Cognitive Neuroimaging Unit (UNICOG), NeuroSpin Center, CEA Paris-Saclay, Gif-sur-Yvette, France Université de Paris, Paris, France
- Institut de Neuromodulation, GHU Paris, Psychiatrie et Neurosciences, Centre Hospitalier Sainte-Anne, Pôle Hospitalo-Universitaire 15, Université Paris Cité, Paris, France
| |
Collapse
|
3
|
Fang Z, Zhao M, Xu T, Li Y, Xie H, Quan P, Geng H, Zhang RY. Individuals with anxiety and depression use atypical decision strategies in an uncertain world. eLife 2024; 13:RP93887. [PMID: 39255007 PMCID: PMC11386953 DOI: 10.7554/elife.93887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/11/2024] Open
Abstract
Previous studies on reinforcement learning have identified three prominent phenomena: (1) individuals with anxiety or depression exhibit a reduced learning rate compared to healthy subjects; (2) learning rates may increase or decrease in environments with rapidly changing (i.e. volatile) or stable feedback conditions, a phenomenon termed learning rate adaptation; and (3) reduced learning rate adaptation is associated with several psychiatric disorders. In other words, multiple learning rate parameters are needed to account for behavioral differences across participant populations and volatility contexts in this flexible learning rate (FLR) model. Here, we propose an alternative explanation, suggesting that behavioral variation across participant populations and volatile contexts arises from the use of mixed decision strategies. To test this hypothesis, we constructed a mixture-of-strategies (MOS) model and used it to analyze the behaviors of 54 healthy controls and 32 patients with anxiety and depression in volatile reversal learning tasks. Compared to the FLR model, the MOS model can reproduce the three classic phenomena by using a single set of strategy preference parameters without introducing any learning rate differences. In addition, the MOS model can successfully account for several novel behavioral patterns that cannot be explained by the FLR model. Preferences for different strategies also predict individual variations in symptom severity. These findings underscore the importance of considering mixed strategy use in human learning and decision-making and suggest atypical strategy preference as a potential mechanism for learning deficits in psychiatric disorders.
Collapse
Affiliation(s)
- Zeming Fang
- Shanghai Mental Health Center, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
- School of Psychology, Shanghai Jiao Tong University, Shanghai, China
| | - Meihua Zhao
- School of Psychology, South China Normal University, Guangzhou, China
| | - Ting Xu
- The Center of Psychosomatic Medicine, Sichuan Provincial Center for Mental Health, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu, China
| | - Yuhang Li
- Centre of Centre for Cognitive and Brain Sciences, Institute of Collaborative Innovation, University of Macau, Macau, China
| | - Hanbo Xie
- Department of Psychology, University of Arizona, Tucson, United States
| | - Peng Quan
- School of Humanities and Management, Guangdong Medical University, Dongguan, China
| | - Haiyang Geng
- Tianqiao and Chrissy Chen Institute for Translational Research, Shanghai, China
| | - Ru-Yuan Zhang
- Shanghai Mental Health Center, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
- School of Psychology, Shanghai Jiao Tong University, Shanghai, China
- Shanghai Key Laboratory of Mental Health and Psychological Crisis Intervention, School of Psychology and Cognitive Science, East China Normal University, Shanghai, China
| |
Collapse
|
4
|
Arumugam D, Ho MK, Goodman ND, Van Roy B. Bayesian Reinforcement Learning With Limited Cognitive Load. Open Mind (Camb) 2024; 8:395-438. [PMID: 38665544 PMCID: PMC11045037 DOI: 10.1162/opmi_a_00132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Accepted: 02/16/2024] [Indexed: 04/28/2024] Open
Abstract
All biological and artificial agents must act given limits on their ability to acquire and process information. As such, a general theory of adaptive behavior should be able to account for the complex interactions between an agent's learning history, decisions, and capacity constraints. Recent work in computer science has begun to clarify the principles that shape these dynamics by bridging ideas from reinforcement learning, Bayesian decision-making, and rate-distortion theory. This body of work provides an account of capacity-limited Bayesian reinforcement learning, a unifying normative framework for modeling the effect of processing constraints on learning and action selection. Here, we provide an accessible review of recent algorithms and theoretical results in this setting, paying special attention to how these ideas can be applied to studying questions in the cognitive and behavioral sciences.
Collapse
Affiliation(s)
| | - Mark K. Ho
- Center for Data Science, New York University
| | - Noah D. Goodman
- Department of Computer Science, Stanford University
- Department of Psychology, Stanford University
| | - Benjamin Van Roy
- Department of Electrical Engineering, Stanford University
- Department of Management Science & Engineering, Stanford University
| |
Collapse
|
5
|
Lai L, Gershman SJ. Human decision making balances reward maximization and policy compression. PLoS Comput Biol 2024; 20:e1012057. [PMID: 38669280 PMCID: PMC11078408 DOI: 10.1371/journal.pcbi.1012057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 05/08/2024] [Accepted: 04/08/2024] [Indexed: 04/28/2024] Open
Abstract
Policy compression is a computational framework that describes how capacity-limited agents trade reward for simpler action policies to reduce cognitive cost. In this study, we present behavioral evidence that humans prefer simpler policies, as predicted by a capacity-limited reinforcement learning model. Across a set of tasks, we find that people exploit structure in the relationships between states, actions, and rewards to "compress" their policies. In particular, compressed policies are systematically biased towards actions with high marginal probability, thereby discarding some state information. This bias is greater when there is redundancy in the reward-maximizing action policy across states, and increases with memory load. These results could not be explained qualitatively or quantitatively by models that did not make use of policy compression under a capacity limit. We also confirmed the prediction that time pressure should further reduce policy complexity and increase action bias, based on the hypothesis that actions are selected via time-dependent decoding of a compressed code. These findings contribute to a deeper understanding of how humans adapt their decision-making strategies under cognitive resource constraints.
Collapse
Affiliation(s)
- Lucy Lai
- Program in Neuroscience, Harvard University, Cambridge, Massachusetts, United States of America
- Theoretical Sciences Visiting Program, Okinawa Institute of Science and Technology Graduate University, Onna, Okinawa, Japan
| | - Samuel J. Gershman
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
| |
Collapse
|
6
|
Webb J, Steffan P, Hayden BY, Lee D, Kemere C, McGinley M. Foraging Under Uncertainty Follows the Marginal Value Theorem with Bayesian Updating of Environment Representations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.30.587253. [PMID: 38585964 PMCID: PMC10996644 DOI: 10.1101/2024.03.30.587253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Foraging theory has been a remarkably successful approach to understanding the behavior of animals in many contexts. In patch-based foraging contexts, the marginal value theorem (MVT) shows that the optimal strategy is to leave a patch when the marginal rate of return declines to the average for the environment. However, the MVT is only valid in deterministic environments whose statistics are known to the forager; naturalistic environments seldom meet these strict requirements. As a result, the strategies used by foragers in naturalistic environments must be empirically investigated. We developed a novel behavioral task and a corresponding computational framework for studying patch-leaving decisions in head-fixed and freely moving mice. We varied between-patch travel time, as well as within-patch reward depletion rate, both deterministically and stochastically. We found that mice adopt patch residence times in a manner consistent with the MVT and not explainable by simple ethologically motivated heuristic strategies. Critically, behavior was best accounted for by a modified form of the MVT wherein environment representations were updated based on local variations in reward timing, captured by a Bayesian estimator and dynamic prior. Thus, we show that mice can strategically attend to, learn from, and exploit task structure on multiple timescales simultaneously, thereby efficiently foraging in volatile environments. The results provide a foundation for applying the systems neuroscience toolkit in freely moving and head-fixed mice to understand the neural basis of foraging under uncertainty.
Collapse
Affiliation(s)
- James Webb
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital, Houston, TX, USA
| | - Paul Steffan
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
| | - Benjamin Y. Hayden
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - Daeyeol Lee
- The Zanvyl Krieger Mind/Brain Institute, The Solomon H Snyder Department of Neuroscience, Department of Psychological and Brain Sciences, Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, MD, USA
| | - Caleb Kemere
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
| | - Matthew McGinley
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital, Houston, TX, USA
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
| |
Collapse
|
7
|
Colas JT, O’Doherty JP, Grafton ST. Active reinforcement learning versus action bias and hysteresis: control with a mixture of experts and nonexperts. PLoS Comput Biol 2024; 20:e1011950. [PMID: 38552190 PMCID: PMC10980507 DOI: 10.1371/journal.pcbi.1011950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 02/26/2024] [Indexed: 04/01/2024] Open
Abstract
Active reinforcement learning enables dynamic prediction and control, where one should not only maximize rewards but also minimize costs such as of inference, decisions, actions, and time. For an embodied agent such as a human, decisions are also shaped by physical aspects of actions. Beyond the effects of reward outcomes on learning processes, to what extent can modeling of behavior in a reinforcement-learning task be complicated by other sources of variance in sequential action choices? What of the effects of action bias (for actions per se) and action hysteresis determined by the history of actions chosen previously? The present study addressed these questions with incremental assembly of models for the sequential choice data from a task with hierarchical structure for additional complexity in learning. With systematic comparison and falsification of computational models, human choices were tested for signatures of parallel modules representing not only an enhanced form of generalized reinforcement learning but also action bias and hysteresis. We found evidence for substantial differences in bias and hysteresis across participants-even comparable in magnitude to the individual differences in learning. Individuals who did not learn well revealed the greatest biases, but those who did learn accurately were also significantly biased. The direction of hysteresis varied among individuals as repetition or, more commonly, alternation biases persisting from multiple previous actions. Considering that these actions were button presses with trivial motor demands, the idiosyncratic forces biasing sequences of action choices were robust enough to suggest ubiquity across individuals and across tasks requiring various actions. In light of how bias and hysteresis function as a heuristic for efficient control that adapts to uncertainty or low motivation by minimizing the cost of effort, these phenomena broaden the consilient theory of a mixture of experts to encompass a mixture of expert and nonexpert controllers of behavior.
Collapse
Affiliation(s)
- Jaron T. Colas
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, California, United States of America
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, California, United States of America
- Computation and Neural Systems Program, California Institute of Technology, Pasadena, California, United States of America
| | - John P. O’Doherty
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, California, United States of America
- Computation and Neural Systems Program, California Institute of Technology, Pasadena, California, United States of America
| | - Scott T. Grafton
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, California, United States of America
| |
Collapse
|
8
|
Valentin S, Kleinegesse S, Bramley NR, Seriès P, Gutmann MU, Lucas CG. Designing optimal behavioral experiments using machine learning. eLife 2024; 13:e86224. [PMID: 38261382 PMCID: PMC10805374 DOI: 10.7554/elife.86224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 11/19/2023] [Indexed: 01/24/2024] Open
Abstract
Computational models are powerful tools for understanding human cognition and behavior. They let us express our theories clearly and precisely and offer predictions that can be subtle and often counter-intuitive. However, this same richness and ability to surprise means our scientific intuitions and traditional tools are ill-suited to designing experiments to test and compare these models. To avoid these pitfalls and realize the full potential of computational modeling, we require tools to design experiments that provide clear answers about what models explain human behavior and the auxiliary assumptions those models must make. Bayesian optimal experimental design (BOED) formalizes the search for optimal experimental designs by identifying experiments that are expected to yield informative data. In this work, we provide a tutorial on leveraging recent advances in BOED and machine learning to find optimal experiments for any kind of model that we can simulate data from, and show how by-products of this procedure allow for quick and straightforward evaluation of models and their parameters against real experimental data. As a case study, we consider theories of how people balance exploration and exploitation in multi-armed bandit decision-making tasks. We validate the presented approach using simulations and a real-world experiment. As compared to experimental designs commonly used in the literature, we show that our optimal designs more efficiently determine which of a set of models best account for individual human behavior, and more efficiently characterize behavior given a preferred model. At the same time, formalizing a scientific question such that it can be adequately addressed with BOED can be challenging and we discuss several potential caveats and pitfalls that practitioners should be aware of. We provide code to replicate all analyses as well as tutorial notebooks and pointers to adapt the methodology to different experimental settings.
Collapse
Affiliation(s)
- Simon Valentin
- School of Informatics, University of EdinburghEdinburghUnited Kingdom
| | | | - Neil R Bramley
- Department of Psychology, University of EdinburghEdinburghUnited Kingdom
| | - Peggy Seriès
- School of Informatics, University of EdinburghEdinburghUnited Kingdom
| | - Michael U Gutmann
- School of Informatics, University of EdinburghEdinburghUnited Kingdom
| | | |
Collapse
|
9
|
Futrell R. An Information-Theoretic Account of Availability Effects in Language Production. Top Cogn Sci 2024; 16:38-53. [PMID: 38145974 DOI: 10.1111/tops.12716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 11/30/2023] [Accepted: 12/01/2023] [Indexed: 12/27/2023]
Abstract
I present a computational-level model of language production in terms of a combination of information theory and control theory in which words are chosen incrementally in order to maximize communicative value subject to an information-theoretic capacity constraint. The theory generally predicts a tradeoff between ease of production and communicative accuracy. I apply the theory to two cases of apparent availability effects in language production, in which words are selected on the basis of their accessibility to a speaker who has not yet perfectly planned the rest of the utterance. Using corpus data on English relative clause complementizer dropping and experimental data on Mandarin noun classifier choice, I show that the theory reproduces the observed phenomena, providing an alternative account to Uniform Information Density and a promising general model of language production which is tightly linked to emerging theories in computational neuroscience.
Collapse
Affiliation(s)
- Richard Futrell
- Department of Language Science, University of California, Irvine
| |
Collapse
|
10
|
Futrell R. Information-theoretic principles in incremental language production. Proc Natl Acad Sci U S A 2023; 120:e2220593120. [PMID: 37725652 PMCID: PMC10523564 DOI: 10.1073/pnas.2220593120] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 07/22/2023] [Indexed: 09/21/2023] Open
Abstract
I apply a recently emerging perspective on the complexity of action selection, the rate-distortion theory of control, to provide a computational-level model of errors and difficulties in human language production, which is grounded in information theory and control theory. Language production is cast as the sequential selection of actions to achieve a communicative goal subject to a capacity constraint on cognitive control. In a series of calculations, simulations, corpus analyses, and comparisons to experimental data, I show that the model directly predicts some of the major known qualitative and quantitative phenomena in language production, including semantic interference and predictability effects in word choice; accessibility-based ("easy-first") production preferences in word order alternations; and the existence and distribution of disfluencies including filled pauses, corrections, and false starts. I connect the rate-distortion view to existing models of human language production, to probabilistic models of semantics and pragmatics, and to proposals for controlled language generation in the machine learning and reinforcement learning literature.
Collapse
Affiliation(s)
- Richard Futrell
- Department of Language Science, University of California, Irvine, CA92617
| |
Collapse
|
11
|
Gong T, Gerstenberg T, Mayrhofer R, Bramley NR. Active causal structure learning in continuous time. Cogn Psychol 2023; 140:101542. [PMID: 36586246 DOI: 10.1016/j.cogpsych.2022.101542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 12/07/2022] [Accepted: 12/11/2022] [Indexed: 12/30/2022]
Abstract
Research on causal cognition has largely focused on learning and reasoning about contingency data aggregated across discrete observations or experiments. However, this setting represents only the tip of the causal cognition iceberg. A more general problem lurking beneath is that of learning the latent causal structure that connects events and actions as they unfold in continuous time. In this paper, we examine how people actively learn about causal structure in a continuous-time setting, focusing on when and where they intervene and how this shapes their learning. Across two experiments, we find that participants' accuracy depends on both the informativeness and evidential complexity of the data they generate. Moreover, participants' intervention choices strike a balance between maximizing expected information and minimizing inferential complexity. People time and target their interventions to create simple yet informative causal dynamics. We discuss how the continuous-time setting challenges existing computational accounts of active causal learning, and argue that metacognitive awareness of one's inferential limitations plays a critical role for successful learning in the wild.
Collapse
Affiliation(s)
- Tianwei Gong
- Department of Psychology, University of Edinburgh, United Kingdom.
| | | | - Ralf Mayrhofer
- Department of Psychology, University of Göttingen, Germany
| | - Neil R Bramley
- Department of Psychology, University of Edinburgh, United Kingdom
| |
Collapse
|
12
|
Bari BA, Gershman SJ. Undermatching Is a Consequence of Policy Compression. J Neurosci 2023; 43:447-457. [PMID: 36639891 PMCID: PMC9864556 DOI: 10.1523/jneurosci.1003-22.2022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 10/14/2022] [Accepted: 11/17/2022] [Indexed: 12/12/2022] Open
Abstract
The matching law describes the tendency of agents to match the ratio of choices allocated to the ratio of rewards received when choosing among multiple options (Herrnstein, 1961). Perfect matching, however, is infrequently observed. Instead, agents tend to undermatch or bias choices toward the poorer option. Overmatching, or the tendency to bias choices toward the richer option, is rarely observed. Despite the ubiquity of undermatching, it has received an inadequate normative justification. Here, we assume agents not only seek to maximize reward, but also seek to minimize cognitive cost, which we formalize as policy complexity (the mutual information between actions and states of the environment). Policy complexity measures the extent to which the policy of an agent is state dependent. Our theory states that capacity-constrained agents (i.e., agents that must compress their policies to reduce complexity) can only undermatch or perfectly match, but not overmatch, consistent with the empirical evidence. Moreover, using mouse behavioral data (male), we validate a novel prediction about which task conditions exaggerate undermatching. Finally, in patients with Parkinson's disease (male and female), we argue that a reduction in undermatching with higher dopamine levels is consistent with an increased policy complexity.SIGNIFICANCE STATEMENT The matching law describes the tendency of agents to match the ratio of choices allocated to different options to the ratio of reward received. For example, if option a yields twice as much reward as option b, matching states that agents will choose option a twice as much. However, agents typically undermatch: they choose the poorer option more frequently than expected. Here, we assume that agents seek to simultaneously maximize reward and minimize the complexity of their action policies. We show that this theory explains when and why undermatching occurs. Neurally, we show that policy complexity, and by extension undermatching, is controlled by tonic dopamine, consistent with other evidence that dopamine plays an important role in cognitive resource allocation.
Collapse
Affiliation(s)
- Bilal A Bari
- Department of Psychiatry, Massachusetts General Hospital, Boston, Massachusetts 02114
| | - Samuel J Gershman
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, Massachusetts 02138
- Center for Brains, Minds, and Machines, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
| |
Collapse
|
13
|
Dubois M, Bowler A, Moses-Payne ME, Habicht J, Moran R, Steinbeis N, Hauser TU. Exploration heuristics decrease during youth. COGNITIVE, AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2022; 22:969-983. [PMID: 35589910 PMCID: PMC9458685 DOI: 10.3758/s13415-022-01009-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Accepted: 04/22/2022] [Indexed: 01/01/2023]
Abstract
Deciding between exploring new avenues and exploiting known choices is central to learning, and this exploration-exploitation trade-off changes during development. Exploration is not a unitary concept, and humans deploy multiple distinct mechanisms, but little is known about their specific emergence during development. Using a previously validated task in adults, changes in exploration mechanisms were investigated between childhood (8-9 y/o, N = 26; 16 females), early (12-13 y/o, N = 38; 21 females), and late adolescence (16-17 y/o, N = 33; 19 females) in ethnically and socially diverse schools from disadvantaged areas. We find an increased usage of a computationally light exploration heuristic in younger groups, effectively accommodating their limited neurocognitive resources. Moreover, this heuristic was associated with self-reported, attention-deficit/hyperactivity disorder symptoms in this population-based sample. This study enriches our mechanistic understanding about how exploration strategies mature during development.
Collapse
Affiliation(s)
- Magda Dubois
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, WC1B 5EH, London, UK.
- Wellcome Centre for Human Neuroimaging, University College London, WC1N 3BG, London, UK.
| | - Aislinn Bowler
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, WC1B 5EH, London, UK
- Wellcome Centre for Human Neuroimaging, University College London, WC1N 3BG, London, UK
- Centre for Brain and Cognitive Development, Birkbeck, University of London, WC1E 7HX, London, UK
| | - Madeleine E Moses-Payne
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, WC1B 5EH, London, UK
- Wellcome Centre for Human Neuroimaging, University College London, WC1N 3BG, London, UK
- UCL Institute of Cognitive Neuroscience, WC1N 3AZ, London, UK
| | - Johanna Habicht
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, WC1B 5EH, London, UK
- Wellcome Centre for Human Neuroimaging, University College London, WC1N 3BG, London, UK
| | - Rani Moran
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, WC1B 5EH, London, UK
- Wellcome Centre for Human Neuroimaging, University College London, WC1N 3BG, London, UK
| | - Nikolaus Steinbeis
- Division of Psychology and Language Sciences, University College London, WC1H 0AP, London, UK
| | - Tobias U Hauser
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, WC1B 5EH, London, UK
- Wellcome Centre for Human Neuroimaging, University College London, WC1N 3BG, London, UK
| |
Collapse
|
14
|
Beron CC, Neufeld SQ, Linderman SW, Sabatini BL. Mice exhibit stochastic and efficient action switching during probabilistic decision making. Proc Natl Acad Sci U S A 2022; 119:e2113961119. [PMID: 35385355 PMCID: PMC9169659 DOI: 10.1073/pnas.2113961119] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 03/03/2022] [Indexed: 12/05/2022] Open
Abstract
In probabilistic and nonstationary environments, individuals must use internal and external cues to flexibly make decisions that lead to desirable outcomes. To gain insight into the process by which animals choose between actions, we trained mice in a task with time-varying reward probabilities. In our implementation of such a two-armed bandit task, thirsty mice use information about recent action and action–outcome histories to choose between two ports that deliver water probabilistically. Here we comprehensively modeled choice behavior in this task, including the trial-to-trial changes in port selection, i.e., action switching behavior. We find that mouse behavior is, at times, deterministic and, at others, apparently stochastic. The behavior deviates from that of a theoretically optimal agent performing Bayesian inference in a hidden Markov model (HMM). We formulate a set of models based on logistic regression, reinforcement learning, and sticky Bayesian inference that we demonstrate are mathematically equivalent and that accurately describe mouse behavior. The switching behavior of mice in the task is captured in each model by a stochastic action policy, a history-dependent representation of action value, and a tendency to repeat actions despite incoming evidence. The models parsimoniously capture behavior across different environmental conditionals by varying the stickiness parameter, and like the mice, they achieve nearly maximal reward rates. These results indicate that mouse behavior reaches near-maximal performance with reduced action switching and can be described by a set of equivalent models with a small number of relatively fixed parameters.
Collapse
Affiliation(s)
- Celia C. Beron
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115
- HHMI, Harvard Medical School, Boston, MA 02115
| | - Shay Q. Neufeld
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115
- HHMI, Harvard Medical School, Boston, MA 02115
| | - Scott W. Linderman
- Department of Statistics, Stanford University, Stanford, CA 94305
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA 94305
| | - Bernardo L. Sabatini
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115
- HHMI, Harvard Medical School, Boston, MA 02115
| |
Collapse
|
15
|
Time pressure changes how people explore and respond to uncertainty. Sci Rep 2022; 12:4122. [PMID: 35260717 PMCID: PMC8904509 DOI: 10.1038/s41598-022-07901-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Accepted: 02/28/2022] [Indexed: 12/25/2022] Open
Abstract
How does time pressure influence exploration and decision-making? We investigated this question with several four-armed bandit tasks manipulating (within subjects) expected reward, uncertainty, and time pressure (limited vs. unlimited). With limited time, people have less opportunity to perform costly computations, thus shifting the cost-benefit balance of different exploration strategies. Through behavioral, reinforcement learning (RL), reaction time (RT), and evidence accumulation analyses, we show that time pressure changes how people explore and respond to uncertainty. Specifically, participants reduced their uncertainty-directed exploration under time pressure, were less value-directed, and repeated choices more often. Since our analyses relate uncertainty to slower responses and dampened evidence accumulation (i.e., drift rates), this demonstrates a resource-rational shift towards simpler, lower-cost strategies under time pressure. These results shed light on how people adapt their exploration and decision-making strategies to externally imposed cognitive constraints.
Collapse
|
16
|
Ashwood ZC, Roy NA, Stone IR, Urai AE, Churchland AK, Pouget A, Pillow JW. Mice alternate between discrete strategies during perceptual decision-making. Nat Neurosci 2022; 25:201-212. [PMID: 35132235 PMCID: PMC8890994 DOI: 10.1038/s41593-021-01007-z] [Citation(s) in RCA: 59] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 12/17/2021] [Indexed: 12/21/2022]
Abstract
Classical models of perceptual decision-making assume that subjects use a single, consistent strategy to form decisions, or that decision-making strategies evolve slowly over time. Here we present new analyses suggesting that this common view is incorrect. We analyzed data from mouse and human decision-making experiments and found that choice behavior relies on an interplay among multiple interleaved strategies. These strategies, characterized by states in a hidden Markov model, persist for tens to hundreds of trials before switching, and often switch multiple times within a session. The identified decision-making strategies were highly consistent across mice and comprised a single 'engaged' state, in which decisions relied heavily on the sensory stimulus, and several biased states in which errors frequently occurred. These results provide a powerful alternate explanation for 'lapses' often observed in rodent behavioral experiments, and suggest that standard measures of performance mask the presence of major changes in strategy across trials.
Collapse
Affiliation(s)
- Zoe C Ashwood
- Deptartment of Computer Science, Princeton University, Princeton, NJ, USA.
- Princeton Neuroscience Institute, Princeton, NJ, USA.
| | | | - Iris R Stone
- Princeton Neuroscience Institute, Princeton, NJ, USA
| | - Anne E Urai
- Cognitive Psychology Unit, Leiden University, Leiden, Netherlands
| | - Anne K Churchland
- David Geffen School of Medicine, The University of California, Los Angeles, Los Angeles, CA, USA
| | - Alexandre Pouget
- Faculty of Medicine & Deptartment of Basic Neurosciences, University of Geneva, Geneva, Switzerland
| | - Jonathan W Pillow
- Princeton Neuroscience Institute, Princeton, NJ, USA.
- Department of Psychology, Princeton University, Princeton, NJ, USA.
| |
Collapse
|
17
|
|
18
|
Bongiorno C, Zhou Y, Kryven M, Theurel D, Rizzo A, Santi P, Tenenbaum J, Ratti C. Vector-based pedestrian navigation in cities. NATURE COMPUTATIONAL SCIENCE 2021; 1:678-685. [PMID: 38217198 DOI: 10.1038/s43588-021-00130-y] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Accepted: 08/12/2021] [Indexed: 01/15/2024]
Abstract
How do pedestrians choose their paths within city street networks? Researchers have tried to shed light on this matter through strictly controlled experiments, but an ultimate answer based on real-world mobility data is still lacking. Here, we analyze salient features of human path planning through a statistical analysis of a massive dataset of GPS traces, which reveals that (1) people increasingly deviate from the shortest path when the distance between origin and destination increases and (2) chosen paths are statistically different when origin and destination are swapped. We posit that direction to goal is a main driver of path planning and develop a vector-based navigation model; the resulting trajectories, which we have termed pointiest paths, are a statistically better predictor of human paths than a model based on minimizing distance with stochastic effects. Our findings generalize across two major US cities with different street networks, hinting to the fact that vector-based navigation might be a universal property of human path planning.
Collapse
Affiliation(s)
- Christian Bongiorno
- Senseable City Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
- Université Paris-Saclay, CentraleSupélec, Mathématiques et Informatique pour la Complexité et les Systèmes, Gif-sur-Yvette, France
| | - Yulun Zhou
- Senseable City Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Urban Planning and Design, Faculty of Architecture, The University of Hong Kong, Pokfulam, Hong Kong, China
| | - Marta Kryven
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - David Theurel
- Department of Physics, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Alessandro Rizzo
- Dipartimento di Elettronica e Telecomunicazioni, Politecnico di Torino, Torino, Italy
- Office of Innovation, New York University Tandon School of Engineering, Six MetroTech Center, New York, NY, USA
| | - Paolo Santi
- Senseable City Lab, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Istituto di Informatica e Telematica del CNR, Pisa, Italy.
| | - Joshua Tenenbaum
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Carlo Ratti
- Senseable City Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
19
|
Futrell R. An Information-Theoretic Account of Semantic Interference in Word Production. Front Psychol 2021; 12:672408. [PMID: 34135832 PMCID: PMC8200775 DOI: 10.3389/fpsyg.2021.672408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Accepted: 04/27/2021] [Indexed: 11/30/2022] Open
Abstract
I present a computational-level model of semantic interference effects in online word production within a rate-distortion framework. I consider a bounded-rational agent trying to produce words. The agent's action policy is determined by maximizing accuracy in production subject to computational constraints. These computational constraints are formalized using mutual information. I show that semantic similarity-based interference among words falls out naturally from this setup, and I present a series of simulations showing that the model captures some of the key empirical patterns observed in Stroop and Picture-Word Interference paradigms, including comparisons to human data from previous experiments.
Collapse
Affiliation(s)
- Richard Futrell
- Department of Language Science, University of California, Irvine, Irvine, CA, United States
| |
Collapse
|
20
|
Ruel A, Devine S, Eppinger B. Resource‐rational approach to meta‐control problems across the lifespan. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2021; 12:e1556. [DOI: 10.1002/wcs.1556] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 12/09/2020] [Accepted: 01/19/2021] [Indexed: 01/26/2023]
Affiliation(s)
- Alexa Ruel
- Department of Psychology Concordia University Montreal Quebec Canada
| | - Sean Devine
- Department of Psychology McGill University Montreal Quebec Canada
| | - Ben Eppinger
- Department of Psychology Concordia University Montreal Quebec Canada
- Faculty of Psychology Technische Universität Dresden Dresden Germany
- PERFORM Center Concordia University Montreal Quebec Canada
| |
Collapse
|
21
|
Lai L, Gershman SJ. Policy compression: An information bottleneck in action selection. PSYCHOLOGY OF LEARNING AND MOTIVATION 2021. [DOI: 10.1016/bs.plm.2021.02.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|