1
|
Velázquez-Vargas CA, Taylor JA. Learning to Move and Plan like the Knight: Sequential Decision Making with a Novel Motor Mapping. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.29.610359. [PMID: 39257833 PMCID: PMC11383687 DOI: 10.1101/2024.08.29.610359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]
Abstract
Many skills that humans acquire throughout their lives, such as playing video games or sports, require substantial motor learning and multi-step planning. While both processes are typically studied separately, they are likely to interact during the acquisition of complex motor skills. In this work, we studied this interaction by assessing human performance in a sequential decision-making task that requires the learning of a non-trivial motor mapping. Participants were tasked to move a cursor from start to target locations in a grid world, using a standard keyboard. Notably, the specific keys were arbitrarily mapped to a movement rule resembling the Knight chess piece. In Experiment 1, we showed the learning of this mapping in the absence of planning, led to significant improvements in the task when presented with sequential decisions at a later stage. Computational modeling analysis revealed that such improvements resulted from an increased learning rate about the state transitions of the motor mapping, which also resulted in more flexible planning from trial to trial (less perseveration or habitual responses). In Experiment 2, we showed that incorporating mapping learning into the planning process, allows us to capture (1) differential task improvements for distinct planning horizons and (2) overall lower performance for longer horizons. Additionally, model analysis suggested that participants may limit their search to three steps ahead. We hypothesize that this limitation in planning horizon arises from capacity constraints in working memory, and may be the reason complex skills are often broken down into individual subroutines or components during learning.
Collapse
|
2
|
Le Denmat P, Verguts T, Desender K. A low-dimensional approximation of optimal confidence. PLoS Comput Biol 2024; 20:e1012273. [PMID: 39047032 PMCID: PMC11299811 DOI: 10.1371/journal.pcbi.1012273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/05/2024] [Accepted: 06/24/2024] [Indexed: 07/27/2024] Open
Abstract
Human decision making is accompanied by a sense of confidence. According to Bayesian decision theory, confidence reflects the learned probability of making a correct response, given available data (e.g., accumulated stimulus evidence and response time). Although optimal, independently learning these probabilities for all possible data combinations is computationally intractable. Here, we describe a novel model of confidence implementing a low-dimensional approximation of this optimal yet intractable solution. This model allows efficient estimation of confidence, while at the same time accounting for idiosyncrasies, different kinds of biases and deviation from the optimal probability correct. Our model dissociates confidence biases resulting from the estimate of the reliability of evidence by individuals (captured by parameter α), from confidence biases resulting from general stimulus independent under and overconfidence (captured by parameter β). We provide empirical evidence that this model accurately fits both choice data (accuracy, response time) and trial-by-trial confidence ratings simultaneously. Finally, we test and empirically validate two novel predictions of the model, namely that 1) changes in confidence can be independent of performance and 2) selectively manipulating each parameter of our model leads to distinct patterns of confidence judgments. As a tractable and flexible account of the computation of confidence, our model offers a clear framework to interpret and further resolve different forms of confidence biases.
Collapse
Affiliation(s)
| | - Tom Verguts
- Department of Experimental Psychology, Ghent University, Ghent Belgium
| | | |
Collapse
|
3
|
McCarthy WP, Kirsh D, Fan JE. Consistency and Variation in Reasoning About Physical Assembly. Cogn Sci 2023; 47:e13397. [PMID: 38146204 DOI: 10.1111/cogs.13397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 10/27/2023] [Accepted: 12/06/2023] [Indexed: 12/27/2023]
Abstract
The ability to reason about how things were made is a pervasive aspect of how humans make sense of physical objects. Such reasoning is useful for a range of everyday tasks, from assembling a piece of furniture to making a sandwich and knitting a sweater. What enables people to reason in this way even about novel objects, and how do people draw upon prior experience with an object to continually refine their understanding of how to create it? To explore these questions, we developed a virtual task environment to investigate how people come up with step-by-step procedures for recreating block towers whose composition was not readily apparent, and analyzed how the procedures they used to build them changed across repeated attempts. Specifically, participants (N = 105) viewed 2D silhouettes of eight unique block towers in a virtual environment simulating rigid-body physics, and aimed to reconstruct each one in less than 60 s. We found that people built each tower more accurately and quickly across repeated attempts, and that this improvement reflected both group-level convergence upon a tiny fraction of all possible viable procedures, as well as error-dependent updating across successive attempts by the same individual. Taken together, our study presents a scalable approach to measuring consistency and variation in how people infer solutions to physical assembly problems.
Collapse
Affiliation(s)
| | - David Kirsh
- Department of Cognitive Science, University of California San Diego
| | - Judith E Fan
- Department of Psychology, University of California San Diego
- Department of Psychology, Stanford University
| |
Collapse
|
4
|
Barron AB, Mourmourakis F. The Relationship between Cognition and Brain Size or Neuron Number. BRAIN, BEHAVIOR AND EVOLUTION 2023; 99:109-122. [PMID: 37487478 DOI: 10.1159/000532013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 07/05/2023] [Indexed: 07/26/2023]
Abstract
The comparative approach is a powerful way to explore the relationship between brain structure and cognitive function. Thus far, the field has been dominated by the assumption that a bigger brain somehow means better cognition. Correlations between differences in brain size or neuron number between species and differences in specific cognitive abilities exist, but these correlations are very noisy. Extreme differences exist between clades in the relationship between either brain size or neuron number and specific cognitive abilities. This means that correlations become weaker, not stronger, as the taxonomic diversity of sampled groups increases. Cognition is the outcome of neural networks. Here we propose that considering plausible neural network models will advance our understanding of the complex relationships between neuron number and different aspects of cognition. Computational modelling of networks suggests that adding pathways, or layers, or changing patterns of connectivity in a network can all have different specific consequences for cognition. Consequently, models of computational architecture can help us hypothesise how and why differences in neuron number might be related to differences in cognition. As methods in connectomics continue to improve and more structural information on animal brains becomes available, we are learning more about natural network structures in brains, and we can develop more biologically plausible models of cognitive architecture. Natural animal diversity then becomes a powerful resource to both test the assumptions of these models and explore hypotheses for how neural network structure and network size might delimit cognitive function.
Collapse
Affiliation(s)
- Andrew B Barron
- School of Natural Sciences, Macquarie University, Sydney, New South Wales, Australia
| | - Faelan Mourmourakis
- School of Natural Sciences, Macquarie University, Sydney, New South Wales, Australia
| |
Collapse
|
5
|
Abstract
Neural mechanisms of perceptual decision making have been extensively studied in experimental settings that mimic stable environments with repeating stimuli, fixed rules, and payoffs. In contrast, we live in an ever-changing environment and have varying goals and behavioral demands. To accommodate variability, our brain flexibly adjusts decision-making processes depending on context. Here, we review a growing body of research that explores the neural mechanisms underlying this flexibility. We highlight diverse forms of context dependency in decision making implemented through a variety of neural computations. Context-dependent neural activity is observed in a distributed network of brain structures, including posterior parietal, sensory, motor, and subcortical regions, as well as the prefrontal areas classically implicated in cognitive control. We propose that investigating the distributed network underlying flexible decisions is key to advancing our understanding and discuss a path forward for experimental and theoretical investigations.
Collapse
Affiliation(s)
- Gouki Okazawa
- Center for Neural Science, New York University, New York, NY, USA;
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China
| | - Roozbeh Kiani
- Center for Neural Science, New York University, New York, NY, USA;
- Department of Psychology, New York University, New York, NY, USA
| |
Collapse
|
6
|
Cristín J, Méndez V, Campos D. Informational Entropy Threshold as a Physical Mechanism for Explaining Tree-like Decision Making in Humans. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1819. [PMID: 36554223 PMCID: PMC9778513 DOI: 10.3390/e24121819] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/09/2022] [Accepted: 12/11/2022] [Indexed: 06/17/2023]
Abstract
While approaches based on physical grounds (such as the drift-diffusion model-DDM) have been exhaustively used in psychology and neuroscience to describe perceptual decision making in humans, similar approaches to complex situations, such as sequential (tree-like) decisions, are still scarce. For such scenarios that involve a reflective prospection of future options, we offer a plausible mechanism based on the idea that subjects can carry out an internal computation of the uncertainty about the different options available, which is computed through the corresponding Shannon entropy. When the amount of information gathered through sensory evidence is enough to reach a given threshold in the entropy, this will trigger the decision. Experimental evidence in favor of this entropy-based mechanism was provided by exploring human performance during navigation through a maze on a computer screen monitored with the help of eye trackers. In particular, our analysis allows us to prove that (i) prospection is effectively used by humans during such navigation tasks, and an indirect quantification of the level of prospection used is attainable; in addition, (ii) the distribution of decision times during the task exhibits power-law tails, a feature that our entropy-based mechanism is able to explain, unlike traditional (DDM-like) frameworks.
Collapse
Affiliation(s)
- Javier Cristín
- Istituto Sistemi Complessi, Consiglio Nazionale delle Ricerche, UOS Sapienza, 00185 Rome, Italy
- Dipartimento di Fisica, Universita’ Sapienza, 00185 Rome, Italy
| | - Vicenç Méndez
- Grup de Física Estadística, Departament de Física, Facultat de Ciències, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| | - Daniel Campos
- Grup de Física Estadística, Departament de Física, Facultat de Ciències, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| |
Collapse
|
7
|
Ho MK, Saxe R, Cushman F. Planning with Theory of Mind. Trends Cogn Sci 2022; 26:959-971. [PMID: 36089494 DOI: 10.1016/j.tics.2022.08.003] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 01/12/2023]
Abstract
Understanding Theory of Mind should begin with an analysis of the problems it solves. The traditional answer is that Theory of Mind is used for predicting others' thoughts and actions. However, the same Theory of Mind is also used for planning to change others' thoughts and actions. Planning requires that Theory of Mind consists of abstract structured causal representations and supports efficient search and selection from innumerable possible actions. Theory of Mind contrasts with less cognitively demanding alternatives: statistical predictive models of other people's actions, or model-free reinforcement of actions by their effects on other people. Theory of Mind is likely used to plan novel interventions and predict their effects, for example, in pedagogy, emotion regulation, and impression management.
Collapse
Affiliation(s)
- Mark K Ho
- Department of Computer Science, Princeton University, Princeton, NJ, USA; Department of Psychology, Princeton University, Princeton, NJ, USA.
| | - Rebecca Saxe
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, USA
| | - Fiery Cushman
- Department of Psychology, Harvard University, Cambridge, MA, USA
| |
Collapse
|
8
|
Li Y, McClelland JL. A weighted constraint satisfaction approach to human goal-directed decision making. PLoS Comput Biol 2022; 18:e1009553. [PMID: 35709299 PMCID: PMC9255770 DOI: 10.1371/journal.pcbi.1009553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 07/05/2022] [Accepted: 05/19/2022] [Indexed: 11/29/2022] Open
Abstract
When we plan for long-range goals, proximal information cannot be exploited in a blindly myopic way, as relevant future information must also be considered. But when a subgoal must be resolved first, irrelevant future information should not interfere with the processing of more proximal, subgoal-relevant information. We explore the idea that decision making in both situations relies on the flexible modulation of the degree to which different pieces of information under consideration are weighted, rather than explicitly decomposing a problem into smaller parts and solving each part independently. We asked participants to find the shortest goal-reaching paths in mazes and modeled their initial path choices as a noisy, weighted information integration process. In a base task where choosing the optimal initial path required weighting starting-point and goal-proximal factors equally, participants did take both constraints into account, with participants who made more accurate choices tending to exhibit more balanced weighting. The base task was then embedded as an initial subtask in a larger maze, where the same two factors constrained the optimal path to a subgoal, and the final goal position was irrelevant to the initial path choice. In this more complex task, participants’ choices reflected predominant consideration of the subgoal-relevant constraints, but also some influence of the initially-irrelevant final goal. More accurate participants placed much less weight on the optimality-irrelevant goal and again tended to weight the two initially-relevant constraints more equally. These findings suggest that humans may rely on a graded, task-sensitive weighting of multiple constraints to generate approximately optimal decision outcomes in both hierarchical and non-hierarchical goal-directed tasks. Different problems require the consideration of different information sources, including often useful long-range, future information that may impact our immediate decisions. However, when future information is irrelevant to a key subgoal, it can be desirable to focus on achieving the subgoal first. We suggest that humans rely on appropriately weighting relevant information over irrelevant information to generate decision outcomes in both types of situations. We conducted behavioral experiments and fitted models of decision processes to understand to what extent people considered various task factors in choosing the initial path in different mazes, both when a simple maze occurred alone or was embedded as an initial part in a larger maze. Our results show that people approximate the optimal decision outcomes in both tasks by modulating the weighting of different factors during planning, and that people who made more accurate initial path choices modulated these weightings more successfully than those who made less accurate choices.
Collapse
Affiliation(s)
- Yuxuan Li
- Department of Psychology, Stanford University, Stanford, California, United States of America
- * E-mail: (YL); (JLM)
| | - James L. McClelland
- Department of Psychology, Stanford University, Stanford, California, United States of America
- * E-mail: (YL); (JLM)
| |
Collapse
|
9
|
Lei Y, Solway A. Conflict and competition between model-based and model-free control. PLoS Comput Biol 2022; 18:e1010047. [PMID: 35511764 PMCID: PMC9070915 DOI: 10.1371/journal.pcbi.1010047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 03/22/2022] [Indexed: 11/25/2022] Open
Abstract
A large literature has accumulated suggesting that human and animal decision making is driven by at least two systems, and that important functions of these systems can be captured by reinforcement learning algorithms. The "model-free" system caches and uses stimulus-value or stimulus-response associations, and the "model-based" system implements more flexible planning using a model of the world. However, it is not clear how the two systems interact during deliberation and how a single decision emerges from this process, especially when they disagree. Most previous work has assumed that while the systems operate in parallel, they do so independently, and they combine linearly to influence decisions. Using an integrated reinforcement learning/drift-diffusion model, we tested the hypothesis that the two systems interact in a non-linear fashion similar to other situations with cognitive conflict. We differentiated two forms of conflict: action conflict, a binary state representing whether the systems disagreed on the best action, and value conflict, a continuous measure of the extent to which the two systems disagreed on the difference in value between the available options. We found that decisions with greater value conflict were characterized by reduced model-based control and increased caution both with and without action conflict. Action conflict itself (the binary state) acted in the opposite direction, although its effects were less prominent. We also found that between-system conflict was highly correlated with within-system conflict, and although it is less clear a priori why the latter might influence the strength of each system above its standard linear contribution, we could not rule it out. Our work highlights the importance of non-linear conflict effects, and provides new constraints for more detailed process models of decision making. It also presents new avenues to explore with relation to disorders of compulsivity, where an imbalance between systems has been implicated.
Collapse
Affiliation(s)
- Yuqing Lei
- Department of Psychology, University of Maryland-College Park, College Park, Maryland, United States of America
| | - Alec Solway
- Department of Psychology, University of Maryland-College Park, College Park, Maryland, United States of America
- Program in Neuroscience and Cognitive Science, University of Maryland-College Park, College Park, Maryland, United States of America
| |
Collapse
|
10
|
Callaway F, van Opheusden B, Gul S, Das P, Krueger PM, Lieder F, Griffiths TL. Rational use of cognitive resources in human planning. Nat Hum Behav 2022; 6:1112-1125. [PMID: 35484209 DOI: 10.1038/s41562-022-01332-8] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Accepted: 03/03/2022] [Indexed: 12/19/2022]
Abstract
Making good decisions requires thinking ahead, but the huge number of actions and outcomes one could consider makes exhaustive planning infeasible for computationally constrained agents, such as humans. How people are nevertheless able to solve novel problems when their actions have long-reaching consequences is thus a long-standing question in cognitive science. To address this question, we propose a model of resource-constrained planning that allows us to derive optimal planning strategies. We find that previously proposed heuristics such as best-first search are near optimal under some circumstances but not others. In a mouse-tracking paradigm, we show that people adapt their planning strategies accordingly, planning in a manner that is broadly consistent with the optimal model but not with any single heuristic model. We also find systematic deviations from the optimal model that might result from additional cognitive constraints that are yet to be uncovered.
Collapse
Affiliation(s)
| | | | - Sayan Gul
- Department of Psychology, University of California, Berkeley, CA, USA
| | - Priyam Das
- Department of Cognitive Sciences, University of California, Irvine, CA, USA
| | - Paul M Krueger
- Department of Psychology, Princeton University, Princeton, NJ, USA
| | - Falk Lieder
- Max Planck Institute for Intelligent Systems, Tübingen, Germany
| | | |
Collapse
|
11
|
Abstract
Recent breakthroughs in artificial intelligence (AI) have enabled machines to plan in tasks previously thought to be uniquely human. Meanwhile, the planning algorithms implemented by the brain itself remain largely unknown. Here, we review neural and behavioral data in sequential decision-making tasks that elucidate the ways in which the brain does-and does not-plan. To systematically review available biological data, we create a taxonomy of planning algorithms by summarizing the relevant design choices for such algorithms in AI. Across species, recording techniques, and task paradigms, we find converging evidence that the brain represents future states consistent with a class of planning algorithms within our taxonomy-focused, depth-limited, and serial. However, we argue that current data are insufficient for addressing more detailed algorithmic questions. We propose a new approach leveraging AI advances to drive experiments that can adjudicate between competing candidate algorithms.
Collapse
|
12
|
Collins AGE, Shenhav A. Advances in modeling learning and decision-making in neuroscience. Neuropsychopharmacology 2022; 47:104-118. [PMID: 34453117 PMCID: PMC8617262 DOI: 10.1038/s41386-021-01126-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Revised: 07/14/2021] [Accepted: 07/22/2021] [Indexed: 02/07/2023]
Abstract
An organism's survival depends on its ability to learn about its environment and to make adaptive decisions in the service of achieving the best possible outcomes in that environment. To study the neural circuits that support these functions, researchers have increasingly relied on models that formalize the computations required to carry them out. Here, we review the recent history of computational modeling of learning and decision-making, and how these models have been used to advance understanding of prefrontal cortex function. We discuss how such models have advanced from their origins in basic algorithms of updating and action selection to increasingly account for complexities in the cognitive processes required for learning and decision-making, and the representations over which they operate. We further discuss how a deeper understanding of the real-world complexities in these computations has shed light on the fundamental constraints on optimal behavior, and on the complex interactions between corticostriatal pathways to determine such behavior. The continuing and rapid development of these models holds great promise for understanding the mechanisms by which animals adapt to their environments, and what leads to maladaptive forms of learning and decision-making within clinical populations.
Collapse
Affiliation(s)
- Anne G E Collins
- Department of Psychology and Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, USA.
| | - Amitai Shenhav
- Department of Cognitive, Linguistic, & Psychological Sciences and Carney Institute for Brain Science, Brown University, Providence, RI, USA.
| |
Collapse
|
13
|
Zylberberg A. Decision prioritization and causal reasoning in decision hierarchies. PLoS Comput Biol 2021; 17:e1009688. [PMID: 34971552 PMCID: PMC8719712 DOI: 10.1371/journal.pcbi.1009688] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Accepted: 11/28/2021] [Indexed: 12/02/2022] Open
Abstract
From cooking a meal to finding a route to a destination, many real life decisions can be decomposed into a hierarchy of sub-decisions. In a hierarchy, choosing which decision to think about requires planning over a potentially vast space of possible decision sequences. To gain insight into how people decide what to decide on, we studied a novel task that combines perceptual decision making, active sensing and hierarchical and counterfactual reasoning. Human participants had to find a target hidden at the lowest level of a decision tree. They could solicit information from the different nodes of the decision tree to gather noisy evidence about the target's location. Feedback was given only after errors at the leaf nodes and provided ambiguous evidence about the cause of the error. Despite the complexity of task (with 107 latent states) participants were able to plan efficiently in the task. A computational model of this process identified a small number of heuristics of low computational complexity that accounted for human behavior. These heuristics include making categorical decisions at the branching points of the decision tree rather than carrying forward entire probability distributions, discarding sensory evidence deemed unreliable to make a choice, and using choice confidence to infer the cause of the error after an initial plan failed. Plans based on probabilistic inference or myopic sampling norms could not capture participants' behavior. Our results show that it is possible to identify hallmarks of heuristic planning with sensing in human behavior and that the use of tasks of intermediate complexity helps identify the rules underlying human ability to reason over decision hierarchies.
Collapse
Affiliation(s)
- Ariel Zylberberg
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York, United States of America
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York, United States of America
| |
Collapse
|
14
|
Abstract
Active inference offers a first principle account of sentient behavior, from which special and important cases-for example, reinforcement learning, active learning, Bayes optimal inference, Bayes optimal design-can be derived. Active inference finesses the exploitation-exploration dilemma in relation to prior preferences by placing information gain on the same footing as reward or value. In brief, active inference replaces value functions with functionals of (Bayesian) beliefs, in the form of an expected (variational) free energy. In this letter, we consider a sophisticated kind of active inference using a recursive form of expected free energy. Sophistication describes the degree to which an agent has beliefs about beliefs. We consider agents with beliefs about the counterfactual consequences of action for states of affairs and beliefs about those latent states. In other words, we move from simply considering beliefs about "what would happen if I did that" to "what I would believe about what would happen if I did that." The recursive form of the free energy functional effectively implements a deep tree search over actions and outcomes in the future. Crucially, this search is over sequences of belief states as opposed to states per se. We illustrate the competence of this scheme using numerical simulations of deep decision problems.
Collapse
Affiliation(s)
- Karl Friston
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, London WC1N 3AR, U.K.
| | - Lancelot Da Costa
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, London WC1N 3AR, U.K., and Department of Mathematics, Imperial College London, U.K.
| | - Danijar Hafner
- Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada, and Google Research, Brain Team, Toronto, ON MSH 153, Canada
| | - Casper Hesp
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, London WC1N 3AR, U.K., and Amsterdam Brain and Cognition Center, University of Amsterdam, Amsterdam 1001 NK, The Netherlands
| | - Thomas Parr
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, London WC1N 3AR, U.K.
| |
Collapse
|
15
|
|
16
|
Solway A, Lohrenz T, Montague PR. Loss Aversion Correlates With the Propensity to Deploy Model-Based Control. Front Neurosci 2019; 13:915. [PMID: 31555082 PMCID: PMC6743018 DOI: 10.3389/fnins.2019.00915] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Accepted: 08/16/2019] [Indexed: 11/13/2022] Open
Abstract
Reward-based decision making is thought to be driven by at least two different types of decision systems: a simple stimulus–response cache-based system which embodies the common-sense notion of “habit,” for which model-free reinforcement learning serves as a computational substrate, and a more deliberate, prospective, model-based planning system. Previous work has shown that loss aversion, a well-studied measure of how much more on average individuals weigh losses relative to gains during decision making, is reduced when participants take all possible decisions and outcomes into account including future ones, relative to when they myopically focus on the current decision. Model-based control offers a putative mechanism for implementing such foresight. Using a well-powered data set (N = 117) in which participants completed two different tasks designed to measure each of the two quantities of interest, and four models of choice data for these tasks, we found consistent evidence of a relationship between loss aversion and model-based control but in the direction opposite to that expected based on previous work: loss aversion had a positive relationship with model-based control. We did not find evidence for a relationship between either decision system and risk aversion, a related aspect of subjective utility.
Collapse
Affiliation(s)
- Alec Solway
- Virginia Tech Carilion Research Institute, Roanoke, VA, United States
| | - Terry Lohrenz
- Virginia Tech Carilion Research Institute, Roanoke, VA, United States
| | - P Read Montague
- Virginia Tech Carilion Research Institute, Roanoke, VA, United States.,Department of Physics, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States.,Wellcome Trust Centre for Neuroimaging, University College London, London, United Kingdom
| |
Collapse
|
17
|
McNamee D, Wolpert DM. Internal Models in Biological Control. ANNUAL REVIEW OF CONTROL, ROBOTICS, AND AUTONOMOUS SYSTEMS 2019; 2:339-364. [PMID: 31106294 PMCID: PMC6520231 DOI: 10.1146/annurev-control-060117-105206] [Citation(s) in RCA: 86] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Rationality principles such as optimal feedback control and Bayesian inference underpin a probabilistic framework that has accounted for a range of empirical phenomena in biological sensorimotor control. To facilitate the optimization of flexible and robust behaviors consistent with these theories, the ability to construct internal models of the motor system and environmental dynamics can be crucial. In the context of this theoretic formalism, we review the computational roles played by such internal models and the neural and behavioral evidence for their implementation in the brain.
Collapse
Affiliation(s)
- Daniel McNamee
- Computational and Biological Learning Lab, Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, United Kingdom
- Institute of Neurology, University College London, London WC1E 6BT, United Kingdom
| | - Daniel M. Wolpert
- Computational and Biological Learning Lab, Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, United Kingdom
- Zuckerman Mind Brain Behavior Institute, Department of Neuroscience, Columbia University, New York 10027, United States
| |
Collapse
|
18
|
Dissociable components of the reward circuit are involved in appraisal versus choice. Sci Rep 2019; 9:1958. [PMID: 30760824 PMCID: PMC6374444 DOI: 10.1038/s41598-019-38927-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Accepted: 01/10/2019] [Indexed: 11/22/2022] Open
Abstract
People can evaluate a set of options as a whole, or they can approach those same options with the purpose of making a choice between them. A common network has been implicated across these two types of evaluations, including regions of ventromedial prefrontal cortex and the posterior midline. We test the hypothesis that sub-components of this reward circuit are differentially involved in triggering more automatic appraisal of one’s options (Dorsal Value Network) versus explicitly comparing between those options (Ventral Value Network). Participants undergoing fMRI were instructed to appraise how much they liked a set of products (Like) or to choose the product they most preferred (Choose). Activity in the Dorsal Value Network consistently tracked set liking, across both task-relevant (Like) and task-irrelevant (Choose) trials. In contrast, the Ventral Value Network was particularly sensitive to evaluation condition (more active during Choose than Like trials). Within vmPFC, anatomically distinct regions were dissociated in their sensitivity to choice (ventrally, in medial OFC) versus appraisal (dorsally, in pregenual ACC). Dorsal regions additionally tracked decision certainty across both types of evaluation. These findings suggest that separable mechanisms drive decisions about how good one’s options are versus decisions about which option is best.
Collapse
|
19
|
Generalization guides human exploration in vast decision spaces. Nat Hum Behav 2018; 2:915-924. [PMID: 30988442 DOI: 10.1038/s41562-018-0467-4] [Citation(s) in RCA: 80] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2017] [Accepted: 10/04/2018] [Indexed: 01/31/2023]
Abstract
From foraging for food to learning complex games, many aspects of human behaviour can be framed as a search problem with a vast space of possible actions. Under finite search horizons, optimal solutions are generally unobtainable. Yet, how do humans navigate vast problem spaces, which require intelligent exploration of unobserved actions? Using various bandit tasks with up to 121 arms, we study how humans search for rewards under limited search horizons, in which the spatial correlation of rewards (in both generated and natural environments) provides traction for generalization. Across various different probabilistic and heuristic models, we find evidence that Gaussian process function learning-combined with an optimistic upper confidence bound sampling strategy-provides a robust account of how people use generalization to guide search. Our modelling results and parameter estimates are recoverable and can be used to simulate human-like performance, providing insights about human behaviour in complex environments.
Collapse
|
20
|
Schulz E, Wu CM, Huys QJM, Krause A, Speekenbrink M. Generalization and Search in Risky Environments. Cogn Sci 2018; 42:2592-2620. [DOI: 10.1111/cogs.12695] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2017] [Revised: 09/26/2018] [Accepted: 09/26/2018] [Indexed: 12/01/2022]
Affiliation(s)
| | - Charley M. Wu
- Center for Adaptive Rationality Max Planck Institute for Human Development
| | | | | | | |
Collapse
|
21
|
Kaplan R, Friston KJ. Planning and navigation as active inference. BIOLOGICAL CYBERNETICS 2018; 112:323-343. [PMID: 29572721 PMCID: PMC6060791 DOI: 10.1007/s00422-018-0753-2] [Citation(s) in RCA: 85] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/17/2017] [Accepted: 03/07/2018] [Indexed: 05/05/2023]
Abstract
This paper introduces an active inference formulation of planning and navigation. It illustrates how the exploitation-exploration dilemma is dissolved by acting to minimise uncertainty (i.e. expected surprise or free energy). We use simulations of a maze problem to illustrate how agents can solve quite complicated problems using context sensitive prior preferences to form subgoals. Our focus is on how epistemic behaviour-driven by novelty and the imperative to reduce uncertainty about the world-contextualises pragmatic or goal-directed behaviour. Using simulations, we illustrate the underlying process theory with synthetic behavioural and electrophysiological responses during exploration of a maze and subsequent navigation to a target location. An interesting phenomenon that emerged from the simulations was a putative distinction between 'place cells'-that fire when a subgoal is reached-and 'path cells'-that fire until a subgoal is reached.
Collapse
Affiliation(s)
- Raphael Kaplan
- Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College London (UCL), 12 Queen Square, London, WC1N 3BG, UK
| | - Karl J Friston
- Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College London (UCL), 12 Queen Square, London, WC1N 3BG, UK.
| |
Collapse
|
22
|
Friston K. Am I Self-Conscious? (Or Does Self-Organization Entail Self-Consciousness?). Front Psychol 2018; 9:579. [PMID: 29740369 PMCID: PMC5928749 DOI: 10.3389/fpsyg.2018.00579] [Citation(s) in RCA: 62] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2018] [Accepted: 04/05/2018] [Indexed: 12/20/2022] Open
Abstract
Is self-consciousness necessary for consciousness? The answer is yes. So there you have it-the answer is yes. This was my response to a question I was asked to address in a recent AEON piece (https://aeon.co/essays/consciousness-is-not-a-thing-but-a-process-of-inference). What follows is based upon the notes for that essay, with a special focus on self-organization, self-evidencing and self-modeling. I will try to substantiate my (polemic) answer from the perspective of a physicist. In brief, the argument goes as follows: if we want to talk about creatures, like ourselves, then we have to identify the characteristic behaviors they must exhibit. This is fairly easy to do by noting that living systems return to a set of attracting states time and time again. Mathematically, this implies the existence of a Lyapunov function that turns out to be model evidence (i.e., self-evidence) in Bayesian statistics or surprise (i.e., self-information) in information theory. This means that all biological processes can be construed as performing some form of inference, from evolution through to conscious processing. If this is the case, at what point do we invoke consciousness? The proposal on offer here is that the mind comes into being when self-evidencing has a temporal thickness or counterfactual depth, which grounds inferences about the consequences of my action. On this view, consciousness is nothing more than inference about my future; namely, the self-evidencing consequences of what I could do.
Collapse
Affiliation(s)
- Karl Friston
- Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College London (UCL), London, United Kingdom
| |
Collapse
|
23
|
Continuous track paths reveal additive evidence integration in multistep decision making. Proc Natl Acad Sci U S A 2017; 114:10618-10623. [PMID: 28923918 DOI: 10.1073/pnas.1710913114] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Multistep decision making pervades daily life, but its underlying mechanisms remain obscure. We distinguish four prominent models of multistep decision making, namely serial stage, hierarchical evidence integration, hierarchical leaky competing accumulation (HLCA), and probabilistic evidence integration (PEI). To empirically disentangle these models, we design a two-step reward-based decision paradigm and implement it in a reaching task experiment. In a first step, participants choose between two potential upcoming choices, each associated with two rewards. In a second step, participants choose between the two rewards selected in the first step. Strikingly, as predicted by the HLCA and PEI models, the first-step decision dynamics were initially biased toward the choice representing the highest sum/mean before being redirected toward the choice representing the maximal reward (i.e., initial dip). Only HLCA and PEI predicted this initial dip, suggesting that first-step decision dynamics depend on additive integration of competing second-step choices. Our data suggest that potential future outcomes are progressively unraveled during multistep decision making.
Collapse
|
24
|
Tartaglia EM, Clarke AM, Herzog MH. What to Choose Next? A Paradigm for Testing Human Sequential Decision Making. Front Psychol 2017; 8:312. [PMID: 28326050 PMCID: PMC5339299 DOI: 10.3389/fpsyg.2017.00312] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Accepted: 02/20/2017] [Indexed: 11/13/2022] Open
Abstract
Many of the decisions we make in our everyday lives are sequential and entail sparse rewards. While sequential decision-making has been extensively investigated in theory (e.g., by reinforcement learning models) there is no systematic experimental paradigm to test it. Here, we developed such a paradigm and investigated key components of reinforcement learning models: the eligibility trace (i.e., the memory trace of previous decision steps), the external reward, and the ability to exploit the statistics of the environment's structure (model-free vs. model-based mechanisms). We show that the eligibility trace decays not with sheer time, but rather with the number of discrete decision steps made by the participants. We further show that, unexpectedly, neither monetary rewards nor the environment's spatial regularity significantly modulate behavioral performance. Finally, we found that model-free learning algorithms describe human performance better than model-based algorithms.
Collapse
Affiliation(s)
- Elisa M. Tartaglia
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL)Lausanne, Switzerland
- Aging in Vision and Action Lab, Sorbonne Universités, UPMC Univ Paris 06, INSERM, CNRS, Institut de la VisionParis, France
| | - Aaron M. Clarke
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL)Lausanne, Switzerland
- Psychology Department and Neuroscience Department, Aysel Sabuncu Brain Research Center, Bilkent UniversityAnkara, Turkey
| | - Michael H. Herzog
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL)Lausanne, Switzerland
| |
Collapse
|
25
|
Solway A, Lohrenz T, Montague PR. Simulating future value in intertemporal choice. Sci Rep 2017; 7:43119. [PMID: 28225034 PMCID: PMC5320483 DOI: 10.1038/srep43119] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Accepted: 01/19/2017] [Indexed: 12/23/2022] Open
Abstract
The laboratory study of how humans and other animals trade-off value and time has a long and storied history, and is the subject of a vast literature. However, despite a long history of study, there is no agreed upon mechanistic explanation of how intertemporal choice preferences arise. Several theorists have recently proposed model-based reinforcement learning as a candidate framework. This framework describes a suite of algorithms by which a model of the environment, in the form of a state transition function and reward function, can be converted on-line into a decision. The state transition function allows the model-based system to make decisions based on projected future states, while the reward function assigns value to each state, together capturing the necessary components for successful intertemporal choice. Empirical work has also pointed to a possible relationship between increased prospection and reduced discounting. In the current paper, we look for direct evidence of a relationship between temporal discounting and model-based control in a large new data set (n = 168). However, testing the relationship under several different modeling formulations revealed no indication that the two quantities are related.
Collapse
Affiliation(s)
- Alec Solway
- Virginia Tech Carilion Research Institute, Roanoke, VA, USA
| | - Terry Lohrenz
- Virginia Tech Carilion Research Institute, Roanoke, VA, USA
| | - P Read Montague
- Virginia Tech Carilion Research Institute, Roanoke, VA, USA.,Department of Physics, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA.,Wellcome Trust Centre for Neuroimaging, University College London, London, UK
| |
Collapse
|
26
|
Kaplan R, King J, Koster R, Penny WD, Burgess N, Friston KJ. The Neural Representation of Prospective Choice during Spatial Planning and Decisions. PLoS Biol 2017; 15:e1002588. [PMID: 28081125 PMCID: PMC5231323 DOI: 10.1371/journal.pbio.1002588] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2016] [Accepted: 12/14/2016] [Indexed: 01/17/2023] Open
Abstract
We are remarkably adept at inferring the consequences of our actions, yet the neuronal mechanisms that allow us to plan a sequence of novel choices remain unclear. We used functional magnetic resonance imaging (fMRI) to investigate how the human brain plans the shortest path to a goal in novel mazes with one (shallow maze) or two (deep maze) choice points. We observed two distinct anterior prefrontal responses to demanding choices at the second choice point: one in rostrodorsal medial prefrontal cortex (rd-mPFC)/superior frontal gyrus (SFG) that was also sensitive to (deactivated by) demanding initial choices and another in lateral frontopolar cortex (lFPC), which was only engaged by demanding choices at the second choice point. Furthermore, we identified hippocampal responses during planning that correlated with subsequent choice accuracy and response time, particularly in mazes affording sequential choices. Psychophysiological interaction (PPI) analyses showed that coupling between the hippocampus and rd-mPFC increases during sequential (deep versus shallow) planning and is higher before correct versus incorrect choices. In short, using a naturalistic spatial planning paradigm, we reveal how the human brain represents sequential choices during planning without extensive training. Our data highlight a network centred on the cortical midline and hippocampus that allows us to make prospective choices while maintaining initial choices during planning in novel environments. Using neuroimaging and computational modelling, this study explains how the human brain represents initial versus subsequent choices during spatial planning in novel environments. We are remarkably adept at inferring the consequences of our actions, even in novel situations. However, the neuronal mechanisms that allow us to plan a sequence of novel choices remain a mystery. One hypothesis is that anterior prefrontal brain regions can jump ahead from an initial decision to evaluate subsequent choices. Here, we examine how the brain represents initial versus subsequent choices of varying difficulty during spatial planning in novel environments. Specifically, participants visually searched for the shortest path to a goal in pictures of novel mazes that contained one or two path junctions. We monitored the participants’ brain activity during the task with functional magnetic resonance imaging (fMRI). We observed, in the anterior prefrontal brain, two distinct responses to demanding choices at the second junction: one in the rostrodorsal medial prefrontal cortex (rd-mPFC), which also signalled less demanding initial choices, and another one in the lateral frontopolar cortex (lFPC), which was only engaged by demanding choices at the second junction. Notably, interactions of the rd-mPFC with the hippocampus, a region associated with memory, increased when planning required extensive deliberation and particularly when planning led to accurate choices. Our findings show how humans can rapidly formulate a plan in novel environments. More broadly, these data uncover potential neural mechanisms underlying how we make inferences about states beyond a current subjective state.
Collapse
Affiliation(s)
- Raphael Kaplan
- Wellcome Trust Centre for Neuroimaging, UCL Institute of Neurology, University College London, London, United Kingdom
- * E-mail:
| | - John King
- UCL Institute of Cognitive Neuroscience, University College London, London, United Kingdom
- Clinical, Education and Health Psychology, University College London, London, United Kingdom
| | - Raphael Koster
- Wellcome Trust Centre for Neuroimaging, UCL Institute of Neurology, University College London, London, United Kingdom
- UCL Institute of Cognitive Neuroscience, University College London, London, United Kingdom
| | - William D. Penny
- Wellcome Trust Centre for Neuroimaging, UCL Institute of Neurology, University College London, London, United Kingdom
| | - Neil Burgess
- Wellcome Trust Centre for Neuroimaging, UCL Institute of Neurology, University College London, London, United Kingdom
- UCL Institute of Cognitive Neuroscience, University College London, London, United Kingdom
- UCL Institute of Neurology, University College London, London, United Kingdom
| | - Karl J. Friston
- Wellcome Trust Centre for Neuroimaging, UCL Institute of Neurology, University College London, London, United Kingdom
| |
Collapse
|
27
|
Abstract
We review the psychology and neuroscience of reinforcement learning (RL), which has experienced significant progress in the past two decades, enabled by the comprehensive experimental study of simple learning and decision-making tasks. However, one challenge in the study of RL is computational: The simplicity of these tasks ignores important aspects of reinforcement learning in the real world: (a) State spaces are high-dimensional, continuous, and partially observable; this implies that (b) data are relatively sparse and, indeed, precisely the same situation may never be encountered twice; furthermore, (c) rewards depend on the long-term consequences of actions in ways that violate the classical assumptions that make RL tractable. A seemingly distinct challenge is that, cognitively, theories of RL have largely involved procedural and semantic memory, the way in which knowledge about action values or world models extracted gradually from many experiences can drive choice. This focus on semantic memory leaves out many aspects of memory, such as episodic memory, related to the traces of individual events. We suggest that these two challenges are related. The computational challenge can be dealt with, in part, by endowing RL systems with episodic memory, allowing them to (a) efficiently approximate value functions over complex state spaces, (b) learn with very little data, and (c) bridge long-term dependencies between actions and rewards. We review the computational theory underlying this proposal and the empirical evidence to support it. Our proposal suggests that the ubiquitous and diverse roles of memory in RL may function as part of an integrated learning system.
Collapse
Affiliation(s)
- Samuel J Gershman
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, Massachusetts 02138;
| | - Nathaniel D Daw
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, New Jersey 08544
| |
Collapse
|