1
|
Higashi H. Dynamics of visual attention in exploration and exploitation for reward-guided adjustment tasks. Conscious Cogn 2024; 123:103724. [PMID: 38996747 DOI: 10.1016/j.concog.2024.103724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 06/24/2024] [Accepted: 06/26/2024] [Indexed: 07/14/2024]
Abstract
The learning process encompasses exploration and exploitation phases. While reinforcement learning models have revealed functional and neuroscientific distinctions between these phases, knowledge regarding how they affect visual attention while observing the external environment is limited. This study sought to elucidate the interplay between these learning phases and visual attention allocation using visual adjustment tasks combined with a two-armed bandit problem tailored to detect serial effects only when attention is dispersed across both arms. Per our findings, human participants exhibited a distinct serial effect only during the exploration phase, suggesting enhanced attention to the visual stimulus associated with the non-target arm. Remarkably, although rewards did not motivate attention dispersion in our task, during the exploration phase, individuals engaged in active observation and searched for targets to observe. This behavior highlights a unique information-seeking process in exploration that is distinct from exploitation.
Collapse
Affiliation(s)
- Hiroshi Higashi
- Graduate School of Engineering, Osaka University, Suita, Osaka, Japan.
| |
Collapse
|
2
|
Allen K, Brändle F, Botvinick M, Fan JE, Gershman SJ, Gopnik A, Griffiths TL, Hartshorne JK, Hauser TU, Ho MK, de Leeuw JR, Ma WJ, Murayama K, Nelson JD, van Opheusden B, Pouncy T, Rafner J, Rahwan I, Rutledge RB, Sherson J, Şimşek Ö, Spiers H, Summerfield C, Thalmann M, Vélez N, Watrous AJ, Tenenbaum JB, Schulz E. Using games to understand the mind. Nat Hum Behav 2024; 8:1035-1043. [PMID: 38907029 DOI: 10.1038/s41562-024-01878-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 04/03/2024] [Indexed: 06/23/2024]
Abstract
Board, card or video games have been played by virtually every individual in the world. Games are popular because they are intuitive and fun. These distinctive qualities of games also make them ideal for studying the mind. By being intuitive, games provide a unique vantage point for understanding the inductive biases that support behaviour in more complex, ecological settings than traditional laboratory experiments. By being fun, games allow researchers to study new questions in cognition such as the meaning of 'play' and intrinsic motivation, while also supporting more extensive and diverse data collection by attracting many more participants. We describe the advantages and drawbacks of using games relative to standard laboratory-based experiments and lay out a set of recommendations on how to gain the most from using games to study cognition. We hope this Perspective will lead to a wider use of games as experimental paradigms, elevating the ecological validity, scale and robustness of research on the mind.
Collapse
Affiliation(s)
| | | | | | | | | | - Alison Gopnik
- University of California, Berkeley, Berkeley, CA, USA
| | | | | | - Tobias U Hauser
- University College London, London, UK
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, London, UK
- University of Tübingen, Tübingen, Germany
| | - Mark K Ho
- Princeton University, Princeton, NJ, USA
| | | | - Wei Ji Ma
- New York University, New York, NY, USA
| | | | | | | | | | | | - Iyad Rahwan
- Center for Humans and Machines, Max Planck Institute for Human Development, Berlin, Germany
| | | | | | | | | | | | - Mirko Thalmann
- Max Planck Institute for Biological Cybernetics, Tübingen, Germany
| | | | | | | | - Eric Schulz
- Max Planck Institute for Biological Cybernetics, Tübingen, Germany.
| |
Collapse
|
3
|
Schurr R, Reznik D, Hillman H, Bhui R, Gershman SJ. Dynamic computational phenotyping of human cognition. Nat Hum Behav 2024; 8:917-931. [PMID: 38332340 PMCID: PMC11132988 DOI: 10.1038/s41562-024-01814-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Accepted: 12/21/2023] [Indexed: 02/10/2024]
Abstract
Computational phenotyping has emerged as a powerful tool for characterizing individual variability across a variety of cognitive domains. An individual's computational phenotype is defined as a set of mechanistically interpretable parameters obtained from fitting computational models to behavioural data. However, the interpretation of these parameters hinges critically on their psychometric properties, which are rarely studied. To identify the sources governing the temporal variability of the computational phenotype, we carried out a 12-week longitudinal study using a battery of seven tasks that measure aspects of human learning, memory, perception and decision making. To examine the influence of state effects, each week, participants provided reports tracking their mood, habits and daily activities. We developed a dynamic computational phenotyping framework, which allowed us to tease apart the time-varying effects of practice and internal states such as affective valence and arousal. Our results show that many phenotype dimensions covary with practice and affective factors, indicating that what appears to be unreliability may reflect previously unmeasured structure. These results support a fundamentally dynamic understanding of cognitive variability within an individual.
Collapse
Affiliation(s)
- Roey Schurr
- Department of Psychology, Center for Brain Sciences, Harvard University, Cambridge, MA, USA.
| | - Daniel Reznik
- Department of Psychology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany.
| | - Hanna Hillman
- Department of Psychology, Yale University, New Haven, CT, USA
| | - Rahul Bhui
- Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA, USA
- Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Samuel J Gershman
- Department of Psychology, Center for Brain Sciences, Harvard University, Cambridge, MA, USA
- Center for Brains, Minds, and Machines, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
4
|
Lloyd A, Roiser JP, Skeen S, Freeman Z, Badalova A, Agunbiade A, Busakhwe C, DeFlorio C, Marcu A, Pirie H, Saleh R, Snyder T, Fearon P, Viding E. Reviewing explore/exploit decision-making as a transdiagnostic target for psychosis, depression, and anxiety. COGNITIVE, AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2024:10.3758/s13415-024-01186-9. [PMID: 38653937 DOI: 10.3758/s13415-024-01186-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 03/27/2024] [Indexed: 04/25/2024]
Abstract
In many everyday decisions, individuals choose between trialling something novel or something they know well. Deciding when to try a new option or stick with an option that is already known to you, known as the "explore/exploit" dilemma, is an important feature of cognition that characterises a range of decision-making contexts encountered by humans. Recent evidence has suggested preferences in explore/exploit biases are associated with psychopathology, although this has typically been examined within individual disorders. The current review examined whether explore/exploit decision-making represents a promising transdiagnostic target for psychosis, depression, and anxiety. A systematic search of academic databases was conducted, yielding a total of 29 studies. Studies examining psychosis were mostly consistent in showing that individuals with psychosis explored more compared with individuals without psychosis. The literature on anxiety and depression was more heterogenous; some studies found that anxiety and depression were associated with more exploration, whereas other studies demonstrated reduced exploration in anxiety and depression. However, examining a subset of studies that employed case-control methods, there was some evidence that both anxiety and depression also were associated with increased exploration. Due to the heterogeneity across the literature, we suggest that there is insufficient evidence to conclude whether explore/exploit decision-making is a transdiagnostic target for psychosis, depression, and anxiety. However, alongside our advisory groups of lived experience advisors, we suggest that this context of decision-making is a promising candidate that merits further investigation using well-powered, longitudinal designs. Such work also should examine whether biases in explore/exploit choices are amenable to intervention.
Collapse
Affiliation(s)
- Alex Lloyd
- Clinical, Educational and Health Psychology, Psychology and Language Sciences, University College London, 26 Bedford Way, London, WC1H 0AP, UK.
| | - Jonathan P Roiser
- Institute of Cognitive Neuroscience, University College London, London, UK
| | - Sarah Skeen
- Institute for Life Course Health Research, Stellenbosch University, Stellenbosch, South Africa
| | - Ze Freeman
- Department of Psychology, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, UK
| | - Aygun Badalova
- Institute of Neurology, University College London, London, UK
| | | | | | | | - Anna Marcu
- Young People's Advisor Group, London, UK
| | | | | | | | - Pasco Fearon
- Clinical, Educational and Health Psychology, Psychology and Language Sciences, University College London, 26 Bedford Way, London, WC1H 0AP, UK
- Centre for Family Research, Department of Psychology, University of Cambridge, Cambridge, UK
| | - Essi Viding
- Clinical, Educational and Health Psychology, Psychology and Language Sciences, University College London, 26 Bedford Way, London, WC1H 0AP, UK
| |
Collapse
|
5
|
Arumugam D, Ho MK, Goodman ND, Van Roy B. Bayesian Reinforcement Learning With Limited Cognitive Load. Open Mind (Camb) 2024; 8:395-438. [PMID: 38665544 PMCID: PMC11045037 DOI: 10.1162/opmi_a_00132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Accepted: 02/16/2024] [Indexed: 04/28/2024] Open
Abstract
All biological and artificial agents must act given limits on their ability to acquire and process information. As such, a general theory of adaptive behavior should be able to account for the complex interactions between an agent's learning history, decisions, and capacity constraints. Recent work in computer science has begun to clarify the principles that shape these dynamics by bridging ideas from reinforcement learning, Bayesian decision-making, and rate-distortion theory. This body of work provides an account of capacity-limited Bayesian reinforcement learning, a unifying normative framework for modeling the effect of processing constraints on learning and action selection. Here, we provide an accessible review of recent algorithms and theoretical results in this setting, paying special attention to how these ideas can be applied to studying questions in the cognitive and behavioral sciences.
Collapse
Affiliation(s)
| | - Mark K. Ho
- Center for Data Science, New York University
| | - Noah D. Goodman
- Department of Computer Science, Stanford University
- Department of Psychology, Stanford University
| | - Benjamin Van Roy
- Department of Electrical Engineering, Stanford University
- Department of Management Science & Engineering, Stanford University
| |
Collapse
|
6
|
Claydon J, James WRG, Clarke ADF, Hunt AR. The role of framing, agency and uncertainty in a focus-divide dilemma. Mem Cognit 2024; 52:574-594. [PMID: 37922110 PMCID: PMC11021327 DOI: 10.3758/s13421-023-01484-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/13/2023] [Indexed: 11/05/2023]
Abstract
How to prioritise multiple objectives is a common dilemma of daily life. A simple and effective decision rule is to focus resources when the tasks are difficult, and divide when tasks are easy. Nonetheless, in experimental paradigms of this dilemma, participants make highly variable and suboptimal strategic decisions when asked to allocate resources to two competing goals that vary in difficulty. We developed a new version in which participants had to choose where to park a fire truck between houses of varying distances apart. Unlike in the previous versions of the dilemma, participants approached the optimal strategy in this task. Three key differences between the fire truck version and previous versions of the task were investigated: (1) Framing (whether the objectives are familiar or abstract), by comparing a group who placed cartoon trucks between houses to a group performing the same task with abstract shapes; (2) Agency (how much of the task is under the participants' direct control), by comparing groups who controlled the movement of the truck to those who did not; (3) Uncertainty, by adding variability to the driving speed of the truck to make success or failure on a given trial more difficult to predict. Framing and agency did not influence strategic decisions. When adding variability to outcomes, however, decisions shifted away from optimal. The results suggest choices become more variable when the outcome is less certain, consistent with exploration of response alternatives triggered by an inability to predict success.
Collapse
Affiliation(s)
- Justin Claydon
- School of Psychology, University of Aberdeen, Aberdeen, AB24 3UB, UK.
| | - Warren R G James
- School of Medical Sciences, University of Aberdeen, Aberdeen, UK
| | | | - Amelia R Hunt
- School of Psychology, University of Aberdeen, Aberdeen, AB24 3UB, UK
| |
Collapse
|
7
|
Paunov A, L'Hôtellier M, Guo D, He Z, Yu A, Meyniel F. Multiple and subject-specific roles of uncertainty in reward-guided decision-making. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.27.587016. [PMID: 38585958 PMCID: PMC10996615 DOI: 10.1101/2024.03.27.587016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Decision-making in noisy, changing, and partially observable environments entails a basic tradeoff between immediate reward and longer-term information gain, known as the exploration-exploitation dilemma. Computationally, an effective way to balance this tradeoff is by leveraging uncertainty to guide exploration. Yet, in humans, empirical findings are mixed, from suggesting uncertainty-seeking to indifference and avoidance. In a novel bandit task that better captures uncertainty-driven behavior, we find multiple roles for uncertainty in human choices. First, stable and psychologically meaningful individual differences in uncertainty preferences actually range from seeking to avoidance, which can manifest as null group-level effects. Second, uncertainty modulates the use of basic decision heuristics that imperfectly exploit immediate rewards: a repetition bias and win-stay-lose-shift heuristic. These heuristics interact with uncertainty, favoring heuristic choices under higher uncertainty. These results, highlighting the rich and varied structure of reward-based choice, are a step to understanding its functional basis and dysfunction in psychopathology.
Collapse
Affiliation(s)
- Alexander Paunov
- INSERM-CEA Cognitive Neuroimaging Unit (UNICOG), NeuroSpin Center, CEA Paris-Saclay, Gif-sur-Yvette, France Université de Paris, Paris, France
- Institut de Neuromodulation, GHU Paris, Psychiatrie et Neurosciences, Centre Hospitalier Sainte-Anne, Pôle Hospitalo-universitaire 15, Université Paris Cité, Paris, France
| | - Maëva L'Hôtellier
- INSERM-CEA Cognitive Neuroimaging Unit (UNICOG), NeuroSpin Center, CEA Paris-Saclay, Gif-sur-Yvette, France Université de Paris, Paris, France
| | - Dalin Guo
- Department of Cognitive Science, University of California San Diego, San Diego, CA, USA
| | - Zoe He
- Department of Cognitive Science, University of California San Diego, San Diego, CA, USA
| | - Angela Yu
- Department of Cognitive Science, University of California San Diego, San Diego, CA, USA
- Centre for Cognitive Science & Hessian AI Center, Technical University of Darmstadt, Germany
| | - Florent Meyniel
- INSERM-CEA Cognitive Neuroimaging Unit (UNICOG), NeuroSpin Center, CEA Paris-Saclay, Gif-sur-Yvette, France Université de Paris, Paris, France
- Institut de Neuromodulation, GHU Paris, Psychiatrie et Neurosciences, Centre Hospitalier Sainte-Anne, Pôle Hospitalo-universitaire 15, Université Paris Cité, Paris, France
| |
Collapse
|
8
|
Olschewski S, Scheibehenne B. What's in a sample? Epistemic uncertainty and metacognitive awareness in risk taking. Cogn Psychol 2024; 149:101642. [PMID: 38401485 DOI: 10.1016/j.cogpsych.2024.101642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 02/01/2024] [Accepted: 02/13/2024] [Indexed: 02/26/2024]
Abstract
In a fundamentally uncertain world, sound information processing is a prerequisite for effective behavior. Given that information processing is subject to inevitable cognitive imprecision, decision makers should adapt to this imprecision and to the resulting epistemic uncertainty when taking risks. We tested this metacognitive ability in two experiments in which participants estimated the expected value of different number distributions from sequential samples and then bet on their own estimation accuracy. Results show that estimates were imprecise, and this imprecision increased with higher distributional standard deviations. Importantly, participants adapted their risk-taking behavior to this imprecision and hence deviated from the predictions of Bayesian models of uncertainty that assume perfect integration of information. To explain these results, we developed a computational model that combines Bayesian updating with a metacognitive awareness of cognitive imprecision in the integration of information. Modeling results were robust to the inclusion of an empirical measure of participants' perceived variability. In sum, we show that cognitive imprecision is crucial to understanding risk taking in decisions from experience. The results further demonstrate the importance of metacognitive awareness as a cognitive building block for adaptive behavior under (partial) uncertainty.
Collapse
Affiliation(s)
- Sebastian Olschewski
- Department of Psychology, University of Basel, Switzerland; Warwick Business School, University of Warwick, United Kingdom.
| | | |
Collapse
|
9
|
Olschewski S, Luckman A, Mason A, Ludvig EA, Konstantinidis E. The Future of Decisions From Experience: Connecting Real-World Decision Problems to Cognitive Processes. PERSPECTIVES ON PSYCHOLOGICAL SCIENCE 2024; 19:82-102. [PMID: 37390328 PMCID: PMC10790535 DOI: 10.1177/17456916231179138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/02/2023]
Abstract
In many important real-world decision domains, such as finance, the environment, and health, behavior is strongly influenced by experience. Renewed interest in studying this influence led to important advancements in the understanding of these decisions from experience (DfE) in the last 20 years. Building on this literature, we suggest ways the standard experimental design should be extended to better approach important real-world DfE. These extensions include, for example, introducing more complex choice situations, delaying feedback, and including social interactions. When acting upon experiences in these richer and more complicated environments, extensive cognitive processes go into making a decision. Therefore, we argue for integrating cognitive processes more explicitly into experimental research in DfE. These cognitive processes include attention to and perception of numeric and nonnumeric experiences, the influence of episodic and semantic memory, and the mental models involved in learning processes. Understanding these basic cognitive processes can advance the modeling, understanding and prediction of DfE in the laboratory and in the real world. We highlight the potential of experimental research in DfE for theory integration across the behavioral, decision, and cognitive sciences. Furthermore, this research could lead to new methodology that better informs decision-making and policy interventions.
Collapse
Affiliation(s)
- Sebastian Olschewski
- Department of Psychology, University of Basel
- Warwick Business School, University of Warwick
| | - Ashley Luckman
- Warwick Business School, University of Warwick
- University of Exeter Business School, University of Exeter
| | - Alice Mason
- Department of Psychology, University of Bath
- Department of Psychology, University of Warwick
| | | | | |
Collapse
|
10
|
Pouchon A, Vinckier F, Dondé C, Gueguen MC, Polosan M, Bastin J. Reward and punishment learning deficits among bipolar disorder subtypes. J Affect Disord 2023; 340:694-702. [PMID: 37591352 DOI: 10.1016/j.jad.2023.08.075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 07/24/2023] [Accepted: 08/14/2023] [Indexed: 08/19/2023]
Abstract
BACKGROUND Reward sensitivity is an essential dimension related to mood fluctuations in bipolar disorder (BD), but there is currently a debate around hypersensitivity or hyposensitivity hypotheses to reward in BD during remission, probably related to a heterogeneous population within the BD spectrum and a lack of reward bias evaluation. Here, we examine reward maximization vs. punishment avoidance learning within the BD spectrum during remission. METHODS Patients with BD-I (n = 45), BD-II (n = 34) and matched (n = 30) healthy controls (HC) were included. They performed an instrumental learning task designed to dissociate reward-based from punishment-based reinforcement learning. Computational modeling was used to identify the mechanisms underlying reinforcement learning performances. RESULTS Behavioral results showed a significant reward learning deficit across BD subtypes compared to HC, captured at the computational level by a lower sensitivity to rewards compared to punishments in both BD subtypes. Computational modeling also revealed a higher choice randomness in BD-II compared to BD-I that reflected a tendency of BD-I to perform better during punishment avoidance learning than BD-II. LIMITATIONS Our patients were not naive to antipsychotic treatment and were not euthymic (but in syndromic remission) according to the International Society for Bipolar Disorder definition. CONCLUSIONS Our results are consistent with the reward hyposensitivity theory in BD. Computational modeling suggests distinct underlying mechanisms that produce similar observable behaviors, making it a useful tool for distinguishing how symptoms interact in BD versus other disorders. In the long run, a better understanding of these processes could contribute to better prevention and management of BD.
Collapse
Affiliation(s)
- Arnaud Pouchon
- Univ. Grenoble Alpes, Inserm, U1216, CHU Grenoble Alpes, Grenoble Institut Neurosciences, 38000 Grenoble, France; Department of Psychiatry, CHU Grenoble Alpes, 38000 Grenoble, France.
| | - Fabien Vinckier
- Motivation, Brain & Behavior (MBB) lab, Institut du Cerveau (ICM), Hôpital Pitié-Salpêtrière, F-75013 Paris, France; Université Paris Cité, F-75006 Paris, France; Department of Psychiatry, Service Hospitalo-Universitaire, GHU Paris Psychiatrie & Neurosciences, F-75014 Paris, France
| | - Clément Dondé
- Univ. Grenoble Alpes, Inserm, U1216, CHU Grenoble Alpes, Grenoble Institut Neurosciences, 38000 Grenoble, France; Department of Psychiatry, CHU Grenoble Alpes, 38000 Grenoble, France; Department of Psychiatry, CH Alpes-Isère, 38000 Saint-Egrève, France
| | - Maëlle Cm Gueguen
- Department of Psychiatry, University Behavioral Health Care & the Brain Health Institute, Rutgers University-New Brunswick, Piscataway, USA; Laureate Institute for Brain Research, Tulsa, OK 74136 USA
| | - Mircea Polosan
- Univ. Grenoble Alpes, Inserm, U1216, CHU Grenoble Alpes, Grenoble Institut Neurosciences, 38000 Grenoble, France; Department of Psychiatry, CHU Grenoble Alpes, 38000 Grenoble, France
| | - Julien Bastin
- Univ. Grenoble Alpes, Inserm, U1216, Grenoble Institut Neurosciences, 38000 Grenoble, France.
| |
Collapse
|
11
|
Lloyd A, Viding E, McKay R, Furl N. Understanding patch foraging strategies across development. Trends Cogn Sci 2023; 27:1085-1098. [PMID: 37500422 DOI: 10.1016/j.tics.2023.07.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 07/05/2023] [Accepted: 07/06/2023] [Indexed: 07/29/2023]
Abstract
Patch foraging is a near-ubiquitous behaviour across the animal kingdom and characterises many decision-making domains encountered by humans. We review how a disposition to explore in adolescence may reflect the evolutionary conditions under which hunter-gatherers foraged for resources. We propose that neurocomputational mechanisms responsible for reward processing, learning, and cognitive control facilitate the transition from exploratory strategies in adolescence to exploitative strategies in adulthood - where individuals capitalise on known resources. This developmental transition may be disrupted by psychopathology, as there is emerging evidence of biases in explore/exploit choices in mental health problems. Explore/exploit choices may be an informative marker for mental health across development and future research should consider this feature of decision-making as a target for clinical intervention.
Collapse
Affiliation(s)
- Alex Lloyd
- Clinical, Educational, and Health Psychology, Psychology and Language Sciences, University College London, 26 Bedford Way, London, WC1H 0AP, UK.
| | - Essi Viding
- Clinical, Educational, and Health Psychology, Psychology and Language Sciences, University College London, 26 Bedford Way, London, WC1H 0AP, UK
| | - Ryan McKay
- Department of Psychology, Royal Holloway, University of London, Egham Hill, Egham, TW20 0EX, UK
| | - Nicholas Furl
- Department of Psychology, Royal Holloway, University of London, Egham Hill, Egham, TW20 0EX, UK
| |
Collapse
|
12
|
Brändle F, Stocks LJ, Tenenbaum JB, Gershman SJ, Schulz E. Empowerment contributes to exploration behaviour in a creative video game. Nat Hum Behav 2023; 7:1481-1489. [PMID: 37488401 DOI: 10.1038/s41562-023-01661-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Accepted: 06/15/2023] [Indexed: 07/26/2023]
Abstract
Studies of human exploration frequently cast people as serendipitously stumbling upon good options. Yet these studies may not capture the richness of exploration strategies that people exhibit in more complex environments. Here we study behaviour in a large dataset of 29,493 players of the richly structured online game 'Little Alchemy 2'. In this game, players start with four elements, which they can combine to create up to 720 complex objects. We find that players are driven not only by external reward signals, such as an attempt to produce successful outcomes, but also by an intrinsic motivation to create objects that empower them to create even more objects. We find that this drive for empowerment is eliminated when playing a game variant that lacks recognizable semantics, indicating that people use their knowledge about the world and its possibilities to guide their exploration. Our results suggest that the drive for empowerment may be a potent source of intrinsic motivation in richly structured domains, particularly those that lack explicit reward signals.
Collapse
Affiliation(s)
| | - Lena J Stocks
- Max Planck Institute for Biological Cybernetics, Tübingen, Germany
| | - Joshua B Tenenbaum
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- Center for Brains, Minds, and Machines, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Samuel J Gershman
- Center for Brains, Minds, and Machines, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA, USA
| | - Eric Schulz
- Max Planck Institute for Biological Cybernetics, Tübingen, Germany
| |
Collapse
|
13
|
Sinclair AH, Wang YC, Adcock RA. Instructed motivational states bias reinforcement learning and memory formation. Proc Natl Acad Sci U S A 2023; 120:e2304881120. [PMID: 37490530 PMCID: PMC10401012 DOI: 10.1073/pnas.2304881120] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 06/19/2023] [Indexed: 07/27/2023] Open
Abstract
Motivation influences goals, decisions, and memory formation. Imperative motivation links urgent goals to actions, narrowing the focus of attention and memory. Conversely, interrogative motivation integrates goals over time and space, supporting rich memory encoding for flexible future use. We manipulated motivational states via cover stories for a reinforcement learning task: The imperative group imagined executing a museum heist, whereas the interrogative group imagined planning a future heist. Participants repeatedly chose among four doors, representing different museum rooms, to sample trial-unique paintings with variable rewards (later converted to bonus payments). The next day, participants performed a surprise memory test. Crucially, only the cover stories differed between the imperative and interrogative groups; the reinforcement learning task was identical, and all participants had the same expectations about how and when bonus payments would be awarded. In an initial sample and a preregistered replication, we demonstrated that imperative motivation increased exploitation during reinforcement learning. Conversely, interrogative motivation increased directed (but not random) exploration, despite the cost to participants' earnings. At test, the interrogative group was more accurate at recognizing paintings and recalling associated values. In the interrogative group, higher value paintings were more likely to be remembered; imperative motivation disrupted this effect of reward modulating memory. Overall, we demonstrate that a prelearning motivational manipulation can bias learning and memory, bearing implications for education, behavior change, clinical interventions, and communication.
Collapse
Affiliation(s)
- Alyssa H. Sinclair
- Department of Psychology & Neuroscience, Duke University, Durham, NC27710
| | - Yuxi C. Wang
- Department of Psychology & Neuroscience, Duke University, Durham, NC27710
| | - R. Alison Adcock
- Department of Psychology & Neuroscience, Duke University, Durham, NC27710
- Department of Psychiatry & Behavioral Sciences, Duke University, Durham, NC27710
| |
Collapse
|
14
|
Shourkeshti A, Marrocco G, Jurewicz K, Moore T, Ebitz RB. Pupil size predicts the onset of exploration in brain and behavior. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.24.541981. [PMID: 37292773 PMCID: PMC10245915 DOI: 10.1101/2023.05.24.541981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In uncertain environments, intelligent decision-makers exploit actions that have been rewarding in the past, but also explore actions that could be even better. Several neuromodulatory systems are implicated in exploration, based, in part, on work linking exploration to pupil size-a peripheral correlate of neuromodulatory tone and index of arousal. However, pupil size could instead track variables that make exploration more likely, like volatility or reward, without directly predicting either exploration or its neural bases. Here, we simultaneously measured pupil size, exploration, and neural population activity in the prefrontal cortex while two rhesus macaques explored and exploited in a dynamic environment. We found that pupil size under constant luminance specifically predicted the onset of exploration, beyond what could be explained by reward history. Pupil size also predicted disorganized patterns of prefrontal neural activity at both the single neuron and population levels, even within periods of exploitation. Ultimately, our results support a model in which pupil-linked mechanisms promote the onset of exploration via driving the prefrontal cortex through a critical tipping point where prefrontal control dynamics become disorganized and exploratory decisions are possible.
Collapse
Affiliation(s)
- Akram Shourkeshti
- Department of Neurosciences, Université de Montréal, Montréal, QC, Canada
| | - Gabriel Marrocco
- Department of Neurosciences, Université de Montréal, Montréal, QC, Canada
| | - Katarzyna Jurewicz
- Department of Neurosciences, Université de Montréal, Montréal, QC, Canada
- Department of Physiology, McGill University, Montréal, QC, Canada
| | - Tirin Moore
- Department of Neurobiology, Stanford University School of Medicine, Stanford, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - R. Becket Ebitz
- Department of Neurosciences, Université de Montréal, Montréal, QC, Canada
| |
Collapse
|
15
|
Turner MA, Moya C, Smaldino PE, Jones JH. The form of uncertainty affects selection for social learning. EVOLUTIONARY HUMAN SCIENCES 2023; 5:e20. [PMID: 37587949 PMCID: PMC10426062 DOI: 10.1017/ehs.2023.11] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 04/13/2023] [Accepted: 04/14/2023] [Indexed: 08/18/2023] Open
Abstract
Social learning is a critical adaptation for dealing with different forms of variability. Uncertainty is a severe form of variability where the space of possible decisions or probabilities of associated outcomes are unknown. We identified four theoretically important sources of uncertainty: temporal environmental variability; payoff ambiguity; selection-set size; and effective lifespan. When these combine, it is nearly impossible to fully learn about the environment. We develop an evolutionary agent-based model to test how each form of uncertainty affects the evolution of social learning. Agents perform one of several behaviours, modelled as a multi-armed bandit, to acquire payoffs. All agents learn about behavioural payoffs individually through an adaptive behaviour-choice model that uses a softmax decision rule. Use of vertical and oblique payoff-biased social learning evolved to serve as a scaffold for adaptive individual learning - they are not opposite strategies. Different types of uncertainty had varying effects. Temporal environmental variability suppressed social learning, whereas larger selection-set size promoted social learning, even when the environment changed frequently. Payoff ambiguity and lifespan interacted with other uncertainty parameters. This study begins to explain how social learning can predominate despite highly variable real-world environments when effective individual learning helps individuals recover from learning outdated social information.
Collapse
Affiliation(s)
- Matthew A. Turner
- Department of Earth System Science, Stanford University, Stanford, CA 94305 USA
- Division of Social Sciences, Stanford Doerr School of Sustainability, Stanford University, Stanford, CA 94305 USA
| | - Cristina Moya
- Department of Anthropology, University of California at Davis, Davis, CA 95616 USA
| | - Paul E. Smaldino
- Cognitive and Information Sciences, University of California at Merced, Merced, CA 95340 USA
- Santa Fe Institute, Santa Fe, NM 87501 USA
- Center for Advanced Study in the Behavioral Sciences, Stanford University, Stanford, CA 94305 USA
| | - James Holland Jones
- Department of Earth System Science, Stanford University, Stanford, CA 94305 USA
- Division of Social Sciences, Stanford Doerr School of Sustainability, Stanford University, Stanford, CA 94305 USA
- Center for Advanced Study in the Behavioral Sciences, Stanford University, Stanford, CA 94305 USA
| |
Collapse
|
16
|
Garvert MM, Saanum T, Schulz E, Schuck NW, Doeller CF. Hippocampal spatio-predictive cognitive maps adaptively guide reward generalization. Nat Neurosci 2023; 26:615-626. [PMID: 37012381 PMCID: PMC10076220 DOI: 10.1038/s41593-023-01283-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Accepted: 02/15/2023] [Indexed: 04/05/2023]
Abstract
The brain forms cognitive maps of relational knowledge-an organizing principle thought to underlie our ability to generalize and make inferences. However, how can a relevant map be selected in situations where a stimulus is embedded in multiple relational structures? Here, we find that both spatial and predictive cognitive maps influence generalization in a choice task, where spatial location determines reward magnitude. Mirroring behavior, the hippocampus not only builds a map of spatial relationships but also encodes the experienced transition structure. As the task progresses, participants' choices become more influenced by spatial relationships, reflected in a strengthening of the spatial map and a weakening of the predictive map. This change is driven by orbitofrontal cortex, which represents the degree to which an outcome is consistent with the spatial rather than the predictive map and updates hippocampal representations accordingly. Taken together, this demonstrates how hippocampal cognitive maps are used and updated flexibly for inference.
Collapse
Affiliation(s)
- Mona M Garvert
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany.
- Max Planck Research Group NeuroCode, Max Planck Institute for Human Development, Berlin, Germany.
- Max Planck Institute for Biological Cybernetics, Tübingen, Germany.
| | - Tankred Saanum
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, Berlin, Germany
| | - Eric Schulz
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, Berlin, Germany
| | - Nicolas W Schuck
- Max Planck Research Group NeuroCode, Max Planck Institute for Human Development, Berlin, Germany
- Max Planck Institute for Biological Cybernetics, Tübingen, Germany
- Institute of Psychology, Universität Hamburg, Hamburg, Germany
| | - Christian F Doeller
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany.
- Kavli Institute for Systems Neuroscience, Centre for Neural Computation, The Egil and Pauline Braathen and Fred Kavli Centre for Cortical Microcircuits, Jebsen Centre for Alzheimer's Disease NTNU, Trondheim, Norway.
- Wilhelm Wundt Institute of Psychology, Leipzig University, Leipzig, Germany.
| |
Collapse
|
17
|
Wang S, Gerken B, Wieland JR, Wilson RC, Fellous JM. The effects of time horizon and guided choices on explore-exploit decisions in rodents. Behav Neurosci 2023; 137:127-142. [PMID: 36633987 PMCID: PMC10787949 DOI: 10.1037/bne0000549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Humans and animals have to balance the need for exploring new options with exploiting known options that yield good outcomes. This tradeoff is known as the explore-exploit dilemma. To better understand the neural mechanisms underlying how humans and animals address the explore-exploit dilemma, a good animal behavioral model is critical. Most previous rodents explore-exploit studies used ethologically unrealistic operant boxes and reversal learning paradigms in which the decision to abandon a bad option is confounded by the need for exploring a novel option for information collection, making it difficult to separate different drives and heuristics for exploration. In this study, we investigated how rodents make explore-exploit decisions using a spatial navigation horizon task (Wilson et al., 2014) adapted to rats to address the above limitations. We compared the rats' performance to that of humans using identical measures. We showed that rats use prior information to effectively guide exploration. In addition, rats use information-driven directed exploration like humans, but the extent to which they explore has the opposite dependance on time horizon than humans. Moreover, we found that free choices and guided choices have different influences on exploration in rodents, a finding that has not yet been tested in humans. This study reveals that the explore-exploit spatial behavior of rats is more complex than previously thought. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
Collapse
|
18
|
Speers LJ, Bilkey DK. Maladaptive explore/exploit trade-offs in schizophrenia. Trends Neurosci 2023; 46:341-354. [PMID: 36878821 DOI: 10.1016/j.tins.2023.02.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 01/30/2023] [Accepted: 02/08/2023] [Indexed: 03/07/2023]
Abstract
Schizophrenia is a complex disorder that remains poorly understood, particularly at the systems level. In this opinion article we argue that the explore/exploit trade-off concept provides a holistic and ecologically valid framework to resolve some of the apparent paradoxes that have emerged within schizophrenia research. We review recent evidence suggesting that fundamental explore/exploit behaviors may be maladaptive in schizophrenia during physical, visual, and cognitive foraging. We also describe how theories from the broader optimal foraging literature, such as the marginal value theorem (MVT), could provide valuable insight into how aberrant processing of reward, context, and cost/effort evaluations interact to produce maladaptive responses.
Collapse
Affiliation(s)
- Lucinda J Speers
- Department of Psychology, University of Otago, Dunedin 9016, New Zealand
| | - David K Bilkey
- Department of Psychology, University of Otago, Dunedin 9016, New Zealand.
| |
Collapse
|
19
|
Culbreth AJ, Schwartz EK, Frank MJ, Brown EC, Xu Z, Chen S, Gold JM, Waltz JA. A computational neuroimaging study of reinforcement learning and goal-directed exploration in schizophrenia spectrum disorders. Psychol Med 2023; 53:1-11. [PMID: 36752156 DOI: 10.1017/s0033291722003993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/09/2023]
Abstract
BACKGROUND Prior evidence indicates that negative symptom severity and cognitive deficits, in people with schizophrenia (PSZ), relate to measures of reward-seeking and loss-avoidance behavior (implicating the ventral striatum/VS), as well as uncertainty-driven exploration (reliant on rostrolateral prefrontal cortex/rlPFC). While neural correlates of reward-seeking and loss-avoidance have been examined in PSZ, neural correlates of uncertainty-driven exploration have not. Understanding neural correlates of uncertainty-driven exploration is an important next step that could reveal insights to how this mechanism of cognitive and negative symptoms manifest at a neural level. METHODS We acquired fMRI data from 29 PSZ and 36 controls performing the Temporal Utility Integration decision-making task. Computational analyses estimated parameters corresponding to learning rates for both positive and negative reward prediction errors (RPEs) and the degree to which participates relied on representations of relative uncertainty. Trial-wise estimates of expected value, certainty, and RPEs were generated to model fMRI data. RESULTS Behaviorally, PSZ demonstrated reduced reward-seeking behavior compared to controls, and negative symptoms were positively correlated with loss-avoidance behavior. This finding of a bias toward loss avoidance learning in PSZ is consistent with previous work. Surprisingly, neither behavioral measures of exploration nor neural correlates of uncertainty in the rlPFC differed significantly between groups. However, we showed that trial-wise estimates of relative uncertainty in the rlPFC distinguished participants who engaged in exploratory behavior from those who did not. rlPFC activation was positively associated with intellectual function. CONCLUSIONS These results further elucidate the nature of reinforcement learning and decision-making in PSZ and healthy volunteers.
Collapse
Affiliation(s)
- A J Culbreth
- Department of Psychiatry, Maryland Psychiatric Research Center (MPRC), University of Maryland School of Medicine, Baltimore, MD, USA
| | | | - M J Frank
- Department of Cognitive, Linguistic and Psychological Sciences, Brown University, Providence, RI, USA
- Department of Psychiatry and Brown Institute for Brain Science, Brown University, Providence, RI, USA
| | - E C Brown
- School of Health and Care Management, Arden University, Berlin, Germany
| | - Z Xu
- Applied LifeSciences & Systems, Morrisville, NC, USA
| | - S Chen
- Department of Psychiatry, Maryland Psychiatric Research Center (MPRC), University of Maryland School of Medicine, Baltimore, MD, USA
- Division of Biostatistics and Bioinformatics, Department of Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore, MD, USA
| | - J M Gold
- Department of Psychiatry, Maryland Psychiatric Research Center (MPRC), University of Maryland School of Medicine, Baltimore, MD, USA
| | - J A Waltz
- Department of Psychiatry, Maryland Psychiatric Research Center (MPRC), University of Maryland School of Medicine, Baltimore, MD, USA
| |
Collapse
|
20
|
Bergenholtz C, Vuculescu O, Amidi A. Microfoundations of Adaptive Search in Complex Tasks: The Role of Cognitive Abilities and Styles. ORGANIZATION SCIENCE 2023. [DOI: 10.1287/orsc.2023.1654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Problem-solving in complex environments requires a cognitively demanding search for task solutions. Managing this search process presents a major challenge in organizations. We contribute to the literature on this topic by providing new evidence on the cognitive antecedents that shape how individuals search when engaged in complex problem-solving tasks. We present results from three laboratory studies, wherein 335 individuals solved a complex task. In doing so, they generated behavioral data coupled with survey-based measurements of the individuals’ cognitive styles and performance-based tests of their cognitive abilities. Our data analysis contributes to the current literature by documenting systematic heterogeneity in the persistence and distance of search that can be explained by the participants’ level of creativity, attention to detail, and executive functions. We extend the research on the microfoundations of adaptive search by linking cognitive antecedents with a complex search task, widening our insight into what search behavior certain cognitive microfoundations lead to, and showing how managers can more effectively shape organizational search. History: This paper has been accepted for the Organization Science Special Issue on Experiments in Organizational Theory. Supplemental Material: The online appendix is available at https://doi.org/10.1287/orsc.2023.1654 .
Collapse
Affiliation(s)
| | - Oana Vuculescu
- Department of Management, Aarhus University, Aarhus, Denmark
| | - Ali Amidi
- Department of Psychology and Behavioural Sciences, Aarhus University, Aarhus, Denmark
| |
Collapse
|
21
|
Trait somatic anxiety is associated with reduced directed exploration and underestimation of uncertainty. Nat Hum Behav 2023; 7:102-113. [PMID: 36192493 DOI: 10.1038/s41562-022-01455-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 08/26/2022] [Indexed: 02/01/2023]
Abstract
Anxiety has been related to decreased physical exploration, but past findings on the interaction between anxiety and exploration during decision making were inconclusive. Here we examined how latent factors of trait anxiety relate to different exploration strategies when facing volatility-induced uncertainty. Across two studies (total N = 985), we demonstrated that people used a hybrid of directed, random and undirected exploration strategies, which were respectively sensitive to relative uncertainty, total uncertainty and value difference. Trait somatic anxiety, that is, the propensity to experience physical symptoms of anxiety, was inversely correlated with directed exploration and undirected exploration, manifesting as a lesser likelihood for choosing the uncertain option and reducing choice stochasticity regardless of uncertainty. Somatic anxiety is also associated with underestimation of relative uncertainty. Together, these results reveal the selective role of trait somatic anxiety in modulating both uncertainty-driven and value-driven exploration strategies.
Collapse
|
22
|
Sharot T, Rollwage M, Sunstein CR, Fleming SM. Why and When Beliefs Change. PERSPECTIVES ON PSYCHOLOGICAL SCIENCE 2023; 18:142-151. [PMID: 35939828 DOI: 10.1177/17456916221082967] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Why people do or do not change their beliefs has been a long-standing puzzle. Sometimes people hold onto false beliefs despite ample contradictory evidence; sometimes they change their beliefs without sufficient reason. Here, we propose that the utility of a belief is derived from the potential outcomes associated with holding it. Outcomes can be internal (e.g., positive/negative feelings) or external (e.g., material gain/loss), and only some are dependent on belief accuracy. Belief change can then be understood as an economic transaction in which the multidimensional utility of the old belief is compared against that of the new belief. Change will occur when potential outcomes alter across attributes, for example because of changing environments or when certain outcomes are made more or less salient.
Collapse
Affiliation(s)
- Tali Sharot
- Department of Experimental Psychology, University College London.,Max Planck University College London Centre for Computational Psychiatry and Ageing Research.,Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology
| | - Max Rollwage
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology.,Wellcome Centre for Human Neuroimaging, University College London
| | | | - Stephen M Fleming
- Department of Experimental Psychology, University College London.,Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology.,Wellcome Centre for Human Neuroimaging, University College London
| |
Collapse
|
23
|
Cogliati Dezza I, Maher C, Sharot T. People adaptively use information to improve their internal states and external outcomes. Cognition 2022; 228:105224. [PMID: 35850045 PMCID: PMC10510028 DOI: 10.1016/j.cognition.2022.105224] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 07/05/2022] [Accepted: 07/06/2022] [Indexed: 11/23/2022]
Abstract
Information can strongly impact people's affect, their level of uncertainty and their decisions. It is assumed that people seek information with the goal of improving all three. But are they successful at achieving this goal? Answering this question is important for assessing the impact of self-driven information consumption on people's well-being. Here, over five experiments (total N = 727) we show that participants accurately predict the impact of information on their internal states (e.g., affect and cognition) and external outcomes (e.g., material rewards), and use these predictions to guide information-seeking choices. A model incorporating participants' subjective expectations regarding the impact of information on their affective, cognitive, and material outcomes accounted for information-seeking choices better than a model that included only objective proxies of those measures. This model also accounted for individual differences in information-seeking choices. By balancing considerations of the impact of information on affective, cognitive and material outcomes when seeking knowledge, participants became happier, more certain and made better decisions when they sought information relative to when they did not, suggesting that the actual consequences of receiving information aligned with their subjective expectations.
Collapse
Affiliation(s)
- I Cogliati Dezza
- Department of Experimental Psychology, Faculty of Brain Sciences, University College London, 26 Bedford Way, London WC1H 0AP, UK; Department of Experimental Psychology, Ghent University, Henri Dunantlaan 2, Ghent, BE, Belgium; The Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, 10-12 Russell Square, London, WC1B 5EH, UK.
| | - C Maher
- Department of Experimental Psychology, Faculty of Brain Sciences, University College London, 26 Bedford Way, London WC1H 0AP, UK; The Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, 10-12 Russell Square, London, WC1B 5EH, UK
| | - T Sharot
- Department of Experimental Psychology, Faculty of Brain Sciences, University College London, 26 Bedford Way, London WC1H 0AP, UK; The Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, 10-12 Russell Square, London, WC1B 5EH, UK; Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar St, Cambridge, MA 02139, USA.
| |
Collapse
|
24
|
Speekenbrink M. Chasing Unknown Bandits: Uncertainty Guidance in Learning and Decision Making. CURRENT DIRECTIONS IN PSYCHOLOGICAL SCIENCE 2022. [DOI: 10.1177/09637214221105051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In repeated decision problems for which it is possible to learn from experience, people should actively seek out uncertain options, rather than avoid ambiguity or uncertainty, in order to learn and improve future decisions. Research on human behavior in a variety of multiarmed-bandit tasks supports this prediction. Multiarmed-bandit tasks involve repeated decisions between options with initially unknown reward distributions and require a careful balance between learning about relatively unknown options (exploration) and obtaining high immediate rewards (exploitation). Resolving this exploration-exploitation dilemma optimally requires considering not only the estimated value of each option, but also the uncertainty in these estimations. Bayesian learning naturally quantifies uncertainty and hence provides a principled framework to study how humans resolve this dilemma. On the basis of computational modeling and behavioral results in bandit tasks, I argue that human learning, attention, and exploration are guided by uncertainty. These results support Bayesian theories of cognition and underpin the fundamental role of subjective uncertainty in both learning and decision making.
Collapse
Affiliation(s)
- Maarten Speekenbrink
- Department of Experimental Psychology, University College London, and The Alan Turing Institute, London, England
| |
Collapse
|
25
|
A Path-Curvature Measure for Word-Based Strategy Searches in Semantic Networks. Symmetry (Basel) 2022. [DOI: 10.3390/sym14081737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Building on a modified version of the Haantjes path-based curvature, this article offers a novel measure that considers the direction of a stream of associations in a semantic network and estimates the extent to which any single association attracts the upcoming associations to its environment—in other words, to what degree one explores that environment. We demonstrate that our measure differs from Haantjes curvature and confirm that it expresses the extent to which a stream of associations remains close to its starting point. Finally, we examine the relationship between our measure and accessibility to knowledge stored in memory. We demonstrate that a high degree of attraction facilitates the retrieval of upcoming words in the stream. By applying methods from differential geometry to semantic networks, this study contributes to our understanding of strategic search in memory.
Collapse
|
26
|
Rojas GR, Curry-Pochy LS, Chen CS, Heller AT, Grissom NM. Sequential delay and probability discounting tasks in mice reveal anchoring effects partially attributable to decision noise. Behav Brain Res 2022; 431:113951. [PMID: 35661751 PMCID: PMC9844124 DOI: 10.1016/j.bbr.2022.113951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 05/20/2022] [Accepted: 05/29/2022] [Indexed: 01/19/2023]
Abstract
Delay discounting and probability discounting decision making tasks in rodent models have high translational potential. However, it is unclear whether the discounted value of the large reward option is the main contributor to variability in animals' choices in either task, which may limit translation to humans. Male and female mice underwent sessions of delay and probability discounting in sequence to assess how choice behavior adapts over experience with each task. To control for "anchoring" (persistent choices based on the initial delay or probability), mice experienced "Worsening" schedules where the large reward was offered under initially favorable conditions that became less favorable during testing, followed by "Improving" schedules where the large reward was offered under initially unfavorable conditions that improved over a session. During delay discounting, both male and female mice showed elimination of anchoring effects over training. In probability discounting, both sexes of mice continued to show some anchoring even after months of training. One possibility is that "noisy", exploratory choices could contribute to these persistent anchoring effects, rather than constant fluctuations in value discounting. We fit choice behavior in individual animals using models that included both a value-based discounting parameter and a decision noise parameter that captured variability in choices deviating from value maximization. Changes in anchoring behavior over time were tracked by changes in both the value and decision noise parameters in delay discounting, but by the decision noise parameter in probability discounting. Exploratory decision making was also reflected in choice response times that tracked the degree of conflict caused by both uncertainty and temporal cost, but was not linked with differences in locomotor activity reflecting chamber exploration. Thus, variable discounting behavior in mice can result from changes in exploration of the decision options rather than changes in reward valuation.
Collapse
|
27
|
Time pressure changes how people explore and respond to uncertainty. Sci Rep 2022; 12:4122. [PMID: 35260717 PMCID: PMC8904509 DOI: 10.1038/s41598-022-07901-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Accepted: 02/28/2022] [Indexed: 12/25/2022] Open
Abstract
How does time pressure influence exploration and decision-making? We investigated this question with several four-armed bandit tasks manipulating (within subjects) expected reward, uncertainty, and time pressure (limited vs. unlimited). With limited time, people have less opportunity to perform costly computations, thus shifting the cost-benefit balance of different exploration strategies. Through behavioral, reinforcement learning (RL), reaction time (RT), and evidence accumulation analyses, we show that time pressure changes how people explore and respond to uncertainty. Specifically, participants reduced their uncertainty-directed exploration under time pressure, were less value-directed, and repeated choices more often. Since our analyses relate uncertainty to slower responses and dampened evidence accumulation (i.e., drift rates), this demonstrates a resource-rational shift towards simpler, lower-cost strategies under time pressure. These results shed light on how people adapt their exploration and decision-making strategies to externally imposed cognitive constraints.
Collapse
|
28
|
Mikhael JG, Gershman SJ. Impulsivity and risk-seeking as Bayesian inference under dopaminergic control. Neuropsychopharmacology 2022; 47:465-476. [PMID: 34376813 PMCID: PMC8674258 DOI: 10.1038/s41386-021-01125-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/01/2020] [Revised: 07/17/2021] [Accepted: 07/21/2021] [Indexed: 02/07/2023]
Abstract
Bayesian models successfully account for several of dopamine (DA)'s effects on contextual calibration in interval timing and reward estimation. In these models, tonic levels of DA control the precision of stimulus encoding, which is weighed against contextual information when making decisions. When DA levels are high, the animal relies more heavily on the (highly precise) stimulus encoding, whereas when DA levels are low, the context affects decisions more strongly. Here, we extend this idea to intertemporal choice and probability discounting tasks. In intertemporal choice tasks, agents must choose between a small reward delivered soon and a large reward delivered later, whereas in probability discounting tasks, agents must choose between a small reward that is always delivered and a large reward that may be omitted with some probability. Beginning with the principle that animals will seek to maximize their reward rates, we show that the Bayesian model predicts a number of curious empirical findings in both tasks. First, the model predicts that higher DA levels should normally promote selection of the larger/later option, which is often taken to imply that DA decreases 'impulsivity,' and promote selection of the large/risky option, often taken to imply that DA increases 'risk-seeking.' However, if the temporal precision is sufficiently decreased, higher DA levels should have the opposite effect-promoting selection of the smaller/sooner option (higher impulsivity) and the small/safe option (lower risk-seeking). Second, high enough levels of DA can result in preference reversals. Third, selectively decreasing the temporal precision, without manipulating DA, should promote selection of the larger/later and large/risky options. Fourth, when a different post-reward delay is associated with each option, animals will not learn the option-delay contingencies, but this learning can be salvaged when the post-reward delays are made more salient. Finally, the Bayesian model predicts correlations among behavioral phenotypes: Animals that are better timers will also appear less impulsive.
Collapse
Affiliation(s)
- John G. Mikhael
- grid.38142.3c000000041936754XProgram in Neuroscience, Harvard Medical School, Boston, MA USA ,grid.38142.3c000000041936754XMD-PhD Program, Harvard Medical School, Boston, MA USA
| | - Samuel J. Gershman
- grid.38142.3c000000041936754XDepartment of Psychology and Center for Brain Science, Harvard University, Cambridge, MA USA ,grid.116068.80000 0001 2341 2786Center for Brains, Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA USA
| |
Collapse
|
29
|
Liquin EG, Gopnik A. Children are more exploratory and learn more than adults in an approach-avoid task. Cognition 2021; 218:104940. [PMID: 34715584 DOI: 10.1016/j.cognition.2021.104940] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2020] [Revised: 08/31/2021] [Accepted: 10/08/2021] [Indexed: 02/06/2023]
Abstract
Intuitively, children appear to be more exploratory than adults, and this exploration seems to help children learn,. However, there have been few clear tests of these ideas. We test whether exploration and learning change across development using a task that presents a "learning trap." In this task, exploitation-maximizing immediate reward and avoiding costs-may lead the learner to draw incorrect conclusions, while exploration may lead to better learning but be more costly. In Studies 1, 2, and 3 we find that preschoolers and early school-aged children explore more than adults and learn the true structure of the environment better. Study 3 demonstrates that children explore more than adults even though they, like adults, predict that exploration will be costly, and it shows that exploration and learning are correlated. Study 4 shows that children's and adults' learning depends on the evidence they generate during exploration: children exposed to adult-like evidence learn like adults, and adults exposed to child-like evidence learn like children. Together, these studies support the idea that children may be more exploratory than adults, and this increased exploration influences learning.
Collapse
Affiliation(s)
- Emily G Liquin
- Department of Psychology, 2121 Berkeley Way, University of California, Berkeley, Berkeley, CA, USA.
| | - Alison Gopnik
- Department of Psychology, 2121 Berkeley Way, University of California, Berkeley, Berkeley, CA, USA.
| |
Collapse
|
30
|
Ghambaryan A, Gutkin B, Klucharev V, Koechlin E. Additively Combining Utilities and Beliefs: Research Gaps and Algorithmic Developments. Front Neurosci 2021; 15:704728. [PMID: 34658760 PMCID: PMC8517513 DOI: 10.3389/fnins.2021.704728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 09/13/2021] [Indexed: 11/20/2022] Open
Abstract
Value-based decision making in complex environments, such as those with uncertain and volatile mapping of reward probabilities onto options, may engender computational strategies that are not necessarily optimal in terms of normative frameworks but may ensure effective learning and behavioral flexibility in conditions of limited neural computational resources. In this article, we review a suboptimal strategy - additively combining reward magnitude and reward probability attributes of options for value-based decision making. In addition, we present computational intricacies of a recently developed model (named MIX model) representing an algorithmic implementation of the additive strategy in sequential decision-making with two options. We also discuss its opportunities; and conceptual, inferential, and generalization issues. Furthermore, we suggest future studies that will reveal the potential and serve the further development of the MIX model as a general model of value-based choice making.
Collapse
Affiliation(s)
- Anush Ghambaryan
- Centre for Cognition and Decision Making, HSE University, Moscow, Russia
- Ecole Normale Supérieure, PSL Research University, Paris, France
| | - Boris Gutkin
- Centre for Cognition and Decision Making, HSE University, Moscow, Russia
- Ecole Normale Supérieure, PSL Research University, Paris, France
| | - Vasily Klucharev
- Centre for Cognition and Decision Making, HSE University, Moscow, Russia
| | - Etienne Koechlin
- Ecole Normale Supérieure, PSL Research University, Paris, France
| |
Collapse
|
31
|
Abstract
Modulation of cognitive control by emotion and motivation has become a major topic in cognition research; however, characterizing the extent to which these influences may dissociate has proved challenging. Here, I examine recent advances in this literature, focusing on: (1) neuromodulator mechanisms underlying positive affect and reward motivation effects on cognitive control; (2) contingency and associative learning in interactions between affect/reward and cognitive control; (3) aspects of task design, unrelated to affect/reward, that may have acted as confounding influences on cognitive control in prior work. I suggest that positive affect and reward should not be considered singular in their effects on cognitive control, but instead varying on multiple parameters and interacting with task demands, to determine goal-directed, adaptive behavior.
Collapse
|
32
|
Barnes K, Rottman BM, Colagiuri B. The placebo effect: To explore or to exploit? Cognition 2021; 214:104753. [PMID: 34023671 DOI: 10.1016/j.cognition.2021.104753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 04/21/2021] [Accepted: 04/26/2021] [Indexed: 11/26/2022]
Abstract
How people choose between options with differing outcomes (explore-exploit) is a central question to understanding human behaviour. However, the standard explore-exploit paradigm relies on gamified tasks with low-stake outcomes. Consequently, little is known about decision making for biologically-relevant stimuli. Here, we combined placebo and explore-exploit paradigms to examine detection and selection of the most effective treatment in a pain model. During conditioning, where 'optimal' and 'suboptimal' sham-treatments were paired with a reduction in electrical pain stimulation, participants learnt which treatment most successfully reduced pain. Modelling participant responses revealed three important findings. First, participants' choices reflected both directed and random exploration. Second, expectancy modulated pain - indicative of recursive placebo effects. Third, individual differences in terms of expectancy during conditioning predicted placebo effects during a subsequent test phase. These findings reveal directed and random exploration when the outcome is biologically-relevant. Moreover, this research shows how placebo and explore-exploit literatures can be unified.
Collapse
|
33
|
Gilbertson T, Steele D. Tonic dopamine, uncertainty and basal ganglia action selection. Neuroscience 2021; 466:109-124. [PMID: 34015370 DOI: 10.1016/j.neuroscience.2021.05.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 05/04/2021] [Accepted: 05/08/2021] [Indexed: 11/29/2022]
Abstract
To make optimal decisions in uncertain circumstances flexible adaption of behaviour is required; exploring alternatives when the best choice is unknown, exploiting what is known when that is best. Using a computational model of the basal ganglia, we propose that switches between exploratory and exploitative decisions are mediated by the interaction between tonic dopamine and cortical input to the basal ganglia. We show that a biologically detailed action selection circuit model, endowed with dopamine dependant striatal plasticity, can optimally solve the explore-exploit problem, estimating the true underlying state of a noisy Gaussian diffusion process. Critical to the model's performance was a fluctuating level of tonic dopamine which increased under conditions of uncertainty. With an optimal range of tonic dopamine, explore-exploit decisions were mediated by the effects of tonic dopamine on the precision of the model action selection mechanism. Under conditions of uncertain reward pay-out, the model's reduced selectivity allowed disinhibition of multiple alternative actions to be explored at random. Conversely, when uncertainly about reward pay-out was low, enhanced selectivity of the action selection circuit facilitated exploitation of the high value choice. Model performance was at the level of a Kalman filter which provides an optimal solution for the task. These simulations support the idea that this subcortical neural circuit may have evolved to facilitate decision making in non-stationary reward environments. The model generates several experimental predictions with relevance to abnormal decision making in neuropsychiatric and neurological disease.
Collapse
Affiliation(s)
- Tom Gilbertson
- Department of Neurology, Level 6, South Block, Ninewells Hospital & Medical School, Dundee DD2 4BF, UK; Division of Imaging Science and Technology, Medical School, University of Dundee, DD2 4BF, UK.
| | - Douglas Steele
- Division of Imaging Science and Technology, Medical School, University of Dundee, DD2 4BF, UK
| |
Collapse
|
34
|
Human Belief State-Based Exploration and Exploitation in an Information-Selective Symmetric Reversal Bandit Task. COMPUTATIONAL BRAIN & BEHAVIOR 2021; 4:442-462. [PMID: 34368622 PMCID: PMC8327602 DOI: 10.1007/s42113-021-00112-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Accepted: 05/24/2021] [Indexed: 02/07/2023]
Abstract
Humans often face sequential decision-making problems, in which information about the environmental reward structure is detached from rewards for a subset of actions. In the current exploratory study, we introduce an information-selective symmetric reversal bandit task to model such situations and obtained choice data on this task from 24 participants. To arbitrate between different decision-making strategies that participants may use on this task, we developed a set of probabilistic agent-based behavioral models, including exploitative and explorative Bayesian agents, as well as heuristic control agents. Upon validating the model and parameter recovery properties of our model set and summarizing the participants' choice data in a descriptive way, we used a maximum likelihood approach to evaluate the participants' choice data from the perspective of our model set. In brief, we provide quantitative evidence that participants employ a belief state-based hybrid explorative-exploitative strategy on the information-selective symmetric reversal bandit task, lending further support to the finding that humans are guided by their subjective uncertainty when solving exploration-exploitation dilemmas. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s42113-021-00112-3.
Collapse
|
35
|
Candelieri A, Perego R, Giordani I, Ponti A, Archetti F. Modelling human active search in optimizing black-box functions. Soft comput 2020. [DOI: 10.1007/s00500-020-05398-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
AbstractModelling human function learning has been the subject of intense research in cognitive sciences. The topic is relevant in black-box optimization where information about the objective and/or constraints is not available and must be learned through function evaluations. In this paper, we focus on the relation between the behaviour of humans searching for the maximum and the probabilistic model used in Bayesian optimization. As surrogate models of the unknown function, both Gaussian processes and random forest have been considered: the Bayesian learning paradigm is central in the development of active learning approaches balancing exploration/exploitation in uncertain conditions towards effective generalization in large decision spaces. In this paper, we analyse experimentally how Bayesian optimization compares to humans searching for the maximum of an unknown 2D function. A set of controlled experiments with 60 subjects, using both surrogate models, confirm that Bayesian optimization provides a general model to represent individual patterns of active learning in humans.
Collapse
|
36
|
|
37
|
Waltz JA, Wilson RC, Albrecht MA, Frank MJ, Gold JM. Differential Effects of Psychotic Illness on Directed and Random Exploration. COMPUTATIONAL PSYCHIATRY (CAMBRIDGE, MASS.) 2020; 4:18-39. [PMID: 33768158 PMCID: PMC7990386 DOI: 10.1162/cpsy_a_00027] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Accepted: 04/27/2020] [Indexed: 12/25/2022]
Abstract
Schizophrenia is associated with a number of deficits in decision-making, but the scope, nature, and cause of these deficits are not completely understood. Here we focus on a particular type of decision, known as the explore/exploit dilemma, in which people must choose between exploiting options that yield relatively known rewards and exploring more ambiguous options of uncertain reward probability or magnitude. Previous work has shown that healthy people use two distinct strategies to decide when to explore: directed exploration, which involves choosing options that would reduce uncertainty about the reward values (information seeking), and random exploration (exploring by chance), which describes behavioral variability that is not goal directed. We administered a recently developed gambling task designed to quantify both directed and random exploration to 108 patients with schizophrenia (PSZ) and 33 healthy volunteers (HVs). We found that PSZ patients show reduced directed exploration relative to HVs, but no difference in random exploration. Moreover, patients' directed exploration behavior clusters into two qualitatively different behavioral phenotypes. In the first phenotype, which accounts for the majority of the patients (79%) and is consistent with previously reported behavior, directed exploration is only marginally (but significantly) reduced, suggesting that these patients can use directed exploration, but at a slightly lower level than community controls. In contrast, the second phenotype, comprising 21% of patients, exhibit a form of "extreme ambiguity aversion," in which they almost never choose more informative options, even when they are clearly of higher value. Moreover, in PSZ, deficits in directed exploration were related to measures of intellectual function, whereas random exploration was related to positive symptoms. Taken together, our results suggest that schizophrenia has differential effects on directed and random exploration and that investigating the explore/exploit dilemma in psychosis patients may reveal subgroups of patients with qualitatively different patterns of exploration.
Collapse
Affiliation(s)
- James A. Waltz
- Maryland Psychiatric Research Center, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Robert C. Wilson
- Department of Psychology and Cognitive Science Program, University of Arizona, Tucson, Arizona, USA
| | - Matthew A. Albrecht
- Maryland Psychiatric Research Center, University of Maryland School of Medicine, Baltimore, Maryland, USA
- School of Public Health, Curtin Health Innovation Research Institute, Curtin University, Perth, Western Australia, Australia
| | - Michael J. Frank
- Department of Cognitive, Linguistic, and Psychological Sciences, Brown University, Providence, Rhode Island, USA
- Department of Psychiatry and Brown Institute for Brain Science, Brown University, Providence, Rhode Island, USA
| | - James M. Gold
- Maryland Psychiatric Research Center, University of Maryland School of Medicine, Baltimore, Maryland, USA
| |
Collapse
|
38
|
Tomov MS, Truong VQ, Hundia RA, Gershman SJ. Dissociable neural correlates of uncertainty underlie different exploration strategies. Nat Commun 2020; 11:2371. [PMID: 32398675 PMCID: PMC7217879 DOI: 10.1038/s41467-020-15766-z] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Accepted: 03/12/2020] [Indexed: 01/27/2023] Open
Abstract
Most real-world decisions involve a delicate balance between exploring unfamiliar alternatives and committing to the best known option. Previous work has shown that humans rely on different forms of uncertainty to negotiate this "explore-exploit" trade-off, yet the neural basis of the underlying computations remains unclear. Using fMRI (n = 31), we find that relative uncertainty is represented in right rostrolateral prefrontal cortex and drives directed exploration, while total uncertainty is represented in right dorsolateral prefrontal cortex and drives random exploration. The decision value signal combining relative and total uncertainty to compute choice is reflected in motor cortex activity. The variance of this signal scales with total uncertainty, consistent with a sampling mechanism for random exploration. Overall, these results are consistent with a hybrid computational architecture in which different uncertainty computations are performed separately and then combined by downstream decision circuits to compute choice.
Collapse
Affiliation(s)
- Momchil S Tomov
- Program in Neuroscience, Harvard Medical School, Boston, MA, 02115, USA.
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA, 02138, USA.
| | - Van Q Truong
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA, 02138, USA
| | - Rohan A Hundia
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA, 02138, USA
| | - Samuel J Gershman
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA, 02138, USA
| |
Collapse
|
39
|
Abstract
Midbrain dopamine signals are widely thought to report reward prediction errors that drive learning in the basal ganglia. However, dopamine has also been implicated in various probabilistic computations, such as encoding uncertainty and controlling exploration. Here, we show how these different facets of dopamine signalling can be brought together under a common reinforcement learning framework. The key idea is that multiple sources of uncertainty impinge on reinforcement learning computations: uncertainty about the state of the environment, the parameters of the value function and the optimal action policy. Each of these sources plays a distinct role in the prefrontal cortex-basal ganglia circuit for reinforcement learning and is ultimately reflected in dopamine activity. The view that dopamine plays a central role in the encoding and updating of beliefs brings the classical prediction error theory into alignment with more recent theories of Bayesian reinforcement learning.
Collapse
Affiliation(s)
- Samuel J Gershman
- Department of Psychology, Center for Brain Science, Harvard University, Cambridge, MA, USA.
| | - Naoshige Uchida
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA, USA
| |
Collapse
|