1
|
Colas JT, O’Doherty JP, Grafton ST. Active reinforcement learning versus action bias and hysteresis: control with a mixture of experts and nonexperts. PLoS Comput Biol 2024; 20:e1011950. [PMID: 38552190 PMCID: PMC10980507 DOI: 10.1371/journal.pcbi.1011950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 02/26/2024] [Indexed: 04/01/2024] Open
Abstract
Active reinforcement learning enables dynamic prediction and control, where one should not only maximize rewards but also minimize costs such as of inference, decisions, actions, and time. For an embodied agent such as a human, decisions are also shaped by physical aspects of actions. Beyond the effects of reward outcomes on learning processes, to what extent can modeling of behavior in a reinforcement-learning task be complicated by other sources of variance in sequential action choices? What of the effects of action bias (for actions per se) and action hysteresis determined by the history of actions chosen previously? The present study addressed these questions with incremental assembly of models for the sequential choice data from a task with hierarchical structure for additional complexity in learning. With systematic comparison and falsification of computational models, human choices were tested for signatures of parallel modules representing not only an enhanced form of generalized reinforcement learning but also action bias and hysteresis. We found evidence for substantial differences in bias and hysteresis across participants-even comparable in magnitude to the individual differences in learning. Individuals who did not learn well revealed the greatest biases, but those who did learn accurately were also significantly biased. The direction of hysteresis varied among individuals as repetition or, more commonly, alternation biases persisting from multiple previous actions. Considering that these actions were button presses with trivial motor demands, the idiosyncratic forces biasing sequences of action choices were robust enough to suggest ubiquity across individuals and across tasks requiring various actions. In light of how bias and hysteresis function as a heuristic for efficient control that adapts to uncertainty or low motivation by minimizing the cost of effort, these phenomena broaden the consilient theory of a mixture of experts to encompass a mixture of expert and nonexpert controllers of behavior.
Collapse
Affiliation(s)
- Jaron T. Colas
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, California, United States of America
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, California, United States of America
- Computation and Neural Systems Program, California Institute of Technology, Pasadena, California, United States of America
| | - John P. O’Doherty
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, California, United States of America
- Computation and Neural Systems Program, California Institute of Technology, Pasadena, California, United States of America
| | - Scott T. Grafton
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, California, United States of America
| |
Collapse
|
2
|
Dundon NM, Colas JT, Garrett N, Babenko V, Rizor E, Yang D, MacNamara M, Petzold L, Grafton ST. Decision heuristics in contexts integrating action selection and execution. Sci Rep 2023; 13:6486. [PMID: 37081031 PMCID: PMC10119283 DOI: 10.1038/s41598-023-33008-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 04/05/2023] [Indexed: 04/22/2023] Open
Abstract
Heuristics can inform human decision making in complex environments through a reduction of computational requirements (accuracy-resource trade-off) and a robustness to overparameterisation (less-is-more). However, tasks capturing the efficiency of heuristics typically ignore action proficiency in determining rewards. The requisite movement parameterisation in sensorimotor control questions whether heuristics preserve efficiency when actions are nontrivial. We developed a novel action selection-execution task requiring joint optimisation of action selection and spatio-temporal skillful execution. State-appropriate choices could be determined by a simple spatial heuristic, or by more complex planning. Computational models of action selection parsimoniously distinguished human participants who adopted the heuristic from those using a more complex planning strategy. Broader comparative analyses then revealed that participants using the heuristic showed combined decisional (selection) and skill (execution) advantages, consistent with a less-is-more framework. In addition, the skill advantage of the heuristic group was predominantly in the core spatial features that also shaped their decision policy, evidence that the dimensions of information guiding action selection might be yoked to salient features in skill learning.
Collapse
Affiliation(s)
- Neil M Dundon
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, CA, 93106, USA.
- Department of Child and Adolescent Psychiatry, Psychotherapy and Psychosomatics, University of Freiburg, 79104, Freiburg, Germany.
| | - Jaron T Colas
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, CA, 93106, USA
| | - Neil Garrett
- School of Psychology, University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, UK
| | - Viktoriya Babenko
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, CA, 93106, USA
| | - Elizabeth Rizor
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, CA, 93106, USA
| | - Dengxian Yang
- Department of Computer Science, University of California, Santa Barbara, CA, 93106, USA
| | | | - Linda Petzold
- Department of Computer Science, University of California, Santa Barbara, CA, 93106, USA
| | - Scott T Grafton
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, CA, 93106, USA
| |
Collapse
|
3
|
Colas JT, Dundon NM, Gerraty RT, Saragosa‐Harris NM, Szymula KP, Tanwisuth K, Tyszka JM, van Geen C, Ju H, Toga AW, Gold JI, Bassett DS, Hartley CA, Shohamy D, Grafton ST, O'Doherty JP. Reinforcement learning with associative or discriminative generalization across states and actions: fMRI at 3 T and 7 T. Hum Brain Mapp 2022; 43:4750-4790. [PMID: 35860954 PMCID: PMC9491297 DOI: 10.1002/hbm.25988] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 05/20/2022] [Accepted: 06/10/2022] [Indexed: 11/12/2022] Open
Abstract
The model-free algorithms of "reinforcement learning" (RL) have gained clout across disciplines, but so too have model-based alternatives. The present study emphasizes other dimensions of this model space in consideration of associative or discriminative generalization across states and actions. This "generalized reinforcement learning" (GRL) model, a frugal extension of RL, parsimoniously retains the single reward-prediction error (RPE), but the scope of learning goes beyond the experienced state and action. Instead, the generalized RPE is efficiently relayed for bidirectional counterfactual updating of value estimates for other representations. Aided by structural information but as an implicit rather than explicit cognitive map, GRL provided the most precise account of human behavior and individual differences in a reversal-learning task with hierarchical structure that encouraged inverse generalization across both states and actions. Reflecting inference that could be true, false (i.e., overgeneralization), or absent (i.e., undergeneralization), state generalization distinguished those who learned well more so than action generalization. With high-resolution high-field fMRI targeting the dopaminergic midbrain, the GRL model's RPE signals (alongside value and decision signals) were localized within not only the striatum but also the substantia nigra and the ventral tegmental area, including specific effects of generalization that also extend to the hippocampus. Factoring in generalization as a multidimensional process in value-based learning, these findings shed light on complexities that, while challenging classic RL, can still be resolved within the bounds of its core computations.
Collapse
Affiliation(s)
- Jaron T. Colas
- Department of Psychological and Brain SciencesUniversity of CaliforniaSanta BarbaraCaliforniaUSA
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
- Computation and Neural Systems Program, California Institute of TechnologyPasadenaCaliforniaUSA
| | - Neil M. Dundon
- Department of Psychological and Brain SciencesUniversity of CaliforniaSanta BarbaraCaliforniaUSA
- Department of Child and Adolescent Psychiatry, Psychotherapy, and PsychosomaticsUniversity of FreiburgFreiburg im BreisgauGermany
| | - Raphael T. Gerraty
- Department of PsychologyColumbia UniversityNew YorkNew YorkUSA
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkNew YorkUSA
- Center for Science and SocietyColumbia UniversityNew YorkNew YorkUSA
| | - Natalie M. Saragosa‐Harris
- Department of PsychologyNew York UniversityNew YorkNew YorkUSA
- Department of PsychologyUniversity of CaliforniaLos AngelesCaliforniaUSA
| | - Karol P. Szymula
- Department of BioengineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Koranis Tanwisuth
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
- Department of PsychologyUniversity of CaliforniaBerkeleyCaliforniaUSA
| | - J. Michael Tyszka
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
| | - Camilla van Geen
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkNew YorkUSA
- Department of PsychologyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Harang Ju
- Neuroscience Graduate GroupUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Arthur W. Toga
- Laboratory of Neuro ImagingUSC Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, University of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Joshua I. Gold
- Department of NeuroscienceUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Dani S. Bassett
- Department of BioengineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of Electrical and Systems EngineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of NeurologyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of PsychiatryUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of Physics and AstronomyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Santa Fe InstituteSanta FeNew MexicoUSA
| | - Catherine A. Hartley
- Department of PsychologyNew York UniversityNew YorkNew YorkUSA
- Center for Neural ScienceNew York UniversityNew YorkNew YorkUSA
| | - Daphna Shohamy
- Department of PsychologyColumbia UniversityNew YorkNew YorkUSA
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkNew YorkUSA
- Kavli Institute for Brain ScienceColumbia UniversityNew YorkNew YorkUSA
| | - Scott T. Grafton
- Department of Psychological and Brain SciencesUniversity of CaliforniaSanta BarbaraCaliforniaUSA
| | - John P. O'Doherty
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
- Computation and Neural Systems Program, California Institute of TechnologyPasadenaCaliforniaUSA
| |
Collapse
|
4
|
Copeland A, Stafford T, Field M. Methodological issues with value-based decision-making (VBDM) tasks: The effect of trial wording on evidence accumulation outputs from the EZ drift-diffusion model. COGENT PSYCHOLOGY 2022. [DOI: 10.1080/23311908.2022.2079801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Open
Affiliation(s)
- Amber Copeland
- Department of Psychology, University of Sheffield, Sheffield, UK
| | - Tom Stafford
- Department of Psychology, University of Sheffield, Sheffield, UK
| | - Matt Field
- Department of Psychology, University of Sheffield, Sheffield, UK
| |
Collapse
|
5
|
Budaev S, Kristiansen TS, Giske J, Eliassen S. Computational animal welfare: towards cognitive architecture models of animal sentience, emotion and wellbeing. ROYAL SOCIETY OPEN SCIENCE 2020; 7:201886. [PMID: 33489298 PMCID: PMC7813262 DOI: 10.1098/rsos.201886] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 12/04/2020] [Indexed: 05/08/2023]
Abstract
To understand animal wellbeing, we need to consider subjective phenomena and sentience. This is challenging, since these properties are private and cannot be observed directly. Certain motivations, emotions and related internal states can be inferred in animals through experiments that involve choice, learning, generalization and decision-making. Yet, even though there is significant progress in elucidating the neurobiology of human consciousness, animal consciousness is still a mystery. We propose that computational animal welfare science emerges at the intersection of animal behaviour, welfare and computational cognition. By using ideas from cognitive science, we develop a functional and generic definition of subjective phenomena as any process or state of the organism that exists from the first-person perspective and cannot be isolated from the animal subject. We then outline a general cognitive architecture to model simple forms of subjective processes and sentience. This includes evolutionary adaptation which contains top-down attention modulation, predictive processing and subjective simulation by re-entrant (recursive) computations. Thereafter, we show how this approach uses major characteristics of the subjective experience: elementary self-awareness, global workspace and qualia with unity and continuity. This provides a formal framework for process-based modelling of animal needs, subjective states, sentience and wellbeing.
Collapse
Affiliation(s)
- Sergey Budaev
- Department of Biological Sciences, University of Bergen, PO Box 7803, 5020 Bergen, Norway
| | - Tore S. Kristiansen
- Research Group Animal Welfare, Institute of Marine Research, PO Box 1870, 5817 Bergen, Norway
| | - Jarl Giske
- Department of Biological Sciences, University of Bergen, PO Box 7803, 5020 Bergen, Norway
| | - Sigrunn Eliassen
- Department of Biological Sciences, University of Bergen, PO Box 7803, 5020 Bergen, Norway
| |
Collapse
|
6
|
Frömer R, Dean Wolf CK, Shenhav A. Goal congruency dominates reward value in accounting for behavioral and neural correlates of value-based decision-making. Nat Commun 2019; 10:4926. [PMID: 31664035 PMCID: PMC6820735 DOI: 10.1038/s41467-019-12931-x] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Accepted: 10/08/2019] [Indexed: 12/22/2022] Open
Abstract
When choosing between options, whether menu items or career paths, we can evaluate how rewarding each one will be, or how congruent it is with our current choice goal (e.g., to point out the best option or the worst one.). Past decision-making research interpreted findings through the former lens, but in these experiments the most rewarding option was always most congruent with the task goal (choosing the best option). It is therefore unclear to what extent expected reward vs. goal congruency can account for choice value findings. To deconfound these two variables, we performed three behavioral studies and an fMRI study in which the task goal varied between identifying the best vs. the worst option. Contrary to prevailing accounts, we find that goal congruency dominates choice behavior and neural activity. We separately identify dissociable signals of expected reward. Our findings call for a reinterpretation of previous research on value-based choice. Decision-making research has confounded the reward value of options with their goal-congruency, as the task goal was always to pick the most rewarding option. Here, authors separately asked participants to select the least rewarding of a set of options, revealing a dominant role for goal congruency.
Collapse
Affiliation(s)
- Romy Frömer
- Cognitive, Linguistic, and Psychological Sciences, Carney Institute for Brain Science, Brown University, Providence, RI, USA.
| | - Carolyn K Dean Wolf
- Cognitive, Linguistic, and Psychological Sciences, Carney Institute for Brain Science, Brown University, Providence, RI, USA
| | - Amitai Shenhav
- Cognitive, Linguistic, and Psychological Sciences, Carney Institute for Brain Science, Brown University, Providence, RI, USA.
| |
Collapse
|
7
|
Krajbich I. Accounting for attention in sequential sampling models of decision making. Curr Opin Psychol 2019; 29:6-11. [DOI: 10.1016/j.copsyc.2018.10.008] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2018] [Accepted: 10/09/2018] [Indexed: 01/22/2023]
|
8
|
Budaev S, Jørgensen C, Mangel M, Eliassen S, Giske J. Decision-Making From the Animal Perspective: Bridging Ecology and Subjective Cognition. Front Ecol Evol 2019. [DOI: 10.3389/fevo.2019.00164] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
9
|
Colas JT. Correction: Value-based decision making via sequential sampling with hierarchical competition and attentional modulation. PLoS One 2018; 13:e0203093. [PMID: 30138375 PMCID: PMC6107256 DOI: 10.1371/journal.pone.0203093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
|