1
|
Colas JT, O’Doherty JP, Grafton ST. Active reinforcement learning versus action bias and hysteresis: control with a mixture of experts and nonexperts. PLoS Comput Biol 2024; 20:e1011950. [PMID: 38552190 PMCID: PMC10980507 DOI: 10.1371/journal.pcbi.1011950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 02/26/2024] [Indexed: 04/01/2024] Open
Abstract
Active reinforcement learning enables dynamic prediction and control, where one should not only maximize rewards but also minimize costs such as of inference, decisions, actions, and time. For an embodied agent such as a human, decisions are also shaped by physical aspects of actions. Beyond the effects of reward outcomes on learning processes, to what extent can modeling of behavior in a reinforcement-learning task be complicated by other sources of variance in sequential action choices? What of the effects of action bias (for actions per se) and action hysteresis determined by the history of actions chosen previously? The present study addressed these questions with incremental assembly of models for the sequential choice data from a task with hierarchical structure for additional complexity in learning. With systematic comparison and falsification of computational models, human choices were tested for signatures of parallel modules representing not only an enhanced form of generalized reinforcement learning but also action bias and hysteresis. We found evidence for substantial differences in bias and hysteresis across participants-even comparable in magnitude to the individual differences in learning. Individuals who did not learn well revealed the greatest biases, but those who did learn accurately were also significantly biased. The direction of hysteresis varied among individuals as repetition or, more commonly, alternation biases persisting from multiple previous actions. Considering that these actions were button presses with trivial motor demands, the idiosyncratic forces biasing sequences of action choices were robust enough to suggest ubiquity across individuals and across tasks requiring various actions. In light of how bias and hysteresis function as a heuristic for efficient control that adapts to uncertainty or low motivation by minimizing the cost of effort, these phenomena broaden the consilient theory of a mixture of experts to encompass a mixture of expert and nonexpert controllers of behavior.
Collapse
Affiliation(s)
- Jaron T. Colas
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, California, United States of America
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, California, United States of America
- Computation and Neural Systems Program, California Institute of Technology, Pasadena, California, United States of America
| | - John P. O’Doherty
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, California, United States of America
- Computation and Neural Systems Program, California Institute of Technology, Pasadena, California, United States of America
| | - Scott T. Grafton
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, California, United States of America
| |
Collapse
|
2
|
Mathar D, Wiebe A, Tuzsus D, Knauth K, Peters J. Erotic cue exposure increases physiological arousal, biases choices toward immediate rewards, and attenuates model-based reinforcement learning. Psychophysiology 2023; 60:e14381. [PMID: 37435973 DOI: 10.1111/psyp.14381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 04/21/2023] [Accepted: 06/17/2023] [Indexed: 07/13/2023]
Abstract
Computational psychiatry focuses on identifying core cognitive processes that appear altered across distinct psychiatric disorders. Temporal discounting of future rewards and model-based control during reinforcement learning have proven as two promising candidates. Despite its trait-like stability, temporal discounting may be at least partly under contextual control. Highly arousing cues were shown to increase discounting, although evidence to date remains somewhat mixed. Whether model-based reinforcement learning is similarly affected by arousing cues remains unclear. Here, we tested cue-reactivity effects (erotic pictures) on subsequent temporal discounting and model-based reinforcement learning in a within-subjects design in n = 39 healthy heterosexual male participants. Self-reported and physiological arousal (cardiac activity and pupil dilation) were assessed before and during cue exposure. Arousal was increased during exposure of erotic versus neutral cues both on the subjective and autonomic level. Erotic cue exposure increased discounting as reflected by more impatient choices. Hierarchical drift diffusion modeling (DDM) linked increased discounting to a shift in the starting point bias of evidence accumulation toward immediate options. Model-based control during reinforcement learning was reduced following erotic cues according to model-agnostic analysis. Notably, DDM linked this effect to attenuated forgetting rates of unchosen options, leaving the model-based control parameter unchanged. Our findings replicate previous work on cue-reactivity effects in temporal discounting and for the first time show similar effects in model-based reinforcement learning in a heterosexual male sample. This highlights how environmental cues can impact core human decision processes and reveal that comprehensive modeling approaches can yield novel insights in reward-based decision processes.
Collapse
Affiliation(s)
- David Mathar
- Department of Psychology, Biological Psychology, University of Cologne, Cologne, Germany
| | - Annika Wiebe
- Department of Psychiatry and Psychotherapy, University Hospital Bonn, Bonn, Germany
| | - Deniz Tuzsus
- Department of Psychology, Biological Psychology, University of Cologne, Cologne, Germany
| | - Kilian Knauth
- Department of Psychology, Biological Psychology, University of Cologne, Cologne, Germany
| | - Jan Peters
- Department of Psychology, Biological Psychology, University of Cologne, Cologne, Germany
| |
Collapse
|
3
|
Le NM, Yildirim M, Wang Y, Sugihara H, Jazayeri M, Sur M. Mixtures of strategies underlie rodent behavior during reversal learning. PLoS Comput Biol 2023; 19:e1011430. [PMID: 37708113 PMCID: PMC10501641 DOI: 10.1371/journal.pcbi.1011430] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 08/09/2023] [Indexed: 09/16/2023] Open
Abstract
In reversal learning tasks, the behavior of humans and animals is often assumed to be uniform within single experimental sessions to facilitate data analysis and model fitting. However, behavior of agents can display substantial variability in single experimental sessions, as they execute different blocks of trials with different transition dynamics. Here, we observed that in a deterministic reversal learning task, mice display noisy and sub-optimal choice transitions even at the expert stages of learning. We investigated two sources of the sub-optimality in the behavior. First, we found that mice exhibit a high lapse rate during task execution, as they reverted to unrewarded directions after choice transitions. Second, we unexpectedly found that a majority of mice did not execute a uniform strategy, but rather mixed between several behavioral modes with different transition dynamics. We quantified the use of such mixtures with a state-space model, block Hidden Markov Model (block HMM), to dissociate the mixtures of dynamic choice transitions in individual blocks of trials. Additionally, we found that blockHMM transition modes in rodent behavior can be accounted for by two different types of behavioral algorithms, model-free or inference-based learning, that might be used to solve the task. Combining these approaches, we found that mice used a mixture of both exploratory, model-free strategies and deterministic, inference-based behavior in the task, explaining their overall noisy choice sequences. Together, our combined computational approach highlights intrinsic sources of noise in rodent reversal learning behavior and provides a richer description of behavior than conventional techniques, while uncovering the hidden states that underlie the block-by-block transitions.
Collapse
Affiliation(s)
- Nhat Minh Le
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Murat Yildirim
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Department of Neurosciences, Cleveland Clinic Lerner Research Institute, Cleveland, Ohio, United States of America
| | - Yizhi Wang
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Hiroki Sugihara
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Mehrdad Jazayeri
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Mriganka Sur
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| |
Collapse
|
4
|
Mathar D, Erfanian Abdoust M, Marrenbach T, Tuzsus D, Peters J. The catecholamine precursor Tyrosine reduces autonomic arousal and decreases decision thresholds in reinforcement learning and temporal discounting. PLoS Comput Biol 2022; 18:e1010785. [PMID: 36548401 PMCID: PMC9822114 DOI: 10.1371/journal.pcbi.1010785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 01/06/2023] [Accepted: 12/01/2022] [Indexed: 12/24/2022] Open
Abstract
Supplementation with the catecholamine precursor L-Tyrosine might enhance cognitive performance, but overall findings are mixed. Here, we investigate the effect of a single dose of tyrosine (2g) vs. placebo on two catecholamine-dependent trans-diagnostic traits: model-based control during reinforcement learning (2-step task) and temporal discounting, using a double-blind, placebo-controlled, within-subject design (n = 28 healthy male participants). We leveraged drift diffusion models in a hierarchical Bayesian framework to jointly model participants' choices and response times (RTS) in both tasks. Furthermore, comprehensive autonomic monitoring (heart rate, heart rate variability, pupillometry, spontaneous eye blink rate) was performed both pre- and post-supplementation, to explore potential physiological effects of supplementation. Across tasks, tyrosine consistently reduced participants' RTs without deteriorating task-performance. Diffusion modeling linked this effect to attenuated decision-thresholds in both tasks and further revealed increased model-based control (2-step task) and (if anything) attenuated temporal discounting. On the physiological level, participants' pupil dilation was predictive of the individual degree of temporal discounting. Tyrosine supplementation reduced physiological arousal as revealed by increases in pupil dilation variability and reductions in heart rate. Supplementation-related changes in physiological arousal predicted individual changes in temporal discounting. Our findings provide first evidence that tyrosine supplementation might impact psychophysiological parameters, and suggest that modeling approaches based on sequential sampling models can yield novel insights into latent cognitive processes modulated by amino-acid supplementation.
Collapse
Affiliation(s)
- David Mathar
- Department of Psychology, Biological Psychology, University of Cologne, Cologne, Germany
- * E-mail:
| | - Mani Erfanian Abdoust
- Biological Psychology of Decision Making, Institute of Experimental Psychology, Heinrich Heine University Duesseldorf, Duesseldorf, Germany
| | - Tobias Marrenbach
- Biological Psychology of Decision Making, Institute of Experimental Psychology, Heinrich Heine University Duesseldorf, Duesseldorf, Germany
| | - Deniz Tuzsus
- Department of Psychology, Biological Psychology, University of Cologne, Cologne, Germany
| | - Jan Peters
- Department of Psychology, Biological Psychology, University of Cologne, Cologne, Germany
| |
Collapse
|
5
|
Colas JT, Dundon NM, Gerraty RT, Saragosa‐Harris NM, Szymula KP, Tanwisuth K, Tyszka JM, van Geen C, Ju H, Toga AW, Gold JI, Bassett DS, Hartley CA, Shohamy D, Grafton ST, O'Doherty JP. Reinforcement learning with associative or discriminative generalization across states and actions: fMRI at 3 T and 7 T. Hum Brain Mapp 2022; 43:4750-4790. [PMID: 35860954 PMCID: PMC9491297 DOI: 10.1002/hbm.25988] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 05/20/2022] [Accepted: 06/10/2022] [Indexed: 11/12/2022] Open
Abstract
The model-free algorithms of "reinforcement learning" (RL) have gained clout across disciplines, but so too have model-based alternatives. The present study emphasizes other dimensions of this model space in consideration of associative or discriminative generalization across states and actions. This "generalized reinforcement learning" (GRL) model, a frugal extension of RL, parsimoniously retains the single reward-prediction error (RPE), but the scope of learning goes beyond the experienced state and action. Instead, the generalized RPE is efficiently relayed for bidirectional counterfactual updating of value estimates for other representations. Aided by structural information but as an implicit rather than explicit cognitive map, GRL provided the most precise account of human behavior and individual differences in a reversal-learning task with hierarchical structure that encouraged inverse generalization across both states and actions. Reflecting inference that could be true, false (i.e., overgeneralization), or absent (i.e., undergeneralization), state generalization distinguished those who learned well more so than action generalization. With high-resolution high-field fMRI targeting the dopaminergic midbrain, the GRL model's RPE signals (alongside value and decision signals) were localized within not only the striatum but also the substantia nigra and the ventral tegmental area, including specific effects of generalization that also extend to the hippocampus. Factoring in generalization as a multidimensional process in value-based learning, these findings shed light on complexities that, while challenging classic RL, can still be resolved within the bounds of its core computations.
Collapse
Affiliation(s)
- Jaron T. Colas
- Department of Psychological and Brain SciencesUniversity of CaliforniaSanta BarbaraCaliforniaUSA
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
- Computation and Neural Systems Program, California Institute of TechnologyPasadenaCaliforniaUSA
| | - Neil M. Dundon
- Department of Psychological and Brain SciencesUniversity of CaliforniaSanta BarbaraCaliforniaUSA
- Department of Child and Adolescent Psychiatry, Psychotherapy, and PsychosomaticsUniversity of FreiburgFreiburg im BreisgauGermany
| | - Raphael T. Gerraty
- Department of PsychologyColumbia UniversityNew YorkNew YorkUSA
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkNew YorkUSA
- Center for Science and SocietyColumbia UniversityNew YorkNew YorkUSA
| | - Natalie M. Saragosa‐Harris
- Department of PsychologyNew York UniversityNew YorkNew YorkUSA
- Department of PsychologyUniversity of CaliforniaLos AngelesCaliforniaUSA
| | - Karol P. Szymula
- Department of BioengineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Koranis Tanwisuth
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
- Department of PsychologyUniversity of CaliforniaBerkeleyCaliforniaUSA
| | - J. Michael Tyszka
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
| | - Camilla van Geen
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkNew YorkUSA
- Department of PsychologyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Harang Ju
- Neuroscience Graduate GroupUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Arthur W. Toga
- Laboratory of Neuro ImagingUSC Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, University of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Joshua I. Gold
- Department of NeuroscienceUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Dani S. Bassett
- Department of BioengineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of Electrical and Systems EngineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of NeurologyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of PsychiatryUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of Physics and AstronomyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Santa Fe InstituteSanta FeNew MexicoUSA
| | - Catherine A. Hartley
- Department of PsychologyNew York UniversityNew YorkNew YorkUSA
- Center for Neural ScienceNew York UniversityNew YorkNew YorkUSA
| | - Daphna Shohamy
- Department of PsychologyColumbia UniversityNew YorkNew YorkUSA
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkNew YorkUSA
- Kavli Institute for Brain ScienceColumbia UniversityNew YorkNew YorkUSA
| | - Scott T. Grafton
- Department of Psychological and Brain SciencesUniversity of CaliforniaSanta BarbaraCaliforniaUSA
| | - John P. O'Doherty
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
- Computation and Neural Systems Program, California Institute of TechnologyPasadenaCaliforniaUSA
| |
Collapse
|
6
|
O’Connell K, Walsh M, Padgett B, Connell S, Marsh AA. Modeling Variation in Empathic Sensitivity Using Go/No-Go Social Reinforcement Learning. AFFECTIVE SCIENCE 2022; 3:603-615. [PMID: 36385908 PMCID: PMC9537390 DOI: 10.1007/s42761-022-00119-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 04/10/2022] [Indexed: 06/16/2023]
Abstract
Recent advances in computational behavioral modeling can help rigorously quantify differences in how individuals learn behaviors that affect both themselves and others. But social learning remains understudied in the context of understanding individual variation in social phenomena like aggression, which is defined by persistent engagement in behaviors that harm others. We adapted a go/no-go reinforcement learning task across social and non-social contexts such that monetary gains and losses explicitly impacted the subject, a study partner, or no one. We then quantified participants' (n = 61) sensitivity to others' rewards, sensitivity to others' losses, and the Pavlovian influence of expected outcomes on approach and avoidance behavior. Results showed that subjects learned in response to punishments and rewards that affected their partner in a way that was computationally similar to how they learned for themselves, consistent with the possibility that social learning engages empathic processes. Further supporting this interpretation, an individualized model parameter that indexed sensitivity to others' punishments was inversely associated with trait antisociality. Modeled sensitivity to others' losses also mapped onto post-task motivation ratings, but was not associated with self-reported trait empathy. This work is the first to apply a social reinforcement learning task that spans affect and action requirement (go/no-go) to measure multiple facets of empathic sensitivity. Supplementary Information The online version contains supplementary material available at 10.1007/s42761-022-00119-4.
Collapse
Affiliation(s)
- Katherine O’Connell
- Interdisciplinary Program in Neuroscience, Georgetown University, Washington, DC USA
| | - Marissa Walsh
- Department of Psychology, Georgetown University, Washington, DC USA
| | - Brandon Padgett
- Department of Psychology, Georgetown University, Washington, DC USA
| | - Sarah Connell
- Department of Psychology, Georgetown University, Washington, DC USA
| | - Abigail A. Marsh
- Interdisciplinary Program in Neuroscience, Georgetown University, Washington, DC USA
- Department of Psychology, Georgetown University, Washington, DC USA
| |
Collapse
|
7
|
Wagner B, Mathar D, Peters J. Gambling Environment Exposure Increases Temporal Discounting but Improves Model-Based Control in Regular Slot-Machine Gamblers. COMPUTATIONAL PSYCHIATRY (CAMBRIDGE, MASS.) 2022; 6:142-165. [PMID: 38774777 PMCID: PMC11104401 DOI: 10.5334/cpsy.84] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 05/26/2022] [Indexed: 11/20/2022]
Abstract
Gambling disorder is a behavioral addiction that negatively impacts personal finances, work, relationships and mental health. In this pre-registered study (https://osf.io/5ptz9/) we investigated the impact of real-life gambling environments on two computational markers of addiction, temporal discounting and model-based reinforcement learning. Gambling disorder is associated with increased temporal discounting and reduced model-based learning. Regular gamblers (n = 30, DSM-5 score range 3-9) performed both tasks in a neutral (café) and a gambling-related environment (slot-machine venue) in counterbalanced order. Data were modeled using drift diffusion models for temporal discounting and reinforcement learning via hierarchical Bayesian estimation. Replicating previous findings, gamblers discounted rewards more steeply in the gambling-related context. This effect was positively correlated with gambling related cognitive distortions (pre-registered analysis). In contrast to our pre-registered hypothesis, model-based reinforcement learning was improved in the gambling context. Here we show that temporal discounting and model-based reinforcement learning are modulated in opposite ways by real-life gambling cue exposure. Results challenge aspects of habit theories of addiction, and reveal that laboratory-based computational markers of psychopathology are under substantial contextual control.
Collapse
Affiliation(s)
- Ben Wagner
- Department of Psychology, Biological Psychology, University of Cologne, Cologne, Germany
- Faculty of Psychology, Chair of Neuroimaging, Technical University of Dresden, Dresden, Germany
| | - David Mathar
- Department of Psychology, Biological Psychology, University of Cologne, Cologne, Germany
| | - Jan Peters
- Department of Psychology, Biological Psychology, University of Cologne, Cologne, Germany
| |
Collapse
|
8
|
Yalnizyan-Carson A, Richards BA. Forgetting Enhances Episodic Control With Structured Memories. Front Comput Neurosci 2022; 16:757244. [PMID: 35399916 PMCID: PMC8991683 DOI: 10.3389/fncom.2022.757244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Accepted: 03/07/2022] [Indexed: 11/13/2022] Open
Abstract
Forgetting is a normal process in healthy brains, and evidence suggests that the mammalian brain forgets more than is required based on limitations of mnemonic capacity. Episodic memories, in particular, are liable to be forgotten over time. Researchers have hypothesized that it may be beneficial for decision making to forget episodic memories over time. Reinforcement learning offers a normative framework in which to test such hypotheses. Here, we show that a reinforcement learning agent that uses an episodic memory cache to find rewards in maze environments can forget a large percentage of older memories without any performance impairments, if they utilize mnemonic representations that contain structural information about space. Moreover, we show that some forgetting can actually provide a benefit in performance compared to agents with unbounded memories. Our analyses of the agents show that forgetting reduces the influence of outdated information and states which are not frequently visited on the policies produced by the episodic control system. These results support the hypothesis that some degree of forgetting can be beneficial for decision making, which can help to explain why the brain forgets more than is required by capacity limitations.
Collapse
Affiliation(s)
- Annik Yalnizyan-Carson
- Department of Biological Sciences, University of Toronto Scarborough, Toronto, ON, Canada
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada
- Montreal Institute for Learning Algorithms (MILA), Montreal, QC, Canada
- *Correspondence: Annik Yalnizyan-Carson
| | - Blake A. Richards
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada
- Montreal Institute for Learning Algorithms (MILA), Montreal, QC, Canada
- Montreal Neurological Institute, Montreal, QC, Canada
- Department of Neurology and Neurosurgery, McGill University, Montreal, QC, Canada
- School of Computer Science, McGill University, Montreal, QC, Canada
| |
Collapse
|
9
|
Optimism and pessimism in optimised replay. PLoS Comput Biol 2022; 18:e1009634. [PMID: 35020718 PMCID: PMC8809607 DOI: 10.1371/journal.pcbi.1009634] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 02/02/2022] [Accepted: 11/12/2021] [Indexed: 11/24/2022] Open
Abstract
The replay of task-relevant trajectories is known to contribute to memory consolidation and improved task performance. A wide variety of experimental data show that the content of replayed sequences is highly specific and can be modulated by reward as well as other prominent task variables. However, the rules governing the choice of sequences to be replayed still remain poorly understood. One recent theoretical suggestion is that the prioritization of replay experiences in decision-making problems is based on their effect on the choice of action. We show that this implies that subjects should replay sub-optimal actions that they dysfunctionally choose rather than optimal ones, when, by being forgetful, they experience large amounts of uncertainty in their internal models of the world. We use this to account for recent experimental data demonstrating exactly pessimal replay, fitting model parameters to the individual subjects’ choices. When animals are asleep or restfully awake, populations of neurons in their brains recapitulate activity associated with extended behaviourally-relevant experiences. This process is called replay, and it has been established for a long time in rodents, and very recently in humans, to be important for good performance in decision-making tasks. The specific experiences which are replayed during those epochs follow highly ordered patterns, but the mechanisms which establish their priority are still not fully understood. One promising theoretical suggestion is that each replay experience is chosen in such a way that the learning that ensues is most helpful for the subsequent performance of the animal. A very recent study reported a surprising result that humans who achieved high performance in a planning task tended to replay actions they found to be sub-optimal, and that this was associated with a useful deprecation of those actions in subsequent performance. In this study, we examine the nature of this pessimized form of replay and show that it is exactly appropriate for forgetful agents. We analyse the role of forgetting for replay choices of our model, and verify our predictions using human subject data.
Collapse
|
10
|
Revisiting the importance of model fitting for model-based fMRI: It does matter in computational psychiatry. PLoS Comput Biol 2021; 17:e1008738. [PMID: 33561125 PMCID: PMC7899379 DOI: 10.1371/journal.pcbi.1008738] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Revised: 02/22/2021] [Accepted: 01/25/2021] [Indexed: 11/19/2022] Open
Abstract
Computational modeling has been applied for data analysis in psychology, neuroscience, and psychiatry. One of its important uses is to infer the latent variables underlying behavior by which researchers can evaluate corresponding neural, physiological, or behavioral measures. This feature is especially crucial for computational psychiatry, in which altered computational processes underlying mental disorders are of interest. For instance, several studies employing model-based fMRI-a method for identifying brain regions correlated with latent variables-have shown that patients with mental disorders (e.g., depression) exhibit diminished neural responses to reward prediction errors (RPEs), which are the differences between experienced and predicted rewards. Such model-based analysis has the drawback that the parameter estimates and inference of latent variables are not necessarily correct-rather, they usually contain some errors. A previous study theoretically and empirically showed that the error in model-fitting does not necessarily cause a serious error in model-based fMRI. However, the study did not deal with certain situations relevant to psychiatry, such as group comparisons between patients and healthy controls. We developed a theoretical framework to explore such situations. We demonstrate that the parameter-misspecification can critically affect the results of group comparison. We demonstrate that even if the RPE response in patients is completely intact, a spurious difference to healthy controls is observable. Such a situation occurs when the ground-truth learning rate differs between groups but a common learning rate is used, as per previous studies. Furthermore, even if the parameters are appropriately fitted to individual participants, spurious group differences in RPE responses are observable when the model lacks a component that differs between groups. These results highlight the importance of appropriate model-fitting and the need for caution when interpreting the results of model-based fMRI.
Collapse
|