Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Toyama A, Katahira K, Ohira H. Reinforcement Learning With Parsimonious Computation and a Forgetting Process. Front Hum Neurosci 2019;13:153. [PMID: 31143107 PMCID: PMC6520826 DOI: 10.3389/fnhum.2019.00153] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Accepted: 04/23/2019] [Indexed: 12/03/2022] Open

For:	Toyama A, Katahira K, Ohira H. Reinforcement Learning With Parsimonious Computation and a Forgetting Process. Front Hum Neurosci 2019;13:153. [PMID: 31143107 PMCID: PMC6520826 DOI: 10.3389/fnhum.2019.00153] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Accepted: 04/23/2019] [Indexed: 12/03/2022] Open

Number

Cited by Other Article(s)

Colas JT, O’Doherty JP, Grafton ST. Active reinforcement learning versus action bias and hysteresis: control with a mixture of experts and nonexperts. PLoS Comput Biol 2024;20:e1011950. [PMID: 38552190 PMCID: PMC10980507 DOI: 10.1371/journal.pcbi.1011950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 02/26/2024] [Indexed: 04/01/2024] Open

Abstract

Active reinforcement learning enables dynamic prediction and control, where one should not only maximize rewards but also minimize costs such as of inference, decisions, actions, and time. For an embodied agent such as a human, decisions are also shaped by physical aspects of actions. Beyond the effects of reward outcomes on learning processes, to what extent can modeling of behavior in a reinforcement-learning task be complicated by other sources of variance in sequential action choices? What of the effects of action bias (for actions per se) and action hysteresis determined by the history of actions chosen previously? The present study addressed these questions with incremental assembly of models for the sequential choice data from a task with hierarchical structure for additional complexity in learning. With systematic comparison and falsification of computational models, human choices were tested for signatures of parallel modules representing not only an enhanced form of generalized reinforcement learning but also action bias and hysteresis. We found evidence for substantial differences in bias and hysteresis across participants-even comparable in magnitude to the individual differences in learning. Individuals who did not learn well revealed the greatest biases, but those who did learn accurately were also significantly biased. The direction of hysteresis varied among individuals as repetition or, more commonly, alternation biases persisting from multiple previous actions. Considering that these actions were button presses with trivial motor demands, the idiosyncratic forces biasing sequences of action choices were robust enough to suggest ubiquity across individuals and across tasks requiring various actions. In light of how bias and hysteresis function as a heuristic for efficient control that adapts to uncertainty or low motivation by minimizing the cost of effort, these phenomena broaden the consilient theory of a mixture of experts to encompass a mixture of expert and nonexpert controllers of behavior.

Collapse

Mathar D, Wiebe A, Tuzsus D, Knauth K, Peters J. Erotic cue exposure increases physiological arousal, biases choices toward immediate rewards, and attenuates model-based reinforcement learning. Psychophysiology 2023;60:e14381. [PMID: 37435973 DOI: 10.1111/psyp.14381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 04/21/2023] [Accepted: 06/17/2023] [Indexed: 07/13/2023]

Abstract

Computational psychiatry focuses on identifying core cognitive processes that appear altered across distinct psychiatric disorders. Temporal discounting of future rewards and model-based control during reinforcement learning have proven as two promising candidates. Despite its trait-like stability, temporal discounting may be at least partly under contextual control. Highly arousing cues were shown to increase discounting, although evidence to date remains somewhat mixed. Whether model-based reinforcement learning is similarly affected by arousing cues remains unclear. Here, we tested cue-reactivity effects (erotic pictures) on subsequent temporal discounting and model-based reinforcement learning in a within-subjects design in n = 39 healthy heterosexual male participants. Self-reported and physiological arousal (cardiac activity and pupil dilation) were assessed before and during cue exposure. Arousal was increased during exposure of erotic versus neutral cues both on the subjective and autonomic level. Erotic cue exposure increased discounting as reflected by more impatient choices. Hierarchical drift diffusion modeling (DDM) linked increased discounting to a shift in the starting point bias of evidence accumulation toward immediate options. Model-based control during reinforcement learning was reduced following erotic cues according to model-agnostic analysis. Notably, DDM linked this effect to attenuated forgetting rates of unchosen options, leaving the model-based control parameter unchanged. Our findings replicate previous work on cue-reactivity effects in temporal discounting and for the first time show similar effects in model-based reinforcement learning in a heterosexual male sample. This highlights how environmental cues can impact core human decision processes and reveal that comprehensive modeling approaches can yield novel insights in reward-based decision processes.

Collapse

Le NM, Yildirim M, Wang Y, Sugihara H, Jazayeri M, Sur M. Mixtures of strategies underlie rodent behavior during reversal learning. PLoS Comput Biol 2023;19:e1011430. [PMID: 37708113 PMCID: PMC10501641 DOI: 10.1371/journal.pcbi.1011430] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 08/09/2023] [Indexed: 09/16/2023] Open

Mathar D, Erfanian Abdoust M, Marrenbach T, Tuzsus D, Peters J. The catecholamine precursor Tyrosine reduces autonomic arousal and decreases decision thresholds in reinforcement learning and temporal discounting. PLoS Comput Biol 2022;18:e1010785. [PMID: 36548401 PMCID: PMC9822114 DOI: 10.1371/journal.pcbi.1010785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 01/06/2023] [Accepted: 12/01/2022] [Indexed: 12/24/2022] Open

Colas JT, Dundon NM, Gerraty RT, Saragosa‐Harris NM, Szymula KP, Tanwisuth K, Tyszka JM, van Geen C, Ju H, Toga AW, Gold JI, Bassett DS, Hartley CA, Shohamy D, Grafton ST, O'Doherty JP. Reinforcement learning with associative or discriminative generalization across states and actions: fMRI at 3 T and 7 T. Hum Brain Mapp 2022;43:4750-4790. [PMID: 35860954 PMCID: PMC9491297 DOI: 10.1002/hbm.25988] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 05/20/2022] [Accepted: 06/10/2022] [Indexed: 11/12/2022] Open

Affiliation(s)

Jaron T. Colas Department of Psychological and Brain SciencesUniversity of CaliforniaSanta BarbaraCaliforniaUSA Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA Computation and Neural Systems Program, California Institute of TechnologyPasadenaCaliforniaUSA
Neil M. Dundon Department of Psychological and Brain SciencesUniversity of CaliforniaSanta BarbaraCaliforniaUSA Department of Child and Adolescent Psychiatry, Psychotherapy, and PsychosomaticsUniversity of FreiburgFreiburg im BreisgauGermany
Raphael T. Gerraty Department of PsychologyColumbia UniversityNew YorkNew YorkUSA Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkNew YorkUSA Center for Science and SocietyColumbia UniversityNew YorkNew YorkUSA
Natalie M. Saragosa‐Harris Department of PsychologyNew York UniversityNew YorkNew YorkUSA Department of PsychologyUniversity of CaliforniaLos AngelesCaliforniaUSA
Karol P. Szymula Department of BioengineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
Koranis Tanwisuth Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA Department of PsychologyUniversity of CaliforniaBerkeleyCaliforniaUSA
J. Michael Tyszka Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
Camilla van Geen Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkNew YorkUSA Department of PsychologyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
Harang Ju Neuroscience Graduate GroupUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
Arthur W. Toga Laboratory of Neuro ImagingUSC Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, University of Southern CaliforniaLos AngelesCaliforniaUSA
Joshua I. Gold Department of NeuroscienceUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
Dani S. Bassett Department of BioengineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA Department of Electrical and Systems EngineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA Department of NeurologyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA Department of PsychiatryUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA Department of Physics and AstronomyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA Santa Fe InstituteSanta FeNew MexicoUSA
Catherine A. Hartley Department of PsychologyNew York UniversityNew YorkNew YorkUSA Center for Neural ScienceNew York UniversityNew YorkNew YorkUSA
Daphna Shohamy Department of PsychologyColumbia UniversityNew YorkNew YorkUSA Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkNew YorkUSA Kavli Institute for Brain ScienceColumbia UniversityNew YorkNew YorkUSA
Scott T. Grafton Department of Psychological and Brain SciencesUniversity of CaliforniaSanta BarbaraCaliforniaUSA
John P. O'Doherty Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA Computation and Neural Systems Program, California Institute of TechnologyPasadenaCaliforniaUSA

Collapse

O’Connell K, Walsh M, Padgett B, Connell S, Marsh AA. Modeling Variation in Empathic Sensitivity Using Go/No-Go Social Reinforcement Learning. AFFECTIVE SCIENCE 2022;3:603-615. [PMID: 36385908 PMCID: PMC9537390 DOI: 10.1007/s42761-022-00119-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 04/10/2022] [Indexed: 06/16/2023]

Wagner B, Mathar D, Peters J. Gambling Environment Exposure Increases Temporal Discounting but Improves Model-Based Control in Regular Slot-Machine Gamblers. COMPUTATIONAL PSYCHIATRY (CAMBRIDGE, MASS.) 2022;6:142-165. [PMID: 38774777 PMCID: PMC11104401 DOI: 10.5334/cpsy.84] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 05/26/2022] [Indexed: 11/20/2022]

Yalnizyan-Carson A, Richards BA. Forgetting Enhances Episodic Control With Structured Memories. Front Comput Neurosci 2022;16:757244. [PMID: 35399916 PMCID: PMC8991683 DOI: 10.3389/fncom.2022.757244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Accepted: 03/07/2022] [Indexed: 11/13/2022] Open

Optimism and pessimism in optimised replay. PLoS Comput Biol 2022;18:e1009634. [PMID: 35020718 PMCID: PMC8809607 DOI: 10.1371/journal.pcbi.1009634] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 02/02/2022] [Accepted: 11/12/2021] [Indexed: 11/24/2022] Open

Abstract

The replay of task-relevant trajectories is known to contribute to memory consolidation and improved task performance. A wide variety of experimental data show that the content of replayed sequences is highly specific and can be modulated by reward as well as other prominent task variables. However, the rules governing the choice of sequences to be replayed still remain poorly understood. One recent theoretical suggestion is that the prioritization of replay experiences in decision-making problems is based on their effect on the choice of action. We show that this implies that subjects should replay sub-optimal actions that they dysfunctionally choose rather than optimal ones, when, by being forgetful, they experience large amounts of uncertainty in their internal models of the world. We use this to account for recent experimental data demonstrating exactly pessimal replay, fitting model parameters to the individual subjects’ choices.

When animals are asleep or restfully awake, populations of neurons in their brains recapitulate activity associated with extended behaviourally-relevant experiences. This process is called replay, and it has been established for a long time in rodents, and very recently in humans, to be important for good performance in decision-making tasks. The specific experiences which are replayed during those epochs follow highly ordered patterns, but the mechanisms which establish their priority are still not fully understood. One promising theoretical suggestion is that each replay experience is chosen in such a way that the learning that ensues is most helpful for the subsequent performance of the animal. A very recent study reported a surprising result that humans who achieved high performance in a planning task tended to replay actions they found to be sub-optimal, and that this was associated with a useful deprecation of those actions in subsequent performance. In this study, we examine the nature of this pessimized form of replay and show that it is exactly appropriate for forgetful agents. We analyse the role of forgetting for replay choices of our model, and verify our predictions using human subject data.

Collapse

Revisiting the importance of model fitting for model-based fMRI: It does matter in computational psychiatry. PLoS Comput Biol 2021;17:e1008738. [PMID: 33561125 PMCID: PMC7899379 DOI: 10.1371/journal.pcbi.1008738] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Revised: 02/22/2021] [Accepted: 01/25/2021] [Indexed: 11/19/2022] Open

Abstract

Computational modeling has been applied for data analysis in psychology, neuroscience, and psychiatry. One of its important uses is to infer the latent variables underlying behavior by which researchers can evaluate corresponding neural, physiological, or behavioral measures. This feature is especially crucial for computational psychiatry, in which altered computational processes underlying mental disorders are of interest. For instance, several studies employing model-based fMRI-a method for identifying brain regions correlated with latent variables-have shown that patients with mental disorders (e.g., depression) exhibit diminished neural responses to reward prediction errors (RPEs), which are the differences between experienced and predicted rewards. Such model-based analysis has the drawback that the parameter estimates and inference of latent variables are not necessarily correct-rather, they usually contain some errors. A previous study theoretically and empirically showed that the error in model-fitting does not necessarily cause a serious error in model-based fMRI. However, the study did not deal with certain situations relevant to psychiatry, such as group comparisons between patients and healthy controls. We developed a theoretical framework to explore such situations. We demonstrate that the parameter-misspecification can critically affect the results of group comparison. We demonstrate that even if the RPE response in patients is completely intact, a spurious difference to healthy controls is observable. Such a situation occurs when the ground-truth learning rate differs between groups but a common learning rate is used, as per previous studies. Furthermore, even if the parameters are appropriately fitted to individual participants, spurious group differences in RPE responses are observable when the model lacks a component that differs between groups. These results highlight the importance of appropriate model-fitting and the need for caution when interpreting the results of model-based fMRI.

Collapse