Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Balcarras M, Ardid S, Kaping D, Everling S, Womelsdorf T. Attentional Selection Can Be Predicted by Reinforcement Learning of Task-relevant Stimulus Features Weighted by Value-independent Stickiness. J Cogn Neurosci 2015;28:333-49. [PMID: 26488586 DOI: 10.1162/jocn_a_00894] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

For:	Balcarras M, Ardid S, Kaping D, Everling S, Womelsdorf T. Attentional Selection Can Be Predicted by Reinforcement Learning of Task-relevant Stimulus Features Weighted by Value-independent Stickiness. J Cogn Neurosci 2015;28:333-49. [PMID: 26488586 DOI: 10.1162/jocn_a_00894] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Number

Cited by Other Article(s)

Paunov A, L’Hôtellier M, Guo D, He Z, Yu A, Meyniel F. Multiple and subject-specific roles of uncertainty in reward-guided decision-making. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.27.587016. [PMID: 38585958 PMCID: PMC10996615 DOI: 10.1101/2024.03.27.587016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]

Webb J, Steffan P, Hayden BY, Lee D, Kemere C, McGinley M. Foraging Under Uncertainty Follows the Marginal Value Theorem with Bayesian Updating of Environment Representations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.30.587253. [PMID: 38585964 PMCID: PMC10996644 DOI: 10.1101/2024.03.30.587253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]

Colas JT, O’Doherty JP, Grafton ST. Active reinforcement learning versus action bias and hysteresis: control with a mixture of experts and nonexperts. PLoS Comput Biol 2024;20:e1011950. [PMID: 38552190 PMCID: PMC10980507 DOI: 10.1371/journal.pcbi.1011950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 02/26/2024] [Indexed: 04/01/2024] Open

Abstract

Active reinforcement learning enables dynamic prediction and control, where one should not only maximize rewards but also minimize costs such as of inference, decisions, actions, and time. For an embodied agent such as a human, decisions are also shaped by physical aspects of actions. Beyond the effects of reward outcomes on learning processes, to what extent can modeling of behavior in a reinforcement-learning task be complicated by other sources of variance in sequential action choices? What of the effects of action bias (for actions per se) and action hysteresis determined by the history of actions chosen previously? The present study addressed these questions with incremental assembly of models for the sequential choice data from a task with hierarchical structure for additional complexity in learning. With systematic comparison and falsification of computational models, human choices were tested for signatures of parallel modules representing not only an enhanced form of generalized reinforcement learning but also action bias and hysteresis. We found evidence for substantial differences in bias and hysteresis across participants-even comparable in magnitude to the individual differences in learning. Individuals who did not learn well revealed the greatest biases, but those who did learn accurately were also significantly biased. The direction of hysteresis varied among individuals as repetition or, more commonly, alternation biases persisting from multiple previous actions. Considering that these actions were button presses with trivial motor demands, the idiosyncratic forces biasing sequences of action choices were robust enough to suggest ubiquity across individuals and across tasks requiring various actions. In light of how bias and hysteresis function as a heuristic for efficient control that adapts to uncertainty or low motivation by minimizing the cost of effort, these phenomena broaden the consilient theory of a mixture of experts to encompass a mixture of expert and nonexpert controllers of behavior.

Collapse

Beron CC, Neufeld SQ, Linderman SW, Sabatini BL. Mice exhibit stochastic and efficient action switching during probabilistic decision making. Proc Natl Acad Sci U S A 2022;119:e2113961119. [PMID: 35385355 PMCID: PMC9169659 DOI: 10.1073/pnas.2113961119] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 03/03/2022] [Indexed: 12/05/2022] Open

Banaie Boroujeni K, Tiesinga P, Womelsdorf T. Interneuron-specific gamma synchronization indexes cue uncertainty and prediction errors in lateral prefrontal and anterior cingulate cortex. eLife 2021;10:69111. [PMID: 34142661 PMCID: PMC8248985 DOI: 10.7554/elife.69111] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 06/17/2021] [Indexed: 12/27/2022] Open

Iyer ES, Kairiss MA, Liu A, Otto AR, Bagot RC. Probing relationships between reinforcement learning and simple behavioral strategies to understand probabilistic reward learning. J Neurosci Methods 2020;341:108777. [PMID: 32417532 DOI: 10.1016/j.jneumeth.2020.108777] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Revised: 04/22/2020] [Accepted: 05/11/2020] [Indexed: 11/18/2022]

Abstract

BACKGROUND

Reinforcement learning (RL) and win stay/lose shift model accounts of decision making are both widely used to describe how individuals learn about and interact with rewarding environments. Though mutually informative, these accounts are often conceptualized as independent processes and so the potential relationships between win stay/lose shift tendencies and RL parameters have not been explored.

NEW METHOD

We introduce a methodology to directly relate RL parameters to behavioral strategy. Specifically, by calculating a truncated multivariate normal distribution of RL parameters given win stay/lose shift tendencies from simulating these tendencies across the parameter space, we maximize the normal distribution for a given set of win stay/lose shift tendencies to approximate reinforcement learning parameters.

RESULTS

We demonstrate novel relationships between win stay/lose shift tendencies and RL parameters that challenge conventional interpretations of lose shift as a metric of loss sensitivity. Further, we demonstrate in both simulated and empirical data that this method of parameter approximation yields reliable parameter recovery.

COMPARISON WITH EXISTING METHOD

We compare this method against the conventionally used maximum likelihood estimation method for parameter approximation in simulated noisy and empirical data. For simulated noisy data, we show that this method performs similarly to maximum likelihood estimation. For empirical data, however, this method provides a more reliable approximation of reinforcement learning parameters than maximum likelihood estimation.

CONCLUSIONS

We demonstrate the existence of relationships between win stay/lose shift tendencies and RL parameters and introduce a method that leverages these relationships to enable recovery of RL parameters exclusively from win stay/lose shift tendencies.

Collapse

Fast spiking interneuron activity in primate striatum tracks learning of attention cues. Proc Natl Acad Sci U S A 2020;117:18049-18058. [PMID: 32661170 DOI: 10.1073/pnas.2001348117] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open

Azimi M, Oemisch M, Womelsdorf T. Dissociation of nicotinic α7 and α4/β2 sub-receptor agonists for enhancing learning and attentional filtering in nonhuman primates. Psychopharmacology (Berl) 2020;237:997-1010. [PMID: 31865424 DOI: 10.1007/s00213-019-05430-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/07/2018] [Accepted: 12/11/2019] [Indexed: 12/22/2022]

Abstract

RATIONALE

Nicotinic acetylcholine receptors (nAChRs) modulate attention, memory, and higher executive functioning, but it is unclear how nACh sub-receptors mediate different mechanisms supporting these functions.

OBJECTIVES

We investigated whether selective agonists for the alpha-7 nAChR versus the alpha-4/beta-2 nAChR have unique functional contributions for value learning and attentional filtering of distractors in the nonhuman primate.

METHODS

Two adult rhesus macaque monkeys performed reversal learning following systemic administration of either the alpha-7 nAChR agonist PHA-543613 or the alpha-4/beta-2 nAChR agonist ABT-089 or a vehicle control. Behavioral analysis quantified performance accuracy, speed of processing, reversal learning speed, the control of distractor interference, perseveration tendencies, and motivation.

RESULTS

We found that the alpha-7 nAChR agonist PHA-543613 enhanced the learning speed of feature values but did not modulate how salient distracting information was filtered from ongoing choice processes. In contrast, the selective alpha-4/beta-2 nAChR agonist ABT-089 did not affect learning speed but reduced distractibility. This dissociation was dose-dependent and evident in the absence of systematic changes in overall performance, reward intake, motivation to perform the task, perseveration tendencies, or reaction times.

CONCLUSIONS

These results suggest nicotinic sub-receptor specific mechanisms consistent with (1) alpha-4/beta-2 nAChR specific amplification of cholinergic transients in prefrontal cortex linked to enhanced cue detection in light of interferences, and (2) alpha-7 nAChR specific activation prolonging cholinergic transients, which could facilitate subjects to follow-through with newly established attentional strategies when outcome contingencies change. These insights will be critical for developing function-specific drugs alleviating attention and learning deficits in neuro-psychiatric diseases.

Collapse

Miller KJ, Shenhav A, Ludvig EA. Habits without values. Psychol Rev 2019;126:292-311. [PMID: 30676040 DOI: 10.1037/rev0000120] [Citation(s) in RCA: 99] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Oemisch M, Westendorff S, Azimi M, Hassani SA, Ardid S, Tiesinga P, Womelsdorf T. Feature-specific prediction errors and surprise across macaque fronto-striatal circuits. Nat Commun 2019;10:176. [PMID: 30635579 PMCID: PMC6329800 DOI: 10.1038/s41467-018-08184-9] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Accepted: 12/20/2018] [Indexed: 01/23/2023] Open

Havenith MN, Zijderveld PM, van Heukelum S, Abghari S, Glennon JC, Tiesinga P. The Virtual-Environment-Foraging Task enables rapid training and single-trial metrics of attention in head-fixed mice. Sci Rep 2018;8:17371. [PMID: 30478333 PMCID: PMC6255915 DOI: 10.1038/s41598-018-34966-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2016] [Accepted: 10/25/2018] [Indexed: 01/12/2023] Open

Talmi D, Slapkova M, Wieser MJ. Testing the Possibility of Model-based Pavlovian Control of Attention to Threat. J Cogn Neurosci 2018;31:36-48. [PMID: 30156504 DOI: 10.1162/jocn_a_01329] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]

Hassani SA, Oemisch M, Balcarras M, Westendorff S, Ardid S, van der Meer MA, Tiesinga P, Womelsdorf T. A computational psychiatry approach identifies how alpha-2A noradrenergic agonist Guanfacine affects feature-based reinforcement learning in the macaque. Sci Rep 2017;7:40606. [PMID: 28091572 PMCID: PMC5238510 DOI: 10.1038/srep40606] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2016] [Accepted: 12/08/2016] [Indexed: 01/05/2023] Open

Balcarras M, Womelsdorf T. A Flexible Mechanism of Rule Selection Enables Rapid Feature-Based Reinforcement Learning. Front Neurosci 2016;10:125. [PMID: 27064794 PMCID: PMC4811957 DOI: 10.3389/fnins.2016.00125] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2015] [Accepted: 03/14/2016] [Indexed: 11/13/2022] Open

Abstract

Learning in a new environment is influenced by prior learning and experience. Correctly applying a rule that maps a context to stimuli, actions, and outcomes enables faster learning and better outcomes compared to relying on strategies for learning that are ignorant of task structure. However, it is often difficult to know when and how to apply learned rules in new contexts. In our study we explored how subjects employ different strategies for learning the relationship between stimulus features and positive outcomes in a probabilistic task context. We test the hypothesis that task naive subjects will show enhanced learning of feature specific reward associations by switching to the use of an abstract rule that associates stimuli by feature type and restricts selections to that dimension. To test this hypothesis we designed a decision making task where subjects receive probabilistic feedback following choices between pairs of stimuli. In the task, trials are grouped in two contexts by blocks, where in one type of block there is no unique relationship between a specific feature dimension (stimulus shape or color) and positive outcomes, and following an un-cued transition, alternating blocks have outcomes that are linked to either stimulus shape or color. Two-thirds of subjects (n = 22/32) exhibited behavior that was best fit by a hierarchical feature-rule model. Supporting the prediction of the model mechanism these subjects showed significantly enhanced performance in feature-reward blocks, and rapidly switched their choice strategy to using abstract feature rules when reward contingencies changed. Choice behavior of other subjects (n = 10/32) was fit by a range of alternative reinforcement learning models representing strategies that do not benefit from applying previously learned rules. In summary, these results show that untrained subjects are capable of flexibly shifting between behavioral rules by leveraging simple model-free reinforcement learning and context-specific selections to drive responses.

Collapse