1
|
Paunov A, L’Hôtellier M, Guo D, He Z, Yu A, Meyniel F. Multiple and subject-specific roles of uncertainty in reward-guided decision-making. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.27.587016. [PMID: 38585958 PMCID: PMC10996615 DOI: 10.1101/2024.03.27.587016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Decision-making in noisy, changing, and partially observable environments entails a basic tradeoff between immediate reward and longer-term information gain, known as the exploration-exploitation dilemma. Computationally, an effective way to balance this tradeoff is by leveraging uncertainty to guide exploration. Yet, in humans, empirical findings are mixed, from suggesting uncertainty-seeking to indifference and avoidance. In a novel bandit task that better captures uncertainty-driven behavior, we find multiple roles for uncertainty in human choices. First, stable and psychologically meaningful individual differences in uncertainty preferences actually range from seeking to avoidance, which can manifest as null group-level effects. Second, uncertainty modulates the use of basic decision heuristics that imperfectly exploit immediate rewards: a repetition bias and win-stay-lose-shift heuristic. These heuristics interact with uncertainty, favoring heuristic choices under higher uncertainty. These results, highlighting the rich and varied structure of reward-based choice, are a step to understanding its functional basis and dysfunction in psychopathology.
Collapse
Affiliation(s)
- Alexander Paunov
- INSERM-CEA Cognitive Neuroimaging Unit (UNICOG), NeuroSpin Center, CEA Paris-Saclay, Gif-sur-Yvette, France Université de Paris, Paris, France
- Institut de Neuromodulation, GHU Paris, Psychiatrie et Neurosciences, Centre Hospitalier Sainte-Anne, Pôle Hospitalo-Universitaire 15, Université Paris Cité, Paris, France
| | - Maëva L’Hôtellier
- INSERM-CEA Cognitive Neuroimaging Unit (UNICOG), NeuroSpin Center, CEA Paris-Saclay, Gif-sur-Yvette, France Université de Paris, Paris, France
| | - Dalin Guo
- Department of Cognitive Science, University of California San Diego, San Diego, CA, USA
| | - Zoe He
- Department of Cognitive Science, University of California San Diego, San Diego, CA, USA
| | - Angela Yu
- Department of Cognitive Science, University of California San Diego, San Diego, CA, USA
- Centre for Cognitive Science & Hessian AI Center, Technical University of Darmstadt, Germany
| | - Florent Meyniel
- INSERM-CEA Cognitive Neuroimaging Unit (UNICOG), NeuroSpin Center, CEA Paris-Saclay, Gif-sur-Yvette, France Université de Paris, Paris, France
- Institut de Neuromodulation, GHU Paris, Psychiatrie et Neurosciences, Centre Hospitalier Sainte-Anne, Pôle Hospitalo-Universitaire 15, Université Paris Cité, Paris, France
| |
Collapse
|
2
|
Webb J, Steffan P, Hayden BY, Lee D, Kemere C, McGinley M. Foraging Under Uncertainty Follows the Marginal Value Theorem with Bayesian Updating of Environment Representations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.30.587253. [PMID: 38585964 PMCID: PMC10996644 DOI: 10.1101/2024.03.30.587253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Foraging theory has been a remarkably successful approach to understanding the behavior of animals in many contexts. In patch-based foraging contexts, the marginal value theorem (MVT) shows that the optimal strategy is to leave a patch when the marginal rate of return declines to the average for the environment. However, the MVT is only valid in deterministic environments whose statistics are known to the forager; naturalistic environments seldom meet these strict requirements. As a result, the strategies used by foragers in naturalistic environments must be empirically investigated. We developed a novel behavioral task and a corresponding computational framework for studying patch-leaving decisions in head-fixed and freely moving mice. We varied between-patch travel time, as well as within-patch reward depletion rate, both deterministically and stochastically. We found that mice adopt patch residence times in a manner consistent with the MVT and not explainable by simple ethologically motivated heuristic strategies. Critically, behavior was best accounted for by a modified form of the MVT wherein environment representations were updated based on local variations in reward timing, captured by a Bayesian estimator and dynamic prior. Thus, we show that mice can strategically attend to, learn from, and exploit task structure on multiple timescales simultaneously, thereby efficiently foraging in volatile environments. The results provide a foundation for applying the systems neuroscience toolkit in freely moving and head-fixed mice to understand the neural basis of foraging under uncertainty.
Collapse
Affiliation(s)
- James Webb
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital, Houston, TX, USA
| | - Paul Steffan
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
| | - Benjamin Y. Hayden
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - Daeyeol Lee
- The Zanvyl Krieger Mind/Brain Institute, The Solomon H Snyder Department of Neuroscience, Department of Psychological and Brain Sciences, Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, MD, USA
| | - Caleb Kemere
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
| | - Matthew McGinley
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital, Houston, TX, USA
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
| |
Collapse
|
3
|
Colas JT, O’Doherty JP, Grafton ST. Active reinforcement learning versus action bias and hysteresis: control with a mixture of experts and nonexperts. PLoS Comput Biol 2024; 20:e1011950. [PMID: 38552190 PMCID: PMC10980507 DOI: 10.1371/journal.pcbi.1011950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 02/26/2024] [Indexed: 04/01/2024] Open
Abstract
Active reinforcement learning enables dynamic prediction and control, where one should not only maximize rewards but also minimize costs such as of inference, decisions, actions, and time. For an embodied agent such as a human, decisions are also shaped by physical aspects of actions. Beyond the effects of reward outcomes on learning processes, to what extent can modeling of behavior in a reinforcement-learning task be complicated by other sources of variance in sequential action choices? What of the effects of action bias (for actions per se) and action hysteresis determined by the history of actions chosen previously? The present study addressed these questions with incremental assembly of models for the sequential choice data from a task with hierarchical structure for additional complexity in learning. With systematic comparison and falsification of computational models, human choices were tested for signatures of parallel modules representing not only an enhanced form of generalized reinforcement learning but also action bias and hysteresis. We found evidence for substantial differences in bias and hysteresis across participants-even comparable in magnitude to the individual differences in learning. Individuals who did not learn well revealed the greatest biases, but those who did learn accurately were also significantly biased. The direction of hysteresis varied among individuals as repetition or, more commonly, alternation biases persisting from multiple previous actions. Considering that these actions were button presses with trivial motor demands, the idiosyncratic forces biasing sequences of action choices were robust enough to suggest ubiquity across individuals and across tasks requiring various actions. In light of how bias and hysteresis function as a heuristic for efficient control that adapts to uncertainty or low motivation by minimizing the cost of effort, these phenomena broaden the consilient theory of a mixture of experts to encompass a mixture of expert and nonexpert controllers of behavior.
Collapse
Affiliation(s)
- Jaron T. Colas
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, California, United States of America
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, California, United States of America
- Computation and Neural Systems Program, California Institute of Technology, Pasadena, California, United States of America
| | - John P. O’Doherty
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, California, United States of America
- Computation and Neural Systems Program, California Institute of Technology, Pasadena, California, United States of America
| | - Scott T. Grafton
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, California, United States of America
| |
Collapse
|
4
|
Beron CC, Neufeld SQ, Linderman SW, Sabatini BL. Mice exhibit stochastic and efficient action switching during probabilistic decision making. Proc Natl Acad Sci U S A 2022; 119:e2113961119. [PMID: 35385355 PMCID: PMC9169659 DOI: 10.1073/pnas.2113961119] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 03/03/2022] [Indexed: 12/05/2022] Open
Abstract
In probabilistic and nonstationary environments, individuals must use internal and external cues to flexibly make decisions that lead to desirable outcomes. To gain insight into the process by which animals choose between actions, we trained mice in a task with time-varying reward probabilities. In our implementation of such a two-armed bandit task, thirsty mice use information about recent action and action–outcome histories to choose between two ports that deliver water probabilistically. Here we comprehensively modeled choice behavior in this task, including the trial-to-trial changes in port selection, i.e., action switching behavior. We find that mouse behavior is, at times, deterministic and, at others, apparently stochastic. The behavior deviates from that of a theoretically optimal agent performing Bayesian inference in a hidden Markov model (HMM). We formulate a set of models based on logistic regression, reinforcement learning, and sticky Bayesian inference that we demonstrate are mathematically equivalent and that accurately describe mouse behavior. The switching behavior of mice in the task is captured in each model by a stochastic action policy, a history-dependent representation of action value, and a tendency to repeat actions despite incoming evidence. The models parsimoniously capture behavior across different environmental conditionals by varying the stickiness parameter, and like the mice, they achieve nearly maximal reward rates. These results indicate that mouse behavior reaches near-maximal performance with reduced action switching and can be described by a set of equivalent models with a small number of relatively fixed parameters.
Collapse
Affiliation(s)
- Celia C. Beron
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115
- HHMI, Harvard Medical School, Boston, MA 02115
| | - Shay Q. Neufeld
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115
- HHMI, Harvard Medical School, Boston, MA 02115
| | - Scott W. Linderman
- Department of Statistics, Stanford University, Stanford, CA 94305
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA 94305
| | - Bernardo L. Sabatini
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115
- HHMI, Harvard Medical School, Boston, MA 02115
| |
Collapse
|
5
|
Banaie Boroujeni K, Tiesinga P, Womelsdorf T. Interneuron-specific gamma synchronization indexes cue uncertainty and prediction errors in lateral prefrontal and anterior cingulate cortex. eLife 2021; 10:69111. [PMID: 34142661 PMCID: PMC8248985 DOI: 10.7554/elife.69111] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 06/17/2021] [Indexed: 12/27/2022] Open
Abstract
Inhibitory interneurons are believed to realize critical gating functions in cortical circuits, but it has been difficult to ascertain the content of gated information for well-characterized interneurons in primate cortex. Here, we address this question by characterizing putative interneurons in primate prefrontal and anterior cingulate cortex while monkeys engaged in attention demanding reversal learning. We find that subclasses of narrow spiking neurons have a relative suppressive effect on the local circuit indicating they are inhibitory interneurons. One of these interneuron subclasses showed prominent firing rate modulations and (35–45 Hz) gamma synchronous spiking during periods of uncertainty in both, lateral prefrontal cortex (LPFC) and anterior cingulate cortex (ACC). In LPFC, this interneuron subclass activated when the uncertainty of attention cues was resolved during flexible learning, whereas in ACC it fired and gamma-synchronized when outcomes were uncertain and prediction errors were high during learning. Computational modeling of this interneuron-specific gamma band activity in simple circuit motifs suggests it could reflect a soft winner-take-all gating of information having high degree of uncertainty. Together, these findings elucidate an electrophysiologically characterized interneuron subclass in the primate, that forms gamma synchronous networks in two different areas when resolving uncertainty during adaptive goal-directed behavior.
Collapse
Affiliation(s)
| | - Paul Tiesinga
- Donders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen, Nijmegen, Netherlands
| | - Thilo Womelsdorf
- Department of Psychology, Vanderbilt University, Nashville, United States.,Department of Biology, Centre for Vision Research, York University, Toronto, Canada
| |
Collapse
|
6
|
Iyer ES, Kairiss MA, Liu A, Otto AR, Bagot RC. Probing relationships between reinforcement learning and simple behavioral strategies to understand probabilistic reward learning. J Neurosci Methods 2020; 341:108777. [PMID: 32417532 DOI: 10.1016/j.jneumeth.2020.108777] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Revised: 04/22/2020] [Accepted: 05/11/2020] [Indexed: 11/18/2022]
Abstract
BACKGROUND Reinforcement learning (RL) and win stay/lose shift model accounts of decision making are both widely used to describe how individuals learn about and interact with rewarding environments. Though mutually informative, these accounts are often conceptualized as independent processes and so the potential relationships between win stay/lose shift tendencies and RL parameters have not been explored. NEW METHOD We introduce a methodology to directly relate RL parameters to behavioral strategy. Specifically, by calculating a truncated multivariate normal distribution of RL parameters given win stay/lose shift tendencies from simulating these tendencies across the parameter space, we maximize the normal distribution for a given set of win stay/lose shift tendencies to approximate reinforcement learning parameters. RESULTS We demonstrate novel relationships between win stay/lose shift tendencies and RL parameters that challenge conventional interpretations of lose shift as a metric of loss sensitivity. Further, we demonstrate in both simulated and empirical data that this method of parameter approximation yields reliable parameter recovery. COMPARISON WITH EXISTING METHOD We compare this method against the conventionally used maximum likelihood estimation method for parameter approximation in simulated noisy and empirical data. For simulated noisy data, we show that this method performs similarly to maximum likelihood estimation. For empirical data, however, this method provides a more reliable approximation of reinforcement learning parameters than maximum likelihood estimation. CONCLUSIONS We demonstrate the existence of relationships between win stay/lose shift tendencies and RL parameters and introduce a method that leverages these relationships to enable recovery of RL parameters exclusively from win stay/lose shift tendencies.
Collapse
Affiliation(s)
- Eshaan S Iyer
- Integrated Program in Neuroscience, McGill University, 3801 Rue University, Montréal, QC H3A 2B4, Canada
| | - Megan A Kairiss
- Department of Psychology, McGill University, 1205 Ave Dr. Penfield, Montréal, QC H3A 1B1, Canada
| | - Adrian Liu
- Department of Physics, McGill University, 3600 Rue University, Montréal, QC H3A 2T8, Canada
| | - A Ross Otto
- Department of Psychology, McGill University, 1205 Ave Dr. Penfield, Montréal, QC H3A 1B1, Canada
| | - Rosemary C Bagot
- Department of Psychology, McGill University, 1205 Ave Dr. Penfield, Montréal, QC H3A 1B1, Canada; Ludmer Centre for Neuroinformatics and Mental Health, 3661 Rue University, Montréal, QC H3A 2B3, Canada.
| |
Collapse
|
7
|
Fast spiking interneuron activity in primate striatum tracks learning of attention cues. Proc Natl Acad Sci U S A 2020; 117:18049-18058. [PMID: 32661170 DOI: 10.1073/pnas.2001348117] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Cognitive flexibility depends on a fast neural learning mechanism for enhancing momentary relevant over irrelevant information. A possible neural mechanism realizing this enhancement uses fast spiking interneurons (FSIs) in the striatum to train striatal projection neurons to gate relevant and suppress distracting cortical inputs. We found support for such a mechanism in nonhuman primates during the flexible adjustment of visual attention in a reversal learning task. FSI activity was modulated by visual attention cues during feature-based learning. One FSI subpopulation showed stronger activation during learning, while another FSI subpopulation showed response suppression after learning, which could indicate a disinhibitory effect on the local circuit. Additionally, FSIs that showed response suppression to learned attention cues were activated by salient distractor events, suggesting they contribute to suppressing bottom-up distraction. These findings suggest that striatal fast spiking interneurons play an important role when cues are learned that redirect attention away from previously relevant to newly relevant visual information. This cue-specific activity was independent of motor-related activity and thus tracked specifically the learning of reward predictive visual features.
Collapse
|
8
|
Azimi M, Oemisch M, Womelsdorf T. Dissociation of nicotinic α7 and α4/β2 sub-receptor agonists for enhancing learning and attentional filtering in nonhuman primates. Psychopharmacology (Berl) 2020; 237:997-1010. [PMID: 31865424 DOI: 10.1007/s00213-019-05430-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/07/2018] [Accepted: 12/11/2019] [Indexed: 12/22/2022]
Abstract
RATIONALE Nicotinic acetylcholine receptors (nAChRs) modulate attention, memory, and higher executive functioning, but it is unclear how nACh sub-receptors mediate different mechanisms supporting these functions. OBJECTIVES We investigated whether selective agonists for the alpha-7 nAChR versus the alpha-4/beta-2 nAChR have unique functional contributions for value learning and attentional filtering of distractors in the nonhuman primate. METHODS Two adult rhesus macaque monkeys performed reversal learning following systemic administration of either the alpha-7 nAChR agonist PHA-543613 or the alpha-4/beta-2 nAChR agonist ABT-089 or a vehicle control. Behavioral analysis quantified performance accuracy, speed of processing, reversal learning speed, the control of distractor interference, perseveration tendencies, and motivation. RESULTS We found that the alpha-7 nAChR agonist PHA-543613 enhanced the learning speed of feature values but did not modulate how salient distracting information was filtered from ongoing choice processes. In contrast, the selective alpha-4/beta-2 nAChR agonist ABT-089 did not affect learning speed but reduced distractibility. This dissociation was dose-dependent and evident in the absence of systematic changes in overall performance, reward intake, motivation to perform the task, perseveration tendencies, or reaction times. CONCLUSIONS These results suggest nicotinic sub-receptor specific mechanisms consistent with (1) alpha-4/beta-2 nAChR specific amplification of cholinergic transients in prefrontal cortex linked to enhanced cue detection in light of interferences, and (2) alpha-7 nAChR specific activation prolonging cholinergic transients, which could facilitate subjects to follow-through with newly established attentional strategies when outcome contingencies change. These insights will be critical for developing function-specific drugs alleviating attention and learning deficits in neuro-psychiatric diseases.
Collapse
Affiliation(s)
- Marzyeh Azimi
- Department of Biology, Centre for Vision Research, York University, Toronto, Ontario, M6J 1P3, Canada
| | - Mariann Oemisch
- Department of Biology, Centre for Vision Research, York University, Toronto, Ontario, M6J 1P3, Canada.,The Zanvyl Krieger Mind/Brain Institute, Department of Neuroscience, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Thilo Womelsdorf
- Department of Biology, Centre for Vision Research, York University, Toronto, Ontario, M6J 1P3, Canada. .,Department of Psychology, Vanderbilt University, PMB 407817, 2301, Vanderbilt Place, Nashville, TN, 37240-7817, USA.
| |
Collapse
|
9
|
Abstract
Habits form a crucial component of behavior. In recent years, key computational models have conceptualized habits as arising from model-free reinforcement learning mechanisms, which typically select between available actions based on the future value expected to result from each. Traditionally, however, habits have been understood as behaviors that can be triggered directly by a stimulus, without requiring the animal to evaluate expected outcomes. Here, we develop a computational model instantiating this traditional view, in which habits develop through the direct strengthening of recently taken actions rather than through the encoding of outcomes. We demonstrate that this model accounts for key behavioral manifestations of habits, including insensitivity to outcome devaluation and contingency degradation, as well as the effects of reinforcement schedule on the rate of habit formation. The model also explains the prevalent observation of perseveration in repeated-choice tasks as an additional behavioral manifestation of the habit system. We suggest that mapping habitual behaviors onto value-free mechanisms provides a parsimonious account of existing behavioral and neural data. This mapping may provide a new foundation for building robust and comprehensive models of the interaction of habits with other, more goal-directed types of behaviors and help to better guide research into the neural mechanisms underlying control of instrumental behavior more generally. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
Collapse
Affiliation(s)
| | - Amitai Shenhav
- Department of Cognitive, Linguistic, and Psychological Sciences, Brown Institute for Brain Science, Brown University
| | | |
Collapse
|
10
|
Oemisch M, Westendorff S, Azimi M, Hassani SA, Ardid S, Tiesinga P, Womelsdorf T. Feature-specific prediction errors and surprise across macaque fronto-striatal circuits. Nat Commun 2019; 10:176. [PMID: 30635579 PMCID: PMC6329800 DOI: 10.1038/s41467-018-08184-9] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Accepted: 12/20/2018] [Indexed: 01/23/2023] Open
Abstract
To adjust expectations efficiently, prediction errors need to be associated with the precise features that gave rise to the unexpected outcome, but this credit assignment may be problematic if stimuli differ on multiple dimensions and it is ambiguous which feature dimension caused the outcome. Here, we report a potential solution: neurons in four recorded areas of the anterior fronto-striatal networks encode prediction errors that are specific to feature values of different dimensions of attended multidimensional stimuli. The most ubiquitous prediction error occurred for the reward-relevant dimension. Feature-specific prediction error signals a) emerge on average shortly after non-specific prediction error signals, b) arise earliest in the anterior cingulate cortex and later in dorsolateral prefrontal cortex, caudate and ventral striatum, and c) contribute to feature-based stimulus selection after learning. Thus, a widely-distributed feature-specific eligibility trace may be used to update synaptic weights for improved feature-based attention. In order to adjust expectations efficiently, prediction errors need to be associated with the features that gave rise to the unexpected outcome. Here, the authors show that neurons in anterior fronto-striatal networks encode prediction errors that are specific to feature values of different stimulus dimensions.
Collapse
Affiliation(s)
- Mariann Oemisch
- Department of Biology, Centre for Vision Research, York University, 4700 Keele Street, Toronto, ON, M6J 1P3, Canada. .,Department of Neuroscience, Yale University School of Medicine, New Haven, CT, 06510, USA.
| | - Stephanie Westendorff
- Department of Biology, Centre for Vision Research, York University, 4700 Keele Street, Toronto, ON, M6J 1P3, Canada.,Institute of Neurobiology, University of Tübingen, Tübingen, 72076, Germany
| | - Marzyeh Azimi
- Department of Biology, Centre for Vision Research, York University, 4700 Keele Street, Toronto, ON, M6J 1P3, Canada
| | - Seyed Alireza Hassani
- Department of Biology, Centre for Vision Research, York University, 4700 Keele Street, Toronto, ON, M6J 1P3, Canada.,Department of Psychology, Vanderbilt University, Nashville, TN, 37240, USA
| | - Salva Ardid
- Department of Mathematics and Statistics, Boston University, Boston, MA, 02215, USA
| | - Paul Tiesinga
- Donders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen, Nijmegen, 6525 EN, Netherlands
| | - Thilo Womelsdorf
- Department of Biology, Centre for Vision Research, York University, 4700 Keele Street, Toronto, ON, M6J 1P3, Canada. .,Department of Psychology, Vanderbilt University, Nashville, TN, 37240, USA.
| |
Collapse
|
11
|
Havenith MN, Zijderveld PM, van Heukelum S, Abghari S, Glennon JC, Tiesinga P. The Virtual-Environment-Foraging Task enables rapid training and single-trial metrics of attention in head-fixed mice. Sci Rep 2018; 8:17371. [PMID: 30478333 PMCID: PMC6255915 DOI: 10.1038/s41598-018-34966-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2016] [Accepted: 10/25/2018] [Indexed: 01/12/2023] Open
Abstract
Attention - the flexible allocation of processing resources based on behavioural demands - is essential to survival. Mouse research offers unique tools to dissect the underlying pathways, but is hampered by the difficulty of accurately measuring attention in mice. Current attention tasks for mice face several limitations: Binary (hit/miss), temporally imprecise metrics, behavioural confounds and overtraining. Thus, despite the increasing scope of neuronal population measurements, insights are limited without equally precise behavioural measures. Here we present a virtual-environment task for head-fixed mice based on 'foraging-like' navigation. The task requires animals to discriminate gratings at orientation differences from 90° to 5°, and can be learned in only 3-5 sessions (<550 trials). It yields single-trial, non-binary metrics of response speed and accuracy, which generate secondary metrics of choice certainty, visual acuity, and most importantly, of sustained and cued attention - two attentional components studied extensively in humans. This allows us to examine single-trial dynamics of attention in mice, independently of confounds like rule learning. With this approach, we show that C57/BL6 mice have better visual acuity than previously measured, that they rhythmically alternate between states of high and low alertness, and that they can be prompted to adopt different performance strategies using minute changes in reward contingencies.
Collapse
Affiliation(s)
- Martha N Havenith
- Donders Institute for Brain, Cognition and Behaviour, Kapittelweg, 29 6525EN, Nijmegen, The Netherlands.
| | - Peter M Zijderveld
- Donders Institute for Brain, Cognition and Behaviour, Kapittelweg, 29 6525EN, Nijmegen, The Netherlands
| | - Sabrina van Heukelum
- Donders Institute for Brain, Cognition and Behaviour, Kapittelweg, 29 6525EN, Nijmegen, The Netherlands
| | - Shaghayegh Abghari
- Donders Institute for Brain, Cognition and Behaviour, Kapittelweg, 29 6525EN, Nijmegen, The Netherlands
| | - Jeffrey C Glennon
- Donders Institute for Brain, Cognition and Behaviour, Kapittelweg, 29 6525EN, Nijmegen, The Netherlands
| | - Paul Tiesinga
- Donders Institute for Brain, Cognition and Behaviour, Kapittelweg, 29 6525EN, Nijmegen, The Netherlands
| |
Collapse
|
12
|
Talmi D, Slapkova M, Wieser MJ. Testing the Possibility of Model-based Pavlovian Control of Attention to Threat. J Cogn Neurosci 2018; 31:36-48. [PMID: 30156504 DOI: 10.1162/jocn_a_01329] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Signals for reward or punishment attract attention preferentially, a principle termed value-modulated attention capture (VMAC). The mechanisms that govern the allocation of attention can be described with a terminology that is more often applied to the control of overt behaviors, namely, the distinction between instrumental and Pavlovian control, and between model-free and model-based control. Although instrumental control of VMAC can be either model-free or model-based, it is not known whether Pavlovian control of VMAC can be model-based. To decide whether this is possible, we measured steady-state visual evoked potentials (SSVEPs) while 20 healthy adults took part in a novel task. During the learning stage, participants underwent aversive threat conditioning with two conditioned stimuli (CSs): one that predicted pain (CS+) and one that predicted safety (CS-). Instructions given before the test stage allowed participants to infer whether novel, ambiguous CSs (new_CS+/new_CS-) were threatening or safe. Correct inference required combining stored internal representations and new propositional information, the hallmark of model-based control. SSVEP amplitudes quantified the amount of attention allocated to novel CSs on their very first presentation, before they were ever reinforced. We found that SSVEPs were higher for new_CS+ than new_CS-. This result is potentially indicative of model-based Pavlovian control of VMAC, but additional controls are necessary to verify this conclusively. This result underlines the potential transformative role of information and inference in emotion regulation.
Collapse
|
13
|
Hassani SA, Oemisch M, Balcarras M, Westendorff S, Ardid S, van der Meer MA, Tiesinga P, Womelsdorf T. A computational psychiatry approach identifies how alpha-2A noradrenergic agonist Guanfacine affects feature-based reinforcement learning in the macaque. Sci Rep 2017; 7:40606. [PMID: 28091572 PMCID: PMC5238510 DOI: 10.1038/srep40606] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2016] [Accepted: 12/08/2016] [Indexed: 01/05/2023] Open
Abstract
Noradrenaline is believed to support cognitive flexibility through the alpha 2A noradrenergic receptor (a2A-NAR) acting in prefrontal cortex. Enhanced flexibility has been inferred from improved working memory with the a2A-NA agonist Guanfacine. But it has been unclear whether Guanfacine improves specific attention and learning mechanisms beyond working memory, and whether the drug effects can be formalized computationally to allow single subject predictions. We tested and confirmed these suggestions in a case study with a healthy nonhuman primate performing a feature-based reversal learning task evaluating performance using Bayesian and Reinforcement learning models. In an initial dose-testing phase we found a Guanfacine dose that increased performance accuracy, decreased distractibility and improved learning. In a second experimental phase using only that dose we examined the faster feature-based reversal learning with Guanfacine with single-subject computational modeling. Parameter estimation suggested that improved learning is not accounted for by varying a single reinforcement learning mechanism, but by changing the set of parameter values to higher learning rates and stronger suppression of non-chosen over chosen feature information. These findings provide an important starting point for developing nonhuman primate models to discern the synaptic mechanisms of attention and learning functions within the context of a computational neuropsychiatry framework.
Collapse
Affiliation(s)
- S. A. Hassani
- Department of Biology, Centre for Vision Research, York University, Toronto, Ontario M6J 1P3, Canada
| | - M. Oemisch
- Department of Biology, Centre for Vision Research, York University, Toronto, Ontario M6J 1P3, Canada
| | - M. Balcarras
- Department of Biology, Centre for Vision Research, York University, Toronto, Ontario M6J 1P3, Canada
| | - S. Westendorff
- Department of Biology, Centre for Vision Research, York University, Toronto, Ontario M6J 1P3, Canada
| | - S. Ardid
- Department of Mathematics, Boston University, Boston, MA 02215, USA
| | - M. A. van der Meer
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH 03755, USA
| | - P. Tiesinga
- Department of Neuroinformatics, Donders Centre for Neuroscience, Radboud University Nijmegen, Nijmegen, AJ 6525, The Netherlands
| | - T. Womelsdorf
- Department of Biology, Centre for Vision Research, York University, Toronto, Ontario M6J 1P3, Canada
| |
Collapse
|
14
|
Balcarras M, Womelsdorf T. A Flexible Mechanism of Rule Selection Enables Rapid Feature-Based Reinforcement Learning. Front Neurosci 2016; 10:125. [PMID: 27064794 PMCID: PMC4811957 DOI: 10.3389/fnins.2016.00125] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2015] [Accepted: 03/14/2016] [Indexed: 11/13/2022] Open
Abstract
Learning in a new environment is influenced by prior learning and experience. Correctly applying a rule that maps a context to stimuli, actions, and outcomes enables faster learning and better outcomes compared to relying on strategies for learning that are ignorant of task structure. However, it is often difficult to know when and how to apply learned rules in new contexts. In our study we explored how subjects employ different strategies for learning the relationship between stimulus features and positive outcomes in a probabilistic task context. We test the hypothesis that task naive subjects will show enhanced learning of feature specific reward associations by switching to the use of an abstract rule that associates stimuli by feature type and restricts selections to that dimension. To test this hypothesis we designed a decision making task where subjects receive probabilistic feedback following choices between pairs of stimuli. In the task, trials are grouped in two contexts by blocks, where in one type of block there is no unique relationship between a specific feature dimension (stimulus shape or color) and positive outcomes, and following an un-cued transition, alternating blocks have outcomes that are linked to either stimulus shape or color. Two-thirds of subjects (n = 22/32) exhibited behavior that was best fit by a hierarchical feature-rule model. Supporting the prediction of the model mechanism these subjects showed significantly enhanced performance in feature-reward blocks, and rapidly switched their choice strategy to using abstract feature rules when reward contingencies changed. Choice behavior of other subjects (n = 10/32) was fit by a range of alternative reinforcement learning models representing strategies that do not benefit from applying previously learned rules. In summary, these results show that untrained subjects are capable of flexibly shifting between behavioral rules by leveraging simple model-free reinforcement learning and context-specific selections to drive responses.
Collapse
Affiliation(s)
- Matthew Balcarras
- Department of Biology, Centre for Vision Research, York University Toronto, ON, Canada
| | - Thilo Womelsdorf
- Department of Biology, Centre for Vision Research, York University Toronto, ON, Canada
| |
Collapse
|