1
|
Nicholas J, Amlang C, Lin CYR, Montaser-Kouhsari L, Desai N, Pan MK, Kuo SH, Shohamy D. The Role of the Cerebellum in Learning to Predict Reward: Evidence from Cerebellar Ataxia. CEREBELLUM (LONDON, ENGLAND) 2024; 23:1355-1368. [PMID: 38066397 PMCID: PMC11161554 DOI: 10.1007/s12311-023-01633-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 11/02/2023] [Indexed: 01/25/2024]
Abstract
Recent findings in animals have challenged the traditional view of the cerebellum solely as the site of motor control, suggesting that the cerebellum may also be important for learning to predict reward from trial-and-error feedback. Yet, evidence for the role of the cerebellum in reward learning in humans is lacking. Moreover, open questions remain about which specific aspects of reward learning the cerebellum may contribute to. Here we address this gap through an investigation of multiple forms of reward learning in individuals with cerebellum dysfunction, represented by cerebellar ataxia cases. Nineteen participants with cerebellar ataxia and 57 age- and sex-matched healthy controls completed two separate tasks that required learning about reward contingencies from trial-and-error. To probe the selectivity of reward learning processes, the tasks differed in their underlying structure: while one task measured incremental reward learning ability alone, the other allowed participants to use an alternative learning strategy based on episodic memory alongside incremental reward learning. We found that individuals with cerebellar ataxia were profoundly impaired at reward learning from trial-and-error feedback on both tasks, but retained the ability to learn to predict reward based on episodic memory. These findings provide evidence from humans for a specific and necessary role for the cerebellum in incremental learning of reward associations based on reinforcement. More broadly, the findings suggest that alongside its role in motor learning, the cerebellum likely operates in concert with the basal ganglia to support reinforcement learning from reward.
Collapse
Affiliation(s)
- Jonathan Nicholas
- Department of Psychology, Columbia University, New York, NY, USA
- Zuckerman Mind Brain Behavior Institute, Columbia University, Quad 3D, 3227 Broadway, New York, NY, 10027, USA
| | - Christian Amlang
- Department of Neurology, Columbia University Medical Center, 650 W. 168th St, Rm 305, New York, NY, 10032, USA
- Initiative for Columbia Ataxia and Tremor, Columbia University Medical Center, New York, NY, USA
| | - Chi-Ying R Lin
- Department of Neurology, Baylor College of Medicine, Houston, TX, USA
| | | | - Natasha Desai
- Department of Neurology, Columbia University Medical Center, 650 W. 168th St, Rm 305, New York, NY, 10032, USA
- Initiative for Columbia Ataxia and Tremor, Columbia University Medical Center, New York, NY, USA
| | - Ming-Kai Pan
- Department of Medical Research, National Taiwan University Hospital, 100, Taipei, Taiwan
- Department and Graduate Institute of Pharmacology, National Taiwan University College of Medicine, 100, Taipei, Taiwan
- Cerebellar Research Center, National Taiwan University Hospital, Yun-Lin Branch, Yun-Lin, Taiwan
| | - Sheng-Han Kuo
- Department of Neurology, Columbia University Medical Center, 650 W. 168th St, Rm 305, New York, NY, 10032, USA.
- Initiative for Columbia Ataxia and Tremor, Columbia University Medical Center, New York, NY, USA.
| | - Daphna Shohamy
- Department of Psychology, Columbia University, New York, NY, USA.
- Zuckerman Mind Brain Behavior Institute, Columbia University, Quad 3D, 3227 Broadway, New York, NY, 10027, USA.
- Kavli Institute for Brain Science, Columbia University, New York, NY, USA.
| |
Collapse
|
2
|
Zid M, Laurie VJ, Levine-Champagne A, Shourkeshti A, Harrell D, Herman AB, Ebitz RB. Humans forage for reward in reinforcement learning tasks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.08.602539. [PMID: 39026817 PMCID: PMC11257465 DOI: 10.1101/2024.07.08.602539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
Abstract
How do we make good decisions in uncertain environments? In psychology and neuroscience, the classic answer is that we calculate the value of each option and then compare the values to choose the most rewarding, modulo some exploratory noise. An ethologist, conversely, would argue that we commit to one option until its value drops below a threshold, at which point we start exploring other options. In order to determine which view better describes human decision-making, we developed a novel, foraging-inspired sequential decision-making model and used it to ask whether humans compare to threshold ("Forage") or compare alternatives ("Reinforcement-Learn" [RL]). We found that the foraging model was a better fit for participant behavior, better predicted the participants' tendency to repeat choices, and predicted the existence of held-out participants with a pattern of choice that was almost impossible under RL. Together, these results suggest that humans use foraging computations, rather than RL, even in classic reinforcement learning tasks.
Collapse
Affiliation(s)
- Meriam Zid
- Department of Neuroscience, University of Montreal, Montreal, QC , H3T 1J4, Canada
| | - Veldon-James Laurie
- Department of Neuroscience, University of Montreal, Montreal, QC , H3T 1J4, Canada
| | | | - Akram Shourkeshti
- Department of Neuroscience, University of Montreal, Montreal, QC , H3T 1J4, Canada
| | - Dameon Harrell
- Department of Psychiatry, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Alexander B. Herman
- Department of Psychiatry, University of Minnesota, Minneapolis, MN, 55455, USA
| | - R. Becket Ebitz
- Department of Neuroscience, University of Montreal, Montreal, QC , H3T 1J4, Canada
| |
Collapse
|
3
|
Miller JA, Constantinidis C. Timescales of learning in prefrontal cortex. Nat Rev Neurosci 2024:10.1038/s41583-024-00836-8. [PMID: 38937654 DOI: 10.1038/s41583-024-00836-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/03/2024] [Indexed: 06/29/2024]
Abstract
The lateral prefrontal cortex (PFC) in humans and other primates is critical for immediate, goal-directed behaviour and working memory, which are classically considered distinct from the cognitive and neural circuits that support long-term learning and memory. Over the past few years, a reconsideration of this textbook perspective has emerged, in that different timescales of memory-guided behaviour are in constant interaction during the pursuit of immediate goals. Here, we will first detail how neural activity related to the shortest timescales of goal-directed behaviour (which requires maintenance of current states and goals in working memory) is sculpted by long-term knowledge and learning - that is, how the past informs present behaviour. Then, we will outline how learning across different timescales (from seconds to years) drives plasticity in the primate lateral PFC, from single neuron firing rates to mesoscale neuroimaging activity patterns. Finally, we will review how, over days and months of learning, dense local and long-range connectivity patterns in PFC facilitate longer-lasting changes in population activity by changing synaptic weights and recruiting additional neural resources to inform future behaviour. Our Review sheds light on how the machinery of plasticity in PFC circuits facilitates the integration of learned experiences across time to best guide adaptive behaviour.
Collapse
Affiliation(s)
- Jacob A Miller
- Wu Tsai Institute, Yale University, New Haven, CT, USA
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA
| | - Christos Constantinidis
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, USA.
- Neuroscience Program, Vanderbilt University, Nashville, TN, USA.
- Department of Ophthalmology and Visual Sciences, Vanderbilt University Medical Center, Nashville, TN, USA.
| |
Collapse
|
4
|
Hore A, Bandyopadhyay S, Chakrabarti S. Persistent spiking activity in neuromorphic circuits incorporating post-inhibitory rebound excitation. J Neural Eng 2024; 21:036048. [PMID: 38861961 DOI: 10.1088/1741-2552/ad56c8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Accepted: 06/11/2024] [Indexed: 06/13/2024]
Abstract
Objective. This study introduces a novel approach for integrating the post-inhibitory rebound excitation (PIRE) phenomenon into a neuronal circuit. Excitatory and inhibitory synapses are designed to establish a connection between two hardware neurons, effectively forming a network. The model demonstrates the occurrence of PIRE under strong inhibitory input. Emphasizing the significance of incorporating PIRE in neuromorphic circuits, the study showcases generation of persistent activity within cyclic and recurrent spiking neuronal networks.Approach. The neuronal and synaptic circuits are designed and simulated in Cadence Virtuoso using TSMC 180 nm technology. The operating mechanism of the PIRE phenomenon integrated into a hardware neuron is discussed. The proposed circuit encompasses several parameters for effectively controlling multiple electrophysiological features of a neuron.Main results. The neuronal circuit has been tuned to match the response of a biological neuron. The efficiency of this circuit is evaluated by computing the average power dissipation and energy consumption per spike through simulation. The sustained firing of neural spikes is observed till 1.7 s using the two neuronal networks.Significance. Persistent activity has significant implications for various cognitive functions such as working memory, decision-making, and attention. Therefore, hardware implementation of these functions will require our PIRE-integrated model. Energy-efficient neuromorphic systems are useful in many artificial intelligence applications, including human-machine interaction, IoT devices, autonomous systems, and brain-computer interfaces.
Collapse
|
5
|
Bays PM, Schneegans S, Ma WJ, Brady TF. Representation and computation in visual working memory. Nat Hum Behav 2024; 8:1016-1034. [PMID: 38849647 DOI: 10.1038/s41562-024-01871-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 03/22/2024] [Indexed: 06/09/2024]
Abstract
The ability to sustain internal representations of the sensory environment beyond immediate perception is a fundamental requirement of cognitive processing. In recent years, debates regarding the capacity and fidelity of the working memory (WM) system have advanced our understanding of the nature of these representations. In particular, there is growing recognition that WM representations are not merely imperfect copies of a perceived object or event. New experimental tools have revealed that observers possess richer information about the uncertainty in their memories and take advantage of environmental regularities to use limited memory resources optimally. Meanwhile, computational models of visuospatial WM formulated at different levels of implementation have converged on common principles relating capacity to variability and uncertainty. Here we review recent research on human WM from a computational perspective, including the neural mechanisms that support it.
Collapse
Affiliation(s)
- Paul M Bays
- Department of Psychology, University of Cambridge, Cambridge, UK
| | | | - Wei Ji Ma
- Center for Neural Science and Department of Psychology, New York University, New York, NY, USA
| | - Timothy F Brady
- Department of Psychology, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
6
|
Schaaf JV, Johansson A, Visser I, Huizenga HM. What's in a name: The role of verbalization in reinforcement learning. Psychon Bull Rev 2024:10.3758/s13423-024-02506-3. [PMID: 38769270 DOI: 10.3758/s13423-024-02506-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/25/2024] [Indexed: 05/22/2024]
Abstract
(e.g., characters or fractals) and concrete stimuli (e.g., pictures of everyday objects) are used interchangeably in the reinforcement-learning literature. Yet, it is unclear whether the same learning processes underlie learning from these different stimulus types. In two preregistered experiments (N = 50 each), we assessed whether abstract and concrete stimuli yield different reinforcement-learning performance and whether this difference can be explained by verbalization. We argued that concrete stimuli are easier to verbalize than abstract ones, and that people therefore can appeal to the phonological loop, a subcomponent of the working-memory system responsible for storing and rehearsing verbal information, while learning. To test whether this verbalization aids reinforcement-learning performance, we administered a reinforcement-learning task in which participants learned either abstract or concrete stimuli while verbalization was hindered or not. In the first experiment, results showed a more pronounced detrimental effect of hindered verbalization for concrete than abstract stimuli on response times, but not on accuracy. In the second experiment, in which we reduced the response window, results showed the differential effect of hindered verbalization between stimulus types on accuracy, not on response times. These results imply that verbalization aids learning for concrete, but not abstract, stimuli and therefore that different processes underlie learning from these types of stimuli. This emphasizes the importance of carefully considering stimulus types. We discuss these findings in light of generalizability and validity of reinforcement-learning research.
Collapse
Affiliation(s)
- Jessica V Schaaf
- Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands.
- Cognitive Neuroscience Department, Radboud University Medical Centre, Nijmegen, The Netherlands.
- Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands.
| | - Annie Johansson
- Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands
| | - Ingmar Visser
- Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands
- Yield, Research Institute for Child Development and Education, Amsterdam, The Netherlands
- ABC, Amsterdam Brain and Cognition Centre, Amsterdam, The Netherlands
| | - Hilde M Huizenga
- Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands
- Yield, Research Institute for Child Development and Education, Amsterdam, The Netherlands
- ABC, Amsterdam Brain and Cognition Centre, Amsterdam, The Netherlands
| |
Collapse
|
7
|
Ghaderi S, Amani Rad J, Hemami M, Khosrowabadi R. Dysfunctional feedback processing in male methamphetamine abusers: Evidence from neurophysiological and computational approaches. Neuropsychologia 2024; 197:108847. [PMID: 38460774 DOI: 10.1016/j.neuropsychologia.2024.108847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 01/24/2024] [Accepted: 02/28/2024] [Indexed: 03/11/2024]
Abstract
Methamphetamine use disorder (MUD) as a major public health risk is associated with dysfunctional neural feedback processing. Although dysfunctional feedback processing in people who are substance dependent has been explored in several behavioral, computational, and electrocortical studies, this mechanism in MUDs requires to be well understood. Furthermore, the current understanding of latent components of their behavior such as learning speed and exploration-exploitation dilemma is still limited. In addition, the association between the latent cognitive components and the related neural mechanisms also needs to be explored. Therefore, in this study, the underlying neurocognitive mechanisms of feedback processing of such impairment, and age/gender-matched healthy controls are evaluated within a probabilistic learning task with rewards and punishments. Mathematical modeling results based on the Q-learning paradigm suggested that MUDs show less sensitivity in distinguishing optimal options. Additionally, it may be worth noting that MUDs exhibited a slight decrease in their ability to learn from negative feedback compared to healthy controls. Also through the lens of underlying neural mechanisms, MUDs showed lower theta power at the medial-frontal areas while responding to negative feedback. However, other EEG measures of reinforcement learning including feedback-related negativity, parietal-P300, and activity flow from the medial frontal to lateral prefrontal regions, remained intact in MUDs. On the other hand, the elimination of the linkage between value sensitivity and medial-frontal theta activity in MUDs was observed. The observed dysfunction could be due to the adverse effects of methamphetamine on the cortico-striatal dopamine circuit, which is reflected in the anterior cingulate cortex activity as the most likely region responsible for efficient behavior adjustment. These findings could help us to pave the way toward tailored therapeutic approaches.
Collapse
Affiliation(s)
- Sadegh Ghaderi
- Institute for Cognitive and Brain Sciences, Shahid Beheshti University, Tehran, Iran
| | - Jamal Amani Rad
- Institute for Cognitive and Brain Sciences, Shahid Beheshti University, Tehran, Iran.
| | - Mohammad Hemami
- Institute for Cognitive and Brain Sciences, Shahid Beheshti University, Tehran, Iran
| | - Reza Khosrowabadi
- Institute for Cognitive and Brain Sciences, Shahid Beheshti University, Tehran, Iran.
| |
Collapse
|
8
|
Hassanzadeh Z, Bahrami F, Dortaj F. Exploring the dynamic interplay between learning and working memory within various cognitive contexts. Front Behav Neurosci 2024; 18:1304378. [PMID: 38420348 PMCID: PMC10899440 DOI: 10.3389/fnbeh.2024.1304378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 01/23/2024] [Indexed: 03/02/2024] Open
Abstract
Introduction The intertwined relationship between reinforcement learning and working memory in the brain is a complex subject, widely studied across various domains in neuroscience. Research efforts have focused on identifying the specific brain areas responsible for these functions, understanding their contributions in accomplishing the related tasks, and exploring their adaptability under conditions such as cognitive impairment or aging. Methods Numerous models have been introduced to formulate either these two subsystems of reinforcement learning and working memory separately or their combination and relationship in executing cognitive tasks. This study adopts the RLWM model as a computational framework to analyze the behavioral parameters of subjects with varying cognitive abilities due to age or cognitive status. A related RLWM task is employed to assess a group of subjects across different age groups and cognitive abilities, as measured by the Montreal Cognitive Assessment tool (MoCA). Results Analysis reveals a decline in overall performance accuracy and speed with differing age groups (young vs. middle-aged). Significant differences are observed in model parameters such as learning rate, WM decay, and decision noise. Furthermore, among the middle-aged group, distinctions emerge between subjects categorized as normal vs. MCI based on MoCA scores, notably in speed, performance accuracy, and decision noise.
Collapse
Affiliation(s)
- Zakieh Hassanzadeh
- Faculty of Psychology and Educational Sciences, Allameh Tabataba'i University, Tehran, Iran
| | - Fariba Bahrami
- School of Electrical and Computer Engineering College of Engineering, University of Tehran, Tehran, Iran
| | - Fariborz Dortaj
- Faculty of Psychology and Educational Sciences, Allameh Tabataba'i University, Tehran, Iran
| |
Collapse
|
9
|
Kirschner H, Nassar MR, Fischer AG, Frodl T, Meyer-Lotz G, Froböse S, Seidenbecher S, Klein TA, Ullsperger M. Transdiagnostic inflexible learning dynamics explain deficits in depression and schizophrenia. Brain 2024; 147:201-214. [PMID: 38058203 PMCID: PMC10766268 DOI: 10.1093/brain/awad362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 09/25/2023] [Accepted: 10/10/2023] [Indexed: 12/08/2023] Open
Abstract
Deficits in reward learning are core symptoms across many mental disorders. Recent work suggests that such learning impairments arise by a diminished ability to use reward history to guide behaviour, but the neuro-computational mechanisms through which these impairments emerge remain unclear. Moreover, limited work has taken a transdiagnostic approach to investigate whether the psychological and neural mechanisms that give rise to learning deficits are shared across forms of psychopathology. To provide insight into this issue, we explored probabilistic reward learning in patients diagnosed with major depressive disorder (n = 33) or schizophrenia (n = 24) and 33 matched healthy controls by combining computational modelling and single-trial EEG regression. In our task, participants had to integrate the reward history of a stimulus to decide whether it is worthwhile to gamble on it. Adaptive learning in this task is achieved through dynamic learning rates that are maximal on the first encounters with a given stimulus and decay with increasing stimulus repetitions. Hence, over the course of learning, choice preferences would ideally stabilize and be less susceptible to misleading information. We show evidence of reduced learning dynamics, whereby both patient groups demonstrated hypersensitive learning (i.e. less decaying learning rates), rendering their choices more susceptible to misleading feedback. Moreover, there was a schizophrenia-specific approach bias and a depression-specific heightened sensitivity to disconfirmational feedback (factual losses and counterfactual wins). The inflexible learning in both patient groups was accompanied by altered neural processing, including no tracking of expected values in either patient group. Taken together, our results thus provide evidence that reduced trial-by-trial learning dynamics reflect a convergent deficit across depression and schizophrenia. Moreover, we identified disorder distinct learning deficits.
Collapse
Affiliation(s)
- Hans Kirschner
- Institute of Psychology, Otto-von-Guericke University, D-39106 Magdeburg, Germany
| | - Matthew R Nassar
- Robert J. and Nancy D. Carney Institute for Brain Science, Brown University, Providence, RI 02912-1821, USA
- Department of Neuroscience, Brown University, Providence, RI 02912-1821, USA
| | - Adrian G Fischer
- Department of Education and Psychology, Freie Universität Berlin, D-14195 Berlin, Germany
| | - Thomas Frodl
- Department of Psychiatry and Psychotherapy, Otto-von-Guericke University, D-39106 Magdeburg, Germany
- Department of Psychiatry, Psychotherapy and Psychosomatics, RWTH Aachen University, Aachen 52074, Germany
- German Center for Mental Health (DZPG), D-39106 Magdeburg, Germany
- Center for Intervention and Research on adaptive and maladaptive brain Circuits underlying mental health (C-I-R-C), Jena-Magdeburg-Halle, D-39106 Magdeburg, Germany
| | - Gabriela Meyer-Lotz
- Department of Psychiatry and Psychotherapy, Otto-von-Guericke University, D-39106 Magdeburg, Germany
| | - Sören Froböse
- Department of Psychiatry and Psychotherapy, Otto-von-Guericke University, D-39106 Magdeburg, Germany
| | - Stephanie Seidenbecher
- Department of Psychiatry and Psychotherapy, Otto-von-Guericke University, D-39106 Magdeburg, Germany
| | - Tilmann A Klein
- Institute of Psychology, Otto-von-Guericke University, D-39106 Magdeburg, Germany
- Center for Behavioral Brain Sciences, D-39106 Magdeburg, Germany
| | - Markus Ullsperger
- Institute of Psychology, Otto-von-Guericke University, D-39106 Magdeburg, Germany
- German Center for Mental Health (DZPG), D-39106 Magdeburg, Germany
- Center for Intervention and Research on adaptive and maladaptive brain Circuits underlying mental health (C-I-R-C), Jena-Magdeburg-Halle, D-39106 Magdeburg, Germany
- Center for Behavioral Brain Sciences, D-39106 Magdeburg, Germany
| |
Collapse
|
10
|
Liuzzi L, Pine DS, Fox NA, Averbeck BB. Changes in Behavior and Neural Dynamics across Adolescent Development. J Neurosci 2023; 43:8723-8732. [PMID: 37848282 PMCID: PMC10727120 DOI: 10.1523/jneurosci.0462-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 08/28/2023] [Accepted: 09/19/2023] [Indexed: 10/19/2023] Open
Abstract
Adolescence is an important developmental period, during which substantial changes occur in brain function and behavior. Several aspects of executive function, including response inhibition, improve during this period. Correspondingly, structural imaging studies have documented consistent decreases in cortical and subcortical gray matter volume, and postmortem histologic studies have found substantial (∼40%) decreases in excitatory synapses in prefrontal cortex. Recent computational modeling work suggests that the change in synaptic density underlie improvements in task performance. These models also predict changes in neural dynamics related to the depth of attractor basins, where deeper basins can underlie better task performance. In this study, we analyzed task-related neural dynamics in a large cohort of longitudinally followed subjects (male and female) spanning early to late adolescence. We found that age correlated positively with behavioral performance in the Eriksen Flanker task. Older subjects were also characterized by deeper attractor basins around task related evoked EEG potentials during specific cognitive operations. Thus, consistent with computational models examining the effects of excitatory synaptic pruning, older adolescents showed stronger attractor dynamics during task performance.SIGNIFICANCE STATEMENT There are well-documented changes in brain and behavior during adolescent development. However, there are few mechanistic theories that link changes in the brain to changes in behavior. Here, we tested a hypothesis, put forward on the basis of computational modeling, that pruning of excitatory synapses in cortex during adolescence changes neural dynamics. We found, consistent with the hypothesis, that variability around event-related potentials shows faster decay dynamics in older adolescent subjects. The faster decay dynamics are consistent with the hypothesis that synaptic pruning during adolescent development leads to stronger attractor basins in task-related neural activity.
Collapse
Affiliation(s)
- Lucrezia Liuzzi
- Emotion and Development Branch, National Institute of Mental Health, Bethesda, 20892, MD
| | - Daniel S Pine
- Emotion and Development Branch, National Institute of Mental Health, Bethesda, 20892, MD
| | - Nathan A Fox
- Department of Human Development and Quantitative Methodology, University of Maryland, College Park, MD 20742
| | - Bruno B Averbeck
- Laboratory of Neuropsychology, National Institute of Mental Health, Bethesda, 20892, MD
| |
Collapse
|
11
|
Sato Y, Sakai Y, Hirata S. State-transition-free reinforcement learning in chimpanzees (Pan troglodytes). Learn Behav 2023; 51:413-427. [PMID: 37369920 DOI: 10.3758/s13420-023-00591-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/07/2023] [Indexed: 06/29/2023]
Abstract
The outcome of an action often occurs after a delay. One solution for learning appropriate actions from delayed outcomes is to rely on a chain of state transitions. Another solution, which does not rest on state transitions, is to use an eligibility trace (ET) that directly bridges a current outcome and multiple past actions via transient memories. Previous studies revealed that humans (Homo sapiens) learned appropriate actions in a behavioral task in which solutions based on the ET were effective but transition-based solutions were ineffective. This suggests that ET may be used in human learning systems. However, no studies have examined nonhuman animals with an equivalent behavioral task. We designed a task for nonhuman animals following a previous human study. In each trial, participants chose one of two stimuli that were randomly selected from three stimulus types: a stimulus associated with a food reward delivered immediately, a stimulus associated with a reward delivered after a few trials, and a stimulus associated with no reward. The presented stimuli did not vary according to the participants' choices. To maximize the total reward, participants had to learn the value of the stimulus associated with a delayed reward. Five chimpanzees (Pan troglodytes) performed the task using a touchscreen. Two chimpanzees were able to learn successfully, indicating that learning mechanisms that do not depend on state transitions were involved in the learning processes. The current study extends previous ET research by proposing a behavioral task and providing empirical data from chimpanzees.
Collapse
Grants
- 16H06283 Ministry of Education, Culture, Sports, Science, Japan Society for the Promotion of Science
- 18H05524 Ministry of Education, Culture, Sports, Science, Japan Society for the Promotion of Science
- 19J22889 Ministry of Education, Culture, Sports, Science, Japan Society for the Promotion of Science
- 26245069 Ministry of Education, Culture, Sports, Science, Japan Society for the Promotion of Science
- U04 Program for Leading Graduate Schools
Collapse
Affiliation(s)
- Yutaro Sato
- Wildlife Research Center, Kyoto University, Kyoto, Japan.
- University Administration Office, Headquarters for Management Strategy, Niigata University, Niigata, Japan.
| | - Yutaka Sakai
- Brain Science Institute, Tamagawa University, Tokyo, Japan
| | - Satoshi Hirata
- Wildlife Research Center, Kyoto University, Kyoto, Japan
| |
Collapse
|
12
|
Park H, Doh H, Lee E, Park H, Ahn WY. The neurocognitive role of working memory load when Pavlovian motivational control affects instrumental learning. PLoS Comput Biol 2023; 19:e1011692. [PMID: 38064498 PMCID: PMC10732416 DOI: 10.1371/journal.pcbi.1011692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 12/20/2023] [Accepted: 11/15/2023] [Indexed: 12/21/2023] Open
Abstract
Research suggests that a fast, capacity-limited working memory (WM) system and a slow, incremental reinforcement learning (RL) system jointly contribute to instrumental learning. Thus, situations that strain WM resources alter instrumental learning: under WM loads, learning becomes slow and incremental, the reliance on computationally efficient learning increases, and action selection becomes more random. It is also suggested that Pavlovian learning influences people's behavior during instrumental learning by providing hard-wired instinctive responses including approach to reward predictors and avoidance of punishment predictors. However, it remains unknown how constraints on WM resources affect instrumental learning under Pavlovian influence. Thus, we conducted a functional magnetic resonance imaging (fMRI) study (N = 49) in which participants completed an instrumental learning task with Pavlovian-instrumental conflict (the orthogonalized go/no-go task) both with and without extra WM load. Behavioral and computational modeling analyses revealed that WM load reduced the learning rate and increased random choice, without affecting Pavlovian bias. Model-based fMRI analysis revealed that WM load strengthened RPE signaling in the striatum. Moreover, under WM load, the striatum showed weakened connectivity with the ventromedial and dorsolateral prefrontal cortex when computing reward expectations. These results suggest that the limitation of cognitive resources by WM load promotes slow and incremental learning through the weakened cooperation between WM and RL; such limitation also makes action selection more random, but it does not directly affect the balance between instrumental and Pavlovian systems.
Collapse
Affiliation(s)
- Heesun Park
- Department of Psychology, Seoul National University, Seoul, Korea
| | - Hoyoung Doh
- Department of Psychology, Seoul National University, Seoul, Korea
| | - Eunhwi Lee
- Department of Psychology, Seoul National University, Seoul, Korea
| | - Harhim Park
- Department of Psychology, Seoul National University, Seoul, Korea
| | - Woo-Young Ahn
- Department of Psychology, Seoul National University, Seoul, Korea
- Department of Brain and Cognitive Sciences, Seoul National University, Seoul, Korea
| |
Collapse
|
13
|
Yoo AH, Keglovits H, Collins AGE. Lowered inter-stimulus discriminability hurts incremental contributions to learning. COGNITIVE, AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2023; 23:1346-1364. [PMID: 37656373 PMCID: PMC10545593 DOI: 10.3758/s13415-023-01104-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 04/13/2023] [Indexed: 09/02/2023]
Abstract
How does the similarity between stimuli affect our ability to learn appropriate response associations for them? In typical laboratory experiments learning is investigated under somewhat ideal circumstances, where stimuli are easily discriminable. This is not representative of most real-life learning, where overlapping "stimuli" can result in different "rewards" and may be learned simultaneously (e.g., you may learn over repeated interactions that a specific dog is friendly, but that a very similar looking one isn't). With two experiments, we test how humans learn in three stimulus conditions: one "best case" condition in which stimuli have idealized and highly discriminable visual and semantic representations, and two in which stimuli have overlapping representations, making them less discriminable. We find that, unsurprisingly, decreasing stimuli discriminability decreases performance. We develop computational models to test different hypotheses about how reinforcement learning (RL) and working memory (WM) processes are affected by different stimulus conditions. Our results replicate earlier studies demonstrating the importance of both processes to capture behavior. However, our results extend previous studies by demonstrating that RL, and not WM, is affected by stimulus distinctness: people learn slower and have higher across-stimulus value confusion at decision when stimuli are more similar to each other. These results illustrate strong effects of stimulus type on learning and demonstrate the importance of considering parallel contributions of different cognitive processes when studying behavior.
Collapse
Affiliation(s)
- Aspen H Yoo
- Department of Psychology, University of California, Berkeley, USA
- Helen Wills Neuroscience Institute, University of California, Berkeley, USA
| | - Haley Keglovits
- Department of Cognitive, Linguistic and Psychological Sciences, Brown University, Providence, USA
| | - Anne G E Collins
- Department of Psychology, University of California, Berkeley, USA.
- Helen Wills Neuroscience Institute, University of California, Berkeley, USA.
| |
Collapse
|
14
|
Tichelaar JG, Sayalı C, Helmich RC, Cools R. Impulse control disorder in Parkinson's disease is associated with abnormal frontal value signalling. Brain 2023; 146:3676-3689. [PMID: 37192341 PMCID: PMC10473575 DOI: 10.1093/brain/awad162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 04/18/2023] [Accepted: 04/26/2023] [Indexed: 05/18/2023] Open
Abstract
Dopaminergic medication is well established to boost reward- versus punishment-based learning in Parkinson's disease. However, there is tremendous variability in dopaminergic medication effects across different individuals, with some patients exhibiting much greater cognitive sensitivity to medication than others. We aimed to unravel the mechanisms underlying this individual variability in a large heterogeneous sample of early-stage patients with Parkinson's disease as a function of comorbid neuropsychiatric symptomatology, in particular impulse control disorders and depression. One hundred and ninety-nine patients with Parkinson's disease (138 ON medication and 61 OFF medication) and 59 healthy controls were scanned with functional MRI while they performed an established probabilistic instrumental learning task. Reinforcement learning model-based analyses revealed medication group differences in learning from gains versus losses, but only in patients with impulse control disorders. Furthermore, expected-value related brain signalling in the ventromedial prefrontal cortex was increased in patients with impulse control disorders ON medication compared with those OFF medication, while striatal reward prediction error signalling remained unaltered. These data substantiate the hypothesis that dopamine's effects on reinforcement learning in Parkinson's disease vary with individual differences in comorbid impulse control disorder and suggest they reflect deficient computation of value in medial frontal cortex, rather than deficient reward prediction error signalling in striatum. See Michael Browning (https://doi.org/10.1093/brain/awad248) for a scientific commentary on this article.
Collapse
Affiliation(s)
- Jorryt G Tichelaar
- Radboud University Medical Centre, Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, 6525EN Nijmegen, The Netherlands
- Radboud University Medical Center, Department of Neurology, Centre of Expertise for Parkinson and Movement Disorders, 6525GA Nijmegen, The Netherlands
| | - Ceyda Sayalı
- The Johns Hopkins University School of Medicine, Center for Psychedelic and Consciousness Research, Baltimore, MD 21224, USA
| | - Rick C Helmich
- Radboud University Medical Centre, Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, 6525EN Nijmegen, The Netherlands
- Radboud University Medical Center, Department of Neurology, Centre of Expertise for Parkinson and Movement Disorders, 6525GA Nijmegen, The Netherlands
| | - Roshan Cools
- Radboud University Medical Centre, Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, 6525EN Nijmegen, The Netherlands
- Radboud University Medical Center, Department of Psychiatry, 6525GA Nijmegen, The Netherlands
| |
Collapse
|
15
|
Sun Y, Kang P, Huang L, Wang H, Ku Y. Reward advantage over punishment for incentivizing visual working memory. Psychophysiology 2023; 60:e14300. [PMID: 36966450 DOI: 10.1111/psyp.14300] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2022] [Revised: 01/26/2023] [Accepted: 03/01/2023] [Indexed: 03/27/2023]
Abstract
The prospects of gaining reward and avoiding punishment widely influence human behavior. Despite of numerous attempts to investigate the influence of motivational signals on working memory (WM), whether the valence and the magnitude of motivational signals interactively influence WM performance remains unclear. To investigate this, the present study used a free-recall working memory task with EEG recording to compare the effect of incentive valence (reward or punishment), as well as the magnitude of incentives on visual WM. Behavioral results revealed that the presence of incentive signals improved WM precision when compared with no-incentive condition, and compared with punishing cues, rewarding cues led to greater facilitation in WM precision, as well as confidence ratings afterward. Moreover, event related potential (ERP) results suggested that compared with punishment, reward led to an earlier latency of late positive component (LPC), a larger amplitude of contingent negative variation (CNV) during the expectation period, and a larger P300 amplitude during the sample and delay periods. Furthermore, reward advantage over punishment in behavioral and neural results were correlated, such that individuals with larger CNV difference between reward and punishment conditions also report greater distinction in confidence ratings between the two conditions. In sum, our results demonstrate what and how rewarding cues cause more beneficial effects than punishing cues when incentivizing visual WM.
Collapse
Affiliation(s)
- Yurong Sun
- Guangdong Provincial Key Laboratory of Brain Function and Disease, Center for Brain and Mental Well-Being, Department of Psychology, Sun Yat-sen University, Guangzhou, China
- School of Psychology and Cognitive Science, East China Normal University, Shanghai, China
| | - Pyungwon Kang
- Zurich Center for Neuroeconomics, Department of Economics, University of Zurich, Zurich, Switzerland
| | - Leyu Huang
- School of Psychology and Cognitive Science, East China Normal University, Shanghai, China
| | - Huimin Wang
- School of Psychology and Cognitive Science, East China Normal University, Shanghai, China
| | - Yixuan Ku
- Guangdong Provincial Key Laboratory of Brain Function and Disease, Center for Brain and Mental Well-Being, Department of Psychology, Sun Yat-sen University, Guangzhou, China
- Peng Cheng Laboratory, Shenzhen, China
| |
Collapse
|
16
|
Schumacher L, Bürkner PC, Voss A, Köthe U, Radev ST. Neural superstatistics for Bayesian estimation of dynamic cognitive models. Sci Rep 2023; 13:13778. [PMID: 37612320 PMCID: PMC10447473 DOI: 10.1038/s41598-023-40278-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 08/08/2023] [Indexed: 08/25/2023] Open
Abstract
Mathematical models of cognition are often memoryless and ignore potential fluctuations of their parameters. However, human cognition is inherently dynamic. Thus, we propose to augment mechanistic cognitive models with a temporal dimension and estimate the resulting dynamics from a superstatistics perspective. Such a model entails a hierarchy between a low-level observation model and a high-level transition model. The observation model describes the local behavior of a system, and the transition model specifies how the parameters of the observation model evolve over time. To overcome the estimation challenges resulting from the complexity of superstatistical models, we develop and validate a simulation-based deep learning method for Bayesian inference, which can recover both time-varying and time-invariant parameters. We first benchmark our method against two existing frameworks capable of estimating time-varying parameters. We then apply our method to fit a dynamic version of the diffusion decision model to long time series of human response times data. Our results show that the deep learning approach is very efficient in capturing the temporal dynamics of the model. Furthermore, we show that the erroneous assumption of static or homogeneous parameters will hide important temporal information.
Collapse
Affiliation(s)
- Lukas Schumacher
- Institute of Psychology, Heidelberg University, Heidelberg, Germany.
| | | | - Andreas Voss
- Institute of Psychology, Heidelberg University, Heidelberg, Germany
| | - Ullrich Köthe
- Computer Vision and Learning Lab, Heidelberg University, Heidelberg, Germany
| | - Stefan T Radev
- Cluster of Excellence STRUCTURES, Heidelberg University, Heidelberg, Germany
| |
Collapse
|
17
|
Rac-Lubashevsky R, Cremer A, Collins AGE, Frank MJ, Schwabe L. Neural Index of Reinforcement Learning Predicts Improved Stimulus-Response Retention under High Working Memory Load. J Neurosci 2023; 43:3131-3143. [PMID: 36931706 PMCID: PMC10146488 DOI: 10.1523/jneurosci.1274-22.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 01/19/2023] [Accepted: 02/20/2023] [Indexed: 03/19/2023] Open
Abstract
Human learning and decision-making are supported by multiple systems operating in parallel. Recent studies isolating the contributions of reinforcement learning (RL) and working memory (WM) have revealed a trade-off between the two. An interactive WM/RL computational model predicts that although high WM load slows behavioral acquisition, it also induces larger prediction errors in the RL system that enhance robustness and retention of learned behaviors. Here, we tested this account by parametrically manipulating WM load during RL in conjunction with EEG in both male and female participants and administered two surprise memory tests. We further leveraged single-trial decoding of EEG signatures of RL and WM to determine whether their interaction predicted robust retention. Consistent with the model, behavioral learning was slower for associations acquired under higher load but showed parametrically improved future retention. This paradoxical result was mirrored by EEG indices of RL, which were strengthened under higher WM loads and predictive of more robust future behavioral retention of learned stimulus-response contingencies. We further tested whether stress alters the ability to shift between the two systems strategically to maximize immediate learning versus retention of information and found that induced stress had only a limited effect on this trade-off. The present results offer a deeper understanding of the cooperative interaction between WM and RL and show that relying on WM can benefit the rapid acquisition of choice behavior during learning but impairs retention.SIGNIFICANCE STATEMENT Successful learning is achieved by the joint contribution of the dopaminergic RL system and WM. The cooperative WM/RL model was productive in improving our understanding of the interplay between the two systems during learning, demonstrating that reliance on RL computations is modulated by WM load. However, the role of WM/RL systems in the retention of learned stimulus-response associations remained unestablished. Our results show that increased neural signatures of learning, indicative of greater RL computation, under high WM load also predicted better stimulus-response retention. This result supports a trade-off between the two systems, where degraded WM increases RL processing, which improves retention. Notably, we show that this cooperative interplay remains largely unaffected by acute stress.
Collapse
Affiliation(s)
- Rachel Rac-Lubashevsky
- Department of Cognitive, Linguistic, and Psychological Sciences, Brown University, Providence, Rhode Island 02912
- Carney Institute for Brain Science, Brown University, Providence, Rhode Island 02912
| | - Anna Cremer
- Department of Cognitive Psychology, Universitat Hamburg, 20146 Hamburg, Germany
| | - Anne G E Collins
- Department of Psychology, University of California, Berkeley, Berkeley, California 94720-1650
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, California 94720
| | - Michael J Frank
- Department of Cognitive, Linguistic, and Psychological Sciences, Brown University, Providence, Rhode Island 02912
- Carney Institute for Brain Science, Brown University, Providence, Rhode Island 02912
| | - Lars Schwabe
- Department of Cognitive Psychology, Universitat Hamburg, 20146 Hamburg, Germany
| |
Collapse
|
18
|
Zeng J, Meng J, Wang C, Leng W, Zhong X, Gong A, Bo S, Jiang C. High vagally mediated resting-state heart rate variability is associated with superior working memory function. Front Neurosci 2023; 17:1119405. [PMID: 36891458 PMCID: PMC9986304 DOI: 10.3389/fnins.2023.1119405] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 01/30/2023] [Indexed: 02/22/2023] Open
Abstract
Background Heart rate variability (HRV), a cardiac vagal tone indicator, has been proven to predict performance on some cognitive tasks that rely on the prefrontal cortex. However, the relationship between vagal tone and working memory remains understudied. This study explores the link between vagal tone and working memory function, combined with behavioral tasks and functional near-infrared spectroscopy (fNIRS). Methods A total of 42 undergraduate students were tested for 5-min resting-state HRV to obtain the root mean square of successive differences (rMSSD) data, and then divided into high and low vagal tone groups according to the median of rMSSD data. The two groups underwent the n-back test, and fNIRS was used to measure the neural activity in the test state. ANOVA and the independent sample t-test were performed to compare group mean differences, and the Pearson correlation coefficient was used for correlation analysis. Results The high vagal tone group had a shorter reaction time, higher accuracy, lower inverse efficiency score, and lower oxy-Hb concentration in the bilateral prefrontal cortex in the working memory tasks state. Furthermore, there were associations between behavioral performance, oxy-Hb concentration, and resting-state rMSSD. Conclusion Our findings suggest that high vagally mediated resting-state HRV is associated with working memory performance. High vagal tone means a higher efficiency of neural resources, beneficial to presenting a better working memory function.
Collapse
Affiliation(s)
- Jia Zeng
- The Center of Neuroscience and Sports, Capital University of Physical Education and Sports, Beijing, China
| | - Jiao Meng
- The Center of Neuroscience and Sports, Capital University of Physical Education and Sports, Beijing, China
| | - Chen Wang
- The Center of Neuroscience and Sports, Capital University of Physical Education and Sports, Beijing, China
| | - Wenwu Leng
- The Center of Neuroscience and Sports, Capital University of Physical Education and Sports, Beijing, China
| | - Xiaoke Zhong
- The Center of Neuroscience and Sports, Capital University of Physical Education and Sports, Beijing, China
| | - Anmin Gong
- School of Information Engineering, Engineering University of People's Armed Police, Xi'an, China
| | - Shumin Bo
- School of Kinesiology and Health, Capital University of Physical Education and Sports, Beijing, China
| | - Changhao Jiang
- The Center of Neuroscience and Sports, Capital University of Physical Education and Sports, Beijing, China.,School of Kinesiology and Health, Capital University of Physical Education and Sports, Beijing, China
| |
Collapse
|
19
|
Abstract
In reinforcement learning (RL) experiments, participants learn to make rewarding choices in response to different stimuli; RL models use outcomes to estimate stimulus-response values that change incrementally. RL models consider any response type indiscriminately, ranging from more concretely defined motor choices (pressing a key with the index finger), to more general choices that can be executed in a number of ways (selecting dinner at the restaurant). However, does the learning process vary as a function of the choice type? In Experiment 1, we show that it does: Participants were slower and less accurate in learning correct choices of a general format compared with learning more concrete motor actions. Using computational modeling, we show that two mechanisms contribute to this. First, there was evidence of irrelevant credit assignment: The values of motor actions interfered with the values of other choice dimensions, resulting in more incorrect choices when the correct response was not defined by a single motor action; second, information integration for relevant general choices was slower. In Experiment 2, we replicated and further extended the findings from Experiment 1 by showing that slowed learning was attributable to weaker working memory use, rather than slowed RL. In both experiments, we ruled out the explanation that the difference in performance between two condition types was driven by difficulty/different levels of complexity. We conclude that defining a more abstract choice space used by multiple learning systems for credit assignment recruits executive resources, limiting how much such processes then contribute to fast learning.
Collapse
Affiliation(s)
| | - Amy Zou
- University of California, Berkeley
| | - Anne G E Collins
- University of California, Berkeley
- Helen Wills Neuroscience Institute Berkeley, CA
| |
Collapse
|
20
|
Working memory capacity estimates moderate value learning for outcome-irrelevant features. Sci Rep 2022; 12:19677. [PMID: 36385131 PMCID: PMC9669000 DOI: 10.1038/s41598-022-21832-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Accepted: 10/04/2022] [Indexed: 11/17/2022] Open
Abstract
To establish accurate action-outcome associations in the environment, individuals must refrain from assigning value to outcome-irrelevant features. However, studies have largely ignored the role of attentional control processes on action value updating. In the current study, we examined the extent to which working memory-a system that can filter and block the processing of irrelevant information in one's mind-also filters outcome-irrelevant information during value-based learning. For this aim, 174 individuals completed a well-established working memory capacity measurement and a reinforcement learning task designed to estimate outcome-irrelevant learning. We replicated previous studies showing a group-level tendency to assign value to tasks' response keys, despite clear instructions and practice suggesting they are irrelevant to the prediction of monetary outcomes. Importantly, individuals with higher working memory capacity were less likely to assign value to the outcome-irrelevant response keys, thus suggesting a significant moderation effect of working memory capacity on outcome-irrelevant learning. We discuss the role of working memory processing on value-based learning through the lens of a cognitive control failure.
Collapse
|
21
|
Nicholas J, Daw ND, Shohamy D. Uncertainty alters the balance between incremental learning and episodic memory. eLife 2022; 11:81679. [PMID: 36458809 PMCID: PMC9810331 DOI: 10.7554/elife.81679] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Accepted: 12/01/2022] [Indexed: 12/04/2022] Open
Abstract
A key question in decision-making is how humans arbitrate between competing learning and memory systems to maximize reward. We address this question by probing the balance between the effects, on choice, of incremental trial-and-error learning versus episodic memories of individual events. Although a rich literature has studied incremental learning in isolation, the role of episodic memory in decision-making has only recently drawn focus, and little research disentangles their separate contributions. We hypothesized that the brain arbitrates rationally between these two systems, relying on each in circumstances to which it is most suited, as indicated by uncertainty. We tested this hypothesis by directly contrasting contributions of episodic and incremental influence to decisions, while manipulating the relative uncertainty of incremental learning using a well-established manipulation of reward volatility. Across two large, independent samples of young adults, participants traded these influences off rationally, depending more on episodic information when incremental summaries were more uncertain. These results support the proposal that the brain optimizes the balance between different forms of learning and memory according to their relative uncertainties and elucidate the circumstances under which episodic memory informs decisions.
Collapse
Affiliation(s)
- Jonathan Nicholas
- Department of Psychology, Columbia UniversityNew YorkUnited States,Mortimer B. Zuckerman Mind, Brain, Behavior Institute, Columbia UniversityNew YorkUnited States
| | - Nathaniel D Daw
- Department of Psychology, Princeton UniversityPrincetonUnited States,Princeton Neuroscience Institute, Princeton UniversityPrincetonUnited States
| | - Daphna Shohamy
- Department of Psychology, Columbia UniversityNew YorkUnited States,Mortimer B. Zuckerman Mind, Brain, Behavior Institute, Columbia UniversityNew YorkUnited States,The Kavli Institute for Brain Science, Columbia UniversityNew YorkUnited States
| |
Collapse
|