1
|
Vázquez D, Maulhardt SR, Stalnaker TA, Solway A, Charpentier CJ, Roesch MR. Optogenetic inhibition of rat anterior cingulate cortex impairs the ability to initiate and stay on task. J Neurosci 2024:e1850232024. [PMID: 38569923 DOI: 10.1523/jneurosci.1850-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 01/16/2024] [Accepted: 01/20/2024] [Indexed: 04/05/2024] Open
Abstract
Our prior research has identified neural correlates of cognitive control in the anterior cingulate cortex (ACC), leading us to hypothesize that the ACC is necessary for increasing attention as rats flexibly learn new contingencies during a complex reward-guided decision-making task. Here, we tested this hypothesis by using optogenetics to transiently inhibit the ACC while rats of either sex performed the same two-choice task. ACC inhibition had a profound impact on behavior that extended beyond deficits in attention during learning when expected outcomes were uncertain. We found that ACC inactivation slowed and reduced the number of trials rats initiated, and impaired both their accuracy and their ability to complete sessions. Further, drift-diffusion model analysis suggested that free-choice performance and evidence accumulation (i.e., reduced drift rates) were degraded during initial learning-leading to weaker associations that were more easily overridden in later trial blocks (i.e., stronger bias). Together, these results suggest that in addition to attention-related functions, the ACC contributes to the ability to initiate trials and generally stay on task.Significant Statement Attentional deficits and the ability to stay on task are defining hallmarks of some of the most prevalent and disruptive neuropsychiatric disorders. Here, we use an optogenetic approach and computational modeling to study how within-subject modulation of the anterior cingulate cortex (ACC) impacts the ability of rats to initiate and complete a complex reward-guided decision-making task. We found that on days in which the ACC was inhibited, the ability of rats to initiate and stay on task was impaired, as well as their task accuracy and ability to complete sessions.
Collapse
Affiliation(s)
- Daniela Vázquez
- Department of Psychology, University of Maryland, College Park, MD 20742, USA
- Program in Neuroscience and Cognitive Science, University of Maryland, College Park, MD 20742, USA
| | - Sean R Maulhardt
- Department of Psychology, University of Maryland, College Park, MD 20742, USA
| | - Thomas A Stalnaker
- Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD, USA
| | - Alec Solway
- Department of Psychology, University of Maryland, College Park, MD 20742, USA
- Program in Neuroscience and Cognitive Science, University of Maryland, College Park, MD 20742, USA
| | - Caroline J Charpentier
- Department of Psychology, University of Maryland, College Park, MD 20742, USA
- Program in Neuroscience and Cognitive Science, University of Maryland, College Park, MD 20742, USA
| | - Matthew R Roesch
- Department of Psychology, University of Maryland, College Park, MD 20742, USA
- Program in Neuroscience and Cognitive Science, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
2
|
Solway A, Schneider I, Lei Y. The relationships between subclinical OCD symptoms, beta/gamma-band power, and the rate of evidence integration during perceptual decision making. Neuroimage Clin 2022; 34:102975. [PMID: 35255416 PMCID: PMC8904622 DOI: 10.1016/j.nicl.2022.102975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Revised: 01/25/2022] [Accepted: 02/25/2022] [Indexed: 11/25/2022]
Abstract
Previous studies have demonstrated that the rate of evidence integration during perceptual decision making, a specific computationally defined parameter, is negatively correlated with both subclinical symptoms of OCD measured on a continuum and categorically diagnosed patient status. However, the neural mechanisms underlying this deficit are unknown. Separate work has shown that both gamma and beta-band power are related to evidence integration, and differences in beta-band power in particular have been hypothesized to hinder flexible behavioral control. We sought to unify these two disparate literatures, one on OCD-related information processing differences constrained by behavioral data alone, and the other on the neural correlates of evidence integration. Using computational modeling and scalp EEG, we tested (N = 67) the relationships between subclinical symptom scores, drift rate, and gamma/beta-band activity during perceptual decision making. We replicated both prior work showing deficits in evidence integration as a function of OCD symptoms, and work showing a relationship between evidence integration and gamma and beta-band power. As predicted, the slope of beta-band power was correlated with OCD symptoms. However, the relationships between OCD symptoms and drift rate and the slopes of gamma and beta-band power and drift rate remained unchanged when simultaneously accounting for all variables, speaking against the hypothesis that differences in band-band power explain drift rate deficits.
Collapse
Affiliation(s)
- Alec Solway
- Department of Psychology, University of Maryland-College Park, United States; Program in Neuroscience and Cognitive Science, University of Maryland-College Park, United States.
| | - Isabella Schneider
- Department of Psychology, University of Maryland-College Park, United States
| | - Yuqing Lei
- Department of Psychology, University of Maryland-College Park, United States
| |
Collapse
|
3
|
Lei Y, Solway A. Conflict and competition between model-based and model-free control. PLoS Comput Biol 2022; 18:e1010047. [PMID: 35511764 PMCID: PMC9070915 DOI: 10.1371/journal.pcbi.1010047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 03/22/2022] [Indexed: 11/25/2022] Open
Abstract
A large literature has accumulated suggesting that human and animal decision making is driven by at least two systems, and that important functions of these systems can be captured by reinforcement learning algorithms. The "model-free" system caches and uses stimulus-value or stimulus-response associations, and the "model-based" system implements more flexible planning using a model of the world. However, it is not clear how the two systems interact during deliberation and how a single decision emerges from this process, especially when they disagree. Most previous work has assumed that while the systems operate in parallel, they do so independently, and they combine linearly to influence decisions. Using an integrated reinforcement learning/drift-diffusion model, we tested the hypothesis that the two systems interact in a non-linear fashion similar to other situations with cognitive conflict. We differentiated two forms of conflict: action conflict, a binary state representing whether the systems disagreed on the best action, and value conflict, a continuous measure of the extent to which the two systems disagreed on the difference in value between the available options. We found that decisions with greater value conflict were characterized by reduced model-based control and increased caution both with and without action conflict. Action conflict itself (the binary state) acted in the opposite direction, although its effects were less prominent. We also found that between-system conflict was highly correlated with within-system conflict, and although it is less clear a priori why the latter might influence the strength of each system above its standard linear contribution, we could not rule it out. Our work highlights the importance of non-linear conflict effects, and provides new constraints for more detailed process models of decision making. It also presents new avenues to explore with relation to disorders of compulsivity, where an imbalance between systems has been implicated.
Collapse
Affiliation(s)
- Yuqing Lei
- Department of Psychology, University of Maryland-College Park, College Park, Maryland, United States of America
| | - Alec Solway
- Department of Psychology, University of Maryland-College Park, College Park, Maryland, United States of America
- Program in Neuroscience and Cognitive Science, University of Maryland-College Park, College Park, Maryland, United States of America
| |
Collapse
|
4
|
Brown VM, Zhu L, Solway A, Wang JM, McCurry KL, King-Casas B, Chiu PH. Reinforcement Learning Disruptions in Individuals With Depression and Sensitivity to Symptom Change Following Cognitive Behavioral Therapy. JAMA Psychiatry 2021; 78:1113-1122. [PMID: 34319349 PMCID: PMC8319827 DOI: 10.1001/jamapsychiatry.2021.1844] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
IMPORTANCE Major depressive disorder is prevalent and impairing. Parsing neurocomputational substrates of reinforcement learning in individuals with depression may facilitate a mechanistic understanding of the disorder and suggest new cognitive therapeutic targets. OBJECTIVE To determine associations among computational model-derived reinforcement learning parameters, depression symptoms, and symptom changes after treatment. DESIGN, SETTING, AND PARTICIPANTS In this mixed cross-sectional-cohort study, individuals performed reward and loss variants of a probabilistic learning task during functional magnetic resonance imaging at baseline and follow-up. A volunteer sample with and without a depression diagnosis was recruited from the community. Participants were assessed from July 2011 to February 2017, and data were analyzed from May 2017 to May 2021. MAIN OUTCOMES AND MEASURES Computational model-based analyses of participants' choices assessed a priori hypotheses about associations between components of reward-based and loss-based learning with depression symptoms. Changes in both learning parameters and symptoms were then assessed in a subset of participants who received cognitive behavioral therapy (CBT). RESULTS Of 101 included adults, 69 (68.3%) were female, and the mean (SD) age was 34.4 (11.2) years. A total of 69 participants with a depression diagnosis and 32 participants without a depression diagnosis were included at baseline; 48 participants (28 with depression who received CBT and 20 without depression) were included at follow-up (mean [SD] of 115.1 [15.6] days). Computational model-based analyses of behavioral choices and neural data identified associations of learning with symptoms during reward learning and loss learning, respectively. During reward learning only, anhedonia (and not negative affect or arousal) was associated with model-derived learning parameters (learning rate: posterior mean regression β = -0.14; 95% credible interval [CrI], -0.12 to -0.03; outcome sensitivity: posterior mean regression β = 0.18; 95% CrI, 0.02 to 0.37) and neural learning signals (moderation of association between striatal prediction error and expected value signals: t97 = -2.10; P = .04). During loss learning only, negative affect (and not anhedonia or arousal) was associated with learning parameters (outcome shift: posterior mean regression β = -0.11; 95% CrI, -0.20 to -0.01) and disrupted neural encoding of learning signals (association with subgenual anterior cingulate prediction error signals: r = -0.28; P = .005). Symptom improvement following CBT was associated with normalization of learning parameters that were disrupted at baseline (reward learning rate: posterior mean regression β = 0.15; 90% CrI, 0.001 to 0.41; loss outcome shift: posterior mean regression β = 0.42; 90% CrI, 0.09 to 0.77). CONCLUSIONS AND RELEVANCE In this study, the mapping of reinforcement learning components to symptoms of major depression revealed mechanistic features associated with these symptoms and points to possible learning-based therapeutic processes and targets.
Collapse
Affiliation(s)
- Vanessa M. Brown
- Department of Psychology, Virginia Tech, Blacksburg,Fralin Biomedical Research Institute at VTC, Virginia Tech, Roanoke,Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania
| | - Lusha Zhu
- Fralin Biomedical Research Institute at VTC, Virginia Tech, Roanoke,School of Psychological and Cognitive Sciences, Beijing Key Laboratory of Behavior and Mental Health, PKU-IDG/McGovern Institute for Brain Research, Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, China
| | - Alec Solway
- Fralin Biomedical Research Institute at VTC, Virginia Tech, Roanoke
| | - John M. Wang
- Department of Psychology, Virginia Tech, Blacksburg,Fralin Biomedical Research Institute at VTC, Virginia Tech, Roanoke
| | - Katherine L. McCurry
- Department of Psychology, Virginia Tech, Blacksburg,Fralin Biomedical Research Institute at VTC, Virginia Tech, Roanoke
| | - Brooks King-Casas
- Department of Psychology, Virginia Tech, Blacksburg,Fralin Biomedical Research Institute at VTC, Virginia Tech, Roanoke,Virginia Tech-Wake Forest University School of Biomedical Engineering and Sciences, Blacksburg
| | - Pearl H. Chiu
- Department of Psychology, Virginia Tech, Blacksburg,Fralin Biomedical Research Institute at VTC, Virginia Tech, Roanoke
| |
Collapse
|
5
|
Solway A, Lin Z, Kaplan CM. Revisiting verbal recognition memory in obsessive-compulsive disorder: A computational approach. J Psychiatr Res 2021; 138:428-435. [PMID: 33962130 DOI: 10.1016/j.jpsychires.2021.04.015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 03/23/2021] [Accepted: 04/12/2021] [Indexed: 12/15/2022]
Abstract
Deficits in primary recognition memory and confidence have previously been tested as potential contributors to excessive checking behavior in obsessive-compulsive disorder. Studies have tested both recognition for actions and, hypothesizing that recognition may be disrupted more generally across content domains, verbal recognition memory. However, studies of verbal recognition memory have yielded mixed results. We revisited this work with the benefit of hindsight, running two new experiments with larger samples, the manipulation of recognition difficulty, and a computational model-based approach to data analysis. In both datasets, we found that discriminability, defined as the difference in drift rate for old versus new stimuli in the drift-diffusion model, was reduced as a function of subclinical OCD symptoms in the general population. Paralleling work on drift rate deficits in perceptual decision making in OCD, these reductions were larger for easier recognition decisions. We also asked participants about their confidence in each recognition decision and parcellated confidence into bias, or the difference in overall confidence, and sensitivity, which represents the ability to appropriately map confidence to objective accuracy. We found no consistent evidence of a relationship between OCD symptoms and either quantity.
Collapse
Affiliation(s)
- Alec Solway
- Department of Psychology, University of Maryland-College Park, United States; Program in Neuroscience and Cognitive Science, University of Maryland-College Park, United States.
| | - Zhen Lin
- Department of Psychology, University of Maryland-College Park, United States
| | - Claire M Kaplan
- Department of Psychology, University of Maryland-College Park, United States
| |
Collapse
|
6
|
Solway A, Lohrenz T, Montague PR. Loss Aversion Correlates With the Propensity to Deploy Model-Based Control. Front Neurosci 2019; 13:915. [PMID: 31555082 PMCID: PMC6743018 DOI: 10.3389/fnins.2019.00915] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Accepted: 08/16/2019] [Indexed: 11/13/2022] Open
Abstract
Reward-based decision making is thought to be driven by at least two different types of decision systems: a simple stimulus–response cache-based system which embodies the common-sense notion of “habit,” for which model-free reinforcement learning serves as a computational substrate, and a more deliberate, prospective, model-based planning system. Previous work has shown that loss aversion, a well-studied measure of how much more on average individuals weigh losses relative to gains during decision making, is reduced when participants take all possible decisions and outcomes into account including future ones, relative to when they myopically focus on the current decision. Model-based control offers a putative mechanism for implementing such foresight. Using a well-powered data set (N = 117) in which participants completed two different tasks designed to measure each of the two quantities of interest, and four models of choice data for these tasks, we found consistent evidence of a relationship between loss aversion and model-based control but in the direction opposite to that expected based on previous work: loss aversion had a positive relationship with model-based control. We did not find evidence for a relationship between either decision system and risk aversion, a related aspect of subjective utility.
Collapse
Affiliation(s)
- Alec Solway
- Virginia Tech Carilion Research Institute, Roanoke, VA, United States
| | - Terry Lohrenz
- Virginia Tech Carilion Research Institute, Roanoke, VA, United States
| | - P Read Montague
- Virginia Tech Carilion Research Institute, Roanoke, VA, United States.,Department of Physics, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States.,Wellcome Trust Centre for Neuroimaging, University College London, London, United Kingdom
| |
Collapse
|
7
|
Solway A, Gu X, Montague PR. Forgetting to Be Addicted: Reconsolidation and the Disconnection of Things Past. Biol Psychiatry 2017; 82:774-775. [PMID: 29110816 DOI: 10.1016/j.biopsych.2017.09.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/16/2017] [Accepted: 09/20/2017] [Indexed: 11/15/2022]
Affiliation(s)
- Alec Solway
- Virginia Tech Carilion Research Institute, Virginia Tech, Blacksburg, Virginia.
| | - Xiaosi Gu
- Center for Brain Health, University of Texas at Dallas, Richardson, Texas
| | - P Read Montague
- Virginia Tech Carilion Research Institute, Virginia Tech, Blacksburg, Virginia; Department of Physics, Virginia Tech, Blacksburg, Virginia; Wellcome Trust Centre for Neuroimaging, University College London, London, United Kingdom
| |
Collapse
|
8
|
Abstract
The laboratory study of how humans and other animals trade-off value and time has a long and storied history, and is the subject of a vast literature. However, despite a long history of study, there is no agreed upon mechanistic explanation of how intertemporal choice preferences arise. Several theorists have recently proposed model-based reinforcement learning as a candidate framework. This framework describes a suite of algorithms by which a model of the environment, in the form of a state transition function and reward function, can be converted on-line into a decision. The state transition function allows the model-based system to make decisions based on projected future states, while the reward function assigns value to each state, together capturing the necessary components for successful intertemporal choice. Empirical work has also pointed to a possible relationship between increased prospection and reduced discounting. In the current paper, we look for direct evidence of a relationship between temporal discounting and model-based control in a large new data set (n = 168). However, testing the relationship under several different modeling formulations revealed no indication that the two quantities are related.
Collapse
Affiliation(s)
- Alec Solway
- Virginia Tech Carilion Research Institute, Roanoke, VA, USA
| | - Terry Lohrenz
- Virginia Tech Carilion Research Institute, Roanoke, VA, USA
| | - P Read Montague
- Virginia Tech Carilion Research Institute, Roanoke, VA, USA.,Department of Physics, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA.,Wellcome Trust Centre for Neuroimaging, University College London, London, UK
| |
Collapse
|
9
|
|
10
|
Abstract
Human behavior has long been recognized to display hierarchical structure: actions fit together into subtasks, which cohere into extended goal-directed activities. Arranging actions hierarchically has well established benefits, allowing behaviors to be represented efficiently by the brain, and allowing solutions to new tasks to be discovered easily. However, these payoffs depend on the particular way in which actions are organized into a hierarchy, the specific way in which tasks are carved up into subtasks. We provide a mathematical account for what makes some hierarchies better than others, an account that allows an optimal hierarchy to be identified for any set of tasks. We then present results from four behavioral experiments, suggesting that human learners spontaneously discover optimal action hierarchies. In order to accomplish everyday tasks, we often divide them up into subtasks: to make spaghetti, we (1) get out a pot, (2) fill it with water, (3) bring the water to a boil, and so forth. But how do we learn to subdivide our goals in this way? Work from computer science suggests that the way a task is subdivided or decomposed can have a dramatic impact on how easy the task is to accomplish: certain decompositions speed learning and planning compared to others. Moreover, some decompositions allow behaviors to be represented more simply. Despite this general insight, little work has been done to formalize these ideas. We outline a mathematical framework to address this question, based on methods for comparing between statistical models. We then present four behavioral experiments, showing that human learners spontaneously discover optimal task decompositions.
Collapse
Affiliation(s)
- Alec Solway
- Princeton Neuroscience Institute, Princeton University, Princeton, New Jersey, United States of America
| | - Carlos Diuk
- Princeton Neuroscience Institute, Princeton University, Princeton, New Jersey, United States of America
| | - Natalia Córdova
- Princeton Neuroscience Institute, Princeton University, Princeton, New Jersey, United States of America
| | - Debbie Yee
- Princeton Neuroscience Institute, Princeton University, Princeton, New Jersey, United States of America
| | - Andrew G. Barto
- School of Computer Science, University of Massachusetts Amherst, Amherst, Massachusetts, United States of America
| | - Yael Niv
- Princeton Neuroscience Institute, Princeton University, Princeton, New Jersey, United States of America
- Department of Psychology, Princeton University, Princeton, New Jersey, United States of America
| | - Matthew M. Botvinick
- Princeton Neuroscience Institute, Princeton University, Princeton, New Jersey, United States of America
- Department of Psychology, Princeton University, Princeton, New Jersey, United States of America
- * E-mail:
| |
Collapse
|
11
|
Miller JF, Neufang M, Solway A, Brandt A, Trippel M, Mader I, Hefft S, Merkow M, Polyn SM, Jacobs J, Kahana MJ, Schulze-Bonhage A. Neural activity in human hippocampal formation reveals the spatial context of retrieved memories. Science 2013; 342:1111-4. [PMID: 24288336 DOI: 10.1126/science.1244056] [Citation(s) in RCA: 201] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
In many species, spatial navigation is supported by a network of place cells that exhibit increased firing whenever an animal is in a certain region of an environment. Does this neural representation of location form part of the spatiotemporal context into which episodic memories are encoded? We recorded medial temporal lobe neuronal activity as epilepsy patients performed a hybrid spatial and episodic memory task. We identified place-responsive cells active during virtual navigation and then asked whether the same cells activated during the subsequent recall of navigation-related memories without actual navigation. Place-responsive cell activity was reinstated during episodic memory retrieval. Neuronal firing during the retrieval of each memory was similar to the activity that represented the locations in the environment where the memory was initially encoded.
Collapse
|
12
|
Westerman S, Sutherland E, Gardner P, Baig N, Critchley C, Hickey C, Mehigan S, Solway A, Zervos Z. The design of consumer packaging: Effects of manipulations of shape, orientation, and alignment of graphical forms on consumers’ assessments. Food Qual Prefer 2013. [DOI: 10.1016/j.foodqual.2012.05.007] [Citation(s) in RCA: 70] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
13
|
Abstract
Recent work has given rise to the view that reward-based decision making is governed by two key controllers: a habit system, which stores stimulus-response associations shaped by past reward, and a goal-oriented system that selects actions based on their anticipated outcomes. The current literature provides a rich body of computational theory addressing habit formation, centering on temporal-difference learning mechanisms. Less progress has been made toward formalizing the processes involved in goal-directed decision making. We draw on recent work in cognitive neuroscience, animal conditioning, cognitive and developmental psychology, and machine learning to outline a new theory of goal-directed decision making. Our basic proposal is that the brain, within an identifiable network of cortical and subcortical structures, implements a probabilistic generative model of reward, and that goal-directed decision making is effected through Bayesian inversion of this model. We present a set of simulations implementing the account, which address benchmark behavioral and neuroscientific findings, and give rise to a set of testable predictions. We also discuss the relationship between the proposed framework and other models of decision making, including recent models of perceptual choice, to which our theory bears a direct connection.
Collapse
Affiliation(s)
- Alec Solway
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08540, USA
| | | |
Collapse
|
14
|
Abstract
Human behavior displays hierarchical structure: simple actions cohere into subtask sequences, which work together to accomplish overall task goals. Although the neural substrates of such hierarchy have been the target of increasing research, they remain poorly understood. We propose that the computations supporting hierarchical behavior may relate to those in hierarchical reinforcement learning (HRL), a machine-learning framework that extends reinforcement-learning mechanisms into hierarchical domains. To test this, we leveraged a distinctive prediction arising from HRL. In ordinary reinforcement learning, reward prediction errors are computed when there is an unanticipated change in the prospects for accomplishing overall task goals. HRL entails that prediction errors should also occur in relation to task subgoals. In three neuroimaging studies we observed neural responses consistent with such subgoal-related reward prediction errors, within structures previously implicated in reinforcement learning. The results reported support the relevance of HRL to the neural processes underlying hierarchical behavior.
Collapse
|