1
|
The utility of a latent-cause framework for understanding addiction phenomena. ADDICTION NEUROSCIENCE 2024; 10:100143. [PMID: 38524664 PMCID: PMC10959497 DOI: 10.1016/j.addicn.2024.100143] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/26/2024]
Abstract
Computational models of addiction often rely on a model-free reinforcement learning (RL) formulation, owing to the close associations between model-free RL, habitual behavior and the dopaminergic system. However, such formulations typically do not capture key recurrent features of addiction phenomena such as craving and relapse. Moreover, they cannot account for goal-directed aspects of addiction that necessitate contrasting, model-based formulations. Here we synthesize a growing body of evidence and propose that a latent-cause framework can help unify our understanding of several recurrent phenomena in addiction, by viewing them as the inferred return of previous, persistent "latent causes". We demonstrate that applying this framework to Pavlovian and instrumental settings can help account for defining features of craving and relapse such as outcome-specificity, generalization, and cyclical dynamics. Finally, we argue that this framework can bridge model-free and model-based formulations, and account for individual variability in phenomenology by accommodating the memories, beliefs, and goals of those living with addiction, motivating a centering of the individual, subjective experience of addiction and recovery.
Collapse
|
2
|
Affect-congruent attention modulates generalized reward expectations. PLoS Comput Biol 2023; 19:e1011707. [PMID: 38127874 PMCID: PMC10781156 DOI: 10.1371/journal.pcbi.1011707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 01/10/2024] [Accepted: 11/23/2023] [Indexed: 12/23/2023] Open
Abstract
Positive and negative affective states are respectively associated with optimistic and pessimistic expectations regarding future reward. One mechanism that might underlie these affect-related expectation biases is attention to positive- versus negative-valence features (e.g., attending to the positive reviews of a restaurant versus its expensive price). Here we tested the effects of experimentally induced positive and negative affect on feature-based attention in 120 participants completing a compound-generalization task with eye-tracking. We found that participants' reward expectations for novel compound stimuli were modulated in an affect-congruent way: positive affect induction increased reward expectations for compounds, whereas negative affect induction decreased reward expectations. Computational modelling and eye-tracking analyses each revealed that these effects were driven by affect-congruent changes in participants' allocation of attention to high- versus low-value features of compounds. These results provide mechanistic insight into a process by which affect produces biases in generalized reward expectations.
Collapse
|
3
|
Inattentive responding can induce spurious associations between task behaviour and symptom measures. Nat Hum Behav 2023; 7:1667-1681. [PMID: 37414886 DOI: 10.1038/s41562-023-01640-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Accepted: 05/23/2023] [Indexed: 07/08/2023]
Abstract
Although online samples have many advantages for psychiatric research, some potential pitfalls of this approach are not widely understood. Here we detail circumstances in which spurious correlations may arise between task behaviour and symptom scores. The problem arises because many psychiatric symptom surveys have asymmetric score distributions in the general population, meaning that careless responders on these surveys will show apparently elevated symptom levels. If these participants are similarly careless in their task performance, this may result in a spurious association between symptom scores and task behaviour. We demonstrate this pattern of results in two samples of participants recruited online (total N = 779) who performed one of two common cognitive tasks. False-positive rates for these spurious correlations increase with sample size, contrary to common assumptions. Excluding participants flagged for careless responding on surveys abolished the spurious correlations, but exclusion based on task performance alone was less effective.
Collapse
|
4
|
Multiple routes to enhanced memory for emotionally relevant events. Trends Cogn Sci 2023; 27:867-882. [PMID: 37479601 DOI: 10.1016/j.tics.2023.06.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 06/13/2023] [Accepted: 06/15/2023] [Indexed: 07/23/2023]
Abstract
Events associated with aversive or rewarding outcomes are prioritized in memory. This memory boost is commonly attributed to the elicited affective response, closely linked to noradrenergic and dopaminergic modulation of hippocampal plasticity. Herein we review and compare this 'affect' mechanism to an additional, recently discovered, 'prediction' mechanism whereby memories are strengthened by the extent to which outcomes deviate from expectations, that is, by prediction errors (PEs). The mnemonic impact of PEs is separate from the affective outcome itself and has a distinct neural signature. While both routes enhance memory, these mechanisms are linked to different - and sometimes opposing - predictions for memory integration. We discuss new findings that highlight mechanisms by which emotional events strengthen, integrate, and segment memory.
Collapse
|
5
|
Improving the Reliability of Cognitive Task Measures: A Narrative Review. BIOLOGICAL PSYCHIATRY. COGNITIVE NEUROSCIENCE AND NEUROIMAGING 2023; 8:789-797. [PMID: 36842498 PMCID: PMC10440239 DOI: 10.1016/j.bpsc.2023.02.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 01/26/2023] [Accepted: 02/03/2023] [Indexed: 02/28/2023]
Abstract
Cognitive tasks are capable of providing researchers with crucial insights into the relationship between cognitive processing and psychiatric phenomena. However, many recent studies have found that task measures exhibit poor reliability, which hampers their usefulness for individual differences research. Here, we provide a narrative review of approaches to improve the reliability of cognitive task measures. Specifically, we introduce a taxonomy of experiment design and analysis strategies for improving task reliability. Where appropriate, we highlight studies that are exemplary for improving the reliability of specific task measures. We hope that this article can serve as a helpful guide for experimenters who wish to design a new task, or improve an existing one, to achieve sufficient reliability for use in individual differences research.
Collapse
|
6
|
A practical guide for studying human behavior in the lab. Behav Res Methods 2023; 55:58-76. [PMID: 35262897 DOI: 10.3758/s13428-022-01793-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/04/2022] [Indexed: 11/08/2022]
Abstract
In the last few decades, the field of neuroscience has witnessed major technological advances that have allowed researchers to measure and control neural activity with great detail. Yet, behavioral experiments in humans remain an essential approach to investigate the mysteries of the mind. Their relatively modest technological and economic requisites make behavioral research an attractive and accessible experimental avenue for neuroscientists with very diverse backgrounds. However, like any experimental enterprise, it has its own inherent challenges that may pose practical hurdles, especially to less experienced behavioral researchers. Here, we aim at providing a practical guide for a steady walk through the workflow of a typical behavioral experiment with human subjects. This primer concerns the design of an experimental protocol, research ethics, and subject care, as well as best practices for data collection, analysis, and sharing. The goal is to provide clear instructions for both beginners and experienced researchers from diverse backgrounds in planning behavioral experiments.
Collapse
|
7
|
The challenges of lifelong learning in biological and artificial systems. Trends Cogn Sci 2022; 26:1051-1053. [PMID: 36335012 PMCID: PMC9676180 DOI: 10.1016/j.tics.2022.09.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Accepted: 09/28/2022] [Indexed: 11/11/2022]
Abstract
How do biological systems learn continuously throughout their lifespans, adapting to change while retaining old knowledge, and how can these principles be applied to artificial learning systems? In this Forum article we outline challenges and strategies of 'lifelong learning' in biological and artificial systems, and argue that a collaborative study of each system's failure modes can benefit both.
Collapse
|
8
|
Humans combine value learning and hypothesis testing strategically in multi-dimensional probabilistic reward learning. PLoS Comput Biol 2022; 18:e1010699. [PMID: 36417419 PMCID: PMC9683628 DOI: 10.1371/journal.pcbi.1010699] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Accepted: 10/31/2022] [Indexed: 11/24/2022] Open
Abstract
Realistic and complex decision tasks often allow for many possible solutions. How do we find the correct one? Introspection suggests a process of trying out solutions one after the other until success. However, such methodical serial testing may be too slow, especially in environments with noisy feedback. Alternatively, the underlying learning process may involve implicit reinforcement learning that learns about many possibilities in parallel. Here we designed a multi-dimensional probabilistic active-learning task tailored to study how people learn to solve such complex problems. Participants configured three-dimensional stimuli by selecting features for each dimension and received probabilistic reward feedback. We manipulated task complexity by changing how many feature dimensions were relevant to maximizing reward, as well as whether this information was provided to the participants. To investigate how participants learn the task, we examined models of serial hypothesis testing, feature-based reinforcement learning, and combinations of the two strategies. Model comparison revealed evidence for hypothesis testing that relies on reinforcement-learning when selecting what hypothesis to test. The extent to which participants engaged in hypothesis testing depended on the instructed task complexity: people tended to serially test hypotheses when instructed that there were fewer relevant dimensions, and relied more on gradual and parallel learning of feature values when the task was more complex. This demonstrates a strategic use of task information to balance the costs and benefits of the two methods of learning.
Collapse
|
9
|
The effects of induced positive and negative affect on Pavlovian-instrumental interactions. Cogn Emot 2022; 36:1343-1360. [PMID: 35929878 PMCID: PMC9852069 DOI: 10.1080/02699931.2022.2109600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 07/19/2022] [Accepted: 07/26/2022] [Indexed: 01/22/2023]
Abstract
Across species, animals have an intrinsic drive to approach appetitive stimuli and to withdraw from aversive stimuli. In affective science, influential theories of emotion link positive affect with strengthened behavioural approach and negative affect with avoidance. Based on these theories, we predicted that individuals' positive and negative affect levels should particularly influence their behaviour when innate Pavlovian approach/avoidance tendencies conflict with learned instrumental behaviours. Here, across two experiments - exploratory Experiment 1 (N = 91) and a preregistered confirmatory Experiment 2 (N = 335) - we assessed how induced positive and negative affect influenced Pavlovian-instrumental interactions in a reward/punishment Go/No-Go task. Contrary to our hypotheses, we found no evidence for a main effect of positive/negative affect on either approach/avoidance behaviour or Pavlovian-instrumental interactions. However, we did find evidence that the effects of induced affect on behaviour were moderated by individual differences in self-reported behavioural inhibition and gender. Exploratory computational modelling analyses explained these demographic moderating effects as arising from positive correlations between demographic factors and individual differences in the strength of Pavlovian-instrumental interactions. These findings serve to sharpen our understanding of the effects of positive and negative affect on instrumental behaviour.
Collapse
|
10
|
Abstract
How does rumination affect reinforcement learning-the ubiquitous process by which we adjust behavior after error in order to behave more effectively in the future? In a within-subject design (n=49), we tested whether experimentally manipulated rumination disrupts reinforcement learning in a multidimensional learning task previously shown to rely on selective attention. Rumination impaired performance, yet unexpectedly this impairment could not be attributed to decreased attentional breadth (quantified using a "decay" parameter in a computational model). Instead, trait rumination (between subjects) was associated with higher decay rates (implying narrower attention), yet not with impaired performance. Our task-performance results accord with the possibility that state rumination promotes stress-generating behavior in part by disrupting reinforcement learning. The trait-rumination finding accords with the predictions of a prominent model of trait rumination (the attentional-scope model). More work is needed to understand the specific mechanisms by which state rumination disrupts reinforcement learning.
Collapse
|
11
|
Minimal cross-trial generalization in learning the representation of an odor-guided choice task. PLoS Comput Biol 2022; 18:e1009897. [PMID: 35333867 PMCID: PMC8986096 DOI: 10.1371/journal.pcbi.1009897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2021] [Revised: 04/06/2022] [Accepted: 02/04/2022] [Indexed: 11/18/2022] Open
Abstract
There is no single way to represent a task. Indeed, despite experiencing the same task events and contingencies, different subjects may form distinct task representations. As experimenters, we often assume that subjects represent the task as we envision it. However, such a representation cannot be taken for granted, especially in animal experiments where we cannot deliver explicit instruction regarding the structure of the task. Here, we tested how rats represent an odor-guided choice task in which two odor cues indicated which of two responses would lead to reward, whereas a third odor indicated free choice among the two responses. A parsimonious task representation would allow animals to learn from the forced trials what is the better option to choose in the free-choice trials. However, animals may not necessarily generalize across odors in this way. We fit reinforcement-learning models that use different task representations to trial-by-trial choice behavior of individual rats performing this task, and quantified the degree to which each animal used the more parsimonious representation, generalizing across trial types. Model comparison revealed that most rats did not acquire this representation despite extensive experience. Our results demonstrate the importance of formally testing possible task representations that can afford the observed behavior, rather than assuming that animals’ task representations abide by the generative task structure that governs the experimental design. To study how animals learn and make decisions, scientists design experiments, train animals to perform them, and observe how they behave. During this process, an important but rarely asked question is how animals understand the experiment. Merely through observing animals’ behavior in a task, it is often hard to determine if they understand the task in the same way as the experimenter expects. Assuming that animals represent tasks differently than they actually do may lead to incorrect interpretations of behavioral or neural results. Here, we compared different possible representations for a simple reward-learning task in terms of how well these alternative models explain animal’s choice behavior. We found that rats did not represent the task in the most parsimonious way, thereby failing to learn from forced-choice trials what rewards are available on free-choice trials, despite extensive training on the task. These results caution against simply assuming that animals’ understanding of a task corresponds to the way the task was designed.
Collapse
|
12
|
Corrigendum: Gradual extinction prevents the return of fear: implications for the discovery of state. Front Behav Neurosci 2021; 15:786900. [PMID: 34912199 PMCID: PMC8667957 DOI: 10.3389/fnbeh.2021.786900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 10/14/2021] [Indexed: 11/29/2022] Open
|
13
|
Abstract
Mood is an integrative and diffuse affective state that is thought to exert a pervasive effect on cognition and behavior. At the same time, mood itself is thought to fluctuate slowly as a product of feedback from interactions with the environment. Here we present a new computational theory of the valence of mood-the Integrated Advantage model-that seeks to account for this bidirectional interaction. Adopting theoretical formalisms from reinforcement learning, we propose to conceptualize the valence of mood as a leaky integral of an agent's appraisals of the Advantage of its actions. This model generalizes and extends previous models of mood wherein affective valence was conceptualized as a moving average of reward prediction errors. We give a full theoretical derivation of the Integrated Advantage model and provide a functional explanation of how an integrated-Advantage variable could be deployed adaptively by a biological agent to accelerate learning in complex and/or stochastic environments. Specifically, drawing on stochastic optimization theory, we propose that an agent can utilize our hypothesized form of mood to approximate a momentum-based update to its behavioral policy, thereby facilitating rapid learning of optimal actions. We then show how this model of mood provides a principled and parsimonious explanation for a number of contextual effects on mood from the affective science literature, including expectation- and surprise-related effects, counterfactual effects from information about foregone alternatives, action-typicality effects, and action/inaction asymmetry. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Collapse
|
14
|
Orbitofrontal cortex and learning predictions of state transitions. Behav Neurosci 2021; 135:487-497. [PMID: 34291969 DOI: 10.1037/bne0000461] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The orbitofrontal cortex (OFC) has been implicated in goal-directed planning and model-based decision-making. One key prerequisite for model-based decision-making is learning the transition structure of the environment-the probabilities of transitioning from one environmental state to another. In this work, we investigated how the OFC might be involved in learning this transition structure, by using fMRI to assess OFC activity while humans experienced probabilistic cue-outcome transitions. We found that OFC activity was indeed correlated with behavioral measures of learning about transition structure. On a trial-by-trial basis, OFC activity was associated with subsequently increased expectation of the more probable outcome; that is, with subsequently more optimal cue-outcome predictions. Interestingly, this relationship was observed no matter what outcome occurred at the time of the OFC activity, and thus is inconsistent with an interpretation of the OFC activity as representing a "state prediction error" that would facilitate learning transitions via error-correcting mechanisms. Finally, OFC activity was related to more optimal predictions only for subsequent trials involving the same cue that was observed at the time of OFC activity-this relationship was not observed for subsequent trials involving a different cue. All together, these results indicate that the OFC is involved in updating or reinforcing a learned transition model on a trial-by-trial basis, specifically for the currently observed cue-outcome associations. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Collapse
|
15
|
Abstract
Understanding the brain requires us to answer both what the brain does, and how it does it. Using a series of examples, I make the case that behavior is often more useful than neuroscientific measurements for answering the first question. Moreover, I show that even for "how" questions that pertain to neural mechanism, a well-crafted behavioral paradigm can offer deeper insight and stronger constraints on computational and mechanistic models than do many highly challenging (and very expensive) neural studies. I conclude that purely behavioral research is essential for understanding the brain-especially its cognitive functions-contrary to the opinion of prominent funding bodies and some scientific journals, who erroneously place neural data on a pedestal and consider behavior to be subsidiary. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Collapse
|
16
|
The case against economic values in the orbitofrontal cortex (or anywhere else in the brain). Behav Neurosci 2021; 135:192-201. [PMID: 34060875 DOI: 10.1037/bne0000448] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Much of traditional neuroeconomics proceeds from the hypothesis that value is reified in the brain, that is, that there are neurons or brain regions whose responses serve the discrete purpose of encoding value. This hypothesis is supported by the finding that the activity of many neurons covaries with subjective value as estimated in specific tasks, and has led to the idea that the primary function of the orbitofrontal cortex is to compute and signal economic value. Here we consider an alternative: That economic value, in the cardinal, common-currency sense, is not represented in the brain and used for choice by default. This idea is motivated by consideration of the economic concept of value, which places important epistemic constraints on our ability to identify its neural basis. It is also motivated by the behavioral economics literature, especially work on heuristics, which proposes value-free process models for much if not all of choice. Finally, it is buoyed by recent neural and behavioral findings regarding how animals and humans learn to choose between options. In light of our hypothesis, we critically reevaluate putative neural evidence for the representation of value and explore an alternative: direct learning of action policies. We delineate how this alternative can provide a robust account of behavior that concords with existing empirical data. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Collapse
|
17
|
Abstract
The central theme of this review is the dynamic interaction between information selection and learning. We pose a fundamental question about this interaction: How do we learn what features of our experiences are worth learning about? In humans, this process depends on attention and memory, two cognitive functions that together constrain representations of the world to features that are relevant for goal attainment. Recent evidence suggests that the representations shaped by attention and memory are themselves inferred from experience with each task. We review this evidence and place it in the context of work that has explicitly characterized representation learning as statistical inference. We discuss how inference can be scaled to real-world decisions by approximating beliefs based on a small number of experiences. Finally, we highlight some implications of this inference process for human decision-making in social environments.
Collapse
|
18
|
Signed and unsigned reward prediction errors dynamically enhance learning and memory. eLife 2021; 10:e61077. [PMID: 33661094 PMCID: PMC8041467 DOI: 10.7554/elife.61077] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 02/26/2021] [Indexed: 02/05/2023] Open
Abstract
Memory helps guide behavior, but which experiences from the past are prioritized? Classic models of learning posit that events associated with unpredictable outcomes as well as, paradoxically, predictable outcomes, deploy more attention and learning for those events. Here, we test reinforcement learning and subsequent memory for those events, and treat signed and unsigned reward prediction errors (RPEs), experienced at the reward-predictive cue or reward outcome, as drivers of these two seemingly contradictory signals. By fitting reinforcement learning models to behavior, we find that both RPEs contribute to learning by modulating a dynamically changing learning rate. We further characterize the effects of these RPE signals on memory and show that both signed and unsigned RPEs enhance memory, in line with midbrain dopamine and locus-coeruleus modulation of hippocampal plasticity, thereby reconciling separate findings in the literature.
Collapse
|
19
|
A recurring reproduction error in the administration of the Generalized Anxiety Disorder scale. Lancet Psychiatry 2021; 8:180-181. [PMID: 33610220 DOI: 10.1016/s2215-0366(21)00001-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Revised: 12/22/2020] [Accepted: 12/22/2020] [Indexed: 10/22/2022]
|
20
|
|
21
|
Reward prediction errors create event boundaries in memory. Cognition 2020; 203:104269. [PMID: 32563083 DOI: 10.1016/j.cognition.2020.104269] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2019] [Revised: 03/13/2020] [Accepted: 03/16/2020] [Indexed: 10/24/2022]
Abstract
We remember when things change. Particularly salient are experiences where there is a change in rewards, eliciting reward prediction errors (RPEs). How do RPEs influence our memory of those experiences? One idea is that this signal directly enhances the encoding of memory. Another, not mutually exclusive, idea is that the RPE signals a deeper change in the environment, leading to the mnemonic separation of subsequent experiences from what came before, thereby creating a new latent context and a more separate memory trace. We tested this in four experiments where participants learned to predict rewards associated with a series of trial-unique images. High-magnitude RPEs indicated a change in the underlying distribution of rewards. To test whether these large RPEs created a new latent context, we first assessed recognition priming for sequential pairs that included a high-RPE event or not (Exp. 1: n = 27 & Exp. 2: n = 83). We found evidence of recognition priming for the high-RPE event, indicating that the high-RPE event is bound to its predecessor in memory. Given that high-RPE events are themselves preferentially remembered (Rouhani, Norman, & Niv, 2018), we next tested whether there was an event boundary across a high-RPE event (i.e., excluding the high-RPE event itself; Exp. 3: n = 85). Here, sequential pairs across a high RPE no longer showed recognition priming whereas pairs within the same latent reward state did, providing initial evidence for an RPE-modulated event boundary. We then investigated whether RPE event boundaries disrupt temporal memory by asking participants to order and estimate the distance between two events that had either included a high-RPE event between them or not (Exp. 4). We found (n = 49) and replicated (n = 77) worse sequence memory for events across a high RPE. In line with our recognition priming results, we did not find sequence memory to be impaired between the high-RPE event and its predecessor, but instead found worse sequence memory for pairs across a high-RPE event. Moreover, greater distance between events at encoding led to better sequence memory for events across a low-RPE event, but not a high-RPE event, suggesting separate mechanisms for the temporal ordering of events within versus across a latent reward context. Altogether, these findings demonstrate that high-RPE events are both more strongly encoded, show intact links with their predecessor, and act as event boundaries that interrupt the sequential integration of events. We captured these effects in a variant of the Context Maintenance and Retrieval model (CMR; Polyn, Norman, & Kahana, 2009), modified to incorporate RPEs into the encoding process.
Collapse
|
22
|
Dopamine transients do not act as model-free prediction errors during associative learning. Nat Commun 2020; 11:106. [PMID: 31913274 PMCID: PMC6949299 DOI: 10.1038/s41467-019-13953-1] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2019] [Accepted: 12/05/2019] [Indexed: 01/07/2023] Open
Abstract
Dopamine neurons are proposed to signal the reward prediction error in model-free reinforcement learning algorithms. This term represents the unpredicted or 'excess' value of the rewarding event, value that is then added to the intrinsic value of any antecedent cues, contexts or events. To support this proposal, proponents cite evidence that artificially-induced dopamine transients cause lasting changes in behavior. Yet these studies do not generally assess learning under conditions where an endogenous prediction error would occur. Here, to address this, we conducted three experiments where we optogenetically activated dopamine neurons while rats were learning associative relationships, both with and without reward. In each experiment, the antecedent cues failed to acquire value and instead entered into associations with the later events, whether valueless cues or valued rewards. These results show that in learning situations appropriate for the appearance of a prediction error, dopamine transients support associative, rather than model-free, learning.
Collapse
|
23
|
Complementary Task Structure Representations in Hippocampus and Orbitofrontal Cortex during an Odor Sequence Task. Curr Biol 2019; 29:3402-3409.e3. [PMID: 31588004 DOI: 10.1016/j.cub.2019.08.040] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2019] [Revised: 07/31/2019] [Accepted: 08/16/2019] [Indexed: 11/29/2022]
Abstract
Both hippocampus (HPC) and orbitofrontal cortex (OFC) have been shown to be critical for behavioral tasks that require use of an internal model or cognitive map, composed of the states and the relationships between them, which define the current environment or task at hand. One general idea is that the HPC provides the cognitive map, which is then transformed by OFC to emphasize information of relevance to current goals. Our previous analysis of ensemble activity in OFC in rats performing an odor sequence task revealed a rich representation of behaviorally relevant task structure, consistent with this proposal. Here, we compared those data to recordings from single units in area CA1 of the HPC of rats performing the same task. Contrary to expectations that HPC ensembles would represent detailed, even incidental, information defining the full task space, we found that HPC ensembles-like those in OFC-failed to distinguish states when it was not behaviorally necessary. However, hippocampal ensembles were better than those in OFC at distinguishing task states in which prospective memory was necessary for future performance. These results suggest that, in familiar environments, the HPC and OFC may play complementary roles, with the OFC maintaining the subjects' current position on the cognitive map or state space, supported by HPC when memory demands are high.
Collapse
|
24
|
Uncovering the 'state': Tracing the hidden state representations that structure learning and decision-making. Behav Processes 2019; 167:103891. [PMID: 31381985 DOI: 10.1016/j.beproc.2019.103891] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 05/23/2019] [Accepted: 06/21/2019] [Indexed: 02/02/2023]
Abstract
We review the abstract concept of a 'state' - an internal representation posited by reinforcement learning theories to be used by an agent, whether animal, human or artificial, to summarize the features of the external and internal environment that are relevant for future behavior on a particular task. Armed with this summary representation, an agent can make decisions and perform actions to interact effectively with the world. Here, we review recent findings from the neurobiological and behavioral literature to ask: 'what is a state?' with respect to the internal representations that organize learning and decision making across a range of tasks. We find that state representations include information beyond a straightforward summary of the immediate cues in the environment, providing timing or contextual information from the recent or more distant past, which allows these additional factors to influence decision making and other goal-directed behaviors in complex and perhaps unexpected ways.
Collapse
|
25
|
Depressive symptoms bias the prediction-error enhancement of memory towards negative events in reinforcement learning. Psychopharmacology (Berl) 2019; 236:2425-2435. [PMID: 31346654 PMCID: PMC6697578 DOI: 10.1007/s00213-019-05322-z] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Accepted: 06/30/2019] [Indexed: 01/09/2023]
Abstract
RATIONALE Depression is a disorder characterized by sustained negative affect and blunted positive affect, suggesting potential abnormalities in reward learning and its interaction with episodic memory. OBJECTIVES This study investigated how reward prediction errors experienced during learning modulate memory for rewarding events in individuals with depressive and non-depressive symptoms. METHODS Across three experiments, participants learned the average values of two scene categories in two learning contexts. Each learning context had either high or low outcome variance, allowing us to test the effects of small and large prediction errors on learning and memory. Participants were later tested for their memory of trial-unique scenes that appeared alongside outcomes. We compared learning and memory performance of individuals with self-reported depressive symptoms (N = 101) to those without (N = 184). RESULTS Although there were no overall differences in reward learning between the depressive and non-depressive group, depression severity within the depressive group predicted greater error in estimating the values of the scene categories. Similarly, there were no overall differences in memory performance. However, in depressive participants, negative prediction errors enhanced episodic memory more so than did positive prediction errors, and vice versa for non-depressive participants who showed a larger effect of positive prediction errors on memory. These results reflected differences in memory both within group and across groups. CONCLUSIONS Individuals with self-reported depressive symptoms showed relatively intact reinforcement learning, but demonstrated a bias for encoding events that accompanied surprising negative outcomes versus surprising positive ones. We discuss a potential neural mechanism supporting these effects, which may underlie or contribute to the excessive negative affect observed in depression.
Collapse
|
26
|
Sequential replay of nonspatial task states in the human hippocampus. Science 2019; 364:eaaw5181. [PMID: 31249030 PMCID: PMC7241311 DOI: 10.1126/science.aaw5181] [Citation(s) in RCA: 113] [Impact Index Per Article: 22.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2018] [Accepted: 04/26/2019] [Indexed: 12/25/2022]
Abstract
Sequential neural activity patterns related to spatial experiences are "replayed" in the hippocampus of rodents during rest. We investigated whether replay of nonspatial sequences can be detected noninvasively in the human hippocampus. Participants underwent functional magnetic resonance imaging (fMRI) while resting after performing a decision-making task with sequential structure. Hippocampal fMRI patterns recorded at rest reflected sequentiality of previously experienced task states, with consecutive patterns corresponding to nearby states. Hippocampal sequentiality correlated with the fidelity of task representations recorded in the orbitofrontal cortex during decision-making, which were themselves related to better task performance. Our findings suggest that hippocampal replay may be important for building representations of complex, abstract tasks elsewhere in the brain and establish feasibility of investigating fast replay signals with fMRI.
Collapse
|
27
|
State representation in mental illness. Curr Opin Neurobiol 2019; 55:160-166. [PMID: 31051434 DOI: 10.1016/j.conb.2019.03.011] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Revised: 03/10/2019] [Accepted: 03/25/2019] [Indexed: 10/26/2022]
Abstract
Reinforcement learning theory provides a powerful set of computational ideas for modeling human learning and decision making. Reinforcement learning algorithms rely on state representations that enable efficient behavior by focusing only on aspects relevant to the task at hand. Forming such representations often requires selective attention to the sensory environment, and recalling memories of relevant past experiences. A striking range of psychiatric disorders, including bipolar disorder and schizophrenia, involve changes in these cognitive processes. We review and discuss evidence that these changes can be cast as altered state representation, with the goal of providing a useful transdiagnostic dimension along which mental disorders can be understood and compared.
Collapse
|
28
|
Neural Signatures of Prediction Errors in a Decision-Making Task Are Modulated by Action Execution Failures. Curr Biol 2019; 29:1606-1613.e5. [PMID: 31056386 DOI: 10.1016/j.cub.2019.04.011] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Revised: 03/04/2019] [Accepted: 04/04/2019] [Indexed: 11/24/2022]
Abstract
Decisions must be implemented through actions, and actions are prone to error. As such, when an expected outcome is not obtained, an individual should be sensitive to not only whether the choice itself was suboptimal but also whether the action required to indicate that choice was executed successfully. The intelligent assignment of credit to action execution versus action selection has clear ecological utility for the learner. To explore this, we used a modified version of a classic reinforcement learning task in which feedback indicated whether negative prediction errors were, or were not, associated with execution errors. Using fMRI, we asked if prediction error computations in the human striatum, a key substrate in reinforcement learning and decision making, are modulated when a failure in action execution results in the negative outcome. Participants were more tolerant of non-rewarded outcomes when these resulted from execution errors versus when execution was successful, but reward was withheld. Consistent with this behavior, a model-driven analysis of neural activity revealed an attenuation of the signal associated with negative reward prediction errors in the striatum following execution failures. These results converge with other lines of evidence suggesting that prediction errors in the mesostriatal dopamine system integrate high-level information during the evaluation of instantaneous reward outcomes.
Collapse
|
29
|
Representational structure or task structure? Bias in neural representational similarity analysis and a Bayesian method for reducing bias. PLoS Comput Biol 2019; 15:e1006299. [PMID: 31125335 PMCID: PMC6553797 DOI: 10.1371/journal.pcbi.1006299] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Revised: 06/06/2019] [Accepted: 04/06/2019] [Indexed: 11/18/2022] Open
Abstract
The activity of neural populations in the brains of humans and animals can exhibit vastly different spatial patterns when faced with different tasks or environmental stimuli. The degrees of similarity between these neural activity patterns in response to different events are used to characterize the representational structure of cognitive states in a neural population. The dominant methods of investigating this similarity structure first estimate neural activity patterns from noisy neural imaging data using linear regression, and then examine the similarity between the estimated patterns. Here, we show that this approach introduces spurious bias structure in the resulting similarity matrix, in particular when applied to fMRI data. This problem is especially severe when the signal-to-noise ratio is low and in cases where experimental conditions cannot be fully randomized in a task. We propose Bayesian Representational Similarity Analysis (BRSA), an alternative method for computing representational similarity, in which we treat the covariance structure of neural activity patterns as a hyper-parameter in a generative model of the neural data. By marginalizing over the unknown activity patterns, we can directly estimate this covariance structure from imaging data. This method offers significant reductions in bias and allows estimation of neural representational similarity with previously unattained levels of precision at low signal-to-noise ratio, without losing the possibility of deriving an interpretable distance measure from the estimated similarity. The method is closely related to Pattern Component Model (PCM), but instead of modeling the estimated neural patterns as in PCM, BRSA models the imaging data directly and is suited for analyzing data in which the order of task conditions is not fully counterbalanced. The probabilistic framework allows for jointly analyzing data from a group of participants. The method can also simultaneously estimate a signal-to-noise ratio map that shows where the learned representational structure is supported more strongly. Both this map and the learned covariance matrix can be used as a structured prior for maximum a posteriori estimation of neural activity patterns, which can be further used for fMRI decoding. Our method therefore paves the way towards a more unified and principled analysis of neural representations underlying fMRI signals. We make our tool freely available in Brain Imaging Analysis Kit (BrainIAK).
Collapse
|
30
|
Rat Orbitofrontal Ensemble Activity Contains Multiplexed but Dissociable Representations of Value and Task Structure in an Odor Sequence Task. Curr Biol 2019; 29:897-907.e3. [PMID: 30827919 DOI: 10.1016/j.cub.2019.01.048] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Revised: 09/14/2018] [Accepted: 01/18/2019] [Indexed: 11/26/2022]
Abstract
The orbitofrontal cortex (OFC) has long been implicated in signaling information about expected outcomes to facilitate adaptive or flexible behavior. Current proposals focus on signaling of expected value versus the representation of a value-agnostic cognitive map of the task. While often suggested as mutually exclusive, these alternatives may represent extreme ends of a continuum determined by task complexity and experience. As learning proceeds, an initial, detailed cognitive map might be acquired, based largely on external information. With more experience, this hypothesized map can then be tailored to include relevant abstract hidden cognitive constructs. The map would default to an expected value in situations where other attributes are largely irrelevant, but, in richer tasks, a more detailed structure might continue to be represented, at least where relevant to behavior. Here, we examined this by recording single-unit activity from the OFC in rats navigating an odor sequence task analogous to a spatial maze. The odor sequences provided a mappable state space, with 24 unique "positions" defined by sensory information, likelihood of reward, or both. Consistent with the hypothesis that the OFC represents a cognitive map tailored to the subjects' intentions or plans, we found a close correspondence between how subjects were using the sequences and the neural representations of the sequences in OFC ensembles. Multiplexed with this value-invariant representation of the task, we also found a representation of the expected value at each location. Thus, the value and task structure co-existed as dissociable components of the neural code in OFC.
Collapse
|
31
|
Holistic Reinforcement Learning: The Role of Structure and Attention. Trends Cogn Sci 2019; 23:278-292. [PMID: 30824227 DOI: 10.1016/j.tics.2019.01.010] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 01/20/2019] [Accepted: 01/24/2019] [Indexed: 10/27/2022]
Abstract
Compact representations of the environment allow humans to behave efficiently in a complex world. Reinforcement learning models capture many behavioral and neural effects but do not explain recent findings showing that structure in the environment influences learning. In parallel, Bayesian cognitive models predict how humans learn structured knowledge but do not have a clear neurobiological implementation. We propose an integration of these two model classes in which structured knowledge learned via approximate Bayesian inference acts as a source of selective attention. In turn, selective attention biases reinforcement learning towards relevant dimensions of the environment. An understanding of structure learning will help to resolve the fundamental challenge in decision science: explaining why people make the decisions they do.
Collapse
|
32
|
Abstract
Making decisions in environments with few choice options is easy. We select the action that results in the most valued outcome. Making decisions in more complex environments, where the same action can produce different outcomes in different conditions, is much harder. In such circumstances, we propose that accurate action selection relies on top-down control from the prelimbic and orbitofrontal cortices over striatal activity through distinct thalamostriatal circuits. We suggest that the prelimbic cortex exerts direct influence over medium spiny neurons in the dorsomedial striatum to represent the state space relevant to the current environment. Conversely, the orbitofrontal cortex is argued to track a subject's position within that state space, likely through modulation of cholinergic interneurons.
Collapse
|
33
|
Author Correction: Dopamine transients are sufficient and necessary for acquisition of model-based associations. Nat Neurosci 2018; 21:1493. [PMID: 30018354 DOI: 10.1038/s41593-018-0202-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In the version of this article initially published, the laser activation at the start of cue X in experiment 1 was described in the first paragraph of the Results and in the third paragraph of the Experiment 1 section of the Methods as lasting 2 s; in fact, it lasted only 1 s. The error has been corrected in the HTML and PDF versions of the article.
Collapse
|
34
|
Model-based predictions for dopamine. Curr Opin Neurobiol 2018; 49:1-7. [PMID: 29096115 PMCID: PMC6034703 DOI: 10.1016/j.conb.2017.10.006] [Citation(s) in RCA: 75] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Revised: 10/07/2017] [Accepted: 10/09/2017] [Indexed: 01/16/2023]
Abstract
Phasic dopamine responses are thought to encode a prediction-error signal consistent with model-free reinforcement learning theories. However, a number of recent findings highlight the influence of model-based computations on dopamine responses, and suggest that dopamine prediction errors reflect more dimensions of an expected outcome than scalar reward value. Here, we review a selection of these recent results and discuss the implications and complications of model-based predictions for computational theories of dopamine and learning.
Collapse
|
35
|
Dissociable effects of surprising rewards on learning and memory. J Exp Psychol Learn Mem Cogn 2018; 44:1430-1443. [PMID: 29553767 DOI: 10.1037/xlm0000518] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Reward-prediction errors track the extent to which rewards deviate from expectations, and aid in learning. How do such errors in prediction interact with memory for the rewarding episode? Existing findings point to both cooperative and competitive interactions between learning and memory mechanisms. Here, we investigated whether learning about rewards in a high-risk context, with frequent, large prediction errors, would give rise to higher fidelity memory traces for rewarding events than learning in a low-risk context. Experiment 1 showed that recognition was better for items associated with larger absolute prediction errors during reward learning. Larger prediction errors also led to higher rates of learning about rewards. Interestingly we did not find a relationship between learning rate for reward and recognition-memory accuracy for items, suggesting that these two effects of prediction errors were caused by separate underlying mechanisms. In Experiment 2, we replicated these results with a longer task that posed stronger memory demands and allowed for more learning. We also showed improved source and sequence memory for items within the high-risk context. In Experiment 3, we controlled for the difficulty of reward learning in the risk environments, again replicating the previous results. Moreover, this control revealed that the high-risk context enhanced item-recognition memory beyond the effect of prediction errors. In summary, our results show that prediction errors boost both episodic item memory and incremental reward learning, but the two effects are likely mediated by distinct underlying systems. (PsycINFO Database Record
Collapse
|
36
|
Human Orbitofrontal Cortex Represents a Cognitive Map of State Space. Neuron 2017; 91:1402-1412. [PMID: 27657452 DOI: 10.1016/j.neuron.2016.08.019] [Citation(s) in RCA: 265] [Impact Index Per Article: 37.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2016] [Revised: 07/11/2016] [Accepted: 08/08/2016] [Indexed: 11/17/2022]
Abstract
Although the orbitofrontal cortex (OFC) has been studied intensely for decades, its precise functions have remained elusive. We recently hypothesized that the OFC contains a "cognitive map" of task space in which the current state of the task is represented, and this representation is especially critical for behavior when states are unobservable from sensory input. To test this idea, we apply pattern-classification techniques to neuroimaging data from humans performing a decision-making task with 16 states. We show that unobservable task states can be decoded from activity in OFC, and decoding accuracy is related to task performance and the occurrence of individual behavioral errors. Moreover, similarity between the neural representations of consecutive states correlates with behavioral accuracy in corresponding state transitions. These results support the idea that OFC represents a cognitive map of task space and establish the feasibility of decoding state representations in humans using non-invasive neuroimaging.
Collapse
|
37
|
Abstract
Theories of episodic memory have generally proposed that individual memory traces are linked together by a representation of context that drifts slowly over time. Recent data challenge the notion that contextual drift is always slow and passive. In particular, changes in one's external environment or internal model induce discontinuities in memory that are reflected in sudden changes in neural activity, suggesting that context can shift abruptly. Furthermore, context change effects are sensitive to top-down goals, suggesting that contextual drift may be an active process. These findings call for revising models of the role of context in memory, in order to account for abrupt contextual shifts and the controllable nature of context change.
Collapse
|
38
|
Should you trust your RSA result? A Bayesian method for reducing bias in neural representational similarity analysis. J Vis 2017. [DOI: 10.1167/17.10.571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
39
|
Predicting trial-by-trial attention dynamics during human reinforcement learning. J Vis 2017. [DOI: 10.1167/17.10.1098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
40
|
Feature-based reward learning biases dimensional attention. J Vis 2017. [DOI: 10.1167/17.10.1297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
41
|
Computational approaches to fMRI analysis. Nat Neurosci 2017; 20:304-313. [PMID: 28230848 DOI: 10.1038/nn.4499] [Citation(s) in RCA: 116] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2016] [Accepted: 01/12/2017] [Indexed: 12/14/2022]
Abstract
Analysis methods in cognitive neuroscience have not always matched the richness of fMRI data. Early methods focused on estimating neural activity within individual voxels or regions, averaged over trials or blocks and modeled separately in each participant. This approach mostly neglected the distributed nature of neural representations over voxels, the continuous dynamics of neural activity during tasks, the statistical benefits of performing joint inference over multiple participants and the value of using predictive models to constrain analysis. Several recent exploratory and theory-driven methods have begun to pursue these opportunities. These methods highlight the importance of computational techniques in fMRI analysis, especially machine learning, algorithmic optimization and parallel computing. Adoption of these techniques is enabling a new generation of experiments and analyses that could transform our understanding of some of the most complex-and distinctly human-signals in the brain: acts of cognition such as thoughts, intentions and memories.
Collapse
|
42
|
Lateral Hypothalamic GABAergic Neurons Encode Reward Predictions that Are Relayed to the Ventral Tegmental Area to Regulate Learning. Curr Biol 2017; 27:2089-2100.e5. [PMID: 28690111 PMCID: PMC5564224 DOI: 10.1016/j.cub.2017.06.024] [Citation(s) in RCA: 71] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Revised: 05/08/2017] [Accepted: 06/09/2017] [Indexed: 12/27/2022]
Abstract
Eating is a learned process. Our desires for specific foods arise through experience. Both electrical stimulation and optogenetic studies have shown that increased activity in the lateral hypothalamus (LH) promotes feeding. Current dogma is that these effects reflect a role for LH neurons in the control of the core motivation to feed, and their activity comes under control of forebrain regions to elicit learned food-motivated behaviors. However, these effects could also reflect the storage of associative information about the cues leading to food in LH itself. Here, we present data from several studies that are consistent with a role for LH in learning. In the first experiment, we use a novel GAD-Cre rat to show that optogenetic inhibition of LH γ-aminobutyric acid (GABA) neurons restricted to cue presentation disrupts the rats' ability to learn that a cue predicts food without affecting subsequent food consumption. In the second experiment, we show that this manipulation also disrupts the ability of a cue to promote food seeking after learning. Finally, we show that inhibition of the terminals of the LH GABA neurons in ventral-tegmental area (VTA) facilitates learning about reward-paired cues. These results suggest that the LH GABA neurons are critical for storing and later disseminating information about reward-predictive cues.
Collapse
|
43
|
Correction: The computational nature of memory modification. eLife 2017; 6. [PMID: 28530550 PMCID: PMC5440165 DOI: 10.7554/elife.28693] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Accepted: 05/16/2017] [Indexed: 11/24/2022] Open
|
44
|
Dopamine transients are sufficient and necessary for acquisition of model-based associations. Nat Neurosci 2017; 20:735-742. [PMID: 28368385 PMCID: PMC5413864 DOI: 10.1038/nn.4538] [Citation(s) in RCA: 144] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Accepted: 02/28/2017] [Indexed: 12/12/2022]
Abstract
Associative learning is driven by prediction errors. Dopamine transients correlate with these errors, which current interpretations limit to endowing cues with a scalar quantity reflecting the value of future rewards. We tested whether dopamine might act more broadly to support learning of an associative model of the environment. Using sensory preconditioning, we show that prediction errors underlying stimulus-stimulus learning can be blocked behaviorally and reinstated by optogenetically activating dopamine neurons. We further show that suppressing the firing of these neurons across the transition prevents normal stimulus-stimulus learning. These results establish that the acquisition of model-based information about transitions between nonrewarding events is also driven by prediction errors and that, contrary to existing canon, dopamine transients are both sufficient and necessary to support this type of learning. Our findings open new possibilities for how these biological signals might support associative learning in the mammalian brain in these and other contexts.
Collapse
|
45
|
Abstract
Retrieving a memory can modify its influence on subsequent behavior. We develop a computational theory of memory modification, according to which modification of a memory trace occurs through classical associative learning, but which memory trace is eligible for modification depends on a structure learning mechanism that discovers the units of association by segmenting the stream of experience into statistically distinct clusters (latent causes). New memories are formed when the structure learning mechanism infers that a new latent cause underlies current sensory observations. By the same token, old memories are modified when old and new sensory observations are inferred to have been generated by the same latent cause. We derive this framework from probabilistic principles, and present a computational implementation. Simulations demonstrate that our model can reproduce the major experimental findings from studies of memory modification in the Pavlovian conditioning literature. DOI:http://dx.doi.org/10.7554/eLife.23763.001 Our memories contain our expectations about the world that we can retrieve to make predictions about the future. For example, most people would expect a chocolate bar to taste good, because they have previously learned to associate chocolate with pleasure. When a surprising event occurs, such as tasting an unpalatable chocolate bar, the brain therefore faces a dilemma. Should it update the existing memory and overwrite the association between chocolate and pleasure? Or should it create an additional memory? In the latter case, the brain would form a new association between chocolate and displeasure that competes with, but does not overwrite, the original one between chocolate and pleasure. Previous studies have shown that surprising events tend to create new memories unless the existing memory is briefly reactivated before the surprising event occurs. In other words, retrieving old memories makes them more malleable. Gershman et al. have now developed a computational model for how the brain decides whether to update an old memory or create a new one. The idea at the heart of the model is that the brain will attempt to infer what caused the surprising event. The reason the chocolate bar tastes unpalatable, for example, might be because it was old and had spoiled. Every time the brain infers a new possible cause for a surprising event, it will create an additional memory to store this new set of expectations. In the future we will know that spoiled chocolate bars taste bad. However, if the brain cannot infer a new cause for the surprising event – because, for example, there appears to be nothing unusual about the unpalatable chocolate bar – it will instead opt to update the existing memory. The next time we buy a chocolate bar, we will have slightly lower expectations about how good it will taste. The dilemma of whether to update an existing memory or create a new one thus boils down to the question: is the surprising event the consequence of a new cause or an old one? This theory implies that retrieving a memory nudges the brain to infer that its associated cause is once again active and, since this is an old cause, it means that the memory will be eligible for updating. Many experiments have been performed on the topic of modifying memories, but this is the first computational model that offers a unifying explanation for the results. The next step is to work out how to apply the model, which is phrased in abstract terms, to networks of neurons that are more biologically realistic. DOI:http://dx.doi.org/10.7554/eLife.23763.002
Collapse
|
46
|
Reconsolidation-Extinction Interactions in Fear Memory Attenuation: The Role of Inter-Trial Interval Variability. Front Behav Neurosci 2017; 11:2. [PMID: 28174526 PMCID: PMC5258753 DOI: 10.3389/fnbeh.2017.00002] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Accepted: 01/04/2017] [Indexed: 01/28/2023] Open
Abstract
Fear extinction typically results in the formation of a new inhibitory memory that suppresses the original conditioned response. Evidence also suggests that extinction training during a retrieval-induced labile period results in integration of the extinction memory into the original fear memory, rendering the fear memory less susceptible to reinstatement. Here we investigated the parameters by which the retrieval-extinction paradigm was most effective in memory updating. Specifically, we manipulated the inter-trial intervals (ITIs) between conditional stimulus (CS) presentations during extinction, examining how having interval lengths with different degrees of variability affected the strength of memory updating. We showed that randomizing the ITI of CS presentations during extinction led to less return of fear via reinstatement than extinction with a fixed ITI. Subjects who received variable ITIs during extinction also showed higher freezing during the ITI, indicating that the randomization of CS presentations led to a higher general reactivity during extinction, which may be one potential mechanism for memory updating.
Collapse
|
47
|
Do You See the Forest or the Tree? Neural Gain and Breadth Versus Focus in Perceptual Processing. Psychol Sci 2016; 27:1632-1643. [DOI: 10.1177/0956797616665578] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
When perceiving rich sensory information, some people may integrate its various aspects, whereas other people may selectively focus on its most salient aspects. We propose that neural gain modulates the trade-off between breadth and selectivity, such that high gain focuses perception on those aspects of the information that have the strongest, most immediate influence, whereas low gain allows broader integration of different aspects. We illustrate our hypothesis using a neural-network model of ambiguous-letter perception. We then report an experiment demonstrating that, as predicted by the model, pupil-diameter indices of higher gain are associated with letter perception that is more selectively focused on the letter’s shape or, if primed, its semantic content. Finally, we report a recognition-memory experiment showing that the relationship between gain and selective processing also applies when the influence of different stimulus features is voluntarily modulated by task demands.
Collapse
|
48
|
Abstract
To many, the poster child for David Marr's famous three levels of scientific inquiry is reinforcement learning-a computational theory of reward optimization, which readily prescribes algorithmic solutions that evidence striking resemblance to signals found in the brain, suggesting a straightforward neural implementation. Here we review questions that remain open at each level of analysis, concluding that the path forward to their resolution calls for inspiration across levels, rather than a focus on mutual constraints.
Collapse
|
49
|
The effects of aging on the interaction between reinforcement learning and attention. Psychol Aging 2016; 31:747-757. [PMID: 27599017 DOI: 10.1037/pag0000112] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Reinforcement learning (RL) in complex environments relies on selective attention to uncover those aspects of the environment that are most predictive of reward. Whereas previous work has focused on age-related changes in RL, it is not known whether older adults learn differently from younger adults when selective attention is required. In 2 experiments, we examined how aging affects the interaction between RL and selective attention. Younger and older adults performed a learning task in which only 1 stimulus dimension was relevant to predicting reward, and within it, 1 "target" feature was the most rewarding. Participants had to discover this target feature through trial and error. In Experiment 1, stimuli varied on 1 or 3 dimensions and participants received hints that revealed the target feature, the relevant dimension, or gave no information. Group-related differences in accuracy and RTs differed systematically as a function of the number of dimensions and the type of hint available. In Experiment 2 we used trial-by-trial computational modeling of the learning process to test for age-related differences in learning strategies. Behavior of both young and older adults was explained well by a reinforcement-learning model that uses selective attention to constrain learning. However, the model suggested that older adults restricted their learning to fewer features, employing more focused attention than younger adults. Furthermore, this difference in strategy predicted age-related deficits in accuracy. We discuss these results suggesting that a narrower filter of attention may reflect an adaptation to the reduced capabilities of the reinforcement learning system. (PsycINFO Database Record
Collapse
|
50
|
Temporal Specificity of Reward Prediction Errors Signaled by Putative Dopamine Neurons in Rat VTA Depends on Ventral Striatum. Neuron 2016; 91:182-93. [PMID: 27292535 DOI: 10.1016/j.neuron.2016.05.015] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2015] [Revised: 03/30/2016] [Accepted: 04/27/2016] [Indexed: 10/21/2022]
Abstract
Dopamine neurons signal reward prediction errors. This requires accurate reward predictions. It has been suggested that the ventral striatum provides these predictions. Here we tested this hypothesis by recording from putative dopamine neurons in the VTA of rats performing a task in which prediction errors were induced by shifting reward timing or number. In controls, the neurons exhibited error signals in response to both manipulations. However, dopamine neurons in rats with ipsilateral ventral striatal lesions exhibited errors only to changes in number and failed to respond to changes in timing of reward. These results, supported by computational modeling, indicate that predictions about the temporal specificity and the number of expected reward are dissociable and that dopaminergic prediction-error signals rely on the ventral striatum for the former but not the latter.
Collapse
|