1
|
Kalbe F, Schwabe L. Prediction Errors for Aversive Events Shape Long-Term Memory Formation through a Distinct Neural Mechanism. Cereb Cortex 2021; 32:3081-3097. [PMID: 34849622 DOI: 10.1093/cercor/bhab402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Revised: 09/09/2021] [Accepted: 10/12/2021] [Indexed: 11/13/2022] Open
Abstract
Prediction errors (PEs) have been known for decades to guide associative learning, but their role in episodic memory formation has been discovered only recently. To identify the neural mechanisms underlying the impact of aversive PEs on long-term memory formation, we used functional magnetic resonance imaging, while participants saw a series of unique stimuli and estimated the probability that an aversive shock would follow. Our behavioral data showed that negative PEs (i.e., omission of an expected outcome) were associated with superior recognition of the predictive stimuli, whereas positive PEs (i.e., presentation of an unexpected outcome) impaired subsequent memory. While medial temporal lobe (MTL) activity during stimulus encoding was overall associated with enhanced memory, memory-enhancing effects of negative PEs were linked to even decreased MTL activation. Additional large-scale network analyses showed PE-related increases in crosstalk between the "salience network" and a frontoparietal network commonly implicated in memory formation for expectancy-congruent events. These effects could not be explained by mere changes in physiological arousal or the prediction itself. Our results suggest that the superior memory for events associated with negative aversive PEs is driven by a potentially distinct neural mechanism that might serve to set these memories apart from those with expected outcomes.
Collapse
Affiliation(s)
- Felix Kalbe
- Department of Cognitive Psychology, Institute of Psychology, Universität Hamburg, Hamburg 20146, Germany
| | - Lars Schwabe
- Department of Cognitive Psychology, Institute of Psychology, Universität Hamburg, Hamburg 20146, Germany
| |
Collapse
|
2
|
Cutler J, Wittmann MK, Abdurahman A, Hargitai LD, Drew D, Husain M, Lockwood PL. Ageing is associated with disrupted reinforcement learning whilst learning to help others is preserved. Nat Commun 2021; 12:4440. [PMID: 34290236 PMCID: PMC8295324 DOI: 10.1038/s41467-021-24576-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 06/25/2021] [Indexed: 12/23/2022] Open
Abstract
Reinforcement learning is a fundamental mechanism displayed by many species. However, adaptive behaviour depends not only on learning about actions and outcomes that affect ourselves, but also those that affect others. Using computational reinforcement learning models, we tested whether young (age 18-36) and older (age 60-80, total n = 152) adults learn to gain rewards for themselves, another person (prosocial), or neither individual (control). Detailed model comparison showed that a model with separate learning rates for each recipient best explained behaviour. Young adults learned faster when their actions benefitted themselves, compared to others. Compared to young adults, older adults showed reduced self-relevant learning rates but preserved prosocial learning. Moreover, levels of subclinical self-reported psychopathic traits (including lack of concern for others) were lower in older adults and the core affective-interpersonal component of this measure negatively correlated with prosocial learning. These findings suggest learning to benefit others is preserved across the lifespan with implications for reinforcement learning and theories of healthy ageing.
Collapse
Affiliation(s)
- Jo Cutler
- Centre for Human Brain Health and Institute for Mental Health, School of Psychology, University of Birmingham, Birmingham, UK.
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, UK.
- Department of Experimental Psychology, University of Oxford, Oxford, UK.
| | - Marco K Wittmann
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, UK
- Department of Experimental Psychology, University of Oxford, Oxford, UK
| | - Ayat Abdurahman
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, UK
- Department of Experimental Psychology, University of Oxford, Oxford, UK
- Department of Psychology, University of Cambridge, Cambridge, UK
| | - Luca D Hargitai
- Department of Experimental Psychology, University of Oxford, Oxford, UK
| | - Daniel Drew
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, UK
- Department of Experimental Psychology, University of Oxford, Oxford, UK
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, UK
| | - Masud Husain
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, UK
- Department of Experimental Psychology, University of Oxford, Oxford, UK
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, UK
| | - Patricia L Lockwood
- Centre for Human Brain Health and Institute for Mental Health, School of Psychology, University of Birmingham, Birmingham, UK.
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, UK.
- Department of Experimental Psychology, University of Oxford, Oxford, UK.
- Christ Church, University of Oxford, Oxford, UK.
| |
Collapse
|
3
|
Metha JA, Brian ML, Oberrauch S, Barnes SA, Featherby TJ, Bossaerts P, Murawski C, Hoyer D, Jacobson LH. Separating Probability and Reversal Learning in a Novel Probabilistic Reversal Learning Task for Mice. Front Behav Neurosci 2020; 13:270. [PMID: 31998088 PMCID: PMC6962304 DOI: 10.3389/fnbeh.2019.00270] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Accepted: 11/27/2019] [Indexed: 11/13/2022] Open
Abstract
The exploration/exploitation tradeoff – pursuing a known reward vs. sampling from lesser known options in the hope of finding a better payoff – is a fundamental aspect of learning and decision making. In humans, this has been studied using multi-armed bandit tasks. The same processes have also been studied using simplified probabilistic reversal learning (PRL) tasks with binary choices. Our investigations suggest that protocols previously used to explore PRL in mice may prove beyond their cognitive capacities, with animals performing at a no-better-than-chance level. We sought a novel probabilistic learning task to improve behavioral responding in mice, whilst allowing the investigation of the exploration/exploitation tradeoff in decision making. To achieve this, we developed a two-lever operant chamber task with levers corresponding to different probabilities (high/low) of receiving a saccharin reward, reversing the reward contingencies associated with levers once animals reached a threshold of 80% responding at the high rewarding lever. We found that, unlike in existing PRL tasks, mice are able to learn and behave near optimally with 80% high/20% low reward probabilities. Altering the reward contingencies towards equality showed that some mice displayed preference for the high rewarding lever with probabilities as close as 60% high/40% low. Additionally, we show that animal choice behavior can be effectively modelled using reinforcement learning (RL) models incorporating learning rates for positive and negative prediction error, a perseveration parameter, and a noise parameter. This new decision task, coupled with RL analyses, advances access to investigate the neuroscience of the exploration/exploitation tradeoff in decision making.
Collapse
Affiliation(s)
- Jeremy A Metha
- Sleep and Cognition, The Florey Institute of Neuroscience and Mental Health, Parkville, VIC, Australia.,Translational Neuroscience, Department of Pharmacology and Therapeutics, School of Biomedical Sciences, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Parkville, VIC, Australia.,Brain, Mind and Markets Laboratory, Department of Finance, Faculty of Business and Economics, The University of Melbourne, Parkville, VIC, Australia
| | - Maddison L Brian
- Sleep and Cognition, The Florey Institute of Neuroscience and Mental Health, Parkville, VIC, Australia.,Translational Neuroscience, Department of Pharmacology and Therapeutics, School of Biomedical Sciences, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Parkville, VIC, Australia
| | - Sara Oberrauch
- Sleep and Cognition, The Florey Institute of Neuroscience and Mental Health, Parkville, VIC, Australia.,Translational Neuroscience, Department of Pharmacology and Therapeutics, School of Biomedical Sciences, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Parkville, VIC, Australia
| | - Samuel A Barnes
- Department of Psychiatry, School of Medicine, University of California, San Diego, La Jolla, CA, United States
| | - Travis J Featherby
- Behavioral Core, The Florey Institute of Neuroscience and Mental Health, Parkville, VIC, Australia
| | - Peter Bossaerts
- Brain, Mind and Markets Laboratory, Department of Finance, Faculty of Business and Economics, The University of Melbourne, Parkville, VIC, Australia
| | - Carsten Murawski
- Brain, Mind and Markets Laboratory, Department of Finance, Faculty of Business and Economics, The University of Melbourne, Parkville, VIC, Australia
| | - Daniel Hoyer
- Sleep and Cognition, The Florey Institute of Neuroscience and Mental Health, Parkville, VIC, Australia.,Translational Neuroscience, Department of Pharmacology and Therapeutics, School of Biomedical Sciences, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Parkville, VIC, Australia.,Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA, United States
| | - Laura H Jacobson
- Sleep and Cognition, The Florey Institute of Neuroscience and Mental Health, Parkville, VIC, Australia.,Translational Neuroscience, Department of Pharmacology and Therapeutics, School of Biomedical Sciences, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Parkville, VIC, Australia
| |
Collapse
|
4
|
Brydevall M, Bennett D, Murawski C, Bode S. The neural encoding of information prediction errors during non-instrumental information seeking. Sci Rep 2018; 8:6134. [PMID: 29666461 PMCID: PMC5904167 DOI: 10.1038/s41598-018-24566-x] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2017] [Accepted: 04/06/2018] [Indexed: 02/04/2023] Open
Abstract
In a dynamic world, accurate beliefs about the environment are vital for survival, and individuals should therefore regularly seek out new information with which to update their beliefs. This aspect of behaviour is not well captured by standard theories of decision making, and the neural mechanisms of information seeking remain unclear. One recent theory posits that valuation of information results from representation of informative stimuli within canonical neural reward-processing circuits, even if that information lacks instrumental use. We investigated this question by recording EEG from twenty-three human participants performing a non-instrumental information-seeking task. In this task, participants could pay a monetary cost to receive advance information about the likelihood of receiving reward in a lottery at the end of each trial. Behavioural results showed that participants were willing to incur considerable monetary costs to acquire early but non-instrumental information. Analysis of the event-related potential elicited by informative cues revealed that the feedback-related negativity independently encoded both an information prediction error and a reward prediction error. These findings are consistent with the hypothesis that information seeking results from processing of information within neural reward circuits, and suggests that information may represent a distinct dimension of valuation in decision making under uncertainty.
Collapse
Affiliation(s)
- Maja Brydevall
- The University of Melbourne, School of Psychological Sciences, Parkville, 3010, Australia.,The University of Melbourne, Department of Finance, Parkville, 3010, Australia
| | - Daniel Bennett
- The University of Melbourne, School of Psychological Sciences, Parkville, 3010, Australia. .,The University of Melbourne, Department of Finance, Parkville, 3010, Australia.
| | - Carsten Murawski
- The University of Melbourne, Department of Finance, Parkville, 3010, Australia
| | - Stefan Bode
- The University of Melbourne, School of Psychological Sciences, Parkville, 3010, Australia
| |
Collapse
|
5
|
Lee JC, Mueller KL, Tomblin JB. Examining Procedural Learning and Corticostriatal Pathways for Individual Differences in Language: Testing Endophenotypes of DRD2/ANKK1. LANGUAGE, COGNITION AND NEUROSCIENCE 2016; 31:1098-1114. [PMID: 31768398 PMCID: PMC6876848 DOI: 10.1080/23273798.2015.1089359] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
The aim of the study was to explore whether genetic variation in the dopaminergic system is associated with procedural learning and the corticostriatal pathways in individuals with developmental language impairment (DLI). We viewed these two systems as endophenotypes and hypothesized that they would be more sensitive indicators of genetic effects than the language phenotype itself. Thus, we genotyped two SNPs in the DRD2/ANKK1 gene complex, and tested for their associations to the phenotype of DLI and the two endophenotypes. Results showed that individuals with DLI revealed poor procedural learning abilities and abnormal structures of the basal ganglia. Genetic variation in DRD2/ANKK1 was associated with procedural learning abilities and with microstructural differences of the caudate nucleus. The association of the language phenotype with these DRD2/ANKK1 polymorphisms was not significant, but the phenotype was significantly associated with the two endophenotypes. We suggest that procedural learning and the corticostriatal pathways could be used as effective endophenotypes to aid molecular genetic studies searching for genes predisposing to DLI.
Collapse
Affiliation(s)
- Joanna C. Lee
- Department of Communication Sciences and Disorders, University of Iowa, Iowa City, IA, USA
| | - Kathryn L. Mueller
- Department of Communication Sciences and Disorders, University of Iowa, Iowa City, IA, USA
| | - J. Bruce Tomblin
- Department of Communication Sciences and Disorders, University of Iowa, Iowa City, IA, USA
| |
Collapse
|
6
|
Lim MSM, Jocham G, Hunt LT, Behrens TEJ, Rogers RD. Impulsivity and predictive control are associated with suboptimal action-selection and action-value learning in regular gamblers. INTERNATIONAL GAMBLING STUDIES 2015; 15:489-505. [PMID: 27274706 DOI: 10.1080/14459795.2015.1078835] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Heightened impulsivity and cognitive biases are risk factors for gambling problems. However, little is known about precisely how these factors increase the risks of gambling-related harm in vulnerable individuals. Here, we modelled the behaviour of eighty-seven community-recruited regular, but not clinically problematic, gamblers during a binary-choice reinforcement-learning game, to characterise the relationships between impulsivity, cognitive biases, and the capacity to make optimal action selections and learn about action-values. Impulsive gamblers showed diminished use of an optimal (Bayesian-derived) probability estimate when selecting between candidate actions, and showed slower learning rates and enhanced non-linear probability weighting while learning action values. Critically, gamblers who believed that it is possible to predict winning outcomes (as 'predictive control') failed to use the game's reinforcement history to guide their action selections. Extensive evidence attests to the ease with which gamblers can erroneously perceive structure in the reinforcement history of games when there is none. Our findings demonstrate that the generic and specific risk factors of impulsivity and cognitive biases can interfere with the capacity of some gamblers to utilise structure when it is available in the reinforcement history of games, potentially increasing their risks of sustaining gambling-related harms.
Collapse
Affiliation(s)
- Matthew S M Lim
- Research Department of Clinical, Educational and Health Psychology, University College London, UK
| | - Gerhard Jocham
- Centre for Functional Magnetic Resonance Imaging of the Brain (fMRIB), University of Oxford, John Radcliffe Hospital, Oxford, UK
| | - Laurence T Hunt
- Centre for Functional Magnetic Resonance Imaging of the Brain (fMRIB), University of Oxford, John Radcliffe Hospital, Oxford, UK
| | - Timothy E J Behrens
- Centre for Functional Magnetic Resonance Imaging of the Brain (fMRIB), University of Oxford, John Radcliffe Hospital, Oxford, UK
| | | |
Collapse
|
7
|
Apitz T, Bunzeck N. Early effects of reward anticipation are modulated by dopaminergic stimulation. PLoS One 2014; 9:e108886. [PMID: 25285436 PMCID: PMC4186816 DOI: 10.1371/journal.pone.0108886] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2014] [Accepted: 09/04/2014] [Indexed: 11/18/2022] Open
Abstract
The abilities to predict future rewards and assess the value of reward delivery are crucial aspects of adaptive behavior. While the mesolimbic system, including dopaminergic midbrain, ventral striatum and prefrontal cortex have long been associated with reward processing, recent studies also indicate a prominent role of early visual brain regions. However, the precise underlying neural mechanisms still remain unclear. To address this issue, we presented participants with visual cues predicting rewards of high and low magnitudes and probability (2×2 factorial design), while neural activity was scanned using magnetoencephalography. Importantly, one group of participants received 150 mg of the dopamine precursor levodopa prior to the experiment, while another group received a placebo. For the placebo group, neural signals of reward probability (but not magnitude) emerged at ∼100 ms after cue presentation at occipital sensors in the event-related magnetic fields. Importantly, these probability signals were absent in the levodopa group indicating a close link. Moreover, levodopa administration reduced oscillatory power in the high (20–30 Hz) and low (13–20 Hz) beta band during both reward anticipation and delivery. Taken together, our findings indicate that visual brain regions are involved in coding prospective reward probability but not magnitude and that these effects are modulated by dopamine.
Collapse
Affiliation(s)
- Thore Apitz
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Nico Bunzeck
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Department of Psychology, University of Lübeck, Lübeck, Germany
- * E-mail:
| |
Collapse
|
8
|
Daniel R, Pollmann S. A universal role of the ventral striatum in reward-based learning: evidence from human studies. Neurobiol Learn Mem 2014; 114:90-100. [PMID: 24825620 DOI: 10.1016/j.nlm.2014.05.002] [Citation(s) in RCA: 104] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2013] [Revised: 05/01/2014] [Accepted: 05/03/2014] [Indexed: 10/25/2022]
Abstract
Reinforcement learning enables organisms to adjust their behavior in order to maximize rewards. Electrophysiological recordings of dopaminergic midbrain neurons have shown that they code the difference between actual and predicted rewards, i.e., the reward prediction error, in many species. This error signal is conveyed to both the striatum and cortical areas and is thought to play a central role in learning to optimize behavior. However, in human daily life rewards are diverse and often only indirect feedback is available. Here we explore the range of rewards that are processed by the dopaminergic system in human participants, and examine whether it is also involved in learning in the absence of explicit rewards. While results from electrophysiological recordings in humans are sparse, evidence linking dopaminergic activity to the metabolic signal recorded from the midbrain and striatum with functional magnetic resonance imaging (fMRI) is available. Results from fMRI studies suggest that the human ventral striatum (VS) receives valuation information for a diverse set of rewarding stimuli. These range from simple primary reinforcers such as juice rewards over abstract social rewards to internally generated signals on perceived correctness, suggesting that the VS is involved in learning from trial-and-error irrespective of the specific nature of provided rewards. In addition, we summarize evidence that the VS can also be implicated when learning from observing others, and in tasks that go beyond simple stimulus-action-outcome learning, indicating that the reward system is also recruited in more complex learning tasks.
Collapse
Affiliation(s)
- Reka Daniel
- Department of Experimental Psychology, Otto-von-Guericke-Universität Magdeburg, D-39016 Magdeburg, Germany; Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08540, USA.
| | - Stefan Pollmann
- Department of Experimental Psychology, Otto-von-Guericke-Universität Magdeburg, D-39016 Magdeburg, Germany; Center for Behavioral Brain Sciences, D-39016 Magdeburg, Germany
| |
Collapse
|
9
|
Learning from feedback: The neural mechanisms of feedback processing facilitating better performance. Behav Brain Res 2014; 261:356-68. [DOI: 10.1016/j.bbr.2013.12.043] [Citation(s) in RCA: 112] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2013] [Revised: 12/24/2013] [Accepted: 12/26/2013] [Indexed: 11/21/2022]
|
10
|
Hillman KL. Cost-benefit analysis: the first real rule of fight club? Front Neurosci 2013; 7:248. [PMID: 24391531 PMCID: PMC3867679 DOI: 10.3389/fnins.2013.00248] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2013] [Accepted: 12/04/2013] [Indexed: 11/16/2022] Open
Abstract
Competition is ubiquitous among social animals. Vying against a conspecific to achieve a particular outcome often requires one to act aggressively, but this is a costly and inherently risky behavior. So why do we aggressively compete, or at the extreme, fight against others? Early work suggested that competitive aggression might stem from an innate aggressive tendency, emanating from subcortical structures. Later work highlighted key cortical regions that contribute toward an instrumental aggression network, one that is recruited or suppressed as needed to achieve a goal. Recent neuroimaging work hints that competitive aggression is upmost a cost-benefit decision, in that it appears to recruit many components of traditional, non-social decision-making networks. This review provides a historical glimpse into the neuroscience of competitive aggression, and proposes a conceptual advancement for studying competitive behavior by outlining how utility calculations of contested-for resources are skewed, pre- and post-competition. A basic multi-factorial model of utility assessment is proposed to account for competitive endowment effects that stem from the presence of peers, peer salience and disposition, and the tactical effort required for victory. In part, competitive aggression is a learned behavior that should only be repeated if positive outcomes are achieved. However, due to skewed utility assessments, deviations of associative learning occur. Hence truly careful cost-benefit analysis is warranted before choosing to vie against another.
Collapse
|
11
|
High-learners present larger mid-frontal theta power and connectivity in response to incorrect performance feedback. J Neurosci 2013; 33:2029-38. [PMID: 23365240 DOI: 10.1523/jneurosci.2565-12.2013] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
A crucial aspect of cognitive control and learning is the ability to integrate feedback, that is, to evaluate action outcomes and their deviations from the intended goals and to adjust behavior accordingly. However, how high-learners differ from low-learners in relation to feedback processing has not been characterized. Further, little is known about the underlying brain connectivity patterns during feedback processing. This study aimed to fill these gaps by analyzing electrical brain responses from healthy adult human participants while they performed a time estimation task with correct and incorrect feedback. As compared with low-learners, high-learners presented larger mid-frontal theta (4-8 Hz) oscillations and lower sensorimotor beta (17-24 Hz) oscillations in response to incorrect feedback. Further, high-learners showed larger theta connectivity from left central, associated with motor activity, to mid-frontal, associated with performance monitoring, immediately after feedback (0-0.3 s), followed by (from 0.3 to 0.6 s after feedback) a flux from mid-frontal to prefrontal, associated with executive functioning. We suggest that these results reflect two cognitive processes related to successful feedback processing: first, the obtained feedback is compared with the expected one, and second, the feedback history is updated based on this information. Our results also indicate that high- and low-learners differ not only on how they react to incorrect feedback, but also in relation to how their distant brain areas interact while processing both correct and incorrect feedback. This study demonstrates the neural underpinnings of individual differences in goal-directed adaptive behavior.
Collapse
|
12
|
Neural correlates of reinforcement learning and social preferences in competitive bidding. J Neurosci 2013; 33:2137-46. [PMID: 23365249 DOI: 10.1523/jneurosci.3095-12.2013] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
In competitive social environments, people often deviate from what rational choice theory prescribes, resulting in losses or suboptimal monetary gains. We investigate how competition affects learning and decision-making in a common value auction task. During the experiment, groups of five human participants were simultaneously scanned using MRI while playing the auction task. We first demonstrate that bidding is well characterized by reinforcement learning with biased reward representations dependent on social preferences. Indicative of reinforcement learning, we found that estimated trial-by-trial prediction errors correlated with activity in the striatum and ventromedial prefrontal cortex. Additionally, we found that individual differences in social preferences were related to activity in the temporal-parietal junction and anterior insula. Connectivity analyses suggest that monetary and social value signals are integrated in the ventromedial prefrontal cortex and striatum. Based on these results, we argue for a novel mechanistic account for the integration of reinforcement history and social preferences in competitive decision-making.
Collapse
|
13
|
From modulated Hebbian plasticity to simple behavior learning through noise and weight saturation. Neural Netw 2012; 34:28-41. [DOI: 10.1016/j.neunet.2012.06.005] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2011] [Revised: 06/08/2012] [Accepted: 06/17/2012] [Indexed: 11/21/2022]
|
14
|
Janssen CP, Gray WD. When, what, and how much to reward in reinforcement learning-based models of cognition. Cogn Sci 2012; 36:333-58. [PMID: 22257174 DOI: 10.1111/j.1551-6709.2011.01222.x] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Reinforcement learning approaches to cognitive modeling represent task acquisition as learning to choose the sequence of steps that accomplishes the task while maximizing a reward. However, an apparently unrecognized problem for modelers is choosing when, what, and how much to reward; that is, when (the moment: end of trial, subtask, or some other interval of task performance), what (the objective function: e.g., performance time or performance accuracy), and how much (the magnitude: with binary, categorical, or continuous values). In this article, we explore the problem space of these three parameters in the context of a task whose completion entails some combination of 36 state-action pairs, where all intermediate states (i.e., after the initial state and prior to the end state) represent progressive but partial completion of the task. Different choices produce profoundly different learning paths and outcomes, with the strongest effect for moment. Unfortunately, there is little discussion in the literature of the effect of such choices. This absence is disappointing, as the choice of when, what, and how much needs to be made by a modeler for every learning model.
Collapse
Affiliation(s)
- Christian P Janssen
- UCL Interaction Centre, University College London, Gower Street, London, UK.
| | | |
Collapse
|
15
|
Morris SE, Holroyd CB, Mann-Wrobel MC, Gold JM. Dissociation of response and feedback negativity in schizophrenia: electrophysiological and computational evidence for a deficit in the representation of value. Front Hum Neurosci 2011; 5:123. [PMID: 22065618 PMCID: PMC3203413 DOI: 10.3389/fnhum.2011.00123] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2011] [Accepted: 10/10/2011] [Indexed: 11/20/2022] Open
Abstract
Contrasting theories of schizophrenia propose that the disorder is characterized by a deficit in phasic changes in dopamine activity in response to ongoing events or, alternatively, by a weakness in the representation of the value of responses. Schizophrenia patients have reliably reduced brain activity following incorrect responses but other research suggests that they may have intact feedback-related potentials, indicating that the impairment may be specifically response-related. We used event-related brain potentials and computational modeling to examine this issue by comparing the neural response to outcomes with the neural response to behaviors that predict outcomes in patients with schizophrenia and psychiatrically healthy comparison subjects. We recorded feedback-related activity in a passive gambling task and a time estimation task and error-related activity in a flanker task. Patients' brain activity following an erroneous response was reduced compared to comparison subjects but feedback-related activity did not differ between groups. To test hypotheses about the possible causes of this pattern of results, we used computational modeling of the electrophysiological data to simulate the effects of an overall reduction in patients' sensitivity to feedback, selective insensitivity to positive or negative feedback, reduced learning rate, and a decreased representation of the value of the response given the stimulus on each trial. The results of the computational modeling suggest that schizophrenia patients exhibit weakened representation of response values, possibly due to failure of the basal ganglia to strongly associate stimuli with appropriate response alternatives.
Collapse
Affiliation(s)
- Sarah E. Morris
- VISN 5 Mental Illness Research, Education, and Clinical CenterBaltimore, MD, USA
- Department of Psychiatry, University of Maryland School of MedicineBaltimore, MD, USA
| | - Clay B. Holroyd
- Department of Psychology, University of VictoriaVictoria, BC, Canada
| | | | - James M. Gold
- Maryland Psychiatric Research Center, University of Maryland School of MedicineCatonsville, MD, USA
| |
Collapse
|
16
|
Rametti G, Carrillo B, Gómez-Gil E, Junque C, Zubiarre-Elorza L, Segovia S, Gomez Á, Guillamon A. The microstructure of white matter in male to female transsexuals before cross-sex hormonal treatment. A DTI study. J Psychiatr Res 2011; 45:949-54. [PMID: 21195418 DOI: 10.1016/j.jpsychires.2010.11.007] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/03/2010] [Revised: 10/26/2010] [Accepted: 11/10/2010] [Indexed: 11/24/2022]
Abstract
BACKGROUND Diffusion tensor imaging (DTI) has been shown to be sensitive in detecting white matter differences between sexes. Before cross-sex hormone treatment female to male transsexuals (FtM) differ from females but not from males in several brain fibers. The purpose of this paper is to investigate whether white matter patterns in male to female (MtF) transsexuals before commencing cross-sex hormone treatment are also more similar to those of their biological sex or whether they are more similar to those of their gender identity. METHOD DTI was performed in 18 MtF transsexuals and 19 male and 19 female controls scanned with a 3 T Trio Tim Magneton. Fractional anisotropy (FA) was performed on white matter of the whole brain, which was spatially analyzed using Tract-Based Spatial Statistics. RESULTS MtF transsexuals differed from both male and female controls bilaterally in the superior longitudinal fasciculus, the right anterior cingulum, the right forceps minor, and the right corticospinal tract. CONCLUSIONS Our results show that the white matter microstructure pattern in untreated MtF transsexuals falls halfway between the pattern of male and female controls. The nature of these differences suggests that some fasciculi do not complete the masculinization process in MtF transsexuals during brain development.
Collapse
|
17
|
Mars RB, Shea NJ, Kolling N, Rushworth MFS. Model-based analyses: Promises, pitfalls, and example applications to the study of cognitive control. Q J Exp Psychol (Hove) 2011; 65:252-67. [PMID: 20437297 PMCID: PMC3335278 DOI: 10.1080/17470211003668272] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
We discuss a recent approach to investigating cognitive control, which has the potential to deal with some of the challenges inherent in this endeavour. In a model-based approach, the researcher defines a formal, computational model that performs the task at hand and whose performance matches that of a research participant. The internal variables in such a model might then be taken as proxies for latent variables computed in the brain. We discuss the potential advantages of such an approach for the study of the neural underpinnings of cognitive control and its pitfalls, and we make explicit the assumptions underlying the interpretation of data obtained using this approach.
Collapse
Affiliation(s)
- Rogier B Mars
- Department of Experimental Psychology, University of Oxford, Oxford, UK.
| | | | | | | |
Collapse
|
18
|
Mahmoudi B, Sanchez JC. A symbiotic brain-machine interface through value-based decision making. PLoS One 2011; 6:e14760. [PMID: 21423797 PMCID: PMC3056711 DOI: 10.1371/journal.pone.0014760] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2010] [Accepted: 01/23/2011] [Indexed: 12/14/2022] Open
Abstract
Background In the development of Brain Machine Interfaces (BMIs), there is a great need to enable users to interact with changing environments during the activities of daily life. It is expected that the number and scope of the learning tasks encountered during interaction with the environment as well as the pattern of brain activity will vary over time. These conditions, in addition to neural reorganization, pose a challenge to decoding neural commands for BMIs. We have developed a new BMI framework in which a computational agent symbiotically decoded users' intended actions by utilizing both motor commands and goal information directly from the brain through a continuous Perception-Action-Reward Cycle (PARC). Methodology The control architecture designed was based on Actor-Critic learning, which is a PARC-based reinforcement learning method. Our neurophysiology studies in rat models suggested that Nucleus Accumbens (NAcc) contained a rich representation of goal information in terms of predicting the probability of earning reward and it could be translated into an evaluative feedback for adaptation of the decoder with high precision. Simulated neural control experiments showed that the system was able to maintain high performance in decoding neural motor commands during novel tasks or in the presence of reorganization in the neural input. We then implanted a dual micro-wire array in the primary motor cortex (M1) and the NAcc of rat brain and implemented a full closed-loop system in which robot actions were decoded from the single unit activity in M1 based on an evaluative feedback that was estimated from NAcc. Conclusions Our results suggest that adapting the BMI decoder with an evaluative feedback that is directly extracted from the brain is a possible solution to the problem of operating BMIs in changing environments with dynamic neural signals. During closed-loop control, the agent was able to solve a reaching task by capturing the action and reward interdependency in the brain.
Collapse
Affiliation(s)
- Babak Mahmoudi
- Department of Biomedical Engineering, University of Miami, Coral Gables, Florida, United States of America.
| | | |
Collapse
|
19
|
Decoding different roles for vmPFC and dlPFC in multi-attribute decision making. Neuroimage 2010; 56:709-15. [PMID: 20510371 DOI: 10.1016/j.neuroimage.2010.05.058] [Citation(s) in RCA: 115] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2010] [Revised: 04/30/2010] [Accepted: 05/20/2010] [Indexed: 11/21/2022] Open
Abstract
In everyday life, successful decision making requires precise representations of expected values. However, for most behavioral options more than one attribute can be relevant in order to predict the expected reward. Thus, to make good or even optimal choices the reward predictions of multiple attributes need to be integrated into a combined expected value. Importantly, the individual attributes of such multi-attribute objects can agree or disagree in their reward prediction. Here we address where the brain encodes the combined reward prediction (averaged across attributes) and where it encodes the variability of the value predictions of the individual attributes. We acquired fMRI data while subjects performed a task in which they had to integrate reward predictions from multiple attributes into a combined value. Using time-resolved pattern recognition techniques (support vector regression) we find that (1) the combined value is encoded in distributed fMRI patterns in the ventromedial prefrontal cortex (vmPFC) and that (2) the variability of value predictions of the individual attributes is encoded in the dorsolateral prefrontal cortex (dlPFC). The combined value could be used to guide choices, whereas the variability of the value predictions of individual attributes indicates an ambiguity that results in an increased difficulty of the value-integration. These results demonstrate that the different features defining multi-attribute objects are encoded in non-overlapping brain regions and therefore suggest different roles for vmPFC and dlPFC in multi-attribute decision making.
Collapse
|
20
|
Hämmerer D, Li SC, Müller V, Lindenberger U. Life span differences in electrophysiological correlates of monitoring gains and losses during probabilistic reinforcement learning. J Cogn Neurosci 2010; 23:579-92. [PMID: 20377358 DOI: 10.1162/jocn.2010.21475] [Citation(s) in RCA: 136] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
By recording the feedback-related negativity (FRN) in response to gains and losses, we investigated the contribution of outcome monitoring mechanisms to age-associated differences in probabilistic reinforcement learning. Specifically, we assessed the difference of the monitoring reactions to gains and losses to investigate the monitoring of outcomes according to task-specific goals across the life span. The FRN and the behavioral indicators of learning were measured in a sample of 44 children, 45 adolescents, 46 younger adults, and 44 older adults. The amplitude of the FRN after gains and losses was found to decrease monotonically from childhood to old age. Furthermore, relative to adolescents and younger adults, both children and older adults (a) showed smaller differences between the FRN after losses and the FRN after gains, indicating a less differentiated classification of outcomes on the basis of task-specific goals; (b) needed more trials to learn from choice outcomes, particularly when differences in reward likelihood between the choices were small; and (c) learned less from gains than from losses. We suggest that the relatively greater loss sensitivity among children and older adults may reflect ontogenetic changes in dopaminergic neuromodulation.
Collapse
|
21
|
Seger CA, Peterson EJ, Cincotta CM, Lopez-Paniagua D, Anderson CW. Dissociating the contributions of independent corticostriatal systems to visual categorization learning through the use of reinforcement learning modeling and Granger causality modeling. Neuroimage 2009; 50:644-56. [PMID: 19969091 DOI: 10.1016/j.neuroimage.2009.11.083] [Citation(s) in RCA: 70] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2009] [Revised: 11/02/2009] [Accepted: 11/26/2009] [Indexed: 11/15/2022] Open
Abstract
We dissociated the contributions to learning of four corticostriatal "loops" (interacting striatal and cortical regions): motor (putamen and motor cortex), visual (posterior caudate and visual cortex), executive (anterior caudate and prefrontal cortex), and motivational (ventral striatum and ventromedial frontal cortex). Subjects learned to categorize individual repeated images into one of two arbitrary categories via trial and error. We identified (1) regions sensitive to correct categorization, categorization learning, and feedback valence; (2) regions sensitive to prediction error (violation of feedback expectancy) and reward prediction (expected feedback associated with category response) using reinforcement learning modeling; and (3) directed influences between regions using Granger causality modeling. Each loop showed a unique pattern of sensitivity to each of these factors. Both the motor and visual loops were involved in acquisition of categorization ability: activity during correct categorization increased across learning and was sensitive to reward prediction. However, the posterior caudate received directed influence from visual cortex, whereas the putamen exerted directed influence on motor cortex. The motivational and executive loops were involved in feedback processing: both regions were sensitive to feedback valence, which interacted with learning across scans. However, the motivational loop activity reflected prediction error, whereas executive loop activity reflected reward prediction, consistent with the executive loop role in integrating reward and action. Granger causality modeling found directed influences between striatal and cortical regions within each loop. Across loops, the motor loop exerted directed influence on the executive loop which is consistent with the role of the executive loop in integrating feedback with stimulus-response history.
Collapse
Affiliation(s)
- Carol A Seger
- Department of Psychology, Colorado State University, Fort Collins, CO 80523, USA.
| | | | | | | | | |
Collapse
|
22
|
van den Bos W, Güroğlu B, van den Bulk BG, Rombouts SARB, Crone EA. Better than expected or as bad as you thought? The neurocognitive development of probabilistic feedback processing. Front Hum Neurosci 2009; 3:52. [PMID: 20140268 PMCID: PMC2816174 DOI: 10.3389/neuro.09.052.2009] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2009] [Accepted: 11/03/2009] [Indexed: 11/30/2022] Open
Abstract
Learning from feedback lies at the foundation of adaptive behavior. Two prior neuroimaging studies have suggested that there are qualitative differences in how children and adults use feedback by demonstrating that dorsolateral prefrontal cortex (DLPFC) and parietal cortex were more active after negative feedback for adults, but after positive feedback for children. In the current study we used functional magnetic resonance imaging (fMRI) to test whether this difference is related to valence or informative value of the feedback by examining neural responses to negative and positive feedback while applying probabilistic rules. In total, 67 healthy volunteers between ages 8 and 22 participated in the study (8–11 years, n = 18; 13–16 years, n = 27; 18–22 years, n = 22). Behavioral comparisons showed that all participants were able to learn probabilistic rules equally well. DLPFC and dorsal anterior cingulate cortex were more active in younger children following positive feedback and in adults following negative feedback, but only when exploring alternative rules, not when applying the most advantageous rules. These findings suggest that developmental differences in neural responses to feedback are not related to valence per se, but that there is an age-related change in processing learning signals with different informative value.
Collapse
|
23
|
Cohen MX, van Gaal S, Ridderinkhof KR, Lamme VAF. Unconscious errors enhance prefrontal-occipital oscillatory synchrony. Front Hum Neurosci 2009; 3:54. [PMID: 19956401 PMCID: PMC2786300 DOI: 10.3389/neuro.09.054.2009] [Citation(s) in RCA: 91] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2009] [Accepted: 11/05/2009] [Indexed: 12/03/2022] Open
Abstract
The medial prefrontal cortex (MFC) is critical for our ability to learn from previous mistakes. Here we provide evidence that neurophysiological oscillatory long-range synchrony is a mechanism of post-error adaptation that occurs even without conscious awareness of the error. During a visually signaled Go/No-Go task in which half of the No-Go cues were masked and thus not consciously perceived, response errors enhanced tonic (i.e., over 1–2 s) oscillatory synchrony between MFC and occipital cortex (OCC) leading up to and during the subsequent trial. Spectral Granger causality analyses demonstrated that MFC → OCC directional synchrony was enhanced during trials following both conscious and unconscious errors, whereas transient stimulus-induced occipital → MFC directional synchrony was independent of errors in the previous trial. Further, the strength of pre-trial MFC-occipital synchrony predicted individual differences in task performance. Together, these findings suggest that synchronous neurophysiological oscillations are a plausible mechanism of MFC-driven cognitive control that is independent of conscious awareness.
Collapse
Affiliation(s)
- Michael X Cohen
- Amsterdam Center for the Study of Adaptive Control in Brain and Behavior, Department of Psychology, University of Amsterdam Amsterdam, The Netherlands.
| | | | | | | |
Collapse
|
24
|
Brain and autonomic association accompanying stochastic decision-making. Neuroimage 2009; 49:1024-37. [PMID: 19647796 DOI: 10.1016/j.neuroimage.2009.07.060] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2008] [Revised: 07/16/2009] [Accepted: 07/23/2009] [Indexed: 12/30/2022] Open
Abstract
To examine the functional association between brain and autonomic activities accompanying decision-making, we simultaneously recorded regional cerebral blood flow using (15)O-water positron emission tomography and event-related brain potentials (ERPs) time-locked to feedback of reward and punishment, as well as cardiovascular parameters, during a stochastic decision-making task. We manipulated the uncertainty of outcomes in the task; specifically, we compared a condition with high predictability of reward/punishment (contingent-reward condition) and a condition with low predictability of reward/punishment (random-reward condition). The anterior cingulate cortex (ACC) was commonly activated in both conditions. Compared with the contingent-reward condition, the orbitofrontal and right dorsolateral prefrontal cortices and dorsal striatum were activated in the random-reward condition, where subjects had to continue to seek contingency between stimuli and reward/punishment. Activation of these brain regions correlated with a positive component of ERPs locked to feedback signals (feedback-related positivity), which showed an association with behavioral decision-making in the contingent-reward condition. Furthermore, cardiovascular responses were attenuated in the random-reward condition, where continuous attention and contingency monitoring were needed, and such attenuation of cardiovascular responses was mediated by vagal activity that was governed by the rostral ACC. These findings suggest that the prefrontal-striatal network provides a neural basis for decision-making and modulation over the peripheral autonomic activity accompanying decision-making.
Collapse
|
25
|
Schmid M. Reinforcing Motor Re-Training and Rehabilitation through Games: A Machine-Learning Perspective. FRONTIERS IN NEUROENGINEERING 2009; 2:3. [PMID: 19430596 PMCID: PMC2679159 DOI: 10.3389/neuro.16.003.2009] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/24/2008] [Indexed: 01/19/2023]
Affiliation(s)
- Maurizio Schmid
- Department of Applied Electronics, Roma Tre University Rome, Italy
| |
Collapse
|
26
|
Ragland JD, Cools R, Frank M, Pizzagalli DA, Preston A, Ranganath C, Wagner AD. CNTRICS final task selection: long-term memory. Schizophr Bull 2009; 35:197-212. [PMID: 18927344 PMCID: PMC2643960 DOI: 10.1093/schbul/sbn134] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Long-term memory (LTM) is a multifactorial construct, composed of different stages of information processing and different cognitive operations that are mediated by distinct neural systems, some of which may be more responsible for the marked memory problems that limit the daily function of individuals with schizophrenia. From the outset of the CNTRICS initiative, this multidimensionality was appreciated, and an effort was made to identify the specific memory constructs and task paradigms that hold the most promise for immediate translational development. During the second CNTRICS meeting, the LTM group identified item encoding and retrieval and relational encoding and retrieval as key constructs. This article describes the process that the LTM group went through in the third and final CNTRICS meeting to select nominated tasks within the 2 LTM constructs and within a reinforcement learning construct that were judged most promising for immediate development. This discussion is followed by each nominating authors' description of their selected task paradigm, ending with some thoughts about future directions.
Collapse
Affiliation(s)
- John D. Ragland
- Department of Psychiatry and Behavioral Sciences, UC Davis Imaging Research Center, University of California at Davis, 4701 X Street, Sacramento, CA 95817,To whom correspondence should be addressed; tel: 916-734-5802, fax: 916-734-8750, e-mail:
| | | | | | | | - Alison Preston
- Department of Psychology and Center for Learning and Memory, University of Texas at Austin
| | | | - Anthony D. Wagner
- Department of Psychology and Neurosciences Program, Stanford University
| |
Collapse
|