51
|
Walsh MM, Anderson JR. Navigating complex decision spaces: Problems and paradigms in sequential choice. Psychol Bull 2014; 140:466-86. [PMID: 23834192 PMCID: PMC4309984 DOI: 10.1037/a0033455] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
To behave adaptively, we must learn from the consequences of our actions. Doing so is difficult when the consequences of an action follow a delay. This introduces the problem of temporal credit assignment. When feedback follows a sequence of decisions, how should the individual assign credit to the intermediate actions that comprise the sequence? Research in reinforcement learning provides 2 general solutions to this problem: model-free reinforcement learning and model-based reinforcement learning. In this review, we examine connections between stimulus-response and cognitive learning theories, habitual and goal-directed control, and model-free and model-based reinforcement learning. We then consider a range of problems related to temporal credit assignment. These include second-order conditioning and secondary reinforcers, latent learning and detour behavior, partially observable Markov decision processes, actions with distributed outcomes, and hierarchical learning. We ask whether humans and animals, when faced with these problems, behave in a manner consistent with reinforcement learning techniques. Throughout, we seek to identify neural substrates of model-free and model-based reinforcement learning. The former class of techniques is understood in terms of the neurotransmitter dopamine and its effects in the basal ganglia. The latter is understood in terms of a distributed network of regions including the prefrontal cortex, medial temporal lobes, cerebellum, and basal ganglia. Not only do reinforcement learning techniques have a natural interpretation in terms of human and animal behavior but they also provide a useful framework for understanding neural reward valuation and action selection.
Collapse
Affiliation(s)
- Matthew M. Walsh
- Air Force Research Laboratory, Wright-Patterson Air Force Base, OH 45433
| | - John R. Anderson
- Carnegie Mellon University, Department of Psychology, Pittsburgh, PA 15213
| |
Collapse
|
52
|
The topographical N170: Electrophysiological evidence of a neural mechanism for human spatial navigation. Biol Psychol 2013; 94:90-105. [DOI: 10.1016/j.biopsycho.2013.05.004] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2012] [Revised: 05/02/2013] [Accepted: 05/03/2013] [Indexed: 11/24/2022]
|
53
|
High temporal discounters overvalue immediate rewards rather than undervalue future rewards: an event-related brain potential study. COGNITIVE AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2013; 13:36-45. [PMID: 22983745 DOI: 10.3758/s13415-012-0122-x] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Impulsivity is characterized in part by heightened sensitivity to immediate relative to future rewards. Although previous research has suggested that "high discounters" in intertemporal choice tasks tend to prefer immediate over future rewards because they devalue the latter, it remains possible that they instead overvalue immediate rewards. To investigate this question, we recorded the reward positivity, a component of the event-related brain potential (ERP) associated with reward processing, with participants engaged in a task in which they received both immediate and future rewards and nonrewards. The participants also completed a temporal discounting task without ERP recording. We found that immediate but not future rewards elicited the reward positivity. High discounters also produced larger reward positivities to immediate rewards than did low discounters, indicating that high discounters relatively overvalued immediate rewards. These findings suggest that high discounters may be more motivated than low discounters to work for monetary rewards, irrespective of the time of arrival of the incentives.
Collapse
|
54
|
Schulreich S, Pfabigan DM, Derntl B, Sailer U. Fearless Dominance and reduced feedback-related negativity amplitudes in a time-estimation task - further neuroscientific evidence for dual-process models of psychopathy. Biol Psychol 2013; 93:352-63. [PMID: 23607997 PMCID: PMC3688084 DOI: 10.1016/j.biopsycho.2013.04.004] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2012] [Revised: 03/19/2013] [Accepted: 04/10/2013] [Indexed: 10/27/2022]
Abstract
Dual-process models of psychopathy postulate two etiologically relevant processes. Their involvement in feedback processing and its neural correlates has not been investigated so far. Multi-channel EEG was collected while healthy female volunteers performed a time-estimation task and received negative or positive feedback in form of signs or emotional faces. The affective-interpersonal factor Fearless Dominance, but not Self-Centered Impulsivity, was associated with reduced feedback-related negativity (FRN) amplitudes. This neural dissociation extends previous findings on the impact of psychopathy on feedback processing and further highlights the importance of distinguishing psychopathic traits and extending previous (neuroscientific) models of psychopathy.
Collapse
Affiliation(s)
- Stefan Schulreich
- Languages of Emotion, Cluster of Excellence at Freie Universität Berlin, 14195 Berlin, Germany.
| | | | | | | |
Collapse
|
55
|
Osinsky R, Mussel P, Ohrlein L, Hewig J. A neural signature of the creation of social evaluation. Soc Cogn Affect Neurosci 2013; 9:731-6. [PMID: 23547246 DOI: 10.1093/scan/nst051] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Previous research has shown that receiving an unfair monetary offer in economic bargaining elicits also-called feedback negativity (FN). This scalp-recorded brain potential probably reflects a bad-vs-good evaluation in the medial frontal cortex and has been linked to fundamental processes of reinforcement learning. In the present study, we investigated whether the evaluative mechanism indexed by the FN is also involved in learning who is an unfair vs fair bargaining partner. An electroencephalogram was recorded while participants completed a computerized version of the Ultimatum Game, repeatedly receiving fair or unfair monetary offers from alleged other participants. Some of these proposers were either always fair or always unfair in their offers. In each trial, participants first saw a portrait picture of the respective proposer before the monetary offer was presented. Therefore, the faces could be used as predictive cues for the fairness of the pending offers. We found that not only unfair offers themselves induced a FN, but also (over the task) faces of unfair proposers. Thus, when interaction partners repeatedly behave in an unfair way, their faces acquire a negative valence, which manifests in a basal neural mechanism of bad-vs-good evaluation.
Collapse
Affiliation(s)
- Roman Osinsky
- Department of Psychology I, Julius-Maximilians-University Würzburg, 97070 Würzburg, Germany and Department of Psychology, Friedrich-Schiller-University Jena, 07743 Jena, Germany
| | - Patrick Mussel
- Department of Psychology I, Julius-Maximilians-University Würzburg, 97070 Würzburg, Germany and Department of Psychology, Friedrich-Schiller-University Jena, 07743 Jena, Germany
| | - Linda Ohrlein
- Department of Psychology I, Julius-Maximilians-University Würzburg, 97070 Würzburg, Germany and Department of Psychology, Friedrich-Schiller-University Jena, 07743 Jena, Germany
| | - Johannes Hewig
- Department of Psychology I, Julius-Maximilians-University Würzburg, 97070 Würzburg, Germany and Department of Psychology, Friedrich-Schiller-University Jena, 07743 Jena, GermanyDepartment of Psychology I, Julius-Maximilians-University Würzburg, 97070 Würzburg, Germany and Department of Psychology, Friedrich-Schiller-University Jena, 07743 Jena, Germany
| |
Collapse
|
56
|
An electrophysiological monetary incentive delay (e-MID) task: A way to decompose the different components of neural response to positive and negative monetary reinforcement. J Neurosci Methods 2012; 209:40-9. [DOI: 10.1016/j.jneumeth.2012.05.015] [Citation(s) in RCA: 103] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2012] [Revised: 05/02/2012] [Accepted: 05/15/2012] [Indexed: 11/23/2022]
|
57
|
Walsh MM, Anderson JR. Learning from experience: event-related potential correlates of reward processing, neural adaptation, and behavioral choice. Neurosci Biobehav Rev 2012; 36:1870-84. [PMID: 22683741 DOI: 10.1016/j.neubiorev.2012.05.008] [Citation(s) in RCA: 366] [Impact Index Per Article: 30.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2012] [Revised: 05/17/2012] [Accepted: 05/21/2012] [Indexed: 11/30/2022]
Abstract
To behave adaptively, we must learn from the consequences of our actions. Studies using event-related potentials (ERPs) have been informative with respect to the question of how such learning occurs. These studies have revealed a frontocentral negativity termed the feedback-related negativity (FRN) that appears after negative feedback. According to one prominent theory, the FRN tracks the difference between the values of actual and expected outcomes, or reward prediction errors. As such, the FRN provides a tool for studying reward valuation and decision making. We begin this review by examining the neural significance of the FRN. We then examine its functional significance. To understand the cognitive processes that occur when the FRN is generated, we explore variables that influence its appearance and amplitude. Specifically, we evaluate four hypotheses: (1) the FRN encodes a quantitative reward prediction error; (2) the FRN is evoked by outcomes and by stimuli that predict outcomes; (3) the FRN and behavior change with experience; and (4) the system that produces the FRN is maximally engaged by volitional actions.
Collapse
Affiliation(s)
- Matthew M Walsh
- Carnegie Mellon University, Department of Psychology,, Baker Hall 342c, Pittsburgh, PA 15213, United States.
| | | |
Collapse
|
58
|
Herting MM, Nagel BJ. Aerobic fitness relates to learning on a virtual Morris Water Task and hippocampal volume in adolescents. Behav Brain Res 2012; 233:517-25. [PMID: 22610054 DOI: 10.1016/j.bbr.2012.05.012] [Citation(s) in RCA: 90] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2012] [Revised: 05/08/2012] [Accepted: 05/10/2012] [Indexed: 11/19/2022]
Abstract
In rodents, exercise increases hippocampal neurogenesis and allows for better learning and memory performance on water maze tasks. While exercise has also been shown to be beneficial for the brain and behavior in humans, no study has examined how exercise impacts spatial learning using a directly translational water maze task, or if these relationships exist during adolescence--a developmental period which the animal literature has shown to be especially vulnerable to exercise effects. In this study, we investigated the influence of aerobic fitness on hippocampal size and subsequent learning and memory, including visuospatial memory using a human analogue of the Morris Water Task, in 34 adolescents. Results showed that higher aerobic fitness predicted better learning on the virtual Morris Water Task and larger hippocampal volumes. No relationship between virtual Morris Water Task memory recall and aerobic fitness was detected. Aerobic fitness, however, did not relate to global brain volume or verbal learning, which might suggest some specificity of the influence of aerobic fitness on the adolescent brain. This study provides a direct translational approach to the existing animal literature on exercise, as well as adds to the sparse research that exists on how aerobic exercise impacts the developing human brain and memory.
Collapse
Affiliation(s)
- Megan M Herting
- Department of Behavioral Neuroscience, Oregon Health & Science University, Portland, OR 97239, USA.
| | | |
Collapse
|
59
|
Osinsky R, Hewig J, Alexander N, Hennig J. COMT Val158Met genotype and the common basis of error and conflict monitoring. Brain Res 2012; 1452:108-18. [DOI: 10.1016/j.brainres.2012.02.054] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2011] [Revised: 02/14/2012] [Accepted: 02/22/2012] [Indexed: 10/28/2022]
|
60
|
Warren CM, Holroyd CB. The Impact of Deliberative Strategy Dissociates ERP Components Related to Conflict Processing vs. Reinforcement Learning. Front Neurosci 2012; 6:43. [PMID: 22493568 PMCID: PMC3318225 DOI: 10.3389/fnins.2012.00043] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2011] [Accepted: 03/19/2012] [Indexed: 11/13/2022] Open
Abstract
We applied the event-related brain potential (ERP) technique to investigate the involvement of two neuromodulatory systems in learning and decision making: The locus coeruleus-norepinephrine system (NE system) and the mesencephalic dopamine system (DA system). We have previously presented evidence that the N2, a negative deflection in the ERP elicited by task-relevant events that begins approximately 200 ms after onset of the eliciting stimulus and that is sensitive to low-probability events, is a manifestation of cortex-wide noradrenergic modulation recruited to facilitate the processing of unexpected stimuli. Further, we hold that the impact of DA reinforcement learning signals on the anterior cingulate cortex (ACC) produces a component of the ERP called the feedback-related negativity (FRN). The N2 and the FRN share a similar time range, a similar topography, and similar antecedent conditions. We varied factors related to the degree of cognitive deliberation across a series of experiments to dissociate these two ERP components. Across four experiments we varied the demand for a deliberative strategy, from passively watching feedback, to more complex/challenging decision tasks. Consistent with our predictions, the FRN was largest in the experiment involving active learning and smallest in the experiment involving passive learning whereas the N2 exhibited the opposite effect. Within each experiment, when subjects attended to color, the N2 was maximal at frontal-central sites, and when they attended to gender it was maximal over lateral-occipital areas, whereas the topology of the FRN was frontal-central in both task conditions. We conclude that both the DA system and the NE system act in concert when learning from rewards that vary in expectedness, but that the DA system is relatively more exercised when subjects are relatively more engaged by the learning task.
Collapse
|
61
|
Holroyd CB, HajiHosseini A, Baker TE. ERPs and EEG oscillations, best friends forever: comment on Cohen et al. Trends Cogn Sci 2012; 16:192; author reply 193. [DOI: 10.1016/j.tics.2012.02.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2012] [Accepted: 02/23/2012] [Indexed: 11/28/2022]
|
62
|
Cavanagh JF, Zambrano-Vazquez L, Allen JJB. Theta lingua franca: a common mid-frontal substrate for action monitoring processes. Psychophysiology 2011; 49:220-38. [PMID: 22091878 DOI: 10.1111/j.1469-8986.2011.01293.x] [Citation(s) in RCA: 451] [Impact Index Per Article: 34.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2011] [Accepted: 07/25/2011] [Indexed: 01/10/2023]
Abstract
We present evidence that a multitude of mid-frontal event-related potential (ERP) components partially reflect a common theta band oscillatory process. Specifically, mid-frontal ERP components in the N2 time range and error-related negativity time range are parsimoniously characterized as reflections of theta band activities. Forty participants completed three different tasks with varying stimulus-response demands. Permutation tests were used to identify the dominant time-frequency responses of stimulus- and response-locked conditions as well as the enhanced responses to novelty, conflict, punishment, and error. A dominant theta band feature was found in all conditions, and both ERP component amplitudes and theta power measures were similarly modulated by novelty, conflict, punishment, and error. The findings support the hypothesis that generic and reactive medial prefrontal cortex processes are parsimoniously reflected by theta band activities.
Collapse
Affiliation(s)
- James F Cavanagh
- Department of Cognitive, Linguistic and Psychological Sciences, Brown University, Providence, Rhode Island 02906, USA.
| | | | | |
Collapse
|
63
|
Learning from delayed feedback: neural responses in temporal credit assignment. COGNITIVE AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2011; 11:131-43. [PMID: 21416212 DOI: 10.3758/s13415-011-0027-0] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
When feedback follows a sequence of decisions, relationships between actions and outcomes can be difficult to learn. We used event-related potentials (ERPs) to understand how people overcome this temporal credit assignment problem. Participants performed a sequential decision task that required two decisions on each trial. The first decision led to an intermediate state that was predictive of the trial outcome, and the second decision was followed by positive or negative trial feedback. The feedback-related negativity (fERN), a component thought to reflect reward prediction error, followed negative feedback and negative intermediate states. This suggests that participants evaluated intermediate states in terms of expected future reward, and that these evaluations supported learning of earlier actions within sequences. We examine the predictions of several temporal-difference models to determine whether the behavioral and ERP results reflected a reinforcement-learning process.
Collapse
|
64
|
Talmi D, Fuentemilla L, Litvak V, Duzel E, Dolan RJ. An MEG signature corresponding to an axiomatic model of reward prediction error. Neuroimage 2011; 59:635-45. [PMID: 21726648 PMCID: PMC3200436 DOI: 10.1016/j.neuroimage.2011.06.051] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2010] [Revised: 06/17/2011] [Accepted: 06/20/2011] [Indexed: 10/26/2022] Open
Abstract
Optimal decision-making is guided by evaluating the outcomes of previous decisions. Prediction errors are theoretical teaching signals which integrate two features of an outcome: its inherent value and prior expectation of its occurrence. To uncover the magnetic signature of prediction errors in the human brain we acquired magnetoencephalographic (MEG) data while participants performed a gambling task. Our primary objective was to use formal criteria, based upon an axiomatic model (Caplin and Dean, 2008a), to determine the presence and timing profile of MEG signals that express prediction errors. We report analyses at the sensor level, implemented in SPM8, time locked to outcome onset. We identified, for the first time, a MEG signature of prediction error, which emerged approximately 320 ms after an outcome and expressed as an interaction between outcome valence and probability. This signal followed earlier, separate signals for outcome valence and probability, which emerged approximately 200 ms after an outcome. Strikingly, the time course of the prediction error signal, as well as the early valence signal, resembled the Feedback-Related Negativity (FRN). In simultaneously acquired EEG data we obtained a robust FRN, but the win and loss signals that comprised this difference wave did not comply with the axiomatic model. Our findings motivate an explicit examination of the critical issue of timing embodied in computational models of prediction errors as seen in human electrophysiological data.
Collapse
Affiliation(s)
- Deborah Talmi
- Wellcome Trust Centre for Neuroimaging, UCL, 12 Queen Square, London WC1N 3BG, UK.
| | | | | | | | | |
Collapse
|
65
|
Baker TE, Stockwell T, Barnes G, Holroyd CB. Individual differences in substance dependence: at the intersection of brain, behaviour and cognition. Addict Biol 2011; 16:458-66. [PMID: 20731633 DOI: 10.1111/j.1369-1600.2010.00243.x] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Recent theories of drug dependence propose that the transition from occasional recreational substance use to harmful use and dependence results from the impact of disrupted midbrain dopamine signals for reinforcement learning on frontal brain areas that implement cognitive control and decision-making. We investigated this hypothesis in humans using electrophysiological and behavioral measures believed to assay the integrity of midbrain dopamine system and its neural targets. Our investigation revealed two groups of dependent individuals, one characterized by disrupted dopamine-dependent reward learning and the other by disrupted error learning associated with depression-proneness. These results highlight important neurobiological and behavioral differences between two classes of dependent users that can inform the development of individually tailored treatment programs.
Collapse
Affiliation(s)
- Travis E Baker
- Department of Psychology, Center of Addiction Research of British Columbia, Child and Youth Care, University of Victoria, Victoria, BC, Canada.
| | | | | | | |
Collapse
|
66
|
Liao Y, Gramann K, Feng W, Deák GO, Li H. This ought to be good: brain activity accompanying positive and negative expectations and outcomes. Psychophysiology 2011; 48:1412-9. [PMID: 21517899 DOI: 10.1111/j.1469-8986.2011.01205.x] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The current study employed a modified gambling task, in which probabilistic cues were provided to elicit positive or negative expectations. Event-related potentials (ERPs) to "final outcome" and "probabilistic cues" were analyzed. Difference waves between the negative condition and the corresponding positive condition were examined. The results confirm that feedback related negativity (FRN) amplitude is modulated by the interaction of outcome valence and expectancy by showing larger FRN difference waves for unexpected than expected outcomes. More interestingly, the difference wave between ERPs elicited by positive and negative expectations showed a negative deflection, with a frontal midline source density around 280 ms after onset of the predictive cue. Negative expectations were associated with larger FRN amplitudes than positive expectations. This suggests that FRN is elicited by probabilistic cues to pending outcomes.
Collapse
Affiliation(s)
- Yu Liao
- Key Laboratory of Cognition and Personality (SWU), Ministry of Education, Chongqing, China
| | | | | | | | | |
Collapse
|
67
|
Baker TE, Holroyd CB. Dissociated roles of the anterior cingulate cortex in reward and conflict processing as revealed by the feedback error-related negativity and N200. Biol Psychol 2011; 87:25-34. [PMID: 21295109 DOI: 10.1016/j.biopsycho.2011.01.010] [Citation(s) in RCA: 141] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2010] [Revised: 01/21/2011] [Accepted: 01/25/2011] [Indexed: 10/18/2022]
Abstract
The reinforcement learning theory of the error-related negativity (ERN) holds that the impact of reward signals carried by the midbrain dopamine system modulates activity of the anterior cingulate cortex (ACC), alternatively disinhibiting and inhibiting the ACC following unpredicted error and reward events, respectively. According to a recent formulation of the theory, activity that is intrinsic to the ACC produces a component of the event-related brain potential (ERP) called the N200, and following unpredicted rewards, the N200 is suppressed by extrinsically applied positive dopamine reward signals, resulting in an ERP component called the feedback-ERN (fERN). Here we demonstrate that, despite extensive spatial and temporal overlap between the two ERP components, the functional processes indexed by the N200 (conflict) and the fERN (reward) are dissociable. These results point toward avenues for future investigation.
Collapse
Affiliation(s)
- Travis E Baker
- Department of Psychology, University of Victoria, BC, Canada.
| | | |
Collapse
|
68
|
Human reversal learning under conditions of certain versus uncertain outcomes. Neuroimage 2011; 56:315-22. [PMID: 21281720 DOI: 10.1016/j.neuroimage.2011.01.068] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2010] [Revised: 01/19/2011] [Accepted: 01/25/2011] [Indexed: 11/20/2022] Open
Abstract
Reversal learning tasks assess behavioral flexibility by requiring subjects to switch from one learned response choice to a different response choice when task contingencies change. This requires both the processing of negative feedback once a learned response is no longer reinforced, and the capacity for flexible response selection. In 2-choice reversal learning tasks, subjects switch between only two responses. Multiple choice reversal learning is qualitatively different in that at reversal, it requires subjects to respond to non-reinforcement of a learned response by selecting a new response from among several alternatives that have uncertain consequences. While activity in brain regions responsible for processing unexpected negative feedback is known to increase in relation to the hedonic value of the reward itself, it is not known whether the uncertainty of reinforcement for future response choices also modulates these responses. In an fMRI study, 15 participants performed 2- and 4-choice reversal learning tasks. Upon reversal in both tasks, activation was observed in brain regions associated with processing changing reinforcement contingencies (midbrain, ventral striatum, insula), as well as in neocortical regions that support cognitive control and behavioral planning (prefrontal, premotor, posterior parietal, and anterior cingulate cortices). Activation in both systems was greater in the 4- than in the 2-choice task. Therefore, reinforcement uncertainty for future responses enhanced activity in brain systems that process performance feedback, as well as in areas supporting behavioral planning of future response choices. A mutually facilitative integration of responses in motivational and cognitive brain systems might enhance behavioral flexibility and decision making in conditions for which outcomes for future response choices are uncertain.
Collapse
|
69
|
When is an error not a prediction error? An electrophysiological investigation. COGNITIVE AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2009; 9:59-70. [PMID: 19246327 DOI: 10.3758/cabn.9.1.59] [Citation(s) in RCA: 137] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
A recent theory holds that the anterior cingulate cortex (ACC) uses reinforcement learning signals conveyed by the midbrain dopamine system to facilitate flexible action selection. According to this position, the impact of reward prediction error signals on ACC modulates the amplitude of a component of the event-related brain potential called the error-related negativity (ERN). The theory predicts that ERN amplitude is monotonically related to the expectedness of the event: It is larger for unexpected outcomes than for expected outcomes. However, a recent failure to confirm this prediction has called the theory into question. In the present article, we investigated this discrepancy in three trial-and-error learning experiments. All three experiments provided support for the theory, but the effect sizes were largest when an optimal response strategy could actually be learned. This observation suggests that ACC utilizes dopamine reward prediction error signals for adaptive decision making when the optimal behavior is, in fact, learnable.
Collapse
|