1
|
Qin L, Wang Z, Yan R, Tang H. Attention-Based Deep Spiking Neural Networks for Temporal Credit Assignment Problems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:10301-10311. [PMID: 37022405 DOI: 10.1109/tnnls.2023.3240176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
The temporal credit assignment (TCA) problem, which aims to detect predictive features hidden in distracting background streams, remains a core challenge in biological and machine learning. Aggregate-label (AL) learning is proposed by researchers to resolve this problem by matching spikes with delayed feedback. However, the existing AL learning algorithms only consider the information of a single timestep, which is inconsistent with the real situation. Meanwhile, there is no quantitative evaluation method for TCA problems. To address these limitations, we propose a novel attention-based TCA (ATCA) algorithm and a minimum editing distance (MED)-based quantitative evaluation method. Specifically, we define a loss function based on the attention mechanism to deal with the information contained within the spike clusters and use MED to evaluate the similarity between the spike train and the target clue flow. Experimental results on musical instrument recognition (MedleyDB), speech recognition (TIDIGITS), and gesture recognition (DVS128-Gesture) show that the ATCA algorithm can reach the state-of-the-art (SOTA) level compared with other AL learning algorithms.
Collapse
|
2
|
Sato Y, Sakai Y, Hirata S. State-transition-free reinforcement learning in chimpanzees (Pan troglodytes). Learn Behav 2023; 51:413-427. [PMID: 37369920 DOI: 10.3758/s13420-023-00591-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/07/2023] [Indexed: 06/29/2023]
Abstract
The outcome of an action often occurs after a delay. One solution for learning appropriate actions from delayed outcomes is to rely on a chain of state transitions. Another solution, which does not rest on state transitions, is to use an eligibility trace (ET) that directly bridges a current outcome and multiple past actions via transient memories. Previous studies revealed that humans (Homo sapiens) learned appropriate actions in a behavioral task in which solutions based on the ET were effective but transition-based solutions were ineffective. This suggests that ET may be used in human learning systems. However, no studies have examined nonhuman animals with an equivalent behavioral task. We designed a task for nonhuman animals following a previous human study. In each trial, participants chose one of two stimuli that were randomly selected from three stimulus types: a stimulus associated with a food reward delivered immediately, a stimulus associated with a reward delivered after a few trials, and a stimulus associated with no reward. The presented stimuli did not vary according to the participants' choices. To maximize the total reward, participants had to learn the value of the stimulus associated with a delayed reward. Five chimpanzees (Pan troglodytes) performed the task using a touchscreen. Two chimpanzees were able to learn successfully, indicating that learning mechanisms that do not depend on state transitions were involved in the learning processes. The current study extends previous ET research by proposing a behavioral task and providing empirical data from chimpanzees.
Collapse
Grants
- 16H06283 Ministry of Education, Culture, Sports, Science, Japan Society for the Promotion of Science
- 18H05524 Ministry of Education, Culture, Sports, Science, Japan Society for the Promotion of Science
- 19J22889 Ministry of Education, Culture, Sports, Science, Japan Society for the Promotion of Science
- 26245069 Ministry of Education, Culture, Sports, Science, Japan Society for the Promotion of Science
- U04 Program for Leading Graduate Schools
Collapse
Affiliation(s)
- Yutaro Sato
- Wildlife Research Center, Kyoto University, Kyoto, Japan.
- University Administration Office, Headquarters for Management Strategy, Niigata University, Niigata, Japan.
| | - Yutaka Sakai
- Brain Science Institute, Tamagawa University, Tokyo, Japan
| | - Satoshi Hirata
- Wildlife Research Center, Kyoto University, Kyoto, Japan
| |
Collapse
|
3
|
Bikute K, Di Bernardi Luft C, Beyer F. The value of an action: Impact of motor behaviour on outcome processing and stimulus preference. Eur J Neurosci 2022; 56:5823-5835. [PMID: 36114689 PMCID: PMC9828266 DOI: 10.1111/ejn.15826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 08/11/2022] [Accepted: 09/13/2022] [Indexed: 01/12/2023]
Abstract
While influences of Pavlovian associations on instrumental behaviour are well established, we still do not know how motor actions affect the formation of Pavlovian associations. To address this question, we designed a task in which participants were presented with neutral stimuli, half of which were paired with an active response, half with a passive waiting period. Stimuli had an 80% chance of predicting either a monetary gain or loss. We compared the feedback-related negativity (FRN) in response to predictive stimuli and outcomes, as well as directed phase synchronization before and after outcome presentation between trials with versus without a motor response. We found a larger FRN amplitude in response to outcomes presented after a motor response (active trials). This effect was driven by a positive deflection in active reward trials, which was absent in passive reward trials. Connectivity analysis revealed that the motor action reversed the direction of the phase synchronization at the time of the feedback presentation: Top-down information flow during the outcome anticipation phase in active trials, but bottom-up information flow in passive trials. This main effect of action was mirrored in behavioural data showing that participants preferred stimuli associated with an active response. Our findings suggest an influence of neural systems that initiate motor actions on neural systems involved in reward processing. We suggest that motor actions might modulate the brain responses to feedback by affecting the dynamics of brain activity towards optimizing the processing of the resulting action outcome.
Collapse
Affiliation(s)
- Kotryna Bikute
- Department of Biological and Experimental PsychologyQueen Mary University of LondonLondonUK
| | | | - Frederike Beyer
- Department of Biological and Experimental PsychologyQueen Mary University of LondonLondonUK
| |
Collapse
|
4
|
Stewardson HJ, Sambrook TD. Reward, Salience, and Agency in Event-Related Potentials for Appetitive and Aversive Contexts. Cereb Cortex 2021; 31:5006-5014. [PMID: 34023899 DOI: 10.1093/cercor/bhab137] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Revised: 03/31/2021] [Accepted: 04/23/2021] [Indexed: 11/13/2022] Open
Abstract
Cognitive architectures tasked with swiftly and adaptively processing biologically important events are likely to classify these on two central axes: motivational salience, that is, those events' importance and unexpectedness, and motivational value, the utility they hold, relative to that expected. Because of its temporal precision, electroencephalography provides an opportunity to resolve processes associated with these two axes. A focus of attention for the last two decades has been the feedback-related negativity (FRN), a frontocentral component occurring 240-340 ms after valenced events that are not fully predicted. Both motivational salience and value are present in such events and competing claims have been made for which of these is encoded by the FRN. The present study suggests that motivational value, in the form of a reward prediction error, is the primary determinant of the FRN in active contexts, while in both passive and active contexts, a weaker and earlier overlapping motivational salience component may be present.
Collapse
Affiliation(s)
| | - Thomas D Sambrook
- School of Psychology, University of East Anglia, Norwich NR4 7TJ, UK
| |
Collapse
|
5
|
Chidharom M, Krieg J, Pham BT, Bonnefond A. Conjoint fluctuations of PFC-mediated processes and behavior: An investigation of error-related neural mechanisms in relation to sustained attention. Cortex 2021; 143:69-79. [PMID: 34391083 DOI: 10.1016/j.cortex.2021.07.009] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Revised: 06/07/2021] [Accepted: 07/09/2021] [Indexed: 11/30/2022]
Abstract
The ability to detect errors, which derives from the medial prefrontal cortex (mPFC), is crucial to maintain attention over a long period of time. While impairment of this ability has been reported in patients with sustained attention disruption, the role mPFC-mediated processes play in the intra-individual fluctuation of sustained attention remains an open question. In this context, we computed the variance time course of reaction time (RT) of 42 healthy individuals to distinguish intra-individual periods of low and high performance instability, assumed to represent optimal and suboptimal attentional states, when performing a sustained Go/NoGo task. Analysis of the neurophysiological mechanisms of response monitoring revealed a specific reduction in the error-related negativity (ERN) amplitude and frontal midline theta power during periods of high compared to low RT variability, but only in individuals with a higher standard deviation of reaction time (SD-RT). Concerning post-error adaptation, an increase in the correct-related negativity (CRN) amplitude as well as the frontal lateral theta power on trials following errors was observed in individuals with lower SD-RT but not in those with higher SD-RT. Our results thus show that individuals with poor sustained attention ability exhibit altered post-error adaptation and attentional state-dependent efficiency of error monitoring. Conversely, individuals with good sustained attention performances retained their post-error adaptation and response monitoring regardless of the attentional periods. These findings reveal the critical role of the action-monitoring system in intra-individual behavioral stability and highlight the importance of considering attentional states when studying mPFC-mediated processes, especially in subjects with low sustained attention ability.
Collapse
Affiliation(s)
- Matthieu Chidharom
- INSERM U1114, Strasbourg, France; University of Strasbourg, Strasbourg, France.
| | - Julien Krieg
- INSERM U1114, Strasbourg, France; University of Strasbourg, Strasbourg, France
| | - Bich-Thuy Pham
- INSERM U1114, Strasbourg, France; University of Strasbourg, Strasbourg, France
| | - Anne Bonnefond
- INSERM U1114, Strasbourg, France; University of Strasbourg, Strasbourg, France
| |
Collapse
|
6
|
Dewiputri WI, Schweizer R, Auer T. Brain Networks Underlying Strategy Execution and Feedback Processing in an Efficient Functional Magnetic Resonance Imaging Neurofeedback Training Performed in a Parallel or a Serial Paradigm. Front Hum Neurosci 2021; 15:645048. [PMID: 34113243 PMCID: PMC8185020 DOI: 10.3389/fnhum.2021.645048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Accepted: 04/06/2021] [Indexed: 11/13/2022] Open
Abstract
Neurofeedback (NF) is a complex learning scenario, as the task consists of trying out mental strategies while processing a feedback signal that signifies activation in the brain area to be self-regulated and acts as a potential reward signal. In an attempt to dissect these subcomponents, we obtained whole-brain networks associated with efficient self-regulation in two paradigms: parallel, where the task was performed concurrently, combining feedback with strategy execution; and serial, where the task was performed consecutively, separating feedback processing from strategy execution. Twenty participants attempted to control their anterior midcingulate cortex (aMCC) using functional magnetic resonance imaging (fMRI) NF in 18 sessions over 2 weeks, using cognitive and emotional mental strategies. We analyzed whole-brain fMRI activations in the NF training runs with the largest aMCC activation for the serial and parallel paradigms. The equal length of the strategy execution and the feedback processing periods in the serial paradigm allows a description of the two task subcomponents with equal power. The resulting activation maps were spatially correlated with functionally annotated intrinsic connectivity brain maps (BMs). Brain activation in the parallel condition correlates with the basal ganglia (BG) network, the cingulo-opercular network (CON), and the frontoparietal control network (FPCN); brain activation in the serial strategy execution condition with the default mode network (DMN), the FPCN, and the visual processing network; while brain activation in the serial feedback processing condition predominantly with the CON, the DMN, and the FPCN. Additional comparisons indicate that BG activation is characteristic to the parallel paradigm, while supramarginal gyrus (SMG) and superior temporal gyrus (STG) activations are characteristic to the serial paradigm. The multifaceted view of the subcomponents allows describing the cognitive processes associated with strategy execution and feedback processing independently in the serial feedback task and as combined processes in the multitasking scenario of the conventional parallel feedback task.
Collapse
Affiliation(s)
- Wan Ilma Dewiputri
- International Max Planck Research School for Neurosciences, Georg-August-University, Göttingen, Göttingen, Germany
| | - Renate Schweizer
- Functional Imaging Laboratory, German Primate Center, Göttingen, Germany.,Leibniz Science Campus Primate Cognition, Göttingen, Germany
| | - Tibor Auer
- School of Psychology, Faculty of Health and Medical Sciences, University of Surrey, Guildford, United Kingdom
| |
Collapse
|
7
|
Sosa JLR, Buonomano D, Izquierdo A. The orbitofrontal cortex in temporal cognition. Behav Neurosci 2021; 135:154-164. [PMID: 34060872 DOI: 10.1037/bne0000430] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
One of the most important factors in decision-making is estimating the value of available options. Subregions of the prefrontal cortex, including the orbitofrontal cortex (OFC), have been deemed essential for this process. Value computations require a complex integration across numerous dimensions, including, reward magnitude, effort, internal state, and time. The importance of the temporal dimension is well illustrated by temporal discounting tasks, in which subjects select between smaller-sooner versus larger-later rewards. The specific role of OFC in telling time and integrating temporal information into decision-making remains unclear. Based on the current literature, in this review we reevaluate current theories of OFC function, accounting for the influence of time. Incorporating temporal information into value estimation and decision-making requires distinct, yet interrelated, forms of temporal information including the ability to tell time, represent time, create temporal expectations, and the ability to use this information for optimal decision-making in a wide range of tasks, including temporal discounting and wagering. We use the term "temporal cognition" to refer to the integrated use of these different aspects of temporal information. We suggest that the OFC may be a critical site for the integration of reward magnitude and delay, and thus important for temporal cognition. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Collapse
Affiliation(s)
| | - Dean Buonomano
- Department of Psychology, University of California-Los Angeles
| | | |
Collapse
|
8
|
Rawls E, Lamm C. The aversion positivity: Mediofrontal cortical potentials reflect parametric aversive prediction errors and drive behavioral modification following negative reinforcement. Cortex 2021; 140:26-39. [PMID: 33905968 DOI: 10.1016/j.cortex.2021.03.012] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 02/07/2021] [Accepted: 03/17/2021] [Indexed: 11/19/2022]
Abstract
Reinforcement learning capitalizes on prediction errors (PEs), representing the deviation of received outcomes from expected outcomes. Mediofrontal event-related potentials (ERPs), in particular the feedback-related negativity (FRN)/reward positivity (RewP), are related to PE signaling, but there is disagreement as to whether the FRN/RewP encode signed or unsigned PEs. PE encoding can potentially be dissected by time-frequency analysis, as frontal theta [4-8 Hz] might represent poor outcomes, while central delta [1-3 Hz] might instead represent rewarding outcomes. However, cortical PE signaling in negative reinforcement is still poorly understood, and the role of cortical PE representations in behavioral reinforcement learning following negative reinforcement is relatively unexplored. We recorded EEG while participants completed a task with matched positive and negative reinforcement outcome modalities, with parametrically manipulated single-trial outcomes producing positive and negative PEs. We first demonstrated that PEs systematically influence future behavior in both positive and negative reinforcement conditions. In negative reinforcement conditions, mediofrontal ERPs positively signaled unsigned PEs in a time window encompassing the P2 potential, and negatively signaled signed PEs for a time window encompassing the FRN/RewP and frontal P3 (an "aversion positivity"). Central delta power increased parametrically with increasingly aversive outcomes, contributing to the "aversion positivity". Finally, negative reinforcement ERPs correlated with RTs on the following trial, suggesting cortical PEs guide behavioral adaptations. Positive reinforcement PEs did not influence ERP or time-frequency activity, despite significant behavioral effects. These results demonstrate that mediofrontal PE signals are a mechanism underlying negative reinforcement learning, and that delta power increases for aversive outcomes might contribute to the "aversion positivity."
Collapse
Affiliation(s)
- Eric Rawls
- Department of Psychiatry and Behavioral Sciences, University of Minnesota Health, USA.
| | - Connie Lamm
- Department of Psychological Sciences, University of Arkansas, USA
| |
Collapse
|
9
|
Kelly MA, Arora N, West RL, Reitter D. Holographic Declarative Memory: Distributional Semantics as the Architecture of Memory. Cogn Sci 2020; 44:e12904. [PMID: 33140517 DOI: 10.1111/cogs.12904] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Revised: 03/30/2020] [Accepted: 08/31/2020] [Indexed: 11/29/2022]
Abstract
We demonstrate that the key components of cognitive architectures (declarative and procedural memory) and their key capabilities (learning, memory retrieval, probability judgment, and utility estimation) can be implemented as algebraic operations on vectors and tensors in a high-dimensional space using a distributional semantics model. High-dimensional vector spaces underlie the success of modern machine learning techniques based on deep learning. However, while neural networks have an impressive ability to process data to find patterns, they do not typically model high-level cognition, and it is often unclear how they work. Symbolic cognitive architectures can capture the complexities of high-level cognition and provide human-readable, explainable models, but scale poorly to naturalistic, non-symbolic, or big data. Vector-symbolic architectures, where symbols are represented as vectors, bridge the gap between the two approaches. We posit that cognitive architectures, if implemented in a vector-space model, represent a useful, explanatory model of the internal representations of otherwise opaque neural architectures. Our proposed model, Holographic Declarative Memory (HDM), is a vector-space model based on distributional semantics. HDM accounts for primacy and recency effects in free recall, the fan effect in recognition, probability judgments, and human performance on an iterated decision task. HDM provides a flexible, scalable alternative to symbolic cognitive architectures at a level of description that bridges symbolic, quantum, and neural models of cognition.
Collapse
Affiliation(s)
- Mary Alexandria Kelly
- Department of Computer Science, Bucknell University
- College of Information Sciences and Computing, The Pennsylvania State University
| | - Nipun Arora
- Department of Cognitive Science, Carleton University
| | - Robert L West
- Department of Cognitive Science, Carleton University
| | - David Reitter
- College of Information Sciences and Computing, The Pennsylvania State University
- Google Research
| |
Collapse
|
10
|
Wurm F, Ernst B, Steinhauser M. The influence of internal models on feedback-related brain activity. COGNITIVE, AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2020; 20:1070-1089. [PMID: 32812148 PMCID: PMC7497542 DOI: 10.3758/s13415-020-00820-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Decision making relies on the interplay between two distinct learning mechanisms, namely habitual model-free learning and goal-directed model-based learning. Recent literature suggests that this interplay is significantly shaped by the environmental structure as represented by an internal model. We employed a modified two-stage but one-decision Markov decision task to investigate how two internal models differing in the predictability of stage transitions influence the neural correlates of feedback processing. Our results demonstrate that fronto-central theta and the feedback-related negativity (FRN), two correlates of reward prediction errors in the medial frontal cortex, are independent of the internal representations of the environmental structure. In contrast, centro-parietal delta and the P3, two correlates possibly reflecting feedback evaluation in working memory, were highly susceptible to the underlying internal model. Model-based analyses of single-trial activity showed a comparable pattern, indicating that while the computation of unsigned reward prediction errors is represented by theta and the FRN irrespective of the internal models, the P3 adapts to the internal representation of an environment. Our findings further substantiate the assumption that the feedback-locked components under investigation reflect distinct mechanisms of feedback processing and that different internal models selectively influence these mechanisms.
Collapse
Affiliation(s)
- Franz Wurm
- Catholic University of Eichstätt-Ingolstadt, Ostenstraße 27, 85072, Eichstätt, Germany.
| | - Benjamin Ernst
- Catholic University of Eichstätt-Ingolstadt, Ostenstraße 27, 85072, Eichstätt, Germany
| | - Marco Steinhauser
- Catholic University of Eichstätt-Ingolstadt, Ostenstraße 27, 85072, Eichstätt, Germany
| |
Collapse
|
11
|
Starita F, Pietrelli M, Bertini C, di Pellegrino G. Aberrant reward prediction error during Pavlovian appetitive learning in alexithymia. Soc Cogn Affect Neurosci 2020; 14:1119-1129. [PMID: 31820808 PMCID: PMC6970149 DOI: 10.1093/scan/nsz089] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2018] [Revised: 09/02/2019] [Accepted: 09/30/2019] [Indexed: 12/31/2022] Open
Abstract
Extensive literature shows that alexithymia, a subclinical trait defined by difficulties in identifying and describing feelings, is characterized by multifaceted impairments in processing emotional stimuli. Nevertheless, its underlying mechanisms remain elusive. Here, we hypothesize that alexithymia may be characterized by an alteration in learning the emotional value of encountered stimuli and test this by assessing differences between individuals with low (LA) and high (HA) levels of alexithymia in the computation of reward prediction errors (RPEs) during Pavlovian appetitive conditioning. As a marker of RPE, the amplitude of the feedback-related negativity (FRN) event-related potential was assessed while participants were presented with two conditioned stimuli (CS) associated with expected or unexpected feedback, indicating delivery of reward or no-reward. No-reward (vs reward) feedback elicited the FRN both in LA and HA. However, unexpected (vs expected) feedback enhanced the FRN in LA but not in HA, indicating impaired computation of RPE in HA. Thus, although HA show preserved sensitivity to rewards, they cannot use this response to update the value of CS that predict them. This impairment may hinder the construction of internal representations of emotional stimuli, leaving individuals with alexithymia unable to effectively recognize, respond and regulate their response to emotional stimuli.
Collapse
Affiliation(s)
| | | | | | - Giuseppe di Pellegrino
- Department of Psychology, Center for Studies and Research in Cognitive Neuroscience, University of Bologna, 40126 Bologna (BO), Italy
| |
Collapse
|
12
|
Lehmann MP, Xu HA, Liakoni V, Herzog MH, Gerstner W, Preuschoff K. One-shot learning and behavioral eligibility traces in sequential decision making. eLife 2019; 8:e47463. [PMID: 31709980 PMCID: PMC6897511 DOI: 10.7554/elife.47463] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Accepted: 11/01/2019] [Indexed: 11/13/2022] Open
Abstract
In many daily tasks, we make multiple decisions before reaching a goal. In order to learn such sequences of decisions, a mechanism to link earlier actions to later reward is necessary. Reinforcement learning (RL) theory suggests two classes of algorithms solving this credit assignment problem: In classic temporal-difference learning, earlier actions receive reward information only after multiple repetitions of the task, whereas models with eligibility traces reinforce entire sequences of actions from a single experience (one-shot). Here, we show one-shot learning of sequences. We developed a novel paradigm to directly observe which actions and states along a multi-step sequence are reinforced after a single reward. By focusing our analysis on those states for which RL with and without eligibility trace make qualitatively distinct predictions, we find direct behavioral (choice probability) and physiological (pupil dilation) signatures of reinforcement learning with eligibility trace across multiple sensory modalities.
Collapse
Affiliation(s)
- Marco P Lehmann
- Brain-Mind-Institute, School of Life SciencesÉcole Polytechnique Fédérale de LausanneLausanneSwitzerland
- School of Computer and Communication SciencesÉcole Polytechnique Fédérale de LausanneLausanneSwitzerland
| | - He A Xu
- Laboratory of Psychophysics, School of Life SciencesÉcole Polytechnique Fédérale de LausanneLausanneSwitzerland
| | - Vasiliki Liakoni
- Brain-Mind-Institute, School of Life SciencesÉcole Polytechnique Fédérale de LausanneLausanneSwitzerland
- School of Computer and Communication SciencesÉcole Polytechnique Fédérale de LausanneLausanneSwitzerland
| | - Michael H Herzog
- Laboratory of Psychophysics, School of Life SciencesÉcole Polytechnique Fédérale de LausanneLausanneSwitzerland
| | - Wulfram Gerstner
- Brain-Mind-Institute, School of Life SciencesÉcole Polytechnique Fédérale de LausanneLausanneSwitzerland
- School of Computer and Communication SciencesÉcole Polytechnique Fédérale de LausanneLausanneSwitzerland
| | - Kerstin Preuschoff
- Swiss Center for Affective Sciences, University of GenevaGenevaSwitzerland
| |
Collapse
|
13
|
Gao T, Zhou Y, Li W, Pfabigan DM, Han S. Neural mechanisms of reinforcement learning under mortality threat. Soc Neurosci 2019; 15:170-185. [PMID: 31526160 DOI: 10.1080/17470919.2019.1668846] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Reinforcement learning - to adjust behaviors in response to feedback regarding reward and punishment - is pivotal to our survival. The present work investigated whether and how reinforcement learning is affected by thoughts of mortality that endanger one's survival. We recorded electroencephalographic while adults performed a probabilistic learning task that required a forced-choice between two visual patterns for monetary reward for different beneficiaries (i.e., self, stranger, or no one) followed by reward or no-reward feedback. We found that verbal reminders of mortality (vs. negative emotion) enlarged an early positive component (P1) at the occipital electrodes but decreased a late positive potential (LPP) at the frontocentral electrodes in response to learning stimuli. While no-reward feedback relative to reward feedback stimuli elicited a feedback-related negativity (FRN) and increased non-phase locked theta band (4-8 Hz) activity at the frontocentral electrodes during reward learning for all beneficiaries, verbal reminders of mortality (vs. negative emotion) significantly reduced the FRN amplitude but failed to modulate the theta band activity. These results suggest that mortality salience enhances early attentional processing but dampens late cognitive evaluation of the learning stimuli during reinforcement learning. Moreover, mortality salience decreases the neural sensitivity to feedback signaling the absence of monetary reward.
Collapse
Affiliation(s)
- Tianyu Gao
- School of Psychological and Cognitive Sciences, PKU-IDG/McGovern Institute for Brain Research, Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| | - Yuqing Zhou
- School of Psychological and Cognitive Sciences, PKU-IDG/McGovern Institute for Brain Research, Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| | - Wenxin Li
- School of Psychological and Cognitive Sciences, PKU-IDG/McGovern Institute for Brain Research, Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| | - Daniela M Pfabigan
- School of Psychological and Cognitive Sciences, PKU-IDG/McGovern Institute for Brain Research, Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| | - Shihui Han
- School of Psychological and Cognitive Sciences, PKU-IDG/McGovern Institute for Brain Research, Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| |
Collapse
|
14
|
Garr E. Contributions of the basal ganglia to action sequence learning and performance. Neurosci Biobehav Rev 2019; 107:279-295. [PMID: 31541637 DOI: 10.1016/j.neubiorev.2019.09.017] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Revised: 07/22/2019] [Accepted: 09/11/2019] [Indexed: 12/12/2022]
Abstract
Animals engage in intricately woven and choreographed action sequences that are constructed from trial-and-error learning. The mechanisms by which the brain links together individual actions which are later recalled as fluid chains of behavior are not fully understood, but there is broad consensus that the basal ganglia play a crucial role in this process. This paper presents a comprehensive review of the role of the basal ganglia in action sequencing, with a focus on whether the computational framework of reinforcement learning can capture key behavioral features of sequencing and the neural mechanisms that underlie them. While a simple neurocomputational model of reinforcement learning can capture key features of action sequence learning, this model is not sufficient to capture goal-directed control of sequences or their hierarchical representation. The hierarchical structure of action sequences, in particular, poses a challenge for building better models of action sequencing, and it is in this regard that further investigations into basal ganglia information processing may be informative.
Collapse
Affiliation(s)
- Eric Garr
- Graduate Center, City University of New York, 365 5(th) Avenue, New York, NY 10016, United States.
| |
Collapse
|
15
|
Rosenbaum GM, Hartley CA. Developmental perspectives on risky and impulsive choice. Philos Trans R Soc Lond B Biol Sci 2019; 374:20180133. [PMID: 30966918 PMCID: PMC6335462 DOI: 10.1098/rstb.2018.0133] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/20/2018] [Indexed: 12/28/2022] Open
Abstract
Epidemiological data suggest that risk taking in the real world increases from childhood into adolescence and declines into adulthood. However, developmental patterns of behaviour in laboratory assays of risk taking and impulsive choice are inconsistent. In this article, we review a growing literature using behavioural economic approaches to understand developmental changes in risk taking and impulsivity. We present findings that have begun to elucidate both the cognitive and neural processes that contribute to risky and impulsive choice, as well as how age-related changes in these neurocognitive processes give rise to shifts in choice behaviour. We highlight how variability in task parameters can be used to identify specific aspects of decision contexts that may differentially influence risky and impulsive choice behaviour across development. This article is part of the theme issue 'Risk taking and impulsive behaviour: fundamental discoveries, theoretical perspectives and clinical implications'.
Collapse
Affiliation(s)
- Gail M. Rosenbaum
- Department of Psychology, New York University, New York, NY 10003, USA
| | - Catherine A. Hartley
- Department of Psychology, New York University, New York, NY 10003, USA
- Center for Neural Science, New York University, New York, NY 10003, USA
| |
Collapse
|
16
|
Tinga AM, de Back TT, Louwerse MM. Non-invasive neurophysiological measures of learning: A meta-analysis. Neurosci Biobehav Rev 2019; 99:59-89. [PMID: 30735681 DOI: 10.1016/j.neubiorev.2019.02.001] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2018] [Revised: 12/22/2018] [Accepted: 02/04/2019] [Indexed: 01/09/2023]
Abstract
In a meta-analysis of 113 experiments we examined neurophysiological outcomes of learning, and the relationship between neurophysiological and behavioral outcomes of learning. Findings showed neurophysiology yielding large effect sizes, with the majority of studies examining electroencephalography and eye-related outcome measures. Effect sizes on neurophysiological outcomes were smaller than effect sizes on behavioral outcomes, however. Neurophysiological outcomes were, but behavioral outcomes were not, influenced by several modulating factors. These factors included the sensory system in which learning took place, number of learning days, whether feedback on performance was provided, and age of participants. Controlling for these factors resulted in the effect size differences between behavior and neurophysiology to disappear. The findings of the current meta-analysis demonstrate that neurophysiology is an appropriate measure in assessing learning, particularly when taking into account factors that could have an influence on neurophysiology. We propose a first model to aid further studies that are needed to examine the exact interplay between learning, neurophysiology, behavior, individual differences, and task-related aspects.
Collapse
Affiliation(s)
- Angelica M Tinga
- Department of Cognitive Science & Artificial Intelligence, Tilburg University, Dante Building, Room D 330, Warandelaan 2, 5037 AB Tilburg, The Netherlands.
| | - Tycho T de Back
- Department of Cognitive Science & Artificial Intelligence, Tilburg University, Dante Building, Room D 330, Warandelaan 2, 5037 AB Tilburg, The Netherlands
| | - Max M Louwerse
- Department of Cognitive Science & Artificial Intelligence, Tilburg University, Dante Building, Room D 330, Warandelaan 2, 5037 AB Tilburg, The Netherlands
| |
Collapse
|
17
|
Wang Y, Luo Y, Wang M, Miao H. Time-invariant biological networks with feedback loops: structural equation models and structural identifiability. IET Syst Biol 2018; 12:264-272. [PMID: 30472690 DOI: 10.1049/iet-syb.2018.5004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Quantitative analyses of biological networks such as key biological parameter estimation necessarily call for the use of graphical models. While biological networks with feedback loops are common in reality, the development of graphical model methods and tools that are capable of dealing with feedback loops is still in its infancy. Particularly, inadequate attention has been paid to the parameter identifiability problem for biological networks with feedback loops such that unreliable or even misleading parameter estimates may be obtained. In this study, the structural identifiability analysis problem of time-invariant linear structural equation models (SEMs) with feedback loops is addressed, resulting in a general and efficient solution. The key idea is to combine Mason's gain with Wright's path coefficient method to generate identifiability equations, from which identifiability matrices are then derived to examine the structural identifiability of every single unknown parameter. The proposed method does not involve symbolic or expensive numerical computations, and is applicable to a broad range of time-invariant linear SEMs with or without explicit latent variables, presenting a remarkable breakthrough in terms of generality. Finally, a subnetwork structure of the C. elegans neural network is used to illustrate the application of the authors' method in practice.
Collapse
Affiliation(s)
- Yulin Wang
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, Sichuan, People's Republic of China
| | - Yu Luo
- School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 610054, Sichuan, People's Republic of China
| | - Mingwen Wang
- School of Mathematics, Southwest Jiaotong University, Chengdu 611756, Sichuan, People's Republic of China
| | - Hongyu Miao
- Department of Biostatistics and Data Science, School of Public Health, University of Texas Health Science Center, Houston 77030, TX, USA.
| |
Collapse
|
18
|
Ryman SG, Cavanagh JF, Wertz CJ, Shaff NA, Dodd AB, Stevens B, Ling J, Yeo RA, Hanlon FM, Bustillo J, Stromberg SF, Lin DS, Abrams S, Mayer AR. Impaired Midline Theta Power and Connectivity During Proactive Cognitive Control in Schizophrenia. Biol Psychiatry 2018; 84:675-683. [PMID: 29921417 PMCID: PMC7654098 DOI: 10.1016/j.biopsych.2018.04.021] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/06/2018] [Revised: 04/17/2018] [Accepted: 04/17/2018] [Indexed: 12/20/2022]
Abstract
BACKGROUND Disrupted proactive cognitive control, a form of early selection and active goal maintenance, is hypothesized to underlie the broad cognitive deficits observed in patients with schizophrenia (SPs). Current research suggests that the disrupted activation within and connectivity between regions of the cognitive control network contribute to disrupted proactive cognitive control; however, no study has examined these mechanisms using an AX Continuous Performance Test task in schizophrenia. METHODS Twenty-six SPs (17 male subjects; mean age 34.46 ± 8.77 years) and 28 healthy control participants (HCs; 16 male subjects; mean age 31.43 ± 7.23 years) underwent an electroencephalogram while performing the AX Continuous Performance Test. To examine the extent of activation and level of connectivity within the cognitive control network, power, intertrial phase clustering, and intersite phase clustering metrics were calculated and analyzed. RESULTS SPs exhibited expected general decrements in behavioral performance relative to HCs and a more selective deficit in conditions requiring proactive cognitive control. Additionally, SPs exhibited deficits in midline theta power and connectivity during proactive cognitive control trials. Specifically, HCs exhibited significantly greater theta power for B cues relative to A cues, whereas SPs exhibited no significant differences between A- and B-cue theta power. Additionally, differential theta connectivity patterns were observed in SPs and HCs. Behavioral measures of proactive cognitive control predicted functional outcomes in SPs. CONCLUSIONS This study suggests that low-frequency midline theta activity is selectively disrupted during proactive cognitive control in SPs. The disrupted midline theta activity may reflect a failure of SPs to proactively recruit cognitive control processes.
Collapse
|
19
|
Feedback is the breakfast of champions: the significance of self-controlled formal feedback for autonomous task engagement. Neuroreport 2018; 29:13-18. [PMID: 29112673 DOI: 10.1097/wnr.0000000000000921] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
With the aim of examining the positive effect of the formal feedback mechanism itself beyond its informational aspect, we engaged participants in the stopwatch task and recorded their electroencephalogram throughout the experiment. This task requires a button press to stop the watch within a given time interval, the completion of which is simultaneously accompanied by adequate information on task performance. In the self-controlled feedback mode, participants could freely choose whether to request formal feedback after completing the task. In another mode, additional feedback was not provided. The 'non-choice' cue was found to elicit a more negative cue-elicited feedback negativity compared with 'choice', suggesting that the opportunity to solicit formal feedback was perceived as more desirable. In addition, a more enhanced stimulus-preceding negativity was observed prior to the task initiation cue in the self-controlled feedback condition, indicating that participants paid more sustained anticipatory attention during task preparation. Taken together, these electrophysiological results suggested an inherent reward within the formal feedback mechanism itself and the significance of self-controlled formal feedback for autonomous task engagement.
Collapse
|
20
|
Krugliakova E, Klucharev V, Fedele T, Gorin A, Kuznetsova A, Shestakova A. Correlation of cue-locked FRN and feedback-locked FRN in the auditory monetary incentive delay task. Exp Brain Res 2017; 236:141-151. [PMID: 29196772 DOI: 10.1007/s00221-017-5113-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2016] [Accepted: 10/24/2017] [Indexed: 02/02/2023]
Abstract
Reflecting the discrepancy between received and predicted outcomes, the reward prediction error (RPE) plays an important role in learning in a dynamic environment. A number of studies suggested that the feedback-related negativity (FRN) component of an event-related potential, known to be associated with unexpected outcomes, encodes RPEs. While FRN was clearly shown to be sensitive to the probability of outcomes, the effect of outcome magnitude on FRN remains to be further clarified. In studies on the neural underpinnings of reward anticipation and outcome evaluation, a monetary incentive delay (MID) task proved to be particularly useful. We investigated whether feedback-locked FRN and cue-locked dN200 responses recorded during an auditory MID task were sensitive to the probability and magnitude of outcomes. The cue-locked dN200 is associated with the update of information about the magnitude of prospective outcomes. Overall, we showed that feedback-locked FRN was modulated by both the magnitude and the probability of outcomes during an auditory version of MID task, whereas no such effect was found for cue-locked dN200. Furthermore, the cue-locked dN200, which is associated with the update of information about the magnitude of prospective outcomes, correlated with the standard feedback-locked FRN, which is associated with a negative RPE. These results further expand our knowledge on the interplay between the processing of predictive cues that forecast future outcomes and the subsequent revision of these predictions during outcome delivery.
Collapse
Affiliation(s)
- Elena Krugliakova
- Centre for Cognition and Decision Making, National Research University Higher School of Economics, 3a Krivokolenniy sidewalk, Moscow, 101000, Russian Federation.
| | - Vasily Klucharev
- Centre for Cognition and Decision Making, National Research University Higher School of Economics, 3a Krivokolenniy sidewalk, Moscow, 101000, Russian Federation
| | - Tommaso Fedele
- Neurosurgery Department, University Hospital Zürich, Frauenklinikstrasse 10, 8091, Zurich, Switzerland
| | - Alexey Gorin
- Centre for Cognition and Decision Making, National Research University Higher School of Economics, 3a Krivokolenniy sidewalk, Moscow, 101000, Russian Federation
| | - Aleksandra Kuznetsova
- Centre for Cognition and Decision Making, National Research University Higher School of Economics, 3a Krivokolenniy sidewalk, Moscow, 101000, Russian Federation
| | - Anna Shestakova
- Centre for Cognition and Decision Making, National Research University Higher School of Economics, 3a Krivokolenniy sidewalk, Moscow, 101000, Russian Federation
| |
Collapse
|
21
|
Moënne-Loccoz C, Vergara RC, López V, Mery D, Cosmelli D. Modeling Search Behaviors during the Acquisition of Expertise in a Sequential Decision-Making Task. Front Comput Neurosci 2017; 11:80. [PMID: 28943847 PMCID: PMC5596102 DOI: 10.3389/fncom.2017.00080] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Accepted: 08/04/2017] [Indexed: 11/13/2022] Open
Abstract
Our daily interaction with the world is plagued of situations in which we develop expertise through self-motivated repetition of the same task. In many of these interactions, and especially when dealing with computer and machine interfaces, we must deal with sequences of decisions and actions. For instance, when drawing cash from an ATM machine, choices are presented in a step-by-step fashion and a specific sequence of choices must be performed in order to produce the expected outcome. But, as we become experts in the use of such interfaces, is it possible to identify specific search and learning strategies? And if so, can we use this information to predict future actions? In addition to better understanding the cognitive processes underlying sequential decision making, this could allow building adaptive interfaces that can facilitate interaction at different moments of the learning curve. Here we tackle the question of modeling sequential decision-making behavior in a simple human-computer interface that instantiates a 4-level binary decision tree (BDT) task. We record behavioral data from voluntary participants while they attempt to solve the task. Using a Hidden Markov Model-based approach that capitalizes on the hierarchical structure of behavior, we then model their performance during the interaction. Our results show that partitioning the problem space into a small set of hierarchically related stereotyped strategies can potentially capture a host of individual decision making policies. This allows us to follow how participants learn and develop expertise in the use of the interface. Moreover, using a Mixture of Experts based on these stereotyped strategies, the model is able to predict the behavior of participants that master the task.
Collapse
Affiliation(s)
- Cristóbal Moënne-Loccoz
- Department of Computer Science, School of Engineering, Pontificia Universidad Católica de ChileSantiago, Chile
| | - Rodrigo C. Vergara
- Facultad de Medicina, Biomedical Neuroscience Institute, Universidad de ChileSantiago, Chile
| | - Vladimir López
- Center for Interdisciplinary Neuroscience, Pontificia Universidad Católica de ChileSantiago, Chile
- School of Psychology, Pontificia Universidad Católica de ChileSantiago, Chile
| | - Domingo Mery
- Department of Computer Science, School of Engineering, Pontificia Universidad Católica de ChileSantiago, Chile
| | - Diego Cosmelli
- Center for Interdisciplinary Neuroscience, Pontificia Universidad Católica de ChileSantiago, Chile
- School of Psychology, Pontificia Universidad Católica de ChileSantiago, Chile
| |
Collapse
|
22
|
Li D, Meng L, Ma Q. Who Deserves My Trust? Cue-Elicited Feedback Negativity Tracks Reputation Learning in Repeated Social Interactions. Front Hum Neurosci 2017; 11:307. [PMID: 28663727 PMCID: PMC5471337 DOI: 10.3389/fnhum.2017.00307] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2016] [Accepted: 05/29/2017] [Indexed: 12/27/2022] Open
Abstract
Trust and trustworthiness contribute to reciprocal behavior and social relationship development. To make better decisions, people need to evaluate others’ trustworthiness. They often assess this kind of reputation by learning through repeated social interactions. The present event-related potential (ERP) study explored the reputation learning process in a repeated trust game where subjects made multi-round decisions of investment to different partners. We found that subjects gradually learned to discriminate trustworthy partners from untrustworthy ones based on how often their partners reciprocated the investment, which was indicated by their own investment decisions. Besides, electrophysiological data showed that the faces of the untrustworthy partners induced larger feedback negativity (FN) amplitude than those of the trustworthy partners, but only in the late phase of the game. The ERP results corresponded with the behavioral pattern and revealed that the learned trustworthiness differentiation was coded by the cue-elicited FN component. Consistent with previous research, our findings suggest that the anterior cue-elicited FN reflects the reputation appraisal and tracks the reputation learning process in social interactions.
Collapse
Affiliation(s)
- Diandian Li
- School of Management, Zhejiang UniversityHangzhou, China.,Beijing Xinsight Technology Co. Ltd.Beijing, China.,Neuromanagement Lab, Zhejiang UniversityHangzhou, China
| | - Liang Meng
- School of Business and Management, Shanghai International Studies UniversityShanghai, China.,Laboratory of Applied Brain and Cognitive Sciences, Shanghai International Studies UniversityShanghai, China
| | - Qingguo Ma
- School of Management, Zhejiang UniversityHangzhou, China.,Neuromanagement Lab, Zhejiang UniversityHangzhou, China.,Institute of Neural Management Sciences, Zhejiang University of TechnologyHangzhou, China
| |
Collapse
|
23
|
Valentin VV, Maddox WT, Ashby FG. Dopamine dependence in aggregate feedback learning: A computational cognitive neuroscience approach. Brain Cogn 2016; 109:1-18. [PMID: 27596541 DOI: 10.1016/j.bandc.2016.06.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2015] [Revised: 06/07/2016] [Accepted: 06/13/2016] [Indexed: 01/10/2023]
Abstract
Procedural learning of skills depends on dopamine-mediated striatal plasticity. Most prior work investigated single stimulus-response procedural learning followed by feedback. However, many skills include several actions that must be performed before feedback is available. A new procedural-learning task is developed in which three independent and successive unsupervised categorization responses receive aggregate feedback indicating either that all three responses were correct, or at least one response was incorrect. Experiment 1 showed superior learning of stimuli in position 3, and that learning in the first two positions was initially compromised, and then recovered. An extensive theoretical analysis that used parameter space partitioning found that a large class of procedural-learning models, which predict propagation of dopamine release from feedback to stimuli, and/or an eligibility trace, fail to fully account for these data. The analysis also suggested that any dopamine released to the second or third stimulus impaired categorization learning in the first and second positions. A second experiment tested and confirmed a novel prediction of this large class of procedural-learning models that if the to-be-learned actions are introduced one-by-one in succession then learning is much better if training begins with the first action (and works forwards) than if it begins with the last action (and works backwards).
Collapse
Affiliation(s)
- Vivian V Valentin
- Department of Psychological & Brain Sciences, University of California, Santa Barbara, United States.
| | - W Todd Maddox
- Department of Psychology, University of Texas, 108 E. Dean Keeton, Stop A8000, Austin, TX 78712-1043, United States.
| | - F Gregory Ashby
- Department of Psychological & Brain Sciences, University of California, Santa Barbara, United States.
| |
Collapse
|
24
|
Weismüller B, Bellebaum C. Expectancy affects the feedback-related negativity (FRN) for delayed feedback in probabilistic learning. Psychophysiology 2016; 53:1739-1750. [DOI: 10.1111/psyp.12738] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2016] [Accepted: 07/26/2016] [Indexed: 12/31/2022]
Affiliation(s)
- Benjamin Weismüller
- Institute for Experimental Psychology, Heinrich-Heine University Düsseldorf; Düsseldorf Germany
| | - Christian Bellebaum
- Institute for Experimental Psychology, Heinrich-Heine University Düsseldorf; Düsseldorf Germany
| |
Collapse
|
25
|
Role of Reversal Learning Impairment in Social Disinhibition following Severe Traumatic Brain Injury. J Int Neuropsychol Soc 2016; 22:303-13. [PMID: 26754292 DOI: 10.1017/s1355617715001277] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
OBJECTIVES The current study aimed to determine whether reversal learning impairments and feedback-related negativity (FRN), reflecting reward prediction error signals generated by negative feedback during the reversal learning tasks, were associated with social disinhibition in a group of participants with traumatic brain injury (TBI). METHODS Number of reversal errors on a social and a non-social reversal learning task and FRN were examined for 21 participants with TBI and 21 control participants matched for age. Participants with TBI were also divided into low and high disinhibition groups based on rated videotaped interviews. RESULTS Participants with TBI made more reversal errors and produced smaller amplitude FRNs than controls. Furthermore, participants with TBI high on social disinhibition made more reversal errors on the social reversal learning task than did those low on social disinhibition. FRN amplitude was not related to disinhibition. CONCLUSIONS These results suggest that impairment in the ability to update behavior when social reinforcement contingencies change plays a role in social disinhibition after TBI. Furthermore, the social reversal learning task used in this study may be a useful neuropsychological tool for detecting susceptibility to acquired social disinhibition following TBI. Finally, that the FRN amplitude was not associated with social disinhibition suggests that reward prediction error signals are not critical for behavioral adaptation in the social domain.
Collapse
|
26
|
Face-induced expectancies influence neural mechanisms of performance monitoring. COGNITIVE AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2015; 16:261-75. [DOI: 10.3758/s13415-015-0387-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
27
|
Dyson M, Thomas E, Casini L, Burle B. Online extraction and single trial analysis of regions contributing to erroneous feedback detection. Neuroimage 2015; 121:146-58. [PMID: 26093326 DOI: 10.1016/j.neuroimage.2015.06.041] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2014] [Revised: 06/12/2015] [Accepted: 06/13/2015] [Indexed: 11/29/2022] Open
Abstract
Understanding how the brain processes errors is an essential and active field of neuroscience. Real time extraction and analysis of error signals provide an innovative method of assessing how individuals perceive ongoing interactions without recourse to overt behaviour. This area of research is critical in modern Brain-Computer Interface (BCI) design, but may also open fruitful perspectives in cognitive neuroscience research. In this context, we sought to determine whether we can extract discriminatory error-related activity in the source space, online, and on a trial by trial basis from electroencephalography data recorded during motor imagery. Using a data driven approach, based on interpretable inverse solution algorithms, we assessed the extent to which automatically extracted error-related activity was physiologically and functionally interpretable according to performance monitoring literature. The applicability of inverse solution based methods for automatically extracting error signals, in the presence of noise generated by motor imagery, was validated by simulation. Representative regions of interest, outlining the primary generators contributing to classification, were found to correspond closely to networks involved in error detection and performance monitoring. We observed discriminative activity in non-frontal areas, demonstrating that areas outside of the medial frontal cortex can contribute to the classification of error feedback activity.
Collapse
Affiliation(s)
- Matthew Dyson
- Aix-Marseille Université, CNRS, LNC UMR 7291, 3 Place Victor Hugo, 13331 Marseille Cedex 3, France.
| | - Eoin Thomas
- Athena, INRIA, 2004, Route des Lucioles, 06902 Sophia Antipolis, France
| | - Laurence Casini
- Aix-Marseille Université, CNRS, LNC UMR 7291, 3 Place Victor Hugo, 13331 Marseille Cedex 3, France
| | - Boris Burle
- Aix-Marseille Université, CNRS, LNC UMR 7291, 3 Place Victor Hugo, 13331 Marseille Cedex 3, France.
| |
Collapse
|
28
|
Valence-separated representation of reward prediction error in feedback-related negativity and positivity. Neuroreport 2015; 26:157-62. [PMID: 25634316 DOI: 10.1097/wnr.0000000000000318] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Feedback-related negativity (FRN) is an event-related brain potential (ERP) component elicited by errors and negative outcomes. Previous studies proposed that FRN reflects the activity of a general error-processing system that incorporates reward prediction error (RPE). However, other studies reported inconsistent results on this issue - namely, that FRN only reflects the valence of feedback and that the magnitude of RPE is reflected by the other ERP component called P300. The present study focused on the relationship between the FRN amplitude and RPE. ERPs were recorded during a reversal learning task performed by the participants, and a computational model was used to estimate trial-by-trial RPEs, which we correlated with the ERPs. The results indicated that FRN and P300 reflected the magnitude of RPE in negative outcomes and positive outcomes, respectively. In addition, the correlation between RPE and the P300 amplitude was stronger than the correlation between RPE and the FRN amplitude. These differences in the correlation between ERP and RPE components may explain the inconsistent results reported by previous studies; the asymmetry in the correlations might make it difficult to detect the effect of the RPE magnitude on the FRN and makes it appear that the FRN only reflects the valence of feedback.
Collapse
|
29
|
Cavanagh JF, Frank MJ. Frontal theta as a mechanism for cognitive control. Trends Cogn Sci 2014; 18:414-21. [PMID: 24835663 PMCID: PMC4112145 DOI: 10.1016/j.tics.2014.04.012] [Citation(s) in RCA: 1415] [Impact Index Per Article: 141.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2014] [Revised: 04/21/2014] [Accepted: 04/22/2014] [Indexed: 12/31/2022]
Abstract
Recent advancements in cognitive neuroscience have afforded a description of neural responses in terms of latent algorithmic operations. However, the adoption of this approach to human scalp electroencephalography (EEG) has been more limited, despite the ability of this methodology to quantify canonical neuronal processes. Here, we provide evidence that theta band activities over the midfrontal cortex appear to reflect a common computation used for realizing the need for cognitive control. Moreover, by virtue of inherent properties of field oscillations, these theta band processes may be used to communicate this need and subsequently implement such control across disparate brain regions. Thus, frontal theta is a compelling candidate mechanism by which emergent processes, such as 'cognitive control', may be biophysically realized.
Collapse
Affiliation(s)
- James F Cavanagh
- Department of Psychology, University of New Mexico, Albuquerque, NM 87131, USA.
| | - Michael J Frank
- Department of Cognitive, Linguistic, and Psychological Sciences, Brown University, Providence, RI 02915, USA
| |
Collapse
|
30
|
Frontal midline theta reflects anxiety and cognitive control: meta-analytic evidence. ACTA ACUST UNITED AC 2014; 109:3-15. [PMID: 24787485 DOI: 10.1016/j.jphysparis.2014.04.003] [Citation(s) in RCA: 348] [Impact Index Per Article: 34.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2014] [Revised: 03/20/2014] [Accepted: 04/15/2014] [Indexed: 12/18/2022]
Abstract
Evidence from imaging and anatomical studies suggests that the midcingulate cortex (MCC) is a dynamic hub lying at the interface of affect and cognition. In particular, this neural system appears to integrate information about conflict and punishment in order to optimize behavior in the face of action-outcome uncertainty. In a series of meta-analyses, we show how recent human electrophysiological research provides compelling evidence that frontal-midline theta signals reflecting MCC activity are moderated by anxiety and predict adaptive behavioral adjustments. These findings underscore the importance of frontal theta activity to a broad spectrum of control operations. We argue that frontal-midline theta provides a neurophysiologically plausible mechanism for optimally adjusting behavior to uncertainty, a hallmark of situations that elicit anxiety and demand cognitive control. These observations compel a new perspective on the mechanisms guiding motivated learning and behavior and provide a framework for understanding the role of the MCC in temperament and psychopathology.
Collapse
|
31
|
Osinsky R, Walter H, Hewig J. What is and what could have been: An ERP study on counterfactual comparisons. Psychophysiology 2014; 51:773-81. [DOI: 10.1111/psyp.12221] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2013] [Accepted: 03/03/2014] [Indexed: 11/29/2022]
Affiliation(s)
- Roman Osinsky
- Department of Psychology I; Julius Maximilians University; Würzburg Germany
| | - Helen Walter
- Department of Psychology I; Julius Maximilians University; Würzburg Germany
| | - Johannes Hewig
- Department of Psychology I; Julius Maximilians University; Würzburg Germany
- Department of Psychology; Friedrich Schiller University; Jena Germany
| |
Collapse
|
32
|
Walsh MM, Anderson JR. Navigating complex decision spaces: Problems and paradigms in sequential choice. Psychol Bull 2014; 140:466-86. [PMID: 23834192 PMCID: PMC4309984 DOI: 10.1037/a0033455] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
To behave adaptively, we must learn from the consequences of our actions. Doing so is difficult when the consequences of an action follow a delay. This introduces the problem of temporal credit assignment. When feedback follows a sequence of decisions, how should the individual assign credit to the intermediate actions that comprise the sequence? Research in reinforcement learning provides 2 general solutions to this problem: model-free reinforcement learning and model-based reinforcement learning. In this review, we examine connections between stimulus-response and cognitive learning theories, habitual and goal-directed control, and model-free and model-based reinforcement learning. We then consider a range of problems related to temporal credit assignment. These include second-order conditioning and secondary reinforcers, latent learning and detour behavior, partially observable Markov decision processes, actions with distributed outcomes, and hierarchical learning. We ask whether humans and animals, when faced with these problems, behave in a manner consistent with reinforcement learning techniques. Throughout, we seek to identify neural substrates of model-free and model-based reinforcement learning. The former class of techniques is understood in terms of the neurotransmitter dopamine and its effects in the basal ganglia. The latter is understood in terms of a distributed network of regions including the prefrontal cortex, medial temporal lobes, cerebellum, and basal ganglia. Not only do reinforcement learning techniques have a natural interpretation in terms of human and animal behavior but they also provide a useful framework for understanding neural reward valuation and action selection.
Collapse
Affiliation(s)
- Matthew M. Walsh
- Air Force Research Laboratory, Wright-Patterson Air Force Base, OH 45433
| | - John R. Anderson
- Carnegie Mellon University, Department of Psychology, Pittsburgh, PA 15213
| |
Collapse
|
33
|
Mehlhorn K, Ben-Asher N, Dutt V, Gonzalez C. Observed Variability and Values Matter: Toward a Better Understanding of Information Search and Decisions from Experience. JOURNAL OF BEHAVIORAL DECISION MAKING 2013. [DOI: 10.1002/bdm.1809] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Affiliation(s)
- Katja Mehlhorn
- Dynamic Decision Making Laboratory, Department of Social and Decision Sciences; Carnegie Mellon University; Pittsburgh PA USA
| | - Noam Ben-Asher
- Dynamic Decision Making Laboratory, Department of Social and Decision Sciences; Carnegie Mellon University; Pittsburgh PA USA
| | - Varun Dutt
- School of Computing and Electrical Engineering, School of Humanities and Social Sciences; Indian Institute of Technology; Mandi India
| | - Cleotilde Gonzalez
- Dynamic Decision Making Laboratory, Department of Social and Decision Sciences; Carnegie Mellon University; Pittsburgh PA USA
| |
Collapse
|
34
|
Walsh MM, Anderson JR. Electrophysiological responses to feedback during the application of abstract rules. J Cogn Neurosci 2013; 25:1986-2002. [PMID: 23915052 PMCID: PMC5476962 DOI: 10.1162/jocn_a_00454] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Much research focuses on how people acquire concrete stimulus-response associations from experience; however, few neuroscientific studies have examined how people learn about and select among abstract rules. To address this issue, we recorded ERPs as participants performed an abstract rule-learning task. In each trial, they viewed a sample number and two test numbers. Participants then chose a test number using one of three abstract mathematical rules they freely selected from: greater than the sample number, less than the sample number, or equal to the sample number. No one rule was always rewarded, but some rules were rewarded more frequently than others. To maximize their earnings, participants needed to learn which rules were rewarded most frequently. All participants learned to select the best rules for repeating and novel stimulus sets that obeyed the overall reward probabilities. Participants differed, however, in the extent to which they overgeneralized those rules to repeating stimulus sets that deviated from the overall reward probabilities. The feedback-related negativity (FRN), an ERP component thought to reflect reward prediction error, paralleled behavior. The FRN was sensitive to item-specific reward probabilities in participants who detected the deviant stimulus set, and the FRN was sensitive to overall reward probabilities in participants who did not. These results show that the FRN is sensitive to the utility of abstract rules and that the individual's representation of a task's states and actions shapes behavior as well as the FRN.
Collapse
Affiliation(s)
- Matthew M. Walsh
- Air Force Research Laboratory, Wright-Patterson Air Force Base, OH
| | | |
Collapse
|
35
|
Better late than never? The effect of feedback delay on ERP indices of reward processing. COGNITIVE AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2013; 12:671-7. [PMID: 22752976 DOI: 10.3758/s13415-012-0104-z] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The feedback negativity (FN), an early neural response that differentiates rewards from losses, appears to be generated in part by reward circuits in the brain. A prominent model of the FN suggests that it reflects learning processes by which environmental feedback shapes behavior. Although there is evidence that human behavior is more strongly influenced by rewards that quickly follow actions, in nonlaboratory settings, optimal behaviors are not always followed by immediate rewards. However, it is not clear how the introduction of a delay between response selection and feedback impacts the FN. Thus, the present study used a simple forced choice gambling task to elicit the FN, in which feedback about rewards and losses was presented after either 1 or 6 s. Results suggest that, at short delays (1 s), participants clearly differentiated losses from rewards, as evidenced in the magnitude of the FN. At long delays (6 s), on the other hand, the difference between losses and rewards was negligible. Results are discussed in terms of eligibility traces and the reinforcement learning model of the FN.
Collapse
|
36
|
Osinsky R, Mussel P, Ohrlein L, Hewig J. A neural signature of the creation of social evaluation. Soc Cogn Affect Neurosci 2013; 9:731-6. [PMID: 23547246 DOI: 10.1093/scan/nst051] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Previous research has shown that receiving an unfair monetary offer in economic bargaining elicits also-called feedback negativity (FN). This scalp-recorded brain potential probably reflects a bad-vs-good evaluation in the medial frontal cortex and has been linked to fundamental processes of reinforcement learning. In the present study, we investigated whether the evaluative mechanism indexed by the FN is also involved in learning who is an unfair vs fair bargaining partner. An electroencephalogram was recorded while participants completed a computerized version of the Ultimatum Game, repeatedly receiving fair or unfair monetary offers from alleged other participants. Some of these proposers were either always fair or always unfair in their offers. In each trial, participants first saw a portrait picture of the respective proposer before the monetary offer was presented. Therefore, the faces could be used as predictive cues for the fairness of the pending offers. We found that not only unfair offers themselves induced a FN, but also (over the task) faces of unfair proposers. Thus, when interaction partners repeatedly behave in an unfair way, their faces acquire a negative valence, which manifests in a basal neural mechanism of bad-vs-good evaluation.
Collapse
Affiliation(s)
- Roman Osinsky
- Department of Psychology I, Julius-Maximilians-University Würzburg, 97070 Würzburg, Germany and Department of Psychology, Friedrich-Schiller-University Jena, 07743 Jena, Germany
| | - Patrick Mussel
- Department of Psychology I, Julius-Maximilians-University Würzburg, 97070 Würzburg, Germany and Department of Psychology, Friedrich-Schiller-University Jena, 07743 Jena, Germany
| | - Linda Ohrlein
- Department of Psychology I, Julius-Maximilians-University Würzburg, 97070 Würzburg, Germany and Department of Psychology, Friedrich-Schiller-University Jena, 07743 Jena, Germany
| | - Johannes Hewig
- Department of Psychology I, Julius-Maximilians-University Würzburg, 97070 Würzburg, Germany and Department of Psychology, Friedrich-Schiller-University Jena, 07743 Jena, GermanyDepartment of Psychology I, Julius-Maximilians-University Würzburg, 97070 Würzburg, Germany and Department of Psychology, Friedrich-Schiller-University Jena, 07743 Jena, Germany
| |
Collapse
|
37
|
Walsh MM, Anderson JR. Learning from experience: event-related potential correlates of reward processing, neural adaptation, and behavioral choice. Neurosci Biobehav Rev 2012; 36:1870-84. [PMID: 22683741 DOI: 10.1016/j.neubiorev.2012.05.008] [Citation(s) in RCA: 366] [Impact Index Per Article: 30.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2012] [Revised: 05/17/2012] [Accepted: 05/21/2012] [Indexed: 11/30/2022]
Abstract
To behave adaptively, we must learn from the consequences of our actions. Studies using event-related potentials (ERPs) have been informative with respect to the question of how such learning occurs. These studies have revealed a frontocentral negativity termed the feedback-related negativity (FRN) that appears after negative feedback. According to one prominent theory, the FRN tracks the difference between the values of actual and expected outcomes, or reward prediction errors. As such, the FRN provides a tool for studying reward valuation and decision making. We begin this review by examining the neural significance of the FRN. We then examine its functional significance. To understand the cognitive processes that occur when the FRN is generated, we explore variables that influence its appearance and amplitude. Specifically, we evaluate four hypotheses: (1) the FRN encodes a quantitative reward prediction error; (2) the FRN is evoked by outcomes and by stimuli that predict outcomes; (3) the FRN and behavior change with experience; and (4) the system that produces the FRN is maximally engaged by volitional actions.
Collapse
Affiliation(s)
- Matthew M Walsh
- Carnegie Mellon University, Department of Psychology,, Baker Hall 342c, Pittsburgh, PA 15213, United States.
| | | |
Collapse
|
38
|
Modulation of the feedback-related negativity by instruction and experience. Proc Natl Acad Sci U S A 2011; 108:19048-53. [PMID: 22065792 DOI: 10.1073/pnas.1117189108] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A great deal of research focuses on how humans and animals learn from trial-and-error interactions with the environment. This research has established the viability of reinforcement learning as a model of behavioral adaptation and neural reward valuation. Error-driven learning is inefficient and dangerous, however. Fortunately, humans learn from nonexperiential sources of information as well. In the present study, we focused on one such form of information, instruction. We recorded event-related potentials as participants performed a probabilistic learning task. In one experiment condition, participants received feedback only about whether their responses were rewarded. In the other condition, they also received instruction about reward probabilities before performing the task. We found that instruction eliminated participants' reliance on feedback as evidenced by their immediate asymptotic performance in the instruction condition. In striking contrast, the feedback-related negativity, an event-related potential component thought to reflect neural reward prediction error, continued to adapt with experience in both conditions. These results show that, whereas instruction may immediately control behavior, certain neural responses must be learned from experience.
Collapse
|