1
|
Clairis N, Lopez-Persem A. Debates on the dorsomedial prefrontal/dorsal anterior cingulate cortex: insights for future research. Brain 2023; 146:4826-4844. [PMID: 37530487 PMCID: PMC10690029 DOI: 10.1093/brain/awad263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 07/19/2023] [Accepted: 07/22/2023] [Indexed: 08/03/2023] Open
Abstract
The dorsomedial prefrontal cortex/dorsal anterior cingulate cortex (dmPFC/dACC) is a brain area subject to many theories and debates over its function(s). Even its precise anatomical borders are subject to much controversy. In the past decades, the dmPFC/dACC has been associated with more than 15 different cognitive processes, which sometimes appear quite unrelated (e.g. body perception, cognitive conflict). As a result, understanding what the dmPFC/dACC does has become a real challenge for many neuroscientists. Several theories of this brain area's function(s) have been developed, leading to successive and competitive publications bearing different models, which sometimes contradict each other. During the last two decades, the lively scientific exchanges around the dmPFC/dACC have promoted fruitful research in cognitive neuroscience. In this review, we provide an overview of the anatomy of the dmPFC/dACC, summarize the state of the art of functions that have been associated with this brain area and present the main theories aiming at explaining the dmPFC/dACC function(s). We explore the commonalities and the arguments between the different theories. Finally, we explain what can be learned from these debates for future investigations of the dmPFC/dACC and other brain regions' functions.
Collapse
Affiliation(s)
- Nicolas Clairis
- Laboratory of Behavioral Genetics (LGC)- Brain Mind Institute (BMI)- Sciences de la Vie (SV), École Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
| | - Alizée Lopez-Persem
- FrontLab, Institut du Cerveau - Paris Brain Institute - ICM, Inserm, CNRS, Sorbonne University, AP HP, Hôpital de la Pitié Salpêtrière, 75013 Paris, France
| |
Collapse
|
2
|
van de Groep IH, Bos MGN, Popma A, Crone EA, Jansen LMC. A neurocognitive model of early onset persistent and desistant antisocial behavior in early adulthood. Front Hum Neurosci 2023; 17:1100277. [PMID: 37533586 PMCID: PMC10392129 DOI: 10.3389/fnhum.2023.1100277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Accepted: 06/22/2023] [Indexed: 08/04/2023] Open
Abstract
It remains unclear which functional and neurobiological mechanisms are associated with persistent and desistant antisocial behavior in early adulthood. We reviewed the empirical literature and propose a neurocognitive social information processing model for early onset persistent and desistant antisocial behavior in early adulthood, focusing on how young adults evaluate, act upon, monitor, and learn about their goals and self traits. Based on the reviewed literature, we propose that persistent antisocial behavior is characterized by domain-general impairments in self-relevant and goal-related information processing, regulation, and learning, which is accompanied by altered activity in fronto-limbic brain areas. We propose that desistant antisocial development is associated with more effortful information processing, regulation and learning, that possibly balances self-relevant goals and specific situational characteristics. The proposed framework advances insights by considering individual differences such as psychopathic personality traits, and specific emotional characteristics (e.g., valence of social cues), to further illuminate functional and neural mechanisms underlying heterogenous developmental pathways. Finally, we address important open questions and offer suggestions for future research to improve scientific knowledge on general and context-specific expression and development of antisocial behavior in early adulthood.
Collapse
Affiliation(s)
- Ilse H. van de Groep
- Erasmus School of Social and Behavioral Sciences, Erasmus University Rotterdam, Rotterdam, Netherlands
- Leiden Institute for Brain and Cognition, Leiden University, Leiden, Netherlands
- Department of Child and Adolescent Psychiatry and Psychosocial Care, Amsterdam University Medical Center, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
| | - Marieke G. N. Bos
- Leiden Institute for Brain and Cognition, Leiden University, Leiden, Netherlands
- Department of Developmental and Educational Psychology, Institute of Psychology, Leiden University, Leiden, Netherlands
| | - Arne Popma
- Department of Child and Adolescent Psychiatry and Psychosocial Care, Amsterdam University Medical Center, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
- Amsterdam Public Health, Mental Health, Amsterdam, Netherlands
| | - Eveline A. Crone
- Erasmus School of Social and Behavioral Sciences, Erasmus University Rotterdam, Rotterdam, Netherlands
- Leiden Institute for Brain and Cognition, Leiden University, Leiden, Netherlands
- Department of Developmental and Educational Psychology, Institute of Psychology, Leiden University, Leiden, Netherlands
| | - Lucres M. C. Jansen
- Department of Child and Adolescent Psychiatry and Psychosocial Care, Amsterdam University Medical Center, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
- Amsterdam Public Health, Mental Health, Amsterdam, Netherlands
| |
Collapse
|
3
|
Prescott TJ, Wilson SP. Understanding brain functional architecture through robotics. Sci Robot 2023; 8:eadg6014. [PMID: 37256968 DOI: 10.1126/scirobotics.adg6014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Accepted: 05/05/2023] [Indexed: 06/02/2023]
Abstract
Robotics is increasingly seen as a useful test bed for computational models of the brain functional architecture underlying animal behavior. We provide an overview of past and current work, focusing on probabilistic and dynamical models, including approaches premised on the free energy principle, situating this endeavor in relation to evidence that the brain constitutes a layered control system. We argue that future neurorobotic models should integrate multiple neurobiological constraints and be hybrid in nature.
Collapse
Affiliation(s)
- Tony J Prescott
- Department of Computer Science, University of Sheffield, Sheffield, UK
| | - Stuart P Wilson
- Department of Computer Science, University of Sheffield, Sheffield, UK
| |
Collapse
|
4
|
A Reinforcement Meta-Learning framework of executive function and information demand. Neural Netw 2023; 157:103-113. [DOI: 10.1016/j.neunet.2022.10.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Revised: 09/05/2022] [Accepted: 10/06/2022] [Indexed: 11/09/2022]
|
5
|
Brain-inspired meta-reinforcement learning cognitive control in conflictual inhibition decision-making task for artificial agents. Neural Netw 2022; 154:283-302. [DOI: 10.1016/j.neunet.2022.06.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2021] [Revised: 06/09/2022] [Accepted: 06/16/2022] [Indexed: 11/21/2022]
|
6
|
Wang D, Chen S, Hu Y, Liu L, Wang H. Behavior Decision of Mobile Robot With a Neurophysiologically Motivated Reinforcement Learning Model. IEEE Trans Cogn Dev Syst 2022. [DOI: 10.1109/tcds.2020.3035778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
7
|
|
8
|
Bouchacourt F, Palminteri S, Koechlin E, Ostojic S. Temporal chunking as a mechanism for unsupervised learning of task-sets. eLife 2020; 9:50469. [PMID: 32149602 PMCID: PMC7108869 DOI: 10.7554/elife.50469] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Accepted: 02/24/2020] [Indexed: 12/26/2022] Open
Abstract
Depending on environmental demands, humans can learn and exploit multiple concurrent sets of stimulus-response associations. Mechanisms underlying the learning of such task-sets remain unknown. Here we investigate the hypothesis that task-set learning relies on unsupervised chunking of stimulus-response associations that occur in temporal proximity. We examine behavioral and neural data from a task-set learning experiment using a network model. We first show that task-set learning can be achieved provided the timescale of chunking is slower than the timescale of stimulus-response learning. Fitting the model to behavioral data on a subject-by-subject basis confirmed this expectation and led to specific predictions linking chunking and task-set retrieval that were borne out by behavioral performance and reaction times. Comparing the model activity with BOLD signal allowed us to identify neural correlates of task-set retrieval in a functional network involving ventral and dorsal prefrontal cortex, with the dorsal system preferentially engaged when retrievals are used to improve performance.
Collapse
Affiliation(s)
- Flora Bouchacourt
- Laboratoire de Neurosciences Cognitives et Computationnelles, Institut National de la Sante et de la Recherche Medicale, Paris, France.,Departement d'Etudes Cognitives, Ecole Normale Superieure, Paris, France
| | - Stefano Palminteri
- Laboratoire de Neurosciences Cognitives et Computationnelles, Institut National de la Sante et de la Recherche Medicale, Paris, France.,Departement d'Etudes Cognitives, Ecole Normale Superieure, Paris, France.,Institut d'Etudes de la Cognition, Universite de Recherche Paris Sciences et Lettres, Paris, France
| | - Etienne Koechlin
- Laboratoire de Neurosciences Cognitives et Computationnelles, Institut National de la Sante et de la Recherche Medicale, Paris, France.,Departement d'Etudes Cognitives, Ecole Normale Superieure, Paris, France
| | - Srdjan Ostojic
- Laboratoire de Neurosciences Cognitives et Computationnelles, Institut National de la Sante et de la Recherche Medicale, Paris, France.,Departement d'Etudes Cognitives, Ecole Normale Superieure, Paris, France.,Institut d'Etudes de la Cognition, Universite de Recherche Paris Sciences et Lettres, Paris, France
| |
Collapse
|
9
|
Dopamine blockade impairs the exploration-exploitation trade-off in rats. Sci Rep 2019; 9:6770. [PMID: 31043685 PMCID: PMC6494917 DOI: 10.1038/s41598-019-43245-z] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Accepted: 04/18/2019] [Indexed: 01/30/2023] Open
Abstract
In a volatile environment where rewards are uncertain, successful performance requires a delicate balance between exploitation of the best option and exploration of alternative choices. It has theoretically been proposed that dopamine contributes to the control of this exploration-exploitation trade-off, specifically that the higher the level of tonic dopamine, the more exploitation is favored. We demonstrate here that there is a formal relationship between the rescaling of dopamine positive reward prediction errors and the exploration-exploitation trade-off in simple non-stationary multi-armed bandit tasks. We further show in rats performing such a task that systemically antagonizing dopamine receptors greatly increases the number of random choices without affecting learning capacities. Simulations and comparison of a set of different computational models (an extended Q-learning model, a directed exploration model, and a meta-learning model) fitted on each individual confirm that, independently of the model, decreasing dopaminergic activity does not affect learning rate but is equivalent to an increase in random exploration rate. This study shows that dopamine could adapt the exploration-exploitation trade-off in decision-making when facing changing environmental contingencies.
Collapse
|
10
|
Impacts of inter-trial interval duration on a computational model of sign-tracking vs. goal-tracking behaviour. Psychopharmacology (Berl) 2019; 236:2373-2388. [PMID: 31367850 PMCID: PMC6695359 DOI: 10.1007/s00213-019-05323-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Accepted: 07/01/2019] [Indexed: 01/15/2023]
Abstract
In the context of Pavlovian conditioning, two types of behaviour may emerge within the population (Flagel et al. Nature, 469(7328): 53-57, 2011). Animals may choose to engage either with the conditioned stimulus (CS), a behaviour known as sign-tracking (ST) which is sensitive to dopamine inhibition for its acquisition, or with the food cup in which the reward or unconditioned stimulus (US) will eventually be delivered, a behaviour known as goal-tracking (GT) which is dependent on dopamine for its expression only. Previous work by Lesaint et al. (PLoS Comput Biol, 10(2), 2014) offered a computational explanation for these phenomena and led to the prediction that varying the duration of the inter-trial interval (ITI) would change the relative ST-GT proportion in the population as well as phasic dopamine responses. A recent study verified this prediction, but also found a rich variance of ST and GT behaviours within the trial which goes beyond the original computational model. In this paper, we provide a computational perspective on these novel results.
Collapse
|
11
|
Khamassi M, Velentzas G, Tsitsimis T, Tzafestas C. Robot Fast Adaptation to Changes in Human Engagement During Simulated Dynamic Social Interaction With Active Exploration in Parameterized Reinforcement Learning. IEEE Trans Cogn Dev Syst 2018. [DOI: 10.1109/tcds.2018.2843122] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
12
|
Wang JX, Kurth-Nelson Z, Kumaran D, Tirumala D, Soyer H, Leibo JZ, Hassabis D, Botvinick M. Prefrontal cortex as a meta-reinforcement learning system. Nat Neurosci 2018; 21:860-868. [DOI: 10.1038/s41593-018-0147-8] [Citation(s) in RCA: 258] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2017] [Accepted: 04/05/2018] [Indexed: 11/09/2022]
|
13
|
Stolyarova A. Solving the Credit Assignment Problem With the Prefrontal Cortex. Front Neurosci 2018; 12:182. [PMID: 29636659 PMCID: PMC5881225 DOI: 10.3389/fnins.2018.00182] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Accepted: 03/06/2018] [Indexed: 12/13/2022] Open
Abstract
In naturalistic multi-cue and multi-step learning tasks, where outcomes of behavior are delayed in time, discovering which choices are responsible for rewards can present a challenge, known as the credit assignment problem. In this review, I summarize recent work that highlighted a critical role for the prefrontal cortex (PFC) in assigning credit where it is due in tasks where only a few of the multitude of cues or choices are relevant to the final outcome of behavior. Collectively, these investigations have provided compelling support for specialized roles of the orbitofrontal (OFC), anterior cingulate (ACC), and dorsolateral prefrontal (dlPFC) cortices in contingent learning. However, recent work has similarly revealed shared contributions and emphasized rich and heterogeneous response properties of neurons in these brain regions. Such functional overlap is not surprising given the complexity of reciprocal projections spanning the PFC. In the concluding section, I overview the evidence suggesting that the OFC, ACC and dlPFC communicate extensively, sharing the information about presented options, executed decisions and received rewards, which enables them to assign credit for outcomes to choices on which they are contingent. This account suggests that lesion or inactivation/inhibition experiments targeting a localized PFC subregion will be insufficient to gain a fine-grained understanding of credit assignment during learning and instead poses refined questions for future research, shifting the focus from focal manipulations to experimental techniques targeting cortico-cortical projections.
Collapse
Affiliation(s)
- Alexandra Stolyarova
- Department of Psychology, University of California, Los Angeles, Los Angeles, CA, United States
| |
Collapse
|
14
|
Cogliati Dezza I, Yu AJ, Cleeremans A, Alexander W. Learning the value of information and reward over time when solving exploration-exploitation problems. Sci Rep 2017; 7:16919. [PMID: 29209058 PMCID: PMC5717252 DOI: 10.1038/s41598-017-17237-w] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2017] [Accepted: 11/22/2017] [Indexed: 11/09/2022] Open
Abstract
To flexibly adapt to the demands of their environment, animals are constantly exposed to the conflict resulting from having to choose between predictably rewarding familiar options (exploitation) and risky novel options, the value of which essentially consists of obtaining new information about the space of possible rewards (exploration). Despite extensive research, the mechanisms that subtend the manner in which animals solve this exploitation-exploration dilemma are still poorly understood. Here, we investigate human decision-making in a gambling task in which the informational value of each trial and the reward potential were separately manipulated. To better characterize the mechanisms that underlined the observed behavioural choices, we introduce a computational model that augments the standard reward-based reinforcement learning formulation by associating a value to information. We find that both reward and information gained during learning influence the balance between exploitation and exploration, and that this influence was dependent on the reward context. Our results shed light on the mechanisms that underpin decision-making under uncertainty, and suggest new approaches for investigating the exploration-exploitation dilemma throughout the animal kingdom.
Collapse
Affiliation(s)
- Irene Cogliati Dezza
- Centre for Research in Cognition & Neurosciences (CRCN), Université Libre de Bruxelles, Brussels, Belgium.
| | - Angela J Yu
- Department of Cognitive Science, University of California San Diego, La Jolla, CA, United States
| | - Axel Cleeremans
- Centre for Research in Cognition & Neurosciences (CRCN), Université Libre de Bruxelles, Brussels, Belgium
| | - William Alexander
- Department of Experimental Psychology, Ghent University, Gent, Belgium
| |
Collapse
|
15
|
Vassena E, Holroyd CB, Alexander WH. Computational Models of Anterior Cingulate Cortex: At the Crossroads between Prediction and Effort. Front Neurosci 2017. [PMID: 28634438 PMCID: PMC5459890 DOI: 10.3389/fnins.2017.00316] [Citation(s) in RCA: 73] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
In the last two decades the anterior cingulate cortex (ACC) has become one of the most investigated areas of the brain. Extensive neuroimaging evidence suggests countless functions for this region, ranging from conflict and error coding, to social cognition, pain and effortful control. In response to this burgeoning amount of data, a proliferation of computational models has tried to characterize the neurocognitive architecture of ACC. Early seminal models provided a computational explanation for a relatively circumscribed set of empirical findings, mainly accounting for EEG and fMRI evidence. More recent models have focused on ACC's contribution to effortful control. In parallel to these developments, several proposals attempted to explain within a single computational framework a wider variety of empirical findings that span different cognitive processes and experimental modalities. Here we critically evaluate these modeling attempts, highlighting the continued need to reconcile the array of disparate ACC observations within a coherent, unifying framework.
Collapse
Affiliation(s)
- Eliana Vassena
- Donders Center for Cognitive Neuroimaging, Donders Institute for Brain, Cognition and Behaviour, Radboud University NijmegenNijmegen, Netherlands.,Department of Experimental Psychology, Ghent UniversityGhent, Belgium
| | - Clay B Holroyd
- Department of Psychology, University of VictoriaVictoria, BC, Canada
| | | |
Collapse
|
16
|
Vinckier F, Gaillard R, Palminteri S, Rigoux L, Salvador A, Fornito A, Adapa R, Krebs MO, Pessiglione M, Fletcher PC. Confidence and psychosis: a neuro-computational account of contingency learning disruption by NMDA blockade. Mol Psychiatry 2016; 21:946-55. [PMID: 26055423 PMCID: PMC5414075 DOI: 10.1038/mp.2015.73] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/21/2014] [Revised: 03/28/2015] [Accepted: 04/13/2015] [Indexed: 02/02/2023]
Abstract
A state of pathological uncertainty about environmental regularities might represent a key step in the pathway to psychotic illness. Early psychosis can be investigated in healthy volunteers under ketamine, an NMDA receptor antagonist. Here, we explored the effects of ketamine on contingency learning using a placebo-controlled, double-blind, crossover design. During functional magnetic resonance imaging, participants performed an instrumental learning task, in which cue-outcome contingencies were probabilistic and reversed between blocks. Bayesian model comparison indicated that in such an unstable environment, reinforcement learning parameters are downregulated depending on confidence level, an adaptive mechanism that was specifically disrupted by ketamine administration. Drug effects were underpinned by altered neural activity in a fronto-parietal network, which reflected the confidence-based shift to exploitation of learned contingencies. Our findings suggest that an early characteristic of psychosis lies in a persistent doubt that undermines the stabilization of behavioral policy resulting in a failure to exploit regularities in the environment.
Collapse
Affiliation(s)
- F Vinckier
- Service de Psychiatrie, Centre Hospitalier Sainte-Anne, Université Paris Descartes, Sorbonne Paris Cité, Faculté de Médecine Paris Descartes, Paris, France
- Motivation, Brain, and Behavior Lab, Centre de Neuro-Imagerie de Recherche, Institut du Cerveau et de la Moelle épinière, Groupe Hospitalier Pitié-Salpêtrière, Paris, France
- INSERM U975, CNRS UMR 7225, UPMC-P6, UMR S 1127, Paris Cedex 13, France
| | - R Gaillard
- Service de Psychiatrie, Centre Hospitalier Sainte-Anne, Université Paris Descartes, Sorbonne Paris Cité, Faculté de Médecine Paris Descartes, Paris, France
- Department of Psychiatry and Behavioural and Clinical Neuroscience Institute, University of Cambridge, Cambridge, UK
- Laboratoire de "Physiopathologie des maladies Psychiatriques", Centre de Psychiatrie et Neurosciences U894, INSERM; Université Paris Descartes, Sorbonne Paris Cité, Paris, France
| | - S Palminteri
- Laboratoire de Neurosciences Cognitives (LNC), INSERM U960, Ecole Normale Supérieure (ENS), Paris, France
- Institute of Cognitive Neurosciences (ICN), University College London (UCL), London, UK
| | - L Rigoux
- Motivation, Brain, and Behavior Lab, Centre de Neuro-Imagerie de Recherche, Institut du Cerveau et de la Moelle épinière, Groupe Hospitalier Pitié-Salpêtrière, Paris, France
- INSERM U975, CNRS UMR 7225, UPMC-P6, UMR S 1127, Paris Cedex 13, France
| | - A Salvador
- Service de Psychiatrie, Centre Hospitalier Sainte-Anne, Université Paris Descartes, Sorbonne Paris Cité, Faculté de Médecine Paris Descartes, Paris, France
- Laboratoire de "Physiopathologie des maladies Psychiatriques", Centre de Psychiatrie et Neurosciences U894, INSERM; Université Paris Descartes, Sorbonne Paris Cité, Paris, France
| | - A Fornito
- Monash Clinical and Imaging Neuroscience, School of Psychological Sciences and Monash Biomedical Imaging, Monash University, Victoria, Australia
| | - R Adapa
- Division of Anaesthesia, University of Cambridge, Cambridge, UK
- Addenbrooke‘s Hospital, Cambridge, UK
| | - M O Krebs
- Service de Psychiatrie, Centre Hospitalier Sainte-Anne, Université Paris Descartes, Sorbonne Paris Cité, Faculté de Médecine Paris Descartes, Paris, France
- Laboratoire de "Physiopathologie des maladies Psychiatriques", Centre de Psychiatrie et Neurosciences U894, INSERM; Université Paris Descartes, Sorbonne Paris Cité, Paris, France
| | - M Pessiglione
- Motivation, Brain, and Behavior Lab, Centre de Neuro-Imagerie de Recherche, Institut du Cerveau et de la Moelle épinière, Groupe Hospitalier Pitié-Salpêtrière, Paris, France
- INSERM U975, CNRS UMR 7225, UPMC-P6, UMR S 1127, Paris Cedex 13, France
| | - P C Fletcher
- Department of Psychiatry and Behavioural and Clinical Neuroscience Institute, University of Cambridge, Cambridge, UK
- Cambridge and Peterborough Foundation Trust, Cambridge, UK
| |
Collapse
|
17
|
Westendorff S, Kaping D, Everling S, Womelsdorf T. Prefrontal and anterior cingulate cortex neurons encode attentional targets even when they do not apparently bias behavior. J Neurophysiol 2016; 116:796-811. [PMID: 27193317 DOI: 10.1152/jn.00027.2016] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2016] [Accepted: 05/18/2016] [Indexed: 11/22/2022] Open
Abstract
Neurons in anterior cingulate and prefrontal cortex (ACC/PFC) carry information about behaviorally relevant target stimuli. This information is believed to affect behavior by exerting a top-down attentional bias on stimulus selection. However, attention information may not necessarily be a biasing signal but could be a corollary signal that is not directly related to ongoing behavioral success, or it could reflect the monitoring of targets similar to an eligibility trace useful for later attentional adjustment. To test this suggestion we quantified how attention information relates to behavioral success in neurons recorded in multiple subfields in macaque ACC/PFC during a cued attention task. We found that attention cues activated three separable neuronal groups that encoded spatial attention information but were differently linked to behavioral success. A first group encoded attention targets on correct and error trials. This group spread across ACC/PFC and represented targets transiently after cue onset, irrespective of behavior. A second group encoded attention targets on correct trials only, closely predicting behavior. These neurons were not only prevalent in lateral prefrontal but also in anterior cingulate cortex. A third group encoded target locations only on error trials. This group was evident in ACC and PFC and was activated in error trials "as if" attention was shifted to the target location but without evidence for such behavior. These results show that only a portion of neuronaly available information about attention targets biases behavior. We speculate that additionally a unique neural subnetwork encodes counterfactual attention information.
Collapse
Affiliation(s)
- Stephanie Westendorff
- Department of Biology, Centre for Vision Research, York University, Toronto, Ontario, Canada;
| | - Daniel Kaping
- Department of Biology, Centre for Vision Research, York University, Toronto, Ontario, Canada; National Institute of Mental Health, Klecany, Czech Republic; and
| | - Stefan Everling
- Department of Physiology and Pharmacology, Centre for Functional and Metabolic Mapping, Western University, Ontario,Canada
| | - Thilo Womelsdorf
- Department of Biology, Centre for Vision Research, York University, Toronto, Ontario, Canada; Department of Physiology and Pharmacology, Centre for Functional and Metabolic Mapping, Western University, Ontario,Canada
| |
Collapse
|
18
|
Chakraborty S, Kolling N, Walton ME, Mitchell AS. Critical role for the mediodorsal thalamus in permitting rapid reward-guided updating in stochastic reward environments. eLife 2016; 5. [PMID: 27136677 PMCID: PMC4887209 DOI: 10.7554/elife.13588] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2015] [Accepted: 05/01/2016] [Indexed: 11/13/2022] Open
Abstract
Adaptive decision-making uses information gained when exploring alternative options to decide whether to update the current choice strategy. Magnocellular mediodorsal thalamus (MDmc) supports adaptive decision-making, but its causal contribution is not well understood. Monkeys with excitotoxic MDmc damage were tested on probabilistic three-choice decision-making tasks. They could learn and track the changing values in object-reward associations, but they were severely impaired at updating choices after reversals in reward contingencies or when there were multiple options associated with reward. These deficits were not caused by perseveration or insensitivity to negative feedback though. Instead, monkeys with MDmc lesions exhibited an inability to use reward to promote choice repetition after switching to an alternative option due to a diminished influence of recent past choices and the last outcome to guide future behavior. Together, these data suggest MDmc allows for the rapid discovery and persistence with rewarding options, particularly in uncertain or changing environments. DOI:http://dx.doi.org/10.7554/eLife.13588.001 A small structure deep inside the brain, called the mediodorsal thalamus, is a critical part of a brain network that is important for learning new information and making decisions. However, the exact role of this brain area is still not understood, and there is little evidence showing that this area is actually needed to make the best choices. To explore the role of this area further, Chakraborty et al. trained macaque monkeys to choose between three colorful objects displayed on a touchscreen that was controlled by a computer. Some of their choices resulted in the monkeys getting a tasty food pellet as a reward. However the probability of receiving a reward changed during testing, and in some cases, reversed, meaning that the highest rewarded object was no longer rewarded when chosen and vice versa. While at first the monkeys did not know which choice was the right one, they quickly learned and changed their choices during the test according to which option resulted in them receiving the most reward. Next, the mediodorsal thalamus in each monkey was damaged and the tests were repeated. Previous research had suggested that such damage might result in animals repeatedly choosing the same option, even though it is clearly the wrong choice. However, Chakraborty et al. showed that it is not as simple as that. Instead monkeys with damage to the mediodorsal thalamus could make different choices but they struggled to use information from their most recent choices to best guide their future behavior. Specifically, the pattern of the monkeys’ choices suggests that the mediodorsal thalamus helps to quickly link recent choices that resulted in a reward in order to allow an individual to choose the best option as their next choice. Further studies are now needed to understand the messages that are relayed between the mediodorsal thalamus and interconnected areas during this rapid linking of recent choices, rewards and upcoming decisions. This will help reveal how these brain areas support normal thought processes and how these processes might be altered in mental health disorders involving learning information and making decisions. DOI:http://dx.doi.org/10.7554/eLife.13588.002
Collapse
Affiliation(s)
| | - Nils Kolling
- Department of Experimental Psychology, Oxford University, Oxford, United Kingdom
| | - Mark E Walton
- Department of Experimental Psychology, Oxford University, Oxford, United Kingdom
| | - Anna S Mitchell
- Department of Experimental Psychology, Oxford University, Oxford, United Kingdom
| |
Collapse
|
19
|
Loncar-Turukalo T, Mijatovic G, Bozanic N, Stoll FM, Bajic D, Procyk E. Time-frequency characterization of local field potential in a decision making task. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2016; 2015:5565-8. [PMID: 26737553 DOI: 10.1109/embc.2015.7319653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
This study seeks to characterize the neuronal mechanisms underlying voluntary decisions to check/verify. In order to describe and potentially decode decisions from brain signals we analyzed intracortical recordings from monkey prefrontal regions obtained during a cognitive task requiring self-initiated as well as cue-instructed decisions. Using local field potentials (LFP) and single units, we analyzed power spectral density, oscillatory modes, power profiles in time, single unit firing rate, and spike-phase relationships in the β band. Our results point toward specific but variable activation patterns of oscillations in β band from separate recordings, with task-dependent frequency preference and amplitude modulation of power. The results suggest relationships between particular LFP oscillations and functions engaged at specific time in the task.
Collapse
|
20
|
Logiaco L, Quilodran R, Procyk E, Arleo A. Spatiotemporal Spike Coding of Behavioral Adaptation in the Dorsal Anterior Cingulate Cortex. PLoS Biol 2015; 13:e1002222. [PMID: 26266537 PMCID: PMC4534466 DOI: 10.1371/journal.pbio.1002222] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2015] [Accepted: 07/06/2015] [Indexed: 11/18/2022] Open
Abstract
The frontal cortex controls behavioral adaptation in environments governed by complex rules. Many studies have established the relevance of firing rate modulation after informative events signaling whether and how to update the behavioral policy. However, whether the spatiotemporal features of these neuronal activities contribute to encoding imminent behavioral updates remains unclear. We investigated this issue in the dorsal anterior cingulate cortex (dACC) of monkeys while they adapted their behavior based on their memory of feedback from past choices. We analyzed spike trains of both single units and pairs of simultaneously recorded neurons using an algorithm that emulates different biologically plausible decoding circuits. This method permits the assessment of the performance of both spike-count and spike-timing sensitive decoders. In response to the feedback, single neurons emitted stereotypical spike trains whose temporal structure identified informative events with higher accuracy than mere spike count. The optimal decoding time scale was in the range of 70-200 ms, which is significantly shorter than the memory time scale required by the behavioral task. Importantly, the temporal spiking patterns of single units were predictive of the monkeys' behavioral response time. Furthermore, some features of these spiking patterns often varied between jointly recorded neurons. All together, our results suggest that dACC drives behavioral adaptation through complex spatiotemporal spike coding. They also indicate that downstream networks, which decode dACC feedback signals, are unlikely to act as mere neural integrators.
Collapse
Affiliation(s)
- Laureline Logiaco
- INSERM, U968, Paris, France
- Sorbonne Universités, UPMC Univ Paris 06, UMR_S 968, Institut de la Vision, Paris, France
- CNRS, UMR_7210, Paris, France
- * E-mail: (LL); (AA)
| | - René Quilodran
- Escuela de Medicina, Departamento de Pre-clínicas, Universidad de Valparaíso, Hontaneda, Valparaíso, Chile
| | - Emmanuel Procyk
- Stem Cell and Brain Research Institute, Institut National de la Santé et de la Recherche Médicale U846, 69500 Bron, France
- Université de Lyon, Université Lyon 1, Lyon, France
| | - Angelo Arleo
- INSERM, U968, Paris, France
- Sorbonne Universités, UPMC Univ Paris 06, UMR_S 968, Institut de la Vision, Paris, France
- CNRS, UMR_7210, Paris, France
- * E-mail: (LL); (AA)
| |
Collapse
|
21
|
Learning to minimize efforts versus maximizing rewards: computational principles and neural correlates. J Neurosci 2015; 34:15621-30. [PMID: 25411490 DOI: 10.1523/jneurosci.1350-14.2014] [Citation(s) in RCA: 104] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
The mechanisms of reward maximization have been extensively studied at both the computational and neural levels. By contrast, little is known about how the brain learns to choose the options that minimize action cost. In principle, the brain could have evolved a general mechanism that applies the same learning rule to the different dimensions of choice options. To test this hypothesis, we scanned healthy human volunteers while they performed a probabilistic instrumental learning task that varied in both the physical effort and the monetary outcome associated with choice options. Behavioral data showed that the same computational rule, using prediction errors to update expectations, could account for both reward maximization and effort minimization. However, these learning-related variables were encoded in partially dissociable brain areas. In line with previous findings, the ventromedial prefrontal cortex was found to positively represent expected and actual rewards, regardless of effort. A separate network, encompassing the anterior insula, the dorsal anterior cingulate, and the posterior parietal cortex, correlated positively with expected and actual efforts. These findings suggest that the same computational rule is applied by distinct brain systems, depending on the choice dimension-cost or benefit-that has to be learned.
Collapse
|
22
|
Khamassi M, Quilodran R, Enel P, Dominey PF, Procyk E. Behavioral Regulation and the Modulation of Information Coding in the Lateral Prefrontal and Cingulate Cortex. Cereb Cortex 2014; 25:3197-218. [PMID: 24904073 DOI: 10.1093/cercor/bhu114] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
To explain the high level of flexibility in primate decision-making, theoretical models often invoke reinforcement-based mechanisms, performance monitoring functions, and core neural features within frontal cortical regions. However, the underlying biological mechanisms remain unknown. In recent models, part of the regulation of behavioral control is based on meta-learning principles, for example, driving exploratory actions by varying a meta-parameter, the inverse temperature, which regulates the contrast between competing action probabilities. Here we investigate how complementary processes between lateral prefrontal cortex (LPFC) and dorsal anterior cingulate cortex (dACC) implement decision regulation during exploratory and exploitative behaviors. Model-based analyses of unit activity recorded in these 2 areas in monkeys first revealed that adaptation of the decision function is reflected in a covariation between LPFC neural activity and the control level estimated from the animal's behavior. Second, dACC more prominently encoded a reflection of outcome uncertainty useful for control regulation based on task monitoring. Model-based analyses also revealed higher information integration before feedback in LPFC, and after feedback in dACC. Overall the data support a role of dACC in integrating reinforcement-based information to regulate decision functions in LPFC. Our results thus provide biological evidence on how prefrontal cortical subregions may cooperate to regulate decision-making.
Collapse
Affiliation(s)
- Mehdi Khamassi
- Stem Cell and Brain Research Institute, INSERM U846, 69500 Bron, France Université de Lyon, Lyon 1, UMR-S 846, 69003 Lyon, France Institut des Systèmes Intelligents et de Robotique, Université Pierre et Marie Curie-Paris 6, F-75252, Paris Cedex 05, France Centre National de la Recherche Scientifique UMR 7222, F-75005, Paris Cedex 05, France
| | - René Quilodran
- Stem Cell and Brain Research Institute, INSERM U846, 69500 Bron, France Université de Lyon, Lyon 1, UMR-S 846, 69003 Lyon, France Escuela de Medicina, Departamento de Pre-clínicas, Universidad de Valparaíso, Hontaneda 2653, Valparaíso, Chile
| | - Pierre Enel
- Stem Cell and Brain Research Institute, INSERM U846, 69500 Bron, France Université de Lyon, Lyon 1, UMR-S 846, 69003 Lyon, France
| | - Peter F Dominey
- Stem Cell and Brain Research Institute, INSERM U846, 69500 Bron, France Université de Lyon, Lyon 1, UMR-S 846, 69003 Lyon, France
| | - Emmanuel Procyk
- Stem Cell and Brain Research Institute, INSERM U846, 69500 Bron, France Université de Lyon, Lyon 1, UMR-S 846, 69003 Lyon, France
| |
Collapse
|
23
|
Durning SJ, Capaldi VF, Artino AR, Graner J, van der Vleuten C, Beckman TJ, Costanzo M, Holmboe E, Schuwirth L. A pilot study exploring the relationship between internists' self-reported sleepiness, performance on multiple-choice exam items and prefrontal cortex activity. MEDICAL TEACHER 2014; 36:434-440. [PMID: 24593696 DOI: 10.3109/0142159x.2014.888408] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
BACKGROUND Studies of resident fatigue and performance have shown mixed results. However, research has not examined daytime sleepiness and performance among attending physicians. The purpose of this study was to explore the relationship between sleep, performance and prefrontal cortex (PFC) activity. We hypothesized that sleepiness scores would negatively correlate with multiple-choice question (MCQ) performance and would also correlate with PFC activity. METHODS Board-certified physicians completed an Epworth Sleepiness Scale (ESS) and then answered MCQs from licensing examinations while in a functional Magnetic Resonance Imaging (fMRI) scanner. RESULTS Seventeen board-certified internists completed the study. The mean number of correct responses was 18.5/32. The correlation between the ESS and MCQ score was -0.30, and higher ESS scores were negatively associated with statistically significant changes in medial PFC (mPFC) activity. CONCLUSIONS Attending physicians who reported higher sleepiness scores performed worse on licensing exam questions. Notably, our cohort had normal to mild sleepiness scores. Moreover, higher sleepiness scores were negatively associated with changes in mPFC activity on fMRI, which is consistent with emerging work implicating the PFC in fatigue-related cognitive impairment. Our findings have implications regarding the impact of sleep on physician performance during examinations and potentially on their care of patients.
Collapse
|
24
|
Berger-Tal O, Nathan J, Meron E, Saltz D. The exploration-exploitation dilemma: a multidisciplinary framework. PLoS One 2014; 9:e95693. [PMID: 24756026 PMCID: PMC3995763 DOI: 10.1371/journal.pone.0095693] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2013] [Accepted: 03/30/2014] [Indexed: 11/18/2022] Open
Abstract
The trade-off between the need to obtain new knowledge and the need to use that knowledge to improve performance is one of the most basic trade-offs in nature, and optimal performance usually requires some balance between exploratory and exploitative behaviors. Researchers in many disciplines have been searching for the optimal solution to this dilemma. Here we present a novel model in which the exploration strategy itself is dynamic and varies with time in order to optimize a definite goal, such as the acquisition of energy, money, or prestige. Our model produced four very distinct phases: Knowledge establishment, Knowledge accumulation, Knowledge maintenance, and Knowledge exploitation, giving rise to a multidisciplinary framework that applies equally to humans, animals, and organizations. The framework can be used to explain a multitude of phenomena in various disciplines, such as the movement of animals in novel landscapes, the most efficient resource allocation for a start-up company, or the effects of old age on knowledge acquisition in humans.
Collapse
Affiliation(s)
- Oded Berger-Tal
- Mitrani Department of Desert Ecology, Jacob Blaustein Institutes for Desert Research, Ben-Gurion University of the Negev, Midreshet Ben-Gurion, Israel
- * E-mail:
| | - Jonathan Nathan
- Department of Solar Energy and Environmental Physics, Jacob Blaustein Institutes for Desert Research, Ben-Gurion University of the Negev, Midreshet Ben-Gurion, Israel
| | - Ehud Meron
- Department of Solar Energy and Environmental Physics, Jacob Blaustein Institutes for Desert Research, Ben-Gurion University of the Negev, Midreshet Ben-Gurion, Israel
- Physics Department, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - David Saltz
- Mitrani Department of Desert Ecology, Jacob Blaustein Institutes for Desert Research, Ben-Gurion University of the Negev, Midreshet Ben-Gurion, Israel
| |
Collapse
|
25
|
A spiking neural integrator model of the adaptive control of action by the medial prefrontal cortex. J Neurosci 2014; 34:1892-902. [PMID: 24478368 DOI: 10.1523/jneurosci.2421-13.2014] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Subjects performing simple reaction-time tasks can improve reaction times by learning the expected timing of action-imperative stimuli and preparing movements in advance. Success or failure on the previous trial is often an important factor for determining whether a subject will attempt to time the stimulus or wait for it to occur before initiating action. The medial prefrontal cortex (mPFC) has been implicated in enabling the top-down control of action depending on the outcome of the previous trial. Analysis of spike activity from the rat mPFC suggests that neural integration is a key mechanism for adaptive control in precisely timed tasks. We show through simulation that a spiking neural network consisting of coupled neural integrators captures the neural dynamics of the experimentally recorded mPFC. Errors lead to deviations in the normal dynamics of the system, a process that could enable learning from past mistakes. We expand on this coupled integrator network to construct a spiking neural network that performs a reaction-time task by following either a cue-response or timing strategy, and show that it performs the task with similar reaction times as experimental subjects while maintaining the same spiking dynamics as the experimentally recorded mPFC.
Collapse
|
26
|
Shen C, Ardid S, Kaping D, Westendorff S, Everling S, Womelsdorf T. Anterior Cingulate Cortex Cells Identify Process-Specific Errors of Attentional Control Prior to Transient Prefrontal-Cingulate Inhibition. Cereb Cortex 2014; 25:2213-28. [PMID: 24591526 DOI: 10.1093/cercor/bhu028] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Errors indicate the need to adjust attention for improved future performance. Detecting errors is thus a fundamental step to adjust and control attention. These functions have been associated with the dorsal anterior cingulate cortex (dACC), predicting that dACC cells should track the specific processing states giving rise to errors in order to identify which processing aspects need readjustment. Here, we tested this prediction by recording cells in the dACC and lateral prefrontal cortex (latPFC) of macaques performing an attention task that dissociated 3 processing stages. We found that, across prefrontal subareas, the dACC contained the largest cell populations encoding errors indicating (1) failures of inhibitory control of the attentional focus, (2) failures to prevent bottom-up distraction, and (3) lapses when implementing a choice. Error-locked firing in the dACC showed the earliest latencies across the PFC, emerged earlier than reward omission signals, and involved a significant proportion of putative inhibitory interneurons. Moreover, early onset error-locked response enhancement in the dACC was followed by transient prefrontal-cingulate inhibition, possibly reflecting active disengagement from task processing. These results suggest a functional specialization of the dACC to track and identify the actual processes that give rise to erroneous task outcomes, emphasizing its role to control attentional performance.
Collapse
Affiliation(s)
- Chen Shen
- Department of Biology, Centre for Vision Research, York University, Toronto, Ontario, Canada M6J 1P3
| | - Salva Ardid
- Department of Biology, Centre for Vision Research, York University, Toronto, Ontario, Canada M6J 1P3
| | - Daniel Kaping
- Department of Biology, Centre for Vision Research, York University, Toronto, Ontario, Canada M6J 1P3
| | - Stephanie Westendorff
- Department of Biology, Centre for Vision Research, York University, Toronto, Ontario, Canada M6J 1P3
| | - Stefan Everling
- Department of Physiology and Pharmacology, Western University, London, Ontario, Canada N6A 5K8
| | - Thilo Womelsdorf
- Department of Biology, Centre for Vision Research, York University, Toronto, Ontario, Canada M6J 1P3 Department of Physiology and Pharmacology, Western University, London, Ontario, Canada N6A 5K8
| |
Collapse
|
27
|
Cazé RD, van der Meer MAA. Adaptive properties of differential learning rates for positive and negative outcomes. BIOLOGICAL CYBERNETICS 2013; 107:711-719. [PMID: 24085507 DOI: 10.1007/s00422-013-0571-5] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2013] [Accepted: 09/14/2013] [Indexed: 06/02/2023]
Abstract
The concept of the reward prediction error-the difference between reward obtained and reward predicted-continues to be a focal point for much theoretical and experimental work in psychology, cognitive science, and neuroscience. Models that rely on reward prediction errors typically assume a single learning rate for positive and negative prediction errors. However, behavioral data indicate that better-than-expected and worse-than-expected outcomes often do not have symmetric impacts on learning and decision-making. Furthermore, distinct circuits within cortico-striatal loops appear to support learning from positive and negative prediction errors, respectively. Such differential learning rates would be expected to lead to biased reward predictions and therefore suboptimal choice performance. Contrary to this intuition, we show that on static "bandit" choice tasks, differential learning rates can be adaptive. This occurs because asymmetric learning enables a better separation of learned reward probabilities. We show analytically how the optimal learning rate asymmetry depends on the reward distribution and implement a biologically plausible algorithm that adapts the balance of positive and negative learning rates from experience. These results suggest specific adaptive advantages for separate, differential learning rates in simple reinforcement learning settings and provide a novel, normative perspective on the interpretation of associated neural data.
Collapse
Affiliation(s)
- Romain D Cazé
- Department of Bioengineering, Imperial College, London, UK,
| | | |
Collapse
|
28
|
Sallet J, Camille N, Procyk E. Modulation of feedback-related negativity during trial-and-error exploration and encoding of behavioral shifts. Front Neurosci 2013; 7:209. [PMID: 24294190 PMCID: PMC3827557 DOI: 10.3389/fnins.2013.00209] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2013] [Accepted: 10/19/2013] [Indexed: 11/25/2022] Open
Abstract
The feedback-related negativity (FRN) is a mid-frontal event-related potential (ERP) recorded in various cognitive tasks and associated with the onset of sensory feedback signaling decision outcome. Some properties of the FRN are still debated, notably its sensitivity to positive and negative reward prediction error (RPE)—i.e., the discrepancy between the expectation and the actual occurrence of a particular feedback,—and its role in triggering the post-feedback adjustment. In the present study we tested whether the FRN is modulated by both positive and negative RPE. We also tested whether an instruction cue indicating the need for behavioral adjustment elicited the FRN. We asked 12 human subjects to perform a problem-solving task where they had to search by trial and error which of five visual targets, presented on a screen, was associated with a correct feedback. After exploration and discovery of the correct target, subjects could repeat their correct choice until the onset of a visual signal to change (SC) indicative of a new search. Analyses showed that the FRN was modulated by both negative and positive prediction error (RPE). Finally, we found that the SC elicited an FRN-like potential on the frontal midline electrodes that was not modulated by the probability of that event. Collectively, these results suggest the FRN may reflect a mechanism that evaluates any event (outcome, instruction cue) signaling the need to engage adaptive actions.
Collapse
Affiliation(s)
- Jérôme Sallet
- INSERM U846, Stem Cell and Brain Research Institute Bron, France ; Université Lyon 1, Université de Lyon Lyon, France ; Decision and Action Laboratory, Department of Experimental Psychology, University of Oxford Oxford, UK
| | | | | |
Collapse
|
29
|
Cos I, Khamassi M, Girard B. Modelling the learning of biomechanics and visual planning for decision-making of motor actions. ACTA ACUST UNITED AC 2013; 107:399-408. [PMID: 23973913 DOI: 10.1016/j.jphysparis.2013.07.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2013] [Revised: 06/11/2013] [Accepted: 07/31/2013] [Indexed: 10/26/2022]
Abstract
Recent experiments showed that the bio-mechanical ease and end-point stability associated to reaching movements are predicted prior to movement onset, and that these factors exert a significant influence on the choice of movement. As an extension of these results, here we investigate whether the knowledge about biomechanical costs and their influence on decision-making are the result of an adaptation process taking place during each experimental session or whether this knowledge was learned at an earlier stage of development. Specifically, we analysed both the pattern of decision-making and its fluctuations during each session, of several human subjects making free choices between two reaching movements that varied in path distance (target relative distance), biomechanical cost, aiming accuracy and stopping requirement. Our main result shows that the effect of biomechanics is well established at the start of the session, and that, consequently, the learning of biomechanical costs in decision-making occurred at an earlier stage of development. As a means to characterise the dynamics of this learning process, we also developed a model-based reinforcement learning model, which generates a possible account of how biomechanics may be incorporated into the motor plan to select between reaching movements. Results obtained in simulation showed that, after some pre-training corresponding to a motor babbling phase, the model can reproduce the subjects' overall movement preferences. Although preliminary, this supports that the knowledge about biomechanical costs may have been learned in this manner, and supports the hypothesis that the fluctuations observed in the subjects' behaviour may adapt in a similar fashion.
Collapse
Affiliation(s)
- Ignasi Cos
- Institut des Systèmes Intelligents et de Robotique (ISIR), Université Pierre et Marie Curie - Paris 6, Paris, France; Centre National de la Recherche Scientifique, UMR 7222, Paris, France; Département de Physiologie, Université de Montréal, Montréal, Canada
| | | | | |
Collapse
|