1
|
Parrini M, Tricot G, Caroni P, Spolidoro M. Circuit mechanisms of navigation strategy learning in mice. Curr Biol 2024; 34:79-91.e4. [PMID: 38101403 DOI: 10.1016/j.cub.2023.11.047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Revised: 10/09/2023] [Accepted: 11/22/2023] [Indexed: 12/17/2023]
Abstract
Navigation tasks involve the gradual selection and deployment of increasingly effective searching procedures to reach targets. The brain mechanisms underlying such complex behavior are poorly understood, but their elucidation might provide insights into the systems linking exploration and decision making in complex learning. Here, we developed a trial-by-trial goal-related search strategy analysis as mice learned to navigate identical water mazes encompassing distinct goal-related rules and monitored the strategy deployment process throughout learning. We found that navigation learning involved the following three distinct phases: an early phase during which maze-specific search strategies are deployed in a minority of trials, a second phase of preferential increasing deployment of one search strategy, and a final phase of increasing commitment to this strategy only. The three maze learning phases were affected differently by inhibition of retrosplenial cortex (RSC), dorsomedial striatum (DMS), or dorsolateral striatum (DLS). Through brain region-specific inactivation experiments and gain-of-function experiments involving activation of learning-related cFos+ ensembles, we unraveled how goal-related strategy selection relates to deployment throughout these sequential processes. We found that RSC is critically important for search strategy selection, DMS mediates strategy deployment, and DLS ensures searching consistency throughout maze learning. Notably, activation of specific learning-related ensembles was sufficient to direct strategy selection (RSC) or strategy deployment (DMS) in a different maze. Our results establish a goal-related search strategy deployment approach to dissect unsupervised navigation learning processes and suggest that effective searching in navigation involves evidence-based goal-related strategy direction by RSC, reinforcement-modulated strategy deployment through DMS, and online guidance through DLS.
Collapse
Affiliation(s)
- Martina Parrini
- Friedrich Miescher Institute for Biomedical Research, 4058 Basel, Switzerland
| | - Guillaume Tricot
- Friedrich Miescher Institute for Biomedical Research, 4058 Basel, Switzerland
| | - Pico Caroni
- Friedrich Miescher Institute for Biomedical Research, 4058 Basel, Switzerland.
| | - Maria Spolidoro
- Friedrich Miescher Institute for Biomedical Research, 4058 Basel, Switzerland.
| |
Collapse
|
2
|
Park H, Doh H, Lee E, Park H, Ahn WY. The neurocognitive role of working memory load when Pavlovian motivational control affects instrumental learning. PLoS Comput Biol 2023; 19:e1011692. [PMID: 38064498 PMCID: PMC10732416 DOI: 10.1371/journal.pcbi.1011692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 12/20/2023] [Accepted: 11/15/2023] [Indexed: 12/21/2023] Open
Abstract
Research suggests that a fast, capacity-limited working memory (WM) system and a slow, incremental reinforcement learning (RL) system jointly contribute to instrumental learning. Thus, situations that strain WM resources alter instrumental learning: under WM loads, learning becomes slow and incremental, the reliance on computationally efficient learning increases, and action selection becomes more random. It is also suggested that Pavlovian learning influences people's behavior during instrumental learning by providing hard-wired instinctive responses including approach to reward predictors and avoidance of punishment predictors. However, it remains unknown how constraints on WM resources affect instrumental learning under Pavlovian influence. Thus, we conducted a functional magnetic resonance imaging (fMRI) study (N = 49) in which participants completed an instrumental learning task with Pavlovian-instrumental conflict (the orthogonalized go/no-go task) both with and without extra WM load. Behavioral and computational modeling analyses revealed that WM load reduced the learning rate and increased random choice, without affecting Pavlovian bias. Model-based fMRI analysis revealed that WM load strengthened RPE signaling in the striatum. Moreover, under WM load, the striatum showed weakened connectivity with the ventromedial and dorsolateral prefrontal cortex when computing reward expectations. These results suggest that the limitation of cognitive resources by WM load promotes slow and incremental learning through the weakened cooperation between WM and RL; such limitation also makes action selection more random, but it does not directly affect the balance between instrumental and Pavlovian systems.
Collapse
Affiliation(s)
- Heesun Park
- Department of Psychology, Seoul National University, Seoul, Korea
| | - Hoyoung Doh
- Department of Psychology, Seoul National University, Seoul, Korea
| | - Eunhwi Lee
- Department of Psychology, Seoul National University, Seoul, Korea
| | - Harhim Park
- Department of Psychology, Seoul National University, Seoul, Korea
| | - Woo-Young Ahn
- Department of Psychology, Seoul National University, Seoul, Korea
- Department of Brain and Cognitive Sciences, Seoul National University, Seoul, Korea
| |
Collapse
|
3
|
Navidi P, Saeedpour S, Ershadmanesh S, Hossein MM, Bahrami B. Prosocial learning: Model-based or model-free? PLoS One 2023; 18:e0287563. [PMID: 37352225 PMCID: PMC10289351 DOI: 10.1371/journal.pone.0287563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 06/07/2023] [Indexed: 06/25/2023] Open
Abstract
Prosocial learning involves the acquisition of knowledge and skills necessary for making decisions that benefit others. We asked if, in the context of value-based decision-making, there is any difference between learning strategies for oneself vs. for others. We implemented a 2-step reinforcement learning paradigm in which participants learned, in separate blocks, to make decisions for themselves or for a present other confederate who evaluated their performance. We replicated the canonical features of the model-based and model-free reinforcement learning in our results. The behaviour of the majority of participants was best explained by a mixture of the model-based and model-free control, while most participants relied more heavily on MB control, and this strategy enhanced their learning success. Regarding our key self-other hypothesis, we did not find any significant difference between the behavioural performances nor in the model-based parameters of learning when comparing self and other conditions.
Collapse
Affiliation(s)
- Parisa Navidi
- Department of Cognitive Psychology, Institute for Cognitive Science Studies, Tehran, Iran
| | - Sepehr Saeedpour
- Department of Electrical and Computer Engineering, University of Tehran, Tehran, Iran
| | - Sara Ershadmanesh
- School of Cognitive Sciences, Institute for Research in Fundamental Sciences, Tehran, Iran
- Department of Computational Neuroscience, MPI for Biological Cybernetics, Tuebingen, Germany
| | | | - Bahador Bahrami
- Crowd Cognition Group, Department of General Psychology and Education, Ludwig Maximilians University, Munich, Germany
| |
Collapse
|
4
|
Munuera J, Burguière E. Can we tackle climate change by behavioral hacking of the dopaminergic system? Front Behav Neurosci 2022; 16:996955. [PMID: 36311863 PMCID: PMC9606619 DOI: 10.3389/fnbeh.2022.996955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 09/20/2022] [Indexed: 11/13/2022] Open
Abstract
Climate change is an undeniable fact that will certainly affect millions of people in the following decades. Despite this danger threatening our economies, wellbeing and our lives in general, there is a lack of immediate response at both the institutional and individual level. How can it be that the human brain cannot interpret this threat and act against it to avoid the immense negative consequences that may ensue? Here we argue that this paradox could be explained by the fact that some key brain mechanisms are potentially poorly tuned to take action against a threat that would take full effect only in the long-term. We present neuro-behavioral evidence in favor of this proposal and discuss the role of the dopaminergic (DA) system in learning accurate prediction of the value of an outcome, and its consequences regarding the climate issue. We discuss how this system discounts the value of delayed outcomes and, consequently, does not favor action against the climate crisis. Finally, according to this framework, we suggest that this view may be reconsidered and, on the contrary, that the DA reinforcement learning system could be a powerful ally if adapted to short-term incentives which promote climate-friendly behaviors. Additionally, the DA system interacts with multiple brain systems, in particular those related to higher cognitive functions, which can adjust its functions depending on psychological, social, or other complex contextual information. Thus, we propose several generic action plans that could help to hack these neuro-behavioral processes to promote climate-friendly actions.
Collapse
Affiliation(s)
- Jérôme Munuera
- Sorbonne Université, Institut du Cerveau–Paris Brain Institute–ICM, Inserm, CNRS, AP-HP, Hôpital de la Pitié Salpêtrière, Paris, France
- Institut Jean Nicod, Département d’Études Cognitives, École Normale Supérieure (ENS), EHESS, CNRS, PSL University, Paris, France
- *Correspondence: Jérôme Munuera,
| | - Eric Burguière
- Sorbonne Université, Institut du Cerveau–Paris Brain Institute–ICM, Inserm, CNRS, AP-HP, Hôpital de la Pitié Salpêtrière, Paris, France
- Eric Burguière,
| |
Collapse
|
5
|
Kavroulakis E, van Kemenade BM, Arikan BE, Kircher T, Straube B. The effect of self-generated versus externally generated actions on timing, duration, and amplitude of blood oxygen level dependent response for visual feedback processing. Hum Brain Mapp 2022; 43:4954-4969. [PMID: 36056611 PMCID: PMC9582366 DOI: 10.1002/hbm.26053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 07/22/2022] [Accepted: 07/30/2022] [Indexed: 12/03/2022] Open
Abstract
It has been widely assumed that internal forward models use efference copies to create predictions about the sensory consequences of our own actions. While these predictions have frequently been associated with a reduced blood oxygen level dependent (BOLD) response in sensory cortices, the timing and duration of the hemodynamic response for the processing of video feedback of self‐generated (active) versus externally generated (passive) movements is poorly understood. In the present study, we tested the hypothesis that predictive mechanisms for self‐generated actions lead to early and shorter neural processing compared with externally generated movements. We investigated active and passive movements using a custom‐made fMRI‐compatible movement device. Visual video feedback of the active and passive movements was presented in real time or with variable delays. Participants had to judge whether the feedback was delayed. Timing and duration of BOLD impulse response was calculated using a first (temporal derivative [TD]) and second‐order (dispersion derivative [DD]) Taylor approximation. Our reanalysis confirmed our previous finding of reduced BOLD response for active compared to passive movements. Moreover, we found positive effects of the TD and DD in the supplementary motor area, cerebellum, visual cortices, and subcortical structures, indicating earlier and shorter hemodynamic responses for active compared to passive movements. Furthermore, earlier activation in the putamen for active compared to passive conditions was associated with reduced delay detection performance. These findings indicate that efference copy‐based predictive mechanisms enable earlier processing of action feedback, which might have reduced the ability to detect short delays between action and feedback.
Collapse
Affiliation(s)
| | - Bianca M van Kemenade
- Department of Psychiatry and Psychotherapy, Philipps University Marburg, Marburg, Germany.,Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, UK
| | - Belkis Ezgi Arikan
- Department of Psychology, Justus-Liebig University Giessen, Giessen, Germany
| | - Tilo Kircher
- Department of Psychiatry and Psychotherapy, Philipps University Marburg, Marburg, Germany
| | - Benjamin Straube
- Department of Psychiatry and Psychotherapy, Philipps University Marburg, Marburg, Germany
| |
Collapse
|
6
|
Abstract
This commentary reviews a novel model of learned helplessness proposed by Boddez et al. in this issue of Cognition and Emotion. Combining operant and goal-directed perspectives, Boddez et al. suggest that helplessness stems from a lack of reinforcement when striving toward a goal, with the degree of generalisation dependent on subjective perceptions of goal similarity. We begin by reviewing the theoretical model, describe possible expansions from a cognitive perspective, and discuss several considerations. We finish with a brief discussion of possible directions for future work.
Collapse
Affiliation(s)
- Jessica M Duda
- Department of Psychology, Yale University, New Haven, United States
| | - Jutta Joormann
- Department of Psychology, Yale University, New Haven, United States
| |
Collapse
|
7
|
Georgiev D, Christie R, Torkamani M, Song R, Limousin P, Jahanshahi M. Development and Validation of a Daily Habit Scale. Front Neurosci 2022; 16:880023. [PMID: 35873816 PMCID: PMC9298974 DOI: 10.3389/fnins.2022.880023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Accepted: 06/17/2022] [Indexed: 11/25/2022] Open
Abstract
Habits are defined as automatic behaviours triggered by cues and performed without awareness. They are difficult to control and mentally efficient, which contrasts with goal-directed behaviour, which is characterised by active thought, high computational effort, and the ability to modify this behaviour in response to a changing environment and contextual demands. Habits are not only defined by the frequency with which a behaviour is performed but represent a complex construct that also includes the strength and automaticity of the habitual behaviour. We report here the development and validation of a Daily Habit Scale (DHS) to assess the frequency, automaticity, and strength of daily habits in healthy individuals. Item reduction based on factor analysis resulted in a scale with 38 items grouped into eight factors explaining 52.91% of the variance. The DHS showed very good internal consistency (Cronbach alpha = 0.738) and test-retest reliability (Intraclass correlation coefficient = 0.892, p<0.001) as well as convergent and divergent reliability compared to other scales measuring habits. We found a significant effect of age, gender, anxiety, and depression on the DHS. Considering certain limitations of the DHS, such as not considering the context of performance of habits, and the absence of certain items, such as transportation use, the results of this study suggest that DHS is a reliable and valid measure of daily habits that can be used by both clinicians and researchers as a measure of daily habits.
Collapse
Affiliation(s)
- Dejan Georgiev
- Department Clinical and Motor Neurosciences, Institute of Neurology, University College London, London, United Kingdom.,Department of Neurology, University Medical Centre Ljubljana, Ljubljana, Slovenia.,Artificial Intelligence Lab, Faculty of Computer and Information Sciences, University of Ljubljana, Ljubljana, Slovenia
| | - Rosie Christie
- Department Clinical and Motor Neurosciences, Institute of Neurology, University College London, London, United Kingdom
| | - Mariam Torkamani
- Department Clinical and Motor Neurosciences, Institute of Neurology, University College London, London, United Kingdom
| | - Ruifeng Song
- Department Clinical and Motor Neurosciences, Institute of Neurology, University College London, London, United Kingdom
| | - Patricia Limousin
- Department Clinical and Motor Neurosciences, Institute of Neurology, University College London, London, United Kingdom
| | - Marjan Jahanshahi
- Department Clinical and Motor Neurosciences, Institute of Neurology, University College London, London, United Kingdom
| |
Collapse
|
8
|
Corticotropin-releasing factor receptor 1 in infralimbic cortex modulates social stress-altered decision-making. Prog Neuropsychopharmacol Biol Psychiatry 2022; 116:110523. [PMID: 35122897 DOI: 10.1016/j.pnpbp.2022.110523] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 01/05/2022] [Accepted: 01/31/2022] [Indexed: 11/21/2022]
Abstract
Chronic stress could lead to a bias in behavioral strategies toward habits. However, it remains unclear which neuronal system modulates stress-induced behavioral abnormality during decision making. The corticotropin-releasing factor (CRF) system in the medial prefrontal cortex (mPFC), which has been implicated in governing strategy choice, is involved in the response to stress. The present study aimed to clarify whether altered function in cortical CRF receptors is linked to abnormal behaviors after chronic stress. In results, mice subjected to a 10-day social defeat preferred to use a habitual strategy. The infralimbic cortex (IL), but not the prelimbic cortex (PL) or anterior cingulate cortex (ACC), showed higher cFos expression in stress-subjected mice than in control mice, which may be associated with habitual behavior choice. Furthermore, CRF receptor 1 (CRFR1) agonist and antagonist infusion in IL during behavioral training mimicked and rescued stress-caused behavioral change in the decision-making assessment, respectively. An electrophysiological approach showed that the frequencies of both spontaneous IPSC and spontaneous EPSC, but not their amplitude, increased after stress and were modulated by CRFR1 agents. Further recordings revealed that an increased ratio of excitation to inhibition (E/I ratio) of IL by stress was rescued under conditions with CRFR1 antagonist. Collectively, these data indicate that CRFR1 plays a critical role in stress-permitted or enhanced glutamatergic and GABAergic presynaptic transmission in direct or indirect ways, as well as the modulation for E/I ratio in the IL. Thus, CRFR1 in the mPFC may be a proper target for treating cases of chronic stress-altered behavior.
Collapse
|
9
|
|
10
|
Abstract
Recent breakthroughs in artificial intelligence (AI) have enabled machines to plan in tasks previously thought to be uniquely human. Meanwhile, the planning algorithms implemented by the brain itself remain largely unknown. Here, we review neural and behavioral data in sequential decision-making tasks that elucidate the ways in which the brain does-and does not-plan. To systematically review available biological data, we create a taxonomy of planning algorithms by summarizing the relevant design choices for such algorithms in AI. Across species, recording techniques, and task paradigms, we find converging evidence that the brain represents future states consistent with a class of planning algorithms within our taxonomy-focused, depth-limited, and serial. However, we argue that current data are insufficient for addressing more detailed algorithmic questions. We propose a new approach leveraging AI advances to drive experiments that can adjudicate between competing candidate algorithms.
Collapse
|
11
|
Abstract
Abstract
Purpose of Review
Current theories of alcohol use disorders (AUD) highlight the importance of Pavlovian and instrumental learning processes mainly based on preclinical animal studies. Here, we summarize available evidence for alterations of those processes in human participants with AUD with a focus on habitual versus goal-directed instrumental learning, Pavlovian conditioning, and Pavlovian-to-instrumental transfer (PIT) paradigms.
Recent Findings
The balance between habitual and goal-directed control in AUD participants has been studied using outcome devaluation or sequential decision-making procedures, which have found some evidence of reduced goal-directed/model-based control, but little evidence for stronger habitual responding. The employed Pavlovian learning and PIT paradigms have shown considerable differences regarding experimental procedures, e.g., alcohol-related or conventional reinforcers or stimuli.
Summary
While studies of basic learning processes in human participants with AUD support a role of Pavlovian and instrumental learning mechanisms in the development and maintenance of drug addiction, current studies are characterized by large variability regarding methodology, sample characteristics, and results, and translation from animal paradigms to human research remains challenging. Longitudinal approaches with reliable and ecologically valid paradigms of Pavlovian and instrumental processes, including alcohol-related cues and outcomes, are warranted and should be combined with state-of-the-art imaging techniques, computational approaches, and ecological momentary assessment methods.
Collapse
|
12
|
Overmeyer R, Berghäuser J, Dieterich R, Wolff M, Goschke T, Endrass T. The Error-Related Negativity Predicts Self-Control Failures in Daily Life. Front Hum Neurosci 2021; 14:614979. [PMID: 33584226 PMCID: PMC7873054 DOI: 10.3389/fnhum.2020.614979] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Accepted: 12/18/2020] [Indexed: 11/13/2022] Open
Abstract
Adaptive behavior critically depends on performance monitoring (PM), the ability to monitor action outcomes and the need to adapt behavior. PM-related brain activity has been linked to guiding decisions about whether action adaptation is warranted. The present study examined whether PM-related brain activity in a flanker task, as measured by electroencephalography (EEG), was associated with adaptive behavior in daily life. Specifically, we were interested in the employment of self-control, operationalized as self-control failures (SCFs), and measured using ecological momentary assessment. Analyses were conducted using an adaptive elastic net regression to predict SCFs from EEG in a sample of 131 participants. The model was fit using within-subject averaged response-locked EEG activity at each electrode and time point within an epoch surrounding the response. We found that higher amplitudes of the error-related negativity (ERN) were related to fewer SCFs. This suggests that lower error-related activity may relate to lower recruitment of interventive self-control in daily life. Altered cognitive control processes, like PM, have been proposed as underlying mechanisms for various mental disorders. Understanding how alterations in PM relate to regulatory control might therefore aid in delineating how these alterations contribute to different psychopathologies.
Collapse
Affiliation(s)
- Rebecca Overmeyer
- Faculty of Psychology, Technische Universität Dresden, Dresden, Germany
| | - Julia Berghäuser
- Faculty of Psychology, Technische Universität Dresden, Dresden, Germany
| | - Raoul Dieterich
- Faculty of Psychology, Technische Universität Dresden, Dresden, Germany
| | - Max Wolff
- Faculty of Psychology, Technische Universität Dresden, Dresden, Germany.,Department of Psychiatry and Psychotherapy, Technische Universität Dresden, Dresden, Germany
| | - Thomas Goschke
- Faculty of Psychology, Technische Universität Dresden, Dresden, Germany.,Neuroimaging Centre, Technische Universität Dresden, Dresden, Germany
| | - Tanja Endrass
- Faculty of Psychology, Technische Universität Dresden, Dresden, Germany.,Neuroimaging Centre, Technische Universität Dresden, Dresden, Germany
| |
Collapse
|
13
|
Fine JM, Zarr N, Brown JW. Computational Neural Mechanisms of Goal-Directed Planning and Problem Solving. ACTA ACUST UNITED AC 2020. [DOI: 10.1007/s42113-020-00095-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
|
14
|
Shine JM. The thalamus integrates the macrosystems of the brain to facilitate complex, adaptive brain network dynamics. Prog Neurobiol 2020; 199:101951. [PMID: 33189781 DOI: 10.1016/j.pneurobio.2020.101951] [Citation(s) in RCA: 68] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2020] [Revised: 10/29/2020] [Accepted: 11/08/2020] [Indexed: 01/20/2023]
Abstract
The human brain is a complex, adaptive system comprised of billions of cells with trillions of connections. The interactions between the elements of the system oppose this seemingly limitless capacity by constraining the system's dynamic repertoire, enforcing distributed neural states that balance integration and differentiation. How this trade-off is mediated by the brain, and how the emergent, distributed neural patterns give rise to cognition and awareness, remains poorly understood. Here, I argue that the thalamus is well-placed to arbitrate the interactions between distributed neural assemblies in the cerebral cortex. Different classes of thalamocortical connections are hypothesized to promote either feed-forward or feedback processing modes in the cerebral cortex. This activity can be conceptualized as emerging dynamically from an evolving attractor landscape, with the relative engagement of distinct distributed circuits providing differing constraints over the manner in which brain state trajectories change over time. In addition, inputs to the distinct thalamic populations from the cerebellum and basal ganglia, respectively, are proposed to differentially shape the attractor landscape, and hence, the temporal evolution of cortical assemblies. The coordinated engagement of these neural macrosystems is then shown to share key characteristics with prominent models of cognition, attention and conscious awareness. In this way, the crucial role of the thalamus in mediating the distributed, multi-scale network organization of the central nervous system can be related to higher brain function.
Collapse
Affiliation(s)
- James M Shine
- Sydney Medical School, The University of Sydney, Australia
| |
Collapse
|
15
|
Monosov IE. How Outcome Uncertainty Mediates Attention, Learning, and Decision-Making. Trends Neurosci 2020; 43:795-809. [PMID: 32736849 PMCID: PMC8153236 DOI: 10.1016/j.tins.2020.06.009] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Revised: 06/16/2020] [Accepted: 06/24/2020] [Indexed: 01/24/2023]
Abstract
Animals and humans evolved sophisticated nervous systems that endowed them with the ability to form internal-models or beliefs and make predictions about the future to survive and flourish in a world in which future outcomes are often uncertain. Crucial to this capacity is the ability to adjust behavioral and learning policies in response to the level of uncertainty. Until recently, the neuronal mechanisms that could underlie such uncertainty-guided control have been largely unknown. In this review, I discuss newly discovered neuronal circuits in primates that represent uncertainty about future rewards and propose how they guide information-seeking, attention, decision-making, and learning to help us survive in an uncertain world. Lastly, I discuss the possible relevance of these findings to learning in artificial systems.
Collapse
Affiliation(s)
- Ilya E Monosov
- Department of Neuroscience and Neurosurgery, Washington University School of Medicine in St. Louis, MO, USA; Department of Biomedical Engineering, Washington University School of Medicine in St. Louis, MO, USA; Washington University Pain Center, Washington University School of Medicine in St. Louis, MO, USA.
| |
Collapse
|
16
|
Overmeyer R, Fürtjes S, Ersche KD, Ehrlich S, Endrass T. Self-regulation is negatively associated with habit tendencies: A validation of the German Creature of Habit Scale. PERSONALITY AND INDIVIDUAL DIFFERENCES 2020. [DOI: 10.1016/j.paid.2020.110029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
17
|
Collins AGE, Cockburn J. Beyond dichotomies in reinforcement learning. Nat Rev Neurosci 2020; 21:576-586. [PMID: 32873936 DOI: 10.1038/s41583-020-0355-6] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/20/2020] [Indexed: 11/09/2022]
Abstract
Reinforcement learning (RL) is a framework of particular importance to psychology, neuroscience and machine learning. Interactions between these fields, as promoted through the common hub of RL, has facilitated paradigm shifts that relate multiple levels of analysis in a singular framework (for example, relating dopamine function to a computationally defined RL signal). Recently, more sophisticated RL algorithms have been proposed to better account for human learning, and in particular its oft-documented reliance on two separable systems: a model-based (MB) system and a model-free (MF) system. However, along with many benefits, this dichotomous lens can distort questions, and may contribute to an unnecessarily narrow perspective on learning and decision-making. Here, we outline some of the consequences that come from overconfidently mapping algorithms, such as MB versus MF RL, with putative cognitive processes. We argue that the field is well positioned to move beyond simplistic dichotomies, and we propose a means of refocusing research questions towards the rich and complex components that comprise learning and decision-making.
Collapse
Affiliation(s)
- Anne G E Collins
- Department of Psychology and the Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, USA.
| | - Jeffrey Cockburn
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, USA
| |
Collapse
|
18
|
Fidgeting as self-evidencing: A predictive processing account of non-goal-directed action. NEW IDEAS IN PSYCHOLOGY 2020. [DOI: 10.1016/j.newideapsych.2019.100750] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
19
|
Electrophysiology of goal-directed versus habitual control during outcome devaluation. Cortex 2019; 119:401-416. [DOI: 10.1016/j.cortex.2019.08.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 07/16/2019] [Accepted: 08/02/2019] [Indexed: 01/08/2023]
|
20
|
Revisiting the relationship between the P3b and working memory updating. Biol Psychol 2019; 148:107769. [PMID: 31525391 DOI: 10.1016/j.biopsycho.2019.107769] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2018] [Revised: 08/04/2019] [Accepted: 09/10/2019] [Indexed: 12/29/2022]
Abstract
The P3b is an extensively studied neurophysiological phenomenon that is predominantly explained in the cognitive neuroscience literature as reflecting context updating, presumably in working memory (WM). Despite the prevalence and influence of the context updating hypothesis, direct empirical support for the role of WM updating in eliciting the P3b is still missing. The present study was designed to address the empirical gap in understanding the functional role of P3b in general, and specifically in relation to WM updating. A mass-univariate approach was used to test the unique contribution of WM updating, categorization, and stimulus probability to the P3b. The results indicated that the P3b is only modulated by the categorization process, a finding that challenges the WM updating hypothesis. Taken together these results, we suggest that the P3b reflects a WM guided target identification mechanism, which operates as part of a goal-directed learning strategy.
Collapse
|
21
|
Moens V, Zénon A. Learning and forgetting using reinforced Bayesian change detection. PLoS Comput Biol 2019; 15:e1006713. [PMID: 30995214 PMCID: PMC6488101 DOI: 10.1371/journal.pcbi.1006713] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Revised: 04/29/2019] [Accepted: 12/09/2018] [Indexed: 12/17/2022] Open
Abstract
Agents living in volatile environments must be able to detect changes in contingencies while refraining to adapt to unexpected events that are caused by noise. In Reinforcement Learning (RL) frameworks, this requires learning rates that adapt to past reliability of the model. The observation that behavioural flexibility in animals tends to decrease following prolonged training in stable environment provides experimental evidence for such adaptive learning rates. However, in classical RL models, learning rate is either fixed or scheduled and can thus not adapt dynamically to environmental changes. Here, we propose a new Bayesian learning model, using variational inference, that achieves adaptive change detection by the use of Stabilized Forgetting, updating its current belief based on a mixture of fixed, initial priors and previous posterior beliefs. The weight given to these two sources is optimized alongside the other parameters, allowing the model to adapt dynamically to changes in environmental volatility and to unexpected observations. This approach is used to implement the "critic" of an actor-critic RL model, while the actor samples the resulting value distributions to choose which action to undertake. We show that our model can emulate different adaptation strategies to contingency changes, depending on its prior assumptions of environmental stability, and that model parameters can be fit to real data with high accuracy. The model also exhibits trade-offs between flexibility and computational costs that mirror those observed in real data. Overall, the proposed method provides a general framework to study learning flexibility and decision making in RL contexts.
Collapse
Affiliation(s)
- Vincent Moens
- CoAction Lab, Institue of Neuroscience, Université Catholique de Louvain, Bruxelles, Belgium
| | - Alexandre Zénon
- CoAction Lab, Institue of Neuroscience, Université Catholique de Louvain, Bruxelles, Belgium
- INCIA, Université de Bordeaux, Bordeaux, France
| |
Collapse
|
22
|
Chatila R, Renaudo E, Andries M, Chavez-Garcia RO, Luce-Vayrac P, Gottstein R, Alami R, Clodic A, Devin S, Girard B, Khamassi M. Toward Self-Aware Robots. Front Robot AI 2018; 5:88. [PMID: 33500967 PMCID: PMC7805649 DOI: 10.3389/frobt.2018.00088] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Accepted: 07/03/2018] [Indexed: 11/13/2022] Open
Abstract
Despite major progress in Robotics and AI, robots are still basically "zombies" repeatedly achieving actions and tasks without understanding what they are doing. Deep-Learning AI programs classify tremendous amounts of data without grasping the meaning of their inputs or outputs. We still lack a genuine theory of the underlying principles and methods that would enable robots to understand their environment, to be cognizant of what they do, to take appropriate and timely initiatives, to learn from their own experience and to show that they know that they have learned and how. The rationale of this paper is that the understanding of its environment by an agent (the agent itself and its effects on the environment included) requires its self-awareness, which actually is itself emerging as a result of this understanding and the distinction that the agent is capable to make between its own mind-body and its environment. The paper develops along five issues: agent perception and interaction with the environment; learning actions; agent interaction with other agents-specifically humans; decision-making; and the cognitive architecture integrating these capacities.
Collapse
Affiliation(s)
- Raja Chatila
- Institute of Intelligent Systems and Robotics, Sorbonne Université, CNRS, Paris, France
| | - Erwan Renaudo
- Institute of Intelligent Systems and Robotics, Sorbonne Université, CNRS, Paris, France
- Intelligent and Interactive Systems, Department of Computer Science, University of Innsbruck, Innsbruck, Austria
| | - Mihai Andries
- Institute of Intelligent Systems and Robotics, Sorbonne Université, CNRS, Paris, France
- Institute for Systems and Robotics, Instituto Superior Técnico, Lisbon, Portugal
| | - Ricardo-Omar Chavez-Garcia
- Institute of Intelligent Systems and Robotics, Sorbonne Université, CNRS, Paris, France
- Istituto Dalle Molle di Studi sull'Intelligenza Artificiale (IDSIA), Università della Svizzera Italiana - Scuola universitaria professionale della Svizzera italiana (USI-SUPSI), Lugano, Switzerland
| | - Pierre Luce-Vayrac
- Institute of Intelligent Systems and Robotics, Sorbonne Université, CNRS, Paris, France
| | - Raphael Gottstein
- Institute of Intelligent Systems and Robotics, Sorbonne Université, CNRS, Paris, France
| | - Rachid Alami
- Intelligent and Interactive Systems, Department of Computer Science, University of Innsbruck, Innsbruck, Austria
| | - Aurélie Clodic
- LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France
| | - Sandra Devin
- Intelligent and Interactive Systems, Department of Computer Science, University of Innsbruck, Innsbruck, Austria
| | - Benoît Girard
- Institute of Intelligent Systems and Robotics, Sorbonne Université, CNRS, Paris, France
| | - Mehdi Khamassi
- Institute of Intelligent Systems and Robotics, Sorbonne Université, CNRS, Paris, France
| |
Collapse
|
23
|
Haith AM, Krakauer JW. The multiple effects of practice: skill, habit and reduced cognitive load. Curr Opin Behav Sci 2018; 20:196-201. [PMID: 30944847 PMCID: PMC6443249 DOI: 10.1016/j.cobeha.2018.01.015] [Citation(s) in RCA: 74] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
When learning a new skill, even if we have been instructed exactly what to do, it is often necessary to practice for hours or even weeks before we achieve proficient and fluid performance. Practice has a multitude of effects on behavior, including increasing the speed of performance, rendering the practiced behavior habitual and reducing the cognitive load required to perform the task. These effects are often collectively referred to as automaticity. Here, we argue that these effects can be explained as multiple consequences of a single principle: caching of the outcome of frequently occuring computations. We further argue that, in the context of more complex task representations, caching different intermediate computations can give rise to more nuanced behavioral signatures, including dissociation between skill, habit and cognitive load.
Collapse
Affiliation(s)
- Adrian M Haith
- Department of Neurology, Johns Hopkins University, Baltimore, MD, USA
| | - John W Krakauer
- Department of Neurology, Johns Hopkins University, Baltimore, MD, USA
- Department of Neuroscience, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
24
|
Pirolli P, Mohan S, Venkatakrishnan A, Nelson L, Silva M, Springer A. Implementation Intention and Reminder Effects on Behavior Change in a Mobile Health System: A Predictive Cognitive Model. J Med Internet Res 2017; 19:e397. [PMID: 29191800 PMCID: PMC5730820 DOI: 10.2196/jmir.8217] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2017] [Revised: 09/03/2017] [Accepted: 10/05/2017] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Implementation intentions are mental representations of simple plans to translate goal intentions into behavior under specific conditions. Studies show implementation intentions can produce moderate to large improvements in behavioral goal achievement. Human associative memory mechanisms have been implicated in the processes by which implementation intentions produce effects. On the basis of the adaptive control of thought-rational (ACT-R) theory of cognition, we hypothesized that the strength of implementation intention effect could be manipulated in predictable ways using reminders delivered by a mobile health (mHealth) app. OBJECTIVE The aim of this experiment was to manipulate the effects of implementation intentions on daily behavioral goal success in ways predicted by the ACT-R theory concerning mHealth reminder scheduling. METHODS An incomplete factorial design was used in this mHealth study. All participants were asked to choose a healthy behavior goal associated with eat slowly, walking, or eating more vegetables and were asked to set implementation intentions. N=64 adult participants were in the study for 28 days. Participants were stratified by self-efficacy and assigned to one of two reminder conditions: reminders-presented versus reminders-absent. Self-efficacy and reminder conditions were crossed. Nested within the reminders-presented condition was a crossing of frequency of reminders sent (high, low) by distribution of reminders sent (distributed, massed). Participants in the low frequency condition got 7 reminders over 28 days; those in the high frequency condition were sent 14. Participants in the distributed conditions were sent reminders at uniform intervals. Participants in the massed distribution conditions were sent reminders in clusters. RESULTS There was a significant overall effect of reminders on achieving a daily behavioral goal (coefficient=2.018, standard error [SE]=0.572, odds ratio [OR]=7.52, 95% CI 0.9037-3.2594, P<.001). As predicted by ACT-R, using default theoretical parameters, there was an interaction of reminder frequency by distribution on daily goal success (coefficient=0.7994, SE=0.2215, OR=2.2242, 95% CI 0.3656-1.2341, P<.001). The total number of times a reminder was acknowledged as received by a participant had a marginal effect on daily goal success (coefficient=0.0694, SE=0.0410, OR=1.0717, 95% CI -0.01116 to 0.1505, P=.09), and the time since acknowledging receipt of a reminder was highly significant (coefficient=-0.0490, SE=0.0104, OR=0.9522, 95% CI -0.0700 to -0.2852], P<.001). A dual system ACT-R mathematical model was fit to individuals' daily goal successes and reminder acknowledgments: a goal-striving system dependent on declarative memory plus a habit-forming system that acquires automatic procedures for performance of behavioral goals. CONCLUSIONS Computational cognitive theory such as ACT-R can be used to make precise quantitative predictions concerning daily health behavior goal success in response to implementation intentions and the dosing schedules of reminders.
Collapse
Affiliation(s)
- Peter Pirolli
- Institute for Human and Machine Cognition, Pensacola, FL, United States
| | - Shiwali Mohan
- Palo Alto Research Center, Palo Alto, CA, United States
| | | | - Les Nelson
- Palo Alto Research Center, Palo Alto, CA, United States
| | - Michael Silva
- Palo Alto Research Center, Palo Alto, CA, United States
| | - Aaron Springer
- University of California, Santa Cruz, Santa Cruz, CA, United States
| |
Collapse
|
25
|
Herbort O, Mathew H, Kunde W. Habit outweighs planning in grasp selection for object manipulation. Cogn Psychol 2016; 92:127-140. [PMID: 27951435 DOI: 10.1016/j.cogpsych.2016.11.008] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Revised: 11/18/2016] [Accepted: 11/23/2016] [Indexed: 10/20/2022]
Abstract
Object-directed grasping movements are adapted to intended interactions with an object. We address whether adjusting the grasp for object manipulation is controlled habitually, based on past experiences, or by goal-directed planning, based on an evaluation of the expected action outcomes. Therefore, we asked participants to grasp and rotate a dial. In such tasks, participants typically grasp the dial with an excursed, uncomfortable arm posture, which then allows to complete the dial rotation in a comfortable end-state. We extended this task by manipulating the contingency between the orientation of the grasp and the resulting end-state of the arm. A one-step (control) group rotated the dial to a single target. A two-step group rotated the dial to an initial target and then in the opposite direction. A three-step group rotated the dial to the initial target, then in the opposite direction, and then back to the initial target. During practice, the two-step and three-step groups reduced the excursion of their grasps, thus avoiding overly excursed arm postures after the second rotation. When the two-step and three-step groups were asked to execute one-step rotations, their grasps resembled those that were acquired during the two-step and three-step rotations, respectively. However, the carry-over was not complete. This suggests that adjusting grasps for forthcoming object manipulations is controlled by a mixture of habitual and goal-directed processes. In the present experiment, the former contributed approximately twice as much to grasp selection than the latter.
Collapse
Affiliation(s)
- Oliver Herbort
- Department of Psychology, Julius-Maximilians-Universität Würzburg, Röntgenring 11, 97070 Würzburg, Germany.
| | - Hanna Mathew
- Department of Psychology, Julius-Maximilians-Universität Würzburg, Röntgenring 11, 97070 Würzburg, Germany.
| | - Wilfried Kunde
- Department of Psychology, Julius-Maximilians-Universität Würzburg, Röntgenring 11, 97070 Würzburg, Germany.
| |
Collapse
|
26
|
Fermin ASR, Yoshida T, Yoshimoto J, Ito M, Tanaka SC, Doya K. Model-based action planning involves cortico-cerebellar and basal ganglia networks. Sci Rep 2016; 6:31378. [PMID: 27539554 PMCID: PMC4990901 DOI: 10.1038/srep31378] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2016] [Accepted: 07/19/2016] [Indexed: 11/29/2022] Open
Abstract
Humans can select actions by learning, planning, or retrieving motor memories. Reinforcement Learning (RL) associates these processes with three major classes of strategies for action selection: exploratory RL learns state-action values by exploration, model-based RL uses internal models to simulate future states reached by hypothetical actions, and motor-memory RL selects past successful state-action mapping. In order to investigate the neural substrates that implement these strategies, we conducted a functional magnetic resonance imaging (fMRI) experiment while humans performed a sequential action selection task under conditions that promoted the use of a specific RL strategy. The ventromedial prefrontal cortex and ventral striatum increased activity in the exploratory condition; the dorsolateral prefrontal cortex, dorsomedial striatum, and lateral cerebellum in the model-based condition; and the supplementary motor area, putamen, and anterior cerebellum in the motor-memory condition. These findings suggest that a distinct prefrontal-basal ganglia and cerebellar network implements the model-based RL action selection strategy.
Collapse
Affiliation(s)
- Alan S. R. Fermin
- Graduate School of Information Science, Nara Institute of Science and Technology, Nara 630-0192, Japan
- Neural Computation Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa 904-0495, Japan
- Brain Science Institute, Tamagawa University, Tokyo 194-8610, Japan
| | - Takehiko Yoshida
- Graduate School of Information Science, Nara Institute of Science and Technology, Nara 630-0192, Japan
- Neural Computation Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa 904-0495, Japan
| | - Junichiro Yoshimoto
- Graduate School of Information Science, Nara Institute of Science and Technology, Nara 630-0192, Japan
- Neural Computation Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa 904-0495, Japan
| | - Makoto Ito
- Neural Computation Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa 904-0495, Japan
| | - Saori C. Tanaka
- ATR Brain Information Communication Research Lab, Kyoto 619-0288, Japan
| | - Kenji Doya
- Graduate School of Information Science, Nara Institute of Science and Technology, Nara 630-0192, Japan
- Neural Computation Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa 904-0495, Japan
- Brain Science Institute, Tamagawa University, Tokyo 194-8610, Japan
- ATR Brain Information Communication Research Lab, Kyoto 619-0288, Japan
| |
Collapse
|
27
|
Pirolli P. From good intentions to healthy habits: towards integrated computational models of goal striving and habit formation. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2016; 2016:181-185. [PMID: 28268309 DOI: 10.1109/embc.2016.7590670] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Computational models were developed in the ACT-R neurocognitive architecture to address some aspects of the dynamics of behavior change. The simulations aim to address the day-to-day goal achievement data available from mobile health systems. The models refine current psychological theories of self-efficacy, intended effort, and habit formation, and provide an account for the mechanisms by which goal personalization, implementation intentions, and remindings work.
Collapse
|
28
|
Chersi F, Burgess N. The Cognitive Architecture of Spatial Navigation: Hippocampal and Striatal Contributions. Neuron 2016; 88:64-77. [PMID: 26447573 DOI: 10.1016/j.neuron.2015.09.021] [Citation(s) in RCA: 134] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Spatial navigation can serve as a model system in cognitive neuroscience, in which specific neural representations, learning rules, and control strategies can be inferred from the vast experimental literature that exists across many species, including humans. Here, we review this literature, focusing on the contributions of hippocampal and striatal systems, and attempt to outline a minimal cognitive architecture that is consistent with the experimental literature and that synthesizes previous related computational modeling. The resulting architecture includes striatal reinforcement learning based on egocentric representations of sensory states and actions, incidental Hebbian association of sensory information with allocentric state representations in the hippocampus, and arbitration of the outputs of both systems based on confidence/uncertainty in medial prefrontal cortex. We discuss the relationship between this architecture and learning in model-free and model-based systems, episodic memory, imagery, and planning, including some open questions and directions for further experiments.
Collapse
Affiliation(s)
- Fabian Chersi
- Institute of Cognitive Neuroscience & Institute of Neurology, University College London, 17 Queen Square, London, WC1N 3AZ, UK.
| | - Neil Burgess
- Institute of Cognitive Neuroscience & Institute of Neurology, University College London, 17 Queen Square, London, WC1N 3AZ, UK.
| |
Collapse
|
29
|
Pezzulo G, Rigoli F, Friston K. Active Inference, homeostatic regulation and adaptive behavioural control. Prog Neurobiol 2015; 134:17-35. [PMID: 26365173 PMCID: PMC4779150 DOI: 10.1016/j.pneurobio.2015.09.001] [Citation(s) in RCA: 299] [Impact Index Per Article: 33.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2014] [Revised: 07/20/2015] [Accepted: 09/08/2015] [Indexed: 11/30/2022]
Abstract
We review a theory of homeostatic regulation and adaptive behavioural control within the Active Inference framework. Our aim is to connect two research streams that are usually considered independently; namely, Active Inference and associative learning theories of animal behaviour. The former uses a probabilistic (Bayesian) formulation of perception and action, while the latter calls on multiple (Pavlovian, habitual, goal-directed) processes for homeostatic and behavioural control. We offer a synthesis these classical processes and cast them as successive hierarchical contextualisations of sensorimotor constructs, using the generative models that underpin Active Inference. This dissolves any apparent mechanistic distinction between the optimization processes that mediate classical control or learning. Furthermore, we generalize the scope of Active Inference by emphasizing interoceptive inference and homeostatic regulation. The ensuing homeostatic (or allostatic) perspective provides an intuitive explanation for how priors act as drives or goals to enslave action, and emphasises the embodied nature of inference.
Collapse
Affiliation(s)
- Giovanni Pezzulo
- Institute of Cognitive Sciences and Technologies, National Research Council, Rome, Italy.
| | | | | |
Collapse
|
30
|
Stoianov I, Genovesio A, Pezzulo G. Prefrontal Goal Codes Emerge as Latent States in Probabilistic Value Learning. J Cogn Neurosci 2015; 28:140-57. [PMID: 26439267 DOI: 10.1162/jocn_a_00886] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
The prefrontal cortex (PFC) supports goal-directed actions and exerts cognitive control over behavior, but the underlying coding and mechanism are heavily debated. We present evidence for the role of goal coding in PFC from two converging perspectives: computational modeling and neuronal-level analysis of monkey data. We show that neural representations of prospective goals emerge by combining a categorization process that extracts relevant behavioral abstractions from the input data and a reward-driven process that selects candidate categories depending on their adaptive value; both forms of learning have a plausible neural implementation in PFC. Our analyses demonstrate a fundamental principle: goal coding represents an efficient solution to cognitive control problems, analogous to efficient coding principles in other (e.g., visual) brain areas. The novel analytical-computational approach is of general interest because it applies to a variety of neurophysiological studies.
Collapse
Affiliation(s)
- Ivilin Stoianov
- National Research Council, Rome, Italy.,CNRS and Aix-Marseille University, France
| | | | | |
Collapse
|
31
|
Dual-process decomposition in human sensorimotor adaptation. Curr Opin Neurobiol 2015; 33:71-7. [DOI: 10.1016/j.conb.2015.03.003] [Citation(s) in RCA: 115] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2015] [Revised: 03/05/2015] [Accepted: 03/09/2015] [Indexed: 11/23/2022]
|
32
|
Abstract
Following a change in the environment or motor apparatus, human subjects are able to rapidly compensate their movements to recover accurate performance. This ability to adapt is thought to be achieved through multiple, qualitatively distinct learning processes acting in parallel. It is unclear, however, what the relative contributions of these multiple processes are during learning. In particular, long-term memories in such paradigms have been extensively studied through the phenomenon of savings-faster adaptation to a given perturbation the second time it is experienced-but it is unclear which components of learning contribute to this effect. Here we show that distinct components of learning in an adaptation task can be dissociated based on the amount of preparation time they require. During adaptation, we occasionally forced subjects to generate movements at very low preparation times. Early in learning, subjects expressed only a limited amount of their prior learning in these trials, though performance improved gradually with further practice. Following washout, subjects exhibited a strong and persistent aftereffect in trials in which preparation time was limited. When subjects were exposed to the same perturbation twice in successive days, they adapted faster the second time. This savings effect was, however, not seen in movements generated at low preparation times. These results demonstrate that preparation time plays a critical role in the expression of some components of learning but not others. Savings is restricted to those components that require prolonged preparation to be expressed and might therefore reflect a declarative rather than procedural form of memory.
Collapse
|
33
|
Chen C, Takahashi T, Nakagawa S, Inoue T, Kusumi I. Reinforcement learning in depression: A review of computational research. Neurosci Biobehav Rev 2015; 55:247-67. [PMID: 25979140 DOI: 10.1016/j.neubiorev.2015.05.005] [Citation(s) in RCA: 116] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2014] [Revised: 04/20/2015] [Accepted: 05/04/2015] [Indexed: 01/05/2023]
Abstract
Despite being considered primarily a mood disorder, major depressive disorder (MDD) is characterized by cognitive and decision making deficits. Recent research has employed computational models of reinforcement learning (RL) to address these deficits. The computational approach has the advantage in making explicit predictions about learning and behavior, specifying the process parameters of RL, differentiating between model-free and model-based RL, and the computational model-based functional magnetic resonance imaging and electroencephalography. With these merits there has been an emerging field of computational psychiatry and here we review specific studies that focused on MDD. Considerable evidence suggests that MDD is associated with impaired brain signals of reward prediction error and expected value ('wanting'), decreased reward sensitivity ('liking') and/or learning (be it model-free or model-based), etc., although the causality remains unclear. These parameters may serve as valuable intermediate phenotypes of MDD, linking general clinical symptoms to underlying molecular dysfunctions. We believe future computational research at clinical, systems, and cellular/molecular/genetic levels will propel us toward a better understanding of the disease.
Collapse
Affiliation(s)
- Chong Chen
- Department of Psychiatry, Hokkaido University Graduate School of Medicine, Sapporo 060-8638, Japan.
| | - Taiki Takahashi
- Department of Behavioral Science/Center for Experimental Research in Social Sciences, Hokkaido University, Sapporo 060-0810, Japan
| | - Shin Nakagawa
- Department of Psychiatry, Hokkaido University Graduate School of Medicine, Sapporo 060-8638, Japan
| | - Takeshi Inoue
- Department of Psychiatry, Hokkaido University Graduate School of Medicine, Sapporo 060-8638, Japan
| | - Ichiro Kusumi
- Department of Psychiatry, Hokkaido University Graduate School of Medicine, Sapporo 060-8638, Japan
| |
Collapse
|
34
|
Vlaev I, Dolan P. Action Change Theory: A Reinforcement Learning Perspective on Behavior Change. REVIEW OF GENERAL PSYCHOLOGY 2015. [DOI: 10.1037/gpr0000029] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Ivo Vlaev
- Warwick Business School, University of Warwick
| | - Paul Dolan
- Department of Social Policy, London School of Economics
| |
Collapse
|
35
|
Mengov G. Person-by-person prediction of intuitive economic choice. Neural Netw 2014; 60:232-45. [PMID: 25278217 DOI: 10.1016/j.neunet.2014.09.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2014] [Revised: 07/09/2014] [Accepted: 09/05/2014] [Indexed: 11/17/2022]
Abstract
Decision making is an interdisciplinary field, which is explored with methods spanning from economic experiments to brain scanning. Its dominant paradigms such as utility theory, prospect theory, and the modern dual-process theories all resort to formal algebraic models or non-mathematical postulates, and remain purely phenomenological. An approach introduced by Grossberg deployed differential equations describing neural networks and bridged the gap between decision science and the psychology of cognitive-emotional interactions. However, the limits within which neural models can explain data from real people's actions are virtually untested and remain unknown. Here we show that a model built around a recurrent gated dipole can successfully forecast individual economic choices in a complex laboratory experiment. Unlike classical statistical and econometric techniques or machine learning algorithms, our method calibrates the equations for each individual separately, and carries out prediction person-by-person. It predicted very well the behaviour of 15%-20% of the participants in the experiment-half of them extremely well-and was overall useful for two thirds of all 211 subjects. The model succeeded with people who were guided by gut feelings and failed with those who had sophisticated strategies. One hypothesis is that this neural network is the biological substrate of the cognitive system for primitive-intuitive thinking, and so we believe that we have a model of how people choose economic options by a simple form of intuition. We anticipate our study to be useful for further studies of human intuitive thinking as well as for analyses of economic systems populated by heterogeneous agents.
Collapse
Affiliation(s)
- George Mengov
- Faculty of Economics and Business Administration, Sofia University St Kliment Óhridski, 125 Tzarigradsko Chaussee Blvd., Bl. 3, 1113 Sofia, Bulgaria.
| |
Collapse
|
36
|
Abstract
We propose and develop a hierarchical approach to network control of complex tasks. In this approach, a low-level controller directs the activity of a "plant," the system that performs the task. However, the low-level controller may be able to solve only fairly simple problems involving the plant. To accomplish more complex tasks, we introduce a higher-level controller that controls the lower-level controller. We use this system to direct an articulated truck to a specified location through an environment filled with static or moving obstacles. The final system consists of networks that have memorized associations between the sensory data they receive and the commands they issue. These networks are trained on a set of optimal associations generated by minimizing cost functions. Cost function minimization requires predicting the consequences of sequences of commands, which is achieved by constructing forward models, including a model of the lower-level controller. The forward models and cost minimization are used only during training, allowing the trained networks to respond rapidly. In general, the hierarchical approach can be extended to larger numbers of levels, dividing complex tasks into more manageable subtasks. The optimization procedure and the construction of the forward models and controllers can be performed in similar ways at each level of the hierarchy, which allows the system to be modified to perform other tasks or to be extended for more complex tasks without retraining lower-levels.
Collapse
Affiliation(s)
- Greg Wayne
- Department of Neuroscience and Department of Physiology and Cellular Biophysics, Columbia University College of Physicians and Surgeons, New York, NY 10032-2695, U.S.A.
| | | |
Collapse
|
37
|
Dayan P. Rationalizable irrationalities of choice. Top Cogn Sci 2014; 6:204-28. [PMID: 24648392 DOI: 10.1111/tops.12082] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2012] [Revised: 02/19/2013] [Accepted: 08/14/2013] [Indexed: 11/28/2022]
Abstract
Although seemingly irrational choice abounds, the rules governing these mis-steps that might provide hints about the factors limiting normative behavior are unclear. We consider three experimental tasks, which probe different aspects of non-normative choice under uncertainty. We argue for systematic statistical, algorithmic, and implementational sources of irrationality, including incomplete evaluation of long-run future utilities, Pavlovian actions, and habits, together with computational and statistical noise and uncertainty. We suggest structural and functional adaptations that minimize their maladaptive effects.
Collapse
Affiliation(s)
- Peter Dayan
- Gatsby Computational Neuroscience Unit, University College London
| |
Collapse
|
38
|
Abstract
An enduring and richly elaborated dichotomy in cognitive neuroscience is that of reflective versus reflexive decision making and choice. Other literatures refer to the two ends of what is likely to be a spectrum with terms such as goal-directed versus habitual, model-based versus model-free or prospective versus retrospective. One of the most rigorous traditions of experimental work in the field started with studies in rodents and graduated via human versions and enrichments of those experiments to a current state in which new paradigms are probing and challenging the very heart of the distinction. We review four generations of work in this tradition and provide pointers to the forefront of the field's fifth generation.
Collapse
|
39
|
Prediction error in reinforcement learning: a meta-analysis of neuroimaging studies. Neurosci Biobehav Rev 2013; 37:1297-310. [PMID: 23567522 DOI: 10.1016/j.neubiorev.2013.03.023] [Citation(s) in RCA: 278] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2012] [Revised: 03/19/2013] [Accepted: 03/27/2013] [Indexed: 01/19/2023]
Abstract
Activation likelihood estimation (ALE) meta-analyses were used to examine the neural correlates of prediction error in reinforcement learning. The findings are interpreted in the light of current computational models of learning and action selection. In this context, particular consideration is given to the comparison of activation patterns from studies using instrumental and Pavlovian conditioning, and where reinforcement involved rewarding or punishing feedback. The striatum was the key brain area encoding for prediction error, with activity encompassing dorsal and ventral regions for instrumental and Pavlovian reinforcement alike, a finding which challenges the functional separation of the striatum into a dorsal 'actor' and a ventral 'critic'. Prediction error activity was further observed in diverse areas of predominantly anterior cerebral cortex including medial prefrontal cortex and anterior cingulate cortex. Distinct patterns of prediction error activity were found for studies using rewarding and aversive reinforcers; reward prediction errors were observed primarily in the striatum while aversive prediction errors were found more widely including insula and habenula.
Collapse
|
40
|
Otto AR, Gershman SJ, Markman AB, Daw ND. The curse of planning: dissecting multiple reinforcement-learning systems by taxing the central executive. Psychol Sci 2013; 24:751-61. [PMID: 23558545 DOI: 10.1177/0956797612463080] [Citation(s) in RCA: 204] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
A number of accounts of human and animal behavior posit the operation of parallel and competing valuation systems in the control of choice behavior. In these accounts, a flexible but computationally expensive model-based reinforcement-learning system has been contrasted with a less flexible but more efficient model-free reinforcement-learning system. The factors governing which system controls behavior-and under what circumstances-are still unclear. Following the hypothesis that model-based reinforcement learning requires cognitive resources, we demonstrated that having human decision makers perform a demanding secondary task engenders increased reliance on a model-free reinforcement-learning strategy. Further, we showed that, across trials, people negotiate the trade-off between the two systems dynamically as a function of concurrent executive-function demands, and people's choice latencies reflect the computational expenses of the strategy they employ. These results demonstrate that competition between multiple learning systems can be controlled on a trial-by-trial basis by modulating the availability of cognitive resources.
Collapse
Affiliation(s)
- A Ross Otto
- Department of Psychology, University of Texas at Austin, USA.
| | | | | | | |
Collapse
|
41
|
Pezzulo G, Rigoli F, Chersi F. The mixed instrumental controller: using value of information to combine habitual choice and mental simulation. Front Psychol 2013; 4:92. [PMID: 23459512 PMCID: PMC3586710 DOI: 10.3389/fpsyg.2013.00092] [Citation(s) in RCA: 90] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2012] [Accepted: 02/08/2013] [Indexed: 11/13/2022] Open
Abstract
Instrumental behavior depends on both goal-directed and habitual mechanisms of choice. Normative views cast these mechanisms in terms of model-free and model-based methods of reinforcement learning, respectively. An influential proposal hypothesizes that model-free and model-based mechanisms coexist and compete in the brain according to their relative uncertainty. In this paper we propose a novel view in which a single Mixed Instrumental Controller produces both goal-directed and habitual behavior by flexibly balancing and combining model-based and model-free computations. The Mixed Instrumental Controller performs a cost-benefits analysis to decide whether to chose an action immediately based on the available "cached" value of actions (linked to model-free mechanisms) or to improve value estimation by mentally simulating the expected outcome values (linked to model-based mechanisms). Since mental simulation entails cognitive effort and increases the reward delay, it is activated only when the associated "Value of Information" exceeds its costs. The model proposes a method to compute the Value of Information, based on the uncertainty of action values and on the distance of alternative cached action values. Overall, the model by default chooses on the basis of lighter model-free estimates, and integrates them with costly model-based predictions only when useful. Mental simulation uses a sampling method to produce reward expectancies, which are used to update the cached value of one or more actions; in turn, this updated value is used for the choice. The key predictions of the model are tested in different settings of a double T-maze scenario. Results are discussed in relation with neurobiological evidence on the hippocampus - ventral striatum circuit in rodents, which has been linked to goal-directed spatial navigation.
Collapse
Affiliation(s)
- Giovanni Pezzulo
- Istituto di Linguistica Computazionale, "Antonio Zampolli," Consiglio Nazionale delle Ricerche Pisa, Italy ; Istituto di Scienze e Tecnologie della Cognizione, Consiglio Nazionale delle Ricerche Roma, Italy
| | | | | |
Collapse
|
42
|
Model-based and model-free mechanisms of human motor learning. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2013; 782:1-21. [PMID: 23296478 DOI: 10.1007/978-1-4614-5465-6_1] [Citation(s) in RCA: 127] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
|
43
|
Abstract
Recent research suggests that novelty has an influence on reward-related learning. Here, we showed that novel stimuli presented from a pre-familiarized category can accelerate or decelerate learning of the most rewarding category, depending on the condition. The extent of this influence depended on the individual trait of novelty seeking. Different reinforcement learning models were developed to quantify subjects' choices. We introduced a bias parameter to model explorative behavior toward novel stimuli and characterize individual variation in novelty response. The theoretical framework allowed us to test different assumptions, concerning the motivational value of novelty. The best fitting-model combined all novelty components and had a significant positive correlation with both the experimentally measured novelty bias and the independent novelty-seeking trait. Altogether, we have not only shown that novelty by itself enhances behavioral responses underlying reward processing, but also that novelty has a direct influence on reward-dependent learning processes, consistently with computational predictions.
Collapse
|
44
|
Dayan P. How to set the switches on this thing. Curr Opin Neurobiol 2012; 22:1068-74. [DOI: 10.1016/j.conb.2012.05.011] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2012] [Revised: 05/10/2012] [Accepted: 05/28/2012] [Indexed: 11/26/2022]
|
45
|
Shizgal P. Scarce means with alternative uses: robbins' definition of economics and its extension to the behavioral and neurobiological study of animal decision making. Front Neurosci 2012; 6:20. [PMID: 22363253 PMCID: PMC3275781 DOI: 10.3389/fnins.2012.00020] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2011] [Accepted: 01/23/2012] [Indexed: 11/13/2022] Open
Abstract
Almost 80 years ago, Lionel Robbins proposed a highly influential definition of the subject matter of economics: the allocation of scarce means that have alternative ends. Robbins confined his definition to human behavior, and he strove to separate economics from the natural sciences in general and from psychology in particular. Nonetheless, I extend his definition to the behavior of non-human animals, rooting my account in psychological processes and their neural underpinnings. Some historical developments are reviewed that render such a view more plausible today than would have been the case in Robbins’ time. To illustrate a neuroeconomic perspective on decision making in non-human animals, I discuss research on the rewarding effect of electrical brain stimulation. Central to this discussion is an empirically based, functional/computational model of how the subjective intensity of the electrical reward is computed and combined with subjective costs so as to determine the allocation of time to the pursuit of reward. Some successes achieved by applying the model are discussed, along with limitations, and evidence is presented regarding the roles played by several different neural populations in processes posited by the model. I present a rationale for marshaling convergent experimental methods to ground psychological and computational processes in the activity of identified neural populations, and I discuss the strengths, weaknesses, and complementarity of the individual approaches. I then sketch some recent developments that hold great promise for advancing our understanding of structure–function relationships in neuroscience in general and in the neuroeconomic study of decision making in particular.
Collapse
Affiliation(s)
- Peter Shizgal
- Department of Psychology, Center for Studies in Behavioral Neurobiology, Concordia University Montréal, QC, Canada
| |
Collapse
|
46
|
Solway A, Botvinick MM. Goal-directed decision making as probabilistic inference: a computational framework and potential neural correlates. Psychol Rev 2012; 119:120-54. [PMID: 22229491 PMCID: PMC3767755 DOI: 10.1037/a0026435] [Citation(s) in RCA: 99] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Recent work has given rise to the view that reward-based decision making is governed by two key controllers: a habit system, which stores stimulus-response associations shaped by past reward, and a goal-oriented system that selects actions based on their anticipated outcomes. The current literature provides a rich body of computational theory addressing habit formation, centering on temporal-difference learning mechanisms. Less progress has been made toward formalizing the processes involved in goal-directed decision making. We draw on recent work in cognitive neuroscience, animal conditioning, cognitive and developmental psychology, and machine learning to outline a new theory of goal-directed decision making. Our basic proposal is that the brain, within an identifiable network of cortical and subcortical structures, implements a probabilistic generative model of reward, and that goal-directed decision making is effected through Bayesian inversion of this model. We present a set of simulations implementing the account, which address benchmark behavioral and neuroscientific findings, and give rise to a set of testable predictions. We also discuss the relationship between the proposed framework and other models of decision making, including recent models of perceptual choice, to which our theory bears a direct connection.
Collapse
Affiliation(s)
- Alec Solway
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08540, USA
| | | |
Collapse
|
47
|
Abstract
In this paper, we review the current literature to highlight relations between age-associated declines in dopaminergic and serotonergic neuromodulation and adult age differences in adaptive goal-directed behavior. Specifically, we focus on evidence suggesting that deficits in neuromodulation contribute to older adults' behavioral disadvantages in learning and decision making. These deficits are particularly pronounced when reward information is uncertain or the task context requires flexible adaptations to changing stimulus-reward contingencies. Moreover, emerging evidence points to age-related differences in the sensitivity to rewarding and aversive outcomes during learning and decision making if the acquisition of behavior critically depends on outcome processing. These age-related asymmetries in outcome valuation may be explained by age differences in the interplay of dopaminergic and serotonergic neuromodulation. This hypothesis is based on recent neurocomputational and psychopharmacological approaches, which suggest that dopamine and serotonin serve opponent roles in regulating the balance between approach behavior and inhibitory control. Studying adaptive regulation of behavior across the adult life span may shed new light on how the aging brain changes functionally in response to its diminishing resources.
Collapse
Affiliation(s)
- Ben Eppinger
- Center for Lifespan Psychology, Max Planck Institute for Human Development, Berlin, Germany.
| | | | | |
Collapse
|
48
|
Rethinking motor learning and savings in adaptation paradigms: model-free memory for successful actions combines with internal models. Neuron 2011; 70:787-801. [PMID: 21609832 DOI: 10.1016/j.neuron.2011.04.012] [Citation(s) in RCA: 313] [Impact Index Per Article: 24.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/12/2011] [Indexed: 01/06/2023]
Abstract
Although motor learning is likely to involve multiple processes, phenomena observed in error-based motor learning paradigms tend to be conceptualized in terms of only a single process: adaptation, which occurs through updating an internal model. Here we argue that fundamental phenomena like movement direction biases, savings (faster relearning), and interference do not relate to adaptation but instead are attributable to two additional learning processes that can be characterized as model-free: use-dependent plasticity and operant reinforcement. Although usually "hidden" behind adaptation, we demonstrate, with modified visuomotor rotation paradigms, that these distinct model-based and model-free processes combine to learn an error-based motor task. (1) Adaptation of an internal model channels movements toward successful error reduction in visual space. (2) Repetition of the newly adapted movement induces directional biases toward the repeated movement. (3) Operant reinforcement through association of the adapted movement with successful error reduction is responsible for savings.
Collapse
|
49
|
Pezzulo G, Rigoli F. The value of foresight: how prospection affects decision-making. Front Neurosci 2011; 5:79. [PMID: 21747755 PMCID: PMC3129535 DOI: 10.3389/fnins.2011.00079] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2010] [Accepted: 06/06/2011] [Indexed: 11/20/2022] Open
Abstract
Traditional theories of decision-making assume that utilities are based on the intrinsic value of outcomes; in turn, these values depend on associations between expected outcomes and the current motivational state of the decision-maker. This view disregards the fact that humans (and possibly other animals) have prospection abilities, which permit anticipating future mental processes and motivational and emotional states. For instance, we can evaluate future outcomes in light of the motivational state we expect to have when the outcome is collected, not (only) when we make a decision. Consequently, we can plan for the future and choose to store food to be consumed when we expect to be hungry, not immediately. Furthermore, similarly to any expected outcome, we can assign a value to our anticipated mental processes and emotions. It has been reported that (in some circumstances) human subjects prefer to receive an unavoidable punishment immediately, probably because they are anticipating the dread associated with the time spent waiting for the punishment. This article offers a formal framework to guide neuroeconomic research on how prospection affects decision-making. The model has two characteristics. First, it uses model-based Bayesian inference to describe anticipation of cognitive and motivational processes. Second, the utility-maximization process considers these anticipations in two ways: to evaluate outcomes (e.g., the pleasure of eating a pie is evaluated differently at the beginning of a dinner, when one is hungry, and at the end of the dinner, when one is satiated), and as outcomes having a value themselves (e.g., the case of dread as a cost of waiting for punishment). By explicitly accounting for the relationship between prospection and value, our model provides a framework to reconcile the utility-maximization approach with psychological phenomena such as planning for the future and dread.
Collapse
Affiliation(s)
- Giovanni Pezzulo
- Istituto di Linguistica Computazionale "Antonio Zampolli," Consiglio Nazionale delle Ricerche Pisa, Italy
| | | |
Collapse
|
50
|
Democratic population decisions result in robust policy-gradient learning: a parametric study with GPU simulations. PLoS One 2011; 6:e18539. [PMID: 21572529 PMCID: PMC3087717 DOI: 10.1371/journal.pone.0018539] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2010] [Accepted: 03/03/2011] [Indexed: 11/28/2022] Open
Abstract
High performance computing on the Graphics Processing Unit (GPU) is an emerging
field driven by the promise of high computational power at a low cost. However,
GPU programming is a non-trivial task and moreover architectural limitations
raise the question of whether investing effort in this direction may be
worthwhile. In this work, we use GPU programming to simulate a two-layer network
of Integrate-and-Fire neurons with varying degrees of recurrent connectivity and
investigate its ability to learn a simplified navigation task using a
policy-gradient learning rule stemming from Reinforcement Learning. The purpose
of this paper is twofold. First, we want to support the use of GPUs in the field
of Computational Neuroscience. Second, using GPU computing power, we investigate
the conditions under which the said architecture and learning rule demonstrate
best performance. Our work indicates that networks featuring strong
Mexican-Hat-shaped recurrent connections in the top layer, where decision making
is governed by the formation of a stable activity bump in the neural population
(a “non-democratic” mechanism), achieve mediocre learning results at
best. In absence of recurrent connections, where all neurons “vote”
independently (“democratic”) for a decision via population vector
readout, the task is generally learned better and more robustly. Our study would
have been extremely difficult on a desktop computer without the use of GPU
programming. We present the routines developed for this purpose and show that a
speed improvement of 5x up to 42x is provided versus optimised Python code. The
higher speed is achieved when we exploit the parallelism of the GPU in the
search of learning parameters. This suggests that efficient GPU programming can
significantly reduce the time needed for simulating networks of spiking neurons,
particularly when multiple parameter configurations are investigated.
Collapse
|