1
|
Torresan F, Baltieri M. Disentangled representations for causal cognition. Phys Life Rev 2024; 51:343-381. [PMID: 39500032 DOI: 10.1016/j.plrev.2024.10.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2024] [Accepted: 10/14/2024] [Indexed: 12/06/2024]
Abstract
Complex adaptive agents consistently achieve their goals by solving problems that seem to require an understanding of causal information, information pertaining to the causal relationships that exist among elements of combined agent-environment systems. Causal cognition studies and describes the main characteristics of causal learning and reasoning in human and non-human animals, offering a conceptual framework to discuss cognitive performances based on the level of apparent causal understanding of a task. Despite the use of formal intervention-based models of causality, including causal Bayesian networks, psychological and behavioural research on causal cognition does not yet offer a computational account that operationalises how agents acquire a causal understanding of the world seemingly from scratch, i.e. without a-priori knowledge of relevant features of the environment. Research on causality in machine and reinforcement learning, especially involving disentanglement as a candidate process to build causal representations, represents on the other hand a concrete attempt at designing artificial agents that can learn about causality, shedding light on the inner workings of natural causal cognition. In this work, we connect these two areas of research to build a unifying framework for causal cognition that will offer a computational perspective on studies of animal cognition, and provide insights in the development of new algorithms for causal reinforcement learning in AI.
Collapse
Affiliation(s)
- Filippo Torresan
- University of Sussex, Falmer, Brighton, BN1 9RH, United Kingdom.
| | - Manuel Baltieri
- University of Sussex, Falmer, Brighton, BN1 9RH, United Kingdom; Araya Inc., Chiyoda City, Tokyo, 101 0025, Japan.
| |
Collapse
|
2
|
Mahmoodi A, Luo S, Harbison C, Piray P, Rushworth MFS. Human hippocampus and dorsomedial prefrontal cortex infer and update latent causes during social interaction. Neuron 2024; 112:3796-3809.e9. [PMID: 39353432 DOI: 10.1016/j.neuron.2024.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 06/04/2024] [Accepted: 09/03/2024] [Indexed: 10/04/2024]
Abstract
Latent-cause inference is the process of identifying features of the environment that have caused an outcome. This problem is especially important in social settings where individuals may not make equal contributions to the outcomes they achieve together. Here, we designed a novel task in which participants inferred which of two characters was more likely to have been responsible for outcomes achieved by working together. Using computational modeling, univariate and multivariate analysis of human fMRI, and continuous theta-burst stimulation, we identified two brain regions that solved the task. Notably, as each outcome occurred, it was possible to decode the inference of its cause (the responsible character) from hippocampal activity. Activity in dorsomedial prefrontal cortex (dmPFC) updated estimates of association between cause-responsible character-and the outcome. Disruption of dmPFC activity impaired participants' ability to update their estimate as a function of inferred responsibility but spared their ability to infer responsibility.
Collapse
Affiliation(s)
- Ali Mahmoodi
- Wellcome Centre for Integrative Neuroimaging, Department of Experimental Psychology, University of Oxford, Oxford, UK.
| | - Shuyi Luo
- Wellcome Centre for Integrative Neuroimaging, Department of Experimental Psychology, University of Oxford, Oxford, UK
| | - Caroline Harbison
- Wellcome Centre for Integrative Neuroimaging, Department of Experimental Psychology, University of Oxford, Oxford, UK
| | - Payam Piray
- Department of Psychology, University of Southern California, Los Angeles, CA, USA
| | - Matthew F S Rushworth
- Wellcome Centre for Integrative Neuroimaging, Department of Experimental Psychology, University of Oxford, Oxford, UK
| |
Collapse
|
3
|
Lee RS, Sagiv Y, Engelhard B, Witten IB, Daw ND. A feature-specific prediction error model explains dopaminergic heterogeneity. Nat Neurosci 2024; 27:1574-1586. [PMID: 38961229 DOI: 10.1038/s41593-024-01689-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 05/22/2024] [Indexed: 07/05/2024]
Abstract
The hypothesis that midbrain dopamine (DA) neurons broadcast a reward prediction error (RPE) is among the great successes of computational neuroscience. However, recent results contradict a core aspect of this theory: specifically that the neurons convey a scalar, homogeneous signal. While the predominant family of extensions to the RPE model replicates the classic model in multiple parallel circuits, we argue that these models are ill suited to explain reports of heterogeneity in task variable encoding across DA neurons. Instead, we introduce a complementary 'feature-specific RPE' model, positing that individual ventral tegmental area DA neurons report RPEs for different aspects of an animal's moment-to-moment situation. Further, we show how our framework can be extended to explain patterns of heterogeneity in action responses reported among substantia nigra pars compacta DA neurons. This theory reconciles new observations of DA heterogeneity with classic ideas about RPE coding while also providing a new perspective of how the brain performs reinforcement learning in high-dimensional environments.
Collapse
Affiliation(s)
- Rachel S Lee
- Princeton Neuroscience Institute, Princeton, NJ, USA
| | - Yotam Sagiv
- Princeton Neuroscience Institute, Princeton, NJ, USA
| | - Ben Engelhard
- Princeton Neuroscience Institute, Princeton, NJ, USA
| | | | - Nathaniel D Daw
- Princeton Neuroscience Institute, Princeton, NJ, USA.
- Department of Psychology, Princeton University, Princeton, NJ, USA.
| |
Collapse
|
4
|
Lu Q, Nguyen TT, Zhang Q, Hasson U, Griffiths TL, Zacks JM, Gershman SJ, Norman KA. Reconciling shared versus context-specific information in a neural network model of latent causes. Sci Rep 2024; 14:16782. [PMID: 39039131 PMCID: PMC11263346 DOI: 10.1038/s41598-024-64272-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 06/06/2024] [Indexed: 07/24/2024] Open
Abstract
It has been proposed that, when processing a stream of events, humans divide their experiences in terms of inferred latent causes (LCs) to support context-dependent learning. However, when shared structure is present across contexts, it is still unclear how the "splitting" of LCs and learning of shared structure can be simultaneously achieved. Here, we present the Latent Cause Network (LCNet), a neural network model of LC inference. Through learning, it naturally stores structure that is shared across tasks in the network weights. Additionally, it represents context-specific structure using a context module, controlled by a Bayesian nonparametric inference algorithm, which assigns a unique context vector for each inferred LC. Across three simulations, we found that LCNet could (1) extract shared structure across LCs in a function learning task while avoiding catastrophic interference, (2) capture human data on curriculum effects in schema learning, and (3) infer the underlying event structure when processing naturalistic videos of daily events. Overall, these results demonstrate a computationally feasible approach to reconciling shared structure and context-specific structure in a model of LCs that is scalable from laboratory experiment settings to naturalistic settings.
Collapse
Affiliation(s)
- Qihong Lu
- Department of Psychology and Princeton Neuroscience Institute, Princeton University, Princeton, USA.
| | - Tan T Nguyen
- Department of Psychological and Brain Sciences, Washington University in St. Louis, St. Louis, USA
| | - Qiong Zhang
- Department of Psychology and Department of Computer Science, Rutgers University, New Brunswick, USA
| | - Uri Hasson
- Department of Psychology and Princeton Neuroscience Institute, Princeton University, Princeton, USA
| | - Thomas L Griffiths
- Department of Psychology and Princeton Neuroscience Institute, Princeton University, Princeton, USA
- Department of Computer Science, Princeton University, Princeton, USA
| | - Jeffrey M Zacks
- Department of Psychological and Brain Sciences, Washington University in St. Louis, St. Louis, USA
| | - Samuel J Gershman
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, USA
| | - Kenneth A Norman
- Department of Psychology and Princeton Neuroscience Institute, Princeton University, Princeton, USA
| |
Collapse
|
5
|
Crego AC, Amaya KA, Palmer JA, Smith KS. A role for the dorsolateral striatum in prospective action control. iScience 2024; 27:110044. [PMID: 38883824 PMCID: PMC11176669 DOI: 10.1016/j.isci.2024.110044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 03/20/2024] [Accepted: 05/17/2024] [Indexed: 06/18/2024] Open
Abstract
The dorsolateral striatum (DLS) is important for performing actions persistently, even when it becomes suboptimal, reflecting a function that is reflexive and habitual. However, there are also ways in which persistent behaviors can result from a more prospective, planning mode of behavior. To help tease apart these possibilities for DLS function, we trained animals to perform a lever press for reward and then inhibited the DLS in key test phases: as the task shifted from a 1-press to a 3-press rule (upshift), as the task was maintained, as the task shifted back to the one-press rule (downshift), and when rewards came independent of pressing. During DLS inhibition, animals always favored their initially learned strategy to press just once, particularly so during the free-reward period. DLS inhibition surprisingly changed performance speed bidirectionally depending on the task shifts. DLS inhibition thus encouraged habitual behavior, suggesting it could normally help adapt to changing conditions.
Collapse
Affiliation(s)
- Adam C.G. Crego
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH 03755, USA
| | - Kenneth A. Amaya
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH 03755, USA
| | - Jensen A. Palmer
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH 03755, USA
| | - Kyle S. Smith
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH 03755, USA
| |
Collapse
|
6
|
Rohe T. Complex multisensory causal inference in multi-signal scenarios (commentary on Kayser, Debats & Heuer, 2024). Eur J Neurosci 2024; 59:2890-2893. [PMID: 38706126 DOI: 10.1111/ejn.16388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 04/24/2024] [Accepted: 04/25/2024] [Indexed: 05/07/2024]
Affiliation(s)
- Tim Rohe
- Institute of Psychology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|
7
|
Avraham G, Ivry RB. Interference underlies attenuation upon relearning in sensorimotor adaptation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.27.596118. [PMID: 38853972 PMCID: PMC11160603 DOI: 10.1101/2024.05.27.596118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Savings refers to the gain in performance upon relearning a task. In sensorimotor adaptation, savings is tested by having participants adapt to perturbed feedback and, following a washout block during which the system resets to baseline, presenting the same perturbation again. While savings has been observed with these tasks, we have shown that the contribution from implicit sensorimotor adaptation, a process that uses sensory prediction errors to recalibrate the sensorimotor map, is actually attenuated upon relearning (Avraham et al., 2021). In the present study, we test the hypothesis that this attenuation is due to interference arising from the washout block, and more generally, from experience with a different relationship between the movement and the feedback. In standard adaptation studies, removing the perturbation at the start of the washout block results in a salient error signal in the opposite direction to that observed during learning. As a starting point, we replicated the finding that implicit adaptation is attenuated following a washout period in which the feedback now signals a salient opposite error. When we eliminated visual feedback during washout, implicit adaptation was no longer attenuated upon relearning, consistent with the interference hypothesis. Next, we eliminated the salient error during washout by gradually decreasing the perturbation, creating a scenario in which the perceived errors fell within the range associated with motor noise. Nonetheless, attenuation was still prominent. Inspired by this observation, we tested participants with an extended experience with veridical feedback during an initial baseline phase and found that this was sufficient to cause robust attenuation of implicit adaptation during the first exposure to the perturbation. This effect was context-specific: It did not generalize to movements that were not associated with the interfering feedback. Taken together, these results show that the implicit sensorimotor adaptation system is highly sensitive to memory interference from a recent experience with a discrepant action-outcome contingency.
Collapse
Affiliation(s)
- Guy Avraham
- Department of Psychology and Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Richard B Ivry
- Department of Psychology and Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA 94720, USA
| |
Collapse
|
8
|
Dillon DG, Belleau EL, Origlio J, McKee M, Jahan A, Meyer A, Souther MK, Brunner D, Kuhn M, Ang YS, Cusin C, Fava M, Pizzagalli DA. Using Drift Diffusion and RL Models to Disentangle Effects of Depression On Decision-Making vs. Learning in the Probabilistic Reward Task. COMPUTATIONAL PSYCHIATRY (CAMBRIDGE, MASS.) 2024; 8:46-69. [PMID: 38774430 PMCID: PMC11104335 DOI: 10.5334/cpsy.108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 04/08/2024] [Indexed: 05/24/2024]
Abstract
The Probabilistic Reward Task (PRT) is widely used to investigate the impact of Major Depressive Disorder (MDD) on reinforcement learning (RL), and recent studies have used it to provide insight into decision-making mechanisms affected by MDD. The current project used PRT data from unmedicated, treatment-seeking adults with MDD to extend these efforts by: (1) providing a more detailed analysis of standard PRT metrics-response bias and discriminability-to better understand how the task is performed; (2) analyzing the data with two computational models and providing psychometric analyses of both; and (3) determining whether response bias, discriminability, or model parameters predicted responses to treatment with placebo or the atypical antidepressant bupropion. Analysis of standard metrics replicated recent work by demonstrating a dependency between response bias and response time (RT), and by showing that reward totals in the PRT are governed by discriminability. Behavior was well-captured by the Hierarchical Drift Diffusion Model (HDDM), which models decision-making processes; the HDDM showed excellent internal consistency and acceptable retest reliability. A separate "belief" model reproduced the evolution of response bias over time better than the HDDM, but its psychometric properties were weaker. Finally, the predictive utility of the PRT was limited by small samples; nevertheless, depressed adults who responded to bupropion showed larger pre-treatment starting point biases in the HDDM than non-responders, indicating greater sensitivity to the PRT's asymmetric reinforcement contingencies. Together, these findings enhance our understanding of reward and decision-making mechanisms that are implicated in MDD and probed by the PRT.
Collapse
Affiliation(s)
- Daniel G. Dillon
- Center for Depression, Anxiety and Stress Research, McLean Hospital, Belmont MA, USA
- Harvard Medical School, Boston MA, USA
| | - Emily L. Belleau
- Center for Depression, Anxiety and Stress Research, McLean Hospital, Belmont MA, USA
- Harvard Medical School, Boston MA, USA
| | - Julianne Origlio
- Depression Clinical and Research Program, Massachusetts General Hospital, Boston MA, USA
| | - Madison McKee
- Depression Clinical and Research Program, Massachusetts General Hospital, Boston MA, USA
| | - Aava Jahan
- Depression Clinical and Research Program, Massachusetts General Hospital, Boston MA, USA
| | - Ashley Meyer
- Depression Clinical and Research Program, Massachusetts General Hospital, Boston MA, USA
| | - Min Kang Souther
- Center for Depression, Anxiety and Stress Research, McLean Hospital, Belmont MA, USA
| | - Devon Brunner
- Center for Depression, Anxiety and Stress Research, McLean Hospital, Belmont MA, USA
| | - Manuel Kuhn
- Center for Depression, Anxiety and Stress Research, McLean Hospital, Belmont MA, USA
- Depression Clinical and Research Program, Massachusetts General Hospital, Boston MA, USA
| | - Yuen Siang Ang
- Center for Depression, Anxiety and Stress Research, McLean Hospital, Belmont MA, USA
| | - Cristina Cusin
- Harvard Medical School, Boston MA, USA
- Depression Clinical and Research Program, Massachusetts General Hospital, Boston MA, USA
| | - Maurizio Fava
- Harvard Medical School, Boston MA, USA
- Depression Clinical and Research Program, Massachusetts General Hospital, Boston MA, USA
| | - Diego A. Pizzagalli
- Center for Depression, Anxiety and Stress Research, McLean Hospital, Belmont MA, USA
- Depression Clinical and Research Program, Massachusetts General Hospital, Boston MA, USA
| |
Collapse
|
9
|
Loetscher KB, Goldfarb EV. Integrating and fragmenting memories under stress and alcohol. Neurobiol Stress 2024; 30:100615. [PMID: 38375503 PMCID: PMC10874731 DOI: 10.1016/j.ynstr.2024.100615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 02/03/2024] [Accepted: 02/06/2024] [Indexed: 02/21/2024] Open
Abstract
Stress can powerfully influence the way we form memories, particularly the extent to which they are integrated or situated within an underlying spatiotemporal and broader knowledge architecture. These different representations in turn have significant consequences for the way we use these memories to guide later behavior. Puzzlingly, although stress has historically been argued to promote fragmentation, leading to disjoint memory representations, more recent work suggests that stress can also facilitate memory binding and integration. Understanding the circumstances under which stress fosters integration will be key to resolving this discrepancy and unpacking the mechanisms by which stress can shape later behavior. Here, we examine memory integration at multiple levels: linking together the content of an individual experience, threading associations between related but distinct events, and binding an experience into a pre-existing schema or sense of causal structure. We discuss neural and cognitive mechanisms underlying each form of integration as well as findings regarding how stress, aversive learning, and negative affect can modulate each. In this analysis, we uncover that stress can indeed promote each level of integration. We also show how memory integration may apply to understanding effects of alcohol, highlighting extant clinical and preclinical findings and opportunities for further investigation. Finally, we consider the implications of integration and fragmentation for later memory-guided behavior, and the importance of understanding which type of memory representation is potentiated in order to design appropriate interventions.
Collapse
Affiliation(s)
| | - Elizabeth V. Goldfarb
- Department of Psychiatry, Yale University, USA
- Department of Psychology, Yale University, USA
- Wu Tsai Institute, Yale University, USA
- National Center for PTSD, West Haven VA, USA
| |
Collapse
|
10
|
Wurm F, Ernst B, Steinhauser M. Surprise-minimization as a solution to the structural credit assignment problem. PLoS Comput Biol 2024; 20:e1012175. [PMID: 38805546 PMCID: PMC11175464 DOI: 10.1371/journal.pcbi.1012175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 06/13/2024] [Accepted: 05/18/2024] [Indexed: 05/30/2024] Open
Abstract
The structural credit assignment problem arises when the causal structure between actions and subsequent outcomes is hidden from direct observation. To solve this problem and enable goal-directed behavior, an agent has to infer structure and form a representation thereof. In the scope of this study, we investigate a possible solution in the human brain. We recorded behavioral and electrophysiological data from human participants in a novel variant of the bandit task, where multiple actions lead to multiple outcomes. Crucially, the mapping between actions and outcomes was hidden and not instructed to the participants. Human choice behavior revealed clear hallmarks of credit assignment and learning. Moreover, a computational model which formalizes action selection as the competition between multiple representations of the hidden structure was fit to account for participants data. Starting in a state of uncertainty about the correct representation, the central mechanism of this model is the arbitration of action control towards the representation which minimizes surprise about outcomes. Crucially, single-trial latent-variable analysis reveals that the neural patterns clearly support central quantitative predictions of this surprise minimization model. The results suggest that neural activity is not only related to reinforcement learning under correct as well as incorrect task representations but also reflects central mechanisms of credit assignment and behavioral arbitration.
Collapse
Affiliation(s)
- Franz Wurm
- Catholic University of Eichstätt-Ingolstadt, Eichstätt, Germany
- Leiden University, Leiden, the Netherlands
- Leiden Institute for Brain and Cognition, Leiden University, Leiden, the Netherlands
| | - Benjamin Ernst
- Catholic University of Eichstätt-Ingolstadt, Eichstätt, Germany
| | | |
Collapse
|
11
|
Pisupati S, Langdon A, Konova AB, Niv Y. The utility of a latent-cause framework for understanding addiction phenomena. ADDICTION NEUROSCIENCE 2024; 10:100143. [PMID: 38524664 PMCID: PMC10959497 DOI: 10.1016/j.addicn.2024.100143] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/26/2024]
Abstract
Computational models of addiction often rely on a model-free reinforcement learning (RL) formulation, owing to the close associations between model-free RL, habitual behavior and the dopaminergic system. However, such formulations typically do not capture key recurrent features of addiction phenomena such as craving and relapse. Moreover, they cannot account for goal-directed aspects of addiction that necessitate contrasting, model-based formulations. Here we synthesize a growing body of evidence and propose that a latent-cause framework can help unify our understanding of several recurrent phenomena in addiction, by viewing them as the inferred return of previous, persistent "latent causes". We demonstrate that applying this framework to Pavlovian and instrumental settings can help account for defining features of craving and relapse such as outcome-specificity, generalization, and cyclical dynamics. Finally, we argue that this framework can bridge model-free and model-based formulations, and account for individual variability in phenomenology by accommodating the memories, beliefs, and goals of those living with addiction, motivating a centering of the individual, subjective experience of addiction and recovery.
Collapse
Affiliation(s)
- Sashank Pisupati
- Limbic Limited, London UK
- Princeton Neuroscience Institute & Department of Psychology, Princeton University, Princeton NJ, USA
| | - Angela Langdon
- National Institute of Mental Health & National Institute on Drug Abuse, National Institutes of Health, Bethesda MD, USA
| | - Anna B Konova
- Department of Psychiatry, University Behavioral Health Care & Brain Health Institute Rutgers University, New Brunswick NJ, USA
| | - Yael Niv
- Princeton Neuroscience Institute & Department of Psychology, Princeton University, Princeton NJ, USA
| |
Collapse
|
12
|
Kleindorfer S, Brouwer L, Hauber ME, Teunissen N, Peters A, Louter M, Webster MS, Katsis AC, Sulloway FJ, Common LK, Austin VI, Colombelli-Négrel D. Nestling Begging Calls Resemble Maternal Vocal Signatures When Mothers Call Slowly to Embryos. Am Nat 2024; 203:267-283. [PMID: 38306283 DOI: 10.1086/728105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2024]
Abstract
AbstractVocal production learning (the capacity to learn to produce vocalizations) is a multidimensional trait that involves different learning mechanisms during different temporal and socioecological contexts. Key outstanding questions are whether vocal production learning begins during the embryonic stage and whether mothers play an active role in this through pupil-directed vocalization behaviors. We examined variation in vocal copy similarity (an indicator of learning) in eight species from the songbird family Maluridae, using comparative and experimental approaches. We found that (1) incubating females from all species vocalized inside the nest and produced call types including a signature "B element" that was structurally similar to their nestlings' begging call; (2) in a prenatal playback experiment using superb fairy wrens (Malurus cyaneus), embryos showed a stronger heart rate response to playbacks of the B element than to another call element (A); and (3) mothers that produced slower calls had offspring with greater similarity between their begging call and the mother's B element vocalization. We conclude that malurid mothers display behaviors concordant with pupil-directed vocalizations and may actively influence their offspring's early life through sound learning shaped by maternal call tempo.
Collapse
|
13
|
Cisler JM, Dunsmoor JE, Fonzo GA, Nemeroff CB. Latent-state and model-based learning in PTSD. Trends Neurosci 2024; 47:150-162. [PMID: 38212163 PMCID: PMC10923154 DOI: 10.1016/j.tins.2023.12.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 12/18/2023] [Accepted: 12/18/2023] [Indexed: 01/13/2024]
Abstract
Post-traumatic stress disorder (PTSD) is characterized by altered emotional and behavioral responding following a traumatic event. In this article, we review the concepts of latent-state and model-based learning (i.e., learning and inferring abstract task representations) and discuss their relevance for clinical and neuroscience models of PTSD. Recent data demonstrate evidence for brain and behavioral biases in these learning processes in PTSD. These new data potentially recast excessive fear towards trauma cues as a problem in learning and updating abstract task representations, as opposed to traditional conceptualizations focused on stimulus-specific learning. Biases in latent-state and model-based learning may also be a common mechanism targeted in common therapies for PTSD. We highlight key knowledge gaps that need to be addressed to further elaborate how latent-state learning and its associated neurocircuitry mechanisms function in PTSD and how to optimize treatments to target these processes.
Collapse
Affiliation(s)
- Josh M Cisler
- Department of Psychiatry and Behavioral Sciences, University of Texas at Austin, Austin, TX, USA; Institute for Early Life Adversity Research, University of Texas at Austin, Austin, TX, USA.
| | - Joseph E Dunsmoor
- Department of Psychiatry and Behavioral Sciences, University of Texas at Austin, Austin, TX, USA; Institute for Early Life Adversity Research, University of Texas at Austin, Austin, TX, USA
| | - Gregory A Fonzo
- Department of Psychiatry and Behavioral Sciences, University of Texas at Austin, Austin, TX, USA; Institute for Early Life Adversity Research, University of Texas at Austin, Austin, TX, USA
| | - Charles B Nemeroff
- Department of Psychiatry and Behavioral Sciences, University of Texas at Austin, Austin, TX, USA; Institute for Early Life Adversity Research, University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
14
|
Hajnal MA, Tran D, Einstein M, Martelo MV, Safaryan K, Polack PO, Golshani P, Orbán G. Continuous multiplexed population representations of task context in the mouse primary visual cortex. Nat Commun 2023; 14:6687. [PMID: 37865648 PMCID: PMC10590415 DOI: 10.1038/s41467-023-42441-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 10/10/2023] [Indexed: 10/23/2023] Open
Abstract
Effective task execution requires the representation of multiple task-related variables that determine how stimuli lead to correct responses. Even the primary visual cortex (V1) represents other task-related variables such as expectations, choice, and context. However, it is unclear how V1 can flexibly accommodate these variables without interfering with visual representations. We trained mice on a context-switching cross-modal decision task, where performance depends on inferring task context. We found that the context signal that emerged in V1 was behaviorally relevant as it strongly covaried with performance, independent from movement. Importantly, this signal was integrated into V1 representation by multiplexing visual and context signals into orthogonal subspaces. In addition, auditory and choice signals were also multiplexed as these signals were orthogonal to the context representation. Thus, multiplexing allows V1 to integrate visual inputs with other sensory modalities and cognitive variables to avoid interference with the visual representation while ensuring the maintenance of task-relevant variables.
Collapse
Affiliation(s)
- Márton Albert Hajnal
- Department of Computational Sciences, Wigner Research Center for Physics, Budapest, 1121, Hungary
| | - Duy Tran
- Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Albert Einstein College of Medicine, New York, NY, 10461, USA
| | - Michael Einstein
- Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Mauricio Vallejo Martelo
- Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Karen Safaryan
- Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Pierre-Olivier Polack
- Center for Molecular and Behavioral Neuroscience, Rutgers University, Newark, NJ, 07102, USA
| | - Peyman Golshani
- Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Integrative Center for Learning and Memory, Brain Research Institute, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- West Los Angeles VA Medical Center, CA, 90073, Los Angeles, USA.
| | - Gergő Orbán
- Department of Computational Sciences, Wigner Research Center for Physics, Budapest, 1121, Hungary.
| |
Collapse
|
15
|
Lamba A, Nassar MR, FeldmanHall O. Prefrontal cortex state representations shape human credit assignment. eLife 2023; 12:e84888. [PMID: 37399050 PMCID: PMC10351919 DOI: 10.7554/elife.84888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2022] [Accepted: 06/16/2023] [Indexed: 07/04/2023] Open
Abstract
People learn adaptively from feedback, but the rate of such learning differs drastically across individuals and contexts. Here, we examine whether this variability reflects differences in what is learned. Leveraging a neurocomputational approach that merges fMRI and an iterative reward learning task, we link the specificity of credit assignment-how well people are able to appropriately attribute outcomes to their causes-to the precision of neural codes in the prefrontal cortex (PFC). Participants credit task-relevant cues more precisely in social compared vto nonsocial contexts, a process that is mediated by high-fidelity (i.e., distinct and consistent) state representations in the PFC. Specifically, the medial PFC and orbitofrontal cortex work in concert to match the neural codes from feedback to those at choice, and the strength of these common neural codes predicts credit assignment precision. Together this work provides a window into how neural representations drive adaptive learning.
Collapse
Affiliation(s)
- Amrita Lamba
- Department of Cognitive Linguistic & Psychological Sciences, Brown UniversityProvidenceUnited States
| | - Matthew R Nassar
- Department of Neuroscience, Brown UniversityProvidenceUnited States
- Carney Institute of Brain Sciences, Brown UniversityProvidenceUnited States
| | - Oriel FeldmanHall
- Department of Cognitive Linguistic & Psychological Sciences, Brown UniversityProvidenceUnited States
- Carney Institute of Brain Sciences, Brown UniversityProvidenceUnited States
| |
Collapse
|
16
|
Barack DL, Bakkour A, Shohamy D, Salzman CD. Visuospatial information foraging describes search behavior in learning latent environmental features. Sci Rep 2023; 13:1126. [PMID: 36670132 PMCID: PMC9860038 DOI: 10.1038/s41598-023-27662-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Accepted: 01/05/2023] [Indexed: 01/22/2023] Open
Abstract
In the real world, making sequences of decisions to achieve goals often depends upon the ability to learn aspects of the environment that are not directly perceptible. Learning these so-called latent features requires seeking information about them. Prior efforts to study latent feature learning often used single decisions, used few features, and failed to distinguish between reward-seeking and information-seeking. To overcome this, we designed a task in which humans and monkeys made a series of choices to search for shapes hidden on a grid. On our task, the effects of reward and information outcomes from uncovering parts of shapes could be disentangled. Members of both species adeptly learned the shapes and preferred to select tiles expected to be informative earlier in trials than previously rewarding ones, searching a part of the grid until their outcomes dropped below the average information outcome-a pattern consistent with foraging behavior. In addition, how quickly humans learned the shapes was predicted by how well their choice sequences matched the foraging pattern, revealing an unexpected connection between foraging and learning. This adaptive search for information may underlie the ability in humans and monkeys to learn latent features to support goal-directed behavior in the long run.
Collapse
Affiliation(s)
- David L Barack
- Department of Neuroscience, Columbia University, New York, USA.
- Mortimer B. Zuckerman Mind Brain and Behavior Institute, Columbia University, New York, USA.
| | - Akram Bakkour
- Department of Psychology, University of Chicago, Chicago, USA
| | - Daphna Shohamy
- Mortimer B. Zuckerman Mind Brain and Behavior Institute, Columbia University, New York, USA
- Department of Psychology, Columbia University, New York, USA
- Kavli Institute for Brain Sciences, Columbia University, New York, USA
| | - C Daniel Salzman
- Department of Neuroscience, Columbia University, New York, USA
- Mortimer B. Zuckerman Mind Brain and Behavior Institute, Columbia University, New York, USA
- Kavli Institute for Brain Sciences, Columbia University, New York, USA
- Department of Psychiatry, Columbia University, New York, USA
- New York State Psychiatric Institute, New York, USA
| |
Collapse
|
17
|
Gallistel CR, Latham PE. Bringing Bayes and Shannon to the Study of Behavioural and Neurobiological Timing and Associative Learning. TIMING & TIME PERCEPTION 2022. [DOI: 10.1163/22134468-bja10069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Abstract
Bayesian parameter estimation and Shannon’s theory of information provide tools for analysing and understanding data from behavioural and neurobiological experiments on interval timing—and from experiments on Pavlovian and operant conditioning, because timing plays a fundamental role in associative learning. In this tutorial, we explain basic concepts behind these tools and show how to apply them to estimating, on a trial-by-trial, reinforcement-by-reinforcement and response-by-response basis, important parameters of timing behaviour and of the neurobiological manifestations of timing in the brain. These tools enable quantification of relevant variables in the trade-off between acting as an ideal observer should act and acting as an ideal agent should act, which is also known as the trade-off between exploration (information gathering) and exploitation (information utilization) in reinforcement learning. They enable comparing the strength of the evidence for a measurable association to the strength of the behavioural evidence that the association has been perceived. A GitHub site and an OSF site give public access to well-documented Matlab and Python code and to raw data to which these tools have been applied.
Collapse
Affiliation(s)
- C. Randy Gallistel
- Professor Emeritus, Rutgers University, 252 7th Ave 10D, New York, NY 10001, USA
| | - Peter E. Latham
- Gatsby Computational Neuroscience Unit, Sainsbury Wellcome Centre or Neural Circuits and Behaviour, 25 Howland St., London WIT 4JG, UK
| |
Collapse
|
18
|
Pisupati S, Niv Y. The challenges of lifelong learning in biological and artificial systems. Trends Cogn Sci 2022; 26:1051-1053. [PMID: 36335012 PMCID: PMC9676180 DOI: 10.1016/j.tics.2022.09.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Accepted: 09/28/2022] [Indexed: 11/11/2022]
Abstract
How do biological systems learn continuously throughout their lifespans, adapting to change while retaining old knowledge, and how can these principles be applied to artificial learning systems? In this Forum article we outline challenges and strategies of 'lifelong learning' in biological and artificial systems, and argue that a collaborative study of each system's failure modes can benefit both.
Collapse
Affiliation(s)
- Sashank Pisupati
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA.
| | - Yael Niv
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA
| |
Collapse
|
19
|
Ben-Artzi I, Luria R, Shahar N. Working memory capacity estimates moderate value learning for outcome-irrelevant features. Sci Rep 2022; 12:19677. [PMID: 36385131 PMCID: PMC9669000 DOI: 10.1038/s41598-022-21832-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Accepted: 10/04/2022] [Indexed: 11/17/2022] Open
Abstract
To establish accurate action-outcome associations in the environment, individuals must refrain from assigning value to outcome-irrelevant features. However, studies have largely ignored the role of attentional control processes on action value updating. In the current study, we examined the extent to which working memory-a system that can filter and block the processing of irrelevant information in one's mind-also filters outcome-irrelevant information during value-based learning. For this aim, 174 individuals completed a well-established working memory capacity measurement and a reinforcement learning task designed to estimate outcome-irrelevant learning. We replicated previous studies showing a group-level tendency to assign value to tasks' response keys, despite clear instructions and practice suggesting they are irrelevant to the prediction of monetary outcomes. Importantly, individuals with higher working memory capacity were less likely to assign value to the outcome-irrelevant response keys, thus suggesting a significant moderation effect of working memory capacity on outcome-irrelevant learning. We discuss the role of working memory processing on value-based learning through the lens of a cognitive control failure.
Collapse
Affiliation(s)
- Ido Ben-Artzi
- School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel.
| | - Roy Luria
- School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
| | - Nitzan Shahar
- School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
20
|
Török B, Nagy DG, Kiss M, Janacsek K, Németh D, Orbán G. Tracking the contribution of inductive bias to individualised internal models. PLoS Comput Biol 2022; 18:e1010182. [PMID: 35731822 PMCID: PMC9255757 DOI: 10.1371/journal.pcbi.1010182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 07/05/2022] [Accepted: 05/08/2022] [Indexed: 11/20/2022] Open
Abstract
Internal models capture the regularities of the environment and are central to understanding how humans adapt to environmental statistics. In general, the correct internal model is unknown to observers, instead they rely on an approximate model that is continually adapted throughout learning. However, experimenters assume an ideal observer model, which captures stimulus structure but ignores the diverging hypotheses that humans form during learning. We combine non-parametric Bayesian methods and probabilistic programming to infer rich and dynamic individualised internal models from response times. We demonstrate that the approach is capable of characterizing the discrepancy between the internal model maintained by individuals and the ideal observer model and to track the evolution of the contribution of the ideal observer model to the internal model throughout training. In particular, in an implicit visuomotor sequence learning task the identified discrepancy revealed an inductive bias that was consistent across individuals but varied in strength and persistence.
Collapse
Affiliation(s)
- Balázs Török
- Department of Computational Sciences, Wigner Research Centre for Physics, Budapest, Hungary
- Department of Cognitive Science, Faculty of Natural Sciences, Budapest University of Technology and Economics, Műegyetem rkp. 3., H-1111 Budapest, Hungary
- Brain, Memory and Language Research Group, Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Budapest, Hungary
| | - David G. Nagy
- Department of Computational Sciences, Wigner Research Centre for Physics, Budapest, Hungary
- Institute of Physics, Eötvös Loránd University, Budapest, Hungary
| | - Mariann Kiss
- Department of Cognitive Science, Faculty of Natural Sciences, Budapest University of Technology and Economics, Műegyetem rkp. 3., H-1111 Budapest, Hungary
- Brain, Memory and Language Research Group, Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Budapest, Hungary
| | - Karolina Janacsek
- Institute of Psychology, ELTE Eötvös Loránd University, Budapest, Hungary
- Centre for Thinking and Learning, Institute for Lifecourse Development, School of Human Sciences, Faculty of Education, Health and Human Sciences, University of Greenwich, London, United Kingdom
| | - Dezső Németh
- Brain, Memory and Language Research Group, Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Budapest, Hungary
- Institute of Psychology, ELTE Eötvös Loránd University, Budapest, Hungary
- Lyon Neuroscience Research Center (CRNL), Université Claude Bernard Lyon 1, Lyon, France
| | - Gergő Orbán
- Department of Computational Sciences, Wigner Research Centre for Physics, Budapest, Hungary
| |
Collapse
|
21
|
Horing B, Büchel C. The human insula processes both modality-independent and pain-selective learning signals. PLoS Biol 2022; 20:e3001540. [PMID: 35522696 PMCID: PMC9116652 DOI: 10.1371/journal.pbio.3001540] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Revised: 05/18/2022] [Accepted: 04/15/2022] [Indexed: 12/02/2022] Open
Abstract
Prediction errors (PEs) are generated when there are differences between an expected and an actual event or sensory input. The insula is a key brain region involved in pain processing, and studies have shown that the insula encodes the magnitude of an unexpected outcome (unsigned PEs). In addition to signaling this general magnitude information, PEs can give specific information on the direction of this deviation-i.e., whether an event is better or worse than expected. It is unclear whether the unsigned PE responses in the insula are selective for pain or reflective of a more general processing of aversive events irrespective of modality. It is also unknown whether the insula can process signed PEs at all. Understanding these specific mechanisms has implications for understanding how pain is processed in the brain in both health and in chronic pain conditions. In this study, 47 participants learned associations between 2 conditioned stimuli (CS) with 4 unconditioned stimuli (US; painful heat or loud sound, of one low and one high intensity each) while undergoing functional magnetic resonance imaging (fMRI) and skin conductance response (SCR) measurements. We demonstrate that activation in the anterior insula correlated with unsigned intensity PEs, irrespective of modality, indicating an unspecific aversive surprise signal. Conversely, signed intensity PE signals were modality specific, with signed PEs following pain but not sound located in the dorsal posterior insula, an area implicated in pain intensity processing. Previous studies have identified abnormal insula function and abnormal learning as potential causes of pain chronification. Our findings link these results and suggest that a misrepresentation of learning relevant PEs in the insular cortex may serve as an underlying factor in chronic pain.
Collapse
Affiliation(s)
- Björn Horing
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Christian Büchel
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| |
Collapse
|
22
|
A computational theory of the subjective experience of flow. Nat Commun 2022; 13:2252. [PMID: 35474044 PMCID: PMC9042870 DOI: 10.1038/s41467-022-29742-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Accepted: 03/25/2022] [Indexed: 11/08/2022] Open
Abstract
Flow is a subjective state characterized by immersion and engagement in one's current activity. The benefits of flow for productivity and health are well-documented, but a rigorous description of the flow-generating process remains elusive. Here we develop and empirically test a theory of flow's computational substrates: the informational theory of flow. Our theory draws on the concept of mutual information, a fundamental quantity in information theory that quantifies the strength of association between two variables. We propose that the mutual information between desired end states and means of attaining them - [Formula: see text] - gives rise to flow. We support our theory across five experiments (four preregistered) by showing, across multiple activities, that increasing [Formula: see text] increases flow and has important downstream benefits, including enhanced attention and enjoyment. We rule out alternative constructs including alternative metrics of associative strength, psychological constructs previously shown to predict flow, and various forms of instrumental value.
Collapse
|
23
|
Abstract
In cognitive psychology, a recent perspective based on the notion of latent cause (LC) has offered new insight on how learning and memory work. Here I explore the implications of this novel perspective to understand posttraumatic stress disorder (PTSD). The proposal is that, because of a propensity to interpret events as manifestations of multiple LCs (a propensity facilitated by experiencing traumas in childhood), PTSD patients form an LC associated with the trauma and that this LC is responsible for typical symptoms of the illness (specifically, intrusive symptoms and associated fear). Later, after the trauma, some patients develop a second LC, now associated with the presence of trauma-related cues combined with absence of danger. Development of the latter LC would interfere with extinction and explain why, for some patients, exposure to trauma-related cues (even when supported by interventions such as exposure protocols) fails to provide much improvement. This proposal has potential clinical implications, raising the possibility that some patients might benefit from exposure to mildly painful aspects of the trauma in conjunction with trauma-related cues.
Collapse
|
24
|
Lu Q, Hasson U, Norman KA. A neural network model of when to retrieve and encode episodic memories. eLife 2022; 11:e74445. [PMID: 35142289 PMCID: PMC9000961 DOI: 10.7554/elife.74445] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Accepted: 02/09/2022] [Indexed: 11/23/2022] Open
Abstract
Recent human behavioral and neuroimaging results suggest that people are selective in when they encode and retrieve episodic memories. To explain these findings, we trained a memory-augmented neural network to use its episodic memory to support prediction of upcoming states in an environment where past situations sometimes reoccur. We found that the network learned to retrieve selectively as a function of several factors, including its uncertainty about the upcoming state. Additionally, we found that selectively encoding episodic memories at the end of an event (but not mid-event) led to better subsequent prediction performance. In all of these cases, the benefits of selective retrieval and encoding can be explained in terms of reducing the risk of retrieving irrelevant memories. Overall, these modeling results provide a resource-rational account of why episodic retrieval and encoding should be selective and lead to several testable predictions.
Collapse
Affiliation(s)
- Qihong Lu
- Department of Psychology, Princeton UniversityPrincetonUnited States
- Princeton Neuroscience Institute, Princeton UniversityPrincetonUnited States
| | - Uri Hasson
- Department of Psychology, Princeton UniversityPrincetonUnited States
- Princeton Neuroscience Institute, Princeton UniversityPrincetonUnited States
| | - Kenneth A Norman
- Department of Psychology, Princeton UniversityPrincetonUnited States
- Princeton Neuroscience Institute, Princeton UniversityPrincetonUnited States
| |
Collapse
|
25
|
Shamay-Tsoory SG, Hertz U. Adaptive Empathy: A Model for Learning Empathic Responses in Response to Feedback. PERSPECTIVES ON PSYCHOLOGICAL SCIENCE 2022; 17:1008-1023. [PMID: 35050819 DOI: 10.1177/17456916211031926] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Empathy is usually deployed in social interactions. Nevertheless, common measures and examinations of empathy study this construct in isolation from the person in distress. In this article we seek to extend the field of examination to include both empathizer and target to determine whether and how empathic responses are affected by feedback and learned through interaction. Building on computational approaches in feedback-based adaptations (e.g., no feedback, model-free and model-based learning), we propose a framework for understanding how empathic responses are learned on the basis of feedback. In this framework, adaptive empathy, defined as the ability to adapt one's empathic responses, is a central aspect of empathic skills and can provide a new dimension to the evaluation and investigation of empathy. By extending existing neural models of empathy, we suggest that adaptive empathy may be mediated by interactions between the neural circuits associated with valuation, shared distress, observation-execution, and mentalizing. Finally, we propose that adaptive empathy should be considered a prominent facet of empathic capabilities with the potential to explain empathic behavior in health and psychopathology.
Collapse
Affiliation(s)
- Simone G Shamay-Tsoory
- Department of Psychology, University of Haifa.,Integrated Brain and Behavior Research Center (IBBRC), University of Haifa
| | - Uri Hertz
- Integrated Brain and Behavior Research Center (IBBRC), University of Haifa.,Department of Cognitive Sciences, University of Haifa
| |
Collapse
|
26
|
Subramanian A, Chitlangia S, Baths V. Reinforcement learning and its connections with neuroscience and psychology. Neural Netw 2021; 145:271-287. [PMID: 34781215 DOI: 10.1016/j.neunet.2021.10.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 09/26/2021] [Accepted: 10/01/2021] [Indexed: 11/19/2022]
Abstract
Reinforcement learning methods have recently been very successful at performing complex sequential tasks like playing Atari games, Go and Poker. These algorithms have outperformed humans in several tasks by learning from scratch, using only scalar rewards obtained through interaction with their environment. While there certainly has been considerable independent innovation to produce such results, many core ideas in reinforcement learning are inspired by phenomena in animal learning, psychology and neuroscience. In this paper, we comprehensively review a large number of findings in both neuroscience and psychology that evidence reinforcement learning as a promising candidate for modeling learning and decision making in the brain. In doing so, we construct a mapping between various classes of modern RL algorithms and specific findings in both neurophysiological and behavioral literature. We then discuss the implications of this observed relationship between RL, neuroscience and psychology and its role in advancing research in both AI and brain science.
Collapse
Affiliation(s)
- Ajay Subramanian
- Department of Psychology, New York University, New York, New York, 10003, USA; Cognitive Neuroscience Lab, BITS Pilani K K Birla Goa Campus, NH-17B, Zuarinagar, Goa, 403726, India.
| | - Sharad Chitlangia
- Amazon; Cognitive Neuroscience Lab, BITS Pilani K K Birla Goa Campus, NH-17B, Zuarinagar, Goa, 403726, India.
| | - Veeky Baths
- Cognitive Neuroscience Lab, BITS Pilani K K Birla Goa Campus, NH-17B, Zuarinagar, Goa, 403726, India; Department of Biological Sciences, BITS Pilani K K Birla Goa Campus, NH-17B, Zuarinagar, Goa, 403726, India.
| |
Collapse
|
27
|
Wang S, Feng SF, Bornstein AM. Mixing memory and desire: How memory reactivation supports deliberative decision-making. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2021; 13:e1581. [PMID: 34665529 DOI: 10.1002/wcs.1581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Revised: 08/24/2021] [Accepted: 09/16/2021] [Indexed: 11/09/2022]
Abstract
Memories affect nearly every aspect of our mental life. They allow us to both resolve uncertainty in the present and to construct plans for the future. Recently, renewed interest in the role memory plays in adaptive behavior has led to new theoretical advances and empirical observations. We review key findings, with particular emphasis on how the retrieval of many kinds of memories affects deliberative action selection. These results are interpreted in a sequential inference framework, in which reinstatements from memory serve as "samples" of potential action outcomes. The resulting model suggests a central role for the dynamics of memory reactivation in determining the influence of different kinds of memory in decisions. We propose that representation-specific dynamics can implement a bottom-up "product of experts" rule that integrates multiple sets of action-outcome predictions weighted based on their uncertainty. We close by reviewing related findings and identifying areas for further research. This article is categorized under: Psychology > Reasoning and Decision Making Neuroscience > Cognition Neuroscience > Computation.
Collapse
Affiliation(s)
- Shaoming Wang
- Department of Psychology, New York University, New York, New York, USA
| | - Samuel F Feng
- Department of Mathematics, Khalifa University of Science and Technology, Abu Dhabi, UAE.,Khalifa University Centre for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, UAE
| | - Aaron M Bornstein
- Department of Cognitive Sciences, University of California-Irvine, Irvine, California, USA.,Center for the Neurobiology of Learning & Memory, University of California-Irvine, Irvine, California, USA.,Institute for Mathematical Behavioral Sciences, University of California-Irvine, Irvine, California, USA
| |
Collapse
|
28
|
Hamid AA. Dopaminergic specializations for flexible behavioral control: linking levels of analysis and functional architectures. Curr Opin Behav Sci 2021. [DOI: 10.1016/j.cobeha.2021.07.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
|
29
|
Knudsen EB, Wallis JD. Hippocampal neurons construct a map of an abstract value space. Cell 2021; 184:4640-4650.e10. [PMID: 34348112 DOI: 10.1016/j.cell.2021.07.010] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 05/11/2021] [Accepted: 07/08/2021] [Indexed: 02/06/2023]
Abstract
The hippocampus is thought to encode a "cognitive map," a structural organization of knowledge about relationships in the world. Place cells, spatially selective hippocampal neurons that have been extensively studied in rodents, are one component of this map, describing the relative position of environmental features. However, whether this map extends to abstract, cognitive information remains unknown. Using the relative reward value of cues to define continuous "paths" through an abstract value space, we show that single neurons in primate hippocampus encode this space through value place fields, much like a rodent's place neurons encode paths through physical space. Value place fields remapped when cues changed but also became increasingly correlated across contexts, allowing maps to become generalized. Our findings help explain the critical contribution of the hippocampus to value-based decision-making, providing a mechanism by which knowledge of relationships in the world can be incorporated into reward predictions for guiding decisions.
Collapse
Affiliation(s)
- Eric B Knudsen
- Helen Wills Neuroscience Institute, University of California at Berkeley, Berkeley, CA, USA.
| | - Joni D Wallis
- Helen Wills Neuroscience Institute, University of California at Berkeley, Berkeley, CA, USA; Department of Psychology, University of California at Berkeley, Berkeley, CA, USA
| |
Collapse
|
30
|
Hamid AA, Frank MJ, Moore CI. Wave-like dopamine dynamics as a mechanism for spatiotemporal credit assignment. Cell 2021; 184:2733-2749.e16. [PMID: 33861952 PMCID: PMC8122079 DOI: 10.1016/j.cell.2021.03.046] [Citation(s) in RCA: 91] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 12/31/2020] [Accepted: 03/23/2021] [Indexed: 12/17/2022]
Abstract
Significant evidence supports the view that dopamine shapes learning by encoding reward prediction errors. However, it is unknown whether striatal targets receive tailored dopamine dynamics based on regional functional specialization. Here, we report wave-like spatiotemporal activity patterns in dopamine axons and release across the dorsal striatum. These waves switch between activational motifs and organize dopamine transients into localized clusters within functionally related striatal subregions. Notably, wave trajectories were tailored to task demands, propagating from dorsomedial to dorsolateral striatum when rewards are contingent on animal behavior and in the opponent direction when rewards are independent of behavioral responses. We propose a computational architecture in which striatal dopamine waves are sculpted by inference about agency and provide a mechanism to direct credit assignment to specialized striatal subregions. Supporting model predictions, dorsomedial dopamine activity during reward-pursuit signaled the extent of instrumental control and interacted with reward waves to predict future behavioral adjustments.
Collapse
Affiliation(s)
- Arif A Hamid
- Department of Neuroscience, Brown University, Providence, RI 02912, USA; Carney Institute for Brain Science, Brown University, Providence, RI 02912, USA.
| | - Michael J Frank
- Department of Cognitive Linguistics & Psychological Sciences, Brown University, Providence, RI 02912, USA; Carney Institute for Brain Science, Brown University, Providence, RI 02912, USA.
| | - Christopher I Moore
- Department of Neuroscience, Brown University, Providence, RI 02912, USA; Carney Institute for Brain Science, Brown University, Providence, RI 02912, USA.
| |
Collapse
|
31
|
Smith R, Moutoussis M, Bilek E. Simulating the computational mechanisms of cognitive and behavioral psychotherapeutic interventions: insights from active inference. Sci Rep 2021; 11:10128. [PMID: 33980875 PMCID: PMC8115057 DOI: 10.1038/s41598-021-89047-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Accepted: 04/15/2021] [Indexed: 11/08/2022] Open
Abstract
Cognitive-behavioral therapy (CBT) leverages interactions between thoughts, feelings, and behaviors. To deepen understanding of these interactions, we present a computational (active inference) model of CBT that allows formal simulations of interactions between cognitive interventions (i.e., cognitive restructuring) and behavioral interventions (i.e., exposure) in producing adaptive behavior change (i.e., reducing maladaptive avoidance behavior). Using spider phobia as a concrete example of maladaptive avoidance more generally, we show simulations indicating that when conscious beliefs about safety/danger have strong interactions with affective/behavioral outcomes, behavioral change during exposure therapy is mediated by changes in these beliefs, preventing generalization. In contrast, when these interactions are weakened, and cognitive restructuring only induces belief uncertainty (as opposed to strong safety beliefs), behavior change leads to generalized learning (i.e., "over-writing" the implicit beliefs about action-outcome mappings that directly produce avoidance). The individual is therefore equipped to face any new context, safe or dangerous, remaining in a content state without the need for avoidance behavior-increasing resilience from a CBT perspective. These results show how the same changes in behavior during CBT can be due to distinct underlying mechanisms; they predict lower rates of relapse when cognitive interventions focus on inducing uncertainty and on reducing the effects of automatic negative thoughts on behavior.
Collapse
Affiliation(s)
- Ryan Smith
- Laureate Institute for Brain Research, 6655 S Yale Ave, Tulsa, OK, 74136, USA.
| | - Michael Moutoussis
- Wellcome Centre for Human Neuroimaging, Institute of Neurology, University College London, London, UK
- The Max Planck-University College London Centre for Computational Psychiatry and Ageing, London, UK
| | - Edda Bilek
- Wellcome Centre for Human Neuroimaging, Institute of Neurology, University College London, London, UK
| |
Collapse
|
32
|
The self in context: brain systems linking mental and physical health. Nat Rev Neurosci 2021; 22:309-322. [PMID: 33790441 PMCID: PMC8447265 DOI: 10.1038/s41583-021-00446-8] [Citation(s) in RCA: 78] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/22/2021] [Indexed: 02/01/2023]
Abstract
Increasing evidence suggests that mental health and physical health are linked by neural systems that jointly regulate somatic physiology and high-level cognition. Key systems include the ventromedial prefrontal cortex and the related default-mode network. These systems help to construct models of the 'self-in-context', compressing information across time and sensory modalities into conceptions of the underlying causes of experience. Self-in-context models endow events with personal meaning and allow predictive control over behaviour and peripheral physiology, including autonomic, neuroendocrine and immune function. They guide learning from experience and the formation of narratives about the self and one's world. Disorders of mental and physical health, especially those with high co-occurrence and convergent alterations in the functionality of the ventromedial prefrontal cortex and the default-mode network, could benefit from interventions focused on understanding and shaping mindsets and beliefs about the self, illness and treatment.
Collapse
|
33
|
Goldfarb EV, Blow T, Dunsmoor JE, Phelps EA. Elemental and configural threat learning bias extinction generalization. Neurobiol Learn Mem 2021; 180:107405. [PMID: 33609739 PMCID: PMC8076085 DOI: 10.1016/j.nlm.2021.107405] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Revised: 02/08/2021] [Accepted: 02/11/2021] [Indexed: 11/26/2022]
Abstract
Emotional experiences often contain a multitude of details that may be represented in memory as individual elements or integrated into a single representation. How details associated with a negative emotional event are represented in memory can have important implications for extinction strategies designed to reduce emotional responses. For example, is extinguishing one cue associated with an aversive outcome sufficient to reduce learned behavior to other cues present at the time of learning that were not directly extinguished? Here, we used a between-subjects multi-day threat conditioning and extinction task to assess whether participants generalize extinction from one cue to unextinguished cues. On Day 1, one group of participants learned that a compound conditioned stimulus, composed of a tone and colored square, predicted an uncomfortable shock to the wrist (Compound group). A second group learned that the tone and square separately predicted shock (Separate group). On Day 2, participants in both groups were exposed to the tone in the absence of shocks (cue extinction). On Day 3, we tested whether extinction generalized from the extinguished to the unextinguished cue, as well as to a compound composed of both cues. Results showed that configural and elemental learning had unique and opposite effects on extinction generalization. Subjects who initially learned that a compound cue predicted shock successfully generalized extinction learning from the tone to the square, but exhibited threat relapse to the compound cue. In contrast, subjects who initially learned that each cue individually predicted shock did not generalize extinction learning from the tone to the square, but threat responses to the compound were low. These results highlight the importance of whether details of an aversive event are represented as integrated or separated memories, as these representations affect the success or limits of extinction generalization.
Collapse
Affiliation(s)
| | - Tahj Blow
- Weill Cornell College of Medicine, New York, NY, USA
| | - Joseph E Dunsmoor
- Department of Psychiatry, University of Texas at Austin, Austin, TX, USA
| | | |
Collapse
|
34
|
Abstract
The central theme of this review is the dynamic interaction between information selection and learning. We pose a fundamental question about this interaction: How do we learn what features of our experiences are worth learning about? In humans, this process depends on attention and memory, two cognitive functions that together constrain representations of the world to features that are relevant for goal attainment. Recent evidence suggests that the representations shaped by attention and memory are themselves inferred from experience with each task. We review this evidence and place it in the context of work that has explicitly characterized representation learning as statistical inference. We discuss how inference can be scaled to real-world decisions by approximating beliefs based on a small number of experiences. Finally, we highlight some implications of this inference process for human decision-making in social environments.
Collapse
Affiliation(s)
- Angela Radulescu
- Department of Psychology, Princeton University, Princeton, New Jersey 08544, USA; .,Princeton Neuroscience Institute, Princeton University, Princeton, New Jersey 08544, USA
| | - Yeon Soon Shin
- Princeton Neuroscience Institute, Princeton University, Princeton, New Jersey 08544, USA
| | - Yael Niv
- Department of Psychology, Princeton University, Princeton, New Jersey 08544, USA; .,Princeton Neuroscience Institute, Princeton University, Princeton, New Jersey 08544, USA
| |
Collapse
|
35
|
Norbury A, Brinkman H, Kowalchyk M, Monti E, Pietrzak RH, Schiller D, Feder A. Latent cause inference during extinction learning in trauma-exposed individuals with and without PTSD. Psychol Med 2021; 52:1-12. [PMID: 33682653 DOI: 10.1017/s0033291721000647] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
BACKGROUND Problems in learning that sights, sounds, or situations that were once associated with danger have become safe (extinction learning) may explain why some individuals suffer prolonged psychological distress following traumatic experiences. Although simple learning models have been unable to provide a convincing account of why this learning fails, it has recently been proposed that this may be explained by individual differences in beliefs about the causal structure of the environment. METHODS Here, we tested two competing hypotheses as to how differences in causal inference might be related to trauma-related psychopathology, using extinction learning data collected from clinically well-characterised individuals with varying degrees of post-traumatic stress (N = 56). Model parameters describing individual differences in causal inference were related to multiple post-traumatic stress disorder (PTSD) and depression symptom dimensions via network analysis. RESULTS Individuals with more severe PTSD were more likely to assign observations from conditioning and extinction stages to a single underlying cause. Specifically, greater re-experiencing symptom severity was associated with a lower likelihood of inferring that multiple causes were active in the environment. CONCLUSIONS We interpret these results as providing evidence of a primary deficit in discriminative learning in participants with more severe PTSD. Specifically, a tendency to attribute a greater diversity of stimulus configurations to the same underlying cause resulted in greater uncertainty about stimulus-outcome associations, impeding learning both that certain stimuli were safe, and that certain stimuli were no longer dangerous. In the future, better understanding of the role of causal inference in trauma-related psychopathology may help refine cognitive therapies for these disorders.
Collapse
Affiliation(s)
- Agnes Norbury
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Hannah Brinkman
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Mary Kowalchyk
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Elisa Monti
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Robert H Pietrzak
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA
- United States Department of Veterans Affairs, National Center for Posttraumatic Stress Disorder, Clinical Neurosciences Division, VA Connecticut Healthcare System, West Haven, CT, USA
| | - Daniela Schiller
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Neuroscience and Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Adriana Feder
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
36
|
Baram AB, Muller TH, Nili H, Garvert MM, Behrens TEJ. Entorhinal and ventromedial prefrontal cortices abstract and generalize the structure of reinforcement learning problems. Neuron 2021; 109:713-723.e7. [PMID: 33357385 PMCID: PMC7889496 DOI: 10.1016/j.neuron.2020.11.024] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Revised: 10/09/2020] [Accepted: 11/19/2020] [Indexed: 11/25/2022]
Abstract
Knowledge of the structure of a problem, such as relationships between stimuli, enables rapid learning and flexible inference. Humans and other animals can abstract this structural knowledge and generalize it to solve new problems. For example, in spatial reasoning, shortest-path inferences are immediate in new environments. Spatial structural transfer is mediated by cells in entorhinal and (in humans) medial prefrontal cortices, which maintain their co-activation structure across different environments and behavioral states. Here, using fMRI, we show that entorhinal and ventromedial prefrontal cortex (vmPFC) representations perform a much broader role in generalizing the structure of problems. We introduce a task-remapping paradigm, where subjects solve multiple reinforcement learning (RL) problems differing in structural or sensory properties. We show that, as with space, entorhinal representations are preserved across different RL problems only if task structure is preserved. In vmPFC and ventral striatum, representations of prediction error also depend on task structure.
Collapse
Affiliation(s)
- Alon Boaz Baram
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, UK.
| | - Timothy Howard Muller
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, UK
| | - Hamed Nili
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, UK
| | - Mona Maria Garvert
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, UK; Max-Planck-Institute for Human Cognitive and Brain Sciences, Stephanstraße 1a, 04103, Leipzig, Germany
| | - Timothy Edward John Behrens
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, UK; Wellcome Trust Centre for Neuroimaging, University College London, London WC1N 3AR, UK
| |
Collapse
|
37
|
Gijsen S, Grundei M, Lange RT, Ostwald D, Blankenburg F. Neural surprise in somatosensory Bayesian learning. PLoS Comput Biol 2021; 17:e1008068. [PMID: 33529181 PMCID: PMC7880500 DOI: 10.1371/journal.pcbi.1008068] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Revised: 02/12/2021] [Accepted: 12/18/2020] [Indexed: 02/08/2023] Open
Abstract
Tracking statistical regularities of the environment is important for shaping human behavior and perception. Evidence suggests that the brain learns environmental dependencies using Bayesian principles. However, much remains unknown about the employed algorithms, for somesthesis in particular. Here, we describe the cortical dynamics of the somatosensory learning system to investigate both the form of the generative model as well as its neural surprise signatures. Specifically, we recorded EEG data from 40 participants subjected to a somatosensory roving-stimulus paradigm and performed single-trial modeling across peri-stimulus time in both sensor and source space. Our Bayesian model selection procedure indicates that evoked potentials are best described by a non-hierarchical learning model that tracks transitions between observations using leaky integration. From around 70ms post-stimulus onset, secondary somatosensory cortices are found to represent confidence-corrected surprise as a measure of model inadequacy. Indications of Bayesian surprise encoding, reflecting model updating, are found in primary somatosensory cortex from around 140ms. This dissociation is compatible with the idea that early surprise signals may control subsequent model update rates. In sum, our findings support the hypothesis that early somatosensory processing reflects Bayesian perceptual learning and contribute to an understanding of its underlying mechanisms. Our environment features statistical regularities, such as a drop of rain predicting imminent rainfall. Despite the importance for behavior and survival, much remains unknown about how these dependencies are learned, particularly for somatosensation. As surprise signalling about novel observations indicates a mismatch between one’s beliefs and the world, it has been hypothesized that surprise computation plays an important role in perceptual learning. By analyzing EEG data from human participants receiving sequences of tactile stimulation, we compare different formulations of surprise and investigate the employed underlying learning model. Our results indicate that the brain estimates transitions between observations. Furthermore, we identified different signatures of surprise computation and thereby provide a dissociation of the neural correlates of belief inadequacy and belief updating. Specifically, early surprise responses from around 70ms were found to signal the need for changes to the model, with encoding of its subsequent updating occurring from around 140ms. These results provide insights into how somatosensory surprise signals may contribute to the learning of environmental statistics.
Collapse
Affiliation(s)
- Sam Gijsen
- Neurocomputation and Neuroimaging Unit, Freie Universität Berlin, Germany
- Humboldt-Universität zu Berlin, Faculty of Philosophy, Berlin School of Mind and Brain, Berlin, Germany
- * E-mail: (SG); (MG)
| | - Miro Grundei
- Neurocomputation and Neuroimaging Unit, Freie Universität Berlin, Germany
- Humboldt-Universität zu Berlin, Faculty of Philosophy, Berlin School of Mind and Brain, Berlin, Germany
- * E-mail: (SG); (MG)
| | - Robert T. Lange
- Berlin Institute of Technology, Berlin, Germany
- Einstein Center for Neurosciences, Berlin, Germany
| | - Dirk Ostwald
- Computational Cognitive Neuroscience, Freie Universität Berlin, Germany
| | - Felix Blankenburg
- Neurocomputation and Neuroimaging Unit, Freie Universität Berlin, Germany
| |
Collapse
|
38
|
Walther T, Diekmann N, Vijayabaskaran S, Donoso JR, Manahan-Vaughan D, Wiskott L, Cheng S. Context-dependent extinction learning emerging from raw sensory inputs: a reinforcement learning approach. Sci Rep 2021; 11:2713. [PMID: 33526840 PMCID: PMC7851139 DOI: 10.1038/s41598-021-81157-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Accepted: 12/08/2020] [Indexed: 11/09/2022] Open
Abstract
The context-dependence of extinction learning has been well studied and requires the hippocampus. However, the underlying neural mechanisms are still poorly understood. Using memory-driven reinforcement learning and deep neural networks, we developed a model that learns to navigate autonomously in biologically realistic virtual reality environments based on raw camera inputs alone. Neither is context represented explicitly in our model, nor is context change signaled. We find that memory-intact agents learn distinct context representations, and develop ABA renewal, whereas memory-impaired agents do not. These findings reproduce the behavior of control and hippocampal animals, respectively. We therefore propose that the role of the hippocampus in the context-dependence of extinction learning might stem from its function in episodic-like memory and not in context-representation per se. We conclude that context-dependence can emerge from raw visual inputs.
Collapse
Affiliation(s)
- Thomas Walther
- Institute for Neural Computation, Ruhr University Bochum, Bochum, Germany
| | - Nicolas Diekmann
- Institute for Neural Computation, Ruhr University Bochum, Bochum, Germany
| | | | - José R Donoso
- Institute for Neural Computation, Ruhr University Bochum, Bochum, Germany
| | | | - Laurenz Wiskott
- Institute for Neural Computation, Ruhr University Bochum, Bochum, Germany
| | - Sen Cheng
- Institute for Neural Computation, Ruhr University Bochum, Bochum, Germany.
| |
Collapse
|
39
|
Levy I, Schiller D. Neural Computations of Threat. Trends Cogn Sci 2021; 25:151-171. [PMID: 33384214 PMCID: PMC8084636 DOI: 10.1016/j.tics.2020.11.007] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 11/16/2020] [Accepted: 11/18/2020] [Indexed: 12/26/2022]
Abstract
A host of learning, memory, and decision-making processes form the individual's response to threat and may be disrupted in anxiety and post-trauma psychopathology. Here we review the neural computations of threat, from the first encounter with a dangerous situation, through learning, storing, and updating cues that predict it, to making decisions about the optimal course of action. The overview highlights the interconnected nature of these processes and their reliance on shared neural and computational mechanisms. We propose an integrative approach to the study of threat-related processes, in which specific computations are studied across the various stages of threat experience rather than in isolation. This approach can generate new insights about the evolution, diagnosis, and treatment of threat-related psychopathology.
Collapse
Affiliation(s)
- Ifat Levy
- Departments of Comparative Medicine, Neuroscience, and Psychology, Yale University, New Haven, CT, USA.
| | - Daniela Schiller
- Department of Psychiatry, Department of Neuroscience, and Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
40
|
Sajid N, Ball PJ, Parr T, Friston KJ. Active Inference: Demystified and Compared. Neural Comput 2021; 33:674-712. [PMID: 33400903 DOI: 10.1162/neco_a_01357] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Active inference is a first principle account of how autonomous agents operate in dynamic, nonstationary environments. This problem is also considered in reinforcement learning, but limited work exists on comparing the two approaches on the same discrete-state environments. In this letter, we provide (1) an accessible overview of the discrete-state formulation of active inference, highlighting natural behaviors in active inference that are generally engineered in reinforcement learning, and (2) an explicit discrete-state comparison between active inference and reinforcement learning on an OpenAI gym baseline. We begin by providing a condensed overview of the active inference literature, in particular viewing the various natural behaviors of active inference agents through the lens of reinforcement learning. We show that by operating in a pure belief-based setting, active inference agents can carry out epistemic exploration-and account for uncertainty about their environment-in a Bayes-optimal fashion. Furthermore, we show that the reliance on an explicit reward signal in reinforcement learning is removed in active inference, where reward can simply be treated as another observation we have a preference over; even in the total absence of rewards, agent behaviors are learned through preference learning. We make these properties explicit by showing two scenarios in which active inference agents can infer behaviors in reward-free environments compared to both Q-learning and Bayesian model-based reinforcement learning agents and by placing zero prior preferences over rewards and learning the prior preferences over the observations corresponding to reward. We conclude by noting that this formalism can be applied to more complex settings (e.g., robotic arm movement, Atari games) if appropriate generative models can be formulated. In short, we aim to demystify the behavior of active inference agents by presenting an accessible discrete state-space and time formulation and demonstrate these behaviors in a OpenAI gym environment, alongside reinforcement learning agents.
Collapse
Affiliation(s)
- Noor Sajid
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, London, WC1N 3AR, U.K.
| | - Philip J Ball
- Machine Learning Research Group, Department of Engineering Science, University of Oxford, Oxford OX1 3PJ, U.K.
| | - Thomas Parr
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, London, WC1N 3AR, U.K.
| | - Karl J Friston
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, London, WC1N 3AR, U.K.
| |
Collapse
|
41
|
Collins AGE, Cockburn J. Beyond dichotomies in reinforcement learning. Nat Rev Neurosci 2020; 21:576-586. [PMID: 32873936 DOI: 10.1038/s41583-020-0355-6] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/20/2020] [Indexed: 11/09/2022]
Abstract
Reinforcement learning (RL) is a framework of particular importance to psychology, neuroscience and machine learning. Interactions between these fields, as promoted through the common hub of RL, has facilitated paradigm shifts that relate multiple levels of analysis in a singular framework (for example, relating dopamine function to a computationally defined RL signal). Recently, more sophisticated RL algorithms have been proposed to better account for human learning, and in particular its oft-documented reliance on two separable systems: a model-based (MB) system and a model-free (MF) system. However, along with many benefits, this dichotomous lens can distort questions, and may contribute to an unnecessarily narrow perspective on learning and decision-making. Here, we outline some of the consequences that come from overconfidently mapping algorithms, such as MB versus MF RL, with putative cognitive processes. We argue that the field is well positioned to move beyond simplistic dichotomies, and we propose a means of refocusing research questions towards the rich and complex components that comprise learning and decision-making.
Collapse
Affiliation(s)
- Anne G E Collins
- Department of Psychology and the Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, USA.
| | - Jeffrey Cockburn
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, USA
| |
Collapse
|
42
|
Sanders H, Wilson MA, Gershman SJ. Hippocampal remapping as hidden state inference. eLife 2020; 9:51140. [PMID: 32515352 PMCID: PMC7282808 DOI: 10.7554/elife.51140] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2019] [Accepted: 05/09/2020] [Indexed: 11/13/2022] Open
Abstract
Cells in the hippocampus tuned to spatial location (place cells) typically change their tuning when an animal changes context, a phenomenon known as remapping. A fundamental challenge to understanding remapping is the fact that what counts as a ‘‘context change’’ has never been precisely defined. Furthermore, different remapping phenomena have been classified on the basis of how much the tuning changes after different types and degrees of context change, but the relationship between these variables is not clear. We address these ambiguities by formalizing remapping in terms of hidden state inference. According to this view, remapping does not directly reflect objective, observable properties of the environment, but rather subjective beliefs about the hidden state of the environment. We show how the hidden state framework can resolve a number of puzzles about the nature of remapping.
Collapse
Affiliation(s)
- Honi Sanders
- Center for Brains Minds and Machines, Harvard University, Cambridge, United States.,Picower Institute for Learning and Memory and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, United States
| | - Matthew A Wilson
- Center for Brains Minds and Machines, Harvard University, Cambridge, United States.,Picower Institute for Learning and Memory and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, United States
| | - Samuel J Gershman
- Center for Brains Minds and Machines, Harvard University, Cambridge, United States.,Department of Psychology, Harvard University, Cambridge, United States
| |
Collapse
|
43
|
Sathiyakumar S, Skromne Carrasco S, Saad L, Richards BA. Systems consolidation impairs behavioral flexibility. ACTA ACUST UNITED AC 2020; 27:201-208. [PMID: 32295840 PMCID: PMC7164516 DOI: 10.1101/lm.051243.119] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Accepted: 02/11/2020] [Indexed: 10/26/2022]
Abstract
Behavioral flexibility is important in a changing environment. Previous research suggests that systems consolidation, a long-term poststorage process that alters memory traces, may reduce behavioral flexibility. However, exactly how systems consolidation affects flexibility is unknown. Here, we tested how systems consolidation affects: (1) flexibility in response to value changes and (2) flexibility in response to changes in the optimal sequence of actions. Mice were trained to obtain food rewards in a Y-maze by switching nose pokes between three arms. During initial training, all arms were rewarded and mice simply had to switch arms in order to maximize rewards. Then, after either a 1 or 28 d delay, we either devalued one arm, or we reinforced a specific sequence of pokes. We found that after a 1 d delay mice adapted relatively easily to the changes. In contrast, mice given a 28 d delay struggled to adapt, especially for changes to the optimal sequence of actions. Immediate early gene imaging suggested that the 28 d mice were less reliant on their hippocampus and more reliant on their medial prefrontal cortex. These data suggest that systems consolidation reduces behavioral flexibility, particularly for changes to the optimal sequence of actions.
Collapse
Affiliation(s)
- Sankirthana Sathiyakumar
- Department of Biological Sciences, University of Toronto Scarborough, Toronto, Ontario M1C 1A4, Canada.,Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario M5S 3G5, Canada
| | - Sofia Skromne Carrasco
- Department of Biological Sciences, University of Toronto Scarborough, Toronto, Ontario M1C 1A4, Canada
| | - Lydia Saad
- Department of Biological Sciences, University of Toronto Scarborough, Toronto, Ontario M1C 1A4, Canada
| | - Blake A Richards
- Department of Biological Sciences, University of Toronto Scarborough, Toronto, Ontario M1C 1A4, Canada.,Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario M5S 3G5, Canada.,Mila, Montréal, Quebec H2S 3H1, Canada.,Department of Neurology and Neurosurgery, McGill University, Montréal, Quebec H3A 2B4, Canada.,School of Computer Science, McGill University, Montréal, Quebec H3A 2A7, Canada
| |
Collapse
|
44
|
Tomov MS, Yagati S, Kumar A, Yang W, Gershman SJ. Discovery of hierarchical representations for efficient planning. PLoS Comput Biol 2020; 16:e1007594. [PMID: 32251444 PMCID: PMC7162548 DOI: 10.1371/journal.pcbi.1007594] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Revised: 04/16/2020] [Accepted: 12/10/2019] [Indexed: 12/12/2022] Open
Abstract
We propose that humans spontaneously organize environments into clusters of states that support hierarchical planning, enabling them to tackle challenging problems by breaking them down into sub-problems at various levels of abstraction. People constantly rely on such hierarchical presentations to accomplish tasks big and small-from planning one's day, to organizing a wedding, to getting a PhD-often succeeding on the very first attempt. We formalize a Bayesian model of hierarchy discovery that explains how humans discover such useful abstractions. Building on principles developed in structure learning and robotics, the model predicts that hierarchy discovery should be sensitive to the topological structure, reward distribution, and distribution of tasks in the environment. In five simulations, we show that the model accounts for previously reported effects of environment structure on planning behavior, such as detection of bottleneck states and transitions. We then test the novel predictions of the model in eight behavioral experiments, demonstrating how the distribution of tasks and rewards can influence planning behavior via the discovered hierarchy, sometimes facilitating and sometimes hindering performance. We find evidence that the hierarchy discovery process unfolds incrementally across trials. Finally, we propose how hierarchy discovery and hierarchical planning might be implemented in the brain. Together, these findings present an important advance in our understanding of how the brain might use Bayesian inference to discover and exploit the hidden hierarchical structure of the environment.
Collapse
Affiliation(s)
- Momchil S. Tomov
- Program in Neuroscience, Harvard Medical School, Boston, Massachusetts, United States of America
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
| | - Samyukta Yagati
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Agni Kumar
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Wanqian Yang
- School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, United States of America
| | - Samuel J. Gershman
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
| |
Collapse
|
45
|
Nussenbaum K, Hartley CA. Reinforcement learning across development: What insights can we draw from a decade of research? Dev Cogn Neurosci 2019; 40:100733. [PMID: 31770715 PMCID: PMC6974916 DOI: 10.1016/j.dcn.2019.100733] [Citation(s) in RCA: 81] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Revised: 10/24/2019] [Accepted: 11/04/2019] [Indexed: 01/02/2023] Open
Abstract
The past decade has seen the emergence of the use of reinforcement learning models to study developmental change in value-based learning. It is unclear, however, whether these computational modeling studies, which have employed a wide variety of tasks and model variants, have reached convergent conclusions. In this review, we examine whether the tuning of model parameters that govern different aspects of learning and decision-making processes vary consistently as a function of age, and what neurocognitive developmental changes may account for differences in these parameter estimates across development. We explore whether patterns of developmental change in these estimates are better described by differences in the extent to which individuals adapt their learning processes to the statistics of different environments, or by more static learning biases that emerge across varied contexts. We focus specifically on learning rates and inverse temperature parameter estimates, and find evidence that from childhood to adulthood, individuals become better at optimally weighting recent outcomes during learning across diverse contexts and less exploratory in their value-based decision-making. We provide recommendations for how these two possibilities - and potential alternative accounts - can be tested more directly to build a cohesive body of research that yields greater insight into the development of core learning processes.
Collapse
|
46
|
Higashi H, Minami T, Nakauchi S. Cooperative update of beliefs and state-transition functions in human reinforcement learning. Sci Rep 2019; 9:17704. [PMID: 31776353 PMCID: PMC6881319 DOI: 10.1038/s41598-019-53600-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Accepted: 10/25/2019] [Indexed: 11/09/2022] Open
Abstract
It is widely known that reinforcement learning systems in the brain contribute to learning via interactions with the environment. These systems are capable of solving multidimensional problems, in which some dimensions are relevant to a reward, while others are not. To solve these problems, computational models use Bayesian learning, a strategy supported by behavioral and neural evidence in human. Bayesian learning takes into account beliefs, which represent a learner’s confidence in a particular dimension being relevant to the reward. Beliefs are given as a posterior probability of the state-transition (reward) function that maps the optimal actions to the states in each dimension. However, when it comes to implementing this learning strategy, the order in which beliefs and state-transition functions update remains unclear. The present study investigates this update order using a trial-by-trial analysis of human behavior and electroencephalography signals during a task in which learners have to identify the reward-relevant dimension. Our behavioral and neural results reveal a cooperative update—within 300 ms after the outcome feedback, the state-transition functions are updated, followed by the beliefs for each dimension.
Collapse
Affiliation(s)
- Hiroshi Higashi
- Graduate School of Informatics, Kyoto University, Kyoto, Japan.
| | - Tetsuto Minami
- Electronics-Inspired Interdisciplinary Research Institute, Toyohashi University of Technology, Toyohashi, Japan.,Department of Computer Science and Engineering, Toyohashi University of Technology, Toyohashi, Japan
| | - Shigeki Nakauchi
- Department of Computer Science and Engineering, Toyohashi University of Technology, Toyohashi, Japan
| |
Collapse
|
47
|
Seymour B. Pain: A Precision Signal for Reinforcement Learning and Control. Neuron 2019; 101:1029-1041. [PMID: 30897355 DOI: 10.1016/j.neuron.2019.01.055] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Revised: 01/18/2019] [Accepted: 01/27/2019] [Indexed: 12/18/2022]
Abstract
Since noxious stimulation usually leads to the perception of pain, pain has traditionally been considered sensory nociception. But its variability and sensitivity to a broad array of cognitive and motivational factors have meant it is commonly viewed as inherently imprecise and intangibly subjective. However, the core function of pain is motivational-to direct both short- and long-term behavior away from harm. Here, we illustrate that a reinforcement learning model of pain offers a mechanistic understanding of how the brain supports this, illustrating the underlying computational architecture of the pain system. Importantly, it explains why pain is tuned by multiple factors and necessarily supported by a distributed network of brain regions, recasting pain as a precise and objectifiable control signal.
Collapse
Affiliation(s)
- Ben Seymour
- Center for Information and Neural Networks, National Institute of Information and Communications Technology, 1-4 Yamadaoka, Suita, Osaka 565-0871, Japan; Computational and Biological Learning Lab, Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK.
| |
Collapse
|
48
|
Abstract
Midbrain dopamine signals are widely thought to report reward prediction errors that drive learning in the basal ganglia. However, dopamine has also been implicated in various probabilistic computations, such as encoding uncertainty and controlling exploration. Here, we show how these different facets of dopamine signalling can be brought together under a common reinforcement learning framework. The key idea is that multiple sources of uncertainty impinge on reinforcement learning computations: uncertainty about the state of the environment, the parameters of the value function and the optimal action policy. Each of these sources plays a distinct role in the prefrontal cortex-basal ganglia circuit for reinforcement learning and is ultimately reflected in dopamine activity. The view that dopamine plays a central role in the encoding and updating of beliefs brings the classical prediction error theory into alignment with more recent theories of Bayesian reinforcement learning.
Collapse
Affiliation(s)
- Samuel J Gershman
- Department of Psychology, Center for Brain Science, Harvard University, Cambridge, MA, USA.
| | - Naoshige Uchida
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA, USA
| |
Collapse
|
49
|
Abstract
Arguably, the most difficult part of learning is deciding what to learn about. Should I associate the positive outcome of safely completing a street-crossing with the situation 'the car approaching the crosswalk was red' or with 'the approaching car was slowing down'? In this Perspective, we summarize our recent research into the computational and neural underpinnings of 'representation learning'-how humans (and other animals) construct task representations that allow efficient learning and decision-making. We first discuss the problem of learning what to ignore when confronted with too much information, so that experience can properly generalize across situations. We then turn to the problem of augmenting perceptual information with inferred latent causes that embody unobservable task-relevant information, such as contextual knowledge. Finally, we discuss recent findings regarding the neural substrates of task representations that suggest the orbitofrontal cortex represents 'task states', deploying them for decision-making and learning elsewhere in the brain.
Collapse
|
50
|
Petter EA, Gershman SJ, Meck WH. Integrating Models of Interval Timing and Reinforcement Learning. Trends Cogn Sci 2019; 22:911-922. [PMID: 30266150 DOI: 10.1016/j.tics.2018.08.004] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Revised: 07/23/2018] [Accepted: 08/13/2018] [Indexed: 10/28/2022]
Abstract
We present an integrated view of interval timing and reinforcement learning (RL) in the brain. The computational goal of RL is to maximize future rewards, and this depends crucially on a representation of time. Different RL systems in the brain process time in distinct ways. A model-based system learns 'what happens when', employing this internal model to generate action plans, while a model-free system learns to predict reward directly from a set of temporal basis functions. We describe how these systems are subserved by a computational division of labor between several brain regions, with a focus on the basal ganglia and the hippocampus, as well as how these regions are influenced by the neuromodulator dopamine.
Collapse
Affiliation(s)
- Elijah A Petter
- Department of Psychology and Neuroscience, Duke University, Durham, NC, USA
| | - Samuel J Gershman
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA, USA
| | - Warren H Meck
- Department of Psychology and Neuroscience, Duke University, Durham, NC, USA.
| |
Collapse
|